Chap 6 - Playing with Outliers

.Purpose Look for PFC - RECLTD pair and look out for the possible outliers in the data based on the regression relationship between two stocks.

> library(RSQLite)
> temp <- hdata[, c("RECLTD", "PFC")]
> dates <- hdata[, 1]
> rownames(temp) <- dates
> fit <- lm(RECLTD ~ PFC + 0, data = temp)
> summary(fit)
Call:
lm(formula = RECLTD ~ PFC + 0, data = temp)

Residuals:
     Min       1Q   Median       3Q      Max
-33.8711  -9.3639   0.5983   8.4083  25.7757

Coefficients:
    Estimate Std. Error t value Pr(>|t|)
PFC 0.922983   0.003482   265.1   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 12.36 on 211 degrees of freedom
Multiple R-squared: 0.997,      Adjusted R-squared: 0.997
F-statistic: 7.026e+04 on 1 and 211 DF,  p-value: < 2.2e-16

> plot(resid(fit), col = "blue", type = "l")

Now look for some diagnostics

> library(car)

Hat Values

> plot(hatvalues(fit), pch = 19, col = "blue", ylim = c(0, 0.1))
> abline(h = c(2, 3) * 2/212, lty = 2)

As one can clearly see that none of the hat values exceed 2h or 3h and hence one can assume that there are no dangerous hat values which unduly influence the hedge ratio of the pair

dfbetas - Function to compute the change in coefficient of the hedge ratio if one of the observations is removed.

> fit <- lm(RECLTD ~ PFC, data = temp)
> dfbs.fit <- dfbetas(fit)
> plot(dfbs.fit, pch = 19, col = "blue")

There is a problem of too much data. As one can see that if you use dfbetas, there will be one dfbeta for each coefficient for each observation.

> fit <- lm(RECLTD ~ PFC + 0, data = temp)
> plot(cookd(fit), pch = 19, col = "blue")
> abline(h = 4/(212 - 1 - 1), lty = 2)

There are about 7 residuals which have more than the required heuristic cooks distance.