Should one use GLS for pairs

Purpose
All along I was using a simple regression model to find out the hedge factor in a pairs trade.

Yesterday after coming back from gym and while I was just thinking about structural change, an idea struck me. I know for certain that errors do not form a gaussian distribution. They are not standard normal gaussian realizations. However I was using the plain simple regression to find out the beta.

So the purpose of this beautiful day is to find out whether glm works in the context of pairs. Can the use of glm give bettter insight in to pairs ?

Let me investigate like a Poirot or Holmes ..heheh I am going to use gls to investigate the parameters

> library(nlme)
> y <- security.db1[, "AMBUJACEM"]
> x <- security.db1[, "GRASIM"]
> dataset <- data.frame(y = y, x = x)
> dataset$trade_date <- as.Date(z$trade_date)
> fit.gls <- gls(y ~ x, correlation = corAR1(), data = dataset)
> summary(fit.gls)
Generalized least squares fit by REML
  Model: y ~ x
  Data: dataset
       AIC      BIC    logLik
  1099.082 1113.199 -545.5408

Correlation Structure: AR(1)
 Formula: ~1
 Parameter estimate(s):
      Phi
0.9181475

Coefficients:
               Value Std.Error   t-value p-value
(Intercept) 40.74003  4.804454  8.479638       0
x            0.02263  0.001957 11.563727       0

 Correlation:
  (Intr)
x -0.95

Standardized residuals:
       Min         Q1        Med         Q3        Max
-2.0890921 -0.7468221 -0.0644149  0.6064483  2.3754263

Residual standard error: 5.150884
Degrees of freedom: 254 total; 252 residual
> dataset$er <- resid(fit.gls)
> pair <- "AMBUJACEM-GRASIM-GLM"
> p <- ggplot(dataset, aes(x = trade_date, y = er)) + scale_x_date()
> q <- p + geom_line(colour = "blue", lwd = 1.3)
> q <- q + geom_hline(yintercept = 0)
> q <- q + scale_x_date("Date")
> q <- q + scale_y_continuous("Spread")
> q <- q + opts(title = pair)
> print(q)

Use the old ols method

> dataset <- data.frame(y = y, x = x)
> dataset$trade_date <- as.Date(z$trade_date)
> fit <- lm(y ~ x, data = dataset)
> summary(fit)
Call:
lm(formula = y ~ x, data = dataset)

Residuals:
     Min       1Q   Median       3Q      Max
-12.7878  -2.7966  -0.2604   2.9760  12.5448

Coefficients:
             Estimate Std. Error t value Pr(>|t|)
(Intercept) 3.112e+01  1.572e+00   19.80   <2e-16 ***
x           2.666e-02  6.582e-04   40.51   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 4.308 on 252 degrees of freedom
Multiple R-squared: 0.8669,     Adjusted R-squared: 0.8663
F-statistic:  1641 on 1 and 252 DF,  p-value: < 2.2e-16
> dataset$er <- resid(fit)
> pair <- "AMBUJACEM-GRASIM- LM"
> p <- ggplot(dataset, aes(x = trade_date, y = er)) + scale_x_date()
> q <- p + geom_line(colour = "blue", lwd = 1.3)
> q <- q + geom_hline(yintercept = 0)
> q <- q + scale_x_date("Date")
> q <- q + scale_y_continuous("Spread")
> q <- q + opts(title = pair)
> q0 <- q
> print(q0)

> pushViewport(viewport(layout = grid.layout(1, 2)))
> vplayout <- function(x, y) viewport(layout.pos.row = x, layout.pos.col = y)
> print(q0, vp = vplayout(1, 1))
> print(q, vp = vplayout(1, 2))

So compare the coefficients

> coef(fit)
(Intercept)           x
31.12278855  0.02666309
> coef(fit.gls)
(Intercept)           x
40.74003244  0.02262649

Clearly the intercept is very very different and the hedge ratio does not change by much.

So, the usage of generalized least square does nothing to the prediction of hedge ratio. A better estimate of intercept shifts the spread vertically and hence there is actually no big change in the way pairs are traded

TAKEAWAY Don’t stretch yourself with a glm…Least squares would do.