PredictorProblems
Purpose
If you have two variables X and Y, then do you regress X Vs Y or Y Vs X.
> data(cars) > head(cars) speed dist 1 4 2 2 4 10 3 7 4 4 7 22 5 8 16 6 9 10 > g <- lm(dist ~ speed, cars) > summary(g) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 15.38 on 48 degrees of freedom Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 F-statistic: 89.57 on 1 and 48 DF, p-value: 1.490e-12 > g <- lm(speed ~ dist, cars) > summary(g) Call: lm(formula = speed ~ dist, data = cars) Residuals: Min 1Q Median 3Q Max -7.5293 -2.1550 0.3615 2.4377 6.4179 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 8.28391 0.87438 9.474 1.44e-12 *** dist 0.16557 0.01749 9.464 1.49e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.156 on 48 degrees of freedom Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 F-statistic: 89.57 on 1 and 48 DF, p-value: 1.490e-12 |
What if there are error in the measurement of X and Y ?
Let there be a systematic error in dependent variable
> g <- lm(dist ~ speed, cars) > summary(g) Call: lm(formula = dist ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -29.069 -9.525 -2.272 9.215 43.201 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -17.5791 6.7584 -2.601 0.0123 * speed 3.9324 0.4155 9.464 1.49e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 15.38 on 48 degrees of freedom Multiple R-squared: 0.6511, Adjusted R-squared: 0.6438 F-statistic: 89.57 on 1 and 48 DF, p-value: 1.490e-12 > g <- lm(I(dist + rnorm(50)) ~ speed, cars) > summary(g) Call: lm(formula = I(dist + rnorm(50)) ~ speed, data = cars) Residuals: Min 1Q Median 3Q Max -28.599 -9.819 -2.601 9.365 43.668 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) -16.9210 6.7803 -2.496 0.0161 * speed 3.8877 0.4169 9.326 2.36e-12 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 15.43 on 48 degrees of freedom Multiple R-squared: 0.6444, Adjusted R-squared: 0.637 F-statistic: 86.98 on 1 and 48 DF, p-value: 2.363e-12 |
There seems to be no change in parameter estimates
Let there be a systematic error in independent variable
> g <- lm(dist ~ speed, cars) > coef(ge1) (Intercept) I(speed + rnorm(50)) -13.61135 3.70888 > ge1 <- lm(dist ~ I(speed + rnorm(50)), cars) > coef(ge1) (Intercept) I(speed + rnorm(50)) -14.768600 3.755804 > ge2 <- lm(dist ~ I(speed + 2 * rnorm(50)), cars) > coef(ge2) (Intercept) I(speed + 2 * rnorm(50)) -11.690288 3.617516 > ge2 <- lm(dist ~ I(speed + 4 * rnorm(50)), cars) > coef(ge2) (Intercept) I(speed + 4 * rnorm(50)) 0.5258011 2.6991028 > ge2 <- lm(dist ~ I(speed + 6 * rnorm(50)), cars) > coef(ge2) (Intercept) I(speed + 6 * rnorm(50)) 11.541484 1.966775 |
As you see , the slope becomes flatter and flatter as the measurement error increases in the independent variable.
One important thing to note is that there is a relationship between new beta and old beta which clearly shows that if the stdev of error measurement is less than the stdev in the fixed variable case, then one can conveniently forget about the bias induced by error measurement of the independent variable.
Controlled variables are hypothetical in Finance. In Hard sciences, if you are experimenting in a lab, then you have the choice on controlling the variable. In finance, one cant even think of such stuff…
Another takeaway is that in a controlled variable environment, whichever variable has a lesser estimation / measurement error, you can take that variable as independent variable and conduct the experiment