Purpose
The purpose of this script is to spell out various connections between power, F, chi-square, multiple regression etc , as usual from scratch.

Let’s start off with the dataset from yule

The result of multiple regression is given below.

> head(x)
  dPaup dOut dOld dPop
1   -73  -95    4   36
2   -53  -88   15   11
3   -69  -79  -15   74
4   -36  -79  -19   24
5   -54  -82   13   -4
6   -48  -73    5   -9
> fit <- lm(dPaup ~ dOut + dOld + dPop, x)
> fit.sum <- summary(fit)
> fit.sum
Call:
lm(formula = dPaup ~ dOut + dOld + dPop, data = x)
Residuals: Min 1Q Median 3Q Max -17.475 -5.311 -1.829 3.132 25.335
Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 12.88356 10.36722 1.243 0.224 dOut 0.75209 0.13499 5.572 5.83e-06 *** dOld 0.05560 0.22336 0.249 0.805 dPop -0.31074 0.06685 -4.648 7.25e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 9.547 on 28 degrees of freedom Multiple R-squared: 0.6972, Adjusted R-squared: 0.6647 F-statistic: 21.49 on 3 and 28 DF, p-value: 2.001e-07

If we represent a , b, c, d as the coefficients of the model, then one needs to know a way to test any hypothesis, let’s say c = d = 0. Let me illustrate a few ways to do it

METHOD 1
The standard way is given in Johnston book on page 92. Create a matrix R representing the equations pertaining to 2 conditions. Basically one needs to test Rb-r

> X <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
> beta <- fit.sum$coef[, 1]
> R <- rbind(c(0, 0, 1, 0), c(0, 0, 0, 1))
> r <- rbind(c(0), c(0))
> t1 <- t(R %*% beta - r)
> t2 <- solve(R %*% (solve(crossprod(X, X))) %*% t(R))
> t3 <- R %*% beta - r
> num <- (t1 %*% t2 %*% t3)/2
> denom <- deviance(fit)/df.residual(fit)
> fvalue <- num/denom
> fvalue
         [,1]
[1,] 15.91833
> fvalue > qf(0.95, 2, 28)
     [,1]
[1,] TRUE

Here fvalue is greater than the critical value and one can conveniently reject the null hypo that c= 0 and d =0

METHOD 2
Thanks to my understanding of wald stats, you can simply fit 2 models and check for statistical difference using anova

> fit1 <- lm(dPaup ~ dOut + dOld + dPop, x)
> fit.sum1 <- summary(fit1)
> X1 <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
> fit2 <- lm(dPaup ~ dOut, x)
> fit.sum2 <- summary(fit2)
> X2 <- cbind(rep(1, 32), x$dOut)
> anova(fit1, fit2)
Analysis of Variance Table
Model 1: dPaup ~ dOut + dOld + dPop Model 2: dPaup ~ dOut Res.Df RSS Df Sum of Sq F Pr(>F) 1 28 2551.9 2 30 5453.5 -2 -2901.6 15.918 2.414e-05 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

You can see that the fvalue obtained is same as that obtained in METHOD 1.

METHOD 3
Thanks to David Freeman,there is another way to test the hypothesis ?norm

> beta1 <- fit.sum1$coef[, 1]
> beta2 <- fit.sum2$coef[, 1]
> num2 <- (norm(X1 %*% beta1)^2 - norm(X2 %*% beta2)^2)/2
> fvalue <- num2/denom
> fvalue > qf(0.95, 2, 28)
[1] TRUE

Thus one can see that fvalue exceeds the critical value and hence the null hypo is rejected that c = d= 0

METHOD 4
Use simulation to investigate the distribution of the F-statistic for testing the null hypothesis that c = d = 0.

> res <- data.frame(matrix(nrow = 1000, ncol = 1))
> beta <- matrix(c(8, 0.8, 0, 0), ncol = 1)
> sig <- 15
> i <- 1
> X <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
> for (i in 1:1000) {
+     Y <- X %*% beta + rnorm(32, 0, sig)
+     fit1 <- lm(Y ~ dOut + dOld + dPop, x)
+     fit.sum1 <- summary(fit1)
+     X1 <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
+     fit2 <- lm(Y ~ dOut, x)
+     fit.sum2 <- summary(fit2)
+     X2 <- cbind(rep(1, 32), x$dOut)
+     beta1 <- fit.sum1$coef[, 1]
+     beta2 <- fit.sum2$coef[, 1]
+     num2 <- (norm(X1 %*% beta1)^2 - norm(X2 %*% beta2)^2)/2
+     denom <- deviance(fit1)/df.residual(fit1)
+     res[i, 1] <- num2/denom
+ }
> critical.val <- qf(0.95, 2, 28)
> length(which(res[, 1] > critical.val))/1000
[1] 0.052

The probability is 0.068.. So it appears like we cannot reject the null hypo

Lets try out with another set of parameters

> res <- data.frame(matrix(nrow = 1000, ncol = 1))
> beta <- matrix(c(13, 0.8, 0.1, 0.3), ncol = 1)
> sig <- 10
> i <- 1
> X <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
> for (i in 1:1000) {
+     Y <- X %*% beta + rnorm(32, 0, sig)
+     fit1 <- lm(Y ~ dOut + dOld + dPop, x)
+     fit.sum1 <- summary(fit1)
+     X1 <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
+     fit2 <- lm(Y ~ dOut, x)
+     fit.sum2 <- summary(fit2)
+     X2 <- cbind(rep(1, 32), x$dOut)
+     beta1 <- fit.sum1$coef[, 1]
+     beta2 <- fit.sum2$coef[, 1]
+     num2 <- (norm(X1 %*% beta1)^2 - norm(X2 %*% beta2)^2)/2
+     denom <- deviance(fit1)/df.residual(fit1)
+     res[i, 1] <- num2/denom
+ }
> critical.val <- qf(0.95, 2, 28)
> length(which(res[, 1] > critical.val))/1000
[1] 0.994

Obviously there is a huge prob and hence you can accept the alternate

Ok.let increase the sigma level and check for the f test

> res <- data.frame(matrix(nrow = 1000, ncol = 1))
> beta <- matrix(c(13, 0.8, 0.1, 0.3), ncol = 1)
> sig <- 25
> i <- 1
> X <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
> for (i in 1:1000) {
+     Y <- X %*% beta + rnorm(32, 0, sig)
+     fit1 <- lm(Y ~ dOut + dOld + dPop, x)
+     fit.sum1 <- summary(fit1)
+     X1 <- cbind(rep(1, 32), x$dOut, x$dOld, x$dPop)
+     fit2 <- lm(Y ~ dOut, x)
+     fit.sum2 <- summary(fit2)
+     X2 <- cbind(rep(1, 32), x$dOut)
+     beta1 <- fit.sum1$coef[, 1]
+     beta2 <- fit.sum2$coef[, 1]
+     num2 <- (norm(X1 %*% beta1)^2 - norm(X2 %*% beta2)^2)/2
+     denom <- deviance(fit1)/df.residual(fit1)
+     res[i, 1] <- num2/denom
+ }
> critical.val <- qf(0.95, 2, 28)
> length(which(res[, 1] > critical.val))/1000
[1] 0.36

Whats the takeaway ? By increasing the error variance, whats happening to the power and level of the test.