Hypothesis testing - Multiple Reg
Purpose
Simulate Multiple Reg and test the Hypothesis
> set.seed(1977) > n <- 15000 > beta.actual <- matrix(c(2, 3, 4, 5), ncol = 1) > beta.sample <- cbind(rnorm(n, beta.actual[1]), rnorm(n, beta.actual[2]), + rnorm(n, beta.actual[3]), rnorm(n, beta.actual[4])) > error <- rnorm(n) > x <- cbind(rep(1, n), runif(n), runif(n), runif(n)) > y <- x[, 1] * beta.sample[, 1] + x[, 2] * beta.sample[, 2] + + x[, 3] * beta.sample[, 3] + x[, 4] * beta.sample[, 4] + error > summary(lm(y ~ x + 0)) Call: lm(formula = y ~ x + 0) Residuals: Min 1Q Median 3Q Max -6.877817 -1.153599 -0.001764 1.160492 6.736600 Coefficients: Estimate Std. Error t value Pr(>|t|) x1 1.95455 0.04478 43.65 <2e-16 *** x2 3.02972 0.04889 61.97 <2e-16 *** x3 3.98549 0.04898 81.37 <2e-16 *** x4 5.09754 0.04900 104.03 <2e-16 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 1.737 on 14996 degrees of freedom Multiple R-squared: 0.9578, Adjusted R-squared: 0.9578 F-statistic: 8.515e+04 on 4 and 14996 DF, p-value: < 2.2e-16 |
Test this Hypothesis b2 = 0 My mind says that there are atleast 3 ways to do it.. Firstly the restricted and unrestricted case
> library(faraway) > g <- lm(sr ~ pop15 + pop75 + dpi + ddpi, savings) > RSS0 <- sum(resid(g)^2) > g.res <- lm(sr ~ pop75 + dpi + ddpi, savings) > RSS1 <- sum(resid(g.res)^2) > fstat <- ((RSS1 - RSS0)/3)/(RSS0/46) > fstat [1] 3.464173 > qf(0.95, 3, 46) [1] 2.806845 |
Since the value is greater than 95 percent level, reject the hypothesis that the coefficient of pop15 =0
Second case is anova
> g1 <- lm(sr ~ pop15 + pop75 + dpi + ddpi, savings) > g2 <- lm(sr ~ pop75 + dpi + ddpi, savings) > anova(g1, g2) Analysis of Variance Table Model 1: sr ~ pop15 + pop75 + dpi + ddpi Model 2: sr ~ pop75 + dpi + ddpi Res.Df RSS Df Sum of Sq F Pr(>F) 1 45 650.71 2 46 797.72 -1 -147.01 10.167 0.002603 ** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 |
Null Hypo is that pop15 = 0 which is obviously rejected if you look at ftest results Third is the complicated procedure of actually computing variance of Rb-r Actually the results will produce the same fstat as procedure 1