Simple Regression
If you are given 1000 values of Y and 1000 values of X1, X2, X3, X4 , X5, X6
Lets say the variables in the dataset are the following .Variables - Dependent : Species - Independent : Area + Elevation + Nearest + Scruz + Adjacent ( XT X )(-1) ( XT Y) gives the parameter estimates To solve the above equation of XTX b = XT Y , one uses QR decomposition Refer to Why use QR decomposition
> library(faraway) > data(gala) > head(gala) Species Endemics Area Elevation Nearest Scruz Adjacent Baltra 58 23 25.09 346 0.6 0.6 1.84 Bartolome 31 21 1.24 109 0.6 26.3 572.33 Caldwell 3 3 0.21 114 2.8 58.7 0.78 Champion 25 9 0.10 46 1.9 47.4 0.18 Coamano 2 1 0.05 77 1.9 1.9 903.82 Daphne.Major 18 11 0.34 119 8.0 8.0 1.84 > fit <- lm(Species ~ Area + Elevation + Nearest + Scruz + Adjacent, + data = gala) > fit.sum <- summary(fit) |
- The following are the attributes present in fit and summary fit
> names(fit) [1] "coefficients" "residuals" "effects" "rank" [5] "fitted.values" "assign" "qr" "df.residual" [9] "xlevels" "call" "terms" "model" > names(fit.sum) [1] "call" "terms" "residuals" "coefficients" [5] "aliased" "sigma" "df" "r.squared" [9] "adj.r.squared" "fstatistic" "cov.unscaled" |
To get parameter estimates
> solve(crossprod(x, x), crossprod(x, y)) [,1] (Intercept) 7.068220709 Area -0.023938338 Elevation 0.319464761 Nearest 0.009143961 Scruz -0.240524230 Adjacent -0.074804832 |
To get standard errors from the model
> sig <- sqrt(deviance(fit)/df.residual(fit)) > sqrt(diag(fit.sum$cov.unscaled)) * sig (Intercept) Area Elevation Nearest Scruz Adjacent 19.15419782 0.02242235 0.05366280 1.05413595 0.21540225 0.01770019 |
To get R squared from the model , one uses
> 1 - deviance(fit)/sum((gala$Species - mean(gala$Species))^2) [1] 0.765847 |