Hobson GLM - Exercise 6.2
Purpose
To work out exercise 6.2 from Hobson’s book '' 6.2 shows response of a grass and legume pasture system to various quantities of phosphorus fertilizer (data from D. F. Sinclair; the results were reported in Sinclair and Probert, 1986). The total yield, ofgrass and legume together, and amount ofphosphorus (K) are both given in kilograms per hectare. Find a suitable model for describing the relationship between yield and quantity off ertilizer.
(a) Plot yield against phosphorus to obtain an approximately linear relationship you may need to try several transformations of either or both variables in order to achieve approximate linearity.
> folder <- "C:/Cauldron/garage/R/soulcraft/Volatility/Learn/Dobson-GLM/" > file.input <- paste(folder, "Table 6.16 Pasture yield.csv", sep = "") > data <- read.csv(file.input, header = T, stringsAsFactors = F) |
> par(mfrow = c(1, 1)) > plot(data$K, data$yield, pch = 19, col = "blue") |
> lambda <- 2 > temp <- (data$yield^lambda - 1)/lambda > par(mfrow = c(1, 1)) > plot(data$K, temp, pch = 19, col = "blue") |
Looks like quadratic is a nice fit. Lets fit the model and check it out
(b) Use the results of(a) to specify a possible model. Fit the model.
> fit <- (lm(yield ~ K + I(K^2), data)) > summary(fit) Call: lm(formula = yield ~ K + I(K^2), data = data) Residuals: Min 1Q Median 3Q Max -876.18 -257.45 82.54 287.88 722.17 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 2471.2637 196.1115 12.601 4.51e-12 *** K 73.8059 19.8999 3.709 0.00110 ** I(K^2) -0.5152 0.3894 -1.323 0.19827 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 438.6 on 24 degrees of freedom Multiple R-squared: 0.7915, Adjusted R-squared: 0.7742 F-statistic: 45.56 on 2 and 24 DF, p-value: 6.736e-09 |
(c) Calculate the standardized residuals for the model and use appropriate plots to check for any systematic effects that might suggest alternative models and to investigate the validity ofan y assumptions made.
> library(car) > qq.plot(fit, simulate = T) [1] 19 |
- All points seem to lie in the confidence bands
Influence stats
> plot(hatvalues(fit), ylim = c(0, 0.5)) > abline(h = c(2, 3) * 3/27) > identify(1:27, hatvalues(fit), row.names(data)) [1] 3 12 26 |
Cooksdistance
> plot(cookd(fit), ylim = c(0, 0.5)) > abline(h = 4/24) > identify(1:27, cookd(fit), row.names(data)) [1] 1 |
DFBETAS
> dfbs.fit <- dfbetas(fit) > par(mfrow = c(3, 1)) > plot(1:27, dfbs.fit[, 1], pch = 19, col = "blue") > plot(1:27, dfbs.fit[, 2], pch = 19, col = "blue") > plot(1:27, dfbs.fit[, 3], pch = 19, col = "blue") |
As you can clearly see that the first point has
a big influence on the second parameter of the model and hence has a bigger cooks distance. This point concurs with the previous cooksd graph