Dobson - GLM - Chap 2
Purpose
To work out the examples and exercises from Chap 2
Example 1
Here’s the data
> x <- read.csv("test2.csv", header = T, stringsAsFactors = F) > print(x) count category 1 0 TOWN 2 1 TOWN 3 1 TOWN 4 0 TOWN 5 2 TOWN 6 3 TOWN 7 0 TOWN 8 1 TOWN 9 1 TOWN 10 1 TOWN 11 1 TOWN 12 2 TOWN 13 0 TOWN 14 1 TOWN 15 3 TOWN 16 0 TOWN 17 1 TOWN 18 2 TOWN 19 1 TOWN 20 3 TOWN 21 3 TOWN 22 4 TOWN 23 1 TOWN 24 3 TOWN 25 2 TOWN 26 0 TOWN 27 2 COUNTRY 28 0 COUNTRY 29 3 COUNTRY 30 0 COUNTRY 31 0 COUNTRY 32 1 COUNTRY 33 1 COUNTRY 34 1 COUNTRY 35 1 COUNTRY 36 0 COUNTRY 37 0 COUNTRY 38 2 COUNTRY 39 2 COUNTRY 40 0 COUNTRY 41 1 COUNTRY 42 2 COUNTRY 43 0 COUNTRY 44 0 COUNTRY 45 1 COUNTRY 46 1 COUNTRY 47 1 COUNTRY 48 0 COUNTRY 49 2 COUNTRY |
Coding up the likelihood function for poisson
> lambda0 <- 0.3 > loglikf <- function(lambda, z) { + loglik <- 0 + for (i in seq_along(z[, 1])) { + t1 <- z[i, 1] + t2 <- log(lambda) + t3 <- log(factorial(z[i, 1])) + temp <- t1 * t2 - lambda - t3 + loglik <- loglik + temp + } + return(loglik) + } > loglikf(lambda0, x) [1] -104.6974 |
Using optim
> lambda0 <- 0.3 > solution <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x) > print(solution$par) [1] 1.183594 |
Using optim separately
> lambda0 <- 0.3 > x1 <- x[x$category == "TOWN", ] > solution1 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x1) > print(solution1$par) [1] 1.423125 > x2 <- x[x$category == "COUNTRY", ] > solution2 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x2) > print(solution2$par) [1] 0.913125 |
Loglik of Separate Models Vs Loglik of single model
> loglikf(solution$par, x) [1] -68.38682 > loglikf(solution1$par, x1) + loglikf(solution2$par, x2) [1] -67.02295 |
They are very close.. How to statistically compare
Residuals for single model, individual models.
> (0:4 - solution$par)/sqrt(solution$par) [1] -1.0879309 -0.1687550 0.7504210 1.6695970 2.5887730 > (0:4 - solution1$par)/sqrt(solution1$par) [1] -1.1929480 -0.3546885 0.4835709 1.3218304 2.1600899 > (0:4 - solution2$par)/sqrt(solution2$par) [1] -0.95557574 0.09091378 1.13740330 2.18389282 3.23038234 |
Example 2
Here’s the data
> x <- read.csv("test3.csv", header = T, stringsAsFactors = F) > print(x) gender period weight 1 b 40 2968 2 b 38 2795 3 b 40 3163 4 b 35 2925 5 b 36 2625 6 b 37 2847 7 b 41 3292 8 b 40 3473 9 b 37 2628 10 b 38 3176 11 b 40 3421 12 b 38 2975 13 g 40 3317 14 g 36 2729 15 g 40 2935 16 g 38 2754 17 g 42 3210 18 g 39 2817 19 g 40 3126 20 g 37 2539 21 g 36 2412 22 g 38 2991 23 g 39 2875 24 g 40 3231 |
> library(ggplot2) > p <- ggplot(x, aes(y = weight, x = period, col = factor(gender))) > q <- p + geom_point(size = 3) > print(q) |
Looks like there is a linear relation
Fit a model with different intercept for b and g b, but have the same slope
> x$gender <- factor(x$gender) > fit <- lm(x$weight ~ x$period + x$gender + 0) > coef(fit) x$period x$genderb x$genderg 120.8943 -1610.2825 -1773.3218 |
Fit Separate models
> x1 <- x[x$gender == "b", ] > fit1 <- lm(x1$weight ~ x1$period) > coef(fit1) (Intercept) x1$period -1268.6724 111.9828 > x2 <- x[x$gender == "g", ] > fit2 <- lm(x2$weight ~ x2$period) > coef(fit2) (Intercept) x2$period -2141.667 130.400 |
One can see that deviance for both the models are very close
> deviance(fit) [1] 658770.7 > deviance(fit1) + deviance(fit2) [1] 652424.5 |
WHICH MODEL IS BETTER
> S0 <- deviance(fit) > S1 <- deviance(fit1) + deviance(fit2) > Fstat <- (S0 - S1)/(S1/20) > pf(Fstat, 1, 20) [1] 0.3361066 |
ALTERNATE REJECTED…
> qqnorm(residuals(fit)) > qqline(residuals(fit)) |
> plot.ts(residuals(fit)) |