Dobson - GLM - Chap 2
Purpose
To work out the examples and exercises from Chap 2
Example 1
Here’s the data
> x <- read.csv("test2.csv", header = T, stringsAsFactors = F)
> print(x)
count category
1 0 TOWN
2 1 TOWN
3 1 TOWN
4 0 TOWN
5 2 TOWN
6 3 TOWN
7 0 TOWN
8 1 TOWN
9 1 TOWN
10 1 TOWN
11 1 TOWN
12 2 TOWN
13 0 TOWN
14 1 TOWN
15 3 TOWN
16 0 TOWN
17 1 TOWN
18 2 TOWN
19 1 TOWN
20 3 TOWN
21 3 TOWN
22 4 TOWN
23 1 TOWN
24 3 TOWN
25 2 TOWN
26 0 TOWN
27 2 COUNTRY
28 0 COUNTRY
29 3 COUNTRY
30 0 COUNTRY
31 0 COUNTRY
32 1 COUNTRY
33 1 COUNTRY
34 1 COUNTRY
35 1 COUNTRY
36 0 COUNTRY
37 0 COUNTRY
38 2 COUNTRY
39 2 COUNTRY
40 0 COUNTRY
41 1 COUNTRY
42 2 COUNTRY
43 0 COUNTRY
44 0 COUNTRY
45 1 COUNTRY
46 1 COUNTRY
47 1 COUNTRY
48 0 COUNTRY
49 2 COUNTRY |
Coding up the likelihood function for poisson
> lambda0 <- 0.3
> loglikf <- function(lambda, z) {
+ loglik <- 0
+ for (i in seq_along(z[, 1])) {
+ t1 <- z[i, 1]
+ t2 <- log(lambda)
+ t3 <- log(factorial(z[i, 1]))
+ temp <- t1 * t2 - lambda - t3
+ loglik <- loglik + temp
+ }
+ return(loglik)
+ }
> loglikf(lambda0, x)
[1] -104.6974 |
Using optim
> lambda0 <- 0.3 > solution <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x) > print(solution$par) [1] 1.183594 |
Using optim separately
> lambda0 <- 0.3 > x1 <- x[x$category == "TOWN", ] > solution1 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x1) > print(solution1$par) [1] 1.423125 > x2 <- x[x$category == "COUNTRY", ] > solution2 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2), + z = x2) > print(solution2$par) [1] 0.913125 |
Loglik of Separate Models Vs Loglik of single model
> loglikf(solution$par, x) [1] -68.38682 > loglikf(solution1$par, x1) + loglikf(solution2$par, x2) [1] -67.02295 |
They are very close.. How to statistically compare
Residuals for single model, individual models.
> (0:4 - solution$par)/sqrt(solution$par) [1] -1.0879309 -0.1687550 0.7504210 1.6695970 2.5887730 > (0:4 - solution1$par)/sqrt(solution1$par) [1] -1.1929480 -0.3546885 0.4835709 1.3218304 2.1600899 > (0:4 - solution2$par)/sqrt(solution2$par) [1] -0.95557574 0.09091378 1.13740330 2.18389282 3.23038234 |
Example 2
Here’s the data
> x <- read.csv("test3.csv", header = T, stringsAsFactors = F)
> print(x)
gender period weight
1 b 40 2968
2 b 38 2795
3 b 40 3163
4 b 35 2925
5 b 36 2625
6 b 37 2847
7 b 41 3292
8 b 40 3473
9 b 37 2628
10 b 38 3176
11 b 40 3421
12 b 38 2975
13 g 40 3317
14 g 36 2729
15 g 40 2935
16 g 38 2754
17 g 42 3210
18 g 39 2817
19 g 40 3126
20 g 37 2539
21 g 36 2412
22 g 38 2991
23 g 39 2875
24 g 40 3231 |
> library(ggplot2) > p <- ggplot(x, aes(y = weight, x = period, col = factor(gender))) > q <- p + geom_point(size = 3) > print(q) |

Looks like there is a linear relation
Fit a model with different intercept for b and g b, but have the same slope
> x$gender <- factor(x$gender) > fit <- lm(x$weight ~ x$period + x$gender + 0) > coef(fit) x$period x$genderb x$genderg 120.8943 -1610.2825 -1773.3218 |
Fit Separate models
> x1 <- x[x$gender == "b", ] > fit1 <- lm(x1$weight ~ x1$period) > coef(fit1) (Intercept) x1$period -1268.6724 111.9828 > x2 <- x[x$gender == "g", ] > fit2 <- lm(x2$weight ~ x2$period) > coef(fit2) (Intercept) x2$period -2141.667 130.400 |
One can see that deviance for both the models are very close
> deviance(fit) [1] 658770.7 > deviance(fit1) + deviance(fit2) [1] 652424.5 |
WHICH MODEL IS BETTER
> S0 <- deviance(fit) > S1 <- deviance(fit1) + deviance(fit2) > Fstat <- (S0 - S1)/(S1/20) > pf(Fstat, 1, 20) [1] 0.3361066 |
ALTERNATE REJECTED…
> qqnorm(residuals(fit)) > qqline(residuals(fit)) |

> plot.ts(residuals(fit)) |
