Dobson - GLM - Chap 2 - RK's Musings

Purpose
To work out the examples and exercises from Chap 2

Example 1
Here’s the data

> x <- read.csv("test2.csv", header = T, stringsAsFactors = F)
> print(x)
   count category
1      0     TOWN
2      1     TOWN
3      1     TOWN
4      0     TOWN
5      2     TOWN
6      3     TOWN
7      0     TOWN
8      1     TOWN
9      1     TOWN
10     1     TOWN
11     1     TOWN
12     2     TOWN
13     0     TOWN
14     1     TOWN
15     3     TOWN
16     0     TOWN
17     1     TOWN
18     2     TOWN
19     1     TOWN
20     3     TOWN
21     3     TOWN
22     4     TOWN
23     1     TOWN
24     3     TOWN
25     2     TOWN
26     0     TOWN
27     2  COUNTRY
28     0  COUNTRY
29     3  COUNTRY
30     0  COUNTRY
31     0  COUNTRY
32     1  COUNTRY
33     1  COUNTRY
34     1  COUNTRY
35     1  COUNTRY
36     0  COUNTRY
37     0  COUNTRY
38     2  COUNTRY
39     2  COUNTRY
40     0  COUNTRY
41     1  COUNTRY
42     2  COUNTRY
43     0  COUNTRY
44     0  COUNTRY
45     1  COUNTRY
46     1  COUNTRY
47     1  COUNTRY
48     0  COUNTRY
49     2  COUNTRY

Coding up the likelihood function for poisson

> lambda0 <- 0.3
> loglikf <- function(lambda, z) {
+     loglik <- 0
+     for (i in seq_along(z[, 1])) {
+         t1 <- z[i, 1]
+         t2 <- log(lambda)
+         t3 <- log(factorial(z[i, 1]))
+         temp <- t1 * t2 - lambda - t3
+         loglik <- loglik + temp
+     }
+     return(loglik)
+ }
> loglikf(lambda0, x)
[1] -104.6974

Using optim

> lambda0 <- 0.3
> solution <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2),
+     z = x)
> print(solution$par)
[1] 1.183594

Using optim separately

> lambda0 <- 0.3
> x1 <- x[x$category == "TOWN", ]
> solution1 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2),
+     z = x1)
> print(solution1$par)
[1] 1.423125
> x2 <- x[x$category == "COUNTRY", ]
> solution2 <- optim(par = lambda0, fn = loglikf, control = list(fnscale = -2),
+     z = x2)
> print(solution2$par)
[1] 0.913125

Loglik of Separate Models Vs Loglik of single model

> loglikf(solution$par, x)
[1] -68.38682
> loglikf(solution1$par, x1) + loglikf(solution2$par, x2)
[1] -67.02295

They are very close.. How to statistically compare

Residuals for single model, individual models.

> (0:4 - solution$par)/sqrt(solution$par)
[1] -1.0879309 -0.1687550  0.7504210  1.6695970  2.5887730
> (0:4 - solution1$par)/sqrt(solution1$par)
[1] -1.1929480 -0.3546885  0.4835709  1.3218304  2.1600899
> (0:4 - solution2$par)/sqrt(solution2$par)
[1] -0.95557574  0.09091378  1.13740330  2.18389282  3.23038234

Example 2
Here’s the data

> x <- read.csv("test3.csv", header = T, stringsAsFactors = F)
> print(x)
   gender period weight
1       b     40   2968
2       b     38   2795
3       b     40   3163
4       b     35   2925
5       b     36   2625
6       b     37   2847
7       b     41   3292
8       b     40   3473
9       b     37   2628
10      b     38   3176
11      b     40   3421
12      b     38   2975
13      g     40   3317
14      g     36   2729
15      g     40   2935
16      g     38   2754
17      g     42   3210
18      g     39   2817
19      g     40   3126
20      g     37   2539
21      g     36   2412
22      g     38   2991
23      g     39   2875
24      g     40   3231

> library(ggplot2)
> p <- ggplot(x, aes(y = weight, x = period, col = factor(gender)))
> q <- p + geom_point(size = 3)
> print(q)

Looks like there is a linear relation
Fit a model with different intercept for b and g b, but have the same slope

> x$gender <- factor(x$gender)
> fit <- lm(x$weight ~ x$period + x$gender + 0)
> coef(fit)
  x$period  x$genderb  x$genderg
  120.8943 -1610.2825 -1773.3218

Fit Separate models

> x1 <- x[x$gender == "b", ]
> fit1 <- lm(x1$weight ~ x1$period)
> coef(fit1)
(Intercept)   x1$period
 -1268.6724    111.9828
> x2 <- x[x$gender == "g", ]
> fit2 <- lm(x2$weight ~ x2$period)
> coef(fit2)
(Intercept)   x2$period
  -2141.667     130.400

One can see that deviance for both the models are very close

> deviance(fit)
[1] 658770.7
> deviance(fit1) + deviance(fit2)
[1] 652424.5

WHICH MODEL IS BETTER

> S0 <- deviance(fit)
> S1 <- deviance(fit1) + deviance(fit2)
> Fstat <- (S0 - S1)/(S1/20)
> pf(Fstat, 1, 20)
[1] 0.3361066

ALTERNATE REJECTED…

> qqnorm(residuals(fit))
> qqline(residuals(fit))

> plot.ts(residuals(fit))