Hobson GLM - Chapter 7-Exercise -2

Purpose
To work on exercises from Dobson’s Book Chap 7

Survival Rates

> folder <- "C:/Cauldron/garage/R/soulcraft/Volatility/Learn/Dobson-GLM/"
> file.input <- paste(folder, "Tables 7.13 Adelaide graduates.csv",
+     sep = "")
> data <- read.csv(file.input, header = T, stringsAsFactors = F)
> data
   year survive total     faculty   sex
1  1938      18    22    medicine   men
2  1939      16    23    medicine   men
3  1940       7    17    medicine   men
4  1941      12    25    medicine   men
5  1942      24    50    medicine   men
6  1943      16    21    medicine   men
7  1944      22    32    medicine   men
8  1945      12    14    medicine   men
9  1946      22    34    medicine   men
10 1947      28    37    medicine   men
11 1938      16    30        arts   men
12 1939      13    22        arts   men
13 1940      11    25        arts   men
14 1941      12    14        arts   men
15 1942       8    12        arts   men
16 1943      11    20        arts   men
17 1944       4    10        arts   men
18 1945       4    12        arts   men
19 1946       *     *        arts   men
20 1947      13    23        arts   men
21 1938       9    14     science   men
22 1939       9    12     science   men
23 1940      12    19     science   men
24 1941      12    15     science   men
25 1942      20    28     science   men
26 1943      16    21     science   men
27 1944      25    31     science   men
28 1945      32    38     science   men
29 1946       4     5     science   men
30 1947      25    31     science   men
31 1938      10    16 engineering   men
32 1939       7    11 engineering   men
33 1940      12    15 engineering   men
34 1941       8     9 engineering   men
35 1942       5     7 engineering   men
36 1943       1     2 engineering   men
37 1944      16    22 engineering   men
38 1945      19    25 engineering   men
39 1946       *     * engineering   men
40 1947      25    35 engineering   men
41 1938      14    19        arts women
42 1939      11    16        arts women
43 1940      15    18        arts women
44 1941      15    21        arts women
45 1942       8     9        arts women
46 1943      13    13        arts women
47 1944      18    22        arts women
48 1945      18    22        arts women
49 1946       1     1        arts women
50 1947      13    16        arts women
51 1938       1     1     science women
52 1939       4     4     science women
53 1940       6     7     science women
54 1941       3     3     science women
55 1942       4     4     science women
56 1943       8     9     science women
57 1944       5     5     science women
58 1945      16    17     science women
59 1946       1     1     science women
60 1947      10    10     science women

(a) Are the proportions of graduates who survived for 50 years after graduation the same all years ofgraduation?

> head(data)
  year survive total  faculty sex
1 1938      18    22 medicine men
2 1939      16    23 medicine men
3 1940       7    17 medicine men
4 1941      12    25 medicine men
5 1942      24    50 medicine men
6 1943      16    21 medicine men
> data <- data[data$survive != "*", ]

Convert strings to numeric

> data$survive <- as.numeric(data$survive)
> data$total <- as.numeric(data$total)
> data$year <- factor(data$year)

> t1 <- data.frame(tapply(data$survive, data$year, sum))
> t2 <- data.frame(tapply(data$total, data$year, sum))
> temp <- data.frame(year = rownames(t1), y = t1[, 1], total = t2[,
+     1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y

> plot(temp$year, temp$prob, pch = 19, col = "blue")

> prop.test(cbind(temp$y, temp$ny))
        10-sample test for equality of proportions without continuity
        correction

data:  cbind(temp$y, temp$ny)
X-squared = 15.0388, df = 9, p-value = 0.08988
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4    prop 5    prop 6    prop 7    prop 8
0.6666667 0.6818182 0.6237624 0.7126437 0.6272727 0.7558140 0.7377049 0.7890625
   prop 9   prop 10
0.6829268 0.7500000

Clearly prop test shows null can be rejected at 0.1 level. Which is more than clearly visible in the graph But alternate can be rejected at 0.05 level

(b) Are the proportions of male graduates who survived for 50 years after graduation the same for all Faculties?

> data.men <- data[data$sex == "men", ]
> t1 <- data.frame(tapply(data.men$survive, data.men$faculty, sum))
> t2 <- data.frame(tapply(data.men$total, data.men$faculty, sum))
> temp <- data.frame(faculty = rownames(t1), y = t1[, 1], total = t2[,
+     1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y

> plot(temp$faculty, temp$prob, pch = 19, col = "blue")

> prop.test(cbind(temp$y, temp$ny))
        4-sample test for equality of proportions without continuity
        correction

data:  cbind(temp$y, temp$ny)
X-squared = 23.226, df = 3, p-value = 3.623e-05
alternative hypothesis: two.sided
sample estimates:
   prop 1    prop 2    prop 3    prop 4
0.5476190 0.7253521 0.6436364 0.7663551

Clearly prop test shows null can be rejected . Which is more than clearly visible in the graph

> data.women <- data[data$sex == "women", ]
> t1 <- data.frame(tapply(data.women$survive, data.women$faculty,
+     sum))
> t2 <- data.frame(tapply(data.women$total, data.women$faculty,
+     sum))
> temp <- data.frame(faculty = rownames(t1), y = t1[, 1], total = t2[,
+     1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y

> plot(temp$faculty, temp$prob, pch = 19, col = "blue")

> prop.test(cbind(temp$y, temp$ny))
        2-sample test for equality of proportions with continuity correction

data:  cbind(temp$y, temp$ny)
X-squared = 6.2537, df = 1, p-value = 0.01239
alternative hypothesis: two.sided
95 percent confidence interval:
 -0.24224935 -0.05429445
sample estimates:
   prop 1    prop 2
0.8025478 0.9508197

Clearly prop test shows null can be rejected . Which is more than clearly visible in the graph

(d) Is the difference between men and women in the proportion ofgraduates who survived for 50 years after graduation the same for Arts and Science?

> data.comp <- data[data$faculty == "arts" | data$faculty == "science",
+     ]
> data.comp$ns <- data.comp$total - data.comp$survive
> xtabs(cbind(data.comp$survive, data.comp$ns) ~ data.comp$faculty +
+     data.comp$sex)
, ,  = V1

                 data.comp$sex
data.comp$faculty men women
          arts     92   126
          science 164    58

, ,  = V2

                 data.comp$sex
data.comp$faculty men women
          arts     76    31
          science  50     3
> t1 <- xtabs(cbind(data.comp$survive) ~ data.comp$faculty + data.comp$sex)
> t2 <- xtabs(cbind(data.comp$total) ~ data.comp$faculty + data.comp$sex)
> t3 <- as.matrix(t1/t2)
> t.test(t3[, 1] - t3[, 2])
        One Sample t-test

data:  t3[, 1] - t3[, 2]
t = -6.2357, df = 1, p-value = 0.1012
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
 -0.6673628  0.2279696
sample estimates:
 mean of x
-0.2196966

Well … the conf bands include 0 and hence there are no differences