Purpose
To work on exercises from Dobson’s Book Chap 7
Survival Rates
> folder <- "C:/Cauldron/garage/R/soulcraft/Volatility/Learn/Dobson-GLM/"
> file.input <- paste(folder, "Tables 7.13 Adelaide graduates.csv",
+ sep = "")
> data <- read.csv(file.input, header = T, stringsAsFactors = F)
> data
year survive total faculty sex
1 1938 18 22 medicine men
2 1939 16 23 medicine men
3 1940 7 17 medicine men
4 1941 12 25 medicine men
5 1942 24 50 medicine men
6 1943 16 21 medicine men
7 1944 22 32 medicine men
8 1945 12 14 medicine men
9 1946 22 34 medicine men
10 1947 28 37 medicine men
11 1938 16 30 arts men
12 1939 13 22 arts men
13 1940 11 25 arts men
14 1941 12 14 arts men
15 1942 8 12 arts men
16 1943 11 20 arts men
17 1944 4 10 arts men
18 1945 4 12 arts men
19 1946 * * arts men
20 1947 13 23 arts men
21 1938 9 14 science men
22 1939 9 12 science men
23 1940 12 19 science men
24 1941 12 15 science men
25 1942 20 28 science men
26 1943 16 21 science men
27 1944 25 31 science men
28 1945 32 38 science men
29 1946 4 5 science men
30 1947 25 31 science men
31 1938 10 16 engineering men
32 1939 7 11 engineering men
33 1940 12 15 engineering men
34 1941 8 9 engineering men
35 1942 5 7 engineering men
36 1943 1 2 engineering men
37 1944 16 22 engineering men
38 1945 19 25 engineering men
39 1946 * * engineering men
40 1947 25 35 engineering men
41 1938 14 19 arts women
42 1939 11 16 arts women
43 1940 15 18 arts women
44 1941 15 21 arts women
45 1942 8 9 arts women
46 1943 13 13 arts women
47 1944 18 22 arts women
48 1945 18 22 arts women
49 1946 1 1 arts women
50 1947 13 16 arts women
51 1938 1 1 science women
52 1939 4 4 science women
53 1940 6 7 science women
54 1941 3 3 science women
55 1942 4 4 science women
56 1943 8 9 science women
57 1944 5 5 science women
58 1945 16 17 science women
59 1946 1 1 science women
60 1947 10 10 science women |
(a) Are the proportions of graduates who survived for 50 years after graduation the same all years ofgraduation?
> head(data)
year survive total faculty sex
1 1938 18 22 medicine men
2 1939 16 23 medicine men
3 1940 7 17 medicine men
4 1941 12 25 medicine men
5 1942 24 50 medicine men
6 1943 16 21 medicine men
> data <- data[data$survive != "*", ] |
Convert strings to numeric
> data$survive <- as.numeric(data$survive)
> data$total <- as.numeric(data$total)
> data$year <- factor(data$year) |
> t1 <- data.frame(tapply(data$survive, data$year, sum))
> t2 <- data.frame(tapply(data$total, data$year, sum))
> temp <- data.frame(year = rownames(t1), y = t1[, 1], total = t2[,
+ 1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y |
> plot(temp$year, temp$prob, pch = 19, col = "blue") |
> prop.test(cbind(temp$y, temp$ny))
10-sample test for equality of proportions without continuity
correction
data: cbind(temp$y, temp$ny)
X-squared = 15.0388, df = 9, p-value = 0.08988
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4 prop 5 prop 6 prop 7 prop 8
0.6666667 0.6818182 0.6237624 0.7126437 0.6272727 0.7558140 0.7377049 0.7890625
prop 9 prop 10
0.6829268 0.7500000 |
- Clearly prop test shows null can be rejected at 0.1 level. Which is more than clearly visible in the graph But alternate can be rejected at 0.05 level
(b) Are the proportions of male graduates who survived for 50 years after graduation the same for all Faculties?
> data.men <- data[data$sex == "men", ]
> t1 <- data.frame(tapply(data.men$survive, data.men$faculty, sum))
> t2 <- data.frame(tapply(data.men$total, data.men$faculty, sum))
> temp <- data.frame(faculty = rownames(t1), y = t1[, 1], total = t2[,
+ 1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y |
> plot(temp$faculty, temp$prob, pch = 19, col = "blue") |
> prop.test(cbind(temp$y, temp$ny))
4-sample test for equality of proportions without continuity
correction
data: cbind(temp$y, temp$ny)
X-squared = 23.226, df = 3, p-value = 3.623e-05
alternative hypothesis: two.sided
sample estimates:
prop 1 prop 2 prop 3 prop 4
0.5476190 0.7253521 0.6436364 0.7663551 |
- Clearly prop test shows null can be rejected . Which is more than clearly visible in the graph
(c) Are the proportions of female graduates who survived for 50 years after graduation the same for Arts and Science?
> data.women <- data[data$sex == "women", ]
> t1 <- data.frame(tapply(data.women$survive, data.women$faculty,
+ sum))
> t2 <- data.frame(tapply(data.women$total, data.women$faculty,
+ sum))
> temp <- data.frame(faculty = rownames(t1), y = t1[, 1], total = t2[,
+ 1])
> temp$prob <- temp$y/temp$total
> temp$ny <- temp$total - temp$y |
> plot(temp$faculty, temp$prob, pch = 19, col = "blue") |
> prop.test(cbind(temp$y, temp$ny))
2-sample test for equality of proportions with continuity correction
data: cbind(temp$y, temp$ny)
X-squared = 6.2537, df = 1, p-value = 0.01239
alternative hypothesis: two.sided
95 percent confidence interval:
-0.24224935 -0.05429445
sample estimates:
prop 1 prop 2
0.8025478 0.9508197 |
- Clearly prop test shows null can be rejected . Which is more than clearly visible in the graph
(d) Is the difference between men and women in the proportion ofgraduates who survived for 50 years after graduation the same for Arts and Science?
> data.comp <- data[data$faculty == "arts" | data$faculty == "science",
+ ]
> data.comp$ns <- data.comp$total - data.comp$survive
> xtabs(cbind(data.comp$survive, data.comp$ns) ~ data.comp$faculty +
+ data.comp$sex)
, , = V1
data.comp$sex
data.comp$faculty men women
arts 92 126
science 164 58
, , = V2
data.comp$sex
data.comp$faculty men women
arts 76 31
science 50 3
> t1 <- xtabs(cbind(data.comp$survive) ~ data.comp$faculty + data.comp$sex)
> t2 <- xtabs(cbind(data.comp$total) ~ data.comp$faculty + data.comp$sex)
> t3 <- as.matrix(t1/t2)
> t.test(t3[, 1] - t3[, 2])
One Sample t-test
data: t3[, 1] - t3[, 2]
t = -6.2357, df = 1, p-value = 0.1012
alternative hypothesis: true mean is not equal to 0
95 percent confidence interval:
-0.6673628 0.2279696
sample estimates:
mean of x
-0.2196966 |
- Well … the conf bands include 0 and hence there are no differences