T-Chisquare-F
Purpose
I want to tell a story of three friend, t distribution, Normal Distribution and Chi Square distribution.
Normal is easy to visualize
> set.seed(1977) > n <- 10000 > x <- rnorm(n) > par(mfrow = c(1, 1)) > hist(x, col = "grey", prob = T, ylim = c(0, 0.5), xlim = c(-3, + 3), main = "Normal Distribution", breaks = seq(-9, 9, 0.05), + xlab = "") > par(new = T) > plot(density(x), ylim = c(0, 0.5), col = "red", lty = 1, lwd = 3, + xlim = c(-3, 3), main = "") |
How does the density look for varying dof for t dist look
> par(new = F) > cols <- rainbow(90) > for (k in 2:90) { + K <- k + x <- rt(n, K) + plot(density(x), ylim = c(0, 1), col = cols[k], lty = 1, + lwd = 3, xlim = c(-3, 3), xlab = "", main = "") + par(new = T) + } |
ChiSquare visualization
> set.seed(1977) > par(new = F) > n <- 10000 > xrange <- c(0, 20) > k <- 1:5 > cols <- rainbow(5) > par(mfrow = c(1, 1)) > for (k in 1:5) { + x <- rchisq(n, k) + hist(x, prob = T, ylim = c(0, 0.5), xlim = xrange, main = "Chi Square Distribution", + xlab = "", col = cols[k]) + par(new = T) + } > legend("topright", legend = 1:5, fill = cols) |
How does the density look for varying dof for chi Square
> plot.new() > par(new = F) > k <- 1:8 > cols <- rainbow(8) > for (k in 1:8) { + x <- rchisq(n, k) + plot(density(x), ylim = c(0, 1), col = cols[k], lty = 1, + lwd = 3, xlim = c(0, 30), xlab = "", main = "") + par(new = T) + } > legend("topright", legend = 1:8, fill = cols) |
How does the density look for varying dof for chi Square standardized look
> par(new = F) > k <- 1:90 > cols <- rainbow(90) > for (k in 1:90) { + K <- k + 10 + x <- rchisq(n, K) + x.t <- (x - K)/sqrt(2 * K) + plot(density(x.t), ylim = c(0, 1), col = cols[k], lty = 1, + lwd = 3, xlim = c(-3, 3), xlab = "", main = "") + par(new = T) + } |
NOW for the relation between the three dist
If x1 is N(0,1) and x2 is Chisq(k) then x1 / squareroot(x2/k) gives t dist
> par(new = F) > n <- 1e+05 > k <- 3 > x <- rnorm(n) > y <- rchisq(n, 3) > y.hat <- y/3 > z <- x/sqrt(y.hat) > plot(density(z), xlim = c(-4, 4), col = "blue", lwd = 2, main = "", + xlab = "") > par(new = T) > plot(density(x), xlim = c(-4, 4), col = "red", lwd = 2, main = "", + xlab = "") > legend("topleft", legend = c("z", "normal"), fill = c("blue", + "red")) |
Clearly it is not normal
> par(new = F) > z1 <- rt(n, k) > plot(density(z), xlim = c(-4, 4), col = "blue", lwd = 2, main = "", + xlab = "") > par(new = T) > plot(density(z1), xlim = c(-4, 4), col = "red", lwd = 2, main = "", + xlab = "") > legend("topleft", legend = c("z", "t"), fill = c("blue", "red")) |
- Thus the rv x1 / squareroot(x2/k) converges to t dist
Why … I know that ratio of two chi square is F and in that sense the above t stat is nothing but square root of F stat F(1,k) But what the hell is the connection between z and t
I AM COMPLETE DUMB FUCK BECOZ I HAVE NEVER LOOKED IN TO THE ASSUMPTIONS OF T TEST Fuck…I am 33 years now and in my whole life on this planet I have never every looked in to this The assumptions underlying a t-test are that
Most t-test statistics have the form T = Z/s, where Z and s are functions of the data.
- Z follows a standard normal distribution under the null hypothesis
- p times s square follows a Chi square distribution with p degrees of freedom under the null hypothesis, where p is a positive constant
- Z and s are independent.
So, the assumptions actually create a connection between t statistic and F statistic