Purpose
I want to tell a story of three friend, t distribution, Normal Distribution and Chi Square distribution.

Normal is easy to visualize

> set.seed(1977)
> n <- 10000
> x <- rnorm(n)
> par(mfrow = c(1, 1))
> hist(x, col = "grey", prob = T, ylim = c(0, 0.5), xlim = c(-3,
+     3), main = "Normal Distribution", breaks = seq(-9, 9, 0.05),
+     xlab = "")
> par(new = T)
> plot(density(x), ylim = c(0, 0.5), col = "red", lty = 1, lwd = 3,
+     xlim = c(-3, 3), main = "")

TDist-001.jpg

How does the density look for varying dof for t dist look

> par(new = F)
> cols <- rainbow(90)
> for (k in 2:90) {
+     K <- k
+     x <- rt(n, K)
+     plot(density(x), ylim = c(0, 1), col = cols[k], lty = 1,
+         lwd = 3, xlim = c(-3, 3), xlab = "", main = "")
+     par(new = T)
+ }

TDist-002.jpg

ChiSquare visualization

> set.seed(1977)
> par(new = F)
> n <- 10000
> xrange <- c(0, 20)
> k <- 1:5
> cols <- rainbow(5)
> par(mfrow = c(1, 1))
> for (k in 1:5) {
+     x <- rchisq(n, k)
+     hist(x, prob = T, ylim = c(0, 0.5), xlim = xrange, main = "Chi Square Distribution",
+         xlab = "", col = cols[k])
+     par(new = T)
+ }
> legend("topright", legend = 1:5, fill = cols)

TDist-003.jpg

How does the density look for varying dof for chi Square

> plot.new()
> par(new = F)
> k <- 1:8
> cols <- rainbow(8)
> for (k in 1:8) {
+     x <- rchisq(n, k)
+     plot(density(x), ylim = c(0, 1), col = cols[k], lty = 1,
+         lwd = 3, xlim = c(0, 30), xlab = "", main = "")
+     par(new = T)
+ }
> legend("topright", legend = 1:8, fill = cols)

TDist-004.jpg

How does the density look for varying dof for chi Square standardized look

> par(new = F)
> k <- 1:90
> cols <- rainbow(90)
> for (k in 1:90) {
+     K <- k + 10
+     x <- rchisq(n, K)
+     x.t <- (x - K)/sqrt(2 * K)
+     plot(density(x.t), ylim = c(0, 1), col = cols[k], lty = 1,
+         lwd = 3, xlim = c(-3, 3), xlab = "", main = "")
+     par(new = T)
+ }

TDist-005.jpg

NOW for the relation between the three dist
If x1 is N(0,1) and x2 is Chisq(k) then x1 / squareroot(x2/k) gives t dist

> par(new = F)
> n <- 1e+05
> k <- 3
> x <- rnorm(n)
> y <- rchisq(n, 3)
> y.hat <- y/3
> z <- x/sqrt(y.hat)
> plot(density(z), xlim = c(-4, 4), col = "blue", lwd = 2, main = "",
+     xlab = "")
> par(new = T)
> plot(density(x), xlim = c(-4, 4), col = "red", lwd = 2, main = "",
+     xlab = "")
> legend("topleft", legend = c("z", "normal"), fill = c("blue",
+     "red"))

TDist-006.jpg

Clearly it is not normal

> par(new = F)
> z1 <- rt(n, k)
> plot(density(z), xlim = c(-4, 4), col = "blue", lwd = 2, main = "",
+     xlab = "")
> par(new = T)
> plot(density(z1), xlim = c(-4, 4), col = "red", lwd = 2, main = "",
+     xlab = "")
> legend("topleft", legend = c("z", "t"), fill = c("blue", "red"))

TDist-007.jpg

  1. Thus the rv x1 / squareroot(x2/k) converges to t dist

Why … I know that ratio of two chi square is F and in that sense the above t stat is nothing but square root of F stat F(1,k) But what the hell is the connection between z and t

I AM COMPLETE DUMB FUCK BECOZ I HAVE NEVER LOOKED IN TO THE ASSUMPTIONS OF T TEST Fuck…I am 33 years now and in my whole life on this planet I have never every looked in to this The assumptions underlying a t-test are that

Most t-test statistics have the form T = Z/s, where Z and s are functions of the data.

  • Z follows a standard normal distribution under the null hypothesis
  • p times s square follows a Chi square distribution with p degrees of freedom under the null hypothesis, where p is a positive constant
  • Z and s are independent.

So, the assumptions actually create a connection between t statistic and F statistic