Fitting Distribution in R
Purpose
Is to work through FITTING DISTRIBUTIONS WITH R Release 0.4-21 February 2005 It is a 6 year old document but I think I should get my hands dirty a bit to be a plug and play mode.Also , any revisit of this document will make me think in a different way.
There are four steps in fitting distributions 1. Model/funtion choice 2. Estimate parameters 3. Evaluate goodness of fit 4. Goodness of fit statistical tests
Histogram
> x.norm <- rnorm(n = 200, m = 10, sd = 2) > hist(x.norm, main = "Histogram of observed data") |
Density state
> plot(density(x.norm), main = "Density estimate") |
ECDF state
> plot(ecdf(x.norm), main = "Density estimate") |
This is something I learnt today
qqplot can be used to check any sample distribution with theoretical distribution
> x.wei <- rweibull(n = 200, shape = 2.1, scale = 1.1) > x.teo <- rweibull(n = 200, shape = 2, scale = 1) > qqplot(x.teo, x.wei, main = "QQ-plot distr. Weibull") > abline(0, 1) |
The biggest challenge is that , curves only differ by mean, variability, skewness and kurtosis. So, if you standardize the variables, the a functional form with both skewness and kurtosis can be used to check the distribution
Normal
> curve(dnorm(x, m = 10, sd = 2), from = 0, to = 20, main = "Normal") |
Gamma
> curve(dgamma(x, scale = 1.5, shape = 2), from = 0, to = 15, main = "Gamma") |
Weibull
> curve(dweibull(x, scale = 2.5, shape = 1.5), from = 0, to = 15, + main = "weibull") |
Skewness and Kurtosis for Normal Distribution
> library(fBasics) > skewness(x.norm) [1] 0.0132836 attr(,"method") [1] "moment" > kurtosis(x.norm) [1] -0.1502271 attr(,"method") [1] "excess" |
Skewness and Kurtosis for Weibull
> skewness(x.wei) [1] 0.8799243 attr(,"method") [1] "moment" > kurtosis(x.wei) [1] 0.5694369 attr(,"method") [1] "excess" |
Check normal
> x <- rnorm(1000, 2, 3) > fitdistr(x, "normal") mean sd 2.02103915 3.06295954 (0.09685929) (0.06848986) |
Check gamma
> x <- rgamma(1000, 2, 3) > fitdistr(x, "gamma") shape rate 2.11176055 3.22441614 (0.08800314) (0.15158757) |
Check pois
> x <- rpois(1000, 2) > fitdistr(x, "poisson") lambda 1.96500000 (0.04432832) |
Look at the amazing collection that this fitdistr can be used to check the distributions
Chi-Square goodness of fit
> n <- 200 > x.pois <- rpois(n, 2.5) > lambda.est <- mean(x.pois) > tab.os <- table(x.pois) > freq.os <- vector() > for (i in 1:length(tab.os)) { + freq.os[i] <- tab.os[[i]] + } > freq.ex <- dpois((0:max(x.pois)), lambda = lambda.est) * n > acc <- mean(abs(freq.os - trunc(freq.ex))) > acc * 100/mean(freq.os) [1] 16.5 |
I began going through this 24 page document to check whether a particular distribution comes from cauchy.
Let me generate a gaussian normal variable and check whether it comes from cauchy
> x <- rnorm(1000, 0, 1) > h <- hist(x, breaks = 20) > y <- rcauchy(1000, 0) > y.cut <- cut(y, breaks = h$breaks) > observed.freq <- as.vector(table(y.cut))/1000 > expected.freq <- as.vector(h$density) > temp <- 0 > for (i in seq_along(observed.freq)) { + temp <- temp + ((observed.freq[i] - expected.freq[i])^2)/expected.freq[i] + } > dof <- length(observed.freq) - 2 - 1 > pchisq(temp, dof) [1] 1 |
This p is extremely small and hence one can reject the null hypothesis.
Obviously sample from cauchy is vastly different from sample from normal
Another learning is the package vcd which can be used to check goodness of fit
> library(vcd) > y <- rpois(1000, 2) > gf <- goodfit(y, type = "poisson", method = "MinChisq") > summary(gf) Goodness-of-fit test for poisson distribution X^2 df P(> X^2) Pearson 5.228552 7 0.6320941 |