MLE
Purpose
To understand MLE and code a function so that you can estimate the parameters for MLE. The basic idea is to look at any ARMA process , estimate the parameters , their standard errors, t statistics etc by maximizing the MLE function.
Typically one uses MLE function, tries to look at mean, sigma of the process. If there are let’s say q parameters and you fit a MLE function with q parameters, then one can maximize the MLE function.
This MLE function can then be subsequently used to find out various stuff about the parameters.
First lets simulate some data
> set.seed(1977) > N <- 10000 > x <- cbind(1, runif(N)) > beta.true <- c(2, 3) > error.var <- 3 > indep <- as.matrix(x) > dep <- indep %*% beta.true + sqrt(error.var) * rnorm(N) > fit <- lm(dep ~ indep[, 2]) > fit.sum <- summary(fit) > coef(fit.sum)[, 1] (Intercept) indep[, 2] 2.035283 2.957402 |
Now lets write the MLE function and the gradient function. The input to one pair is parameter as a vector. The input to another pairs is parameter as a list.
> myMLE <- function(params, n, dep, indep) { + beta.test <- as.matrix(params[1:2]) + sigma.sq <- as.matrix(params[3]) + if (sigma.sq <= 0) + return(NA) + e <- dep - indep %*% beta.test + loglike <- (-n/2) * log(2 * pi) - (n/2) * log(sigma.sq) - + (1/(2 * sigma.sq)) * t(e) %*% (e) + return(-loglike) + } > myparams <- list(intercept = 1, slope = 1, sigma.sq = 1) > myMLE.p <- function(myparams, n, dep, indep) { + myparams <- as.list(myparams) + beta.test <- as.matrix(c(myparams$intercept, myparams$slope)) + sigma.sq <- as.matrix(myparams$sigma.sq) + if (sigma.sq <= 0) + return(NA) + e <- dep - indep %*% beta.test + loglike <- (-n/2) * log(2 * pi) - (n/2) * log(sigma.sq) - + (1/(2 * sigma.sq)) * t(e) %*% (e) + return(-loglike) + } > ols.gradient <- function(params, n, dep, indep) { + beta.test <- as.matrix(params[1:2]) + sigma.sq <- params[3] + e <- dep - indep %*% beta.test + g <- numeric(3) + g[1:2] <- (t(indep) %*% e)/sigma.sq + g[3] <- (-n/(2 * sigma.sq)) + (t(e) %*% e)/(2 * sigma.sq * + sigma.sq) + return(-g) + } > fit <- optim(c(1, 1, 1), myMLE, n = N, dep = dep, indep = indep) > cat("The fit parameters are ", fit$par) The fit parameters are 2.035939 2.957095 3.051341 > fit <- optim(myparams, myMLE.p, n = N, dep = dep, indep = indep) > cat("The fit parameters are ", fit$par) The fit parameters are 2.035939 2.957095 3.051341 > fit <- optim(c(1, 1, 1), method = "L-BFGS-B", myMLE, gr = ols.gradient, + lower = c(-Inf, -Inf, 1e-06), upper = rep(Inf, 3), n = N, + dep = dep, indep = indep) > cat("The fit parameters are ", fit$par) The fit parameters are 2.035283 2.957401 3.051711 > fit <- optim(c(1, 1, 1), method = "L-BFGS-B", myMLE, gr = ols.gradient, + lower = c(-Inf, -Inf, 1e-06), upper = rep(Inf, 3), hessian = TRUE, + n = N, dep = dep, indep = indep) > inverted <- solve(fit$hessian) > results <- cbind(fit$par, sqrt(diag(inverted)), fit$par/sqrt(diag(inverted))) > colnames(results) <- c("Coefficient", "Std. Err.", "t") > rownames(results) <- c("Sigma", "Intercept", "X") > cat("MLE results --\n") MLE results -- > print(results) Coefficient Std. Err. t Sigma 2.035283 0.03509248 57.99769 Intercept 2.957401 0.06077070 48.66492 X 3.051711 0.04315770 70.71068 |
Results using other optimization routines are
> fit <- optim(c(1, 1, 1), method = "Nelder-Mead", myMLE, gr = ols.gradient, + n = N, dep = dep, indep = indep) > cat("The fit using Nelder Mead is ", fit$par, "\n") The fit using Nelder Mead is 2.035939 2.957095 3.051341 > fit <- optim(c(1, 1, 1), method = "L-BFGS-B", myMLE, gr = ols.gradient, + lower = c(-Inf, -Inf, 1e-06), upper = rep(Inf, 3), hessian = TRUE, + n = N, dep = dep, indep = indep) > cat("The fit using L-BFGS-B is ", fit$par) The fit using L-BFGS-B is 2.035283 2.957401 3.051711 > fit <- optim(c(1, 1, 1), method = "SANN", myMLE, gr = ols.gradient, + n = N, dep = dep, indep = indep) > cat("The fit using SANN is ", fit$par) The fit using SANN is 1 1 1 > fit <- optim(c(1, 1, 1), method = "CG", myMLE, gr = ols.gradient, + n = N, dep = dep, indep = indep) > cat("The fit using CG is ", fit$par) The fit using CG is 2.035283 2.957401 3.051710 |
The key is obviously to decide on the prior distribution and then you can do all the jazz to work on the parameters.