Cracked density() function

After 3 weeks and close to 2 days of constant thinking about density functions, I have finally cracked the density function in base R.My head was spinning in all directions and I could not get my hands on cracking it.

Finally I did what any failed person would do to learn something. Start from scratch

Firstly I wrote my own R code to do kernel density estimation

> gauss.wt <- function(x) {
+     1/(sqrt(2 * pi)) * exp(-x^2/2)
+ }
> set.seed(1977)
> n <- 512
> obs.c <- sample(c(1, 2), n, T)
> mus <- c(3, 7)
> sigs <- c(1, 1)
> x <- sapply(obs.c, function(x) rnorm(1, mus[x], sigs[x]))
> n <- length(x)
> h <- bw.nrd0(x)
> prob <- sapply(x, function(z) (1/(n * h)) * sum(gauss.wt((z -
+     x)/h)))
> z <- cbind(x, prob)
> z <- z[order(x), ]

> par(mfrow = c(1, 1))
> plot(prob ~ x, type = "l", data = z, xlim = range(x), ylim = c(0,
+     0.2), col = "red", lwd = 3)

Ok, it looks like KDE

Now I wanted to compare with base R output

> res <- density(x)
> prob1 <- approx(res$x, res$y, z[, 1])

So, I was able to get the density values at all the relevant points.

> par(mfrow = c(1, 1))
> plot(prob ~ x, type = "l", data = z, xlim = range(x), ylim = c(0,
+     0.2), col = "grey", lwd = 8)
> points(z[, 1], prob1$y, col = "red", lty = "dashed", cex = 0.2,
+     pch = 19)

This was the first mini achievement. I could manually replicate the density function

Now the next task was to customize density function of R and code my own. I download the R source code, read through the C file that was being used in the density function and replicated the C code in R.

> density.rad <- function(x, bw = "nrd0", n = 512, cut = 3) {
+     x <- as.vector(x)
+     weights <- rep.int(1/n, n)
+     n <- max(n, 512)
+     if (n > 512)
+         n <- 2^ceiling(log2(n))
+     bw <- bw.nrd0(x)
+     from <- min(x) - cut * bw
+     to <- max(x) + cut * bw
+     lo <- from - 4 * bw
+     up <- to + 4 * bw
+     BinDist <- function(x, weights, lo, up, n) {
+         y <- numeric(2 * n)
+         xdelta <- (up - lo)/n
+         for (i in seq_along(x)) {
+             xpos <- (x[i] - lo)/xdelta
+             ix <- floor(xpos)
+             fx <- xpos - ix
+             wi <- weights[i]
+             y[ix] <- y[ix] + (1 - fx) * wi
+             y[ix + 1] <- y[ix + 1] + (fx) * wi
+         }
+         return(y)
+     }
+     y <- BinDist(x, weights, lo, up, n)
+     kords <- seq.int(0, 2 * (up - lo), length.out = 2L * n)
+     kords[(n + 2):(2 * n)] <- -kords[n:2]
+     kords <- dnorm(kords, sd = bw)
+     kords <- fft(fft(y) * Conj(fft(kords)), inverse = TRUE)
+     kords <- pmax.int(0, Re(kords)[1L:n]/length(y))
+     xords <- seq.int(lo, up, length.out = n)
+     x <- seq.int(from, to, length.out = n)
+     list(x = x, y = approx(xords, kords, x)$y)
+ }

> res <- density.rad(x)
> prob1 <- approx(res$x, res$y, z[, 1])

So, I was able to get the density values at all the relevant points.

> par(mfrow = c(1, 1))
> plot(prob ~ x, type = "l", data = z, xlim = range(x), ylim = c(0,
+     0.2), col = "grey", lwd = 8)
> points(z[, 1], prob1$y, col = "red", lty = "dashed", cex = 0.2,
+     pch = 19)

So, finally I have achieved a mini milestone, i.e. understanding the density function at a certain depth.

I am still unclear on the convolution front. Let me take some time and understand that aspect. But for now , let me blog this. After a very long time , I feel that I have cracked something nice. I had been going through a lot of troubles to understand this.