Purpose
I thought I knew everything about boxplot and was even trying to skip the first chapter on boxplots. How naive of me ? I had recently heard stanford prof speaking about mindsets.

If there are 8 data points let’s say 1,2,3,…8 What is the median ?

> x <- 1:10
> print((x[5] + x[6])/2)
[1] 5.5
> print(median(x))
[1] 5.5

Whats the first quartile and third quartile?

> boxplot(x)
> y <- (boxplot(x))
> print(y)
$stats
     [,1]
[1,]  1.0
[2,]  3.0
[3,]  5.5
[4,]  8.0
[5,] 10.0
attr(,"class")
        1
"integer"
$n [1] 10
$conf [,1] [1,] 3.001801 [2,] 7.998199
$out numeric(0)
$group numeric(0)
$names [1] "1"

Boxplot-002.jpg

Well, at 33 years of age, I have learnt a lesson that , knowledge about anything is not fixed. It is growing

I was thinking that first quartile is at 3 and third quartile is at 8 But R results are little different. conf attribute shows that it is Why ? I don’t know the answer as yet..

> boxplot.default
function (x, ..., range = 1.5, width = NULL, varwidth = FALSE,
    notch = FALSE, outline = TRUE, names, plot = TRUE, border = par("fg"),
    col = NULL, log = "", pars = list(boxwex = 0.8, staplewex = 0.5,
        outwex = 0.5), horizontal = FALSE, add = FALSE, at = NULL)
{
    args <- list(x, ...)
    namedargs <- if (!is.null(attributes(args)$names))
        attributes(args)$names != ""
    else rep(FALSE, length.out = length(args))
    groups <- if (is.list(x))
        x
    else args[!namedargs]
    if (0 == (n <- length(groups)))
        stop("invalid first argument")
    if (length(class(groups)))
        groups <- unclass(groups)
    if (!missing(names))
        attr(groups, "names") <- names
    else {
        if (is.null(attr(groups, "names")))
            attr(groups, "names") <- 1:n
        names <- attr(groups, "names")
    }
    cls <- sapply(groups, function(x) class(x)[1])
    cl <- if (all(cls == cls[1]))
        cls[1]
    else NULL
    for (i in 1:n) groups[i] <- list(boxplot.stats(unclass(groups[[i]]),
        range))
    stats <- matrix(0, nrow = 5, ncol = n)
    conf <- matrix(0, nrow = 2, ncol = n)
    ng <- out <- group <- numeric(0)
    ct <- 1
    for (i in groups) {
        stats[, ct] <- i$stats
        conf[, ct] <- i$conf
        ng <- c(ng, i$n)
        if ((lo <- length(i$out))) {
            out <- c(out, i$out)
            group <- c(group, rep.int(ct, lo))
        }
        ct <- ct + 1
    }
    if (length(cl) && cl != "numeric")
        oldClass(stats) <- cl
    z <- list(stats = stats, n = ng, conf = conf, out = out,
        group = group, names = names)
    if (plot) {
        if (is.null(pars$boxfill) && is.null(args$boxfill))
            pars$boxfill <- col
        do.call("bxp", c(list(z, notch = notch, width = width,
            varwidth = varwidth, log = log, border = border,
            pars = pars, outline = outline, horizontal = horizontal,
            add = add, at = at), args[namedargs]))
        invisible(z)
    }
    else z
}
<environment: namespace:graphics>

Ok, the Five number summary is as follows median, lower quartile, upper quartile, extremes

> median(x)
[1] 5.5
> y$conf
         [,1]
[1,] 3.001801
[2,] 7.998199
> y$conf + c(-1.5, 1.5) * diff(y$conf)
          [,1]
[1,] -4.492797
[2,] 15.492797

Ok, to end with here are the basic properties of a boxplot

  1. Median and Mean bars are measures of location
  2. Relative location of the median and the mean in the box is a measure of skewness
  3. Length of the box and whiskers are a measure of spread
  4. Length of the whiskers indicate the tail length of the distribution
  5. Outlying points are indicated with * / o
  6. The boxplots do not indicate multi modality or clusters
  7. If we compare the relative size and location of the boxes, we are comparing distributions

So, Obviously Histograms are better for understanding multimodal distributions