Purpose The purpose of this post is to summarize my learnings from John Fox book titled “An R and S Plus companion to Applied Regression”

Although I have read Faraway Linear Models with R, I would like to take 3 hours to go through this entire book and write down any specific code learnings from this book.

Chapter 3 TOOL : Histogram

> library(car)
> attach(Prestige)
> hist(income, n.bins(income))


n.bins implements Freedman - Diaconis formula for density estimation

TOOL : Stem and box plot

> stem(income)
  The decimal point is 3 digit(s) to the right of the |
0 | 6979 2 | 44689001125556667999 4 | 012233456777881111234566889 6 | 01233556679901145679 8 | 000012334488999936 10 | 4004 12 | 45 14 | 026 16 | 5 18 | 3 20 | 22 | 24 | 39


TOOL : Density Estimates

> hist(income, nclass = n.bins(income), prob = T)
> lines(density(income), lwd = 2)
> box()
> lines(density(income, adjust = 0.5), lwd = 1)


TOOL : Quantile Plot

> qq.plot(income)


TOOL : Box Plot

> boxplot(income)


TOOL : Scatter Plot

> scatterplot(income, prestige, span = 0.6, lwd = 3)


TOOL : Coded Scatter Plots

> scatterplot(prestige ~ income | type, span = 0.6, lwd = 3)
> detach(Prestige)


TOOL : Jitter

> attach(Vocab)
> par(mfrow = c(1, 2))
> plot(education, vocabulary, main = "Without Jitter")
> plot(jitter(education), jitter(vocabulary), main = "With Jitter")
> abline(lm(vocabulary ~ education))
> lines(lowess(education, vocabulary, f = 0.2), lwd = 3)


lowess , an acronym for locally weighted regression.

TOOL : Bivariate Density Estimates

> library(sm)
> attach(SLID)
> par(mfrow = c(1, 1))
> valid <- complete.cases(wages, education)
> sm.density(cbind(education[valid], wages[valid]), display = "image",
+     col = gray(seq(1, 0, length = 100)))
Warning: weights overwritten by binning
> points(jitter(education, amount = 0.25), wages, cex = 0.15)
> box()
> lines(lowess(education[valid], wages[valid], f = 1/3), lwd = 3)
> remove(valid)


TOOL : Parallel Box plots

> attach(Ornstein)
> boxplot(interlocks ~ nation)


TOOL : Scatterplot Matrices

> attach(Prestige)
> scatterplot.matrix(cbind(prestige, income, education, women),
+     diagonal = "density", span = 0.75)


TOOL : Conditional Plots

> attach(SLID)
> coplot(log(wages) ~ education | age + sex, panel = panel.car,
+     col = gray(0.5), lwd = 3, cex = 0.4)


TOOL : Box cox transformation

> summary(box.cox.powers(income))
Box-Cox Transformation to Normality
Est.Power Std.Err. Wald(Power=0) Wald(Power=1) 0.1793 0.1108 1.6179 -7.4062
L.R. test, power = 0: 2.7103 df = 1 p = 0.0997 L.R. test, power = 1: 47.261 df = 1 p = 0

Two additional tools that I learnt about are - Bulging rule and apply transformations accordingly - Spread level plot where the plot is between the log IQR and log median.