Basic Plots

Purpose The purpose of this post is to summarize my learnings from John Fox book titled “An R and S Plus companion to Applied Regression”

Although I have read Faraway Linear Models with R, I would like to take 3 hours to go through this entire book and write down any specific code learnings from this book.

Chapter 3 TOOL : Histogram

> library(car)
> attach(Prestige)
> hist(income, n.bins(income))

n.bins implements Freedman - Diaconis formula for density estimation

TOOL : Stem and box plot

> stem(income)
  The decimal point is 3 digit(s) to the right of the |

   0 | 6979
   2 | 44689001125556667999
   4 | 012233456777881111234566889
   6 | 01233556679901145679
   8 | 000012334488999936
  10 | 4004
  12 | 45
  14 | 026
  16 | 5
  18 | 3
  20 |
  22 |
  24 | 39

TOOL : Density Estimates

> hist(income, nclass = n.bins(income), prob = T)
> lines(density(income), lwd = 2)
> box()
> lines(density(income, adjust = 0.5), lwd = 1)

TOOL : Quantile Plot

> qq.plot(income)

TOOL : Box Plot

> boxplot(income)

TOOL : Scatter Plot

> scatterplot(income, prestige, span = 0.6, lwd = 3)

TOOL : Coded Scatter Plots

> scatterplot(prestige ~ income | type, span = 0.6, lwd = 3)
> detach(Prestige)

TOOL : Jitter

> attach(Vocab)
> par(mfrow = c(1, 2))
> plot(education, vocabulary, main = "Without Jitter")
> plot(jitter(education), jitter(vocabulary), main = "With Jitter")
> abline(lm(vocabulary ~ education))
> lines(lowess(education, vocabulary, f = 0.2), lwd = 3)

lowess , an acronym for locally weighted regression.

TOOL : Bivariate Density Estimates

> library(sm)
> attach(SLID)
> par(mfrow = c(1, 1))
> valid <- complete.cases(wages, education)
> sm.density(cbind(education[valid], wages[valid]), display = "image",
+     col = gray(seq(1, 0, length = 100)))
Warning: weights overwritten by binning
> points(jitter(education, amount = 0.25), wages, cex = 0.15)
> box()
> lines(lowess(education[valid], wages[valid], f = 1/3), lwd = 3)
> remove(valid)

TOOL : Parallel Box plots

> attach(Ornstein)
> boxplot(interlocks ~ nation)

TOOL : Scatterplot Matrices

> attach(Prestige)
> scatterplot.matrix(cbind(prestige, income, education, women),
+     diagonal = "density", span = 0.75)

TOOL : Conditional Plots

> attach(SLID)
> coplot(log(wages) ~ education | age + sex, panel = panel.car,
+     col = gray(0.5), lwd = 3, cex = 0.4)

TOOL : Box cox transformation

> summary(box.cox.powers(income))
Box-Cox Transformation to Normality

 Est.Power Std.Err. Wald(Power=0) Wald(Power=1)
    0.1793   0.1108        1.6179       -7.4062

L.R. test, power = 0:  2.7103   df = 1   p = 0.0997
L.R. test, power = 1:  47.261   df = 1   p = 0

Two additional tools that I learnt about are - Bulging rule and apply transformations accordingly - Spread level plot where the plot is between the log IQR and log median.