Basic Plots
Purpose The purpose of this post is to summarize my learnings from John Fox book titled “An R and S Plus companion to Applied Regression”
Although I have read Faraway Linear Models with R, I would like to take 3 hours to go through this entire book and write down any specific code learnings from this book.
Chapter 3 TOOL : Histogram
> library(car) > attach(Prestige) > hist(income, n.bins(income)) |
n.bins implements Freedman - Diaconis formula for density estimation
TOOL : Stem and box plot
> stem(income) The decimal point is 3 digit(s) to the right of the | 0 | 6979 2 | 44689001125556667999 4 | 012233456777881111234566889 6 | 01233556679901145679 8 | 000012334488999936 10 | 4004 12 | 45 14 | 026 16 | 5 18 | 3 20 | 22 | 24 | 39 |
TOOL : Density Estimates
> hist(income, nclass = n.bins(income), prob = T) > lines(density(income), lwd = 2) > box() > lines(density(income, adjust = 0.5), lwd = 1) |
TOOL : Quantile Plot
> qq.plot(income) |
TOOL : Box Plot
> boxplot(income) |
TOOL : Scatter Plot
> scatterplot(income, prestige, span = 0.6, lwd = 3) |
TOOL : Coded Scatter Plots
> scatterplot(prestige ~ income | type, span = 0.6, lwd = 3) > detach(Prestige) |
TOOL : Jitter
> attach(Vocab) > par(mfrow = c(1, 2)) > plot(education, vocabulary, main = "Without Jitter") > plot(jitter(education), jitter(vocabulary), main = "With Jitter") > abline(lm(vocabulary ~ education)) > lines(lowess(education, vocabulary, f = 0.2), lwd = 3) |
lowess , an acronym for locally weighted regression.
TOOL : Bivariate Density Estimates
> library(sm) > attach(SLID) > par(mfrow = c(1, 1)) > valid <- complete.cases(wages, education) > sm.density(cbind(education[valid], wages[valid]), display = "image", + col = gray(seq(1, 0, length = 100))) Warning: weights overwritten by binning > points(jitter(education, amount = 0.25), wages, cex = 0.15) > box() > lines(lowess(education[valid], wages[valid], f = 1/3), lwd = 3) > remove(valid) |
TOOL : Parallel Box plots
> attach(Ornstein) > boxplot(interlocks ~ nation) |
TOOL : Scatterplot Matrices
> attach(Prestige) > scatterplot.matrix(cbind(prestige, income, education, women), + diagonal = "density", span = 0.75) |
TOOL : Conditional Plots
> attach(SLID) > coplot(log(wages) ~ education | age + sex, panel = panel.car, + col = gray(0.5), lwd = 3, cex = 0.4) |
TOOL : Box cox transformation
> summary(box.cox.powers(income)) Box-Cox Transformation to Normality Est.Power Std.Err. Wald(Power=0) Wald(Power=1) 0.1793 0.1108 1.6179 -7.4062 L.R. test, power = 0: 2.7103 df = 1 p = 0.0997 L.R. test, power = 1: 47.261 df = 1 p = 0 |
Two additional tools that I learnt about are - Bulging rule and apply transformations accordingly - Spread level plot where the plot is between the log IQR and log median.