I have always wondered about the binning criterion in histogram, a basic tool that is used widely by any data analyst . By default, in most of the software available, Sturges’ Rule and Scott Rule are provided which say the following ,the bin width chosen should be

image

image 

There must be a derivation in some book but I had never bothered to look it up , until last week , when I had to for some reason.

Found a nice explanation by David Scott  and its well worth knowing the logic behind this simple rules. Why ? Becoz it helps in appreciating Kernel density estimation which takes care of the limitations of standard histogram plots. 

However all these rules should be used as a diagnostic tools , that too with a pound of salt..Histograms can be deceptive as one can see with the following data.

imageThe above data is from a gaussian random var and hence histogram plots are ok. But once we take mixture of normals, the default histogram fails…In the following , a mixture of normals data is used..

image 

As one can see from the above figure, one can hardly get the true picture of the bimodal nature of the data… One of the other binning methods used is the one proposed by  (Hardle, Muller, Sperlich and Werwatz, 2003)

image

Anyways, I have way diverged from the intent of the post which is the derivation of the simple Sturge’s rule and Scott Rule.