Bumping

Classification trees fail miserably in some cases and in such situations, bumping might be a good method. A stylized example of bumping is as follows : Imagine that there are two covariates x1 and x2 and the true class labels dependend on XORing the two covariates. The orange labels represent one class and blue labels represent another class. If you run any sort of plain vanilla classification algorithm that does greedy binary splits, the algo will fail.

Dynamic Linear Models with R : Summary

Link : Detailed Summary of the book Takeaway: dlm package in R is one of the best resources out there in the open source community that can be used for DLM inference. The fact that one of the authors is also the contributors to the package has made this book apt for practitioners. However the book is best understood after having a working knowledge of Bayesian inference. By understanding and thinking in State space framework, a modeler gets many more options to model univariate or multivariate time series data.

Volatility understanding – Reality check

Taleb and Goldstein asked the following question to about 87 people that included portfolio managers, Ivy league graduates and investment professionals : A stock (or a fund) has an average return of 0%. It moves on average 1% a day in absolute value; the average up move is 1% and the average down move is 1%. It does not mean that all up moves are 1%–some are .6%, others 1.

More accurate estimate == Poor classification

Jerome Friedman’s paper titled, “On bias, variance, 0/1-loss, and the curse-of-dimensionality”, provides a great insight in to the way classification errors work. The paper throws light on the way bias and variance conspire to make some of the highly biased methods perform well on test data. Naive Bayes works, KNN works and so do many such classifiers that are highly biased. This paper gives the actual math behind classification error and shows that the additive nature of bias and variance that holds good for estimation error cannot be generalized to classification error.