2009 KDD Cup entry – Model Description
http://www.vcasmo.com/swf/vcasmo.swf
Key Steps :
- Did not use R for data import operation - Used SPSS to read the data
- Feature Selection - Used R in this step
- Data Cleaning - Treatment of Categorical variables was a problem
Software used : SAS + R
Techniques used : Gradient Boosting machine(gbm package)
Rationale :
- Handling of missing values
- Robustness against extreme values
- Handling categorical and continous variables
- Models interaction between predictors
- Can model nonlinear dependencies
Fitting Time : Couple of hours on a desktop