Why econometricians need to learn new tricks

Here is a write-up by Hal R. Varian that talks about the deluge of data that makes it imperative for econometricians to equip themselves with “big data” skills. Some of the main points of the write-up are :

Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning and so on may allow for more effective ways to model complex relationships
Data cleaning tools such as OpenRefine and Data Wrangler can be used to assist in data cleansing
Data analysis in statistics and econometrics can be broken down into four categories: 1) prediction, 2) summarization, 3) estimation, and 4) hypothesis testing. Machine learning is concerned primarily with prediction; the closely related field of data mining is also concerned with summarization, and particularly in finding interesting patterns in the data. Econometricians, statisticians, and data mining specialists are generally looking for insights that can be extracted from the data. Machine learning specialists are often primarily concerned with developing high-performance computer systems that can provide useful predictions in the presence of challenging computational constraints. Data science, a somewhat newer term, is concerned with both prediction and summarization, but also with data manipulation, visualization, and other similar tasks.
When confronted with a prediction problem, an economist would think immediately of a linear or logistic regression. However, there may be better choices, particularly if a lot of data is available. These include nonlinear methods such as 1) classification and regression trees (CART), 2) random forests, and 3) penalized regression such as LASSO, LARS, and elastic nets. (There are also other techniques such as neural nets, deep learning, and support vector machines which I do not cover in this review.)
The test-train cycle and cross validation are very commonly used in machine learning and, in my view, should be used much more in economics, particularly when working with large datasets. For many years, economists have reported in-sample goodness-of-fit measures using the excuse that we had small datasets. But now that larger datasets have become available, there is no reason not to use separate training and testing sets.
An example is given where plain vanilla logistic regression fails whereas trees work There are many other approaches to creating trees, including some that are explicitly statistical in nature. For example, a conditional inference tree," or ctree for short, chooses the structure of the tree using a sequence of hypothesis tests. The resulting trees tend to need very little pruning.
Econometricians are well-acquainted with the bootstrap but rarely use the other two methods. Bagging is primarily useful for nonlinear models such as trees. Boosting tends to improve predictive performance of an estimator significantly and can be used for pretty much any kind of classifier or regression model, including logits, probits, trees, and so on. It is also possible to combine these techniques and create a “forest" of trees that can often significantly improve on single-tree methods.
Ensembles of decision trees (often known as Random Forests) have been the most successful general-purpose algorithm in modern times.
Elastic net regression, a combination of lasso and ridge regression can be used for more robust fits
Another approach to variable selection that is novel to most economists is spike-and-slab regression, a Bayesian technique.
Bayesian Structural Time Series (BSTS) seem to work well for variable selection problems in time series applications.
“ Bayesian variable selection for nowcasting economic time series” – A Paper that shows how machine learning techniques can be used by econometricians
There is a lot to learn for Machine learners from Econometricians in the area of causal inference. Econometricians have developed several tools for causal inference such as instrumental variables, regression discontinuity, difference-in-differences and various forms of natural and designed experiments.
How to understand the effect of an ad campaign with out a control group ? The paper explains it via BSTS model
An important insight from machine learning is that averaging over many small models tends to give better out-of-sample prediction than choosing a single model.
In 2006, Netflix offered a million dollar prize to researchers who could provide the largest improvement to their existing movie recommendation system. The winning submission involved a complex blending of no fewer than 800 models" though they also point out that predictions of good quality can usually be obtained by combining a small number of judiciously chosen methods." It also turned out that a blend of the best and second-best submissions outperformed both of them.
Read up everything that is on Bayesian averaging. It is going to a play a critical role in the times to come
In this period of “big data" it seems strange to focus on sampling uncertainty, which tends to be small with large datasets, while completely ignoring model uncertainty which may be quite large.
Books/Papers/ online stuff suggested by Hal R. Varian :
Pedro Domingos. A few useful things to know about machine learning
Graham Williams. Data Mining with Rattle and R.
ESL
ISL
Xindong Wu and Vipin Kumar, editors. The Top Ten Algorithms in Data Mining.
Bill Howe. Introduction to data science
Jeff Leek videos on Coursera
Venables and Ripley
Kevin Murphy’s book - Machine Learning A Probabalistic Perspective