Why econometricians need to learn new tricks

Here is a write-up by Hal R. Varian that talks about the deluge of data that makes it imperative for econometricians to equip themselves with “big data” skills. Some of the main points of the write-up are : Machine learning techniques such as decision trees, support vector machines, neural nets, deep learning and so on may allow for more effective ways to model complex relationships Data cleaning tools such as OpenRefine and Data Wrangler can be used to assist in data cleansing

A Mind for Numbers

This book is mainly targeted at high school / college kids who feel their learning efforts are not paying off, teachers who are on the look out for effective instruction techniques, parents who are concerned with their child’s academic results and want to do something about it. The author of the book, Dr. Barbara Oakley, has an interesting background. She served in the US army as a language translator before transitioning to academia.

Matrix Algebra : Theory, Computations, and Applications in Statistics

We often come across mathematical expressions represented via matrices and assume that numerical calculations exactly happen the way expressions appear. Let’s take for example These are the well known “normal equations” to compute regression coefficients. One might look at this expression and conclude that the code that computes beta inverts the Gramian matrix XTX and then multiplies the inverse with XTy. Totally false. Why? The condition number of the Gramian matrix XTX equals square of the condition number of X.

Axler revisited

I was looking for something in my old stack of books when I stumbled on to Sheldon Axler’s fantastic book, ‘'Linear Algebra Done Right". I have fond memories about the book. I think the last time I referred to this book was more than 3.5 years ago. Took a few hours to go over the book again. Like wine that tastes better when aged, I think some books also give the same kind of effect, at least to me.

Curse of Dimensionality

Our intuition does not serve well in high dimensional spaces. Hence there are few issues with using nearest neighbor methods on high dimensional data. Firstly, the methods that involve capturing a fixed neighborhood around the points gives a high variance for the fit. Secondly, if you relax the fixed neighborhood criterion and try to capture a specific number of neighbors, the methods are no longer local. Hence it pays to think through these issues on whatever dataset you are working on.