book_cover

This book is a great quick read that highlights various aspects of Neural Network components. There are three parts to the book. The first part of the book takes the reader through the various components of a Neural Network. The second part of the book equips the reader to write a simple piece of python code from scratch(not using any off-the-shelf library) that sets up a neural network, trains the neural network and test the structure for a certain set of test samples. The third part of the book adds some fun exercises that test a specific neural network.

Deep Learning as an algo and framework has become extremely popular in the last few years. In order to understand Deep Learning, a basic familiarly of Neural Networks in its original form is useful. This book acts as a good entry point for any ML newbie who wants to learn Neural Networks. Even though there are few equations mentioned in the first part of the book, the details of every component of the equation are explained in such a way that you just need to know high school math to understand the contents of the book.

The author starts with a very simple example of estimating a conversion function that takes input in kilometers and gives the output in miles. Although you might know the exact conversion between the two scales, there is another interesting way to go about solving this problem. If you know the exact value of the input in miles, then you can guess the conversion function and then check the extent of error between the function estimate and the true value. Based on the magnitude and direction of the error, one can adjust the conversion function in such a way that the output of the function is as close to the true value as possible. The way you tweak the various parameters in the conversion function depends on the derivative of the conversion function with respect to the various parameters.

In an Neural Network retraining, one needs to compute the partial derivative of the error with respect to a weight parameter and then use that value to tweak the weight parameter. Why should one do that ? What’s partial derivative of error got to do with weight parameter updates ? One feature that is particularly explained well in the book is the fact that there is a need for delayed updates for weight parameters. If you update parameters after every training sample, then the weight parameters are going to fit only the last training sample very well and the whole model might perform badly for all the other samples. By using learning rate as a hyperparameter, one can control the amount of weight updates. The rationale for something more than linear classifier is illustrating via classifying the output of XOR operator.

The analogy between water in a cup and activation function is very interesting. Observations suggest that neurons don’t react readily, but instead suppress the input until it has grown so large that it triggers an output. The idea of connecting the threshold with an activation function is a nice way to understand the activation functions. The idea that there are two functions that are being used on every node, one is the summation operator and the second is the activation function is illustrated via nice visuals. Even a cursory understanding of Neural Networks requires at least a basic knowledge of Matrices. The author does a great job of demonstrating the purpose of matrices, i.e. easy representation of the NN model as well as efficient computation of forward propagation and back propagation

The idea that the partial derivative is useful in tweaking the weight parameters is extremely powerful. One can visualize by thinking of a person trying to reach to the lowest point of a topsy turvy surface. There are many ways to reach the lowest point on the surface. In the process of reaching the lowest point, the gradient of the surface in several directions is essential as it will guide the person in moving in the right direction. The gradient gets smaller as you reach the minimum point. This basic calculus idea is illustrated via a set of rich visuals. Any person who has no understanding of partial derivatives will still be able to understand by going through the visuals.

The second part of the book takes the reader through a Python class that can be used to train and test a Neural Network structure. Each line of the code is explained in such a way that a Python newbie will have no problem in understanding the various components of the code.

The third part of the book invites the readers to test Neural network digit classification on a sample generated from their own handwriting. For fun, you can write down a few digits on a piece of paper and then check whether a NN network built trained based on x number of sample points can successfully recognize the digits in your handwriting.

The book shows a lot of visuals that show NN performance based on tweaking the hyperparameters such as learning rate, number of hidden layers etc. All of those visuals, I guess, will make any curious reader to explore this model further. In fact NN model in its new avatar, Deep Learning, is becoming a prerequisite skill for any Machine Learning engineer.