Image Compression & SVD

Image compression is a billion dollar industry. An optimal way to store and retrieve image data storage is the key. Though there are multiple algorithms that are usually used , one such algorithm that has found wide spread popularity is SVD( Singular Value Decomposition).

To explain SVD in simple words, let’s take an image

For illustration purpose, I have deliberately chosen two colors. Now let’s say you want to store this data. One way is to store in a matrix. Lets say that the above image can be fit in a 15 by 40 matrix with each cell carrying either a white or a black colour. Now if you were to store the above data in a matrix , it will be a whole lot of 0’s to represent black color and sparse 1’s to represent white color. Most of the images that we come across are of the same pattern. If you represent them in a data matrix , it will turn out to be a sparse matrix, meaning a matrix with whole lot of 0’s as its elements. Now if you want to store the above data, you will be storing 15*40 = 600 numbers. It might not sound a lot but if you pick up an image from the real world, the image might have 512 pixels in each row and 256 pixels in each column, thus having massive amount of data (2 power 17 entries). The example that I have chosen is for mere illustration purpose but the principle is applicable to any massive image data matrix too.

One needs an effective way of storing data than the naive m * n ( row times column) data points. SVD provides a technique to do so. Well, in simple words, SVD breaks the data matrix in to sets of rank 1 matrices . The image quality improves as you consider more and more rank 1 matrices. Each rank one matrix needs a storage of m+n data points , meaning in the above case , 15+ 40 = 55 data points.

So, the more rank 1 matrices you want to use, the better the clarity of the picture. The beauty of this method is that , for sparse matrices, a few rank1 matrices usually do a good job of representing the image.

If I were to do SVD of the above matrix (15 rows*40 columns )and choose let’s say the first three rank 1 matrices, the image would look like this

Not good.Image looks very blurred.Let’s consider the first 6 rank 1 matrices ?

Looks better. What if we consider the first 8 rank 1 matrices ?

Looks much better. You can stop here OR based on the requirement, you can include a few more to get

At this stage, you have retrieved the image with total clarity and at the same time reduced the data storage for the image by a LOT. So, basically instead of storing 600 data points, depending upon the context, one can store far lesser data points and regenerate the image accordingly.

SVD has widespread applications in cryptography too. BTW, SVD is a crucial piece in Google’s page rank algorithm.

My particular interest in SVD stems from the fact that it is an incredibly useful tool in statistics. There are multiple areas where one can use SVD. But let me mention one of the most commonly known area, linear regression. If you have a linear model Xb = Y , the solution to b has a term (X*X) inverse. You cannot find X*X inverse by the usual methods as it is numerically very unstable. One uses SVD to get around these numerical issues and quickly gets to know the linear model coefficients. Recently I came across an intra day arb model where SVD is heavily used.

An SVD trivia – Netflix $1 million dollar prize for improving their recommendation algo , had seen many participants using SVD to improve their algo results.