The Pattern on the Stone : Book Review

book-cover

If one rips apart a computer and looks at its innards, one sees a coalescence of beautiful ideas. The modern computer is built based on electronics but the ideas behind its design has nothing to do with electronics. A basic design can be built from valves/water pipes etc. The principles are the essence of what makes computers compute.

This book introduces some of the main ideas of computer science such as Boolean logic, finite-state machines, programming languages, compilers and interpreters, Turing universality, information theory, algorithms and algorithmic complexity, heuristics, uncomutable functions, parallel computing, quantum computing, neural networks, machine learning, and self-organizing systems.

Who might be the target audience of the book ? I guess the book might appeal to someone who has a passing familiarity of main ideas of computer science and wants to know a bit more, without being overwhelmed. If you ever wanted to explain the key ideas of computing to a numerate high school kid in such a way that it is illuminating and at the same time makes him/her curious, and you are struggling to explain the ideas in simple words, then you might enjoy this book.

Nuts and Bolts

Recollect any algebraic equation you would have come across in your high school. Now replace the variables with logic statements that are either true or false, replace the arithmetic operations with relevant Boolean operators, you get a Boolean algebraic equation. This sort of algebra was invented by George Boole. It was Claude Shannon who showed that one could build electronic circuits that could mirror any Boolean expression. The implication of this construction is that any function capable of being described as a precise logical statement can be implemented by an analogous system of switches.

Two principles of any type of computer involves 1) reducing a task to a set of logical functions and 2) implementing logical functions as a circuit of connected switches. In order to illustrate these two principles, the author narrates his experience of building tic-tac-toe game with a set of switches and bulbs. If you were to manually enumerate all the possible combinations of a 2 player game, one of the best representations of all the moves is via a decision tree. This decision tree will help one choose the right response for whatever be your opponent’s move. Traversing through the decision tree means evaluating a set of Boolean expressions. Using switches in series or parallel, it is possible to create a automated response to a player’s moves. The author describes briefly the circuit he built using 150 odd switches. So, all one needs to construct an automated tic-tac-toe game is switches, wires, bulbs and a way to construct logical gates. There are two crucial elements missing from the design. First is that the circuit has no concept of events happening over time; therefore the entire sequence of game must be determined in advance. The second aspect that is missing is that the circuit can perform only one function. There is no software in it.

Early computers were made with mechanical components. One can represent AND, OR, NOR, etc. using a set of mechanical contraptions that can then be used to perform calculations. Even in 1960’s most of the arithmetic calculators were mechanical.

Building a computer out of any technology requires two components, i.e.

switches : this is the steering element which can combine multiple signals in to one signal
connectors : carries signal between switches

The author uses hydraulic valves, tinker toys, mechanical contraptions, electrical circuits to show the various ways in which the Boolean logic can be implemented. The basic takeaway from this chapter is the principle of functional abstraction. This process of functional abstraction is a fundamental in computer design-not the only way to design complicated systems but the most common way. Computers are built up on a hierarchy of such functional abstractions, each one embodied in a building block. Once someone implements a 0/1 logic, you built stuff over it. The blocks that perform functions are hooked together to implement more complex functions, and these collections of blocks in turn become the new building blocks for the next level.

Universal Building blocks

The author explains the basic mechanism behind translating any logical function in to a circuit. One needs to break down the logical function in to parts, figure out the type of gates that are required to translate various types of input to the appropriate output of a logical function. To make this somewhat abstract principle more concrete, the author comes up with a circuit design for “Majority wins” logical block. He also explains how one could design a circuit for “Rocks-Paper-Scissors” game. The learnings from these simple circuits are then extrapolated to the general computer. Consider an operation like addition or multiplication; you can always represent it as a Boolean logic block. Most computers have logical blocks called arithmetic units that perform this job.

Hence the core idea behind making a computer perform subtraction, addition, multiplication, or whatever computation is to write out the Boolean logic on a piece of paper and then implement the same using logical gates.

The most important class of functions are time varying functions, i.e. the output depends on the previous history of inputs. These are handled via a finite-state machine. The author describes the basic idea of finite-state machine via examples such as as ball point pen, combination lock, tally counter, odometer etc. To store this finite-state machine, one needs a device called register. An n-bit register has n inputs and n outputs, plus an additional timing input that tells the register when to change state. Storing new information is called “writing” the state of the register. When the timing signal tells the register to write a new state, the register changes its state to match the inputs. The outputs of the register always indicate its current state. Registers can be implemented in many ways, one of which is to use a Boolean logic block to steer the state information around in a circle. This type of register is often used in electronic computers, which is why they lose track of what they’re doing if their power is interrupted.

A finite-state machine consists of a Boolean logic block connected to a register. The finite-state machine advances its state by writing the output of the Boolean logic block into the register; the logic block then computes the next state, based on the input and the current state. This next state is then written into the register on the next cycle. The process repeats in every cycle. The machine on which I am writing this blog post is 1.7GHz machine, i.e. the machine can change its state at a rate of 1.7billion times per second. To explain the details of a finite state machine and its circuit implementation, the author uses familiar examples like “traffic lights”, “combination lock”. These examples are superbly explained and that’s the beauty of this book. Simple examples are chosen to illustrate profound ideas.

Finite state machines are powerful but limited. They cannot be used for stating many patterns that are common in our world. For instance, it is impossible to build a finite-state machine that will unlock a lock whenever you enter any palindrome. So, we need something else too besides logic gates and finite state machines.

Programming

Boolean logic and finite-state machine are the building blocks of computer hardware. Programming language is the building block of computer software. There are many programming languages and if you learn one or two, you can quickly pick up the others. Having said, writing code to perform something is one thing and writing effective code is completely a different thing. The latter takes many hours of deliberate effort. This is aptly put by the author,

Every computer language has its Shakespeares, and it is a joy to read their code. A well-written computer program possesses style, finesse, even humor-and a clarity that rivals the best prose.

There are many programming languages and each has its own specific syntax.The syntax in these languages are convenient to write as compared to machine level language instructions. Once you have a written a program, how does the machine know what to do ? There are three main steps :

a finite-state machine can be extended, by adding a storage device called a memory, which will allow the machine to store the definitions of what it’s asked to do
extended machine can follow instructions written in machine language, a simple language that specifies the machine’s operation
machine language can instruct the machine to interpret the programming language

A computer is just a special type of finite-state machine connected to a memory. The computer’s memory-in effect, an array of cubbyholes for storing data-is built of registers, like the registers that hold the states of finite-state machines. Each register holds a pattern of bits called a word, which can be read (or written) by the finite-state machine. The number of bits in a word varies from computer to computer. Each register in the memory has a different address, so registers are referred to as locations in memory. The memory contains Boolean logic blocks, which decode the address and select the location for reading or writing. If data is to be written at this memory location, these logic blocks store the new data into the addressed register. If the register is to be read, the logic blocks steer the data from the addressed register to the memory’s output, which is connected to the input of the finite-state machine.Memory can contain data, processing instructions, control instructions. These instructions are stored in machine language. Here is the basic hierarchy of functional dependence over various components:

Whatever we need the computer to perform, we write it in a programming language
This is converted in to machine language by a compiler, via a predetermined set of subroutines called operating system
The instructions are stored in a memory. These are categorized in to control and processing instructions.
Finite state machines fetch the instructions and execute the instructions
The instructions as well as data are represented by bits, are stored in the memory
Finite state machines and memory are built based on storage registers and Boolean blocks
Boolean blocks are implemented via switches in series or parallel
Switches control something physical that sends 0 or 1

If you look at these steps, each idea is built up on a a level of abstraction. In one sense, that’s why any one can write simple software programs without understanding a lot of details about the computer architecture. However the more one goes gets closer to the machine language, the more he needs to understand the details of various abstractions that have been implemented.

How universal are Turing machines ?

The author discusses the idea of universal computer, that was first described in 1937 by Alan Turing. What’s a Turing machine ?

Imagine a mathematician performing calculations on a scroll of paper. Imagine further that the scroll is infinitely long, so that we don’t need to worry about running out of places to write things down. The mathematician will be able to solve any solvable computational problem no matter how many operations are involved, although it may take him an inordinate amount of time.

Turing showed that any calculation that can be performed by a smart mathematician can also be performed by a stupid but meticulous clerk who follows a simple set of rules for reading and writing the information on the scroll. In fact, he showed that the human clerk can be replaced by a finite-state machine. The finite-state machine looks at only one symbol on the scroll at a time,
so the scroll is best thought of as a narrow paper tape, with a single symbol on each line. Today, we call the combination of a finite-state machine with an infinitely long tape a Turing machine

The author also gives a few example of noncomputable problems such as “Halting problem”. He also touches briefly upon quantum computing and explains the core idea using water molecule

When two hydrogen atoms bind to an oxygen atom to form a water molecule, these atoms somehow “compute” that the angle between the two bonds should be 107 degrees. It is possible to approximately calculate this angle from quantum mechanical principles using a digital computer, but it takes a long time, and the more accurate the calculation the longer it takes. Yet every molecule in a glass of water is able to perform this calculation almost instantly. How can a single molecule be so much faster than a digital computer?

The reason it takes the computer so long to calculate this quantum mechanical problem is that the computer would have to take into account an infinite number of possible configurations of the water molecule to produce an exact answer. The calculation must allow for the fact that the atoms comprising the molecule can be in all configurations at once. This is why the computer can only approximate the answer in a finite amount of time.

One way of explaining how the water molecule can make the same calculation is to imagine it trying out every possible configuration simultaneously-in other words, using parallel processing. Could we harness this simultaneous computing capability of quantum mechanical objects to produce a more powerful computer? Nobody knows for sure.

Algorithms and Heuristics

The author uses simple examples to discuss various aspects of any algorithm like designing an algorithm, computing the running time of an algorithm etc. For many of the problems where precise algorithms are not available, the next best option is to use heuristics. Designing heuristics is akin to art. Most of the real life problems requires one to come up with a healthy mix of algorithmic solutions and heuristics based solutions. IBM Deep Blue is an amazing example that shows the intermixing of algorithms and heuristics can beat one of the best human minds. Indeed there will be a few cognitive tasks that will be out of computer’s reach and humans have to focus on skills that are inherently human. A book length treatment is given to this topic by Geoff Colvin in his book (Humans are underrated)

Memory : Information and Secret codes

Computers do not have infinite memory. There needs a way to measure the information stored in the memory. An n bit memory can store n bits of data. But we need to know how many bits are required to store a certain form of input. One can think of various ways for doing so. One could use the “representative” definition, i.e think about how each character in the input is represented on a computer and then assess the total number of bits required to represent the input in a computer. Let’s say this blog post has 25000 characters and each character takes about 8 bits on my computer, then the blog post takes up 0.2 million bits. 8 bits for each character could be a stretch. May be all the characters in this post might require only 6 bits per character. Hence there would 0.15 million bits required to represent this post. The problem with this kind of quantifying information is that it is “representation” dependent. Ideal measure would be the minimum number of bits needed to represent this information. Hence the key question here is, “How much can you compress a given text without losing information?”.

Let’s say one wants to store the text of “War and Peace”, the information size obtained by multiplying 8 bits times number of characters in the novel would give a upper bound. By considering various forms of compression such as 6 bit encoding, taking advantage in regularities of data, take advantage of grammar associated with the language, one can reduce the information size of the novel. In the end, the compression that uses the best available statistical methods would probably reach an average representation size of fewer than 2 bits per character-about 25 percent of the standard 8-bit character representation.

If the minimum number of bits required to represent the image is taken as a measure of the amount of information in the image, then an image that is easy to compress will have less information. A picture of a face, for example, will have less information than a picture of a pile of pebbles on the beach, because the adjacent pixels in the facial image are more likely to be similar. The pebbles require more information to be communicated and stored, even though a human observer might find the picture of the face much more informative. By this measure, the picture containing the most information would be a picture of completely random pixels, like the static on a damaged television set. If the dots in the image have no correlation to their neighbors, there is no regularity to compress. So, pictures that are totally random require lot many bits and hence contain lot of information. This go against our notion of information. In common parlance, a picture with random dots should have less information that a picture with some specific dot pattern. Hence it is important for computers to store meaningful information. Indeed that’s how many image and sound compression algos work. They discard meaningless information. Another level of generalization of this idea is to consider a program that can generate the data that is being stored. This leads us to another measure of information:

The amount of information in a pattern of bits is equal to the length of the smallest computer program capable of generating those bits.

This definition of information holds whether the pattern of bits ultimately represents a picture, a sound, a text, a number, or anything else.

The second part of this chapter talks about public-key private key encryption and error correction mechanisms. The author strips off all the math and explains it in plain simple English.

Speed: Parallel Computers

“Parallel computing” is a word that is tossed around at many places. What is the basic problem with a normal computer that parallel computing aims to solve? If you look at the basic design of a computer, it has always been that processing and memory were considered two separate components, and all the effort was directed towards increasing the processing speed. If you compare the modern silicon chip based computer to let’s say the vintage room filled computer, the basic two-part design has remained the same, i.e. processor connected to memory design. This has come to be known as sequential computer and has given the need for parallel computing. To work any faster, today’s computers need to do more than one operation at once. We can accomplish this by breaking up the computer memory into lots of little memories and giving each its own processor. Such a machine is called a parallel computer. Parallel computers are practical because of the low cost and small size of microprocessors. We can build a parallel computer by hooking together dozens, hundreds, or even thousands of these smaller processors. The fastest computers in the world are massively parallel computers, which use thousands or even tens of thousands of processors.

In the early days of parallel computing, many people were skeptical about it for a couple of reasons, the main being Amdahl’s law. It stated that no matter how many parallel processor you use, there will be at least 10% of the task that needs to be done sequentially and hence the task completion time time does not do down as rapidly as one uses more and more processors. Soon it was realized that most of the tasks had negligible amount of sequential component. By smart design, one could parallelize many tasks.

Highly parallel computers are now fairly common. They are used mostly in very large numerical calculations (like the weather simulation) or in large database calculations, such as extracting marketing data from credit card transactions. Since parallel computers are built of the same parts as personal computers, they are likely to become less expensive and more common with time. One of the most interesting parallel computers today is the one that is emerging almost by accident from the networking of sequential machines. The worldwide network of computers called the Internet is still used primarily as a communications system for people. The computers act mostly as a medium-storing and delivering information (like electronic mail) that is meaningful only to humans.Already standards are beginning to emerge that allow these computers to exchange programs as well as data. The computers on the Internet, working together, have a potential computational capability that far surpasses any individual computer that has ever been constructed.

Computers that learn and adapt

The author gives some basic ideas on which adaptive learning has been explored via algorithms. Using the example of neutral networks, perceptron, etc. the author manages to explain the key idea of machine learning.

Beyond Engineering

Brain cannot be analyzed via the usual “divide and conquer” mechanism that is used to understand “sequential computer”. As long as the function of each part is carefully specified and implemented, and as long as the interactions between the parts are controlled and predictable, this system of “divide and conquer” works very well, but an evolved object like the brain does not necessarily have this kind of hierarchical structure. The brain is much more complicated than a computer, yet it is much less prone to catastrophic failure. The contrast in reliability between the brain and the computer illustrates the difference between the products of evolution and those of engineering. A single error in a computer’s program can cause it to crash, but the brain is usually able to tolerate bad ideas and incorrect information and even malfunctioning components. Individual neurons in the brain are constantly dying, and are never replaced; unless the damage is severe, the brain manages to adapt and compensate for these failures. Humans rarely crash.

So, how does on go about designing something that is different from “engineering”? The author illustrates this via “sorting numbers” example in which one can go about as follows

Generate a “population” of random programs
Test the population to find which programs are the most successful.
Assign a fitness score to each program based on how successfully they sort the numbers
Create new populations descended from the high-scoring programs. One can think of many ways here; only the fittest survive / “breed” new programs by pairing survivors in the previous generation
When the new generation of programs is produced, it is again subjected to the same testing and selection procedure, so that once again the fittest programs survive and reproduce. A parallel computer will produce a new generation every few seconds, so the selection and variation processes can feasibly be repeated many thousands of times. With each generation, the average fitness of the population tends to increase-that is, the programs get better and better at sorting. After a few thousand generations, the programs will sort perfectly.

The author confesses that the output from the above experiments works, but it is difficult to “understand” why it works.

One of the interesting things about the sorting programs that evolved in my experiment is that I do not understand how they work. I have carefully examined their instruction sequences, but I do not understand them: I have no simpler explanation of how the programs work than the instruction sequences themselves. It may be that the programs are not understandable-that there is no way to break the operation of the program into a hierarchy of understandable parts. If
this is true-if evolution can produce something as simple as a sorting program which is fundamentally incomprehensible-it does not bode well for our prospects of ever understanding the human brain.

The chapter ends with a discussion about building a thinking machine.

Takeaway :

The book explains the main ideas of “computer science” in simple words. Subsequently, it discusses many interesting areas that are currently being explored by researchers and practitioners. Anyone who is curious about things like “How do computers work ?”, “What are the limitations of the today’s computer hardware and software ?”, etc. and wants to get some idea about them, without getting overwhelmed by the answers, will find this book interesting.