Superforecasting - Book Review

Superforecasting: Book review

An optimistic skeptic

The chapter starts off by saying that there are indeed people in the world who become extremely popular, make tons of money, get all the press coverage, by providing a perfect explanation, after the fact. The author gives one such example of a public figure who rose to fame explaining events post-fact, Tom Friedman. At the same time, there are many people who are lesser known in the public space but have an extraordinary skill at forecasting. The purpose of the book is to explain how these ordinary people come up with reliable forecasts and beat the experts hands down.

Philip Tetlock, one of the authors of the book is responsible for a landmark study spanning 20 years(1984-2004) that compares experts predictions and random predictions. The conclusion was, the average expert had done little better than guessing on many of the political and economic questions. Even though the right comparison should have been a coin toss, the popular financial media used “dart-throwing chimpanzee pitted against experts” analogy. In a sense, the analogy was more “sexy” than the mundane word, “random”. The point was well taken by all, that the experts are not all that good in predicting outcomes. However the author feels disappointed that his study has been used to dish out extreme opinions about experts and forecasting abilities such as, “all expert are useless”. Tetlock believes that it is possible to see into the future, at least in some situations and to some extent, and that any intelligent, open-minded, and hardworking person can cultivate the requisite skills. Hence one needs to have “optimistic” mindset about predictions. It is foolhardy to have a notion that all predictions are useless.

The word “Skeptic” in the chapter’s title reflects the mindset on must possess in this increasingly nonlinear world. The chapter mentions an example of Tunisian man committing suicide that leads to a massive revolution in the Arab world. Could anyone have predicted such catastrophic ripple effects of a seemingly common event ? It is easy to look backward and sketch a narrative arc, but difficult to actually peer in to the future and forecast? To make effective predictions, the mindset should be that of an “optimistic skeptic”.

So is reality clock-like or cloud-like? Is the future predictable or not? These are false dichotomies, the first of many we will encounter. We live in a world of clocks and clouds and a vast jumble of other metaphors. Unpredictability and predictability coexist uneasily in the intricately interlocking systems that make up our bodies, our societies, and the cosmos. How predictable something is depends on what we are trying to predict, how far into the future, and under what circumstances.

In fields where forecasts have been reliable and good, one sees that the people who make these forecasts follow, Forecast, measure, revise. Repeat. procedure. It’s a never-ending process of incremental improvement that explains why weather forecasts are good and slowly getting better. Why is this process non-existent in many stock market predictions, macro economic predictions? The author says that it is a demand-side problem. The consumers of forecasting don’t demand evidence of accuracy and hence there is no measurement.

Most of the readers and general public might be aware of the research done by Tetlock that produced the dart-throwing chimpanzee article. However this chapter talks about another research study that Tetlock and his research partner&wife started in 2011, Good Judgement Project. The couple invited volunteers to sign up and answer well designed questions about the the future. In total, there were 20,000 people who volunteered to predict event outcomes. The author collated the predictions from this entire crew of 20,000 people and played a tournament conducted by IARA, an intelligence research agency. The game comprised predicting events spanning a month to a year in to the future. It was held between 5 teams, one of which was GJP. Each team would effectively be its own research project, free to improvise whatever methods it thought would work, but required to submit forecasts at 9 a.m. eastern standard time every day from September 2011 to June 2015. By requiring teams to forecast the same questions at the same time, the tournament created a level playing field-and a rich trove of data about what works, how well, and when. Over four years, IARPA posed nearly five hundred questions about world affairs. In all, one million individual judgments about the future. In all the years, the motley crowd of forecasters of The Good Judgement project beat the experts hand down. The author says that there are two major takeaways from the performance of his team

Foresight is real : They aren’t gurus or oracles with the power to peer decades into the future, but they do have a real, measurable skill at judging how high-stakes events are likely to unfold three months, six months, a year, or a year and a half in advance.
Forecasting is not some mysterious gift : It is the product of particular ways of thinking, of gathering information, of updating beliefs. These habits of thought can be learned and cultivated by any intelligent, thoughtful, determined person.

The final section of the first chapter contains author’s forecast on the entire field of forecasting. With machines doing most of the cognitive work, there is a threat the forecasting done by humans will be no match to that supercomputers. However the author feels that humans are underrated(a book length treatment has been given by Geoff Colvin). In the times to come, the best forecasts would result from a combination of human-machine teams rather than humans only or machines only forecasts.

The book is mainly about specific type of people, approximately 2% of the forecasters who did phenomenally well. The author calls them “superforecasters”. Have they done well because of luck or skill? If it is skill, what can one learn from them ? are some of the teasers posed in the first chapter of the book.

Illusions of Knowledge

Given the number of books and articles that have been churned out talking about our cognitive biases, there is nothing really new in the chapter. The author reiterates the System-1 and System-2 thinking from Daniel Kahnenman’s book. He also talks about the perils of being over-confident of our own abilities. He talks about various medical practices that were prevalent before the advent of clinical trials. Many scientists advocated medicine based on their “tip-of-your-nose” perspective without vetting their intuitions.

The tip-of-your-nose perspective can work wonders but it can also go terribly awry, so if you have the time to think before making a big decision, do so-and be prepared to accept that what seems obviously true now may turn out to be false later.

The takeaway from this chapter is obvious from the title of the chapter. One needs to weigh in both system-1 and system-2 thinking in most of our decisions. An example of Magnus carlsen is given where his intuition tells him what are the possible steps immediately, and he spends most of the time double checking his intuition. Only then does he make the next move in a chess tournament. Its an excellent practice to mix System-1 thinking and Sytem-2 thinking, but one requires conscious effort to do that.

Keeping Score

The chapter starts with the infamous statement of Steve Balmer who predicted that iPhone was not going to have a significant market share. To evaluate Balmer’s forecast in a scientific manner, the author looks at the entire content of Balmer’s speech and says that there are many vague terms in the statement that it is difficult to give a verdict on the forecast. Another example is the “open letter” to Bernanke that was sent by many economists to stop QE to restore stability. QE did not stop and US has not seen any of the dire consequences that economists had predicted. So, is the forecast wrong ? Again the forecast made by economists is not worded precisely in numerical terms so that one can evaluate it. The basic message that the author tries to put across is, “judging forecasts is difficult”.

The author’s basic motivation to conduct a study on forecasting came while sitting on a panel of experts who were asked to predict the future of Russia. Many of the forecasts were a complete disaster. However that did not make them humble. No matter what had happened the experts would have been just as adept at downplaying their predictive failures and sketching an arc of history that made it appear they saw it coming all along. In such scenario, how does one go about testing forecasts ? Some of the forecasts have no time lines. Some of the forecasts are worded in vague terms. Some of them are not worded in numbers. Even if there are numbers, the event happened cannot be repeated and hence how does one decide whether it is luck or skill ? We cannot rerun history so we cannot judge one probabilistic forecast- but everything changes when we have many probabilistic forecasts. Having many forecasts helps one pin down two essential features of any forecast analysis, i.e. calibration and resolution. Calibration involves testing whether the forecast and the actual are in sync. Resolution involves whether the forecast involved are decisive probabilistic estimate and not somewhere around 40%-60%. The author takes all the above in to consideration and starts a 20 year project from 1984-2004 that goes like this :

assemble experts in various fields
ask a large number of questions with precise time frames and unambigous language
require that forecast be expressed using numerical probability scales
measure the calibration of the forecasters
measure the resolution of the forecasters
use brier score to evaluate the distance between the forecast and the actual

The author patiently waits for 20 years to see the results of all the forecasts. The following are the findings/insights from the project :

To make a good analogy, the author says big idea thinkers are akin to “Hedgehogs” and many idea thinkers are akin to “foxes”
Foxes were better forecasters than Hedgehogs
Foxes don’t fare well with the media
Aggregating among a diverse set of opinions beats hedgehogs. That’s why averaging from several polls gives a better result than single poll. This doesn’t mean “wisdom of any sort of crowd” works. It means “wisdom of certain type of crowd” works.
best metaphor for developing various perspective is to have a dragonfly eye. Dragonflies have two eyes, but theirs are constructed very differently. Each eye is an enormous, bulging sphere, the surface of which is covered with tiny lenses. Depending on the species, there may be as many as thirty thousand of these lenses on a single eye, each one occupying a physical space slightly different from those of the adjacent lenses, giving it a unique perspective. Information from these thousands of unique perspectives flows into the dragonfly’s brain where it is synthesized into vision so superb that the dragonfly can see in almost every direction simultaneously, with the clarity and precision it needs to pick off flying insects at high speed. A fox with the bulging eyes of a dragonfly is an ugly mixed metaphor but it captures a key reason why the foresight of foxes was superior to that of hedgehogs with their green-tinted glasses. Foxes aggregate perspectives.
Simple AR(1) kind of models performed better than hedgehogs and foxes

Superforecasters

The chapter starts off recounting the massive forecasting failure from the National Security Agency, the Defense Intelligence Agency, and thirteen other agencies that constitute the intelligency community of US government. These agencies had a consensus view that IRAQ has weapons of mass destruction. This view made everyone support Bush’s policy of waging the Iraq war. After the invasion in 2003, no WMDs were found. How come the agencies that employ close to twenty thousand intelligence analysts were so wrong? Robert Jervis who has critically analyzed the performance of these agencies over several decades says that the judgement was a reasonable one but wrong. This statement does require some explanation and the author provides the necessary details. The takeaway from the story is that the agencies did some errors that would have scaled back the probability levels that were associated with the consensus view. Who knows it would have changed the course of Iraq’s history?

After this failure, IARPA(Intelligence Advanced Research Projects Activity) was created in 2006. Its mission was to fund cutting-edge research with the potential to make the intelligence community smarter and more effective. They approach the author with a specific type of game in mind. IARPA’s plan was to create tournament-style incentives for top researchers to generate accurate probability estimates for Goldilocks-zone questions. The research teams would compete against one another and an independent control group. Teams had to beat the combined forecast-the “wisdom of the crowd”-of the control group. In the first year, IARPA wanted teams to beat that standard by 20%-and it wanted that margin of victory to grow to 50% by the fourth year. But that was only part of IARPA’s plan. Within each team, researchers could run experiments to assess what really works against internal control groups. Tetlock’s team beat the control group hands down. Was it luck ? Was it the team had a slower reversion to mean ? Read the chapter to judge it for yourself. Out of several volunteers that were involved GJP, the author finds that there were certain forecasters who were very extremely good. The next five chapters are all about the way superforecasters seem go about forecasting. The author argues that there are two things to note from GJP’s superior performance :

We should not treat the superstars of any given year as infallible. Luck plays a role and it is only to be expected that the superstars will occasionally have a bad year and produce ordinary results
Superforecasters were not just lucky. Mostly, their results reflected skill.

Supersmart?

The set of people whom the author calls superforecasters do not represent a random sample of people. So, the team’s outcome is not the same thing as collating predictions from a large set of random people. These people are different is what the author says. But IQ or education are not the boxes based on which they can be readily classified. The author reports that in general the volunteers had higher IQ than others but there was no marked distinction between forecasters and superforecasters. So it seems intelligence and knowledge help but they add little beyond a certain threshold-so superforecasting does not require a Harvard PhD and the ability to speak five languages.

The author finds that superforecasters follow a certain way of thinking that seems to be marking better forecasters

good back of the envelope calculations
Starting with outside view that reduces anchoring bias
Subsequent to outside view, get a grip on the inside view
Look out for various perspectives about the problem
Think thrice/four times, think deeply to root out confirmation bias
It’s not the raw crunching power you have that matters most. It’s what you do with it.

Most of the above findings are not groundbreaking. But what it emphasizes is that good forecasting skills do not belong to some specific kind of people. It can be learnt and consciously cultivated.

For superforecasters, beliefs are hypotheses to be tested, not treasures to be guarded. It would be facile to reduce superforecasting to a bumper-sticker slogan, but if I had to, that would be it.

Superquants?

Almost all the superforecasters were numerate but that is not what makes their forecasts better. The author gives a few examples which illustrate the mindset that most of us carry. It is the mindset of Yes, No and Maybe, where Yes mean very almost certainty, No means almost impossible and Maybe means 50% chance. This kind of probabilistic thinking with only three dials does not help us become a good forecasters. Based on the GJP analysis, the author says that the superforecasters have a more fine grained sense of probability estimates than the rest of forecasters. This fine grained probability estimates are not a result of some complex math model, but are a result of careful thought and nuanced judgement.

Supernewsjunkies?

The chapter starts with the author giving a broad description of the way a superforecaster works:

Unpack the question into components. Distinguish as sharply as you can between the known and unknown and leave no assumptions unscrutinized. Adopt the outside view and put the problem into a comparative perspective that downplays its uniqueness and treats it as a special case of a wider class of phenomena. Then adopt the inside view that plays up the uniqueness of the problem. Also explore the similarities and differences between your views and those of others-and pay special attention to prediction markets and other methods of extracting wisdom from crowds. Synthesize all these different views into a single vision as acute as that of a dragonfly. Finally, express your judgment as precisely as you can, using a finely grained scale of probability.

One of the things that the author notices about superforecasters is their tendency to make changes to the forecasts frequently. As things/facts change around them, they revise their forecasts. This begs the question, “Does the initial forecast matter ?”. What if one starts with a vague prior and keep updating it based on the changing world. The GJP analysis shows that superforecasters initial estimates were 50% more accurate than the regular forecasters. The real takeaway is that “updating matters”;frequent updating is as demanding as challenging and it is a huge mistake to belittle belief updating.Both under and overreaction to events happening can diminish accuracy. Both can also, in extreme cases, destroy a perfectly good forecast. Superforecasters have little ego invested in their intial judgements and the subsequent judgements. This makes them update their forecasts far quicker than other forecasters. Superforecasters update frequently and update in smaller increments. Thus they tread the middle path between over forecasting and underforecasting. The author mentions one superforecaster who uses Bayes theorem to revise his estimates. Does that mean Bayes is the answer to getting forecasts accurate? No, says the author. He found that even though all the superforecasters were numerate enough to apply Bayes, but nobody actually crunched numbers that explicitly. The message is that all the superforecasters appreciate the Bayesian spirit, though none had explicitly used a formula to update their forecasts. But not always “small updations” work. The key idea that the author wants to put across is that there is no one “magic” way to go about forecasting. Instead there are many broad principles with lots of caveats.

Perpetual Beta

The author starts off by talking about Carol Dwecks' “growth mindset” principle and says that this is one of the important traits of a superforecaster.

We learn new skills by doing. We improve those skills by doing more. These fundamental facts are true of even the most demanding skills. Modern fighter jets are enormously complex flying computers but classroom instruction isn’t enough to produce a qualified pilot. Not even time in advanced flight simulators will do. Pilots need hours in the air, the more the better. The same is true of surgeons, bankers, and business executives.

It goes without saying that practice is key to becoming good. However it is actually “informed practice” that is the key to becoming good. Unless there is a clear and timely feedback about how you are doing, the quantity of practice might be an erroneous indicator of your progress. This idea has been repeated in many books in the past few years. An officer’s ability to spot a lier is generally poor because the feedback of his judgement takes long time to reach him. On the other hand, people like metereologists, seasoned bridge players learn from failure very quickly and improve their estimates. I think this is the mindset of a day trader in financial markets. He makes a trade, he gets a quick feeback about it and learns from the mistakes. If you take a typical mutual fund manager and compare him/her with a day trader, the cumulative feedback that the day trader receives is far more than what an MF manager receives. Till this day, there is a debate about whether Mr.XYZ was a good fund manager or not. You can fill in any name for XYZ. Some say luck, Some say skill. It is hard to tease out which is which when the data points are coarse grained. However if you come across a day trader who consistently makes money for a decent number of years, it is hard to atttribute luck to his performance, for the simple reason that he has made far more trades cumulatively than an MF manager. The basic takeaway atleast for a forecaster is that he/she must know when the forecast fails. This is easier said/written than done. Forecasts could be worded in ambiguous language, the feedback might have a large time lag like years by which time our flawed memories can no longer remember the old forecast estimate. The author gives a nice analogy for forecasters who do not have timely feedback. He compares them with basketball players doing free throws in the dark.

They are like basketball players doing free throws in the dark. The only feedback they get are sounds-the clang of the ball hitting metal, the thunk of the ball hitting the backboard, the swish of the ball brushing against the net. A veteran who has taken thousands of free throws with the lights on can learn to connect sounds to baskets or misses. But not the novice. A “swish!” may mean a nothing-but-net basket or a badly underthrown ball. A loud “clunk!” means the throw hit the rim but did the ball roll in or out? They can’t be sure. Of course they may convince themselves they know how they are doing, but they don’t really, and if they throw balls for weeks they may become more confident-I’ve practiced so much I must be excellent!-but they won’t get better at taking free throws. Only if the lights are turned on can they get clear feedback. Only then can they learn and get better.

The author also highlights “grit” as a common trait across all superforecasters.

Towards the end of this chapter, the author manages to give a rough composite portrait of a superforecaster :

Philosophical Outlook	Cautious	Nothing is certain
	Humble	Reality is infinitely complex
	Non Deterministic	What happens is not meant to be and does not have to happen
Abilities and thinking styles	Actively open-minded	Beliefs are hypotheses to be tested, not treasures to be protected
	Intelligent, Knowledgable with a need for cognition	Intellectually curious, enjoy puzzles and mental challenges
	Reflective	Introspective and self-critical
	Numerate	Comfortable with numbers
Methods of forecasting	Pragmatic	Not wedded to any idea or agenda
	Analytical	Capable of stepping back from the tip-of-your-nose perspective and considering other views
	Dragon-fly eyed	Value diverse views and synthesize them into their own
	Probablistic	Judge using many grades of maybe
	Thoughtful updaters	When facts change, they change their minds
	Good intuitive psychologists	Aware of the value of checking thinking for cognitive and emotional biases
Work ethic	Growth mindset	Believe it's possible to get better
	Grit	Determined to keep at it however long it takes

The author says that the single most predictor of rising to the ranks of superforecasters is to be in a state of “perpetual beta”.

Superteams

The author uses GJP as a fertile ground to ask many interesting questions such as :

When does “wisdom of crowd” thinking help ?
Given a set of individuals, does weighing team forecasts work better than weighting individual forecasts?
Given that there are two groups, forecasters and superforecasters, does acknowledging superforecasters after the year 1 performance works in improving or degrading the subsequent year performance for superforecasters?
How do forecasters perform against prediction markets ?
How do superforecasters perform against prediction markets ?
How do you counter “groupthink” amongst a team of superforecasters?
Does face-to-face interaction amongst superforecasters help/worsen the forecast performance ?
If aggregation of different perspectives gives better performance, should the aggregation be based on ability or diversity ?

These and many other related questions are taken up in this chapter. I found this chapter very interesting as the arguments made by the author are based on data rather than some vague statements and opinions.

Leader’s dilemma

I found it difficult to keep my attention while reading this chapter. It was trying to address some of the issues that typical management books talk about. I have read enough management BS books that my mind has become extremely repulsive to any sort of general management gyan. May be there is some valuable content in this chapter. May be there are certain type of readers who will find the content in the chapter appealing. Not me.

Are they really Super?

The chapter critically looks at the team of superforecasters and tries to analyze viewpoints of various people who don’t belive that superforecasters have done something significant. The first skeptic is Daniel Kahneman who seems to be of the opinion that there is a scope bias in forecasting. Like a true scientist, the author puts his superforecasting team in a controlled experiment that gives some empirical evidence that supeforecasters are less prone to scope bias. The second skeptic that the author tries to answer is Nassim Taleb. It is not so much as an answer to Taleb, but an acknowledgement that superforecasters are different. Taleb is dismissive of many forecasters as he believes that history jumps and these jumps are blackswans (highly improbable events with a lot of impact). The author defends his position by saying

If forecasters make hundreds of forecasts that look out only a few months, we will soon have enough data to judge how well calibrated they are. But by definition, “highly improbable” events almost never happen. If we take “highly improbable” to mean a 1% or 0.1% or 0.0001% chance of an event, it may take decades or centuries or millennia to pile up enough data. And if these events have to be not only highly improbable but also impactful, the difficulty multiplies. So the first-generation IARPA tournament tells us nothing about how good superforecasters are at spotting gray or black swans. They may be as clueless as anyone else-or astonishingly adept. We don’t know, and shouldn’t fool ourselves that we do.

Now if you believe that only black swans matter in the long run, the Good Judgment Project should only interest short-term thinkers. But history is not just about black swans. Look at the inch-worm advance in life expectancy. Or consider that an average of 1% annual global economic growth in the nineteenth century and 2% in the twentieth turned the squalor of the eighteenth century and all the centuries that preceded it into the unprecedented wealth of the twenty-first. History does sometimes jump. But it also crawls, and slow, incremental change can be profoundly important.

So, there are people who trade or invest based on blackswan thinking. Vinod Khosla invests in many startups so that one of them can be the next google. Taleb himself played with OTM options till one day he cracked it big time. However this is the not the only kind of philosophy that one can adopt. A very different way is to beat competitors by forecasting more accurately-for example, correctly deciding that there is a 68% chance of something happening when others foresee only a 60% chance. This is the approach of the best poker players. It pays off more often, but the returns are more modest, and fortunes are amassed slowly. It is neither superior nor inferior to black swan investing. It is different.

What Next ?

The chapter starts off by giving a few results of the opinion polls that were conducted before the Scotland’s referendum of joining UK. The numbers show that there was no clear sign of which way the referendum would go. In any case, the final referendum was voted NO. It was hard to predict the outcome. There was one expert/pundit, Daniel Drezner, who came out in the open and admitted that it is extremely easy to give an explanation after the fact but doing so, before the fact forecast, is a different ball game. Drezner also noted that he himself had stuck to NO for sometime before switching to YES. He made an error while correcting his prior opinion. As a learning, he says, in the future he would give a confidence interval for the forecast, rather than a binary forecast. The author wishes that in the future many more experts/forecasters adopt the confidence interval mindset. This shift from point estimate to interval estimate might do a world of good, says the author. What will this 500 page book do to the general reader/society ? The author says that there could be two scenarios.

Scenario 1: forecasting is mainly used to advance a tribes interests. In all such situations, the accuracy of the forecast would be brushed aside and whoever makes the forecast that suits the popular tribe will be advertised and sadly actions will be taken based on these possibly inaccurate forecasts. This book will be just another book on forecasting that is good to read, but nothing actionable will come out of it.
Scenario 2 : Evidence based forecasting takes off. Many people will demand accuracy, calibration results of experts

Being an optimistic skeptic, the author feels that evidence based forecasting will be adopted in the times to come. Some quantification is always better than no quantification(which is want we see currently). The method or system used in the forecasting tournament to come out ahead is a work-in-progress, admits the author. However that doesn’t mean it is not going to improve our forecasting performance.

Towards the end of the book, the author does seem to acknowledge the importance of Tom Friedmans of the world, not because of their forecasting ability. It is their vague forecasts that are actually superquestions for the forecasters. Whenever pundits give their forecasts in a imprecise manner, that serves as the fodder for all the forecasters to actually get to work. The assumption the author makes is that superforecasters are not superquestioners. Superquestioner are typically hedgehogs who have one big idea, think deeply and see the world based on that one big idea. Superforecasters, i.e. foxes are not that good at churning out big questions, is what the author opines. In conclusion, he says the idea person would be a combination of superforecaster and superquestioner.

Takeaway

This book is not “ONE BIG IDEA” book. Clearly the author is on the side of foxes and not hedgehogs. The book is mainly about analyzing the performance a specific set of people from a forecasting team that participated in IARPA sponsored tournament. The book looks at these superforecasters and spells out a number of small but powerful ideas/principles that can be cultivated by anyone, who aspires to be a better forecaster.