August 14, 2017

Getting it Righter, Faster

The Role of Prediction in Agile Government Decisionmaking

By Dr. Kathryn McNabb Cochran and CDR Gregory Tozzi

Humanity is fascinated by prediction failures. The failure to predict the September 11, 2001, terrorist attacks has replaced the failure to predict the Japanese attack on Pearl Harbor as the United States’ canonical intelligence failure, but the failure to predict the Arab Spring is gaining ground. These failures have prompted some to argue that because prediction is a futile exercise, organizations are better served by investing in agile systems that can react rapidly to change rather than investing in predictive systems that help them anticipate change. The authors argue that this is a false dichotomy. Predictive systems can support agility, and recent advances in the science of forecasting offer multiple tools for organizations seeking to see around the corner faster and with more acuity.

To be effective, prediction must be approached more rigorously than it has been in the past, which places stringent requirements on both the predictive systems and the organizations that employ them. Predictive systems must generate forecasts that are numerically precise, probabilistic, and continuously updated, while the policy community must embrace probabilistic thinking, develop systems for mapping outcomes to measures of utility, and organize agencies so that they can have the flexibility to respond proactively to shifting predictions.

In this paper, the authors outline how the complexity of today’s world underlines the need for agility in government decisionmaking and argue that predictive systems can support agility at multiple decision points in the policymaking process. The authors then provide an overview of forecasting methodologies that meet these requirements. The paper concludes with a case study of how the Department of State’s Bureau of International Narcotics and Law Enforcement Affairs used one of these methods, the forecasting tournaments developed by the Good Judgment Project, to inform its decisions on how much to invest in manual eradication technologies in Colombia. That case study revealed that forecasts have the most impact when they diverge from internal consensus and when they are combined with a decisionmaking framework that can help offices map numeric values onto intangible utilities.

Introduction

Prediction has always been difficult, but the interconnectedness and speed of our globalized world make managing uncertainty especially challenging. Increased interdependence has exponentially increased links between actors, creating potential for nonlinear change.¹ In a U.S. Institute of Peace report, Robert Lamb and Melissa Greg describe the impact of this in trying to manage humanitarian assistance in complex conflicts: “internal dynamic inputs can create cascade effects (second and third order or higher) that make it extremely hard to predict what effects they ultimately will have.”² Additionally, technology has increased the speed at which system-level change can occur. Speed combined with interdependence means that “the amount of interactive complexity previously contained in months of say, local conversation and letter exchange might now be squeezed into a few hours of explosive social media escalation.”³ The iconic example is the video of the self-immolation of Tunisian street vendor Tarek al-Tayeb Mohammed Bouaziz going viral on social media, setting off a wave of protests that toppled the Tunisian government in just 10 days and led to mass protests throughout the Middle East and North Africa. The unrest ultimately resulted in the demise of entrenched regimes in Egypt and Libya and sparked the civil wars that continue to rage in Syria and Yemen.⁴

At the same time, interdependence and technological change have also vastly increased the amount of information organizations have at their fingertips, and advances in forecasting methodologies have yielded a number of proven techniques that can tap into this information to generate accurate predictions. The goal of this paper is to explain how those techniques can best be used by the U.S. government, particularly the defense and foreign policy sectors. This report begins by reviewing the literature on the need for organizational agility in today’s world and exploring how prediction can be used to facilitate the three key components of agility: adaptation, flexibility, and responsiveness. It then argues that to support these components, predictive systems must be precise, probabilistic, and continuously updated. After describing a few of the advanced forecasting tools available, the report features a case study of the use of one of the tools – Good Judgment’s forecasting tournaments – by the Department of State’s Bureau of International Narcotics and Law Enforcement Affairs. After discussing the findings from the case study, the report concludes with recommendations for both predictive systems and policymakers. The report recommends that predictive systems maximize responsiveness by providing continuously updated forecasts and maximize impact by embedding predictions within a decisionmaking framework. It also recommends that the policymaking community familiarize decisionmakers and analysts with probabilistic thinking and empower office-level strategy cells with predictive mandates so organizations can harness the expertise of those closest to emerging issues.

The Need for Agility

Faced with today’s complexity, many organizations are eschewing explicit prediction, instead adopting systems that can react quickly to unforeseen changes. General Stanley McChrystal argues in Team of Teams that the key criterion for success has shifted from efficiency to agility. Effectiveness, McChrystal says, is less about optimization than it is about an organization’s “responsiveness to a constantly shifting environment.”⁵ This underpins former Secretary of the Navy Richard Danzig’s recommendations about planning for predictive failure. He argues in a 2011 Center for a New American Security (CNAS) report that because prediction in national security is so difficult, the government should develop platforms and processes that handle rapid change rather than investing in capabilities that increase foresight.⁶ Others focus on creating flat, decentralized organizations that share information and empower all levels of leadership to respond rapidly and creatively to new problems.⁷

Little attention is paid to increasing predictive capacity, because the arguments underpinning the recommendations outlined above assume that prediction is foolhardy in complex, volatile environments. Prediction is largely limited to scenario planning, which is designed to inspire creative thinking about the range of potential scenarios without assessing their likelihoods. Prediction is considered an enabler of efficiency rather than agility, and so is deprioritized. But this is a false dichotomy.

Prashant Patel and Michael Fisherkeller, analysts at the Institute for Defense Analysis, break down agility into three components: Adaptability measures the ability of the organization (or equipment or software) to change, flexibility measures the costs of adaptation, and responsiveness measures the time required for adaptation.⁸ Prediction is essential for all three components. At the planning phase, it enables prioritization of different types of adaptability and allows the cost structures of varying levels of flexibility to be compared in a systematic manner. Prediction can also increase responsiveness by enabling organizations to spot and respond to disruptive change early.

Enabling Adaptation and Flexibility

Prediction is the critical precursor of rational decisionmaking because it enables rigorous comparisons of alternatives. When faced with choices – of strategies, of weapon systems, of investment portfolios – decisionmakers must be able to assess the potential consequences of each choice and the likelihood of each of those potential consequences. A rational decision to purchase F-22s or F-35s is a function of predictions about adversaries’ future capabilities, future operating environments, and future mission sets. The decision to open an alternative logistical path to Afghanistan in 2009 was informed – implicitly or explicitly – by predictions that Pakistan would reverse course and deny the United States access to its ports, roads, and airspace.⁹

In an era where efficiency was the primary criterion for success, prediction played a role in determining the scenario for which a decision would be optimized and in predicting how the decision would influence a set of critical variables that had been identified ex ante. The U.S. Navy developed the AEGIS weapon system specifically to counter the specter of Soviet anti-ship cruise missiles denying access to areas of strategic importance.¹⁰ Likewise, American nuclear strategy in the 1960s used predictions about Soviet nuclear strategy to optimize force structure to ensure that the United States could achieve assured destruction.¹¹

In an era where agility is the primary criterion for success, the role of prediction is more nuanced and involves measuring the uncertainty associated with various courses of action in addition to the likely outcome. These predictions are key to determining levels of adaptability and flexibility during the planning process. Defense acquisition provides instructive examples. The design process for long-lived military systems involves the use of trade space analysis to resolve the tension between optimizing for the current operating environment and providing space, weight, and power margins to allow the system to adapt to evolving threats, foreseen or unforeseen.¹² The size of the growth margin and the degree of initial optimization are bets based on predictions about the future state of the operating environment and hedges in response to uncertainty in those predictions.

Prediction can support flexibility through the acquisition of a portfolio consisting of low-cost systems with options to develop systems able to bridge anticipated capability gaps. Structured to employ novel authorities, the Defense Innovation Unit – Experimental (DIUx) can contract prototype projects with nontraditional defense contractors in fewer than 60 days and can transition successful demonstrations quickly to follow-on production.¹³ Where traditional defense acquisition embeds flexibility based on prediction in the system design process, the DIUx business model, structure and enabling legislation are themselves bets on the future state and uncertainty of technological innovation.

Enabling Responsiveness

In addition to providing a rigorous basis for planning, real-time prediction can be used to update plans proactively as conditions change. The response to improvised explosive devices (IEDs) in the Iraq War is an instructive example. The vulnerability of Humvees to IEDs, for instance, was identified as a threat as early as 2003, but it was not until 2007 that the Pentagon prioritized the rapid deployment of mine-resistant ambush protected vehicles (MRAPs) to address the IED challenge. Once the decision was made, more than 10,000 MRAPs were fielded in less than a year.¹⁴ The success of the MRAP program has been debated, with proponents pointing to significant decreases in casualties and opponents focusing on the costs.¹⁵ Both sides note that the MRAPs would have had a much greater impact had they been deployed earlier, at the height of the insurgency. If predictions about the likely trajectory of IED explosions and lethality had been available and incorporated into the decisionmaking process, the switch to MRAPs could have been made sooner. Rather than reactively responding to the accumulation of casualties, the Pentagon could have responded proactively to forecasts that showed that there was a high probability that IED use and lethality would increase exponentially in the early years of the conflict.

Prediction can also enable responsiveness in policymaking. Forecasts about the likelihood of famine or the spread of infectious disease can (and do) enable aid to be repositioned before humanitarian challenges reach catastrophic levels.¹⁶ Enabling such preventive measures is the explicit goal of the U.S. Holocaust Museum’s Early Warning Project, which uses statistical models and crowdsourcing to provide policymakers with predictions about the likelihood of mass killing.¹⁷

Requirements for Agile Predictive Systems

Enabling responsiveness, adaptability, and flexibility places four requirements on predictive systems. First, predictions must be probabilistic. Predictions should reflect the uncertainty that is a hallmark of today’s world. It is not enough to know the most likely outcome or scenario; being able to measure the associated likelihoods is critical to making rational decisions that properly account for risk. Embracing probabilistic reasoning acknowledges that no outcome is ever certain and that attention must be paid to improbable, but not impossible, outcomes.

Second, predictions must be updated continuously. The true probability of an event occurring changes (often quickly) over time. Continuous prediction systems should update rapidly when presented with new evidence. Point predictions can support planning when systems are designed, budgets allocated, and priorities established. Those predictions should be continuously updated as decisions are implemented to allow dynamic adjustment in response to changes in the likely trajectory of different scenarios.¹⁸

Third, predictions must be precise. Assigning numeric probabilities to outcomes enables assessment of the accuracy of forecasters and prediction methods. In contrast, assigning vague terms like “likely,” “probable,” or “possible” to outcomes makes evaluation impossible, preventing systematic improvement of predictive abilities. It also makes it difficult to perform meaningful comparisons of the relative likelihood of alternative scenarios.¹⁹ For example, in February 2017 the United Nations warned that famine was likely in Nigeria, Somalia, and Yemen.²⁰ Simply knowing that famine is likely in three countries is insufficient to apportion aid among them. Knowing that the associated probabilities are 60 percent, 85 percent, and 90 percent, and using those probabilities as input to an expected utility-based system, is much more helpful. Precise probabilities also enable temporal comparison. From April to May 2017, the Famine Early Warning System (FEWS) categorized Nigeria as likely to experience famine, but FEWS’ qualitative predictions make it difficult to assess whether famine became more or less likely during this time. Good Judgment Open, Good Judgement Inc's crowdsourced prediction website, has been tracking the probability of famine in Nigeria in real time using the probabilistic forecasts presented in Figure 1. Because precise probabilities are given, trends are easy to spot. Finally, precise numerical predictions are required to make expected value judgments, and it is those judgments that are key to decisionmaking.

Figure 1

Finally, the predictive system must enable expected utility calculations. A 2006 Central Intelligence Agency (CIA) research paper detailing the value of prediction markets noted that “at the end of the day, prediction market results are just probabilistic estimates of future outcomes. … [P]olicymakers still must decide on the threshold for action.”²¹ Expected utility theory provides a tractable framework for using probabilities to make decisions under uncertainty, while accounting for the risk tolerance of the decisionmaker. The basic idea is to enumerate the utilities associated with plausible outcomes, multiply those utilities by their associated probabilities, and take the sum. Quantifying the decisionmaking problem enables repeatable comparisons of alternatives. Although expressing abstract ideas such as political capital in numbers is challenging, it is not impossible. Foreign policy and medical literature includes examples of mapping qualitative outcomes or conditions to linear scales.²² The medical community uses survey questions designed to determine the dollar equivalence of outcomes or to understand the willingness of individuals to trade outcomes such as longevity and quality of life.²³

An expected utility-based decision support system would produce a time series of the expected utilities associated with decisions under consideration. Decisionmakers would be able to identify how changes in key predictions shift their expected utilities and to reallocate resources accordingly.

Forecasting: Tools for Prediction

A number of forecasting tools are available to decisionmakers looking to leverage the power of precise predictions. The list of methods presented below is not exhaustive but includes techniques of established value to the defense and foreign policy communities.

Forecasting Tournaments

In 1906, Francis Galton asked fairgoers in Plymouth County, England, to predict the weight of a slaughtered ox. While individual predictions varied widely, the median of the crowd’s prediction was almost exactly right.²⁴ Out of this experiment came the theoretical basis for the wisdom of the crowd, which postulates that crowd-based estimates will converge on the truth because individual prediction errors are random and thus cancel when aggregated.²⁵

Forecasting tournaments leverage this logic to extract accurate probabilistic predictions.²⁶ Individuals compete by making predictions on a series of discrete forecasting questions. These predictions are scored, ranked, and weighted. The crowd’s prediction, computed by aggregating individually weighted forecasts, has been found to be more accurate than the prediction of individuals, even experts.²⁷ From 2011 to 2015, the Intelligence Advanced Research Projects Activity (IARPA) ran a series of forecasting tournaments to assess the utility of crowdsourced prediction methods for the intelligence community (IC). The winning team, the Good Judgment Project, employed a forecasting tournament that outperformed prediction markets, other elicitation techniques, and IC analysts.²⁸ Embedded within the Good Judgment Project was a series of experiments which showed that cognitive de-biasing, training in probabilistic reasoning, teaming, tracking elite forecasters, and using algorithmic strategies could boost the accuracy of individuals and the crowd.²⁹ Today’s cutting-edge tournaments employ these strategies to deliver accurate, real-time forecasts on issues such as the future of European integration and the future of artificial intelligence.

Prediction Markets

Markets for trading securities derived from future events are an alternative way of crowdsourcing prediction. The Iowa Electronic Markets (IEM) provide an excellent example of this model. Users trade contracts that pay the holder a certain amount of real money on expiration.³⁰ Contract liquidation values range from $0 to $1 and can be mapped directly to a probability. During the 2016 presidential election, users could trade two instruments: winner-take-all contracts paying one dollar in the event of a Republican or Democratic popular vote plurality, and vote share contracts that paid one dollar times the Republican or Democratic share of the two-party popular vote.³¹ The IEM has a decades-long history of beating polls and outperformed Gallup’s predictions of presidential elections from 1988 to 2000 by a healthy margin.³² Hypermind, a prediction market based in France, was nearly as accurate as Good Judgment in predicting the outcome of the 2016 presidential election, though most other prediction markets placed much higher probabilities on a Hillary Clinton win.³³

Prediction markets face several challenges. Regulation in the United States may prevent markets from being sufficiently liquid to enable efficient trading,³⁴ which can be particularly problematic when small groups of individuals acquire enough capital to move the entire market with single trades.³⁵ Using real-money markets to predict events of great consequence also faces ethical and legal scrutiny. The two legal prediction markets in the United States are experimental and operate in narrow legal carve-outs.³⁶ An experimental market created by the Defense Advanced Research Projects Agency (DARPA) in 2003 was shut down after critics alleged that it was tantamount to gambling on terrorism.³⁷ Recent research suggests that these latter challenges could be addressed by shifting to play-money markets, which have proven as effective as real-money markets.³⁸

Formal Models

Game theory offers an alternative to crowdsourcing and has the benefit of explicitly modeling the strategic interactions that generate outcomes of interest. Bruce Bueno de Mesquita’s model, developed in the early 1980s, predicts shifts of players’ revealed positions on an issue in response to challenges from counterparties using an expanded bargaining game. Experts identify the actors, define their preferences, and anchor the bargaining space. The model uses these inputs to predict the outcome of the negotiations. These models are particularly attractive because they offer insights into the dynamics of contested issues in ways that the other methods considered here cannot. An analyst using the latest models will be presented with a timeline of actions taken by relevant stakeholders as the issue moves toward resolution.³⁹ Armed with these dynamics, a policymaker can identify the optimal series of bluffs, compromises, and threats to achieve something closest to a desired end state.

Although Bueno de Mesquita’s model is challenging to replicate and not readily transparent,⁴⁰ a CIA study suggested that it correctly identified the underlying dynamics of collective decisionmaking processes twice as often as did area experts. His willingness to publish predictions of high-profile events, including the succession of the Ayatollah Khomeini,⁴¹ earned him and his method no small amount of fame,⁴² and recent updates to the model enable the experts to use distributions to define the model’s input, thus generating a probabilistic forecast rather than a point prediction. This innovation, coupled with the expanding number of actors and interactions, makes this a particularly powerful tool. With an open source toolkit developed at the King Abdullah Petroleum Studies and Research Center (KAPSARC), coming online, the transparency and replicability challenges may also be solved.⁴³

Big Data

Exponential increases in computing power and networked connectivity create opportunities to glean forecasting insights from massive data sets. The promise of big data is the ability to match a machine-learning algorithm to a large data set in order to yield predictive models otherwise unattainable through conventional research. IARPA launched the Open Source Indicators (OSI) tournament in 2011 to explore big data’s value in training models based on social network data to predict consequential societal events.⁴⁴ The winning system, Virginia Tech’s EMBERS, achieved near-perfect accuracy in predicting occurrences, though it did not always categorize them correctly.⁴⁵ In 2015, IARPA launched the Mercury Project, an experimental project designed to take lessons learned from OSI and apply them to data acquired through foreign signals intelligence.⁴⁶

Big data holds obvious promise to decisionmakers in the defense and foreign policy spaces, but it must be applied carefully. In addition to addressing privacy concerns,⁴⁷ practitioners must take care to avoid “overfitting,” an error evident in Google’s effort to predict influenza outbreaks – Google’s model fit its own training data superbly but was largely unable to predict new outbreaks accurately.⁴⁸ In addition, practitioners must navigate the tradeoffs between predictive power and opacity. IARPA’s OSI tournament and Mercury project require that models be built from shallow features of the data. Shallow features are model inputs that are directly interpretable by analysts and could include inputs like key words in social media posts, estimated participation in protests, and economic indicators. Restricting model inputs to shallow features allows analysts to follow the model’s reasoning from input to output. The alternative, deep learning, can produce significantly more accurate models. However, these models are essentially “black boxes,” with inner workings that are impossible to explain beyond their general mathematical properties.⁴⁹ If it is unsettling to consider a deep learning–powered car that operates based on inscrutable relationships between sensor-derived input and mechanical output, how much more difficult will it be to trust a similar system whose forecasts are used to justify military intervention?

Bayesian Reasoning

Bayes’ theorem is the basis of an array of tools for updating forecasts and aggregating the results of other probabilistic forecasting techniques.⁵⁰ As such, it provides a tractable way to begin incorporating probabilistic forecasting into an organization and a powerful way to integrate the methods described above. Essentially, it is a mathematically consistent means of updating forecasts as new information becomes available.⁵¹ An analyst begins with a forecast of the likelihood of an event based on a preconception. That forecast is updated as new information becomes available by assessing the likelihood that the new information would have occurred if the scenario occurred and the overall likelihood that the new information would have occurred. The formula and an example are given in Figure 2. By explicitly comparing the likelihood of the information in the relevant scenario and the total likelihood of the information occurring, Bayesian reasoning forces an assessment of the likelihood that the event will not occur even with the new information and then incorporates that assessment into the updated forecast. Consider the example of Iraqi weapons of mass destruction (WMD). In the example, Bayesian updating would force analysts to consider the total probability of Saddam Hussein interfering with International Atomic Energy Agency (IAEA) inspectors, which included both the probability that he interfered with the inspectors because he had WMD and the probability that he interfered with the inspectors even though he had no WMD. Applied rigorously, Bayesian reasoning reveals the true evolution of the likelihood that an event will occur. In fact, persistent errors in human judgment were famously revealed by comparing the trajectory of individual predictions against Bayesian updating.⁵² Bayesian reasoning does not come naturally to people, but the literature suggests that it can be taught with relative ease.⁵³

Figure 2

Case Study: The Question of Aerial Coca Eradication in Colombia

To demonstrate the utility of these methods and identify practical challenges in employing forecasting within government, the authors embarked on a forecasting project with the Department of State’s Bureau of International Narcotics and Law Enforcement Affairs (INL). The case study examined how INL used Good Judgment Inc.’s forecasting reports to inform its decisions on how to support Colombia in its eradication of illicit coca crops, a topic in which one of the co-authors of this report has been deeply involved.

Background

Eradicating illicit Colombian coca – the principal ingredient of cocaine – through aerial spraying of glyphosate was a key element of Plan Colombia, the comprehensive bilateral United States and Colombian approach to combating the corrosive influence of organized crime and the Revolutionary Armed Forces of Colombia (FARC)-led leftist insurgency in Colombia.⁵⁴ The Colombian government suspended aerial eradication in May 2015,⁵⁵ citing concerns about the carcinogenic properties of glyphosate. But some speculated that this action was taken principally to placate the FARC during a key phase of peace negotiations.⁵⁶ This posed a challenge for Washington, which was supportive of the peace deal, but also saw aerial eradication as the key element of the U.S.-Colombia counternarcotics strategy.⁵⁷ It also sparked a debate within Colombia as coca cultivation surged.⁵⁸ Highlighting divergent views within the government, Colombian Attorney General Nestor Humberto Martinez called for his government to resume aerial fumigation, a move backed by the country’s chief prosecutor, Alejandro Ordoñez.⁵⁹

U.S. Policy Options

INL was faced with several choices after suspension of aerial eradication. First, INL had to decide what to do with the fleet of heavily modified AT-802 spray aircraft. Second, it had to determine its support for alternative eradication techniques. Aerial eradication was preferred to manual methods, and funds for manual techniques could tie up resources needed to support a renewed Colombian aerial eradication program. If Colombia’s suspension of aerial eradication was truly permanent, divesting spray planes and investing heavily in manual techniques would be sensible. If not, it would be preferable to maintain the fleet and limit investment in manual techniques to short-term programs so that the aerial eradication program could be brought back online quickly.

INL struck a balance between these two approaches, selling part of the fleet at auction and arranging to transfer the remainder to the Colombian National Police so Colombia could resume aerial eradication if the 2015 decision was reversed. Although this shifted some of the programming risk onto the Colombian government, INL still faced the question of how much to invest in manual eradication. Were Colombia to reactivate the program, INL would likely need to provide maintenance and operational support to the Colombian effort to maximize the impact and to deflect concerns that widening drug production put the Colombian peace process at risk.⁶⁰ Having funds on hand to do this would enable aerial eradication to resume quickly, but keeping those funds in reserve would have opportunity costs.

Thus, the uncertainty over the future of the aerial eradication program posed two distinct challenges to INL:

Reprogramming counternarcotics support in Colombia to achieve best effect while maintaining the option to support a renewed Colombian aerial eradication program.
Developing a narrative explaining INL’s overall Colombia program when the future of aerial eradication remained uncertain.

The first challenge is practical. Reprogramming decisions carry opportunity costs and cannot be made without considering the overall Colombia program. Should INL reserve funds to respond rapidly to potential Colombian policy shifts, or should it optimize resource expenditures based on current constraints? How adaptable should INL’s programing be in this respect?

The second challenge is political. No government agency operates in a vacuum. Getting the resource decisions wrong in the extreme – by going full throttle with support for an aerial eradication program that never materializes or by being entirely unprepared to support a Colombian decision to resume the program – would cause uncomfortable discussions at the White House and on Capitol Hill, discussions that could place elements of INL’s broader enterprise at risk.

The Forecasting Project

To assist INL’s decisionmaking, the bureau partnered with CNAS and Good Judgment. In consultation with INL and CNAS, Good Judgment asked its top-tier forecasters, referred to as “superforecasters,”⁶¹ if the government of Colombia would allow aerial spraying to resume before the end of Colombian President Juan Manuel Santos’ term in August 2018. In addition to providing continuous probability estimates associated with the likelihood of resuming are aerial spraying the four plausible outcomes, Good Judgment Inc. provided an analysis of key forecast drivers and risk factors culled from conversations among forecasters on Good Judgment’s platform.

Figure 3

Good Judgment’s forecasts, presented in Figure 3, reinforced INL’s assessment that Colombia was unlikely to restart aerial eradication before August 7, 2018. The forecasts provided additional evidence in support of INL’s decisions to focus resources on manual eradication efforts.

The authors interviewed the INL Western Hemisphere programs director Richard Glenn to get his take on the usefulness of having access to a live forecasting stream, the ways in which the predictions were used, and what organizational changes could make the tool more useful. That conversation produced three key findings. First, having the forecasting data was useful, but its effect was limited because it reinforced INL’s qualitative assessment. Had the forecasting data revealed a likely departure from the status quo or even shown signs that the probability of a resumption of aerial spraying was climbing, it would have had a greater policy impact. INL had the flexibility to move funds in response to such a forecast. Glenn noted that if the forecast had shown a high likelihood of aerial spraying resuming, INL likely would have decreased its investment in manual eradication programs or designed those programs to be more short-term.

Second, and unsurprisingly, the expected utility calculations were not straightforward despite the precision of Good Judgment’s forecasts. Simple budgetary analysis does not adequately support expected utility calculations because, while comparing the costs of alternative courses of action is often straightforward, the benefits that accrue to the decisionmakers as a result of embarking on a course of action are often intangible and uncertain. In this case, the primary anticipated benefit was the impact of the selected course of action on illicit narcotics production in Colombia, but the opacity of the illicit narcotics supply chain makes projections of the efficacy of interventions uncertain. Tangential impacts of the decision were reputational and political, and thus challenging to quantify. A concrete measure of the utility associated with competing cases would have required some degree of internal effort and expert support to develop. Such a measure, however, would have enabled INL to present rigorous comparisons of alternatives to the White House, Congress, or the government of Colombia.

Finally, having access to forecasting products would be more useful if line offices like INL’s Office of Western Hemisphere Programs had dedicated planning staff that were focused on identifying future opportunities and challenges at the operational level. The State Department’s policy and planning staffs are centralized and focus on developing strategic plans, but those plans often do not speak directly to the operational decisions that line offices must make, such as what to do with a fleet of armored crop dusters. Operational decisions need to be informed by longer-term planning. Providing decentralized forecasting resources at the operational and tactical levels would allow, in this instance, detailed analyses of alternative futures specific to Colombia. An office armed with these resources would be better positioned to ask prescient forecasting questions and use the predictive tools described above. The prevalence of planning varies across government agencies. The Department of Defense and the military services have large planning staffs able to take advantage of advances in forecasting; indeed, several already are. Agencies without robust planning staffs will find integrating these technologies into their decisions more challenging.

Conclusions and Recommendations for Forecasting Practitioners

This case study and our broader analysis suggest that building predictive capacity can promote agility within government organizations. To be most effective at enabling agility and influencing the policy process, predictive systems must:

Be responsive. Prediction is more valuable and relevant to the client when it is updated frequently. Point predictions are useful during planning, but continuously updated predictions are needed during implementation as the policy evolves.
Provide a decisionmaking framework. As described above, defining utility in the INL case study was difficult because the outcomes of proposed courses of action were uncertain and intangible. Helping clients map similar intangibles to numeric values is critical to unlocking the power of precise predictions.

To be most effective at leveraging predictive technology, policymakers need to:

Learn to think probabilistically. Probabilistic thinking is on the rise, but its use is not occurring uniformly. A policymaker who does not have an intuitive grasp of probabilistic methods risks impeding the sound and robust decisionmaking that prediction methods support.
Organize for agility. Prediction-based decisionmaking promotes agility, reduces energy wasted on planning against implausible or inconsequential futures, and reduces surprise. However, historically reactive departments and agencies are not organized to take advantage of prediction. When restructuring is necessary, avoid isolating prediction and strategy experts in an ivory tower. Rather, forecasting tools should be available to both the strategic-level policy planning staffs and the office-level units whose strategy cells could leverage prediction methods to harness the expertise of those closest to challenging issues.

The world is complex and evolving. Organizations need to be agile and adapt quickly to change. But embracing agility does not require abandoning prediction. A predictive approach to agility provides decisionmakers with rigorous and regularly updated predictions of the costs and benefits associated with competing decisions. The alternative approach, which creates policies and organizations that adapt fluidly to a never-ending sequence of surprises, outwardly eschews prediction, but is in fact based on unvoiced and imprecise estimates of threats and opportunities. The predictive approach, in contrast, encourages a transparent accounting of underlying assumptions, risk points, and the impact of new evidence on decisions. Adopting the predictive approach requires a degree of training and restructuring, but investing in predictive capacity will pay dividends as governments and organizations adapt to an increasingly volatile world.

Acknowledgements

This project transitioned from an informal discussion on the state and value of prediction to a yearlong research effort thanks to the space and time afforded by CNAS, Good Judgment, and the U.S. Coast Guard. Our view of the promise of prediction was formed thanks to candid discussions with experts from the policy and forecasting communities, including Secretary Henry Kissinger, Secretary Colin Powell, Secretary Madeleine Albright, Director John Brennan, Michèle Flournoy, General David Petraeus, General Stanley McChrystal, Richard Armitage, Richard Danzig, Kristen Jordan, Bruce Bueno de Mesquita, Robert Kaplan, Ben Wise, Brian Efird, David Massad, and Marc Koehler. We are especially grateful to our collaborators from the Department of State’s Bureau of International Narcotics and Law Enforcement Affairs, Richard Glenn and Kale Edwards, for their willingness to help us ground the research with a practical case study.

Endnotes

Stanley McChrystal et al., Team of Teams: New Rules of Engagement for a Complex World (Penguin, 2015), Chapter 3; Richard Danzig, “Driving in the Dark: Ten Propositions About Prediction and National Security,” (CNAS, October 2011), 15, https://www.cnas.org/publications/reports/driving-in-the-dark-ten-propositions-about-prediction-and-national-security%C2%A0. ↩
Robert D. Lamb and Melissa R. Gregg, “Preparing for Complex Conflicts,” (U.S. Institute of Peace, October 2016), 2-3, https://www.usip.org/publications/2016/10/preparing-complex-conflicts. ↩
McChrystal et al., Team of Teams, 70. ↩
Rania Abouzeid, “How Mohammed Bouazizi Sparked a Revolution,” Time (January 21, 2011), http://content.time.com/time/magazine/article/0,9171,2044723,00.html. ↩
McChrystal et al., Team of Teams, 20. ↩
Danzig, “Driving in the Dark,” 19-21. ↩
McChrystal et al, Team of Teams; John Dowdy and Kirk Reickhoff, “Agility in US National Security,” mckinsey.com, March 2017, http://www.mckinsey.com/industries/public-sector/our-insights/agility-in-us-national-security; Eric G. Kail, “Leading Effectively in a VUCA Environment,” Harvard Business Review (December 3, 2010), https://hbr.org/2010/12/leading-effectively-in-a-vuca. ↩
Prashant R. Patel and Michael P. Fisherkeller, “Prepare to be Wrong: Assessing and Designing for Adaptability, Flexibility, and Responsiveness,” (Institute for Defense Analyses, August 2013), 7, https://www.ida.org/idamedia/Corporate/Files/Publications/IDA_Documents/CARD/ida-document-p-5005.ashx. ↩
Stratfor, “Special Report: U.S.-NATO, Facing the Reality of Risk in Pakistan,” stratfor.com, April 27, 2009, https://www.stratfor.com/analysis/special-report-us-nato-facing-reality-risk-pakistan-stratfor-interactive-map. ↩
Joseph T. Threston, “The AEGIS Weapon System,” Naval Engineer’s Journal, 121 no. 3 (2009), 85-108. ↩
Robert McNamara, "Mutual Deterrence," (San Francisco, September 18, 1967), https://astro.temple.edu/~rimmerma/mutual_deterrence.htm. ↩
Patel and Fischerkeller, “Prepare to be Wrong,” 8. ↩
Ash Carter, “Remarks on Opening DIUx East and Announcing the Defense Innovation Board,” (Cambridge, MA, July 26, 2016), https://www.defense.gov/News/Speeches/Speech-View/Article/858155/remarks-on-opening-diux-east-and-announcing-the-defense-innovation-board/. ↩
Christopher J. Lamb, Matthew J. Schmidt, and Berret J. Fitzsimmons, “MRAPs, Irregular Warfare, and Pentagon Reform,” (Institute for National Strategic Studies, June 2009), 16, http://usacac.army.mil/cac2/cgsc/sams/media/MRAPs.pdf. ↩
Chris Rohlfs and Ryan Sullivan, “The MRAP Boondoggle,” Foreign Affairs (May 4, 2017), https://www.foreignaffairs.com/articles/2012-07-26/mrap-boondoggle; Christopher J. Lamb and Sally Scudder, “Why the MRAP Is Worth the Money,” Foreign Affairs (May 4, 2017), https://www.foreignaffairs.com/articles/afghanistan/2012-08-23/why-mrap-worth-money. ↩
Naomi Larson, “The ‘Supers’ Who Can Predict the Future: Can We Learn How to See Disaster Coming?” The Guardian, March 6, 2017, https://www.theguardian.com/global-development-professionals-network/2017/mar/06/forewarned-is-forearmed-learning-how-to-predict-the-future. ↩
Early Warning Project, https://www.earlywarningproject.org. ↩
Walter Frick, “What Research Tells Us About Making Accurate Predictions,” Harvard Business Review (February 2, 2015), https://hbr.org/2015/02/what-research-tells-us-about-making-accurate-predictions. ↩
Phillip Tetlock, Super Forecasting: The Art and Science of Prediction (New York: Broadway Books, 2015), Chapter 3. ↩
Somini Sengupta, “ Why 20 Million People Are on Brink of Famine in a ‘World of Plenty,’ ” New York Times, February 22, 2017, https://www.nytimes.com/2017/02/22/world/africa/why-20-million-people-are-on-brink-of-famine-in-a-world-of-plenty.html. ↩
Puong Fei Yeh, “Using Prediction Markets to Enhance U.S. Intelligence Capabilities,” Studies in Intelligence, 50 no. 4 (2006), https://www.cia.gov/library/center-for-the-study-of-intelligence/csi-publications/csi-studies/studies/vol50no4/using-prediction-markets-to-enhance-us-intelligence-capabilities.html. ↩
Bruce Bueno de Mesquita, David Newman, and Alvin Rabushka, Forecasting Political Events: The Future of Hong Kong (New Haven: Yale University Press, 1985); Ulf-Dietrich Reips and Frederik Funke, "Interval-Level Measurement with Visual Analogue Scales in Internet-Based Research: VAS Generator," Behavior Research Methods, 40 no. 3 (2008), 699-704. ↩
Robert Hutchins et al., “Quantifying the Utility of Taking Pills for Cardiovascular Prevention,” Circulation: Cardiovascular Quality and Outcomes, 8 no. 2 (2015), 155-163; Paul Dolan et al., “The Time Trade&dash;Off Method: Results from a General Population Study,” Health Economics, 5 no. 2 (1996), 141-154. ↩
Francis Galton, “Vox Populi (The Wisdom of Crowds),” Nature, 75 no. 7 (1907), 450-451, http://www.nature.com/nature/journal/v75/n1949/abs/075450a0.html. ↩
James Surowiecki, The Wisdom of Crowds: Why the Many Are Smarter than the Few and How Collective Wisdom Shapes Business, Economies, Societies, and Nations (New York: Doubleday, 2004), Chapter 1. ↩
Phillip Tetlock, Barb Mellers, Nick Rohrbaugh, and Eva Chen, “Forecasting Tournaments: Tools for Increasing Transparency and Improving the Quality of Debate,” Current Directions in Psychological Science, 23 no. 4 (2014), 290-295, https://faculty.wharton.upenn.edu/wp-content/uploads/2015/07/2014---forecasting-tournaments-tools-for-increasing-transparency-and-improving-debate.pdf. ↩
Barbara Mellers et al., “Psychological Strategies for Winning a Geopolitical Forecasting Tournament,” Psychological Science, 25 no. 5 (2014), 1106-1115, https://faculty.wharton.upenn.edu/wp-content/uploads/2015/07/2014---psychological-strategies-for-winning-a-tournament.pdf. ↩
Mellers et al., “Psychological Strategies for Winning a Geopolitical Forecasting Tournament,” 1106-1115; Alix Spiegel, “So You Think You’re Smarter Than a CIA Agent?” npr.org, April 2, 2015, http://www.npr.org/sections/parallels/2014/04/02/297839429/-so-you-think-youre-smarter-than-a-cia-agent. ↩
Mellers et al. “Psychological Strategies for Winning a Geopolitical Forecasting Tournament,” 1106-1115. ↩
Iowa Electronic Markets, “Trader’s Manual,” tippie.biz.uiowa.edu, https://tippie.biz.uiowa.edu/iem/trmanual/. ↩
Iowa Electronic Markets, “2016 U.S. Presidential Election Markets,” tippie.biz.uiowa.edu, https://tippie.biz.uiowa.edu/iem/markets/pres16.html. ↩
“Prediction 2016,” The Economist (January 2, 2016), http://www.economist.com/news/united-states/21684798-how-jesse-jackson-inadvertently-revived-political-betting-prediction-2016; Kenneth J. Arrow et al., “The Promise of Prediction Markets,” Science, 320 no. 5878 (2008), 877, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.320.1811&rep=rep1&type=pdf. ↩
Pavel Atanasov and Regina Joseph, “Which Election Forecast Was the Most Accurate? Or Rather: The Least Wrong?” Washington Post, November 30, 2016, https://www.washingtonpost.com/news/monkey-cage/wp/2016/11/30/which-election-forecast-was-the-most-accurate-or-rather-the-least-wrong/?utm_term=.0fb5fe653f5f. ↩
Arrow et al., “The Promise of Prediction Markets,” 877. ↩
Pavel Atanasov, Phillip Rescober, Eric Stone, Samuel Swift, Emile Servan-Schreiber, Phillip Tetlock, Lyle Ungar, and Barb Mellers, “Distilling the Wisdom of Crowds: Prediction Markets Versus Prediction Polls,” Management Science (2016), http://pubsonline.informs.org/doi/abs/10.1287/mnsc.2015.2374. ↩
Jessica Contrera, “Here’s How to Legally Gamble on the 2016 Race,” Washington Post, March 28, 2016, https://www.washingtonpost.com/lifestyle/style/heres-how-to-legally-gamble-on-the-2016-race/2016/03/28/14397dde-f1dc-11e5-85a6-2132cf446d0a_story.html?utm_term=.4f39fd7482b5; Katy Bachman, “Meet the ‘Stock Market’ for Politics,” Politico, October 31, 2014, http://www.politico.com/story/2014/10/predictit-online-politics-stock-market-112374. ↩
Meghan Holloway, “Profiting from Terrorism: Futures Markets and World Events,” Brown Political Review, December 9, 2014, http://www.brownpoliticalreview.org/2014/12/profiting-from-terrorism-futures-markets-and-world-events/. ↩
Emile Servan&dash;Schreiber et al., "Prediction Markets: Does Money Matter?" Electronic Markets, 14 no. 3 (2004), 243-251, http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.637.7533&rep=rep1&type=pdf. ↩
Leo Lester, Brian Efird, and Ben Wise, “Reforming the Role of State-Owned Enterprise in China’s Energy Sector: An Analysis of Collective Decision-Making Processes Using the KAPSARC Toolkit for Behavioral Analysis (KTAB),” (King Abdullah Petroleum Studies and Research Center, July 2015), 15, https://www.kapsarc.org/openkapsarc/kapsarc-toolkit-for-behavioral-analysis-ktab/. ↩
Jason B. Scholz, Gregory J. Calbert, and Glen A. Smith. “Unravelling Bueno de Mesquita’s Group Decision Model,” Journal of Theoretical Politics, 23 no. 4 (2011), 510-531; Clive Thompson, “Can Game Theory Predict When Iran Will Get the Bomb?” New York Times, August 15, 2009, http://www.nytimes.com/2009/08/16/magazine/16Bruce-t.html. ↩
Bruce Bueno de Mesquita, "Forecasting Policy Decisions: An Expected Utility Approach to Post-Khomeini Iran." PS: Political Science & Politics, 17 no. 2 (1984), 226-236. ↩
Michael A. M. Lerner and Ethan Hill, “The New Nostradamus.” GOOD Magazine (February 17, 2015), https://www.good.is/articles/the-new-nostradamus. ↩
Ben Wise, Leo Lester, and Brian Efird, “An introduction to the KAPSARC Toolkit for Behavioral Analysis (KTAB) using one-dimensional spatial models,” (King Abdullah Petroleum Studies and Research Center, May 2015), https://www.kapsarc.org/openkapsarc/kapsarc-toolkit-for-behavioral-analysis-ktab/. ↩
“Open Source Indicators Program (Solicitation #: IARPA-BAA-11-11),” (IARPA, August 23, 2011), 5, https://www.fbo.gov/index?s=opportunity&mode=form&id=cf2e4528d4cbe25b31855a3aa3e1e7c9&tab=core&_cview=0. ↩
Leah McGrath Goodman, “The EMBERS Project Can Predict the Future with Twitter,” Newsweek (March 7, 2015), http://www.newsweek.com/2015/03/20/embers-project-can-predict-future-twitter-312063.html. ↩
Kristen Jordan, “Predictive Analytics: A New Tool for the Intelligence Community,” The Cipher Brief, December 4, 2016, https://www.thecipherbrief.com/article/tech/predictive-analytics-new-tool-intelligence-community-1092. ↩
Charles Duhigg, “How Companies Learn Your Secrets,” New York Times, February 16, 2012, http://www.nytimes.com/2012/02/19/magazine/shopping-habits.html. ↩
David Lazer et al., “The Parable of Google Flu: Traps in Big Data Analysis,” Science, 343 no. 6176 (2014), 1203-1205, https://dash.harvard.edu/bitstream/handle/1/12016836/The%20Parable%20of%20Google%20Flu%20(WP-Final).pdf?sequence=1. ↩
Will Knight, “The Dark Secret at the Heart of AI,” Technology Review (May/June 2017). ↩
Charles E. Fisk, “The Sino-Soviet Border Dispute: A Comparison of the Conventional and Bayesian Methods for Intelligence Warning,” in Inside CIA’s Private World: Declassified Articles from the Agency’s Internal Journal 1955-1992, ed. H. Bradford Westerfield (New Haven: Yale University Press, 1995), 264-273. ↩
Central Intelligence Agency, “Handbook of Bayesian Analysis for Intelligence” (Central Intelligence Agency, Directorate of Intelligence, Office of Political Research, June 1975), 1-3, https://www.cia.gov/library/readingroom/docs/CIA-RDP86B00269R001100080001-0.pdf. ↩
Daniel Kahneman and Amos Tversky, “On the Psychology of Prediction,” Psychological Review, 80 no. 4 (1973), 237-251. ↩
Peter Sedlmeier and Gerd Gigerenzer, “Teaching Bayesian Reasoning in Less than Two Hours,” Journal of Experimental Psychology: General, 130 no. 3 (2001), 380, https://www.apa.org/pubs/journals/releases/xge-1303380.pdf. ↩
Connie Veillette, “Plan Colombia: A Progress Report” (Congressional Research Service, June 22, 2005), 5, https://fas.org/sgp/crs/row/RL32774.pdf; Daniel Mejía, “Plan Colombia: An Analysis of Effectiveness and Costs” (The Brookings Institution, 2015), 5, https://www.brookings.edu/wp-content/uploads/2016/07/Mejia-Colombia-final-2.pdf. ↩
William Neuman, “Defying U.S., Colombia Halts Aerial Spraying of Crops Used to Make Cocaine,” New York Times, May 14, 2015, https://www.nytimes.com/2015/05/15/world/americas/colombia-halts-us-backed-spraying-of-illegal-coca-crops.html?_r=0. ↩
Sibylla Brodzinsky, “FARC Peace Talks: Colombia Nears Historic Deal After Agreement on Justice and Reparations,” The Guardian, September 23, 2015, https://www.theguardian.com/world/2015/sep/24/farc-peace-talks-colombia-nears-historic-deal-after-agreement-on-justice-and-reparations. ↩
Marco Rubio, “U.S.-Colombia Partnership Should Be Strengthened,” Miami Herald, November 18, 2014, http://www.miamiherald.com/opinion/op-ed/article4001535.html. ↩
Alba Tobella and Christine Armario, “U.S. Says Colombia’s Coca Production Surges to Record Levels,” Associated Press, March 14, 2017, http://bigstory.ap.org/article/cf342b9aaa8b4235b8071f485d99406f/us-colombia-coca-production-surges-record-levels; United Nations Office on Drugs and Crime, “Coca Crops in Colombia Increase Almost 40 per Cent over One Year,” unodc.org, July 8, 2016, https://www.unodc.org/unodc/en/frontpage/2016/July/coca-crop-in-colombia-increases-almost-40-per-cent-over-one-year_-new-unodc-report.html. ↩
teleSUR, “Colombia’s Attorney General Calls for Renewed Aerial Fumigation,” telesur.net, September 5, 2016, http://www.telesurtv.net/english/news/Colombias-Attorney-General-Calls-for-Renewed-Aerial-Fumigation-20160905-0024.html. ↩
James Bargent, “March of the White Powder: A Cocaine Boom Could Derail Colombia’s Peace Process,” news.vice.com, March 30, 2017, https://news.vice.com/story/a-cocaine-boom-could-derail-colombias-peace-process. ↩
Barbara Mellers et al., “Identifying and Cultivating Superforecasters as a Method of Improving Probabilistic Predictions,” Perspectives on Psychological Science, 10 no. 3 (2015), https://faculty.wharton.upenn.edu/wp-content/uploads/2015/07/2015---superforecasters.pdf. ↩

Authors

Dr. Kathryn McNabb Cochran

Next Generation National Security Fellow, 2016, Good Judgment, Inc.

Kathryn McNabb Cochran is the Director of Foreign Policy Research at Good Judgment Inc, the corporate spin off of the Good Judgment Project, an IARPA funded initiative, which ...
CDR Gregory Tozzi

Senior Military Fellow, 2016-2017

CDR Gregory Tozzi was commissioned as an Ensign in the U.S. Coast Guard in May 1998.  He is a designated Cutterman and Surface Warfare Officer with tours as Engineer Offi...

Podcast
- November 18, 2018
Loren DeJonge Schulman on The Smell of Victory Podcast
On The Smell of Victory Podcast, Bob Hein and Phil Walter sat down with Loren DeJonge Schulman of the Center for a New American Security to discuss the draft. Listen to the f...

By Loren DeJonge Schulman
Commentary
- The Atlantic
- November 15, 2018
Trump Gets NATO Backwards
Returning from the World War I armistice commemoration in Paris, President Trump reemphasized his view of America’s European allies. “We pay for large portions of other countr...

By Richard Fontaine
Video
- November 13, 2018
Amb. Nuland on N. Korea: The U.S. 'needs to get back into real diplomacy'
Amb. Victoria Nuland, CEO of the Center for a New American Security and former Assistant Secretary of State, joins Ali Velshi to discuss reports that North Korea is moving ahe...

By Victoria Nuland
Commentary
- The Australian Financial Review
- November 8, 2018
US midterm elections 2018: Democrats abroad in the Indo-Pacific
A partial "blue wave" crested over the US House of Representatives this week, ushering in a Democratic majority there for the first time in eight years. With Republicans stren...

By Richard Fontaine

View All Reports View All Articles & Multimedia

Publications

Research Areas

Resident Experts

Adjunct Experts

Who We Are

CNAS Programs

Press

Events

Connect

Getting it Righter, Faster

Introduction

The Need for Agility

Enabling Adaptation and Flexibility

Enabling Responsiveness