Let us define what we mean here by “prediction.” To predict something is to say that a specific thing will happen. Examples of prediction in technology include:
- A computer will beat the best human player in chess by the year 2000
Ray Kurzweil, 19901
(The computer DeepBlue beat world champion Gary Kasparov in 1997)
- Mail will be delivered between New York and Australia by guided missiles by 1969
Arthur Summerfield, US Postmaster General, 19592
(This didn’t happen)
- 25% of all cars will be electric by 2040
Bloomberg New Energy Finance report, 20163
These predictions are specific and concrete enough to be objectively verifiable. Statements about the future can be used for many things, such as inspiring people or helping them make decisions4, but here we are interested in predictions for their ability to be objectively tested by comparing them to the real world. If the prediction matches the real world, the prediction was correct, and if it doesn’t, the prediction was incorrect. Correct predictions may then be used for other purposes like making decisions or inspiration, but their usefulness will stem from their accuracy, not other factors.
Most predictions, particularly successful predictions, are based on a model of the real world. For example, if someone asks you if juice made of apples would taste good, you probably first imagine the juice, and then answer the question. That imagined food is a model of the world, in your head. The model has the components of the juice, how they taste in your mouth, and what you yourself think of that taste. We make models like this all the time; whenever we imagine, plan, forecast, or predict, we are typically using models. These models might be implicit, but they’re there.
What’s Inside a Model
Causes: Not All of Them
In order to be accurate, a model does not have to be a complete model of everything. Your mental model of food likely doesn’t include atoms, and it definitely doesn’t include Jupiter. Some parts of the real world that aren’t included in your model, like Jupiter, have no bearing on what causes food to taste good on Earth, so it’s clear they are not necessary for the model to make good predictions. However, some of the things your model skips over are actually critical to what makes a food taste good, like atoms. Your experience of how a food tastes is determined by how a fat or sugar molecule’s atoms lay inside your taste buds’ receptor molecules. Your experience is also determined by a myriad of other factors, like how the taste buds are coupled to neurons that send a signal into your brain. But our models can still make good predictions without perfectly representing all aspects of the relevant parts of the universe. This is because the world often has persistent relationships. For example, apple juice has sugar molecules in it, and those molecules’ atoms can interact with your taste buds’ receptors in a particular way, and those receptors interact with the neurons in your tongue in a particular way, and that eventually gives your mind a particular sensation of taste. Because this chain of causes in the world is persistent, your model of the world can simply say “apple juice tastes sweet” and make accurate predictions. There can be fathomless depths of causality hiding inside that statement, but because the hidden causal interactions are persistent, they can be compressed. This compressibility of the world is what allows a simple model to make correct predictions.
Sometimes, however, those hidden chains of causality are not persistent, and so they start to matter. For example, “vinegar tastes sour” is a good predictive model, for the same reason that “apple juice tastes sweet” is a good predictive model: the causes underneath that statement are persistent. But if you eat the fruit of synsepalum dulcificum, often called “miracle fruit,” then our model for vinegar breaks. Miracle fruit contains the molecule miraculin, which binds to the taste receptors that typically bind to sugar. Miraculin stays bound to those receptors, and if you then eat vinegar, a miracle occurs: the normally sour vinegar tastes sweet. The miraculin molecule interacts with the molecules of the vinegar (or any sour food) in a way which causes the sugar-sensitive taste receptor to activate, beginning the chain of causes that lead your mind to experience “tastes sweet.” Your model of “vinegar tastes sour” now predicts the wrong thing, because it did not include the relevant causes.
Essentially all models of the physical world have hidden causality. The process of finding these hidden causes is frequently called “science.” Science involves proposing new models of the world (hypotheses) that yield different predictions from our old models, then testing to see if those new predictions are correct (experiments). For example, the old model of “vinegar tastes sour, regardless of having eaten a miracle fruit” would get supplanted by “vinegar tastes sour typically, but tastes sweet after having eaten a miracle fruit,” because the second model gave better predictions. Science could then go deeper to understand what the causes were that led vinegar to taste sweet with the miracle fruit, finding previously hidden mechanisms like taste receptors, molecules, etc. The same inquiry could be used to understand why vinegar didn’t taste sweet without the miracle fruit! Through the scientific process we can get ever more accurate understandings of the causes in the world, which allow us to create models that give us ever more accurate predictions.
Predictors: Maybe Causes, Maybe Not
Consider two bottles, one labeled “apple juice” and one labeled “vinegar.” You have not eaten a miracle fruit, and none are around. Which bottle’s contents will taste sweet, and which will taste sour? Why? In this case, drinking the bottle labeled “apple juice” will lead you to taste sourness, and drinking the bottle labeled “vinegar” will lead you to taste sweetness. The reason, of course, is the bottle labeled “apple juice” actually contained vinegar, and the bottle labeled “vinegar” actually contained apple juice. You know why changing the labels did not cause a change in taste: bottle labels do not cause taste (unless you eat the label). But even knowing this, you may have used the bottle label to predict the taste. You may have a model that says bottles are labeled “apple juice” and their contents taste sweet due to the same common cause: actually containing apple juice. Seeing a bottle labeled “apple juice” led you to infer that it contained apple juice, and thus that the contents were sweet. You would not have thought that the bottle label caused the taste, but you would have thought it predicted the taste. Thus, the bottle label was a predictor with a predictive relationship with the thing to be predicted. In contrast, “drinking apple juice causes a sensation of sweet taste” considers the apple juice also as a predictor, but it has an explanatory or causal relationship with the thing predicted5.
Predictive relationships can be valuable because they can, of course, lead to correct predictions. These predictions are not guaranteed, however, since other forces can get in the way and break the apparent relationship between a predictor and the predicted thing, as happened with the bottle labels. This most notably happens if we confuse a predictor for a cause and assume that manipulating that predictor ourselves will change the outcome. Making that mistake might lead us to write “apple juice” on a bottle and pour water into it, then wonder why the liquid inside doesn’t taste sweet. Predictors are very useful, but we should not assume they are necessarily causes.
Associational Models: Making Acausal Predictions and Inspiring Causes
Not all models try to understand causes; sometimes we construct models solely for the purpose of prediction. In these prediction-focused models there may be no explicit causal structure, but simply associations between one or more variables and the thing to predict. These are acausal, associational models. For example, you may have noticed that all bottles labeled “apple juice” that you ever encountered had contents that tasted sweet, and use that fact alone to predict that any bottle in the future labeled “apple juice” will have contents that taste sweet. The current exemplar for acausal, associational models is called “machine learning”6; machine learning techniques essentially transform many kinds of data in increasingly-sophisticated ways to find associations between the thing to be predicted (like your risk of a car accident) and many possible predictor variables (age, income, education, car, etc.). In technology development, a long-used predictive model is trend extrapolation. In trend extrapolation the thing to be predicted is the change in performance of a technology (like computer performance) and the predictor variable is the amount of time. Predictive models like these are then tested by predicting on new data (such as data from the future), to test if the associations in the original data that the model was trained on continue to hold for new data. If the associations continue to hold, predictive models can make good predictions for new parts of the world, which is their purpose.
These prediction-focused, associational models can potentially do great prediction, given the right data that shows the right associations between parts of reality. However, they are fragile to changes in the world that weren’t reflected in the data they were trained on, particularly if they rely on predictors that are not the actual causes of the thing to be predicted. For example, a predictive model based on trend extrapolation will predict that a trend in technology will continue as long as the years continue to go by, regardless of data from economics, engineering, or physics. But the actual causes of technology development are not calendar years. The longtime success story of technology trend extrapolation, Moore’s law, has now slowed to what may soon be a halt7. But technology can also develop faster than trends predict: genome sequencing prices dropped smoothly between 2001 and 2008 from $100 million to $10 million, but then plummeted to $10 thousand by 2011, 2.5 orders of magnitude below the expectation from the previous trend8. Association-based predictive models can give more robust prediction by expanding the amount of data they are trained on. Large amounts of data has been the basis of the current machine learning explosion. However, unless they happen to capture the real causal mechanism as one of their associations, these models still work best with predicting situations that are similar to situations in the data they were trained with (e.g. the same period of history for trend extrapolation). Outside of that regime they are blind.
The solution to models that cannot accurately predict the parts of the world you want to predict is to make better models. To accurately predict a part of the world that is different from previously observed parts of the world, the best hope is to use models that reflect the causal mechanisms of reality. In order to identify these causal mechanisms, we need to do science. We need to propose mechanistic models for how the world works (hypotheses) and use them to make and test falsifiable predictions (experiments). But where do we get these hypotheses from? Frequently, association-based modeling. “Correlation doesn’t imply causation, but it does waggle its eyebrows suggestively and gesture furtively while mouthing ‘look over there’”9. An association between two variables (like apple juice and sweetness) are data that a causal model would need to explain, and they are frequently the inspiration behind studies that eventually find more detailed mechanisms (like sugar and taste receptors).
What’s Inside an Experiment
Calibration of Uncertainty
Your model of the world may be predictively accurate, in that if a food tastes good in the model it also tastes good in the real world. The model can also be useful without being perfectly accurate. A model that accurately predicted how foods would taste 90% of the time might still be useful, especially compared to a model that was correct 50% of the time. But even better is a model that you know gives correct predictions only 90% of the time. If you think a model is perfectly accurate, you will think you will be able to make decisions that will allow you to enjoy 100% of your meals. But if you know the model is only correct for 90% of foods, you can can anticipate the occasional surprise and prepare accordingly (like having a bottle of mouthwash handy). Conversely, if you think a model is correct 50% of the time and it’s actually 90%, you’re missing out on predictive power; you may decide to just select food by flipping a coin, even though you could use your imperfect model and enjoy 40% more of your meals!
Knowing the accuracy of a model’s predictions is called calibration10. Calibration is a statement about the certainty or confidence associated with any prediction. Instead of your model simpling predicting “Vinegar will taste bad” and “Apple juice will taste good,” it could instead predict “Vinegar will taste: Bad-90%, Good-10%” and “Apple juice will taste: Good-70%, Bad-30%.” If the model’s predictions with 90% confidence are correct 9 out 10 times, and its predictions with 70% confidence are correct 7 out of 10 times, this model is well-calibrated. Most models of interesting phenomena are imperfect, but the well-calibrated models are more useful because they allow you to correctly predict how often you’ll be right or wrong, on average.
What properties does a model need to have in order for it to be anything other than completely certain? A model could include a mechanism that is variable. For example, if someone rolls a fair die and you need to predict what value will come up, your mental model includes the fact that die rolls are variable, and that they come up 1, 2, 3, 4, 5, and 6 equally often. So your model will predict “1-1/6, 2-1/6, 3-1/6, 4-1/6, 5-1/6, 6-1/6.” This is how a well-calibrated model will predict outcomes that have causes that are variable.
Variability, however, is in the eye of the beholder. Imagine someone rolls a die, looks at it, covers it, and then asks you what the value is. Your mental model has the same information as before, and so it again should predict “1-1/6, 2-1/6, 3-1/6, 4-1/6, 5-1/6, 6-1/6.” The other person, however, saw the die’s value and so has more information. Their mental model will give a different response: “5-100%” These different responses are due to different uncertainty about the world, which is caused by a combination of what data we have observed and what our estimates of the world were prior to receiving that data.
To return to our apple juice, maybe our initial model is “Apple juice tastes: Good-50%, Bad-50%.” We drink three glasses of apple juice and they all taste good, so perhaps we update our model to “Apple juice tastes: Good-88%, Bad-12%,” depending on how strongly you believed in your initial model. Later we drink another glass of more apple juice and it tastes bad, so we update our model to “Apple juice tastes: Good-70%, Bad-30%.” As we drink more and more samples of apple juice, our model’s distribution of good and bad will shift to match the real world variability of apple juice11.
The variability of the real world and the calibration of your model are important for identifying if your model is making true predictions about the world. For example, consider predicting a technology’s future performance using trend extrapolation. Historically, these models predict a value within ±30% of the real value, about 50% of the time12. Are these good predictions? If these models were 50% confident that the true value would lay within that range, then these models gave true predictions about the world. If these models were 99% certain the true value would lay within that range, then they did not make true predictions about the world. And if the models gave no statement about their certainty, but just predicted a single value at one specific point, then they were likely all completely wrong. Unfortunately, most trend extrapolation has no statement of certainty, and may implicitly fall in the last group. More informative trend extrapolation uses historical data to estimate the variability of the trend, and then uses that variability to generate a distribution of confidence values for all possible levels of future performance. These methods can and have been used for predicting technology development, with promising initial results13. Having well-calibrated predictions of future technology allows us to weigh different outcomes by how probable they are, and then prepare for the entire distribution of outcomes accordingly.
Adding Predictors to Reduce Uncertainty and Test Causes
A well-calibrated model that accurately reflects the distribution of good and bad tasting apple juice is useful, because we can use that to make predictions of how often apple juice tastes good or bad. But it would be even more useful if it reflected which of the juice tasted good or bad; i.e. if it made better predictions. Such a model would need more data than just the good and bad taste of all apple juice. It would need additional data, predictors of the apple juice taste. As we’ve discussed, predictors are ideally causes of the apple juice’s taste, but they could also be factors that don’t cause the taste, but have historically associated with it. Whatever their origin, predictors allow a model to segment apple juice down further, such as into green and red apple juice. After collecting data, we may get a model that says “Green apple juice tastes: Good-10%, Bad-90%; Red apple juice tastes: Good-90%, Bad-10%.” This model may make much better predictions due to the lowered uncertainty, which is thanks to the additional predictors.
A predictor may enable you to make better predictions, but what if it’s not a cause? Eventually you may run into a situation like the bottle labels, where your predictive model will fail. How can we determine if using green apples to make apple juice actually causes the bad taste, or if it’s an acausal predictor that has only benefited prediction due to its historical association with bad taste? There are two answers: 1. add more predictors, or 2. run an experiment.
Add More Predictors
We could add another predictor: the ripeness of the apples used to make the juice. Ripe apples can be green or red, but unripe apples can only be green. We collect data on the taste of apple juice, considering the apples’ color and their ripeness, and find: “Unripe green apple juice tastes: Good-0%, Bad-100%; Ripe green apple juice tastes: Good-90%, Bad-10%; Ripe red apple juice tastes: Good-90%, Bad-10%.” Ripeness is what’s important! The color of the apple appears to have no relevance for predicting taste.
Run An Experiment
One might say that adding more predictors is all well and good, but to find the causes we really need to run an experiment. However, that is what we just did: we tested the effect of apple color on juice taste by controlling for the effect of apple ripeness. Testing the influence of one factor while controlling for other factors is an experiment. The process of finding causes with experiments is the process of testing predictors by adding other predictors.
Earlier we introduced “predictors” versus “causes” because the distinction fits our intuitions and reflects the ideal that we are pursuing. Now that we have built up understanding about how we use data to build models of the world, we can handle a more complete picture: in practice, “causes” are not functionally different from other predictors. True causes can be found, but we can never be sure that’s what they are. In our experiments with apple juice we have only controlled for apple color and ripeness; but perhaps the unripe apples actually contain very little juice, so they can’t make the plain water taste good. If we made an apple juice with a great many unripe apples, perhaps that juice would taste good, which our predictive model would have failed to predict correctly. Unripe apples’ possible lack of juice is an example of a confounding factor we can “control for,” alternatively known as “collect data on.” But after collecting this data (or running this experiment) we see that controlling for quantity of juice doesn’t help our predictions. The ripeness of the apples remains a cause of flavor, or so it seems. What we call “causes” are predictors whose predictive power has survived new data, so far. Practically, causes are not so much “true” as they are “tough”14.
Science finds new data, like the miracle fruit’s effect on taste, that we could not previously predict. There is already no shortage of such data; nearly every model is created with an imperfect ability to predict15. Science is also the process of finding factors that, when we account for them, yield better prediction. In practice what sets apart “doing science” from making acausal predictive models is deciding what data to collect. There is a difference between having data from many different predictors and having data that can be used to distinguish which of those predictors can be explained by the others. Perhaps in our experiment with the unripe apples all the unripe apples had been stored in a dirty kitchen sink, marring their flavor. If we are creating an acausal, associational model we would not need to care; the model may still make good predictions, up until we start predicting the taste of apple juice from a different kitchen. But if our goal is to identify possible causes, then we must consider such potential confounders and other ways the data could be misrepresenting reality16. The work of science is then to carefully construct a data set that makes clear what is the influence of each predictor, including the kitchen sink17.
When we think of new factors to control, we are essentially proposing different causal models (often called hypotheses). If we think that data on the kitchen sink may enable better predictions, we are proposing a causal model in which the kitchen sink is a cause. If we think that data on the person making the juice may enable better prediction, such as whether they are right- or left-handed, we are considering a causal model in which handedness is a cause18. There are infinitely many causal models that explain any data set perfectly. How do we know which of these causal models are credible? By testing their predictions against new data that is different from what we have seen before. New data that shows new associations shrinks the space of causal models that are plausible. When we design and run an experiment we are constructing a set of data that will reduce our uncertainty in what causal models are probable. We have historically found that simpler models do better on accurately predicting new data, and so we often have a prior assumption that the simpler causal models are more probable (“Occam’s Razor”). However, if the data of the real world is shown to have many complexities, like with miracle fruit, then the most probable surviving models may also have many complexities, like taste receptors and miraculin molecules and neurons.
We never know for certain what the true causes are. As with the miracle fruit, there is always the opportunity for some part of the world to hinge on a hidden cause that wasn’t in the models that previously seemed most probable. As such, the goal of predictive modeling is not to be right, but to be less wrong. “All models are wrong, but some are useful”19. With the continued thoughtful inspection of purposefully collected data, we can create models of reality that give ever better prediction and are ever more useful.
A Note on Qualitative and Quantitative Models
When you make a mental model, like a model of what apple juice will taste like, it often feels different from a model that you have written down. This is particularly true if the model written outside your head has been formalized to include explicit rules, like “Apple juice always taste sweet” or perhaps “IF consume apple juice THEN taste sweetness.” Your mental model certainly feels different if the written rule is “
Consume apple juice \implies p(taste sweetness)=0.873.”
The difference between mental models and formalized models has often been cast as “qualitative” vs. “quantitative” models. However, qualitative and quantitative models can act a lot alike. The gut predictions of a well-trained expert have a lot of the same properties as a well-trained artificial neural network. Both human minds and artificial networks can contain very accurate predictive models of the world, and they are created by observing potentially vast amounts of data. More importantly, in both cases the internal dynamics of the model can be unknown. A human expert can make an accurate prediction just because it feels right, and artificial neural networks can be horribly opaque in the meaning of what is being calculated inside. These two kinds of models are implicit: the structure of the model is unknown or unclear. Implicit models are in contrast to explicit models, where the structure of the model is known. The main thing that functionally sets qualitative models apart from many quantitative models is that their internal structure is unclear; a human may be able to somewhat explain their gut judgment, but often not completely enough to enable reproducing the model elsewhere.
Explicit models are more easily scrutinized, and thus more easily tested and improved. The implicit models of human judgment can be worse than explicit models if the latter has been repeatedly refined. However, when people are able to update their judgment in response to new data, then the implicit mental models of humans can still be part of the the world’s best predictors20. But regardless of whether models are explicit or implicit, quantitative or qualitative, for the purpose of making predictions they can be functionally the same. They all make predictions about the world, they all can be accurate or inaccurate, they all can be well- or poorly-calibrated, and they all can be updated in response to new data.
- Ray Kurzweil, The Age of Intelligent Machines (The MIT Press, 1990).↩
- David William English, The Air Up There (McGraw Hill Professional, 2003).↩
- “New Energy Outlook 2016” (Bloomberg New Energy Finance, June 2016).↩
- Persistent Forecasting of Disruptive Technologies (Washington, D.C.: National Academies Press, 2009), http://www.nap.edu/catalog/12557.↩
- Galit Shmueli, “To Explain or to Predict?” Statist. Sci. 25, no. 3 (August 2010): 289–310, doi:10.1214/10-STS330.↩
- Machine learning includes such disparate methods as artificial neural networks, random forests, and support vector machines. They all transform and associate data without assuming any particular causal structure.↩
- M. Mitchell Waldrop, “The Chips Are Down for Moore’s Law,” Nature 530, no. 7589 (February 9, 2016): 144–47, doi:10.1038/530144a.↩
- “The Cost of Sequencing a Human Genome,” July 6, 2016, https://www.genome.gov/27565109/the-cost-of-sequencing-a-human-genome/.↩
- Randall Munroe, “Xkcd: Correlation,” March 6, 2009, https://xkcd.com/552/.↩
- A. Philip Dawid, “The Well-Calibrated Bayesian,” Journal of the American Statistical Association 77, no. 379 (1982): 605–10, http://amstat.tandfonline.com/doi/abs/10.1080/01621459.1982.10477856.↩
- This whole process is what happens in Bayesian statistics.↩
- Shannon R. Fye et al., “An Examination of Factors Affecting Accuracy in Technology Forecasts,” Technological Forecasting and Social Change 80, no. 6 (July 2013): 1222–31, doi:10.1016/j.techfore.2012.10.026; Carie Mullins, “Retrospective Analysis of Technology Forecasting: In-Scope Extension,” August 13, 2012.↩
- J. Doyne Farmer and François Lafond, “How Predictable Is Technological Progress?” Research Policy 45, no. 3 (April 2016): 647–65, doi:10.1016/j.respol.2015.11.001.↩
- Stephen Thornton, “Karl Popper,” in The Stanford Encyclopedia of Philosophy, ed. Edward N. Zalta, Winter 2015, 2015, http://plato.stanford.edu/archives/win2015/entries/popper/.↩
- Alan Musgrave and Charles Pigden, “Imre Lakatos,” in The Stanford Encyclopedia of Philosophy, ed. Edward N. Zalta, Summer 2016, 2016, http://plato.stanford.edu/archives/sum2016/entries/lakatos/.↩
- There are many.↩
- The data sets most associated with science are randomized control trials, which aim to control for all possible factors by randomly assigning objects to be tested in some way. This works great if you can readily place certain properties on objects, like taking an apple and making it ripe or unripe. However, this isn’t always so easy, so other kinds of data construction are also useful tools.↩
- Or at least shares a common cause with the apple juice taste.↩
- George E. P. Box and Norman R. Draper, Empirical Model-Building and Response Surfaces, 1 edition (New York: Wiley, 1987).↩
- Philip E. Tetlock and Dan Gardner, Superforecasting: The Art and Science of Prediction (Crown, 2015).↩
- K. B. Laskey, R. Hanson, and C. Twardy. “Combinatorial Prediction Markets for Fusing Information from Distributed Experts and Models”. In: 18th International Conference on Information Fusion. July 2015, pp. 1892– 1898.↩