Political Prediction Isn't Dead. But It Needs to Adapt
2016 has been difficult for the political data industry. Predictions have been confounded and data-driven campaigns have fallen short. While the mishaps have sometimes involved small errors in contests that happened both to be close and hugely significant, quantitative analysis has lost its mojo and is under fire.
Some of the criticism is unfair. The U.K.'s Brexit referendum, for example, was a one-time event without recent precedent, so polling was always going to be difficult. But it would also be wrong to downplay the challenges that the numbers game faces. While opinion polls have margins of error, that doesn't explain all of them missing in the same direction. Likewise, offsetting errors that happen to cancel each other out, as happened at the U.K. election in 2010 and may have afflicted U.S. national polls, should be taken just as seriously as those that cause wrong calls.
What's clear is that getting a representative sample and predicting who will actually get to the ballot box is getting harder. The former can be reliably achieved only by physically knocking on doors as many times as necessary, a labor-intensive exercise that's expensive to undertake. But the trend of unlikely voters turning out on election day continues to be a significant headache for those of us who make our living from political predictions.
That's because politics itself has changed, and has done so rapidly. Attitudes toward social progressivism -- shown by studies in several countries to be highly correlated with education -- are now playing a significant part alongside the traditional left-right divide over the size and role of the state.
While that's an interesting development, it isn't in and of itself an issue for pollsters. It does become a factor, though, when it changes voting patterns. And what appears to have been happening recently isn't simply that populist insurgencies have attracted strong support among whites without college degrees. Crucially, voter turnout within that demographic -- often reliably low in recent years -- has been increasing. The effect of those new actors can be particularly stark in low turnout countries, including the U.K. and U.S.
One charge frequently leveled at political soothsayers is that they ought to get out of the big cities and talk to a wider range of people around their countries. That's a sensible suggestion; but to measure public opinion accurately in doing so, you need to find representative samples of people in representative locations and ask them identical questions. In other words, you end up conducting opinion polls -- encountering exactly the same challenges already outlined.
Using the results of real elections such as midterms can be a useful guide in some countries, including before the 2015 U.K. general election, but it doesn't work everywhere. It also requires a number of biases that need to be adjusted for, and which aren't necessarily stable over time.
Social media sentiment analysis is one of the most exciting areas of potential research, but it's also fraught with potential pitfalls. Determining whether those expressing opinions in public are representative of the country (geographically, demographically or politically) is one challenge. But a more serious problem is the lack of history. The aforementioned analysis of polling before the 2015 U.K. general election was based on decades of electoral data; the social media era provides a wealth of tweets, but only a tiny number of elections against which to backtest them thus far.
Some have even gone as far as to suggest that polls undermine the democratic process and should be banned. This rests on the perception that polls influence voting behavior. However it is not the polls themselves that are doing the influencing, but expectations of the outcome. In a world with opinion polls, those expectations can be informed by science -- however imperfect -- conducted in good faith. Without polls, they would be based on rumor, briefing, cherry-picked internal data and personal biases. Besides which, how many of the commentators blasting the polls for not anticipating a Trump victory would even have thought it close in the absence of polling?
So what needs to change? More work needs to be done on poll samples and likely-voter analysis, particularly among politically disengaged groups. However, forecasting models can often be improved to better allow for the fact that data is imperfect. A key reason for the huge variation between forecasters at the U.S. Presidential election was the way they dealt with correlated errors that repeat across several geographic regions, as opposed to random polling flaws.
But how the data is communicated also matters. One particular problem seems to involve how probabilities between about 60 and 90 percent tend to be misinterpreted. It's intuitive that a binary outcome with a 50 to 60 percent probability is only a narrow favorite; likewise, one with a likelihood of 90 to 100 percent is a near certainty. But when it comes to something like the 70 to 80 percent range -- likely, but nowhere near certain -- audiences are likely to struggle. They don't instinctively think in terms of probability distributions, and may treat such events as certainties when that is not the message from the numbers.
Political data has had problems in the past and will again in the future. But, like any science, it can and will adapt. Social media analysis may, in time, become a useful indicator as it becomes better understood and more thoroughly tested against outcomes. The more data there is on unlikely voters casting a vote, the more flexible models can become in adjusting for a shift in demographics. But in the absence of a proven alternative, the existing political data framework, imperfect as it is, remains the best we have to work with.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
To contact the author of this story:
Matt Singh at email@example.com
To contact the editor responsible for this story:
Mark Gilbert at firstname.lastname@example.org