For political junkies, polls and predictions are almost irresistible. But they can also overload us with information that's at best confusing and at worst wrong.
Recent high-profile polling misses—from Israel and the U.K. to Michigan—have prompted concerns that election results can no longer accurately be forecast by surveying the electorate and that technological change is causing an irreversible decline in the industry.
That's why Bloomberg Politics undertook an analysis of hundreds of polls, as well as several common prediction models, at use in the 2016 presidential race.
The analysis examined 258 final projections covering 78 state primaries or caucuses—excluding the District of Columbia and overseas territories—from four predictors: RealClearPolitics, an aggregator of statewide polls; PredictWise, an aggregator of betting-market data; FiveThirtyEight, whose "poll-plus" prediction model considers statewide polls, national polls, and endorsements; and Bing Predicts, which combines prediction market data, polling, Internet queries, and social media posts.
The analysis showed how often readers can trust polls and other predictions, when they're most reliable (hint: later in the primary calendar), and which specific candidates have been most discounted this cycle.
Turns Out, the 2016 Polls Haven’t Been That Bad
There's good news for those still looking to polls for insight: In our study, polls, particularly when taken in aggregate, remain a very accurate way to predict elections, and big discrepancies between polls and results are more the exception than the rule.
Of the 524 individual poll predictions collected by RealClearPolitics and HuffPost Pollster conducted within one month of a state primary or caucus, 450 of them (86 percent) correctly forecast the eventual winner. When we strip out the two biggest misses for polling this cycle, the Iowa Republican caucuses and the Michigan Democratic primary, where 33 out of 38 poll predictions missed the mark, this increases the overall accuracy rate to 92 percent.
Events can move fast on the campaign trail, so looking at month-old polls may seem problematic, except that they turn out to be just as reliable, in the aggregate, as surveys conducted closer to when states vote. Of the 183 polls conducted within one week of voting, 157 correctly predicted the eventual winner. That's an identical 86 percent accuracy rate as for the full sample. Stripping out the Iowa Republican and Michigan Democratic polls that conducted one week before voting results in a 92 percent accuracy rate, also the same as for the whole sample.
As for the final poll averages published by RealClearPolitics before voting, the aggregation site correctly predicted the winner 90 percent of the time across the 49 contests covered.
Looking more closely at the 13 individual pollsters who published at least 10 individual contest predictions in two or more states, we find a greater mix of results. Seven pollsters (including SurveyUSA) accurately predicted the winner at least 90 percent of the time, while five more (including YouGov and Public Policy Polling) were right at least 80 percent of the time. The real under-performer in the group at 65 percent was Quinnipiac University, which has a B grade from poll-obsessed FiveThirtyEight.com, the website run by former New York Times stats guru Nate Silver. This is largely because of several polls Quinnipiac conducted in Iowa that showed Donald Trump and Bernie Sanders narrowly in the lead. (Ultimately, only five of 25 Iowa polls correctly predicted Texas Senator Ted Cruz's eventual win in the Hawkeye State.)
Perhaps most surprising, in an era where the gold standard is still considered to be polls of likely voters conducted over the phone, online pollster SurveyMonkey has been shown to be one of the most reliable, with a 92 percent track record. The site has polled 13 contests to date (virtually all on the Republican side) in seven states and always targeted registered (not likely) voters.
The reason polls still matter is that asking people how they plan to vote is still the most direct and accurate way to figure out what they might do in a contest, said Cliff Young, president of public affairs in the U.S. for the market research firm Ipsos.
“We’re in a world now where our toolbox is much, much larger,” Young said. But “you’re always going to need a measure of public opinion.”
Of course, pollsters will continue to face an uphill struggle, needing to make complicated decisions and assumptions about which people are likely to vote and how best to reach a representative subset of them (whether by landline, cell phone, e-mail, or other means). This means careful polling will continue to be time-consuming and expensive in a way that, when combined with our hunger for predictions, is almost guaranteed to lead to a flood of slapdash surveys. For that reason, pollsters' methodologies are an important factor to consider when consuming the polls they publish.
Take the Michigan blunder, for instance. Hillary Clinton was projected to win the state's Democratic primary by 21.4 points in the final RealClearPolitics average, but instead Sanders won by 1.5 points. The problem was that many pollsters designed surveys to measure an electorate that looked—in size and demographics—like the one that had turned out in the state's last contested Democratic presidential primary, in 2008. Yet Barack Obama didn't even appear on the Michigan primary ballot eight years ago after the Democratic National Committee sanctioned the state for moving its primary earlier in violation of party rules.
Polls by themselves can vary widely in quality and reliability, but aggregations of polls including the ones on RealClearPolitics or FiveThirtyEight (which weights results by polls' previous accuracy and also factors in national polls and endorsements), can help smooth out methodological concerns and small sample sizes and allow trends to emerge.
Who’s the Best Predictor? Polls When You Have Them, Prediction Markets When You Don’t
At first glance, the poll aggregators included in this analysis look to be the most accurate predictors, correctly calling the various nominating contests they covered around 90 percent of the time. Prediction markets, as aggregated by PredictWise, a research project led by David Rothschild, an economist at Microsoft Research in New York City, were right nearly 86 percent of the time. Bing Predicts, which takes a more kitchen-sink approach by considering polls and prediction markets, as well as Internet searches and social media, fared slightly poorer at 83 percent.
What seemed to matter a great deal was the type of contest, with predictors faring much better when considering primary contests (of which there were 55 at the state level) than caucuses (of which there were 23). Looking at primaries, all four predictors were right at least 91 percent of the time across whichever contests each covered, beating their overall accuracy rates, with FiveThirtyEight's 94 percent performance the best of the group. The flip side of this universally strong performance is that all four predictors were clearly flummoxed by the more complicated communal pressures that are a hallmark of caucuses, with accuracy rates here ranging from as little as 65 percent for Bing Predicts to 75 percent for RealClearPolitics.
An important consideration when comparing these predictors, of course, is the sometimes wildly different sets of state nominating contests that each has covered or modeled to date. Across the 48 combined Democratic and Republican contests that were covered by all four sources in our analysis, FiveThirtyEight and PredictWise narrowly lead with a 92 percent accuracy rate, followed close behind by RealClearPolitics and Bing Predicts at 90 percent. In other words, where polling is plentiful, there's no real need to look to other sources.
Where polling is less readily available, such as in smaller caucus states out west, prediction markets do a good job filling the gap. Across the 28 nominating contests that PredictWise modeled but where there was insufficient polling for RealClearPolitics to calculate an average, PredictWise correctly predicted the winner 75 percent of the time. Bing, which covered 29 more contests than RealClearPolitics, was right over 72 percent of the time.
As with poll aggregations, in which voter preference seems to be measured best simply by asking about it directly, these other models incorporate soft factors from qualitative information about the race, often from pundits or in news accounts that describe it, that can help politics-watchers get a better sense of the most important dynamics. Ipsos, for instance, incorporates models that include social data early on in races but has found that the predictions tend to converge—or regular polls become even more accurate—as a race progresses and voter preference becomes the single dominating factor, according to Young.
A final factor we considered in our analysis is the impact of the actual primary calendar on predictions. As it turns out, predictors were more likely to go astray in the first five weeks of voting.
By March 8, around 56 percent of the nominating contests to date had already taken place, yet that same period represented 85 percent of all faulty predictions due to a much wider field of candidates (at least 12 on the Republican side alone), and because pollsters are often working from eight-year-old information about likely turnout and core issues.
“Early in the cycle there are more viable candidates. And, there is much less information about the likely voter space and sentiment of the voters,” said David Rothschild, of PredictWise. “Exit polls (turnout and sentiment) and voter files (turnout) are keys for sophisticated predictions of later races.”
How the Predictions Have Been Wrong About Cruz and Sanders
Runner-up candidates Cruz and Sanders sometimes say the media, with its breathless coverage of whatever poll's just been released, has given them short shrift. That kind of dismissal can become self-fulfilling, they argue, scaring away voters who want to align themselves with a winner. When it comes to recent predictions, which have clearly underestimated their campaigns in particular, they may have a point.
About 85 percent of prediction misses in the 2016 cycle occurred in contests where Trump or Clinton were expected to win, and often by sizable margins, only for Cruz or Sanders to pull off a surprise. This pattern encompasses 11 states in total, mostly in the Midwest: Iowa, Minnesota, Oklahoma, Kansas, Nebraska, and Michigan all featured predictions that missed the mark.
As it so happens, Trump and Clinton are currently favored by most of the predictors in Tuesday's primary in Indiana. The question now: Will this be yet another case of underestimating the underdogs, or have the Republican and Democratic front-runners amassed enough of a lead that Indiana will simply extend the predictor accuracy streak of the past six weeks?