Beginning in early September, we started aggregating the internal demographic and partisan data of national polling by 18 outlets. In part, this was an effort to encourage greater transparency by pollsters. In the week before the election, we pulled together an historical view of this data, highlighting a few of the emerging trends, and then filled the Decoder with the most recent high-quality polling data we could find: Our final pre-election data set includes national polls from 10 outlets, all of which were conducted after FBI Director James Comey's October 28 letter to Congress. Here's what the polls indicated on Nov. 8 about partisan turnout, partisan loyalty, and the independent vote—as well as voters' leanings parsed by race, gender, age and education. We've also included the 2012 and 2016 exit polls, and a list of takeaways. All polling is flawed—something the Decoder said on day one. This project is an attempt to better understand what it can, and can't, tell us.
The distribution of results for each poll gives an idea of how much each group—by race, gender, age, education, and party ID—counted in each pollster’s sample. Like the layout above, the candidate margin is still represented on the horizontal axis, but we’ve added the size of each group to the vertical axis. This lets you see how widely the pollsters varied in their estimates of proportion. (Not all polls reported share data for every demographic slice.)
Published October 25, 2016
People love to take potshots at pollsters. But anyone doing pre-election polling is putting a number out there, completely in the public view, which people will eventually judge as right or wrong. I give pollsters great credit for sticking their heads outside the foxhole. Polling is hard and ever harder: Response rates are declining, different segments of the electorate are hard to reach, and we simply don’t know what the actual electorate is going to be—who’s really going to show up to vote on Nov. 8.
Polling is a science, but it’s also an art. It involves modeling, hypothesizing, even highly educated guessing. The margin of error—a statistical calculation that says how confident you are in the result, given the number of people you talk to—only begins to capture its uncertainty.
Different pollsters use not only different methods to gather data (live phone, robo-calling, Internet) and different sample sources (random-digit dialing, sampling from a voter file, and more), but also different methodologies to estimate the shares that key demographic groups will ultimately contribute to the electorate. These variables (plus random error, which is widely ignored by most casual observers) are why polls designed to measure the same electorate, conducted at roughly the same time with the same sample size, can tell such different stories.
Pollsters try to get a handle on two things for each of these demographic groups: How many voters in each group support a particular candidate, and how many will actually show up to vote.
You can glean a lot from looking beneath the top-line horse race numbers at the internals of a poll—not just what pollsters learned when they interviewed voters, but also the assumptions and models they used, which are built on what occurred in the past. Can Hillary Clinton reassemble the Obama coalition, or will an enthusiasm gap keep some home? Will Trump bring new voters—the sort that haven’t voted in the past—into the process? And how loyal will these different segments of potential voters be? White voters will almost certainly cast the majority of their votes for Trump, but by what margin? And what proportion of the vote will they make up? No one knows for sure.
Because the United States has such a huge, diverse population and because there are differences in the turnout rates of these diverse groups, the only way pollsters (and campaigns) can even begin to evaluate estimates of the electorate is by scrutinizing demographic subsets. When you look closely at turnout, you can easily see how demographic groups can punch above or below their weight—young people for instance, tend to be difficult to get to the polls, whereas their parents vote in higher numbers—and why modern presidential campaigns have become so focused on the ground game, targeted messaging, and other tactics.
Put another way, elections are about share and performance. Candidates for office seek to maximize the turnout (share) of segments of the electorate that are likely to support them while trying to maximize how they do (perform) with various groups in the electorate. And when it comes to pre-election surveys, accurate polling is about getting share and performance right.
Age, race, gender, education, and party identification are examples of demographic or partisan categories that are considered to be voter segments. An election outcome—or any given poll result—is the sum of the products of the size of each segment and how each candidate is doing with that segment.
We know what a poorly chosen survey looks like—a “pollster” goes to the mall and talks to whoever will consent to talk. A famous example of a problematic survey (although it’s a highly entertaining one) is the Kinsey Reports. This was a voluntary opt-in poll; the people who answered were the people who wanted to answer. When the results came back, the poll appeared to show that Americans were a bit more adventurous than previously believed. But there is very good reason to doubt the result. It’s a reasonable hypothesis that the kind of person who’s likely to opt into a poll about their sex practices may possess certain other idiosyncrasies.
The same is true of the subject at hand: The people who are most likely to answer a poll about politics are people who are fired up about politics, passionate about sharing their opinion. This is a situation that pollsters in our era are at pains to avoid.
The textbook definition of a well-chosen sample is what’s known as a random probability sample: Everyone in the target population, the one we’re trying to measure, has a chance of being chosen. With a hat tip to my first boss, Murray Edelman of CBS News, we can use the analogy of soup. Say I have people coming to dinner, and I want to make a big pot of soup. Before everyone comes over, when I want to make sure I’m not giving my guests bad soup, what do I do? Do I drink the entire pot of soup? No, I wouldn’t have anything left to serve. So I take a taste.
But what happens if I take just a little off the top? That might not be a representative flavor—the good stuff may have sunk to the bottom. So I stir it up. Good pollsters do with their polls what I’m doing with my soup. I’m not tasting every droplet—but every droplet of soup has a chance of being in my spoon.
That means I need to avoid missing big chunks of people, as pollsters used to when they called only landlines and not cell phones. Pollsters have continued to fix that, and now the better polls will contain responses from 40, 50, or 60 percent cell-phone users.
But choosing a good sample is just a first step. I may have drawn a statistically correct random probability sample of people I need to call. Now I have to contact them. But response rates are low, even after multiple callbacks—and they have been declining for decades, dropping from about 50 percent 20 years ago to an average of around 7 percent now.
That’s not a problem if the 7 percent of people I manage to contact is a random sub-sample of our entire sample. But we know, intuitively, that it’s probably not. Indeed, looking at census data will usually prove it. We may not have enough men, Latinos, young people, people without college degrees, or whatever.
In order to bring the sample back in line with previous measurements, pollsters weight. (The technical term is “post-stratification.”) In other words, because of sampling error or non-response, a particular survey may have too few young people, too many educated people, or too few black respondents. The exercise of weighting is about getting share right for those segments whose true proportion we know. For example, if we know (and we do) the exact proportion of the population made up of people over 65 years of age and under 29 years of age, we can weight the sizes of these groups in our sample to those actual shares. In telephone polling, it is almost always the case that pollsters succeed in talking to far too many older voters and far too few younger voters—the opposite can be true with Internet samples—so age weighting is ubiquitous.
Sometimes weighting has an impact on the horse race number, and sometimes it doesn’t. For example, if there is little or no difference between the attitudes (or turnout at the polls) of people under 29 and people over 65, it wouldn’t matter if you have too few or too many of one of those age groups. Again, intuitively, we know that this is not true.
No matter what, it is best practice to weight data to known shares—which is everything but party identification, more on that later—and cope with the challenges. One big challenge is that weighting assumes that the responses we get from each group are valid measures of the overall attitudes of that group, and that the only problem is that the size of the group is too large or too small. In other words, the age example above assumes that the young people the pollster talked to had, on average, the same attitudes as those the pollster didn’t talk to. We don’t know that for sure.
I was reminded of this all too clearly in an Iowa poll I conducted in September 2008 with my former colleague and good friend, Charles Franklin, who is now director of the Marquette Law School Poll. We didn’t have enough 18- to 29-year-olds in our sample—and we knew how many there should be in the general population—so we up-weighted them. The results went from a slight Obama lead to a McCain lead. We thought: That doesn’t make sense. How, if we gave more weight to the responses of the young people we did have, did it get better for McCain? We failed to consider that the 18- to 29-year-olds we did have, whom we had reached on their landlines, were living at home in Iowa with their parents and were more likely to be religious and conservative. We didn’t have enough young people, but the ones we had were too Republican. So when we fixed their size, we made it worse; we amplified their impact.
An additional challenge is that when it comes to pre-election polling, it’s often not clear what the population targets should be. We have precise data on the demographics of the general population and good data on the demographics of registered voters, but we don’t have “truth” on the demographics of who will vote. Pollsters use a variety of methods to determine whether the people they are speaking to are likely to vote, from simply asking whether they will vote to applying more complex models that weight respondents on a variety of questions, such as whether they know where their polling place is. Differences in those approaches are one of the chief sources of variance in polls, and they can easily throw things off. For example, in 2012, one of the main reasons some polls (namely, Gallup’s and the Romney campaign’s) missed the mark was that their weighting scheme assumed whites would make up a greater share of the electorate than they did on Election Day.
Party identification is the characteristic, or attitude, most closely tied to vote choice—and it matters whether you think party attachments are a characteristic or an attitude. Some scholars and pollsters believe party identification is akin to a demographic characteristic (such as age or gender) or—a bit less strongly—to a religious or sports-team attachment. Others believe party identification is an attitude like any other and varies with the times. I’m closer to the fixed-attachment school. But because of population replacement and changing individual attitudes, I believe that levels of partisanship can change gradually, more like a slow-turning battleship than a quick-tacking sailboat.
For example, after 2004, as people gave up on the war in Iraq and there was general disapproval of George W. Bush, party identification shifted away from Republicans. That was the biggest change in my lifetime. The rough parity of 2004 jumped to plus-7 percent for the Democrats in 2008.
Party identification happens to be one of the biggest factors influencing polling in this election. The current polls are pretty much in agreement that while Democrats may be a bit more loyal to Clinton than Republicans are to Trump, partisans are supporting their respective nominees by large majorities. Where polls differ, especially state polls, is often in their estimate of the share of partisans in the likely electorate.
Polls in the last couple weeks seem to show a dip in Democrats' support for Clinton. Is this a real change or, with recent news, are Democrats just less likely to answer surveys? Perhaps we are now seeing a situation in the current race as we did in 2012, when the race got closer in the polls after the first presidential debate. The margin narrowed, but there’s some evidence that was partly due to decreased participation by Democrats in polls. President Obama, by his own admission, did not do well in the first debate and Democrats were freaked out and may have wanted to retreat a little—an effect that has an influence on polling: "I don’t feel like talking for 20 minutes on the phone about politics.” It’s like when your team loses, and you don’t read the article about the game the next day.
All the news about Hillary Clinton—the Clinton foundation, the e-mails, the stumble leaving the 9/11 Memorial, may be having the same depressive effect.
You can easily see how pollsters would be tempted to weight to party identification. Do they? Some do, some don’t, and those that do would never admit they do. Differences in current polls, however, are not completely explained by differences in party identification. In this election, it seems hard to get a read on independents—who dislike both candidates! In the two polls granting Clinton her largest advantage (Suffolk and Monmouth), she is winning independents. In polls wherein the race is tied, or Clinton’s lead is more modest, she is losing independents.
The beauty—and for some pollsters, the tragic flaw—of polling is that all these questions will be answered on Election Day. But be charitable with the ones who got the race wrong. To be right, you’ve got to be good and lucky.
Ken Goldstein is the polling analyst for Bloomberg Politics and a professor of politics at the University of San Francisco.