When government surveys were fun.

Image: Thomas Worth/MPI/Getty Images

This Isn't 'Big Data.' It's Just Bad Data.

Peter R. Orszag is a Bloomberg View columnist. He is a vice chairman of investment banking at Lazard. He was President Barack Obama’s director of the Office of Management and Budget from 2009 to 2010 and the director of the Congressional Budget Office from 2007 to 2008.
Read More.
a | A

With response rates that have declined to under 10 percent, public opinion polls are increasingly unreliable. Perhaps even more concerning, though, is that the same phenomenon is hindering surveys used for official government statistics, including the Current Population Survey, the Survey of Income and Program Participation and the American Community Survey. Those data are used for a wide array of economic statistics -- for example, the numbers you read in newspapers on unemployment, health insurance coverage, inflation and poverty. 

An article in the latest issue of the Journal of Economic Perspectives underscores the alarming decline in the quality of the data from these surveys. Take the Consumer Expenditure Survey, which plays a crucial role in the construction of the consumer price index. In the mid and late 1980s, less than 15 percent of households contacted for the survey failed to respond. By 2013, the non-response rate had risen to more than 33 percent. The explanation is likely that families increasingly view surveys, whether by political pollsters or government officials, as too time-consuming, annoying, and intrusive of privacy.

It's possible to correct for this rising lack of participation in the surveys, by adjusting the weights attached to the households that do respond. Unfortunately, lack of participation is just a small part of the growing crisis in household surveys. A far bigger problem stems from the responses to the questions asked. Many responses are simply missing, either because the family refused to answer or because the interviewer failed to record the response. 

The government statistical agencies then often impute a response, by using statistical techniques to guess at what the response would have been based on responses from other similar families. 

An example involves the Current Population Survey, which is used for many purposes, including a gauge of household income and poverty. More than a third of the money recorded as being received by households from Social Security in this survey was imputed; about a quarter of the money recorded for Temporary Assistance to Needy Families, commonly known as welfare, is similarly estimated. These imputation rates have generally been rising over time.

All of that may be fine if the imputations were highly reliable. Unsurprisingly, that’s often not the case. The authors compare the dollars reported in the surveys (including the imputations) against administrative data on how much various programs actually sent to households. For TANF, the estimates suggest that half or less of the benefits provided are captured in official surveys. For food stamps, dollars are often undercounted by 30 percent or more. For Social Security, the bias is smaller, but still can range from 5 to 30 percent in major surveys. 

These errors have been increasing over time as the survey quality has deteriorated. And their impact is large: Adjusting for these gaps would, according to the authors, reduce the overall poverty rate by more than 2 percentage points and the poverty rate for single mothers by more than 10 percentage points. That, in turn, suggests that policy-makers are basing their decisions on data that's meaningfully different from what's happening in the real world.

So what is to be done? Three steps would be useful. The first is to more quickly link administrative records with survey data, to help correct for bias in the latter. The second is to explore new ways of combining the massive databases being collected in the private sector with official data. 

The final element is to protect funding for official surveys, to allow more efforts to reduce measurement error. For example, the Barack Obama administration has sought money to resume follow-ups on data left incomplete by respondents in the American Community Survey. When funding for that follow-up was eliminated in 2013, the missing data rate rose from 5.5 percent to 8.5 percent. 

Instead of reinstating that funding, the House’s appropriations legislation for 2016 further cut overall funding for the survey (and also for the decennial census, on which the survey builds). Penny wise and pound foolish will lead only to increasingly unreliable data -- and worse policy.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

To contact the author of this story:
Peter R. Orszag at porszag3@bloomberg.net

To contact the editor responsible for this story:
Christopher Flavelle at cflavelle@bloomberg.net