Data

People Lie, But Search Data Tell the Truth

Looking to Google for a revolution in social science.

Truth zone.

Photographer: Michael Gottschalk/Photothek via Getty Images

Seth Stephens-Davidowitz, a former research assistant of mine, would not strike most people as a revolutionary. Yet in his new book “Everybody Lies,” he argues persuasively for a mutiny in social science.

The problem should be familiar to anyone who's followed political polling in the past few years, despite the successful predictions of Emmanuel Macron’s victory in France. Put simply, most people tend to lie on surveys and on social media, too. As a result, when we study people's responses to surveys or what they say on social media, we come up with a misleading picture.

Rather than disparage surveys and social media posts, Stephens-Davidowitz points to a different way of understanding ourselves. In the ostensible privacy of online searching, he argues, we inadvertently reveal ourselves, and this digital truth serum offers the best way of finding out who we really are.

Examples abound. According to survey data, Americans overall are not particularly racist, and any racism that does exist is more dominant in the South – a view that is often endorsed by the media. Yet online searches reveal a remarkable number of racist inquiries by Americans, and these searches are in no way limited to the South. Indeed, the highest rates for racist searches are found in places such as upstate New York, eastern Ohio and western Pennsylvania. The true racism divide is not North-South, it turns out, but East-West, with limited racist search behavior west of the Mississippi River. This pattern correlates strongly with presidential election results; in the local areas with the highest share of racist online searches, Barack Obama substantially under-performed, and Donald Trump substantially over-performed.

Another example involves homosexuality. Survey data and social media profiles suggest the proportion of men who report being gay is roughly twice as high in Rhode Island as it is in Mississippi. Yet Google searches of terms associated with gay pornography vary little across the country, and are only marginally higher in Rhode Island than in Mississippi – suggesting that the survey results and social media profiles in some states may not reflect reality. Indeed, in the states where under-reporting may be larger, spouses tend to be more suspicious. The most searched-for term on Google after “Is my husband…” is not “cheating” or “depressed” but “gay,” and that question is asked far more frequently in states where the survey reports are low.

Many other myths are exploded in the book, some by search data and some by other evidence. The notion that violent movies cause violence? Not correct. The crime data show that violence declines before, during and after the showing of violent movies – perhaps because people who would be inclined to commit violence instead go to see the violent movie, and given the association between drinking and violence, the diversionary effect lingers because movie theaters generally don’t serve alcohol.

Another accepted idea, first offered by the historian James McPherson, is that the Civil War caused Americans to shift common usage from “the United States are” to “the United States is.” Nope again: A search of digitized books shows that there was no noticeable shift around the time of the war, and “the United States are” remained common for 15 or more years afterward.

Consider, next, the assumption that people start out liberal and become more conservative as they age. Again, not really. Instead, what seems to matter is an imprint effect that occurs when people are 18. Americans born in 1941, for example, turned 18 during Dwight Eisenhower’s presidency. And by about 10 percentage points, they have tended to be lifelong Republicans. A similar phenomenon applies to sports teams: A person’s favorite baseball team tends to be one that won the World Series during his or her childhood.

All of this would be merely amusing if it left us with only a collection of punctured myths. But Stephens-Davidowitz aims higher, writing that “Google searches are the most important dataset ever collected on the human psyche.” Therein lies the power of his new book: While acknowledging the limitations, Stephens-Davidowitz argues that big data can rescue social science from its garbage in-garbage out problem.

We are still early on this journey, but “Everybody Lies” provides the ballast to suggest it’s the right road.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

    To contact the author of this story:
    Peter R. Orszag at porszag5@bloomberg.net

    To contact the editor responsible for this story:
    Mary Duenwald at mduenwald@bloomberg.net

    Before it's here, it's on the Bloomberg Terminal.
    LEARN MORE
    Comments