Statistical Significance Is Overrated

Just because something can be observed doesn't mean that it's important.

Seeking R-squared.

Photographer: Ann Johansson/corbis/getty images

Ronald A. Fisher, one of the fathers of modern statistics, reportedly got on the nerves of many of his contemporaries. But if there’s a reason we should be annoyed with Fisher today, it’s for coining the misleading term “statistical significance.”

Those two words don’t necessarily mean that a finding is important or that an effect is big. It only means that the effect is clearly visible. It basically indicates how confident you can be that a result isn’t due to random noise. As a measure of that, statistical significance already has some major problems. In 2016, the American Statistical Association issued a statement cautioning against excessive reliance on p-values, a common measure of statistical significance. Researchers such as Andrew Gelman, John Ioannidis and many others have demonstrated how these measures can be misused and misinterpreted to make accidental results seem real.

But beyond these methodological problems, the idea of statistical significance has caused some problems in the way people read about and understand scientific findings. 

First, statistical significance doesn’t tell you how strong an effect is. For example, some studies find that low-skilled immigration drives down the wages of American workers without college degrees. This finding is statistically significant, but that doesn’t tell you how big the effect is. Look closer and you notice that most of the papers find a very small impact -- an inflow of 12 million low-skilled workers is predicted to reduce the wages of U.S. high-school dropouts by a few percentage points at most. That’s an effect worth considering, of course, but it’s not as big a deal as most immigration opponents would have us believe.

How could such a small effect be statistically significant? The answer is: lots and lots of data. When researchers gather huge amounts of information, even tiny effects can be detected. In the age of big data, it’s becoming easier and easier to find statistically significant needles in haystacks.

Because Fisher appropriated the word “significance” to mean detectability, there’s no commonly accepted term for the size of an effect. Economists sometimes call effect size “economic significance,” and biologists sometimes call it “biological significance,” but these terms are specific to their fields, and rarely get picked up by the press. As a result, many people reading about the latest research results in the news don’t realize how minuscule some of those findings are.

A second problem happens when people confuse statistical significance with explanatory power. For example, take the question of why people voted for Donald Trump. Many explanations have been advanced -- racial resentment, economic anxiety, hostile sexism, authoritarian attitudes and a feeling of voicelessness. Each of these reasons has adherents who believe it’s the key to understanding Trumpism. Each comes armed with studies showing statistically significant correlations between their preferred factor and Trump support.

But you don’t often hear about how much of Trump's support is explained by each. In statistics, there are measures of how well a model can explain the observed data -- the most famous of these is called R-squared, which measure the percent of the variation in the data explained by a researcher’s model. If you have an R-squared of one, your model explains everything. But if R-squared is close to zero, most of the reasons for whatever you’re studying remain a mystery.

I’ve never seen a model that manages to get close to explaining all of Trump's support with factors like these. The closest I’ve seen is a study claiming that racist and sexist attitudes explained two-thirds of the gap between college-educated and non-college-educated white voters. Even this claim, if it turns out to be true (the highest R-squared I could find in one of their tables was 0.5), would leave a significant fraction of the gap unexplained.

So people who look at these studies and say that “Trump voters are authoritarians” or “Trump voters are racist” need to realize that there are many different factors associated with Trump support, some of which nobody even knows. Just because these factors are statistically significant doesn’t mean they’re the whole story.

Many people who do statistical research in fields like economics are taught to ignore R-squared and other measures of explanatory power, and focus more on statistical significance. My hunch is that this attitude comes from pharmaceutical testing -- if you’re looking for a drug that reliably reduces headaches, you’re mostly interested in verifying that it works so that you can start selling it to customers. That means you want to focus on statistical significance. But when trying to explain big social phenomena, the question of whether we know the whole story or only a small piece of it is often of central importance.

So when you read about research findings, remember that statistical significance is only part of what you need to know. How strong an effect is, and how important it is in the real world, might matter even more.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.