Researchers Need to Recognize What They Don't Know
Researchers in the fields of social science and medicine are debating how to fix an increasingly recognized problem: A lot of their findings are either outright wrong or can't be replicated. Satisfying as it would be to have a simple solution, to some extent they might just have to learn to live with uncertainty.
The medical statistician John Ionnides laid out the issue more than a decade ago, in a paper titled "Why Most Published Research Findings Are False.” Looking specifically at social science and biomedicine, he found that the limitations of available data, study design and the human mind meant that research outcomes often did little more than reflect the prevailing bias.
Others have found similarly troubling results. In one study, Federal Reserve economists were able to confirm the findings of less than half of a selection of 67 economics research papers. Another analysis estimated that 85 percent of biomedicine research is wasted, because results either never get published or rest on unscientific study design.
One big weakness is a measure that researchers commonly use to assess their results. Known as a p-value, it is supposed to indicate the likelihood that an outcome occurred by statistical fluke. If that likelihood is below 5 percent (a p-value of less than 0.05), the result is deemed "statistically significant." Yet researchers have so much freedom in designing studies that even the well-intentioned often fool themselves, tweaking the data and analysis until they yield “significant” results for the wrong reasons.
What to do? Some researchers want to set a more challenging threshold, lowering the required p-value to 0.005 (0.5 percent). While this could eliminate some borderline studies and reduce the number of papers published, it might also suppress some useful work. Experiments involving people are subject to a lot of natural variation, and researchers must often work with small data sets. As a result, some legitimately interesting effects or relationships deserving further study might not meet a p-value test.
Another group of statisticians has sensibly suggested that researchers drop the whole illusion of precision. Focusing on a specific p-value threshold, they argue, puts too much emphasis on one line of evidence. Such measures should be viewed in the context of other information, including the size of the effect and the presence of a plausible explanation for its existence.
Beyond that, various practices can help researchers avoid fooling themselves. Consider how physicists searching for evidence of gravitational waves kept their colleagues on their toes: They secretly authorized several team members to inject false data into the detectors, forcing everyone to take every possible precaution to ensure that they wouldn’t take some fluke for the real thing. Elsewhere, journals are pushing researchers to better document their work, and are moving to approve and register papers earlier in the process -- an approach that refocuses scrutiny on the quality of the scientific approach and the importance of the research question, rather than on the statistical significance of the results.
Medicine and the social sciences face a daunting task: They’re trying to answer questions that are both urgent for humanity and extremely difficult, due to the inherently complex nature of their subjects. Instead of seeking certainty, they might have to recognize that some problems overwhelm our current abilities to tease out any useful insight. No use of statistics can get around that.
This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.
To contact the editor responsible for this story:
Mark Whitehouse at email@example.com