For Better Schools, Start With the Evidenceby
Presumably, we’d like to think that most government regulations begin with a firm grounding in evidence that they’d have the intended result. After all, U.S. federal regulations must undergo a cost-benefit analysis to determine that they make economic sense, and the opinions of Congressional Budget Office analysts on the fiscal impact of reforms can help sway opinions. Even so, most policy changes in America and worldwide are based on ideology rather than strong evidence of what works.
A nascent, evidence-based policy movement is trying to change that. At the same time, a close look at efforts to improve education demonstrates how very far we have to go before “evidence-based policy” can be much more than a slogan.
Researchers testing new medicines often use the “randomized control trial” as a powerful tool to see if drugs work. Take a bunch of patients with, say, the flu. Divide them into two groups by a coin toss, give one group your new wonder drug, while the other group gets tea, toast, and bed rest. If those given the wonder drug get better, faster, than the “control group,” you are on your way to Food and Drug Administration approval and riches. The supporters of evidence-based policy advocate the same approach for policy.
For many big government decisions, this is impossible. Ben Bernanke can’t flip a coin and raise the interest rate 1 percent in half the states and keep it the same in the rest. But there are some policy interventions that could be tested with a randomized control trial.
Take the U.S. Department of Education. It runs a “What Works Clearinghouse,” which provides examples of “proven” techniques to improve learning outcomes in America’s schools. To be included in the clearinghouse, evidence of success must come from a randomized trial a school setting. Want to improve third graders’ reading? The site has 20 different third-grade literacy interventions, from Earobics to Spell Read, that have passed the randomized control sniff test. Implement all of the “what works” recommendations, and presumably we’d be able to ensure that every child gets a decent education.
But the problem with the “what works” approach is that it relies on the idea that one randomized trial in one place can demonstrate what works generally. That’s usually true of medical interventions: It is rare that a successful cure for a disease in California would have no effect on patients in Colorado or Chile. But new research by Justin Sandefur and Lant Pritchett of the Center for Global Development suggests that the same is not so true of policy experiments. People are much more complicated than bodies. People in different settings respond differently.
Back to education: A randomized study of class sizes in Tennessee by Princeton’s Alan Krueger found that kids in larger classes performed worse in tests. A randomized study by MIT’s Esther Duflo and colleagues in Kenya found the same relationship—but the impact was about one-quarter the size as what Krueger found in the U.S. Another randomized study by Duflo’s MIT colleague Abhijit Banerjee and colleagues in India found a weak positive relationship between class size and student performance: The bigger the class, the better the students performed. Whether making class sizes smaller improved test scores depended on where it was tried.
It isn’t just where the policy change happens—it is who manages the change. Another randomized study co-authored by Duflo found that in one region of Kenya, contract teachers hired by an NGO helped to improve test scores in schools. But an evaluation co-authored by Sandefur of a scaled-up program in the same country found that contract teachers hired by the government rather than NGOs had no impact on test scores at all.
That doesn’t mean using randomized trials is worthless for making broader policy conclusions. First off, a negative randomized control result provides valuable evidence that something doesn’t work in all settings. Take the randomized trials that have looked at microfinance—the practice of making very small loans to very poor people. The fact that reported impacts were small or absent was a useful corrective to the hype that implied small-scale lending was a global panacea for poverty.
Second, and more positively, Sandefur and Pritchett argue that varying results from the same policy intervention across different regions and implementing agencies suggest not that evaluation is pointless, but that we should be experimenting and evaluating much more often. In some cases, it will turn out that a new policy has broadly similar effects when evaluated in many different places: Cash transfers to poor families have been evaluated in a range of different communities around the world and pretty consistently lead to better health and education outcomes for kids, for example. But for many—perhaps the majority—of policy problems, there isn’t a one-size-fits-all solution. When an expert says, “If it worked in Cleveland, it will work here,” ask why it worked in Cleveland, and whether the conditions the same here. Then try it, and evaluate it. And then try something else. And so on.
This need for constant experimentation in policymaking means we can’t outsource innovation and testing to a small elite of PhDs like we do with drugs. Everyone engaged in making policy needs the freedom and encouragement to experiment, evaluate, and reformulate policies. Of course, that implies many policy experiments will fail. But politicians and bureaucrats should not be punished for trying something new, only to find it doesn’t work. The true failure is favoring principles or ideology over evidence, which leads to snake oil solutions based on nothing more than fear, ignorance, and prejudice.