Policymakers Don’t Care About Your RCTs
Michelle Rao on why policy evidence doesn’t move spending
People often assume that when rigorous evidence shows a policy works, policymakers will want to expand or scale that work. The standard model for how you do this is you run a pilot program, evaluate it, and then use those findings to decide whether you want to expand the pilot or shutter it. Makes sense, right?
Rao’s recent study suggests that this isn’t how the world works. Studying Conditional Cash Transfers (CCTs) across Latin America, she finds that policymakers don’t systematically increase or decrease spending based on new evaluations, even when the findings are big, surprising, or high-quality.
This should make us rethink how we approach policy evaluation and what it means to actually influence decision-makers.
Recently I’ve been wrestling with how we should think about evidence-based policymaking in a world where most studies of social interventions fail to replicate. Elsewhere, I explored how this fact should influence the type of policy goals and political work that well-intentioned advocates engage in. Rao’s study provides a chance to look at another link in the chain, the connection between policymakers and evidence.
To give credit, I found this paper via a Twitter thread that sums up the paper’s findings well, as Rao does herself in a World Bank guest blog post. I’ll summarize the findings here, connect them to a key model of policymaking, and explore what this means for researchers and funders.
What are CCTs?
Rao’s able to study this dynamic because CCTs are so popular in Latin America. The most famous one of these is Brazil’s Bolsa Familia. Essentially, many Brazilian families were so poor that they kept their children home to work instead of sending them to school. They did this because they might starve without the extra wages, but in the long run those same kids lose out on an education and the ability to make even more money to support their families later. Education is important!
Bolsa Familia basically says “send your kid to school and get them vaccinated and we’ll give you more money than your kid would have made by staying at home and working that low wage job.” It’s a win-win-win: reduce poverty, get kids educated, reduce the spread of disease. Brazil isn’t the only country to realize that this is a good policy and it’s been adopted and evaluated all over Latin America.
What does Rao find
Rao uses this wide-but-varied adoption to her advantage. Because even though many countries adopted CCTs, there’s plenty of variation in terms of when they were created in each country and how much money they spend on them.

So if policy wonk philosopher kings were in charge of the government, how would we expect this to play out? Let’s say Ecuador sees Bolsa Familia and decides to copy it, but wants to proceed cautiously because programs don’t always perfectly adapt in new settings. The program could reach enough people to justify $100M in spending, but they start with a $10M pilot.
If those results come back positive, wouldn’t you expect Ecuador to expand the pilot to $20M, $30M, maybe even all $100M? Everyone isn’t always perfectly rational, but isn’t this kind of program expansion the whole reason you do the evaluation in the first place?
Rao finds that this isn’t how the world works. There’s no correlation between the production of new evidence from these studies and spending on these policies. Ecuador would be no more or less likely to expand a $10M pilot to $20M after getting back a good evaluation of the pilot. And it works in both directions, if there’s a surprise negative finding (relative to the CCT literature) policymakers also don’t reduce spending.
Rao kicks the tires on a variety of different theories that might explain this headline finding (studies don’t matter) while showing that some studies matter more than others. At times, this involves some Bayesian math that’s over my head, but I’ll trust she knows what she’s doing. Here’s some examples of the sensible models and questions she tests out:
Are policymakers not changing their mind based on one study because they’re already aware of the overall CCT literature?
Do they respond more to “surprise” findings that deviate from the literature?
Does the way researchers frame their results in their abstract matter, since this is the kind of Executive Summary most busy policymakers would look at?
Do policymakers care about high quality evidence like randomized control trials (RCTs) and respond more to those?
Are policymakers worried about the replication crisis, responding more to evidence that’s more generalizable and likely to scale?
The answer to all of these questions? Nope. None of these factors impact policy spending on CCTs.
The importance of timely studies
So much of the policy wonk community cares about doing evidence-based policy, built on high quality studies that are likely to generalize. As a policy wonk, I’ve spilled ink on those subjects myself. What Rao is telling us is that, at least in this big cross-country study, those factors just don’t matter to policymakers.
But wait! It’s not a nihilist “nothing matters” paper. In exploring all these different models, she does come across one factor that does make a difference. What policymakers do seem to respond to, at least in this study, is “actionable” evidence.
Primarily what makes a study actionable is how timely it is. Here, this is defined as how much time passed from the study period to the publication. So if you publish a study in 2015 on CCTs about the period 2000-2010, that’s a 5-year lag (which is about average).
Rao finds that the shorter the lag time, the more impact the study has. Rao also finds that studies are much more likely to be “actionable” and influence policy if the same party is in power when it’s published as when the study began. Both of these make sense and feel related. If it takes you 10 years to publish a study then the chances that a new government is in power go up.
This also jives very well with one of my favorite models of American politics, the “punctuated equilibrium” model from Frank Baumgartner and Bryan Jones’s Agendas and Instability in American Politics.
Windows of opportunity are key and policymakers are busy!
Baumgartner and Jones did a bunch of empirical work and found that US policymaking kind of proceeds in two modes. Most of the time, policy in a given domain changes slowly and incrementally due to heavy friction in the government, our famous checks and balances. But occasionally there are huge, dramatic changes in a policy area, which then settles back down into only incremental changes.

Why does it work this way? Here’s their model.
Policymakers have “bounded rationality” (a term that comes from Herbert Simon). It’s a fancy way to say that policymakers have a limited attention span and limited knowledge. Their job covers so many issues and they only have so many hours in the day.
But sometimes the political stars align because of factors like public scandals, media coverage, or outside advocacy pressure. To pick an example from the book, federal legislators ignored nuclear power as an issue until Three Mile Island drew attention to it.
In those moments, a window of political opportunity opens and different politicians maneuver to try and get their own solutions implemented so they can take credit for solving a high salience political issue. In those moments, policy doesn’t move incrementally, it shifts suddenly and dramatically. But then it settles back down to a new equilibrium with plenty of friction, policy moves slow again, and policymakers shift their limited attention to a new issue.

What’s the upshot here? Rao’s evidence fits this model. Policy change doesn’t happen incrementally in response to new evidence. Instead, evidence only matters when political conditions make it actionable. Policy change only follows evidence when it is timely and politically convenient.
A two track approach to policy research
This doesn’t mean that the less timely research is totally useless. If policymakers don’t respond to evaluation results in the short term, how else might research influence policy? It could:
Shift policy narratives,
Influence policymakers in another country or jurisdiction,
Provide justification for action down the line (when a political window opens), or
Help refine program implementation rather than just affecting spending.

High quality evidence in particular is critical and we need to pay more attention to the replication crisis (I wrote more on that here). But, in light of Rao’s findings, researchers and research funders in the social policy arena should think of their work on two separate tracks:
Rapid response piloting and evaluation for live political windows, even if the study is imperfect.
Rigorous evaluation for long-term evidence-building.
What’s out? Everything in the middle. Mediocre policy evaluations that neither build a timely case for public funding expansion nor contribute to the body of rigorous evidence.
I’m sympathetic to researchers here. It’s good that they want to produce high quality evidence, but our research institutions are just not set up for rapid response. It’s genuinely challenging to set up the partnerships and data for a good study, run it through IRB approval, clean up the data, go through the peer review process, and then publish findings.
Which approach to take depends on the political moment. In those rare moments when a policy window opens, speed matters more than methodological perfection. If you take five years to produce a gold-standard study, the moment is gone. Go with Approach 1.
When it looks like there’s no political momentum, focus on Approach 2, using the luxury of time to create long-term evidence.
In my experience, funders of research are bad at the “rapid response” part of Approach 1 above. This is particularly acute and bad in the hard sciences, where institutions like the NIH were so incapable of rapid response funding for COVID research that private alternatives like Fast Grants had to fill the void. But it’s bad in the social sciences, too, and we need more work like Fast Grants happening for social policy research.
I’ve followed Arnold Ventures in this area, they’ve done inspiring work and been at the forefront of thoughtful evaluation that balances being high quality and engaging policymakers on issues that you can actually influence them on. I’ve been particularly inspired by the work of Jennifer Doleac and others in the Criminal Justice space there.
Conclusion
So what have we learned?
Policymakers don’t respond to new evidence in the way we might assume. Rao finds that research findings have no clear correlation with government spending. The only thing that seems to matter is whether the research is timely and politically convenient.
Policy change follows political windows, not incremental learning. While Rao studies Latin America, her findings fit neatly into Baumgartner and Jones’s punctuated equilibrium model of American politics. Most of the time, policy moves slowly due to institutional friction, but when a political window opens big change can happen fast. Research only influences policy when it’s aligned with that window.
The right approach to research depends on timing. When a political opportunity is open, focus on quick-turnaround, actionable evaluations that arm policymakers with evidence they can use right now. When no window is open, invest in rigorous, long-term research that can be ready when the moment arrives. Everything in the middle is a waste of time.
Finally, let’s recognize that this is one study and that it only covers political spending on a particular set of policies in one swathe of the world. But I hope that more research like this follows. Rao’s study forces us to examine how evidence and policymaking actually interact, not how they should interact. If we want research to matter, we need to meet policymakers where they are, not where we wish they’d be.
Addendum: Massachusetts/Arnold Ventures Collab
Coincidentally, one of the next things in my reading pile after I wrote this (yet, I have real piles because I still like to print things out) was an Arnold Ventures evaluation of their Massachusetts Juvenile Justice Pay for Success Pilot. It’s a creative idea that gives some clear structure and tightens some of the feedback loops that Rao’s studying.
Basically, rather than having the government (in this case the state of Massachusetts) pay per unit of service, they instead pay based on outcomes based on a robust, independent evaluation (an RCT and a difference-in-differences (DiD) study). In this case, it was a program to reduce recidivism and improve job prospects for the formerly-incarcerated. Funders (including Arnold Ventures) front the money and the state of Massachusetts agreed to pay them back with interest if the evaluator found that Roca met certain agreed-upon goals.
It’s a fascinating experiment with how government procurement should work in some sense. And models like this, where the state pre-commits to spend certain money depending on evaluation results, addresses some of the disconnect between policy evaluation and political spending from Rao’s study. You could imagine a similar concept with the state paying for a pilot and then pre-committing to expand it contingent on certain outcomes.
In this specific case the results were a bit unsatisfying. The Pay for Success project was an experiment with lots of moving parts and partners. They ran into challenges recruiting enough participants and relying on inaccurate state data, plus it was impacted by the pandemic. One of their evaluation advisors said that the end result was an evaluation that was “extremely underpowered and only marginally informative.” All that being said, as Megan Stevenson noted, lots of criminal justice interventions like this fail, so the null result shouldn’t surprise us.
Lots of work for a null finding, but this is why you experiment! As they note, many of the challenges here could have been solved with a strong pilot phase and engaging a few additional government entities to flag some of the data and referral issues they encountered. Either or both of those would have allowed them to address some of those issues that hampered the program delivery and evaluation.
Kudos to Arnold Ventures, Roca, Massachusetts, and others for trying this out, I hope others follow in their footsteps.
The implicit hypothesis in this piece seems to be that data "ought to" move policy. I find this naive, but perhaps I'm extremely jaded. I would assume that data has (almost) no influence on policy until proven otherwise (i.e. it's the Null Hypothesis)!
Perhaps it's because of my heavy exposure to healthcare policy, by way of pharmaceuticals (my professional field). There is this charming expectation among academics in the pharmacoeconomics field that their analyses, if only paid attention to by whomever's in charge, would lead to a more efficient / equitable / [insert liberal nostrum] healthcare "system". I'm leading the witness here, but you can see the problem: nobody's in charge of healthcare (in the US), so it's not a "system". All of it is transactional, so nobody has any use for some disembodied view of what the "right" thing to do might be; instead, everybody has bills to pay, shareholders to placate, etc.
Now, extending this theme beyond healthcare: "policy makers", in countries with elected, representative governments (more or less), ultimately respond to "politicians" (or they might be one and the same). Thus, "policy" is downstream of "politics" and we know how little these rely on data and studies. You could have the best data in the world to show that free needles for drug abusers have good societal outcomes, or that jailing people for crimes have bad ones, none of this matters as much as what voters believe. And data don't change beliefs very much. So, it's actually interesting when studies *do* have an effect on policy (and opinions).