What do we do if nothing works?

An important question for anyone who wants to make the world a better place

Dec 09, 2024

I recently enjoyed digging into a recent symposium from my favorite think tank (Niskanen Center) and Vital City based on a controversial law review article from Megan Stevenson. In the article, she examines the past 50 years of randomized control trials (RCTs) of different interventions in the criminal justice space and finds that almost none of them work, particularly when we look at replication or scaling to new locations.

That is true for all sorts of interventions: job training programs for re-entry, cognitive behavioral therapy to reduce violence, modified probation protocols, etc. Name a darling program and it likely does not replicate or scale using RCTs, which are some of the best quality of evidence available. It’s a great article, remarkably plain language for an academic law review article, I recommend reading it.

This argument cuts to the core of the entire nonprofit world. The symposium features a bunch of responses, including from some of my favorite social scientists like Alex Tabarrok and Jennifer Doleac.

First, I’ll summarize Stevenson’s article. Second, I’ll give an overview of the most interesting responses. Third, I give my own takeaways.

A thought-provoking and visually layered digital illustration inspired by a symposium on criminal justice interventions and the complexity of social change. The central image is a large, symbolic balance scale, one side weighed down by social programs like housing, job training, and criminal justice interventions, represented as abstract icons. The other side carries structural forces like tides and stabilizers, illustrated as waves crashing against rigid, immovable barriers. Figures in modern and historical attire debate around the scale, representing competing perspectives. The background is divided into vibrant colors and shadowy tones, symbolizing optimism and the challenges of systemic reform. — This is GPT’s attempt, I think, to show people debating and weighing different policy solutions?

Stevenson’s argument

Part 1: nothing works

RCTs are in many ways the “gold standard” of evidence in both hard science and social science. If we look at RCTs for most criminal justice interventions over the last fifty years, it turns out that most of them don’t work.

Sometimes there’s a promising intervention, like a job training program for re-entry, that generates some promising data. But there’s two problems:

Often when we have some preliminary good data from that program from surveys or other weaker forms of evaluation, we do an RCT and find that actually the program doesn’t have the impact we thought it had.
Even when an RCT shows positive impacts of a program in its original context (ex. Hawaii’s Opportunity Probation with Enforcement program), if we try and take that program to another place and do an RCT there we usually find that it doesn’t work. In the case of that Hawaii probation program, they even tried to just do another RCT study in Hawaii and found that it didn’t work even in the same place as the original study.

Stevenson isn’t the first to notice this. It sounds a lot like what sociologist Peter H. Rossi called the Iron Law of Evaluation:

The expected value of any net impact assessment of any large scale social program is zero. The Iron Law arises from the experience that few impact assessments of large scale social programs have found that the programs in question had any net impact. The law also means that, based on the evaluation efforts of the last twenty years, the best a priori estimate of the net impact assessment of any program is zero, i.e., that the program will have no effect.

In other words, we have so much evidence that past programs don’t work, we should start off by assuming a new program idea doesn’t work unless we have high quality evidence to the contrary.

Stevenson alleges that this reveals something broader about the social world, as a similar phenomenon (“nothing works”) has been observed in education, sociology, global health, and other fields. Stevenson uses this to challenge what she calls the “engineers view” of the social world, namely that we can look at the social world as a mechanistic system of cause and effect and target a particular intervention to change one part of that system and see a bunch of dominos fall. Most of what she finds doesn’t replicate is this kind of domino effect, what she calls a “cascade.”

Here’s one example cascade theory, Stevenson gives us the pitch:

People released from prison are extraordinarily vulnerable. They often have no money, no home, and little prospect for employment with a felony conviction on their record. This is a pivotal time. If they can find a job, they can begin the process of reestablishing themselves within the community. If they can’t, they are more likely to fall back on the same behaviors that put them in prison in the first place and may end up back there.

This is an opportunity for us to intervene! We need to invest resources to help recently released individuals find jobs. We need to connect them with employers and support them in creating a resume or prepping for interviews. Maybe we should subsidize their employment for a few months to help establish good work habits and add recent employment to their resume. A program like this has the potential to yield very high dividends, since it will launch people onto a new life trajectory with increased rates of employment and reduced rates of crime and incarceration. And you don’t need to pay for people to have a job forever, because it’s really just the transition point that’s an issue. After that critical period is over, they’ll have a stable job, be integrated into community, and be good to go.

This theory of change is a cascade because one intervention (helping someone get a job at a critical transition point) can have ripple effects and alter the course of the person’s life.

Great story, right? I certainly bought it. But turns out that job training re-entry programs have none of these spillover effects. They’ll get a job during the training program, but that’s it.

Stevenson uses the metaphor of trying to keep an orange at the top of an empty bowl. You can push the orange up to the edge of the bowl and it can stay there while you hold it, but as soon as you’re done holding it up it rolls back down.

Part 2: why does nothing work?

All of that is still just Stevenson’s description of the current state of affairs. But why do these interventions fail? Why do these cascades not happen, even when we can tell a pretty compelling story about why they should work?

Stevenson says that essentially whatever reasons that people aren’t already achieving what we want them to achieve (e.g., more money, good health, stable job, not committing crimes) are due to structural constraints in the environment. She calls these “stabilizers,” and defines them as “the set of socioeconomic forces that resist externally-imposed change.”

This doesn’t always need to mean that things stay static, like the orange example implies. It can just be that there’s a much bigger macro effect going on.

Let’s say you hope to see a job training program increase employment levels, but a recession happens so none of the program participants can find a job for reasons that have nothing to do with the program or their felony record. And then you study the same job training program in the post-pandemic tight labor market where people are desperate for workers so ex-cons are being hired whether they’re in that job training program or not.

In both examples, the same structural factors impact both the control group and the intervention group in the RCT study. Maybe the job training program is well-designed, but its impact is too small to be significant in the face of macroeconomic conditions.

Stevenson uses the metaphor of the tides here: the water level changes according to the tides. An intervention is like a small breeze that moves some water around in a particular place, but doesn’t change the overall direction of the tide. In the long run, the breeze has no measurable impact on the body of water.

Part 3: what do we do with this information?

Here’s a summary of Stevenson’s argument up to this point: RCTs are some of the best evidence we have for any intervention. RCTs consistently find that criminal justice interventions don’t work, especially ones built on the “cascade” model of a small change having broad ripple effects for the rest of someone’s life. When one RCT does work, that same intervention almost always fails to scale or be replicated. Given all that, we should stop banking on the “cascade” theory of change and place greater weight on structural “stabilizers” as a cause of people’s outcomes.

A dark and brooding artistic rendering of the themes from the symposium on criminal justice interventions, with a focus on nihilism and systemic failure. The central image remains a large, symbolic balance scale, but it is crumbling, with one side weighed down by fragmented social programs and the other side engulfed by ominous tidal waves representing structural forces. The figures debating around the scale are now shadowy and despondent, depicted in muted, somber tones, with despair evident in their postures. The background is stark, featuring stormy skies and jagged silhouettes, emphasizing a sense of futility and the relentless, overpowering nature of systemic stabilizers. — GPT’s more “nihilist,” discouraged version of the image above..

Pretty discouraging, right? She offers three potential ways we can respond, assuming we agree with her conclusion that cascades are a myth and constraints are binding:

Focus on direct interventions

Stevenson’s finding suggests that giving someone transitional housing for 6 months doesn’t have 50 ripple effects throughout their life or make them more likely to have housing after that 6 months is over. But they sure are in a house for those 6 months.

So.. maybe if we want someone to be housed, then we just put them in a house? Stevenson frames this in reference to the common parable “give a man a fish and feed him for a day; teach him how to fish and he’ll eat for a lifetime.” Stevenson says that evidence tells us that the second part of that parable is wrong, but it doesn’t mean we shouldn’t just give the man a fish. You can apply this to a lot of interventions: feed hungry people, house the homeless, give poor people money. All of those work directly, you can indeed feed a hungry person, but you should expect the effect to wear off as soon as you stop giving her the food.

Uncertain incrementalism

Screw RCTs. We should trust our instincts: if a program seems like it’s helping someone but the RCT shows no evidence, we should trust our gut and assume that the intervention is having a positive effect, just one so small that it’s not showing up in the RCT. But the caveat is we can never be sure, and if the impact is so small it doesn’t show up in an RCT there’s no reason that it’s more likely to be a small positive effect than a negative effect. So we need to be careful. And this isn’t just theoretical, some good-sounding programs do back-fire and have the opposite of their intended effect (see Pahlka’s article below).

Systemic reform

Okay, RCTs show that “limited-scope, isolable interventions rarely lead to meaningful change.” So we need to go bigger! Including “changes that are so large in scope that experimental evaluation is infeasible.” She gives prison abolition as an example and mentions that this type of work “requires changing the hearts and minds of large numbers of people, as well as changing the concrete structural factors of our lived experience.”

Like lots of systems thinking practitioners, she emphasizes that this work is complex, full of unintended consequences, and that when we do this work “we are flying half blind.” There’s no reason that these systemic reforms need be left wing or right wing, she gives examples from both ends of the political spectrum who reject incremental, RCT-like work in favor of sweeping revolution.

Symposium Responses

Big claims from Stevenson! So Niskanen and Vital City convened social scientists, public servants, philanthropists, and others to respond. I read all of the responses in the symposium. I’ve included a brief summary of something I found interesting in most of them (I left a handful off that didn’t stand out as much to me). Sometimes what felt salient wasn’t the author’s main thesis, so I’m sure that if you read some of them you’d have different takeaways. The Vital City editors have their own summary of trends across the articles in the introduction to the issue

If you only have time to read three of these, read the first three that are bolded. My personal conclusions from all this reading are in the last section below this one:

Alex Tabarrok has a wide-ranging response, but his primary lens is that interventions that try to change people’s preferences fail, but if we change people’s constraints and incentives they often do work. The whole idea of the “cascade” that Stevenson describes as a myth is premised on the idea that you can change someone’s preferences and that will ripple out going forward. He also contends that “if an individual wishes to change, these programs aren’t necessary.”
- I also appreciated another point of his: if it’s really impossible to change people’s behavior, then why do large companies like Amazon and Google pay people tons of money to change what their customers do? Surely some kind of behavior change is possible or these companies wouldn’t pay those people tons of money to increase their profits.Mark Gray has a primer on social science methods if you need to brush up.
Anna Harvey is a bit of an optimist – she thinks Stevenson omits a whole swathe of quasi-experimental, large sample size studies that are just as legit as RCTs. She lists out some interventions that have been found to work using those methods, including reducing cash bail without increasing recidivism.
Sherry Glied says, basically, why are we so surprised? Research is hard! She compares it to medical research, where 2% of drugs investigated end up making it to market. She also makes another solid point: if there was something super obvious that worked, our ancestors probably already figured it out. The low hanging fruit has already been picked and we’re investigating harder problems now. (John Arnold makes a similar point in his article)
- Findings of null effects can change: she shows how Medicaid expansion changed the previous view that health insurance doesn’t impact mortality. Turns out it does by a small amount that didn’t show up in previous studies, but at the national level (330M people) those small gains end up saving a lot of lives per year.
Jennifer Doleac disaggregates the issues with RCTs into four categories, says social science has good tools to address 3/4 of them (things like pre-registering your studies to guard against publication bias). The fourth issue where we don’t know as much is exactly why interventions don’t scale well, so she suggests more research specifically on that question.
- Side note, she is the best, check out her podcast on evidence-based criminal justice reform. For a sample episode, this is on restorative justice diversion programs. It’s a great podcast for learning about different interventions and about how the sausage gets made for high quality, quantitative social science work.
Chloe Gibbs zooms in on evidence around early childhood education, specifically Head Start programs, to make the point that “we can actually learn a lot from mixed findings and variations in effectiveness.” A lot of areas of social policy RCTs are pretty new and each time something doesn’t work (or we find it works with one population and not another) it helps us generate new hypotheses to test to further refine our understanding of exactly why certain interventions do or don’t work with different people and different contexts. So if something doesn’t work that’s okay, it’s part of the knowledge generation process.
Marc A. Levin is also an optimist. Turns out, most of Stevenson’s evidence is from a 2006 review of RCTs. There have been a lot of RCTs done since 2006 and some of them have found positive impacts! In addition, RCTs have been a valuable tool in finding some interventions that were actively bad, like DARE drug programs, so we shouldn’t discount them as an important tool even if they’re finding null results.
Matt Grossman advocates for humility: the world is hard to change. But he argues against both left and right wing people who take this difficulty as evidence that we should opt for some sort of grand revolution instead of incremental change.
Jonathan Rauch takes a journalist’s perspective and centers the value of touching grass and actually going out and talking to people. Don’t over-rely on data, he says.
John Maki thinks that the process of trying to do evidence-based policy is as important as the outcomes. He worked in Illinois and there’s lots here about the value of convening and relationship-building.
Jeffrey Liebman makes a variety of points, but my favorite is that we should evaluate any program against the baseline of what would happen if we just gave participants the same amount of money in cash.
Jennifer Pahlka notes that interventions often backfire. For example, abortions went up nationally after the overturn of Roe v. Wade.
Candice Jones says that the idea that policy is driven by what’s evidence-based is wrong: political ideology drives policy and then politicians look for studies that confirm what they already want to do.

My conclusions and takeaways

Some things do work!

Overall, Stevenson is right that most things don’t work, and that we should be particularly skeptical of cascade theories. Almost everyone in the Vital City symposium agrees with this.

But the difference between “nothing works” and “almost nothing works” is a pretty important one. Harvey, Glied, Levin, and Gibbs all highlight some examples of more recent work and non-RCT, equally-valid studies that have found positive results. Doleac shows how the quality of research is improving. Tabarrok notes that for-profits can influence consumer behavior in ways that mirror what social interventions try to do, so unless we think they're all lighting money on fire then it's probably possible in the social realm.

But I still think we should reckon with the facts and social theory that Stevenson presents and largely down-weight how much resources we should be spending on interventions like the ones she describes. And we should start with the assumption that a great-sounding program doesn’t work unless we have evidence to the contrary.

This is equally a challenge to left wing people and right wing people

Most of my friends are left wing, many more than I am, and whenever I’ve talked to them about this article and symposium they see it as confirming their structural view of problems.

The progressive view of things is that the government should be providing more services to people. And many of these services are part of the same engineer’s view of the world and social cascade theory. I’ve often heard it said by progressives that the entire nonprofit service world is just doing what the government would be doing in more enlightened and progressive countries. Nonprofits/NGOs then are part of the "neoliberal" approach to privatizing state functions.

But if what Stevenson finds is that a ton of what’s happening in the nonprofit world is useless, then that also means a ton of what progressive governments are doing (or what progressives want the government to be doing) is also useless.

But right wingers shouldn’t gloat too hard. Sure, people on the right are often more cautious about unintended consequences of social interventions because of Burkean epistemic humility or general prudence. But there’s also plenty of examples of conservatives trying to influence people’s lives and roll back societal shifts that they bemoan.

For one, declining marriage rates. George W. Bush’s administration attempted to influence this with policy and programmatic interventions and the results were not great. Here’s a good New Yorker article about it.

If there aren’t policy levers to be pulled about these cultural shifts that conservatives bemoan, it’s easy for conservative politics to devolve completely into “man yells at cloud” territory.

I choose the systems approach

Of Stevenson’s three options, I choose the systemic reform approach. To me, the lack of high quality evidence around most direct service interventions is further validation of much of the move in parts of philanthropy, law, and other fields away from direct service funding and towards a systems change framework.

While I do believe that some interventions work, our assumption should be that they don’t. Or that for every person they help they may inadvertently harm someone else. Even if they’re adapting a successful evidence-based practice from elsewhere.

This is a hard pill to swallow! The stories that we can tell about these interventions are so persuasive (see the job training re-entry example above) that it’s hard to acknowledge the evidence that they don’t work in the way it’s described. But we should acknowledge this evidence and not put our heads in the sand.

It’s hard to use Stevenson’s framework: Medicaid as case study

I appreciate that Stevenson doesn’t just drop a bomb and say “you guys figure out what we should do with the fact that nothing works.” She does try to offer three potential approaches to social change work that reckon with the lack of reproducibility.

But as I thought of the nonprofit work that is funded by philanthropy, I had a hard time categorizing different interventions.

Let’s take Medicaid Expansion. As Stevenson and Glied point out, giving people health insurance has been treated as an intervention and we’ve had different results of its effectiveness over the years.

But should we actually think of that as a direct intervention being done forever, akin to putting someone in government-funded housing for their whole life? It’s basically the state paying for poor people’s health expenses. In some ways that’s just a monetary transfer, even if we assume it has no spillover effects. You need to go to the doctor with or without health insurance and the state will pay for it, putting more money back in your pocket and/or reducing your amount of medical debt.

Or is it a cascade theory? Giving someone access to medical care means they’ll do proactive, preventative care or go to therapy, putting them on a healthier trajectory, allowing them to be more productive and self-sufficient members of society. Pretty soon they get a great-paying job and don’t need Medicaid anymore.

Or is Medicaid expansion a systemic reform? In states like Missouri where it took a 10-year battle post-ACA, there was real organizing that took place that shifted voters’ understanding of politics. People were persuaded that we have a shared responsibility for the health of the poorest people in the state. That is a big mindset shift and may mean that even if Medicaid changes, the state will still need to care for poor people some other way. That’s a profound shift for a conservative voting public, and likely more impactful in the long term than the Medicaid expenditures for a given year.

As if this wasn’t complicated enough, Sherry Glied’s article above showed that for years researchers thought that expanded health insurance coverage had little to no impact on mortality. More recently, now that there’s a large enough sample size to draw on it was found that at there is a small reduction in mortality, one that would translate at the national level to thousands of lives saved per year. But we only know that because we expanded Medicaid via the ACA and thus had large enough sample sizes to draw on. So in some sense we had to take the leap on the theory and spend tons of public money to then get the evidence that confirmed it was money well spent.

The point I’m trying to make is that in practice it’s not always easy to tease apart the different levels that reforms are working on. It’s easy to spin whatever narrative you want to.

Social science is hard, but don’t let hard science off the hook.

The comparison to hard science and medical research is helpful. That’s a perspective I bring from almost going down the Chemistry PhD route: hard science is just as riddled with research errors and false findings as social science, they’re just less reflective about it. I spent an entire summer unsuccessfully trying to replicate the yield from one organometallic reaction. The research assistant who took over for me spent another 6 months on it before the lab abandoned the experiment. A friend of mine who did biochemistry research heard about this and joked to me about how “reproducibility is fake.”

So social change practitioners need to be humble, but also can cut themselves some slack. There is progress being made here (see Harvey, Glied) and some tweaks to research can help fix these problems (see Doleac) to help us understand how to scale programs better.

Tabarrok is right: incentives matter

I’m wrestling with Alex Tabarrok’s contention that you can only durably change behavior by changing incentives, rather than changing someone’s preferences. It’s a very economist’s view of the world that I’m sure misses a lot, but his argument and examples are persuasive. Scott Sumner, who’s in the same economist orbit as Tabarrok (though I mostly read him for his film takes), recently published a piece that he calls the “it doesn’t matter” perspective that basically argues that structural economic forces and incentives are so strong that many of the things that both right wing and left wing people complain about don’t really matter in the grand scheme of things.

I used to run a counseling clinic and to some extent people’s preferences can change, but maybe Tabarrok is right that those people would have reached the same conclusion and changed their lives without the therapy.

Though isn’t there some value in helping speed along people’s positive life trajectories through programmatic interventions, even just from a utilitarian perspective? If counseling helps you speed along the process of getting your life in shape, that’s less years of suffering and more years of thriving.

Flyover Takes

Discussion about this post