Randomized control trials and economic models: friends or foes?

Randomized control trial (RTC) studies are getting more and more attention among policymakers in the last few decades. In addition, the RCT is one of the core experimental methodologies used by the recent nobel prize laureates in economics Duflo, Kremer and Banerjee.

Given the excitement around these methods, Chicago University has recently run the IGM Economic Experts Panel asking economic experts on whether the “ Randomized control trials are a valuable tool for making significant progress in poverty reduction”. The results of the poll are summarized in the graph below.

The chart above highlights respondents’ agreement distribution. What struck me most from the results was Angus Deaton’s strong disagreement with the statement – especially given that he is an expert in the field.

Why does Deaton strongly disagree?

To answer that we would like to think about what the RCT is and how does it fit to answer the policy question. Let’s shed some light on it.

What is an RCT?

RCT is a technique used predominantly in medical sciences, but also applied in economics quite intensively , especially in the last few decades. The technique works in the following way. Researchers randomly select a group of people to allocate them a clinical intervention (such as an anti-cancer pill). The comparison group (which is called the control group) is also randomly selected where they received a placebo intervention (such as a sugar pill).

Then the researchers compare the difference between the groups to quantify the significance of the treatment (clinical intervention or “treatment effect”).

In economic research, RCT is often applied in poverty alleviation schemes to help quantify the effect of the policy intervention. However, it has been applied much more widely giving insights about the labour market, behavioural economics, health economics, taxation, and industrial economics.

So an RCT tells me what a policy does?

RCT gives us an empirical treatment effect given specific conditions. This is the type of thing economists will often call a stylised fact.

However, stylised facts cannot give us general policy effects – they tell us what the policy response was in a specific set of circumstances, but we need to be able to generalize that effect to apply it in other circumstances.

This is where Deaton gets concerned, and where some of the push-back against RCT stems from.

To get a policy effect we still need a model – simply scaling up an RCT involves imposing an implicit model about how the policy and behavioural responses work, one that assumes the scale of the policy change does not matter and that there are no general equilibrium effects.

This matters. If we provided a minimum income payment in Treviso, Italy we may find certain changes in prices and labour supply responses in that community. However, we could not then take that result and “scale it up” across Italy as a whole – as Treviso was not a closed system in the same way an entire country may be, and the larger scale of the policy would influence prices and labour market responses differently as a result (eg if a minimum income increased demand for particular goods, doing so in a small region may not change the price for that good – while doing it for the whole country would).

How economic models fit in here?

Economic models provide the mechanism for generating generalisability. At the same time, models and RCT results should work in a recipricotive way.

Given the same conditions as the RCT, a good economic model should be able to replicate the result – or at least key attributes of it. Given the ability to replicate an RCT for those conditions, the model then embeds key assumptions about why that result held and a description of the systems that make up the question at hand – this allows an economist to ask counterfactual questions about what would happen if the policy introduced was much larger.

However, it isn’t all one way. Models should in turn be reevaluated if a robust body of RCT evidence suggests that – for a given set of conditions – the models results are false. RCTs provide the pieces of evidence that models should be able to replicate, while models provide a framework for understanding what can’t be measured and how other, counterfactual, policy changes will work.

Examples of policy implementations (treatments):

To clarify let’s talk about specific examples of how the RCT can be used.

Minimum wage and labour market

Let’s consider an example with minimum wage increase and the labour market outcomes. Card and Kruger (1993) found that the minimum wage increase in New Jersey led to employment increases in the state compared to the other state (Pennsylvania), where the same policy was not applied.

Now if we want to take this result and generalise it to the population level, saying that if we increase minimum wage, it will lead to an increase in employment rate, we are making a mistake. Why? Because the same increase in the minimum wage in all states would have different impacts due to the composition of those states, the overall change in prices in the economy, and the capital structure and industries that are viable across the US economy.

However, it showed there were real shortcomings with models that could ONLY indicate that an increase in the minimum wage could reduce employment. This helped to generate a literature that has more carefully considered the role of minimum wages given the potential for market power and strategic interaction in the market for low wage workers.

What is the solution then?

In Deaton’s view too much is being asked of RCTs, and indeed people need to recognise how to “transport” the results to another context:

“More generally, demonstrating that a treatment works in one situation is exceedingly weak evidence that it will work in the same way elsewhere; this is the ‘transportation’ problem: what does it take to allow us to use the results in new contexts, whether policy contexts or in the development of theory?
It can only be addressed by using previous knowledge and understanding, i.e. by interpreting the RCT within some structure, the structure that, somewhat paradoxically, the RCT gets its credibility from refusing to use. If we want to go from an RCT to policy, we need to build a bridge from the RCT to the policy.”

Deaton’s concern, which is reasonable, is that RCTs are treated as a sole source of truth. But such a focus isn’t just misleading, it would be bad science.

Card and Kruger’s paper did not tell us that a higher minimum wage would increase employment – it taught us that reality is complicated, and the evaluation of policy must be based on trying to understand how this works, using both evidence and theory. Duflo, Kremer, and Banerjee similarly see the importance of both – in her Economist as Plumber article Duflo notes:

“However, because the economist-plumber intervenes in the real world, she has a responsibility to assess the effects of whatever manipulation she was involved with, as rigorously as possible, and help correct the course: the economist-plumber needs to persistently experiment, and repeat the cycle of trying something out, observing, tinkering, trying again”

Deaton’s concern is that people will experiment and measure without ever trying to model and understand what they are doing – thereby generating a stream of published studies but no understanding. Those that are more positive about the RCT revolution instead see such experimentation as part of this very iterative process that helps to describe the “transport” problem that Deaton is concerned about.

To sum it up

Predicting a policy result from a given policy involves an implicit model – irrespective of the number of RCTs that have been run. However, these RCT provide a discipline that any worthwhile predictive model needs to be able to replicate – they provide the true stylised facts (if done properly) that a predictive model must match to be credible.