Adaptive experiments for policy research

November 30, 2020

5 min

Maximilian Kasy and Anja Sautmann

The goal of policy research is often to identify which policy – out of a range of possible design or implementation options – will have the greatest impact on the outcome in question. Such efforts may require large sample sizes with potentially long periods of observation, as well as painstaking and costly data collection. This column proposes a new experimental research design that fulfills the goal of identifying the best policy faster and with fewer observations than a ‘standard’ experiment. ‘Adaptive’ experimental designs like this, which focus on the policies that show the greatest promise as the evidence accumulates, have the advantage that even as an experiment is in progress, a rising share of participants benefit from policies that emerge as the most successful.

One of the biggest methodological shifts in public policy research has been the rise of experiments – ‘randomized controlled trials’ (RCTs) – to measure the effects of social programs or policies. RCTs have been particularly influential in development economics.

When an academic researcher conducts an experiment to test a policy, the goal is typically precise measurement. This is best done by creating two groups of people that are as similar as possible, one of which (the ‘treatment’ group) receives the program and the other (the ‘control’ group) does not. An RCT can answer the question: does this program have a significant effect and what exactly is the effect?

But when a government or NGO conducts an experiment, measurement is not always their first goal. Instead, they are often most interested in quickly finding the best possible program variant among several available options.

They might want to learn, for example, what type of support is most effective in helping refugees to find work in their host country; or which form of outreach is most successful in engaging parents in their children’s schoolwork. In other words, they would like to answer the question: which version of the program will have the greatest effect and should therefore be implemented?

Take Precision Agriculture for Development (PAD), which provides a free agricultural extension service for smallholder farmers. For an enrollment drive in India with one million farmers, PAD wanted to learn as quickly as possible how to conduct enrollment calls effectively, so that farmers would not screen the calls without learning about the service. On the other hand, PAD did not need to estimate enrollment rates from their calls precisely.

Our research provided a solution to PAD’s problem by conducting an ‘adaptive’ experiment. The idea is to carry out the experiment in several waves and to adapt the experiment after each wave, so that the learning goal is reached as quickly as possible.

In the first wave, the experiment for PAD looked just like a non-adaptive RCT. But from the second wave onwards, the algorithm we used, ‘exploration sampling’, began focusing on the call methods with higher response rates, in order to learn as much as possible about those treatment arms that are the most likely candidates for implementation.

The idea of adaptive sampling is almost as old as the idea of randomized experiments. Adaptive designs have been used in clinical trials as well as in the targeting of online advertisements.

PAD’s objective was to identify the enrollment method with the highest success rate for scaled implementation of the experiment. But algorithms for adaptive experimentation can cater to many different objectives.

For example, in the case of a job search support program for refugees in Jordan by the International Rescue Committee (IRC), the goal was to learn as much as possible about different interventions – here providing information sessions, counseling, or financial support – but at the same time, the IRC also wanted the best program arm to benefit as many refugees as possible. Researchers therefore implemented a hybrid adaptive algorithm that achieves better treatment outcomes than a pure RCT (by assigning more subjects to treatment arms that perform well) but also learns about each arm with high precision.

In the case of PAD, we showed that the exploration sampling algorithm leads to consistently better recommendations on which policy to implement than a standard RCT. PAD carried out exploration sampling with the phone numbers of 10,000 farmers in June 2019. They tested six different methods of calling: placing calls in the mornings or evenings, and alerting the farmer with a text message at different lead times.

Figure 2 shows the share of successful calls in the six treatment arms by the end of the experiment. Figure 3 illustrates what shares of each wave were assigned to the different treatment arms over time.

1. Share of successful calls in each treatment arm (out of 1).

2. Assignment shares of phone numbers (observations) to treatment arms over time. Shares add to 1 at each date.

Despite considerable variation, calling at 10am with a text message one hour ahead emerged as the most successful treatment early on. As a result, a greater share of each wave was assigned to that call option. By the end, nearly 4,000 out of the 10,000 phone numbers were assigned to this arm – as seen from the aggregate assignment shares in Figure 4.

3. Aggregate number of observations (phone numbers) assigned to each treatment arm at the end of the experiment. Nearly 4,000 phone numbers were assigned to the arm which had the highest call response rates: calling at 10am with a text message sent one hour ahead of time.

The key insight of adaptive experimentation is that splitting the sample into equal-sized treatment and control groups, as a standard RCT does, may not always be the best thing to do, once we have learned a little more about the different treatment arms. This has great potential to improve how experiments are done in practice.

A welcome feature of many adaptive procedures is that more participants benefit from the best treatment options, facilitating the ethical conduct of experiments in development and policy research. Moreover, the learning process is completed faster and with smaller sample sizes. At the same time, using an adaptive algorithm for learning ensures that the resulting policy decision is still replicable and fully empirically justified.