Elizabeth Nelson
Elizabeth Nelson

Accurate Uncertainty

by J. Edward Anthony

Most of us would say a little luck never hurts. But luck is a problem for economists, pollsters, and anyone who wants to infer information about a large population based on a limited sample, says Douglas L. Miller, Policy Analysis and Management/Economics.

Imagine a drug trial of 50 patients. Half the patients are randomly assigned to a group that receives an experimental drug, the other half receives a placebo. If 15 patients who receive the drug show improvement, but only 11 of those given a placebo, how confident can we be that the new medication is effective? What’s the chance that, by the luck of the draw, the two healthiest patients fell into the group that received the new medication? If researchers could increase their sample size, might they determine that the medication has no effect? Or even does more harm than good?

For economists like Miller, luck sometimes poses even greater challenges than it does for medical researchers. Miller is interested in measuring the effects of public policies and social programs on health and other life outcomes. In that context, it’s often difficult or impossible to manipulate experimental variables and control for the many factors that might influence an outcome.

Assessing Real-World Complexity

Consider efforts to assess Head Start, the early childhood education program, based on the life outcomes of its participants as adults. Researchers in the 1990s struck on the idea of comparing children within the same family—some who attended Head Start and some who did not—in an attempt to control for the multitude of important factors, such as location, household income, and parental education, that are known to affect life outcomes.

In a paper that began as a project for graduate students, Miller and his collaborators discovered that using within-family comparisons can skew results. Because within-family comparisons are more likely to draw on data from large families—since large families are most likely to have at least one child who did and one child who did not participate—the estimates turn out to be more representative of large families than of the population as a whole. Miller and his collaborators worked out a way to correct the main estimate by weighing more heavily the smaller families who were used for the comparison. But doing so also increased the possible role of luck in producing the estimate.

“When we look at an estimate,” says Miller, “we want to know how large a role chance or luck may have played in producing it. Of course, we’d rather avoid luck altogether, but that’s often beyond our control. What we can do is carefully assess how much of our main estimate may be the product of luck.”

One of Miller’s achievements has been to improve statistical tools so that researchers can more accurately calculate the role of luck—or what Miller calls statistical uncertainty—when producing estimates about a larger population based on a limited sample.

Businesses, fundraisers, activists, and legislators all rely on inferences drawn from limited data sets, and to make informed decisions they need an accurate measure of the uncertainty underlying an estimate. “Our sense of statistical uncertainty is critical in decision making,” says Miller. “If, for example, our methods for calculating uncertainty are faulty and make our measurement of statistical uncertainty too low, we can wind up being falsely confident and draw conclusions that are unsupported by the evidence,” he says.

Statistical uncertainty has legal implications as well. With accurate methods of calculating statistical uncertainty, a researcher can, for example, analyze evidence of race or gender bias and then evaluate an employer’s claim that a pay gap is the result of chance and not systemic discrimination.

Cluster-Robust Inference

One challenge arises when clusters of data points are subject to influence by one or more unobserved factors. “There's a whole set of assumptions that we teach when we teach regressions. One of these assumptions is that our observations are independent of each other,” says Miller. “When each data point is independent, each observation carries with it more or less the same amount of information. The estimate of uncertainty is premised on the independence of data points. If your regression model assumes that your observations are all independent when they really aren’t, your estimate itself might be fine—but your uncertainty around that estimate will show false precision. It will make you think you can be more confident in the estimate than you should be.”

Imagine a study that collects data from 20 cities to determine how increasing the sales tax affects household spending. If one city in the study relies heavily on tourism, and the tourism industry experiences a sudden slump, the households in that city would be a cluster.

“Because they share that common unobserved factor, they're not going to be as independent as you might think they are,” says Miller. “They don't provide extra information. It's not as if you're getting two different observations from across the country where, in some sense, those things wash out. They don't wash out if they're within the same community.”

Miller has been creating methods to better calculate statistical uncertainty when observations are clustered. “Researchers were mistakenly missing out on accounting for some ways that luck could be in play in their estimates,” he says. “As a consequence, they were treating their estimates with too much confidence.”

“Researchers were mistakenly missing out on accounting for some ways that luck could be in play in their estimates.”

Accurately calculating the role of luck when relying on related data points is called cluster-robust inference. Getting it right used to require many more clusters—collecting data from 50 cities, for example, instead of 20. Miller devised a way to improve measures of uncertainty with fewer clusters. More recently, Miller advanced a method for dealing with different types of clusters simultaneously.

“Maybe I want to think about the data being related in terms of the community, and also, separately, in terms of occupation,” says Miller. “I might need to take into consideration that all firefighters across communities have experienced a common economic shock because of something happening in terms of their occupation. As a researcher, I need to allow for dependence across two different dimensions of relatedness—community and occupation. Using the traditional cluster-robust toolkit, you had to choose. You could only account for one or the other. The new technique enables researchers to account for multiple dimensions.”

Miller is currently working on methods of cluster-robust inference for data that are organized into dyads. “One data set might involve trade between the United States and the United Kingdom, another between the United Kingdom and China, and another between Germany and Denmark,” says Miller. “We need to account for the fact that the United States–United Kingdom and United Kingdom–China observations are possibly related, but are independent from the Germany–Denmark dyad. That could have a real impact on your measure of uncertainty.”

Economics—Looking at Wellbeing

Miller started graduate school with the intention of studying the economy of developing nations. It was the late 1990s, and the field of economics was expanding. “Through peer and adviser influence, I got on a different trajectory,” he says. Increasingly, development economists were looking beyond economic outcomes, such as earnings and consumption, to consider people’s wellbeing and life chances.

“Part of what was exciting about that moment was expanding the set of important life outcomes that economists wanted to study,” says Miller. “I started looking at questions of health and economic determinants of health. That led to one branch of the work I do today.”

There’s a science to developing better statistical tools, says Miller, and an art to imagining the comparisons that can put the impact of a policy or program into sharp relief. Miller likes working in the space between—knowing the strengths and limitations of statistical tools and optimizing them for the job at hand. “We need to pay attention to what’s really happening in the world—the facts on the ground. But often the facts we’re interested in, it’s not like a physics experiment where you can observe a clean A-to-B connection. There’s a lot of messiness in the world around us. Statistics lets us seek out patterns that are hidden underneath the messiness.”