Research

Yale Study Finds Twice as Many Undocumented Immigrants as Previous Estimates

Generally accepted estimates put the population of undocumented immigrants in the United States at approximately 11.3 million. A new study, using mathematical modeling on a range of demographic and immigration operations data, suggests that the actual undocumented immigrant population may be more than 22 million.


Immigration is the focus of fierce political and policy debate in the United States. Among the most contentious issues is how the country should address undocumented immigrants. Like a tornado that won’t dissipate, arguments have spun around and around for years. At the center lies a fairly stable and largely unquestioned number: 11.3 million undocumented immigrants residing in the U.S. But a paper by three Yale-affiliated researchers suggests all the perceptions and arguments based on that number may have a faulty foundation; the actual population of undocumented immigrants residing in the country is much larger than that, perhaps twice as high, and has been underestimated for decades.

Using mathematical modeling on a range of demographic and immigration operations data, the researchers estimate there are 22.1 million undocumented immigrants in the United States. Even using parameters intentionally aimed at producing an extremely conservative estimate, they found a population of 16.7 million undocumented immigrants. 


Read the study: The Number of Undocumented Immigrants in the United States: Estimates Based on Demographic Modeling with Data from 1990 to 2016 

The results, published in PLOS ONE, surprised the authors themselves. They started with the extremely conservative model and expected the results to be well below 11.3 million. 

“Our original idea was just to do a sanity check on the existing number,” says Edward Kaplan, the William N. and Marie A. Beach Professor of Operations Research at the Yale School of Management. “Instead of a number which was smaller, we got a number that was 50% higher. That caused us to scratch our heads.”

Jonathan Feinstein, the John G. Searle Professor of Economics and Management at Yale SOM, adds, “There’s a number that everybody quotes, but when you actually dig down and say, ‘What is it based on?’ You find it’s based on one very specific survey and possibly an approach that has some difficulties. So we went in and just took a very different approach.”

The 11.3 million number is extrapolated from the Census Bureau’s annual American Community Survey. “It’s been the only method used for the last three decades,” says Mohammad Fazel‐Zarandi, a senior lecturer at the MIT Sloan School of Management and formerly a postdoctoral associate and lecturer in operations at the Yale School of Management. That made the researchers curious—could they reproduce the number using a different methodology?

The approach in the new research was based on operational data, such as deportations and visa overstays, and demographic data, including death rates and immigration rates. “We combined these data using a demographic model that follows a very simple logic,” Kaplan says. “The population today is equal to the initial population plus everyone who came in minus everyone who went out. It’s that simple.”

While the logic is simple—tally the inflows and outflows over time—actually gathering, assessing, and inserting the data appropriately into a mathematical model isn’t at all simple. Because there is significant uncertainty, the results are presented as a range. After running 1,000,000 simulations of the model, the researchers’ 95% probability range is 16 million to 29 million, with 22.1 million as the mean. 

Notably, the upper bound of the traditional survey approach, which also produces a range, doesn’t overlap with the lower bound of the new modeling method. “There really is some open water between these estimates,” Kaplan says. He believes that means the differences between the approaches can’t be explained by sampling variability or annual fluctuations. 

There are key areas of agreement between this paper and the existing survey numbers. Both methods found that the greatest growth of the undocumented population happened in the 1990s and early 2000s. Both found that the population size has been relatively stable since 2008. 

“The trajectory is the same. We see the same patterns happening, but they’re just understating the actual number of people who have made it here,” says Fazel‐Zarandi. In his view, that suggests the survey method doesn’t effectively reach a group with incentives to stay undetected. “They are capturing part of this population, but not the whole population.”

Chart of various estimates of undocumented immigrant population over time

Kaplan and Feinstein have worked on this type of problem for many years. “The analysis we’ve done can be thought of as estimating the size of a hidden population,” Kaplan says. “People who are undocumented immigrants are not walking around with labels on their foreheads. Neither are populations of homeless people, neither are populations of drug users, and neither are populations of terrorists. Yet for policy, it is very important to know the size of these hidden populations because that sets the scale of the problem in each of these different policy areas.”

Invariably, such work requires scholars to find ways to work with incomplete data. Feinstein says, “I see this project a filling in the pieces of a jigsaw puzzle. You’re taking the data from different places and bringing it together in a way that’s logical and helps you estimate something important, but not all those pieces have all the information you’d like.”

“The population today is equal to the initial population plus everyone who came in minus everyone who went out. It’s that simple.”

In fact, some of the relevant data sets have only recently become available, so this approach might not have been possible for this particular puzzle, even a few years ago. Fazel‐Zarandi notes that 2015 was the first time that data on visa overstays was collected by the Department of Homeland Security.

Bringing all the different sources of data together is arduous. “There’s a lot hidden under the hood, so to speak,” Feinstein says. The key components—inflows and outflows—are each made of numerous subcomponents. Each subcomponent must be aggregated from different sources, evaluated for its specific level of certainty, then incorporated into the mathematical model in a consistent way. 

“There are very few numbers we can point to and say this is carved in stone,” Kaplan adds. “We allow for all of that variability in the modeling, which complicates everything and explains why we get such a wide range of possible outcomes.” 

He continues, “How many people are actually being apprehended at the border? That’s hard data. That’s reported each year.” From there it’s possible to reverse engineer an estimate of how many people must have tried to cross the border. “This kind of ‘backwards logic’ is common in models of this form.” Kaplan notes that in the early days of the HIV/AIDS crisis, the number of new HIV infections was reverse engineered from the number of new AIDS cases.



The paper examines the years 1990 to 2016. The initial population is a key component of all subsequent years’ calculations; Fazel‐Zarandi explains that the team chose 1990 as a starting point because it fell between the amnesty President Reagan offered for undocumented immigrants in 1986 and the rapid growth of illegal immigration in the 1990s.

While the findings are startling, they aren’t describing a new situation. “We wouldn’t want people to walk away from this research thinking that suddenly there’s a large influx happening now,” says Feinstein. “It’s really something that happened in the past and maybe was not properly counted or documented.” 

Kaplan adds, “What we’re saying is the number has been higher all along.” 

While immigration is a hot button topic, the researchers are adamant that their aim is to provide information. “Of course, our findings will get pulled and tugged in many ways, but our purpose is just to provide better information,” Feinstein says. “This paper is not oriented towards politics or policy. I want to be very clear: this paper is about coming up with a better estimate of an important number.”

How might this research inform the debate around immigration? Some might argue that the presence of twice as many undocumented immigrants justifies tougher immigration enforcement. 

“One of the most common arguments in favor of a tougher immigration policy is that undocumented immigrants are coming with a lot of criminality,” Kaplan notes. But paradoxically, the new findings may undercut that argument. He points out that previous studies, based on the widely accepted total of 11.3 million undocumented immigrants, found that the rate of serious crimes committed by these immigrants is lower than for U.S. citizens. The new findings suggest that the rate is even lower than previously believed: “You have the same number of crimes but now spread over twice as many people as was believed before, which right away means that the crime rate among undocumented immigrants is essentially half whatever was previously believed.

With respect to the idea that undocumented immigrants take job opportunities from citizens, Kaplan points to different possible interpretations of the new findings. “The fact that there are actually more people here than we thought before might explain that, but you can also look at it the other way: whatever job displacement there has been happened with twice as many undocumented immigrants as we thought. That causes you to rethink just how much pressure there is.”

As is typical with academic work, this finding is not an endpoint. Feinstein says, “Hopefully, these results spur further thinking.”
 

Related:

Insights Animation: The Economic Benefits of Immigration

Does Immigration Create Jobs?

Immigration and Innovation

Three Questions: Prof. Rodrigo Canales on the Broken Promise of DACA

Senior Lecturer, MIT Sloan School of Management

John G. Searle Professor of Economics and Management

William N. and Marie A. Beach Professor of Operations Research, Professor of Public Health & Professor of Engineering