Skip to main content
Faculty Viewpoints

Why Hidden Populations Are So Hard to Count

Yale researchers Edward Kaplan and Jonathan Feinstein explain how widely accepted estimates have greatly undercounted the number of undocumented immigrants in the United States.

A map of the United States illustrating inflows and outflows of people.
  • Jonathan S. Feinstein
    John G. Searle Professor of Economics and Management
  • Edward H. Kaplan
    William N. and Marie A. Beach Professor of Operations Research, Professor of Public Health & Professor of Engineering

The recent finding from the Pew Research Center that there are 10.7 million undocumented immigrants in the country has been widely cited in the media. But a closer look shows that this number is based on faulty analysis of broader population surveys and is certainly not a reliable basis for understanding the issue of illegal immigration. Our own research, using a methodology based on population flows into and out of the country, came to a startlingly different conclusion. We estimate there are around 22 million undocumented immigrants in the U.S.

How can there be such a big discrepancy? The answer can be found in how the survey-based approaches handle the questions that respondents left blank.

The Pew data, for example, is based on U.S. Census counts and government surveys, such as the American Community Survey. One question in this survey asks whether individuals in the household were born outside the United States. We know the number of legal immigrants in the country from other government records, so to calculate the number of undocumented immigrants, Pew simply subtracts that number of legal immigrants from the total born outside the U.S. The logic is clear, but the survey numbers are riddled with uncertainty.

“The Census is too blunt an instrument to reach a relatively small population that has an incentive to remain undetected.”

First of all, about 5% of households don’t respond to the survey at all. Furthermore, around 8% of those who respond skip the question about place of birth.

Combine the nonresponders with the question skippers, and you get nearly 13% of the population of the United States for whom we have no clear answer to the origin question (and that does not even include undocumented immigrants who misrepresent themselves as having been born in the U.S.). That’s approximately 40 million people—far more than the difference between Pew’s estimate and ours. How do these survey-based estimates then try to fill in the blanks? With a technique called “hot deck” allocation, which in essence matches each record with no answer to the origin question with a record that is similar in its other answers and then assumes the origin answer should also match. So if you are a single male, age 33, working in construction, who skipped the origin question, you might be matched with another single male, age 33, working in construction, who was born in Cleveland. This is a reasonable approach if we can assume that people skip the origin question at random, and thus are similar to the broader population. But there are a lot of reasons why undocumented immigrants in particular might want to skip this question. They might fear official retribution if they admit their status, for instance. They are not missing at random; they’re missing on purpose.

Basically, since undocumented immigrants are a small population with incentives not to answer, the result of “hot deck” allocation of existing responses to the place of birth question will with very high probability assign the response “born in the USA!” to undocumented immigrants who might have participated but did not reveal their place of birth. This leads directly to a large undercount (and again this ignores deliberate misrepresentation of place of birth).

The survey-based approach does try to compensate for this kind of undercounting, adding about 10% to its calculation of the undocumented population. But if you look carefully at this 10% adjustment, you find that it has been repeatedly justified over the years via appeal to a small study performed in the Los Angeles area after the 2000 Census. Leaving aside whether response rates in an urban Western area can be held as representative for the whole country, this study was plagued by the same problem of nonresponses from people who would prefer not to be found, leading us to believe the 10% number is too low.

The bottom line is that the Census is too blunt an instrument to reach a relatively small population that has an incentive to remain undetected.

Why do we think our approach of measuring population flows into and out of the country is more likely to be right? We find supporting evidence in other immigration data. For instance, the survey data finds about 54% of the undocumented population is male. However, we know from government data that about 80% of individuals crossing the southern border are male. If you assume the survey is undercounting males and adjust to the 80% ratio, you come out with a total undocumented population very close to our estimate.

Immigration has become a hot-button political issue, and our work has been attacked and/or manipulated by both the left and the right. Some critics from the left have suggested that there is no issue with undocumented immigrants underreporting—but these are the same people who are arguing against the appearance of a citizenship question on the U.S. Census Form, ostensibly because such a question would dissuade undocumented immigrants from participating. Meanwhile, commentators on the right have argued that since our study showed millions more undocumented immigrants in the country, then millions of them likely voted in the election. Of course, if the reason our numbers are so much higher is that undocumented immigrants underreport, it seems highly unlikely that the same people would register to vote. Our interest is only in reaching an honest assessment of the situation.

When we started studying this issue, we were surprised to find that we were the first people to challenge the residual method in the last 20 years. The difficulty of measuring hidden populations is very real. Policymakers and scholars interested in issues related to the homeless or drug users face similar challenges in getting reliable baseline measures. If we build our understanding on questionable numbers or politically slanted interpretations, we risk fundamentally misrepresenting the nature of the challenges we want to address.

Department: Faculty Viewpoints