Is Big Data Bigger than Its Own Hype?

Nicholas A. Christakis
Sterling Professor of Social and Natural Science, Internal Medicine & Biomedical Engineering
Meagen Eisenberg
Chief Marketing Officer, MongoDB
Harlan M. Krumholz
Harold H. Hines, Jr. Professor of Medicine and Professor in the Institute of Social Policy Studies, of Investigative Medicine, and of Public Health (Health Policy); and Director of the Yale Center for Outcomes Research and Evaluation

July 03, 2017

“Big data will not solve a single problem for us,” said Yale’s Harlan Krumholz. It’s not that he is a big data skeptic, though to date, he sees more hype than results, at least in medicine, his field of expertise. What Krumholz wants to make clear is that data alone doesn’t do anything—even when compiled into astoundingly large data sets on our most powerful computers. Fundamentally what matters will be the questions we ask and the insights that come from this new way of seeing.

Put another way, Krumholz, Harold H. Hines, Jr. professor of medicine and director of the Center for Outcomes Research and Evaluation at the Yale School of Medicine, likened big data to the microscope—a device that didn’t do anything on its own but also enabled a new way of seeing. The microscope was a tool that enabled a process that produced germ theory. Krumholz’s view of big data’s ultimate potential is similar, “It will create immense opportunities.” If we do our part to use big data effectively, Krumholz said, it could launch a “new age of discovery.”

His remarks were part of the June 27, 2017, Yale SOM webinar “Will Big Data Change the World?” sponsored by the Office of Development & Alumni Relations. The panel also included Meagen Eisenberg ’04, chief marketing officer for MongoDB, and Nicholas Christakis, the Sol Goldman Family Professor of Social and Natural Science at Yale University.

All three experts highlighted the transformative potential of big data, the remarkable rate of change, and the complexity of developing effective applications. Eisenberg underscored that big data “is not a single technology, technique, or initiative. It’s a practice applied across different fields.” She also pointed to a number of effective big data tools already in use. One is Chicago’s WindyGrid, an open-source dashboard for managing a city. The geographic information system application creates a unified view of all city operations that can be viewed overlaid on a municipal map. Another is MetLife’s Wall which lets customer service representatives help the company’s 100 million-plus clients more effectively. Instead of fielding a question then deciding which of the more than 70 different administrative systems to access, the resources are all available to the rep through a single interface that is as intuitive as a Facebook page.

Both examples work by drawing together existing flows of data to make them more useful. Christakis pointed out that big data is “big” in part because the cost of collecting it has plummeted. Digitization and the connectivity created by sensors, the internet, and mobile technology created a new playing field. A few decades ago the most ambitious studies drew on a few thousand subjects, looked at a limited number of variables at a few points in time. “Because of the massive passive data collection that is possible in so many industries those are all exploding,” Christakis said. “You might have observations on millions or billions of people and you might have thousands of variables about each individual and you might have them at levels of temporal resolution that are just mind-boggling, literally to the second sometimes.”

The passive collection of previously unimaginably detailed data sets has helped drop the cost of experimentation and testing. While, at the same time, processing capacity has improved fast enough that what was a challenge for supercomputers is now feasible on widely available machines.

In his own work, which focuses on connection and contagion in social and biological networks, Christakis has been able to show that popular people on Twitter were using hashtags on average nine days before they went viral. The work may apply to contagion phenomena in politics, consumer activity, stocks, or epidemics. Much of the research is only possible as a result of big data.

Christakis and Krumholz compared big data to the invention of the telescope or the microscope. They pointed to a similar potential to see things differently, which allows questioning of old assumptions and generation of new ideas. Krumholz said it could be “a re-engineering of the way that we learn about our world.”

He noted that medicine has lagged behind in moving big data from hype to reality in part because it is still often using “small data.” There are a number of reasons for that but privacy around medical records and data is a significant issue.

All of the panelists noted the challenge of privacy and security issues that come with big data. The technology, tools, and baseline practices are moving forward very quickly in many sectors using rules that are often buried in unread terms of service agreements. That allows for progress, but everything may be moving too fast for a full public dialogue or thoughtful policy framework. Krumholz noted one result in medicine: “While people have trouble getting their records and data, it turns out that companies get ahold of our data all the time through business associate agreements.” He added, “I believe that people ought to have more agency over their own data.…That’s a long way from where we are now.”

Work in the field tends to be highly collaborative. Open source tools are commonplace. But even on specific projects there’s a need for multidisciplinary teams. “We’re getting smarter about how to process the data with algorithms,” said Eisenberg. But she underscored that data scientists aren’t going to transform the world on their own. “You have to know the questions to ask.” And that requires domain expertise.

Building on the point, Krumholz added, “Where there’s success is where there’s a clear idea of who the end user is and how they are likely to use the information,” That then directs the technical aspects of data analysis as well as the translation of the results into forms that are understandable and actionable.

Ideally, the outcome is tracked, throwing off new data that can be assessed and turned into incremental improvement leading to a virtuous spiral. But the experts warned that as with any technology, our choices determine whether it is used for good or for ill.

Department: Faculty Viewpoints

Topics: