Skip to main content

Machine Learning Model Extracts Insights from Customer Reviews

Vast amounts of potentially useful information about consumer opinions is captured in written reviews, but this unstructured data goes largely unanalyzed. A new study co-authored by Yale SOM’s K. Sudhir uses natural-language analysis to learn from what customers are saying—and to infer meaning from what remains unsaid.

An illustration of artificial intelligence


  • K. Sudhir
    James L. Frank ’32 Professor of Private Enterprise and Management, Professor of Marketing & Director of the China India Insights Program

A collaboration with Insights Review, the online publication of the Yale School of Management's Center for Customer Insights.

By Dylan Walsh

Feedback is the lifeblood of many consumer-facing companies.

For such companies, the five-star survey has become a universal tool of self-improvement. How satisfied are you with your recent purchase of the product or service? How likely are you to return? How enjoyable was the shopping and checkout experience? And, of course, how likely are you to recommend us to a friend?

Answers to these questions arrive neatly quantified, ready to be graphed and analyzed. But the process captures a narrow and artificial sentiment, without the detail and subtlety expressed in written customer feedback like online reviews. Such reviews are one of many examples of unstructured data—things like writing and images that can’t be readily packaged in a spreadsheet. Estimates suggest, in fact, that the 80-90% of business data is represented unsystematically, and so less than 1% of it ever gets analyzed.

A new study from Yale SOM’s K. Sudhir tries to make order out of that chaos. “What we’ve tried to do is to very broadly get at that data and develop insights that might be useful to companies,” he says. In collaboration with Ishita Chakraborty, a PhD candidate at Yale SOM, and Minkyung Kim of UNC Chapel Hill, Sudhir used natural-language analysis to extract practicable information from written customer reviews. “We wanted to understand more clearly what is being talked about and convert that into a scoring scheme.”

Read the study: “Attribute Sentiment Scoring with Online Text Reviews: Accounting for Language Structure and Attribute Self-Selection”

The researchers collected nearly 30,000 Yelp reviews of restaurants, comprising a total of one million sentences. They then adapted state-of-the-art deep learning architectures in machine learning to automatically classify every sentence along two dimensions. First, does it discuss specific attributes of the restaurant like food, service, price, ambiance, or location? Second, for sentences that did discuss one of these attributes, how favorably did it label the attribute? These scores ranged from extremely positive to extremely negative.

While prior work using natural language algorithms has tackled this ground, most success has been around a binary determination of whether a review is positive or negative. This project stands apart on several fronts. First, it proved particularly capable at parsing sentiment along specific restaurant attributes: for example, the same review may rank food quality highly, service as mediocre, and ambiance as especially bad. “Trying to associate the right sentiments with particular attributes was a harder problem that we tackled,” Sudhir said. The model also paid particular attention to interpret traditionally challenging syntax—meandering sentences, grammatical twists, sarcasm—with much greater success. All told, these kinds of sentences constituted roughly half of the data set.

Third, and perhaps most important, the model was designed to infer meaning from absence. Previous work in this field has taken for granted that customers who don’t mention, for example, the price of a restaurant do so because the issue is not important to them. But this may not be the case.

“What does it mean when somebody doesn’t talk about something?” Sudhir says. Perhaps the attribute simple wasn’t worth mentioning. Perhaps, instead, the attribute fell in line with the reviewer’s personal expectations, whatever those may be, or with the more general expectations of the kind of restaurant they were visiting. We may expect one kind of ambiance from Chuck E. Cheese, and another kind of ambiance from a three-star Michelin restaurant, but a reviewer’s silence in each case does not necessarily mean the same thing. “What we tried to do is make that inference—how should we interpret the silence?—to get a more accurate average score from the data.”

“This is a very cost-effective way to listen to open-ended feedback—and there are a great many places where you could apply it.”

Their model was able to distill important nuance through this mechanism. At lower-end restaurants, for instance, customer’s opinions about service are more positive than one might assume from written reviews alone, as people are generally satisfied with service but remain silent on the point. Likewise, a lower-end restaurant’s value is probably far higher than written reviews would suggest, as those who most frequently write explicitly about value are people who think it’s lacking; their voice depresses the overall picture among a large group of generally content but silent patrons.

Though the researchers focused their work on five attributes specific to restaurants, Sudhir notes that the model is portable between sectors. If a hotel chain were interested in listening closely to customer sentiment from Tripadvisor reviews, the company could simply retrain the model to extract the most salient hotel-related attributes.

“If the goal is to listen to your customers and respond as an organization to places where they say you need to pay more attention, then the granularity of this model is pretty good and the results should be reasonably accurate,” Sudhir said. “This is a very cost-effective way to listen to open-ended feedback—and there are a great many places where you could apply it.”

Department: Research