By Roberta Kwok
Consumers rely on online ratings to guide everything from restaurant choices to furniture purchases. One review might not be trustworthy, but when websites average their ratings together, the thinking goes, its score should reflect the “wisdom of the crowd” and accurately capture its quality.
But research by Balázs Kovács, an assistant professor of organizational behavior at Yale SOM, and his collaborators suggests that this system contains a flaw. According to their analyses, products or businesses that initially receive poor reviews are likely to get fewer ratings in the future. Since the rest of the “crowd” doesn’t weigh in, the item may be stuck with an unfairly low score. It “might never be corrected,” Kovács says.
“If there are few reviews, the score is not only inaccurate but is biased in a systematic way,” adds Gaël Le Mens, a professor of behavioral science at the Universitat Pompeu Fabra, who collaborated with Kovács, along with Judith Avrahami and Yaakov Kareev of the Federmann Center for the Study of Rationality at the Hebrew University of Jerusalem. “There is a systematic tendency for such scores to be underestimations.”
Read the study: “How Endogenous Crowd Formation Undermines the Wisdom of the Crowd in Online Ratings”
To understand the team’s reasoning, consider the following scenario. Two new restaurants of equal quality open. An accurate rating for both would be, say, three stars. The average of the first few reviews will likely not be exactly three stars since the sample size is small. By chance, restaurant A might get lucky and average four stars after the first week, while restaurant B gets unlucky and averages two stars.
What happens next? Diners who read online ratings will likely choose restaurant A instead of B. Restaurant A accumulates more reviews, and the rating eventually converges to an accurate average of three stars. But since few people try restaurant B, it gets stuck at two stars—even though it’s as good as A.
The rating isn’t the only problem, Kovács says. Review sites often order options from highest to lowest scores. If your business is poorly reviewed, “you go to the bottom,” he says. “People are not going to choose and rate you.”
To find evidence for this pattern, the researchers analyzed 78 million Amazon ratings and 2.2 million Yelp ratings. Highly rated products and businesses accumulated reviews more quickly than poorly rated ones did, they found. If an item’s score increased by one star, it received reviews an average of 16% faster on Amazon and 14% faster on Yelp.
To demonstrate that this leads items with few reviews to be underestimated, the team created a computer model. The model simulated the process of users selecting items to rate, based on their current scores, and submitting more ratings. As expected, an item that started out poorly was likely to remain stuck with a score that was lower than its true quality.
Next, the researchers conducted an experiment in which participants rated pictures presented on their computers. First, the participants were shown 50 buttons, each corresponding to a picture, and had to choose 10 to view and rate. For some participants, the buttons were ordered so that those corresponding to pictures with the highest ratings were at the top of the page, while lower-rated ones were near the bottom. Previous research suggests that in this situation, people are more likely to pick options near the top of the page, and thus pictures with the highest ratings would receive the largest quantity of additional ratings. This is what happened in the experiment. And just as the model predicted, pictures with few ratings tended to earn average scores that were lower than their “actual” quality, as determined by other participants rating pictures presented at random.
The team also demonstrated they could reverse the bias by displaying the buttons in the opposite order. For some participants, the buttons were ordered from lowest- to highest-rated. In this case, pictures with few ratings got average scores that exceeded their actual quality.
Finally, the team studied ratings of iPad cases on the French and German Amazon sites. The products were exactly the same, but ratings differed depending on the site. By comparing the data, the researchers could isolate the effect of the number of reviews. They found that having 10 times more reviews was linked to an average score 0.35 stars higher.
The study suggests that customers shouldn’t dismiss low-rated items with few reviews. “You’re likely to be positively surprised,” Le Mens says.
The results also suggest that businesses that start off with bad reviews can counteract the effect by asking more customers to review their product or service. There’s no need to plead for high ratings, Le Mens says: “Even if you just tell them, ‘Write an honest review,’ it will help.”
Online review sites also could take steps to counteract this bias, the researchers say. For instance, the sites could avoid displaying average scores for products with only a few reviews. And instead of ordering options based strictly on ratings, they could randomize the order a bit. Or the sites could identify items with few reviews and encourage users to rate them. “That’s one of the ways to fight this,” Kovács says.