Prof. Tristan Botelho has conducted multiple studies showing how women are penalized in evaluations of their work. He says it’s time to move beyond documenting bias and figure out how to reduce it.
“Unfortunately, bias is a common finding. You don’t get surprised anymore,” he says. But if we can better understand the levers that attenuate or magnify bias, “we can then design these processes more equitably.”
A new study co-authored by Botelho suggests one possible way to do that. He and Marina Gertsberg of Monash University found evidence that awards for evaluators may prompt them to consider their assessments more carefully. Specifically, the change in status appears to reduce the gender bias that often plagues evaluations.
The team came to this conclusion after analyzing more than 1.6 million Yelp reviews of restaurants. In general, they found a bias in favor of male servers. But among reviewers who had received an “Elite” designation from Yelp, the gender gap in their reviews became smaller after being recognized.
The award may have drawn attention to the reviewer, motivating them to judge their dining experiences more fairly. “There’s a spotlight on them,” says Botelho. “That might actually make you stop paying attention to things that you shouldn’t have paid attention to in the first place,” such as gender.
Botelho says that this insight may help organizations reduce the bias of evaluators, such as hiring managers. Of course, not everyone in the workplace can be given an award. But organizations could find other ways to achieve a similar effect and scrutinize evaluators’ judgments more heavily.
“It’s not about the award per se,” Botelho says. “It’s about the levers we can pull to amp up pressures of attention and scrutiny.”
When Botelho and Gertsberg began their study, it wasn’t clear whether an award would have any effect on an evaluator. The awardee might assume that they’ve been recognized for their excellent work, and therefore think they don’t need to change any of their practices. But the researchers speculated that the award might motivate evaluators to perform even better.
The Yelp data set proved useful for investigating this question, for a few reasons. It contained a huge trove of information: the team obtained about 1.6 million reviews that mentioned the server for more than 50,000 restaurants in 11 urban areas from 2004 to 2017. In about one-fifth of those reviews, the researchers could identify whether the server was a man or a woman, based on whether the writer used the terms “waiter,” “waitress,” or other gendered language.
And Yelp had given 6% of the reviewers an “Elite” designation. While this award didn’t come with money, the people who received it did seem to care about the status boost. After becoming an Elite member, they tended to submit more reviews and write longer reviews.
First, the team looked at reviews by non-Elite members. They compared the ratings associated with reviews that mentioned male versus female servers.
“We often think about getting awards as recognition for past actions. But what this is showing is that these awards also have an effect on subsequent actions.”
The researchers found that when the server was a woman, the average rating was lower. In particular, the reviewer was more likely to give the worst possible star rating. About 25% of reviews referencing waitresses received one star, compared to 20% of those mentioning waiters. The reverse trend held on the opposite side of the spectrum; 30% of reviews with waiters received five stars, compared to only 23% of those with waitresses.
Next, the researchers examined reviews by Elite members. These evaluators seemed to be more discerning; they gave out fewer one- and five-star reviews and more three- and four-star reviews overall. And gender differences shrank. For instance, 7% of reviews mentioning waitresses and 5% of those mentioning waiters got 1 star.
To examine the data more rigorously, the team compared sets of reviews that were otherwise very similar. To do so they matched restaurant reviews based on key factors, such as the restaurant’s price range, restaurant’s location, the server’s gender referenced, and the activity of the evaluator, such as the number of reviews they had submitted so far, the length of the review, and when the review was posted.
The researchers also knew that evaluators might be influenced by other people’s reviews. To control for that effect, they calculated how much each reviewer’s rating deviated from the restaurant’s average Yelp rating to date.
Among non-Elite members, mentioning any server, regardless of gender, pushed the rating down. The rating dropped 0.43 stars below the restaurant’s average to date when reviews referenced male servers and 0.55 stars for female servers, creating a gender gap of 0.13 stars. But among Elite members, the gender gap shrank to 0.05 stars.
“It reduces their bias by more than half,” Botelho says.
The team found similar trends when they looked only at Elite users and compared their reviews before and after attaining Elite status. The award itself seemed to have caused the change in reviewer behavior.
The study suggests that “the status award changes the way people evaluate,” Botelho says. “We often think about getting awards as recognition for past actions. But what this is showing is that these awards also have an effect on subsequent actions.”
Botelho speculates that this mechanism works only if the evaluator is at risk of losing their status due to poor performance. For example, a lifetime achievement award likely wouldn’t have much of an effect.
Putting additional scrutiny on workplace evaluators “is not going to eliminate bias, but it’s going to start stripping away the likelihood that individual biases are going to creep in.”
How can organizations translate the team’s finding to the workplace? Giving an award to every evaluator isn’t practical, but there are other ways to increase scrutiny, Botelho says.
For example, when someone wants to reject a job applicant or deny a promotion, they could be required to justify their decision in a group meeting. Instead of giving a vague reason, such as saying the person “is a bad fit,” the evaluator would need to identify a specific key performance indicator (KPI) that the candidate failed to meet.
“Try to move them away from using indicators unrelated to the candidate’s quality,” Botelho says, and toward criteria “that are tangible, that can be defendable, that are KPI-based.” Increasing accountability for these judgments “is not going to eliminate bias, but it’s going to start stripping away the likelihood that individual biases are going to creep in.”