Is Women’s Work Evaluated Fairly?

Tristan L. Botelho
Associate Professor of Organizational Behavior

June 19, 2018

Does gender bias prevent women from being treated fairly in job interviews, performance assessments, and other evaluations? Yale SOM’s Tristan Botelho set out to examine bias in the evaluation of women in a notoriously male-dominated field: buy-side investment banking.

Botelho, who has worked in the industry, says that while it has a macho culture, it also has a singular focus on measurable performance. “You buy a stock and the stock goes up or down and that gets attributed to you as part of your performance,” he says. “And we know that investment professionals really care about that performance.” That focus on performance might serve to counteract bias. It also makes it easier to detect bias, because performance can be easily separated from other factors that might be affecting how someone is evaluated.

Botelho and Mabel Abraham of Columbia University found an ideal setting in which to test for gender bias and its relationship to performance: an online platform where investment professionals post stock recommendations and rate each other’s recommendations. Data from the site allowed them to see how users rated recommendations that were associated with a typically female name. They could also look forward in time and see how accurate the recommendations turned out to be.

Read the study: “Pursuing Quality: How Search Costs and Uncertainty Magnify Gender-based Double Standards in a Multistage Evaluation Process”

The researchers found that, in fact, recommendations from men and women were rated equally relative to their future performance. But crucially, users were less likely to click on women’s recommendations to begin with. Once the investment professionals saw women’s proposals in detail, they treated them fairly. But the women’s ideas were less likely to get attention than their male colleagues’.

Q: What were you trying to learn in this research?

The question that we’re after in this paper is how gender affects an evaluation process. We’re doing it in the context of the investment management industry. Hedge fund and mutual fund managers come together on an online platform to share recommendations about their portfolios. As part of the platform, people can rate one another’s recommendations. What we’re after in this paper is how a recommendation posted by a woman gets rated versus a recommendation posted by a male when we can control for the quality of that recommendation, which is really key in this industry.

The platform is set up like a multi-stage evaluation. At the first stage you’re getting a list of investment recommendations that you can sift through—buy this stock, sell this stock, how it’s performing to date, who posted it—but very little information about why they’re saying to buy or sell—the analysis that’s supporting this position.

To get to that information, you actually have to click on a recommendation. You can think of it very much like hiring. At the first stage you get all these résumés that give you some information about the individual, but if you really want to know more about them, you have to bring them in for an interview. That’s akin to clicking on the recommendation. Once you click, you get more information about the recommendation, and you’re allowed to give some feedback in the form of a comment and rate the quality of that recommendation from one to five stars.

What we’re really trying to pay attention to is how gender affects when people are clicking on these recommendations and then once they’ve clicked, whether they’re treating these recommendations differently—by leaving more or less feedback, better or worse ratings.

What we find is that when recommendations are posted by women investment professionals, they get less views than those posted by their male colleagues. But conditional on getting attention, once people click on the recommendation and rate the quality, they are treating men and women equally.

Q: How did you determine if a recommendation was posted by a man or a woman?

We used an algorithm, developed by IBM, that takes all the names that they have been able to collect through census level data in the U.S. and abroad and scores each name with a probability of that name being associated with a man or a woman. So a name like Matthew, for example would receive a score of zero. A name like Mary would receive a score of 99. In the middle would be names that could be seen as either, names like Taylor or Chris.

While people on the site self-disclose their gender, it’s not available anywhere during the evaluation process, so you’d actually have to do a little digging to find out if someone is actually a man or a woman. We wanted to replicate what was going on when you were sitting in your chair viewing other people’s recommendations and all you could see was the name. So if you saw a name like Mary, there’s a strong possibility that you think that’s a woman, whereas if you see a name like Taylor, maybe you’d be less sure.

Q: What inspired the paper?

I worked in the investment industry for a little bit and the one thing that you notice right away is that it’s pretty dominated by males. Depending on the survey you look at, about 5% to 10% of investment professionals are females, and especially as you move up to more senior levels, that number dwindles.

“We find that women have to substantially outperform the average individual to get the same number of clicks as a man.”

We really wanted to get at whether there’s actually a gender bias, and what got us really excited about it was the fact that in investment management there are core performance indicators. You buy a stock, the stock goes up or down, and that gets attributed to you as part of your performance. We thought this was a really clean setting where we could try to control for performance to see if characteristics that shouldn’t be affecting that performance, such as someone’s gender, are affecting how their colleagues are paying attention to their ideas and rating those ideas.

Q: Did the results jibe with what you expected, given your experience in this industry?

I went in with an expectation that we might find gender bias in the industry, but we thought that this is such a performance-driven industry that if you could actually measure how someone’s performing in the moment that you’re deciding to click on that recommendation to rate it or not to rate it, that that should counteract the subconscious bias in the industry. Unfortunately it looks like in the first stage, when people have very little information about the analysis, people are using gender as some sort of screening characteristic.

But the silver lining is that once they’re given all the pertinent information and they can see how they analyze a stock, the gender bias disappears; men and women are treated exactly the same. So we’re finding bias when there’s search costs and uncertainty—related, maybe, to not having enough pertinent information or too many choices to sort through.

Q: Beyond reflecting gender inequality, what do you think is driving this difference?

It’s hard to tell. In the paper, one thing we do test is whether the bias get worse when there are many ideas to sort through. If there are seven or eight new recommendations on the platform, maybe you click on all of them. But when there are 100 in the last couple of days, that becomes a little harder—you obviously can’t allocate time to all of them. We do find that as there are more recommendations to sift through over the last several days, the gender bias is strongest. Maybe you’re used to working with your male peers, so when you have a lot of recommendations to sift through, you’re going to be more likely to just to click on the ones that seem to fit with the industry.

I think it’s really important to highlight that something like this is going on. Maybe being aware that this bias exists, you might be more likely when faced a choice of clicking on something or letting someone pitch you or hiring someone—to think, wait, maybe I am being a little biased here; let me take a step back.

Q: How much of this is a result of it being online? Would there be a different result in another setting?

That’s an important question, because we need to be clear about boundary conditions with any research. Maybe we’d only see this online, whereas face to face we would expect less of this bias or a different kind of bias. It’s hard to tell how it would translate exactly, but I think there are a lot of contexts that this resembles.

I’ll go back to hiring, where you’re just given a stack of résumés; it’s no different than seeing a screen full of potential investment recommendations. You have very little actual knowledge about the individual and are trying to make the best decision, often time-constrained. A lot of popular jobs are not getting one or two applications—they’re getting hundreds. And there’s one or two hiring managers sifting through them, so they may actually be relying on a lot of these subconscious bias or sorting characteristics. I think it resembles a lot of organizational processes where you don’t actually have face-to-face interaction.

And it obviously translates very well to digital platforms more generally, which have been growing in importance, from crowdfunding platforms to labor markets, such as the freelance market. There are a lot of these online platforms and we think that what we find would be very pertinent to those.

When it comes to organizational processes like hiring, we may even expect a stronger bias because when you’re hiring someone to be an engineer, let’s say, if they don’t have work product that’s solely theirs that you can judge performance on, you might be more likely to fall back on your biases. Whereas we’re finding it in a context where you’re being told this person is outperforming the market, objectively speaking, and we’re still seeing, at least at the first stage, that people are clicking less on recommendations posted by women versus posted by men.

Q: It sounds like money’s being left on the table.

It’s hard to say that these fund managers perform worse because they’re not paying attention, but they are leaving some opportunities out there that are just as good as the ones that they are viewing. They’re not making the decision with all the information they could be.

That brings us to another key point in the paper, which is that we find that women have to substantially outperform the average individual to get the same number of clicks at the first stage as a man. We’re able to match these data with the financial market data for the next year or two so we know what the best ideas were at the time. And what we found is that it’s not only that you have to be a good performer, but you have to be one of the best performers to get the same level of attention.

Q: Do you know whether there’s anything else besides gender that might be impacting how people choose? Ethnicity, for example?

We were really interested in making the paper even broader, not just about gender. But another unfortunate fact about this industry is that it’s heavily dominated by white males. When we started looking at names that suggested different ethnicities, there just weren’t enough people in our sample to have any meaningful analysis. It suggested a very similar result, but the numbers are even less than those of females.

What makes it harder is, when we talked to people in the industry while running this research, some people with more ethnic-signaling names said they would actually publicly use a different first name, so they would not be kind of singled out for their ethnicity.

Women do the same. We presented this paper at one point, and a female former analyst came up to us and said, “Thank you for writing this paper, because my whole career I went by my initials thinking that my name was really holding me back. People in my family thought I was crazy, but people in my industry understood.”

Q: Can you extrapolate from these findings to other industries?

The key reason we focused on finance is the ability to control for objective performance. I think once you move into other industries, it’s really hard to say that this person’s a better writer than that person or a better marketer than this person or they can hire better talent than that person. But I do think that other industries, especially those with stark contrasts in distribution of gender, where we would be able to focus on some sort of evaluation process—project evaluation, end-of-year evaluations, hiring—we would see something similar during conditions of not having enough pertinent information about what’s being evaluated or too much uncertainty or search costs.

Department: Research

Topics: