How to Decrease Gender Bias in Performance Reviews

When Nadia Comăneci scored a perfect 10 for her routine on the uneven bars at the 1976 Olympics, she made history. No other gymnast had ever received such a pristine mark at the games, and she went on to earn six more of them—along with three gold medals—in Montreal that year. Her dazzling performances and those perfect 10s have etched themselves into collective memory as a story of brilliance.

The stories of brilliance we tell ourselves in our everyday lives, at work, and especially in certain fields, however, tend to feature male protagonists. And when it comes to performance reviews, men are far more likely to garner perfect 10s, and the women who work alongside them in the same roles are much less likely to see similar marks—even if they perform just as well by other measures.

So says research recently published in the American Sociological Review. But here’s the surprising part: When evaluations were based on a six-point scale rather than a 10-point scale, the gender gap virtually disappeared.

The results suggest that just a small change to how we design rating systems—even one as seemingly inconsequential as the number of possible ratings on a scale—could disrupt gender bias.

The study authors first looked at real teaching evaluations at an unnamed university in North America, which just so happened to transition from a 10-point scale to a six-point scale. Before the change, male professors in male-dominated subject areas received top, or “10,” ratings in 31.4% of cases, compared to only 19.5% of cases for female professors. After the change, men and women received top, or “6,” ratings 41.2% and 42.7% of the time, respectively.

In other words, the new scale meant that women—in many cases the exact same professors teaching the exact same classes they’d taught before—suddenly got top marks just as often as their male colleagues.

The authors were keenly aware that some critics (and many sexists) would argue that the male professors were simply more likely to be exceptional, and that all the new condensed scale did was muddy the waters and make it harder to distinguish the very good from the truly brilliant.

So their second study controlled for any potential differences in actual teaching quality. They showed online participants the transcript of a lecture supposedly given by a professor (in actuality it was based on a TED Talk), but some were told the instructor was John Anderson and others that it was Julie Anderson.

When participants used a 10-point scale, “John” got the top mark 22% of the time, compared to 13% for “Julie.” But when other participants used a six-point scale they gave “John” and “Julie” top marks 25% and 24% of the time, respectively.

“Whereas the top score on a 10-point scale elicited images of exceptional or perfect performance—and, as a result, activated gender stereotypes of brilliance manifest in raters’ hesitation to assign women top scores—the top score on the six-point scale did not carry such strong performance expectations,” the authors of the paper, Lauren A. Rivera from Northwestern University and András Tilcsik from the University of Toronto, write. “Under the six-point system, evaluators recognized a wider variety of performances—and, critically, performers—as meriting top marks.”

Even though these studies focused on academia, the results should make anyone and everyone think hard about how fair the supposedly objective tools they’re using to measure performance really are. The authors point out that the number 10 has a unique cultural meaning, and so perhaps a scale that relied on it was particularly prone to reflecting biases.

The bottom line is that you have to evaluate the evaluations. If there appears to be a performance gap between groups, ask and investigate whether the problem is the performance itself or how you’re measuring it.

It might seem like these are small, insignificant differences in the grand scheme of things, but reviews impact everything else. “Given that performance ratings are often tied to important rewards, such as salaries, bonuses, and promotions, rating systems can have direct implications for employees’ career trajectories,” the authors write.

Bias builds up. If women get slightly worse performance reviews, they’re less likely to get raises and promotions than their male colleagues, and the cycle repeats itself as fewer and fewer women make it to the most senior levels, especially in male-dominated industries. That in itself reinforces the notion that men are more likely to be brilliant and worthy of those positions of power, which fuels the original stereotypes. And around and around we go.

Here’s an important caveat: The researchers emphasize that while the six-point scale eliminated the gender gap in evaluations, it didn’t magically eradicate gender bias. The new scale simply changed how much the tool reflected existing biases. The participants in the second study were still far more likely to use superlatives to describe “John” than “Julie” when they shared “the words that first came to mind when they thought of the instructor’s teaching performance.” It’s just that these differences were less likely to be reflected in the numeric ratings.

So while changing a scale could help on the surface and in the short run, there’s still a great deal of work to be done to rid the world of the underlying bias—and a long way to go before our stories of brilliance in every field star women in the leading roles as often as they do men.