Q & A

Can you find leadership in the numbers?

Cade Massey and a former student, Rufus Peabody, developed a new way to calculate power rankings for NFL teams. Massey discusses the importance of clean, bias-free statistical analysis, and considers how a study of leadership in athletics might be applied to the business world.


Q: What is a power ranking?

A power ranking evaluates the strength of a team and looks toward its future performance. It’s like a poll, in that it ranks all the teams, only in our case it is statistically driven. We rank the teams based on the predicted point differential against an average team on a neutral field. We adjust team statistics for things like home field, opponent, and game situation, and then weight them according to their relevance for future performance. In short, we look at the past to predict the future.

We’ve generated a little buzz here and there. We’ve been told by professionals we’re foolish for giving this stuff away, but we’re not in it for the money. It’s a fascinating exercise, plus we get to soapbox a bit. A major reason for doing this is to focus on really good analysis and talk about ways to avoid the behavioral biases that can make other rankings unreliable. 

Q: How do you come up with your rankings?

We use play-by-play data, which we have going back nine years, to determine which statistics correlate with future wins and how. We have a lot of history, so we can go back in order to hone our model. Our goal is to get as clean a measure as possible, which is difficult. People may start out looking at the appropriate things, but then factor in bad measures or weight it wrong. This isn’t confined just to sports. People get this wrong in domains all over the place. 

There are a lot of new statistics out there, and we like some of them. But our background in psychology led us in the direction of taking straightforward measurements and finding the purest way to use them. Our version of a better mousetrap is the same mousetrap, just built stronger and truer. I’m sure we’ll end up adding things over time, but for now we’re looking at eight statistics—four offensive, four defensive. We have basic stats, like yards per pass attempt, and scoring efficiency, which is just points scored per yard. The final statistic, which we tested at the last minute, turned out to be very important: contextual-izing performance. So a three-yard gain on third-and-four is very different from the same gain on third-and-two. It turns out to be a distinction that matters, so we went ahead and folded it in. 

Q: So how’s it working?

Well. Very well, in fact. We were early to short some bad teams that started out with good records, and stayed with some good teams that started out with bad records. There’s been one potential aberration. One team has been ranked very high through the first several games though their actual won-loss record has been dismal: the Dallas Cowboys, which happens to be the team I grew up on. They’ve been ranked egregiously high, and they’re likely going to push us back into the laboratory to see how we can improve the model. What are we missing? Maybe nothing. It’s quite possible that the things that led them to lose won’t continue in the future, or that if they do, our model will begin to shift them into their proper spot. 

What we’re counting are things we know matter for the future. We’re trying not to count things that matter in a given game but not the future. The canonical example is fumble recoveries. But what’s interesting about the Cowboys is all the penalties. They come into our model indirectly as chance, but maybe there’s some persistent quality that we need a better way to measure. We’re rookies at this. 

Q: With the Cowboys in particular, you hear a lot about coaching, about discipline, as if their performance is a result of these intangible leadership-type factors. If that’s the case, how do you measure that?

With the Cowboys, the two things you’re hearing are that it’s either the coach’s fault or the quarterback’s. The coach is seen as too nice, essentially, as fostering an undisciplined team, and that leads to penalties. There really is a lot of chance in football that matters for determining the winner. But at what point can you assign the results to a leadership issue, such as a coach who is too nice? We should be able to measure that. 

All this rhetoric around intangibles is overdone. I don’t believe that it never matters, but I am certain that people read more into intangibles than is actually there. One of the main reasons behind our approach is to navigate this problem. We may have gone too far, because there’s absolutely nothing in our system that can pick up on intangibles. We might come back, over time. But part of what we’re doing is trying to counterbalance all the rhetoric out there, especially by the sports pundits, on intangibles. We’re trying to look at objective performance.

[Editor’s note: Subsequent to this conversation, the Cowboys fired their coach, Wade Phillips, on November 8, 2010. In their next five games, they posted a record of three wins and two losses.]

Q: Presumably, some of those intangibles are actually tangible—they just haven’t been identified. 

That’s right. A lot of work in social sciences right now is exploring these behind-the-scenes measurements. And sports is particularly good for this. Take home-field advantage. Is it some fuzzy thing, such as the players getting to sleep in their own beds, or could it be crowd participation? There are people who have looked at the data really hard and believe it’s a referee thing. It’s not coming through the players after all. If you look at our data, you’ll see that teams get 5 to 10% more per rush at home than when they’re away. So it’s real. Why is hard to explain. 

This is why I enjoy studying sports. You can observe things in such greater detail than you can in other organizational settings, and we get to see performance. You can break it down. Hopefully, we can take the insights other places. The other thing is that in every game there is settling up—there is a final outcome. Compare this to financial markets, where you never know the true underlying values, the true outcomes. There is no true outcome, really, in an equity price. This helps me to feel better about the fact that we spend all this time on the sports research.

The trick to all of this is to find a way to harvest chance from other facts. It’s true both in sports and the business world. In 2002, Ohio State won the national championship. They won all their games, but seven, including the title game, were within a touchdown. Basically they squeaked by all year. They had an unflashy quarterback. So people said they were “winners.” They knew how to get it done. They were “clutch.” They possessed all these intangibles. Or maybe they were just lucky. With apologies to the Buckeyes out there, we’d suggest luck had a lot to do with it.

Q: How far can you go in applying this to the business world, specifically in terms of leadership? Is leadership quantifiable? 

Right off I should say I’m sufficiently new to this enterprise that I’ve not performed any real scholarship in this arena. But a lot of what we’ve been discussing, in terms of methodology, are ideas we talk about with our students all the time. This kind of analysis is hugely important for performance evaluation. How do you really judge the performance of an executive? Just as people weight sports statistics wrong, they are often plagued with outcome bias when it comes to performance evaluation. It’s hard to disabuse someone of the notion that if something failed, the guy who decided to do it is to blame. The same is true if the initiative is a success. We shouldn’t forget the role chance plays in all of this. Things happen after a decision is made that are purely chance but dramatically change the outcome, of course. But there is study after study in psychology showing that people’s opinions are biased by the outcome. 

I have another garage project with a guy who did an independent study with me at Duke. He played basketball in college, and professionally in Europe, and worked some with Coach Krzyzewski and the Duke team when he was doing his mba there. Now he works for the nba as their numbers expert. A few years ago, we were trying to come up with a system for evaluating players in the nba. Since there are fewer guys on the court, it should be easier than in other sports. What we did there was use Shapley values, which come from economics. It’s the idea that, essentially, when you add someone to a group you can ask what additional value he brings. And you can do that in basketball. You can observe basics like how much value one guy adds to a group of another four guys by comparing that to the value that everybody else that ever plays with those same four guys adds. You can look at the individual contribution of a player to any set of players, relative to anybody else’s contribution to that same set of players. You’ve got enough data. Yet it’s still very complicated, which tells you something about how hard it is to do in the non-sports world. 

Where would something like this work? How about a consulting firm? At consulting firms, people move from team to team over the course of several years. So you could perform a similar study to get an idea of how much value an employee brings to a team. You could also use this to evaluate the team leaders or the engagement managers. If they were systematic about it, then they might start reassigning people in certain ways to maximize their evaluability. Given how important the quality of people is to their business, it might be worth paying more attention to analytics. 

Another challenge with performance evaluations is that as soon as you start measuring things, people start manipulating their performance to match the measurements. You need measures that are, to the extent possible, outside of people’s manipulability. You want to be careful about going too far away from outcomes. But my sense is that companies don’t do a very thorough or rigorous or creative job coming up with different performance measures. And, yes, I think we could take this approach to an organization and come up with some useful measures, and it would be an interesting thing to do. 

Q: What about looking at top-level leadership? If you could just look at all of the CEO transitions on all the companies on the New York Stock Exchange over a long period, could you draw some kind of conclusion about whether those CEO changes have effects?

CEOs get fired, and they quit, at times that are related to performance. This happens in sports, too. Teams tend to do better after they fire their coach. Is this a reflection on the leadership of the new coach? Does it validate the decision to make a change? What tends to happen in a regression-to-the-mean world, which of course is what we live in, is that after bad situations tend to come better situations, simply because of chance. On average, going forward, chance won’t break against you as often. So good tends to follow bad, or at least better tends to follow bad. Regression to the mean is another way of talking about the role of chance. So with CEOs, a big study like you’re talking about might show that companies tend to do better after the change. Is that because the CEO is better? That’s not at all clear. 

Jeff Sonnenfeld published a nice article recently on CEO charisma. This is another problem with CEO evaluations, and it’s related in some ways. He asked the question, are charismatic CEOs more successful? And what he found is that it’s not the case that charismatic CEOs are more successful. It’s that successful CEOs are perceived as being more charismatic. So that’s a case where it’s just reverse causality. But if you ask people, they see a causal relation.

In a real way, it comes down to finding ways to factor chance out of the equation. A CEO makes a decision, which may be bad or good, and then chance factors into the outcome. I spoke of outcome bias and you just can’t overstate the impact it has in how leaders are evaluated. The key is to get beyond this, to evaluate the person based on the decision made at the time, based on the inputs that were available at the time, and not the chance that can determine their success or failure. There’s study after study that shows people get blamed for things outside their control when things go badly, and they get credit for things that happen outside their control when things go right. This is a terrible way to evaluate performance, but it’s ingrained in the way people operate. You can see it in sports, where the coach is a genius or a bum based on what other people do on the field and the chance interactions between them. You can see it with CEOs, and middle managers, all the way down the line. We need to find a way to evaluate those decisions almost despite their outcomes. It sounds easy, but it’s not. 

Interview conducted and edited by John Zebrowski

Assistant Professor of Organizational Behavior, Yale School of Management