A Model for Understanding with Edward Kaplan

Sean David Williams

Subscribe to Lessons from the Yale School of Management in Apple Podcasts, Spotify, YouTube, or your favorite podcast player.

While the rest of the world races to pour huge amounts of data into AI systems, operations research expert Edward Kaplan is focused on what he calls “little data”: finding the right mathematical model to illuminate a business or public policy problem when you have limited information.

Hosted by Blake Eskin

I got into Grand Central with seven minutes to spare. That was less than I’d planned for, but sometimes it’s hard to get out of the house in the morning. And because it had been so hard, I hadn’t had breakfast and I was hungry. ‌

On the way to the platform I walked by a kiosk that sells coffee and pastry. There was one person behind the counter and three people waiting in line. I studied the line, tried to size the people up. If they were black-coffee-and-croissant types, I could probably place my order and make my train, which was now leaving in six minutes. That is, if the guy behind the counter had any hustle in him. But if any of those three people wanted a bagel with vegetable cream cheese and onion and… no, no wait, forget about the onion, just cream cheese and tomato and a latte half-caff with almond milk. Oh, you’re out of almond milk? What other kinds of milk do you have? Then there would be no hope for me. I was going to be grouchy by the time I got to New Haven.‌

But better to arrive grouchy than miss my train, and I’m trying to cut back on carbs. So I ditched the line, caught the train, and grabbed a snack on my way over to the Yale School of Management, and it was there, talking to Ed Kaplan, that I started to see that waiting-in-line situation differently. You see, Kaplan is a professor of operations research at SOM. He’s been teaching a class called Policy Modeling for almost four decades, and the dilemma I faced was right up his alley.‌

I’m very good at something called queuing theory.

In the world of queuing theory, my decision to ditch the line that morning is commonplace enough to have a name.‌

That’s actually a well-known behavior in queuing theory that’s known as reneging or abandonment, where basically people join a queue and they start to wait, but then they give up because they get too impatient. Which is not the same thing as walking into a room and seeing that the line is so large that you just turn around and leave right away. That’s called balking.

OK, so I balked, I didn’t renege.‌

Ok, fine.

Now if you know anything about queuing theory, which I did not, my missed breakfast is a pretty obvious queuing problem. I was a customer balking at a queue because I was nervous about getting served in time. But Kaplan, as he says, is very good at queuing theory and he can look at situations that don’t look anything like my breakfast dilemma, situations where you can’t see any customers waiting to be served, and model those as queuing problems too. ‌

For example, slowing the spread of viruses or thwarting terrorism plots—these can be framed as queuing problems. And queuing theory isn’t the only thing Kaplan is very good at. He’s able to take lots of situations and find a way to boil them down to equations. And in doing so, in finding simple mathematical models for real-world problems, Kaplan can identify solutions that otherwise might not be so obvious.‌

The most important skill is taking whatever the issue is and being able to structure it in a way that you can analyze and that you see structure that enables you to build a model for that problem.

Today’s lesson is about tackling complex problems even when you don’t have a lot to go on.‌

When I got to Kaplan’s office, he was wearing a T-shirt, no collar, and his vibe was much more mathematician or engineer than business executive. There’s a lot going on in his office. In the background, you can hear the whir of his computer hard drives. Behind his desk, there’s a flyer for the Saskatoon Klezmer Band, which Kaplan’s father started; Kaplan grew up in Saskatchewan. And on the table in front of him about $15 in loose change, a memorial of sorts to his former coffee buddy, Sharon Oster, who was a professor at SOM and—‌

Former dean of the school. Actually, you’re sitting in her chair, which is fine.

Oster died a few years ago, but Kaplan is a creature of habit.‌

I just never really liked carrying change around, so I’d always just kind of come in and clonk it on the table. And then Sharon would come in and she would, while we’re talking, shooting the breeze, make all kinds of intricate designs like geometric patterns and stack things up, little pyramids or whatever. None of that is there now, of course; it’s all been knocked down. But it just kind of became a thing.

Kaplan holds multiple appointments in the School of Management, the School of Public Health in Engineering, and there are more.‌

Apparently I have a secondary appointment in Statistics and Data Science too. I show up on the website. I’m affiliated, I guess, with Economics.

His mind, in other words, goes in a lot of directions. In our two hours together, we covered the NCAA basketball bracket, immigration, World War II, shoplifting, elections…and that’s in addition to everything else you’re going to hear about. Kaplan is full of information, and management students are often trained to gather as much information as they can when they try to solve a problem. But that’s not usually how Kaplan works.‌

Everybody talks about big data. Everybody talks data mining and AI and other things. No, I do little data. I squeeze water out of a rock.

And sometimes big data isn’t the best way to get an answer to an urgent problem.‌

Is it more important to try to be extremely precise and get everything right down to the last detail, which is almost never possible? Sometimes it’s important if you literally are landing a rocket on the moon, for example, but most of the time it’s not. Or are you better off coming up with a model, the result of which is qualitative understanding of the phenomenon, whatever that phenomenon is? And that’s the thing that you can explain to people, and that’s the thing that people can take away with them. Not the equations, not the epsilons, not the deltas, but actually the result of the modeling.

This, I think, is Kaplan’s superpower: his ability to find simple solutions to daunting, complex problems that there isn’t a lot of data for, and in areas where he may not have any previous expertise. And it’s because he’s not a specialist in the field that Kaplan can step back and recognize how those problems share a structure with other problems that we already know how to describe.‌

I’ll see if I can give an example of this. I’ve never taken a class in public health in my life. I’ve never taken a class in epidemiology in my life, and yet a lot of my best-known work is exactly in that area. How is that even possible? Well, a standard problem in epidemiology is you want to know how many infected people there are with something. HIV, COVID, tuberculosis, whatever it is.

Whatever it is less important for Kaplan than the underlying structure. And for that, he turns to queuing theory.‌

Queuing theory is a theory about, basically, customers who are waiting in lines and trying to get processed, and you’re trying to get, on the one hand, descriptive measures: How long do people have to wait on average? How many people are waiting? How busy are the servers, how efficient is the system? And once you kind of have that static picture in their head, how can you design things to be better? You can come up with better policies for selecting customers, for ordering, all this kind of stuff.

And then I look at the problem in public health and I say, wait a minute, infections are like customers. The time from when a person becomes infected until they develop symptoms, or the infectious period, as it were, is kind of like a service time. That’s when people are going to be out there infecting people.

And once I realize that, I can use these ideas from queuing theory and start answering questions like, How many infected people are there? When is this outbreak going to peak? What’s the ultimate fraction of people that are going to be infected? How can I use that knowledge and all of a sudden translate it into all kinds of different pieces of data that people wouldn’t have thought of before?

As Kaplan talks through this, he’s not writing equations with epsilons and deltas on a whiteboard. Instead, he’s making the logic behind his model clear. Infections are like customers. And that insight sets you up to make decisions without a lot of data.‌

For example, if I’ve got people who are infected, they’re transmitting infections—if you can test everybody under the sun, you can watch the whole process, but you can’t. And COVID for sure, you couldn’t, we didn’t have any tests. But it manifests in terms of cases. It manifests in terms of people going to the hospital. It manifests in terms of people dying. It manifests in terms of what you can see in the wastewater. And all of these different manifestations are coming from the same process. It’s the same customers showing up, the infections, but all of a sudden these different things are being observed. So the question is, how do you connect all that stuff together

And for a lot of people, it may not have been obvious, but for me it was pretty direct because you could see all of these other measures, cases, hospitalizations, deaths are basically lagged indicators—gesundheit, speaking of public health—of when people got infected in the first place,

That sneeze, that was my producer, Julie. She’s fine, not contagious.‌

Anyway, Kaplan’s example—this isn’t a hypothetical. When COVID first emerged, Yale asked its faculty, including Kaplan, to help manage the crisis. So along with a statistician and a public health scholar, Kaplan produced a model to determine how often you need to test undergraduates to contain the virus in the dorms. Was once a week enough? To figure this out the team had to start somewhere, so they made some assumptions. In other words, they made up some numbers. Let’s say there are three infected students, and each day one more gets infected through contact with someone off campus. And containing the virus means infection rates need to stay below 5%. Once a week that wouldn’t be enough, it turned out; you’d have to test students every three days. This is how he teaches his students in policy modeling to approach problems.‌

I have some soundbites that I use in the class which turn out to be pretty effective in getting some of these ideas across. One of my favorite ones is I say, OK, this is policy modeling. When we don’t know something, we assume we do.

What does that mean? I need to know a number. I need to know some parameter. I don’t know what it is, so I’m just going to invent something and say, OK, I’m going to call it Q. Now we’re going to go through this thought. Let’s pretend if we knew what this thing was, what would we do with it? How would you use it? And then all of a sudden, the answer to your original question just ends up falling out because you’re able to build the whole system around it pretending you knew what the answer was. We use this over and over and over again.

How did Kaplan get so good at policy modeling? It all started back in Saskatoon.‌

I just always enjoyed mathematics, but it wasn’t merely the abstraction part of it. It was the idea of using logic in a systematic way to solve problems. I always liked math more on the applied side than on the pure theory side.

Then he went to college in Montreal at McGill, where he was thinking he’d like to major in urban planning, but there was a class in the course catalog that caught his eye.‌

Mathematical Models and Applications. That’s the name of the course. It’s also the name of a book that I have sitting on the shelf, which was the textbook for that class. And I started reading about this. In one class, you would study things as disparate as the laws of planetary motion, the spread of rumors, traffic, voting—just all manner of different real problems, but using mathematics.

Kaplan knew he wanted to take that course, but there were a lot of prerequisites. So he had to chart a path of study specifically designed to prepare him for that class. This, by the way, is another strategy Kaplan teaches for solving problems.‌

I actually ended up solving, without knowing it, an applied mathematical problem called dynamic programming, which we can summarize as, if I know what I want to do and where I want to be at some point in the future, what do I have to do now to get me there? And what you do is work backwards. A lot of people work forwards with problems. Sometimes you have to say, where is it you want to end up? And then you work backwards.

By the way, that’s the same principle for the best ways of filling out basketball brackets for March Madness. Don’t ask, who’s going to win this game? Who’s going to win this game? Ask yourself at the beginning, who’s going to win the whole tournament? And then work backwards and say, well, if they’re going to win the tournament, then they have to win all the intervening games.

So I took that class and I loved it and it really changed me.

What the road to Damascus was for Paul, mathematical models and applications became for Ed Kaplan. His policy modeling course, the one he’s taught for 37 years, is its own thing, but he got a lot of inspiration from that modeling class back in Montreal.‌

The kinds of topics that I deal with did not show up in that class. But what did show up in that class and what does show up in my class is this idea that you can actually talk about what makes good versus bad models.

Then in graduate school, Kaplan discovered a field where a polymath like him could thrive.‌

One of my professors was a professor of Operations Research, and this was the first time I’d ever heard of that field, but it was very much applied mathematics for problem solving. And his expertise was urban problem solving. It was sort of exactly the sort of stuff that I was interested in. And all of a sudden, it became clear to me that it was that kind of modeling and that kind of mathematics applied to real problems which was much more interesting to me than city planning per se.

For the operations research work Kaplan does, the math is not an end in and of itself, but it opens up a lot of possibilities, especially when there isn’t much data.

A lot of policy problems, if you think about it, are dominated by people who are substantive experts. People understand housing policy. You have doctors who understand addiction or epidemiologists who understand how infectious diseases spread. We could kind of go on and on and on in different policy areas, but in almost every case, they do not have the mathematical ability to sort of formulate some of the key problems crisply, and then develop solutions based on that mathematics. And that’s what I became an expert at.

It isn’t even so much that what I do is the world’s most complicated math. That’s not true at all. In fact, I put a premium on making things as simple as possible because if you want someone to adopt the solution or consider it, they need to understand it. And if they’re going to understand it, they have to be able to understand it at a level that does not require equations.

You don’t need to understand queuing theory to understand that, as Kaplan says, infections are like customers. Public housing tenants are also like customers. This is something Kaplan realized when he was working for the city of Boston.‌

It’s a peculiar kind of a queue because of course, people don’t wait forever. People are waiting for public housing, all of a sudden something happens—it could be that they found something that was affordable on the private market; it could be that they leave the area altogether; it could be that they move in with a relative. Lots of things. Which means that from the queuing point of view, people who were in line have dropped out.

And renovating a public housing project is also a queuing problem.‌

Basically, you have units that are being taken out of service and new units are replaced, and they’re coming on board. And so you get this fluctuating inventory of units, which becomes this queue. And what you’re trying to do is basically minimize the number of vacancies over this whole time. If you have a lot of vacancies, of course that’s inefficient. On the other hand, you can never have a situation where there’s not enough vacancies so that people actually don’t have a place to live. So this actually turned out to be a model that looks like a great big game of musical chairs, in a way. You’re moving everybody around trying to make sure that everybody always has a place to sit, so to speak. But you’re trying to make the whole thing go as fast as possible because this is a big construction project, so time literally is money.

The Boston Housing Authority—they didn’t know much about operations research.‌

I walk into a room, it’s about, I don’t know, a quarter of the size of this one. No windows, big pieces of paper taped to the wall where they had this housing project that had, I don’t know, 15 buildings or something, and they were literally by hand trying to say, OK, next Thursday the Smith family is going to move from this apartment to that apartment, this building, that building, whatever. And they’d gotten about halfway through and they got stuck. And by getting stuck, I mean it was impossible for them to make the moves. They didn’t have any available units that were appropriate for the people who had to be moved out, and you had to move the people out of the building so that you could modernize it. ‌

I looked at them and I said, how many buildings do you have? It’s like 15. I’m thinking to myself, how many sequences have you tried? Well, this is the first one.

Fifteen factorial is a big number! I mean 15 times 14 times 13 times 12… That’s the number of different sequences there are. I said, there’s a better way to do this. And so that actually was the first problem that got me into this whole public housing thing. And the problem, was come up with an optimal—that is to say, time-minimizing—sequence that would make sure that everybody is always housed in an appropriately sized unit. I was able to use this analysis and shaved something like six months off the whole process. And it was really important. I mean, the people who were the contractors were coming in and they had their own ideas about how to sequence things. “Oh, this would never work.” Well, actually, the model said no, it would work. It produced a schedule that said, you can move these people on this day and these people on this day, and it all worked.

Kaplan’s PhD thesis was called Managing the Demand for Public Housing. Since then, he’s worked on all sorts of problems. He built another model showing that you could reduce sexual assault in a South African township by building more toilets. How did that solve the problem? Well, more toilets means, on average, closer toilets, which means women don’t have to walk as far in the middle of the night, so it’s less risky. These projects tend to be collaborations with other researchers who specialize in a specific topic and they come to Kaplan for help.‌

For me, while I agree that it’s very hard to be successful if you work on something that you’re not interested in, but it’s not enough that you’re the one who’s interested in it. Especially with this kind of work, there needs to be some third party that thinks this is the most important thing in the world. Otherwise, you’re kind of doing all this work and at the end, who cares? You don’t want to work on problems where at the end people can come up to you and say, who cares? That’s a bad use of your time, in my view—for what I’m doing.

We’ve talked a lot so far about queuing problems, but not every situation can be reduced to a queuing problem. Kaplan tells his policy modeling students about different kinds of problems.‌

A queuing problem, a Markov problem, a Poisson problem, a certain kind of optimization, whatever, cost-effectiveness, whatever.

But more than that, he’s trying to help them develop a way of thinking about problems. So is that the skill you’re trying to teach? The skill of is this inductive reasoning?‌

It turns out that what we’re doing is not inductive. Or it’s certainly not inferential. In other words, it’s not like statistics; it’s the opposite. What we do is you make a set of assumptions which are very simple, and you try to get as far as you can with the simplest model, and you only start making things more complicated if you have to. Because the beauty of the simple models is that they’re easy to understand and you can explain them.

What a lot of people do is exactly the opposite of what I do. What they do is they go out, they collect masses of data, and then they just stare at the data. And somehow or another, the truth about whatever they’re studying is just going to leap out.

But the problem with doing that is if you go in and you don’t have a very clear sense of what the question is you’re trying to answer, then how do you even know what the right data are to collect? Whereas with me, the modeling is what tells me what data I have to collect.

In his Policy Modeling class, Kaplan teaches his students about the study that brought him his first big burst of attention back in the 1990s. It was about whether needle exchange programs could slow the spread of HIV-AIDS. One way to do public health research is through surveys, but people aren’t always the most reliable source of data about themselves. So Kaplan’s team found their answers elsewhere.‌

We came up with a system that basically let the needles do the talking. They don’t have the same proclivity to please the interviewer. That was what was so compelling about the study.

And he found a clever way to explain it all.‌

With needle exchange, the analogy is to malaria. People infect mosquitoes; mosquitoes infect people. All right, so now people infect needles; needles infect people. The needles played a role of the mosquito—imagining that you’re going to study malaria, not by testing people, but by testing mosquitoes. And the rate at which people get infected depends on what fraction of the mosquitoes are infected.

The same thing happens here. The rate of those people get infected from needle sharing depends on the fraction of needles that are infected. And that’s something that you can directly affect with the needle exchange program. Because after all, here’s two needles; this one’s been floating around for three weeks in the population, and this one’s been floating around for three days. Which one’s more likely to be HIV positive? Not too hard to figure that out, right?

And it all came back to the model.‌

The entire evaluation of this needle exchange program was driven by a mathematical model of how needle exchange actually worked. And on the basis of that model, we were able to say, what do we need to know in order to figure out if it’s working? And that led to this whole idea of tracking the needles, testing the needles, being able to identify the needles, so that whenever a needle would come back, we would know who it was given to and if it was the same person as the person who returned it. And could we see, how long that needle had actually been out in the community? And what fraction of the needles were actually infectious with HIV? We were able to do all of that. But the model was what told us what data we needed. Which is very different than going in with a big survey and saying, all right, tell me all the different drugs you use. Tell me how many times you inject. Tell me how many partners you have. Just answer all of these questions.

After two hours in Kaplan’s office, I was starting to understand what he does. Not the details of the math, but the approach to problem solving. So needles are like mosquitoes. Customers are like infections. Public housing tenants are like hungry podcast hosts. There’s a practical side of all this for the MBA students. Most of them won’t be creating their own models, but they are going to hear about them and have to evaluate them.‌

Especially if they’re going into areas like consulting, for example. They need to know whether or not what some analyst is coming up with is—does it pass the laugh test, in some sense?

But most importantly, Kaplan wants his students to learn how to think.‌

What it’s giving them is the facility to take a look at something, a problem that they read in the newspaper or they know something about, and to see structure in it, to see simple structure in it, and to see the consequences of that structure. It’s not a math class—a theorem and proof thing. It really is now that you understand how some of these simple models work, you can put them together and create your own models for problems that just come to you, not in mathematical form.

What I really consider to be the best student is not the one who gets a perfect score on all the problems, but someone who could take a newspaper article and create a problem set based on it, that would lead somebody step by step through an analysis of a problem. That to me is great.

Little data. Getting water from a rock. It’s a useful approach, but the world isn’t moving that way.‌

You know, a lot of attention these days has gone to the AI world and into the big data and machine learning and stuff like that, and there’s some overlap. There are some questions that I look at with this approach that someone who knows nothing about the process or whatever can just take a bunch of data and dump it into an AI program.

And then the program could give you an answer, and quickly. But having an answer isn’t always enough.‌

People would have no clue as to why, but just in terms of getting numbers out, they can do that. And so that’s where the limit of AI is pushing us here.

Kaplan feels strongly that his back-of-the-envelope approach to problem solving has lasting value.‌

I think there’s always going to be an audience for this because there’s always going to be people who need to be able to explain stuff, and who really do want to understand stuff as opposed to just push a button and get an answer.

Getting the correct answer matters less to Kaplan than formulating the right question. It’s like the difference between following directions on the map in your smartphone versus looking at the sky or the shadows on the ground and knowing where to go. Because with some problems with some of the most important urgent problems, in fact, there isn’t going to be a map yet for how to solve them. But there may be a model.‌