» In Search of the Elusive Makeup Call

The makeup call is an infuriating piece of NBA folklore. We assume it exists, rolling our eyes as officials compensate for one mistake by piling on another. But it takes work to prove something is real, and the NBA will fight like hell against any suggestion that its officials operate with something below pristine integrity. One referee sued a courtside reporter who claimed to overhear the promise of a makeup call.

The fight will take a new turn in the coming weeks, when the journal Economic Inquiry publishes what is among the first peer-reviewed study claiming to present strong evidence of possible makeup calls. Paul Gift, the author of the study and an economics professor at Pepperdine University, scoured five years of play-by-play data to see if particular calls against one team triggered a pattern of calls against the other.

Gift zeroed in on calls against offensive teams that require the most subjective judgment from referees: offensive fouls, traveling calls, and three-second violations. Most other turnover types don’t require the same level of interpretation from referees, Gift argues. The offensive team doesn’t have much of a gripe when it throws the ball out of bounds, diddles around as the 24-second clock expires, or tosses a lazy pass into the waiting hands of a defender.

The charges, travels, illegal screens, and selective three-second enforcement are the calls that inspire wild-eyed anger and Zapruder-style film breakdowns. Gift found they also lead to a stunning uptick in the same calls against the other team, often in the very next possession. If the refs hit Team A with an offensive foul, traveling call, or three-second violation on one possession, the likelihood of their opponents getting dinged with the same call on the next possession jumps by between 16 percent and 66 percent depending on the type of call in question, Gift’s study found.

The makeup effect persists, though in gradually diminishing power, for as long as a half-dozen possessions after the original call, Gift says.

There are caveats, of course. Public play-by-play data doesn’t always distinguish between types of offensive fouls, so there was no way in many cases for Gift to determine whether the original call was a charge, an illegal screen, a push-off, or something else. On the whole, referees were 21 percent more likely than usual to call an offensive foul on one team on the possession immediately following an offensive call on the other team, the study found.

The effect was most pronounced for three-second violations, which were 66 percent more likely in the possession after officials called one on the other team.

Those sound like huge numbers, and they are. But these events are so rare relative to other calls (and non-calls) that even a massive makeup effect wouldn’t manifest itself in every game. Gift found that a team commits an offensive foul on only about two of every 100 possessions. Three-second violations are even rarer; teams commit about one on average every 300 possessions.

“These are reasonably large numbers,” Gift says. “But they are small when you think about it another way.”

The makeup effect remains across all four quarters in the regular season and postseason, regardless of scoring margin, Gift found.

Factors beyond officiating could muck up Gift’s findings. An aggrieved team might play harder right after a shaky offensive call against them, dialing up feistier on-ball defense and more diligent rotations — attentiveness that could lead to an uptick in charges and traveling calls without any help from referees. That effect appears to exist on offense, anyway. Gift found that when a team commits a “non-judgment” turnover on one possession — a lost ball, errant pass, or 24-second violation — it is less likely to commit the same turnover type the next time they get the ball.

Here’s the rub: more effort and urgency from the defensive team should result in an increase of all turnover types, since better defense can make any bad outcome more likely for the offense — missed shots, live-ball turnovers, 24-second calls, and the sorts of judgment calls at the focus of Gift’s study. If teams gear up on defense after a bad charging call torpedoes one of their offensive possessions, we should see a jump in all those bad offensive outcomes for their opponents.

But we don’t, Gift says. Rates of all turnover types stay about the same¹ in the possession after one of Gift’s judgment calls — except for offensive fouls, traveling calls, and three-second violations, per Gift’s parsing of play-by-play data. In other words: Those three turnover types, and only those three turnover types, happen much more often on one end of the floor if they have just happened on the other. And those turnover types require major subjective judgment from the refs.

Again, the analysis does not distinguish between types of offensive fouls. If Gift could isolate charging calls, it’s possible that he’d find that increased defensive effort explains some of the makeup call effect. The analysis also doesn’t drill down to the team level, and teams of course play very different styles on both ends of the floor. Maybe teams that force a lot of turnovers overall can really dig in after a bad offensive foul call against them. Perhaps a low-turnover defense, like last season’s ultra-conservative Blazers, gets pissed off after a bad charging call and flies around the floor with unusual vigor on the other end.

An analysis based on SportVU camera data could examine one subset of calls — say, the alleged makeup traveling calls — and see if increased man-to-man pressure contributed more than guilt-ridden officiating. Like every academic tackling this stuff, Gift did not have access to the specifics of which referee whistled each call; the NBA keeps that data private.

The league read a draft of Gift’s paper two years ago. The NBA was already studying just about every aspect of its officiating, but higher-ups were curious about Gift’s research. It decided to conduct video reviews of makeup call scenarios that fit Gift’s model, says Steven Angel, NBA senior vice-president for referee operations and analytics. The league found no evidence that the second call in those sequences was less accurate than the first call, Angel says.

In theory, we would expect the second call to be more of a stretch, since officials searching for a makeup opportunity might push a rule beyond its boundaries.

This is not the final say on makeup calls, and as Gift is careful to point out, it is not proof that referees are devious or evil. That sort of sensationalist headline popped up after the famous study by Justin Wolfers and Joseph Price in 2007 found that majority-white officiating crews were slightly less likely to call fouls on white players. Critics rushed to demonize referees as racist, when the study really showed evidence of a tiny and subconscious cognitive bias in favor of the familiar.

“This research, to me, shows that referees are human,” Gift says.

The data does not necessarily prove that officials make a conscious decision to do one team a favor as a makeup gift, though that may be the case with some calls. Officials may feel some subconscious regret, or this could be a simple case of a phenomenon called “priming,” by which the act of noticing one thing — a three-second violation, for instance — sets off an involuntary prompt among the rest of the crew to notice the same thing.

The league thinks priming likely explains some of the data. “This paper has some interesting findings,” Angel says. “I come to a different conclusion than the author does, and I have some additional resources at my disposal.”

The evaluation of referees, from both inside and outside the league, will only get more exacting as new data emerges. Adam Silver has promised increased transparency about officiating, and the league this season will send teams more of its internal postgame referee evaluations. The Price-Wolfers effect has vanished since the publication of the study.

The league itself is also getting into the academic study game. A little over a month ago, the NBA hired Arup Sen, a PhD economist who has presented at the MIT Sloan Sports Analytics Conference and once proposed an intriguing redesign of the draft process. Sen will study officiating trends just like outside academics, only with access to the NBA’s secret data.

Officiating will never be perfect; even robot officials would either miss calls or whistle so many fouls as to make NBA games interminably long. Not every bias is insidious, and referees really do try their best to get things right.

“We’re not looking to whitewash anything,” Angel says. “We want to make sure we have the very best officiating we can for every game.”