Schrödinger’s Refs: A Halfhearted Audit of the 2011 NFL Officials
The Titans are trailing by 25 points, deep in the third quarter, when they score a touchdown. The extra point is converted, the onside kick is well-executed, and the opposing team flubs the catch. Players fling themselves at the loose ball from all directions, and it is quickly lost under a pile of bodies. Two referees arrive on the scene, take a moment to sort through the melee, stand up, and point in opposite directions. The referee who had originally signaled against Tennessee realizes that there is dissent in the zebra ranks, and swiftly defers to his colleague, swapping arms to rule in favor of the Titans. Unfortunately, his colleague’s desire for conformity is just as strong, and as one right arm falls, another rises, and once again, the two officials point in opposite directions. The pair of them look less like referees and more like a novelty two-man dance troupe on America’s Got Talent, possibly called “Flip-Flop, Don’t Stop”.
Now, this clearly did not happen this season. As much as I’d like to follow through with my feint, I’m aware that my pull-back-and-reveal trickery is unlikely to fool our razor-sharp readership into thinking that these were replacement referees. Apart from anything else, the title of this article provides a fairly hefty clue to the contrary, and you’ll not be surprised to hear that this happened at Heinz Field in Week 5 of the 2011-12 season, in a game blessed with unionized, non-replacement referees. It should also be noted that those refs (once they’d stopped voguing) succeeded in correctly awarding the ball to the Titans, and the confusion merely served to add some light relief to an otherwise mirthless Steelers victory procession. Pittsburgh intercepted a Hasselbeck pass on the line of scrimmage on the very next play, so the referees’ decision could not have been more moot, even if it were incorrect, and it didn’t even warrant a mention in most game write-ups. And why should it?
But had it happened under the eye of replacement referees, there’s no doubt that everyone would remember it. Every game of the first three weeks of the 2012-13 season was intensely scrutinized for evidence of refereeing incompetence, and that evidence has been extensively documented. It all points in one direction: The replacement referees were terrible. But it doesn’t give us any context, and as someone who instinctively defends referees (in all sports) on the grounds that they do a near-impossible job far better than I could, I found myself compelled to subject the union refs to the same degree of scrutiny, so I decided to review their performance through the 2011 season. I then realized that would mean watching 267 games, and quickly downgraded my objectives, deciding instead to focus on the level of press attention and online anti-referee fan outrage caused by their decisions.
This has not been a scientific process, but it has nonetheless revealed some interesting results. However, before we get onto those, I should outline the methods used, so you can see precisely how unscientific the process has been. I initially decided to monitor levels of anti-referee Twitter activity for each game, but found this to be both mind-numbingly dull, and completely futile. When dealing with something as subjective as football, it’s possible to find someone willing to criticize every vaguely contentious refereeing decision, and someone else who’s happy to defend each call (often the NFL itself), so I instead opted for an approach which was merely mind-numbingly dull and possibly futile. This involved Googling every matchup using various keywords (flag, blown, referee, call, and so on), and then scanning the first few pages of results for reports of refereeing incompetence. I then watched each disputed play and assessed which I thought were genuinely blown calls, or at the very least extremely contentious decisions, and which were merely the marks left on the Internet by sobbing Raiders fans. Once again, I’d like to stress that this process is hugely flawed, for all sorts of reasons, not least the vagaries of Google’s search algorithms, and I’m absolutely certain that if you were foolish enough to try this yourself, you’d come up with different results.
Right, enough caveats. Here’s what I found: Of the 267 games played last season, 65 (just under a quarter) featured controversial calls that were sufficiently noteworthy to turn up in my various searches; a fairly unsurprising result. What is surprising is the distribution of the games flagged as featuring controversial decisions; there were only four in the last three weeks of the regular season (one in 12, or an 8 percent chance), and regular-season Sunday games were less likely to be contentious than Thursday/Monday Night Football and postseason games (23 percent to 30 percent chances respectively). Also noteworthy is that the most controversial periods of the regular season were Weeks 1 to 3 (30 percent controversial) and Weeks 10 and 11 (37 percent controversial).
Clearly, a nationwide television audience increases the odds of a contentious decision appearing in my search results, but there’s also a significant bias toward periods in which the league is at its most competitive. Those first three weeks are the only time in which every fan believes their team can make the playoffs (speaking as a Browns fan, I know how short lived that optimism can be), and as a result, every decision that shapes a game’s outcome in that period is of tremendous importance. It’s a lot harder to get worked up about a phantom pass interference call when your team’s 4-11 for the season, and it’s also clearly a lot easier for a referee to make the right calls when no one cares too much about the result. Change the scenario to a closely contested Monday night battle between two 6-4 sides, and you increase both the chance of referee error, and the chance that journalists and fans will care enough about the controversy to trouble the Google search rankings.
The short version of the above simply reads: The act of observation is enough to influence the event, and if we bear that in mind when analyzing the performance of the replacement refs, it explains an awful lot regarding their performance. It’s easy to forget that they started Week 1 reasonably well, but every error they made was cataloged with forensic rigor by an eagle-eyed media (who were equally rigorous in their background checks) and the replacements clearly couldn’t cope with the enormous level of scrutiny, both on and off the field. Their confidence was likely shattered, they lost any respect they may have originally had with players and coaches, and, by Week 2, refereeing standards subsequently slipped faster than a wide receiver stepping on a referee’s hat in the end zone. Which reminds me; the refs were also hellishly unlucky, with a capacity to conjure disasters from the most unlikely of situations, but that particular incident was one of the very few that didn’t have a precedent from the 2011 season. For instance, who can forget the regular referees gifting the Niners five bonus yards when spotting the ball on a game-winning drive that ended with a 6-yard fourth-and-goal touchdown? Well, not the Detroit Lions, that’s for sure.
Of course, we now have our union referees back, and they’ve returned with a highly competent display in Week 4 (apart from awarding Matt Stafford a TD after he broke the plane of the end zone with the football’s aura? The ball certainly didn’t cross the line, I know that much). Nonetheless, it’s great to have the proper referees back, for all their many failings, and I’m sure they’ll do a fantastic job during the honeymoon period they’ve earned while watching the replacements (mis)handle the season openers. Based on my (so-called) research, I expect that honeymoon to end in Week 10, at which point fans will have tired of seeing fewer incorrect decisions delivered with more haste, and with significantly more confidence. My relief at seeing the original referees return has less to do with an improvement in standards, and more to do with the fact that I can’t help wondering what would have happened had the replacements made the correct call on Golden Tate: Would the NFL have been able to hold firm without that high-profile Monday night error, and how many more games would it have taken for the replacements to become the regulars? I suspect it wouldn’t have required much more game time for the scabs (if you’ll excuse the slightly grisly analogy) to heal themselves, and I’m mightily relieved that we’ll never know the answer.