» Rung Up: Are Postseason Umpires Actually Baseball’s Most Accurate?

The postseason is intended to showcase and reward baseball’s best teams, which make their case for inclusion over a revealing 162-game schedule. And while one wouldn’t know it from the invective that accompanies every questionable October call, the same is supposed to be true for umpires. “The goal each and every season,” MLB spokesman Mike Teevan told me via email, “is to have the most deserving Umpires working Postseason games.”

In the small samples of postseason play, the most deserving players and teams don’t always perform the way they did during the regular season. Presumptive every-award winner Clayton Kershaw took two losses in the NLDS after losing three games all year; in the ALDS, probable MVP Mike Trout went 1-for-12. All four division-series favorites were either swept or eliminated in four games. Umpires, too, have been in the spotlight for undesirable reasons. Fourth-year ump Vic Carapazza, working his first postseason, ejected Nationals second baseman Asdrubal Cabrera and manager Matt Williams after they objected to his strike zone in the 10th inning of an 18-inning Game 2. After the Dodgers’ Game 3 loss, outfielder Matt Kemp called umpire Dale Scott’s zone “terrible” and “by far the worst I’ve ever seen.” And with the Nats on the verge of elimination in the ninth inning of Game 4, plate ump Hunter Wendelstedt drew widespread criticism after ringing up shortstop Ian Desmond on what appeared to be a checked swing without appealing to the first-base umpire for a more informed ruling.

Given the elevated stakes and the national profile of postseason games, it’s only natural that umpires would be subjected to greater scrutiny at this time of year and that we’d fixate on every missed call as intently as we dissect every managerial decision. It’s also not surprising that frustrated players, moments after a setback or a demoralizing defeat in the playoffs, would resort to umpire-blaming more quickly than they would after an equivalent call in a meaningless midsummer matchup. In October, more smoke surrounding umpire calls doesn’t necessarily mean more fire.¹ It might just mean that the inevitable mistakes humans make when asked to do an impossible job provoke stronger reactions when the calls matter more.

MLB’s postseason umpire selection system is supposed to minimize those mistakes. “The main component in the selection of Umpires for Postseason assignments is performance during the season,” Teevan wrote. “We factor in results from the Zone Evaluation system for their plate assignments, accuracy on their calls and rulings, and observations of their work by our Supervisory staff. In addition, there is consideration given to an Umpire’s experience level (overall seniority and previous Postseasons), his proficiency at handling situations, health and time missed during the season, and a number of other administrative factors.”

Theoretically, then, umpires who work in the postseason should show above-average regular-season performance, just as the hitters and pitchers on playoff teams will, on the whole, have better stats than those on losing teams. Similarly, postseason umpiring should be more accurate than regular-season umpiring on the whole.

We can look for evidence of increased October umpire quality with data from PITCHf/x, Major League Baseball Advanced Media’s pitch-tracking technology. Calling pitches is only one aspect of an umpire’s job, but it might be the most important, especially now that the expanded instant replay system gives teams recourse to appeal most other kinds of calls.

With assistance from Daren Willman, proprietor of invaluable advanced-stats resource Baseball Savant, I examined umpire correct-call rates from 2009 to 2014, all of which fell into a narrow band between the lower limits of MLB’s tolerance for idiosyncratic strike zones and the upper limits of the human sensory system.² Willman classified strikes on called pitches outside the dimensions of the rulebook strike zone and balls on called pitches inside the zone as incorrect calls. Balls on called pitches outside the zone and strikes on called pitches inside the zone were designated as correct calls. Each umpire’s correct-call rate is simply his tally of correct calls divided by all of his calls.

Among the 79 umpires who called at least 3,000 pitches during the 2014 regular season, the difference between the most accurate (Lance Barksdale, 88.6 percent correct calls) and the least accurate (Brian O’Nora, 84.2 percent) was only 4.4 percentage points. Because full-time umps can call several thousand pitches in a season, though, minor differences in accuracy add up: The gap between Barksdale and O’Nora translates to 193 incorrect calls over the course of a typical umpire’s season, or roughly seven per full game behind the plate (which would, on average, be distributed evenly between teams). Most umpires, however, are clustered so closely together that you’d have a hard time telling the good from the bad by watching.³

Roughly one-third of umpires who call games from behind the plate during the regular season also do so during the postseason. The following table compares the regular-season accuracy of postseason umps to the regular-season accuracy of all umps. If the best ball/strike-callers are being picked for the postseason, the accuracy of the “postseason only” group should be above the league average.

Year	All Umps	Postseason Umps Only
2009	84.8 percent	84.8 percent
2010	85.6	85.3
2011	85.9	85.7
2012	86.2	86.0
2013	86.8	86.6
2014	86.7	86.5
Total	86.0	85.8

It’s not. Only in 2009 did postseason umps even match the league average. In each of the past five seasons, umpires selected for postseason duty have been less accurate than their counterparts who spent October at home. The differences are small, but these are enormous samples — more than 850,000 combined called pitches for the postseason umps alone.

As another check, we can see whether the correct-call rate rises in the postseason relative to the regular season.

Year	Regular Season	Postseason
2009	84.8 percent	83.6 percent
2010	85.6	84.9
2011	85.9	86.6
2012	86.2	86.3
2013	86.8	86.8
2014	86.7	86.0
Total	86.0	85.7

Again, no. In four of the past six seasons, postseason strike zones haven’t been any more accurate than regular-season strike zones.

So what’s going on here? The words Teevan used to describe the league’s goals — to have the “most deserving” umpires, not necessarily the best or most accurate — might be telling. Here’s how this year’s ALCS and NLCS umpiring crews ranked in regular-season correct-call rate (out of 79 qualified umps), along with their accumulated years of service in the majors. (Asterisks denote crew chiefs.)

ALCS
Name	Accuracy Rank	Years of Service
Joe West*	65	37
Brian Gorman	77	23
Marvin Hudson	55	15.5
Dan Iassogna	63	13
Ron Kulpa	68	16
Tim Timmons	67	15.5
Mark Wegner	13	16
NLCS
Gerry Davis*	10	31
Mark Carlson	37	15.5
Phil Cuzzi	16	16
Paul Emmel	51	15.5
Greg Gibson	2	15
Bill Miller	71	16
Bill Welke	75	31

On the American League side, only one member of the seven-man umpiring crew⁴ ranked above the median in accuracy rate. The National League crew fares somewhat better but still contains two of the lowest-ranked umps. Every umpire on the list, however, has logged some serious time in a chest protector. With 37 years of service, Joe West is the active leader among all umps, and even Dan Iassogna, the least-experienced ump in either of this year’s championship series, has put in 13 years. In this group, the top-ranked Barksdale, who has 11 years of major league service, would be the new kid on the black. And Hal Gibson, a 33-year-old ump who debuted in July 2013 but finished third in this year’s accuracy ratings, probably wouldn’t understand any of his colleagues’ pop-culture references.

Los Angeles Dodgers v San Francisco Giants

Although West has been a competent pitch-caller in the past, he wasn’t close to being one of this season’s most accurate umps, and it seems like a stretch to say he’s one of the most proficient at “handling situations.” West earned a one-game suspension in September for grabbing Jonathan Papelbon’s jersey during a dispute; ironically, MLB executive VP for baseball operations Joe Torre, who disciplined West, was once on the receiving end of a shove from him in an incident that resulted in a three-game suspension for West. “Cowboy Joe” had a reputation for being a bit of a brawler more than two decades ago, and he regularly ranks among players’ least-favorite officials. However, he’s the president of the powerful World Umpires Association, and no umpire has been on the job longer, which Teevan called a “consideration.” To be fair, West’s calls were reviewed via replay 52 times and overturned only 22 times, according to data from MLB. That’s a 41.5 percent overturn rate, lower than the league-average 47.3 percent. On the whole, postseason umps had a 43.9 percent overturn rate (although that doesn’t tell us whether their calls were challenged more or less often than average).

In theory, giving preference to experienced umps sounds like a strategy that would improve the quality of the calls. In practice, though, umpiring experience might not matter much more than postseason experience. This season saw the biggest crop of rookie umpires in the past several years, as MLB used 11 first-time major league umps to accommodate the need for replay officials. Despite some anecdotal evidence that rookie/fill-in umps are prone to making mistakes, more comprehensive data shows that rookie umps actually have above-average correct-call rates — and that veteran umps (defined here as those who debuted before 2001) are below average. That would suggest that the replacement level for pitch-calling is high, and that accuracy in judging balls and strikes doesn’t tend to improve over time.

Year	All Umps	Rookie Umps	Veteran Umps
2009	84.8	85.5	84.6
2010	85.6	85.3	85.4
2011	85.9	85.5	85.7
2012	86.2	86.8	86.1
2013	86.8	87.2	86.6
2014	86.7	87.6	86.9
Total	86.0	86.6	85.8

I chose pre-2001 as the cutoff for “veteran” umps because 2001 was the first season for QuesTec’s Umpire Information System, a precursor to PITCHf/x in umpire evaluation. The increased role of technology in internal umpire reviews has made strike zones more standardized and brought them closer in line with rulebook zones, as indicated by the rising percentages in the tables above. As a result, the shape of the zone has changed to make low strikes more frequent without any reduction in the rate of high strikes. Might it be that rookie umps have been more adaptable (or selected for their adherence to the rulebook zone),⁵ and that veteran umps show up as less accurate because they’re still calling strikes the way they were before QuesTec and PITCHf/x?

% Strikes on Low Pitches				% Strikes on High Pitches
Year	All Umps	Rookies	Veterans	Year	All Umps	Rookies	Veterans
2009	72.0	73.1	71.5	2009	86.6	85.6	85.0
2010	81.4	78.7	81.4	2010	87.1	82.6	83.7
2011	78.1	73.2	77.8	2011	86.5	89.9	88.1
2012	81.6	76.8	81.3	2012	88.2	84.3	86.4
2013	84.1	85.4	84.0	2013	83.5	87.5	87.1
2014	86.0	85.2	85.9	2014	85.1	84.7	86.1

No: Both rookies and vets have evolved to call low strikes (those in the lower third of the zone) at roughly the same rate. Young umpires haven’t driven the downward expansion of the strike zone, and older umps aren’t less accurate because they’re clinging to an outdated understanding of the zone.

We should note that there are more sophisticated ways of studying umpire accuracy than simply counting out-of-zone strikes and in-zone balls as missed calls. For instance, one could use a more probabilistic model and assign fractional misses to each mistake based on how often pitches in a given location are called strikes. MLB’s Zone Evaluation system, which Teevan referenced and which I wrote about last year in an article about automating the strike zone, might make more adjustments to the data that could further refine the numbers. MLB also reviews every non-ball/strike call through its SURE system, whether it was challenged or not, which provides a more complete picture of an umpire’s overall accuracy — although accuracy on non-reviewable calls that can’t be undone, such as the ball/strike judgments that made Kemp and Cabrera mad, might be a more important criterion.

The variation in pitch-calling skill among big league umpires is slim enough that choosing one umpire over another might not make any difference in a given game, just as a manager’s decision to signal for a slightly less effective reliever won’t usually cost his team a win. However, it takes only one crucial blown call to inflame a fan base. By allowing longevity, missed time, and politics to play a role in postseason umpire selection, the sport might be making the same mistake Don Mattingly and Matt Williams made in the NLDS: leaving the best available options on the bench at the most important point in the season. Major League Baseball sought to make postseason assignments more flexible in the last round of collective bargaining with the union, but it could behoove MLB to further redefine “deserving” when the league and its umpires negotiate a new contract at the end of this year.