» Rise of the Machines?

In an interview with Baseball Prospectus almost 10 years ago, a project manager for QuesTec, the purveyors of the “Umpire Information System” that Major League Baseball used to evaluate umps from 2002 to 2008, was asked whether he could envision the system simply making the calls instead of serving as a supplement. His answer was emphatic: “Never.” “Not only do we not want to,” he explained, “but since a lot of our data processing is postgame, a real-time application that would satisfy the required pace of the game would be virtually impossible.”

That was then. A decade later, the QuesTec system is dead (though its website lives on, like an ugly Internet time capsule from 2004). In its place, MLB and its clubs rely upon more sophisticated and streamlined motion-tracking technologies, most prominently Sportvision’s PITCHf/x. Sportvision captures information on pitch speed and trajectory via two cameras installed in every big league park, then provides that information to both broadcasters and Major League Baseball Advanced Media. In turn, MLBAM distributes the data in XML format for free, which has created a cottage industry of online PITCHf/x analysts, many of whom have been hired by teams to continue their work out of the public eye.

PITCHf/x is what powers those strike zone overlays you see on your screens during baseball broadcasts — the ones that sometimes suggest that the plate umpire just cost your team a strike. Unlike QuesTec, PITCHf/x requires little turnaround time; according to Marv White, chief technologist for innovation at ESPN and Sportvision’s former CTO, “the data is real time.” Not only is it processed quickly, it’s precise: Alan Nathan, a professor emeritus of physics at the University of Illinois and an expert on the physics of baseball, has demonstrated that the system is accurate to within an inch, or about a third of a baseball diameter.

So, is today’s pitch-tracking technology up to taking over for umpires? “I think devices such as PITCHf/x … are already accurate enough to get the correct strike zone at the 90 percent level,” Nathan says. Many inside the industry agree. “Technologically, I think it’s certainly achievable,” says Mike Port, former Angels and Red Sox general manager and MLB’s vice-president of umpiring from 2005 to 2011. “It could be done,” allows Jim McKean, former MLB ump and umpire supervisor.

But those endorsements come with some caveats. For one thing, PITCHf/x still has hiccups. Occasionally, an error will cause the system to outright miss a pitch, which would be a crippling problem if PITCHf/x were entrusted with sole responsibility for calling balls and strikes. Granted, this is very rare — according to Cory Schwartz, vice-president of stats at MLBAM, PITCHf/x tracked all but 892 of the 709,917 pitches thrown this season (a 99.87 percent success rate). Many of those omissions are concentrated in a small sample of games affected by hardware errors, while the others (usually no more than one per game) are attributable to glare or other adverse lighting. As infrequent as they are, though, the potential for missing pitches would require some fail-safe for a computerized system. “Do human umpires miss pitches? Sure,” says Dan Brooks, founder of PITCHf/x repository BrooksBaseball.net. “But for the most part, they don’t just forget pitches happened. It’s not like everybody is waiting for the call and they’re just like, ‘I didn’t see that one, guys.'”

Calibration problems are another obstacle. From time to time, PITCHf/x systems in certain ballparks can get out of whack, reporting plate locations that can be several inches off. Schwartz says that MLBAM receives “diagnostics … on every system after every game,” and when calibration quirks are detected, the company can make minor software tweaks to tilt or pan an uncooperative camera by a fraction of a degree. If that doesn’t fix what’s ailing the system, technicians can perform a full “field registration,” a two-hour process that involves placing reference markers on the field. Full field regs are done at each ballpark before Opening Day and throughout the season as needed. “With any data-capture system you learn more and more about the strengths and weaknesses and the stress points, and you learn how to strengthen, tighten those up,” Schwartz says. “I would never say it’s 100 percent accurate, I’d never say it’s 100 percent complete. I don’t think anybody would be wise to make a claim like that about pretty much any kind of closed system. But I think that the effort that goes into completeness and detail and accuracy is pretty considerable.”

However, there’s also the fact that fans screaming from the rooftops for robot umps would prefer to forget: Even an automated strike zone would have to have a human element, because the cameras and computers aren’t fully self-sufficient. The difficulty of computerizing calls varies greatly depending on pitch location, so as McKean says, “There’s going to be somebody running that machine that’s going to make judgments.”

The east-west direction is relatively easy, since the horizontal dimensions of the rulebook strike zone don’t shift from at-bat to at-bat. “You could do that right now, with the existing technology,” says Harry Pavlidis, director of technology at Baseball Prospectus and founder of PITCH Info LLC, which provides consulting services to teams. “On 98 percent of the pitches, that would work.”

But determining the top and bottom boundaries of the strike zone is much more difficult, both because batter heights and stances vary (from batter to batter and, to a lesser extent, for the same batter within a season) and because the rulebook description of where the zone extends — from “the bottom of the knees” to “a horizontal line at the midpoint between the top of the shoulders and the top of the uniform pants” — is open to interpretation. Even if a computer could identify that midpoint, how could it pick the moment at which “the batter is prepared to swing at a pitched ball,” the rulebook’s equally vague prescription for when the dimensions are supposed to be set?

It’s hard enough for a human to make that kind of call, and as of now, no completely automated method can reliably accomplish the task. That’s where the PITCHf/x operators, editors, and auditors come in, Schwartz explains. “We have auditors here at our group,” he says. “They’re trained by the umpiring department and the commissioner’s office, and they tweak the tops and bottoms of the strike zones for every called pitch … then that data is saved into our database, and then when we launch the PITCHf/x system for the start of each game, the strike zones that are stored in the database are downloaded to the operator who’s running the PITCHf/x system.”

During the game, according to Schwartz, “[the operators] may tweak a pixel here or there, and then that data in turn is saved back into the database and edited and audited.” Unless a player is making his major league debut — in which case he’d start out with only estimated top and bottom boundaries based on his height — there generally aren’t big adjustments made from game to game. “We’ve gotten to the point where we really change them very, very little, because even though batters will change their stance somewhat, the process we have for maintaining them is very sound,” Schwartz says. “It’s sort of like a spiral, where we gradually spiral in on almost an exact strike zone for every single pitch.”

As always, a human element makes human error inevitable. For instance, here’s how the MLB Gameday application drew Jon Jay’s strike zone in one August at-bat:

Jon Jay Due to an operator error, the top of Jay’s strike zone was set in a region where even Vladimir Guerrero wouldn’t be tempted to swing. If the computer had been calling strikes in this situation, A.J. Burnett could have thrown Jay a pitch roughly the height of the Gateway Arch and gotten away with it. Schwartz says this mistake was made only twice this season, and it’s easily identified and corrected after the fact. However, post hoc corrections aren’t always an acceptable solution when a game is going on. You could, of course, restrict the maximum height at which the system could set the zone to prevent this particular problem from occurring again. But inevitably, some other error or edge case would pop up at an inopportune time. PITCHf/x works very well — maybe better than humans, most of the time. But when something goes wrong with the system, the resulting problems can be perplexing and potentially highly disruptive. As Brooks puts it, to rely on PITCHf/x alone instead of umpires, one would have to be “willing to accept a much smaller amount of inexplicable error in exchange for a larger amount of explicable error.”¹

The goal, of course, is no error, or as close to that ideal as we can possibly come. And so the best solution might be a hybrid approach that combines tradition with technology. Not robot umps, but regular umps with input from robot brains.

Umpires are already more closely acquainted with pitch-tracking technology than is often assumed. “We currently use technology for both evaluation purposes and training purposes,” says MLB spokesman Mike Teevan. “All umpires receive a computerized breakdown of their plate performances, including calls they got right and calls they missed.”

The Zone Evaluation system, as the league’s PITCHf/x-based balls-and-strikes review heuristic has been dubbed, is one component of a comprehensive umpire assessment program. The league claims an average ZE umpire accuracy rate of 95 percent or higher, although certain pitches on which the umpire — but not PITCHf/x — is blocked by the catcher are excluded from the count. Even so, with an average of 156 called pitches per game (between both teams), a 95 percent success rate suggests that eight incorrect calls still slip through, too many not to notice. The explanation, contrary to what you may have heard from hecklers, isn’t that umpires are incompetent. It’s that their job is impossible for human beings to do perfectly.

There’s evidence that MLB’s monitoring, combined with merit-based incentives like All-Star Game and postseason assignments, has had the intended effect. A 2007 study by The Hardball Times found that there had been a 25 percent reduction in the spread among umpire strike zone sizes since the QuesTec system was installed. And while the Zone Evaluation system’s output is proprietary, data from Brooks Baseball suggests that the percentage of calls umpires and PITCHf/x have in common has increased by a few percentage points since 2008 (though the publicly available info provided by Pavlidis pegs the current rate closer to 90 percent than 95).

If the knowledge that they’re being graded, coupled with postgame PITCHf/x feedback, can have an appreciable effect on the way umpires call pitches, imagine what real-time, in-game feedback could accomplish. Along with the expanded replay review system in the works for next season, it’s the logical next step in leveraging technology to make officiating more accurate: Instead of waiting until after the game to tell an umpire how he’s stacking up to the system, give him access to the same real-time information that fans watching at home and in the stands can see.

Pavlidis proposes putting a buzzer in the umpire’s pocket. Brooks and Port prefer a visual aid. As Port points out, an understandable instinct for self-preservation leads umpires to position themselves in the “slot” between batter and catcher, which gives them an off-center perspective on the plate and impairs their ability to see the outside corner. Port likes the idea of a PITCHf/x-informed “heads-up display … that would give the umpire a better overall look at the strike zone,” compensating for the need to protect his non-robotic body. “If he centers it properly as the pitch comes in,” Port continues, “in the corner of the mask there is a red or green light” that would signal strike or ball. An LED indicator would be an easy addition to make to the umpire’s mask.

The umpire wouldn’t be obligated to parrot the PITCHf/x ruling, though if he went against it too often, his Zone Evaluation results would suffer. Instead, he could use it as a tool to improve his own sense of the strike zone and as a valuable piece of corroborating or contradictory evidence when he’s worried that he didn’t get a great look at a pitch.

This arrangement might seem unnecessarily sentimental — after all, if PITCHf/x is the standard by which umpires are judged, and it works just as quickly, why not just drop the middlemen? Actually, there are a few reasons. For one, putting PITCHf/x on strike-zone duty wouldn’t render plate umpires redundant. Although calling balls and strikes is the plate ump’s most important task, he also determines whether a batter is hit by a pitch or by a batted ball in fair territory, as well as whether a ball is foul tipped, a hitter commits to a swing, or a pitcher balks; ensures that the plate and ball are kept in clean, playable condition and (theoretically) that the pace of the game is maintained; and rules on tag and force plays at home, among other jobs. That’s a long list of responsibilities for a robot.

And while you may care more about accuracy than about what the umpires want, it pays to be pragmatic. Major league umpires — and their union, the World Umpires Association — are far more likely to accept an expanded “advisory” role for PITCHf/x than they are to sign off on giving up pitch-calling completely. If the goal is improving accuracy as quickly as possible, there’s something to be said for taking a path of less resistance that leads to a significant incremental improvement.

The most persuasive factor in favor of the PITCHf/x-aided umpire plan is the benefit of having a human on hand in case of occasional calibration errors and problems with missed pitches. “I think having a buffer between the technology and the judgment is very useful,” says Brooks. Nathan concurs, adding, “there would have to be some kind of ‘manual override.'” And for human umps to maintain the focus necessary to make an accurate ruling when PITCHf/x has issues, they would have to have the final say under normal circumstances. As Pavlidis puts it, “Once you take them away from being the primary, they’re no longer capable of making that call.”

Ninety-five percent accurate or not, major league umpires clearly aren’t quite good enough to keep fans from complaining. But it’s possible that even a perfectly accurate system wouldn’t satisfy everyone. Replace the umpires with an automated system and we’d still see some people who believe that the robot isn’t properly programmed (or has become self-aware and developed an affinity for the other team). When deciding whether to do something, MLB has to ask itself a question Port asked me: “Do you want to go into a technological arrangement just to cover that remaining four-plus percent [of called pitches] when you’re still going to have people questioning technology?” A discussion that starts out being about whether we can do something quickly morphs into one about whether we should. Automating the strike zone is as much a philosophical matter as it is a technological one.

Keep in mind that automation could have unintended consequences, the kind we aren’t inclined to consider when, in the angry aftermath of an incorrect call, we let loose a reflexive #robotumps. Tinkering with the strike zone is serious business; as Bill James wrote in his 1988 Abstract, “The strike zone is at the very heart of a baseball game. An inch in the strike zone means far more than 10 yards in the outfield.” Previous adjustments to the zone have produced significant changes in baseball’s offensive environment, and it’s possible that even a stricter observance of the current zone could do the same. “Going to a fully standardized, technology-based system would be way more jarring than people realize,” Pavlidis says.

Over time, players have internalized some of the idiosyncrasies of the strike zone as it’s currently called. The zone called against left-handed hitters is shifted a couple inches away relative to righties. The size of the zone fluctuates depending on the count — expanding dramatically on 3-0 and shrinking severely on 0-2 — and according to the base-out state, velocity of the pitch, and many other factors. Yes, these are all arguments in support of standardizing the strike zone, assuming you like to see pitches called according to code. They’re also reasons to exercise caution. “Because it’s always worked this way” isn’t a good reason not to do something different, but it is a reason to think through the possible ramifications before making a major change that could upset the delicate batter-pitcher balance. Players will adjust to whatever the zone looks like, but it’s in baseball’s best interests to make those adjustments smooth.

McKean cautions that instituting an automatic zone “would ruin the game,” which makes him the latest in a long line of thus-far-incorrect critics who’ve warned that something would be the end of baseball. “If you told the pitchers to try and throw that ball with an automatic strike zone, which means it has to hit some part of that plate or be in some part of that strike zone, heck, the games would go on for five, six hours,” he says. My guess is that he has the direction of the effect right, but the magnitude wrong. Automating the strike zone would probably make it slightly smaller, on the whole, and more predictable for the hitter. That could increase scoring and perhaps lead to longer games, but not to such an extent that the sport would be broken.

However, standardizing the zone would remove a level of interplay between batter, pitcher, catcher, and umpire that many fans find compelling. No longer could a savvy pitcher with pinpoint command annex extra territory off the corners, like Tom Glavine or Mariano Rivera, or learn how to tailor his approach to each umpire’s personalized zone. And catcher receiving skills — the impact of which has only recently been recognized — would become obsolete overnight. With less emphasis on soft hands and the ability to frame pitches, catcher could become a more offensively oriented position. “People are going to have to understand that if you have a technological system, it is evaluating where the ball crosses the plate in relation to the strike zone, not where the catcher catches it,” Port says.

While these changes might make the batter-pitcher confrontation fairer, they would also sap it of some of its nuance, leaving less to analyze and discuss. And if the major league strike zone is automated, the same system would have to be put in place throughout the minors, at least, so that prospects could become accustomed to the same conditions. Only some minor league parks are equipped with PITCHf/x, and installing the system in others would represent an additional expense.

McKean offers another argument in support of keeping umpires around: Removing them, or reducing their role, might make baseball more boring. The former umpire makes the case that the controversy generated by incorrect calls — or at least the perception of incorrect calls — generates excitement. “I’ve been out there and had arguments with managers on the field,” he says. “When the manager walks off the field, he gets a standing ovation. So you want to take all that out of baseball?” It’s true that bad calls might not be bad for baseball’s bottom line; if a fan cares enough to complain about a call, he or she is probably already hooked. But the argument for fan excitement seems like a slippery slope — you wouldn’t want to make calls less accurate in order to create more controversy.

The “human element” argument in favor of flesh-and-blood umpires is often dismissed as an admission of fear: fear of change, fear of the unknown, and fear that one day they’ll build a robot that can replace us, too. But it’s not just a sentiment that only Luddites could love. It contains a kernel of truth: Robot umps really would have human consequences.

For now, the automated–strike zone debate is academic, with no indication that robot umps are in MLB’s plans. “Balls and strikes will not be subject to the instant-replay expansion we are currently studying,” Teevan says. “We believe our umpires do an excellent job calling balls and strikes.”

And while many inside the game are eager to see the use of replay expanded, Twitter’s enthusiasm for robot umps isn’t as widespread within the industry. “That usually comes, I would say, from well-intended fan interest,” Port says. “I cannot tell you that I have ever heard of anyone within baseball saying, ‘Let’s go to an automated strike zone’ … not even managers or coaches.”²

But as motion-tracking technology becomes even more trustworthy and widely embraced — not only in baseball, but in tennis, cricket, and soccer, among other sports — support for the idea will spread. It has to, because the history of baseball is a story of specialization. Today’s pitchers average fewer innings than ever. American League pitchers don’t hit, and designated hitters don’t field. The last player-manager (Pete Rose) retired in 1986, and non-player managers have seen their coaching staffs swell to the point that a single hitting coach no longer suffices for some teams. Similar waves of specialization have swept through the front office and even the broadcast booth. Some of these changes don’t sit well with old-school fans, but baseball has become a bigger, more competitive business, and as the stakes have risen, so has the level of competition. Everyone has to be better at their job.

Umpires are no exception. In the beginning, there was one umpire on the field. Now there are four, each with separate responsibilities — and in the playoffs, when the calls really have to be right, the crews expand to six. We’ve reached the point at which a greater human element isn’t the answer: It’s time for technology to lend another helping hand. Sell real-time PITCHf/x feedback not as an indictment of umpires, but as recognition of the fact that they do a very difficult job. Ask umpires for input. Try out the system in spring training. And watch the accuracy rate rise. “Ten to 15 years ago, the technology of the month was that big overhead camera,” major league umpire Fieldin Culbreth said in As They See ‘Em: A Fan’s Travels in the Land of Umpires. “Then there was slow-motion replay, then QuesTec and the strike zone they put up on the TV screen. But you know what? Every step along the way, technology has ended up befriending the umpires.”

Umpires are already better than you think. But there isn’t one of them who couldn’t benefit from a robot friend.