Picture a baseball player who’s had a bad week. He has a career .375 on-base percentage, but over the past five games he’s gone 2-for-20 with no walks and seven strikeouts (for a .100 on-base percentage). If I asked you to explain the problem, you’d most likely just shrug your shoulders and tell me that the guy was having a rough week and that things were bound to get better. After all, what’s 20 plate appearances in a 162-game season?
But what if I told you that the player is a left-handed batter? And then I added in that he’s faced five left-handed pitchers over this stretch? The presence of new information changes the equation: Historically, left-handed hitters do worse against left-handed pitchers. Expecting our imaginary slugger to maintain a .375 on-base percentage during this week would be overly optimistic. We shouldn’t expect him to perform as poorly, either, but the schedule should’ve raised questions for the manager: Do I have another player who might be worse overall but better situationally? Knowing that a key player might struggle this week, is it a good time to give him a rest — even if he’s generally still the best option?
The field of public soccer analytics has a left-handed-pitcher problem. Since it consists of so many events, baseball is fertile ground for analytics. A whopping 143 players in Major League Baseball had more than 500 plate appearances this season. Meanwhile, in the Premier League last season, only seven players took 100 shots, and only two had more than 50 shots on target. To get around that problem, most of the advances in soccer analytics have come from working with aggregates. Whether it’s some of the more basic concepts (like total shots ratio or comparative shots on target) or slightly more opaque metrics (like expected goals), the process comes from looking at the totals across leagues and then drawing conclusions.
Looking at lots of shots by lots of teams across lots of leagues provides a good idea of how things tend to work. The best teams consistently get more shots on target than their opponents. How many of those shots end up in the back of the net can vary wildly from game to game, but by the end of the season, the teams that win the shots-on-target battles tend to win the points battles as well. Using expected goals gets us to the same place, with the added wrinkles of how likely it is for various types of shots to be scored. To put it simply: A player rounding the keeper gets a higher-percentage chance of turning his shot into a goal than a player attempting a contested header from the same exact location. Expected goals helps show which teams can augment their shots-on-target edge by taking higher-percentage shots and conceding lower-percentage ones. Come season’s end, most teams will end up close to where their averages suggest they should be.
The approach works, but it’s frustrating. Aggregates don’t offer helpful information about why teams are diverging from where analysts expect them to be — only that they’ll likely come back to normal. That left-handed batter who went 2-for-20 was experiencing two separate kinds of bad luck. First, he was performing worse than expected against left-handed pitching, and that just happens sometimes. Second, he got unlucky because he faced five lefties in a row, and while that won’t keep happening, it’s a situation that can be addressed beforehand. However, soccer’s analytical models don’t differentiate between those two kinds of luck. If a baseball manager couldn’t tell the difference, he’d soon lose his job.
Soccer analytics is having some sort of a moment. In February 2014, soccer-data company Opta1 held its first OptaPro Analytics Forum, a kind of Sloan Sports Analytics Conference designed to bring together people doing analytics work with the clubs who might be interested in that work.2 The panel that read the papers submitted for the first conference included Chris Anderson, coauthor of The Numbers Game; Blake Wooster, cofounder and CEO of 21st Club; and Sam Green and Devin Pleuler, Opta’s advanced data analysts.
1. Opta’s data powers ESPN Stats & Info.
2. There’s also the added benefit of showing off to clients why it might be a good idea to purchase Opta data.
Among next year’s crop of judges, some of the names are the same, but the titles sure have changed: Dr. Ian Graham, head of research, Liverpool FC; Sam Green, data scientist, Aston Villa FC; Devin Pleuler, analytics manager, Toronto FC; Ted Knutson, head of player analytics, Smartodds (Brentford FC and FC Midtjylland); and Johannes Harkins, advanced analyst, Opta. Gone are the times when work that was being done in the public sphere was being ignored by clubs at large. Now, Arsene Wenger talks about expected goals in press conferences.
Of course, with increased influence comes increased derision. When Liverpool replaced Brendan Rodgers, their alleged reliance on numbers was supposedly to blame. The same was true when Championship side Brentford, a team that has very publicly invested in an analytics-heavy approach, fired their manager only months after bringing him in this summer. Mainstream derision toward analytics obviously isn’t anything new to sports. But like it once did with baseball and basketball, the increase in anti-analytics sentiment drives home just how ingrained data has become in the soccer world. Once, the conversation centered on whether analytics would ever achieve a foothold inside the game; now, we’ve got the familiar hand-wringing over the fact that it already has.
While the divide between analytics and “proper football men” is destined to fade into meaninglessness, the divide with more immediate and legitimate ramifications is between clubs and the public. There’s what’s being done publicly for the education and entertainment of those who consume the game, and then there’s what’s being done behind the curtain, as teams scrape for every edge they can. As Ted Knutson, perhaps the most prominent analyst to make the switch from public work to club employment, said, “On StatsBomb, I was interested in increasing the overall knowledge of the game. As someone who works inside a football club, it would be in our interest if no one else increased the knowledge of how football works any further.”
It’s not only about how the information is treated, either. It’s also about what the end goal of that information is. Dan Altman, through his statistical consulting company, North Yard Analytics, has done work for public consumption, in addition to working behind the curtain with a number of teams.
“I think the public is most interested in what’s happening now and what’s going to happen next,” Altman said. “By contrast, clubs also want to know a lot about how and why things happened in the past, so they can learn and do better in the future. It’s the difference between the performer and the audience. But clubs need forecasts of the future, too. A classic situation would be to ask, ‘Where are we heading?’ If we don’t like the answer, what can we do to change it?”
“What can we do to change it?”
That’s the question that public soccer analytics, with its aggregated approach, struggles to answer. We’ve progressively improved at figuring out what levels we can expect teams to return to, but no better at separating out how teams get there. From the outside, figuring out what causes fluctuations in a team’s performance ultimately becomes a theoretical academic exercise, the kind of thing an annoying college freshman who just took his or her first philosophy class won’t shut up about. Why does it matter if a team is going through a short spell of excellent form or a short spell of getting lucky, if either one will inevitably end … man?
For example, Atletico Madrid’s two-year run of set-piece dominance seems to have come to an end this season. Was Atletico Madrid getting lucky? Or were their opponents all hapless left-handed batters who couldn’t see that Diego Simeone was walking to the mound and signaling for a LOOGY before each corner kick? For the public, the difference may not matter. Over time, all teams score set-piece goals at roughly the same rate, so who cares why there’s a blip? From the inside, where time and training are finite resources, it’s an extremely important question. Does a manager need to devote extra time to preparing for Simeone’s set pieces? If the team’s success is just a quirk, then no, those precious minutes and hours can be put to use elsewhere. If, however, it’s a repeatable strength of Atletico Madrid, then that should go into opponent game planning. Eventually, enough opponents will focus on stopping specifically that, and Atletico’s edge will ebb. Each outcome looks the same from the outside, but it’s two different paths to the same result, and every Atletico opponent needs to decide which gets them there.
The difficulty of seeing the metaphorical left-handed pitchers in soccer reaches across all parts of the game. Imagine a team of left-handed hitters. They’d be perfect to play in a league with very few left-handed pitchers. Sure, maybe there’d be a weird, unlucky stretch when they happened to run into two or three lefties in a row, but those would pass, and most of the time they’d be uniquely situated to dominate a league. Now, imagine that that team had to play in the Champions League, where lefties came along a lot more often. It’s worth considering whether that is what’s been happening with Arsenal3 (and maybe Manchester City) in recent years, even if you wouldn’t know it from looking at the data and the results would be indistinguishable from bad luck. For the most part, though, they’ve dominated in England, and when the style of play changes against continental sides, they’ve struggled.
3. Not including yesterday’s 2-0 win over Bayern Munich, of course.
The same problem applies to player acquisition. Say there’s a guy whose attacking stats make him look like a superstar: great shooting numbers, plenty of shots from dangerous positions, and lots of chances created. Yet, a new team in a different league signs him and soon finds out that all the things that made him great don’t translate to their tactics, personnel, or the style of the league. Three years on and Tottenham very well may be asking those questions of themselves about Erik Lamela. He’s young and he’s learning, and he may yet figure out Premier League pitching, but gone are the fat fastballs he was feasting on in Italy.
As it stands, analytics in the public sphere have gotten very good at figuring out what a team’s batting average should be. They can confidently tell a team not to worry if they’re mired in a slump, as the predictions are accurate enough to forecast which downturns will eventually turn around. The next step, the one that’s being worked on in private, is figuring out how teams can improve their batting average — how they can spot those lefty pitchers on the mound and how they can better avoid those temporary slumps before they happen. That won’t be easy, given the paucity of data compared to other sports. Then again, it was only a few years ago that people within the game wondered if numbers even mattered at all. And look how far we’ve come since then.