The NHL’s Analytics Awakening
This time last year, mentioning the word “analytics” in hockey circles was a good way to bring any conversation to a screeching halt. At best, you might get blank stares. At worse, you could expect a scowl, followed by a clichéd lecture about spreadsheets, protractors, and just watching the damn games.
Those days now seem like a very long time ago. In what became known as hockey’s summer of analytics, several of the community’s top minds were snapped up by NHL teams. Fan forums and Twitter lit up with those wanting to learn about these newfangled numbers. Talk radio was debating the merits of Corsi and Fenwick, and even old-school media began incorporating the newer stats into their work. Suddenly, analytics is everywhere.
All of this has left the field with new credibility, as even longtime critics have been forced to concede that there’s value in analytics. But it’s also left behind a void, with thought leaders like Tyler Dellow and Eric Tulsky now largely silenced by the terms of their new employment.
On Saturday, several of the field’s top remaining names gathered in Calgary for the second Alberta Analytics Conference. The event was organized by Rob Vollman1 and attended by roughly 70 fans, as well as media and at least one team executive.2
As you’d expect, the day featured the occasional mention of how much the tide had turned, and maybe even a little bit of gloating. But for the most part, the focus was on the future. Hockey analytics has arrived and is here to stay. But compared to sports like baseball, the field is still in its infancy. Despite the progress of the last year, there’s a long road ahead, and lots of ground still left to be covered.
So where do we go from here? Here are five areas you can expect to hear more about over the coming months and years.
All the small things
The best known of the new wave of hockey stats is Corsi, which measures the number of shots that each team attempts. Corsi3 turns out to be hugely important — it’s one of the best indicators we have of future success, especially at the team level. Put simply, teams that can gain an edge in possession and create more shot attempts usually go on to beat the teams that can’t. As far as analytics go, this is settled science and has been for a long time.
Great. So now what?
After all, any coach who looked at the numbers is going to want to know: If Corsi is so important, how do I improve my team’s number? “Be a better possession team” isn’t a useful answer. “Create more Corsi events” is even worse. All of this stuff may be useful for predicting future success and failure at a macro level, but as a practical matter it isn’t useful to coaching staffs unless it can translate into specific changes to a team’s strategic approach.
We covered one example last season: zone entries. At last year’s Sloan Conference, Tulsky and others presented a paper that looked at the value of entering the offensive zone with possession, as opposed to dumping the puck in deep and then trying to retrieve it. They found that entry with possession was roughly twice as valuable in terms of generating shots and scoring chances. That went against conventional North American hockey wisdom, which has long leaned toward the safer dump-and-chase approach, but the numbers were convincing. It was also exactly the sort of insight that a coach can actually use, and in fact, some NHL coaches did.4
Now, analysts like Justin Azevedo are looking for similar breakthroughs. On Saturday, Azevedo presented his efforts to study what he refers to as microstats, the sort of common plays that take place on virtually every shift, but that aren’t tracked separately in the box score. Azevedo wants to know whether the way teams approach those common plays could impact their Corsi.
For instance, think about the stretch pass. Hockey fans have come to appreciate the ability of a defenseman to make a long pass across multiple zones to a streaking forward; it’s one of the most coveted skills that an offensive blueliner can have. But how often do stretch passes succeed? And do the failed attempts, which can often result in an icing call or, worse, the play quickly coming back the other way, hurt a team more than the successful ones help?5
Those are the sort of questions that will become more common as fans like Azevedo figure out what’s worth tracking. It’s daunting work; unlike shot attempts, the data can’t be scraped from the NHL’s logs, so it has to be tracked manually by somebody watching the game and recording each play they see. It’s hard for one person to track more than one team at a time,6 and much of the work is subjective and prone to error.
It’s a tough job, but it’s the sort of thing that will have to be done if analysts want to answer that coach’s question: Great, now what?
Preparing for the new wave of technology
While analysts like Azevedo are tracking data manually, most of the numbers used in modern analytics still come from publicly available game files that can be downloaded from the NHL website. And that presents a problem, because as speaker after speaker noted Saturday, that data is notoriously unreliable.
The NHL uses multiple people at every game to input data in real time, and there’s a degree of between-periods quality control. But hockey turns out to be an enormously difficult game to track. Unlike baseball or football with their frequent breaks, hockey can go long stretches without a pause in the action. In the time it takes a tracker to look down and press a button on an iPad, something else could happen and get missed. And a lot of what’s being tracked is subjective, leading to significant rink bias that skews the results even further.7
All of that has led to a nagging worry that much of the work being done may be built on bad inputs. That’s not a crushing blow — there are ways to adjust for things like rink bias, and many of the other errors should even out over large enough sample sizes — but it’s a concern.
The good news is that the days of relying primarily on hand-entered data may be coming to a close, as the league moves toward incorporating new types of technology. Camera-based tools like SportVU, already well-known for its revolutionary impact on the NBA, could be coming to the NHL.8 Meanwhile, the league is already looking into placing RFID tags on players that would track their movements during games. Adding a chip to the puck is another possibility.
Theoretically, that sort of technology could track virtually everything — the movement of every player, the movement of the puck, who has it and for how long, and what they do with it. The amount of data generated would be close to overwhelming, all of it much more accurate than what we have now.
The challenge will be figuring out what to do with it all. Not surprisingly, the analytics community already has some ideas. But when and if the technology does become available,9 the race will be on to see who can develop the next generation of stats and systems. The results could eventually make today’s stats seem simplistic by comparison.
The quest for the one-stop stat
In the decade or so that we could define as hockey’s analytics era, the sport has seen an avalanche of new statistics. We have stats to measure possession, competition quality, teammate quality, and even luck.
What we don’t have, yet, is a universal stat that captures everything in one number, the way WAR10 does in baseball. That’s created frustration, as old-school cynics struggle to use stats like Corsi to measure overall performance. And it’s left hockey fans without a stats-based answer to the age-old question, “Is this guy better than that guy?”
It’s not for lack of trying. Attempts to create a hockey version of WAR date back to at least 2003, and there have been at least a half-dozen distinct attempts, including concepts with names like DeltaSOT and Goals Versus Threshold. More recently, a stat called Total Hockey Rating was introduced, but it was widely mocked for ranking players like Tyler Kennedy and Patric Hornqvist ahead of Sidney Crosby and Alexander Ovechkin.11 Other recent attempts have included Hockey-Reference’s Point Shares. Each has their supporters, but none has earned the sort of widespread acceptance that baseball’s WAR has found.
Andrew Thomas, a Carnegie Mellon professor, is the latest to take a crack at the problem, using a model called “G-net through the Mean Even-Strength Hazard.”12 Thomas chronicled those efforts Saturday in a presentation called “The Single-Number Dream.” It’s not an easy problem to solve. On the surface, you need to figure out how many goals13 each play is worth. That can actually be done reasonably well; for example, we can assign a fraction of a goal to a center for each offensive faceoff win. Do that for every stat we can measure and then add it all up, and you’ve got yourself an all-encompassing stat.14
But then you start running into problems. One is what Thomas calls “the Crosby effect,” the fact that players are more productive when playing beside an elite talent. Should that be factored in? And do you treat each shot a player takes equally, or weight them based on whether they represent a legitimate scoring chance? That answer seems obvious, until you realize that we don’t really have a universally agreed on definition of what a scoring chance is.15
Making things even more complicated is that, in today’s NHL, even the seemingly straightforward concept of a “win” gets tricky. Teams are actually accumulating points, and those aren’t handed out equally thanks to the league’s ridiculous overtime point. At least in baseball, there’s one win available every game. In hockey, there are two points up for grabs some nights, and three points on others. How do you factor that in?
One key, Thomas argued, is to not let the perfect be the enemy of the very good.16 The number of possible factors that could be worked into a single all-encompassing hockey stat could reach into the thousands, and many of those would end up amounting to noise. The nature of this sort of stat is that nobody will ever be entirely happy with the results. That’s no reason not to try.
And stats guys have been trying for a long time. Nobody has managed to solve the riddle yet, but that day may be coming.
Unraveling shot quality
Shot quality can be a touchy subject in the analytics community. Much of the basic work being done now is based on attempted shot volume, but we know all shots are not created equal. An unscreened shot from the point isn’t the same as a goalmouth tap-in, and it seems odd to give them equal value when calculating something like Corsi. That often leads to skeptics trying to dismiss the numbers outright for being blind to what’s really happening on the ice; analytics advocates respond by pointing out that those effects largely disappear with a large enough sample size.17
While that debate rumbles on, the analytics community is working to find ways to incorporate shot quality into what they do. That unscreened point shot should be less valuable than the tap-in, and under certain circumstances the data would ideally reflect that. The tricky part is figuring out how big the difference is.
So far, much of the work has been based on shot location. Common sense would tell us that shots become more likely to go in when they’re taken closer to the net, and the numbers back that up. That gives us a starting point (although one that’s hindered by the lousy quality of the NHL’s shot location data), and sites like Greg Sinclair’s Super Shot Search allow users to chart all the shots and goals taken by specific teams and players.
Location doesn’t tell the whole story, though, because how the puck got there matters too. Shots taken off a rebound tend to go in more than first shots, and we can track those.18 Deflections are also more likely to turn into goals. And shots that come off passes are also more valuable, especially if the pass goes cross-ice and forces the goaltender to move laterally.
Various efforts are under way to better understand how to represent and account for different types of shots. Sportsnet’s Chris Boyle has been leading the way, tracking shot quality as it relates to goaltenders’ save percentages. Others are working on assigning a value to each shot attempt taken, based on its historical likelihood of turning into a goal. This is where the various pieces start to come together; better tracking technology will help with the data, and the results could then be used to create more accurate player-value statistics.
But in the meantime, don’t let anyone tell you that modern hockey analytics ignores shot quality altogether. It’s a piece of the puzzle right now, and that piece will get bigger as we go.
Filling the Extra Skater void
The Maple Leafs, owners of the infamous unspent analytics budget, shocked the hockey world by going on an offseason hiring spree. One of their new additions was Darryl Metcalf, who was best known in the hockey world as the man behind ExtraSkater.com, a website that launched before last season and quickly emerged as the most popular resource for modern hockey stats.
When Metcalf was hired, the site shut down.19 That’s left a huge void for the legions of fans, media, and even front-office employees who’d come to rely on Extra Skater. And it’s created a race to become the site that will emerge as its replacement.
To be clear, there are lots of hockey stat sites available today — probably too many. There are the old standbys, many of which predate Metcalf’s creation by several years. There are brand-new sites that have just recently launched, with new ones seeming to appear almost daily. Each one does something well. So far, none come close to being the one-stop shop that Extra Skater had become.
The void was a hot topic of conversation Saturday, with much of the talk centered on which sites would be the first to add certain tools and features. That’s all well and good, but this race will be won by whoever can figure out how to do what Metcalf did: build a site that’s easy enough for the average fan to use. Extra Skater’s breakthrough wasn’t inventing stats that nobody had ever seen before. It was taking the stuff that was largely already available elsewhere and making it accessible to anyone. For the average fan, the concepts behind some of these modern stats are difficult enough; nobody wants to feel like they have to take a postgraduate course just to find the information online.
The Extra Skater void won’t last long. The incentives to win the race are too high — after all, it took Metcalf less than one year to go from launching his site to landing his dream job. With interest in hockey analytics suddenly booming, it’s possible that a major media property will wade in, or maybe even the NHL itself.20
Or maybe it will be another Metcalf — one guy working solo to build something that will take the world by storm. As with so much of the hockey analytics world, we may not know the answer right now, but we’ll likely find out soon enough.