» The NFL’s Cloudy Crystal Ball

If you’ve been following this space at Grantland for a while now, you’ve seen a variety of advanced concepts and statistical rules of thumb pop up with regard to measuring football performance. With the season rapidly approaching, now seems like a good time to review those concepts, explain why they work, and apply them to the 2011 season. We’ve already started with our previews of the 49ers, Dolphins, Buccaneers, and Broncos from two weeks ago, but there are 28 other teams worth considering, too.

Before we start, though, a quick acknowledgment about football statistics. For a variety of reasons, they are less meaningful and harbor less authority than their big brothers in baseball. Football statistics are about where Bill James was in 1986, just with additional computing power. There are still reams of data to be compiled, stunning concepts to be proven, and a billion arguments to be had. Even then, though, football has far fewer discrete events and games than baseball does. Our collective knowledge of football through numbers is going to improve, but it’s never going to catch up.

While wins are obviously very important for measuring team performance, as a statistic they limit the possibilities and reduce entire games to single data points. Using point differential or an advanced metric like DVOA to measure a team’s performance is best because you can think about it in the context of plays as opposed to games. The average team from 2011 was involved in about 2,035 plays, and you can score or be scored upon on just about every single snap. That’s 2,035 chances to prove how strong or weak you are. You only have 16 opportunities to record a win or loss, and you might only get three or four opportunities to win a close game. And those close games might come down to how you do on one play. Those three or four plays might mean a whole lot to your win-loss record, but they aren’t the best indicators of how you’re going to do on those four plays in the future; obviously, 2,035 is a much larger sample than 16 or four.

The four concepts below are some of the core underlying indicators of team performance. Historically, they’ve each shown great significance in projecting a team’s performance into the future, so keep them in mind as you prepare for the 2012 season.

Football’s Pythagorean Theorem

In a Sentence: Point differential is a better indicator of future winning percentage than winning percentage itself.

How It Works: Created by Bill James for baseball and modified for football in the early ’90s by current Houston Rockets general manager Daryl Morey, the Pythagorean theorem (or “Pythagorean expectation”) is a formula that translates a team’s points scored and allowed into an “expected” winning percentage. That formula isn’t exactly for the faint of heart:

Points For^2.37 / (Points For^2.37 + Points Against^2.37)

As an example, let’s take the 2011 Chiefs, who went 7-9 while scoring 212 points and allowing 338. Our formula is 212^2.37 / (212^2.37 + 338^2.37) = 0.248. That’s the Chiefs’ expected winning percentage from their point differential, and if we multiply it by 16 games, we get a total of just 4.0 wins. The Pythagorean theorem suggests that the Chiefs outperformed their true level of performance by three full wins.

Why It Works: Because all wins aren’t created equal. During Kansas City’s three-game winning streak last year, they beat the Raiders 28-0 in a game in which they forced six interceptions. Pretty impressive. A week later, they beat the Chargers 23-20 in overtime during that game in which Philip Rivers fumbled a meaningless snap moments before San Diego could attempt a game-winning field goal. For each of those two performances, Kansas City got the same exact mark on their record: one win. Nobody in his right mind would think that Kansas City looked equally good in both of those games, even if they got the same result. That’s where the “All that matters is the W” argument falls apart. It’s like saying the pass/fail system is just as useful as the traditional grading scale when figuring out how well somebody did in a class.

Prove That It Works: The simplest way to show off the efficacy of the Pythagorean expectation is to show off the impact it has on a team’s record during the following year. The chart below lumps teams from 1983 to 2010 into groups by the difference between their expected win total and their actual win total, and then notes how that team’s win total changed during the following year.

Expected Wins vs. Actual Wins	Teams	Avg. Change in Wins
-3 to -2	57	Improved by 2.6 wins
-2 to -1.5	44	Improved by 2.5 wins
-1.5 to -1	86	Improved by 2.0 wins
-1 to -0.5	124	Improved by 0.6 wins
-0.5 to 0	119	Neither improved nor declined
0 to 0.5	123	Neither improved nor declined
0.5 to 1	127	Declined by 0.9 wins
1 to 1.5	97	Declined by 1.5 wins
1.5 to 2	66	Declined by 1.8 wins
2 to 3	33	Declined by 2.5 wins

The Chiefs and their difference, a full three wins, would fit into the very last category within the table, one that suggests that they’ll decline by 2.5 wins this season. Of course, that’s just the average change, and the Chiefs may benefit greatly from full years with Eric Berry, Jamaal Charles, and Matt Cassel; like every bit of information that surrounds a team, it’s important to blend the statistics with the specific context to which they’re being applied.

It also works on the half-seasonal level. The 30 teams from 1994 to 2011 that “underperformed” their Pythagorean win totals by the largest amount during their first eight games of the year saw their win totals more than double during the second eight, going from winning an average of 1.8 games in the first half to 3.7 in the second half. Likewise, the 30 teams that “overperformed” by garnering win totals that put their point expectations to shame suffered dramatic dropoffs; after winning an average of 6.1 of their first eight games, they averaged just 4.3 wins over the second half. In both cases, the gap between our outliers’ expected win totals and their actual win totals went from an enormous first-half total to a second-half total of essentially zero. You can cheat Pythagoras, but not for long.

Apply It to 2012: The table below lists the five teams with the biggest positive and negative differences between their actual win totals and their Pythagorean expectations from last year.

It will be really interesting to see if the Packers retreat toward league-average. Teams with great quarterbacks are more likely to outplay their Pythagorean expectations than teams without them, but that hadn’t been the case with the Packers and Aaron Rodgers before last year. During Rodgers’s first three years as the Packers’ starter, Green Bay underperformed their Pythagorean expectation by a total of six full wins.

Record in Close Games

In a Sentence: Teams are incredibly inconsistent from year to year when it comes to winning games that are decided by one touchdown or less.

How It Works: No obtuse formula here. Just count up each team’s number of games that were decided by one touchdown or less, check their winning percentage, and then see if they were similarly good or bad during the following season. When you do, we bet that you’ll find it’s essentially random.

Why It Works: Because, as we mentioned in the intro, a few close games per year isn’t enough to draw any conclusions.

Prove That It Works: Let’s start with a group of teams that were dominant in close games during given NFL seasons. Our arbitrary group of teams played six or more games that were decided by a touchdown or less in those seasons and each of them won 75 percent or more of those games. In all, those teams went a combined 449-102 (81.5 percent) in close contests. If there were really something consistent about how a team performs in the tight ones, these teams would at least emulate their record during the following season. Instead, they went a combined 256-249 (50.7 percent) in those same close games the following year.

On the other side of the tracks are the teams that couldn’t pull out those close games, the ones that didn’t know how to win or finish or whatever. They were the ones that played six or more games and won only 25 percent or less of them. In their downtrodden year, they went a combined 103-479 (17.7 percent). The following year? 241-284 (45.9 percent). Winning the close ones just isn’t a sustainable way to make the playoffs year in and year out.

If there’s an exception to the rule, as with the Pythagorean expectation, it’s having a great quarterback. Peyton Manning was 64-33 in those games with the Colts, and it’s not surprising when you consider how well he managed endgame scenarios. Tom Brady is 43-15 in those same games. On the other hand, Aaron Rodgers is 13-17 in one-touchdown games, and Drew Brees is 22-16 during his time in New Orleans (after going 14-14 as the Chargers starter). So some great quarterbacks seem to drastically outperform the expected regressions, but others don’t.

Apply It to 2012: Here are 10 teams that stood out on either side of the coin in close games last year.

The Raiders won just one game by more than a touchdown last year, and that was a 10-pointer over the Jets in which they sacked Mark Sanchez on the Oakland 2-yard line on fourth down with 52 seconds left.¹ You might consider the case that having a talented kicker like Sebastian Janikowski would produce a few extra close wins, but the Raiders were exactly 9-9 in games decided by a touchdown or less over the previous three seasons.

Minnesota, meanwhile, lost as many games by seven points or fewer as any team has since the advent of the 16-game season. Three other teams have lost nine games by one touchdown or less in a given season; the following year, two bounced back to relatively normal records in close games and improved by four wins each, while a third — the 1989 Chargers — failed to win one of their five close games the subsequent year.

The Plexiglass Principle

In a Sentence: Teams that make a significant leap in their performance (or an aspect thereof) over a given season often give back some of those gains during the following year.

How It Works: By observing the performance of a team after it made a dramatic improvement, relative to the rest of the league, in a given aspect of play during the previous season. Under the league’s current 32-team setup, an improvement of 20 spots in the rankings in a given statistic (e.g., wins, points allowed, etc.) might be a good place to start concerning oneself with the Plexiglass Principle.

Why It Works: Because too many things have to go right for a team to make an extreme leap, and those things don’t often happen in consecutive seasons. Take the 2008 Dolphins, who went from winning just one game in 2007 to 11 wins in 2008. That’s a classic Plexiglass scenario. Those Dolphins were 7-2 in games decided by a touchdown or less and beat their Pythagorean expectation by 2.2 wins. We already know that’s not sustainable.

Getting past the statistics, though, consider what happened to their personnel. After an injury-filled 2007, the 2008 Dolphins got healthy seasons from the vast majority of their starters. That included quarterback Chad Pennington and halfback Ronnie Brown, who, as a pair, had combined for just one healthy 16-game season in nine beforehand. They lasted a total of 12 games the following year. Miami also faced the fifth-easiest schedule in football and implemented a gimmicky new offensive tactic (the Wildcat) that worked in 2008 but was likely to wear off with future exposure. It took all those little lucky breaks to get the Dolphins to improve by 10 wins, and while some of the genuine improvements (an upgrade in coaching from Cam Cameron to Tony Sparano, quality drafts under Bill Parcells/Jeff Ireland, some modicum of health and vestiges of the Wildcat) stuck around, they didn’t have quite the impact they did during 2008.

Prove That It Works: Check out the first table in the Buccaneers preview from August 8 to see how the Plexiglass Principle affects win totals from year to year.

It also holds water when focused onto individual aspects of team performance, such as scoring points. Since 1989, 10 teams have suffered a drop of 20 spots or more in the league’s rankings for points scored over a given season. Most recently, the Colts went from fourth in scoring in 2010 to 28th last year. In the year after their precipitous collapse, those teams have improved by an average of more than eight spots in the rankings. Eleven teams have moved up by 20 spots or more in the rankings, and those teams declined by more than 11 spots in the following season.

And, as you’re probably guessing, it also applies to a team’s defensive performance. Using the same sample size as for offense, 15 teams have improved their points-allowed placements in the league ranking by 20 spots or more in a given year, as the Texans (29th to fourth) did last season. Those organizations declined by an average of more than eight spots in the rankings during the subsequent year. The 22 teams that saw their defensive performances drop by 20 or more spots, like the Buccaneers (ninth to 32nd) last season, improved by an average of more than 11 spots. Take a lot, give a little back.

Apply It to 2012: In addition to the teams mentioned above and their shifts on one particular side of the ball, here are the teams with the biggest shifts in win totals from 2010 to 2011:

Team Turnover Margin

In a Sentence: The turnover margins produced by teams over a 16-game season are markedly inconsistent from year to year.

How It Works: We overestimate just how random turnovers can be from year to year. While teams are historically consistently effective at forcing fumbles, they struggle to recover a consistently high percentage of those fumbles from year to year, suggesting that randomness overruns that incredibly important type of play. Interceptions are perhaps more stable, but as mentioned regarding Alex Smith in the Niners preview, even they can vary dramatically from year to year.

Why It Works: Physics. When players who aren’t regular ballcarriers try to scoop up an oblong spheroid, funny things tend to happen. Like this Derrick Mason fumble recovery against the Cowboys in 2008.

Prove That It Works: The aforementioned Niners (positive) and Buccaneers (negative) pieces note what happens to teams living on the very extremes of the turnover margin; they tend to bounce almost all the way back toward a seasonal turnover margin of zero, and their records shift about two games toward average.

Apply It to 2012: Beyond the Niners and Bucs, here are the teams with the largest turnover margins from last year:

There are other factors that also exhibit meaning when used properly, like strength of schedule, average starting field position, penalty rates, and the “hidden” aspects of special-teams performance on things like field goal percentage allowed and kickoff distance against.²

Detractors of statistics often say that metrics reduce a fascinating game to numbers or work in service of those who would prefer to see the games played on paper. In baseball, these sorts of straw-man arguments end up glorifying triples as some Holy Grail of meaning beyond numbers.³ We don’t agree, obviously, but it’s less about how the numbers work and more about how you use them. Football is always going to be a mysterious game, even to those who played it. At their best, statistical concepts in any sport don’t remove the mystery of the triple or force the game onto paper; they give us a way to navigate and try to understand the void, to make sense of millions of helmets crashing into each other on weekends.

The concepts above are little bits of mystery reclaimed as knowledge, but they’re still subject to incredible amounts of randomness. We might know that the Niners’ turnover margin is going to decline 97 percent of the time, but we have no idea what it’s going to decline to. It could go from plus-28 to minus-15, and they could go from 13-3 to 5-11, or it could fall to plus-15, and they could stick at 13-3. Maybe they’ll be the exception to the rule that keeps their turnover rate at plus-31 and they go 14-2. We don’t know for sure and won’t until the season is over, and that’s fine. The goal with using statistics in sports isn’t to know, because truly knowing something over a 16-game schedule before it even happens is impossible; it’s to learn. Stuff like the Pythagorean expectation and the Plexiglass Principle? That’s what we’ve learned so far about football.

Underachievers	Diff.	Overachievers	Diff.
Dolphins	-2.5	Packers	3.1
Vikings	-2.5	Chiefs	3.0
Eagles	-1.8	Broncos	2.2
Panthers	-1.5	Raiders	1.7
Colts	-1.3	Patriots	1.4

“Lucky” Teams	Record	“Unlucky” Teams	Record
Raiders	7-2	Vikings	2-9
Packers	5-1	Panthers	1-5
Saints	4-1	Rams	1-5
49ers	6-2	Colts	1-5
Steelers	5-2	Eagles	2-5

Improved	Win Change	Declined	Win Change
49ers	+7	Colts	-8
Packers	+5	Buccaneers	-6
Bengals	+5	Rams	-5
4 teams	+4	5 teams	-3

Positive	Margin	Negative	Margin
Niners	+28	Buccaneers	-16
Packers	+24	Eagles	-14
Patriots	+17	Redskins	-14
Lions	+11	Cardinals	-13
Seahawks	+8	Steelers	-13

Capital Gains

The 30, Week 20: The Rays Don’t Care

The NFL’s Cloudy Crystal Ball

Football’s Pythagorean Theorem

Record in Close Games

The Plexiglass Principle

Team Turnover Margin

More from Bill Barnwell

NFL Week 8 Picks: The Weight of the Interim Label October 30, 2015

The Eagles on the Brink October 27, 2015

The NFL Top 25 Rankings October 26, 2015

Week 7 Picks: Welcome to Finding Your NFL Roots October 23, 2015

The Anatomy of Baltimore’s Lost Season October 20, 2015

More NFL

NFL Week 8 Picks: The Weight of the Interim Label October 30, 2015

Spirit of St. Louis: Todd Gurley Is Giving the Rams a Chance to Take Flight October 29, 2015

‘The Grantland NFL Podcast’: Week 7 Review Part 2 October 28, 2015

The Legacy of the NFL Sin-Eaters October 28, 2015

The Eagles on the Brink October 27, 2015

More Features

Blades of Glory October 28, 2015

Russell, the Creator October 28, 2015

30 for 30 Shorts: Every Day October 28, 2015

The 15 Biggest Plays in Baseball History October 27, 2015

The Laughs, Pathos, and Overwhelming Talent of Jan Hooks October 20, 2015