The Misuse and Misunderstanding of NFL Analytics
The use of analytics is confusing fans and receiving criticism…for good reason
- Ben Alamar, ESPN Director of Sports Analytics
If you’re an NFL fan, you’ve been hearing a lot about analytics lately. For example, a few Sundays ago the Minnesota Vikings had a chance to kick an easy field goal late in the game to extend their lead to 8 points. Instead, coach Mike Zimmer opted to go for a first down on 4th and 1 — it was unsuccessful and they ultimately lost the game.
Why go for it in that situation? Because analytics supported the decision. According to ESPN’s analytic modeling, the Vikings had a 98% chance of winning by going for it. It seemed like the smart play. Here’s why such decisions need to be reconsidered because NFL teams and the NFL talking heads are not understanding or applying these data correctly.
Here are 3 key problems with how analytics is being used in the NFL:
1. Applying Group Data at the Individual Level
It’s not unusual for various professionals to use group level data to make predictions about specific individuals. Group level data refers to statistics that describe the behaviour or outcomes for a whole group of people or things. For example, as noted in the quote above, ESPN collects data from all NFL teams across all (or many) NFL games to create an estimate. The estimate is a prediction for an individual — in this case any NFL team at a particular point in time (ex: 4th and 1 late in the fourth quarter).
A similar use of data can be seen in actuarial sciences. For example, insurance companies will gather data about large groups of people, and then try to match you (an individual) to a particular group in order to make a prediction. If you’re a 37 year-old non-smoker, then the insurance company can look at what their stats say about people in that group and estimate the likelihood of death, which allows them to determine how much money they should charge for life insurance.
We know that insurance companies do a very good job making actuarial predictions. In other words, they can take group data and successfully apply it to an individual. So, how are NFL teams getting it wrong?
The problem is that statistics like those of ESPN are gathering situational data as opposed to team data. In the insurance example, the data gathered from people (e.g., are you a smoker?) tell us info about what the person is like. This is not what analytic companies like ESPN are doing, as far as I can tell. Let’s take the 4th and 1 example. They are looking at the average outcome based on what all NFL teams have done in similar situations. Insurance companies and ESPN are using different group data — the former are gathering info to make predictions about the person, while the latter are making predictions about situations.
Why is the ESPN approach a problem? The situational data doesn’t account for the actual team characteristics nor the in-game contextual factors.
For example, if a team running the 4th and 1 had the best run offence and were going against the worst run defense, is it reasonable to expect that their probability of winning is the same as a team with the worst run offense going against the best run defense? As far as I can tell the abilities of the teams running the play in question is not being properly factored into the modeling. Furthermore, game specific information is also not properly included in the model (ex: you might have the best run offense, but your offensive line could be struggling that game or your starting running back just left with an injury).
In other words, there are specific team and game factors that add so much contextual information that relying on averaged group data is just inappropriate.
2. Running Averages are Not the Same as Time Point Probabilities
The lifetime risk of a woman being diagnosed with breast cancer is 1 in 8 or about 12.5%. What does this have to do with NFL analytics?
Statistics are only useful if you understand their meaning. I’ve read articles (including some professional health sites) where the authors misunderstood the difference between lifetime and point prevalence statistics. Lifetime statistics refer to the probability of getting disease X once a person has lived a standard lifetime (typically 80 years). Point prevalence can be used to refer to the probability a person will have disease X at a particular point in time.
The mistake I’ve seen some people make is assuming that a 45 year-old woman going for a mammogram has a 12.5% chance of having a positive test result (i.e., she has breast cancer). This is not true — the probability of the average 45 year-old is not the same as the average 80 year old.
Again, what does this have to do with the NFL? Let’s take a look at 2-point conversions, which have been a hot topic in the past few years, with many analysts claiming that teams should go for 2-point conversions at a particular point in the game or more often. A nice article on this can be found here, where the author shows that, on average, NFL teams score 1.02 points from 2-point attempts and .938 points per 1-point attempts. The conclusion many would make (and do) is that the 2-point attempt should be made more often or at a particular point in the game because it produces more points on average.
The logic of the 2-point argument has some truth, but it’s not framed nor used correctly. There are two significant problems with this line of thought (I’ll review the second problem in point #3 below).
The first error touches on the statistical concept I described above — the difference between probability at time X vs probability across time. While watching an NFL game, the announcer might say “The team should probably go for 2 here because the stats say so.” No. They don’t. If the stats for a particular team showed that they succeed on the 2-point conversion 60% of the time, that doesn’t mean there’s a 60% chance on each 2-point try. The actual probability for each try will vary across games, opponents, situations, etc.
The 60% success rate means that across a period of time (i.e., a certain number of 2-point attempts), the average turns out to be 60%. This is an important distinction because in order for a team to capitalize on a favourable 2-point statistic, they would need to regularly make the attempt. This would involve a 40% fail rate, but at the end of the season, if the statistic is true, then regularly trying for 2 would pay off.
You have to live a lifetime to get the 12.5% lifetime risk (for the average woman), and likewise, you need a large enough data set (i.e., a regular use of the 2-point attempt) to have the 60% be true across time.
Now, before I move on to problem #3, some readers might take issue with my claim that each 2-point try isn’t necessarily 60% each try. One might argue that the 2-point try is a binary outcome just like a coin flip (you either make the 2-point try or you don’t). Each flip of a coin is 50% heads or tails on each flip, so why isn’t this the same?
The coin flip has no conditions and is considered a random outcome. The 2-point success rate is conditional and not random. The success of each attempt is conditional on a number of factors (team competency, weather, injuries, opposition competency, etc.). Because these conditional factors are expected to vary across time, so too will the probability.
How often would a team have to go for 2 in order to achieve the estimated stat(in this example, 60%)? I don’t know. I also suspect that if teams started to go for 2 regularly, other teams would adjust by preparing more for the 2 point try, and the numbers would change. There’s a dynamism and iterative aspects to these things, but if a team wanted to try, they need to understand the stats.
3. Intra-team Statistics are More Useful than Inter-team Stats
Inter-team statistics refers to stats derived from all the teams (eg, average NFL 2-point success rate), while intra-team refers to stats that come from one team (eg, the difference between 1 and 2 point success rates for one team).
Let’s stick with the 2-point conversion to highlight how best to use analytic data. I’ll return to the stats cited in the SB Nation article from section #2. When we look at the inter-team statistics and note that the 2-point attempt yields 1.02 points for an average NFL team, and .93 points on the 1-point attempts for the average NFL team. Thus, one might conclude that NFL teams should always go for 2-points because they will produce on average .09 extra points for each attempt across a season, which could matter.
However, this would be a mistake for some teams. Look at the woeful Detroit Lions, whose average 2-point attempt produces less than .5 points, while their average 1-point attempt is around .95 points. If they used the inter-team statistics (i.e., statistics from league averages), they’d lose an average of .45 points per attempt across a season. Teams like the Lions aren’t average NFL teams, and the further a team is from average, the less likely inter-team statistics will be accurate. The SB Nation article correctly limits its conclusions to intra-team suggestions only.
In fact, one of mine and others’ main criticisms of analytics could be rectified by using more intra-team data to make decisions. How does a specific team tend to do in situation X?
There are great benefits to be found by analytic-informed decision making and NFL teams are moving in the right direction by embracing these stats. The key is understanding when and how to use them, and when to recognize their limitations.