Monday, January 26, 2015

Bayesian predictions for Super Bowl XLIX

Last fall I taught an introduction to Bayesian statistics at Olin College. My students worked on some excellent projects, and I invited them to write up their results as guest articles for this blog. Here is the first article of the series:


Predicting Super Bowl XLIX
Alex Crease and Matt Wismer


All code for this post can be found on Github: https://github.com/MattWis/PoissonFootball


With the Super Bowl coming up this Sunday, everyone is probably wondering where to place their bets. Fortunately, we've developed a model that uses Bayesian statistics to predict the results of football matches based upon prior season data of each team. We then used our model to predict a game between the Seahawks and the Patriots.


Our model is an expanded version of Allen Downey's solution to the “World Cup Problem”, where one can predict goals scored in a soccer match based on prior data. While the soccer model uses a Poisson process to predict when the next goal will be scored, our football model is a bit trickier because it not only accounts for defense, but also incorporates the two types of scoring; field goals and 7 point touchdowns, to predict final scores. Safeties and two-point conversions were eliminated to simplify our model. We also didn't include overtime considerations, allowing the two teams to tie at the end of regulation.


This model can be useful in a number of different betting schemes. Here’s what it predicts:


Who’s going to win?


Patriots, easy. Well, we may be biased because we both go to school in New England. Here’s the real story:


We considered scoring to be a single Poisson process, and have a separate distribution that represents our belief about the probability of a team scoring a touchdown or a field goal. We then calculate the probability of scoring a certain number of touchdowns and field goals based on the binomial distribution. The single Poisson process is related to the ability of the offense to get into field goal range, and the percentage of touchdowns is related to the team’s red zone efficiency.


We also keep track of offense and defense separately. To calculate the distribution of points scored, we average the offense’s scoring rate with the rate at which the defense allows scoring, and average the offense’s touchdown percent with the percentage of touchdowns that the defense allows.


The points distributions for each team were generated as described above, so we then calculated the probability that one would be greater than the other, or, in simpler terms, the probability that each team would win:


Outcome:
Patriots Win
Seahawks Win
Overtime
Probability:
52.0%
44.1%
3.9%

At this point, the teams seem well matched, with our model slightly favoring the Patriots. Looking more closely at the data, the 90% confidence interval gives the Patriots a scoring range of 6-45, and the Seahawks 6-44, bringing us to the investigation of the point spread.


Betting the Point Spread


This game is likely to be pretty close. We looked at the probability of different spreads by subtracting the Seahawk's point distribution from the Patriot’s point distribution. The cumulative distribution of the point spread is shown below. As you can see, the 50% mark appears right around the Patriots winning by 1. So if you can find a spread significantly distant from that, you can consult this chart to figure out how likely you will be to win the spread. However, this result does not consider overtime a possibility, it depends on the chance that the teams will tie. Thus the point spread may be different if the game goes into overtime.
Point_Spread.png


Bets on the Total


Some people like to bet on the total points scored by both teams, so we wanted to figure that out as well. We added our two points distributions, and got an interesting result. There is almost an identical probability of the total being 37 (3.636%) or 44 (3.634%). You can consult the below graph for other likely total scores to see where to place your bets.


Total_Points.png


Another method of betting on the total is based on an over/under scale, like the point spread. We've got you covered in that case, too. We expect the total score to be centered around 45 points, as that is the mean of the distribution. According to http://www.oddsshark.com/nfl/odds, the over/under line is currently at 48 or 48.5, and with our results in mind, we advise betting under. However, if the game goes into overtime, all bets are off (pun intended) and the total score could go higher.


Total_CDF.png
Office Pool Grid Betting


Because of the multiple ways of scoring in football, certain scores are more likely to occur than others. As you can see in the histograms below, scoring 17 points in a match is the most likely scenario for both teams because it is a combination of 2 touchdowns and 1 field goal, both quantities being likely for a football match. However, it is impossible to score something like 11 points because no combinations of 7 and 3 add up to that value. From the two graphs, you can see that the Seahawks are actually more likely to score field goals than the Patriots because their probability of scoring 3 points is higher.


Patriots.png


Seahawks.png
To satisfy your betting needs, we've created histograms below displaying the likelihood that teams will score values with specific last digits so you can (hopefully) blow your office mates away with your office pool bets. If you are familiar with the way points add up in football, our data isn't very surprising. Across both teams, the top two digits to place your bets on are 0 and 7, with 4 and 3 coming up behind.


Patriots_Digits.png


Seahawks_Digits.png

Sadly, our results do not show a clear outcome to place your bets on. However, we hope that we've shed some light on how you can make an educated guess when deciding where to put your money. Please remember that you ultimately decide how to bet, and we are no way liable for any money you may lose (or win) over the Super Bowl weekend.