This is a review of the NCAA basketball research conducted by Francisco J. R. Ruiz and Fernando Perez-Cruz, applying Poisson factorization.
Predicting probabilities regarding outcomes of sporting events is difficult as it is often not clear which variables actually affect the outcome and what information is known before the event begins. Predicting outcomes for team sports is even more difficult as now there is the additional information regarding the individual players that will affect the outcome predictions.
A model is developed to estimate probabilities for the March Madness Tournament in college basketball. The model combines soccer models, which identify teams by its attack and defense coefficients, and Poisson factorization in which the parts of a matrix are assumed to be independent Poisson random variables given some hidden attributes. The structure of the March Madness Tournament is also taken into account.
To start, an attack and defense coefficient is defined for each team and for each NCAA conference. The conference coefficients replicate the overall behavior of each conference and the team coefficients demonstrate differences within each conference. Each coefficient represents a particular strategy, as some teams may be successful at defending some strategies while less successful at defending others. The same idea is used with attacking.
The posterior distribution of the attack and defense coefficients are approximated with a mean-field inference algorithm, as is the home coefficient. These inferences are converted into a non-convex optimization problem using variational algorithms which are simpler to compute than Markov Chain Monte Carlo methods and do not create the same limitations.
The variational algorithm is applied to four years worth of data from the tournament. The output of the algorithm is used to estimate the probability of teams winning in each tournament game. The predicted probabilities are averaged for 100 independent runs of the variational algorithm, implementing different initializations.
One advantage of this model is that it not only generates results, it also supplies explanations for the results. Experiment outcomes indicate that there is some advantage to playing at home, but that advantage is not as strong as it is for soccer teams. The model can also rank conferences. Ranking of teams indicates that teams that lose a few close games and win many games by a large margin will finish higher than teams that win all their games by a small margin. Last, the model can predict the expected results of each game. These predictions were derived after averaging the expected Poisson means for the 100 independent runs of the model.
This model is general enough to be implemented for most team sports during regular season or playoff games. Consequently, analysts can rank teams by the strengths and weakness of the offense and defense. They can also compare different conferences within a league.
Analytics Used: Poisson Factorization, Mean-Field Inference Algorithm, Variational Algorithm
————————
Find out how Sports Analytics Expert Victor Holman can give your team the competitive advantage.
How mature is your team’s analytics program? Take the Sports Analytics Maturity Assessment.
Discover the Groundbreaking Sports Analytics Application and Framework coaches and sports analysts are talking about!
Learn all about sports analytics in Victor Holman’s Sports Analytics Blog.