This is a review of the tennis analytics research conducted by Stephanie Ann Kovalchik, applying a bookmaker consensus model.
Where there are sports, there are people predicting the outcome. Trying to figure out who will win fascinates people of all ages all around the world. This includes fans of the sport of tennis.
Models that predict outcomes of singles matches in professional tennis are plentiful and the purpose of this research is to study 11 of those models for their accuracy and discriminatory power. Testing the models was accomplished using the dataset from the 2014 Association of Tennis Professionals Tour. Also investigated are the differences between performance on different surfaces and different tournament levels. The aim is to identify the major components of win ability in professional tennis as well as to determine how current models can be improved.
The eleven models fall into three categories: regression, point-based and paired comparison models. Regression models directly model the winner of a match and typically are based on the probit family and include player rankings as a predictor. Point-based models determine the probability of winning on serve and from there compute a prediction for the match outcome using an algebraic formula, which assumes that points are independent and evenly distributed. Paired comparison models weigh the possibilities of different options. A bookmaker consensus model was included as a standard of reference for predictive performance.
To maintain a fair comparison all models were analysed using one-year’s worth of data, including 2395 ATP singles matches played during the 2014 season. Models were evaluated based on: prediction accuracy, calibration, log-loss, and discrimination. Prediction accuracy is the percentage of correct predictions. Calibration looks at the expected wins across a number of matches. Log-loss measures whether the player with the higher ranking won and if the corresponding prediction was correct. Discrimination is determined by measuring the mean prediction for matches won by higher-ranked players minus the mean prediction for the matches they lost.
In terms of accuracy, regression and point-based models performed similarly, with the regression models that included player ranking being the most accurate. Adding in additional predictors did not improve the accuracy of the models. The paired comparison models showed improved accuracy as the amount of data increased. Overall, the bookmaker consensus model was the most accurate in its predictions.
In terms of calibration, several models indicated a bias in that the models tended to underestimate the higher-ranked player’s likelihood of winning, predicting more upsets than actually occurred during the year analyzed.
The bookmaker consensus model also generated the lowest log-loss of all models, which indicates that it made fewer overconfident predictions. The point-based models predicted the greatest amount of upsets.
Again, the bookmaker consensus model demonstrated the best discriminatory ability. Regression-based models were the worst in this area.
Accuracy differences were found as related to player rank, tournament level, and type of surface. The greatest differences were found between matches which included a top 30 player, and matches that did not. All models were significantly more accurate regarding their predictions of matches involving a top 30 player. Predictions for Grand Slam matches were also more accurate than other tournaments. Finally, predictions for grass and hard-court tournaments were more accurate than those for tournaments using clay courts.
The only model that came close to the bookmaker consensus model in accuracy was Elo, which is the only model that considers a player’s past wins and losses. Recent performances are weighted more heavily as well as wins against stronger opponents.
This information provides useful data for analysts who are always trying to generate the best predictions. It can also point analysts to the opportunity to create an even better model using the strengths of the various models.
Analytics Used: Regression Model, Point-based Model, Paired Comparison Model, Bookmaker Consensus Model
Find out how Sports Analytics Expert Victor Holman can give your team the competitive advantage.
How mature is your team’s analytics program? Take the Sports Analytics Maturity Assessment.
Discover the Groundbreaking Sports Analytics Application and Framework coaches and sports analysts are talking about!
Learn all about sports analytics in Victor Holman’s Sports Analytics Blog.