This is a review of the soccer research conducted by Patrick Lucey, Alina Bialkowski, Mathew Montfort, Peter Carr, and Iain Matthews applying conditional random fields and point-in-polygon calculations.
Within any soccer game, the team with the most shots on goal is often not the winner. Why is this? Obviously, not every shot has the same probability of resulting in a goal. Using player-tracking data, this research looks at how to quantify the value of a shot.
Fine-grained player tracking data is used to determine several features: how close the shot is to the goal, the distance from the defender, the number of defenders between the shot and goal as well as the position of the other attackers on the field.
Match context is also taken into account with shots organized into six different context groups, namely open play, counter attack, corners, penalties, free kicks and set pieces. Grouping shots this way and then looking at which contexts lead to the greatest number of goals tells us that teams have a greater probability of scoring while on a counter attack than during a normal possession. A normal possession is more likely to result in a goal than a penalty kick. Corners have the lowest probability among all six contexts of resulting in a goal.
Player and ball tracking data is used from one season of games from a professional soccer league in Prozone. This resulted in 9732 shots from which spatiotemporal patterns were analyzed. This information was inputted into a Conditional Random Field in order to estimate the probability of a team scoring from a given shot. Teams are analyzed in regards to quantity and quality of shots using team tendencies rather than individual characteristics.
The position of the defenders has a major impact on the offense in terms of the decision regarding whether or not to attempt a shot as well the actual execution of the shot. In order to determine the defenders’ orientation towards the ball, it is first necessary to determine if there are any defenders located between the shot and the goal. If any are present, the Euclidean distance between shot and defender is determined. If there are no defenders within the area, the distance is given a negative value. All match contexts show that the probability of scoring increases when there is no defender between the shot and the goal.
Another factor to look at is the number of defenders in this area. A standard point-in-polygon calculation is utilized to determine which players are within this area.
Defensive structure is determined by assigning roles to each player, done by finding the permutation of the raw location points, which minimizes the distance, and then utilizing a Hungarian algorithm to ultimately assign a role to each player. From this, four further features are created: the distance between the defensive line, the distance between the back-line and the midfield-line, the number of defensive role changes and the number of attackers in front of the defensive center.
Offensive features incorporated into the model include whether it was a long pass, cross, dribbling and taking on players, or pressing. Also included is the position of the player who passes the ball to the shooter, the pace of the players, and how the team on offense moves relative to the defense.
From all of this information the expected goal value of each shot is calculated using logistic regression.
This information can be used to compare and contrast characteristics between teams. The efficiency of the various offences and defenses can be calculated and also be used for comparison purposes. Coaches can further analyze the efficiency of their team within different matches, providing information regarding strengths and weaknesses of their team, as well as their opponents. This information would help build stronger strategies to incorporate while playing the various opponents. Finally, analysts can look at the results and determine whether the winning team won based on talent or luck.
Training players to evaluate the expected value of a shot would ultimately lead to a greater shot efficiency. However, at the same time, defenders could use the same information to improve their defensive strategy.
Analytics Used: Conditional Random Field, Euclidean Distance, Standard Point-in-Polygon Calculation, Permutation, Hungarian Algorithm, Logistic Regression
——————————————————–
Find out how Sports Analytics Expert Victor Holman can give your team the competitive advantage.
How mature is your team’s analytics program? Take the Sports Analytics Maturity Assessment.
Discover the Groundbreaking Sports Analytics Software and Framework coaches and sports analysts are talking about!
Learn all about sports analytics in Victor Holman’s Sports Analytics Blog.