Germany’s top national football league, the Bundesliga, has since 2020 been providing real-time insight into match data to fans using AWS machine learning, compute, database, and storage cloud services. Now they are providing deeper insight into player efficiency, with two new stats around player efficiency, to be premiered as graphics during broadcasts in September 2021 for the 2021-22 season.
The first advanced stat, shot efficiency, compares the actual number of goals that a player or team has scored with how many goals the player or team should have scored based on the quality of their chances. The second, passing profile, provides deeper insights into the pass quality of a player or an entire team. Both stats are debuting during Matchday 4 on September 11, 2021, featuring the faceoff between German Champion FC Bayern München and the second-place team of the previous season, RB Leipzig.
Bundesliga “Match Facts” help audiences better understand nuanced aspects of the game of football, such as decision making on the pitch or what goes into exceptional player performance. Bundesliga generates the match facts by gathering and analyzing the match feeds from live games in real time as they’re streamed into AWS. On the backend, Bundesliga uses AWS capabilities in analytics, machine learning, compute, storage, database, serverless, and media services to process and store the vast amount of data that powers these statistics, as well as to train, deploy, and scale the machine learning models used to generate predictions. Fans see the insights as graphics during broadcasts, with additional details in the official Bundesliga app. The two new match facts will better showcase the action on the field and give fans, coaches, players, and commentators visual support for analyzing players’ and teams’ performance.
To develop these stats, machine learning models trained on Amazon SageMaker (AWS’s service that enables data scientists and developers to build, train, and deploy machine learning models quickly) analyzed thousands of video hours of previous Bundesliga seasons. In the case of shot efficiency, Bundesliga trained its machine learning models on a dataset of more than 40,000 historical shots on goal, which includes features derived from player positional data, such as distance to goal, angle to goal, player speed, number of defenders in the line of a shot, and goalkeeper coverage.
In the case of passing profile, Bundesliga analyzed video of nearly 2 million passes and used that data to construct an algorithm that computes a difficulty score for each pass at any moment, evaluating characteristics such as distance to the receiver, the number of defending players in between, and pressure on a player. Once computed, Bundesliga aggregates difficulty scores for each player and team to form a passing profile.
Andreas Heyden, executive vice president of digital innovations for DFL Deutsche Fußball Liga (DFL) Group, said, “Bundesliga Match Facts powered by AWS allow us to give fans more insight into the game of football, broadcasters more interesting stories to tell, and coaches and teams more data to excel at their game. Last year, the reception for Bundesliga Match Facts around the world was very positive, and we will continue to raise the bar and innovate on these analytics using machine learning to make them even better. The two new stats for this season give fans a view into player efficiency that hasn’t been achieved before, and we are still just at the beginning of our relationship with AWS. I’m excited to see how technology will continue to evolve the fan experience and the game.”
Klaus Buerg, general manager for AWS Germany, Austria, and Switzerland, Amazon Web Services EMEA, added, “Through the work we’ve accomplished with Bundesliga in creating eight Bundesliga Match Facts in a short period of time, we are giving fans a new way to appreciate player speed, field positioning, goals, passing, and shot efficiency, creating even more excitement in watching the game.”
Shot efficiency and passing profile
Football fans love to revisit and break down scoring opportunities to better celebrate or mourn key game moments. This new stat helps fans determine which players or teams best exploit their chances at scoring a goal. It compares the number of goals that a player or team has actually scored with the cumulative value from expected goals (xGoals)—an existing Bundesliga Match Fact—which is the number of goals the player or team should have scored based on the quality of their attempted shots. The difference between these two values is the shot efficiency number. If the value is negative (shown on TV by a red arrow pointing down), the player or team has scored fewer goals than would have been expected. If the value is positive (green arrow up), the player or team exceeded the expected value.
For the first time, each player’s efficiency can be objectively assessed based on the overall quality of shots and the number of goals scored. For example, this advanced stat can compare two strikers who scored the same number of goals after 10 matchdays to determine which player is converting goals in challenging versus easy situations. Commentators can also use the stat, for instance, to analyze if players have a high number of goals because they are well supported by their teammates or because they exploit openings in the defense.
Fans often put themselves into a player’s shoes and compare the choices they might have made in a given situation to what the player actually did. This new stat helps fans understand how players think and decide where to pass the ball. It also provides deeper insights into the pass quality and pass strength of players and teams, including which passing decisions they prioritize, such as an offensive pass, passing the ball back, or opening up play with a long ball. Before passing profile, the effectiveness of player passing was measured primarily by the number of passes that successfully reach a player’s target; now, with passing profile, it is possible to assess the quality of passes too, accounting for pass difficulty.
For instance, by looking at how many opponents press the recipient and passer, how high the ball is in the air, and how many opponents were positioned between the recipient and passer, the stat calculates the pass difficulty rating. It also offers further insights into the passing behavior of a player or team by identifying the number of long and short passes, pass direction, and the type of passes a player favors.
A deep dive into analyzing passing in football
In the 2020-2021 Bundesliga season, on average 917 passes were completed per match. The record for the highest and lowest number of completed passes was set in the same match, on Match Day 26 when Arminia Bielefeld hosted RB Leipzig. Arminia completed 152 passes compared to 865 by Leipzig. In traditional football analysis, the amount of completed passes is widely seen as an indicator for team dominance (even though Leipzig won that match only by the slightest of margins: 1:0).
To assess an individual player’s performance, consider that in 2020-2021 the average Bundesliga player completed 86% of his passes, but individual numbers vary from 22% completion rate to 100%. If a higher completion rate indicates dominance, what then if the player with 22% brought a striker into a scoring position with every pass, ideally bypassing several defenders each time? And what does it say if the player with the 100% rate was a defender passing the ball horizontally to another defender, because he couldn’t find a teammate to pass to up the pitch? Then again, a horizontal pass can be a great tool for offense too, opening up the field by moving the ball to the other side of the pitch.
Building an ML model that can predict the difficulty of a given pass requires the creation of a large dataset filled with both successful and unsuccessful passes from the past. Although much is known about successful passes (for example, the receiver, the location where the ball was controlled, the duration and distance of the pass), little is known about an unsuccessful pass because it simply didn’t reach its intended target. AWS therefore adopted an approach proposed by Anzer and Bauer (2021) to identify the intended receiver of an unsuccessful pass using a ball trajectory and motion model so that we can add these entries into our passing dataset.
Writing in a blog to explain the development of the model, the AWS professional services team said although it sometimes doesn’t look like it, a ball has to adhere to the laws of physics. “We can use gravity, air drag, and rolling drag to map the trajectory of a pass. With a physical model as proposed by Spearman and Basye (2017), we can use the first 0.4 seconds after a pass is given to map the entire trajectory of the ball. The physical model in the following figure estimates the trajectory based on this 0.4-second timeframe.” “This computed path is shown in the image on the left in orange. In this example, player 11 from the blue team attempts to initiate an attack with a pass to player 32, who is making a run on the right flank. To evaluate our physical model, we can compare the estimated ball trajectory to the actual trajectory provided by the tracking data, shown in black. Comparing both trajectories shows that the estimated trajectory is fairly close to reality. However, the model doesn’t account for the drag force after a pass meets the pitch (due to weather conditions) and doesn’t consider curve balls because this information isn’t available within the tracking data.”
After the trajectory of the ball is modeled, the AWS team said they know where it’s estimated to land. The next step is to calculate who could reach the ball. This is done by a motion model. This motion model estimates the area a player can reach within a pre-defined time window and is largely based on a player’s speed and direction. The model is compared to the movement of players in the previous three seasons of Bundesliga data to understand how players move. The results can be visualized into four circles around each player, representing the area they can reach within 0.5, 1, 1.5, and 2 seconds.
They add, “Each player’s potential movement is computed and compared to the estimated landing location of the ball. Given the assumption that a ball can be controlled when it’s below 1.5 meters in height, we can make an estimated guess of which player could reach the ball first. Now, to estimate the intended receiver of a given pass, we combine the ball trajectory model with the motion model. If we map the trajectory of an unsuccessful pass, we can use our motion model to determine which player could reach the ball first. This player is likely to be the intended receiver of a pass. We can then use this information to add relevant data points (such as the receiver, the location where the ball was controlled, the duration and distance of the pass) for unsuccessful passes to our dataset.”
They go on to explain how ML can be used to estimate passing difficulty, and then estimate the passing profile of a player and his passing efficiency.
They then further describe training of the passing profile model. “The passing profile model is only the tip of the iceberg; behind the scenes we need to account for several important operations, such as continuous training, continuous improvements to the model, continuous deployment of new models, model monitoring, metadata tracking, model lineage, and multi-account deployment. To address these particularities of industrializing ML models, we created training and deployment pipelines. Moreover, looking towards the future development of additional Match Facts, we invested additional time in developing reusable model training and deployment pipelines. These generic pipelines are designed and implemented using the AWS Cloud Development Kit (AWS CDK). Templatizing these pipelines ensure the consistent development of new Match Facts while reducing effort and time to market. Our architecture considers all our three environments: development, staging, and production. Given the experimental nature of model training, the actual training pipeline resides on our development environment. This allows our data and ML engineers to freely work and experiment with new features and analysis.”
AWS said they used historical data of nearly 2 million passes to build an ML model on SageMaker, which computes the difficulty of a pass. The model is based on 26 factors, such as distance the ball travels or the pressure the passer is under. They conclude “We’ve shown how to build a reusable model training pipeline and facilitate multi- and cross-account deployments of ML models with the click of a button.”