Academia.eduAcademia.edu

Football analytics: a literature analysis from 2010 to 2020

2021

The overall goal for the current study is to present a literature review of analytics, precisely machine learning (ML) reference authors in terms of methods and applicable scopes of study, in football where is a field that historically there are empirical decisions and the usage of analytics has been growing intensely. The research aims to list relevant academic contributions published between 2010 and 2020, performing a comparable picture per authors across the following subsets: player individual technical skills and team performance. Furthermore, the approach will provide a summary of studies for machine learning methods applied in football. Such outcomes of this study would contribute to the discussion about football analytics. Regarding that these summaries can drive researchers to have a deep dive into the fields of interest straight to references preview studied in the thesis. Results indicate that football analytics has broadly vast opportunities in terms of research, regarding machine learning methods and a high potential to have a deep exploration of team and player perspective. This study can leverage and pavement new further in-depth and targeted investigation toward football analytics.

FOOTBALL ANALYTICS: A LITERATURE ANALYSIS FROM 2010 TO 2020 Leonardo Mendes Serra Fontanive Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management 2 NOVA Information Management School Instituto Superior de Estatística e Gestão de Informação Universidade Nova de Lisboa FOOTBALL ANALYTICS: A LITERATURE ANALYSIS FROM 2010 TO 2020 by Leonardo Mendes Serra Fontanive Dissertation presented as the partial requirement for obtaining a Master's degree in Information Management, Specialization in in Knowledge Management and Business Intelligence Advisor: Mauro Castelli January 2021 2 ABSTRACT The overall goal for the current study is to present a literature review of analytics, precisely machine learning (ML) reference authors in terms of methods and applicable scopes of study, in football where is a field that historically there are empirical decisions and the usage of analytics has been growing intensely. The research aims to list relevant academic contributions published between 2010 and 2020, performing a comparable picture per authors across the following subsets: player individual technical skills and team performance. Furthermore, the approach will provide a summary of studies for machine learning methods applied in football. Such outcomes of this study would contribute to the discussion about football analytics. Regarding that these summaries can drive researchers to have a deep dive into the fields of interest straight to references preview studied in the thesis. Results indicate that football analytics has broadly vast opportunities in terms of research, regarding machine learning methods and a high potential to have a deep exploration of team and player perspective. This study can leverage and pavement new further in-depth and targeted investigation toward football analytics. KEYWORDS Machine Learning; Football; Sports; Analytics; Knowledge 3 INDEX 1. INTRODUCTION ............................................................................................................ 8 1.1. Background and problem identification................................................................ 8 1.2. Problem (Research question)/ General Objective (main goal) ............................. 9 1.3. Specific Objectives ................................................................................................. 9 1.4. assumptions........................................................................................................... 9 1.5. Study relevance and importance ........................................................................ 10 1.6. Methodology ....................................................................................................... 10 2. LITERATURE REVIEW................................................................................................... 11 2.1. Football ................................................................................................................ 11 2.1.1. Overall Concept ............................................................................................ 11 2.1.2. Player individual technical skills: the complexity, samples, and limitations 12 2.1.3. Team Performance and the associated challenge ....................................... 15 2.1.4. Other Fields .................................................................................................. 16 2.2. football analytics ................................................................................................. 17 2.2.1. Concept......................................................................................................... 17 2.2.2. Use Cases ...................................................................................................... 18 3. MACHINE LEARNING MODELS .................................................................................... 28 4. AUTHORS PER FIELD: PLAYER AND TEAM .................................................................. 37 5. CONCLUSIONS ............................................................................................................ 46 5.1. Synthesis of the developed work ........................................................................ 46 5.2. Future work ......................................................................................................... 46 6. BIBLIOGRAPHY ............................................................................................................ 48 4 LIST OF FIGURES Figure 1 - Technical performance and soccer player ratings (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017) ............................................................................................ 12 Figure 2 – Importance of technical and contextual features to human rating process (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017) ......................................... 13 Figure 3 - Proposed definitions for skill-related performance in soccer based on a literature search (Aquino, Alves, Fuini, & Garganta, 2017) ............................................................. 14 Figure 4 – A sports analytics framework (Morgulev, Azar, & Lidor, 2018) .............................. 17 Figure 5 – Model of value creation from business intelligence & Analytics in competitive sports (Caya & Bourdon, 2016) ........................................................................................ 18 Figure 6 - Screen capture from Match Vision Studio Premium® software with the categorical matrix used in the study (Borges, Garganta, Guilherme, & Jaime, 2019) ....................... 19 Figure 7 - TeamSense software application developed in Matlab with entry point view; an example of a time-plot variable output (e.g. surface area); an exemplar of photogram from a variable 2D video animation (Frias, 2012) ............................................................ 20 Figure 8 - statistical indicators for improving the soccer performance from an individual’s perspective (Vanoye, Penna, Parra, & Díaz, 2017) .......................................................... 21 Figure 9 - statistical indicators for improving the soccer performance from the team’s perspective (Vanoye, Penna, Parra, & Díaz, 2017) .......................................................... 22 Figure 10 - Schema of the PlayeRank framework. Starting from a database of soccer-logs (a), it consists of three main phases. The learning phase (c) is an “offline” procedure: It must be executed at least once before the other phases since it generates information used in the other two phases, but then it can be updated separately. The rating (b) and the ranking phases (d) are online procedures, i.e., they are executed every time a new match is available in the database of soccer-logs (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) ............................................................................................................... 23 Figure 11 - Zone of ball recovered (Santos, et al., 2016) ......................................................... 25 Figure 12 - Zone of the last pass to shot (Santos, et al., 2016) ................................................ 25 Figure 13 - Shot zone (Santos, et al., 2016) ............................................................................. 26 Figure 14 - distribution of positions per event type (Pappalardo, et al., 2019) ...................... 27 Figure 15 - Player passing networks (Pappalardo, et al., 2019) ............................................... 27 Figure 16 - Model used for comparison (Khan & Kirubanand, 2019) ...................................... 34 5 Figure 17 - Left: Illustration of the football game. Right: Strategies of the hand-crafted rulebased agent (Boyd-Graber, He, Kwok, & Daumé, 2016).................................................. 34 Figure 18 - The attack leading up to Barcelona’s final goal in their 3-0 win against Real Madrid on December 23, 2017 (Decroos, Haaren, Bransen, & Davis, 2019) .................. 41 Figure 19 – Value per action (Decroos, Haaren, Bransen, & Davis, 2019) .............................. 42 Figure 20 - The skill requirements (Key Performance Indicators) for the different positions in soccer (Hughes, et al., 2012) ........................................................................................... 42 Figure 21 - Example of a passing event as recorded by OPTA (Kröckel, 2019) ....................... 43 Figure 22 -England’s offensive sequences ending in a shooting attempt (Kröckel, 2019) ..... 44 Figure 23 - An example of ball possession chain data. The table shows a part of a ball possession chain dataset, which represents events in the 1st half of a match (Kusmakar S., et al., 2016) .................................................................................................................. 44 Figure 24 - Validation of the proposed approach on the largest open collection of soccer logs from 7 major competitions (Kusmakar S., et al., 2020) .................................................. 45 6 LIST OF TABLES Table 1 - key performance indicators (Pratas, Volossovitch, & Carita, 2018) ........................ 15 Table 2 - For each competition, described the corresponding geographic area, the total number of seasons, matches, events, and players. (*) 21,361 indicates the number of distinct players, as some players play with their teams in both national and continental/international competitions (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019)................................................................................................................................. 24 Table 3 - ML Summary ............................................................................................................. 31 Table 4 - Algorithms performance on match data (number of attributes=336) (Kumar, 2013) .......................................................................................................................................... 32 Table 5 - Algorithms performance with selected attributes (threshold=0.07, number .......... 32 Table 6 - Top 34 highest gain- ratio team attributes (Kumar, 2013) ...................................... 33 Table 7 - Evaluation of logo based replay using SVM and NN (Zawbaa, Hassanien, & ElBendary, 2011) ................................................................................................................. 35 Table 8 - Confusion matrix for event detection and summarization (Zawbaa, Hassanien, & ElBendary, 2011) ................................................................................................................. 35 Table 9 - An example of ball possession chain data. The table shows a part of a ball possession chain dataset, which represents events in the 1st half of a match (Kusmakar, et al., 2016)....................................................................................................................... 35 Table 10 - The predictive performance of the developed machine learning models. Shown are the segments predicted in favor of a team with the overall prediction accuracy, the predicted winner, and the true match results (Kusmakar, et al., 2016) ......................... 36 Table 11 – Authors Summary per field: Player and Team ....................................................... 40 7 1. INTRODUCTION 1.1. BACKGROUND AND PROBLEM IDENTIFICATION Football has been improving the data-driven approach in terms of the decision. Following this trend, data researches have been emerging in the clubs and the interest in data is growing. Also, with the increase of the Internet of Everything Technologies in Football, new sources of data are leveraging the opportunities of analysis. Combining data, technology, and football, (Américo, 2013) points out innovation introduced in football led to the creation of game analysis systems. Thus, these systems analyze players' moves collectively and individually to obtain statistics that support coaches' plans during the games. Fundamentally, the mentioned approach uses sophisticated techniques and algorithms to deal with important information about broad aspects of the game, such as studies of the aptitude and tactical capacity of the players and teams present in the game. According to (De Silva, et al., 2018), it´s possible to apply data analytics from some specific football perspective. Also, it would be a key success to balance player indicators. Thus, for instance, tracking in football would enable monitoring and, consequently, would be possible to determine if an athlete is performing accordingly on matches and training sessions. (Pires & Santos, 2018) presents a study which states that increasingly sports results often lie in the details that can be noticed with the use of technology or device that can make the difference. Also, nowadays in the big data era, sports are part of that, because, regarding there are large amounts of data collected that can be applied for analysis, and sports technology is in constant growth and development, as it has been seen the increase in science in the sports area. Joining within the above findings, (Memmert, et al., 2011) concludes that many experts regard tactics as the factor which gets the least attention in the training process and the work is heavily empirical on those deficits not using a scientifically based analysis of football-specific group tactics. Exploring the scenario presented in this study broadly, the study of (Mohammadi & Sorour, 2018) and (Rein & Memmert, 2016) illustrates also the effects on jobs and the workforce in general when the subject is Machine Learning, regarding that knowledge work automation addresses creative problem-solving. Furthermore, ML techniques are enablers for big data analytics and knowledge extraction. (Novatchkov & Baca, 2013) summarizes, regarding gathering data in sports that it is possible to have an insight into significant characteristics. In one way these inputs can be applied for the development of intelligent methods adapted from conventional machine learning concepts, allowing an automatic assessment, and providing appropriate feedback. Thus, in practice, the implementation of such techniques could be crucial for the investigation of the quality of the execution, the assistance of athletes but also coaches. Analyzing the researched scenario, in the literature, there is an expressive group of machine learning methods for football analytics. Also, among all the perspectives that involve the game, the context of 8 player individual technical skills and team performance seems to be findings that can have interesting results, regarding the lack of scientific decision support approach. 1.2. PROBLEM (RESEARCH QUESTION)/ GENERAL OBJECTIVE (MAIN GOAL) Data analytics is like a storm that is gathering in football. It is one that will wash away all certainties and change the game we know. Football analytics has not progressed as far as it could. Every shared knowledge will support us to love more the game. Analytics in football is the newest frontier. It´s not just a matter of collecting data. You have to know what to do with them (Anderson & Sally, 2013). Also, technology has been changing the data collection process in high-performance sport. So it increases the successful athletes and teams will be those who are supported by a strong analytical capability to create all the required analysis to create competitive advantage (Schulenkorf & Frawley, 2017). Figuring out the background and problem identification, the research aims to present relevant academic contributions published between 2010 and 2020, performing a comparable picture per authors across the following subsets: player individual technical skills and team performance. Furthermore, the approach will provide a summary of studies for machine learning methods applied in football. It´s expected to reach this general objective, giving summaries and a basis for discussion toward football analytics in terms of methods and the referred subsets per authors. 1.3. SPECIFIC OBJECTIVES We can divide the primary goal of this project into the following objectives, respecting the order of steps: 1. To investigate and study relevant academic contributions published between 2010 and 2020 toward machine learning methods applied for football analytics and the respective used fields: player individual technical skills and team performance; 2. To evaluate and discuss the most relevant ML models per authors to perform a summary; 3. To evaluate and discuss the most relevant authors per chosen fields to perform a summary; 4. To give future directions for the next steps using this study. 1.4. ASSUMPTIONS The present research will underline references to build summaries about the subject of football analytics. The proposal is to target the relevant authors and drive a literature review, not a comparison between algorithms, regarding the accuracy and the weight of each subset approached in the football context. 9 1.5. STUDY RELEVANCE AND IMPORTANCE Regarding relevance and importance, it can be highlighted that coaches can also take advantage of the data extracted from the players’ performance. They are moving toward integrating analytics more dynamically. Decisions may use technology to support and justify them. Coaches can make tactical decisions, helping scouts decide what players their team should recruit as well. Coaches have access to live analytics of the players’ physical condition and overall performance made accessible by the analytics department that is also working in real-time in the stands (Korte, 2014). Joining the idea, analytics has influenced the tactics in professional baseball and basketball in recent years. Ultimately, it may have just as great an impact on football, which traditionally hasn’t relied on statistics to figure out much of anything. In 2019, Liverpool, a recognized team in England that invest a lot in Analytics used data to big decisions and supported them to arrive at the last UEFA Champions League final (Schoenfeld, 2019). Regarding the economic impact, the market for sporting events is worth $80 billion in 2014—with impressive growth projected for the foreseeable future. For a content industry wracked with uncertainty, sports analytics is a beacon of hope. On a sport-by-sport basis, growth occurred nearly across the board, but football remains the runaway leader. Football revenues increased from $25.1 billion in 2009 to $35.3 billion in 2013. (Collignon, 2019). The work of (Morgulev, Azar, & Lidor, 2018) provides an introduction that underlines big data characteristics of sport as a uniquely authentic arena for exploring research ideas, an excellent source for analytics and specifically those concerning certain contexts of human behavior. Additionally, the study of (Kumar, 2013), presents that if we search at the performance analytics literature related to football, we realize that most of the research is being done with few performance variables and is dependent on understanding the structure of the game. The possible interaction of multiple factors impacts the complexity of football match analysis. Thus, it makes it necessary to improve the contribution in terms of studies and cases. 1.6. METHODOLOGY The design of the research will be descriptive. The first target of this research will have a consolidated literature review about football analytics. The second step will evaluate and discuss the most relevant ML models per authors to perform a summary, regarding the literature analysis. The third step will be a set of relevant authors per the following fields: player individual technical skills and team performance. At the end present the results and discussions with a conclusion and recommendation for further researches in the future. 10 2. LITERATURE REVIEW 2.1. FOOTBALL 2.1.1. Overall Concept (Ali, 2011) approaches football as the premier audience sport in the world, due to its growing popularity, as well as the amount of financial interest in the game, it is one of the most extensively researched intermittent team sports. Highlighting, there are plenty of subject areas that have taken advantage of scientific knowledge gained from football including the natural and physical sciences, medicine, and social sciences. Another important aspect to add on a timeline perspective until now, (Doidge, et al., 2019) shows that football has been facing an economic transformation over this time and there has been a significant change in how football clubs are managed, within an increasing focus on commercial and media growth. Also, it has been performing investments in stadiums and branding. Furthermore, sponsorship and contracts have been increasing exponentially to achieve global audience penetration. For instance, the Premier League and Champions League were formed in 1992 and these have acted as economic models that leverage the other leagues and turned it into a successful model and push the economic transformation. Adding new components to conceptualize football, regarding (FIFA, 2019), the world governing body, since its birth, football has been part of our communities. It is more than a game, more than a sport, it is a way of life that we all embrace, regardless of nationality, creed, ethnicity, education, gender, or religion. And It is about supporting the growth and development of football by promoting the integrity and quality of the game for everyone, and not just for today, but for generations to come. According to (Ali, 2011) and raising the bridge to arrive on the study subject, football is a complex sport, requiring the repetition of many disparate actions. For instance, there are several proofs of concepts that are currently being used such as assess the physical prowess of players, approaching simple running tests using monitor speed, agility proofs, and repeated sprint performance. Joining what (Ali, 2011) concludes, (De Silva, et al., 2018) presents that performance management of top football players is a complex system involving enhancement of physical performance, skill-based training, tactical training, minimization of injury risk, and psychological support. Managing practice is vital to allowing players to perform at an optimal level throughout a play season's length. (Constantinou & Fenton, 2017) agrees with (Ali, 2011) and (FIFA, 2019) that Football is the most popular sport in the world and it leverages the inspiration of several researchers to use football activities as a real-world application field to test various statistical, probabilistic, and machine learning techniques. Continuing to explore the complexness, referring (Qing, et al., 2020), football is influenced by many factors such as technical, tactical, mental, and physiological, however as the matches have a highlevel complexity and dynamic behavior, some other aspects needed to be addressed as situational variables as match location, team quality, quality of opposition and match outcome. Also, it´s important to underline other game nature dynamics as interactions between players and positions. 11 2.1.2. Player individual technical skills: the complexity, samples, and limitations The football player's technical skills are directly connected with data and performance, (Caya & Bourdon, 2016) proposes that Business Intelligence & Analytics techniques can represent individual athletes well in their pursuit of positive achievements. Also, individual athletes are anxious to leverage their athletic performance in their respective sport and aspire to be good at what their sport demands in terms of physical and competitive accomplishments. Continuing with the topic and setting practical samples, relating to football player's technical skills, data, and performance, the study of (Spearman & Basye, 2017) presents a model for ball control in football based on the concepts of how long it takes a player to reach the ball (time-to-intercept) and how long it takes a player to control the ball (time-to-control). Thus, players would keep the advantage in understanding this physics-Based modeling of pass probabilities translated and applied to the field, regarding that these metrics are constructed at the per-player level. Additionally, according to the figure below from (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017), there is another sample of events produced by a player during a match which would be considerable in terms of player individual technical skills. The study uses as a source a game followed by reporters from three sports newspapers, then they assign an individual player rating according to personal interpretation of each player´s performance. Figure 1 - Technical performance and soccer player ratings (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017) 12 Furthermore, the research of (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017) points out one specific attention needed before analysis which is that the human evaluation process has a limitation of features which attract their attention and then construct the evaluation. Thus, an important step is to understand how the human evaluation process can be leveraged with the support of data science and artificial intelligence. In the following figure, charts indicate the importance of every attribute, normalized in the range [0; 1], to the human rating process for football typical positions as Goalkeepers (a), Defenders (c), Midfielders (d), and Forwards (b). Hence, taking advantage of machine learning models, as the plots indicate, most of the features have a negligible influence on the human judge’s evaluation process. Figure 2 – Importance of technical and contextual features to human rating process (Pappalardo, Cintia, Pedreschi, Giannotti, & Barabási, 2017) 13 Continuing to approach the importance of context, the study of (Aquino, Alves, Fuini, & Garganta, 2017) identifies the lack of definition and classification of the skill-related variables. Also, it was detected two additional limitations: the contextualization of the sample omitted and the influence of match situational variables (e.g. location, quality of opponent, status); and the absence of representative task design to measure skill-related performance. The following figure shows definitions proposed by the research according to the literature. Figure 3 - Proposed definitions for skill-related performance in soccer based on a literature search (Aquino, Alves, Fuini, & Garganta, 2017) Moreover, (Aquino, Alves, Fuini, & Garganta, 2017) focuses that a fundamental task in sports science and performance analysis is to understand the relationship between skill acquisition and the development of players to achieve sports excellence. Hence, it is essential to develop theoretical principles to guide the concession of skill acquisition programs. The improvements for decisionmaking and regulation of action in dynamic environments, for instance in football, come out from the continuous performer-environment interactions. 14 2.1.3. Team Performance and the associated challenge Regarding the game, the last topic approached the player technical skills in terms of complexity, samples, and limitations, there is another perspective that will be addressed in this study: the team as a separate entity for performance analysis. Collaborating with the statement above, the study of (Pratas, Volossovitch, & Carita, 2018) refers that target-scoring trends have been analyzed from two different perspectives, according to research studies performing football match analysis: the static and the dynamic. Also, the inherent randomness of football makes the analysis even more impactful, and what makes a difference is perhaps not the data itself, but the ability to use this data to formulate a theory that explains how a team increases their chances of winning. Furthermore, some relevant performance metrics correlated with goal-scoring are underlined and may be appropriate in each context of the game but would likely be insignificant in another, depending on several factors related to the quality of the teams and the style of the game. Thus, Football analytics determines that performance indicators are relevant, and the problem is how the importance of different performance metrics varies depending on the context. Continuing to explore the challenge of analysis in terms of team performance, according to (Lepschy, Woll, & Wäsche, 2018), despite the popularity of football and while reviews on performance indicators in football are available, none focuses solely on the identification of success factors. Additionally, it appeared that the most significant variables are efficiency (the number of goals divided by the number of shots), shots on goal, ball possession, pass accuracy/successful passes as well as the quality of opponent and match location. Following again the research of (Pratas, Volossovitch, & Carita, 2018), the characteristics that can be approached and analyzed on goal scoring as key performance indicators: Pass accuracy Number of passes Temporal Analysis Duration of possession Scoring Efficiency Number os passes Types of passes Game situation Zones in which possessions started First and next goals Playing style Space-time coordination Scoring efficiency First goal Areas from which goals were scored Table 1 - key performance indicators (Pratas, Volossovitch, & Carita, 2018) The research of (Sarmento, Campanico, & Marcelino, 2014) shows an overview in terms of team analysis, it indicates relationships between patterns of physical and efficacy of game actions (involvements with the ball, successful, passes, dribbling, shots, and shots on target). Also, highintensity activity patterns can be a key success factor for team performance and players of more 15 successful teams covered greater total distances with the ball, and at very high-intensity running, had a high average of goals for total shots on target, performed more actions with the ball, higher number of passes, tackles, dribbling, and shots on target when compared with less successful teams. 2.1.4. Other Fields Beyond the scope of this study, football has a broad scope under different fields, regarding (Morgulev, Azar, & Lidor, 2018), sports betting market is one of largest sports business sector where consumers and suppliers try to predict the results of future events correctly. Scientists are therefore continuing to develop a variety of models that are constructed using different methodologies for forecasting. Such models concentrate on predicting the outcomes of individual matches or the outcomes of the tournament. It has become an arena for making progress in computing and machine learning, with cutting-edge predictive analytics, due to the extremely competitive nature of the gambling industry. The research of (Klyuchka, Cherednichenko, Vasylenko, & Yakovleva, 2015) aimed to find the most important factors that are not confidential information and can be easily determined before the start of the football match. It presents that forecasting rules are used to increase the accuracy of predicting the results of football matches by identifying the winning team based on data retrieved from results of previous games championship, adding substantial factors, to understand the influence of results. The football codes are recurrent team sports with high-intensity action bursts that are intermix with low-intensity and rest events. A variety of pressures imposed on intermittent sporting team members contribute to temporary, acute, or chronic fatigue. Fatigue is dynamic and multifactorial and depends on various contextual factors such as physical ability, technical abilities, the role of play, training load, the importance of the game, and seasonal period. The number of competitive matches per season is often very high; thus, between training sessions and competition, athletes only have a short period to recover. There is evidence that too many matches can result in a lack of motivation and mental burn-out, a decline in physical and match results, and an increase in injuries. There is evidence that too many matches can result in a lack of motivation and mental burn-out, a decline in physical and match quality, and an increase in injuries. Recovery approaches are therefore required to relieve fatigue, recover efficiency, and reduce injury risk. (Clarke & Noon, 2019) Such studies as (Morgulev, Azar, & Lidor, 2018), (Klyuchka, Cherednichenko, Vasylenko, & Yakovleva, 2015), and (Clarke & Noon, 2019) are outside the scope of this research, but It´s important to notice that, conceptually, there are other fields which can be combined to cope deeply team and individual performance, especially injury prevention according to (Clarke & Noon, 2019). 16 2.2. FOOTBALL ANALYTICS 2.2.1. Concept Moving forward with the football performance context in terms of team and individual player skills, it’s important to connect the concepts of football analytics, (Babbar, 2019) shows that sports have been facing progress from just being a sport to the involvement of science in it. The referred study approaches Sports analytics as a combination of data collection, forecasting the game, and using tools and techniques to interpret the game strategy to improve a player's performance individually and for the team. Hence, it is expected that sports analysis will foster many new applications for endusers, sports coaches, and sports managers. Also, analytical goals in these applications include a comparison of results, prediction, and behavioral correlation of attributes between players and teams. Moreover, information can be either quantitative or qualitative and is usually collected from the athletes' biographical data, performance, medical reports, and scouting reports. Following the rationale of (Babbar, 2019), football analytics can support reliable and systematic data enabling athletes and coaches to leverage their decisions. To put in place this kind of initiative, realtime systems can be used for finding key analysis points, capturing the position of ball and movement of players throughout the game, and combining it with advanced statistical algorithms and software would enable coaches, managers to alter their tactics to gain an upper hand on the competitor. According to (Morgulev, Azar, & Lidor, 2018), sports analytics can be supported as a framework, historical data can be either quantitative or qualitative and these data are typically collected from multiple sport-relevant resources and the collected data are standardized, centralized, integrated, and analyzed using different metrics. Thus, it is assumed that a reliable and systematic analysis of the data will enable different stakeholders to strengthen their decision-making processes. A sports analytics framework is described in the following figure. Figure 4 – A sports analytics framework (Morgulev, Azar, & Lidor, 2018) Under this context and regard the study of (Morgulev, Azar, & Lidor, 2018), in football, teams in the English Premier League (EPL) became advanced in terms of performance analytics. Although, when 17 compared to basketball, for instance, assessing players’ skills to score in football is slowed down by the low frequency of scoring events. Tactical factors, such as the number and length of possessions, passing sequences, and spatial analysis of the territory played are aggregated to optimize performance. Additionally, a specific example of how players and coaches may benefit from the assessment of large samples of events in football is the information combined with probability in the directions of penalty shots, based on the shooters’ previous statistics provided by the analysts to the goalkeepers before critical matches. As football is a competitive sport, an integrative framework was developed by (Caya & Bourdon, 2016) in which the potential value from Business Intelligence and Analytics project is tight to the actual focus of value where these investments are expected to happen. While this framework provides a very high-level representation of how and where this kind of project generates value in competitive sports, it conducts more accurate examples of value creation at each level of analysis (institutional, organizational, and individual levels). Within each “sub-model”, specific detail about the nature of Business Intelligence and Analytics initiatives, along with particular conversion contingencies and measures of value created from those same initiatives. Figure 5 – Model of value creation from business intelligence & Analytics in competitive sports (Caya & Bourdon, 2016) 2.2.2. Use Cases To make the concepts of football analytics tangible, this sub-section presents a compiled of samples that show the applicability of the main purpose of the section. There are seven studies, each one with a different perspective in the football analytics field, such as tactical efficacy, offensive 18 behaviors, defensive playing method, analysis of match situations, statistical indicators for improving player performance, data-driven framework to evaluate player performance, situations that finished in goal and distribution of the positions of the events. The first sample is the study from (Borges, Garganta, Guilherme, & Jaime, 2019), it refers to tactical efficacy and offensive game processes adopted by Italian and Brazilian youth soccer players, approached the performance of the team with the following scope: 218 offensive actions selected from 28 matches, including 18 matches of the Italian team U-15 in dispute for the Italian championship, season 2015/2016, and 10 matches of the Brazilian team – 5 matches U-15 and 5 matches U-17 – in dispute for the national and state championship, season 2016. Additionally, Matches were randomly selected along the season. The research of (Borges, Garganta, Guilherme, & Jaime, 2019) takes advantage of data through observational analysis using Match Vision Studio Premium®, software that enables the researcher to create a categorical matrix according to the variables to be analyzed. Figure 6 - Screen capture from Match Vision Studio Premium® software with the categorical matrix used in the study (Borges, Garganta, Guilherme, & Jaime, 2019) (Borges, Garganta, Guilherme, & Jaime, 2019) concludes that all offensive sequences ended in shots according to the following variables: number of players involved, ball touches, passing, duration, corridor change. Also, defines offensive actions as three: a counter-attack, quick attack, and positional attack. Thus, with this context, the research suggests that all offensive methods adopted can be used to achieve success during a game of U-15 and U-17 soccer players. 19 The research of (Castelão, Garganta, Afonso, José, & Costa, 2015) targets to analyze offensive behaviors performed by six national football teams that were involved in the finals of the 2006 world cup and 2004 and 2008 Euro Cup. The mentioned study of (Castelão, Garganta, Afonso, José, & Costa, 2015), supported by the software SIDS (Sequential Data Interchange Standard & GSEQ (Generalized Sequential Querier), uses sequential analysis by the lag method to verify the different offensive game patterns and analyzes 647 offensive game sequences. Also, with the bias and sample explored, it shows that topperforming football teams drive different patterns and methods of offensive play and yet be victorious. Furthermore, no patterns were found to be more effective than others, regarding any specific offensive behavior. Continuing approach football analysis on a group perspective, (Frias, 2012) presents that changes in the defensive playing method influence the collective behavior of football teams, using Team AMS software application and Team Sense as the support tool of the study. Moreover, it aimed to understand the influence of specific performance constraints on the actions of teams, with the main goal was to analyze the influence of the defensive method (zone vs. man-to-man) in the collective performance of football teams. Then, the research analyzes two small-sided games played by two teams of 6 players (5 outfield players plus a goalkeeper) both using zone defense in the first experimental condition and man-to-man defense in the second one. Thus, the collective performance of teams was captured by 4 collective variables found: surface area, stretch index, length per with ratio, and teams’ centers' distance. Figure 7 - TeamSense software application developed in Matlab with entry point view; an example of a time-plot variable output (e.g. surface area); an exemplar of photogram from a variable 2D video animation (Frias, 2012) Regarding the study of (Frias, 2012), summarizes that the changes imposed in the collective behavior of a team by the adoption of a different defensive method can be less strong than the own differences between the two teams. 20 The last section approached one study that may drive a balance in terms of prioritization between defense tactical and the opponent analysis, also, in the perspective of improving approach, the paper from (Vanoye, Penna, Parra, & Díaz, 2017) proposes to use metrics or statistical indicators for leveraging the football performance. Besides the rating of the individual errors with negative points: Goals Shots Off Target, Not goals from direct free-kicks and indirect free-kicks, unsuccessful dribbles, caught opposition offside, unsuccessful shots free-kicks or indirect free-kicks, head Shots Off the target, shots off target, unsuccessful long /short passes, pass directions incorrectly, pass lengths incorrect, pass locations incorrect, duels lost on the offensive/defensive, aerial duels lost on the offensive/defensive, own goals, penalties conceded, defensive mistakes, fouls Committed, corner crosses / direct or indirect free-kicks conceded, Throw-ins conceded, yellow or red cards, substituted off, and others, which significantly affects the soccer performance of the team, to the metric called Motivation Index or lack of motivation. To going further with this Index, the study of (Vanoye, Penna, Parra, & Díaz, 2017) takes advantage of a European football match to obtain the index of motivation and thereby determine the relationship of the index with the outcome of the match. In the meantime, the software NacSports supports the performance analysis, and indicators were distributed as individuals, according to figure 8, and as a team, regarding figure 9. Figure 8 - statistical indicators for improving the soccer performance from an individual’s perspective (Vanoye, Penna, Parra, & Díaz, 2017) 21 Figure 9 - statistical indicators for improving the soccer performance from the team’s perspective (Vanoye, Penna, Parra, & Díaz, 2017) After the experimentation stage and results, (Vanoye, Penna, Parra, & Díaz, 2017) concludes that small individual errors affect the motivation state of the players, which affects the result of the match. Hence, indicators of motivation can be a key tool to be applied to make corrections during the match and avoid losing the match by negative values indicators of motivation. Among the different supporting structures oriented to analysis in football, the work of (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) defines PlayeRank, a data-driven framework that offers a principled multi-dimensional and role-aware evaluation of the performance of football players. Moreover, it deployed a massive dataset of soccer-log, millions of match events on four seasons of 18 prominent soccer competitions. The framework of (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) consists of 3 phases starting from a database of soccer-logs: rating phase, in charge of computation of the performance rating; ranking phase, PlayeRank assigns a player to a position according to a set of rules and based on players rating; learning phase, generates information used in the rating and the ranking phases performing two steps, weighting and role detector training. 22 Figure 10 - Schema of the PlayeRank framework. Starting from a database of soccer-logs (a), it consists of three main phases. The learning phase (c) is an “offline” procedure: It must be executed at least once before the other phases since it generates information used in the other two phases, but then it can be updated separately. The rating (b) and the ranking phases (d) are online procedures, i.e., they are executed every time a new match is available in the database of soccer-logs (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) The dataset of (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) covers a total of 64 soccer seasons, more than 31 million events and was provided by the company Wyscout, a leading company in the football industry that connects soccer professionals worldwide, supporting more than 50 soccer associations, and more than 1,000 professional clubs around the world. Following the table below, all the details will be provided. 23 Table 2 - For each competition, described the corresponding geographic area, the total number of seasons, matches, events, and players. (*) 21,361 indicates the number of distinct players, as some players play with their teams in both national and continental/international competitions (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) The research of (Pappalardo, Ferragina, Cintia, & Pedreschi, 2019) shows PlayeRank, a data-driven framework that offers a multi-dimensional and role-aware evaluation of the performance of soccer players which also observed that top performances are rare and unevenly distributed since a few top players produce most of the considered excellent performances. Thus, a result that should be the focus is that top players do not always play excellently, they just achieve top performances more frequently than the other players. Besides, PlayeRank should be seen as a valuable tool to support professional football scouts in evaluating, searching, ranking, and recommending soccer players. Under team performance analysis, the research of (Santos, et al., 2016) focuses on the analysis of match situations that finished in goal, and also 557 goals were analyzed from 10 teams across Portugal, Spain, England, and German. Regarding those only goals that were possible to obtain the sequence of actions from the moment of ball possession were considered. The study used the Football Goal Observation System and found a higher number of goals happens from ball recovered through a lost ball on the offensive zone and offensive midfield areas, where the last pass occurs in offensive sector zones, through counterattack, within the penalty area with the right foot corresponding 53,17% of the sample and left foot 28,83%. These results are shown in the next three figures. 24 Figure 11 - Zone of ball recovered (Santos, et al., 2016) Figure 12 - Zone of the last pass to shot (Santos, et al., 2016) 25 Figure 13 - Shot zone (Santos, et al., 2016) The paper of (Pappalardo, et al., 2019) describes an open collection of soccer-logs that cover seven male football competitions provided by Wyscout and the data were approached on the football Data Challenge initiative (https://sobigdata-soccerchallenge.it/). Regarding that soccer-logs detailed match events, each containing these types of information: pass, shot, foul, tackle, a time-stamp, the player(s), the position on the field, pass accuracy, and other relevant collections. As well as the other studies presented above, (Pappalardo, et al., 2019) explores football analytics use cases, as the two figures described below, regarding player performance on the team context. The first one presents the distribution of positions per event type, plotting the distribution of the positions of the events during the match. The darker is the green, the higher is the number of events in a specific field zone. Moreover, the same figure also describes the distribution of the passes’ position during a match for each player’s role. The darker is the color, the higher is the number of passes in a specific field zone. Then, the second figure shows the represent player passing networks of the match Napoli and Juventus, Italian first division, where each node is a player, and edges represent passes between players. Moreover, The size of the nodes reflects the number of ingoing and outgoing passes, while the size of the edges is proportional to the number of passes between the players. 26 Figure 14 - distribution of positions per event type (Pappalardo, et al., 2019) Figure 15 - Player passing networks (Pappalardo, et al., 2019) 27 3. MACHINE LEARNING MODELS To evaluate and discuss the relevant ML models per authors, the study provides a summary of all the individual studies reviewed and presented in Table 2, tracking key findings in terms of authors, sample or dataset, methods or Algorithms, and key outputs. There was a mix of studies, with different datasets and Algorithms. Each one has a specific approach and achievement, but instead of evaluating isolated variables and accuracy, this chapter has the goal to report useful methods that can bring results based on a previous investigation and may drive a better understand of behaviors and use key reference authors as an influence to drive researches. Starting with Kumar's research, it helps to find attributes that are relevant to assessing players' performance and that most influence the game. Kumar explored different methods and the analysis found supports as a guide to having a starting point both to analyze specific characteristics as well as different machine learning algorithms that apply to football within a previously explored study. Following table 2, the second line, the research of (Khan & Kirubanand, 2019), Besides, the performance comparison between two algorithms such as XGBoost and SVM, has the purpose of accuracy in terms of match prediction, this study within the current context of this work can support potential algorithms that can be useful, concerning algorithms that behave well within the scope of football variables. In the sequence of the table, there is the study of (Boyd-Graber, He, Kwok, & Daumé, 2016), it approaches the use of machine learning to take advantage of the understanding of the opponent's behavior and adapt strategy based on predictions according to the opponent's parameters. Within the studies presented, (Zawbaa, Hassanien, & El-Bendary, 2011) study shows the potential use of algorithms such as support vector machine (SVM) and neural network (NN) for analysis of football performance. The paper takes advantage of a video summarization system and shows applicability to achieve a result of great accuracy and precision. Another preliminary analysis that can be used as a reference for research within football. The fifth line of the summary lists the (Kusmakar, et al., 2016) study. It analyzes the pattern-forming dynamics of player interactions that can leverage the understanding of tactical behavior. Also, the study explores quantitative measures of a team’s performance, focused on player interactions. Moreover, the research shows a path that machine learning-enabled approach for automated predictive analysis of performance. 28 Key Outputs Methods / Algorithms Sample/Dataset Multilayer Perceptron Functional Trees 59 attributes obtained Sequential Minimal Optimization with positive gain The dataset of player performances for EPL released by Naive Bayes ratio are the ones OPTA contained 210 attributes and 10369 instances. Out Random Forest affecting match of the 210 attributes of players, 198 attributes were Decision Table outcome performance statistics while the others were identifiers Fuzzy Unordered Rule for the player for that match. The dataset released by Induction Algorithm With the applied OPTA did not contain match-outcomes. The dataset did J48Graft algorithms, it not contain Own Goals, the goals scored by a player J48 concluded top of 34 against his team. The data for Own Goals scored by each Jrip attributes player in each match of the tournament was fetched from REP Tree characterizes the WhoScored.com LibSVM match outcome to a Kstar satisfactory extent. AdaBoostM1 with Functional Tree An ensemble learning can be a better choice when trying to predict the results in this field than SVM SVM Study /Author (Kumar, 2013) The Dataset selected contained features such as the number of goals scored by the home team, the number of goals scored by away team, Shots taken by the home (Khan & Kirubanand, team, Shots taken by away team, home team points, 2019) away team points, a variety of betting odds, and finally the Full-time result. The datasets collected were from the year 2000 to 2013. 29 Key Outputs Methods / Algorithms DQN-world is confused by the defensive behavior and significantly sacrifices its performance against the offensive opponent; DRON achieves a much better trade-off, retaining rewards close to both upper bounds against the varying opponent. DRON (Deep Reinforcement Opponent Network) DQN Game multi-agent that is played on a 6x9 grid by two (Boyd-Graber, He, players, which simulates movements and situations of a Kwok, & Daumé, 2016) football environment Compared to the performance results obtained using the SVM classifier, the proposed system attained good NNbased performance results concerning recall ratio, however, it attained poor NN-based performance results concerning precision ratio. Accordingly, it has been concluded that using the SVM classifier is more appropriate for soccer video summarization than the NN classifier. ANN SVM Five videos for soccer matches from World Cup Championship 2010, Africa Championship League 2010, (Zawbaa, Hassanien, & El-Bendary, 2011) Africa Championship League 2008, European Championship League 2008, and Euro 2008. Sample/Dataset Study /Author 30 Key Outputs Table 3 - ML Summary machine learning-enabled approach for automated predictive analysis of performance and team’ s network derived using possession chain data, by quantitatively analyzing measures of performance that have a specific distribution and that can be used to predict the performance of a team. Methods / Algorithms NN SVM Sample/Dataset Study /Author A dataset from a season of Major League Soccer division of the United States and Canada. The dataset consists of the possession chain data from 13 matches. The interaction information (possession chain) comprises of time and duration of all ball passes and tackles between players. The dataset also includes the nature of (Kusmakar, et al., 2016) the interaction which can be categorized as being between teammates or between opposing players. The positional information includes the x-y position of all individuals throughout the entire match. 31 Regarding the study of (Kumar, 2013), it presented the following table results of algorithms performance, sorted by correctly classified, ROC areas, F-Measure and Kappa statistic, taking into account all the threshold levels. The top four of all the involved are multilayer Perceptron, Functional Trees (FT), AdaboostM1 with FT, and Sequential Minimal Optimization (SMO). It shows the data used for the classification activity is applicable and informative in terms of match outcome. Table 4 - Algorithms performance on match data (number of attributes=336) (Kumar, 2013) Moreover, the study of (Kumar, 2013), as moving forward on decreasing the number of attributes by setting gain-ratio threshold the performance of some algorithms change, it concludes that the Multilayer perceptron performance decreases as it decreases the number of attributes, but Sequential Minimal Optimization algorithm keeps its good prediction. Also, another interesting behavior that improved with the fewer variables approach was KSTAR. Table 5 - Algorithms performance with selected attributes (threshold=0.07, number of attributes=34) on match data (Kumar, 2013) 32 Besides presenting insights in terms of algorithms and the respective performance, the study of (Kumar, 2013) brings a setlist of highest gain ratio team attributes that can support the exploration of the relevance of football attributes. The table below is sorted by the top gain-ratio found in the research. Table 6 - Top 34 highest gain- ratio team attributes (Kumar, 2013) Additionally, the algorithms approached Kumar's study, the research of (Khan & Kirubanand, 2019) tested the performance of the highest gain-ratio team attributes applied to football. The proof followed the figure below, regarding that the flow starts on the training and test set. The process triggers the data clean up and the features are computed to the current data set. Moving forward the flow, selected features are put into the SVM model with the RBF kernel. Finally, the data is fed into the XGBoost model and the outcome from both the methods are compared. 33 Figure 16 - Model used for comparison (Khan & Kirubanand, 2019) Extrapolating the team attribute analysis, the study of (Boyd-Graber, He, Kwok, & Daumé, 2016) explores the concept of simulate movements and situations in a football environment, taking advantage of a multi-agent. Developing probabilistic models or parameterized strategies for specific applications, encoding observing through a deep Q-Network (DQN) and Deep Reinforcement Opponent Network (DRON) variations. The following figure illustrates how the game situations and the response against different behaviors. Figure 17 - Left: Illustration of the football game. Right: Strategies of the hand-crafted rule-based agent (Boyd-Graber, He, Kwok, & Daumé, 2016) So far, these listed studies in this chapter can leverage the different types of analysis and cover together potential gaps within tested algorithms and outcomes. Furthermore, the findings and experimentation of (Zawbaa, Hassanien, & El-Bendary, 2011), using video as a source, adopting neural network (NN) and SVM, can underline the key events during the match. According to (Zawbaa, Hassanien, & El-Bendary, 2011) and following the findings, table 6 describes an overview of the comparison between SVM and NN, and, as reported, the SVM shows a better Performance, once the precision detected was worth 20% higher than NN. Meanwhile, Table 7 shows the precision in terms of event detection and all the results are above 89%, which means the proposed system achieved high accuracy. 34 Table 7 - Evaluation of logo based replay using SVM and NN (Zawbaa, Hassanien, & El-Bendary, 2011) Table 8 - Confusion matrix for event detection and summarization (Zawbaa, Hassanien, & ElBendary, 2011) After high accuracy achieved in the last study and (Boyd-Graber, He, Kwok, & Daumé, 2016) that shows an approach in terms of behavior and movements, there is additional research that copes with an analysis chain during the match. The findings of (Kusmakar, et al., 2016) can unlock the value in terms of this type of approach. Regarding table 8, presented by (Kusmakar, et al., 2016), shows the type of events that can be leveraged according to a sequence during the match. Table 9 - An example of ball possession chain data. The table shows a part of a ball possession chain dataset, which represents events in the 1st half of a match (Kusmakar, et al., 2016) Nevertheless, predictive models developed by (Kusmakar, et al., 2016), show a mean accuracy of up to 75% in predicting the segmental outcome of the likelihood of a team making a successful attempt 35 to score. The following table presents the segments predicted in favor of a team with the overall prediction accuracy. Table 10 - The predictive performance of the developed machine learning models. Shown are the segments predicted in favor of a team with the overall prediction accuracy, the predicted winner, and the true match results (Kusmakar, et al., 2016) 36 4. AUTHORS PER FIELD: PLAYER AND TEAM Five different authors are reviewed and presented in table 10, tracking key findings in terms of authors, sample, field, and key outputs. The summary is distributed on 40% focused on the player, 40% focused on the team and 20% focused on player and team. The first study listed in table 10, from the top to the bottom, study of (Decroos, Haaren, Bransen, & Davis, 2019) shows a framework for valuing player actions, regarding the outcome, taking into account the context. Thus, the player contributions can be measured according to offensive and defensive performance. Shifting to the next row of table 10, the article of (Hughes, et al., 2012) approaches the player perspective as well, regarding performance indicators per position and category sets that may support the priorities of skills driven by the game needs. On the other hand, some authors oversee both perspectives on the same study: team and player, such as (Kröckel, 2019). The outcome of this work is a valuable deep dive among different approaches regarding player and team performance. It's possible to extract insights on metrics and methods. Also, it supports an extended overview of other topics that are not the focus of this research, but it should be considered in terms of a reference to good references. Moving forward with table 10, (Kumar, 2013) research presents a reference in terms of team perspective. Besides the last chapter, where this author was explored, the study can leverage specifically the team performance and attributes that may influence the game. Last but not least, the study of (Kusmakar S., et al., 2020) shows the potential for uncovering local numerical markers of team performance. Regarding the topic ML was driven in chapter 9, it’s important to underline that this research also has a machine learning-enabled approach. Moreover, it has valuable insights into team analysis. 37 Study / Author Sample / Dataset Field Wyscout data for the English, Spanish, German, Italian, French, Dutch, and Belgian top (Decroos, Haaren, Bransen, & Davis, divisions. It considered 11,565 games played in 2019) the 2012/2013 through 2017/2018 seasons. Player European Football Championships of 2004. The measure was based on a subjectively drawn continuum that analyses a player’ s technical movement throughout the game Player (Hughes, et al., 2012) Key Outputs Values all action types as passes, crosses, dribbles, and shots Reasons about an action’ s possible effects on the subsequent actions. The player actions that increase and decrease a team’ s chance of scoring Key performance indicators per position and category sets 38 Study / Author (Kröckel, 2019) Sample / Dataset Field Key Outputs Social network analysis (SNA) Dynamic network analysis (DNA) Overview of data and tools in SNA research in football Social network metrics used Euro 2016 dataset among 51 games from OPTA in football performance database, not all games are used for each analysis approach during the study. A selection was Player and Team SOM network be applied for performed based on the aim of the analysis team performance analysis and the required amount of data. Comparison of mining algorithms, regarding football Clustering algorithms Data and information useful for real-time decision support 39 Table 11 – Authors Summary per field: Player and Team Study / Author Sample / Dataset Field (Kumar, 2013) The dataset of player performances for EPL released by OPTA contained 210 attributes and 10369 instances. Out of the 210 attributes of players, 198 attributes were performance statistics while the others were identifiers for the player for that match. The dataset released by OPTA did not contain match-outcomes. The dataset did not contain Own Goals, the goals scored by a player against his team. The data for Own Goals scored by each player in each match of the tournament was fetched from WhoScored.com Team A dataset from a season of Major League Soccer division of the United States and Canada. The dataset consists of the possession chain data from 13 matches. The interaction information (possession chain) comprises of time and duration of all ball passes and tackles between players. The dataset also includes the nature of the interaction which can be categorized as being between teammates or between opposing players. The positional information includes the x-y position of all individuals throughout the entire match. Team (Kusmakar S., et al., 2020) Key Outputs Top of 34 team attributes that affect the match outcome Potential for uncovering local numerical markers of team performance. 40 Moving back to (Decroos, Haaren, Bransen, & Davis, 2019) research, the framework studied has different ratings, regarding actions of players’ offensive and events influence. As an example of this kind of approach, there is the figure below. . Figure 18 - The attack leading up to Barcelona’s final goal in their 3-0 win against Real Madrid on December 23, 2017 (Decroos, Haaren, Bransen, & Davis, 2019) Following the research of (Decroos, Haaren, Bransen, & Davis, 2019), figure 18 illustrates exactly what is stated in the title of the article: actions speak louder than goals. And connecting the next figure 19 shows a correlation between the number of actions per 90 minutes with the related value per action, the result of the finding demonstrates that Lionel Messi is a world-class player. 41 Figure 19 – Value per action (Decroos, Haaren, Bransen, & Davis, 2019) Keeping in mind the player perspective, the study of (Hughes, et al., 2012) presents the following figure, indicating key performance Indicators (KPI) that best fit per position. Thus, it’s a valuable reference to have a deep dive and tackle the important KPI regarding the player's objective during a match. Figure 20 - The skill requirements (Key Performance Indicators) for the different positions in soccer (Hughes, et al., 2012) 42 Shifting to another author, the research of (Kröckel, 2019) approaches some relevant topics among football analytics, on the player and team perspective. The results of the study present insights on metrics and methods, catching how coaches can take advantage of decision support during live matches. One way to structure data, regarding (Kröckel, 2019), is to track a single event., for instance, shot on goal, in a manner that it’s possible to define the actor (who), the type of action (what), time (when) and the pitch position (where). The figure below shows the file format that aggregates this kind of information. Figure 21 - Example of a passing event as recorded by OPTA (Kröckel, 2019) Continuing on the same study, it shows a long list of authors and depicts player performance analysis and team performance, in a specific chapter, largely exploring social network analysis and dynamic network analysis. Another interesting highlight on the (Kröckel, 2019) research is the chapter that deeply dives on the tactical behavior and the individual actions performed, regarding mining algorithms. Also, it explores a sequence of game steps ending on a final action target, as demonstrated in the following figure. 43 Figure 22 -England’s offensive sequences ending in a shooting attempt (Kröckel, 2019) Changing and looking forward to a reference about team perspective, there is the research of (Kumar, 2013), mentioned also on the last chapter, also (Kusmakar S., et al., 2020) analyzes football as a dynamic system, regarding player actions and the context among events as it demonstrates on the following figure. Figure 23 - An example of ball possession chain data. The table shows a part of a ball possession chain dataset, which represents events in the 1st half of a match (Kusmakar S., et al., 2016) Keeping on the same study of (Kusmakar S., et al., 2016), another key finding on it, is the combination of machine learning with team performance, achieving an automated machine learning model to predict the outcome of a game segment, comparing another authors and researches as illustrated on the next figure. It can drive results between different approaches. 44 Figure 24 - Validation of the proposed approach on the largest open collection of soccer logs from 7 major competitions (Kusmakar S., et al., 2020) 45 5. CONCLUSIONS 5.1. SYNTHESIS OF THE DEVELOPED WORK The objective of this study was to investigate and explore relevant academic contributions toward machine learning methods applied for football analytics and the respective used fields were the following: player individual technical skills and team performance. Furthermore, assess the most relevant ML models per authors on the football analytics field to deliver a summary. Besides, the aim was to evaluate the most relevant authors per chosen fields to deliver a summary as well. Finally, in this final chapter, give future directions for the next steps using this study. In the first part of the research, the context of football analytics background was approached on the literature review. The boundaries and fundamentals of the study were established to drive the path for the next steps. After that, analyzing the literature, the summary of machine learning methods among football analytics had five relevant findings which had a concentration on SVM and different kinds of neural network algorithms. After comparing machine learning methods, the analysis went down on authors per the defined boundaries of scope perspective: team and player. Regarding results, there were five satisfactory studies in which it’s possible to explore valuable player and team metrics, the importance of actions, and the sequence itself. Thus, the applicability of football analytics-driven on the specific fields approached on this research. Despite the sample size found in the study, the results from the literature can drive other researchers to have a guide and, through compiled summaries, pave new further in-depth and targeted investigation toward football analytics. Besides, it was concluded that football analytics has broadly vast opportunities in terms of research, regarding machine learning methods and a high potential to have a deep exploration of team and player perspective. 5.2. FUTURE WORK As the next step, I would explore deeply samples and experiments in the field of machine learning application, as long as use cases taking advantage of team and player as the target. Nevertheless, it would be great to apply in a real-life world and follow up results, supporting football decisions and tracking achievements. Second, future research might also further investigate the usage of the SVM, regarding that this model was found in most of the studies showed in the present research, and compare the results obtained to other machine learning algorithms, applying football datasets and targeting a player and teams analysis, identifying reliably and accurately. Last but not least, football has alternative perspectives that influence team results and should be explored, regarding player recovery in terms of fatigue, psychological mindset, and other relevant aspects that may be addressed. Thus, another further investigation would be the correlation 46 between team results and other potential perspectives taking advantage of previous studies, and the list of researches already touched on the presented work. 47 6. BIBLIOGRAPHY Américo, J. (2013). Sistema de Seguimento de Jogadores de Futebol baseado em Vídeo de Baixa Qualidade. Porto: Faculdade de Ciências da Universidade do Porto em Ciência de Computadores. Anderson, C., & Sally, D. (2013). The Numbers Game: Why Everything You Know About Soccer Is Wrong. London: Penguin Group. Aquino, R., Alves, I., Fuini, E., & Garganta, J. (2017, June). Skill-Related Performance in Soccer: A Systematic Review. Human Movement. Babbar, M. (2019). A systematic review of sports analytics. Borges, P., Garganta, J., Guilherme, J., & Jaime, M. (2019, August). Tactical efficacy and offensive game processes adopted by Italian and Brazilian youth soccer players. Motriz. Revista de Educação Física. Boyd-Graber, J., He, H., Kwok, K., & Daumé, H. I. (2016). Opponent Modeling in Deep Reinforcement Learning. Proceedings of the 33rd International Conference on Machine. New York. Castelão, D., Garganta, J., Afonso, José, & Costa, I. (2015, June). Sequential analytsis of attacking bahaviors performed by top-level national soccer teams. Revista brasileira de ciência do esporte, pp. 230-236. Caya, O., & Bourdon, A. (2016). A framework of value creation from business intelligence and analytics in competitive sports. 49th Hawaii International Conference on System Sciences (HICSS) (pp. 1061-1071). IEEE. Clarke, N., & Noon, M. (2019). Editorial: Fatigue and Recovery in Football. Coventry University. Collignon, H. (2019). Winning in the Business of Sports. Retrieved from Atkearney: https://www.atkearney.com/communications-media-technology/article?/a/winning-in-thebusiness-of-sports Constantinou, A., & Fenton, N. (2017). Towards Smart-Data: Improving predictive accuracy in a longterm football team. Knowledge-Based Systems. De Silva, V., Caine, M., Skinner, J., Dogan, S., Kondoz, A., Peter, T., . . . Smith, B. (2018, October 26). Player Tracking Data Analytics as a Tool for Physical Performance Management in Football: A Case Study from Chelsea Football Club Academy. Decroos, T., Haaren, J. V., Bransen, L., & Davis, J. (2019, August). Actions Speak Louder than Goals: Valuing Player Actions in Soccer. The 25th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD ’19). Doidge, M., Claus, R., Gabler, J., Irving, R., Millward, P., & Silvério, J. (2019). The impact of international football events on local, national, and transnational fan cultures: a critical overview. Soccer & Society. 48 FIFA. (2019, September 29). FIFA. Retrieved from https://www.fifa.com/livingfootball Frias, T. (2012). Changes in defensive playing methods influence the collective behavior of association football teams. Universidade Técnica de Lisboa. Hughes, M., Caudrelier, T., James, N., Redwood-brown, A., Donnelly, I., Kirkbride, A., & Duschesne, C. (2012). Moneyball and soccer - An analysis of the key performance indicators of elite male soccer players by position. Journal of Human Sport and Exercise, 7(2). Khan, S., & Kirubanand, V. (2019). Comparing machine learning and ensemble learning in the field of football. International Journal of Electrical and Computer Engineering (IJECE), 4321-4325. Klyuchka, Y. A., Cherednichenko, O. Y., Vasylenko, A. V., & Yakovleva, O. V. (2015). Forecasting the results of football matches on the internet-based information. Bulletin of NTU "KPI". Korte, T. (2014, June 19). Datainnovation.org. Retrieved from Datainnovation.org: https://www.datainnovation.org/2014/06/how-data-and-analytics-have-changed-thebeautiful-game/ Kröckel, P. (2019, July). Big Data Event Analytics in Football for Tactical Decision Support. Kumar, G. (2013). Machine Learning for Soccer Analytics. Dublin. Kusmakar, S., Shelyag, S., Zhu, Y., Dwyer, D., Gastin, P., & Angelova, M. (2016). Machine learningenabled team performance analysis in the dynamical environment of soccer. Kusmakar, S., Shelyag, S., Zhu, Y., Dwyer, D., Gastin, P., & Angelova, M. (2020, March). Machine learning-enabled team performance analysis in the dynamical environment of soccer. DSI Collaborative Research (Intelligent Sensor Processing for Enhancing Defence Decision Support). Lepschy, H., Woll, A., & Wäsche, H. (2018). How to be Successful in Football: A Systematic Review. The Open Sports Sciences Journal. Memmert, D., Bischof, J., Endler, S., Grunz, A., Schmid, M., Schmidt, A., & Perl, J. (2011). World-Level Analysis in Top Level Football Analysis and Simulation of Football Specific Group Tactics by Means of Adaptive Neural Networks. Artificial Neural Networks. Mohammadi, M., & Sorour, S. (2018, June 5). Deep Learning for IoT Big Data and Streaming Analytics: A Survey. IEEE COMMUNICATIONS SURVEYS & TUTORIALS. Morgulev, E., Azar, O. H., & Lidor, R. (2018). Sports analytics and the big-data era. International Journal of Data Science and Analytics. Novatchkov, H., & Baca, A. (2013, December). Artificial Intelligence in Sports on the Example of Weight Training. Journal of Sports Science and Medicine, pp. 27-37. Pappalardo, L., Cintia, P., Pedreschi, D., Giannotti, F., & Barabási, A.-L. (2017, December). Human Perception of Performance. arXiv on Physics and Society. 49 Pappalardo, L., Cintia, P., Rossi, A., Massucco, E., Ferragina, P., Pedreschi, D., & Giannotti, F. (2019). A public data set of spatio-temporal match events in soccer competitions. Scientific Data. Pappalardo, L., Ferragina, P., Cintia, P., & Pedreschi, D. (2019, September). Player Rank: Data-driven Performance Evaluation and Player Ranking in Soccer via a Machine Learning Approach. ACM Transactions on Intelligent Systems and Technology, 10. Pires, M., & Santos, V. (2018). Assessing the Impact of the Internet of Everything Technologies in Football. Journal of Sports Science 6, pp. 36-55. Pratas, J. M., Volossovitch, A., & Carita, A. I. (2018). Goal scoring in elite male football: A systematic review. Journal of Human Sport and Exercise. Qing, Y., Ruano, M.-Á., Hongyou, L., Zhang, S., Gao, B., Wunderlich, F., & Memmert, D. (2020). Evaluation of the Technical Performance of Football Players in the UEFA Champions League. International Journal of Environmental Research and Public Health. Rein, R., & Memmert, D. (2016). Big data and tactical analysis in elite soccer: future challenges and opportunities for sports science. SpringerPlus. Santos, F., Mendes, B., Maurício, N., Furtado, B., Sousa, P., & Pinheiro, V. (2016). Análise do Golo em Equipas de Elite de futebol na época 2013-2014. Revista de Desporto e Actividade Física, 8, 11-22. Sarmento, H., Campanico, J., & Marcelino, R. (2014). Match analysis in football: a systematic review. Journal of Sports Sciences. Schoenfeld, B. (2019, May 22). Nytimes. Retrieved from Nytimes: https://www.nytimes.com/2019/05/22/magazine/soccer-data-liverpool.html Schulenkorf, N., & Frawley, S. (2017). Critical Issues in Global Sport Management. London: Routledge. Spearman, W., & Basye, A. T. (2017). Physics-Based Modeling of Pass Probabilities in Soccer. Sports Analytics Conference. MIT Sloan. Tax, N., & Joustra, Y. (2015). Predicting The Dutch Football Competition Using Public Data: A Machine Learning Approach. TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. X, NO. X, MONTH YEAR 1. Vanoye, J., Penna, A., Parra, O., & Díaz, D. (2017). Motivation Index to Improve Soccer Performance. International Journal of Combinatorial Optimization Problems and Informatics, 8, 45-57. Zawbaa, H. M., Hassanien, A. E., & El-Bendary, N. (2011). Machine Learning-Based Soccer Video Summarization System. Communications in Computer and Information Science, (pp. 19-28). 50 Page | i