1. Introduction
Soccer is a complex system including multiple components that evolve at different scales both in time and in space. Presently, soccer has a huge economical and social relevance [1,2], but the study using advanced numerical and computational tools is still limited. We note that distinct levels of competition have been tackled, namely the technical progress of a player during his/her career [3,4,5], the time–space trajectories of the players in a match [6,7,8,9,10], or the performance of a number of teams along a league and season [11,12,13,14,15].
The prediction of the outcome of soccer matches is another important field, due to its interest both for the public, clubs, advertising companies, media and odds setters, besides researchers [16]. A variety of statistics tools have been adopted, namely Poisson models [17], Bayesian methods [18], rating systems [19] and machine learning schemes [20], among others [21,22].
The prediction of a match, league, or competition outcome is closely related to the concept of uncertainty. Uncertainty arouses fans’ emotion, is essential in the betting business, and is the factor that moves the sports industry. The uncertainty about the result of a match, a league, or any other competition, is measured by the ‘competitive balance’ [23,24]. In a league, or multi-team competition, the final standings of the teams is the main point of interest. If the competitiveness is high, then we have a high uncertainty in the match outcome, and vice versa, in what concerns the teams ranking in a league or competition [25]. Classical measures to quantify competitiveness either adopt simple ratios of standard features [26,27], or are developed based on graph theory [25].
Recent advances in the analysis of soccer dynamics have been accomplished with the developments registered in the area of sports analytics [28,29]. Sports analytics consists of the mathematical and statistical analysis of data related to sports, with the objective of providing a competitive advantage to a team or an individual. Often, we distinguish between on-field and off-field analytics [30]. The first deals with the improvement of the on-field behavior of players and teams, and, for example may address player fitness and game tactics. The second deals with business and focuses on helping sport organizations to increase ticket and merchandise sales, improve fans’ engagement and reach good management decisions, just to mention a few. Sports analytics developed rapidly in the last few years, supported by the technological advances in data measurement, storage and computational processing. Object-tracking tools allowed the automatic collection of information about players over time. The spatiotemporal datasets were adopted in a number of research works, including the retrieval of play sequences [31] and the classification of defensive strategies [32] in basketball, and shot prediction [33] in tennis. Spatiotemporal data were used in soccer to identify play styles and team formations [34], as well as to plan coordinated playing tactics [35].
The strategies to form competitive sports teams while having limited resources has attracted the attention of professionals, scientists and society. Scouting is fundamental in many sports, namely in professional soccer, to identify talented players [36]. Recognizing player styles and similarities between them are also crucial in forming a team lineup. To such purposes, scouts, technical directors and coaches often depend on heuristics (e.g., wage, specific abilities, previous experience and intuition) to choose players for their teams [37] independently of the time horizon of interest, that is, prior to, or during, a season or match. However, the standard adopted procedures are subjective and mistakes can lead to sport and economic failure. The rapid increase in the volume and quality of soccer digital data allowed for the application of computer tools to characterize and rank athletes under the light of their perceived abilities [38]. Nonetheless, the automatic characterization of players based on such data is challenging in modern soccer [39], since players’ positions are not rigidly defined. Indeed, many players can occupy various roles on the field and each position requires a particular set of skills and physical attributes. Tools for searching relevant information in large soccer datasets motivated the interest of researchers in the field of computer science. Machine learning methods have been successfully applied in the prediction of match outcomes [20,40] and athletes’ injuries [41,42], analysis of team performance [43,44] and talent discovering [45,46], just to cite a few. The characterization and selection of players based on data is still a challenge.
The multidimensional nature of the data required to analyze soccer player styles and to compare elements between each other made the dimensionality reduction and clustering algorithms key tools to deal with soccer datasets. Dimensionality reduction-based schemes try to preserve in low dimensional representations the information embedded in the original datasets. They include linear methods, such as classic multidimensional scaling [47], principal component [48], canonical correlation [49], linear discriminant [50] and factor analysis [51], as well as nonlinear approaches, such as non-classic MDS, or Sammon’s projection [52], isomap [53], Laplacian eigenmap [54], diffusion map [55], t-distributed stochastic neighbor embedding [56] and uniform manifold approximation and projection (UMAP) [57]. These techniques are closely connected to the field of information visualization, which corresponds to the computational generation of visual portraits of a dataset. Its main goal is to expose features embedded in the data, in order to understand the system that generated such data [58,59].
We find nowadays a vast literature on soccer data, but research based on dimensionality reduction, clustering and computer visualization of soccer players data is scarce. We can cite some works that adopt these techniques, although not necessarily all three together. Abade et al. [60] classified young players following their physical and physiological profiles gathered from training sessions in the point of view of age and playing position. The data from the time motion and the body acceleration/deceleration features were processed using repeated-measures factorial ANOVA and two-step cluster analysis to classify players. Fortuna et al. [61] analyzed the notoriety and international popularity of players in the viewpoint of Google queries over time. The data streams were processed through K-means clustering and three semi-metrics using the functional principal component decomposition and their first and second derivatives. Kirschstein and Liebscher [62] studied the athletes’ market value versus their performance skills by applying principal component analysis. Gavião et al. [63] used ranking, classification, dynamic evaluation and regularity analysis within the framework of composition of probabilistic preferences to determine the best investment opportunities when choosing among players.
This paper adopts dimensionality reduction, clustering and computer visualization tools to compare soccer players based on a set of attributes. The players are characterized by numerical data that rate their specific skills. The dataset used is retrieved from the soccer video game FIFA by Electronic Arts (EA) (
The paper structure is as follows. Section 2 and Section 3 introduce the UMAP algorithm, used for processing and visualizing the dataset, and the FIFA dataset, respectively. Section 4 analyses the data in a global perspective and interprets the results in the light of the geometric patterns generated. Section 5 compares the players based on their skills according to their position on the pitch. Section 6 presents the conclusions.
2. The Uniform Manifold Approximation and Projection
The UMAP is novel technique [57] for dimensionality reduction, clustering and visualization of high-dimensional datasets, which seeks to accurately represent both the local and global structures that characterize the information [64,65].
Let us consider a set of N objects, , , in a r-dimensional space. Those are represented in a s-dimensional embedding space, , by , while preserving as best as possible the inter-object distances.
The UMAP computational tool requires a distance, , between pairs of objects and , , and the number of neighbors to consider, k. The algorithm has two main stages. In the first, it starts by computing the k-nearest neighbors of , , with respect to the distance . Then, the UMAP calculates the parameters and for each data point . The parameter stands for a nonzero distance between and its nearest neighbor and is determined as:
(1)
The parameter plays a key role for assuring the local connectivity of the manifold. This means that yields a locally adaptive exponential kernel for each point.
The constant must be chosen so that the following condition is satisfied:
(2)
and it is determined using a binary search.The algorithm determines a joint probability distribution that measures the similarity between and , in such a way that similar (dissimilar) objects are assigned a higher (lower) probability:
(3)
(4)
where , , and .In the second stage, the UMAP algorithm calculates the similarities between each pair of points in the embedding s-dimensional space:
(5)
(6)
where , , and . The parameters a and b are either user-defined, or are determined by the algorithm given the required separation between close points, , in the embedding space:(7)
The UMAP performs an optimization, while minimizing the cross-entropy between the distribution of points in the original and the embedding spaces:
(8)
The minimization procedure starts with a given initial set of points in the embedding space. The UMAP uses the Graph Laplacian to assign initial low-dimensional coordinates and, then, proceeds with the optimization using the gradient descent:
(9)
3. Description of the Dataset
Comprehensive datasets of sports are either obtained by the end-user through dedicated hardware and software tools, or are bought from professional service providers. Soccer-related statistics characterize specific aspects of the teams and players during a match, such as the percentage of time with ball possession, the number of attempts to goal and the number of finishes and turnovers. Moreover, we can also have, for a given season, the accumulated points, the average number of goals scored and suffered per match, and the average time to score, just to cite a few. These data are generated automatically by means of sensors, such as video cameras and 3D tracking motions systems, processed using specific software and organized in databases. Therefore, gathering such rich information about teams and players is costly and, therefore, has been available only to entities with high financial resources.
Fortunately, public sports-related datasets, ranging from individual players’ performance attributes and game statistics, to event logs of matches, have also became available to the scientific community and professionals. Concerning data about soccer players’ skills, besides those obtained using automatic procedures, knowledge comes also from coaches, former players, journalists and other sports agents. The precise characterization of players will allow a better understanding of teams, matches and leagues, as well as to improve the economic aspects of the modern soccer industry.
In this paper we use data from the FIFA 2021 video game. The FIFA was launched in 1995 by the company EA
The FIFA 2021 raw dataset contains 18,944 players. However, after data cleaning for eliminating entries with missing or inaccurate values, we obtain a total of 18,708 players, distributed within the groups {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers}, comprising athletes, as shown in Table 1.
Figure 1 depicts the histograms that characterize the distributions of the players’ attributes
Figure 2 shows the attributes
In a different dimension, Figure 3 portrays the Goalkeeper’s and Striker’s attributes and
Figure 4 shows the attributes for Goalkeepers and Strikers. It should be mentioned that besides their ‘standard’ attributes, Goalkeepers and Strikers are also assigned with field player- and goalkeeper-specific attributes, respectively. This seems somewhat strange, but, in fact, soccer allows goalkeepers and field players to occupy any position on the pitch as long as they comply with the rules that apply to those positions. The analysis for other playing positions is not included here for the sake of parsimony.
4. The UMAP for Global Comparison and Visualization of Soccer Players
For implementing the UMAP dimensionality reduction, clustering and visualization tool we used the
We present results for the distances {Arccosine, Canberra, Correlation, Lorentzian} = to compare the objects and , , that stand for players and are characterized by the attributes () listed in Table 2. The choice for is based on the available database information. We included all players’ technical attributes (i.e., the maximum possible). The distances are given by [68]:
(10)
(11)
(12)
(13)
Figure 5 depicts the 3D loci of the players in the FIFA 2021 dataset obtained by the UMAP with the distances . We verified that the Goalkeepers form a cluster quite different from the others, while the {Defenders, Centre Midfielders, Wingers, Strikers} show some superposition. This is expected, since the field players have characteristics much different than those exhibited by the goalkeepers, but closer to each other. Moreover, we find players that have skills allowing them to play in different positions on the pitch. For example, L. Messi can play as RW, ST and CF. We verify also that the , and separate well the five groups, while reveals more difficulties to separate the Goalkeepers from the other groups. The and yield very similar loci.
Different distances can lead to valid visual representations, but not all of them are able to capture the structures of interest hidden in the data. It should be mentioned that the selection of an adequate distance often requires a number of numerical trials. In this work, we tested other distances, but the option of including additional metrics would have led to a huge number of figures. Therefore, we selected those that we found best, in order to limit space.
We can obtain an alternative representation by changing the fourth dimension from a categorical to a numerical variable. Figure 6 highlights different aspects of the 2021 dataset by means of colormaps applied to the locus obtained with proportional to the attributes , , and . It can be seen that for all attributes, the UMAP can place similar objects close to each other in the embedding space. Moreover, the objects tend to distribute uniformly over a smooth surface. Naturally, other attributes can be represented using a similar procedure.
It should be emphasized that we can compare subsets of players that are selected from the original dataset by means of some criterion. Figure 7 illustrates this idea by considering merely the players in the four groups {Defenders, Centre Midfielders, Wingers, Strikers}. In this case, the Goalkeepers were not included in the processed dataset, since, as shown in Figure 5, they are quite different from the others. We verify that now the four groups emerge slightly more clear than before, even though we still have some superposition.
5. The UMAP for Local Comparison and Visualization of Soccer Players
In this section, we analyze the UMAP loci for each group separately. In other words, we considered each group in the set {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers} and, therefore, we have five cases. Obviously, the study can also be performed for other groups, for samples extracted from a single or various groups, and for distinct years.
Figure 8 depicts the results obtained for Goalkeepers and Strikers, where the colormap is proportional to the attribute . For the other groups, the charts are of the same type. We verify that, for both cases, the players, represented by points, distribute regularly in space, with the most valuable ones occupying the edges of the surface. Other possible patterns (if they exist) are difficult to distinguish due to the large number of objects and, thus, hide more subtle relationships. Therefore, even adopting 3D loci, to perceive assertively the location of the objects poses problems for a large number of objects. Magnifying the cloud of points mitigates the problem, but does not solve it satisfactorily. One possibility is to consider subsets with just the objects of interest and generate new (different) loci based on the the new datasets.
In the sequel, we analyze just the top 100 players in view of the criterion , in each group {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers}. Naturally, other criteria can be adopted to extract the elements from the groups and we can mix players from various groups, but the criteria adopted illustrate well the procedure.
Firstly, the players are compared using the Canberra distance and their locus is generated through the UMAP dimensionality reduction and clustering algorithm. Secondly, given one element in the locus, freely chosen by the user, the w players who are closer to the one adopted as reference are identified according to the Euclidean distance in the 3D embedding space, yielding a small cluster of w elements. Finally, the user can evaluate the w most ‘interesting’ players in the perspective of additional criteria, such as , or . Of course, if , then we have the player closer to the reference one.
Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13 depict the UMAP loci generated. For the Goalkeepers, the most valuable one, J. Oblak, was taken as the reference. Then, choosing , the closer elements, sorted by increasing distance, were {B. Leno, N. Guzmán, D. Livaković, S. Romero, E. Martínez, F. Muslera, K. Schmeichel, Alisson, A. Onana, J. Cillessen}. Therefore, B. Leno emerges as the best choice for substituting J. Oblak, when merely the player’s skills criterion is considered. However, if the user decides to choose additional criteria, such as and , then a compromise exists between skills and cost, and the best choices could instead correspond to N. Guzmán or S. Romero, since they can be hired with a more limited economic effort.
For the Defenders, Centre Midfielders, Wingers and Strikers, we chose V. van Dijk, K. De Bruyne, Neymar Jr and L. Messi as references, and for , we obtain the sets {M. Hummels, Piqué, Azpilicueta, L. Hernández, Thiago Silva, T. Alderweireld, J. Vertonghen, L. Bonucci, H. Maguire, Marquinhos}, {Bruno Fernandes, P. Pogba, L. Modrić, T. Kroos, D. Alli, Parejo, M. Kovačić, M. Sabitzer, Arthur, Thiago}, {S. Mané, R. Sterling, M. Salah, Bernardo Silva, A. Di María, H. Ziyech, J. Sancho, C. Eriksen, R. Mahrez, Oyarzabal} and {Cristiano Ronaldo, K. Mbappé, P. Dybala, K. Benzema, H. Son, K. Havertz, M. Rashford, M. Reus, R. Lewandowski, E. Hazard}, respectively. By applying the same approach as before for the Goalkeepers, the best options for substituting the references can be found. Let us focus on the Strikers. Usually, those are the most valuable and the most popular, as they are the most effective goal scorers, and goals are the essence of soccer. Let us assume that the recent conflicts between L. Messi and F. C. Barcelona of Summer 2020 have intensified and that the club is forced to replace the player. The question that will then be asked is whom to hire. According to the UMAP loci generated, the first choice will be Cristiano Ronaldo, if the criterion is exclusively based on the player’s skill. However, if there are no economic restrictions, as seems to be the case with elite clubs, the K. Mbappé hypothesis may be a more suitable choice. His value is higher and he earns a higher salary, but, on the other hand, he is younger and has greater potential for progression than Cristiano Ronaldo. Thus, it is up to the club to weigh the most convenient factors in deciding who should replace L. Messi.
Figure 14 portraits the normalized distance between the most valuable player in each group {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers}, that is, having for references {J. Oblak, V. van Dijk, K. De Bruyne, Neymar Jr, L. Messi}, and comparing the UMAP coordinates with relation to their closer elements. We verify that the distance increases with jumps, which translate in worse skills as we move from first towards next choice players.
The UMAP was proven very effective for visualizing clusters of objects, outperforming other dimensionality reduction, clustering and information visualization techniques both in terms of their computational time, memory requirements and ability to unveil patterns embedded in the data [57]. One must note that concrete information about the management decisions of the soccer teams is not available. Therefore, to have a comparison of “real-world” data is virtually impossible, not only for researchers, but also for governments and for soccer associations. The experience gathered in other applications [69,70] allows us to consider whether a given algorithm is “better” or “worse” based on its clustering performance. Certainly, this is a subjective point of view, but the fact is that the assessment of the results provided by such kinds of techniques is based on the user experience and intuition. Another issue that needs to be highlighted is that the main goal of the paper is not to straightforwardly provide a commercial/computational tool for sport managers. Therefore, to avoid unclear legal, commercial, financial and ethical issues, the maximum extent for us was limited to refer the names of the players without commenting on their qualities. In summary, the goal of the paper is to explore the potential associated with the adoption of advanced clustering techniques for soccer players.
6. Conclusions
This paper adopted the UMAP dimensionality reduction, clustering and information visualization technique to explore relationships between soccer players. The algorithm constructs representations of the original dataset of players’ skills without imposing a priori requirements. The loci generated in a low-dimensional space allow a straightforward interpretation of the data. The results showed that the adoption of dimensionality-reduction and visualization tools for processing complex data is a key modeling option with current computational resources. The approach can be easily extended to deal with more features and richer descriptions of the data involving a higher number of dimensions.
Author Contributions
A.M.L. and J.A.T.M. conceived, designed and performed the experiments, analyzed the data and wrote the paper. All authors have read and agreed to the published version of the manuscript.
Funding
This research received no external funding.
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
Data supporting reported results can be found at
Conflicts of Interest
The authors declare no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Figure 1. Histograms characterizing the FIFA 2021 dataset according to the attributes: (a) age; (b) ln(value_eur); (c) ln(wage_eur); (d) ln(release_clause_eur).
Figure 2. Box plots characterizing the attributes of {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers} in the FIFA 2021 dataset.
Figure 3. The attributes ln(value_eur) and potential versus age of Goalkeepers and Strikers (FIFA 2021 dataset).
Figure 5. The 3D loci of players in the FIFA 2021 dataset obtained by the UMAP with the distances: (a) dAr; (b) dCa; (c) dCo; (d) dLo.
Figure 6. The 3D loci obtained by the UMAP with the Canberra distance dCa for the FIFA 2021 dataset. The colormap is proportional to the attributes: (a) ln(overall); (b) ln(value_eur); (c) ln(wage_eur); (d) ln(release_clause_eur).
Figure 7. The 3D loci of players in the groups {Defenders, Centre Midfielders, Wingers, Strikers} the FIFA 2021 dataset obtained by the UMAP with the distances: (a) dAr; (b) dCa; (c) dCo; (d) dLo.
Figure 8. The 3D loci obtained by the UMAP with the Canberra distance for the FIFA 2021 dataset: (a) Goalkeepers; (b) Strikers. The colormap is proportional to the attribute ln(value_eur).
Figure 9. The 3D locus generated by the UMAP with the Canberra distance for the N=100 most valuable goalkeepers in the FIFA 2021 dataset. The reference is J. Oblak and w=10. The size of the circular marks and the colormap are proportional to the attributes wage_eur and value_eur, respectively.
Figure 10. The 3D locus generated by the UMAP with the Canberra distance for the N=100 most valuable defenders in the FIFA 2021 dataset. The reference is V. van Dijkand and w=10. The size of the circular marks and the colormap are proportional to the attributes wage_eur and value_eur, respectively.
Figure 11. The 3D locus generated by the UMAP with the Canberra distance for the N=100 most valuable midfielders in the FIFA 2021 dataset. The reference is K. De Bruyne and w=10. The size of the circular marks and the colormap are proportional to the attributes wage_eur and value_eur, respectively.
Figure 12. The 3D locus generated by the UMAP with the Canberra distance for the N=100 most valuable wingers in the FIFA 2021 dataset. The reference is Neymar Jr and w=10. The size of the circular marks and the colormap are proportional to the attributes wage_eur and value_eur, respectively.
Figure 13. The 3D locus generated by the UMAP with the Canberra distance for the N=100 most valuable strikers in the FIFA 2021 dataset. The reference is L. Messi and w=10. The size of the circular marks and the colormap are proportional to the attributes wage_eur and value_eur, respectively.
Figure 14. The normalized distance between the most valuable player in each group {Goalkeepers, Defenders, Centre Midfielders, Wingers, Strikers}, with reference {J. Oblak, V. van Dijk, K. De Bruyne, Neymar Jr, L. Messi}, and with relation to their j=1,…,10 closer elements.
List of typical positions of the players on the pitch and the number of players assigned to these positions in FIFA 2021 (April).
Group | Number of Players | Position | Acronym |
---|---|---|---|
Goalkeepers | 2054 | Goalkeepers | GK |
Defenders | 6725 | Centre Back | CB |
Right Back | RB | ||
Left Back | LB | ||
Right Wing Back | RWB | ||
Left Wing Back | LWB | ||
Centre Midfielders | 3556 | Centre Defensive Midfielder | CDM |
Centre Midfielder | CM | ||
Centre Attacking Midfielder | CAM | ||
Wingers | 2854 | Right Midfielder | RM |
Left Midfielder | LM | ||
Right Wing | RW | ||
Left Wing | LW | ||
Strikers | 3519 | Right Forward | RF |
Centre Forward | CF | ||
Left Forward | LF | ||
Striker | ST |
List of attributes of L. Messi and Cristiano Ronaldo in FIFA 2021 (April).
Atributes | |||||||
---|---|---|---|---|---|---|---|
Number | Name | Value | Number | Name | Value | ||
L. Messi | C. Ronaldo | L. Messi | C. Ronaldo | ||||
1 | attacking_crossing | 85 | 84 | 26 | mentality_composure | 96 | 95 |
2 | attacking_finishing | 95 | 95 | 27 | defending_marking | 32 | 28 |
3 | attacking_heading_accuracy | 70 | 90 | 28 | defending_standing_tackle | 35 | 32 |
4 | attacking_short_passing | 91 | 82 | 29 | defending_sliding_tackle | 24 | 24 |
5 | attacking_volleys | 88 | 86 | 30 | goalkeeping_diving | 6 | 7 |
6 | skill_dribbling | 96 | 88 | 31 | goalkeeping_handling | 11 | 11 |
7 | skill_curve | 93 | 81 | 32 | goalkeeping_kicking | 15 | 15 |
8 | skill_fk_accuracy | 94 | 76 | 33 | goalkeeping_positioning | 14 | 14 |
9 | skill_long_passing | 91 | 77 | 34 | goalkeeping_reflexes | 8 | 11 |
10 | skill_ball_control | 96 | 92 | 35 | sofifa_id | 158023 | 20801 |
11 | movement_acceleration | 91 | 87 | 36 | short_name | L. Messi | Cristiano Ronaldo |
12 | movement_sprint_speed | 80 | 91 | 37 | age | 33 | 35 |
13 | movement_agility | 91 | 87 | 38 | overall | 93 | 92 |
14 | movement_reactions | 94 | 95 | 39 | potential | 93 | 92 |
15 | movement_balance | 95 | 71 | 40 | value_eur | 103.5 M | 63M |
16 | power_shot_power | 86 | 94 | 41 | wage_eur | 560 k | 220k |
17 | powerjumping | 68 | 95 | 42 | player_positions | RW, ST, CF | ST, LW |
18 | power_stamina | 72 | 84 | 43 | release_clause_eur | 212.2 M | 104M |
19 | power_strength | 69 | 78 | 44 | height_cm | 170 | 187 |
20 | power_long_shots | 94 | 93 | 45 | weight_kg | 72 | 83 |
21 | mentality_aggression | 44 | 63 | 46 | preferred_foot | left | right |
22 | mentality_interceptions | 40 | 29 | 47 | international_reputation | 5 (maximum 5) | 5 (maximum 5) |
23 | mentality_positioning | 93 | 95 | 48 | work_rate | medium/low | high/low |
24 | mentality_vision | 95 | 82 | 49 | weak_foot | 4 (maximum 5) | 4 (maximum 5) |
25 | mentality_penalties | 75 | 84 | 50 | team_position | CAM | LS |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors.
Abstract
In professional soccer, the choices made in forming a team lineup are crucial for achieving good results. Players are characterized by different skills and their relevance depends on the position that they occupy on the pitch. Experts can recognize similarities between players and their styles, but the procedures adopted are often subjective and prone to misclassification. The automatic recognition of players’ styles based on their diversity of skills can help coaches and technical directors to prepare a team for a competition, to substitute injured players during a season, or to hire players to fill gaps created by teammates that leave. The paper adopts dimensionality reduction, clustering and computer visualization tools to compare soccer players based on a set of attributes. The players are characterized by numerical vectors embedding their particular skills and these objects are then compared by means of suitable distances. The intermediate data is processed to generate meaningful representations of the original dataset according to the (dis)similarities between the objects. The results show that the adoption of dimensionality reduction, clustering and visualization tools for processing complex datasets is a key modeling option with current computational resources.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 INEGI, Faculty of Engineering, University of Porto, 4200-465 Porto, Portugal
2 Institute of Engineering, Polytechnic of Porto, Dept. of Electrical Engineering, 4249-015 Porto, Portugal;