Content area
Abstract-This study aims to objectively evaluate the offensive performance of Major League Baseball players. We analyzed offensive statistics for the 2023 season, focusing on the top 30 players in the American League by batting average. Principal component analysis was conducted to reduce the number of parameters and summarize key aspects of batting performance. The first and second principal components effectively captured players' overall abilities. Scatter plots of these components also clearly illustrated distinct player characteristics. Notably, Shohei Ohtani's scores differed markedly from those of other players, indicating his exceptional performance even when assessed solely on 2023 batting data.
Abstract-This study aims to objectively evaluate the offensive performance of Major League Baseball players. We analyzed offensive statistics for the 2023 season, focusing on the top 30 players in the American League by batting average. Principal component analysis was conducted to reduce the number of parameters and summarize key aspects of batting performance. The first and second principal components effectively captured players' overall abilities. Scatter plots of these components also clearly illustrated distinct player characteristics. Notably, Shohei Ohtani's scores differed markedly from those of other players, indicating his exceptional performance even when assessed solely on 2023 batting data.
Index Terms-batting performance; Major League Baseball; offensive statistics; principal component analysis (PCA); Shohei Ohtani
1. INTRODUCTION
Major League Baseball fans often discuss player statistics as well as the game results of their favorite teams. A player's offensive performance can be assessed using many parameters, such as batting average, home runs, and on-base plus slugging (OPS). Therefore, it can be difficult to grasp a batter's overall ability at a glance. This study aims to objectively evaluate the offensive performance of Major League Baseball players. To this end, we conduct a multivariate analysis to develop statistical measures of players' overall abilities. As the primary statistical method, principal component analysis (PCA) is employed to reduce the number of parameters and summarize key aspects of batting performance. Additionally, players are grouped according to offensive performance profiles.
II. RESEARCH METHODS
A. Research Materials
We analyzed MLB players' offensive statistics for the 2023 season, obtained from the Yahoo Japan website [1] and summarized in Table 1. It consists of data on the 30 players with the highest batting averages in the MLB's American League.
B. Parameters Used in this Research
All parameters used in this study are classified as Standard Stats by Major League Baseball [2]. These include batting average (AVG), triples (3B), home runs (HR), total bases (TB), runs batted in (RBI), runs (R), strikeouts (SO or K), walks (BB), stolen bases (SB), ground into double plays (GIDP), on-base percentage (OBP), slugging percentage (SLG), on-base plus slugging (OPS), and batting average with runners in scoring position.
C. System
1) Hardware: The client computer system was a standard PC laptop, an ASUS Expert Book with an Intel Core 15-10210U CPU @ 1.60 GHz, 8 GB memory, and a 512 GB SSD.
2) Software: For basic calculations, we used LibreOffice Calc 7.3.6.2 (x64) spreadsheet software [3] running on the Windows 10 Professional operating system. For statistical analysis, we used R System version 4.2.2 [4] running on the same operating system. Both are well-known open-source applications used worldwide.
D. Data Processing
Using the acquired data, we conducted a PCA [5] with the R statistical system [4], employing the princomp function. PCA summarizes multiple variables into a smaller number of components, making the data easier to interpret. This method has been used in several studies related to Major League Baseball [6], [7], [8]. Furthermore, Attarian et al. [9],[10] placed a particular emphasis on improving the accuracy of Bayesian classifiers [11] through feature selection and dimension reduction via linear discriminant analysis (LDA) and PCA, respectively.
III. RESULTS AND DISCUSSION
A. Results of PCA
1) Table II summarizes the PCA, focusing on the first five components whose standard deviations exceed 1.00- specifically, 2.370, 1.475, 1.292, 1.251, and 1.030. The proportions of variance explained by components 1 through 5 were 0.401, 0.155, 0.119, 0.112, and 0.076, respectively. The cumulative proportions of variance explained by these components were 0.401, 0.556, 0.676, 0.787, and 0.863, respectively.
2) Figure 1 illustrates the coefficients (eigenvectors) of the first and second principal components, calculated using PCA.
3) Figure 2 displays a scatter plot of the scores for the first and second principal components, also obtained through PCA.
B. Interpretation of Principal Component Analysis
The present results have the following implications:
1) For the first principal component, the coefficients of OPS (0.392), HR (0.363), and RBI (0.341) are high. Therefore, the first principal component can be regarded as representing players" power-hitting ability.
2) For the second principal component, the coefficients of on-base percentage (0.361) and batting average (0.307) are high, while those of stolen bases (-0.479) and triples (-0.372) are low. Therefore, the second principal component can be regarded as representing players" contact hitting and speed abilities.
C. Classification of Players into Characteristic Groups
We can identify characteristic groups in Fig. 2.
1) Group One
Group One consists of players with a high first principal component and a midrange second principal component. This group includes Ohtani (LAA), Tucker (HOU), Semien (TEX), Devers (BOS). It is characterized by high OPS, high RBI, and high HR, indicating that these players are power hitters.
2) Group Two
Group Two consists of players with both a high first principal component and a high second principal component. This group includes Diaz (TB) and Seager (TEX). It is characterized by a high on-base percentage and a high batting average, indicating contact hitters with strong consistency at the plate.
3) Group Three
Group Three consists of players with a midrange first principal component and a low second principal component. This group includes Witt Jr. (KC), Robert (CWS), and Rodriguez (SEA). It is characterized by a high number of stolen bases and a high strikeout rate, suggesting speed-oriented players with high strikeout rates.
D. These results underscore Shohei Ohtani's exceptional batting performance
As Figure 2 shows, Ohtani's score for the first principal component is extremely high, while that for the second is midrange. His scores are plotted in clearly distinct positions relative to other players. These results underscore Ohtani's exceptional performance, even when evaluated solely based on 2023 batting statistics.
IV. CONCLUSIONS
This study applied PCA to condense multiple batting statistics into a smaller set of key performance indicators. We found that the first and second principal components effectively capture players' overall abilities. Additionally, scatter plots of these components clearly illustrate differences in player characteristics. Notably, Shohei Ohtani's scores were plotted in distinct positions compared to other players. These results highlight Ohtani's exceptional performance, even when evaluated solely based on 2023 batting statistics.
REFERENCES
[1] Yahoo Japan MLB (accessed on January 1, 2024) https://baseball.yahoo.co.jp/mlb/
[2] Official Site of Major League Baseball, Offence of Standard Stats (accessed on January 1, https://www.mlb.com/glossary/standard-stats
[3] Official Website of the LibreOffice Project (accessed on January 1, 2024) http://www.libreoffice.org/
[4] The R Project for Statistical Computing (accessed on January 1, 2024) https://www.r-project.org/
[5] Greenacre, M., Groenen, P.J.F., Hastie, T. et al. Principal component analysis. Nat Rev Methods Primers 2, Article number 100, 2022. Available at https://doi.org/10.1038/s43586-022-00184-w
[6] Depken, C.A., Grant, D., "Multiproduct pricing in Major League Baseball: A principal components analysis," Economic Inquiry, vol.49, no.2, pp474-488, 2011.
[7] Gushiken, S., Ikezaki, J., Miyata, R., "Principal component analysis of starting pitcher indexes in Nippon professional baseball," ICIIBMS 2015 - International Conference on Intelligent Informatics and Biomedical Sciences, pp378-379, 7439490, 2016.
[8] Matsuka, H., Asahi, Y., "What kind of foreign baseball players want to get Japanese baseball team?" Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 9735, pp560-568, 2016.
[9] Adam Attarian, George Danis, Jessica Gronsbell, Gerard Iervolino, and Hien Tran, "A Comparison of Feature Selection and Classification Algorithms in Identifying Baseball Pitches," Lecture Notes in Engineering and Computer Science: Proceedings of The International MultiConference of Engineers and Computer Scientists 2013, IMECS 2013, 13-15 March, 2013, Hong Kong, pp263-268.
[10] Attarian A., Danis G., Gronsbell J., lervolino G., Layne L., Padgett D., Tran H., "Baseball pitch classification: A Bayesian method and dimension reduction investigation," JAENG Transactions on Engineering Sciences - Special Issue of the International MultiConference of Engineers and Computer Scientists 2013 and World Congress on Engineering 2013, IMECS 2013 and WCE 2013, Routledge: Taylor & Francis Group, pp393-399, 2014.
[11] Lei Jiang, Peng Yuan, Qiongbing Zhang, and Qi Liu, "A Study of the Naive Bayes Classification Based on the Laplacian Matrix," JAENG International Journal of Computer Science, vol.47, no.4, pp713-722, 2020.
© 2025. This work is published under https://creativecommons.org/licenses/by-nc-nd/4.0/ (the"License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.