Content area
Accurate and reliable Gross Domestic Product (GDP) forecasting is indispensable for informed economic policymaking and risk management. Autocorrelation, a prevalent characteristic of macroeconomic time series, poses significant challenges to traditional forecasting methodologies and statistical process control. This study introduces a novel approach to GDP forecasting and monitoring by integrating XGBoost regression, a robust machine learning algorithm, with Individual and Moving Range (I-MR) control charts. By effectively capturing complex nonlinear relationships and mitigating autocorrelation, the proposed model offers enhanced predictive accuracy compared to conventional methods. Empirical results demonstrate the model’s efficacy in phase I, aligning closely with actual GDP values. However, phase II analysis reveals discrepancies, suggesting the need for further model refinement and the potential incorporation of additional economic indicators to improve forecast precision.
Introduction
Economic growth is quantitatively assessed by a nation’s GDP changes. GDP represents the aggregate monetary value of all final goods and services produced within a country’s geographic boundaries over a specific period, typically one year. This metric encapsulates the total remuneration accruing to the factors of production employed in the domestic production process [1]. GDP is a macroeconomic indicator that quantifies the total monetary value of final goods and services produced within a nation during a given period. Commonly used as a proxy for a country’s economic health and aggregate demand, GDP serves as a fundamental measure for assessing economic growth, productivity, and living standards. Moreover, GDP is a primary criterion for membership in the Group of Twenty (G20), an intergovernmental forum of the world’s largest economies. The G20’s focus on global economic stability, financial regulation, and sustainable development necessitates the inclusion of countries exhibiting significant GDP, indicative of their economic influence and capacity to contribute to these objectives [2]. The G20 serves as a key forum for global economic governance, bringing together the world’s twenty largest economies. Indonesia, a regional economic powerhouse in Southeast Asia, is a prominent member of this influential group, primarily due to its substantial population and economic scale. While other Southeast Asian nations, such as Malaysia, Thailand, Brunei Darussalam, and Singapore, also exhibit significant regional economic influence, Indonesia remains the sole representative of the region within this exclusive body [3].
GDP is a primary metric for assessing a nation’s economic activity and overall health. Accurate GDP forecasting is instrumental in informing effective policy formulation, investment strategies, and macroeconomic stabilization. The Autoregressive Integrated Moving Average (ARIMA) model has established itself as a prevalent time series methodology for modeling GDP fluctuations. Its efficacy stems from its capacity to capture the inherent autocorrelation and temporal patterns characteristic of economic data [4,5]. Huda et al. employed a Generalized Space-Time Autoregressive (GSTAR) model to examine the spatial and temporal determinants of GDP growth in Indonesia, Malaysia, Singapore, Thailand, and Brunei Darussalam. Their findings indicate that a uniform spatial weight matrix effectively captures the spatial dependencies among these economies, suggesting a homogeneous spatial structure in GDP growth within the region [6]. In recent years, Long Short-Term Memory (LSTM) neural networks have emerged as a promising tool for GDP forecasting. These models have demonstrated superior performance in capturing complex temporal dependencies within economic data, resulting in enhanced predictive accuracy compared to traditional time series methods [7,8].
To effectively monitor GDP trends, time series modeling can be employed in conjunction with control chart analysis. A fundamental assumption underlying control chart methodology is the independence and uncorrelated nature of observations. In the context of time series analysis, where data points exhibit inherent dependencies, residuals derived from an appropriately specified time series model are employed as control chart inputs. This approach is justified by the assumption that such residuals approximate white noise, characterized by independence and zero autocorrelation. Consequently, control charts can be effectively applied to any time series process for which a white noise residual structure is tenable. Time series models serve to mitigate the impact of autocorrelation, a statistical property that can obscure genuine process changes and lead to erroneous conclusions about GDP stability [9]. To monitor autocorrelated data, a modified control chart can be implemented by analyzing residuals derived from a time series model. As suggested by Yaschin, for data exhibiting a high degree of autocorrelation, a residual-based control chart is a suitable approach [10]. A study by Imro’ah et al. investigated the applicability of residual control charts for monitoring the forecasting accuracy of ARIMA models in the context of GDP prediction. They employed an Individual Moving Range control chart to analyze the residuals obtained from the fitted ARIMA model. The diagnostic tests revealed a lack of normality in the residuals, suggesting potential model misspecification. Furthermore, the I-MR control chart identified observations exceeding the control limits, indicating the presence of outliers or significant deviations from the predicted values. These findings collectively suggest that the chosen ARIMA model may not be sufficiently accurate in capturing the underlying dynamics of GDP and, consequently, may not be reliable for forecasting future values [11]. The I-MR chart is employed to graphically monitor individual data points collected sequentially. The I chart tracks the central tendency of the process, while the MR chart provides insights into process variability. By examining both charts concurrently, practitioners can identify potential shifts in the process mean or increases in dispersion [12,13].
Integrating control charts with machine learning models such as support vector machine (SVM), Neural Networks, and Xtreme Gradient Boosting model (XGBoost) offers various advantages and disadvantages for tracking GDP predictions. Control chart with SVM effectively monitor the stability of predictions over time and detect shift of trends in GDP values. Moreover, SVM’s computational efficiency can be a limitation when dealing with very large datasets, requiring further optimization techniques. Neural Networks, while highly accurate for capturing non-linear relationships, required significantly more computational resources and larger datasets for training to achieve optimal results. On the other hand, XGBoost known for its high performance and speed, can monitor prediction consistency, detect deviations in GDP trends, and offer robustness to missing data while providing feature importance insights.
To address the inherent limitations imposed by the autocorrelation assumption prevalent in prior research, this paper introduces a novel modeling framework utilizing XGBoost regression. One of the primary reasons for selecting XGBoost regression is its accuracy. The algorithm has consistenly outperformed other machine learning models. For GDP prediction, where precision is crucial, XGBoost’s ability to minimize prediction errors through techniques like regularization and cross-validation is invaluable [14]. Because of XGBoost’s effectiveness in managing huge datasets and capacity to reduce overfitting, it has been extensively utilized in the context of forecasting and classification [15]. The XGBoost model successfully modeled the dynamic behavior of data, accurately predicting the full range of values, including extreme events, and closely reflecting the observed flow patterns [16]. By explicitly acknowledging and accommodating the temporal dependencies inherent in the data, this approach seeks to enhance predictive accuracy and robustness compared to traditional methods that implicitly assume independence between observations. Given the extensive time range (1976–2021) and the inclusion of multiple countries, such as Indonesia, Brunei Darussalam, Malaysia, Singapore, and Thailand, the dataset is quite large. XGBoost’s scalability ensures that it can handle this extensive dataset efficiently, making it a suitable choice for the study. Kovarik et al. use control charts for time series analysis in financial management [17]. The researchers explain the use of control charts in case studies involving the analysis of Slovak currency and Argentina’s Gross Domestic Product highlights the flexibility of this method in managing cash flows and financial stability. Control charts such as CUSUM, EWMA, and ARIMA are not only capable of monitoring heteroskedastic financial processes but can also be applied in various economic contexts to detect small changes that might be overlooked by conventional methods [17]. Sulistiawanti et al. implemented a hybrid modeling approach to enhance water quality surveillance at multiple Water Treatment Plants (WTPs). The researchers leveraged Multivariate Exponentially Weighted Moving Average (MEWMA) and Multivariate Exponentially Weighted Moving Variance (MEWMV) control charts to monitor and detect anomalies in key water quality parameters. To address the inherent autocorrelation present in hydrological and water quality data, the study incorporated residual XGBoost regression as a preprocessing step, thereby improving the sensitivity and reliability of the control charting methodology in identifying process instabilities [18]. Therefore, the study that will be discussed in this research is the control of GDP growth from five countries as G20 members, using an I-MR control chart based on residual XGBoost regression.
Based on aforementioned reason, this research investigates the application of I-MR control charts to monitor and potentially regulate the GDP growth trajectories of five G20 nations. A novel approach is proposed, utilizing the residuals derived from XGBoost regression as the input data for the control charting process. This methodology aims to enhance the precision and sensitivity of identifying anomalies in GDP growth patterns, thereby providing early warning signals for potential economic instabilities. Detected anomalies or out-of-control signals in GDP growth can provide critical insights for policymakers and economists. These signals may indicate underlying economic issues such as inflation, unemployment, or supply chain disruptions. By identifying these anomalies early, policymakers can implement targeted interventions, such as adjusting interest rates, modifying fiscal policies, or introducing stimulus packages, to stabilize the economy. Economists can also use this information to forecast future economic trends and advise on long-term strategic planning.
Materials and Methods
Data source
This study employs secondary time-series data on GDP procured from the World Bank database. The sample encompasses five ASEAN nations: Indonesia, Brunei Darussalam, Malaysia, Singapore, and Thailand, spanning the period from 1976 to 2021, resulting in a total of 46 annual observations. To account for potential structural breaks in the GDP series, the data is partitioned into two phases. Phase I covers the period from 1976 to 2010, while Phase II spans 2011–2021. This delineation is informed by visual inspection of the time series plot which reveals discernible shifts in the GDP trajectories, likely attributable to the lingering effects of the 2008–2009 global financial crisis. The post-crisis period (Phase II) is characterized by the ASEAN economies’ efforts to restore economic equilibrium, consequently influencing the GDP growth dynamics within the region [19].
Data analytical method
Xtreme Gradient Boosting Model.
XGboost is a machine learning ensemble method that employs gradient boosting of decision trees. By iteratively constructing a set of decision trees, each aiming to correct the errors of its predecessors, XGBoost produces highly accurate predictive models. Recent comparative studies have demonstrated that XGBoost consistently outperforms traditional empirical formulas and numerical models in terms of predictive performance and computational efficiency [20]. XGBoost has been used extensively and has produced excellent outcomes for a wide range of issues. Because of its features for regression, classification, and ranking, this version of the Gradient Boosting Method (GBM) is more effective and scalable than previous modelsXGBoost is also a widely implemented method in machine learning and data mining challenges, with the winning team’s solutions including XGBoost or ensemble XGBoost with neural networks. This system operates ten times quicker than the most widely used versions currently in use [15].
The objective function of the XGBoost model is expressed as follows [21]:
(1)
where is the number of training data, represents the feature vector and represents the label on the ?? -th instance, represents the prediction of the ?? th instance at the -th iteration, is the loss function that measures the difference between the label and the final prediction plus the result of the new tree. represents the new tree that classifies the ?? -th instance using , and denote the regularization term that reduces the complexity of the new tree.
The primary objective of XGBoost is to continuously add weak trees with different weights to the ensemble of models. These trees should approximate the residuals of the previous predictions as closely as possible, which is expressed as follows [22]:
(2)
Where where is the prediction result, and represents the set that includes all regression trees, where represents one of the regression trees, and k is the number of regression trees. The primary objective is to ensure that the predicted value is as close as possible to the actual value , While still maintaining generalizability. For many years, this technology has garnered significant attention due to its exceptional performance in terms of high efficiency and accuracy, ease of interpretation, flexibility, and scalability. For instance, it can handle large-scale data in parallel with high efficiency and can iteratively optimize the model, which generally results in better prediction accuracy compared to other algorithms [23].
Individual-moving range control chart
The basic assumption in control charts is that observations must be independent and uncorrelated with each other. In a time series control chart, residuals are used as observations because, in time series models, residuals are assumed to be independent and uncorrelated white noise. Therefore, control charts can be applied to any time series model that assumes white noise in its residuals.
A control chart that meets this condition is the I-MR chart. In many situations, the sample size used for process monitoring is n = 1; that is, the sample comprises individual units. To supplement the controlling, a moving range chart supports successive absolute differences | Xi – Xi-1 | as the basis of estimating the process variability [24]. Where the calculation can be written as follows [11]:
(3)
(4)
Where represents the mean of the with being the observation dan representing a constant.
The I-Chart monitors the average data over time, while the MR-Chart shows variation between successive measurements. Combined with the I-MR Chart, they provide a comprehensive view of process performance, especially when focusing on individual measurements over group averages. The I-Chart plots each individual data point at regular intervals to detect trends and changes in the process, capturing both common cause and special cause variations. Data in the I-Chart should be arranged chronologically for effective time-related performance analysis. The MR-Chart complements the I-Chart by displaying differences between successive data points. The I-MR control chart helps identify when a process moves out of statistical control and indicates the source of special cause variations [25].
Steps of the analysis
The research methodology employed a sequential approach encompassing the following stages:
1. Comprehensive Literature Review: An in-depth exploration of existing literature was conducted to establish a theoretical framework and identify research gaps pertinent to the study objectives.
2. Simulation Studies: To evaluate the efficacy of the proposed approach under controlled conditions, simulation studies were implemented.
3. Variable Definition and Data Collection: Relevant variables were meticulously defined and operationalized, followed by the systematic collection of associated data.
4. Data Partitioning: Time series analysis was utilized to segment the collected data into distinct phases I and II, facilitating subsequent analysis.
5. Autocorrelation Assessment: The presence of autocorrelation within the data for each country was examined using autocorrelation function (ACF) and partial autocorrelation function (PACF) plots.
6. Modeling: In instances of significant autocorrelation, the XGBoost regression method was employed to obtain residual data and mitigate the autocorrelation effect.
7. Statistical Process Control: I-MR control charts were constructed based residual of XGBoost regression on to monitor process stability and identify potential anomalies.
8. Conclusion and Recommendation Formulation: Based on the findings derived from the preceding steps, comprehensive conclusions were drawn, and actionable recommendations were developed.
Flowchart
This visual will help orient the reader to the framework of the study and offer a clearer perspective on how each section or aspect of the research interconnects. With this overview in mind, from Fig 1 flowchart illustrates the structured methodology used throughout this paper.
[Figure omitted. See PDF.]
Pseudocodes
By presenting this pseudocode, the reader can gain a more detailed understanding of how the algorithm operates, allowing for better comprehension of the computational approach or analysis applied throughout this study.
[Figure omitted. See PDF.]
Results and discussion
Simulation Studies
The ARIMA model is a widely employed statistical method for time series forecasting, capable of capturing autocorrelation and non-stationarity inherent in such data. Prior to the application of more complex ensemble or machine learning models like XGBoost, extensive simulation studies utilizing ARIMA serve as a foundational step to characterize the underlying patterns and stochastic properties of time series [26]. “This study investigates the potential for enhancing predictive accuracy through the hybrid modeling of ARIMA and XGBoost algorithms. A simulation-based approach is employed to rigorously evaluate the model’s performance using synthetic datasets prior to real-world application. This methodology enables a comprehensive assessment of the model’s predictive capabilities under controlled conditions [27]. Three simulation scenarios were conducted, each generating 100 data points from AR(1), MA(1), and ARMA(1,1) processes. These datasets were subsequently partitioned into training (n = 70) and testing (n = 30) subsets for Phase I and Phase II analysis, respectively.
Prior to extracting residual values via XGBoost regression, a lag structure analysis was conducted for each AR(1), MA(1), and ARMA(1,1) model in Phase I. PACF plots, as depicted in Fig 2, revealed a significant lag of one at the 0.05 level for all models. Subsequently, XGBoost models were constructed for each time series in Phase I, with their respective residuals serving as input for the subsequent Phase II modeling.
[Figure omitted. See PDF.]
Subsequent model diagnostics were conducted using I-MR control charts to assess the quality of model residuals. A two-phase approach was employed for control limit estimation and process monitoring. Phase I control limits were established based on initial residual data and subsequently used to monitor the stability of residuals in phase II. Fig 3 illustrates the I-MR control charts for the three models.
[Figure omitted. See PDF.]
Table 1 presents a comparative analysis of out-of-control observations during Phase I and Phase II for three simulated models. Results indicate that the AR(1) model exhibited the highest frequency of out-of-control points in Phase I, while the ARMA(1,1) model demonstrated the greatest number of such occurrences in Phase II.
[Figure omitted. See PDF.]
Xgboost Modelling for GDP Data
Time series plot of GDP in each country is presented in Fig 4. The partitioning of the dataset into training and testing subsets was guided by the observed patterns and fluctuations within the data. The training data employed in this study did not contain any instances of GDP values that deviated significantly from the overall distribution, either in terms of extreme lows or highs. Before time series forecasting model development, autocorrelation analysis was conducted. Fig 5 presents autocorrelation function (ACF) plots for GDP growth rates across the examined countries. The ACF plots exhibit significant autocorrelation in each case. Given the presence of autocorrelation, an XGBoost regression model was employed to forecast GDP growth. To inform feature engineering for the XGBoost model, PACF plots were analyzed. These plots, depicted in Fig 6, indicate the presence of significant autocorrelation at lag 1 for Thailand’s GDP at a 0.05 significance level. This finding suggests that incorporating the previous period’s GDP value as a feature in the XGBoost model may enhance predictive performance. The inputs for the XGBoost model in phase 1 are formed as y1(i-1), y2(i-1), y3(i-1), y4(i-1), and y5(i-1).
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
In Phase I, five GDP time series models were developed using XGBoost with a maximum depth of 3 and 50 boosting iterations (nrounds). The max depth of 3 prevents overfitting by limiting tree complexity, ensuring the model captures patterns without noise. Fifty iterations balance enough boosting rounds for convergence with minimal computational load. Table 2 highlights the model’s iterative improvement over 50 cycles, with decreasing mean absolute percentage error (MAPE) and root mean square (RMSE) values indicating enhanced predictive accuracy. Initially high errors drop significantly by the final iterations, reflecting optimized performance. This shows the model’s increasing reliability across different datasets.
[Figure omitted. See PDF.]
As illustrated in Fig 7, the phase I predictions exhibited a strong alignment with the actual data patterns for all countries, as confirmed by the RMSE and MAPE values presented in Table 3. However, the phase II modeling revealed notable deviations between the predicted and actual GDP trajectories for Brunei, Malaysia, and Thailand. These discrepancies are likely attributable to structural shifts that occurred in the time series between the two phases. Despite these challenges, residual calculations were conducted for all countries in phase II to facilitate further analysis and evaluation.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
To assess the adequacy of the XGBoost model for subsequent control chart monitoring, residual diagnostics were conducted. Specifically, the ACF of the residuals was examined to verify the absence of autocorrelation, a critical assumption for control chart effectiveness. As depicted in Fig 8, the ACF plot reveals no significant autocorrelation at any lag, indicating that the XGBoost model effectively captures the underlying data structure and generates residuals consistent with the white noise assumption.
[Figure omitted. See PDF.]
Monitoring Residual Xgboost using I-MR Control Chart
The I-MR control chart was employed to assess the statistical control of residuals generated from the modeling process which presents in Fig 9. A two-phase approach was adopted. In Phase I, control limits were established based on the initial residual data. The GDP values corresponding to this phase exhibited statistical control, as evidenced by the absence of data points exceeding the control limits. Consequently, the analysis progressed to Phase II.
[Figure omitted. See PDF.]
During Phase II, the I-MR control chart, incorporating distinct control limits for this phase, revealed that residual values for all countries exceeded the established thresholds at certain points. These findings indicate the presence of significant and anomalous fluctuations in GDP values.
Comparison with Time Series Modeling using ARIMA
To further evaluate the effectiveness of different forecasting approaches, this study compares traditional forecasting using ARIMA with our proposed model, XGBoost models, in terms of their predictive performance and suitability for residual-based I-MR control charts. ARIMA, a widely used time series forecasting method, requires careful parameter tuning, typically based on ACF and PACF analysis. To manage this, we used Auto ARIMA. This can automates the selection of the optimal parameters by minimizing the Akaike Information Criterion (AIC) or Bayesian Information Criterion (BIC). Auto ARIMA reduces manual intervention and optimizes the model selection.
Comparing with the evaluation performance from XGBoost model in Table 4, showed that the RMSE and MAPE values for ARIMA models across different countries were consistently higher than those of XGBoost. This suggests that ARIMA struggles to capture the underlying patterns in GDP growth data, leading to significant forecast deviations. Auto ARIMA, while optimizing ARIMA parameters automatically, did not substantially improve forecast accuracy and remained less reliable than XGBoost. From Fig 10, show noticeable deviations from actual GDP growth trends, particularly in Phase II, where structural shifts and economic fluctuations appear more pronounced. The forecasts generated by ARIMA tend to lag behind actual values, and in some cases, overestimate or underestimate GDP growth trends. In contrast, XGBoost models align more closely with the observed data, effectively capturing both short-term and long-term fluctuations.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
The I-MR control charts in Fig 9 (XGBoost residual-based) and Fig 11 (ARIMA residual-based), along with Table 5, show that XGBoost detects more out-of-control (OOC) points in the I chart compared to ARIMA. This indicates that XGBoost is more sensitive to structural shifts in GDP trends, particularly in Brunei and Singapore, where it identifies more deviations. In contrast, ARIMA’s lower OOC count suggests it smooths out variations excessively, potentially missing key economic changes. The MR chart results further support XGBoost’s stability, as its residuals exhibit more controlled variability, especially in Malaysia, while ARIMA residuals show greater fluctuations.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
Although XGBoost has a higher OOC count, this does not indicate instability but rather a better ability to capture GDP anomalies. Table 5 confirms that XGBoost identifies real economic shifts, while ARIMA’s lower OOC count may indicate an inability to detect significant changes. Additionally, XGBoost’s MR chart remains more stable across countries, reinforcing its robustness. Given GDP’s nonlinear nature, XGBoost’s adaptability makes it the superior model for forecasting and monitoring, offering better anomaly detection and more reliable insights than ARIMA.
Discussion and limitation
This study proposes a hybrid modeling framework to enhance GDP forecasting accuracy for Malaysia, Brunei, Singapore, Thailand, and Indonesia. The methodology incorporates time series models - AR(1), MA(1), and ARIMA(1,1) - as foundational components. To capture potential non-linear relationships and improve predictive power, these time series models are subsequently integrated with the ensemble learning technique, XGBoost. The resulting forecasts are subjected to rigorous monitoring using I-MR control charts to assess the stability and reliability of the model’s performance over time.
It is essential to acknowledge the limitations inherent in this approach. The effectiveness of the hybrid model is contingent upon the underlying assumption that the GDP data for the target countries exhibit characteristics amenable to the specified time series models. Deviations from these assumptions could potentially compromise the accuracy of the forecasts. Moreover, the successful application of XGBoost is predicated on meticulous hyperparameter tuning to prevent overfitting and optimize model performance. The quality and quantity of the training data are also critical factors influencing the algorithm’s predictive capabilities. The efficacy of I-MR control charts in monitoring residual values is reliant on the assumption of stable and normally distributed error terms. Violations of these assumptions may lead to inaccurate assessments of model performance.
For modelling comparison, ARIMA models rely on strong assumptions of linearity and stationarity, which may limit their ability to model complex GDP fluctuations. In contrast, XGBoost introduces a more flexible, non-linear approach, enabling it to learn from intricate economic patterns, adapt to structural changes, and provide more accurate predictions. By incorporating I-MR control charts, both models’ residuals are monitored to evaluate their stability, revealing that XGBoost outperforms ARIMA in detecting anomalies and economic shifts.
Despite its advantages, XGBoost is not without limitations. While it successfully captures non-linear relationships, its effectiveness depends on meticulous hyperparameter tuning to prevent overfitting and ensure generalizability. In contrast, ARIMA models require parameter estimation but follow a more structured selection process (e.g., ACF and PACF analysis), making them more interpretable. Moreover, the I-MR control chart assumes that residuals follow a stable and normally distributed pattern, which may not always hold for either model.
The identification of anomalies or out-of-control signals within GDP growth trajectories can offer invaluable insights for policymakers and economists. These aberrant signals often serve as indicators of latent economic challenges, including inflation, unemployment, or disruptions in supply chains. By proactively identifying these anomalies, policymakers can implement well-targeted interventions, such as adjusting interest rates, modifying fiscal policies, or introducing stimulus packages, to effectively stabilize the economy.
Furthermore, the early detection of anomalies can enable policymakers to anticipate and mitigate potential economic crises. By understanding the underlying causes of these anomalies, policymakers can take proactive steps to address the root issues and prevent their escalation. For instance, if an anomaly is indicative of inflationary pressures, policymakers can implement measures to curb inflation, such as raising interest rates or reducing government spending.
Additionally, economists can leverage the information derived from anomaly detection to refine their economic forecasting models. By incorporating these insights into their models, economists can improve the accuracy and reliability of their predictions, providing policymakers with more informed guidance for decision-making. Moreover, the identification of anomalies can help economists identify emerging economic trends and patterns, enabling them to anticipate future developments and advise on long-term strategic planning.
Conclusion
This research investigates the application of I-MR control charts to monitor and potentially regulate the GDP growth trajectories of five G20 nations. A novel approach is proposed, employing the residuals derived from XGBoost regression as input for the control charting process. This methodology addresses the autocorrelation challenge inherent in traditional time series models, as evidenced in previous studies. By incorporating XGBoost, the study develops a more robust predictive framework for GDP forecasting. While Phase I predictions demonstrate adequate alignment with actual data, Phase II models exhibit deviations, suggesting avenues for refinement. The analysis underscores the efficacy of I-MR control charts based on XGBoost residuals in monitoring GDP forecast accuracy. Phase I residuals conform to control limits, whereas Phase II reveals out-of-control signals, indicative of significant GDP fluctuations. This research contributes to enhancing economic stability monitoring for the selected G20 countries. Ongoing refinement of the XGBoost model through hyperparameter optimization is essential for improving predictive accuracy. Future research could explore the integration of more sophisticated machine learning algorithms or hybrid models to further enhance GDP forecasting capabilities.
Supporting information
S1 Data.
https://doi.org/10.1371/journal.pone.0321660.s001
(XLSX)
References
1. 1. Aitken A. measuring welfare beyond GDP. Natl Inst econ rev. 2019;249:R3–16.
* View Article
* Google Scholar
2. 2. Terra dos Santos LC, Frimaio A, Giannetti BF, Agostinho F, Liu G, Almeida CMVB. Integrating Environmental, social, and economic dimensions to monitor sustainability in the G20 Countries. Sustainability. 2023;15(8):6502.
* View Article
* Google Scholar
3. 3. Zamroni S. Indonesia in the G20: benefits and challenges amidst national interests and priorities. 2010.
4. 4. Sankwa CE, Sharper S. Forecasting Zambia’s gross domestic product using time series autoregressive integrated moving average (ARIMA) model. Int J Innov Sci Res Technol. 2020;5(9):440–7.
* View Article
* Google Scholar
5. 5. Haque MA, Ahmed A. Time series modeling and forecasting on GDP Data of Bangladesh: an application of arima model. IJLTEMAS. 2024;XIII(IV):199–207.
* View Article
* Google Scholar
6. 6. Huda NM, Imro’ah N, Arini NF, Utami DS, Umairah T. Looking at GDP from a Statistical Perspective: Spatio-Temporal GSTAR(1;1) Model. JTAM. 2023;7(4):976.
* View Article
* Google Scholar
7. 7. Bhavika Nemade. Computational analysis for enhanced forecasting of India’s GDP growth using a modified LSTM Approach. Commun Appl Nonlinear Analy. 2024;31(2s):339–59.
* View Article
* Google Scholar
8. 8. Shams MY, Tarek Z, El-kenawy E-SM, Eid MM, Elshewey AM. Predicting Gross Domestic Product (GDP) using a PC-LSTM-RNN model in urban profiling areas. ComputUrban Sci. 2024;4(1).
* View Article
* Google Scholar
9. 9. Alwan LC. Effects of autocorrelation on control chart performance. Commun Stat Theory and Methods. 1992;21(4):1025–49.
* View Article
* Google Scholar
10. 10. Yashchin E. Performance of CUSUM control schemes for serially correlated observations. Technometrics. 1993;35(1):37–52.
* View Article
* Google Scholar
11. 11. Imro’ah N, Huda NM. Control chart as verification tools in time series model. BAREKENG: J Math & App. 2022;16(3):995–1002.
* View Article
* Google Scholar
12. 12. Marks NB, Krehbiel TC. Design and application of individuals and moving range control charts. JABR. 2011;25(5).
* View Article
* Google Scholar
13. 13. Adke SR, Hong X. A Supplementary Test Based on the Control Chart for Individuals. J Qual Technol. 1997;29(1):16–20.
* View Article
* Google Scholar
14. 14. Brownlee J. A gentle introduction to XGBoost for applied machine learning. https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/.
* View Article
* Google Scholar
15. 15. Chen T, Guestrin C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016:785–94.
* View Article
* Google Scholar
16. 16. Meddage DPP, Mohotti D, Wijesooriya K. Predicting transient wind loads on tall buildings in three-dimensional spatial coordinates using machine learning. J Build Eng. 2024;85:108725.
* View Article
* Google Scholar
17. 17. Kovářík M, Sarga L, Klímek P. Usage of control charts for time series analysis in financial management. J Business Econ Manage. 2014;16(1):138–58.
* View Article
* Google Scholar
18. 18. Sulistiawanti N, Ahsan M, Khusna H. Multivariate exponentially weighted moving average (MEWMA) and multivariate exponentially weighted moving variance (MEWMV) chart based on residual XGBoost regression for monitoring water quality. Engineering Letters. 2023;31(3):1001–8.
* View Article
* Google Scholar
19. 19. Hill H, Menon J. ASEAN economic integration: features, fulfillments, failures and the future. 2010.
20. 20. Tarwidi D, Pudjaprasetya SR, Adytia D, Apri M. An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX. 2023;10:102119. pmid:37007622
* View Article
* PubMed/NCBI
* Google Scholar
21. 21. Fang Z-G, Yang S-Q, Lv C-X, An S-Y, Wu W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study. BMJ Open. 2022;12(7):e056685. pmid:35777884
* View Article
* PubMed/NCBI
* Google Scholar
22. 22. Zhang L, Bian W, Qu W, Tuo L, Wang Y. Time series forecast of sales volume based on XGBoost. J Phys: Conf Ser. 2021;1873(1):012067.
* View Article
* Google Scholar
23. 23. Li X, Shi L, Shi Y, Tang J, Zhao P, Wang Y, et al. Exploring interactive and nonlinear effects of key factors on intercity travel mode choice using XGBoost. Appl Geog. 2024;166:103264.
* View Article
* Google Scholar
24. 24. Montgomery DC. Introduction to statistical quality control. 8th ed. John Wiley & Sons, Inc; 2020.
25. 25. Ahmed N, Matsushima K, Nemoto K, Kondo F. Identification of inheritance and genetic loci responsible for wrinkled fruit surface phenotype in chili pepper (Capsicum annuum) by quantitative trait locus analysis. Mol Breed. 2024;45(1):5. pmid:39734933
* View Article
* PubMed/NCBI
* Google Scholar
26. 26. Wilson GT. Time series analysis: forecasting and control, 5th Edition. Box GEP, Jenkins GM, Reinsel GC, Ljung GM (Editors) J Time Ser Anal. 2015. Hoboken, New Jersey: John Wiley and Sons Inc; 2016;37(5):709–711, 712. ISBN: 978‐1‐118‐67502‐1. https://doi.org/10.1111/jtsa.12194
27. 27. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018.
Citation: Aisy RR, Zulfa L, Rahim Y, Ahsan M (2025) Residual XGBoost regression—Based individual moving range control chart for Gross Domestic Product growth monitoring. PLoS One 20(5): e0321660. https://doi.org/10.1371/journal.pone.0321660
About the Authors:
Rahida Rihhadatul Aisy
Roles: Data curation, Formal analysis, Methodology, Visualization, Writing – review & editing
Affiliation: Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Latifatuz Zulfa
Roles: Resources, Software, Writing – original draft
Affiliation: Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Yolanda Rahim
Roles: Formal analysis, Investigation, Visualization
Affiliation: Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
Muhammad Ahsan
Roles: Investigation, Methodology, Supervision, Validation, Writing – review & editing
E-mail: [email protected]
Affiliation: Department of Statistics, Faculty of Science and Data Analytics, Institut Teknologi Sepuluh Nopember, Surabaya, Indonesia
ORICD: https://orcid.org/0000-0003-3444-2766
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Aitken A. measuring welfare beyond GDP. Natl Inst econ rev. 2019;249:R3–16.
2. Terra dos Santos LC, Frimaio A, Giannetti BF, Agostinho F, Liu G, Almeida CMVB. Integrating Environmental, social, and economic dimensions to monitor sustainability in the G20 Countries. Sustainability. 2023;15(8):6502.
3. Zamroni S. Indonesia in the G20: benefits and challenges amidst national interests and priorities. 2010.
4. Sankwa CE, Sharper S. Forecasting Zambia’s gross domestic product using time series autoregressive integrated moving average (ARIMA) model. Int J Innov Sci Res Technol. 2020;5(9):440–7.
5. Haque MA, Ahmed A. Time series modeling and forecasting on GDP Data of Bangladesh: an application of arima model. IJLTEMAS. 2024;XIII(IV):199–207.
6. Huda NM, Imro’ah N, Arini NF, Utami DS, Umairah T. Looking at GDP from a Statistical Perspective: Spatio-Temporal GSTAR(1;1) Model. JTAM. 2023;7(4):976.
7. Bhavika Nemade. Computational analysis for enhanced forecasting of India’s GDP growth using a modified LSTM Approach. Commun Appl Nonlinear Analy. 2024;31(2s):339–59.
8. Shams MY, Tarek Z, El-kenawy E-SM, Eid MM, Elshewey AM. Predicting Gross Domestic Product (GDP) using a PC-LSTM-RNN model in urban profiling areas. ComputUrban Sci. 2024;4(1).
9. Alwan LC. Effects of autocorrelation on control chart performance. Commun Stat Theory and Methods. 1992;21(4):1025–49.
10. Yashchin E. Performance of CUSUM control schemes for serially correlated observations. Technometrics. 1993;35(1):37–52.
11. Imro’ah N, Huda NM. Control chart as verification tools in time series model. BAREKENG: J Math & App. 2022;16(3):995–1002.
12. Marks NB, Krehbiel TC. Design and application of individuals and moving range control charts. JABR. 2011;25(5).
13. Adke SR, Hong X. A Supplementary Test Based on the Control Chart for Individuals. J Qual Technol. 1997;29(1):16–20.
14. Brownlee J. A gentle introduction to XGBoost for applied machine learning. https://machinelearningmastery.com/gentle-introduction-xgboost-applied-machine-learning/.
15. Chen T, Guestrin C. XGBoost. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016:785–94.
16. Meddage DPP, Mohotti D, Wijesooriya K. Predicting transient wind loads on tall buildings in three-dimensional spatial coordinates using machine learning. J Build Eng. 2024;85:108725.
17. Kovářík M, Sarga L, Klímek P. Usage of control charts for time series analysis in financial management. J Business Econ Manage. 2014;16(1):138–58.
18. Sulistiawanti N, Ahsan M, Khusna H. Multivariate exponentially weighted moving average (MEWMA) and multivariate exponentially weighted moving variance (MEWMV) chart based on residual XGBoost regression for monitoring water quality. Engineering Letters. 2023;31(3):1001–8.
19. Hill H, Menon J. ASEAN economic integration: features, fulfillments, failures and the future. 2010.
20. Tarwidi D, Pudjaprasetya SR, Adytia D, Apri M. An optimized XGBoost-based machine learning method for predicting wave run-up on a sloping beach. MethodsX. 2023;10:102119. pmid:37007622
21. Fang Z-G, Yang S-Q, Lv C-X, An S-Y, Wu W. Application of a data-driven XGBoost model for the prediction of COVID-19 in the USA: a time-series study. BMJ Open. 2022;12(7):e056685. pmid:35777884
22. Zhang L, Bian W, Qu W, Tuo L, Wang Y. Time series forecast of sales volume based on XGBoost. J Phys: Conf Ser. 2021;1873(1):012067.
23. Li X, Shi L, Shi Y, Tang J, Zhao P, Wang Y, et al. Exploring interactive and nonlinear effects of key factors on intercity travel mode choice using XGBoost. Appl Geog. 2024;166:103264.
24. Montgomery DC. Introduction to statistical quality control. 8th ed. John Wiley & Sons, Inc; 2020.
25. Ahmed N, Matsushima K, Nemoto K, Kondo F. Identification of inheritance and genetic loci responsible for wrinkled fruit surface phenotype in chili pepper (Capsicum annuum) by quantitative trait locus analysis. Mol Breed. 2024;45(1):5. pmid:39734933
26. Wilson GT. Time series analysis: forecasting and control, 5th Edition. Box GEP, Jenkins GM, Reinsel GC, Ljung GM (Editors) J Time Ser Anal. 2015. Hoboken, New Jersey: John Wiley and Sons Inc; 2016;37(5):709–711, 712. ISBN: 978‐1‐118‐67502‐1. https://doi.org/10.1111/jtsa.12194
27. Hyndman RJ, Athanasopoulos G. Forecasting: principles and practice. OTexts; 2018.
© 2025 Aisy et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.