Development of Hybrid Methods for Prediction of

Full text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

In the development of the national economy of any country, there is a significant contribution of minerals. The development of effective policies is necessary for mining industries. The income obtained by the mineral sector can be optimized, and the industry can alleviate the shortage of metals through accurate prediction of the mining process, as minerals have a massive impact on the economy of Pakistan. Since the mineral production data are nonstationary and have multiscale stochastic attributes, the mining process is affected by climate change and other projects related to socioeconomic developments that result in a challenging task for predicting the production of minerals resources. Two models are commonly used to predict such types of data: a process-based model and a data-driven model. The process-based model needs an extensive calibration and validation dataset [1]. The data-driven model considered the physical mechanism and scientific knowledge of stochastic geologic processes [2]. Huang et al. [2] used the process-based models for the prediction of hydrological data that were also nonlinear and having complex characteristics. They concluded that the lack of scientific knowledge and unavailability of sufficient data make it difficult to predict. Data-driven models were efficiently used to overcome the drawback of process-based models [3].

The data-driven models are further categorized as traditional statistical and machine learning (ML) models. The traditional statistical methods, i.e., autoregressive integrated moving average (ARIMA), only consider the stationary and linear data. The ARIMA model is successfully applied to predict the production of mineral resources [4]. However, the problem with this traditional model is that it needed data to be stationary. Furthermore, since the production data of mineral resources are nonstationary and have nonlinearity in them, they also contain complex time-varying characteristics. Therefore traditional statistical models are not enough to capture the nonlinearity and time-varying characteristics of the nonlinear data. However, ML techniques can be used to deal with the drawbacks and instability of the traditional statistical models, as ML methods are used to deal with nonstationary and nonlinear time-series data [5]. Therefore, ML methods also are suitable for capturing the nonstationary and nonlinear behavior of mineral production data.

Commonly used ML techniques are artificial neural network (ANN) and support vector machine (SVM), which also suffered from the problem of parameter selection and overfitting, and do not consider time-varying characteristics of nonstationary time-series data. Moreover, the production data of mineral resources contain noises that forbid the researcher to predict them accurately. Therefore, hybrid techniques are needed to model the noise corrupted and time-varying characteristics of the production of mineral resources data. Bokde et al. [6] proposed a hybrid model (EMD-ANN) for the nonstationary wind speed prediction and concluded that its performance was highly satisfactory and robust for jumping samplings than ANN and ARIMA models.

To deal with the limitations of simple existing models, some of the preprocessing strategies are utilized with different data-driven models to increase the prediction accuracy of a different kind of nonlinear and nonstationary time series. The hybris models are developed to get the time-varying characteristics through noise reduction. The different hybrid models with preprocessing strategies have already been applied for complex and nonlinear data, having time-varying characteristics [7]. The strategy of the hybrid technique is based on decomposition and denoising, prediction, and ensemble stages [1, 8, 9]. Several algorithms are developed, such as spectral analysis, wavelet analysis (WA), Fourier analysis, and empirical mode decomposition (EMD), to reduce these noises or stochastic volatiles from the data [10, 11]. Fourier analysis and spectral analysis are used for those kinds of data that are stationary or linear. However, EMD and WA are the most commonly used preprocessing algorithms for nonlinear or nonstationary data and provide better results. The algorithms of WA decompose the nonlinear and nonstationary data of mineral resources into multiscale components [12]. These components are used as inputs at the prediction stage, and then these predicted components are ensemble for final prediction. The authors in [5] proposed a hybrid model for forecasting streamflow by using wavelet transform to decompose streamflows and use ANN for forecasting purposes. They concluded that their hybrid model was efficient than simple existing models. In the present paper, EMD- and WA-based thresholds are used to reduce noises from the mineral production data. The WA is considered as a powerful tool for converting a signal into a stationary signal with specific effectiveness. In the literature, different hybrid models with wavelet decomposition are used for the prediction of different kinds of nonlinear and nonstationary time-series data [13]. Wu et al. [14] proposed a hybrid forecasting model in combination with the particle swarm optimization algorithm and wavelet neural network (PSO-WNN) to predict China’s natural gas consumption. They used the PSO algorithm to optimize the initial weights, and the parameters of wavelet are updated through dynamic learning to have improvement in forecasting precision and reduce fluctuation of the WNN. They concluded that the proposed model is superior as compared to ANN- and WNN-based models. Some improvements have been made in hybrid modes consisting WA decomposition to get more accurate results of the prediction of daily flows which is also nonlinear time-series data [13]. However, the performance of the WA depends upon the selection of the type of mother wavelet. Prior knowledge about the signal, which is to be analyzed, and prior knowledge about its frequency content are needed for a suitable choice of the mother wavelet. Wu et al. [15] proposed an EMD method to overcome the shortcomings of WA for scrutinizing the nonlinear data and nonstationary datasets. Complex time-series data can be decomposed into a small and finite number of IMFs by using EMD. The EMD strategy has the advantage of converting the nonstationary series into stationary series. There exist different studies in the literature that used EMD with different data-driven models such as EMD-artificial neural network (EMD-ANN), EMD-radial basis function (EMD-RBF), EMD-support vector machines (EMD-SVMs), EMD-relevant vector machine (EMD-RVM), and EMD-ARIMA, and these hybrid models improve the prediction accuracy [16–18]. The EMD is combined with the ANN in many past studies, especially in hydrology [19], and also a novel model based on EMD and deep learning is used by Mi et al. [20] to reduce the noises and extract the information of trend of the original data of wind speed. However, EMD suffers from the problem of mode mixing between the IMFs [13]. To overcome the mode mixing problem, the authors in [13] proposed a new technique of decomposition called ensemble empirical mode decomposition (EEMD), in which they used white Gauss noise. The EEMD method can separate the signals without inappropriate mode mixing. It uses white noise that helps to establish the dyadic reference frame on time-scale space. They concluded that removing the problem of mode mixing EEMD produced a set of IMFs that carry the full physical meanings. Many hybrid techniques based on EEMD are used for streamflow and wind speed prediction and in hydrology [21, 22]. Di et al. [1] proposed a four-stage hybrid model based on EEMD decomposition to bring improvement in the prediction accuracy by minimizing the noises. They concluded that EEMD in combination with data-driven models could improve the prediction accuracy compared to EMD-based hybrid models. Liu [23] proposed a hybrid model by combining EEMD with the grey SVM model (GSVM) to forecast high-speed rail passenger flow. They concluded that their two-stage hybrid model is more suitable for short-term predictions than other single and hybrid models.

Ghumman et al. [24] proposed two hybrid models based on preprocessing techniques EEMD, called EEMD pattern sequence-based forecasting (EEMD-PSF) and EEMD difference pattern sequence-based forecasting (EEMD-DPSF) models. The first model decomposes the series into number of IMFs and improves the prediction performance using the PSF model, and the second model reduced the effects of seasonality, trend, and irregularity from the wind speed data. They compared their performances with simple PSF, ARIMA, and LSSVM models for wind speed and concluded that EEMD-PSF and EEMD-DPSF models outperformed than simple models. Jaitly and Hinton [25] presented a comprehensive review of decomposition-based methods to improve the wind forecasting accuracy. They used and discussed wavelet analysis, EMD, seasonal adjusted, intrinsic time-scale decomposition, variational mode decomposition, and Bernaola-Galvan algorithm for decomposition of wind speed time-series data as an alternative forecasting algorithm. Through comparative analysis of various decomposition-based models, they concluded that the decomposition-based model provided the accuracy in prediction as compared to other forecasting algorithms.

Although EEMD proved helpful in solving mode mixing issues, it created one other problem: some residual noise during signal reconstruction. As a result, there may be different modes because of the various realizations of signal and noise. To solve this problem, Mehrsai et al. [26] proposed the complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) technique. A specific noise is added in their proposed model at each decomposition stage, and a residue is calculated to get each mode. The final complete results of decomposition are obtained numerically with the negligible error term.

This paper proposed a revised hybrid model to decompose the time-varying characteristics of the production data of mineral resources built upon CEEMDAN based on a 48-step ahead of the direct recursive prediction strategy. The study focuses on the preprocessing techniques (denoising and decomposing) and their effect on the prediction of mineral resources. An important point to note from the past studies is that they used a one-step-ahead prediction strategy rather than a multistep-ahead prediction [27]. In one-step-ahead prediction, all the observations are used by the predictor to estimate the parameter for the time step, and we used 48-month-ahead prediction to predict the mineral resources. For improved and better performance of the revised hybrid model, the long-term behavior of mineral resource production is observed by considering the 48-step-ahead direct recursive prediction strategy.

In the current study, we aimed to make a revised hybrid model using the preprocessing techniques to decompose the production data of mineral resources using the WA-CEEMDAN technique, which follows the steps of EEMD by adding white noise at each level of decomposition. Therefore, the purpose of using the CEEMDAN algorithm in the current study is to find an efficient way to decompose the time-series data of mineral resources, which increased its prediction accuracy. Furthermore, the current study explores its prediction performance using 48-step-ahead prediction, especially for nonlinear data, by considering this emerging hybrid modeling technique.

Based on the details outlined above, this paper is focused on developing a modified model to predict the production of mineral resources, improving the EMD/WA-CEEMDAN-based multimodels (EMD/WA-CEEMDAN). In this paper, the prediction performance of CEEMDAN-based hybrid models used for long-term to review the prediction of mineral resources is observed. Section 2 is focused on the motivations behind the production data of mineral resource prediction. A short review of models used for prediction of mineral’s data and an introduction to EMD, EEMD, CEEMDAN, and their modified versions are discussed in Section 3. Additionally, a short review of various approaches used in hybrid CEEMDAN models to select the appropriate prediction methods with the consideration of the characteristics of respective IMFs is also discussed in Section 3. Section 4 describes details of the study area and data. Section 5 presents and discusses the results of the case study, while conclusions are made for this research in Section 6.

2. Motivation behind the Prediction of the Mineral Resources

In recent years, prediction of mineral resource prediction has become a challenging task for researchers. Although there is abundance of coal reserves in Pakistan, still share of gas and oil is about 65% in the energy mix. Pakistan, despite being a mineral-enriched country, is facing an alarming situation as its power generation is based on foreign exchange. The mineral sector of Pakistan is dominated by four principal minerals which are gas, oil, gypsum, and coal. There is a need to analyze and accurate prediction of the production of these four major minerals to deal with emerging challenges. The main purpose towards accurate prediction of mineral production data is to get the efficient and optimum utilization of natural resources in the development of the economy.

Prediction of the mineral resources is considered to be an important study because of nonlinear, nonstationary, and having time-varying characteristics of datasets. Many researchers are working towards accurate prediction of nonlinear datasets, having complex time-varying characteristics. Although these researchers have the same goal, most of them have varieties in their motivations.

Saadat et al. [28] used the CEEMDAN algorithm in their proposed model to predict the daily peak load. The proposed three-stage hybrid model comprised the CEEMDAN technique, which showed a robust decomposed ability of reliable prediction. Nazir et al. [29] developed a robust hybrid model by utilizing the CEEMDAN algorithm to decompose hydrological time-series data, which showed the robust decomposition ability to forecast the nonlinear and complex time-series data. The purpose of including such article is that those inferred the practicability and serviceability of the hybrid models for the decomposition of nonlinear datasets.

3. Proposed Methodology

In the current study, we proposed two hybrid methods to enhance the prediction accuracy of the time-series data. Both methods have the same map and formation except in the denoising stage, where the approaches for eliminating the noises from the data are different. At the decomposition stage of both methods, CEEMDAN (an improved form of EEMD) [25] is used to identify the oscillation. At the prediction stage, the multimodels are used after considering the nature of IMFs to predict the obtained IMFs accurately. There are two main objectives of using the multimodels: predicting the IMF by looking up their nature accurately. The second objective is to determine the attainments of the complex and straightforward models after the reduction of the complexity of mineral resource time-series data by using decomposition strategies. Finally, all the IMFs obtained through prediction are combined to predict the time-series data. The proposed hybrid models consist of denoising, denoted as D-step, decomposition, and prediction, denoted as Decompose-step and P-step, respectively [29]. A short description of these steps is given as follows:

(1) In D-step, the WA and EMD methods are presented to remove the noise from the production of mineral resource time-series data.

(2) In Decompose-step, CEEMDAN is used to decompose the denoised series into the $k$ IMF components and one residual term.

(3) In P-step, the series obtained after denoising is decomposed into IMF components and one residual term, which is predicted by using linear stochastic and nonlinear machine learning methods. The model, which has a lower rate of error for prediction than other models, is utilized for further consideration based on three performance evaluation criteria. At last, the predicted results are combined to get the final prediction.

Compared with the earlier review papers, the main contribution of the current study is to review the prediction literature from the perspective of decomposition-based hybrid methods. More precisely, we compile the decomposition-based models and discuss their algorithms and structures. Furthermore, we examine that how decomposition-based hybrid models improve prediction accuracy. Finally, existing techniques adopted in recent years to improve the performance of decomposition-based models are also compiled and discussed.

To better understand and ease the reader, by combining all the steps mentioned above, the proposed strategy can be named as EMD (denoising)-CEEMDAN (decomposing)-MM (multimodels), and WA (denoising)-CEEMDAN (decomposing)-MM (multimodels) shortly named as EMD-CEEMDA-MM and WA-CEEMDAN-MM models. Both models are exhibited in Figure 1.

[figure omitted; refer to PDF]

3.1. Denoising Step (D-Step)

In many time-series data, noises are certain components that sometimes reduce the prediction accuracy. Many algorithms exist in the literature, such as spectral analysis, WA, Fourier analysis, and EMD, to reduce these noises or stochastic volatiles from the data [30]. Fourier analysis and spectral analysis are used for those kinds of data that are stationary or linear. However, EMD and WA can also deal with nonlinear or nonstationary data and provide better results. In this study, EMD- and WA-based thresholds are used to reduce noises from the mineral production data.

3.1.1. Wavelet Analysis (WA)

WA has developed as an efficient tool for converting a signal into a stationary signal with specific effectiveness. The wavelet with finite energy (small wave) is established in time, or frequency for analysis of temporal phenomena serves as a basis function. One of the favorite uses of the discrete wavelet transformation (DWT) is to eliminate the noises from the signal. This idea contains a strategy of applying the threshold wavelet on the coefficients of the signal contains the noises called the wavelet technique for denoising. The wavelet threshold is recognized as a powerful method for removing the noises from the signal. For using the wavelet basis soft and hard threshold, it is necessary to decide about the choice of the mother wavelet function. Using the Symlet 8 mother wavelet [31], the mineral production data are decomposed into the following coefficients [26, 30]: $\begin{matrix} (1) & b_{k, l} = \sum_{l = 0}^{2^{N - k} - 1} 2^{- k / 2} ϕ 2^{- k} p - l, \\ (2) & c_{k, l} = \sum_{k = 1}^{K} \sum_{l = 0}^{2^{N - k} - 1} 2^{- k / 2} θ 2^{- k} p - l, \end{matrix}$ where $b_{k, l}$ is the approximation coefficient and $c_{k, l}$ is the detail coefficient.

Signal’s energy can be divided over some of the wavelet coefficients with high magnitude by using this technique. On the other side, noise energy can be divided into many wavelet coefficients that have a low magnitude. The denoised signal can be obtained by removing wavelet coefficient noise and by reconstructing the coefficients. Wavelet denoising has three steps: (1) decomposition of the signal into coefficients of wavelet, (2) applying thresholds on the coefficients obtained in step 1 [32], and (3) reconstruction of the wavelet coefficients.

3.1.2. Hard and Soft Thresholds

The assignment of the actual threshold is necessary for the elimination of the noises during denoising. As in wavelet transformation, the efficiency of the denoising technique depends on the value of the threshold. Several selection procedures are most likely for deciding about the thresholding rule, but two symbolic ones are hard thresholding and soft thresholding [33]. In the current study, for calculating the wavelet coefficients, soft and hard thresholding are applied to implement the wavelet denoising technique, which is listed below [26]. The soft thresholding is $\begin{matrix} (3) & c_{k, l}^{'} = \begin{cases} c_{k, l}, & c_{k, l} \geq {Th}_{k}, \\ 0, & c_{k, l} < {Th}_{k}, \end{cases} \end{matrix}$ and the hard thresholding is $\begin{matrix} (4) & c_{k, l}^{'} = \begin{cases} sgn c_{k, l} c_{k, l} - {Th}_{k}, & c_{k, l} \geq {Th}_{k}, \\ 0, & c_{k, l} < {Th}_{k}, \end{cases} \end{matrix}$ where ${Th}_{k}$ is the threshold which is calculated as ${Th}_{k} = a \sqrt{2 E_{k} \ln N}, k = 1,2,3, \dots, K$ , $a$ is the constant which considers the values in the interval of 0.4 and 1.4 with the jump of 0.1, and ${\hat{Md}}_{k}$ is the median deviation, i.e., $\begin{matrix} (5) & {\hat{Md}}_{k} = \frac{median c_{k - 1, l} | k = 1,2,3, \dots, 2^{k - 1} - 1}{0.6745} . \end{matrix}$

By utilizing the following equation, the decomposed signal is reconstructed using the approximations and noise-free details: $\begin{matrix} (6) & x p = \sum_{k = 0}^{2^{N - K} - 1} a_{k, l}^{'} 2^{- k / 2} ϕ 2^{- k} p - l + \sum_{k = 1}^{k} \sum_{k = 0}^{2^{N - k} - 1} c_{k, l}^{'} 2^{- k / 2} θ 2^{- k} p - l, \end{matrix}$ where $a_{k, l}^{'}$ and $c_{k, l}^{'}$ are the threshold approximation and detailed coefficients, respectively.

3.1.3. Empirical Mode Decomposition (EMD)

The EMD is a flexible kind of decomposition technique and a purely data-driven technique to decompose the sophisticated signals into the series of components of the nonlinear and nonstationary data. After decomposition, by combining all the components for reconstructing, the original signal without any loss of the information can be obtained. The primary purpose of EMD is to get the IMFs from the complex signals. The following two conditions should be satisfied by the extracted IMFs [34]: (a) from all the data, the total of zero-crossings and the extrema should either be equal or differ by at most one; (b) at any point, the value of the envelope should be zero.

The main steps of the EMD for an original time series $x p, p = 1,2,3, \dots, N$ are as follows [33]:

(1) Identification of all the local extrema of original time series $x p$ .

(2) Create the upper and lower envelope as $U p$ and $G p$ by using a cubic spline.

(3) Estimate the mean value of the upper envelope and the lower envelope $m p = U p + G p / 2$ .

(4) Find the difference in the mean of the envelope from the original series $x p$ . The difference $d p$ is calculated as $d p = x p - m p$ . Then, examine the properties of $d p$ .

(5) Repeat 1–4 steps before the number of extrema is less than or equal to one so that no more IMF can be extracted or residue $e p$ becomes a monotonic function.

At last, the signal can be shown as the sum of all the IMFs and residue $e p$ , where $n$ is the number of IMFs, $C_{i} p, i = 1,2,3, \dots, n$ is the $i^{th}$ IMF, and $e p$ is the residue. The way to denoise the IMF is the same as described in steps (1)–(4), except for the last two because of the low-frequency IMFs, which are used without denoising them [26, 35]: $\begin{matrix} (7) & x p = \sum_{i}^{n} d_{i} + e p, \end{matrix}$ where $n$ shows the number of sifted IMFs, as $i = 1,2,3, \dots, n$ $e p$ is the trend of the signal and $d_{i} p$ is the $i^{th}$ IMF. Except for the last two IMFs, which are used without applying denoising techniques on them because of the low magnitude in their frequencies, the rest of the procedure of denoising process is the same as in wavelet-based denoising by using (3), (4), and (7) equations. In equations (3), (4), and (7), according to the number of IMFs, the subscript is replaced by $i$ . Before reconstructing the signal, a smooth signal for input can be obtained by applying the thresholds on IMFs. The rebuilding of the denoised signal is generalized as $\begin{matrix} (8) & x p = \sum_{i = 1}^{n_{1} - 2} h_{i} p + \sum_{i = n_{1} - 2}^{n} h_{i} p + e p, \end{matrix}$ where the parameter $n_{1}$ is the number of IMFs which provides us with the easiness of the elimination of the low-order IMF which are noisy and also of higher-order IMFs, which are a little bit noisy in Gaussian noise conditions as $i = 1,2,3, \dots, n$ $h_{i} p$ , and $e p$ is the $i^{th}$ IMF and trend of the signal, respectively.

3.2. Decomposition Step (Decompose-Step)

In the decomposition step, we need to apply a helpful tool to reduce the noises, providing accurate results. EEMD technique is applied in the decomposition step to handle the trouble of mode mixing.

3.2.1. Ensemble Empirical Mode Decomposition (EEMD)

To improve the EMD and mitigate the mode mixing, EEMD is developed. In this technique, the white noise added by EEMD is distributed equally among all the time-frequency space, which helps in the separation of the frequency scales and decreases the occurrence of mode mixing. The procedure is presented as follows [36, 37]:

(i) Initialization of the ensemble number $Q$ .

(ii) Set the amplitude of the added white noise $i = 1$ .

(iii) Add the random white noise signal ${wn}_{i} p$ in the original signal $x p$ : $\begin{matrix} (9) & x_{i} p = x p + {wn}_{i} p, \end{matrix}$

where ${wn}_{i} p$ is the $i^{th}$ included series of white noise and $x_{i} p$ denotes the $i^{th}$ included noise signal $i = 1 \sim Q, Q > 1$ .

(iv) By using EMD, decompose the noise signal $x_{i} p$ into $N$ IMFs $C_{j, i} p, j = 1,2,3, \dots, P$ , where $C_{j, i} p$ shows the $j^{th}$ IMF of the $i^{th}$ noise signal and $P$ is the total number of IMFs.

(a) To obtain the pro-IMF $s_{1} p$ , subtract the mean envelope $m_{1} p$ from the original $x_{1} p$ , i.e., $s_{1} p = x_{1} p - m_{1} p$ , where $m_{1} p = U p + G p / 2$ .

(b) Consider $s_{1} p$ as a new signal if the average of the lower and upper envelope becomes zero and if the number of zero-crossing and the number of extrema are equal or almost one.

(d) From the original signal, $x_{1} p$ , subtract the resulting IMF $C_{1} p$ . Consider the residue $e_{1} p$ as the new data and go back to step 1: $\begin{matrix} (10) & x_{1} p = C_{1} p + e_{1} p . \end{matrix}$

(e) In the substep, if the residue becomes the monotonic function, then complete the algorithm, and if $i < Q$ , then go back to step 3. The last residual is treated as the trend:

(v) Estimate the ensemble mean ${IMF}_{j} p$ of all trials of each IMF: $\begin{matrix} (12) & {IMF}_{j} p = \frac{1}{P} \sum_{j = 1}^{P} C_{j, i} p, j = 1,2,3, \dots, P, i = 1,2,3, \dots, Q . \end{matrix}$

(vi) Consider the mean ${IMF}_{j} p$ as the final mean of all the $P$ IMFs.

3.2.2. Complete Ensemble Empirical Mode Decomposition with Added Noise (CEEMDAN)

Although EEMD can bring down the problem of mode mixing to a certain degree with included white noise sequence, the error cannot be eliminated after the computation of the averaging to a finite number. It affects the sequence of reconstruction. For the elimination of the mode mixing, CEEMDAN adds the adaptive white noise smoothing pulse interference in decomposition, and for making the decomposition of the data more complete, it uses the properties of the mean Gaussian white noise whose mean is zero. The detailed procedure of the CEEMDAN is as follows.

Persistent with EEMD, in the calculation of CEEMDAN, P times decompose the original signal $x p$ , i.e., $x p + r_{i} w n_{i} p$ , where $r_{i}$ is the parameter which deals the signal to noise ratio. The first component of the IMF is $\begin{matrix} (13) & {IMF}_{1} p = \frac{1}{P} \sum_{i = 1}^{P} c_{j, 1} p . \end{matrix}$

The residual of the signal is $\begin{matrix} (14) & e_{1} p = x p - {IMF}_{1} p . \end{matrix}$

The $d p$ is defined as the $l^{th}$ IMF component obtained from EMD. The sequence $v_{1} p + r_{1} d_{1} p n_{j} p$ is decomposed as follows to get the second IMF component: $\begin{matrix} (15) & {IMF}_{2} p = \frac{1}{P} \sum_{j = 1}^{P} d_{1} e_{1} p + r_{1} d_{1} n_{j} p . \end{matrix}$

The second residual signal is $\begin{matrix} (16) & e_{2} p = e_{1} p - {IMF}_{2} p . \end{matrix}$

Similarly, by following the above procedure, the expression of the $l^{th}$ residual signal will be as follows: $\begin{matrix} (17) & e_{l} p = e_{l - 1} p - {IMF}_{l} p . \end{matrix}$

The expression of the $l + 1^{th}$ residual signal is $\begin{matrix} (18) & {IMF}_{l + 1} p = \frac{1}{P} \sum_{j = 1}^{P} d_{l} e_{l} p + r_{l} d_{l} n_{j} p . \end{matrix}$

The above procedure is repeated until the criterion is met. The expression of the original sequence, if the number of IMF components is M, is as follows: $\begin{matrix} (19) & x p = \sum_{i = 1}^{M} {IMF}_{i} p + e p, \end{matrix}$ where ${IMF}_{i} p$ is the $i^{th}$ IMF, $e p$ is the overall residual signal, and $x p$ is the signal obtained after decomposition. The decomposed IMFs are further used in the prediction stage.

3.3. Prediction Step (P-Step)

Multistep-ahead prediction is used to predict a sequence of values in time-series data. This approach is applied to predict a model step-by-step and use the predicted value of the current time step to determine its value in the next time step. We split the data into training and testing datasets. For the training dataset, 70% of the observations are used, and 30% are used for the testing dataset.

In the prediction stage, by using machine learning time series and stochastic methods, the denoised IMFs are used as input to predict the production data of mineral resources. For this, we used multistep-ahead forecasting strategy. The reason for using two different types of models in prediction is that the IMFs with high frequencies are predicted accurately through ML methods and do not provide accurate results for the IMFs having low frequencies. Stochastic models provide better outcomes for the prediction of the IMFs with low frequencies. These two types of models are used for the direct forecast of IMFs having high and low frequencies.

The models used for the prediction purpose are briefly described as follows.

3.3.1. Autoregressive Integrated Moving Average (ARIMA) Model

For the prediction of the IMF, the autoregressive moving average model is used as follows: $\begin{matrix} (20) & {IMF}_{k}^{i} = ψ_{1} {IMF}_{k - 1}^{i} + \dots + ψ_{p} {IMF}_{k - p}^{i} + ε_{k}^{i} + φ_{1} ε_{k - 1}^{i} + \dots + φ_{q} ε_{k - q}^{i}, \end{matrix}$ where ${IMF}_{k}^{i}$ shows the $i^{th}$ IMF and $ε_{p}^{i}$ shows $i^{th}$ residual obtained through CEEMDAN, and $p$ and $q$ are the lag values of autoregressive and moving average term. Sometimes the time-series data are nonstationary; in such a situation, the series can be made stationary by differencing to an appropriate degree. Then, in the case of differencing the series, the model is called $ARIMA p, d, q,$ where $d$ is a degree of difference that makes the series stationary.

3.3.2. Group Method of Data Handling (GMDH)-Type NN Model

GMDH is a type of unexplored neural network. The GMDH-NN models are established by considering the evolutionary method of modeling (GEvoM), which is a program that generates a polynomial type neural network for modeling the data. The input variables, hidden layers containing neurons, best model structure, and the number of layers are determined automatically in these networks. By considering the evaluation criteria, some of the neurons are chosen then the output of these selected neurons turns into the input of the next layer. For the selection of the neurons, the prediction means square criterion is considered using some transfer functions shown in Table 1.

Table 1

Transfer functions for GMDH-NN algorithms.

Transfer functions
Sigmoid function	$v = 1 / 1 + \exp^{- u}$
Tangent function	$v = \tan u$
Radial basis function	$v = \exp^{- u^{2}}$
Polynomial function	$v = u$

The procedure is repeated until the final layer. In the last segment, only one predicted neuron is considered. However, GMDH-NN selects the relation of only two variables and ignores the effect of an individual variable. The relationship between input and output variables is generally expressed through the Volterra functional series called Kolmogorov–Gabor polynomial [38]: $\begin{matrix} (21) & v = b_{0} + \sum_{k = 1}^{n} b_{k} u_{k} + \sum_{k = 1}^{n} \sum_{l = 1}^{n} b_{k l} u_{k} u_{l} + \sum_{k = 1}^{n} \sum_{l = 1}^{n} \sum_{m = 1}^{n} b_{k l m} u_{k} u_{l} u_{m} + \dots . \end{matrix}$

A refined form of GMDN-NN is the architecture group method of data handling (RGMDH-NN), which not only considers two variables but also considers them individually; the remaining procedure of RGMDH is the same as GMDH.

3.3.3. Radial Basis Function Neural Network (RBFNN)

As an ANN technique, RBFNN is used to predict the decomposed IMFs and components of residual. The reason for the selection of RBFNN is the simplicity of its structure and flexibility in selecting the number of neurons [39]. On the contrary to other feedforward neural networks, that RBFNN consists of one layer. Moreover, the RBFNN has a sound capability of approximation and fast convergence speed [40].

The structure of the RBFNN consists of two sections, one section is the nonlinear conversion of input (the first layer in Figure 2) to the hidden layer (the second layer in Figure 2), and the other part is the linear conversion of the hidden layer to the output layer which is the third layer described in Figure 2. Some of the expressions of RBF are shown in Table 2.

[figure omitted; refer to PDF]

Table 2

Transfer functions for RBFN algorithms.

Radial basis functions for RBFN algorithms
Power function	$ψ w = w^{c}, c = odd, w \in R$
Gaussian function	$ψ w = \exp - w^{2} / 2 c^{2}, c > 0, w \in R$
Square root function	$ψ w = \sqrt{w^{2} + c^{2}}, c > 0, w \in R$
Hyperbolic tangent function	$ψ w = 1 - e^{- 2 w} / 1 + e^{- 2 w}, w \in R$
Sigmoid function	$ψ w = {1 + e^{- w}}^{- 1}, w \in R$
Thin plate spline function	$ψ w = w^{2} \log w, w \in R$
Reciprocal square root function	$ψ w = {w^{2} + c^{2}}^{- 1 / 2}, c > 0, w \in R$

3.3.4. Mutistep-Ahead Prediction

Time-series prediction can be used for both single (one-step-ahead prediction) and multiple periods (multistep-ahead prediction). Multistep-ahead prediction has to deal with problems, such as accumulation error, uncertainty, and accuracy, unlike one-step-ahead prediction. However, accurate time-series prediction for long horizon has become challenging. A multistep-ahead time-series prediction consists of predicting the next H values of a time series consists of N observations, where the forecasting horizon is denoted by H > 1.

The DirREC strategy [41] combines the steps and the principles of direct and iterated strategies. The recursive strategy is used for the prediction of all IMFs using three different models, and at last, all the predicted IMFs are ensembled and predicted by direct approach. DirREC makes the prediction with different models for each horizon, and also, as the recursive approach, it increases the set of inputs by adding more variables related to the predictions of the previous step. However, the embedding size is different for all horizons.

4. Study Area and Experimental Design

4.1. Selection of Area for Study

Pakistan is blessed with numerous geological potential and has numerous reserves of minerals such as coal, copper, gold, and limestone, which are much useful for industrial development. However, we have not yet promoted growth and eliminated poverty in the country by utilizing our natural resources to the maximum level. The wealthiest province of Pakistan is Baluchistan, with approximately 80 to 85% minerals. The rest of the minerals, 10 to 15%, are present in KPK, Sindh, and Punjab. Despite Pakistan’s precious mineral resources and two continuous mineral policies, this sector contributes poorly to the country’s GDP. The reason may be insufficient assessments and monitoring, political instability, problems related to weather, shortage of foreign investments, and insecurity in the mineral-rich areas (https://www.pc.gov.pk/uploads/pub/FIRST_05_PAGES_STRATEGY_FOR_MINERAL_SECTOR_DEVELOPMENT_IN_PAKISTAN.pdf).

4.2. Description of Data

The observed data consist of the production of principle minerals resources of Pakistan, which are named natural gas, oil, coal, and gypsum measured in terms of metric tons. It consists of 168 monthly observations recorded from July 2005 to June 2019. The data are divided into the training dataset and testing data for observing the model performance. The dataset of data contains 118 observations from the month of July 2005 to April 2015, and the testing dataset contains 50 observations from May 2015 to June 2019. The training dataset consists of 80% observations of the observed series, and the testing dataset includes 20% observations of the observed series.

4.3. Comparison of the Proposed Hybrid Model with Other Models

Both suggested models are compared with other models used for prediction, with and without decomposed and denoised techniques. We named them 1-stage, 2-stage, and 3-stage models for our convenience, which we used for comparison purposes:

(i) 1-stage model: models without having denoising and decomposition techniques are selected in this stage, i.e., ARIMA. We called them 1-stage as used in [29].

(ii) 2-stage model: in 2-stage models, denoised techniques (EMD/WA) are selected for comparison, having noise removal capacity, i.e., EMD/WA-ARIMA, EMD/WA-RGMDH, and EMD/WA-RBFNN. For prediction purposes, different models are selected to compare the statistical model with the models based on artificial intelligence, i.e., RGMDH and RBFNN models. These 2-stage models are selected from [42] for comparison purpose.

(iii) 3-stage model: in these models, both denoised and decomposed strategies are accessed; that is, EMD-EEMD-MM is selected from [15] for the purpose of comparison. Multiple models are selected under 3-stage models for the prediction by keeping the same strategy as in the proposed model. A direct 48-step-ahead forecasting strategy is used in the prediction step. We used a multistep-ahead forecasting methodology. Three methods for prediction are used: one traditional statistical model, i.e., ARIMA (p, d, and q), and two machine learning methods, i.e., GMDH and RBFNN.

4.4. Accuracy Measure Techniques

The performance of the model can be achieved by measuring the closeness of the predicted values and the observed values for the test dataset. By using three evaluation measures, i.e., signal-to-noise ratio (SNR), mean relative error (MRE), mean absolute error (MAE), mean square error (MSE), and mean absolute percentage error (MAPE) [43], the prediction accuracy of the selected and proposed models is obtained. The following are their mathematical expressions: $\begin{matrix} (22) & MRE = \frac{1}{N} \frac{\sum_{p = 1}^{n} g p_{0} - g p_{pred}}{g p_{0}}, \\ MAE = \frac{1}{N} \sum_{p = 1}^{n} g p_{0} - g p_{pred}, \\ MSE = \frac{1}{N} \sum_{p = 1}^{n} {g p_{0} - g p_{pred}}^{2}, \\ MAPE = \frac{g p_{m 0} - g p_{m pred}}{g p_{m 0}} * 100, \end{matrix}$ where $g p_{pred}$ and $g p_{0}$ are the predicted and real data, respectively, and $n$ is the data size; also, $g p_{m 0}$ is the mean of the observed data, and $g p_{m pred}$ is the mean of the predicted values. MRE, MAE, and MSE measures the departure between the original values and predicted values. The neurons of neural networks (GMDH and RGMDH) are selected according to their MSEs.

5. Results and Discussion

In Figure 1, step-by-step procedure of the proposed method is described. In which firstly, in D-step, the original series is decomposed using EMD or WA. In the second step (Decompose-step), the denoised series is decomposed using the CEEMDAN technique. In the third step of Figure 1, the IMFs which are obtained through CEEMDAN are predicted using ARIMA, RGMDH, and RBFNN models. At the final stage, all the predicted IMFs are ensembled :

D-stage results: according to Figure 1, the original series of data is decomposed in the first step by using two noise removal filters; the results of denoising are described as follows:

Wavelet-based denoising: by following the ELT process shown in Figure 1 and using equations (1) and (2), the approximations are calculated on the data of mineral resources of Pakistan, and soft and hard thresholding are used to remove the noises from the coefficients of mineral production time-series data. Rules of soft and hard thresholding are calculated by using equations (3) and (4), respectively. Then, based on the lower value of MSE, denoised series, hard thresholds are reconstructed for wavelet analysis.

EMD-based denoising: for removing the noises from the production of mineral resource data using EMD, IMFs are calculated using equation (7). For denoising, these calculated IMFs except for the last two IMFs, soft and hard thresholding rules, are used as in WA by using equations (3) and (4), respectively. Since the last two IMFs are smooth and have low-frequency characteristics, that is why there is no need for denoising the last two IMFs. The denoised IMFs based on hard thresholding reconstruct noise-free mineral resource time-series data from equation (8). The WA- and EMD-based denoising are combined in Figure 3 for gas and oil. The statistical measures of the original series and denoised series of all four minerals, i.e., mean, standard deviation, MRE, MAE, and MSE for EMD and wavelet noise removal techniques, are shown in Table 3. According to these findings, it can be observed that EMD behaves better than WA.

According to statistical measures, the results show that WA and EMD behave differently for all four minerals. The mean and standard deviation of the original series, denoised series by EMD, and denoised by WA are almost the same in all minerals. However, for gas and coal production, the standard deviation becomes less by using WA. According to other statistical measures, i.e., MRE, MAE, and MSE, EMD performs better than WA, as, in all minerals, these measures have lower values than WA. Therefore, it is concluded that EMD and WA performed equally to denoise the mineral resources in the long run. In the decomposition stage, EMD and WA, both denoising techniques are used separately as input to get those characteristics that change in terms of varying frequencies, i.e., high frequencies and low frequencies.

Decompose-stage results: to get the local changing features for the time from denoised time-series data of mineral resources by WA/EMD, which are decomposed further into six IMF components and one residual term. The EMD and CEEMDAN decomposition methods are used for the extraction of IMFs from four minerals. The decomposition results of EMD-CEEMDAN techniques of gas and oil are presented in Figure 4, where all four minerals are decomposed into six IMF components, and one residual term is presented. The drawn-out IMFs represent mineral production time-series data characteristics where the starting IMFs show the higher frequency. In contrast, the last three IMFs represent lower frequency and residual shown as a trend.

The results of WA-CEEMDAN decomposition are shown in Figure 5. The extracted IMFs represent the attributes of production of mineral resource time-series data, where at the beginning, some IMFs have a higher frequency and then slowly frequency reduces till sixth IMF and residual shown as the trend in Figures 4 and 5. The amplitude’s value of the white noise is chosen as 0.2, and a maximum number of the ensemble members (1000) are selected. Almost all IMFs and residuals for all minerals show identical characteristics for EMD-CEEMDAN and WA-CEEMDAN decomposition methods.

P-step results: three methods are adopted to predict all extracted IMFs and for residuals to get precise results. For this purpose, one traditional and two other nonlinear methods are used, i.e., ARIMA (p, d, and q) as traditional statistical method and GMDH-NN and RBFNN as nonlinear methods, which are used for the prediction of IMFs and residual for all four-mineral production. The mineral resource data of four minerals are partitioned into 70% and 30% for the training dataset and testing dataset, respectively. The parameters of the model and its structure are estimated using 118 observations of mineral resources. Then, the suggested model and other models used for comparison purposes are tested in terms of their validity using 30% mineral resource data. After estimating multimodels for every IMF component and residual, the model with the least value of MRE, MAE, and MSE is considered as the most appropriate and selected for the prediction of each IMF. The findings for the training dataset of the suggested model and all other models in comparison to gas, oil, coal, and gypsum production are given in Table 4. The prediction results of suggested models EMD-CEEMDAN-MM and WA-CEEMDAN-MM illustrated the effectiveness of all four minerals with a minimum value of MRE, MAE, and MSE compared with 1-stage, 2-stage, and 3-stage evaluation models. However, the suggested model WA-CEEMDAN-MM acquires the lowest value of MSE than another proposed EMD-CEEMDAN-MM model. The model with the worst prediction is a 1-stage model, as shown in Table 4, with the maximum value of MRE, MAE, and MSE.

Here, the 1-stage model, i.e., the ARIMA model, attains the maximum value of MSE without applying the techniques of denoising and decomposition on the mineral production time-series data. The predicted graph of the suggested model EMD-CEEMDAN-MM in comparison with 2-stage, i.e., EMD-ARIMA and EMD-RGMDH models, is presented in Figure 6, and also, the predicted graph of WA-CEEMDAN-MM in comparison with 2-stage, i.e., WA-ARIMA and WA-RGMDH models, is presented in Figure 7. Here, in Table 4, the MSE of the RBFNN model is greater than all the MSEs of other models, so for comparison, we skip RBFNN in the graphical presentation of predicted results. Denoising, decomposition, and ensemble principles can be used to predict the mineral resources’ production. In Table 4, 2-stage, i.e., ARIMA and RGMDH models, performs better than the single-stage model, and from the 2-stage model, the WA-based model performs better than EMD-based models.

However, it is concluded from Table 4 that the suggested models perform well to predict the production of mineral resources by decreasing its intricacy and increasing the performance of prediction over 1-stage, 2-stage, and 3-stage models.

The prediction errors for the testing dataset are presented in Table 5. It can be concluded that our suggested model performed better than all other benchmark models, i.e., the error obtained through prediction using the testing dataset in comparison with the error of prediction using the training dataset. From Table 5, it can be examined that the performance of the WA-CEEMDAN-MM model with minimum values of MAD, MAPE, and MSE is better than all other models. The graphical presentation of suggested models by considering the testing dataset for only gas production is exhibited in Figure 8, based on the EMD technique. Figure 9 is based on the WA technique. From these figures, it can be observed that our suggested model performed better than all other existing models.

Overall comparison of the proposed model with denoised and decomposed models: in general, from Tables 4 and 5, it can be observed that removing the noises from the production data of mineral resources by using EMD and WA techniques provides better results as compared to single or model without denoising and decomposing. It can be observed that the value of MAE, MRE, and MSE from Tables 4 and 5 of all four minerals performs well for 2-stage models as compared to 1-stage models. It is noticed that the 1-stage ARIMA model predicted some of the IMFs having low frequencies, precisely but not for the IMFs having high frequencies as they contained more time-varying characteristics. Moreover, different statistical and machine learning models are used with these denoised series to predict mineral production and explore the performances of simple and complex models. These statistical and machine learning models can be seen from Tables 4 and 5, where RBFNN with EMD and WA performs worst than EMD- and WA-based ARIMA and RGMDH. This shows that one can use simple models for predicting mineral production compared to complex models such as RBFNN.

It is observed that the two-stage model performed on average 6.7% better than the 1-stage model, and the 3-stage existing model performed 76.3% better than the 1-stage model on average and 87.3% on average better than the two-stage model.

Our proposed model 3-stage EMD/WA-CEEMDAN-MM attained 29.4% on average less value of MSE compared to the 1-stage model, 49.2% less value compared to 2-stage, and 68.75% less value of MSE as compared to the existing 3-stage model, similarly, on average 10.87% less value of MRE as compared to 1-stage, 2.0% on average less value of MRE than the 2-stage model, and 75.5% on average than the 3-stage model. Also, on average, there is a 22.49%, 52.38%, and −46.63% decrease in MAE by using the proposed model as compared with 1-stage, 2-stage, and existing 3-stage models.

However, the performance of 3-stage models, i.e., EMD–CEEMDAN-MM and WA-CEEMDAN-MM, is better than 1-stage and 2-stage models as it reduces the complexity of mineral production data in many ways by combining denoising and decomposing techniques. The integrated features of denoising and decomposing of 3-stage models enhance the prediction accuracy of mineral production, as shown in Tables 4 and 5. Moreover, the WA-based hybrid models provided better accuracy of prediction as compared to EMD-based models.

[figure omitted; refer to PDF]

[figures omitted; refer to PDF]

[figure omitted; refer to PDF][figure omitted; refer to PDF]

Table 3

Statistical measures of WA- and EMD-based denoised production of four-mineral time-series datasets.

Mineral resources	Mode	$μ$	$σ$	MRE	MAE	MSE
Gas production	Original series	122355	5477.39
	EMD	122352.4	5452.545	64.56	0.00	5849.80
	WA	122355	3757.383	3258.05	0.03	14792170

Oil production	Original series	2341219	353450.2
	EMD	2341179	353338.3	441.92	0.00	266404.30
	WA	2341219	343459.1	62569.99	0.03	6181270027

Gypsum production	Original series	109677.6	56078.37
	EMD	109673.2	56031.31	178.88	0.00	43509.52
	WA	109675.9	54138.90	9578.65	0.10	143169479

Coal production	Original series	297596.6	73584.54
	EMD	297615.9	73382.25	315.16	0.00	144581.50
	WA	297596.6	58222.67	30897.95	0.12	1552598008

Table 4

The evaluation of the prediction error of the suggested model (EMD-CEEMDAN-MM and WA-CEEMDAN-MM) in comparison with other existing models for gas, oil, coal, and gypsum production.

Mineral production	Model name	Models	MRE	MAE	MSE
Gas production	1-S	ARIMA	5562.08	0.05	34688672
	2-S	WA-ARIMA	564.79	0.00	500307.2
		WA-RGMDH	1041.51	0.01	1621799
		WA-RBFN	22919.26	0.19	545273283
		EMD-ARIMA	3750.39	0.03	23240715
		EMD-RGMDH	3753.87	0.03	23340680
		EMD-RBFN	22916.16	0.19	556913324
	3-S	EMD-EEMD-MM	2020.75	0.02	7766789
		WA-CEEMDAN-MM	158.46	0.00	38472.38
		EMD-CEEMDAN-MM	1195.00	0.01	2336537

Oil production	1-S	ARIMA	86924.62	0.04	12933064462
	2-S	WA-ARIMA	32680.16	0.01	1778236730
		WA-RGMDH	24693.97	0.01	981106280
		WA-RBFN	441208.80	0.17	313742347809
		EMD-ARIMA	86826.35	0.04	12907664665
		EMD-RGMDH	84082.76	0.04	11935015062
		EMD-RBFN	445083.30	0.17	318704000000
	3-S	EMD-EEMD-MM	34349.94	0.01	2014443387
		WA-CEEMDAN-MM	5676.63	0.00	54755189
		EMD-CEEMDAN-MM	26569.21	0.01	1213733733

Coal production	1-S	ARIMA	47665.83	0.19	3554183458
	2-S	WA-ARIMA	4919.99	0.02	63677954
		WA-RGMDH	7587.04	0.03	109980128
		WA-RBFN	65501.76	0.20	6500657737
		EMD-ARIMA	47592.77	0.19	3543414179
		EMD-RGMDH	46715.48	0.19	3480224927
		EMD-RBFN	74976.31	0.25	8461474651
	3-S	EMD-EEMD-MM	14999.79	0.06	365101582
		WA-CEEMDAN-MM	2384.12	0.01	10215110
		EMD-CEEMDAN-MM	13761.37	0.05	328725693

Gypsum production	1-S	ARIMA	16095.92	0.16	494553674
	2-S	WA-ARIMA	10218.46	0.09	239738995
		WA-RGMDH	9318.04	0.08	217996433
		WA-RBFN	44669.82	0.42	3399773324
		EMD-ARIMA	16040.73	0.16	492041414
		EMD-RGMDH	16495.23	0.16	501503414
		EMD-RBFN	45538.21	0.43	3549328950
	3-S	EMD-EEMD-MM	6876.55	0.08	77558497
		WA-CEEMDAN-MM	2569.18	0.02	25152289
		EMD-CEEMDAN-MM	21073.19	0.22	773453648

Table 5

The evaluation of the prediction error of the suggested model (EMD-CEEMDAN-MM and WA-CEEMDAN-MM) in comparison with other models for all four minerals having the testing dataset.

Mineral production	Model name	Models	MRE	MAE	MSE
Gas production	1-S	ARIMA	3253.46	0.03	16505715
	2-S	WA-ARIMA	3357.08	0.03	14297822
		WA-RGMDH	85.12	0.00	10700.46
		WA-RBFN	85178.71	0.70	7484485277
		EMD-ARIMA	3243.62	0.03	16313393
		EMD-RGMDH	3564.16	0.03	17443358
		EMD-RBFN	85223.53	0.70	7507698054
	3-S	EMD-EEMD-MM	120692.20	1.00	14573026239
		WA-CEEMDAN-MM	55.78	0.00	4528.72
		EMD-CEEMDAN-MM	1121.05	0.01	1835955

Oil production	1-S	ARIMA	104425.90	0.04	16912634727
	2-S	WA-ARIMA	92291.30	0.03	11294925595
		WA-RGMDH	5925.51	0.00	64583522
		WA-RBFN	1919080.00	0.70	3802014000000
		EMD-ARIMA	104445.20	0.04	16895725118
		EMD-RGMDH	100369.30	0.04	15464333281
		EMD-RBFN	1919040.00	0.70	3814679000000
	3-S	EMD-EEMD-MM	2722506.00	1.00	7416098000000
		WA-CEEMDAN-MM	7888.74	0.00	147703705
		EMD-CEEMDAN-MM	34261.48	0.01	1730054305

Coal production	1-S	ARIMA	52706.14	0.17	3986455945
	2-S	WA-ARIMA	34790.81	0.11	1834667964
		WA-RGMDH	3399.37	0.01	17296854
		WA-RBFN	234617.90	0.69	61083423522
		EMD-ARIMA	52689.36	0.17	3984927942
		EMD-RGMDH	47981.98	0.16	3413398302
		EMD-RBFN	234083.70	0.68	63318268986
	3-S	EMD-EEMD-MM	332627.70	1.06	111602246835
		WA-CEEMDAN-MM	2207.27	0.01	9144544
		EMD-CEEMDAN-MM	20414.31	0.07	20414.31

Gypsum production	1-S	ARIMA	26046.03	0.14	1164947275
	2-S	WA-ARIMA	29409.92	0.16	1482365241
		WA-RGMDH	1007.13	0.00	1651773
		WA-RBFN	135000.60	0.70	19562632811
		EMD-ARIMA	26005.65	0.14	11618477
		EMD-RGMDH	27240.40	0.14	1228486486
		EMD-RBFN	134601.00	0.69	20165977385
	3-S	EMD-EEMD-MM	193557.50	1.02	38076758501
		WA-CEEMDAN-MM	1023.04	0.01	2727448
		EMD-CEEMDAN-MM	11205.51	0.06	213535838

6. Conclusion

For the optimal mineral supply and the purposes of mineral resources, an accurate prediction of mineral resources is necessary. Here, some of the data processing methods are utilized to increase the prediction accuracy of such stochastic type data by efficiently using decomposition techniques. By using the three strategies, denoising, decomposing, and ensemble, two hybrid models with 3-stages are suggested, EMD-CEEMDAN-MM and WA-CEEMDAN-MM, from which WA-CEEMDAN-MM performed better than the EMD-CEEMDAN-MM model for the decomposition of the nonstationary and nonlinear data of minerals. For evaluating the performance of both models, the production data of four minerals are used. In general, our suggested model performed well for all four minerals than the other 1-stage, 2-stage, and existing 3-stage models. For evaluation measures, three techniques are utilized, i.e., MAE, MSE, and MAPE. These 3-stage hybrid models can be used for the decomposition of any nonlinear and nonstationary data for prediction [44–47].

Acknowledgments

This work was supported by grants from the National Natural Science Foundation of China program (41801339). The authors are also thankful to the Deanship of Scientific Research at King Saud University, through research group no. RG-1437-027.

References

[1] C. Di, X. Yang, X. Wang, "A four-stage hybrid model for hydrological time series forecasting," PLoS One, vol. 9 no. 8,DOI: 10.1371/journal.pone.0104663, 2014.

[2] N. E. Huang, Z. Shen, S. R. Long, "The empirical mode decomposition and the Hubert spectrum for nonlinear and non-stationary time series analysis," Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences, vol. 454 no. 1971, pp. 903-995, DOI: 10.1098/rspa.1998.0193, 1998.

[3] J. Dybała, R. Zimroz, "Rolling bearing diagnosing method based on empirical mode decomposition of machine vibration signal," Applied Acoustics, vol. 77, pp. 195-203, DOI: 10.1016/j.apacoust.2013.09.001, 2014.

[4] Y. Kopsinis, S. McLaughlin, "Development of EMD-based denoising methods inspired by wavelet thresholding," IEEE Transactions on Signal Processing, vol. 57 no. 4, pp. 1351-1362, DOI: 10.1109/tsp.2009.2013885, 2009.

[5] S. Forecasting, U. Empirical, A. N. Networks, "Streamflow forecasting using empirical wavelet transform and artificial neural networks," Water, vol. 9 no. 6,DOI: 10.3390/w9060406, 2017.

[6] N. Bokde, A. Feijóo, K. Kulat, "Analysis of differencing and decomposition preprocessing methods for wind speed prediction," Applied Soft Computing, vol. 71, pp. 926-938, DOI: 10.1016/j.asoc.2018.07.041, 2018.

[7] M. Yang, Y. F. Sang, C. Liu, Z. Wang, "Discussion on the choice of decomposition level for wavelet based hydrological time series modeling," Water (Switzerland), vol. 8 no. 5,DOI: 10.3390/w8050197, 2016.

[8] T. Peng, J. Zhou, C. Zhang, W. Fu, "Streamflow forecasting using empirical wavelet transform and artificial neural networks," Water, vol. 9 no. 6, 2017.

[9] W. Chen, D. Zhang, Y. Chen, "Random noise reduction using a hybrid method based on ensemble empirical mode decomposition," Journal of Seismic Exploration, vol. 26 no. 3, pp. 227-249, 2017.

[10] Z. Islam, "Literature review on physically based hydrological modeling," 2011. Ph. D. thesis

[11] H. Liu, C. Chen, H. Q. Tian, Y. F. Li, "A hybrid model for wind speed prediction using empirical mode decomposition and artificial neural networks," Renewable Energy, vol. 48, pp. 545-556, DOI: 10.1016/j.renene.2012.06.012, 2012.

[12] D. Srinivasan, "Energy demand prediction using GMDH networks," Neurocomputing, vol. 72 no. 1–3, pp. 625-629, DOI: 10.1016/j.neucom.2008.08.006, 2008.

[13] S. Kim, H. Kim, "A new metric of absolute percentage error for intermittent demand forecasts," International Journal of Forecasting, vol. 32 no. 3, pp. 669-679, DOI: 10.1016/j.ijforecast.2015.12.003, 2016.

[14] X. J. Wu, G. C. Jiang, X. J. Wang, N. Fang, L. Zhao, Y. M. Ma, "Prediction of reservoir sensitivity using RBF neural network with trainable radial basis function," Neural Computing and Applications, vol. 22, pp. 947-953, DOI: 10.1007/s00521-011-0787-z, 2013.

[15] C. L. Wu, K. W. Chau, Y. S. Li, "Methods to improve neural network performance in daily flows prediction," Journal of Hydrology, vol. 372 no. 1–4, pp. 80-93, DOI: 10.1016/j.jhydrol.2009.03.038, 2009.

[16] D. Wang, Y. Liu, Z. Wu, H. Fu, Y. Shi, H. Guo, "Scenario analysis of natural gas consumption in China based on wavelet neural network optimized by particle swarm optimization algorithm," Energies, vol. 11 no. 4,DOI: 10.3390/en11040825, .

[17] T. Xiong, Y. Bao, Z. Hu, "Beyond one-step-ahead forecasting: evaluation of alternative multi-step-ahead forecasting models for crude oil prices," Energy Economics, vol. 40, pp. 405-415, DOI: 10.1016/j.eneco.2013.07.028, 2013.

[18] A. Sorjamaa, J. Hao, N. Reyhani, Y. Ji, A. Lendasse, "Methodology for long-term prediction of time series," Neurocomputing, vol. 70 no. 16–18, pp. 2861-2869, DOI: 10.1016/j.neucom.2006.06.015, 2007.

[19] M. E. Torres, M. A. Colominas, G. Schlotthauer, P. Flandrin, "A complete ensemble empirical mode decomposition with adaptive noise," Proceedings of the 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP),DOI: 10.1109/icassp.2011.5947265, .

[20] X. Mi, H. Liu, Y. Li, "Wind speed prediction model using singular spectrum analysis, empirical mode decomposition and convolutional support vector machine," Energy Conversion and Management, vol. 180, pp. 196-205, DOI: 10.1016/j.enconman.2018.11.006, 2019.

[21] W. Niu, Z. Feng, M. Zeng, "Forecasting reservoir monthly runoff via ensemble empirical mode decomposition and extreme learning machine optimized by an improved gravitational search algorithm," Applied Soft Computing, vol. 82,DOI: 10.1016/j.asoc.2019.105589, 2019.

[22] M. A. Farahani, M. T. V. Wylie, E. Castillo-Guerra, B. G. Colpitts, "Reduction in the number of averages required in BOTDA sensors using wavelet denoising techniques," Journal of Lightwave Technology, vol. 30 no. 8,DOI: 10.1109/jlt.2011.2168599, 2012.

[23] W. Liu, K. He, Q. Gao, C. Liu, "Application of EMD-based SVD and SVM to coal-gangue interface detection," Journal of Applied Mathematics, vol. 2014,DOI: 10.1155/2014/283606, 2014.

[24] A. R. Ghumman, Y. M. Ghazaw, A. R. Sohail, K. Watanabe, "Runoff forecasting by artificial neural network and conventional model," Alexandria Engineering Journal, vol. 50 no. 4, pp. 345-350, DOI: 10.1016/j.aej.2012.01.005, 2011.

[25] N. Jaitly, G. Hinton, "Learning a better representation of speech soundwaves using restricted Boltzmann machines," Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing,DOI: 10.1109/icassp.2011.5947700, .

[26] A. Mehrsai, H. R. Karimi, K. D. Thoben, B. Scholz-Reiter, "Application of learning pallets for real-time scheduling by the use of radial basis function network," Neurocomputing, vol. 101, pp. 82-93, DOI: 10.1016/j.neucom.2012.07.028, 2013.

[27] J. Kim, C. Y. Chun, B. H. Cho, "Comparative analysis of the DWT-based denoising technique selection in noise-riding DCV of the Li-Ion battery pack," Proceedings of the 9th International Conference on Power Electronics-ECCE Asia: “Green World with Power Electronics”, ICPE 2015-ECCE Asia,DOI: 10.1109/icpe.2015.7168185, .

[28] S. Saadat, I. Hussain, M. Faisal, "Modeling and forecasting of principal minerals production," Arabian Journal of Geosciences, vol. 14 no. 9,DOI: 10.1007/s12517-021-07135-x, 2021.

[29] H. M. Nazir, I. Hussain, M. Faisal, A. M. Shoukry, S. Gani, I. Ahmad, "Development of multidecomposition hybrid model for hydrological time series analysis," Complexity, vol. 2019,DOI: 10.1155/2019/2782715, 2019.

[30] S. Huang, J. Chang, Q. Huang, Y. Chen, "Monthly streamflow prediction using modified EMD-based support vector machine," Journal of Hydrology, vol. 511, pp. 764-775, DOI: 10.1016/j.jhydrol.2014.01.062, 2014.

[31] Y. Lei, Z. He, Y. Zi, "Application of the EEMD method to rotor fault diagnosis of rotating machinery," Mechanical Systems and Signal Processing, vol. 23 no. 4, pp. 1327-1338, DOI: 10.1016/j.ymssp.2008.11.005, 2009.

[32] M. Santhosh, C. Venkaiah, D. M. Vinod Kumar, "Ensemble empirical mode decomposition based adaptive wavelet neural network method for wind speed prediction," Energy Conversion and Management, vol. 168 no. May, pp. 482-493, DOI: 10.1016/j.enconman.2018.04.099, 2018.

[33] Z. Qu, K. Zhang, J. Wang, W. Zhang, W. Leng, "A hybrid model based on ensemble empirical mode decomposition and fruit fly optimization algorithm for wind speed forecasting," Advances in Meteorology, vol. 2016,DOI: 10.1155/2016/3768242, 2016.

[34] K. L. Chong, S. H. Lai, A. El-Shafie, "Wavelet transform based method for river stream flow time series frequency analysis and assessment in tropical environment," Water Resources Management, vol. 33 no. 6, pp. 2015-2032, DOI: 10.1007/s11269-019-02226-7, 2019.

[35] S. Xu, R. Niu, "Computers and geosciences displacement prediction of Baijiabao landslide based on empirical mode decomposition and long short-term memory neural network in Three Gorges area, China," Computers and Geosciences, vol. 111, pp. 87-96, DOI: 10.1016/j.cageo.2017.10.013, 2018.

[36] J. Zhang, R. Jiang, B. Li, N. Xu, "Computers and geosciences an automatic recognition method of microseismic signals based on EEMD-SVD and ELM," Computers and Geosciences, vol. 133,DOI: 10.1016/j.cageo.2019.104318, 2019.

[37] H. Liu, C. Chen, H. Q. Tian, Y. F. Li, "A hybrid model for wind speed prediction using empirical mode decomposition and artificial neural networks," Renewable Energy, vol. 48, pp. 545-556, DOI: 10.1016/j.renene.2012.06.012, 2012b.

[38] X. Wei, R. Lin, S. Liu, C. Zhang, "Improved EEMD denoising method based on singular value decomposition for the chaotic signal," Shock and Vibration, vol. 2016,DOI: 10.1155/2016/7641027, 2016.

[39] S. Dai, D. Niu, Y. Li, "Daily peak load forecasting based on complete adaptive noise and support vector machine optimized by modified grey wolf," Energies, vol. 11 no. 1,DOI: 10.3390/en11010163, 2018.

[40] Z. Guo, W. Zhao, H. Lu, J. Wang, "Multi-step forecasting for wind speed using a modified EMD-based artificial neural network model," Renewable Energy, vol. 37 no. 1,DOI: 10.1016/j.renene.2011.06.023, 2012.

[41] A. Grossmann, J. Morlet, "Decomposmon of hardy functions into square integrable wavelets of constant shape," Fundamental Papers in Wavelet Theory, pp. 126-139, DOI: 10.1515/9781400827268.126, 2009.

[42] Z. Qian, Y. Pei, H. Zareipour, N. Chen, "A review and discussion of decomposition-based hybrid models for wind energy forecasting applications," Applied Energy, vol. 235 no. November 2018, pp. 939-953, DOI: 10.1016/j.apenergy.2018.10.080, 2019.

[43] H. Liu, H. Q. Tian, C. Chen, Y. Fei, "A hybrid statistical method to predict wind speed and wind power," Renewable Energy, vol. 35 no. 8, pp. 1857-1861, DOI: 10.1016/j.renene.2009.12.011, 2010.

[44] A. K. Tiwari, Z. Mukherjee, R. Gupta, M. Balcilar, "A wavelet analysis of the relationship between oil and natural gas prices," Resources Policy, vol. 60 no. December 2018, pp. 118-124, DOI: 10.1016/j.resourpol.2018.11.020, 2019.

[45] Z. Y. Wang, J. Qiu, F. F. Li, "Hybrid models combining EMD/EEMD and ARIMA for long-term streamflow forecasting," Water (Switzerland), vol. 10 no. 7,DOI: 10.3390/w10070853, 2018.

[46] X. Jiang, L. Zhang, X. Chen, "Short-term forecasting of high-speed rail demand: a hybrid approach combining ensemble empirical mode decomposition and gray support vector machine with real-world applications in China," Transportation Research Part C: Emerging Technologies, vol. 44, pp. 110-127, DOI: 10.1016/j.trc.2014.03.016, 2014.

[47] S. Yannis Kopsinis, "Empirical mode decomposition based denoising techniques," Proceedings of the IAPR Workshop on Cognitive Information Processing, .

Word count: 8852

Show less

Copyright © 2021 Maria Qurban et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Translate

Accurate estimation of the mining process is vital for the optimal allocation of mineral resources. The development of any country is precisely connected with the management of mineral resources. Therefore, the forecasting of mineral resources contributed much to management, planning, and a maximum allocation of mineral resources. However, it is challenging because of its multiscale variability, nonlinearity, nonstationarity, and high irregularity. In this paper, we proposed two revised hybrid methods to address these issues to predict mineral resources. Our methods are based on denoising, decomposition, prediction, and ensemble principles that are applied to the production of mineral resource time-series data. The performance of the proposed methods is compared with the existing traditional one-stage model (without denoised and decomposition strategies) and two-stage hybrid models (based on denoised strategy), and three-stage hybrid models (with denoised and decomposition strategies). The performance of these methods is evaluated using mean relative error (MRE), mean absolute error (MAE), and mean square error (MSE) as evaluation measures for the production of four principle mineral resources of Pakistan. It is concluded that the proposed framework for the prediction of mineral resources indicated better performance as compared to other existing one-stage, two-stage, and three-stage models. Furthermore, the prediction accuracy of the revised hybrid model is improved by reducing the complexity of the production of mineral resource time-series data.

Details

Title

Development of Hybrid Methods for Prediction of Principal Mineral Resources

Author

Qurban, Maria¹; Zhang, Xiang²

; Nazir, Hafiza Mamona¹; Hussain, Ijaz¹

; Muhammad Faisal³; Elsayed Elsherbini Elashkar⁴; Jameel Ahmad Khader⁵; Soudagar, Sadaf Shamshoddin⁶; Shoukry, Alaa Mohamd⁷

; Fares Fawzi Al-Deek⁸

¹ Department of Statistics, Quaid-i-Azam University, Islamabad, Pakistan
² National Engineering Research Center of Geographic Information System, School of Geography and Information Engineering, China University of Geosciences (Wuhan), Wuhan 430074, China
³ Faculty of Health Studies, University of Bradford, Bradford, BD7 1DP, UK
⁴ Administrative Sciences Department, Community College, King Saud University, Riyadh, Saudi Arabia
⁵ College of Business Administration, King Saud University Muzahimiyah, Al-Muzahmiya, Saudi Arabia
⁶ College of Business Administration, King Saud University Riyadh, Riyadh, Saudi Arabia
⁷ Arriyadh Community College, King Saud University, Riyadh, Saudi Arabia; KSA Workers University, Nsar, Egypt
⁸ Administrative Sciences Department, Arriyadh Community College, King Saud University, Riyadh, Saudi Arabia

Editor

Yuxing Li

Publication year

2021

Publication date

2021

Publisher

John Wiley & Sons, Inc.

ISSN

1024123X

e-ISSN

15635147

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2021/6362660

ProQuest document ID

2563363323

Development of Hybrid Methods for Prediction of Principal Mineral Resources

Jump to:

Full text

Abstract

Details

Suggested sources