Full text

Turn on search term navigation

1. Introduction

The hydrological streamflow forecast aims to predict the future hydrological conditions based on the existing information. Accurate and robust hydrological streamflow forecasting is of great importance for many water resource systems [1]. Knowing the streamflow information in the future period in advance can greatly reduce the loss from flood disasters and improve the efficiency of reservoir operation and water resource utilization [2]. However, the formation process of hydrological runoff is impacted by many factors such as rainfall, evaporation, underlying surface conditions, and human activities. Runoff time series usually have characteristics such as nonlinearity, non-stationarity, and uneven spatiotemporal distribution. Therefore, accurate runoff prediction is tough and challenging.

Generally, hydrological forecasts can be divided into medium- and long-term forecasts and short-term forecasts based on their time scales. Among them, medium- and long-term runoff forecasting mainly provides prerequisite support for the development of medium- and long-term profit scheduling and comprehensive water resource allocation plans for reservoirs, while short-term streamflow forecasting is particularly important because it has high reliability and can guide the actual dispatch plan [3].

In addition to time-scale classification, other important classifications of hydrological forecasting are physically based models, conceptual models, and empirical models [4]. Unlike the physically based models and conceptual models, empirical models, also known as data-driven models, only pay attention to the information of historical data rather than the hydrological processes of water movement. Therefore, such models have obvious advantages for streamflow prediction in areas where data are scarce. Many researchers have done a lot of research during the last decades, using models such as Linear Regression (LR), Holt–Winters model, Vector Auto-regression (VAR), Autoregressive Integrated Moving Average model (ARIMA) [5], Random Forest (RF), Artificial Neural Network (ANN) [6,7], Gradient Boosting Machine (GBM), Gaussian Process Regression (GPR) [8], extreme learning machine (ELM) [9], and so on [10]. However, these classical models are prone to falling into local optimums and are sensitive to parameter selection. At the same time, due to the strong randomness and non-stationarity of runoff data, as well as the large amount of runoff data, traditional methods have long training cycles, complex operations, and low prediction accuracy in practical applications. With the emergence of artificial intelligence algorithms, some new methods and hybrid methods have been applied to runoff prediction and achieved good results. Xie [11] proposed a non-stationary daily runoff series prediction model based on deep confidence networks by utilizing variational patterns for signal decomposition and optimizing hyperparameters using an improved particle swarm optimization algorithm. Wang [12] proposed an ensemble learning model based on least squares support vector regression (LSSVR) using seasonal decomposition (SD) for predicting water and electricity consumption. Adnan [13] examine the prediction and estimation capability of an optimally pruned extreme learning machine (OP-ELM) model for the daily streamflows of the Shehang and Fujiangqiao stations at the Fujiang River. Yaseen [14] investigated an enhanced version of the extreme learning machine (EELM) model in river flow forecasting applied in a tropical environment. Rezaie-Balf [15] developed a hybrid model with M5 model tree (M5Tree) and multivariate adaptive regression spline (MARS) to forecast one- and multi-day-ahead river flow. In order to anticipate reservoir inflow, Nanda [16] devised a novel method combining the Variable Infiltration Capacity (VIC) land-surface model with an external Wavelet-based Non-linear autoregressive with exogenous inputs (WNARX)based error-updating model. Although decomposition and integration techniques can be used to simplify complex data and extract data features, a single decomposition technique cannot completely eliminate the randomness and irregularity in a time series. The generated components have dynamic complexity and irregular frequency ranges, which pose difficulties for the prediction model. Therefore, the quadratic decomposition technique is applied to prediction models. In order to improve multi-step outcomes, Wen [17] proposed a data-driven method for designing a two-phase hybrid model that combines variational mode decomposition (VMD) methods with full ensemble empirical mode decomposition with adaptive noise (CEEMDAN). Shu [18] proposed a hybrid prediction model based on Support Vector Machine (SVM) using two-stage decomposition techniques. The results indicate that the two-stage decomposition greatly improves the predictive ability. Although the use of quadratic decomposition technology can to some extent solve the above problems and reduce difficulty, some components may still be unstable and highly complex after quadratic decomposition, and the number and components of decomposition are determined in advance and cannot be dynamically adjusted according to features, resulting in poor transferability and a decrease in prediction accuracy. Therefore, it is necessary to improve the existing decomposition reconstruction mode by dynamically adjusting the number and components of decomposition based on the features. At the same time, some deep learning methods with simpler structures, fewer parameters, shorter training time, and better predictive performance have been proposed, including GRU, TCN, and so on. He [19] utilized seasonal decomposition (SD) for signal decomposition and proposed a daily runoff prediction method based on the deep gated recursive unit (GRU) method. Zhang [20] proposed an innovative short-term runoff prediction framework based on Ensemble Attention (EA) and a time convolutional network (TCN). As the foundation of this framework, TCN’s efficient architecture is characterized by shared parameters and parallel computing, significantly improving computational efficiency. TCN and other methods have also been proven to have great potential for application in runoff prediction [21].

In summary, in order to explore better hybrid application models of time–frequency dynamic decomposition technology and deep learning methods in runoff forecasting, we have developed a new runoff forecasting framework and proposed a runoff forecasting model within this framework. The contributions of this study include the following:

(1). A runoff prediction framework based on dynamic decomposition reconstruction integration processing (DDRI) is proposed by cleverly combining reconstruction integration technology and dynamic decomposition technology. On this basis, a monthly runoff prediction model based on TCN, attention mechanism, CEEMDAN, VMD, and TPE optimization algorithms is proposed.
(2). This study integrates dynamic decomposition methods and filtered decomposition reconstruction processes, fully utilizing dynamic decomposition techniques, complexity analysis, reconstruction techniques, and neural networks optimized by automatic hyperparameter optimization algorithms, effectively improving the interpretability and prediction accuracy of the model.
(3). The application of this model to monthly runoff forecasting at the Pingshan Hydrological Station and Yichang Hydrological Station in the upper reaches of the Yangtze River verified the reliability and predictive performance of the forecasting method.
(4). On this basis, the combination principles and evaluation indicators of eight prediction models, including LSTM, TCN, CEEMDAN-LSTM, CEEMDAN-TCN-Attention, CEEMDAN-VMD-LSTM, CEEMDAN-VMD-TCN, CEEMDAN-VMD-LSTM-Attention (DDRI), and CEEMDAN-WPD-TCN-Attention (DDRI), are combined to further demonstrate the rationality and superiority of the hybrid prediction model.

The remaining part of this article is organized as follows: Section 2 briefly introduces the research method used in this article, and proposes a detailed framework for runoff prediction and a monthly runoff prediction model. Section 3 contains several selected predictive performance evaluation metrics. Section 4 proposes a case study method and comparative method for monthly runoff prediction, selecting Yichang Hydrological Station and Pingshan Hydrological Station in the upper reaches of the Yangtze River as research objects to compare and analyze the predictive performance of the proposed model and the comparative model. Section 5 conducts result analysis and discussion, and the conclusion is presented in Section 6. The abbreviation descriptions of this article are shown in Abbreviation Section.

2. The Model and Methodology

2.1. Temporal Convolutional Network (TCN)

Temporal convolution networks (TCNs) were first proposed by Shaojie Bai in 2018. This algorithm introduces causal convolution and dilative convolution, avoiding the problem of information leakage that exists in traditional algorithms. In addition, it can effectively avoid many problems such as increased training complexity, vanishing gradients, and inability to achieve the expected fitting effect due to the continuous increase of model convolutional layers [22].

The temporal convolution network (TCN) algorithm is composed of causal and extended 1-D convolution layers with the same lengths of input and output, which effectively combines neural networks (CNNs) [23] with deep layers of stackable network, and recurrent neural networks (RNNs) [24] with parallel convolution computation. The advantage of memorizing the previous and future information makes it possible to guarantee the training requirements of both high speed and high quality in the processing of sequence information.

Causal convolution: The causal convolution layer of a time convolutional network algorithm can be directly represented by the following figure. The value at time T of the previous layer only depends on the value at time T of the next layer and its preceding values. Therefore, compared to traditional convolutional neural networks, causal convolution is a strict time-constrained model that cannot see future data, illustrated in Figure 1.

Dilative convolution: Dilative convolution is a convolution operation based on causal convolution, which expands the sampling interval of the convolution kernel, making the output feature map have a larger receptive field and stronger expression ability. Compared to traditional convolutional neural networks, the modeling time of simple causal convolution is still limited by the size of the convolutional kernel. If we want to grasp a longer dependence, we must stack more convolution layers linearly. However, as the convolutional layers continue to increase, problems such as increased training complexity, vanishing gradients, and poor fitting performance also emerge, as shown in the following figure.

Different from the traditional convolution, the expanded convolution allows the input of the convolution to have interval sampling, and the sampling rate is controlled by D in the graph. d = 1 in the bottom layer indicates that every point is sampled during input, and d = 2 in the middle layer indicates that every two points are sampled as input. In general, the higher the level, the larger the size of d used. Therefore, expansion convolution makes the size of the effective window increase exponentially with the number of layers. In this way, the convolution network can obtain a large receptive field with fewer layers, illustrated in Figure 2.

Residual connection: Due to its ability to maintain consistency in information flow, residual connection enables the network to transmit information in a cross-layer manner and helps the network learn identity mapping, reducing network degradation. Therefore, residual connection has been proven to be an effective method for training deep networks [25]. A residual block is constructed to replace the convolution of one layer. As shown in the figure below, a ReLU nonlinear activation function and dropout regularization are added to each layer of the convolutional network. This not only helps the network converge faster during training, but also avoids the problem of gradient vanishing. By using residual connections, the impact of local gradient vanishing and exploding in deep neural networks on model performance is also to some extent eliminated, illustrated in Figure 3.

The specific process of the time convolution network algorithm is described as follows:

(1). Assuming that the given sequence $x_{0}, x_{1}, \dots, x_{T}$ is the output $y_{t}$ of prediction $t$ , only the input $x_{0}, x_{1}, \dots, x_{T}$ before $t$ can be used for prediction. It can be expressed by the formula $\hat{y_{0}}, \hat{y_{1}}, \dots, \hat{y_{T}} = f (x_{0}, x_{1}, \dots, x_{T})$ , where $y_{t}$ is only related to the input sequence $x_{0}, x_{1}, \dots, x_{T}$ and has nothing to do with any “future” input $x_{t + 1}, \dots, x_{T}$ . The task of sequence modeling is to build a model $f$ , whose goal is to minimize the error loss $L (y_{0}, y_{1}, \dots, y_{T}), f (x_{0}, x_{1}, \dots, x_{T})$ between the actual output value and the predicted value, and its error loss function is set according to different situations.
$p (x | h) = \prod_{t = 0}^{N = 1} p (x_{t + 1} | x_{0}, x_{1}, \dots, x_{t}, h)$
(2). The extended convolution is realized by the above formula.
(3). The calculation output value of the residual block is $o$ :
(1) $o = A c t i v a t i o n (x + F (x))$

2.2. Attention-Based Temporal Convolutional Network (TCN-Attention)

The attention mechanism is a technique which is used to enhance the attention of deep learning networks to important parts of the input data. It reduces the model’s focus on irrelevant information through weight allocation, thus paying more attention to important information [26]. The formula for calculating the weight value of attention is as follows:

(2) $\{\begin{cases} Q = ω_{q} X + b_{q} \\ K = ω_{k} X + b_{k} \\ V = ω_{v} X + b_{v} \end{cases}$

(3) $A c t i v a t i o n (Q, K, V) = S o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V$

Among them, $Q$ represents the query vector, $K$ represents the key vector, and $V$ represents the value vector; $ω_{q}, b_{q}, ω_{k}, b_{k}, ω_{v}, b_{v}$ represent the weight and bias respectively; and $d_{k}$ represents the dimension of the $Q, k$ matrix.

TCN-Attention is a deep learning framework that combines a Temporary Convolutional Network (TCN) with an attention mechanism, which is used to process time series data. It combines the local feature capture capability of TCN and the global feature capture capability of an attention mechanism. This structure can be adjusted and optimized according to specific tasks and datasets. It allows the network to focus on local patterns while effectively identifying long-term dependencies and global patterns in time series data during the learning process, thereby improving the performance of time series modeling, illustrated in Figure 4.

2.3. Tree-Structured Parzen Estimator (TPE)

TPE is a tree-based Bayesian optimization algorithm which is used to find the optimal parameter configuration in a large number of parameter spaces. Compared with traditional Bayesian algorithms, the TPE algorithm can adaptively iterate and optimize the probability density space of parameter groups while maintaining hyperparameter dependency, and adjust the size of the parameter search space, ultimately finding the optimal solution in as few iterations as possible [27]. The TPE algorithm divides the optimization problem into tree structures independent of parameters, and then seeks the optimal solution in these parameter spaces. In addition, the Parzen estimator of the TPE algorithm can estimate the probability density of each component in the observation space based on the current observation value and prior distribution type [28].

All hyperparameters are defined in the parameter space that needs to be optimized as $X = \{x^{(1)}, x^{(2)}, \dots, x^{(k)}\}$ . After iteratively optimizing the model hyperparameters, a set of optimal hyperparameters is obtained to minimize the objective loss function $f (x^{(i)}), (i = 1, 2, \dots, k)$ .

Two density functions in TPE are used to define $p (x | y)$ :

(4) $p (x | y) = \{\begin{cases} l (x) & if y < y^{*} \\ g (x) & if y \geq y^{*} \end{cases}$

Among them, $l (x)$ refers to the estimated density value where $f (x^{(i)})$ is less than the loss function $y^{*}$ in the observation space, which is also known as the good group probability; and $g (x)$ refers to the estimated density value where $f (x^{(i)})$ is greater than or equal to the loss function $y^{*}$ in the observation space, which is also known as the bad group probability. The final selection of optimal hyperparameters often does not rely on the optimal observation value, but on the probability distribution of the optimal observation value.

The function of the Parzen estimator is to generate a probability density model around $k$ configuration space observations. The probability density estimation formula of the Parzen estimator for $x$ in each input configuration space and the output loss function $y$ is as follows:

(5) $p (x) = γ l (x) + (1 - γ) g (x)$

In search strategies, uncertainty optimization problems often use the expected improvement (EI) algorithm as the collection function [29]. The EI algorithm is a heuristic algorithm based on greedy improvement, which can calculate the sampling point with the best expected improvement degree within a given sampling point. Due to the inability to obtain $p (y | x)$ , we use the Bayesian formula for the following conversion:

(6) $E I_{y^{*}} (x) = \int_{- \infty}^{y^{*}} (y^{*} - y) p (y | x) d y = \int_{- \infty}^{y^{*}} (y^{*} - y) \frac{p (x | y) p (y)}{p (x)} d y$

Among them, $y^{*}$ represents the threshold. We innovate the formula $γ = p (y < y^{*})$ . It represents a certain quantile of the TPE algorithm, which is used to divide $l (x)$ and $g (x)$ , with a range of $(0, 1)$ . The EI expression is acquired as follows:

(7) $E I_{y^{*}} (x) = \frac{γ y^{*} l (x) - l (x) \int_{- \infty}^{y^{*}} p (y) d y}{γ l (x) + (1 - γ) g (x)} \propto {(γ + \frac{g (x)}{l (x)} (1 - γ))}^{- 1}$

Equation (7) indicates that in order to maximize EI, it is necessary to obtain the maximum good group probability and the minimum bad group probability during the iteration process. In the actual calculation process, iterative calculation will return the maximum $E I_{y^{*}} (x)$ and the optimal parameter $y^{*}$ .

2.4. Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (CEEMDAN)

The decomposition model of time series prediction is improved from the empirical mode decomposition model (EMD) to EEMD, CEEMD, CEEMDAN, etc. CEEMDAN decomposes complex signals into finite IMFs and residuals [30]. Adaptive noise complete empirical mode complex signals (CEEMDAN) is an algorithm developed from the empirical mode decomposition method and ensemble empirical mode decomposition method [31]. At the same time, the adaptive white noise is added to the IMF based on ensemble empirical mode decomposition. In each stage, the residual signal is calculated to obtain the IMF component of each mode, which effectively solves the problem of noise transfer from high frequency to low frequency in the ensemble empirical mode decomposition method, and ensures the integrity of the decomposition process.

Letting $E_{k} (\cdot)$ be the operator of the $k$ -order modal component obtained by EMD decomposition, $ε_{0} ν^{i} (t) (i = 1, \dots, I)$ be the noise component, $\tilde{I M F_{k}}$ be the $k$ -order modal component obtained by CEEMDAN decomposition, and $ε_{i - 1}$ be the adaptive coefficient when solving $\tilde{I M F_{k}} (t)$ , then the specific steps of the CEEMDAN algorithm are as follows:

(1). In the adaptive noise complete empirical mode decomposition (CEEMDAN) algorithm, EMD is used to decompose the noisy signal $s (t) + ε_{0} ν^{i} (t) (i = 1, \dots, I)$ for $I$ times. The first order modal components are obtained as follows:
(8) $\tilde{I M F_{1}} (t) = \frac{1}{I} \sum_{i = 1}^{I} I M F_{1}^{i} (t) = \bar{I M F_{1}} (t)$
(2). The first unique residual signal $r_{1} (t)$ of the adaptive noise complete empirical mode decomposition (CEEMDAN) is calculated, that is, $k = 1$ , and the residual signal is as follows:
(9) $r_{1} (t) = s (t) - \tilde{I M F_{1}} (t)$
(3). As in step 1, the noise component $ε_{1} E_{1} (ν^{i} (t))$ is added to the residual signal $r_{1} (t)$ , and $i$ experiments are carried out. Each time, the $r_{1} (t) + ε_{1} E_{1} (ν^{i} (t))$ signal is decomposed until the first EMD modal component is successfully obtained. Then, the second modal component is as follows:
(10) $\tilde{I M F_{2}} (t) = \frac{1}{I} \sum_{i = 1}^{I} E_{1} (r_{1} (t) + ε_{1} E_{1} (ν^{i} (t)))$
(4). By analogy, when calculating the $k$ residual signal, it is consistent with step 3, and then the $k + 1$ modal component can be expressed as follows:
(11) $r_{k} (t) = r_{k - 1} (t) - \tilde{I M F_{k}} (t)$

(12) $\tilde{I M F_{(k + 1)}} (t) = \frac{1}{I} \sum_{i = 1}^{I} E_{1} (r_{k} (t) + ε_{k} E_{k} (ν^{i} (t)))$
(5). Step 4 is executed until the decomposed residual signal is no longer decomposed, and the decomposition termination condition is that the number of extreme points of the residual signal is no more than 2. Assuming that the total number of the final decomposed modal components is $k$ , the final residual signal can be expressed as follows:
(13) $R (t) = s (t) - \sum_{k = 1}^{k} \tilde{I M F_{k}} (t)$

Therefore, the original signal $s (t)$ is finally decomposed into the following:

(14) $s (t) = R (t) + \sum_{k = 1}^{k} \tilde{I M F_{k}} (t)$

According to the above process, it can be seen that the whole process of adaptive noise complete empirical mode decomposition is complete, which can accurately reconstruct the original signal [32].

2.5. The Proposed Model CEEMDAN-VMD-TCN-Attention-TPE Based on DDRI Framework

Under the influence of various complex factors such as climate change and human activities, hydrological time series are usually nonlinear and non-stationary [33]. In order to better explore the application of time series decomposition technology in runoff forecasting and effectively improve the accuracy of runoff forecasting, we cleverly combine reconstruction integration technology and dynamic decomposition technology to propose a Temporal Convolutional Network Fusion Attention Mechanism Runoff Prediction method based on dynamic decomposition reconstruction integration processing. The proposed model is made up of six main stages, illustrated in Figure 5.

The detailed steps are as follows:

Step 1: Data pre-processing. Raw data of the runoff series were obtained and the obtained raw data preprocessed. The augmented Dickey Fuller test (ADF test) is used to test the stationarity of the data. If the null hypothesis is rejected, it means that the time series data is non-stationary, and differential processing is used to make the data stationary.

(15) $Δ x_{j} = x_{i + 1} - x_{i} (i = 0, 1, 2 \dots, N; j = 0, 1, 2 \dots, N - 1)$

In the formula, $x_{i}$ and $x_{i + 1}$ represent the runoff data at time $i$ and time $i + 1$ , respectively. $Δ x_{j}$ represents the runoff data at time $j$ after the first-order difference.

At the same time, in order to eliminate the influence of dimensionality or numerical values on the calculation results, the data are normalized to the interval $[0, 1]$ .

(16) $X_{i} = \frac{x_{i} - x_{\min}}{x_{\max} - x_{\min}}$

In the formula, $X_{i}$ represents the normalized result, and $x_{i}$ represents the runoff data at time $i$ . $x_{\min}$ and $x_{\max}$ represent the minimum and maximum values of the runoff sequence, respectively.

Step 2: Dataset partitioning. The whole runoff series data were divided into three datasets, the training set, validation set, and test set, and the ratio of each dataset was 80:10:10. The training set is used for iteratively training the proposed mixed model weight parameters, the validation set is used for improving and judging hyperparameters during the TPE iterative calculation process, and the test set is used for evaluating the final model performance.

Step 3: Sequence decomposition. CEEMDAN is introduced to decompose the pre-processed runoff series data into $k$ sub-modes for the first time. Among them, the modal number $k$ is an important parameter in CEEMDAN, which needs to be given in advance. In this chapter, $k$ is optimized as a hyperparameter using the TPE algorithm. A TCN-Attention runoff forecasting model was established for each IMF component, and the TCN-Attention model was optimized and trained.

Step 4: Model training and hyperparameter optimization. Some open-source software libraries, such as Tensorflow 2.8.0 and Keras 2.15.0, were used to build and train the Temporal Convolutional Network Fusion Attention Mechanism Runoff Prediction Model Based on dynamic decomposition reconstruction integration processing. The TPE optimization module is used to continuously iterate, update $p (x | y)$ , continue to calculate to maximize EI, then get new hyperparameters and repeat this step, and calculate the model to return the evaluation parameters. The model that achieves the maximum benefit after the iteration is complete. Selecting and outputting the corresponding optimized hyperparameters represents the completion of hyperparameter tuning for the model.

Step 5: Low precision component dynamic decomposition. After using the optimized model to predict each component, the prediction accuracy of the corresponding component is obtained. The low precision components that need to be decomposed again are selected based on the performance evaluation of the decomposed components on the validation set. The RMSE predicted by the low precision components that need to be decomposed again should be greater than or equal to the average RMSE predicted by all the decomposed components. All low precision components that need to be decomposed again are aggregated into high complexity and low complexity using the permutation entropy, PE. After using permutation entropy to classify and aggregate the decomposed components, the VMD method is further used for secondary decomposition to generate new components.

Step 6: Forecasting application. The obtained secondary components were input into the training and the optimized TCN-Attention model for prediction. Then, we combined the training set and validation set to train the model proposed in this chapter. If the NRMSE is more than 10%, the component will be decomposed again, and the consistent components will be retained. Finally, the prediction results of all components are accumulated and the final prediction result is obtained through inverse normalization.

3. Assessment Metric

In this study, four evaluation indicators are discussed for assessing prediction performance in streamflow forecasting, namely the Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), and Nash–Sutcliffe Efficiency coefficient (NSE) [34,35]. The Mean Absolute Error (MAE) represents the arithmetic mean of the absolute discrepancies between the forecasted and the actual values, serving as a metric to quantify the average magnitude of the divergence in the model’s predictions. The Root Mean Square Error (RMSE) is used to measure the degree of deviation between the predicted values and actual values. The Mean Absolute Percentage Error (MAPE) is used to measure the average percentage of relative error between the predicted and actual values. The Nash–Sutcliffe Efficiency coefficient (NSE) is used to evaluate the degree of fit between the model predictions and actual values. In general, prediction models with larger NSE values or smaller RMSE, NRMSE, and MAPE values have better prediction performance. The formulas for MAE, RMSE, MAPE, and NSE are as follows:

(17) $M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|$

(18) $R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}$

(19) $M A P E (%) = \frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - {\hat{y}}_{i}|}{y_{i}}$

(20) $N S E = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - {\bar{y}}_{i})}^{2}}$

where

n

is the length of the observed runoff data series,

y_{i}

represents the observed values,

{\hat{y}}_{i}

stands for the prediction runoff values,

{\bar{y}}_{i}

denotes the average observed value of runoff, and

y_{\min}

and

y_{\max}

represent the minimum and maximum values of the observed runoff data series, respectively.

4. Case Study

4.1. Study Area Introduction

This study focused on the Yangtze River Basin (from the source of the Yangtze River to the Shanghai). As shown in Figure 6 below, the Yangtze River originates from the Geladandong Snow Mountain, the main peak of the Tanggula Mountains on the Qinghai–Tibet Plateau [36]. The main stream of the Yangtze River flows through 11 provincial administrative units, including Qinghai and Xizang, and finally into the East China Sea [37]. Due to the influence of southwest monsoons, southeast monsoons, and the Qinghai–Tibet plateau, the climate in this area is very sensitive and the precipitation distribution is very uneven. The flood season usually occurs between May and September, when rainfall accounts for 78% of the annual rainfall [38]. Additionally, this place is also easily affected by water scarcity during the dry season. Therefore, climate diversity has brought many difficulties to the development and utilization of water resources in the upper reaches of the Yangtze River [39].

4.2. Data Analysis and Partitioning

The data analyzed in this article come from two main control stations in the Yangtze River—the Pingshan Hydrological Station and the Yichang Hydrological Station. The Pingshan Hydrological Station is the window for water and rain conditions in the upper reaches of the Yangtze River, and the Yichang Hydrological Station is the first major river control station in the middle reaches of the Yangtze River. Their positions are shown in Figure 6. The runoff data of Pingshan Hydrological Station cover the period from January 1940 to December 2010. The runoff data of Yichang Hydrological Station cover the period from January 1882 to December 2011. The whole runoff series data were divided into three datasets, the training set, validation set, and test set, and the ratio of each dataset was 80:10:10. As per the flowchart, we used the Pingshan Hydrological Station runoff data from January 1940 to December 1995 for training, from January 1996 to December 2002 for validation, and from January 2003 to December 2009 for testing. We also used the Yichang Hydrological Station runoff data from January 1882 to December 1985 for training, from January 1986 to December 1998 for validation, and from January 1999 to December 2011 for testing. Figure 7 and Figure 8 display the runoff data of the Pingshan Hydrological Station and Yichang Hydrological Station, respectively. Table 1 provides the statistical data for the three datasets, such as the minimum value, maximum value, average value, STDEV, etc.

The runoff series of the Pingshan Hydrological Station and Yichang Hydrological Station were analyzed. Firstly, preprocessing of the runoff series dataset was necessary, including missing value data supplementation and data transformation. We used quadratic spline interpolation to supplement the missing value data for this purpose. Additionally, stability analysis of the runoff series dataset was particularly important among them. To assess the stability of the runoff series dataset, we innovated the ADF test method. The ADF test method first assumes that the runoff series dataset is unstable. If the test results accept the null hypothesis, the runoff series dataset is unstable. At this point, it is necessary to test and process the dataset to make it stable. The ADF test results of the runoff series at the Pingshan Hydrological Station and Yichang Hydrological Station are shown in Table 2. According to Table 1, it can be seen that the hypothesis test results at Pingshan Hydrological Station are less than 1%, 5%, and 10%, with $p$ -values close to 0. Thus, the test results reject the null hypothesis, indicating that the Pingshan Hydrological Station runoff series is stationary. Similarly, the hypothesis test results at Yichang Hydrological Station are less than 1%, 5%, and 10%, with $p$ -values close to 0. Thus, the test results reject the null hypothesis, indicating that the Yichang Hydrological Station runoff series is stationary. In this work, all data preprocessing and management are completed using the Numpy 1.21.5 and Pandas 1.4.2 scientific computing software packages in Python 3.9.12.

In order to reduce the difficulty of prediction, we used CEEMDAN to decompose the data into several IMF components. In this paper, it is very important to set the decomposition parameters for the decomposition results; values too large or too small will directly affect the prediction accuracy of the model. When using the TPE algorithm to optimize the decomposition parameters, through iterative calculations from 2 to 50, we ultimately found that when the decomposition parameters were too large, it led to signals with similar scales in different IMF components. When the decomposition parameter value was too small, it led to insufficient signal decomposition, resulting in interference terms that were too large for time–frequency characterization. Therefore, we used the TPE optimization algorithm to determine the optimal decomposition level $k = 7$ for the Pingshan Hydrological Station and $k = 9$ for the Yichang Hydrological Station. After decomposition, the complexity of prediction is greatly reduced due to the more regular data patterns. In previous studies, we have found that as the raw data are decomposed into multiple modalities, they lose their correlation with the influencing factors, and these key influencing factors are also difficult to incorporate into the prediction model of a single decomposed subsequence [40]. However, the input of predictive models has a profound impact on their performance, so it is necessary to optimize the selection of input variables. Therefore, in this study, we only used the autocorrelation values of historical runoff sequences to simulate and predict runoff. We use the partial autocorrelation coefficient (PACF) to determine the optimal lag period, and set the historical values of the first 36 time periods of the runoff series to be predicted at the Pingshan and Yichang Hydrological Stations as the initial features. From the 36 initial features, we selected the features whose partial autocorrelation coefficient was outside the 95% confidence interval, with the predicted values as the final input factor for the prediction model. After final calculation, the lag numbers of the runoff sequences at the Pingshan Hydrological Station and Yichang Hydrological Station were 27 and 32, respectively, as shown in Figure 9 and Figure 10.

4.3. Parameter Settings

Simply put, hyperparameters are parameters that need to be set up before training a model, such as the model number k, learning rate, number of convolution kernels, convolution kernel size, number of convolution layers, number of residual connections, inflation coefficient, etc., rather than parameters obtained through training. Usually, in order to improve the predictive performance and effectiveness of the model, we need to select a set of optimal hyperparameters. On the basis of the original TCN-Attention model structure, we set up a hidden layer of a deep neural network to ensure that the runoff time series data could be accurately fitted into the prediction model. To objectively assess all techniques, we used the Tree-structured Parzen Estimator (TPE) to determine the hyperparameter values. Specifically, the hyperparameters of the proposed model CEEMDAN-VMD-TCN-Attention-TPE based on DDRI Framework were as follows: the TCN-Attention runoff prediction model consists of two layers of residual units, each containing two convolutional units and a nonlinear mapping. The model uses ReLU as the nonlinear activation function. One convolutional layer is added in the residual mapping, with a convolutional layer size of 1 × 1. The Adam optimizer and random gradient descent algorithm are used for model training. The specific parameter settings are shown in Table 3.

4.4. Comparative Experiments

To verify the reliability and accuracy of the proposed model, nine predictive models were established in this study: LSTM, TCN, CEEMDAN-LSTM, CEEMDAN-TCN-Attention, CEEMDAN-VMD-LSTM, CEEMDAN-VMD-TCN, CEEMDAN-VMD-LSTM-Attention, CEEMDAN-WPD-TCN-Attention, and CEEMDAN-VMD-TCN-Attention-TPE. These models were all implemented using Python 3.9.12 programming. Among the above runoff prediction models, LSTM and TCN belong to single deep learning models. CEEMDAN-LSTM and CEEMDAN-TCN-Attention are hybrid prediction models based on single decomposition, CEEMDAN-VMD-LSTM and CEEMDAN-VMD-TCN are hybrid prediction models based on quadratic decomposition, and CEEMDAN-VMD-LSTM-Attention, CEEMDAN-WPD-TCN-Attention, and CEEMDAN-VMD-TCN-Attention-TPE are hybrid prediction models based on dynamic decomposition reconstruction integration processing.

5. Result Analysis and Discussion

On this basis, we analyzed and discussed the prediction results of models such as LSTM, TCN, CEEMDAN-LSTM, CEEMDAN-TCN-Attention, CEEMDAN-VMD-LSTM, CEEMDAN-VMD-TCN, CEEMDAN-VMD-LSTM-Attention (DDRI), CEEMDAN-WPD-TCN-Attention (DDRI), and CEEMDAN-VMD-TCN-Attention-TPE (DDRI). Training each model using historical runoff data, we then used the trained models to predict monthly runoff data for a total of 85 months from December 2002 to December 2009 at the Pingshan Hydrological Station and 156 months from December 1998 to December 2011 at the Yichang Hydrological Station. This made it possible to analyze the predictive ability of each model. Figure 11 and Figure 12, respectively, show the predicted results of runoff sequences during the testing period at the Pingshan Hydrological Station and Yichang Hydrological Station. In general, the prediction accuracy of the hybrid prediction model based on dynamic decomposition reconstruction integration processing was higher than that of the mixed prediction model based on quadratic decomposition, the mixed prediction model based on primary decomposition, and the single prediction model. After optimizing hyperparameters using the TPE algorithm, the prediction accuracy of the model was effectively improved. Figure 11 and Figure 12, respectively, show a comparison of the predictive performance of nine different prediction models during the testing period at the Pingshan and Yichang Hydrological Stations. Figure 13 and Figure 14 show the scatter plots of the runoff series prediction results in the testing period for the Pingshan and Yichang Hydrological Stations.

From Figure 11 and Figure 12, it can be seen that the predicted results of each model are basically consistent with the fluctuation process of the runoff sequence at the Pingshan Hydrological Station and Yichang Hydrological Station. The prediction accuracy is ranked from high to low as follows: proposed model > CEEMDAN-VMD-LSTM-Attention > CEEMDAN-WPD-TCN-Attention > CEEMDAN-VMD-TCN > CEEMDAN-VMD-LSTM > CEEMDAN-TCN-Attention > CEEMDAN-LSTM > TCN > LSTM. Due to their ability to better eliminate randomness and irregularity in time series, hybrid models have superior predictive performance compared to single prediction models. However, due to the instability and high complexity of some components predicted based on the quadratic decomposition model, they cannot be dynamically adjusted according to features, resulting in poor transferability and decreased prediction accuracy. The model proposed in this paper improves the existing decomposition and reconstruction mode by dynamically adjusting the number and components of decomposition based on features, and uses the TPE optimization algorithm to optimize the parameters of the CEEMDAN-VMD-TCN-Attention-TPE model, significantly improving the predictive performance of the model.

According to the scatter plots in Figure 13 and Figure 14, we can clearly see that there are significant differences in the predictive performance of various models. The scattered points of the LSTM and CEEMDAN-LSTM models deviate significantly from the actual values, especially in the peak part of the runoff series. The predictive performances of the CEEMDAN-VMD-LSTM and CEEMDAN-TCN-Attention models are similar, with overall improvement compared to the two models mentioned above, but the scatter points are still relatively scattered. Overall, it is evident from the scatter plot that the predictive performance of the model based on dynamic decomposition reconstruction integration (SDRI) is significantly better than other models. However, the neural network models mentioned above have a large number of hyperparameters and are difficult to select. Improper selection of hyperparameters may lead to insufficient fitting and overfitting of the model. Therefore, by using the TPE optimization algorithm to optimize the parameters of the CEEMDAN-VMD-TCN-Attention-TPE model, the prediction performance of the model can be obviously improved.

Although the observed–predicted runoff comparison hydrograph and scatter diagram can intuitively evaluate the corresponding relationship between the predicted value and the observed data, the statistical indicators can more accurately evaluate the prediction ability of a different model. Therefore, this study obtained statistical indicators based on 20 training sessions and calculated the average value of the indicators. Table 4 and Table 5 show the evaluation results of nine forecasting methods for the Pingshan and Yichang Hydrological Stations. Figure 15 and Figure 16, respectively, show radar charts of the predictive performance of the Pingshan and Yichang Hydrological Stations.

In the case study section, we compared and analyzed the runoff prediction models of the Pingshan Hydrological Station and Yichang Hydrological Station. Of them, we selected four major types of models, namely single deep learning models (including LSTM and TCN), single decomposition models (including CEEMDAN-LSTM and CEEMDAN-TCN-Attention), quadratic decomposition models (including CEEMDAN-VMD-LSTM and CEEMDAN-VMD-TCN), and models based on DDRI (including CEEMDAN-WPD-TCN-Attention and CEEMDAN-VMD-LSTM-Attention), to verify the performance of the models. We found that hybrid deep learning models exhibit better predictive performance than individual models. Compared with TCN, CEEMDAN-TCN-Attention, and CEEMDAN-VMD-TCN, the NSE of the CEEMDAN-VMD-TCN-Attention-TPE (DDRI) model in the testing set of both the Pingshan Hydrological Station and Yichang Hydrological Station can be increased by more than 11.8%, 9.4%, 4.9%, 11.8%, 10.1%, and 0.5%. This indicates that extracting internal variation features of runoff sequences through dynamic decomposition algorithms can effectively reduce the difficulty of prediction. At the same time, it also fully demonstrates the importance of hyperparameter optimization in improving the accuracy of runoff prediction models. In addition, it can be clearly seen from Table 3 and Table 4 that our proposed runoff forecasting method has the best statistical indicators during both the training and experimental periods. Taking the Pingshan Hydrological Station as an example, the MAE, RMSE, MAPE, and NSE in the test period were 1007.93, 985.87, 16.47, and 0.922, respectively. Compared with the single deep learning model (TCN), the MAE decreased by 33.2%, the RMSE decreased by 36.1%, the MAPE decreased by 23.1%, and the NSE increased by 13.4%. Compared with the single decomposition model (CEEMDAN-TCN-Attention), the MAE decreased by 28.1%, the RMSE decreased by 37.3%, the MAPE decreased by 24.8%, and the NSE increased by 10.4%. Compared with the quadratic decomposition model (CEEMDAN-VMD-TCN), the MAE decreased by 21.2%, the RMSE decreased by 29.6%, the MAPE decreased by 10.6%, and the NSE increased by 5.1%. Compared with models based on DDRI (CEEMDAN-VMD-LSTM-Attention), the MAE decreased by 13.1%, the RMSE decreased by 22.9%, the MAPE decreased by 0.4%, and the NSE increased by 2.1%. The above analysis indicates that, compared with other methods, the runoff forecasting method proposed in this paper can provide more satisfactory forecasting results. It has been proven that the DDRI prediction framework developed in this article has better runoff prediction performance than single decomposition frameworks and quadratic decomposition frameworks. In summary, the application of the dynamic decomposition reconstruction ensemble processing framework, Temporal Convolutional Network, attention mechanism, and hyperparameter optimization method can enhance the prediction performance of the complex hydrological time series model.

6. Conclusions

In this paper, we proposed to predict monthly runoff by using a CEEMDAN-VMD-TCN-Attention-TPE model (DDRI). The proposed model preprocesses the data and uses the CEEMDAN decomposition method to decompose the runoff sequence. A TCN-Attention runoff forecasting model was established for each IMF component, and the TCN-Attention model was optimized and trained. To build and train the Temporal Convolutional Network Fusion Attention Mechanism Runoff Prediction Model based on dynamic decomposition reconstruction integration processing, the TPE hyperparameter optimization algorithm is used to continuously iterate, update, continue to calculate to maximize EI, then get new hyperparameters, repeat this step, and calculate the model to return the evaluation parameters. The model that achieves the maximum benefit after the iteration is complete. Selecting and outputting the corresponding optimized hyperparameters represents the completion of hyperparameter tuning for the model. Then, all low precision components that need to be decomposed again are aggregated into high complexity and low complexity using the permutation entropy, PE. The VMD method is further used for secondary decomposition to generate new components. Finally, the obtained secondary components were input into the training and the optimized TCN-Attention model for prediction. To verify the performance of the proposed model, several models, including LSTM, TCN, CEEMDAN-LSTM, CEEMDAN-TCN-Attention, CEEMDAN-VMD-LSTM, CEEMDAN-VMD-TCN, CEEMDAN-VMD-LSTM-Attention and CEEMDAN-WPD-TCN-Attention, were chosen as comparison methods. The study offers the following conclusions:

(1). By comparing the single deep learning models (including LSTM and TCN), the single decomposition models (including CEEMDAN-LSTM and CEEMDAN-TCN-Attention) and the quadratic decomposition models (including CEEMDAN-VMD-LSTM and CEEMDAN-VMD-TCN), it was observed that the prediction performance of models based on dynamic decomposition reconstruction integration processing was superior to that of single and quadratic decomposition models in all indicators. On this basis, it was asserted that models based on dynamic decomposition reconstruction integration processing can improve prediction performance. The MAE, RMSE, MAPE, and NSE indicators of the proposed model showed the best performances, with test set values of 1007.93, 985.87, 16.47, and 0.922 for the Pingshan Hydrological Station and 1086.81, 1211.18, 17.20, and 0.919 for the Yichang Hydrological Station, respectively.
(2). The method proposed in this paper integrates dynamic decomposition methods and filtered decomposition reconstruction processes, fully utilizing decomposition techniques, reconstruction techniques, complexity analysis, dynamic decomposition techniques, and neural networks optimized by automatic hyperparameter optimization algorithms, effectively improving the interpretability and prediction accuracy of the model of monthly runoff.
(3). The time convolutional neural network can effectively extract cross-temporal nonlinear features of longer runoff data, and the introduction of the attention mechanism can effectively capture the importance distribution and duration relationship of historical time features in runoff prediction, which has been effectively verified in this paper.

The prediction results show that among all the methods mentioned, the proposed method in this paper is the best for prediction accuracy, model training time, and multi-step-ahead prediction. Specifically, our study answers the question of which components need to be reconstructed, which components need to be decomposed, and how many times they need to be decomposed in a decomposition-ensemble framework. This research, however, is subject to several limitations. The dynamic decomposition–reconstruction–ensemble approach proposed in this study obtains more components, which undoubtedly increases the complexity and computational effort of the model. In terms of future research directions, further work that improves the decomposition technique to fully extract the features of the original sequence can be devoted to decrease the model complexity. In addition, only the data series of two stations were used for model verification in this study. Future research needs to be extended to stations in other basins to verify the robustness of the model.

Author Contributions

Conceptualization, Z.Q. and L.M.; methodology, Z.Q.; validation, Z.Q. and Y.Z.; formal analysis, L.M.; data curation, Z.Q., Y.Z. and P.R.; writing—original draft preparation, Z.Q.; writing—review and editing, L.M. and Y.Z.; visualization, Z.Q. and S.Z.; supervision, L.M. and H.Q. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

Acronym	Full Name
DDRI	Dynamic decomposition–reconstruction–integration
QDRI	Quadratic decomposition reconstruction integration
SDRI	Single decomposition reconstruction integration
CEEMDAN	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise
VMD	Variational Modal Decomposition
WPD	Wavelet Packet Decomposition
EMD	Empirical mode decomposition
TCN	Temporal Convolutional Network
LSTM	Long Short-Term Memory
TCN-Attention	Attention-based Temporal Convolutional Network
LSTM-Attention	Attention-based Long Short-Term Memory
CEEMDAN-LSTM	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Long Short-Term Memory
CEEMDAN-TCN-Attention	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Attention-based Temporal Convolutional Network
CEEMDAN-VMD-LSTM	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Variational Modal Decomposition–Long Short-Term Memory
CEEMDAN-VMD-TCN	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Variational Modal Decomposition–Temporal Convolutional Network
CEEMDAN-VMD-LSTM-Attention	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Variational Modal Decomposition–Attention-based Long Short-Term Memory
CEEMDAN-WPD-TCN-Attention	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Wavelet Packet Decomposition–Attention-based Temporal Convolutional Network
CEEMDAN-VMD-TCN-Attention-TPE	Complete Ensemble Empirical Mode Decomposition With Adaptive Noise–Variational Modal Decomposition–Attention-based Temporal Convolutional Network
MAE	Mean Absolute Error
RMSE	Root Mean Square Error
MAPE	Mean Absolute Percentage Error
NSE	Nash–Sutcliffe Efficiency coefficient
TPE	Tree-structured Parzen Estimator
IMF	Intrinsic mode function
ADF	Augmented Dickey Fuller
PACF	Partial autocorrelation function

Footnotes

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Figures and Tables

Figure 1. Visualization of causal convolutional layers.

Figure 2. Visualization of dilative convolutional layers.

Figure 3. Visualization of residual connection.

Figure 4. Framework of TCN-Attention.

Figure 5. The framework of our proposed dynamic decomposition–reconstruction–ensemble approach.

Figure 6. Study area and locations of the two hydrological stations in the Yangtze River Basin.

Figure 7. Runoff of the Pingshan Hydrological Station.

Figure 8. Runoff of the Yichang Hydrological Station.

Figure 9. The PACF plot of Pingshan Hydrological Station runoff data.

Figure 10. The PACF plot of Yichang Hydrological Station runoff data.

Figure 11. Prediction results and residuals of different methods for the Pingshan Hydrological Station.

Figure 12. Prediction results and residuals of different methods for the Yichang Hydrological Station.

View Image - Figure 13. Scatter plots of different methods for the Pingshan Hydrological Station. (a) Proposed model; (b) CEEMDAN-VMD-LSTM-Attention model; (c) CEEMDAN-WPD-TCN-Attention model; (d) CEEMDAN-VMD-TCN model; (e) CEEMDAN-VMD-LSTM model; (f) CEEMDAN-TCN-Attention model; (g) CEEMDAN-LSTM model; (h) TCN model; (i) LSTM model.

Figure 13. Scatter plots of different methods for the Pingshan Hydrological Station. (a) Proposed model; (b) CEEMDAN-VMD-LSTM-Attention model; (c) CEEMDAN-WPD-TCN-Attention model; (d) CEEMDAN-VMD-TCN model; (e) CEEMDAN-VMD-LSTM model; (f) CEEMDAN-TCN-Attention model; (g) CEEMDAN-LSTM model; (h) TCN model; (i) LSTM model.

View Image - Figure 14. Scatter plots of different methods for the Yichang Hydrological Station. (a) Proposed model; (b) CEEMDAN-VMD-LSTM-Attention model; (c) CEEMDAN-WPD-TCN-Attention model; (d) CEEMDAN-VMD-TCN model; (e) CEEMDAN-VMD-LSTM model; (f) CEEMDAN-TCN-Attention model; (g) CEEMDAN-LSTM model; (h) TCN model; (i) LSTM model.

Figure 14. Scatter plots of different methods for the Yichang Hydrological Station. (a) Proposed model; (b) CEEMDAN-VMD-LSTM-Attention model; (c) CEEMDAN-WPD-TCN-Attention model; (d) CEEMDAN-VMD-TCN model; (e) CEEMDAN-VMD-LSTM model; (f) CEEMDAN-TCN-Attention model; (g) CEEMDAN-LSTM model; (h) TCN model; (i) LSTM model.

View Image - Figure 15. Radar charts of statistical indicators for different runoff prediction models at the Pingshan Hydrological Station during the experimental stage. (a) MAE improvement rate; (b) RMSE improvement rate; (c) MAPE improvement rate; (d) NSE improvement rate.

Figure 15. Radar charts of statistical indicators for different runoff prediction models at the Pingshan Hydrological Station during the experimental stage. (a) MAE improvement rate; (b) RMSE improvement rate; (c) MAPE improvement rate; (d) NSE improvement rate.

View Image - Figure 16. Radar charts of statistical indicators for different runoff prediction models at the Yichang Hydrological Station during the experimental stage. (a) MAE improvement rate; (b) RMSE improvement rate; (c) MAPE improvement rate; (d) NSE improvement rate.

Figure 16. Radar charts of statistical indicators for different runoff prediction models at the Yichang Hydrological Station during the experimental stage. (a) MAE improvement rate; (b) RMSE improvement rate; (c) MAPE improvement rate; (d) NSE improvement rate.

Table 1

Statistical data of the runoff sequence training set, validation set, and test set at Pingshan Hydrological Station.

Station	Data Range	Min (m³/s)	Max (m³/s)	Ave (m³/s)	STDEV
Pingshan	January 1940–December 1995	$1109.03$	$17,309.68$	$4476.28$	$3596.699$
	January 1996–December 2002	$1274.52$	$19,448.39$	$5148.48$	$4339.637$
	January 2003–December 2009	$1415.48$	$14,184.84$	$4499.71$	$3362.689$
Yichang	January 1882–December 1985	$3057.74$	$49,458.06$	$14,264.18$	$10,272.54$
	January 1986–December 1998	$3183.93$	$52,167.74$	$13,447.46$	$9834.172$
	January 1999–December 2011	$3329.35$	$42,770.97$	$12,823.18$	$8860.593$

Table 2

The ADF test results for the Pingshan Hydrological Station and Yichang Hydrological Station runoff series.

Station	Data Range	Test Result	$p$ -Value	1%	5%	10%
Pingshan	January 1940–December 2009	$- 5.77$	$6.07 \times 10^{- 6}$	$- 4.18$	$- 3.52$	$- 3.17$
Yichang	January 1882–December 2011	$- 6.13$	$- 4.59 \times 10^{- 5}$	$- 5.82$	$- 4.63$	$- 4.91$

Table 3

Optimal hyperparameters of the TCN-Attention model by the TPE algorithm.

Station	Learning Rate	Number of Convolution Kernels	Kernel Size	LossFunction	Optimizers	$k$	Epochs	Batch Size
Pingshan	$0.002$	$0.002$	$2 \times 2$	$mse$	$adam$	$8$	$100$	$64$
Yichang	$0.002$	$0.002$	$2 \times 2$	$mse$	$adam$	$8$	$100$	$64$

Table 4

The predictive performance of different forecasting methods at Pingshan Hydrological Station.

Model	Training				Testing
Model	MAE (m³/s)	RMSE (m³/s)	MAPE (%)	NSE	MAE (m³/s)	RMSE (m³/s)	MAPE (%)	NSE
LSTM	1411.36	1633.51	24.17	0.805	1554.29	1664.36	25.36	0.797
TCN	1396.26	1607.29	22.89	0.831	1509.31	1542.57	21.41	0.813
CEEMDAN-LSTM	1345.20	1591.85	20.17	0.842	1499.27	1633.21	22.73	0.827
CEEMDAN-TCN-Attention	1438.29	1611.81	21.41	0.857	1401.89	1571.92	21.89	0.835
CEEMDAN-VMD-LSTM	1351.35	1584.37	19.53	0.897	1371.24	1547.83	19.77	0.821
CEEMDAN-VMD-TCN	1281.48	1531.47	18.74	0.901	1279.03	1400.21	18.43	0.877
CEEMDAN-WPD-TCN-Attention	1134.31	1331.12	16.29	0.913	1258.40	1239.33	15.11	0.861
CEEMDAN-VMD-LSTM-Attention	1033.27	1295.17	16.17	0.917	1159.41	1278.42	16.54	0.903
Proposed Model	1007.92	1229.33	15.03	0.929	1007.93	985.87	16.47	0.922

Table 5

The predictive performance of different forecasting methods at Yichang Hydrological Station.

Model	Training				Testing
Model	MAE (m³/s)	RMSE (m³/s)	MAPE (%)	NSE	MAE (m³/s)	RMSE (m³/s)	MAPE (%)	NSE
LSTM	1418.84	1689.74	23.91	0.811	1429.19	1691.24	24.28	0.791
TCN	1392.04	1671.84	23.81	0.820	1421.28	1687.28	18.47	0.811
CEEMDAN-LSTM	1381.74	1665.33	22.80	0.829	1328.91	1691.19	21.29	0.819
CEEMDAN-TCN-Attention	1400.25	1691.72	21.63	0.834	1321.18	1601.28	22.38	0.826
CEEMDAN-VMD-LSTM	1363.94	1688.93	19.22	0.855	1363.01	1459.19	19.92	0.847
CEEMDAN-VMD-TCN	1231.95	1572.94	19.53	0.887	1309.83	1527.01	17.17	0.914
CEEMDAN-WPD-TCN-Attention	1291.62	1484.82	18.29	0.901	1291.74	1501.82	18.29	0.919
CEEMDAN-VMD-LSTM-Attention	1152.84	1376.29	17.81	0.917	1191.84	1410.29	18.49	0.911
Proposed Model	1121.81	1374.21	16.54	0.926	1086.81	1211.18	17.20	0.919

References

1. Ahmed, J.A.; Sarma, A.K. Artificial Neural Network Model for Synthetic Streamflow Generation. Water Resour. Manag.; 2007; 21, pp. 1015-1029. [DOI: https://dx.doi.org/10.1007/s11269-006-9070-y]

2. Liu, Y.; Qin, H.; Mo, L.; Wang, Y.; Chen, D.; Pang, S.; Yin, X. Hierarchical Flood Operation Rules Optimization Using Multi-Objective Cultured Evolutionary Algorithm Based on Decomposition. Water Resour. Manag.; 2019; 33, pp. 337-354. [DOI: https://dx.doi.org/10.1007/s11269-018-2105-3]

3. Wang, Q.; Yue, C.; Li, X.; Liao, P.; Li, X. Enhancing robustness of monthly streamflow forecasting model using embedded-feature selection algorithm based on improved gray wolf optimizer. J. Hydrol.; 2023; 617, 128995. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2022.128995]

4. Devia, G.K.; Ganasri, B.P.; Dwarakish, G.S. A Review on Hydrological Models. Aquat. Procedia; 2015; 4, pp. 1001-1007. [DOI: https://dx.doi.org/10.1016/j.aqpro.2015.02.126]

5. Valipour, M.; Banihabib, M.E.; Behbahani, S.M.R. Comparison of the ARMA, ARIMA, and the Autoregressive Artificial Neural Network Models in Forecasting the Monthly Inflow of Dez Dam Reservoir. J. Hydrol.; 2013; 476, pp. 433-441. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2012.11.017]

6. Castellano-Méndez, M.; González-Manteiga, W.; Febrero-Bande, M.; Manuel Prada-Sánchez, J.; Lozano-Calderón, R. Modelling of the Monthly and Daily Behaviour of the Runoff of the Xallas River Using Box–Jenkins and Neural Networks Methods. J. Hydrol.; 2004; 296, pp. 38-58. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2004.03.011]

7. Besaw, L.E.; Rizzo, D.M.; Bierman, P.R.; Hackett, W.R. Advances in Ungauged Streamflow Prediction Using Artificial Neural Networks. J. Hydrol.; 2010; 386, pp. 27-37. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2010.02.037]

8. Kisi, O. Pan Evaporation Modeling Using Least Square Support Vector Machine, Multivariate Adaptive Regression Splines and M5 Model Tree. J. Hydrol.; 2015; 528, pp. 312-320. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2015.06.052]

9. Mouatadid, S.; Adamowski, J. Using Extreme Learning Machines for Short-Term Urban Water Demand Forecasting. Urban Water J.; 2017; 14, pp. 630-638. [DOI: https://dx.doi.org/10.1080/1573062X.2016.1236133]

10. Wu, C.L.; Chau, K.W.; Li, Y.S. Predicting Monthly Streamflow Using Data-driven Models Coupled with Data-preprocessing Techniques. Water Resour. Res.; 2009; 45, W08432. [DOI: https://dx.doi.org/10.1029/2007WR006737]

11. Xie, T.; Zhang, G.; Hou, J.; Xie, J.; Lv, M.; Liu, F. Hybrid Forecasting Model for Non-Stationary Daily Runoff Series: A Case Study in the Han River Basin, China. J. Hydrol.; 2019; 577, 123915. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2019.123915]

12. Wang, S.; Yu, L.; Tang, L.; Wang, S. A Novel Seasonal Decomposition Based Least Squares Support Vector Regression Ensemble Learning Approach for Hydropower Consumption Forecasting in China. Energy; 2011; 36, pp. 6542-6554. [DOI: https://dx.doi.org/10.1016/j.energy.2011.09.010]

13. Adnan, R.M.; Liang, Z.; Trajkovic, S.; Zounemat-Kermani, M.; Li, B.; Kisi, O. Daily Streamflow Prediction Using Optimally Pruned Extreme Learning Machine. J. Hydrol.; 2019; 577, 123981. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2019.123981]

14. Yaseen, Z.M.; Sulaiman, S.O.; Deo, R.C.; Chau, K.-W. An Enhanced Extreme Learning Machine Model for River Flow Forecasting: State-of-the-Art, Practical Applications in Water Resource Engineering Area and Future Research Direction. J. Hydrol.; 2019; 569, pp. 387-408. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2018.11.069]

15. Rezaie-Balf, M.; Kim, S.; Fallah, H.; Alaghmand, S. Daily River Flow Forecasting Using Ensemble Empirical Mode Decomposition Based Heuristic Regression Models: Application on the Perennial Rivers in Iran and South Korea. J. Hydrol.; 2019; 572, pp. 470-485. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2019.03.046]

16. Nanda, T.; Sahoo, B.; Chatterjee, C. Enhancing Real-Time Streamflow Forecasts with Wavelet-Neural Network Based Error-Updating Schemes and ECMWF Meteorological Predictions in Variable Infiltration Capacity Model. J. Hydrol.; 2019; 575, pp. 890-910. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2019.05.051]

17. Wen, X.; Feng, Q.; Deo, R.C.; Wu, M.; Yin, Z.; Yang, L.; Singh, V.P. Two-Phase Extreme Learning Machines Integrated with the Complete Ensemble Empirical Mode Decomposition with Adaptive Noise Algorithm for Multi-Scale Runoff Prediction Problems. J. Hydrol.; 2019; 570, pp. 167-184. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2018.12.060]

18. Chen, S.; Ren, M.; Sun, W. Combining Two-Stage Decomposition Based Machine Learning Methods for Annual Runoff Forecasting. J. Hydrol.; 2021; 603, 126945. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2021.126945]

19. He, F.; Wan, Q.; Wang, Y.; Wu, J.; Zhang, X.; Feng, Y. Daily Runoff Prediction with a Seasonal Decomposition-Based Deep GRU Method. Water; 2024; 16, 618. [DOI: https://dx.doi.org/10.3390/w16040618]

20. Zhang, C.; Sheng, Z.; Zhang, C.; Wen, S. Multi-Lead-Time Short-Term Runoff Forecasting Based on Ensemble Attention Temporal Convolutional Network. Expert. Syst. Appl.; 2024; 243, 122935. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.122935]

21. Qiao, X.; Peng, T.; Sun, N.; Zhang, C.; Liu, Q.; Zhang, Y.; Wang, Y.; Shahzad Nazir, M. Metaheuristic Evolutionary Deep Learning Model Based on Temporal Convolutional Network, Improved Aquila Optimizer and Random Forest for Rainfall-Runoff Simulation and Multi-Step Runoff Prediction. Expert. Syst. Appl.; 2023; 229, 120616. [DOI: https://dx.doi.org/10.1016/j.eswa.2023.120616]

22. Bai, S.; Kolter, J.Z.; Koltun, V. An Empirical Evaluation of Generic Convolutional and Recurrent Networks for Sequence Modeling. arXiv; 2018; arXiv: 1803.01271v2

23. Tan, K.; Chen, J.; Wang, D. Gated Residual Networks with Dilated Convolutions for Monaural Speech Enhancement. IEEE/ACM Trans. Audio Speech Lang. Process.; 2019; 27, pp. 189-198. [DOI: https://dx.doi.org/10.1109/TASLP.2018.2876171] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/31355300]

24. Sun, L.; Du, J.; Dai, L.-R.; Lee, C.-H. Multiple-Target Deep Learning for LSTM-RNN Based Speech Enhancement. Proceedings of the 2017 Hands-Free Speech Communications and Microphone Arrays (HSCMA); San Francisco, CA, USA, 1–3 March 2017; pp. 136-140.

25. He, Y.; Liu, R.; Li, H.; Wang, S.; Lu, X. Short-Term Power Load Probability Density Forecasting Method Using Kernel-Based Support Vector Quantile Regression and Copula Theory. Appl. Energy; 2017; 185, pp. 254-266. [DOI: https://dx.doi.org/10.1016/j.apenergy.2016.10.079]

26. Feng, S.; Feng, Y. A Dual-Staged Attention Based Conversion-Gated Long Short Term Memory for Multivariable Time Series Prediction. IEEE Access; 2022; 10, pp. 368-379. [DOI: https://dx.doi.org/10.1109/ACCESS.2021.3136712]

27. Yang, L.; Shami, A. On Hyperparameter Optimization of Machine Learning Algorithms: Theory and Practice. Neurocomputing; 2020; 415, pp. 295-316. [DOI: https://dx.doi.org/10.1016/j.neucom.2020.07.061]

28. Shen, K.; Qin, H.; Zhou, J.; Liu, G. Runoff Probability Prediction Model Based on Natural Gradient Boosting with Tree-Structured Parzen Estimator Optimization. Water; 2022; 14, 545. [DOI: https://dx.doi.org/10.3390/w14040545]

29. Jones, D.R. A Taxonomy of Global Optimization Methods Based on Response Surfaces. J. Glob. Optim.; 2001; 21, pp. 345-383. [DOI: https://dx.doi.org/10.1023/A:1012771025575]

30. Kala, A.; Ganesh Vaidyanathan, S.; Sharon Femi, P. CEEMDAN Hybridized with LSTM Model for Forecasting Monthly Rainfall. J. Intell. Fuzzy Syst.; 2022; 43, pp. 2609-2617. [DOI: https://dx.doi.org/10.3233/JIFS-213064]

31. Luukko, P.J.J.; Helske, J.; Räsänen, E. Introducing Libeemd: A Program Package for Performing the Ensemble Empirical Mode Decomposition. Comput. Stat.; 2017; 31, pp. 545-557. [DOI: https://dx.doi.org/10.1007/s00180-015-0603-9]

32. Zhang, J.; Li, Z.; Huang, J.; Cheng, M.; Li, H. Study on Vibration-Transmission-Path Identification Method for Hydropower Houses Based on CEEMDAN-SVD-TE. Appl. Sci.; 2022; 12, 7455. [DOI: https://dx.doi.org/10.3390/app12157455]

33. Chang, J.; Zhang, H.; Wang, Y.; Zhu, Y. Assessing the Impact of Climate Variability and Human Activities on Streamflow Variation. Hydrol. Earth Syst. Sci.; 2016; 20, pp. 1547-1560. [DOI: https://dx.doi.org/10.5194/hess-20-1547-2016]

34. Fang, J.; Yang, L.; Wen, X.; Yu, H.; Li, W.; Adamowski, J.F.; Barzegar, R. Ensemble Learning Using Multivariate Variational Mode Decomposition Based on the Transformer for Multi-Step-Ahead Streamflow Forecasting. J. Hydrol.; 2024; 636, 131275. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2024.131275]

35. Xu, Z.; Mo, L.; Zhou, J.; Fang, W.; Qin, H. Stepwise Decomposition-Integration-Prediction Framework for Runoff Forecasting Considering Boundary Correction. Sci. Total Environ.; 2022; 851, 158342. [DOI: https://dx.doi.org/10.1016/j.scitotenv.2022.158342]

36. Zhong, W.; Guo, J.; Chen, L.; Zhou, J.; Zhang, J.; Wang, D. Future Hydropower Generation Prediction of Large-Scale Reservoirs in the Upper Yangtze River Basin under Climate Change. J. Hydrol.; 2020; 588, 125013. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2020.125013]

37. Shi, M.; Yuan, Z.; Shi, X.; Li, M.; Chen, F.; Li, Y. Drought Assessment of Terrestrial Ecosystems in the Yangtze River Basin, China. J. Clean. Prod.; 2022; 362, 132234. [DOI: https://dx.doi.org/10.1016/j.jclepro.2022.132234]

38. Chao, N.; Li, F.; Yu, N.; Chen, G.; Wang, Z.; Ouyang, G.; Yeh, P.J.-F. Divergent Spatiotemporal Variability of Terrestrial Water Storage and Eight Hydroclimatic Components over Three Different Scales of the Yangtze River Basin. Sci. Total Environ.; 2023; 879, 162886. [DOI: https://dx.doi.org/10.1016/j.scitotenv.2023.162886]

39. Wen, S.; Su, B.; Wang, Y.; Zhai, J.; Sun, H.; Chen, Z.; Huang, J.; Wang, A.; Jiang, T. Comprehensive Evaluation of Hydrological Models for Climate Change Impact Assessment in the Upper Yangtze River Basin, China. Clim. Change; 2020; 163, pp. 1207-1226. [DOI: https://dx.doi.org/10.1007/s10584-020-02929-6]

40. Wang, Y.; Wu, L. On Practical Challenges of Decomposition-Based Hybrid Forecasting Algorithms for Wind Speed and Solar Irradiation. Energy; 2016; 112, pp. 208-220. [DOI: https://dx.doi.org/10.1016/j.energy.2016.06.075]

Word count: 9075

Show less

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Accurate and reliable runoff forecasting is of great significance for hydropower station operation and watershed water resource allocation. However, various complex factors, such as climate conditions and human activities, constantly affect the formation of runoff. Runoff data under changing environments exhibit highly nonlinear, time-varying, and stochastic characteristics, which undoubtedly pose great challenges to runoff prediction. Under this background, this study ingeniously merges reconstruction integration technology and dynamic decomposition technology to propose a Temporal Convolutional Network Fusion Attention Mechanism Runoff Prediction method based on dynamic decomposition reconstruction integration processing. This method uses the Temporal Convolutional Network to extract the cross-temporal nonlinear characteristics of longer runoff data, and introduces attention mechanisms to capture the importance distribution and duration relationship of historical temporal features in runoff prediction. It integrates a decomposition reconstruction process based on dynamic classification and filtering, fully utilizing decomposition techniques, reconstruction techniques, complexity analysis, dynamic decomposition techniques, and neural networks optimized by automatic hyperparameter optimization algorithms, effectively improving the model’s interpretability and precision of prediction accuracy. This study used historical monthly runoff datasets from the Pingshan Hydrological Station and Yichang Hydrological Station for validation, and selected eight models including the LSTM model, CEEMDAN-TCN-Attention model, and CEEMDAN-VMD-LSTM-Attention (DDRI) for comparative prediction experiments. The MAE, RMSE, MAPE, and NSE indicators of the proposed model showed the best performances, with test set values of 1007.93, 985.87, 16.47, and 0.922 for the Pingshan Hydrological Station and 1086.81, 1211.18, 17.20, and 0.919 for the Yichang Hydrological Station, respectively. The experimental results indicate that the fusion model generated through training has strong learning ability for runoff temporal features and the proposed model has obvious advantages in overall predictive performance, stability, correlation, comprehensive accuracy, and statistical testing.

Details

Title

A Temporal Convolutional Neural Network Fusion Attention Mechanism Runoff Prediction Model Based on Dynamic Decomposition Reconstruction Integration Processing

Author

Zhou, Qin¹

; Zhang, Yongchuan¹; Qin, Hui¹

; Li, Mo¹; Ren, Pingan¹; Zhu, Sipeng¹

¹ School of Civil and Hydraulic Engineering, Huazhong University of Science & Technology, Wuhan 430074, China; Hubei Key Laboratory of Digital Valley Science and Technology, Wuhan 430074, China

First page

3515

Publication year

2024

Publication date

2024

Publisher

MDPI AG

e-ISSN

20734441

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/w16233515

ProQuest document ID

3144155586