Full Text

Turn on search term navigation

1. Introduction

A warming climate is likely to reshape the global hydrological cycle, resulting in an exacerbation of water resources crises and hydrological hazards. In particular, floods, as one of the most destructive, widely distributed, and frequently occurred natural disasters, are expected to introduce growing casualties and property losses [1]. To better prepare for, and mitigate, these crises and risks, it is crucial to develop timely and accurate streamflow and flood forecasting systems.

Currently, process-based hydrologic–hydraulic models and data-driven models re-resent the two mainstream approaches for streamflow and flood predictions [2]. Process-based hydrologic–hydraulic models make explicit representation of key hydrological processes, including the rainfall-runoff process, flow routing, the interception and infiltration of precipitation, groundwater process, evapotranspiration, ecological responses, and feedbacks, etc. Each process is described by equations derived from the physical, phenomenological, or empirical understandings of the modelers. Various processes are threaded or coupled under the constraints of conservation laws. While a growing tendency for developing process-based models is to move from lumped models to spatially-distributed models, and to incorporate more detailed characterization of formerly overlooked hydrological processes, this increase in model complexity does not necessarily lead to more accurate predictions [3]. This dilemma is due to the prevailing parametric and structural uncertainties in the model formulations, as well as the uncertainties of the required high-resolution input meteorological forcing data. While the community has long realized the importance of leveraging observational data to calibrate, diagnose, and inform the development of process-based models, a robust framework for consistently delivering multi-source, inhomogeneous observational information to process-based models, in order to quantify the models’ aleatoric uncertainties, and reduce the models’ epistemic uncertainties, is still lacking.

In addition to process-based models, data-driven, machine learning models recently demonstrated advantageous accuracy for streamflow predictions, and served as an attractive alternative to process-based models. Machine learning is a set of algorithms, such as linear regression [4], a support vector machine [5], model tree ensembles [4], and a neural network [6,7], and is increasingly and extensively used in the field of geosciences. In contrast to process-based models, machine learning models rely on sample data to make predictions, without being explicitly programmed to do so [8,9]. This is usually achieved by optimizing the data feature representation and model parameters to obtain an optimal objective function value. This ideology allows the data to speak for themselves, hence, revealing the potential deficiencies of our mechanistic understanding. As a result, machine learning strongly complements and enriches process-based models, and helps scientists gain new insights [10].

For streamflow forecast in particular, a machine learning model, named long short-term memory (LSTM) recurrent neural network, demonstrated advantages in streamflow prediction [11,12,13,14]. An LSTM neural network has a hidden layer, using unique gate settings to regulate information flow for a modeling time series. Kratzert et al.[11] establish a rainfall-runoff model using LSTM to predict streamflow. Feng et al. [15] introduce data integration that leveraged recent observations to improve short-term streamflow forecast using LSTM. Xiang et al. [16] propose a LSTM-seq2seq model, based on LSTM and sequence-to-sequence (seq2seq) structure, for hourly streamflow prediction, and it shows excellent predictive ability, and could enhance short-term flood forecast accuracy. Despite the success of the above-mentioned works, it is worth noting that single data-driven LSTM streamflow forecast models face challenges if there are no abundant data to facilitate their training [17,18].

Applying artificial intelligence techniques, in conjunction with physical understanding, substantially improves simulation effectiveness [19,20,21]. Recently, several studies use the so-called theory-guided machine learning approach, combining physical understanding with machine learning [2,21,22,23,24]. The two components of such a combination, based on different philosophies, complement each other in terms of their inherent strengths and limitations. While the important hydrological processes involved in physics-models constitute the black-box feature of machine learning, machine learning techniques may be helpful in extracting any information left in the residuals of physical models [25,26,27,28]. Karpatne et al. [21] propose a physics-guided neural network (PGNN) by combining a physical model with a neural network, and take lake water temperature simulation as an example to demonstrate the effectiveness of PGNN. The PGNN framework leverages the output of physics-based model simulations, along with observational features, to generate predictions using a neural network architecture [21]. Yang et al. [27] integrate the streamflow output of the global hydrological models (GHMs)-CaMa-Flood model chain, and meteorological data from the ERA-interim dataset [29], as the inputs of LSTM model, which suggests that machine learning methods improve model-based flood simulation. To the best of our knowledge, the PGNN framework has only been used for streamflow simulation at global scale [27]. To date, any attempt to use the PGNN method for streamflow simulation at river basin scale is limited, and detailed evaluation of the method is needed to assess the performance of hydrological simulations from upstream to downstream for watershed studies. Therefore, this study integrated the physically-based VIC-CaMa-Flood model into the LSTM of streamflow and flood simulations, using a PGNN framework in the Lancang–Mekong River Basin (LMRB), an important transboundary river basin in southeast Asia.

The objectives of this study were to: (1) apply a PGNN framework that combined VIC-CaMa-Flood and LSTM to improve streamflow and flood simulation, and we called this a hybrid-physics-data (HPD) model; (2) quantify the added value of the physical model by assessing the feature importance and relative contribution of the physical model and meteorological inputs.

2. Study Area and Data

2.1. Study Area

The transboundary Lancang–Mekong River, located between 9°60′–33°80′ N and 93°50′–108°60′ E, is the longest river in southeast Asia. The length of the LMRB is close to 4900 km, with a drainage area of 795,000 km², and an annual streamflow at the outlet of 14,500 m³/s (MRC, 2010). It flows through 6 countries including China, Myanmar, Laos, Thailand, Cambodia, and Vietnam. It is called the Lancang River in China and the Mekong River outside China. Streamflow observations at five hydrological stations, namely, Chiang Saen (CS), Luang Prabang (LP), Vientiane (VT), Mukdahan (MK), and Pakse (PK), were used in this study. The locations of the hydrological stations are shown in Figure 1.

The peak discharge per unit area of the LMRB is close to the limit value of global rain flood rivers [30]. Frequent floods pose a major threat to the safety and properties of people in surrounding countries. In addition, climate change would lead to an upward trend in the duration, occurrence frequency, and magnitude of floods in the LMRB [31,32,33], which warrants accurate simulation and prediction of floods.

2.2. Data

In this study, precipitation data, at a spatial resolution of 0.25°, was derived from a daily gridded precipitation dataset established by Japan’s Asian Precipitation-Highly Resolved Observational Data Integration Towards Evaluation of Water Resources ( APHRODITE) research project [34]. Temperature and wind speed data, with a spatial resolution of 0.25°, were derived from the Global Meteorological Forcing Dataset [35]. The daily streamflow observation data (Table 1) was obtained from Mohammed et al. [36]. In addition, the soil data and land cover data were obtained from the Harmonized World Soil Database (HWSD) [37], and the global land cover data, at 1 km resolution, developed by the University of Maryland [38], respectively. The digital elevation model (DEM) was acquired from Shuttle Radar Topography Mission (SRTM) elevation data (https://srtm.csi.cgiar.org/, accessed on 18 August 2020).

3. Methods

3.1. Physical Model with VIC Coupled CaMa-Flood

The VIC model [39] is a large-scale, distributed-land surface, hydrological model, based on physical mechanisms. The VIC model mainly considers the physical exchange processes between atmosphere, vegetation, and soil. Since its development, the VIC model was widely used, and the model structure continuously improved and optimized. The VIC-2L model was gradually upgraded to the VIC-3L model, which is widely used nowadays. The grid-based VIC model simulates land surface hydrological processes following the principles of energy balance and water balance, and derives various hydrological variables, including evapotranspiration, infiltration, surface runoff, and baseflow. The VIC model was repeatedly used in the LMRB [40,41,42], and obtained good simulation results. In this study, the grid-based VIC model was used, and the LMRB was divided into a grid of 1288 cells, with a spatial resolution of 0.25°.

The Catchment-based Macro-scale Floodplain (CaMa-Flood) model is a distributed global river routing model that calculates flow process effectively in continental-scale rivers [43,44]. Runoff generated by hydrological models, such as VIC [45] and H08 [46], can be the input of the CaMa-Flood model. The code and data of the CaMa-Flood model are from http://hydro.iis.u-tokyo.ac.jp/~yamadai/cama-flood/ (accessed on 2 November 2020). In this study, the runoff (including surface runoff and baseflow) generated by the VIC model was taken as the input of the CaMa-Flood model, to obtain the daily streamflow.

3.2. Physics Guided LSTM Model

The PGNN framework leverages the simulation output of the physics-based model and observational features to generate simulations, using a neural network architecture [21]. LSTM is a special variant of recurrent neural network (RNN), which can learn long-term dependent information. Compared with RNN, the LSTM model adds a three “gate” structure, which effectively removes the short-term dependency bottleneck of RNN. The LSTM network is mainly composed of four steps:

Step 1: The forgetting gate is used to determine how much previously useless information is discarded from the “cell state”. As shown in Equation (1), h_t₋₁ represents the output of the LSTM module at the moment t−1, x_t represents the input of the current moment t, and the value range of f_t is guaranteed between 0 and 1 through the activation function (Sigmoid function is generally used).

Step 2: The new information is updated to the “cell state” through the input gate, as shown in Equations (2) and (3). Equation (2) shows that the storage information ( $\tilde{c_{t}}$ ) at the current moment can be obtained through the tanh function, which connects the output (h_t₋₁) of the LSTM module at the moment t−1 with the input (x_t) at the current moment. Equation (3) plays the same role as Equation (1), but Equation (3) is designed for the input information.

Step 3: As shown in Equation (4), this step is mainly to update the “cell state”. The previous useless information is discarded by multiplying the stored information (c_t₋₁) at the previous moment by the f_t calculated in the first step. The useful information is retained at the current moment by multiplying the stored information ( $\tilde{c_{t}}$ ) at the current moment with i_t calculated in Equation (3). The information of the present moment is the sum of the previous useful information and the current useful information.

Step 4: The output is obtained based on the “cell state”, which is the output gate. First, Equation (5) is used to determine which part of the “cell state” is the output. Then, the tanh activation function is used to process the “cell state”, and it is multiplied by the output of the Equation (5) to obtain the output information (h_t).

(1) $f_{t} = σ (W_{f} \cdot [h_{t - 1,} x_{t}] + b_{f})$

(2) $\tilde{c_{t}} = \tanh (W_{c} \cdot [h_{t - 1,} x_{t}] + b_{c})$

(3) $i_{t} = σ (W_{i} \cdot [h_{t - 1,} x_{t}] + b_{i})$

(4) $c_{t} = f_{t} \circ c_{t - 1} + i_{t} \circ \tilde{c_{t}}$

(5) $o_{t} = σ (W_{o} \cdot [h_{t - 1,} x_{t}] + b_{o})$

(6) $h_{t} = o_{t} \circ \tanh (c_{t})$

Here, we applied the PGNN framework, which combined VIC-CaMa-Flood and the popular deep learning technique, LSTM, to make full use of the advantages of these two models and, thus, improve the simulation performance of streamflow and flood. The meteorological variables (including precipitation, maximum temperature, minimum temperature, and wind speed), and the daily streamflow calculated by the VIC-CaMa-Flood coupled model, were input into the LSTM model together, which was called the HPD model, and its model structure is shown in Figure 2. The meteorological variables and the VIC-CaMa-Flood streamflow correspond to the selected hydrological stations. We set the length of the input features at 365 days, so the HPD model was used to simulate daily streamflow in the LMRB from 1966 to 2015. Due to the significant increase in reservoir storage capacity after 2008 [42], we divided the annual streamflow series into two stages: 1966–2007 (less impacted period), and 2008–2015 (human impact period). Moreover, we used data from the less impacted period to verify the effectiveness of the HPD model. Specifically, the daily streamflow data covering January 1966 to December 1992 were selected as the training dataset, the daily streamflow data covering January 1993 to December 1997 were selected as the validation dataset, and the daily streamflow data covering January 1998 to December 2007 were selected as the testing dataset. In addition, in order to verify the effectiveness and necessity of adding the output of the process-based model into the data-driven model, we used the individual LSTM model, whose inputs only contained meteorological variables (precipitation, maximum temperature, minimum temperature, and wind speed), to simulate streamflow and flood. Moreover, the division of training period, validation period, and testing period of the individual LSTM model was consistent with that of the HPD model. For hyperparameters, the HPD model and the individual LSTM model had the same setting. The number of stacked LSTM layers was one, and hidden state length was fifteen.

3.3. GBRT for Model Inputs Importance Measurement

The gradient boosting regression tree (GBRT) is an iterative decision tree algorithm, which constructs a set of weak learners (decision trees), and accumulates the results of multiple decision trees as the final prediction output. The algorithm effectively combines the decision tree with the integration idea. The main purpose of feature importance evaluation using GBRT is to quantify how much contribution each feature makes in each decision tree, take an average value, and finally compare the contribution between features. In this study, the meteorological inputs and VIC-CaMa-Flood streamflow were the input features, and the observed streamflow was the output. In order to be consistent with the training of the HPD model, we selected the data of the training period to analyze the importance of input features for streamflow simulation at the 5 hydrological stations using the GBRT method.

3.4. Evaluation Method

Three indices were selected as flood characteristics in this study, including maximum annual flow (MAF), the 95th percentile maximum streamflow (Q95), and the 90th percentile maximum streamflow (Q90). In addition, simulation performances of the HPD, individual LSTM, and the VIC-CaMa-Flood model were quantitatively evaluated using three statistical indicators, namely, Nash–Sutcliffe efficiency (NSE) [47], relative error (RE), and correlation coefficient (R2) (Equations (7)–(9)). The NSE value was used to verify the conformity between the measured and simulated values, and its maximum value was 1. These indicators are defined as follows:

(7) $N S E = 1 - \frac{\sum_{i = 1}^{N} {(Q_{o b s, i} - Q_{s i m, i})}^{2}}{\sum_{i = 1}^{N} {(Q_{o b s, i} - {\bar{Q}}_{o b s})}^{2}}$

(8) $R E = \frac{\sum_{i = 1}^{N} Q_{s i m, i} - \sum_{i = 1}^{N} Q_{o b s, i}}{\sum_{i = 1}^{N} Q_{o b s, i}}$

(9) $R^{2} = \frac{\sum_{i = 1}^{N} (Q_{o b s, i} - {\bar{Q}}_{o b s}) (Q_{s i m, i} - {\bar{Q}}_{s i m})}{\sqrt{\sum_{i = 1}^{N} {(Q_{o b s, i} - {\bar{Q}}_{o b s})}^{2} \sum_{i = 1}^{N} {(Q_{s i m, i} - {\bar{Q}}_{s i m})}^{2}}}$

where

Q_{o b s, i}

and

Q_{s i m, i}

are the observed and simulated daily streamflow on the i-th day, respectively;

{\bar{Q}}_{o b s}

and

{\bar{Q}}_{s i m}

are the mean values of the observed and simulated daily streamflow, respectively; and N is the total number of days.

4. Results and Discussion

4.1. Identification of Streamflow Break Points

Figure 3 summarizes the abrupt change points of the annual observed streamflow series at the 5 stations from 1965 to 2015, using the Mann–Kendall test [48]. From the figure, the CS and VT stations have an abrupt change point in 2008. This abrupt change point might correspond to the rapid increase in reservoir storage capacity in the LMRB in 2008, due to the construction of two large reservoirs, the Nuozhadu and the Jinhong reservoirs [42,49]. After 2008, the abrupt change point at the LP station is in 2013, the abrupt change point at the PK station is in 2014, and there is no abrupt change point at the MK station. Although there are several abrupt change points before 2008, it is likely there was an affect due to climate change. Existing studies [42,49,50] also indicate that abrupt changes occur in 2008 in the LMRB, and the impact of human activities is minor before 2008. Therefore, during the period of 1965–2007, the LMRB is slightly affected by human activities, and mainly affected by climate change; the period from 2008 to 2015 is the human impact period.

4.2. The Performance of HPD Model

Figure 4 shows the daily time series of observed and simulated streamflow at the five selected stations in the testing and human impact period, and Table 2 shows the streamflow simulation results of the HPD model in the less impacted period (including the training period (1966–1992), validation period (1993–1997), testing period (1998–2007), and human impact period (2008–2015)). The NSE values range from 0.91 to 0.96, and the RE values ranges from −9.6% to 4.5% during the testing period. Previous studies argue that a model is considered satisfactory when the NSE value is greater than 0.50 and RE value is less than 25.0% [51]. The NSE values are 0.63, 0.62, and the RE values are 11.0%, 20.1% at the CS and VT stations in the human impact period, respectively. The NSE values are 0.88, 0.93, and 0.88, and the RE values are 0.1%, 1.9%, and 12.0% at the LP, MK, and PK stations in the human impact period, respectively. However, compared to the less impacted period, the simulation performances at the CS and VT stations in the human impact period are obviously worse than those during the less impacted period, which might be because these two stations already experienced abrupt change in streamflow in 2008. Since the abrupt change points at the other three stations appear later, the simulation performance in the human impact period is still very good.

Table 3 and Table 4 show the observed and simulated flood indicators (include MAF, Q95, and Q90) in the testing period (1998–2007) and human impact period (2008–2015), respectively. The RE value of MAF ranges from −19.1% to −8.1%, the RE value of Q95 ranges from −13.9% to −1.1%, and the RE value of Q90 ranges from −8.6% to −1.0% in testing period, which indicates the HPD model is capable of simulating flood in the less impacted period. The RE value of MAF ranges from −8.5% to 7.8%, the RE value of Q95 ranges from −1.1% to 17.2%, and the RE value of Q90 ranges from 4.4% to 26.0% in the human impact period. The RE values in the human influence period are greater than those in the less impacted period. Figure 5 shows the scatter plot, regression line, and performance of the simulated and measured flood at all selected stations in the testing period and human impact period. All R² values are close to 1. The NSE values are 0.934, 0.822, and the RE values are −9.2%, 18.7% of Q95 in the testing period and human impact period, respectively. The NSE values are 0.956, 0.646, and the RE values are −7.0%, 26.5% of Q90 in the testing period and human impact period, respectively. Obviously, the simulation performance of Q95 and Q90 during the testing period is better than that of the human impact period. The NSE values are 0.858, 0.871, and the RE values are −13.9%, 3.4% of MAF in the testing period and human impact period, respectively. Although the NSE values of MAF are close, the RE value during the testing period is smaller than that during the human impact period, which indicates that the simulation performance on MAF during the testing period is better than that of the human impact period. In addition, we find that the RE values of the three flood indicators simulated by the HPD model during the human impact period are greater than that during the testing period, which shows that the effects of human activities on floods decreases after 2008.

The above results indicate that the HPD model satisfactorily simulates streamflow and flood under climate change, which provides an important technical means for the future study of changes of streamflow and flood under climate change. For the human impact period, streamflow and flood affected by human activities decrease. The HPD model’s performance deteriorates in the human impact period. This is mainly because this period is not used as training data, as other studies show that, given sufficient training data, LSTM learns management patterns and simulates streamflow from basins with certain reservoirs [52].

4.3. Comparing HPD Model with VIC-CaMa-Flood and Individual LSTM Model

Figure 6 shows the performance of the VIC-CaMa-Flood model, individual LSTM model, and HPD model in simulating streamflow and flood. The HPD model outperforms the other two models by a significant margin.

In the streamflow simulation of the VIC-CaMa-Flood model, NSE is above 0.5 at only one station, while NSE values at the remaining stations are all less than 0.5, and the RE values are all greater than 40.0%. In the flood simulation of the VIC-CaMa-Flood model, the RE values are almost all greater than 40.0%. Therefore, the VIC-CaMa-Flood model does not simulate streamflow and flood well. However, the streamflow obtained from the VIC-CaMa-Flood simulation, and its meteorological driven data, were input into the LSTM model (i.e., HPD model), which significantly improves the accuracy of streamflow and three flood indicators. Therefore, the data-driven model effectively improves the simulation performance of a physically-based model, which indicates that the data-driven model plays a role in correcting the simulation deviation of the physically-based model. This is consistent with conclusion of Yang et al. [27].

In the streamflow simulation of the individual LSTM model, NSE values range from 0.6 to 0.8, and the RE values are all less than 20%. The results of the individual LSTM model in streamflow simulation are satisfactory. However, in the three flood indicators simulation, the RE values are barely greater than 20%, which indicates that the individual LSTM model does not simulate flood well. When we added the VIC-CaMa-Flood streamflow into the LSTM model, the performance greatly improved. Therefore, we believe that the output of physical model can be used as a considerably important input feature of data-driven model.

The GBRT model used in scikit-learn [53] has become a widely-used feature importance ranking method. To further evaluate the relative importance of each contributing feature, we used the GBRT method to measure the five input features of the HPD model. From Figure 7, VIC-CaMa-Flood streamflow contributes about 30.0% to the observed streamflow, while the contribution rates of the other input features (precipitation, maximum temperature, minimum temperature, wind speed) are between 16.1% and 19.3% at the five stations. The VIC-CaMa-Flood streamflow has the largest contribution rate, indicating that this input feature is more important than the other four inputs. Therefore, a machine learning model, in this case LSTM, is significantly reinforced by the input of our mechanistic understanding, such as simulated streamflow, from a physically-based model.

The HPD model proposed in this study not only improves the simulation ability of the physically-based models, in terms of computational expense and simulation accuracy, but also enables machine learning to contain a certain degree of the mechanistic understanding. Combining physical process models with machine learning for modeling makes them complementary to each other, and strikes a balance between model complexity and data availability. At the same time, the HPD model is also applicable to other river basins, and on a global scale [27]. Extremely excellent simulation capabilities make it possible for the HPD model to guide more effective flood simulation and prediction systems. Moreover, it may also become an effective tool for studying future flood scenarios under climate change.

There are certain limitations with our HPD model. The limitation is that our HPD model is applicable to river basins, and on a global scale, but it is not applicable to small watershed, or on a local scale. However, this study also provides some exploration ideas for future streamflow simulation on small watershed and on a local scale. Our HPD model in this study used a large-scale hydrological model (i.e., the VIC-CaMa-Flood model), which can make full use of topographic data such as DEM, land use, and slope on a river basin scale to provide useful predictive information for machine learning. Therefore, it remains to be explored whether topographic data can be integrated into machine learning to improve streamflow simulation on the small watershed and on a local scale, by considering physical models applicable to the small watershed and local scale.

4.4. The Impact of Human Activities on the Flood

Figure 8 shows the average flow changes and relative changes of the observed and simulated flood obtained by the HPD model at the five stations (CS, LP, VT, MK, PK) in the human impact period (2008–2016), compared to the testing period (1998–2007). Compared with the testing period, three observed flood indicators (MAF, Q95, Q90) are significantly reduced in the human impact period, especially at the CS station. The observed flood decreases in MAF (−2570.53.03 to −1160.13 m³/s, −6% to −26%), decreases in Q95 (−2719.33 to −1356.71 m³/s, −7% to −30%), and decreases in Q90 (−3203.10 to −627.61.61 m³/s, −7% to −30%). The change of observed flood is large (>25%) at the CS station. The MAF, Q95, and Q90 at the CS station decrease by 26% (−2570.53 m³/s), 30% (−2085.21 m³/s), and 30% (−1799.33 m³/s), respectively. Both the Jinghong and Xiaowan reservoirs were under construction during 2005–2007, and the Nuozhadu reservoir was constructed and put into operation in 2008 [49]. With the construction of the reservoirs, the flood is affected by both climate change and reservoirs in the human impact period. The flood simulated by the HPD model at all stations shows consistent changes on three flood indicators: it decreases at the CS station, while increasing at the LP, VT, MK, and PK stations. Since the HPD model simulates flood well in the LMRB under climate change, the flood simulated by HPD is considered to only be affected by climate change, rather than by human activities. The MAF, Q95, and Q90 at the CS station decrease by 16% (−1372.64 m³/s), 22% (−1493.82 m³/s), and 17% (−1066.47 m³/s) under climate change, respectively. The changes of the observed floods at all stations are smaller than the flood simulated by the HPD model. The construction of reservoirs significantly reduces flood in the LMRB, which is consistent with the analysis performed by Yun et al. [42].

5. Conclusions

We developed a neural network model leveraging outputs from a process-based model, i.e., VIC-CaMa-Flood, and meteorological forcing data to simulate streamflow and flood at five hydrological stations during 1966–2015 in the LMRB. The results show that our hybrid physics-data (HPD) methodology delivers advantageous accuracy in streamflow and flood simulation, outperforming both the pure process-based VIC-CaMa-Flood model and the pure observational data driven LSTM model, by a large margin. These results suggest the usefulness of introducing physical regularization in data-driven modeling, and the necessity of observation-informed bias correction for process-based models. We further developed a gradient boosting tree method to measure the information contribution from the process-based model simulation and the meteorological forcing data in our HPD methodology. The results show that the process-based model simulation contributes about 30% to the HPD outcome, outweighing the information contribution from each of the meteorological forcing variables (<20%). Our HPD methodology inherits the physical mechanism of the process-base model and the high predictability capability of the LSTM model, offering a novel way to make use of incomplete physical understanding and insufficient data to enhance streamflow and flood predictions. We draw the following conclusions based on the experimental results:

(1). For the streamflow simulation of the HPD model, the NSE values are greater than 0.90, and the RE values are less than 10% in the less impacted period. For flood simulation of the HPD model, the NSE values are greater than 0.86, and the RE values are less than 20% in the less impacted period. These simulation results show that the HPD model simulates streamflow and flood well under climate change, and the performance is better than that of a pure process-based model or pure data-driven model. The reasonable integration of a hydrological model and deep learning is expected to provide accurate streamflow simulation or flood simulation, suggesting that the physics-guided long short-term memory network model is promising for hydrological application at basin scale.
(2). In deep learning, a good simulation model is largely constrained by effective features. The VIC-CaMa-Flood streamflow contributes about 30.0% to the simulation of the HPD model, while the contribution rates of other input features are between 16.1% and 19.3%. Therefore, the streamflow simulated by the physical model is an important feature for deep learning. This feature is obtained through our current understanding of the hydrological cycle process, which improves the accuracy of deep learning simulation, and also makes the data-driven model physically meaningful.
(3). Under climate change, the flood at the CS station decreases by 16–22%, while the flood at the other four stations show an increasing trend in the period 2008–2015. The observed floods at all stations are significantly reduced, and observed flood variation is less than that under climate change. The result implies that the construction of reservoirs may significantly reduce flood in the LMRB.

Author Contributions

Conceptualization, Q.T. and B.L.; methodology, Q.T. and B.L.; software, B.L.; validation, B.L.; formal analysis, B.L.; investigation, Q.T. and B.L.; resources, Q.T.; data curation, Q.T. and B.L.; writing—original draft preparation, B.L.; writing—review and editing, Q.T., B.L., G.Z., L.G., B.P. and C.S.; visualization, B.L.; supervision, Q.T. and C.S.; project administration, Q.T.; funding acquisition, Q.T. All authors have read and agreed to the published version of the manuscript.

Data Availability Statement

The daily streamflow observations are available from Mohammed et al. [36] (https://www.sciencedirect.com/science/article/pii/S2352340918314318), precipitation data from APHRODITE research project (http://aphrodite.st.hirosaki-u.ac.jp/download/), temperature and wind speed data from the GMFD (https://rda.ucar.edu/datasets/ds314.0/), soil and land cover data from HWSD (http://webarchive.iiasa.ac.at/Research/LUC/External-World-soil-database/), and DEM from SRTM elevation data (https://srtm.csi.cgiar.org/). The VIC model source codes can be available from https://vic.readthedocs.io/en/master/, and the CaMa-Flood model can be visited at http://hydro.iis.u-tokyo.ac.jp/∼yamadai/cama-flood/.

Acknowledgments

The authors would like to thank the CaMa-Flood model supported from Dai Yamazaki.

Conflicts of Interest

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Figures and Tables

Figure 1. The LMRB and locations of the mainstream hydrological stations.

Figure 2. The HPD model based on LSTM model and VIC-CaMa-Flood coupling model.

View Image - Figure 3. Abrupt change detection using Mann–Kendall test at five hydrological stations in 1965–2015: (a) CS station, (b) LP station, (c) VT station, (d) MK station, (e) PK station. UF, positive sequence statistics; UB, reversal sequence statistics.

Figure 3. Abrupt change detection using Mann–Kendall test at five hydrological stations in 1965–2015: (a) CS station, (b) LP station, (c) VT station, (d) MK station, (e) PK station. UF, positive sequence statistics; UB, reversal sequence statistics.

Figure 4. Observed and simulated daily streamflow using HPD model at five hydrological stations from 1998 to 2015 in the LMRB.

View Image - Figure 5. The relationship between measured and simulated flood (i.e., MAF, Q95, and Q90) for all selected stations in testing period and human impact period. The Nash–Sutcliffe efficiency (NSE), relative error (RE), and correlation coefficient (R2) are also shown.

Figure 5. The relationship between measured and simulated flood (i.e., MAF, Q95, and Q90) for all selected stations in testing period and human impact period. The Nash–Sutcliffe efficiency (NSE), relative error (RE), and correlation coefficient (R2) are also shown.

View Image - Figure 6. Comparison of simulations from VIC-CaMa-Flood model, individual LSTM model, and HPD model for testing period (1998–2007): (a) Distribution of NSE across all stations for daily mean streamflow; (b) distribution of RE across all stations for daily mean streamflow; (c) distribution of RE for three flood indicators.

Figure 6. Comparison of simulations from VIC-CaMa-Flood model, individual LSTM model, and HPD model for testing period (1998–2007): (a) Distribution of NSE across all stations for daily mean streamflow; (b) distribution of RE across all stations for daily mean streamflow; (c) distribution of RE for three flood indicators.

View Image - Figure 7. Importance of HPD model inputs (pr, precipitation; tmax, maximum temperature; tmin, minimum temperature; wind, wind speed; VIC-CaMa-Flood streamflow, streamflow simulated by VIC-CaMa-Flood model) at five selected stations in the training period.

Figure 7. Importance of HPD model inputs (pr, precipitation; tmax, maximum temperature; tmin, minimum temperature; wind, wind speed; VIC-CaMa-Flood streamflow, streamflow simulated by VIC-CaMa-Flood model) at five selected stations in the training period.

View Image - Figure 8. Average and relative change of the observed and simulated flood obtained by HPD model at five selected stations in human impact period (2008–2015) compared with testing period (1998–2007).

Figure 8. Average and relative change of the observed and simulated flood obtained by HPD model at five selected stations in human impact period (2008–2015) compared with testing period (1998–2007).

Table 1

The five hydrological gauging stations.

No.	Station (abbr.)	Location		Country	Drainage Area (km²)	Data Record
No.	Station (abbr.)	Longitude (°)	Latitude (°)	Country	Drainage Area (km²)	Data Record
1	Chiang Saen (CS)	100.117	20.292	Thailand	191,055	1965–2015
2	Luang Prabang (LP)	102.082	19.878	Laos, PDR	273,838	1965–2015
3	Vientiane (VT)	102.620	18.049	Laos, PDR	303,528	1965–2015
4	Mukdahan (MK)	104.743	16.529	Thailand	394,134	1965–2015
5	Pakse (PK)	105.795	15.115	Laos, PDR	550,955	1965–2015

Table 2

Streamflow performance of HPD model for training, validation, testing, and human impact period.

Station	Less impacted Period (1966–2007)						Human Impact Period(2008–2015)
	Training Period(1966–1992)		Validation Period(1993–1997)		Testing Period(1998–2007)		Human Impact Period(2008–2015)
	NSE	RE	NSE	RE	NSE	RE	NSE	RE
CS	0.93	0.006	0.90	0.052	0.91	0.045	0.63	0.110
LP	0.96	0.007	0.93	−0.006	0.92	0.018	0.88	0.001
VT	0.96	−0.008	0.94	−0.051	0.94	−0.021	0.62	0.201
MK	0.98	−2 × 10⁻⁶	0.96	−0.024	0.93	−0.096	0.93	0.019
PK	0.98	0.009	0.98	−0.003	0.96	−0.028	0.88	0.120

Table 3

Flood indicators performance of HPD model for testing period (1998–2007).

Station	MAF (m³/s)			Q95 (m³/s)			Q90 (m³/s)
Station	Obs	Sim	RE	Obs	Sim	RE	Obs	Sim	RE
CS	9855	8803	−0.107	6979	6905	−0.011	5994	5935	−0.010
LP	15,357	13,174	−0.142	11,288	9723	−0.139	9236	8765	−0.051
VT	16,435	13,771	−0.162	12,605	11,474	−0.090	10,759	10,124	−0.059
MK	30,775	24,897	−0.191	25,026	21,977	−0.122	21,923	20,203	−0.078
PK	36,164	33,249	−0.081	30,441	28,119	−0.076	27,905	25,519	−0.086

Table 4

Flood indicators performance of HPD model for human impact period (2008–2015).

Station	MAF (m³/s)			Q95 (m³/s)			Q90 (m³/s)
Station	Obs	Sim	RE	Obs	Sim	RE	Obs	Sim	RE
CS	7284	7430	0.020	4894	5411	0.156	4195	4929	0.175
LP	13,890	12,174	−0.085	9932	9826	−0.011	8609	8987	0.044
VT	15,274	14,889	−0.025	10,742	12,593	0.172	9279	11,695	0.260
MK	27,079	26,791	−0.011	22,768	23,965	0.053	19,910	21,214	0.065
PK	34,101	36,752	0.078	27,722	31,296	0.129	24,147	27,228	0.128

References

1. UNDRR. Global Assessment Report on Disaster Risk Reduction; United Nations Office for Disaster Risk Reduction (UNDRR): Geneva, Switzerland, 2019.

2. Young, C.C.; Liu, W.C.; Wu, M.C. A physically based and machine learning hybrid approach for accurate rainfall-runoff modeling during extreme typhoon events. Appl. Soft Comput.; 2017; 53, pp. 205-216. [DOI: https://dx.doi.org/10.1016/j.asoc.2016.12.052]

3. dos Santos, F.M.; de Oliveira, R.P.; Mauad, F.F. Lumped versus Distributed Hydrological Modeling of the Jacare-Guacu Basin, Brazil. J. Environ. Eng.; 2018; 144, 04018056. [DOI: https://dx.doi.org/10.1061/(ASCE)EE.1943-7870.0001397]

4. Kadkhodazadeh, M.; Valikhan Anaraki, M.; Morshed-Bozorgdel, A.; Farzin, S. A New Methodology for Reference Evapotranspiration Prediction and Uncertainty Analysis under Climate Change Conditions Based on Machine Learning, Multi Criteria Decision Making and Monte Carlo Methods. Sustainability; 2022; 14, 2601. [DOI: https://dx.doi.org/10.3390/su14052601]

5. Kadkhodazadeh, M.; Farzin, S. A Novel LSSVM Model Integrated with GBO Algorithm to Assessment of Water Quality Parameters. Water Resour. Manag.; 2021; 35, pp. 3939-3968. [DOI: https://dx.doi.org/10.1007/s11269-021-02913-4]

6. Pan, B.; Hsu, K.; AghaKouchak, A.; Sorooshian, S. Improving Precipitation Estimation Using Convolutional Neural Network. Water Resour. Res.; 2019; 55, pp. 2301-2321. [DOI: https://dx.doi.org/10.1029/2018WR024090]

7. Wentao, l.; Pan, B.; Xia, J.; Duan, Q. Convolutional neural network-based statistical post-processing of ensemble precipitation forecasts. J. Hydrol.; 2021; 605, 127301. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2021.127301]

8. Adnan, R.M.; Liang, Z.; Trajkovic, S.; Zounemat-Kermani, M.; Li, B.; Kisi, O. Daily streamflow prediction using optimally pruned extreme learning machine. J. Hydrol.; 2019; 577, 123981. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2019.123981]

9. Sahoo, S.; Russo, T.A.; Elliott, J.; Foster, I. Machine learning algorithms for modeling groundwater level changes in agricultural regions of the US. Water Resour. Res.; 2017; 53, pp. 3878-3895. [DOI: https://dx.doi.org/10.1002/2016WR019933]

10. Liu, W.; Yang, T.; Sun, F.; Wang, H.; Feng, Y.; Du, M. Observation-Constrained Projection of Global Flood Magnitudes with Anthropogenic Warming. Water Resour. Res.; 2021; 57, e2020WR028830. [DOI: https://dx.doi.org/10.1029/2020WR028830]

11. Kratzert, F.; Klotz, D.; Brenner, C.; Schulz, K.; Herrnegger, M. Rainfall-runoff modelling using Long Short-Term Memory (LSTM) networks. Hydrol. Earth Syst. Sci.; 2018; 22, pp. 6005-6022. [DOI: https://dx.doi.org/10.5194/hess-22-6005-2018]

12. Kratzert, F.; Klotz, D.; Shalev, G.; Klambauer, G.; Hochreiter, S.; Nearing, G. Benchmarking a Catchment-Aware Long Short-Term Memory Network (LSTM) for Large-Scale Hydrological Modeling. Hydrol. Earth Syst. Sci. Discuss.; 2019; pp. 1-32. [DOI: https://dx.doi.org/10.5194/hess-2019-368]

13. Reichstein, M.; Camps-Valls, G.; Stevens, B.; Jung, M.; Denzler, J.; Carvalhais, N. Prabhat. Deep learning and process understanding for data-driven Earth system science. Nature; 2019; 566, pp. 195-204. [DOI: https://dx.doi.org/10.1038/s41586-019-0912-1] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30760912]

14. Shen, C.P.; Laloy, E.; Elshorbagy, A.; Albert, A.; Bales, J.; Chang, F.; Ganguly, S.; Hsu, K.; Kifer, D.; Fang, Z. et al. HESS Opinions: Incubating deep-learning-powered hydrologic science advances as a community. Hydrol. Earth Syst. Sci.; 2018; 22, pp. 5639-5656. [DOI: https://dx.doi.org/10.5194/hess-22-5639-2018]

15. Feng, D.; Fang, K.; Shen, C. Enhancing Streamflow Forecast and Extracting Insights Using Long-Short Term Memory Networks with Data Integration at Continental Scales. Water Resour. Res.; 2020; 56, e2019WR026793. [DOI: https://dx.doi.org/10.1029/2019WR026793]

16. Xiang, Z.R.; Yan, J.; Demir, I. A Rainfall-Runoff Model with LSTM-Based Sequence-to-Sequence Learning. Water Resour. Res.; 2020; 56, e2019WR025326. [DOI: https://dx.doi.org/10.1029/2019WR025326]

17. Feng, D.; Lawson, K.; Shen, C. Mitigating Prediction Error of Deep Learning Streamflow Models in Large Data-Sparse Regions with Ensemble Modeling and Soft Data. Water Resour. Res.; 2021; 48, e2021GL092999. [DOI: https://dx.doi.org/10.1029/2021GL092999]

18. Ma, K.; Feng, D.; Lawson, K.; Tsai, W.-P.; Liang, C.; Huang, X.; Sharma, A.; Shen, C. Transferring Hydrologic Data Across Continents—Leveraging Data-Rich Regions to Improve Hydrologic Prediction in Data-Sparse Regions. Water Resour. Res.; 2021; 57, e2020WR028600. [DOI: https://dx.doi.org/10.1029/2020WR028600]

19. Shamseldin, A.Y.; O’Connor, K.M. A non-linear neural network technique for updating of rsiver flow forecasts. Hydrol. Earth Syst. Sci.; 2001; 5, pp. 577-597. [DOI: https://dx.doi.org/10.5194/hess-5-577-2001]

20. Anctil, F.; Perrin, C.; Andreassian, V. Ann output updating of lumped conceptual rainfall/runoff forecasting models. J. Am. Water Resour. Assoc.; 2003; 39, pp. 1269-1279. [DOI: https://dx.doi.org/10.1111/j.1752-1688.2003.tb03708.x]

21. Karpatne, A.; Watkins, W.; Read, J.; Kumar, V. Physics-guided Neural Networks (PGNN): An Application in Lake Temperature Modeling. Proceedings of the ACM SIGKDD 2018 International Conference; London, UK, 19–23 August 2018.

22. Karpatne, A.; Atluri, G.; Faghmous, J.H.; Steinbach, M.; Banerjee, A.; Ganguly, A.; Shekhar, S.; Samatova, N.; Kumar, V. Theory-Guided Data Science: A New Paradigm for Scientific Discovery from Data. IEEE Trans. Knowl. Data Eng.; 2017; 29, pp. 2318-2331. [DOI: https://dx.doi.org/10.1109/TKDE.2017.2720168]

23. Read, J.S.; Jia, X.; Willard, J.; Appling, A.P.; Zwart, J.A.; Oliver, S.K.; Karpatne, A.; Hansen, G.J.A.; Hanson, P.C.; Watkins, W. et al. Process-Guided Deep Learning Predictions of Lake Water Temperature. Water Resour. Res.; 2019; 55, pp. 9173-9190. [DOI: https://dx.doi.org/10.1029/2019WR024922]

24. Daw, A.; Thomas, R.Q.; Carey, C.C.; Read, J.S.; Appling, A.P.; Karpatne, A. Physics-Guided Architecture (PGA) of Neural Networks for Quantifying Uncertainty in Lake Temperature Modeling. Proceedings of the SIAM International Conference on Data Mining (SDM); Cincinnati, OH, USA, 7–9 May 2020; pp. 532-540.

25. Panda, R.K.; Pramanik, N.; Bala, B. Simulation of river stage using artificial neural network and MIKE 11 hydrodynamic model. Comput. Geosci.; 2010; 36, pp. 735-745. [DOI: https://dx.doi.org/10.1016/j.cageo.2009.07.012]

26. Napolitano, G.; See, L.; Calvo, B.; Savi, F.; Heppenstall, A. A conceptual and neural network model for real-time flood forecasting of the Tiber River in Rome. Phys. Chem. Earth; 2010; 35, pp. 187-194. [DOI: https://dx.doi.org/10.1016/j.pce.2009.12.004]

27. Yang, T.; Sun, F.B.; Gentine, P.; Liu, W.B.; Wang, H.; Yin, J.B.; Du, M.Y.; Liu, C.M. Evaluation and machine learning improvement of global hydrological model-based flood simulations. Environ. Res. Lett.; 2019; 14, 114027. [DOI: https://dx.doi.org/10.1088/1748-9326/ab4d5e]

28. Razavi, S. Deep learning, explained: Fundamentals, explainability, and bridgeability to process-based modelling. Environ. Model. Softw.; 2021; 144, 105159. [DOI: https://dx.doi.org/10.1016/j.envsoft.2021.105159]

29. Dee, D.P.; Uppala, S.M.; Simmons, A.J.; Berrisford, P.; Poli, P.; Kobayashi, S.; Andrae, U.; Balmaseda, M.A.; Balsamo, G.; Bauer, P. et al. The ERA-Interim reanalysis: Configuration and performance of the data assimilation system. Q. J. R. Meteorol. Soc.; 2011; 137, pp. 553-597. [DOI: https://dx.doi.org/10.1002/qj.828]

30. O’Connor, J.E.; Costa, J.E. The World’s Largest Floods, Past and Present: Their Causes and Magnitudes; U.S. Geological Survey Circular: Reston, HI, USA, 2004; [DOI: https://dx.doi.org/10.3133/cir1254]

31. Hirabayashi, Y.; Mahendran, R.; Koirala, S.; Konoshima, L.; Yamazaki, D.; Watanabe, S.; Kim, H.; Kanae, S. Global flood risk under climate change. Nat. Clim. Change; 2013; 3, pp. 816-821. [DOI: https://dx.doi.org/10.1038/nclimate1911]

32. Hoang, L.P.; Lauri, H.; Kummu, M.; Koponen, J.; van Vliet, M.T.H.; Supit, I.; Leemans, R.; Kabat, P.; Ludwig, F. Mekong River flow and hydrological extremes under climate change. Hydrol. Earth Syst. Sci.; 2016; 20, pp. 3027-3041. [DOI: https://dx.doi.org/10.5194/hess-20-3027-2016]

33. Rasanen, T.A.; Kummu, M. Spatiotemporal influences of ENSO on precipitation and flood pulse in the Mekong River Basin. J. Hydrol.; 2013; 476, pp. 154-168. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2012.10.028]

34. Yatagai, A.; Arakawa, O.; Kamiguchi, K.; Kawamoto, H.; Nodzu, M.I.; Hamada, A. A 44-Year Daily Gridded Precipitation Dataset for Asia Based on a Dense Network of Rain Gauges. Sola; 2009; 5, pp. 137-140. [DOI: https://dx.doi.org/10.2151/sola.2009-035]

35. Sheffield, J.; Goteti, G.; Wood, E.F. Development of a 50-year high-resolution global dataset of meteorological forcings for land surface modeling. J. Clim.; 2006; 19, pp. 3088-3111. [DOI: https://dx.doi.org/10.1175/JCLI3790.1]

36. Mohammed, I.N.; Bolten, J.D.; Srinivasan, R.; Meechaiya, C.; Spruce, J.P.; Lakshmi, V. Ground and satellite based observation datasets for the Lower Mekong River Basin. Data Brief; 2018; 21, pp. 2020-2027. [DOI: https://dx.doi.org/10.1016/j.dib.2018.11.038] [PubMed: https://www.ncbi.nlm.nih.gov/pubmed/30510987]

37. Fischer, G.; Nachtergaele, F.; Prieler, S.; Velthuizen, H.T.; Verelst, L.; Wiberg, D. Global Agro-Ecological Zones Assessment for Agriculture (GAEZ 2008); IIASA: Laxenburg, Austria, FAO: Rome, Italy, 2008.

38. Hansen, M.C.; Defries, R.S.; Townshend, J.R.G.; Sohlberg, R. Global land cover classification at 1km resolution using a decision tree classifier. Int. J. Remote Sens.; 2000; 21, pp. 1331-1364. [DOI: https://dx.doi.org/10.1080/014311600210209]

39. Liang, X.; Lettenmaier, D.P.; Wood, E.F.; Burges, S.J. A Simple Hydrologically Based Model of Land-Surface Water and Energy Fluxes for General-Circulation Models. J. Geophys. Res.-Atmos.; 1994; 99, pp. 14415-14428. [DOI: https://dx.doi.org/10.1029/94JD00483]

40. Chang, C.H.; Lee, H.; Hossain, F.; Basnayake, S.; Jayasinghe, S.; Chishtie, F.; Saah, D.; Yu, H.; Sothea, K.; Du Bui, D. A model-aided satellite-altimetry-based flood forecasting system for the Mekong River. Environ. Model. Softw.; 2019; 112, pp. 112-127. [DOI: https://dx.doi.org/10.1016/j.envsoft.2018.11.017]

41. Dang, T.D.; Chowdhury, A.K.; Galelli, S. On the representation of water reservoir storage and operations in large-scale hydrological models: Implications on model parameterization and climate change impact assessments. Hydrol. Earth Syst. Sci.; 2020; 24, pp. 397-416. [DOI: https://dx.doi.org/10.5194/hess-24-397-2020]

42. Yun, X.; Tang, Q.; Wang, J.; Liu, X.; Zhang, Y.; Lu, H.; Wang, Y.; Zhang, L.; Chen, D. Impacts of climate change and reservoir operation on streamflow and flood characteristics in the Lancang-Mekong River Basin. J. Hydrol.; 2020; 590, 125472. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2020.125472]

43. Yamazaki, D.; Kanae, S.; Kim, H.; Oki, T. A physically based description of floodplain inundation dynamics in a global river routing model. Water Resour. Res.; 2011; 47, W04501. [DOI: https://dx.doi.org/10.1029/2010WR009726]

44. Yamazaki, D.; de Almeida, G.A.M.; Bates, P.D. Improving computational efficiency in global river models by implementing the local inertial flow equation and a vector-based river network map. Water Resour. Res.; 2013; 49, pp. 7221-7235. [DOI: https://dx.doi.org/10.1002/wrcr.20552]

45. Wei, Z.W.; He, X.G.; Zhang, Y.G.; Pan, M.; Sheffield, J.; Peng, L.Q.; Yamazaki, D.; Moiz, A.; Liu, Y.P.; Ikeuchi, K. Identification of uncertainty sources in quasi-global discharge and inundation simulations using satellite-based precipitation products. J. Hydrol.; 2020; 589, 125180. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2020.125180]

46. Mateo, C.M.; Hanasaki, N.; Komori, D.; Tanaka, K.; Kiguchi, M.; Champathong, A.; Sukhapunnaphan, T.; Yamazaki, D.; Oki, T. Assessing the impacts of reservoir operation to floodplain inundation by combining hydrological, reservoir management, and hydrodynamic models. Water Resour. Res.; 2014; 50, pp. 7245-7266. [DOI: https://dx.doi.org/10.1002/2013WR014845]

47. Nash, J.E.; Sutcliffe, J.V. River flow forecasting through conceptual models part I—A discussion of principles. J. Hydrol.; 1970; 10, pp. 280-290. [DOI: https://dx.doi.org/10.1016/0022-1694(70)90255-6]

48. Hamed, K.H.; Rao, A.R. A modified Mann-Kendall trend test for autocorrelated data. J. Hydrol.; 1998; 204, pp. 182-196. [DOI: https://dx.doi.org/10.1016/S0022-1694(97)00125-X]

49. Han, Z.; Long, D.; Fang, Y.; Hou, A.; Hong, Y. Impacts of climate change and human activities on the flow regime of the dammed Lancang River in Southwest China. J. Hydrol.; 2019; 570, pp. 96-105. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2018.12.048]

50. Li, D.; Long, D.; Zhao, J.; Lu, H.; Hong, Y. Observed changes in flow regimes in the Mekong River basin. J. Hydrol.; 2017; 551, pp. 217-232. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2017.05.061]

51. Moriasi, D.N.; Arnold, J.G.; Van Liew, M.W.; Bingner, R.L.; Harmel, R.D.; Veith, T.L. Model evaluation guidelines for systematic quantification of accuracy in watershed simulations. Trans. ASABE; 2007; 50, pp. 885-900. [DOI: https://dx.doi.org/10.13031/2013.23153]

52. Ouyang, W.; Lawson, K.; Feng, D.; Ye, L.; Zhang, C.; Shen, C. Continental-scale streamflow modeling of basins with reservoirs: Towards a coherent deep-learning-based strategy. J. Hydrol.; 2021; 599, 126455. [DOI: https://dx.doi.org/10.1016/j.jhydrol.2021.126455]

53. Pedregosa, F.; Varoquaux, G.; Gramfort, A.; Michel, V.; Thirion, B.; Grisel, O.; Blondel, M.; Prettenhofer, P.; Weiss, R.; Dubourg, V. et al. Scikit-learn: Machine Learning in Python. J. Mach. Learn. Res.; 2011; 12, pp. 2825-2830.

Word count: 7169

Show less

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

A warming climate will intensify the water cycle, resulting in an exacerbation of water resources crises and flooding risks in the Lancang–Mekong River Basin (LMRB). The mitigation of these risks requires accurate streamflow and flood simulations. Process-based and data-driven hydrological models are the two major approaches for streamflow simulations, while a hybrid of these two methods promises advantageous prediction accuracy. In this study, we developed a hybrid physics-data (HPD) methodology for streamflow and flood prediction under the physics-guided neural network modeling framework. The HPD methodology leveraged simulation information from a process-based model (i.e., VIC-CaMa-Flood) along with the meteorological forcing information (precipitation, maximum temperature, minimum temperature, and wind speed) to simulate the daily streamflow series and flood events, using a long short-term memory (LSTM) neural network. This HPD methodology outperformed the pure process-based VIC-CaMa-Flood model or the pure observational data driven LSTM model by a large margin, suggesting the usefulness of introducing physical regularization in data-driven modeling, and the necessity of observation-informed bias correction for process-based models. We further developed a gradient boosting tree method to measure the information contribution from the process-based model simulation and the meteorological forcing data in our HPD methodology. The results show that the process-based model simulation contributes about 30% to the HPD outcome, outweighing the information contribution from each of the meteorological forcing variables (<20%). Our HPD methodology inherited the physical mechanisms of the process-based model, and the high predictability capability of the LSTM model, offering a novel way for making use of incomplete physical understanding, and insufficient data, to enhance streamflow and flood predictions.

Details

Title

Physics-Guided Long Short-Term Memory Network for Streamflow and Flood Simulations in the Lancang–Mekong River Basin

Author

Liu, Binxiao¹; Tang, Qiuhong¹

; Zhao, Gang²

; Gao, Liang³; Shen, Chaopeng⁴; Pan, Baoxiang⁵

¹ Key Laboratory of Water Cycle and Related Land Surface Processes, Institute of Geographic Sciences and Natural Resources Research, Chinese Academy of Sciences, Beijing 100101, China; [email protected]; University of Chinese Academy of Sciences, Beijing 100049, China
² Department of Global Ecology, Carnegie Institution for Science, Stanford, CA 94305, USA; [email protected]
³ State Key Laboratory of Internet of Things for Smart City and Department of Civil and Environmental Engineering, University of Macau, Macao SAR 999078, China; [email protected]
⁴ Civil and Environmental Engineering, Pennsylvania State University, State College, PA 16801, USA; [email protected]
⁵ Lawrence Livermore National Lab, Atmospheric, Earth and Energy Division, Livermore, CA 94550, USA; [email protected]

First page

1429

Publication year

2022

Publication date

2022

Publisher

MDPI AG

e-ISSN

20734441

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.3390/w14091429

ProQuest document ID

2663085939

Physics-Guided Long Short-Term Memory Network for Streamflow and Flood Simulations in the Lancang–Mekong River Basin

Jump to:

Full Text

Abstract

Details

Suggested sources