Content area
In this study, a novel spatiotemporal hydrodynamic prediction task framework, named single frame prediction, was developed. The framework could generate results based on boundary conditions and a single flood map from the last time step, relying on hydrodynamic principles rather than historical trends, and doesn't require the assistance of traditional hydrodynamic models. Moreover, a post‐processing method based on physical laws was developed to refine the outputs of deep learning models at each time step, aiming to reduce accumulated errors in long‐term predictions. The performance of a widely used convolutional neural network‐based model, U‐Net, was evaluated to assess the feasibility of single frame prediction and the impact of the proposed post‐processing method. The experiments showed that single frame prediction could produce accurate flood maps, demonstrating the feasibility of the novel framework. Furthermore, the results indicated that the physics‐based post‐processing method could mitigate errors at each step, thereby enhancing prediction accuracy across entire flood event, showing strong effectiveness and applicability in flood prediction. Additionally, an ablation experiment was conducted to assess the effectiveness of each step in the method. The single frame prediction provided a more comprehensive and interpretable depiction of flood prediction processes with essential hydrodynamic variables, including water depth and unit discharge on all grid cells. The post‐processing method significantly reduced the accumulated error in the later stages of single frame prediction to an acceptable range with an average root‐mean‐square error of 0.041 m for water depth and 0.003 m2/s for unit discharge, suggesting a new technique for long‐term flood predictions.
Introduction
In recent years, flood disasters have become one of the most perilous natural hazards, inflicting substantial economic and human losses (Jonkman & Vrijling, 2008). Notable examples include the severe flooding triggered by extreme rainfall across Western Europe from 12 to 15 July 2021 (Tradowsky et al., 2023) and the catastrophic flooding resulting from the St. Francis Dam break (Begnudelli & Sanders, 2007), both of which caused extensive damage. Bentivoglio et al. (2022) distinguished flooding into five categories: river floods, flash floods, coastal floods, urban floods, and dam break and dike breach floods. The first four categories are predominantly driven by meteorological conditions such as heavy precipitation, strong winds, and snowmelt. In contrast, dam break and dike breach floods are primarily caused by the failure of flood protection structures, presenting a challenging concern both for difficulties in prediction and the consequence of severe damages (Rifai et al., 2020). Therefore, the ability to predict the extent of dam break and dike breach floods in a short time frame is of utmost importance. In most flood scenarios, flood flows could travel tens of kilometers from the location of breach to the outlet, with outcomes influenced not only by inflow boundary conditions but also by the hydrodynamic processes between adjacent grid cells. This necessitates a more robust analysis of spatial features, as the complex hydrodynamic processes involved challenge the rapid emulation of dam break and dike breach floods over long time periods and large spatial regions.
Hydrodynamic models continue to be the primary tool for spatiotemporal prediction of the entire flood process resulting from dam break and dike breach. These models could calculate the distribution of hydrodynamic variables across the simulated area for the next time step using the flood map at the current time step by computing convective flux between adjacent grid cells based on physical laws such as 2D shallow water equations (Toro, 2001). Common models for spatiotemporal flood emulation include MIKE 21, Delft3D, and HEC-RAS. Nonetheless, accurately modeling hydrodynamic processes requires huge amounts of computational resource and time, posing a significant challenge in the context of rapid response to unpredictable natural disasters (Karim et al., 2023).
With the advancement of artificial intelligence, deep learning models have been introduced into spatiotemporal prediction for dam break and dike breach floods, significantly enhancing prediction speed. Most current studies predict results at the next time step based on the history of flood maps. Some studies (Ding et al., 2020; Guo et al., 2025; Luo et al., 2024; Wang et al., 2024) segmented flooded images into multiple independent observation stations, with time series of hydrodynamic variables serving as inputs and outputs. This approach simplifies data analysis by transforming the spatiotemporal prediction task into a one-dimensional prediction task at specific observation stations such as outlet stations, albeit at the cost of neglecting the spatial relationships within the flooded map. As a result, several models are required to predict water level at each station, respectively, and an additional interpolation process is necessary to construct the water level distribution across the entire flooded area to illustrate the flood propagation process.
Alternatively, other studies have focused on generating the entire flooded map. Löwe et al. (2021) and Cache et al. (2024) used convolutional neural network (CNN) to predict maximum flood water depth at each grid cell for a flood process. However, these studies could not predict flood arrival times, which poses challenges for flood early warning. Furthermore, RNN-based models (Besseling et al., 2024; Kazadi et al., 2024; Wei et al., 2024) and CNN-based models (Bentivoglio et al., 2023) have successfully generated flooded map for the next time step from several preceding frames. By capturing historical trends of hydrodynamic variables from previous frames they produced accurate flood maps. The inputs of these studies ignore boundary conditions or spatial terrain features, leading them to be unable to predict the hydrodynamic propagation process but focus on the temporal variation trend. However, this task pattern is not without limitations. It requires multiple frames of flooded maps from the hydrodynamic model to start predictions, this means users must have access to a hydrodynamic model, and interacting with the hydrodynamic model increases the programming complexity for practical use. Furthermore, the framework using historical trends demonstrated poor generalization under changing boundary conditions. As Figure 1 illustrates, given the same flood map input at time step t, the framework predicts a large flooded area if the historical trend is rising (red block) but a small area if the trend is falling (green block), which deviates from physics. According to physics, the propagation process is only influenced by the starting state from the last time step and the current boundary condition, and the historical trend should not have a direct effect.
[IMAGE OMITTED. SEE PDF]
In response, Fathi et al. (2025) introduced a novel prediction task framework wherein deep learning models predict the flooded areas for the next time step using upstream discharge and a single previous frame, thereby aligning with the operational methodology of hydrodynamic models. This study suggested an innovative and physics-based prediction task framework for flood prediction. Nonetheless, the work of Fathi et al. (2025) only employed water depth on each grid cell, terrain elevations, and upstream discharge as inputs, while unit discharge on each grid cell, another significant hydrodynamic variable in 2D shallow water equations, was not considered in input data, leading to an unexplainable prediction process. The previous experiment utilized a simple-structure deep learning model to assess prediction performance on a single terrain features map, whereas the applicability of the model and the prediction framework to complex and varying inputs remained unproven.
Furthermore, while deep learning models have shown high accuracy in one-step prediction tasks, intricate spatiotemporal correlations often lead to inaccurate long-term predictions in dynamic systems modeling (Wu et al., 2024). This is attributed to the accumulation of errors inherent in autoregressive tasks performed by deep learning models. To address this issue, Bentivoglio et al. (2023) utilized a loss function applied across multi-steps results to mitigate the accumulated error. Nevertheless, the implementation of this approach demanded substantial storage and computational resources for the storage of the complex back-propagation gradients over multi-steps training. Such requirements proved to be impractical under typical hardware constraints, thereby constraining the broad adoption of this method.
Because outputs from hydrodynamic and hydrological models are susceptible to errors and systematic biases, post-processing methods are essential to achieve unbiased predictions (Madadgar et al., 2014). Fraehr et al. (2023) applied a threshold to remove minor deviations in water depth prediction on dry gird cells. Additionally, some research has utilized observed data or integrated deep learning models as postprocessors to boost prediction accuracy (Sattari et al., 2024). These various post-processing strategies have enhanced prediction accuracy, representing an innovative avenue for mitigating errors in deep learning long-term forecasts.
This paper proposed a novel spatiotemporal hydrodynamic prediction task framework designed for long-term predictions. The interpretable framework could generate flood maps based on boundary conditions and a single frame from the last time step independently of historical trends, and doesn't rely on the assistance of hydrodynamic models. Additionally, to enhance accuracy, a physics-based post-processing method was developed to correct the generated results at each time step, thereby effectively reducing the accumulated errors over the entire flood process. Furthermore, the performance of a CNN-based deep learning model was evaluated to assess the feasibility of single frame prediction and the impact of the post-processing method for long-term flood predictions.
The rest of the paper is organized as follows. Section 2 provides a detailed description of the methods employed, including the novel prediction task framework, the data set, the deep learning model architecture, and the post-processing method. Section 3 presents the experimental results, assesses impact of the employed methods, and demonstrates the effect of the proposed method on a specific real-world case study. Section 4 summarizes the contributions of this study and proposes potential research directions.
Data and Methods
Prediction Task Framework
In this study, a novel spatiotemporal hydrodynamic prediction task framework was proposed, aligning with the operational methodology of hydrodynamic models.
Most previous studies have focused on capturing temporal features from several flood map frames in flood history, a framework this study refer to as historical trend prediction to generate accurate results at step t + 1 from frames at step t − p + 1 to step t where p represents the length of the historical observations. However, this task framework has limitations, as mentioned in the introduction.
To address these issues, a novel spatiotemporal hydrodynamic prediction task framework for flood prediction, termed single frame prediction, was developed. This innovative task framework predicts flood maps for the next time step based on boundary conditions and a single frame from the last time step. Following the work of Fathi et al. (2025), who applied this framework solely for water depth auto-regressive prediction task, this study aimed to extend the framework with additional hydrodynamic variables, including water depth (h) and unit discharges in the X- and Y-directions (qx and qy), to generate more comprehensive and interpretable flood maps.
Data Set
Due to the scarcity of three-dimensional (X-axis, Y-axis and, time axis) spatiotemporal hydrodynamic simulation data sets, most studies used hydrodynamic models to generate data for deep learning. Thus, this study created a new data set for single frame prediction that includes diverse terrain features and inflow boundary conditions. Although there were limitations on the quality and realism of the synthetic data, this work, like most previous studies (including the foundational work by Bentivoglio et al. (2023) upon which this work builds), focused on creating surrogate models for traditional hydrodynamic models to accelerate prediction. Therefore, the aim of this study was the development and evaluation of the proposed methods for the surrogate models, rather than an analysis of the differences between synthetic and authentic data.
The synthetic data set utilized in this study comprised 40 dike breach flood events with distinct terrain features and boundary conditions. Following the methodology of Bentivoglio et al. (2023), who used Perlin noise generator (Perlin, 2002) to generate random digital elevation models for unbiased terrain features, this study generated distinct 64 × 64 grid terrain features for each event using the same noise generator and a 100 m spatial resolution but with different random seeds. Different random seeds ensured the diversity of the generated 2D noise maps. These maps were subsequently used directly as terrain elevations without any modification. Compared to the fixed inlet location in the previous studies, boundary conditions in this data set were distinctively designed to assess the generalization ability across complex and varying scenarios. For each dike breach flood event, the inlet flow location was randomly generated along the perimeter and a constant unit discharge of inflow was assigned a random value ranging from 0.2 to 0.8 m2/s. Additionally, to simulate solid walls surrounding the area, akin to the flood case 2 presented in Jiang et al. (2021), highlands elevated by 25 m were set as two ghost cells around the perimeter. Subsequently, the flood processes were simulated for a duration of 60 hr using Mike 21. The data set was divided into three subsets for training (events 1–32), validation (events 33–36), and test (events 37–40). To increase the diversity of flood data set, an enhanced data augmentation method including rotation (90°, 180°, and 270°) and flipping for both scalar and vector data was applied to expand the data set by eight times. Additional samples were created by concurrently applying rotation and flipping to both the input and the label of each original sample, treating them as images to maintain their correspondence.
Three static features, that is being constant for the whole flood process, were recorded on an arbitrary grid cell (i,j): terrain elevation (zi,j) and slopes along the X- and Y-directions. Additionally, four dynamic features were recorded for each time step t: water depth (ht,i,j), unit discharges in the X- and Y-directions (qxt,i,j and qyt,i,j), and water surface elevation (et,i,j). Because Fathi et al. (2025) considered only water depth as a dynamic feature in both input and output for their autoregressive flood prediction, their deep learning models couldn't replicate the relationships defined by physical equations, resulting in poor interpretability. According to the shallow water equations, the driving forces for a shallow flow are represented by the convective term and the water surface gradient term. In the method adopted by Fathi et al. (2025), which considered only water depth, the deep learning model accounted for only the water surface gradient term. By contrast, the proposed single frame prediction incorporated additional essential hydrodynamic variables, giving deep learning models the opportunity to establish more interpretable relationships. All input variables (x) were rescaled into a standard score (z) through standardization according to Equation 1, where μ is the mean of the population and σ is the standard deviation of the population.
For historical trend prediction, each sample comprised three consecutive frames with a temporal resolution of 1 hr, with the first two used as input and the last one as label. For single frame prediction, only two frames were used for input and label, respectively. The initial input frame was established based on the initial conditions for the first prediction step. The initial conditions for the single frame prediction, including initial water depth and unit discharge, could be automatically generated from observation data or manually defined as synthetic scenarios. This process operated independently of traditional hydrodynamic models. For all subsequent steps, each input frame was derived directly from the output of the preceding step. To enable the model to predict based on physical laws rather than historical trends, additional boundary condition information was integrated into the single frame prediction input. The first frame used as input included a ghost cell containing all static and dynamic variables as boundary conditions, like traditional hydrodynamic models (Figure 2). For the dike breach cases, inflow condition was specified at the breach and the flow rate was determined based on engineering experience, utilizing either theoretical formulas or scenario-specific values, while the remaining boundaries were treated as rigid walls. Compared to Fathi et al. (2025) whose method struggled to generate accurate predictions on our data set, this study incorporated more fundamental variables (water depth and unit discharges) for a more comprehensive prediction and evaluated the generalization ability of a more advanced deep learning model across diverse spatial terrain features.
[IMAGE OMITTED. SEE PDF]
Deep Learning Model
In this study, U-Net, a CNN-based model previously utilized by Bentivoglio et al. (2023), was employed as a baseline for the spatiotemporal flood prediction task (Figure 3). Since its introduction by LeCun et al. (1989), CNN has become one of the most widely used feed-forward neural networks. Among various CNN-based models, U-Net (Ronneberger et al., 2015) distinguishes itself with its encoder-decoder structure and the fusion of deep and shallow features, leading to its adoption in flood prediction research (Bentivoglio et al., 2023; Cache et al., 2024; Löwe et al., 2021; Shao et al., 2024).
[IMAGE OMITTED. SEE PDF]
Post-Processing Method
The error in deep learning models could accumulate during long-term predictions, potentially resulting in erroneous results eventually. A post-processing method based on physical laws was developed to mitigate error at each time step, thereby significantly reducing the accumulated error in the later stages and enhancing the robustness and stability of prediction (Figure 4). The method involved three sequential steps: outlier correction, connected component search, and mass conservation correction. Throughout the entire flood prediction process, the outputs generated by deep learning models at each time step were refined using this three-step post-processing method, aimed at reducing errors and providing a more physical and self-consistent input for the next time step.
[IMAGE OMITTED. SEE PDF]
Outlier Correction
Due to prediction errors in deep learning models, the generated flood maps might contain grid cells with negative water depths, which violate the physical laws. Thus, this step reset all dynamic features (water depth and unit discharges) to zero for grid cells whose water depth is below a threshold, defined as 10−4 m based on engineering experience.
Connected Component Search
This step treated the 64 × 64 grid image as a directed graph (Bang-Jensen & Gutin, 2009) grounded in graph theory. In the directed graph, each grid cell was represented as a vertex (V), and the boundary between two adjacent flooded grid cells was represented as a directed edge (E) from Vi to Vj if ei exceeds zj, indicating that water could flow from Vi to Vj.
According to physical laws, it is evident that the flooded grid cells could constitute a connected component of the directed graph, irrespective of rainfall conditions and kinetic energy of flow. However, general deep learning models often struggle to understand these physical laws, resulting in the generation of high water depths on grid cells that are inaccessible to flood flow. Consequently, a search for connected components was conducted from the inlet vertex. Vertices not included in the connected component, indicating areas where water cannot reach from the inlet, were deemed to have no water, and all associated variables on these vertices were set to zero (Figure 5).
[IMAGE OMITTED. SEE PDF]
Mass Conservation Correction
According to the principle of mass conservation, the increased water quantity should equal the total flow at the inlet. However, due to prediction errors inherent in deep learning models and corrections in the preceding two steps, the principle of mass conservation was not maintained. To address this problem, the water depth and unit discharge at flooded grid cells were refined through the following steps:
-
Calculate between the output and input for each grid cell (Equation 2).
-
Adjust to to ensure mass conservation, where the sum of equals the total inflow from the previous time step (Equation 3).
-
Calculate the refined hydrodynamic variables and while maintaining constant velocity (Equation 4).
As a results, noise was eliminated from outputs of each prediction step following the three stages of post-processing. The refined outputs then served as inputs for the subsequent prediction step, thereby reducing accumulated errors throughout the entire flood process.
Results and Discussion
In this study, the U-Net utilized by Bentivoglio et al. (2023) was employed for the following experiments. This study evaluated the model's performance to assess the feasibility of single frame prediction and the impact of the post-processing method on a synthetic data set. Furthermore, a real-world flood event data set was additionally included to test the robustness and practical applicability of the proposed methodology.
Experimental Details
The U-Net models with 64 base filters and two different depths (d) were employed for the experiments: d = 3 to maintain consistency with the best results in Bentivoglio et al. (2023), and d = 4 to increase the number of parameters. Furthermore, the models were evaluated on two prediction frameworks, both with and without post-processing (abbreviated to “p-p” in the following tables and figures).
For the training processes, the AdamW optimizer (Loshchilov & Hutter, 2017) was utilized with a batch size of 128. A cyclic cosine annealing learning rate schedule was implemented, with the learning rate varying between an upper limit of 10−3 and a lower limit of 10−5. The models were trained for 300 epochs on a NVIDIA Tesla V100 GPU.
For evaluation, the mean-square error function was utilized as the loss function to optimize the models as shown in Equation 5, where denotes the variable predicted by deep learning models at position (i, j) at the time step t for event k, and denotes the labels generated by hydrodynamic model. The root-mean-square error (RMSE, Equation 6) and mean-absolute-error (MAE, Equation 7) were used to assess the accuracy of predicted water depth and unit discharges across all frames during the entire flood process, where n represents the number of samples and T denotes the number of frames. The critical success index (CSI, Equation 8) was used to measure flood warning accuracy at a specified threshold τ, where TPτ, FNτ, and FPτ denote correct alarms, missed alarms, and false alarms, respectively. Two thresholds τ of 0.05 and 0.3 m were employed for CSI computation according to Löwe et al. (2021).
Experimental Results
Computational Speed
First, the computational speed of prediction when using the post-processing method was evaluated. The deep learning model's average prediction times for a 48-hr flood process without and with the post-processing method were 1.686 and 1.815 s, respectively. By contrast, Mike 21 required an average of 58.388 s to simulate a 48-hr flood with a 1-s time step interval. For a 60-hr flood prediction, the deep learning model and Mike 21 required 1.842 s (2.121 s with p-p) and 82.127 s, respectively. Consequently, using the post-processing method added less than 0.3 s to the prediction time, an increase that did not significantly impact the overall process. Furthermore, single frame prediction took only one-fortieth the time required by the hydrodynamic model. This substantial time saving becomes even more pronounced as the hydrodynamic model's simulation duration extends beyond 24 hr, or as the time interval of deep learning predictions increases.
Feasibility of Single Frame Prediction
The results indicated that single frame prediction, when combined with the post-processing method, could generate accurate flood maps in long-term predictions, as demonstrated in Figure 6.
[IMAGE OMITTED. SEE PDF]
Table 1 presented the test results of single frame prediction on the synthetic events. The deep learning models successfully generated accurate distributions of water depth and unit discharge across various terrain features and boundary conditions, illustrating the propagation of flooded areas (Figure 6). Compared to previous work of Fathi et al. (2025), the above experiments considered essential hydrodynamic variables as input during predictions, instead of only water depth, thus providing a more comprehensive and interpretable flood prediction process. Besides, the widely used U-Net model generated more accurate inundation maps across several flood events than the previously used model, demonstrating the strong generalization ability of single frame prediction to distinct terrain features and boundary conditions. Consequently, the results confirmed the feasibility of single frame prediction framework.
Table 1 Results of Single Frame Prediction and Historical Trend Prediction on Synthetic Events
| Framework | Model | RMSE | MAE | AvgCSI (%) | ||
| h (m) (10−2) | q (m2/s) (10−2) | h (m) (10−2) | q (m2/s) (10−2) | |||
| Single frame prediction | U-Net d = 3 | 7.57 | 0.61 | 2.83 | 0.23 | 79.4 |
| U-Net d = 4 | 6.43 | 0.55 | 2.52 | 0.19 | 82.7 | |
| U-Net d = 3 with p-p | 5.18 | 0.36 | 1.58 | 0.10 | 88.1 | |
| U-Net d = 4 with p-p | 4.06 | 0.30 | 1.30 | 0.08 | 90.4 | |
| Historical trend prediction | U-Net d = 3 | 4.87 | 0.30 | 1.58 | 0.10 | 89.0 |
| U-Net d = 4 | 4.70 | 0.28 | 1.68 | 0.10 | 88.6 | |
| U-Net d = 3 with p-p | 4.34 | 0.28 | 1.33 | 0.08 | 89.9 | |
| U-Net d = 4 with p-p | 4.40 | 0.27 | 1.36 | 0.08 | 89.6 |
While historical trend prediction offered a baseline, single frame prediction remained less accurate than the previous multi-frames approach without the post-processing method, largely due to less input information. However, single frame prediction demonstrated its potential for accurate predictions when augmented with a post-processing method, achieving the best results among all test cases (U-Net d = 4 with p-p). This improved performance suggested that single frame prediction, which relied on physical laws and hydrodynamic principles, might benefit more from the physics-based post-processing method compared to historical trend prediction. Consequently, these findings underscored the feasibility and potential necessity of combining deep learning methods with physical principles.
Impact of Post-Processing Method
With the employment of the post-processing method, all metrics improved for all combinations of models and prediction frameworks (Table 1), demonstrating the strong effectiveness and applicability of the post-processing method. Furthermore, a comprehensive analysis is presented below.
Applicability of the Post-Processing Method
The applicability of the post-processing method to different prediction frameworks, flood events, and hydrodynamic variables was investigated. As illustrated in Table 1, all metrics calculated across the entire test data set showed improvement after post-processing. Furthermore, the percentages of improved samples after applying the post-processing method for water depth, unit discharge, and flooded area were calculated. For each variable, a sample was considered improved only if all relevant metrics (RMSE and MAE for water depth and unit discharge, or CSI0.05m and CSI0.3m for flooded area) showed improvement. Additionally, the percentages of improved samples based on metrics for all three variables were calculated.
The results in Table 2 indicated that the post-processing method performed effectively on approximately half synthetic samples in historical trend prediction and most synthetic samples in single frame prediction, demonstrating strong applicability across diverse prediction frameworks, flood events and hydrodynamic variables. More than 80% samples processed using the U-Net d = 4 model showed improvement in metrics for water depth and flooded area after post-processing, attributable to the last two steps of the method, connected component search and mass conservation correction. However, the improvements in metrics for unit discharges were less pronounced than those for the other two variables. Due to the large spatial and temporal resolution (100 m and 1 hr) and complex hydrodynamic principles, quantitatively analyzing the variation in unit discharges was more difficult than for the other two variables, which were based on 2D shallow water equations, resulting in a relatively less significant improvement.
Table 2 Percentages of Improved Samples After Post-Processing for Each Variable
| Framework | Model | Water depth (%) | Unit discharge (%) | Flooded area (%) | All (%) |
| Single frame prediction | U-Net d = 3 with p-p | 90.6 | 93.8 | 93.8 | 87.5 |
| U-Net d = 4 with p-p | 87.5 | 65.6 | 87.5 | 65.6 | |
| Historical trend prediction | U-Net d = 3 with p-p | 56.3 | 59.4 | 56.3 | 50.0 |
| U-Net d = 4 with p-p | 65.6 | 59.4 | 62.5 | 40.6 |
Ablation Experiment for the Post-Processing Method
To investigate the impact of each step in the post-processing method, an ablation experiment was conducted using the U-Net d = 4 model and single frame prediction on the synthetic events. The performance of the post-processing method without each step was evaluated, respectively. As shown in Table 3, the improvement in accuracy and the percentages of improved samples in total samples declined significantly without the connected component search at step 2, and rose slightly without the mass conservation correction at step 3. By contrast, no metrics changed without the outlier correction at step 1, indicating the high prediction accuracy of deep learning models. However, the brief outlier correction at step 1 was retained as a safeguard, even though its outlier-removing effect was similar to that of step 3. The results demonstrated that both step 2 and step 3 played important roles in reducing errors. Although step 3 could enhance the accuracy without step 2, it might produce some negative effects on the accuracy of water depth and unit discharges when combined with step 2, due to the difficulty of quantitative correction. Among these three steps, the step 2 was the most critical operation, as it could effectively remove most noise in the unsubmerged area. Because employing all three steps, each contributing effectively (particularly the last two steps), led to the highest AvgCSI, the result yielded by this configuration was considered the best in the study.
Table 3 Results of Ablation Experiment
| Employed step | RMSE | MAE | AvgCSI (%) | Improved samples (%) | |||||
| h (m) (10−2) | q (m2/s) (10−2) | h (m) (10−2) | q (m2/s) (10−2) | h | q | CSI | All | ||
| None | 6.43 | 0.55 | 2.52 | 0.19 | 82.7 | \ | \ | \ | \ |
| Step 2 and 3 | 4.06 | 0.30 | 1.30 | 0.08 | 90.4 | 87.5 | 65.6 | 87.5 | 65.6 |
| Step 1 and 3 | 6.09 | 0.50 | 2.23 | 0.16 | 83.3 | 81.3 | 71.9 | 59.4 | 50.0 |
| Step 1 and 2 | 3.92 | 0.29 | 1.30 | 0.08 | 90.2 | 87.5 | 84.4 | 87.5 | 81.3 |
| All | 4.06 | 0.30 | 1.30 | 0.08 | 90.4 | 87.5 | 65.6 | 87.5 | 65.6 |
Influence on Prediction Stability
Compared with the early stages of flood process, the optimization effect of the post-processing method was more significant in the later stages. A typical sample was presented in Figure 7 to illustrate the impact of the post-processing method. The spatial distributions of water depth and unit discharge in the X-direction across the entire flood event generated by the U-Net d = 4 using single frame prediction with and without the post-processing method were presented. As shown in Figure 7, without the post-processing method, noticeable outliers appeared on the right side of flood map starting at 24 hr. The absolute error and range of outliers continued to increase over time, ultimately resulting in a flood map with significant errors. In contrast, there was no outliers for either water depth or unit discharge in the flood map refined by the post-processing. As a result, without modifying the deep learning models, accurate flood maps after 48 hr were produced through the post-processing method, proposing a simple approach to reduce errors in the later stages of single frame prediction to an acceptable range.
[IMAGE OMITTED. SEE PDF]
A Real-World Case Study
To further test the proposed prediction framework and post-processing method, they were applied to a real-world dam break event, namely the Tous Dam break event (Alcrudo & Mulet, 2007), and the data set utilized was developed using Mike 21. The Tous Dam, located in Valencia, Spain, failed in 1982, leading to devastating flooding and in turn economic and environmental damage. The terrain elevation of Tous Dam (97 × 70 pixels with a 50 m spatial resolution) was employed to simulate data for model testing. Inflow condition was specified at the upstream boundary and the flow rate was set using the real upstream hydrograph (a 40-hr flood with a 15,000 m3/s peak flow), the downstream boundary was considered as an open boundary, and the remaining boundaries were treated as rigid walls. Additionally, nine additional synthetic inflow hydrographs were combined with the real terrain elevations to generate training data (eight events) and validation data (one event).
Due to differences between the synthetic data set and the real-world case study, some experimental settings were modified. Since the inflow unit discharges varied over time during the flood, the unit discharge inflow boundary condition was approximated by the average unit discharge for the subsequent time interval. Furthermore, for the training process, a batch size of 8 and epoch number 800 were employed. Additionally, the second step of the post-processing method used all flooded grid cells as search starting points due to the terrain's significant undulations, and the third step was omitted for the real-world event because the downstream outflow boundary condition resulted in a violation of mass conservation. Other settings remained consistent with the experiments using the synthetic data set.
For the real-world dam break case, predicting flood propagation during the dam break proved to be very challenging due to the extremely high peak flow. Therefore, results for both the initial period (the first 12 hr before a significant part of the dam fell) and the entire flood process were presented in Table 4. The relatively low errors in predicted water depth and unit discharges (when compared to the maximum water depth of 40.68 m and peak inflow of 15,000 m3/s) along with high AvgCSI values, demonstrated the practical applicability of the single frame prediction with post-processing method, which successfully modeled the flood's propagation. However, accurately predicting the entire dam break process, particularly with extremely high unit discharges, remains a challenge for future research.
Table 4 Results of Single Frame Prediction and Historical Trend Prediction on Real-World Event
| Framework | Model | Time period | RMSE | MAE | AvgCSI (%) | ||
| h (m) | q (m2/s) | h (m) | q (m2/s) | ||||
| Single frame prediction | U-Net d = 4 | Initial period | 0.70 | 3.01 | 0.29 | 0.99 | 33.4 |
| Entire flood | 4.35 | 11.18 | 1.82 | 4.47 | 16.4 | ||
| U-Net d = 4 with p-p | Initial period | 0.43 | 1.45 | 0.09 | 0.24 | 88.0 | |
| Entire flood | 0.83 | 2.86 | 0.19 | 0.53 | 82.2 | ||
| Historical trend prediction | U-Net d = 4 | Initial period | 0.69 | 2.27 | 0.23 | 0.56 | 44.5 |
| Entire flood | 1.57 | 5.91 | 0.52 | 1.46 | 28.4 | ||
| U-Net d = 4 with p-p | Initial period | 0.61 | 2.10 | 0.14 | 0.36 | 84.8 | |
| Entire flood | 1.37 | 5.24 | 0.35 | 0.97 | 74.3 |
Whether in the initial period or throughout the entire flood, historical trend prediction generated superior results to single frame prediction when both were without post-processing. However, single frame prediction with post-processing achieved the best result among all models, showing greater improvement through post-processing than historical trend prediction. Due to the poor generalization under changing boundary conditions, when inflow discharge subsequently decreased, historical trend prediction led to larger flooded areas. This occurred because the prediction was based on the initial period of increasing inflow discharge. Therefore, employing the physics-based single frame prediction method with post-processing could achieve higher prediction accuracy than the previous methods, especially when inflow discharge varied, a finding consistent with conclusions drawn from previous experiments on the synthetic data set.
Figure 8 illustrated the flood propagation process predicted using single frame prediction with and without the post-processing method. As shown in Figure 8, single frame prediction with the post-processing method successfully predicted flooded areas throughout the flood event. Furthermore, noticeable outliers began to appear after 14 hr in the predictions without the post-processing method, but these were effectively mitigated by applying the post-processing.
[IMAGE OMITTED. SEE PDF]
Conclusions
In this study, a novel spatiotemporal flood prediction task framework, named single frame prediction, was proposed. Compared to historical trend prediction used in previous studies, single frame prediction could predict based on boundary conditions and a single frame input from the last time step. Furthermore, a post-processing method based on physical laws was developed to reduce accumulated errors in spatiotemporal flood prediction. Additionally, the U-Net model was evaluated to demonstrate the feasibility of single frame prediction and the impact of the post-processing method. The conclusions derived are as follows:
-
A novel flood prediction task framework, single frame prediction, was proposed based on the operational methodology of hydrodynamic models. Compared to the historical trend prediction utilized in the previous studies, the single frame framework could predict based on boundary conditions and a single frame input from the last time step. As a results, the single frame prediction doesn't require the assistance of traditional hydrodynamic models and is more aligned with the operational methodology of hydrodynamic models. As shown in the experiments, it was feasible to generate accurate flood maps across various terrain features and boundary conditions with acceptable errors. When combined with the post-processing method, single frame prediction even outperformed historical trend prediction, highlighting the feasibility and potential necessity of integrating deep learning with physical principles.
-
A post-processing method based on physical laws was developed to reduce errors at each time step during flood prediction, thereby significantly reducing the accumulated error in the later stages. The physics-based post-processing method refined the generated flood map according to hydrodynamic principles, providing a more physical and self-consistent input for the next time step. The experiments indicated that the post-processing method could increase prediction accuracy across various prediction frameworks, terrain features, boundary conditions, and hydrodynamic variables, demonstrating strong applicability. Additionally, the method could significantly increase the stability of single frame prediction by removing outliers from flood maps in the later stages. As a result, this study developed a simple method to increase accuracy of flood prediction tasks without modifying deep learning models or increasing data sets, demonstrating compatibility with other methods.
However, there were still some limitations in this study. Although the current method performed well with a constant unit discharge boundary condition, more advanced methods should be used to handle the time-varying inlet discharge in further experiments. Furthermore, current deep learning models still couldn't fully understand complex physical laws. Introduction or design of an advanced model based on physical laws, rather than using post-processing methods, remains a potential research direction to improve prediction accuracy.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant U2243240) and the Water Conservancy and Technology Program of Hunan Province, China (XSKJ2024064-4). Moreover, this work utilized computing resources provided by Center of High performance computing, Tsinghua University. The authors thank reviewers for the valuable comments which dramatically improve this manuscript.
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
Data Availability Statement
The original data and codes used in the study by Bentivoglio et al. (2023) can be downloaded from and . The original data in the study by Alcrudo and Mulet (2007) can be downloaded from . The independently created datasets, codes, and model parameters from this study would be uploaded to github () as soon as possible.
Alcrudo, F., & Mulet, J. (2007). Description of the Tous Dam break case study (Spain). Journal of Hydraulic Research, 45(sup1), 45–57. https://doi.org/10.1080/00221686.2007.9521832
Bang‐Jensen, J., & Gutin, G. Z. (2009). Classes of digraphs. In J. Bang‐Jensen & G. Z. Gutin (Eds.), Digraphs: Theory, algorithms and applications (pp. 31–86). Springer. https://doi.org/10.1007/978‐1‐84800‐998‐1_2
Begnudelli, L., & Sanders, B. F. (2007). Simulation of the St. Francis dam‐break flood. Journal of Engineering Mechanics, 133(11), 1200–1212. https://doi.org/10.1061/(ASCE)0733‐9399(2007)133:11(1200)
Bentivoglio, R., Isufi, E., Jonkman, S. N., & Taormina, R. (2022). Deep learning methods for flood mapping: A review of existing applications and future research directions. Hydrology and Earth System Sciences, 26(16), 4345–4378. https://doi.org/10.5194/hess‐26‐4345‐2022
Bentivoglio, R., Isufi, E., Jonkman, S. N., & Taormina, R. (2023). Rapid spatio‐temporal flood modelling via hydraulics‐based graph neural networks. Hydrology and Earth System Sciences, 27(23), 4227–4246. https://doi.org/10.5194/hess‐27‐4227‐2023
Besseling, L. S., Bomers, A., & Hulscher, S. J. M. H. (2024). Predicting flood inundation after a dike breach using a long short‐term memory (LSTM) neural network. Hydrology, 11(9), 152. https://doi.org/10.3390/hydrology11090152
Cache, T., Gomez, M. S., Beucler, T., Blagojevic, J., Leitao, J. P., & Peleg, N. (2024). Enhancing generalizability of data‐driven urban flood models by incorporating contextual information. In Hydrology and earth system sciences discussions (pp. 1–23). https://doi.org/10.5194/hess‐2024‐63
Ding, Y., Zhu, Y., Feng, J., Zhang, P., & Cheng, Z. (2020). Interpretable spatio‐temporal attention LSTM model for flood forecasting. Neurocomputing, 403, 348–359. https://doi.org/10.1016/j.neucom.2020.04.110
Fathi, M. M., Liu, Z., Fernandes, A. M., Hren, M. T., Terry, D. O., Nataraj, C., & Smith, V. (2025). Spatiotemporal flood depth and velocity dynamics using a convolutional neural network within a sequential deep‐learning framework. Environmental Modelling and Software, 185, 106307. https://doi.org/10.1016/j.envsoft.2024.106307
Fraehr, N., Wang, Q. J., Wu, W., & Nathan, R. (2023). Development of a fast and accurate hybrid model for floodplain inundation simulations. Water Resources Research, 59(6), e2022WR033836. https://doi.org/10.1029/2022WR033836
Guo, W.‐D., Chen, W.‐B., & Chang, C.‐H. (2025). A spatiotemporal watershed‐scale machine‐learning model for hourly and daily flood‐water level prediction: The case of the tidal Beigang River, Taiwan. Natural Hazards, 121(8), 9563–9611. https://doi.org/10.1007/s11069‐025‐07187‐2
Jiang, C., Zhou, Q., Yu, W., Yang, C., & Lin, B. (2021). A dynamic bidirectional coupled surface flow model for flood inundation simulation. Natural Hazards and Earth System Sciences, 21(2), 497–515. https://doi.org/10.5194/nhess‐21‐497‐2021
Jonkman, S. N., & Vrijling, J. K. (2008). Loss of life due to floods. Journal of Flood Risk Management, 1(1), 43–56. https://doi.org/10.1111/j.1753‐318X.2008.00006.x
Karim, F., Armin, M. A., Ahmedt‐Aristizabal, D., Tychsen‐Smith, L., & Petersson, L. (2023). A review of hydrodynamic and machine learning approaches for flood inundation modeling. Water, 15(3), 566. https://doi.org/10.3390/w15030566
Kazadi, A., Doss‐Gollin, J., Sebastian, A., & Silva, A. (2024). FloodGNN‐GRU: A spatio‐temporal graph neural network for flood prediction. Environmental Data Science, 3, e21. https://doi.org/10.1017/eds.2024.19
LeCun, Y., Boser, B., Denker, J. S., Henderson, D., Howard, R. E., Hubbard, W., & Jackel, L. D. (1989). Backpropagation applied to handwritten zip code recognition. Neural Computation, 1(4), 541–551. https://doi.org/10.1162/neco.1989.1.4.541
Loshchilov, I., & Hutter, F. (2017). Decoupled weight decay regularization. In International conference on learning representations. https://doi.org/10.48550/arXiv.1711.05101
Löwe, R., Böhm, J., Jensen, D. G., Leandro, J., & Rasmussen, S. H. (2021). U‐FLOOD – Topographic deep learning for predicting urban pluvial flood water depth. Journal of Hydrology, 603, 126898. https://doi.org/10.1016/j.jhydrol.2021.126898
Luo, Y., Zhou, Y., Chen, H., Xiong, L., Guo, S., & Chang, F.‐J. (2024). Exploring a spatiotemporal hetero graph‐based long short‐term memory model for multi‐step‐ahead flood forecasting. Journal of Hydrology, 633, 130937. https://doi.org/10.1016/j.jhydrol.2024.130937
Madadgar, S., Moradkhani, H., & Garen, D. (2014). Towards improved post‐processing of hydrologic forecast ensembles. Hydrological Processes, 28(1), 104–122. https://doi.org/10.1002/hyp.9562
Perlin, K. (2002). Improving noise. ACM Transactions on Graphics, 21(3), 681–682. https://doi.org/10.1145/566654.566636
Rifai, I., Schmitz, V., Erpicum, S., Archambeau, P., Violeau, D., Pirotton, M., et al. (2020). Continuous monitoring of fluvial dike breaching by a laser profilometry technique. Water Resources Research, 56(10), e2019WR026941. https://doi.org/10.1029/2019WR026941
Ronneberger, O., Fischer, P., & Brox, T. (2015). U‐Net: Convolutional networks for biomedical image segmentation. In N. Navab, J. Hornegger, W. M. Wells, & A. F. Frangi (Eds.), Medical image computing and computer‐assisted intervention—MICCAI 2015 (pp. 234–241). Springer International Publishing. https://doi.org/10.1007/978‐3‐319‐24574‐4_28
Sattari, A., Jafarzadegan, K., & Moradkhani, H. (2024). Enhancing streamflow predictions with machine learning and Copula‐embedded Bayesian model averaging. Journal of Hydrology, 643, 131986. https://doi.org/10.1016/j.jhydrol.2024.131986
Shao, Y., Chen, J., Zhang, T., Yu, T., & Chu, S. (2024). Advancing rapid urban flood prediction: A spatiotemporal deep learning approach with uneven rainfall and attention mechanism. Journal of Hydroinformatics, 26(6), 1409–1424. https://doi.org/10.2166/hydro.2024.024
Toro, E. (2001). Shock‐capturing methods for free‐surface shallow flows. Retrieved from https://www.semanticscholar.org/paper/Shock‐Capturing‐Methods‐for‐Free‐Surface‐Shallow‐Toro/04783f5cca09b8b1ccfdf6e592e35735e4c0d2ff
Tradowsky, J. S., Philip, S. Y., Kreienkamp, F., Kew, S. F., Lorenz, P., Arrighi, J., et al. (2023). Attribution of the heavy rainfall events leading to severe flooding in Western Europe during July 2021. Climatic Change, 176(7), 90. https://doi.org/10.1007/s10584‐023‐03502‐7
Wang, Y., Wang, W., Xu, D., Zhao, Y., & Zang, H. (2024). A novel strategy for flood flow prediction: Integrating Spatio‐temporal information through a two‐dimensional hidden layer structure. Journal of Hydrology, 638, 131482. https://doi.org/10.1016/j.jhydrol.2024.131482
Wei, G., Xia, W., He, B., & Shoemaker, C. (2024). Quick large‐scale spatiotemporal flood inundation computation using integrated encoder‐decoder LSTM with time distributed spatial output models. Journal of Hydrology, 634, 130993. https://doi.org/10.1016/j.jhydrol.2024.130993
Wu, H., Liang, Y., Xiong, W., Zhou, Z., Huang, W., Wang, S., & Wang, K. (2024). Earthfarseer: Versatile spatio‐temporal dynamical systems modeling in one model (arXiv:2312.08403). arXiv. https://doi.org/10.48550/arXiv.2312.08403
© 2025. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.