Content area
Flood modeling is crucial in flood management as it can provide early warnings and support informed decision‐making on mitigation and adaptation strategies. However, it remains challenging to provide accurate flood predictions in real time using hydrodynamic flood models due to high computational demands. This study presents a new discriminator‐guided Generative Adversarial Neural Networks (GANs) model for two‐dimensional, high‐resolution urban flood prediction. Compared with the traditional GANs, the role of the discriminator is re‐defined by modifying its structure and loss function, enabling pixel‐wise discrimination based on errors, thereby better meeting the requirements of high‐resolution flood prediction. The proposed model is tested on the case study of Exeter, which covers an area of 27 km2 with a spatial resolution of 2 m, compared with the baseline models of Pix2Pix and U‐Net. The proposed model can accurately predict the water depths across historical and design rainfall events, achieving an average root mean square error of 0.044 m and Critical Success Index of 0.754, demonstrating the generalization capability on unseen rainfall events. The proposed model significantly improves computational efficiency and offers a viable solution for spatiotemporal flood prediction in real‐time, providing informed decision‐making for urban flood management.
Introduction
Urban flooding is one of the most widespread natural hazards, posing significant risks to people, property, and critical infrastructure. For example, over 5.2 million homes and businesses in England are at risk of flooding, with associated economic losses amounting to £333 million between 2019 and 2020 (Department for Environment Food & Rural Affairs, 2024). The impacts of climate change and urbanization are likely to exacerbate flood risks, presenting persistent challenges for urban flood management (Fowler et al., 2021; Güneralp et al., 2015; Hanlon et al., 2021; Kew et al., 2024). Climate change will increase the frequency of extreme storm events globally (Fowler et al., 2021; Güneralp et al., 2015; Hammond et al., 2015; Hanlon et al., 2021; Kew et al., 2024). In parallel, the acceleration of urbanization can increase surface runoff by turning permeable green spaces into impervious surfaces (Hammond et al., 2015). By 2050, it is estimated that 68% of the global population will reside in urban areas, implying that a larger proportion will face flood risks, particularly in cities located in floodplains (Hammond et al., 2015; O'Donnell & Thorne, 2020). These converging trends suggest that urban pluvial flooding is likely to become more frequent and severe in the coming decades. As a result, robust flood modeling capabilities are essential for supporting both real-time flood response and long-term urban planning. However, the current hydrodynamic models have difficulties in providing real-time flood prediction due to their computationally intensive demands (Guo, Guan, & Yu, 2021; Zounemat-Kermani et al., 2020). Therefore, it is imperative to explore rapid and novel approaches to support informed decision-making in flood management (Fu et al., 2022; Rosenzweig et al., 2021).
Data-driven models have been developed to meet the need for flood-related tasks because of their high computational efficiency (Aderyani et al., 2025; Bentivoglio et al., 2022; Fu et al., 2022; Ivanov et al., 2021; Sit et al., 2020). Models, such as Random Forest, Naive Bayes, and Multilayer Perceptron, have been employed in flood prediction, demonstrating their effectiveness while generally being lightweight and computationally efficient (Aderyani et al., 2025; Khosravi et al., 2019; Li et al., 2021). However, increasing attention has been given to the importance of exploiting spatial features, especially when dealing with complex catchments where spatial variability plays a critical role.
As a response to the need for better spatial feature extraction and improved generalization, recent research has shifted toward deep learning approaches. Deep learning models have been widely applied to flood-related tasks, including flood assessment, detection, and prediction. Recent popular architectures including Transformers, Graph Neural Networks (GNNs), Physics-Informed Neural Networks (PINNs) and Convolutional Neural Networks (CNNs) have been explored across various flood types such as pluvial, fluvial, and dam-break flooding (Bentivoglio et al., 2023, 2025; Cao et al., 2025; do Lago et al., 2023; Pianforini et al., 2024; Sharma & Saharia, 2025; Taghizadeh et al., 2025), which have distinct challenges in terms of accurately capturing flood dynamics. Model selection is typically matched to the specific objectives of the flood prediction task. For example, in large-scale basins where training data is abundant, general-purpose Transformer models and their vision-specific variants (e.g., ViTs) may show strong performance (Pianforini et al., 2024; Sharma & Saharia, 2025). Fluvial and dam-break floods follow directional flow paths that can be naturally represented as graphs, making GNNs effective for modeling spatiotemporal dependencies (Bentivoglio et al., 2023, 2025; Taghizadeh et al., 2025). For pluvial flood modeling, the key challenges lie in the sensitivity to micro-topographic variations and the simultaneous rainfall impact across multiple sub-areas in the catchment, thus highlighting the need for high-resolution prediction. CNN-based models are well-suited for pluvial flood prediction, effectively capturing localized flooding patterns via grid-based spatial inputs and producing pixel-wise water depth outputs (Cao et al., 2025; Wang et al., 2024). Recognizing the match between task requirements and CNN strengths, this study adopts a CNN-based architecture to model urban pluvial floods.
As a refinement of conventional CNNs, U-Net architectures have been increasingly adopted for their ability to enhance the spatial encoding and decoding of flood-related features, and they are now widely regarded as a strong baseline in flood modeling applications (Guo, Leitao, et al., 2021; Löwe et al., 2021; Wang et al., 2024). This architecture enables pixel-wise predictions and has shown strong performance in representing spatial patterns. However, challenges remain in improving prediction accuracy, capturing fine-scale details, and enhancing generalization, all of which are critical for practical applications.
Generative Adversarial Neural Networks (GANs) represent a promising approach for enhancing flood prediction performance (Burrichter et al., 2023; do Lago et al., 2023; Hofmann & Schüttrumpf, 2021) by reducing over-smoothing and preserving fine-scale spatial features that are critical for high-resolution urban flood modeling. GANs consist of two neural networks, a generator and a discriminator, which are trained simultaneously in a competitive way. The generator aims to produce synthetic data that resembles the target distribution, while the discriminator attempts to distinguish between real and generated data. Through this adversarial training process, the generator progressively improves its ability to generate realistic synthetic data, while the discriminator becomes more effective at detecting them. Among GAN-based approaches, Pix2Pix has been widely applied to flood mapping tasks, generating high-resolution flood maps by learning from observed flood patterns and translating them into realistic representations (Burrichter et al., 2023; do Lago et al., 2023; Hofmann & Schüttrumpf, 2021; Isola et al., 2017). For instance, FloodGAN has been developed as a rainfall-driven flood prediction framework, utilizing satellite images as input to capture spatiotemporal rainfall dynamics and their impact on flooding (Hofmann & Schüttrumpf, 2021). Similarly, the DualGAN model has been designed to predict floods in previously unseen catchments, employing dual generator-discriminator pairs: one to identify flooded cells and another to estimate water depths with enhanced precision (do Lago et al., 2023). Another approach integrates graph-based learning with image-to-image translation by combining a Graph Neural Network with Pix2Pix to effectively incorporate complex infrastructure information, such as manhole data, into flood predictions (Burrichter et al., 2023). Previous studies on Pix2Pix have primarily focused on refining the generative component by incorporating novel input features, modifying the architecture, or integrating domain knowledge, while the design and limitations of the discriminator have received comparatively less attention.
However, this limited focus on the discriminator poses challenges for flood prediction tasks. In the original Pix2Pix framework, the discriminator is designed for image style transfer and provides feedback based on global image realism, not pixel-level accuracy (Isola et al., 2017). Its loss function treats the entire generated image as either real or fake, regardless of local correctness based on the ground truth. While this strategy promotes global visual consistency in general image translation tasks, the generator may produce outputs that appear realistic but lack numerical fidelity, leading to deviations from true flood depths (Isola et al., 2017). This is particularly problematic for flood prediction, where pixel-wise correctness is important.
To address this gap, this study aims to develop a discriminator-guided GAN model for two-dimensional, high-resolution urban flood prediction by redefining the role of the discriminator to effectively guide the generator rather than competing with it. Compared to the standard Pix2Pix, the new model is improved in the following aspects: (a) proposing a Fully Convolutional Network (FCN) instead of the standard discriminator to keep the spatial dimensions of the input and output the same; (b) introducing an ground-truth-guided discrimination approach that calculates the error between the prediction and the ground truth as the target when the discriminator evaluates the generated flood map. The proposed model is tested on the case study area of Exeter, which covers about 27 km2 with a 2 m spatial resolution. Rainfall events with a 5-min temporal resolution are used for model training and testing. The proposed model is compared with two baselines of U-Net and Pix2Pix models. The experiment results show that the proposed approach is effective in high-resolution spatiotemporal flood simulation, providing real-time prediction for urban flood management.
Methodology
Problem Formulation
The study predicts the flood depth map at time step t + n using static catchment features and dynamic rainfall data from the past m time steps up to current time t, following the formulation,
The catchment features are static, including elevation and slope, which remain constant over time. The dynamic rainfall inputs include the current frame index at time t (counted from the onset of rainfall), the cumulative rainfall up to t, and the rainfall intensities from t − m to t (M frames in total). These features are used as input to predict the flood depth at time t + n. The study adopts a single-step sliding window approach for prediction. By moving the temporal window along each flood event, we extract a series of input–output pairs corresponding to different target time steps. Each sample uses a 30-min rainfall window (m past frames) as input to predict the flood depth at the subsequent t + n frame. By producing multiple single-step predictions per event, the model can generate a continuous sequence of flood depth maps across the entire flood event timeline.
Model
Structure
Our proposed model shares a similar structure with Pix2Pix, which is a type of conditional GAN. It consists of a generator and a discriminator. The proposed model employs a U-Net-based generator to produce flood maps, as depicted in Figure 1, utilizing an encoder-decoder framework (Ronneberger et al., 2015). Parameter counts are provided in Appendix C in Supporting Information S1. The data flow of the proposed model is provided in Appendix D in Supporting Information S1.
[IMAGE OMITTED. SEE PDF]
Each 1D rainfall feature (e.g., intensity at a given time step) is broadcast into a 2D matrix to match the spatial dimensions of the DEM and slope inputs. These inputs that is, broadcasted rainfall features, DEM, and slope are then stacked along the channel dimension and fed into both the generator. The encoder consists of three double convolutional blocks that extract multi-scale features, from simple geometries to complex environmental attributes. Each block includes convolutional layers with Leaky ReLU activation (Goyal et al., 2020), enhancing the model's ability to learn complex patterns. Max pooling reduces feature map dimensionality, preserving critical information while optimizing computational efficiency. Skip connections concatenate encoder and decoder feature maps, preserving gradients and spatial details. The decoder reconstructs flood maps using transposed convolutions for upsampling, followed by additional convolutional layers to refine predictions. The final convolutional layer outputs a single-channel map representing predicted water depths, ensuring high-fidelity flood extent reconstruction.
This study replaces the standard discriminator with an FCN-based classifier to enhance high-resolution spatial discrimination essential for flood prediction. The FCN classifier maintains the spatial dimensions of input data, enabling detailed grid-level analysis. As shown in Figure 1, catchment and rainfall feature matrices are concatenated with generated or real flood maps to form input pairs, which pass through four double convolutional blocks to learn spatial patterns and evaluate relationships. The output is a matrix providing pixel-wise discrimination, functioning as a classification or regression result depending on the activation function. Retaining the same spatial dimensions as the input, each cell represents classification confidence or regression error of the predicted water depth. This approach improves flood mapping accuracy and supports practical flood risk management by offering detailed spatial insights into potential flooding scenarios.
Loss Function
Loss functions quantify the difference between the model's predictions and the target values, guiding the training process by providing a learning signal for parameter adjustments to minimize the loss. The Generator () and the discriminator () are trained using separate loss functions.
The discriminator is trained to distinguish between real and generated water depth maps on a pixel-wise basis. It takes as input the event features (catchment and rainfall features) and either the ground-truth water depth or the generated result , and then outputs a matrix of probabilities indicating whether each pixel is real (1) or fake (0).
The total loss of the discriminator is defined as:
Real pair loss () encourages to classify all pixels of the real pair as real (i.e., output close to 1):
In Equation 3, is a matrix of ones with the same dimensions as , representing the ideal discriminator output for real samples. is a customizable loss function, such as Binary Cross Entropy (BCE).
Fake pair loss () penalizes when it fails to identify difference between the generated result and the ground truth .
In Equation 4, is a binary mask constructed by thresholding the absolute error between the generated and true depth values:
This pixel-level mask reflects where the generator prediction is considered acceptable within threshold or not. In our experiments, the threshold is set to 0.05 m.
The generator is trained to produce realistic flood depth maps that both resemble the ground truth and are indistinguishable from real samples by the discriminator . Its loss function comprises two components.
Reconstruction Loss () encourages the generator to produce outputs that are close to the ground truth . It is defined using the mean absolute error (MAE) across the spatial dimensions of the predicted flood map:
Adversarial Loss () guides the generator to produce outputs that are classified as real by the discriminator.
In Equation 7, is a custom loss function applied to the discriminator's prediction, typically Binary Cross Entropy (BCE), and is a hyperparameter that controls the trade-off between adversarial realism and pixel-wise accuracy.
A larger places more weight on reconstructing the ground truth (i.e., minimizing ), leading to results that are visually closer to the reference flood maps. Conversely, a smaller gives more emphasis to fooling the discriminator, potentially allowing for more creative or flexible outputs. In the original Pix2Pix and FloodGAN, was set as 100 (Hofmann & Schüttrumpf, 2021; Isola et al., 2017), a large value, to prioritize accurate reconstruction, while Lago et al. set as 5 and 40 (do Lago et al., 2023) in their dual GAN model, small values, to balance adversarial and reconstruction objectives more evenly.
In this study, we compare two different formulations for the custom loss function (): the Binary Cross Entropy loss and a masked L1 loss .
Let denote the ground truth labels for the discriminator's output (e.g., all-ones for real samples or a threshold-based mask for generated samples), and denote the discriminator's prediction.
BCE penalizes misclassification of each pixel independently:
Equation 8 has been widely used in classification tasks, and ensures probabilistic supervision across the full prediction domain (Semenov et al., 2019).
The second formulation follows the masking strategy introduced in the UFLOOD model (Löwe et al., 2021), which focuses the loss computation on flooded areas only, avoiding bias toward non-flooded regions that dominate the spatial domain.
A binary mask is first constructed using a predefined threshold to identify pixels that are considered flooded in either the prediction or the ground truth:
Only the values in positions where the mask is active (i.e., flooded areas) are selected:
The final masked L1 loss is computed as:
In Equation 9, is the total number of valid (flooded) pixels as determined by the mask. In our implementation, the threshold is set to 0.05 m across all flood events and time frames. This effectively excludes shallow or negligible flood depths, reducing bias caused by large non-flooding areas dominating the loss.
Differences From Standard Pix2Pix
This study made flood-specific adaptations on the structure and loss function of the discriminator based on the standard framework of Pix2Pix.
The previous study explored an FCN structure (i.e., a 1 × 1 PixelGAN classifier) by reducing the depth of the discriminator, concluding that it did not affect fine details (Isola et al., 2017). This is because simply reducing the number of neural network layers can lead to a decline in the network's learning capacity. As illustrated in Figure 1, this study proposed a deeper FCN structure to achieve fine-grid discrimination, enabling it to have more parameters to learn the complexity of fine details. This structure also serves as a prerequisite for ground-truth-guided discrimination in Equation 5.
The standard Pix2Pix defines all parts of generative images as fake, that is is a constant full-zero matrix, even if generative images are identical to the targets (Isola et al., 2017). In this way, the two losses of the discriminator and the generator (Equations 4 and 7) are contradictory, that is, one falls as the other rises. Therefore, the generator and the discriminator can maintain mutual and ongoing adversarial training to guarantee the model's diversity and avoid model collapse. The training stops when the losses (Equations 4 and 7) converge over certain epochs. However, a constant full-zero matrix in Equation 4 means a complete rejection of the generative results. This can mislead the generator, especially when it produces the desired results, resulting in destructive competition (Sajeeda & Hossain, 2022; Saxena & Cao, 2021).
Compared with the standard Pix2Pix, this study proposes a measure to calculate based on the error to avoid rejecting the generative results completely, and we call it ground-truth-guided discrimination (Equation 5). In this way, Equations 4 and 7 are no longer contradictory which implies that the loss values of the two adversaries can theoretically decrease to zero simultaneously and bring the adversary to an end, pointing to deterministic results. This design enables the discriminator to focus more on learning flood patterns rather than merely outperforming the generator, leading to a more stable decline in the loss function and making it better suited for flood prediction.
Criteria
Rooted mean square error (RMSE), Coefficient of Determination (R2), and Nash-Sutcliffe Efficiency (NSE) were used in this study. denotes the number of cells (pixels), and represent the target and prediction of flood water depth of the th pixel.
In Equation 14, NSE is calculated for each location individually, where and denote the target and predicted water depth at location and time over time steps, and is the mean target value at location . In Equation 15, R2 is used to evaluate the overall agreement between and across all locations, and is the mean target value across all locations. Equations 14 and 15 share a similar mathematical expression. They range from to 1, with higher values indicating better fit of the model. In this study, NSE evaluates the predictive accuracy of flood depth at individual pixels over time, focusing on the model's ability to capture the temporal dynamics of flood events at specific locations. In contrast, R2 assesses the overall performance of the model across the entire data set, measuring how well the predicted flood depths correspond to observed values across all pixels. This dual approach provides a comprehensive evaluation, balancing local accuracy with broader predictive capability.
The error of a single cell is used for the visualization of flood maps straightforwardly, as below,
Equations 13–16 are sensitive to extreme values. As water depths vary significantly across different rainfall events, comparing these values is meaningful only for the same rainfall event.
Precision, Recall and Critical Success Index (CSI) were introduced to evaluate the model performance. Let denotes a threshold that converts water depths to binary values. The water depth higher than is flooded (positive); otherwise, it is non-flooded (negative). Therefore, the prediction results are classified into True Positive (TP), False Positive (FP), True Negative (TN), and False Negative (FN).
Precision measures the quality of positive predictions, that is, the proportion of predicted flooding cells that are correct, while Recall assesses the completeness of positive predictions, that is, the proportion of true flooding cells correctly identified. CSI combines both Precision and Recall into a single metric. Equations 17–19 range from 0 to 1, with higher scores indicating better model performance. When the denominator is zero, this study assigns a score of zero. In previous studies, a threshold of 0.05 m was commonly used, where a depth greater than 0.05 m is considered flooded, and 0.3 m is regarded as moderate water depth (do Lago et al., 2023; Environment Agency, 2019; Löwe et al., 2021).
Most natural rainfall events do not lead to flooding in the early stage, or they result in only minor flooding, which implies that the number of TP is generally zero or very small. However, Equations 17–19 are sensitive to the number of TP. Consequently, Accuracy is introduced as a complementary metric, as below,
The range of Equation 20 is , with higher scores indicating a good performance on TP and TN.
Case Study
Case Study Area
The city of Exeter, located in the Southwest of England, was selected as the study area, covering approximately 27 km2 with a spatial resolution of 2 m. Situated on the wide floodplain and estuary of the River Exe, the city's elevation ranges from 0 to 162 m, with higher terrain in the north and lower areas in the south, as shown in Figure 2a. Due to this topographical variation, Exeter is prone to frequent flooding. Historical extreme rainfall events in summer and winter have led to severe floods in the past two decades. Furthermore, research indicates that the frequency of extreme rainfall events is expected to increase due to climate change (Kew et al., 2024), highlighting the need for rapid spatiotemporal flood mapping tools. This study used elevation and slope from Digital Elevation Model (DEM) as the features (Environment Agency, 2023; Li et al., 2025), according to the feature importance based on previous research (Khosravi et al., 2019; Löwe et al., 2021), as shown in Figures 2b and 2c.
[IMAGE OMITTED. SEE PDF]
Rainfall Events
This study included 44 rainfall events in total: 26 synthetic rainfall events and 18 natural rainfall events. The synthetic rainfall events, with durations ranging from 1 to 3 hr, were generated using return periods from 10 to 1,000 years, which aligns with the UK standard based on the handbook published by the UK Environment Agency (Environment Agency, 2019; Faulkner, 1999). The historical rainfall events were obtained from the inventory of Met Office and Devon County Council (Devon County Council, 2024; Met Office, 2024). We collected radar data from Met Office Nimrod System (Met Office, 2003). For each selected event, we extracted rainfall data from the first 6 hr following the onset of rainfall and conducted a 6-hr simulation accordingly. We applied a spatially uniform rainfall intensity over the study area with a temporal resolution of 5 min for both natural and synthetic rainfall events. Radar data was used solely to compute the area-averaged rainfall intensity.
Ground Truth Flood Water Depth
This study utilized simulation outputs from CADDIES (Guidolin et al., 2016), a conceptual flood mapping model, as the ground truth (targets). For each rainfall event, flood maps were generated at 5-min interval over a 6-hr period, starting from the onset of rainfall. The input DEM and rainfall data used in CADDIES were the same as those employed in the proposed model.
Data Augmentation and Normalization
Data augmentation is an essential and common approach to reduce overfitting for small data sets in deep learning. This study employed random cropping to increase the number of spatiotemporal samples and enhance the diversity of the training set. Furthermore, convolutional neural networks (CNNs) inherently face challenges in effectively processing information at the corners and edges of the input matrix (Innamorati et al., 2020; Islam et al., 2024). Various regions can be centered within the cropped patches by using random cropping rather than fixed patches. This approach allows the model to learn spatial patterns more comprehensively while mitigating boundary effects.
The min-max normalization method was applied to the data set to uniform the scale and improve the convergence (do Lago et al., 2023; Li et al., 2021).
Data Set
This study divided synthetic and natural rainfall events into training and test sets at an 8:2 ratio, maintaining the same proportion for both types of events. Two experiments were conducted:
-
Experiment: 1 To evaluate the mode's stability, 5-fold cross-validation was performed. Each fold comprised training and validation events. Data augmentation generated 8,000 samples from the training events to train the model, while the validation events assessed the model's ability to learn common patterns across events.
-
Experiment: 2 Data augmentation was applied to all training events to produce 10,000 samples, which were split into a training set and validation set at an 8:2 ratio. The validation set was used solely to monitor the trend of loss values.
Test events remained unseen during both the training and validation stages. Rainfall Intensities for Experiment 1 and 2 are provided in Appendix E in Supporting Information S1.
Model Settings
The models were implemented using PyTorch within the Python 3.6 environment and trained on an NVIDIA A100 GPU with 40 GB of memory. Hyperparameters were configured based on baselines established in previous studies, with a kernel size of 3, pooling size of 2, stride of 1, and batch size of 32 (Löwe et al., 2021). The Adam optimizer was employed with an initial learning rate of 0.0001, allowing for adaptive adjustments (Kingma, 2014). Each model was trained for 200 epochs, and the epoch corresponding to the lowest validation loss was evaluated. The study compared four models (Li et al., 2025), which are the baseline of U-Net (U-Net), the baseline of pix2pix (Pix2Pix), the proposed model with BCE loss function (BCE), the proposed model with masking loss function (MSK). The model settings are shown in Table 1. The matrix of is the discriminator target: For Pix2Pix, is a full zero matrix, the same as the standard Pix2Pix design, while BCE and MSK adopted the proposed ground-truth-guided discrimination method, as described in Equation 5. The parameter controls the balance of the two objectives (Equation 7). The study set for Pix2Pix, consistent with the standard model, to ensure a fair comparison, as this choice encourages the model to better reflect the input contextual data. For BCE and MSK, the study , as the logic of discrimination was changed by , so similar contributions of the two objectives are expected. Hyperparameter tuning of λ is provided in Appendix A in Supporting Information S1.
Table 1 Setting for Models
| Model name | Generator input | Discriminator output | Discriminator loss function | ||
| U-Net | 256*256*10 | Not applicable | Not applicable | Not applicable | Not applicable |
| Pix2Pix | 256*256*10 | 32*32*1 (classification) | 0 | 100 | |
| BCE | 256*256*10 | 256*256*1 (classification) | error | 10 | |
| MSK | 256*256*10 | 256*256*1 (regression) | error | 10 |
Results and Discussion
Model Comparison
In this section, the models were trained on the entire training data set with a randomly split train-validation set. Training losses were recorded at each epoch, while validation losses were evaluated every 10 epochs.
The loss values of models are presented in Figure 3. For U-Net, the training and validation loss followed a similar trend, indicating that the model fitted well. However, a noticeable error was present at each epoch, suggesting that the patterns learned from the training data were partially applicable to the validation set. In the case of Pix2Pix, an unstable training process was observed, characterized by intense adversarial interactions between the generator and the discriminator. There was a significant discrepancy between the trends of the training and validation losses, as well as a large gap between their values, indicating that the patterns learned by the model were limited. The lowest validation loss was observed at the 150th epoch. Extending the training epochs for the Pix2Pix may enhance performance, but this would require additional computational resources.
[IMAGE OMITTED. SEE PDF]
For BCE, the training losses decreased steadily during the first 50 epochs, as the optimization objectives of the two adversaries were similar. Between the 50th and 60th epochs, the discriminator's loss continued to decrease while the generator approached a local optimum and was subsequently corrected by the discriminator. From the 60th to the 80th epoch, the generator and discriminator engaged in intense adversarial interactions, as the generator was still driven by the task of defeating the discriminator, pushing itself to generate higher-quality outputs. After the 80th epoch, the validation loss started to grow, meaning that the discriminator began to overfit because the generator learned patterns to outcompete the discriminator. For MSK, the overall adversarial intensity is higher than that of BCE but lower than that of Pix2Pix. This is because the discriminator loss function was calculated based on flooded cells rather than all cells, that is, it was grounded in the targets but with a different focus from the generator. Both losses exhibited an overall downward trend with mild fluctuations, eventually stabilizing, demonstrating the effectiveness of the proposed approach. Furthermore, neither of the validation losses showed overfitting, indicating the robustness of the model.
From Figure 3, it can be observed that the two models proposed in this study outperform Pix2Pix in terms of stability. This is because the ground-truth-guided discrimination ensures that the two adversaries share similar optimization objectives, which further reduces the model's diversity. Moreover, compared to U-Net, the generator's training and validation losses demonstrate more similar trends, indicating that most of the patterns learned during training are accurately reflected in the validation loss. Therefore, the proposed approaches are effective.
Figure 4 illustrates the mean performance of models. The performance of the Pix2Pix model differed significantly from that of the other three models, indicating that the standard Pix2Pix model struggled to be directly applicable for flood prediction. The BCE and MSK models, with their higher metrics, validated the effectiveness of the proposed modifications. These modifications notably enhanced the Pix2Pix model's results in flood forecasting, and the BCE model dominated the U-Net model, demonstrating that adaptations to the discriminator alone were sufficient to make the BCE model surpass U-Net in performance. However, overall, the other three models exhibited similar performance, with no substantial differences in efficacy. From the 1-hr to 6-hr forecasts of rainfall events, the precision, recall, accuracy, CSI, and R2 metrics for all models showed an initial increase followed by a stabilization. The study focused on rainfall events ranging from 1 to 3 hr, suggesting that all models had limited capability in handling the rising stage of these events, but as the event progressed, their predictive performance improved significantly. Notably, the initial low precision observed in the models during the first 2 hr was understandable and reflected the inherent challenges in early flood prediction. During the early stages of rainfall, especially with natural rainfall patterns, areas that developed into floods were rare, resulting in a higher number of true negative samples. This was consistent with expectations and demonstrated that the models were effective in correctly identifying non-flood conditions, even though they struggled to accurately detect actual flooding events at this stage. In summary, the proposed enhancements were suitable for flood prediction, demonstrating good predictive performance across evaluated metrics. Model performance variability across random seeds is provided in Appendix B in Supporting Information S1. Mean ± standard deviation of model scores for maximum water depth prediction across 13 test rainfall events is provided in Appendix G in Supporting Information S1.
[IMAGE OMITTED. SEE PDF]
Figure 5 compares the errors of BCE and Pix2Pix for the 3-hr 100-year rainfall event. It was observed that Pix2Pix exhibited both substantial overestimation and underestimation, which aligned with the observations in Figure 4. Moreover, at any given time and location during the flood event, the model produced highly inconsistent predictions, preventing a clear determination of whether it had effectively learned spatial or temporal patterns. Additionally, the model failed to maintain a consistent prediction pattern for certain areas, alternately overestimating and underestimating at successive time steps, making the model unsuitable for practical deployment. In contrast, the BCE model exhibited relatively smaller errors. For a small subset of regions, BCE consistently underestimated all time steps, indicating a systematic bias affecting its accuracy. However, this bias was more predictable and potentially correctable than erratic fluctuations. Overall, the proposed method demonstrated feasibility for flood prediction and significantly improved over standard models.
[IMAGE OMITTED. SEE PDF]
Figure 6 illustrates the pixel-wise NSE differences between the BCE and U-Net. Overall, while BCE shows a slight visual advantage, neither model clearly dominates the other, which is consistent with the conclusion drawn from Figure 4. The proposed discrimination method changes the U-Net's prediction behavior, as reflected in the spatial distribution of NSE difference. Regions where BCE outperforms are more spatially scattered, indicating increased sensitivity to fine-grained spatial features. As seen in subplots A and B of Figure 6, BCE produces more accurate predictions than U-Net for small structures with clear boundaries, such as roads, buildings, and channels, suggesting that the proposed discriminator has a stronger ability to capture spatial structural details. However, in some low-lying areas with flat terrain where runoff tends to accumulate, such as subplots C and D, the proposed model produces less accurate predictions than U-Net. These areas lack distinct spatial structural cues, making it difficult for the discriminator to utilize its advantage. In summary, introducing the proposed discriminator enables the model to focus on more realistic spatial details. However, in regions where such details cannot be effectively exploited for prediction, the model offers limited advantages. Pixel-wise NSE of BCE and U-Net under 3-hr 100-year event is provided in Appendix F in Supporting Information S1.
[IMAGE OMITTED. SEE PDF]
Overall Performance of the BCE
This study applied a 5-fold cross-validation to assess the generalization capacity and stability of the model. Figure 7 illustrates the CSI scores of the five models with thresholds of 0.05 and 0.3 m. Overall, the models exhibited similar prediction behavior across folds, that is, low scores in the early phase and stable, high scores during the receding phase, indicating that the model consistently learned underlying flood dynamics from different events. Among the five folds, Folds 2–5 showed comparable performance, whereas Fold 1 underperformed, which may be attributed to the characteristics of the rainfall events it contained. Natural rainfall patterns are inherently irregular, and in our case, only 18 rainfall events were available. This limited sample size increases the likelihood that one fold includes anomalous or less representative rainfall patterns, leading to performance variations. Given the consistent results from the other four folds, such variability is expected and acceptable in small-sample data sets. Notably, Fold 1 showed a CSI increase of approximately 10% when the threshold increased from 0.05 to 0.3 m, suggesting that the model still captured meaningful patterns for moderate flood depths despite uneven rainfall conditions. In contrast, this growth was less pronounced in other folds, implying stable performance across different thresholds. In summary, the 5-fold evaluation confirms that the model is both effective and robust across varying rainfall inputs.
[IMAGE OMITTED. SEE PDF]
Figure 8 shows the boxplots in identifying flooding areas (t = 0.05) of precision, recall, CSI, and Accuracy at each time step across seven synthetic events (left) and five natural events (right). The precision, recall, and CSI scores were relatively low in the early stages during the rainfall events because of the limited number of flooding cells. These scores in certain natural events were 0 in the beginning steps, while the accuracy score was 1, indicating the floods did not occur, and the models correctly predicted all True Negatives (non-flooding areas). As the cumulative rainfall increased, the number of flooded cells grew, and flood depths rose, gradually improving the scores. However, fluctuations can be observed in the middle stages due to variations in rainfall events, especially for natural events. Eventually, the scores stabilized at higher values and remained steady during the receding phase of the flood. In addition, synthetic rainfall events exhibited greater stability than natural rainfall events. The median and mean values of synthetic rainfall were close, indicating that outliers have a limited impact on the average. In contrast, a noticeable difference was observed between the median and mean values for natural rainfall events, suggesting lower stability, which was attributed to the uneven natural rainfall events. In summary, the overall performance of the four metrics above is acceptable, and the proposed model can effectively learn the spatiotemporal dynamics of pluvial flooding within the trained areas. However, the model showed different performances in the rising and receding phases, which suggested the patterns of these two phases were different. During the receding phase, the model mainly learned the pattern of the underlying surface conditions, whereas the rising phase required additional consideration of rainfall patterns, and this was particularly challenging due to the uneven natural rainfall events.
[IMAGE OMITTED. SEE PDF]
Figure 9 shows the normalized 2D histogram of max flood depth with the 30-min lead time across test rainfall events. The predictions were reasonable for most cells, thus high R2 values were observed in these events. Errors were minor for water depths lower than 1.2 m, and the accurate prediction within this water range can help recognize various levels of impact on the city. The predictions were unstable for water depths over 1.2 m, and different patterns of predictions were observed between synthetic and natural rainfall events. The prediction tended to overestimate the water depth on synthetic events, and this was acceptable because the impact was similar after the flood depth reached the threshold of 1.2 m. In contrast, some cells exhibited underestimation during natural rainfall events, as their irregular rainfall patterns make prediction more challenging. In addition, water depths deeper than 1.2 m typically appear in stream channels, where floodwaters converge from a broader spatial extent. However, our model processes inputs as independent 256 × 256 patches, which limits the spatial context available to the network. This patch-wise processing prevents the model from capturing upstream-downstream relationships and large-scale convergence patterns, leading to degraded performance in deep-water areas.
[IMAGE OMITTED. SEE PDF]
Performance of the BCE on Specific Events
The 3-hr 100-year rainfall event was chosen to demonstrate the flood progression in Figure 10. The model effectively captured both rising (1st–3rd hour) and receding (3rd–6th hour) phases, demonstrating its ability to predict flood dynamics across diverse terrain conditions. For instance, the higher-elevation eastern area experienced temporary water accumulation, while the lower southwestern region near the riverbank retained water longer, leading to prolonged inundation.
[IMAGE OMITTED. SEE PDF]
To show spatial variations in flood progression, six flood-prone points in the East and Southwest were selected for comparison under two lead times and rainfall events in Figure 11. The flood response varied, with points in the East (A, B, C) receding quickly after rainfall ceased, while those in the Southwest (D, E, F) retained water longer. Since rainfall conditions were uniform, catchment characteristics were the primary drivers of these differences, suggesting the model successfully learned spatial patterns. The model performed better for synthetic events than natural rainfall due to the more structured patterns of synthetic events. The NSE for the synthetic event was higher with a 5-min lead time than with 30 min, except at D. The visualized result shows that D exhibits an overall underestimation rather than a completely incorrect pattern, indicating the model captures useful information but retains a noticeable bias, likely due to unresolved hydrodynamic complexities or terrain-driven flow variations. In the natural event, NSE values were similar for both lead times, but the 5-min lead time produced smoother predictions, lacking steep increases. Overall, the 5-min lead time provided more reliable predictions.
[IMAGE OMITTED. SEE PDF]
Computing Time
In this study, CADDIES simulations averaged 18,700 s for a 6-hr (72-frame) simulation using an NVIDIA RTX 3070 GPU. However, the computation time was highly sensitive to rainfall accumulation and duration. For example, simulating a rainfall event with a total rainfall depth of 5 mm on 3 January 2014, which did not cause flooding, took only 90 s. In contrast, a 6-hr rainfall event with a total rainfall depth of 34 mm on 9 May 2023, required 34,635 s to complete due to the increased computational complexity associated with modeling flood dynamics.
On the other hand, the deep learning-based BCE model exhibited consistent computational times across all events, requiring an average of 3,480 s for a 60-frame simulation on the same NVIDIA RTX 3070 GPU. When run on an NVIDIA A100 GPU, the simulation time is reduced significantly to 150–180 s, averaging approximately 3 s per frame. Unlike CADDIES, the BCE model supports direct frame prediction, enabling it to predict the flood depth at any specific time point within a 6-hr event based on the spatiotemporal context available up to the current time, rather than simulating the process sequentially from the initial state. This capability improves efficiency by eliminating the dependency on prior frames for subsequent predictions.
Discussion
The study showed that the proposed models achieved higher evaluation scores and temporal consistency compared to the standard Pix2Pix. This was attributed to the adaptation of the proposed discrimination approach. The ground-truth-guided discriminator better fit flood-related tasks, whereas the discriminator in Pix2Pix primarily focused on visual smoothness. For tasks governed by physical principles, such as flood prediction, designing discriminators that reflect task-specific semantics could lead to better guidance and improved model performance. Furthermore, the improved temporal consistency achieved by this approach enhanced the reliability of flood forecasts over time, which was crucial for real-world early warning systems.
Through multiple training runs, the proposed BCE model achieved slightly better performance than the baseline U-Net. It also induced noticeable changes in prediction behavior, with the model paying more attention to flood details, indicating the effectiveness of the proposed discrimination approach. This performance improvement may be attributed to the use of a simple FCN-based discriminator. Given its effectiveness even in this basic form, the discriminator remains a promising component for further refinement; more expressive designs may enhance its guidance for the generator.
The proposed model directly predicted future flood depths at fixed lead times based solely on historical rainfall and static catchment features. This design bypassed the need for step-by-step forecasting of hydrological variables, thereby enabling flexible and jump-ahead prediction. Such capability would be advantageous in time-sensitive scenarios where traditional recursive models were constrained by temporal continuity. As a result, the proposed model effectively offers a simplified alternative for practical flood early warning applications.
The results showed consistency with existing research. For example, Wang et al. and Cao et al. employed a single-step sliding window approach to predict floods and observed that model scores were generally lower during the early stages of rainfall, and higher in the later stages (Cao et al., 2025; Wang et al., 2024). They attributed this phenomenon to the different phase characteristics between rainfall and water level response. We further suggest that, from a numerical perspective, data imbalance in the early rainfall stage is another important factor contributing to weaker model performance. In addition, the numerical results exhibited similar trends and error levels to those reported in related studies, even within a broader context of flood modeling (Bentivoglio et al., 2023, 2025; Pianforini et al., 2024; Taghizadeh et al., 2025). While the overall results aligned with prior findings, this study employed a city-scale modeling setup with a higher spatial resolution, which helped capture and effectively discriminate finer variations in flood depth and thus provides a potential advantage.
Compared with multi-step prediction or auto-regressive methods (Bentivoglio et al., 2023, 2025; Pianforini et al., 2024; Taghizadeh et al., 2025), our results exhibited a distinct temporal trend. In previous studies, multi-step models tended to achieve higher accuracy at the initial prediction steps, but their performance degraded over time due to accumulated errors. In contrast, the single-step sliding window approach is not affected by error propagation from previous steps (Cao et al., 2025; Wang et al., 2024). Given its independence from prior predictions, this approach aligns better with our objective of reducing temporal error accumulation in continuous forecasting.
A major limitation of the proposed model lies in the spatial generalization capability, which means the proposed model needs to be trained for each unseen site. This limitation was a recognized challenge across most existing data-driven flood prediction models. Studies attributed this issue to architectural limitations of U-Net (Bentivoglio et al., 2022; do Lago et al., 2023), which relied on patch-based sampling and lacked a global receptive field over the whole catchment, making it unable to perceive incoming runoff from surrounding areas or the outflow beyond the patch. This restricted its capacity to capture large-scale flood propagation patterns. In addition, the limited size of the training data set was a key factor constraining spatial generalization, as it reduced the diversity of hydraulic conditions to which the model was exposed during learning (Bentivoglio et al., 2025; Wang et al., 2024; Xu et al., 2025). Consequently, the model struggled to learn different hydraulic dynamics and failed to generalize to unseen domains. Zero-shot testing revealed such limitations, especially in scenarios involving significant differences in flood volume, terrain size, and elevation, for example, the model failed to handle unseen hydraulic behaviors, such as trapping water without further propagation in sloped basins, which were not represented in the training data (Bentivoglio et al., 2025). From a machine learning perspective, the limited size and coverage of training data set led to a non-i.i.d. (independent and identically distributed) learning problem. Although the labels were consistently generated using conceptual or hydrodynamic models, differences in catchment characteristics and local flood severity still led to distributional shifts between training and unseen domains. A commonly adopted solution is transfer learning, where a small amount of data from the target catchment are used to fine-tune the pre-trained model (Cache et al., 2024; Seleem et al., 2023; Xu et al., 2025). It was also worth noting that even physics-based models typically require site-specific calibration prior to deployment. In this context, transfer learning with fine-tuning represented a practical strategy for improving the adaptability of data-driven models.
Future research can build upon this study in several key areas to further enhance flood prediction accuracy and robustness. First, the proposed discrimination method is theoretically compatible with existing advancements in deep learning models. Future studies could integrate it into other models to evaluate whether it brings additional performance improvements. Second, multi-step prediction can be considered depending on the specific application scenarios. For instance, while multi-step prediction introduces a risk of cumulative errors, single-step prediction may become less efficient in large urban areas, as it requires multiple iterations to generate a complete sequence of flood maps. Third, regarding the model's generalization capacity, future work may focus on incorporating features with broader spatial context, such as topographic wetness index (TWI), flow accumulation, or hydrological connectivity. Alternatively, transfer learning could be employed to fine-tune the model for improved adaptability to different regions. In addition, the spatial resolution of rainfall input is an important factor in flood dynamics. In future work, it is necessary to incorporate such inputs to better reflect the spatial variability of real precipitation events. Finally, as the model is trained to learn patterns from CADDIES, a non-physics-based model, its outputs inevitably reflect the limitations of the original simulator. This constrains physical interpretability and may affect generalizability. While suitable for rapid, scenario-based applications, the model is not intended for physically rigorous analyses. Future work will focus on extending the approach to physics-based models to enhance realism and robustness.
Conclusion
The study proposed a discriminator guided approach that evaluates prediction quality based on the error between model outputs and target water depths.
The main contributions are as follows,
-
The discriminative function of the standard Pix2Pix model is redefined by introducing a discriminator-guided approach. It repositions the discriminator as a mentor rather than a competitor, fostering a more stable training process and enabling the generator to produce highly deterministic flood maps. Such a role shift provides valuable insights for tasks governed by physical principles, such as hydrological forecasting and environmental modeling.
-
The model exploits the strengths of single step sliding window prediction, allowing it to estimate the flood depth at any specific time point within a 6-hr event based on the spatiotemporal context available up to the current time. Such capability would be advantageous in time-sensitive scenarios where traditional recursive models were constrained by temporal continuity. As a result, the proposed model effectively offers a simplified alternative for practical flood early warning applications.
-
This study uses Exeter as a city-scale case study to provide an initial validation of the proposed method. Comparative experiments with standard Pix2Pix and U-Net demonstrate the effectiveness of our approach. In addition, parameter tuning on key components was conducted, offering useful references for future research.
Experimental results demonstrated that, compared to pix2pix, the proposed discriminator effectively suppressed visually plausible but numerically inaccurate outputs, resulting in more physically consistent flood depth estimations. When compared with U-Net, the model also better captured localized flood patterns. Additionally, it demonstrated the ability to predict near-future flood depths without requiring explicit future rainfall as input.
These results suggest that the model may serve as a lightweight alternative for short-term flood forecasting in settings with available input data. Though not designed to replicate full physical processes, it may assist in operational forecasting tasks where efficiency is prioritized.
Acknowledgments
Zhufeng Li is a PhD student sponsored by the China Scholarship Council (202208440028) at the University of Exeter.
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
Data Availability Statement
The catchment data, proposed model code, and rainfall data can be downloaded here: (Li et al., 2025).
Aderyani, F. R., Jafarzadegan, K., & Moradkhani, H. (2025). A surrogate machine learning modeling approach for enhancing the efficiency of urban flood modeling at metropolitan scales. Sustainable Cities and Society, 123, 106277. https://doi.org/10.1016/j.scs.2025.106277
Bentivoglio, R., Isufi, E., Jonkman, S. N., & Taormina, R. (2022). Deep learning methods for flood mapping: A review of existing applications and future research directions. Hydrology and Earth System Sciences, 26(16), 4345–4378. https://doi.org/10.5194/hess‐26‐4345‐2022
Bentivoglio, R., Isufi, E., Jonkman, S. N., & Taormina, R. (2023). Rapid spatio‐temporal flood modelling via hydraulics‐based graph neural networks. Hydrology and Earth System Sciences, 27(23), 4227–4246. https://doi.org/10.5194/hess‐27‐4227‐2023
Bentivoglio, R., Isufi, E., Jonkman, S. N., & Taormina, R. (2025). Multi‐scale hydraulic graph neural networks for flood modelling. Natural Hazards and Earth System Sciences, 25(1), 335–351. https://doi.org/10.5194/nhess‐25‐335‐2025
Burrichter, B., Hofmann, J., Koltermann da Silva, J., Niemann, A., & Quirmbach, M. (2023). A spatiotemporal deep learning approach for urban pluvial flood forecasting with multi‐source data. Water, 15(9), 1760. https://doi.org/10.3390/w15091760
Cache, T., Gomez, M. S., Beucler, T., Blagojevic, J., Leitao, J. P., & Peleg, N. (2024). Enhancing generalizability of data‐driven urban flood models by incorporating contextual information. Hydrology and Earth System Sciences, 28(24), 5443–5458. https://doi.org/10.5194/hess‐28‐5443‐2024
Cao, X., Wang, B., Yao, Y., Zhang, L., Xing, Y., Mao, J., et al. (2025). U‐RNN high‐resolution spatiotemporal nowcasting of urban flooding. Journal of Hydrology, 659, 133117. https://doi.org/10.1016/j.jhydrol.2025.133117
Department for Environment Food & Rural Affairs. (2024). Flood and coastal erosion risk management report: 1 April 2019 to 31 March 2020.
Devon County Council. (2024). Flood risk management. Retrieved from https://www.devon.gov.uk/floodriskmanagement/
do Lago, C. A. F., Giacomoni, M. H., Bentivoglio, R., Taormina, R., Gomes, M. N., Jr., & Mendiondo, E. M. (2023). Generalizing rapid flood predictions to unseen urban catchments with conditional generative adversarial networks. Journal of Hydrology, 618, 129276. https://doi.org/10.1016/j.jhydrol.2023.129276
Environment Agency. (2019). What is the risk of flooding from surface water map. Retrieved from https://assets.publishing.service.gov.uk/media/5db6ded540f0b6379a7acbb8/What‐is‐the‐Risk‐of‐Flooding‐from‐Surface‐Water‐Map.pdf
Environment Agency. (2023). LIDAR composite digital terrain model (DTM) 2m [Dataset]. Retrieved from https://environment.data.gov.uk/dataset/09ea3b37‐df3a‐4e8b‐ac69‐fb0842227b04
Faulkner, D. (1999). Flood estimation handbook: Rainfall frequency estimation. Institute of Hydrology.
Fowler, H. J., Ali, H., Allan, R. P., Ban, N., Barbero, R., Berg, P., et al. (2021). Towards advancing scientific knowledge of climate change impacts on short‐duration rainfall extremes. Philosophical Transactions of the Royal Society A, 379(2195), 20190542. https://doi.org/10.1098/rsta.2019.0542
Fu, G., Jin, Y., Sun, S., Yuan, Z., & Butler, D. (2022). The role of deep learning in urban water management: A critical review. Water Research, 223, 118973. https://doi.org/10.1016/j.watres.2022.118973
Goyal, M., Goyal, R., Venkatappa Reddy, P., & Lall, B. (2020). Activation functions. In W. Pedrycz & S.‐M. Chen (Eds.), Deep learning: Algorithms and applications (pp. 1–30). Springer International Publishing. Cham. https://doi.org/10.1007/978‐3‐030‐31760‐7_1
Guidolin, M., Chen, A. S., Ghimire, B., Keedwell, E. C., Djordjević, S., & Savić, D. A. (2016). A weighted cellular automata 2D inundation model for rapid flood analysis. Environmental Modelling & Software, 84, 378–394. https://doi.org/10.1016/j.envsoft.2016.07.008
Güneralp, B., Güneralp, İ., & Liu, Y. (2015). Changing global patterns of urban exposure to flood and drought hazards. Global Environmental Change, 31, 217–225. https://doi.org/10.1016/j.gloenvcha.2015.01.002
Guo, K., Guan, M., & Yu, D. (2021). Urban surface water flood modelling—A comprehensive review of current models and future challenges. Hydrology and Earth System Sciences, 25(5), 2843–2860. https://doi.org/10.5194/hess‐25‐2843‐2021
Guo, Z., Leitao, J. P., Simões, N. E., & Moosavi, V. (2021). Data‐driven flood emulation: Speeding up urban flood predictions by deep convolutional neural networks. Journal of Flood Risk Management, 14(1), e12684. https://doi.org/10.1111/jfr3.12684
Hammond, M. J., Chen, A. S., Djordjević, S., Butler, D., & Mark, O. (2015). Urban flood impact assessment: A state‐of‐the‐art review. Urban Water Journal, 12(1), 14–29. https://doi.org/10.1080/1573062X.2013.857421
Hanlon, H. M., Bernie, D., Carigi, G., & Lowe, J. A. (2021). Future changes to high impact weather in the UK. Climatic Change, 166(3), 50. https://doi.org/10.1007/s10584‐021‐03100‐5
Hofmann, J., & Schüttrumpf, H. (2021). floodGAN: Using deep adversarial learning to predict pluvial flooding in real time. Water, 13(16), 2255. https://doi.org/10.3390/w13162255
Innamorati, C., Ritschel, T., Weyrich, T., & Mitra, N. J. (2020). Learning on the edge: Investigating boundary filters in CNNs. International Journal of Computer Vision, 128(4), 773–782. https://doi.org/10.1007/s11263‐019‐01223‐y
Islam, M. A., Kowal, M., Jia, S., Derpanis, K. G., & Bruce, N. D. B. (2024). Position, padding and predictions: A deeper look at position information in CNNs. International Journal of Computer Vision, 132(9), 3889–3910. https://doi.org/10.1007/s11263‐024‐02069‐9
Isola, P., Zhu, J.‐Y., Zhou, T., & Efros, A. A. (2017). Image‐to‐image translation with conditional adversarial networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (pp. 1125–1134). https://doi.org/10.1109/CVPR.2017.632
Ivanov, V. Y., Xu, D., Dwelle, M. C., Sargsyan, K., Wright, D. B., Katopodes, N., et al. (2021). Breaking down the computational barriers to real‐time urban flood forecasting. Geophysical Research Letters, 48(20), e2021GL093585. https://doi.org/10.1029/2021GL093585
Kew, S., McCarthy, M., & Ryan, C. (2024). Autumn and winter storms over UK and Ireland are becoming wetter due to climate change. In World weather attribution. Grantham Institute for Climate Change London, UK.
Khosravi, K., Shahabi, H., Pham, B. T., Adamowski, J., Shirzadi, A., Pradhan, B., et al. (2019). A comparative assessment of flood susceptibility modeling using multi‐criteria decision‐making analysis and machine learning methods. Journal of Hydrology, 573, 311–323. https://doi.org/10.1016/j.jhydrol.2019.03.073
Kingma, D. P. (2014). Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980.
Li, Z., Fu, Z., Li, Q., & Fu, G. (2025). Discriminator‐Guided GAN [Model]. Zenodo. https://doi.org/10.5281/zenodo.15029471
Li, Z., Liu, H., Luo, C., & Fu, G. (2021). Assessing surface water flood risks in urban areas using machine learning. Water, 13(24), 3520. https://doi.org/10.3390/w13243520
Löwe, R., Böhm, J., Jensen, D. G., Leandro, J., & Rasmussen, S. H. (2021). U‐FLOOD—Topographic deep learning for predicting urban pluvial flood water depth. Journal of Hydrology, 603, 126898. https://doi.org/10.1016/j.jhydrol.2021.126898
Met Office. (2003). Met Office rain radar data from the NIMROD system [Dataset]. Retrieved from http://catalogue.ceda.ac.uk/uuid/82adec1f896af6169112d09cc1174499/
Met Office. (2024). Past weather events [Dataset]. Retrieved from https://www.metoffice.gov.uk/weather/learn‐about/past‐uk‐weather‐events
O'Donnell, E. C., & Thorne, C. R. (2020). Drivers of future urban flood risk. Philosophical Transactions of the Royal Society A, 378(2168), 20190216. https://doi.org/10.1098/rsta.2019.0216
Pianforini, M., Dazzi, S., Pilzer, A., & Vacondio, R. (2024). Real‐time flood maps forecasting for dam‐break scenarios with a transformer‐based deep learning model. Journal of Hydrology, 635, 131169. https://doi.org/10.1016/j.jhydrol.2024.131169
Ronneberger, O., Fischer, P., & Brox, T. (2015). U‐net: Convolutional networks for biomedical image segmentation, medical image computing and computer‐assisted intervention–MICCAI 2015. In 18th International Conference, Munich, Germany, October 5–9, 2015, Proceedings, Part III 18 (pp. 234–241). Springer. https://doi.org/10.1007/978‐3‐319‐24574‐4_28
Rosenzweig, B., Herreros Cantis, P., Kim, Y., Cohn, A., Grove, K., Brock, J., et al. (2021). The value of urban flood modeling. Earth's Future, 9(1), e2020EF001739. https://doi.org/10.1029/2020EF001739
Sajeeda, A., & Hossain, B. M. M. (2022). Exploring generative adversarial networks and adversarial training. International Journal of Cognitive Computing in Engineering, 3, 78–89. https://doi.org/10.1016/j.ijcce.2022.03.002
Saxena, D., & Cao, J. (2021). Generative adversarial networks (GANs): Challenges, solutions, and future directions. ACM Computing Surveys, 54(3), 1–42. https://doi.org/10.1145/3446374
Seleem, O., Ayzel, G., Bronstert, A., & Heistermann, M. (2023). Transferability of data‐driven models to predict urban pluvial flood water depth in Berlin, Germany. Natural Hazards and Earth System Sciences, 23(2), 809–822. https://doi.org/10.5194/nhess‐23‐809‐2023
Semenov, A., Boginski, V., & Pasiliao, E. L. (2019). Neural networks with multidimensional cross‐entropy loss functions, computational data and social networks. In 8th International Conference, CSoNet 2019, Ho Chi Minh City, Vietnam, November 18–20, 2019, Proceedings 8 (pp. 57–62). Springer. https://doi.org/10.1007/978‐3‐030‐34980‐6_5
Sharma, N. K., & Saharia, M. (2025). DeepSARFlood: Rapid and automated SAR‐based flood inundation mapping using vision transformer‐based deep ensembles with uncertainty estimates. Science of Remote Sensing, 11, 100203. https://doi.org/10.1016/j.srs.2025.100203
Sit, M., Demiray, B. Z., Xiang, Z., Ewing, G. J., Sermet, Y., & Demir, I. (2020). A comprehensive review of deep learning applications in hydrology and water resources. Water Science and Technology, 82(12), 2635–2670. https://doi.org/10.2166/wst.2020.369
Taghizadeh, M., Zandsalimi, Z., Nabian, M. A., Shafiee‐Jood, M., & Alemazkoor, N. (2025). Interpretable physics‐informed graph neural networks for flood forecasting. Computer‐Aided Civil and Infrastructure Engineering, 40(18), 2629–2649. https://doi.org/10.1111/mice.13484
Wang, Z., Lyu, H., Fu, G., & Zhang, C. (2024). Time‐guided convolutional neural networks for spatiotemporal urban flood modelling. Journal of Hydrology, 645, 132250. https://doi.org/10.1016/j.jhydrol.2024.132250
Xu, Q., De Vos, L. F., Shi, Y., Rüther, N., Bronstert, A., & Zhu, X. X. (2025). Urban flood modeling and forecasting with deep neural operator and transfer learning. Journal of Hydrology, 661, 133705. https://doi.org/10.1016/j.jhydrol.2025.133705
Zounemat‐Kermani, M., Matta, E., Cominola, A., Xia, X., Zhang, Q., Liang, Q., & Hinkelmann, R. (2020). Neurocomputing in surface water hydrology and hydraulics: A review of two decades retrospective, current status and future prospects. Journal of Hydrology, 588, 125085. https://doi.org/10.1016/j.jhydrol.2020.125085
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.