Content area
Abstract
Measuring flooding through time is crucial for understanding exposure and vulnerability — key components to estimating flood risks and impacts. Yet, historical records of flood inundation are sparse. In this study, we reconstruct flood extents for 78 damaging events in eastern North Carolina between 1996 and 2020 using high‐resolution geospatial data and address‐level National Flood Insurance Program (NFIP) records. We train random forest models on NFIP‐based labeled flood presence and absence data and a suite of geospatial predictors. Then, we predict the probability of flood damage at every 30 m grid cell within our model domain. Our models achieve an average Area Under the Curve of 0.76 and outperform flood extent estimates from process‐based and remote sensing models when evaluated against NFIP data for six events. We find that approximately 90,000 (2.3%) buildings in our study area flooded at least once, of which over 20,000 (0.53%) flooded more than once. Our estimate is more than double the number of buildings that filed NFIP claims between 1996 and 2020. Furthermore, 43% of flooded buildings are located outside the Federal Emergency Management Agency (FEMA) Special Flood Hazard Area. Our results illustrate that flood exposure, especially repetitive exposure, is much more widespread than previously recognized. By generating a comprehensive record of past flood extents using address‐level observations of damage, we create a first‐of‐its‐kind geospatial database that can be used to identify locations of repetitive flooding. This represents a crucial first step in examining the dynamic relationships between flood exposure, vulnerability, and risk.
Full text
Introduction
Repetitive flooding has dire consequences for communities, raising concerns about infrastructure resilience and economic stability, in addition to public and ecosystem health (IPCC, 2023). Evidence of repetitive flood exposure is prevalent within high-risk communities in the United States (US), where the Federal Emergency Management Agency (FEMA) National Flood Insurance Program (NFIP) has reported increasing numbers of severe repetitive loss properties in recent years (FEMA, 2024e; Weber et al., 2024). This trend exacerbates financial burdens across multiple scales, affecting the NFIP (US Government Accountability Office, 2020), local governments (Gourevitch et al., 2022; Hino et al., 2019), and households (Howell & Elliott, 2019). The resulting financial strain can lead to diminished resources for infrastructure repairs, emergency response, and adaptation efforts. Beyond financial impacts, when floods repeatedly impact the same populations, they can amplify underlying vulnerabilities, compromising the ability of households and communities to effectively respond to future flooding (Cutter, 2018; de Ruiter & van Loon, 2022; Peacock et al., 2014; Van Zandt, 2020).
Despite the growing evidence of repetitive flood exposure, there are few examples of temporally complete data sets of historical flooding and even fewer examples of data sets that can be used to model exposure to repetitive flood hazards or their impacts (de Ruiter et al., 2020; Paprotny et al., 2018). Consequently, most prior studies of flood risk investigate the impacts associated with individual historical events or design floods and typically represent the processes that interact to form risk—such as hazards, exposure, and vulnerability—as static measures. Yet these processes vary over time and across population groups (Faas, 2016; Tate et al., 2021; Wisner et al., 2004), and the inability to account for their dynamic nature has led to a limited understanding of how flood exposure might influence outcomes across different spatial and temporal scales (e.g., from household to community) (Moreira et al., 2021; Terti et al., 2015). Thus, it is crucial to better understand where and how often past flood exposure has occurred, as this information is needed to capture the dynamic nature of flood risk and its impacts.
In this study, we develop a novel database of historical flood events capable of representing repetitive flood exposure. We train random forest models to reconstruct flood extents in eastern North Carolina (NC) for events that occurred between January 1996 and September 2020. For each event, we derive flood presence and absence data from address-level records of Federal Insurance and Mitigation Administration (FIMA) NFIP claims and policies in force obtained from FEMA Region IV for eastern NC. We train the random forest models using the NFIP-based labeled flood presence and absence locations and a suite of high-resolution geospatial data as predictors. Once tuned, we use the models to predict the probability of flood damage exposure at a 30 m resolution across the model domain. For events with other sources of flood inundation information, such as from process-based models and observational models trained on remote sensing data, we compare our modeled flood extents to these estimates based on their ability to predict past exposure. Ultimately, we generate a more comprehensive spatial and temporal estimate of past flood frequency and building exposure across our study area than previously available.
Background
Mapping flood hazards is essential for understanding patterns of exposure over time. Estimating extents of past floods can help identify areas and populations that have experienced multiple flood events and provide insight into the factors that drive repetitive flood exposure and impacts. While records of historical flooding exist, they differ in geographic focus, temporal coverage, and data collection methods (Li et al., 2021). Many records do not provide sufficient spatial resolution to characterize the geographic distribution of exposure or lack the temporal resolution necessary to analyze changes in exposure or vulnerability to flooding over time. For example, hazard events data sets, such the Emergency Events Database (EM-DAT), the Spatial Hazard Events and Losses Database (SHELDUS), and NOAA NCEI's Storm Events Database are temporally explicit but aggregate exposure and damage estimates at larger spatial units, like zip codes, census units, or counties (ASU Center for Emergency Management and Homeland Security, 2024; Delforge et al., 2025; NOAA NCEI, 2020). These data sets are often derived from administrative records or news sources and may underrepresent events further in the past and in less densely populated areas (Jäger et al., 2024; Mazhin et al., 2021; Stevens et al., 2016).
Process-based models (i.e., numerical models) can be used to generate high resolution spatial information on past floods, but seldom at timescales useful for estimating repetitive flood exposure. These models resolve the fundamental equations of fluid dynamics to produce estimates of flood magnitude and intensity such as depth, velocity, extent, and duration (Bates, 2022; Gupta et al., 2015; Teng et al., 2017). In the US, process-based models have been used to hindcast flood events (CERA, 2024; PNNL, 2024; Rose et al., 2024). Hindcasts are useful for generating estimates of potential flood damage, like those reported in FEMA's Hazus Loss Library. Modeled reconstructions of past flood events have also been used to demonstrate that both demographic and economic growth trends since the 1950s have contributed to increasing flood losses (Dottori et al., 2022; Paprotny et al., 2024). However, process-based models often have large computational demands, requiring substantial observational data to calibrate and validate results, and few are applied to reconstruct floods beyond the largest or most damaging events. As a result, researchers and policymakers in the US rely on hazard products like the FEMA Special Flood Hazard Area (SFHA) to delineate flood risk, even though it has been shown to underestimate exposure across a wide range of historical flood events (Blessing et al., 2017; Brody et al., 2013; Highfield et al., 2013; Kousky & Michel-Kerjan, 2017).
The computational demands of process-based models have also limited their capacity to fully represent multiple flood drivers (e.g., fluvial, pluvial, storm surge) and influencing factors (e.g., land use, local infrastructure) simultaneously. In contrast, observation-based models, trained on either remotely sensed or ground-based observations of flooding, are computationally efficient, flexible, and capable of implicitly representing flooding from multiple processes. These models utilize statistical or machine learning methods (e.g., kriging, random forests, convolutional neural nets) trained on observational data to predict flood extents, exposure, or magnitudes of damage. For example, optical satellite observations of flooding have been used to generate predictions of flood extents (Brakenridge, 2016). However, the resolution of optical satellite imagery is often too coarse to resolve flooding in urban areas with complex local infrastructure and topography, or too infrequent to capture the maximum extents of a flood event (Tellman et al., 2021b). Moreover, there may be imagery interference from cloud cover, especially during heavy precipitation events, even when optical satellites are in the correct position to capture a flood. In contrast, radar satellites can detect flooding through cloud cover and at night, but also seldom capture short duration and large flood events, like flash floods (Tarpanelli et al., 2022). Over longer flood events, observations from multiple remote sensing systems can be combined using data fusion techniques to improve temporal coverage (Muñoz et al., 2021; Zhang, 2010).
The use of ground-based observations is relatively new in flood hazard modeling and event-based reconstructions. Examples of ground-based observational data include emergency service requests (e.g., 311 calls; Mobley et al., 2019), volunteered geographic information (e.g., social media posts; de Bruijn et al., 2019), and damage records (e.g., flood insurance claims; Thomson et al., 2023). These data often represent exposure to flooding, producing model outputs that can be interpreted as flood hazard exposure. This is different from process-based models which require the addition of building data and fragility curves to estimate damage and remote sensing observations which do not distinguish between damaged and undamaged flooded pixels. Even so, ground-based observations may not fully represent where flood exposure or damages have occurred. For example, models which rely on insurance records to model flood outcomes may incorrectly predict exposure, damage, and impacts—particularly among underinsured areas and populations (Choi et al., 2024; Gall et al., 2009; Wagenaar et al., 2020). In the US, only homeowners with federally-backed mortgages within the FEMA SFHA are required to purchase flood insurance (FEMA, 2024a). Voluntary purchase of flood insurance among residents outside the SFHA, renters, and homeowners without mortgages is chronically low and previous studies have shown that wealthier, older, and more educated populations are more likely to purchase insurance (Atreya et al., 2015; Bradt et al., 2021; Dixon et al., 2006).
Machine learning algorithms trained on observational data have strong prediction capabilities, even though they cannot explain the influence of different flood drivers like process-based models. Support vector machines and random forests are well-suited to flood event reconstruction as they can train on labeled flood data (i.e., observational data that distinguishes between flood presence and absence) to produce out-of-sample flood predictions in areas without observations, improving the model's accuracy and ability to discriminate between prediction instances (Mosavi et al., 2018). Several recent studies have used ground-based observations to estimate flood damage probability based on past flood damage records, showing results that are as good, if not better, than standard regulatory products like the FEMA SFHA (Collins et al., 2022; Mobley et al., 2021; Woznicki et al., 2019). Others have leveraged flood insurance data to reconstruct individual flood extents and estimates of damage (Thomson et al., 2023); however, to our knowledge, no studies have attempted to generate a database of past flood extents using flood insurance records.
Materials and Methods
Model Framework Overview
Figure 1 provides an overview of the model framework used to generate the archive of flood extents; hereafter referred to as the Flood Extent Archive (FLDEX). We use signal processing to identify discrete flood events from address-level NFIP claims filed between January 1996 and September 2020 (Figures 1a and 1b; Section 3.1.2). We then train random forest models for each event using 30 m resolution geospatial data and labeled observations of flood damage presence and absence derived from NFIP claims and policies-in-force data (Figures 1c–1e; Section 3.1.2-3.1.3). For each event, we create continuous 30 m resolution rasters representing out-of-sample flood damage probability predictions which we then convert to binary rasters representing the flood damage exposure footprint (i.e., flood extent). We test the sensitivity of flood extent predictions to different cutoff thresholds (Figure 1f; Section 3.1.4) and we evaluate model performance (Figure 1g; Section 3.2.1). When possible, we compare our flood extents to other estimates of flooding derived from process-based models and remote sensing observation-based models (Figure 1h; Section 3.2.3).
[IMAGE OMITTED. SEE PDF]
Study Area
North Carolina experiences frequent and severe flooding due to its diverse geography and climate. The State ranks fourth in terms of number of hurricane landfalls in the US (NOAA, 2023), and sixth in terms of NFIP repetitive loss and severe repetitive loss properties (FEMA, 2024e). The study area spans eight USGS Hydrologic Unit Code 6-digit (HUC-6) watersheds (Figure 1b), including the entirety of the Neuse-Pamlico and Cape Fear River watersheds as well as portions of the Chowan-Roanoke and Pee Dee River watersheds. We obtained anonymized records of address-level NFIP policies in force (1974–2020) and claims (1975–2020) from FEMA Region IV for the 78 NC counties overlapping with these watersheds. This amounts to 77% of the State's land area.
Address-Level NFIP Data
Address-level NFIP claims and policies-in-force data were geolocated as described in Thomson et al. (2023). Records that did not match to a building rooftop were excluded. We include records for the period between 1 January 1996 and 30 September 2020, as it aligns with the availability of our geospatial predictor data sets. This period is also associated with increased flood activity in NC (Paerl et al., 2019). Across all claims filed in the study area during this period, total payouts amounted to approximately $2.5 billion (2023 USD) (FEMA, 2024c). The final address-level data set includes 70,947 claims at 40,695 buildings and 957,427 policies-in-force records at 138,889 buildings within the NC Building Footprints (2010) data set (State of North Carolina - Emergency Management, 2012).
We use signal processing techniques from the SciPy.signal package (Virtanen et al., 2020) to identify peaks in time series of NFIP claims corresponding to discrete events based on the recorded date of loss and location, thereby accounting for spatial and temporal clustering of damage during individual flood events. An event is defined when at least 15 claims were recorded (based on “dateOfLoss”) within a 7-day period and within the same USGS HUC-6 watershed boundary. We tested the sensitivity of event identification by varying the number of claims in a 7-day period between 10 and 20 and chose 15 as this captured all 18 flood-related FEMA federal disaster declarations that occurred within our study area between 1996 and 2020 (FEMA, 2024b). The date ranges of discrete events, once identified, were calculated using the height, peak, and volume of claims within each USGS HUC-6 watershed. This allowed events to have different durations in different watersheds. We compared the dates of identified events against other publicly available records, including FEMA disaster declarations (FEMA, 2024b), NOAA NCEI's Storm Events Database (NOAA NCEI, 2020), and HURDAT2 (NOAA, 2024).
In addition, we compare the address-level NFIP claims and policies-in-force records to the FIMA NFIP Redacted Claims and Policies (v2) data sets publicly available through OpenFEMA (FEMA, 2024c, 2024d). The address-level data contains approximately 85% of the claims filed since 1996, but only 46% of policies in force recorded since 2010 (the earliest policy record provided by OpenFEMA) (see Figures S1–S2 in Supporting Information S1). Because flood absence training data by event are defined as policies in force without accompanying claims, having a complete data set is critical. Thus, for events that occurred after 2010, we supplement our policies-in-force records by calculating the total number of missing policies in force in each census tract, both inside and outside the FEMA SFHA, and sampling the missing policies in force from residential buildings with no prior claim or policy record (see Figure S3 in Supporting Information S1). For events prior to 2010, we use an anonymized data set of NFIP policies-in-force aggregated by zip code and follow the same sampling procedure (see Figure S4 in Supporting Information S1). All of the sampled policies-in-force are assumed to be points of flood absence. The original address-level policies-in-force record contained 2,876,297 observations across 78 events, with some policies in force across multiple events (see Figure S5a in Supporting Information S1). After supplementing these with additional policies-in-force records available by census tract and zip code, the final data set contains 8,492,316 observations of policies in force between 1996 and 2020. Our final policies-in-force data set covers 791,825 30 m model grid cells, 666,628 more than the original record of policies in force (see Figure S5b in Supporting Information S1). We test the sensitivity of our results to the inclusion of these sampled policies.
Labeled flood presence and absence training data are generated from NFIP claims and policies in force by event at a 30 m grid resolution to match geospatial predictors (Figure 2). Grid cells are labeled as presence locations (i.e., assigned value of one) when they contain at least one NFIP claim during an event and as absence locations if they contain at least one active policy but no claims filed during the event (i.e., assigned value of zero). In cases where multiple claims are recorded within a single 30 m grid cell, as in multi-family dwellings, the grid cell is still assigned a value of one.
[IMAGE OMITTED. SEE PDF]
Geospatial Predictors
We compiled 21 geospatial predictor variables at a 30 m grid resolution capturing topographic, hydrologic, and contextual factors influencing flooding. The choice of variables was informed by previous studies using random forests for flood modeling (Collins et al., 2022; Knighton et al., 2020; Mobley et al., 2021; Woznicki et al., 2019). Each variable was derived from publicly available sources and resampled to match the 30 m resolution of the 2019 NLCD Land Cover data for NC (USGS, 2024). Following feature selection, we retain 11 final predictor variables: elevation, topographic wetness index, height above nearest drainage, distance to coast, distance to stream, accumulated precipitation, fractional impervious surface, hydraulic conductivity, SFHA, building density, and urban road density. The same set of predictor variables are used to model each event although their relative importance is allowed to vary by event. More information about the creation of geospatial predictor variables can be found in (see Text S1 and Table S1 in Supporting Information S1).
Several predictor variables change over time between 1996 and 2020 and vary by event. In the case of fractional impervious surface, we use available data from 2001, 2006, 2009, 2011, 2013, 2016, 2019 from the NLCD (USGS, 2024). All events prior to 2001 use data from 2001 and all events after use data from the most recent available year. Similarly, urban road density is calculated within a 500 m radius area of each grid cell using data available annually from 2011 to 2020 via the US Census Bureau Primary and Secondary Roads data set (US Census Bureau, 2021). We calculate accumulated precipitation for the duration of each event using ERA5 hourly precipitation data (Hersbach et al., 2023). Although NC experienced development during this period, we use one instance of building density calculated from the NC Building Footprints (2010) data set where the value in each grid cell represents the count of buildings in a 9 × 9 window of grid cells (State of North Carolina - Emergency Management, 2012).
Random Forest Model Procedure
We use the R ranger package (Wright & Ziegler, 2017) to train a random forest model for each event using a 5-fold cross validation with different 80/20 splits of the generated flood absence and presence data for training and testing. Each fold tunes 70 potential models, each with a unique combination of hyperparameters and 1,000 trees to avoid overfitting. The final model for a given event is chosen from the cross-fold model with the highest Area Under the Curve (AUC) performance based on the cross-validation results, which are calculated using out-of-sample testing data. This model is saved, along with information on training and testing performance and variable importance. Continuous flood damage probability rasters are created by applying the model at every 30 m grid cell with input geospatial predictor information across the model domain. The final prediction outcome represents a continuous probability of class membership (flooded or not flooded) based on the average of all predictions generated across the trees within the random forest.
For each event, we generate four potential binary flood exposure footprints representing potential flood extents based on four probability threshold criteria: maximizing accuracy, F1 score, geometric mean, and the mid-point of the predicted continuous probability range.
Model Performance
Performance Metrics
Model performance is calculated per event using the labeled flood presence and absence data generated from the address-level NFIP data. We assess performance using 10 metrics: accuracy, True Positive Rate (TPR), True Negative Rate (TNR), False Positive Rate (FPR), precision, False Alarm Ratio (FAR), F1 score, Critical Success Index (CSI), AUC, and Brier score (see Table S2 in Supporting Information S1 for equations and score interpretations). We create confusion matrices using each of the four probability thresholds to tabulate the number of True Positives (TP), True Negatives (TN), False Negatives (FN), and False Positives (FP) needed for calculating performance metrics. Due to the imbalanced nature of our data—flood presence locations are the minority class across all events—we focus on four metrics commonly used for imbalanced data to report performance relative to the minority class: AUC, CSI, F1 score, and TPR. The AUC measures a model's ability to discriminate between flood presence and absence, with a perfect score of one. The CSI measures a model's ability to correctly predict flood presence while accounting for FP and FN, with a perfect score of one indicating that all flood presence instances are identified with no misclassifications. The TPR is a related metric that measures a model's ability to capture flood presence instances without missing any TP, though it still allows some FP. Finally, the F1 score is the harmonic mean of precision and TPR and balances FP and FN. An F1 score of one represents perfect accuracy in identifying true flood presence instances without any misclassifications.
Model Sensitivity
In addition to testing the sensitivity of flood extent results using different cutoff thresholds, we perform sensitivity analyses on the placement of sampled policies by calculating performance metrics for each event's flood extent using the labeled flood absence locations derived from both the original policies and the sampled policies and for events pre- and post- 2010, as different data sets are used to generate sampled policies for these time periods. To address potential selection bias concerns in using insurance data to train observation-based flood prediction models, we evaluate model predictions and calculate standard errors within one county that experienced substantial impacts inside and outside the SFHA during Hurricane Florence in 2018 (the largest event by number of claims in our record). Using the NFIP claims and policies data we investigate whether there are differences in our model performance across the floodplain boundary, where NFIP policy uptake is substantially lower and therefore fewer claims are recorded. We also calculate standard errors for the areas inside and outside of the SFHA and at locations coincident with NFIP policies to examine variability in predictions.
Comparison With Estimates of Flood Extents From Other Sources
Final flood extents are compared to estimates derived from process-based model outputs and remote sensing observation-based model outputs for six flood events. Modeled flood extents for Hurricanes Floyd (1999), Matthew (2016) and Florence (2018) are available from the ADvanced CIRCulation (ADCIRC) and the Super-Fast Inundation of Coasts (SFINCS) models (Grimley, Bunya, et al., 2024). The ADCIRC simulations represent coastal inundation from storm tide whereas the SFINCS data represents inundation from both coastal and runoff processes. More details are provided in Grimley, Hollinger Beatty, et al. (2024); Grimley et al. (2025); and Ratcliff (2022). We use flood depth rasters to classify locations as flooded or not flooded. Depths from SFINCS were available as a modeled product; depths from ADCIRC were calculated by subtracting our 30 m resolution DEM from the modeled water surface elevations. We use a lower threshold of 0.15 m to classify flooded buildings. Because we did not consider the structural characteristics of buildings, it is possible that we overestimate number of flooded buildings in cases where modeled flood depths are lower than the first floor elevation (FFE) of the building.
Remote sensing observation-based model outputs from the MODIS optical satellite are available for Hurricanes Florence (2018), Irene (2011), Ida (2009), and Ernesto (2006) (Tellman et al., 2021a). The MODIS satellite covers the entire globe daily at a 250 m resolution (Lin et al., 2016). During Hurricane Florence, repeat-pass imagery was collected with an Uninhabited Aerial Vehicle Synthetic Aperture Radar (UAVSAR) at a 5 m resolution (Wang et al., 2022). While flood extents from these remote sensing sources represent flooding from multiple drivers and processes, they also do not indicate whether damage occurred.
We evaluate the comparative performance between our estimates of flood extent and available physics-based and remote sensing sources using the labeled flood presence and absence data generated from NFIP records per event. Performance is only calculated within the overlapping region with our model domain to maintain the same number of flood presence and absence locations (e.g., FLDEX and ADCIRC model performance metrics are both evaluated within the smaller ADCIRC domain). For performance comparisons, we focus on TPR, TNR, FAR, and FPR metrics to highlight differences in flood presence and absence magnitudes and predictive reliability for each event. The TPR and FAR assess positive predictions, measuring the models' ability to correctly identify flood presence instances and the rate of FP; TNR and FPR focus on negative predictions, measuring the models' skill in correctly identifying non-flood instances and the rate of misclassifying them as flooded. Perfect scores for TPR and TNR are one, while perfect scores for FAR and FPR are zero. Performance is also calculated with and without the sampled policies data (described in Section 3.1.2) and across different land cover types (e.g., developed vs. undeveloped) as defined by the NLCD Land Cover data for the year of each event (USGS, 2024). We also estimate the number of buildings flooded by each source using the NC Building Footprints (2010) data set (State of North Carolina - Emergency Management, 2012). Additional information on the sources of flood extents is available in Table S3 and Figure S6 in Supporting Information S1.
Results
Flood Events
We identify 78 discrete flood events between 1 January 1996 and 30 September 2020 (Figure 3). These events account for 67,259 claim records, or 95% of all claims filed over the study period. The majority (76%) of claims were filed within the SFHA. Of the 78 events, 75 appear in NOAA NCEI's Storm Events Database (NOAA NCEI, 2020), 44 correspond to dates with tropical cyclone (TC) activity based on HURDAT2 (NOAA, 2024), and 18 correspond to flood-related FEMA disaster declarations (FEMA, 2024b). Only two of the events we identified (events 56 and 62 by claim size, see Table S4 in Supporting Information S1) are not present in other data sources.
[IMAGE OMITTED. SEE PDF]
Model Performance
For each event, we produce an estimated flood damage probability between zero and one, which we then convert to binary flooded and not flooded classifications based on a selected probability threshold. There is significant variation in predicted flood extents and building exposure based on which threshold is chosen (see Tables S5–S6 in Supporting Information S1). Using the geometric mean and F1 score thresholds, we estimate that 99.90% and 82.58% of the model domain, respectively, flooded at least once across the 78 events. In contrast, using the mid-point of the probability range and maximum accuracy thresholds, we estimate 10.48% and 6.86% flooded, respectively. Maximum accuracy minimizes the number of FP and prioritizes correct predictions for flood presence, resulting in the most conservative estimates of flood extents and exposure. Given the imbalanced nature of the training data, maximum accuracy is the most representative threshold. Thus, all subsequent results are reported for flood extents thresholded using maximum accuracy.
Model performance is evaluated across 10 metrics for each of the 78 events (Garcia et al., 2025). Figure 4 shows the variation in model performance across four of the 10 performance metrics measured: AUC, CSI, F1 score, and TPR. The average AUC, CSI, F1 score, and TPR are 0.76, 0.49, 0.60, and 0.51, respectively. The average AUC of 0.76 demonstrates that the models have “good” performance distinguishing between flooded and not flooded locations (de Hond et al., 2022). The average accuracy and average TNR are 0.9973 and 0.9997, respectively, indicating strong model performance in predicting non-flooded areas. The high average accuracy (i.e., optimal value of 1.00) and low average Brier score (i.e., optimal value of 0.00) are due to the imbalance in the training data set, where non-flooded instances dominate. A comparison between average precision (0.90) and average TPR (0.51) demonstrates that while the models correctly predict flooding at 90% of the flooded locations they identify, they tend to underestimate the total number of flooded locations (by 49% on average). The low average FAR (0.10) suggests that the models are conservative, with few instances of flooding predicted where it did not occur. The average CSI of 0.49 demonstrates that the models correctly identify nearly half of all flooded locations when both false alarms (FP) and misses (FN) are considered. The CSI values of 45% of events exceed 0.70, indicative of reasonable model performance (Bates et al., 2021). The IQR and medians across the 78 events for each of the 10 performance metrics are provided in Table S7 in Supporting Information S1. Average model performance by probability threshold choice is also available in Table S8 in Supporting Information S1.
[IMAGE OMITTED. SEE PDF]
Model Sensitivity
We test the sensitivity of the models against three factors: (a) inclusion of sampled policies in the labeled flood absence training data set, (b) event size, and (c) potential bias from differences in the availability of labeled flood training data inside versus outside of the SFHA. The models show consistent performance regardless of inclusion of sampled policies, with events after 2010 and events with more claims exhibiting better performance. Additionally, the models predict more flooded locations than those indicated by NFIP claims, both inside and outside of the SFHA. This suggests that our models capture flood exposure beyond locations recorded as flooded in NFIP claims records, even in areas outside the SFHA, where fewer NFIP claims and policies are present due to lower insurance uptake in these areas.
With the inclusion of sampled policies in the labeled flood absence training data, the average AUC remains the same, while the average CSI increases significantly from 0.49 to 0.51 and the average F1 score increases significantly from 0.60 to 0.61, based on paired t-tests (p < 0.05; see Table S9 in Supporting Information S1). On average, the models perform better for events that occurred after 2010—those for which policy records are available from OpenFEMA (see Table S10 in Supporting Information S1). Based on independent t-tests, the improvements in AUC, CSI, F1 score, and TPR between pre- and post- 2020 events are statistically significant (p < 0.05). For the 44 events prior to 2010, the average AUC is 0.72, average CSI is 0.43, average F1 score is 0.53, and average TPR is 0.44. For the 34 events after 2010, the average AUC is 0.80, average CSI is 0.58, average F1 score is 0.68, and average TPR is 0.60. We find small but statistically significant correlations between event size and performance metrics (AUC, CSI, F1 score, and TPR), using Pearson's product-moment correlation (p < 0.05), indicating improved performance with increasing event size (see Table S11 in Supporting Information S1).
In one county with damage during Hurricane Florence, we compare model performance of flood extents between areas inside and outside of the SFHA to assess whether low insurance coverage outside the SFHA leads to underprediction of flooding in our models due to potentially limited training data (see Figure S7 in Supporting Information S1). We find larger AUC, CSI, F1 score, and TPR values inside the SFHA than outside the SFHA (see Table S12 in Supporting Information S1), suggesting the model may underpredict flooding outside the SFHA compared to inside the SFHA. However, even though only 14% of NFIP claims were filed outside the SFHA, our model predicts that 37% of the flooding occurs outside of the SFHA. Therefore, although the model may underpredict flooding outside the SFHA relative to inside the SFHA, the model still identifies a larger proportion of flooding occurred outside the SFHA compared to the labeled flood presence training data. We also calculate standard errors of the continuous flood damage probability predictions across the county to assess how variability in model predictions differ within and outside of the SFHA and in grid cells coincident with NFIP policies (see Figure S8 in Supporting Information S1). The average standard errors of grid cells inside of the SFHA are 0.11, compared to 0.09 for grid cells outside of the SFHA (see Table S13 in Supporting Information S1). In grid cells with and without NFIP policies, the average standard errors are 0.10 (see Table S14 in Supporting Information S1). These results suggest that our models exhibit similar variation in modeled predictions across the SFHA boundary and in locations with and without NFIP policy coverage.
Performance Comparison of Flood Extent Estimates by Source and Event
Overall, the FLDEX estimates consistently achieve high TNRs and higher TPRs relative to the other sources of flood extent across six events (Figure 5). The FLDEX estimates have lower FPRs compared to all other sources; meaning they misclassify fewer flood absence locations. FLDEX estimates also have lower FARs than both process-based and remote sensing observational model outputs, correctly identifying more flood presence locations as TP than FP. Among the comparison sources, the remote sensing flood extent estimates consistently show the lowest TPRs, indicating that they detect fewer instances of flooding at presence locations. Estimates from process-based models show relatively higher TPRs, indicating that they correctly predict flooding at presence locations. FARs for process-based models fall between those produced by remote sensing flood extent estimates and FLDEX estimates, suggesting they slightly overpredict total flood extent by incorrectly predicting flood presence at absence locations. These findings are consistent when sampled policies are removed and across developed and undeveloped land cover classes (see Tables S15–S16 in Supporting Information S1).
[IMAGE OMITTED. SEE PDF]
Figure 6 compares the differences in predicted flood extents and exposure for FLDEX events with comparison sources. The process-based model, SFINCS, predicts the largest flood extents (Figure 6a) and consistently high flood exposure (Figure 6b) relative to all other sources for events where it is available (Florence, Floyd, and Matthew). The FLDEX and SFINCS model domains are similar (see Figure S6 in Supporting Information S1). In contrast, UAVSAR and FLDEX predict similar flood extents (Figure 6a), even though the UAVSAR domain is much smaller. During Hurricane Florence, UAVSAR flights were commissioned to collect data for multiple days over areas with significant impacts from the storm (Wang et al., 2022), which may explain the high levels of building exposure (Figure 6b) relative to predicted flood extent and domain size (Figure 6a). The MODIS extents predict slightly more flood exposure than FLDEX estimates for the smallest comparison events (Ida and Ernesto), however MODIS extents also produce more FP (Figure 5b). Across the six events, all sources predict similar flood extents in developed land use areas, whereas there is significant variability in the extent of flooding predicted in undeveloped areas (see Table S17 in Supporting Information S1).
[IMAGE OMITTED. SEE PDF]
Spatial Extent of Historical Flood Exposure
Across all 78 FLDEX events, we estimate 90,116 buildings flooded at least once, with 23% experiencing repetitive flood exposure (see Figure S9 in Supporting Information S1). Figure 7 shows the locations of buildings that experienced repetitive flood exposure across the 78 FLDEX events. We estimate that 69,719 (1.81%) of all buildings in the study area flooded in at least one of the 78 events and 20,397 (0.53%) of all buildings flooded more than once. For residential buildings, 49,256 (1.60%) buildings flooded once and 16,514 (0.54%) of residential buildings flooded more than once. Among buildings with repetitive exposure, the average return interval between flood events is 9.6 years. However, the timing varies widely, with 25% of repetitively flooded buildings experiencing a second flood within 3 years. When examining flood exposure across the 44 FLDEX events with TC activity, we estimate that 61,072 (1.58%) of all buildings flooded in one TC event and 19,452 (0.50%) flooded in multiple TC events. For residential buildings across TC events, 46,039 (1.50%) of residential buildings flooded in one TC event, while 16,196 (0.53%) flooded in multiple TC events (see Table S18 in Supporting Information S1).
[IMAGE OMITTED. SEE PDF]
Figure 8 shows the proportion of flooded buildings inside and outside the SFHA across the 78 FLDEX events. We estimate that 43% of all buildings and 35% of residential buildings that flooded between 1996 and 2020 are located outside of the SFHA. In comparison, only 29% of structures associated with NFIP claims are located outside of the SFHA during this period. We estimate a higher rate of repetitive flooding inside the SFHA than outside of it—33% of buildings (34% of residential buildings) inside the SFHA flooded more than once versus 9% of buildings (9% of residential buildings) outside the SFHA flooded more than once. These rates are different than what would be estimated using buildings associated with NFIP claims alone −45% of buildings inside the SFHA and 27% of buildings outside of the SFHA flooded more than once (see Table S19 in Supporting Information S1). The overall proportion of buildings associated with multiple NFIP claims is 40%, whereas we estimate repetitive flood exposure at 23% of buildings identified as flooded. Additionally, a higher proportion of repetitively flooded buildings associated with NFIP claims are located outside the SFHA—20%, versus our estimate of 17%.
[IMAGE OMITTED. SEE PDF]
Discussion
We identify and model 78 flood events that occurred in eastern NC between 1 January 1996 and 30 September 2020 to create a Flood Extent Archive (i.e., FLDEX) that is spatially and temporally complete. Our event-based models perform well with most performance scores representing “good” performance when averaged across all models (de Hond et al., 2022). In addition, our models outperform comparison sources of flood extent estimates when evaluated against observed flood damage presence and absence points from six previous flood events. Using FLDEX, we estimate that 90,116 buildings (73% of which are residential) flooded at least once and 20,397 buildings (81% of which are residential) flooded more than once. Additionally, we predict more than twice the number of flooded buildings compared to those at addresses associated with NFIP claims filed between 1996 and 2020.
Our event-based flood reconstructions provide significantly more insight into flood exposure, revealing both the extents and locations of past flood events. Similar to previous studies (Galloway et al., 2018; Wing et al., 2020), we find that a substantial portion of past flood exposure—43% of all buildings and 35% of residential buildings—occurred outside of the SFHA. However, unlike previous studies, the spatially and temporally explicit archive we generate in this study provides an opportunity to analyze relative exposure and its potential influence on insurance uptake. Notably, we predict that a much larger number of buildings have flooded outside the SFHA than evidenced by the NFIP claims data set—38,554 versus 11,833—even though our models underestimate exposure compared to flood extent estimates from process-based models and remote sensing sources. When comparing the rates of repetitive flooding using FLDEX against those reported in our address-level NFIP claims records, we predict a similar number of repetitively flooded buildings outside of the SFHA—3,354 versus 3,165. However, the proportion of repetitively flooded buildings outside of the SFHA estimated using our models is lower than the proportion calculated using buildings associated with NFIP claims—9% versus 27%. This suggests that local knowledge of flood-prone areas, even in places not officially recognized as hazardous (i.e., outside the SFHA), may be driving higher flood insurance uptake in these areas, leading to higher rates of repeat claims. Overall, we predict flooding at 22,700 additional buildings inside of the SFHA and 26,721 additional buildings outside of the SFHA relative to NFIP claims records, indicating substantial portions of previously unknown flood exposure exist within and beyond the regulatory floodplain. These data could guide the development of targeted insurance subsidy programs, particularly in areas with high rates of repetitive exposure and those that are home to low-income households who may struggle to afford flood insurance.
Data on past flooding is scarce and often not easily accessible to the general public (Fairweather et al., 2024; NRDC, 2024; Wing et al., 2021). Improved access to publicly available data on past flood exposure could empower prospective homeowners and renters with better knowledge of flood risks, potentially encouraging household protection and adaptation behaviors, including through purchase of insurance. Flood insurance is often viewed as the “first line of defense” against flood-related financial losses (FEMA, 2015; Lightbody, 2017). It plays a crucial role in recovery by covering direct repair costs and protecting households from deeper financial strain that can lead to serious risks like mortgage defaults and property abandonment (Kousky et al., 2020; Thomson et al., 2023; You & Kousky, 2024). While flood insurance purchase is mandatory at properties with a federally-backed mortgage within the SFHA (FEMA, 2024a), it is voluntary for properties outside the SFHA and for renters and homeowners without mortgages. Recent estimates find that approximately 48.3% of households within the SFHA held flood insurance in 2019, while uptake rates among voluntary purchasers was only 2.2% (Bradt et al., 2021). Other research has shown that local knowledge or experience can lead to increases in voluntary insurance purchases (Choi et al., 2024; Gallagher, 2014; Shao et al., 2017). Thus, increasing the availability of data on previous flooding could lead to improved insurance uptake rates.
Flood insurance uptake varies by location, knowledge of local risk, ability to afford coverage, and socio-demographic factors (Atreya et al., 2015; Brody et al., 2017; Dixon et al., 2006), all of which can introduce bias into flood exposure estimates derived from NFIP claims and policies data. We investigate flood exposure in one county to quantify model sensitivity. We find that while our model slightly underpredicts flooding outside the SFHA, the model also predicts a higher proportion of flooding outside the SFHA compared to the proportion of NFIP claims filed outside the SFHA. This suggests that even though our models likely underestimate exposure, they are still able to predict flooding in areas outside of the SFHA with low, or no, insurance penetration. We also demonstrate that our models outperform all other available physics-based and remote sensing model outputs when compared against observations of flood presence and absence, both inside and outside of the SFHA. Notably, flood extent estimates from process-based models tended to predict flood exposure at many locations where no flood damage was reported, but this may be driven in part by the method we used to estimate flood exposure (i.e., we did not incorporate information on FFE). In contrast, estimates from remote sensing observation-based models tended to underpredict flood exposure at many locations where flood damage was reported.
The are other sources of uncertainty that are worth noting. First, our estimates of flood extents are sensitive to the cutoff threshold used to convert continuous flood damage probabilities to binary flood extents. Model performance by event is similar across all thresholds yet estimates of flood extent and exposure vary substantially. We select a conservative threshold, and therefore likely underestimate flood extents and building exposure, particularly outside of the SFHA, where insurance coverage is lower. Second, we use the same geospatial predictors to train each event model, even though some events are driven by different flood sources (e.g., storm surge, pluvial, fluvial). We include proxy variables to account for these drivers, like distance to coast and accumulated precipitation, but some smaller events still produce visible prediction irregularities in flood extents. Future research could use feature selection to refine the geospatial predictors used to model each event. Third, we did not include building-specific predictors or vulnerability measures, such as FFE, given our focus on predicting flood exposure, including damage that occurs below the first floor, at a 30 m resolution. This is in part because there is limited availability of high-quality data on FFEs for structures outside the SFHA in NC. Fourth, for smaller events, our training data are imbalanced, with many more flood absence than presence locations. Modeling smaller events at smaller scales could reduce this imbalance and provide more insight into model performance metrics like accuracy and TNR. Finally, while our models produce out-of-sample flood exposure predictions, we acknowledge that evaluating our estimates of exposure is not possible in areas with sparse NFIP claims and policies data. Future research should attempt to validate against secondary sources of flood data, like damage surveys, collected after flood events.
Despite these limitations, FLDEX contributes to efforts to better understand and characterize flood risk in eastern NC. Overall, there has been limited focus on exposure outside the SFHA, yet we show that a significant portion of past flooding has occurred in “low-risk” areas. Previous studies have used knowledge of flood exposure during past events to investigate how experiences with flood exposure influence a variety of socioeconomic outcomes (Billings et al., 2022; Deryugina et al., 2018; Frankenberg et al., 2008; Gray et al., 2014; Koslov et al., 2021; Loebach & Korinek, 2019), but the majority of past studies neglect the time-varying nature of exposure—primarily due to a lack of available data—and how trends in exposure over time can shape vulnerability and, thus, long-term outcomes. Because FLDEX captures previous and repetitive flood exposure at high spatial and temporal resolutions, the reconstructed flood extents can be used as measures of exposure to establish causal relationships between exposures and outcomes as diverse as health, decision-making, education, economic security, and displacement. In future research, we plan to leverage this data set to simulate dynamic vulnerability, enabling the measurement of consecutive and cumulative impacts at various scales, throughout the life course, and across different population groups. The causal applications of this data set present a unique opportunity to gain insights into how the dynamic nature of flood exposure can reshape vulnerability, ultimately impacting resilience in a future projected to see more frequent and severe flooding.
Conclusion
Observation-based machine learning models, trained on historical damage data, offer a robust mechanism to estimate past flood hazard extents and building exposures at high spatial and temporal resolutions. These methods are also easily translatable to other areas and could be applied nationwide with access to additional address-level NFIP data or other observations of flood damage. When compared against other available flood extent estimates created using process-based models and observation-based models trained on remote sensing data, we find that flood extents in our archive better capture observed locations of flood damage presence and absence, without generating many FP. We demonstrate that our models are robust against different sensitivity tests and potential biases in our training data. Additionally, we predict significantly more flood exposure than recorded in NFIP claims records between 1996 and 2020. Perhaps most importantly, the detailed record of past flood extents we generate here can be used to investigate the complex interactions between flood exposure, vulnerability, and risk in hotspots of previous and repeat flooding. The Flood Extent Archive (FLDEX) itself and the information we can gain from its use can help inform more proactive, strategic, and equitable flood risk mitigation and adaptations efforts.
Acknowledgments
We thank Jillian Evans-Strong for finding and organizing external information on the events identified in this study. We thank members of the Flood Lab at the University of North Carolina at Chapel Hill and the Economics Team at the Environmental Defense Fund for constructive discussions. We thank the University of North Carolina at Chapel Hill Research Computing group for computational support and resources. This research was supported by the National Oceanic and Atmospheric Administration (NOAA) through the NOAA Climate Adaptation Partnerships (CAP) program NA21OAR4310312 and the North Carolina Policy Collaboratory. We are grateful to the North Carolina Department of Emergency Management and FEMA Region IV for access to data.
Data Availability Statement
This analysis was conducted using R version 3.4.1. A data repository for the Flood Extent Archive (FLDEX) is available here: (Garcia et al., 2025). The data and example scripts that support the findings of this study are publicly available, when possible, in the data repository. Comparison data for ADCIRC and SFINCS models are available here: (Grimley, Bunya, et al., 2024). Comparison data for MODIS remote sensing based models are available here: (Tellman et al., 2021a). Comparison data for the UAVSAR remote sensing based model is available here: (Wang, 2021). Address-level NFIP claims and policies data contain personally identifiable information and are not publicly available, aggregated records of NFIP insurance data are available here: (FEMA, 2024c, 2024d).
© 2025. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.