Content area
Abstract
Wildfires have increasingly affected human and natural systems across the western United States (WUS) in recent decades. Given that the majority of ignitions are human‐caused and potentially preventable, improving the ability to predict fire occurrence is critical for effective wildfire prevention and risk mitigation. We used over 500,000 wildfire ignition records from 2000 to 2020 to develop machine learning models that predict daily ignition probability across the WUS and incorporate a wide range of physical, biological, social, and administrative variables. A key innovation of this work is development of novel sampling techniques for representing ignition absence. Unlike traditional purely random sampling or hyper‐sampling, which does not account for temporally autocorrelated factors (such as droughts, insect outbreaks, and heatwaves) and spatially autocorrelated factors (such as proximity to human settlements, infrastructure presence, and fuel type), we introduce spatially and temporally stratified sampling of ignition absence. By drawing absence samples near the location and time of historical ignitions, we better captured the complex environmental and anthropogenic conditions associated with fire occurrence or lack thereof. Models trained without stratified sampling produced ignition probability maps that consistently overestimated fire risk during high fire danger periods, whereas models incorporating stratified fire absence samples more accurately captured the spatial and temporal variability of fire potential and achieved predictive accuracies exceeding 95%. In addition to operational utility for fire prevention and resource allocation, our approach offers insights into the drivers of wildfire ignitions and highlights the value of incorporating spatial and temporal structure in absence sampling for wildfire modeling.
Full text
Introduction
Wildfires, hereafter fires, are one class of climate-related extremes that have increasingly impacted several regions globally, including the western United States (WUS), with compound and cascading effects (Higuera et al., 2023; Modaresi Rad, Abatzoglou, Fleishman, et al., 2023; Modaresi Rad, Abatzoglou, Kreitler, et al., 2023). Fires impact a variety of systems and sectors, including the built environment (Seydi et al., 2024; Wibbenmeyer & McDarris, 2021), public health (Buchholz et al., 2022), and land management and firefighting resources (Hosansky, 2023). Fires also affect the environment in numerous ways, such as altering vegetation structure and composition, increasing the risk of post-fire debris flows, modifying snowpack dynamics, and accelerating erosion in burned areas while contributing to sedimentation in downstream channels (Buchholz et al., 2022; Williams et al., 2022).
Increasing fire risks in the WUS have been attributed to intensification of fire weather (Abatzoglou & Williams, 2016; Zhuang et al., 2021), expansion of the wildland-urban interface (Radeloff et al., 2018), and increased fuel loads due to historical fire suppression (Boisramé et al., 2022). Strategies for mitigating increased fire risk include fuel treatments around values-at-risk and in critical ecological areas (Chung, 2015; Finney, 2001), home hardening to enhance the structure's ability to survive fire (Kodur et al., 2020), and fire prevention efforts (Edgeley et al., 2025). Fire prevention is among the least costly and most effective strategies in the WUS (Calkin et al., 2023) given that over 60% of all fire ignitions in the region are human-caused and potentially preventable. Because human-caused ignitions typically occur closer to human settlements and values-at-risk, they tend to be more destructive (Kumar, 2025) and more intense (Hantson et al., 2022).
Studies have focused on understanding the patterns and drivers of fire ignitions (Balch et al., 2017; Mann et al., 2016; Syphard et al., 2007). Weather conditions have been associated with fuel flammability and human outdoor activities, contributing to seasonal and interannual variability in the number and location of fire ignitions (Finney et al., 2011; Littell et al., 2016; Noonan-Wright et al., 2011). Widespread drying and increases in the number of critical fire weather days (Alizadeh et al., 2023; Khorshidi et al., 2020), in conjunction with human factors such as the expansion of the wildland-urban interface (Radeloff et al., 2018), increasing population size, and fire prevention efforts, have been linked with long-term trends in the number of ignitions (Nagy et al., 2018; Noonan-Wright et al., 2011). Although there is no widespread increase in the number of ignitions across the West (some studies show declines in the number of human-ignited fires; Jorge et al., 2025; Pourmohamad, Abatzoglou, et al., 2025, Pourmohamad, Sadegh, & Abatzoglou, 2025; Syphard et al., 2025), suggesting that various fire prevention efforts have been effective, the persistence of high-impact, human-ignited fires suggests that more fire prevention is needed. Essential to effective fire prevention is identifying not only where and when ignitions occur, but also what drives such variability.
In this paper, we develop machine learning models to predict daily fire ignition occurrence and investigate their drivers across the WUS. Such information is essential for understanding fire risk; prioritizing regional fuel management, fire suppression resources, and fire prevention efforts; and preparing communities for fire emergencies (Chen & Jin, 2022; Di Giuseppe et al., 2025). To train these models, we use extensive historical fire ignition records, and propose a novel method to develop robust samples that indicate absence of fire (Jiménez-Ruano et al., 2022). Our absence-of-fire samples not only are randomly distributed across space and time, but reflect conditions generally similar to those of ignition presence, with subtle yet important differences. We also evaluate the relative influence of various drivers on ignition occurrence. Gaining insight into the spatial and temporal patterns of fire ignitions is essential for informing fire and land management strategies aimed at reducing ignition risk (Chen et al., 2021; Faivre et al., 2014). Finally, we investigate the applicability of these models, which are trained on point samples, to develop gridded ignition probability maps for the WUS, and conduct a sensitivity analysis of these models to grid sizes.
Data and Methods
Our fire occurrence prediction framework (Figure 1) integrates a variety of factors identified in the literature as influences on fire ignition probability, including weather, fire danger, climate, land cover, topography, social factors, population density, and management. In the following sections, we first describe the samples representing fire presence and absence, and the attributes used to distinguish between them. We then describe our models and their evaluation, followed by a brief discussion of how our predictive maps of ignition probability are generated.
[IMAGE OMITTED. SEE PDF]
Data
Data to Train, Validate, and Test Models
Presence of Ignition
We obtained fire records from the FPA FOD-Attributes data set (Pourmohamad et al., 2024), which augments the sixth version of the Fire Program Analysis-Fire Occurrence Database (FPA FOD v6; Short, 2014, 2022) with nearly 270 physical (e.g., weather, climate, topography, infrastructure), biological (e.g., land cover), social (e.g., population density, social vulnerability index (SVI)), and administrative (e.g., national preparedness level (NPL), jurisdiction) attributes that coincide with the date and location of each ignition. This data set contains information on the location, discovery time, cause, and final size of >2.3 million fires in the United States from 1992 to 2020. Of these fires, 752,461 occurred in the WUS (Arizona, California, Colorado, Idaho, Montana, Nevada, New Mexico, Oregon, Utah, Washington, and Wyoming), with final sizes from <0.1 acres to 1,068,802 acres. Here, we address fire occurrence without regard for fire size.
Absence of Ignition
We created three sets of samples that represent absence of ignition, and compiled physical, biological, social, and administrative attributes associated with each point-date.
-
Hyper-sampling: We created a 4 km grid mesh over the WUS (193,545 grid cells). Because fire occurrence peaks during summer and is minimal during winter in the WUS (Table S1 in Supporting Information S1) (Podschwit & Cullen, 2020), we randomly selected four dates in summer, one date in winter, and two dates each in spring and autumn (nine dates per year for each year from 2000 through 2020—timeline dictated by data availability, details later). For each date, we randomly selected one point in each grid cell to ensure a uniform spatial distribution of the samples. This resulted in 36,580,005 (193,545 9 21) samples of ignition absence (Figure 2).
-
Spatially Stratified Sampling (SSS): A majority of previous studies used either entirely random sampling (Zaidi, 2023) or a version of hyper-sampling (Moradi et al., 2024) to develop ignition absence samples. In contrast, we used spatially and temporally stratified sampling (TSS) to better capture the nuances in the attributes associated with the location and date of ignition presence and absence. By investigating these stratified samples, one can better understand the exact underlying conditions that lend themselves to a fire occurrence. We sampled 9 dates—4 in summer, 2 in fall, 2 in spring, 1 in winter—at the exact location of historical fire occurrences but on dates other than the ignition date. This resulted in 6,772,149 (752,461 9) SSS ignition absence samples (Figure 2).
-
Temporally Stratified Sampling (TSS): We randomly sampled points that coincided with the date of an ignition and in close proximity to the ignition location, but not at the exact location. We randomly sampled points within a 4 km (TSS-4) radius of the ignition location, and then within a 15 km radius (TSS-15). This resulted in 752,461 ignition absence samples for each of the TSS-4 and TSS-15 sets (Figure 2).
Ignitions are more likely to occur in certain locations than others due to a variety of factors, including human activity and infrastructure, fuel type, and moisture regime. Similarly, specific periods—such as the Fourth of July for fireworks ignitions and growing season tails for ignitions caused by debris burning—are more prone to new ignitions. Hyper-sampling methods implicitly assume that ignition probability is uniformly distributed across space and to some extent time, although constraints on temporal distributions try to alleviate this shortcoming. In contrast, SSS and TSS techniques account for the spatial and temporal variability in ignition probability, thereby providing a more representative sample of ignition absence cases.
[IMAGE OMITTED. SEE PDF]
Data to Assess Predictive Models
We develop two sets of samples, beyond formal training, validation, and test data, to assess the predictive model. We first develop daily time series of attributes from 2000 to 2020 for a set of points to assess the dynamics of ignition probability over time. Next, we develop gridded maps of attributes at certain dates to assess the spatial distribution of ignition probabilities. The latter case also helps to evaluate whether a point model can develop predictive maps, and if so, to identify the grid resolution that can accurately represent attributes of the point samples used to train the models.
-
Time series data: To assess how fire ignition probability changes over time, we selected the locations of 13 historical fires. We ensured that these fires were distributed over the entire WUS (Figure S1 in Supporting Information S1) and that each had a different ignition cause (arson or incendiarism; debris and open burning; equipment and vehicle use; firearms and explosives use; fireworks; data missing, not specified, or undetermined; misuse of fire by a minor; natural; other causes; power generation, transmission, or distribution; railroad operations and maintenance; recreation and ceremony; and smoking). We extracted daily physical, biological, social, and administrative attributes at each location from 2000 to 2020.
-
Map data: To assess whether point models can develop ignition probability maps, and if so, to determine the optimal grid size, we developed four sets of gridded data with different grid sizes (4 km, 1 km, 250 m, and 30 m). In each case, we acquired physical, biological, social, and administrative attributes associated with the centroids of the grids (Figure S2 in Supporting Information S1). This exercise aims to strike a balance in grid resolution: although finer resolutions more effectively capture local, bottom-up factors (e.g., land cover, population density), their information content is limited by the uncertainty in the locations of ignitions reported in the FPA FOD data set. We selected 5 September 2019, with 323 reported fires across the WUS, and 4 July 2020, with 250 reported fires across the WUS, to develop fire probability maps.
Attributes Used in the Predictive Model
Following Pourmohamad, Abatzoglou, et al. (2025), Pourmohamad, Sadegh, and Abatzoglou (2025), we selected 29 attributes with the highest predictive power for fire ignition modeling and the least collinearity. These attributes are enlisted in Table S2 in Supporting Information S1, and are summarized below:
-
General information: Discovery day of year (DOY), fire year, and ignition cause. DOY captures intra-annual seasonality of ignitions, such as an increase in the number of fire ignitions around the Fourth of July, US Independence Day (Balch et al., 2017). Fire year acts as a proxy for long-term trends in fire ignitions, such as those due to fire prevention efforts. Each ignition cause is associated with distinct attributes (Pourmohamad, Abatzoglou, et al., 2025, Pourmohamad, Sadegh, & Abatzoglou, 2025). The National Wildfire Coordinating Group provides a list of 13 causes (National Wildfire Coordinating Group, 2025); here, we classify fires as all fires, naturally ignited fires, and human-ignited fires.
-
Weather and fire danger indices: Daily precipitation, wind speed, minimum temperature, 100- and 1000-hr dead fuel moisture, burning index, vapor pressure deficit (VPD), energy release component (ERC), and ERC percentile. Fuels are more receptive to ignitions during dry-hot conditions, and wind enhances the likelihood of certain ignition causes, such as power-started fires (Balch et al., 2024). We note that there is a level of correlation between these attributes, but opted to include them in our model since each conveys nuanced information that can inform our predictive models. For example, ERC and VPD provide complementary information given that ERC evolves slowly whereas VPD evolves quickly. Daily precipitation enables differentiation between dry and wet lightning, which are associated with widely different probabilities of fire ignition (Kalashnikov, 2024).
-
Climate attributes: Long-term average annual precipitation, temperature, and reference evapotranspiration. Climatic attributes largely shape fuel availability and type, and therefore the fire regimes in each region (Pausas & Paula, 2012).
-
Topography and land cover: Average elevation, aspect, slope, and topographic position index within a 1 km radius of the ignition point, and existing vegetation cover and fire regime group (FRG) at the point of ignition. Topography is highly correlated with the type of ignition; for example, lightning-caused ignitions generally occur at higher elevations and slopes compared to human-caused ignitions (Narayanaraj & Wimberly, 2012). We selected average topographical indices within 1 km of the ignition point to address uncertainties in the reported ignition locations (Short, 2015). The 1-km radius also contributed to generalizing the point model to develop ignition probability maps (details later). Finally, existing vegetation cover helps capture different flammability levels among fuels and represents land-cover types such as roads (Fares et al., 2017).
-
Social factors: Annual gross domestic product, global human modification (GHM), average human population density within a 1 km radius of the ignition point, and overall SVI. Various social attributes have been linked with human ignitions of fire (Flanagan et al., 2018). For example, moderate population densities were linked with a higher probability of ignition, whereas lower and higher population densities translate to absence of humans and limited vegetation cover, respectively, and hence were associated with lower ignition probabilities (Syphard et al., 2008).
-
Administrative factors: Suppression difficulty index, FRG, management unit, protected area status (GAP Status Code), NPL, FIPS code—Land and fire management practices impact ignition loading (Keane, 2012). The number of human-caused ignitions is lower in protected areas (Nelson & Chomitz, 2011). NPL represents the national availability of fire suppression resources, with higher values indicating coincidence of multiple large fires across the country and strained suppression resources (Abatzoglou et al., 2021). FIPS code acts as a proxy for local factors such as demographics, suppression resources, and level of recreational activities, among others, which are relevant to ignition modeling but are not readily available as independent data to be included in the model.
Modeling
We developed two ensemble machine learning models (Random Forest and XGBoost) and two deep learning models (one-dimensional convolutional neural networks, with convolution over attributes; CNN-1D with Keras and PyTorch; Figure S3 in Supporting Information S1) to estimate ignition probability in three ignition classes: all ignitions (regardless of reported cause), human-caused ignitions (12 classes of human-caused ignitions combined), and natural ignitions (almost entirely lightning-caused in the WUS). We used data from 2000 to 2018 to train, validate, and test our models, and used data from 2019 to 2020 as additional test cases (extra test). We did not use data before 2000 because certain data attributes (such as population density) were unavailable before 2000. We used dummy encoding to convert string-type attributes to digits. Importantly, we removed fire records that met three criteria: (a) reported by county or local agencies, (b) less than 0.1 acre in final size, and (c) associated with developed land cover with low, medium, or high development intensity. We deemed the uncertainty in reporting of these fires unacceptable due to temporal changes in governmental fire reporting procedures (Jorge et al., 2025). This process decreased the total number of ignition records between 1992 and 2020 from 752,461 to 574,522, of which 383,872 occurred between 2000 and 2020 and were used in our modeling effort. We divided the 2000–2018 machine learning-ready ignition records and ignition absence samples into training (∼65%), validation (∼15%), and test (∼20%) data (Table S3 in Supporting Information S1). We used training data to learn model parameters, validation data to learn model hyperparameters, and test data to assess model accuracy. We considered ignition probabilities over 50% as an ignition incident. This threshold was selected through fine tuning of hyperparameters. We also performed a sensitivity analysis to examine how the model's performance changed with different cutoff thresholds used to distinguish between ignition presence and absence. We found a higher probability threshold to marginally reduce false positives at the expense of a large increase in false negatives (Figure S4 in Supporting Information S1).
We used a Bayesian optimizer and cross-validation to learn hyperparameters for each model. The training data had a significant class imbalance, with ignition absence incidents vastly outnumbering ignition presence incidents (98.9% vs. 1.1% of all samples, respectively). We developed separate models both with and without class weights to assess the potential impacts of class imbalance on the model performance. We then selected the best model on the basis of precision, recall, F1 score, and overall accuracy. We refrained from additional input data preprocessing, including temporal detrending, smoothing, or using a moving average, and spatial blocking. Specifically, temporal trends in the observed ignitions respond to interannual climatic fluctuations, the chronology of adoption of fire prevention strategies, and other factors that are directly or indirectly included in the modeling framework. Similarly, potential clustering of ignitions can be explained by underlying governing factors that are explicitly provided to the model.
We used the ignition absence samples to develop two categories of models. In the first category, Ignition Model (IM), we included the samples representing absence of fire derived from hyper-sampling (Figure S5 in Supporting Information S1). In the second category, Augmented Ignition Model (AIM), we included the absence of fire samples from hyper-sampling and from SSS and TSS-4 (Figure S6 in Supporting Information S1). In each category, we kept 20% of the fire absence samples for model testing. Furthermore, we withheld the entire group of TSS-15 data from the training process for additional testing. We used Shapley Additive explanations (SHAP) to calculate global feature importance, a measure of each feature's overall contribution to the model's predictive power. This metric is crucial for model interpretation and identifying the underlying relationships between features and the target variable (Nohara et al., 2022).
Interested audience are referred to Text S1, Table S4, and Figures S7 and S8 in Supporting Information S1 that describe machine learning versus weather-based ignition modeling.
Results and Discussion
Attributes Associated With Ignition Presence and Absence
Kernel density distributions of attributes associated with presence and absence of fire ignitions (Figure 3) have similarities and distinctions that can be used in machine learning models to predict an ignition or lack thereof. Here, ignition absence includes both hyper-sampled and stratified (SSS, TSS-4) samples. Differences were more pronounced for weather-related attributes: ignition presence samples (all ignitions in Figure 3) were associated with drier and hotter conditions than ignition absence samples. For example, daily VPD (Figure 3a) and daily minimum temperature (Figure 3c) distributions shifted toward higher values, and 1,000-hr dead fuel moisture distribution shifted toward lower values (Figure 3b), for samples associated with presence of ignitions. This pattern is consistent with expectations given the higher likelihood of ignitions in drier and hotter conditions. Differences were less pronounced for other attributes, but still apparent for elevation (Figure 3g), aspect (Figure 3i) and global human modification index (GHM; Figure 3j), which are to some extent explained by the geography of fire occurrences (mainly human-caused ignitions, but also natural fires following mountain patterns; Pourmohamad et al., 2024) and our hyper-sampling method—generating a majority of ignition absence samples—drawing uniformly distributed samples for the absence of fire case. Figure S9 in Supporting Information S1 summarizes similarities and differences among the 27 numerical attributes associated with presence and absence of ignitions.
[IMAGE OMITTED. SEE PDF]
Nuanced differences emerged when separating natural from human-caused ignitions (Figures S10 and S11 in Supporting Information S1). Specifically, distributions of weather-related attributes associated with natural ignitions were more distinct from absence of fire samples (Figure S11 in Supporting Information S1) than those associated with human-caused ignitions (Figure S10 in Supporting Information S1). The latter is consistent with the concentration of natural ignitions in summer as compared to the expanded spatial and temporal presence of human-caused ignitions (Balch et al., 2017). These differences were also observed for topographical attributes with natural ignitions occurring at higher elevations and on steeper slopes than human-caused ignitions. The GHM distribution for natural ignitions, unlike that for human-caused ignitions, closely resembled the distribution of fire absence samples. Human-caused ignitions tend to occur in more developed areas and natural ignitions in less developed areas, and a majority of our fire absence samples were uniformly drawn from across the WUS, much of which is less developed. Finally, the distribution of climate attributes (annual temperature and reference evapotranspiration) for natural ignitions were more closely aligned with those of fire absence samples, as compared to human-caused ignitions (Figures S10 and S11 in Supporting Information S1), which we believe are due to differences in the spatial distributions of human-caused and natural ignitions.
Model Accuracy
We first used ignition presence samples (all causes) and the hyper-sampling-derived fire absence data to train two ensemble machine learning and two CNN-1D deep learning models. XGBoost provided the highest overall accuracy over the 20% out-of-sample test data among all models (Table S5 in Supporting Information S1). Our prior modeling efforts (Pourmohamad, Abatzoglou, et al., 2025, Pourmohamad, Sadegh, & Abatzoglou, 2025; Seydi et al., 2024) and those of others (Shwartz-Ziv & Armon, 2022) also showed superior performance of XGBoost for tabular data. We therefore employed XGBoost for the remainder of our modeling exercises in this paper. We also tested the XGBoost models with and without class weights, which resulted in very similar overall accuracy (with weight: 99.95%, without weight: 99.96%) with marginally superior accuracy for the case without weights; we hence used the XGBoost model without weights in the remainder of our analysis. We also note that within our conditional probability framework—where ignition probability is modeled based on 29 fire-related covariates and drivers—the decision boundaries between ignition presence and absence differ across modulating conditions. The XGBoost model can effectively capture these distinctions, so the imbalance in the data is not expected to fundamentally bias the predictions toward the more frequent label.
The IM framework demonstrated a robust performance on the test data, achieving high precision, recall, accuracy, and F1 score across all ignitions, natural ignitions and human-caused ignitions (84%–100%; Table 1). The model performance for the stratified fire absence data (SSS, TSS-4, and TSS-15), however, dropped to 66%–80% (Table 1). We attribute this performance drop to the divergent attributes of ignition absence samples used in training the model and those of the SSS and TSS data. The stratified ignition absence samples are spatially and/or temporally clustered around the ignition presences and hence are associated with attributes more similar to those of ignition presence. Performance of the AIM framework was marginally inferior to IM for the test data with precision, recall, accuracy, and F1 score ranging between 81% and 100%, but AIM performed markedly superior to IM by accurately capturing 96%–100% of fire absence samples for out-of-sample SSS, TSS-4, and TSS-15 data (Table 1). Refer to Table S6 for confusion matrices and to Figure S12 in Supporting Information S1 for Receiver Operating Characteristic curves and the Area Under the Curve for both IM and AIM frameworks and all modeling cases.
Table 1 Performance Metrics for the Ignition Model (IM) and the Augmented Ignition Model Over Out-of-Sample and Extra Test Data
| Data set | Metrics | Ignition model | Augmented ignition model | ||||
| All fire causes (%) | Natural causes (%) | Human-caused (%) | All fire causes (%) | Natural causes (%) | Human-causes (%) | ||
| Test data | Precision | 95.54 | 96.19 | 96.75 | 91.53 | 92.35 | 90.85 |
| Recall | 82.96 | 84.87 | 78.30 | 80.71 | 82.78 | 74.63 | |
| F1 score | 88.81 | 90.18 | 86.55 | 85.78 | 87.30 | 81.94 | |
| Accuracy | 99.77 | 99.91 | 99.84 | 99.71 | 99.89 | 99.79 | |
| Accuracy for stratified data | SSS | 76.04 | 76.06 | 79.87 | 96.18 | 99.95 | 99.75 |
| TSS-4 | 66.60 | 70.80 | 67.15 | 96.20 | 98.31 | 96.88 | |
| TSS-15 | 67.71 | 70.90 | 68.09 | 96.18 | 98.26 | 96.94 | |
| 2019 | Precision | 86.53 | 91.96 | 85.19 | 81.45 | 85.88 | 78.60 |
| Recall | 83.34 | 92.56 | 76.24 | 73.88 | 79.97 | 64.94 | |
| F1 score | 84.91 | 92.26 | 80.47 | 77.48 | 82.82 | 71.12 | |
| Accuracy | 99.79 | 99.95 | 99.84 | 99.70 | 99.91 | 99.78 | |
| 2020 | Precision | 69.70 | 64.50 | 72.69 | 68.68 | 56.47 | 71.24 |
| Recall | 24.04 | 16.09 | 23.12 | 21.77 | 14.36 | 21.82 | |
| F1 score | 35.75 | 25.75 | 35.08 | 33.06 | 22.90 | 33.41 | |
| Accuracy | 99.22 | 99.80 | 99.40 | 99.22 | 99.79 | 99.41 |
Model performance for both IM and AIM frameworks remained robust when applied to 2019 data, but performance metrics for both cases dropped markedly when applied to 2020 data (Table 1; extra test data). Societal disruptions and stay-at-home orders in 2020 associated with COVID-19 markedly changed human interactions with fire, causing a near record number of human-caused ignitions in 2020 (Jorge et al., 2025). Additionally, lower resource availability for fuel management and infrastructure maintenance likely contributed to the trends and patterns of fire ignitions in 2020. Nevertheless, our models were not trained with data that represent such a societal shock, and hence they were not able to accurately predict the 2020 ignitions.
Gridded Ignition Model
We used the trained models to develop predictive maps of ignition probability. Given that our model was trained on points, we executed the model for the grid centroids. In our analysis of sensitivity to grid sizes, we evaluated the percentage of ignitions that were correctly classified by our point-based model, in essence calculating Recall for the model evaluated on grids. Model performance enhanced with increasing grid resolution (decreasing grid sizes), achieving 76%, 81%, 83%, and 86% accuracy for 4 km, 1 km, 250 m, and 30 m grid sizes (Figure S13 in Supporting Information S1). This outcome is expected, as finer resolutions place grid centroids closer to ignition points, resulting in their associated attributes being more similar—or even identical—to those of the ignition locations. Computational costs increase by a factor of 16–70 as grid sizes decrease. Considering trade-offs between accuracy and computational cost, we selected a 1 km grid to map the ignition probability over the WUS. We developed ignition probability maps for two dates in our extra test data, 5 September 2019 (Figure 4) and 4 July 2020 (Figure S14 in Supporting Information S1), both of which were associated with a large number of ignitions.
[IMAGE OMITTED. SEE PDF]
Although the IM framework almost perfectly identified ignitions in all classes (all ignitions, human-caused ignitions, and natural ignitions), it invariably assigned a high probability to the entire WUS on 5 September 2019 (Figure 4; left column) noting that small patches of low probability emerging in the human-caused ignitions model (Figure 4c). The IM yielded a low-information map due to the lack of spatial variability in the ignition probability distribution, rendering it not useful for resource allocation purposes. In contrast, the AIM framework generated a more informative ignition probability map (Figure 4; right column). The information content came at the expense of a marginal decline in correctly classification of observed ignition incidents but still captured 82% of human-caused ignitions and 93% of natural ignitions.
The AIM of natural ignitions exhibited a distinct spatial pattern, assigning elevated ignition probabilities to mountainous regions and lower probabilities to low elevations (Figure 4f), for example, in California's Central Valley, where irrigation of agricultural land minimizes the chances of lightning-started fires. Lower elevations are also generally associated with fewer ignitions than higher elevations, although our model does not include climatological lightning data and is therefore not affected by this factor. Our natural IM accurately captured the cluster of lightning storm-started fires on 5 September 2019 on the border of California and Oregon (Figure 4f), as well as other scattered ignitions across the WUS, but also assigned a high ignition probability to other mountainous regions that did not experience a fire. We attribute this behavior to the lack of representation of lightning strikes in our model—since our main focus was on modeling human-caused ignitions that potentially could be prevented. In other words, our model only captured the environmental conditions that are receptive to a lightning-ignited fire, not the presence of lightning strikes.
The human-caused ignitions also captured the geographical difference in high and low ignition likelihoods, assigning higher ignition probabilities to areas close to human settlements, roads, and infrastructure (Figure 4d). For example, foothills surrounding the Central Valley of California were assigned a high ignition probability, whereas the adjacent irrigated agricultural lands were assigned lower ignition likelihoods. High-elevation mountains also received a low probability of human-caused ignition. The AIM model trained on all ignitions (Figure 4b) exhibited a more extensive spatial distribution of high ignition probabilities compared to the models trained on natural and human-caused ignition data separately. This is because the all-ignitions model integrates the predictive patterns of both natural and human-caused ignitions, capturing a more diverse range of factors that contribute to ignition risk. Probability maps for 4 July 2020 rendered similar general insights discussed here (Figure S14 in Supporting Information S1).
Given that the AIM framework was found to be more informative than the IM, we will use AIM in the remainder of this study. We applied the AIM to a more refined gridded map (30 m) in three case-study locations in Siskiyou County, California, on 12 March 2019 (during a period of relatively low ignition activity in the region) and 27 July 2019 (during a period of high ignition activity). The first location (Figure 5g) encompasses agricultural land and wildland vegetated areas. On 12 March, which represents wet conditions in the early spring, AIM assigned low ignition probabilities to the wildland areas, but assigned high ignition probabilities to agricultural lands (Figure 5a). We believe the latter is due to the probability of agricultural residue burns escaping and igniting a fire. Although our model does not include any information on soil moisture or vegetation health (e.g., Normalized Difference Vegetation Index (NDVI)), we believe the model captures agricultural residue burn probabilities through the interaction of seasonality metrics, such as DOY, combined with land cover. On 27 July, when conditions were hot and dry, our model assigned a high probability of ignition to the wildland areas, but a lower probability to the agricultural lands as by this time irrigation started and lowered the likelihood that vegetation would ignite, let alone the lower chance of residue burn—which we believe these conditions were captured through the seasonality metrics (Figure 5d).
[IMAGE OMITTED. SEE PDF]
The third case study location in northern California (Figure 5i) encompasses wildland vegetation, agricultural land, and wetlands. Similar to the first case study, AIM assigned high ignition probability to agricultural land and low ignition probability to wildlands on 12 March 2019. In this case, the model also predicted linear patterns of high ignition probability alongside roads. While our variables do not include distance to road, our model was able to capture higher ignition probabilities alongside roads using land cover data (Figure 5c). We emphasize that our model predicts an ignition of a fire of any final size (e.g., <0.1 acres), which is associated with loose thresholds of an ignition occurrence. On 27 July 2019, ignition probability across a majority of the case study location, except the wetland on the northeast side and in agricultural land, was high. Our model could have been further improved by incorporating vegetation productivity or greenness indices, such as the NDVI; however, we did not include these metrics among our variables due to incomplete coverage across the spatial and temporal extent of our study. The results of the second case study (Figures 5b, 5e, and 5h) were similar to those of the first and third cases.
Time Series of Ignition Probability
We used the AIM to develop daily time series of natural or human-caused ignition probabilities (Figure 6, Figure S15 in Supporting Information S1) from 2000 to 2020 for 13 locations across the WUS (Figure S1 in Supporting Information S1) that experienced a natural or human-ignited fire during that time period. The type of ignition we modeled corresponded to that of the reported fire. For a majority of the 13 time series, including that for the location that experienced a naturally ignited fire, the daily time series generally followed weather patterns, where dry and hot days were associated with a higher probability of ignition (Figure 6a). In some cases where the weather conditions were not hot and dry (low ERC), ignition probability was still high due to elevated wind speeds (Figure S15b in Supporting Information S1; fire ignited by debris burning). However, ignition probability dynamics widely diverged from weather conditions in the case of the location that observed an arson fire (Figure 6d, Figure S1 in Supporting Information S1). In that location, social factors dominated ignition probability, and the probability remained high except during some periods that were too wet and cold to be receptive to an ignition (Figure 6d).
[IMAGE OMITTED. SEE PDF]
Driving Factors of Predictions
Shapley value analysis revealed that the top four contributors to our model's classification outcomes were annual temperature, NPL, discovery DOY, and fire year, consistently for all ignitions, natural ignitions, and human ignitions (Figure 7). Annual temperature reflects a range of influences—including background climate conditions, prevailing fire regimes, and the geographic distribution of ignitions—all of which play important roles in shaping ignition patterns. Higher/lower values of annual temperature, however, were not consistently associated with positive/negative contribution to ignition incidence (Figure 7, right column), indicating that annual temperature works in conjunction with other factors to modulate fire occurrence. Higher values of NPL—which are associated with elevated fire activity across the country and strained firefighting resources—negatively contributed to the fire occurrence, lessening the probability of new ignitions. This is probably due to the management factors, such as burn bans, public land closures, and enhanced social awareness during high national preparedness levels. Discovery DOY mainly indicates certain celebrations such as 4 July, but also captures the seasonality of ignitions—for example natural ignitions occurring in summer months and agricultural residue burns occurring in the two tails of the growing season. Finally, the fire year represents the long-term trends in the number of ignitions, due to management, fire prevention, public awareness campaigns, and population dynamics, among others (Pourmohamad, Abatzoglou, et al., 2025, Pourmohamad, Sadegh, & Abatzoglou, 2025).
[IMAGE OMITTED. SEE PDF]
Divergence emerges in the importance of subsequent drivers, with VPD, FIPS Code, and 1000-hr dead fuel moisture (FM1000) being important for all ignition; FIPS Code, FM1000, and GHM index (GHM) being important for human ignitions; and daily minimum temperature, VPD and FM1000 being important for natural ignitions. Weather-related attributes are obviously important for both natural and human ignitions, but social factors represented through FIPS Code and GHM are more important for human ignitions. Dry-hot values of weather indices positively contribute to the fire occurrence, whereas the impacts of social factors are not as direct. High values of GHM generally lead to more fire ignitions, but also preventing ignitions in certain cases, and its lower values negatively contribute to fire occurrence. Increasing GHM is associated with greater human presence and a higher likelihood of ignition—up to a threshold where human density becomes high enough to limit fuel availability, thereby limiting the potential for fire ignition.
Conclusion
Accurate prediction of ignition likelihood is crucial for assessing fire risk, informing targeted fire prevention and fuel management strategies, and enhancing early response and community preparedness efforts (Chen & Jin, 2022). To address this need, we developed machine learning models trained on a comprehensive data set that integrated physical, biological, social, and administrative factors associated with ignition presence and absence samples. We used our models to develop ignition risk maps at daily temporal resolution and multiple spatial resolutions (4 km, 1 km, 250 m, and 30 m), providing a robust framework for ignition risk assessment and management.
A key contribution of our study is its use of spatially and TSS to represent fire absence conditions more accurately than conventional random or hyper-sampling approaches. Whereas random sampling implicitly assumes a uniform probability of ignition across space and time, our stratified approach acknowledges the inherent heterogeneity in ignition likelihood that is driven by factors such as human activity, infrastructure presence, fuel type, moisture regimes, and climate extremes such as prolonged droughts and heatwaves. By aligning the distribution of absence samples with the spatial and temporal patterns of reported ignitions, our method provides a more realistic and informative foundation for modeling fire occurrence. We first used hyper-sampling to develop over 36 million fire absence samples, randomly selecting points in space and time to ensure uniform coverage across the WUS, while aligning the temporal distribution with seasonal ignition patterns (i.e., more samples in summer, fewer in winter). Trained on all historical ignition samples (>500,000, after filtering out uncertain records) and the hyper-sampled ignition absence data, our IM yielded near-perfect performance. Stratified ignition absence samples were then strategically drawn either at historical fire locations but on different dates (to capture spatial structure) or on the same dates as historical fires but at nearby locations (to capture temporal structure). When applied to these more realistic ignition absence samples, which were designed to closely resemble fire-prone conditions, the model's performance declined substantially. Further analysis revealed that the model tended to assign high ignition probabilities across much of the WUS on critical fire weather days, thereby limiting its effectiveness for operational use in resource allocation and fire prevention or response.
We then developed an AIM that incorporated not only hyper-sampled data but also spatially and temporally stratified samples representing ignition absence. Although this model slightly underperformed the original IM when applied to the test data, its performance substantially improved when applied to absence samples with background conditions that closely resembled those of ignition events. The AIM also produced more reliable ignition risk maps, effectively capturing spatial variability in fire occurrence likelihood across low- and high-risk areas. We used this model to generate time series of ignition probabilities for selected locations across the WUS. The model successfully tracked intra-annual cycles of dry-hot versus wet-cold weather to predict ignition likelihood. Additionally, in certain locations, ignition probabilities remained consistently high throughout the year except during brief cold and wet periods, indicating a dominant influence of consistent human factors. The model also effectively captured seasonal ignition patterns, such as those associated with agricultural debris burning, and was sensitive to the effect of infrastructure proximity, especially roads, on ignition risk.
Future research could build on our development of point-based models and their application to develop gridded maps of fire ignition likelihood by incorporating computer vision models to better capture spatial patterns in ignition drivers. Our models assume that temporal nonstationary in ignitions can be represented through proxy variables such as year. Although our models make accurate short-term predictions, they will need periodic retraining as new data become available to account for evolving trends. We also used the FIPS code as a proxy for local factors such as resources, demographics, and community characteristics. Extending this framework to explicitly include these variables could further improve model performance. Lastly, projecting future ignition patterns in response to population dynamics, WUI expansion, and ongoing climate warming remains an active and important area for future research.
Acknowledgments
This research was supported by the Joint Fire Science Program (Grants L21AC10247 and L24AC00277) and the National Science Foundation (awards # 2429021 and 2521103).
Conflict of Interest
The authors declare no conflicts of interest relevant to this study.
Data Availability Statement
All the data used and models developed in this study are publicly available from Pourmohamad, Abatzoglou, et al. (2025), Pourmohamad, Sadegh, and Abatzoglou (2025).
© 2026. This work is published under http://creativecommons.org/licenses/by-nc/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.