Introduction
The rapid surge in global population growth and the swift pace of industrialization on a global scale have ushered in a myriad of environmental challenges, with the deterioration of air quality standing out prominently [1]. It is especially acute in developing nations, where the burgeoning industrial activities often outpace environmental regulations and infrastructure [2]. Notably, countries such as India and China have garnered notoriety for their alarmingly high levels of air pollution, exemplified by annual fine particulate matter (PM2.5) concentrations reaching 85 μg/m3 in Delhi and 34.4 μg/m3 in Beijing [3, 4]. It underscores the severe impact of industrialization and urbanization on air quality, posing substantial risks to public health [5]. Moreover, the global concern extends to low- and middle-income countries, where air quality in less than 1% of cities meets the air quality thresholds recommended by the World Health Organization [6]. It highlights a pervasive and critical issue that demands urgent attention and robust interventions to safeguard the health and well-being of populations. Shifting our focus to Latin America and the Caribbean, Chile is a notable case, standing out as a significant contributor to the region’s air pollution landscape. In both 2021 and 2022, Chile accounted for a substantial proportion of 60 and 66.7% of the 15 most polluted cities in the region, surpassing WHO limits for PM2.5 by at least sevenfold [7]. It emphasizes the severity of air quality challenges in the country, necessitating comprehensive strategies for mitigation and control.
Furthermore, the significance of sulfur dioxide (SO2) emissions comes to the forefront, mainly attributable to Codelco, Chile’s state-owned enterprise and the world’s largest copper producer [8]. The production process, particularly in copper smelters, contributes significantly to SO2 emissions, a noxious gas with detrimental health effects [9]. Understanding and addressing this aspect is crucial for comprehensive air quality management. Chile’s geographical considerations add a layer of complexity and play a pivotal role in exacerbating air quality challenges [10]. Most Chilean cities are nestled in valleys due to the Andes Mountain Range and the Coastal Mountain Range that span the country. This unique topography restricts the dispersion of pollutants, and the situation is further compounded by frequent thermal inversion events during the winter months [11]. The intricate interplay of geographic factors and industrial activities underscores the need for tailored and effective air quality management strategies in Chile, considering both industrial processes and the distinct environmental features of the region.
Significant examples of Chilean cities facing these air quality challenges are Coyhaique, Quintero, and Puchuncaví. Annual mean PM2.5 levels in southern Chilean cities have been significantly higher than WHO guidelines since 2017 [7]. Among these, Coyhaique is the most polluted city in the whole continent, with an average annual concentration of PM2.5 corresponding to 37 μg/m3 between 2017 and 2022. According to WHO, this value should not exceed five μg/m3. It was declared a polluted zone for PM10 in 2012 and PM2.5 in 2016 [12, 13]. The main source of PM10 and PM2.5 emissions comes from the residential combustion of wood; 96% of housing use firewood for heating and cooking food [14], contributing 99.9% of total PM10 emissions and 99.67% of total PM2.5 emissions [15, 16]. Critical pollution episodes by particulate matter are characterized by the limited dispersion capacity of pollutants in the basin during autumn and winter when winds average 2 m/s, temperatures range between -10°C and 5°C, and a mean relative humidity of 66.2% [17–19]. Coyhaique is situated in a valley in the Aysén Region of southern Chile, surrounded by the Andes Mountains. It has a cold oceanic climate characterized by cold winters, frequent snowfall, and mild summers. The region experiences significant temperature variations, with winter temperatures often dropping below freezing. The topography and climatic conditions contribute to frequent thermal inversions during autumn and winter, trapping pollutants close to the ground and leading to high concentrations of particulate matter [14].
Located in the northern coastal sector of the Valparaíso Region, Quintero and Puchuncaví host the "Ventanas Industrial Complex," which includes over 17 industrial facilities such as thermoelectric plants, petrochemical refineries, and a copper smelter. The climate is Mediterranean, with dry summers and mild, wet winters. The coastal location results in moderate temperatures and high humidity with prevalent sea breezes that can influence the dispersion of pollutants. However, atmospheric conditions sometimes lead to pollutant accumulation, affecting air quality in the region [20]. 2018 marked a dire incident in this region when approximately 1,415 individuals experienced poisoning from various gases, including methyl chloroform, nitrobenzene, toluene, sulfur dioxide, and particulate matter [9]. Particularly alarming was the disproportionate impact on school-going children from institutions such as "La Greda," "Alonso Quintero," and "Francia," prompting the National Disaster Prevention and Response Service (ONEMI) to declare a yellow alert for one week in the communes of Quintero and Puchuncaví. Similar incidents had been documented in 2011, indicating a recurring and persistent challenge in this industrialized zone [20]. The Ventanas Industrial Complex assumes a substantial role in Chile’s emission profile, contributing to 22% of the nation’s total emissions of carbon dioxide, particulate matter, sulfur dioxide, and nitrogen oxides. It makes it the second-largest sacrifice zone in terms of its impact on air pollution in the country, with Mejillones municipality claiming the first position at 32% [21]. The severity of the situation underscores the urgent need for stringent environmental regulations, effective monitoring, and proactive measures to mitigate the adverse effects of industrial activities on the air quality and public health in this critical zone.
Both studied zones grapple with the intricate interplay of natural features, with Coyhaique’s mountainous surroundings influencing pollutant dispersion [19] and Quintero’s and Puchuncavi’s coastal proximity [22, 23]. This complexity underscores the urgency for coordinated efforts by the government and private sector, involving stringent regulations, effective enforcement, promotion of clean energy, and comprehensive public awareness campaigns.
In this context, providing alert systems in the short term becomes crucial to anticipate alert, pre-emergency, and environmental emergency levels with the final aim of mitigating the health impacts on the population [24]. Proactive measures can be enacted, such as halting industrial operations and reducing vehicular traffic in urban centers for specified durations [25]. Consequently, forecasting the concentration values of particulate or gaseous pollutants is indispensable for timely risk prediction.
Previous studies on air quality forecasting in Chile have provided valuable insights into the challenges and opportunities associated with predicting pollutant levels. Notably, Díaz-Robles et al. (2008) employed a hybrid model combining AutoRegressive Integrated Moving Average (ARIMA) and Artificial Neural Networks (ANN) for PM10 forecasting in Temuco [11], showcasing improved accuracy over individual models. It was demonstrated that incorporating variables such as wind speed, precipitation, relative humidity, solar radiation, and atmospheric pressure enhanced the model’s predictive capabilities. The study effectively captured complex patterns, achieving 100% accuracy in alert episodes and 80% in pre-emergency episodes. Additionally, a dedicated effort focused on Coyhaique by [17] developed an ANN model and a linear model for PM2.5 forecasting, demonstrating the significance of accurate predictions in a region heavily impacted by wood stove emissions during fall-winter seasons. In this case, including meteorological variables like average temperature, wind speed, thermal amplitude, wind direction, and accumulated precipitation further contributed to precise PM2.5 predictions. The results highlighted the neural network model’s ability to achieve a Pearson correlation (R2) of about 0.95, a normalized mean error of 18%, and an 84% prediction accuracy for critical air quality days in Coyhaique. In the context of pollution and socioeconomic variables, using multiple linear regression with ordinary least squares (OLS) presented insights into the relationship between air pollution and factors such as income poverty, multidimensional poverty, and energy poverty [26]. The study found significant positive correlations between air pollutants (PM10, PM2.5, and SO2) and variables reflecting poverty levels by employing fixed effects for years and months.
Considering the limited utilization of statistical models for air quality assessment in Chile, the proposed study aims to build upon these foundations. The selection of ARIMA and ANN methodologies is justified by their proven effectiveness in capturing linear and nonlinear patterns in data from air quality. By building upon the strengths demonstrated in the cited works, the hybrid ARIMA-ANN approach is anticipated to enhance forecast accuracy, especially for critical pollutants such as SO2, PM2.5, and PM10 in Chilean cities Quintero and Coyhaique.
Regarding predictive models, both ARIMA and ANN models serve as prominent methodologies for air quality forecasting, each with distinct advantages and limitations [27, 28]. ARIMA models offer simplicity in implementation and interpretation due to their linear nature. They effectively capture recurring patterns within time series data, a valuable trait for air quality forecasts. Additionally, these models demonstrate robustness in handling missing data and outliers [29]. However, ARIMA models have limitations. Before delving into the process depicted in Fig 1, ensuring the time series data’s stationary is imperative, which is essential for ARIMA’s functionality.
[Figure omitted. See PDF.]
Furthermore, they might struggle when dealing with highly nonlinear or complex relationships in specific datasets. The application process involves several steps. First, the time series must be rendered stationary, involving the removal of trends and seasonality. Then, determining the order of each part of the ARIMA model is crucial for forecasting pollutant concentrations [30].
On the contrary, ANN models represent a category of nonlinear models deeply rooted in machine learning principles, mirroring the learning processes of real neurons [31]. The schematic illustration presented in Fig 2 provides a comprehensive overview of the intricate architecture inherent in an ANN model. This visual representation effectively captures the model’s complexity by delineating the inputs, hidden layers, and outputs. Each input node corresponds to the initial parameters and variables input into the network, while the hidden layers epitomize sophisticated internal processing mechanisms. The output node serves as the endpoint, showcasing the final result generated by the model, revealing the intricate interconnections and weight distributions.
[Figure omitted. See PDF.]
Significantly, the ANN algorithm autonomously adjusts these parameters to optimize error minimization, as [32] highlighted. An exemplary application of this model type is found in recurrent neural networks, which are particularly well-suited for time-series data analysis, a critical aspect in air quality forecasting. Leveraging past data, these models can make precise predictions about future air quality conditions.
Moreover, incorporating hyperparameter optimization techniques, such as the Keras tuner framework, is crucial in refining these models. This process involves fine-tuning various parameters to minimize the mean absolute percentage error (MAPE) metric, thereby further enhancing the accuracy of air quality forecasting models [27]. The amalgamation of cutting-edge technology and sophisticated machine learning models signifies a notable advancement and holds substantial promise in significantly elevating the accuracy of air quality forecasting. This technological leap is paramount for effective environmental monitoring and the success of public health initiatives, as underscored by contemporary research findings [33]. Despite the strengths demonstrated by the ARIMA and ANN models in air quality forecasting, several limitations must be acknowledged. Data availability and quality pose significant challenges; gaps or inconsistencies in the monitoring data may occur due to equipment malfunctions or maintenance periods, potentially affecting model accuracy.
Additionally, air quality is influenced by many factors, including meteorological conditions, geographical features, and human activities [34]. While the models can incorporate key variables such as wind speed and direction, other influential factors like solar radiation, atmospheric pressure, and human-induced emissions could not be included due to data limitations. This exclusion may limit the models’ ability to capture the full complexity of pollutant behavior.
Furthermore, the inherent limitations of the ARIMA and ANN models should be considered. Although effective in capturing linear patterns, ARIMA models may struggle with highly nonlinear or complex relationships in the data. While adept at modeling nonlinear patterns, ANN models require large datasets for training and can be prone to overfitting. The hybrid ARIMA-ANN approach aims to mitigate these individual limitations, but further refinement is possible. Additionally, focusing on specific regions like Quintero and Coyhaique with unique geographical and climatic conditions may limit the generalizability of the results to other areas. Extending the study to include more diverse locations and incorporating additional variables could enhance the robustness and applicability of the findings.
Materials and methods
Data collection
Chile maintains a network of 219 monitoring stations for meteorological variables and air quality, integrated into the "National Air Quality Information System" (SINCA) [35]. Therefore, the study’s main goal was to propose models for each city capable of describing the main meteorological and air quality variables for an adequate forecast of SO2, PM2.5, and PM10 levels. The Quintero station was chosen due to its proximity to populated residential areas, making it representative of human exposure to air pollutants resulting from both industrial emissions and urban activities. The Ventanas station was selected because it is located closest to the primary industrial emission sources in the area, such as copper smelters and coal-fired power plants. This station captures the direct impact of industrial activities on pollutant concentrations, providing valuable data on emissions from the industrial zone. In the case of Coyhaique, the monitoring stations selected are Coyhaique I and Coyhaique II. These are the only official monitoring stations in the Coyhaique region that provide continuous and validated air quality data. Their selection was essential to capture the air quality influenced by residential heating, especially during colder months when wood burning is prevalent. These stations offer comprehensive coverage of the area’s air quality conditions, allowing for accurate modeling of pollutants like PM₂.₅ and PM₁₀, which are significantly affected by local heating practices. As shown in Fig 3, the red dots represent the locations of these monitoring stations, illustrating their strategic placement concerning emission sources and populated areas.
[Figure omitted. See PDF.]
The data extracted from SINCA corresponds to hourly data validated by the Ministry of the Environment for SO2, PM10, and PM2.5 concentrations. The period analyzed corresponds to the years between 2016 and 2021—the monitoring stations operated by SINCA use internationally recognized methods and calibrated instruments for measuring air pollutants. Regular maintenance and calibration of equipment are performed to comply with quality assurance protocols established by the Chilean Ministry of the Environment. The data undergo rigorous validation processes, including checks for completeness, consistency, and plausibility, ensuring that only high-quality data are used for analysis [35].
Software and validation
The forecast was made using a hybrid ARIMA-ANN approach. The forecast [36] and caret packages [37, 38] of the R software were used for the generation, implementation, and validation. On the other hand, the analysis of seasonal cycles and identification of patterns was also carried out with OpenAir [39].
Both models were generated for the representative months of winter (June) and summer (December), that is, eight models. The first step was the determination of temporal and statistical patterns in the data, which were divided into the training and validation sets of the models (80–20%).
The generation of the ARIMA model and the parameter estimation followed the methodology presented in [11], using the auto.arima() function [36], which determines the order of the autoregressive (p), differencing (d), and moving average (q) parameters. The series was plotted before inputting the data into the function, and any anomalies were identified. Also, the data were processed to stabilize the variance with a logarithmic transformation. The procedure stopped when the first insignificant result was obtained. The parameter ‘d’ was chosen based on successive unit root tests of KPSS [40]. The data were tested for a unit root; if the test result was significant, the differenced data were tested for a unit root, and so on. Parameters ‘p’ and ‘q’ were selected based on minimizing the Akaike Information Criterion (AIC). This function performed the fewest possible differentiations for prediction purposes, as choosing parameter ‘d’ based on AIC minimization could lead to over-differentiation, affecting the forecast and widening prediction intervals.
On the other hand, the neural network model selected was the Neural Network Auto-Regressive (NNAR) algorithm, using the nnetar() function from the forecast package [36], which combines multi-layer neural network models with an autoregressive linear model. for better processing of information and working with complex dynamic systems. These models have only one hidden layer [36, 41], where the order ‘k’ denotes the number of neurons present. An order ‘p’ within the model indicates the autoregression order. An NNAR (p,0) model is similar to an ARIMA (p,0) model but without parameter limitations that ensure stationarity. For this reason, the same p values provided by the ARIMA model have been used. For the parameter k, the default value, which is half the number of input nodes (including external regressors, if present) plus 1, was evaluated [36, 42]. The gridSearch method [40] was also employed, creating a list of values and evaluating the model for each combination. The optimal model was selected based on statistical deviations. A total of 20 repetitions were conducted, corresponding to the number of networks fitted with different random starting weights. These networks were then averaged to generate the final forecasts.
For the validation, the Akaike Information Criterion (AIC) was used, which indicates the difference between the complexity of the model and its goodness of fit, measuring how explanatory and predictive a model is. When comparing this indicator between different models, it should be as small as possible since it would be closer to the complexity of the data, thus predicting values that are similar to those observed in reality [43].
Moreover, a comprehensive analysis of the residual values becomes imperative post-model development, as this subset might encompass crucial values necessitating model modifications to avert potential anomalies. The Ljung-Box statistical test emerges as a pivotal tool in this evaluation process, subjecting the residuals to three fundamental criteria that define a model’s efficacy [44, 45]. Firstly, the test scrutinizes whether the time series of residuals aligns with the characteristics of white noise, denoting an absence of discernible trends or patterns. Secondly, it verifies the insignificance of residual lags, ensuring that past residuals do not exert undue influence on the current study period. Thirdly, the test assesses whether the distribution of residual values conforms to a normal distribution, contributing to the robustness of the model evaluation process [44].
The Ljung-Box test hinges on two statistical parameters, denoted as p and Q. The former (p) is associated with a null hypothesis, presupposing that the residuals exhibit characteristics of white noise. In contrast, the alternative hypothesis posits the absence of such characteristics. Conversely, the Q statistic is juxtaposed against a Chi-square distribution and must fall below the critical value in this distribution for the null hypothesis to hold. Furthermore, a p-value exceeding 0.05 is pivotal for affirming the fulfillment of the stipulated conditions. In comparing prediction models, the preferable choice is the one manifesting a lower value in the Q test statistic while concurrently satisfying the stipulated p criteria.
Several strategies were employed to prevent overfitting in our models. For the ANN models, regularization techniques such as L2 regularization and dropout layers were used to constrain model complexity. Early stopping was also implemented, monitoring the validation loss to halt training when necessary. Cross-validation techniques, specifically k-fold cross-validation, were employed to assess the model’s performance on unseen data, ensuring its generalizability. The models were evaluated using performance metrics, including correlation coefficient (R2), Root Mean Square Error (RMSE), and Mean Absolute Percentage Error (MAPE) on both training and validation datasets. Residual analysis was conducted to verify the randomness of residuals, ensuring that the models were not overfitted. The p and Q statistics were pivotal in selecting the ARIMA models and developing the ANN models. Subsequently, the statistical values assumed a crucial role in evaluating the forecast performance of the hybrid ARIMA-ANN models, such as correlation coefficient (R2), RMSE, and MAPE. We ensured a robust and accurate model evaluation process by integrating these performance metrics into the overfitting prevention strategies [30, 46].
Results
Statistical analysis of meteorological data
Over the period from 2016 to 2021, the data available for PM2.5 and PM10 at the Coyhaique I monitoring station stood at 98.2% and 97.9%, respectively, while for Coyhaique II, these figures were 96.5% for PM2.5 and 96.0% for PM10. On the other hand, Quintero stations showed between 96.2% and 98.8% of data availability. In Ventanas, the PM2.5 ranged between 15 and 20 μg/m3. The SO2 had average peaks in winter above 18 μg/m3 and minimums of 12 μg/m3 in summer. PM2.5 had similar values at Quintero Station, but the SO2 reached 70 μg/m3 peaks, mainly in June.
The peak concentrations of PM2.5 in Coyhaique I and Coyhaique II were observed during May, June, and July by examining the seasonal patterns. Notably, May registered the highest levels with 204.0 and 192.0 μg/m3 for Coyhaique I and Coyhaique II, respectively. June and July followed suit, with concentrations ranging from 117.0 to 129.0 μg/m3. Shifting the focus to PM10 concentrations, a similar trend emerged, with the highest levels observed during May, June, and July. May recorded the highest concentrations for both monitoring stations, ranging from 157.0 to 222.0 μg/m3. Interestingly, the summer months of December, January, and February displayed lower pollutant levels. December recorded the lowest concentrations, with 6.53 for PM2.5 and 10.6 μg/m3 for PM10. January and February continued this trend, with concentrations ranging from 5.82 to 11.2 μg/m3.
Geographical and climatic factors influence the challenging air quality conditions in Coyhaique. The basin’s topography, combined with the seasonal prevalence of low winds during fall and winter, creates a stagnation effect that impedes the effective dispersion of pollutants [47]. The geographical context, characterized by surrounding hills and valleys, exacerbates this situation, accumulating pollutants in the air.
The harsh winter temperatures, often falling within the range of -10 to 5°C, further compound the problem. The necessity for heating during this period contributes significantly to the heightened levels of particulate matter. The increased use of combustion-based heating sources, such as wood-burning stoves, releases substantial amounts of pollutants into the atmosphere.
Several studies conducted in similar regions with complex topography and climatic conditions echo the challenges faced by Coyhaique. These studies highlight the intricate interplay between meteorological factors and air quality. The findings underscore the need for region-specific strategies to mitigate air pollution, acknowledging the unique environmental dynamics contributing to pollutant accumulation [48].
Comprehensive air quality management plans must consider regulatory measures, community engagement, and awareness to address these challenges. Implementing alternative heating technologies, promoting energy-efficient practices, and fostering a community-wide commitment to reducing emissions are essential components of a holistic approach [22, 49].
Results of ARIMA models
The different prediction models generated are shown in Table 1, with their respective order, AIC and Ljung–Box test results for each pollutant, monitoring station, and study period.
[Figure omitted. See PDF.]
Model analysis for Quintero.
As mentioned above, the monitoring stations are located near the industrial zone, which is the primary source of air pollution; therefore, it was essential to consider where the toxic gases measured by the monitoring stations originate to determine the maximum concentrations of pollutants. It was directly affected by the direction of the wind, as the masses of toxic gases moved from the emission source to the monitoring station.
According to Table 1, the AIC decreased when another variable was added to the model generation, and many external factors changed the pollutant concentration measurement. It generated a more realistic picture of what is happening in the atmosphere, thus achieving better models and subsequent predictions. Considering the AIC values, the ARIMA models containing the average wind speed and the wind direction of interest as the external variables resulted in the lowest values.
On the other hand, the ARIMA above models meet the Ljung–Box test conditions, with p below 0.05. The following discussion considered the ARIMA models with those conditions. The order of the models depends on the study period at the Quintero Industrial Zone.
Considering SO2 at Ventanas station for the December 2019 period, this model was affected only by the autoregression part of the ARIMA model. In other words, observations from previous periods of the same variable were highly influential in predicting pollutant concentrations. In this case, the order of the autoregression was 5, which indicates that the variable to be predicted was affected by observations of the same variable from five periods before the one under study; concerning the second-order shown, (d equal to 0) indicates that the time series under study did not need to be differentiated since the series was already stationary. As it was a model with a relatively high autoregressive order, its errors increase since the prediction generated depends on periods far from the one under study, thus increasing the difference between the observed and the prediction, causing the trend to be lost and forecasts outside the observed range.
By analyzing the same pollutant for the same period but for the Quintero station, the results are impacted by the autoregression sections and the moving average of the ARIMA model, the latter being the one that predominates in the predictions generated. In this case, the forecast is influenced by observing the same variable before the one under study (p = 1), which is also evident during the winter at the same station and for both PM2.5 and SO2 at the Ventanas station.
On the other hand, the ARIMA results with a value of q = 2 imply that the model is affected by the residuals of the observations of the same variable from two periods before the one under study. It was manifested for SO2 for both periods at the Quintero station and PM2.5 but for the December period at the Ventanas station.
It is also observed that we have a "d" equal to 0, which indicates that the time series does not need differentiation to be stationary. SO2 concentrations were expressed for both stations in the summertime (December) and PM2.5 at the Ventanas station in June.
ARIMA models with configurations (0,1,2), (1,1,1), (1,1,2), (1,0,2), and (1,0,3) have low order. A good prediction and low errors are expected, mainly because the forecast depends on periods that are not thus far from the current one. In contrast, model (5,0,0) is considered to have a relatively high autoregressive order, generating an increase in its errors since the prediction generated depends on periods far from the one under study, thus increasing the difference between the observed and the prediction, causing the trend to be lost and forecasts to be outside the observed range.
Model analysis for Coyhaique.
The structural terms of the ARIMA model for monitoring station Coyhaique I and PM2.5 showed an autoregressive term oscillating between 3 and 5. For the moving average term, the range was between 0 and 1. Finally, all the models have a value of 1 for the differencing term. All AIC values were less than 3000. The test using relative humidity as a covariable and without any external variables had lower Q-statistic values of 4.56 and 4.20, respectively, while the others had values greater than 30. Additionally, those models had p-values greater than 0.05, considered adequate [28].
After analyzing the residual graph at both stations for PM2.5, only the test with relative humidity as a covariable showed the normality and white noise requirements. The rest exhibited lags outside the range, indicating significant autocorrelations. For this reason, the ARIMA (2,1,1) and (4,1,0) models achieved the best performances and were selected for the neural network analysis.
According to the analysis of the PM10 concentration at the Coyhaique I station, models (3,0,0) and (1,0,0) yielded negative AIC values, contrasting with the other models. Notably, the model with wind speed as a covariable demonstrated a significantly greater AIC value of 2327.8. Considering the Ljung–Box statistic, only the lower-order model manifested a p-value of 0.65. Furthermore, a collective evaluation incorporating the Q statistic revealed consistent values, with the lowest observed value (14.19).
The autocorrelation function (ACF) showed the best performance for the (1,0,0) model, depicting a lack of significant autocorrelations across all lags. This configuration emerges as the most judicious choice for predicting PM10 concentrations at the Coyhaique I monitoring station. This model was the most suitable candidate for predictions, corroborated by its adherence to white noise, indicating a normal distribution and optimal parameters (Guisande et al., 2011).
The structural terms of the identified models for the Coyhaique II station for the PM2.5 pollutants were mostly the ARIMA (5,1,0) configuration. Moreover, the configuration with an autoregressive (AR) term of 4, a moving average (MA) term of 0, and a differencing parameter (d) of 1 exhibited the best performance. For the Ljung–Box test of the transformed models, the Q statistic was lowest at 3.81. Only the model (4,1,0) exhibited a p-value greater than 0.05, while the other models had p-values ranging from 10−6 to 10−11. A graphical examination of the autocorrelation function (ACF) revealed that models with relative humidity and wind speed as covariables were the only ones that displayed an appropriate structure for PM2.5. In the case of residual plots, all the models exhibited a normal structure. As mentioned above, the ARIMA model (4,1,0) is suggested for predicting PM2.5 at the Coyhaique II station.
The analysis of the PM10 pollutant concentration at the Coyhaique II station revealed an ARIMA (1,1,2) structure without covariability. In contrast, the ARIMA (4,1,0) structure considered the relative humidity as an external variable, the same as that reported for PM2.5. The remaining covariables were persistent with the ARIMA (5,1,0) configuration. Four negative values were obtained according to the AIC analysis of the transformed models. Moreover, in the Ljung–Box test, only one model exceeded the necessary value to consider white noise (p-value should be greater than 0.05): ARIMA (1,1,2).
Moreover, this configuration had the lowest Q statistic value, at 4.14, while the other values ranged between 30 and 63. Additionally, the graphs obtained for each model showed that only the model without covariables met the necessary lag range. However, all the configurations graphically exhibited white noise, and the residuals also demonstrated a normal distribution. Considering all the information, the ARIMA (1,1,2) model was deemed the most suitable for predicting PM10. This evaluation considered which model best satisfied all the parameters assessed. For example, the AIC of the chosen model was not the lowest; however, other models lacked the appropriate p-value and did not exhibit the required graph adjustments. It is important to remember that no single parameter alone was decisive in choosing the best model, so a comprehensive evaluation was necessary.
Results of ANN models
The neural network models were built using the ARIMA model previously selected as a basis. The same covariates were considered for the model run for each monitoring station. Also, the Box-Cox method was used to treat the normality required.
The models had the structure (p,k); the first value shown (p) indicates how many observations of the same variable, but in previous periods, significantly affect the prediction, which was added as input variables, the second value (k) indicates the number of nodes or neurons that were in the hidden layer.
The model NNAR (1,2) resulted in both pollutants at Ventanas station during June, for PM2.5 at the same location but in December, and only for SO2 at Quintero station in both periods. It was the most repeated configuration. In those cases, the prediction only depends on observing the same variable from an earlier period. Also, there are two variables in the input layer, meaning there are only two neurons in this layer. The value of k equals two, indicating two nodes in the hidden layer, thus making it a simple but effective neural network. Since there were fewer neurons, there was better feedback between the nodes, and information and variables could be processed more accurately.
The resulting model for PM2.5 at Quintero station during June and December was NNAR (2,2), indicating that the forecast depends on observations of the same variable from two periods before the one under study, thus having three neurons in the input layer to process the data. Concerning the neurons in the hidden layer, two nodes are present, similar to the discussion cases mentioned above, capable of effectively transferring and transforming information to the output layer.
For SO2 at Ventanas station in December, the NNAR model (5.4) indicates that the predictions are affected by observations of the same variable of interest from five periods before. This model also points out that four neurons are within the hidden layer, which is to be expected due to a large number of inputs and a more extensive connection required to process all the incoming information.
In the case of the Coyhaique I monitoring station, the model showed, for PM2.5 and using relative humidity as a covariate, an NNAR (2,2). Meanwhile, for Coyhaique II, it was NNAR (4,3). However, for the pollutant PM10, using historical relative humidity information and PM2.5 concentration as auxiliary variables, the resulting model was NNAR (2,1) for the Coyhaique I station. Meanwhile, the input for the Coyhaique II station was the historical PM10 concentrations, resulting in NNAR (1,1).
Performance of the ARIMA and the ANN models
Figs 4 and 5 show the time series comparison of the observed values and predicted registries obtained from ARIMA and ANN models.
[Figure omitted. See PDF.]
[Figure omitted. See PDF.]
The historical records for PM2.5 and PM10 during June at both stations had a wider range of values than the December registries. In other words, during the summer in Coyhaique, there are low oscillations of particulate matter compared to the winter period, when ventilation conditions have a marked influence on the dispersion of pollutants. Most predictions explain that more predicted points coincided with the observed values in December, showing a simulation with better statistics performance than those obtained in June. According to Fig 5, the last days of both months have more similar behaviors than the first 15 days. All models in their two comparison periods gave an R2 statistic greater than 0.9, as shown in Table 2.
[Figure omitted. See PDF.]
In general, the predicted data in June 2019, both in neural networks and ARIMA models, differed from the observation in December 2019, which behaved more accurately in all models. If ARIMA and neural network models are compared, it is clear that the latter had a higher accuracy for both pollutants at the Coyhaique I and II monitoring stations. It is explained by its learning capabilities for self-adaptiveness to time series with fluctuating patterns and trends over time [28], which is the case with historical data in Coyhaique. Other studies have used many machine learning techniques for forecasting air quality. A scheme using an optimized recurrent neural network showed that the LSTM encoder-decoder model had the best performance and successfully forecasted PM2.5 concentrations with a mean absolute percentage error (MAPE) of 28.2%, 15.07%, and 42.1% daily and 11.75%, 9.5%, and 7.4% hourly for different cities in Pakistan [50], similar values than achieved in this study.
On the other hand, using advanced technology, such as recurrent neural networks, contributes to the accuracy of air quality forecasting. These models facilitate the analysis of vast datasets, identifying intricate patterns and relationships among various variables [33], a capability shared with the hybrid ARIMA-ANN approach employed in the present investigation. The findings of this study align with those of prior research, such as [51], which demonstrated a similar performance of Artificial Neural Networks (ANNs). In that study, a unified architecture was employed, which combined tree-based architectures to capture spatial dependencies and temporal patterns. Using meteorological variables, air pollutant concentrations, and external data, it was trained to predict NO2, O3, SO2, CO, PM10, and PM2.5 levels across diverse locations without retraining.
Wind direction significantly influences the transport and dispersion of pollutants from emission sources to monitoring stations. In industrial areas like Quintero and Puchuncaví, prevailing winds can carry pollutants toward residential zones, affecting air quality. By incorporating wind direction into the models, it becomes possible to account for these dynamics, leading to more accurate predictions of pollutant concentrations at specific locations. Similarly, wind speed affects the dilution and dispersion of pollutants. Higher wind speeds typically enhance dispersion, reducing pollutant concentrations, while lower wind speeds can lead to pollutant accumulation. Including wind speed as a covariate allows the models to adjust predictions based on these dispersion conditions, improving accuracy.
The improved performance metrics observed in the models that included external covariates underscore their importance. The models achieved higher R2 values and exhibited lower RMSE and MAPE values, indicating more precise and reliable predictions. It demonstrates that integrating external meteorological factors enhances the models’ ability to capture the complex interactions influencing air pollutant levels. In the case of the Coyhaique stations, the inclusion of relative humidity as an external covariate significantly improved model performance for PM₂.₅ and PM₁₀ predictions. The valley topography and climatic conditions in Coyhaique contribute to pollutant accumulation, especially during winter when residential heating is prevalent.
Considering these insights, the predictive models developed in this study have significant implications for air quality management in Chile. The high accuracy of the hybrid ARIMA-ANN models underscores their potential as practical tools for forecasting pollutant concentrations like SO₂, PM₂.₅, and PM₁₀. These models can inform early warning systems, enabling authorities to issue timely alerts and implement mitigation strategies to protect public health. Additionally, the models can guide policy decisions by highlighting the influence of meteorological factors and emission sources on air quality. Future investigations could focus on integrating additional contextual information specific to the analyzed regions, such as localized emission inventories, land use patterns, and socioeconomic factors. This integration could enhance the models’ precision and adaptability, making them even more effective for region-specific applications. Expanding the modeling approach to include other pollutants and testing it in different geographical areas could further contribute to a comprehensive national strategy for air quality management. By leveraging advanced modeling techniques and incorporating a broader range of variables, these predictive models can be crucial in addressing environmental challenges and improving air quality forecasts in Chile.
Conclusions
The comprehensive analysis of air quality data spanning the years 2016 to 2021 revealed critical insights into the atmospheric conditions of Chile, particularly in the cities of Quintero, Puchuncaví, and Coyhaique. Throughout the investigative period, applying a hybrid forecasting approach, integrating Autoregressive Integrated Moving Average (ARIMA) models and Artificial Neural Networks (ANN), emerged as a robust tool for predicting pollutant levels. Building upon fine-tuned ARIMA models, rigorously evaluated using metrics such as the Akaike Information Criterion (AIC) and Ljung-Box statistical tests, external covariates, including wind speed and direction, were incorporated to enhance model realism, especially within the Quintero industrial zone.
Meteorological analyses underscored the significant influence of geographical and climatic factors on air quality dynamics. In Coyhaique, the interplay of topography, seasonal wind patterns, and low temperatures during fall and winter created a stagnation effect, impeding the dispersion of pollutants. The study emphasized the importance of region-specific strategies for effective air quality management, acknowledging the unique environmental dynamics contributing to pollutant accumulation.
Evaluation of model performance consistently demonstrated the efficacy of the hybrid ARIMA-ANN approach. The models achieved R2 values exceeding 0.90 across all monitored pollutants and stations, indicating strong correlations between predicted and observed values. Specifically, Mean Absolute Percentage Error (MAPE) values were below 1% for PM₂.₅ predictions in Coyhaique, demonstrating exceptional model precision.
Comparisons with current literature trends, such as the increasing utilization of such models when larger datasets are available, underscored the relevance and effectiveness of the approach. Insights from recent studies, like those referenced, highlight the potential for further incorporating specific contextual information to enhance the accuracy of air quality predictions. The findings of this study significantly advance the understanding of air quality dynamics in Chile and advocate for the integration of advanced technologies, like neural networks, in future endeavors to improve air quality forecasting. By addressing the identified limitations—such as data availability and the need to incorporate additional influential variables—and exploring potential enhancements, researchers can further contribute to advancing this field, ultimately leading to better environmental management and public health outcomes.
References
1. 1. Mahendra H. N. et al., “Assessment and Prediction of Air Quality Level Using ARIMA Model: A Case Study of Surat City, Gujarat State, India,” Nat. Environ. Pollut. Technol., vol. 22, no. 1, pp. 199–210, Mar. 2023,
* View Article
* Google Scholar
2. 2. Manisalidis I., Stavropoulou E., Stavropoulos A., and Bezirtzoglou E., “Environmental and Health Impacts of Air Pollution: A Review,” Front. Public Heal., vol. 8, p. 505570, Feb. 2020, pmid:32154200
* View Article
* PubMed/NCBI
* Google Scholar
3. 3. IQAir, “World Air Quality Report,” 2021.
* View Article
* Google Scholar
4. 4. Shams S. R. et al., “Assessing the effectiveness of artificial neural networks (ANN) and multiple linear regressions (MLR) in forcasting AQI and PM10 and evaluating health impacts through AirQ+ (case study: Tehran),” Environ. Pollut., vol. 338, p. 122623, Dec. 2023, pmid:37806430
* View Article
* PubMed/NCBI
* Google Scholar
5. 5. Rossi D., Mascolo A., Mancini S., Breton J. G. C., Breton R. M. C., and Guarnaccia C., “Modelling and Forecast of Air Pollution Concentrations during COVID Pandemic Emergency with ARIMA Techniques: the Case Study of Two Italian Cities,” WSEAS Trans. Environ. Dev., vol. 19, pp. 151–162, Feb. 2023,
* View Article
* Google Scholar
6. 6. WHO, “World health statistics 2022: monitoring health for the SDGs, sustainable development goals,” 2022.
7. 7. IQAir, “World’s most polluted cities,” 2022.
8. 8. Alam M. A., Mukherjee A., Bhattacharya P., and Bundschuh J., “An appraisal of the principal concerns and controlling factors for Arsenic contamination in Chile,” Sci. Rep., vol. 13, no. 1, p. 11168, Jul. 2023, pmid:37429943
* View Article
* PubMed/NCBI
* Google Scholar
9. 9. Varela D., Corrales C., Corrales T., Salazar F., Solís C., and Yáñez-Figueroa C., “Asociación entre los niveles de NO, NO2, SO2, O3, CH4 en el aire y las tasas de hospitalización del Hospital Adriana Cousiño de Quintero durante los años 2012 al 2018,” Rev. ANACEM, vol. 14, no. 2, pp. 29–42, 2020, Accessed: Jun. 30, 2022. [Online]. Available: https://revista.anacem.cl/wp-content/uploads/2021/05/Asociacion-entre-los-niveles-de-NO-NO2-SO2-O3-CH4-en-el-aire-y-las-tasas-de-hospitalizacion-del-Hospital-Adriana-Cousino-de-Quintero-durante-los-anos-2012-al-2018.pdf
* View Article
* Google Scholar
10. 10. Álamos N. et al., “High resolution inventory of atmospheric emissions from transport, industrial, energy, mining and residential sectors of Chile Earth System Science Data Discussions,” Earth Syst. Sci. Data, 2021,
* View Article
* Google Scholar
11. 11. Díaz-Robles L. A. et al., “A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile,” Atmos. Environ., vol. 42, no. 35, pp. 8331–8340, Nov. 2008,
* View Article
* Google Scholar
12. 12. Ministry of Environment, The geographical area that includes the city of Coyhaique and its surrounding area is declared an area saturated by Respirable Particulate Material MP10, as a daily and annual concentration, in accordance with the polygon indicated. Chile: https://bcn.cl/2kh0n, 2012.
13. 13. Ministry of Environment, The geographical area that includes the city of Coyhaique and its surrounding area is declared a saturated zone for Fine Respirable Particulate Matter MP2.5, as a 24-hour concentration. Chile: https://bcn.cl/2kh0m, 2016.
14. 14. Ministry of Environment, Establishes Atmospheric Decontamination Plan for the city of Coyhaique and its surrounding area. Chile: https://bcn.cl/2kgyh, 2017.
15. 15. Perez P., Menares C., and Ramirez C., “PM2.5 Forecasting in the most polluted city in South America,” in WIT Transactions on Ecology and the Environment, vol. 230, Syngellakis S, Ed. WIT Press, 2018, pp. 199–204. https://doi.org/10.2495/AIR180181
16. 16. Muñoz-Ibáñez F. G. and Cáceres-Lillo D. D., “Impacto del recambio de tecnología de calefacción en la concentración atmosférica por MP2,5 y en las admisiones por urgencias respiratorias en Coyhaique, Chile,” Cad. Saude Publica, vol. 36, no. 6, 2020, pmid:32609172
* View Article
* PubMed/NCBI
* Google Scholar
17. 17. Perez P., Menares C., and Ramírez C., “PM2.5 forecasting in Coyhaique, the most polluted city in the Americas,” Urban Clim., vol. 32, p. 100608, Jun. 2020,
* View Article
* Google Scholar
18. 18. Zhang M. et al., “Characters of Particulate Matter and Their Relationship with Meteorological Factors during Winter Nanyang 2021–2022,” Atmosphere (Basel)., vol. 14, no. 1, p. 137, Jan. 2023,
* View Article
* Google Scholar
19. 19. Solís R. et al., “Long-term airborne particle pollution assessment in the city of Coyhaique, Patagonia, Chile,” Urban Clim., vol. 43, p. 101144, May 2022,
* View Article
* Google Scholar
20. 20. Valenzuela-Fuentes K., Alarcón-Barrueto E., and Torres-Salinas R., “From Resistance to Creation: Socio-Environmental Activism in Chile’s ‘Sacrifice Zones,’” Sustainability, vol. 13, no. 6, p. 3481, Mar. 2021,
* View Article
* Google Scholar
21. 21. Peragallo R., “The State Production of Sacrifice Zones in Chile: An In-Depth Study of the Quintero-Puchuncavi Case,” Pontificia Universidad Catolica de Chile, Santiago, 2020.
22. 22. Pino-Cortés E., Carrasco S., Acosta J., de Almeida Albuquerque T. T., Pedruzzi R., and Díaz-Robles L. A., “An evaluation of the photochemical air quality modeling using CMAQ in the industrial area of Quintero-Puchuncavi-Concon, Chile,” Atmos. Pollut. Res., vol. 13, no. 3, p. 101336, Mar. 2022,
* View Article
* Google Scholar
23. 23. Seguel R. J. et al., “Volatile organic compounds measured by proton transfer reaction mass spectrometry over the complex terrain of Quintero Bay, Central Chile,” Environ. Pollut., vol. 330, p. 121759, Aug. 2023, pmid:37146872
* View Article
* PubMed/NCBI
* Google Scholar
24. 24. Carreño G., López-Cortés X. A., and Marchant C., “Machine Learning Models to Predict Critical Episodes of Environmental Pollution for PM2.5 and PM10 in Talca, Chile,” Mathematics, vol. 10, no. 3, p. 373, Jan. 2022,
* View Article
* Google Scholar
25. 25. Issa Zadeh S. B. and Garay-Rondero C. L., “Enhancing Urban Sustainability: Unravelling Carbon Footprint Reduction in Smart Cities through Modern Supply-Chain Measures,” Smart Cities, vol. 6, no. 6, pp. 3225–3250, Nov. 2023,
* View Article
* Google Scholar
26. 26. Herrera P., Rojo J., and Scapini V., “Relationship between pollution levels and poverty: regions of Antofagasta, Valparaiso and Biobio, Chile,” Int. J. Energy Prod. Manag., vol. 7, no. 2, pp. 176–184, Jul. 2022,
* View Article
* Google Scholar
27. 27. Aladağ E., “Forecasting of particulate matter with a hybrid ARIMA model based on wavelet transformation and seasonal adjustment,” Urban Clim., vol. 39, Sep. 2021,
* View Article
* Google Scholar
28. 28. Pakrooh P. and Pishbahar E., “Forecasting Air Pollution Concentrations in Iran, Using a Hybrid Model,” Pollution, vol. 5, no. 4, pp. 739–747, Oct. 2019,
* View Article
* Google Scholar
29. 29. Kaur J., Parmar K. S., and Singh S., “Autoregressive models in environmental forecasting time series: a theoretical and application review,” Environ. Sci. Pollut. Res., vol. 30, no. 8, pp. 19617–19641, Jan. 2023, pmid:36648728
* View Article
* PubMed/NCBI
* Google Scholar
30. 30. Kożuch A., Cywicka D., and Adamowicz K., “A Comparison of Artificial Neural Network and Time Series Models for Timber Price Forecasting,” Forests, vol. 14, no. 2, p. 177, Jan. 2023,
* View Article
* Google Scholar
31. 31. Yang G. R. and Wang X.-J., “Artificial Neural Networks for Neuroscientists: A Primer,” Neuron, vol. 107, no. 6, pp. 1048–1070, Sep. 2020, pmid:32970997
* View Article
* PubMed/NCBI
* Google Scholar
32. 32. Jothilakshmi S. and Gudivada V. N., “Large Scale Data Enabled Evolution of Spoken Language Research and Applications,” 2016, pp. 301–340.
* View Article
* Google Scholar
33. 33. Alkabbani H., Ramadan A., Zhu Q., and Elkamel A., “An Improved Air Quality Index Machine Learning-Based Forecasting with Multivariate Data Imputation Approach,” Atmosphere (Basel)., vol. 13, no. 7, p. 1144, Jul. 2022,
* View Article
* Google Scholar
34. 34. Agarwal A. and Sahu M., “Forecasting PM2.5 concentrations using statistical modeling for Bengaluru and Delhi regions,” Environ. Monit. Assess., vol. 195, no. 4, 2023, pmid:36949261
* View Article
* PubMed/NCBI
* Google Scholar
35. 35. Ministry of Environment, “Sistema de Información Nacional de Calidad del Aire,” https://sinca.mma.gob.cl/, 2021.
36. 36. Hyndman R. J. and Khandakar Y., “Automatic Time Series Forecasting: The forecast Package for R,” J. Stat. Softw., vol. 27, no. 3, 2008,
* View Article
* Google Scholar
37. 37. Kuhn M., “Building Predictive Models in R Using the caret Package,” J. Stat. Softw., vol. 28, no. 5, 2008,
* View Article
* Google Scholar
38. 38. Kuhn M. and Johnson K., Applied Predictive Modeling. London: Springer, 2013. https://doi.org/10.1007/978-1-4614-6849-3
39. 39. Carslaw D. C. and Ropkins K., “openair—An R package for air quality data analysis,” Environ. Model. Softw., vol. 27–28, pp. 52–61, Jan. 2012,
* View Article
* Google Scholar
40. 40. van Greunen J. and Heymans A., “Determining the Impact of Different Forms of Stationarity on Financial Time Series Analysis,” in Business Research, Singapore: Springer Nature Singapore, 2023, pp. 61–76. https://doi.org/10.1007/978-981-19-9479-1_4
41. 41. Zhang X., Ding C., and Wang G., “An Autoregressive-Based Kalman Filter Approach for Daily PM2.5 Concentration Forecasting in Beijing, China,” BIG DATA, 2023, pmid:37134205
* View Article
* PubMed/NCBI
* Google Scholar
42. 42. Tsan Y.-T., Chen D.-Y., Liu P.-Y., Kristiani E., Nguyen K. L. P., and Yang C.-T., “The Prediction of Influenza-like Illness and Respiratory Disease Using LSTM and ARIMA,” Int. J. Environ. Res. Public Health, vol. 19, no. 3, Feb. 2022, pmid:35162879
* View Article
* PubMed/NCBI
* Google Scholar
43. 43. Mondal P., Shit L., and Goswami S., “Study of Effectiveness of Time Series Modeling (Arima) in Forecasting Stock Prices,” Int. J. Comput. Sci. Eng. Appl., vol. 4, no. 2, pp. 13–29, Apr. 2014,
* View Article
* Google Scholar
44. 44. Vallejo F., Díaz-Robles L. A., Vega R. E., and Cubillos F., “A novel approach for prediction of mass yield and higher calorific value of hydrothermal carbonization by a robust multilinear model and regression trees,” J. Energy Inst., vol. 93, no. 4, 2020,
* View Article
* Google Scholar
45. 45. Espinoza Pérez L., Espinoza Pérez A., Pino-Cortés E., Vallejo F., and Díaz-Robles L. A., “An environmental assessment for municipal organic waste and sludge treated by hydrothermal carbonization,” Sci. Total Environ., vol. 828, p. 154474, Jul. 2022, pmid:35276176
* View Article
* PubMed/NCBI
* Google Scholar
46. 46. Das R., Middya A. I., and Roy S., “High granular and short term time series forecasting of PM2.5 air pollutant—a comparative review,” Artif. Intell. Rev., vol. 55, no. 2, pp. 1253–1287, Feb. 2022,
* View Article
* Google Scholar
47. 47. Zafra C., Ángel Y., and Torres E., “ARIMA analysis of the effect of land surface coverage on PM10 concentrations in a high-altitude megacity,” Atmos. Pollut. Res., vol. 8, no. 4, pp. 660–668, Jul. 2017,
* View Article
* Google Scholar
48. 48. Wu Y., Li R., Cui L., Meng Y., Cheng H., and Fu H., “The high-resolution estimation of sulfur dioxide (SO2) concentration, health effect and monetary costs in Beijing,” Chemosphere, vol. 241, p. 125031, Feb. 2020, pmid:31610459
* View Article
* PubMed/NCBI
* Google Scholar
49. 49. Cereceda-Balic F. et al., “Emission factors for PM 2.5, CO, CO 2, NO x, SO 2 and particle size distributions from the combustion of wood species using a new controlled combustion chamber 3CE,” Sci. Total Environ., vol. 584–585, no. x, pp. 901–910, 2017, pmid:28189303
* View Article
* PubMed/NCBI
* Google Scholar
50. 50. Waseem K. H. et al., “Forecasting of Air Quality Using an Optimized Recurrent Neural Network,” Processes, vol. 10, no. 10, p. 2117, Oct. 2022,
* View Article
* Google Scholar
51. 51. Borah J. et al., “AiCareBreath: IoT-Enabled Location-Invariant Novel Unified Model for Predicting Air Pollutants to Avoid Related Respiratory Disease,” IEEE Internet Things J., vol. 11, no. 8, pp. 14625–14633, Apr. 2024,
* View Article
* Google Scholar
Citation: Vallejo F, Yánez D, Viñán-Guerrero P, Díaz-Robles LA, Oyaneder M, Reinoso N, et al. (2025) Enhancing air quality predictions in Chile: Integrating ARIMA and Artificial Neural Network models for Quintero and Coyhaique cities. PLoS ONE 20(1): e0314278. https://doi.org/10.1371/journal.pone.0314278
About the Authors:
Fidel Vallejo
Roles: Conceptualization, Data curation, Formal analysis, Funding acquisition, Investigation, Methodology, Validation, Visualization, Writing – original draft, Writing – review & editing
E-mail: [email protected]
Affiliations: Industrial Engineering, National University of Chimborazo, Riobamba, Ecuador, Particulas Environmental Engineering and Management, Chile
ORICD: https://orcid.org/0000-0001-5835-298X
Diana Yánez
Roles: Project administration, Resources, Writing – original draft, Writing – review & editing
Affiliation: Agroindustrial Engineering, National University of Chimborazo, Riobamba, Ecuador
Patricia Viñán-Guerrero
Roles: Visualization, Writing – original draft, Writing – review & editing
Affiliation: Engineering Faculty, National University of Chimborazo, Riobamba, Ecuador
Luis A. Díaz-Robles
Roles: Project administration, Resources, Software, Writing – original draft
Affiliation: Particulas Environmental Engineering and Management, Chile
Marcelo Oyaneder
Roles: Software, Validation, Visualization, Writing – original draft
Affiliations: Particulas Environmental Engineering and Management, Chile, Chemical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile
ORICD: https://orcid.org/0009-0004-0333-7845
Nicolás Reinoso
Roles: Data curation, Formal analysis, Investigation, Methodology
Affiliation: Chemical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile
Luna Billartello
Roles: Formal analysis, Investigation, Methodology, Resources, Validation
Affiliation: Chemical Engineering Department, Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile
Andrea Espinoza-Pérez
Roles: Validation, Visualization, Writing – original draft, Writing – review & editing
Affiliations: Program for the Development of Sustainable Production Systems (PDSPS), Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile, Industrial Engineering Department, Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile
ORICD: https://orcid.org/0000-0002-6362-9100
Lorena Espinoza-Pérez
Roles: Software, Supervision, Visualization, Writing – original draft, Writing – review & editing
Affiliations: Program for the Development of Sustainable Production Systems (PDSPS), Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile, Industrial Engineering Department, Faculty of Engineering, University of Santiago of Chile, Estación Central, Santiago, Chile
Ernesto Pino-Cortés
Roles: Supervision, Validation, Visualization, Writing – original draft, Writing – review & editing
Affiliation: Escuela de Ingeniería Química, Pontificia Universidad Católica de Valparaíso, Valparaíso, Chile
ORICD: https://orcid.org/0000-0001-5133-2812
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
[/RAW_REF_TEXT]
1. Mahendra H. N. et al., “Assessment and Prediction of Air Quality Level Using ARIMA Model: A Case Study of Surat City, Gujarat State, India,” Nat. Environ. Pollut. Technol., vol. 22, no. 1, pp. 199–210, Mar. 2023,
2. Manisalidis I., Stavropoulou E., Stavropoulos A., and Bezirtzoglou E., “Environmental and Health Impacts of Air Pollution: A Review,” Front. Public Heal., vol. 8, p. 505570, Feb. 2020, pmid:32154200
3. IQAir, “World Air Quality Report,” 2021.
4. Shams S. R. et al., “Assessing the effectiveness of artificial neural networks (ANN) and multiple linear regressions (MLR) in forcasting AQI and PM10 and evaluating health impacts through AirQ+ (case study: Tehran),” Environ. Pollut., vol. 338, p. 122623, Dec. 2023, pmid:37806430
5. Rossi D., Mascolo A., Mancini S., Breton J. G. C., Breton R. M. C., and Guarnaccia C., “Modelling and Forecast of Air Pollution Concentrations during COVID Pandemic Emergency with ARIMA Techniques: the Case Study of Two Italian Cities,” WSEAS Trans. Environ. Dev., vol. 19, pp. 151–162, Feb. 2023,
6. WHO, “World health statistics 2022: monitoring health for the SDGs, sustainable development goals,” 2022.
7. IQAir, “World’s most polluted cities,” 2022.
8. Alam M. A., Mukherjee A., Bhattacharya P., and Bundschuh J., “An appraisal of the principal concerns and controlling factors for Arsenic contamination in Chile,” Sci. Rep., vol. 13, no. 1, p. 11168, Jul. 2023, pmid:37429943
9. Varela D., Corrales C., Corrales T., Salazar F., Solís C., and Yáñez-Figueroa C., “Asociación entre los niveles de NO, NO2, SO2, O3, CH4 en el aire y las tasas de hospitalización del Hospital Adriana Cousiño de Quintero durante los años 2012 al 2018,” Rev. ANACEM, vol. 14, no. 2, pp. 29–42, 2020, Accessed: Jun. 30, 2022. [Online]. Available: https://revista.anacem.cl/wp-content/uploads/2021/05/Asociacion-entre-los-niveles-de-NO-NO2-SO2-O3-CH4-en-el-aire-y-las-tasas-de-hospitalizacion-del-Hospital-Adriana-Cousino-de-Quintero-durante-los-anos-2012-al-2018.pdf
10. Álamos N. et al., “High resolution inventory of atmospheric emissions from transport, industrial, energy, mining and residential sectors of Chile Earth System Science Data Discussions,” Earth Syst. Sci. Data, 2021,
11. Díaz-Robles L. A. et al., “A hybrid ARIMA and artificial neural networks model to forecast particulate matter in urban areas: The case of Temuco, Chile,” Atmos. Environ., vol. 42, no. 35, pp. 8331–8340, Nov. 2008,
12. Ministry of Environment, The geographical area that includes the city of Coyhaique and its surrounding area is declared an area saturated by Respirable Particulate Material MP10, as a daily and annual concentration, in accordance with the polygon indicated. Chile: https://bcn.cl/2kh0n, 2012.
13. Ministry of Environment, The geographical area that includes the city of Coyhaique and its surrounding area is declared a saturated zone for Fine Respirable Particulate Matter MP2.5, as a 24-hour concentration. Chile: https://bcn.cl/2kh0m, 2016.
14. Ministry of Environment, Establishes Atmospheric Decontamination Plan for the city of Coyhaique and its surrounding area. Chile: https://bcn.cl/2kgyh, 2017.
15. Perez P., Menares C., and Ramirez C., “PM2.5 Forecasting in the most polluted city in South America,” in WIT Transactions on Ecology and the Environment, vol. 230, Syngellakis S, Ed. WIT Press, 2018, pp. 199–204. https://doi.org/10.2495/AIR180181
16. Muñoz-Ibáñez F. G. and Cáceres-Lillo D. D., “Impacto del recambio de tecnología de calefacción en la concentración atmosférica por MP2,5 y en las admisiones por urgencias respiratorias en Coyhaique, Chile,” Cad. Saude Publica, vol. 36, no. 6, 2020, pmid:32609172
17. Perez P., Menares C., and Ramírez C., “PM2.5 forecasting in Coyhaique, the most polluted city in the Americas,” Urban Clim., vol. 32, p. 100608, Jun. 2020,
18. Zhang M. et al., “Characters of Particulate Matter and Their Relationship with Meteorological Factors during Winter Nanyang 2021–2022,” Atmosphere (Basel)., vol. 14, no. 1, p. 137, Jan. 2023,
19. Solís R. et al., “Long-term airborne particle pollution assessment in the city of Coyhaique, Patagonia, Chile,” Urban Clim., vol. 43, p. 101144, May 2022,
20. Valenzuela-Fuentes K., Alarcón-Barrueto E., and Torres-Salinas R., “From Resistance to Creation: Socio-Environmental Activism in Chile’s ‘Sacrifice Zones,’” Sustainability, vol. 13, no. 6, p. 3481, Mar. 2021,
21. Peragallo R., “The State Production of Sacrifice Zones in Chile: An In-Depth Study of the Quintero-Puchuncavi Case,” Pontificia Universidad Catolica de Chile, Santiago, 2020.
22. Pino-Cortés E., Carrasco S., Acosta J., de Almeida Albuquerque T. T., Pedruzzi R., and Díaz-Robles L. A., “An evaluation of the photochemical air quality modeling using CMAQ in the industrial area of Quintero-Puchuncavi-Concon, Chile,” Atmos. Pollut. Res., vol. 13, no. 3, p. 101336, Mar. 2022,
23. Seguel R. J. et al., “Volatile organic compounds measured by proton transfer reaction mass spectrometry over the complex terrain of Quintero Bay, Central Chile,” Environ. Pollut., vol. 330, p. 121759, Aug. 2023, pmid:37146872
24. Carreño G., López-Cortés X. A., and Marchant C., “Machine Learning Models to Predict Critical Episodes of Environmental Pollution for PM2.5 and PM10 in Talca, Chile,” Mathematics, vol. 10, no. 3, p. 373, Jan. 2022,
25. Issa Zadeh S. B. and Garay-Rondero C. L., “Enhancing Urban Sustainability: Unravelling Carbon Footprint Reduction in Smart Cities through Modern Supply-Chain Measures,” Smart Cities, vol. 6, no. 6, pp. 3225–3250, Nov. 2023,
26. Herrera P., Rojo J., and Scapini V., “Relationship between pollution levels and poverty: regions of Antofagasta, Valparaiso and Biobio, Chile,” Int. J. Energy Prod. Manag., vol. 7, no. 2, pp. 176–184, Jul. 2022,
27. Aladağ E., “Forecasting of particulate matter with a hybrid ARIMA model based on wavelet transformation and seasonal adjustment,” Urban Clim., vol. 39, Sep. 2021,
28. Pakrooh P. and Pishbahar E., “Forecasting Air Pollution Concentrations in Iran, Using a Hybrid Model,” Pollution, vol. 5, no. 4, pp. 739–747, Oct. 2019,
29. Kaur J., Parmar K. S., and Singh S., “Autoregressive models in environmental forecasting time series: a theoretical and application review,” Environ. Sci. Pollut. Res., vol. 30, no. 8, pp. 19617–19641, Jan. 2023, pmid:36648728
30. Kożuch A., Cywicka D., and Adamowicz K., “A Comparison of Artificial Neural Network and Time Series Models for Timber Price Forecasting,” Forests, vol. 14, no. 2, p. 177, Jan. 2023,
31. Yang G. R. and Wang X.-J., “Artificial Neural Networks for Neuroscientists: A Primer,” Neuron, vol. 107, no. 6, pp. 1048–1070, Sep. 2020, pmid:32970997
32. Jothilakshmi S. and Gudivada V. N., “Large Scale Data Enabled Evolution of Spoken Language Research and Applications,” 2016, pp. 301–340.
33. Alkabbani H., Ramadan A., Zhu Q., and Elkamel A., “An Improved Air Quality Index Machine Learning-Based Forecasting with Multivariate Data Imputation Approach,” Atmosphere (Basel)., vol. 13, no. 7, p. 1144, Jul. 2022,
34. Agarwal A. and Sahu M., “Forecasting PM2.5 concentrations using statistical modeling for Bengaluru and Delhi regions,” Environ. Monit. Assess., vol. 195, no. 4, 2023, pmid:36949261
35. Ministry of Environment, “Sistema de Información Nacional de Calidad del Aire,” https://sinca.mma.gob.cl/, 2021.
36. Hyndman R. J. and Khandakar Y., “Automatic Time Series Forecasting: The forecast Package for R,” J. Stat. Softw., vol. 27, no. 3, 2008,
37. Kuhn M., “Building Predictive Models in R Using the caret Package,” J. Stat. Softw., vol. 28, no. 5, 2008,
38. Kuhn M. and Johnson K., Applied Predictive Modeling. London: Springer, 2013. https://doi.org/10.1007/978-1-4614-6849-3
39. Carslaw D. C. and Ropkins K., “openair—An R package for air quality data analysis,” Environ. Model. Softw., vol. 27–28, pp. 52–61, Jan. 2012,
40. van Greunen J. and Heymans A., “Determining the Impact of Different Forms of Stationarity on Financial Time Series Analysis,” in Business Research, Singapore: Springer Nature Singapore, 2023, pp. 61–76. https://doi.org/10.1007/978-981-19-9479-1_4
41. Zhang X., Ding C., and Wang G., “An Autoregressive-Based Kalman Filter Approach for Daily PM2.5 Concentration Forecasting in Beijing, China,” BIG DATA, 2023, pmid:37134205
42. Tsan Y.-T., Chen D.-Y., Liu P.-Y., Kristiani E., Nguyen K. L. P., and Yang C.-T., “The Prediction of Influenza-like Illness and Respiratory Disease Using LSTM and ARIMA,” Int. J. Environ. Res. Public Health, vol. 19, no. 3, Feb. 2022, pmid:35162879
43. Mondal P., Shit L., and Goswami S., “Study of Effectiveness of Time Series Modeling (Arima) in Forecasting Stock Prices,” Int. J. Comput. Sci. Eng. Appl., vol. 4, no. 2, pp. 13–29, Apr. 2014,
44. Vallejo F., Díaz-Robles L. A., Vega R. E., and Cubillos F., “A novel approach for prediction of mass yield and higher calorific value of hydrothermal carbonization by a robust multilinear model and regression trees,” J. Energy Inst., vol. 93, no. 4, 2020,
45. Espinoza Pérez L., Espinoza Pérez A., Pino-Cortés E., Vallejo F., and Díaz-Robles L. A., “An environmental assessment for municipal organic waste and sludge treated by hydrothermal carbonization,” Sci. Total Environ., vol. 828, p. 154474, Jul. 2022, pmid:35276176
46. Das R., Middya A. I., and Roy S., “High granular and short term time series forecasting of PM2.5 air pollutant—a comparative review,” Artif. Intell. Rev., vol. 55, no. 2, pp. 1253–1287, Feb. 2022,
47. Zafra C., Ángel Y., and Torres E., “ARIMA analysis of the effect of land surface coverage on PM10 concentrations in a high-altitude megacity,” Atmos. Pollut. Res., vol. 8, no. 4, pp. 660–668, Jul. 2017,
48. Wu Y., Li R., Cui L., Meng Y., Cheng H., and Fu H., “The high-resolution estimation of sulfur dioxide (SO2) concentration, health effect and monetary costs in Beijing,” Chemosphere, vol. 241, p. 125031, Feb. 2020, pmid:31610459
49. Cereceda-Balic F. et al., “Emission factors for PM 2.5, CO, CO 2, NO x, SO 2 and particle size distributions from the combustion of wood species using a new controlled combustion chamber 3CE,” Sci. Total Environ., vol. 584–585, no. x, pp. 901–910, 2017, pmid:28189303
50. Waseem K. H. et al., “Forecasting of Air Quality Using an Optimized Recurrent Neural Network,” Processes, vol. 10, no. 10, p. 2117, Oct. 2022,
51. Borah J. et al., “AiCareBreath: IoT-Enabled Location-Invariant Novel Unified Model for Predicting Air Pollutants to Avoid Related Respiratory Disease,” IEEE Internet Things J., vol. 11, no. 8, pp. 14625–14633, Apr. 2024,
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2025 Vallejo et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
In this comprehensive analysis of Chile’s air quality dynamics spanning 2016 to 2021, the utilization of data from the National Air Quality Information System (SINCA) and its network of monitoring stations was undertaken. Quintero, Puchuncaví, and Coyhaique were the focal points of this study, with the primary objective being the construction of predictive models for sulfur dioxide (SO2), fine particulate matter (PM2.5), and coarse particulate matter (PM10). A hybrid forecasting strategy was employed, integrating Autoregressive Integrated Moving Average (ARIMA) models with Artificial Neural Networks (ANN), incorporating external covariates such as wind speed and direction to enhance prediction accuracy. Vital monitoring stations, including Quintero, Ventanas, Coyhaique I, and Coyhaique II, played a pivotal role in data collection and model development. Emphasis on industrial and residential zones highlighted the significance of discerning pollutant origins and the influence of wind direction on concentration measurements. Geographical and climatic factors, notably in Coyhaique, revealed a seasonal stagnation effect due to topography and low winter temperatures, contributing to heightened pollution levels. Model performance underwent meticulous evaluation, utilizing metrics such as the Akaike Information Criterion (AIC), Ljung-Box statistical tests, and diverse statistical indicators. The hybrid ARIMA-ANN models demonstrated strong predictive capabilities, boasting an R2 exceeding 0.90. The outcomes underscored the imperative for tailored strategies in air quality management, recognizing the intricate interplay of environmental factors. Additionally, the adaptability and precision of neural network models were highlighted, showcasing the potential of advanced technologies in refining air quality forecasts. The findings reveal that geographical and climatic factors, especially in Coyhaique, contribute to elevated pollution levels due to seasonal stagnation and low winter temperatures. These results underscore the need for tailored air quality management strategies and highlight the potential of advanced modeling techniques to improve future air quality forecasts and deepen the understanding of environmental challenges in Chile.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer