Air quality models, such as the Community Multiscale Air Quality (CMAQ), Weather Research and Forecasting-Chemistry (WRF-chem), the Comprehensive Air Quality Model with Extensions (CAMx), and the Nested Air Quality Prediction Modeling System (NAQPMS) models, are effective tools for atmospheric pollution analysis. They are widely used in the prediction and analysis of the temporal and spatial distribution of atmospheric pollutants. However, there are uncertainties in terms of emission sources, initial conditions (ICs), boundary conditions (BCs), and chemical processes that affect the accuracy of these models (Bouttier & Courtier, 1999; Hanna et al., 1998; Moore & Londergan, 2001). In particular, the ICs input the atmospheric state at a specified time into the model, playing an important role in air quality prediction. Data assimilation (DA), which can incorporate observations into the model, has been shown to be an effective method to improve the ICs.
Since the assimilation of the initial field is significant for building an accurate analysis field, which is crucial for improving the short-term forecast and analyzing the vertical diffusion and extinction of aerosol, many studies and multiple DA methods have been conducted and have obtained significant assimilation effects (Adhikary et al., 2008; Dai et al.,2019; Gao et al., 2017; Hirtl et al., 2014; Kumar et al., 2019; Peng et al., 2018; Werner et al., 2019; Yin et al., 2016). The variational methods aim to obtain the optimal estimate by establishing the adjoint of forward model and minimizing a cost function (Bocquet et al., 2015). Three-dimensional/four-dimensional variational methods (3D/4D VAR) are widely used in research and operational forecast of air pollution. Liu et al. (2011) simulated a dust storm process in eastern China in 2010 using the WRF-chem model and the gridpoint statistical interpolation (GSI) 3D variational data assimilation system. The total AOD (aerosol optical depth) retrieval products from the moderate resolution imaging spectroradiometer (MODIS) were assimilated. The results indicated that the AOD of the model was improved significantly with assimilation, with the bias (BIAS) and root mean square error (RMSE) being improved by 20% on average compared to the results without assimilation. Li et al. (2013) established a three-dimensional variational (3DVAR) system for the MOSAIC 4bin aerosol scheme in WRF-Chem and improved the prediction of PM2.5 by assimilating the surface observation in Los Angeles basin. The result shows the correlation coefficient of PM2.5 increased from 0.51 to 0.86, and the RMSE of PM2.5 decreased from 11 to 4.21 μg/m3 after assimilation. Yumimoto et al. (2008) developed a 4DVAR system to assimilate the vertical profiles of the dust extinction coefficients derived from NIES Lidar. It is found that the RMSE of dust aerosol optical thickness decrease by 35%–40% and the dust emission improve 57.8% with DA in Seoul, Matsue, and Toyama. The Filtering methods such as extended/ensemble Kalman filter (EKF/EnKF) are the sequential assimilation methods. Pagowski et al. (2012) established an EnKF system for MOSAIC scheme of WRF-chem model and improved the surface aerosol concentrations over the United States. Yumimoto et al. (2016) used Local Ensemble Transform Kalman Filter (LETKF) method to establish a satellite AOD assimilation system; the results show that the assimilation improved the dust emission over the Gobi Desert and Mongolia and the model with DA can reproduced the observed transboundary pollution that was not captured without DA. Wu et al. (2008) compared four kinds of assimilation algorithms (including the OI method, the reduced-rank square root Kalman filter (RRSQRT), and the EnKF, and 4DVAR) on the simulation of ozone by the Polyphemus model. The results showed that all four assimilation schemes improved the accuracy of one-day forecasting. Besides, some hybrid methods combining the variational method and the filtering method have also been implemented. Schwartz et al. (2014) established a hybrid variational-ensemble DA system and used it to assimilate AOD and PM2.5. By comparing it with 3DVAR and EnSRF, they found that with the hybrid data assimilation system, the WRF-chem model improves effectively the forecast of aerosols. Lee et al. (2017) employed Maximum Likelihood Ensemble Filter (MLEF) to establish a hybrid variational-ensemble DA system for WRF-chem and assimilated the meteorological and aerosol observations. The results indicated that the RMSE can decrease 21.6% for the 24-h forecast with DA.
Although there have been some achievements in the field of atmospheric chemical data assimilation, they are insufficient for assimilation studies based on the CMAQ model in China. Most of the studies aimed to assimilate the gas variable or a single aerosol variable. Since the CMAQ model has been demonstrated to be a creditable research tool in Asia (Liu et al. 2010a, 2010b; Wang et al., 2008, 2014), it is essential to research assimilation systems for the CMAQ model. Huang et al. (2014) developed a CO2 DA system under an EnKF framework based on the CMAQ model and carried out East Asia CO2 DA experiments. The results indicate that the assimilation of surface CO2 observations can effectively improve the simulation of the CO2 concentration by CMAQ, with the assimilation effect primarily occurring at the assimilation stations and downwind areas. The BIAS and RMSE of the daily average CO2 concentration at the assimilation stations are reduced by 1.00 and 1.83 ppm, respectively. In terms of aerosol assimilation based on the CMAQ model, some research assimilated PM2.5 by merged aerosol variables in the DA system. Tang et al. (2015) merged aerosol variables from the AERO5 module as PM2.5, and employed the optimal interpolation (OI) method to assimilate surface observation and MODIS AOD data. The results indicate that assimilation effectively reduces the mean bias of aerosols by the model. Feng et al. (2018) combined the organic carbon (OC), elemental carbon (EC), sulfate, nitrate, ammonium, sea salt, and unspeciated aerosols in the CMAQ model as a PM2.5 variable and then assimilated them in the GSI system. By assimilating the surface PM2.5 observations, they improved the ability of the CMAQ model to forecast the PM2.5 concentration in China in the winter of 2015. Lee et al. (2020) considered the uncertainties in anthropogenic emissions and estimated a new background error covariance (BEC) for PM2.5. Compared with the BEC calculated by the traditional NMC (National Meteorological Center) method, they found that the assimilation effect was more significant by using the new BEC assimilating the surface PM2.5 in East China and South Korea. Note that the PM10 concentration is not used in these studies, since there is only one aerosol variable (PM2.5) in the DA. As a significant pollutant in China, it is necessary to add the coarse particle concentration to the data assimilation system as an independent variable.
The Sichuan Basin (SCB) is one of the most air-polluted regions in China, extending in the north to the Yunnan-Guizhou Plateau and east to the Qinghai-Tibet Plateau. Surrounding by high elevations, atmospheric pollution events occur frequently in the Sichuan Basin region due to its special topographic conditions and its meteorological conditions of low wind speed and high humidity (Liu et al., 2016; Luo et al., 2014; Zhao et al., 2018). In addition, the Chengdu and Chongqing in the Sichuan Basin, united as a Chengdu-Chongqing city cluster, are the economic center of and one of the most densely populated regions in Western China. Therefore, it is worthwhile to conduct research on air pollution and air quality forecasting in the Sichuan Basin.
In this study, we combined the aerosol species of the CMAQ model into PM2.5 and PM2.5-10, and established a 3DVAR assimilation system for the two variables. To analyze and evaluate the improvement of the assimilation system, we designed two parallel experiments for a heavy haze episode of January 13–16, 2018 in the Sichuan Basin. The contents of this study are organized as follows. Section 2 describes the configuration of the model and the 3DVAR system. Section 3 introduces a description of the experimental procedure and relevant statistical indicators in this study. Section 4 evaluates the improvement of the data assimilation system for the aerosols. The summary and a discussion of the experimental results are provided in Section 5.
Materials and Data Model ConfigurationThe WRF-CMAQ adopted in this study is a widely used “offline” regional air quality model coupling the WRF (version 3.7) and CMAQ (version 5.1) models. The WRF model is a nonhydrostatic mesoscale model for meteorological forecasting (Skamarock & Klemp, 2008) developed by the U.S. National Centers for Environmental Prediction (NECP) and the U.S. National Center for Atmospheric Research (NCAR). It contains a variety of physical parameterization schemes and is widely used in meteorological research and operational forecasting (Grell et al., 2005). The CMAQ model is a third-generation air quality forecast model developed by the US Environmental Protection Agency (EPA). It comprehensively considers a variety of pollutants in the atmosphere and their complex chemical processes (including various gas-phase chemical processes, liquid-phase chemical processes, heterogeneous chemical processes, and so on). It is suitable for air quality forecasting, and atmospheric chemistry research (Eder & Yu, 2006; Tesche et al., 2006; Wyat et al., 2017). In this study, WRF provides the meteorological field data to drive CMAQ in the simulation of the chemical field with hourly updating of the meteorological field data.
Figure 1 depicts the model domain of the experiment. For WRF, two nested domains are employed, where the outer domain covers most of the East Asia region, and the inner domain covers mainly Southwest China which is centered in the SCB region. As depicted in Figure 1b, the SCB region in the red box includes the total Sichuan Basin and surrounding areas. The observation data of this area are influential for adjusting assimilation in the SCB region. The outer domain is centered on the north of Chengdu (103.3°E, 33.3°N), with a 9 km horizontal resolution (540 × 513 grids). The inner domain is centered on the east of Chengdu (105.086°E, 30.387°N), with a 3 km horizontal resolution (472 × 442 grids). For the CMAQ model, it only runs over the inner domain, and the chemical field is driven by the meteorological data of the WRF forecasts. There are 31 and 16 layers in the vertical directions of WRF and CMAQ, respectively. These layers are uneven, and the vertical resolution decreases as the altitude increases. In this study, the Lin scheme (Gustafson et al., 2007) was employed as the microphysics scheme of WRF, the RRTM scheme (Mlawer et al., 1997) as the longwave radiation scheme, the Goddard scheme (Chou & Suarez, 1994) as the shortwave radiation scheme, the MYJ scheme (Mellor & Yamada. 1982) as the surface-layer scheme, and the Noah scheme (Chen et al., 2010) as the land surface parameterization scheme. For the CMAQ model, the chemical mechanism of cb05tucl_ae6_aq was employed. This mechanism simultaneously considers liquid-phase and aerosol chemistry and uses the QSSA algorithm for solutions (Binkowski & Roselle. 2003; Cheng et al., 2010).
Figure 1. The model domains (The bar represents the terrain altitude, the SCB represent the Sichuan Basin region, the black character is the name of provinces, and the ◎ represents the provincial capital).
In our experiment, the final analysis (FNL) data published by NCAR were employed to provide ICs and BCs for the WRF meteorological model. The FNL data are at a spatial resolution of 1° × 1° and are updated once every 6 h. The emission source data from the MEIC (Multiresolution Emission Inventory for China) of 2012 are used in our experiment (Li et al., 2017; Zheng et al., 2018). This emission inventory provides nationwide emissions data, including those from electric power generation, industry, civil works, transportation, and agriculture, and supports the outputs of five chemical mechanisms, including SAPRC99, SAPRC07, CB05, CBIV, and RADM2 (He, 2012; Zhang et al., 2009).
Three-Dimensional Variational Data Assimilation System Formulation of 3DVARIn this study, we established a 3DVAR system using the scheme of Li et al. (2008) and Zang, Li, et al. (2016) to assimilate the aerosol variables of the CMAQ model. Considering that the ENKF algorithm requires a great number of ensemble members (approximately 30–50 members) to estimate the reasonable mode errors, which consumes large amounts of computer time and computing resources. The 4DVAR algorithm has to write the accompaniment of the whole chemical process, and all the four-dimensional variables in the assimilation window are required in the data assimilation system for iteration; thus, the amount of calculation is also significantly increased. Considering the calculation cost, calculation speed and assimilation effect, we prefer to use the 3DVAR system in our experiment rather than ENKF or 4DVAR in the experiment. By constructing and solving the cost function, the optimal incremental field of the state variables can be obtained in the 3DVAR system. The cost function can be expressed as: [Image Omitted. See PDF]
Here, is the true state variable of the atmosphere, represents the vector of the background field, including PM2.5 and PM2.5-10 in this study. is the observation operator that converts the model vector to the observed variables, represents the observed data to be assimilated, and denote the BEC matrix and the observation error covariance matrix, respectively. In practical application, we consider both the model and observation, iteratively calculating the minimum value of the cost function.
Conversion From Model Variables to Control VariablesIn the CMAQ model, three lognormal distribution modes (Aitken(i), accumulation(j), and coarse(k) modes) are applied to represent the size distributions of aerosols (Byun & Schere, 2006). It is necessary to convert the model variables to PM2.5 and PM2.5-10 before assimilated. Most studies used the sum of particulate matter in the Aitken and accumulation modes (PMi + j) as PM2.5. However, PMi + j would differ significantly from PM2.5 because of the variation in relative humidity (Jiang et al., 2006; Nolte et al., 2015).
To directly compare the concentration of aerosols with the measurement data of PM2.5 and PM2.5-10, three coefficients, namely, PM25AT, PM25AC, and PM25CO, are used to calculate the concentration of PM2.5, which represent the proportion of PM2.5 in the three respective modes in the CMAQ model. Note that these three coefficients are between 0 and 1 and vary with time and position. In this study, a total of 47 aerosol species, including EC, OC, sulfate, and nitrate, from the AERO6 aerosol module are used to calculate the mass concentrations of PM2.5 and PM2.5-10. The aerosol species are shown in Table S1. Similar to Tang et al. (2017), the PM2.5 and PM2.5-10 can be expressed as: [Image Omitted. See PDF] [Image Omitted. See PDF]where , and are the concentrations of Aitken mode, accumulation mode and coarse mode aerosols, respectively. , , and are the proportion coefficients of PM2.5 in Aitken mode, accumulation mode, and coarse mode, respectively. Accordingly, , , and are the proportion coefficients of PM2.5-10 in Aitken mode, accumulation mode, and coarse mode. Equations 2 and 3 are implemented to calculate the PM2.5 and PM2.5-10 of the background field from the variables of the CMAQ model. In contrast, the increments of PM2.5 and PM2.5-10 obtained from the 3DVAR system are distributed proportionally into variables of the CMAQ model using the proportion coefficients of Equations 2 and 3.
Background Error Covariance StatisticsBEC plays an essential role in the 3DVAR, representing the model errors and influencing the assimilation effect. In this study, the BEC statistics are evaluated with the NMC method proposed by Parrish and Derber (1992). The 24-h and 48-h continuous simulations of January 2018 were conducted, with differences between the two simulations at the same time assumed to be the BEC statistics. Since the BEC matrix is too large to handle in practice, we simplified the calculation of the BEC in this experiment based on the scheme of Li et al. (2013) by decomposing it into a standard deviation matrix and a correlation matrix: [Image Omitted. See PDF]where is the background error standard deviation matrix and is the correlation matrix. The complete correlation matrix is the square of the variable size, calculated as: [Image Omitted. See PDF]where , , represent the size of grid points in the three dimensions respectively, and is the number of the variables. In our experiment, the PM2.5 and PM2.5-10 include 47 kinds of aerosol species in the AERO6 aerosol chemical mechanism in the CMAQ model. The calculation cost of multivariate assimilation requires large amounts of computer time and memory as the number of variables increases. Thus, we combined the aerosol species into PM2.5 and PM2.5-10 and established the 3DVAR data assimilation system for the two variables. As two independent control variables, PM2.5 and PM2.5-10 are uncorrelated. Therefore, the correlation matrix can be simplified to a diagonal matrix, given as: [Image Omitted. See PDF]
The correlation matrix of each variable can be further split using the Kronecker product method: [Image Omitted. See PDF]where , , and represent the , and correlation coefficient matrix in the , and directions, respectively. The values , and are the numbers of grid points in the three-dimensional directions. The operator is the tensor product, which is employed to construct a three-dimensional matrix from three one-dimensional matrices. According to Zang, Hao, et al. (2016), the and are assumed to be horizontally isotropic which means that the correlation coefficient is the same in both the X and Y directions. Thus, and are collectively called the horizontal correlation coefficient , it can be expressed as: [Image Omitted. See PDF]
In Equation 8, and represent the two points in the horizontal direction; the correlation between the two points decreases with distance. The correlation length scale is the only parameter that must be estimated in Equation 8, defined as the distance of the two points and between which the correlation decreases to .
To analysis the BEC of this system, the standard deviation of the BEC, as well as the horizontal and vertical length scales of aerosols, was obtained statistically. Figure 2a shows the vertical profiles of the standard deviation of the BEC. It is found that the error standard deviation of BEC of PM2.5 is an order of magnitude greater than that of PM2.5-10 in the low level of the atmosphere. For one thing, the concentration of PM2.5 is much higher than PM2.5-10 in this region. For another, the results also indicate that the forecast accuracy of the former is relatively low, leading to a large standard deviation. In addition, the curves of standard deviations of PM2.5 and PM2.5-10 in Figure 2a are all declining rapidly above the boundary layer (about 1 km). It is because most of aerosol is under the boundary layer and the concentration generally decreases rapidly above the boundary layer (Zang, Li, et al., 2016 ; Ma et al., 2018). It is worth noting that the standard deviation of BEC of PM2.5-10 reached its maximum at ∼1 km. A probable reason for this is that a part of PM2.5-10 originated from dust transported over long distances and accumulated near the boundary layer and above it, leading to poor model prediction skills at these altitudes. Figure 2b depicts the variation in the horizontal autocorrelations of PM2.5 and PM2.5-10, the black line represents the correlation length scale at the position . It shows that the horizontal autocorrelations of PM2.5 and PM2.5-10 decrease with distance, and the horizontal autocorrelation of PM2.5-10 attenuates more rapidly than that of PM2.5. At the position of 100 km, the horizontal autocorrelation of PM2.5-10 has been reduced to 0.2, while that of PM2.5 is still ∼0.4. This is probably because the particle size of PM2.5-10 is larger, and it is more prone to strong local deposition. The length scales are ∼18 and 12 km for PM2.5 and PM2.5-10, respectively, which is similar to those of Li et al. (2013) and Zang et al. (2016). The vertical autocorrelations of PM2.5 and PM2.5-10 are shown in Figures 2c and 2d, respectively. The higher autocorrelation of aerosols is mainly near the surface and the vertical autocorrelations of PM2.5 and PM2.5-10 both decrease with altitude. Compared to the vertical autocorrelation of PM2.5, the vertical autocorrelation of PM2.5-10 decreases faster, which is consistent with the trends of horizontal autocorrelation.
Figure 2. Background error covariation of PM2.5 and PM2.5-10. Panel (a): The vertical profiles of the standard deviation of the background errors for PM2.5 (blue line) and PM2.5-10 (red line). Panel (b): The horizontal autocorrelations of PM2.5 (blue line) and PM2.5-10 (red line), the black line represents the correlation length scale at the position e−1/2. Panel (c) and Panel (d): The vertical autocorrelations of PM2.5 and PM2.5-10, respectively.
The hourly observation data of PM2.5 and PM10 employed in the data assimilation system are from the real-time publishing platform of national urban air quality, the China Meteorological Data Sharing Network (
The observation errors are derived from the measurement error (observed value error) and the representative error (error of observation operator ), which affects the weight of observed data in the data assimilation system. The observation error is defined as: [Image Omitted. See PDF]where is the observation error of the assimilation variable, which is PM2.5 and PM2.5-10 in this experiment, is the measurement error, and is the representative error. The measurement error is the systematic error generated during monitoring by the instrument at each environmental monitoring station, which can be expressed as: [Image Omitted. See PDF]
Here, is the instrument error and is the observed value of particles. Currently, the oscillating balance detector is mainly used at each automatic air quality monitoring station to measure the quality of PM2.5 and PM10, and its instrument precision is = 1.5 μg/m3 (1-h average) (You et al., 2016). In addition to the instrumental error, the representative error needs to be taken into account. It is the error caused by converting the model variable to the observation variable (Schwartz et al., 2012) and can be expressed as: [Image Omitted. See PDF]where is the adjustment coefficient, is the mode resolution, and is the scale parameter of the observation station, which was taken to be 3 km following Feng et al. (2018).
Experimental DesignIn this study, two parallel experiments with and without assimilation, named the DA experiment and the control experiment, were carried out on a heavy haze episode that occurred in the SCB region from January 13 to 16, 2018. Similar to Pagowski et al. (2012), the WRF-CMAQ model was run out to continuous forecasting for 7 days to form a stable initial chemical field beginning from January 6, 2018. The experiments were then conducted from 12:00 UTC on January 12, 2018.
Figure 3 shows the flow chart of the experiments. In the control experiment (blue line), DA was not employed and the prediction of the control experiment was continuous during the pollution process. The CMAQ model was used for daily forecasting for 30 h starting from 00:00 UTC on January 13, 2018, while the WRF run forecast 12 h ahead of the CMAQ forecast to handle the spin-up problem of the meteorological field. The DA experiment (red line) was divided into two stages. First, the cycle assimilation process was performed from 00:00 to 06:00 UTC every day from January 13 to 16, 2018, so that the observed data of this period would effectively propagate as assimilated information in the model space. The concentrations of PM2.5 and PM2.5-10 in the background field were assimilated hourly by using the observed data at the corresponding moment in time to obtain a new analysis field. Second, after the completion of cycle assimilation, the model analysis field after assimilation at 06:00 UTC was taken as the initial field, and then a 24-h prediction was carried out.
Figure 3. Flow chart of control experiment and data assimilation (DA) experiment.
In the experiment, correlation coefficient (CORR), BIAS, and RMSE, were used as statistical indicators to evaluate the improvement of the assimilation on the model. The equations for calculating these statistical indicators are as follows: [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF] [Image Omitted. See PDF]
In Equations 12–14, is the concentration of observation data; is the concentration of model variables at the observation position obtained by grid-point interpolation; is the covariance of observation data and model variables; are the variances of the observation data and the model variables, respectively; and is the number of samples. Note that the number of assimilated samples for the period of the forecast is the total number of effective observation stations in the D02 region. However, for the evaluation of individual stations, the number of samples is the number of hourly forecasts, that is, 24. In Equations 15–17, the subscript DA denotes the data DA experiment and the subscript control denotes the control experiment. dCORR, dBIAS, and dRMSE represent the differences in CORR, BIAS, and RMSE between the DA experiment and the control experiment, respectively. Note that though the variables in the DA system are PM2.5 and PM2.5-10, but PM2.5 and PM10 are kept as evaluation variables to maintain consistency with the observation data and prior research.
Results Pollution Process AnalysisDuring January 13–16 2018, there were 250–260 effective observation stations in the D02 region after considering the quality control of the observation data. Figure 4 depicts the location and concentration of surface observation of PM2.5 (the triangles represent observation stations and the color means the concentration of PM2.5) and 10-m wind (10 m above the ground) at 06:00 (UTC the same below) every day from January 13 to 16, 2018. It is found that high concentrations of pollution were primarily distributed in the central and northern of the D02 region. The Guanzhong Plain in Shaanxi maintained the highest concentration of PM2.5 during the pollution process, reaching 300 μg/m3. Severe pollution also occurred in part of Hubei, as well as some regions of Henan and Shanxi from January 14, with PM2.5 concentrations reaching ∼200 μg/m3. In the SCB region, the high concentrations of PM2.5 (reaching 200 μg/m3) occurred from January 13 to 15 and decreased on January 16.
Figure 4. Concentrations of PM2.5 observation (µg/m3) on the surface and the wind field (m/s) of 10 m at 06:00 UTC from January 13–16 (a, b, c, d).
Coupled with the wind field, it is found that the pollution in the SCB region is related to the wind and its divergence during the pollution process. At 06:00 UTC on January 13 (Figure 4a), the 10-m wind speed was low and the convergence was weak which would result in the accumulation of PM2.5 in the SCB region. At 06:00 UTC on January 14 (Figure 4b) and January 15 (Figure 4c), PM2.5 accumulated in the northern and western of SCB, respectively because of wind transport. At 06:00 UTC on January 16 (Figure 4d), the PM2.5 concentration in western Sichuan was reduced, affected by strong westerly winds from the clean mountainous area. Meanwhile, in the southeast of Sichuan and the south of Chongqing, there was a weak divergence field that results in a repaid decrease of the PM2.5 concentration. The pollution distribution and evolution trend of PM10 (not shown) were similar to that of PM2.5, with heavy pollution also over the SCB region and affected by the wind field.
Analysis of Initial FieldsFigure 5 shows the average concentration increments of PM2.5 and PM10 at the lowest model level at 06:00 UTC from January 13 to 16. The color bar represents concentration increments, calculated by the analysis field minus the background field. It is obvious that the positive increments for PM2.5 cover most of the D02 region (Figure 5a), which implies that the DA significantly improved the underestimation for the concentration of PM2.5 in the control experiment, especially in the SCB, and areas of Shaanxi and Hubei. The negative increments are primarily distributed in the southern and western of the D02 region; this is related to the low concentration of PM2.5 in these areas. The increment distribution of PM10 (Figure 5b) is similar to to that of PM2.5 in the D02 region, and the value of PM10 increment is only a little bigger than that of PM2.5 increment. It indicates that the relatively small increments of PM2.5-10. The primary reason is that the concentration and the standard deviation of PM2.5-10 are all less than that of PM2.5.
Figure 5. The average analysis increments of PM2.5 (a) and PM10 (b) concentrations on the surface at 0600 UTC.
Figures 6a and 6b respectively present the scatter plots of PM2.5 and PM10 in the SCB at 0600 UTC every day. Similar to Figure 5, it also shows that the CMAQ model underestimated the concentration of PM2.5 and PM10 in the SCB region in the control experiment (blue dots), especially for the high concentrations. In contrast to the control experiment, the initial fields of PM2.5 and PM10 concentrations in DA experiments are close to the observation values (red dots). By calculating the CORR, BIAS, and RMSE of the initial field of PM2.5 and PM10, it can be found that in the DA experiment, the CORR and BIAS increased and the RMSE decreased significantly for both PM2.5 and PM10. Specifically, the CORR of PM2.5 and PM10 increased by 0.59 and 0.65, the BIAS increased by 82.29 and 125.41 μg/m3, and the RMSE declined by 73.69 and 116.30 μg/m3, respectively. It is evident that the adjustments in forecast skills for PM10 are more significant than those for PM2.5, reflecting the inaccuracy of the simulation especially the underestimation of concentration for PM2.5-10 in the CMAQ model.
Figure 6. The Scatter plot of the initial fields of PM2.5 (a) and PM10 (b) from the control and data assimilation (DA) experiments at 0600 UTC.
The forecast skills (including the mean concentration, CORR, BIAS, and RMSE) with time in the SCB region were evaluated using the hourly surface observation from 00:00 UTC January 13 to 00:00 UTC January 17 (Figure 7 and Figure 8). As described in Section 2.3, the control experiment is conducted daily from 00:00 UTC to 06:00 UTC of the next day during the period of forecast (blue line). It is different in that the first 6 h from 00:00 UTC to 06:00 UTC is the period of assimilation (red dashed line), then the model forecasts 24 h from the 06:00 UTC every day (red line) in the DA experiment. For PM2.5 (Figure 7), the improvement resulting from hourly assimilation is significant in the period of assimilation, which can continue at least 24 h in the period of forecast. As shown in Figure 7a, the mean concentration in the DA experiment is closer to the observation data (black line) than that of the control experiment from January 13 to 16. Despite the mean concentration of PM2.5 being overestimated in the DA experiment on January 17, assimilation still improves the three statistical indicators (including CORR, BIAS, RMSE), which are also significantly improved in the DA experiment. Compared to the three statistical indicators in the control experiment, CORR (Figure 7b), and BIAS (Figure 7c) are notably higher and the RMSE (Figure 7d) is lower in the DA experiments within the first 18 h of the forecast. In addition, the improvement of forecast skills for the PM2.5 forecast every 6 h were also calculated (Table S2). It is noted that the improvements in the forecast skills decrease gradually over the forecast time due to the effects of emissions and model processes (Kahnert, 2008). On average, the PM2.5 concentration increases by 53%, CORR increases by 118%, BIAS increases by 63%, and RMSE decreases by 24% within 24 h.
Figure 7. The mean concentration, correlation coefficient (CORR), bias (BIAS), and root mean square error (RMSE) of PM2.5 (a, b, c, d) forecast from the data assimilation (DA) experiment and control experiment in the Sichuan Basin (SCB).
Figure 8. Same as Figure 7, but for the mean concentration, correlation coefficient (CORR), bias (BIAS), and root mean square error (RMSE) of PM10 (a, b, c, d).
Similarly, Figures 7 and 8 illustrates the evolution of the forecast skills of PM10 in the SCB region. The adjustment of assimilation for PM10 is similar to that for PM2.5 and the forecast skills are also improved within 24 h. Note that the improvement of assimilation decreases faster than that of PM2.5 during the period of forecast. This is mainly because of the rapid local deposition of PM2.5-10, which corresponds to the relatively low autocorrelation of PM2.5-10 in the vertical and horizontal directions in Section 2.3.2. Within 24 h, it is noteworthy that on average the BIAS increases by 58% and RMSE decreases by 30%, while PM10 concentration increases by 74% and CORR increases by 193% (Table S3). In consideration of the low concentration and strong localization of PM2.5-10, the significant improvements in mean concentration and CORR of PM10 are probably because of the serious underestimation of PM2.5-10 in the control experiment.
The assimilation effect of individual stations on the lowest level of the model is also evaluated. Similar to Pagowski et al. (2010), the concentrations of PM2.5 and PM10 from the model are interpolated to the location of observation stations, and the differences in forecast skills are compared from stations between the control experiment and DA experiment within 24 h. Figure 9 illustrates the dCORR of PM2.5 and PM10 at each station on the hourly average, which is calculated by Equation 15. The positive value of dCORR of stations represents that assimilation improved the CORR at these stations. Most of the stations that have positive dCORR values are located in the SCB region, where low wind speed and high concentration of aerosols were maintained during the forecast period (Figure 4), leading to a sustained improvement of assimilation. Especially in Chongqing, dCORR values are generally higher than 0.5. For PM2.5 (Figure 9a), there are 84 stations for which dCORR is positive in the SCB region, accounting for 89% of the total of 94 observation stations. Compared with PM2.5, there are fewer PM10 stations for which dCORR is positive (Figure 9b), accounting for 85% of the total observation stations.
Figure 9. The difference in correlation coefficient (dCORR) of PM2.5 (a) and PM10 (b) at observation stations, from average hourly (0–24 h) forecasts of the data assimilation (DA) experiment and the control experiment.
Figure 10 is the same as Figure 9 but for the dBIAS of PM2.5 and PM10, calculated by Equation 16. The positive value of dBIAS also means an improvement in BIAS. Compared to the distribution of dCORR, dBIAS of PM2.5 and PM10 shows positive values in more stations. For both PM2.5 (Figure 10a) and PM10 (Figure 10b), the dBIAS values are positive for all of the stations in the SCB region. This indicates that the model consistently underestimates concentrations of aerosols in this region and that assimilation increases the concentrations, which is consistent with Figures 7c and 8c. In addition, the dBIAS of PM10 is higher than that of PM2.5 at the same station which results from the contribution of PM2.5-10.
Figure 10. Same as Figure 9, but the difference in bias (dBIAS) of PM2.5 (a), PM10 (b).
Figure 11 illustrates the dRMSE of PM2.5 and PM10 at each station on the hourly average, calculated by Equation 17. In contrast with dCORR and dBIAS, the negative values of dRMSE represent the improvement due to assimilation in the observation stations. It can be found that the dRMSE are negative in most stations for PM2.5 and PM10. The stations with negative values of dRMSE of PM2.5 and PM10 account for 87% (Fig. 11a) and 96% (Figure 11b) of the total stations in the SCB region, respectively. In summary, considering the overall performance of all three statistical indicators, the proportion of fully improved (dCORR/dBIAS is positive and dRMSE is negative) stations is 76% for PM2.5 and that for PM10 is 89% in the SCB region.
Figure 11. Same as Figure 9 but the diffrence in root mean square error (dRMSE) of PM2.5 (a) and PM10 (b).
In this study, a 3DVAR system for aerosols of the CMAQ model is established. PM2.5 and PM2.5-10, merged from 47 model variables of the AERO6 aerosol module of the CMAQ model, are employed as control variables in the 3DVAR system that can simultaneously assimilate observations of PM2.5 and PM10 concentrations. For the BEC of control variables, it is found that the standard deviation of PM2.5 is an order of magnitude higher than that of PM2.5-10 in the low level of the atmosphere. The CORR of PM2.5-10 attenuated faster than that of PM2.5 in the horizontal and vertical direction, which was attributable to the rapid local deposition of PM2.5-10.
To evaluate the influence of assimilation on the initial field and forecast field of the aerosol concentrations of the CMAQ model, experiments were conducted on a heavy haze episode in Sichuan Basin of China from January 13 to 16, 2018. Compared to the control experiment that underestimated the aerosol concentrations in the SCB region, the initial fields of PM2.5 and PM10 concentrations were significantly improved in the DA experiment. Relative to the control experiment, the CORR of PM2.5 and PM10 of the DA experiment increased by 0.59 and 0.65, the BIAS increased by 82.29 and 125.41 μg/m3, and the RMSE declined by 73.69 and 116.30 μg/m3, respectively. The forecasting skills of PM2.5 and PM10 are also analyzed in the SCB region from January 13 to 16. For PM2.5 and PM10, the mean concentration increased by 53% and 74%, CORR increased by 118% and 193%, BIAS increased by 63% and 58%, and RMSE decreased by 24% and 30% on a 24-h average, respectively. The improvement due to assimilation is significant and sustainable during the period of forecast. It is similar to previous studies in that the influence of assimilation can continue for 6–24 h and will gradually approach the control experiment (Cheng et al. 2019; Dumitrache et al., 2016; Jiang et al. 2013). The horizontal distribution of the assimilation effect is also analyzed using the three statistical indicators during the forecast period. The results indicate that 76% of the stations are fully improved for PM2.5 and 89% of stations for PM10 in the SCB region.
Furthermore, this study found that the improvement effect of assimilation on PM2.5 was slightly different from the effect on PM10 in terms of spatial and temporal distributions caused by the contribution of PM2.5-10 to PM10. In the period of forecast, the adjustment of the assimilation of PM10 declined faster than that of PM2.5 with time. This results from the rapid local deposition of PM2.5-10, leading to the limited spread of the improvement from assimilation.
In this research, the improvement of assimilation in the SCB region is significant, which is probably related to the special meteorological conditions in this basin. Wang et al. (2018) found that the high altitudes block surface winds, which may cause frequent air stagnation events that affect PM2.5 dispersion in the SCB region. In addition, a strong temperature inversion frequently appears above the basin in the winter; coupled with the low speed of the wind and the high concentration of pollutants, this leads to heavy pollution over the area (Miao et al., 2019; Ning et al., 2018). Under the environment of high concentration and poor diffusion conditions, the benefits of assimilation would be effectively maintained and produce a longer-term influence on the forecast.
Overall, the data assimilation system established in this study can effectively improve the initial field and the forecast field of the concentration of aerosols in the CMAQ model. However, our 3DVAR system still has some shortcomings. First, the control variables for assimilation are only PM2.5 and PM2.5-10. The increment fields of these two variables need to be assigned to model variables according to the original proportions after assimilation; this process may bring uncertainty to the 3DVAR system. Second, the BEC of different model aerosol variables (such as sulfate, nitrate, and ammonium salts) may be different, but the current system is unable to reflect this difference in detail. Third, we only assimilated the data from surface observation stations in this study. For most regions in Western China, surface observation stations are still sparse. In addition, some studies also certified the importance of assimilation of the initial aerosol vertical profile (Bao et al., 2019; Ha et al., 2020; Jiang et al., 2013; Zang et al., 2016). Thus, the development of satellite AOD, Lidar, and other multisource DA methods is necessary in the future. Last but not least, we only adjusted the initial field in this study. Although DA can improve the initial field significantly, the influence of ICs fades with time and the concentration field is primarily driven by emissions in the model (Sandu et al., 2011). Recently, there have been more and more studies for the optimization of emissions. The results have indicated that using the concentration constraint emissions can effectively improve the medium and long-term predictions (Chen et al., 2019; Huang et al., 2014; Peng et al., 2017). In the future, we will use variational data assimilation or EnKF to develop emission source assimilation techniques and further improve our data assimilation system.
AcknowledgmentsThis study was jointly funded by the National Key R & D Pilot Research Projects of China (2017FYC0209803, 2016YFC0203305) and National Natural Science Foundation of China (Grant No. 41775123, No. 41975167, No. 91644223). The authors acknowledge the China National Environmental Monitoring Center and the China Meteorological Data Sharing Network for providing surface observation data. The authors also thank the MEIC team for providing the anthropogenic prior emissions and NCEP FNL reanalysis data. The use of the WRF and CMAQ models, made available and supported by the NCAR and EPA, is gratefully acknowledged.
Data Availability StatementThe surface observation data are available at
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is published under http://creativecommons.org/licenses/by-nc-nd/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A three‐dimensional variational (3DVAR) data assimilation method for the aerosol variables of the community multiscale air quality (CMAQ) model was developed. This 3DVAR system uses PM2.5 and PM2.5‐10 (the difference between PM10 and PM2.5) as control variables and used the AERO6 aerosol chemical mechanism in the CMAQ model. Two parallel experiments (one with and one without data assimilation [DA]) were performed to evaluate the assimilating effects of surface PM2.5 and PM10 during a heavy haze episode from January 13 to 16, 2018 in the Sichuan Basin (SCB) region. The results show that simulations without DA clearly underestimated PM2.5 and PM10 concentrations, and the analysis field with aerosol DA is skillful at fitting the observations and effectively improving subsequent forecasts of PM2.5 and PM10. For the analysis fields of PM2.5 and PM10 after DA comparing with those without DA, the correlation coefficient (CORR) of PM2.5 and PM10 increased by 0.59 and 0.65, the bias (BIAS) increased by 82.29 and 125.41 μg/m3, and the root mean square error (RMSE) declined by 73.69 and 116.30 μg/m3, respectively. Improvement of subsequent 24‐h forecasts of PM2.5 and PM10 with DA is also significant. Statistical results of forecasting improvement with DA indicated that the CORR, BIAS, and RMSE for PM2.5 and PM10 at 78% and 89% of stations in the SCB region are improved, respectively. From the perspective of assimilation duration time, the improvement of PM2.5 and PM10 can be maintained for ∼24 h.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details


1 Collaborative Innovation Center on Forecast and Evaluation of Meteorological Disasters (CIC‐FEMD)/Key Laboratory for Aerosol‐Cloud Precipitation of China Meteorological Administration, School of Atmospheric Physics, Nanjing University of Information Science & Technology, Nanjing, China
2 Institute of Meteorology and Oceanography, National University of Defense Technology, Nanjing, China
3 State Key Lab of Severe Weather, Key Laboratory for Atmospheric Chemistry, Chinese Academy of Meteorological Sciences, Beijing, China
4 Institute of NBC Defense, Beijing, China
5 No. 32145 Unit of PLA, Xinxiang, China
6 No. 93263 Unit of PLA, Jinzhou, China
7 Henan Meteorological Station, Zhengzhou, China