Full Text

Turn on search term navigation

1 Introduction

The direct and indirect climate effects of atmospheric aerosols greatly depend on the particles' spatial distribution in the atmosphere and their climate-relevant properties, including their hygroscopicity, optical properties, and their ability to act as cloud condensation nuclei (CCN) and ice nuclei . These properties, in turn, are closely related to the aerosol mixing state . Aerosol mixing state refers to the way in which different aerosol chemical species are distributed among and within the aerosol particles . As shown in many observational field studies, atmospheric aerosols have complex mixing states , ranging between the two extremes of an “internal mixture”, where the composition of all particles within the population is identical (and equal to the bulk composition of the aerosol), and an “external mixture”, where each particle in a population consists of only a single species (which may be different for each particle).

This poses a unique challenge for the modeling of aerosols in Earth system models, which, for the sake of computational efficiency, represent aerosols by simplifying the true aerosol mixing state using various mixing-state-related assumptions. For example, bulk aerosol models predict the abundance of individual aerosol chemical species by tracking the species' mass concentrations, inherently treating the aerosol as external mixtures of, e.g., sulfate, black carbon, organic carbon, sea salt, and dust . Univariate sectional models are able to represent size-resolved composition but cannot resolve the diversity of the aerosol within a certain size range. For modal models, the ability to resolve mixing state depends on the definition and the placement of the modes. Different approaches for modal models have been developed, ranging from a small number of internally mixed, non-overlapping modes (e.g., three modes in MAM3, ; or CMAQv5.2, ) to a larger number of modes that may overlap in a given size range and separate out different aerosol mixtures (e.g., nine modes in MADE3, ; or 16 modes in MATRIX, ). For these multi-modal models, the aerosol processes of gas–aerosol partitioning and coagulation make it necessary to define rules for how the modes interact . Condensation of secondary aerosol on a mode reserved for a pure species (e.g., black carbon or dust) requires moving mass over to a mixed mode when a critical mass fraction of secondary aerosol is exceeded. Transfer terms due to coagulation of particles in different modes can be calculated analytically , and rules need to be defined regarding the destination mode after coagulation. Generally, the transfer of aerosol mass from smaller modes to larger modes during growth can lead to inaccuracies. The removal of particles due to scavenging by cloud activation is another issue that is difficult to reconcile. Hence, the choice of the number of modes, their compositions, and the criteria for transfer between modes are user-defined, which introduces structural uncertainty in aerosol simulations that still needs to be quantified.

Given that modal models are to some extent mixing-state-aware, the following question arises: how well do modal models represent mixing state? Due to the scarcity of relevant observational data, we are not yet at the point where we can comprehensively validate model output of aerosol mixing state as is done for other aerosol-related quantities, such as bulk mass concentrations or aerosol optical depth. However, higher-detail models can serve as benchmarks to perform a verification of simulated aerosol mixing state. This paper aims to verify the global distribution of aerosol mixing state represented by a modal model by using benchmark simulations from the particle-resolved stochastic aerosol model PartMC-MOSAIC . Our usage of the term “aerosol representation” in this paper encompasses the representation of processes that go along with the aerosol representation itself, since the two are in practice tightly coupled.

We used the aerosol mixing state index $χ$ as a metric to quantify aerosol mixing state. The mixing state index $χ$ can be interpreted as a label for particle populations to rigorously characterize where the population lies on the spectrum from external ( $χ = 0$ %) to internal ( $χ = 100$ %) mixture. This concept has been successfully applied to observational data and for error quantification studies . Particularly relevant for this work is the study by , which showed that assuming an internal mixture when the aerosol is actually not completely internally mixed can result in errors of up to 150 % in CCN predictions.

PartMC-MOSAIC tracks the composition of individual particles and therefore resolves aerosol mixing state explicitly . However, this modeling approach is computationally very expensive and therefore not practical for large-scale simulations of several months or years of simulation time. To estimate the global spatial distribution of mixing state, we recently developed a machine-learned (ML) model based on high-detail particle-resolved simulations that uses inputs that are known from global model simulations to predict $χ$ . In this paper, we use this ML model to predict the spatial distribution of the mixing index $χ$ and then compare the results with $χ$ values that are derived from the Community Earth System Model version 2 CESM2 version 2.1.0; using the four-mode version of the Modal Aerosol Module MAM4;.

This paper is organized as follows. In Sect. we introduce the setup of the Earth system model simulations. The definition of mixing state indices and the derivation of aerosol mixing state indices for modal models are given in Sect. . Section briefly describes the ML model generated with machine learning and particle-resolved modeling for estimating the benchmark aerosol mixing state indices. Section focuses on the comparison of mixing state indices from the particle-resolved and modal models, and Sect. summarizes our findings.

2 Global model simulations

Here we employed CESM2 to provide the global model simulation data. Specifically, we used the component set FHIST to set up the global simulations with aerosols. This component set represents a typical historical simulation in the Community Atmospheric Model CAM6; using an active atmosphere and land with prescribed sea-surface temperatures and sea-ice extent, as well as a 1 $^{\circ}$ finite-volume dycore with the forcing data available from 1979 to 2015.

MAM4 is the default aerosol module of this component set, which represents the aerosol size distribution with four lognormal modes (Aitken, accumulation, coarse, and primary carbon modes; ). MAM4 tracks six aerosol species, and these are distributed over the four modes as follows. The Aitken mode consists of dust, sulfate, secondary organic aerosol (SOA), and sea salt. The accumulation mode includes sulfate, SOA, sea salt, primary organic matter (POM), black carbon (BC), and dust. The coarse mode contains sulfate, dust, and sea salt. The primary carbon mode contains only BC and POM, which are supplied by primary aerosol emissions.

The choice of modes in MAM4 is motivated by the desire to treat the microphysical aging of the primary carbonaceous aerosols in the atmosphere similar to other modal models used in regional or global models . In MAM4, mass and number concentrations of BC and POM in the primary carbon mode are transferred to the accumulation mode by the processes of intermodal coagulation and condensation of SOA and sulfuric acid onto the primary carbon mode. The accumulation mode then represents aged BC and POM, as these species are internally mixed with other aerosol species. The MAM4 treatment of aging is critical for improving the long-range transport of carbonaceous aerosols to remote regions such as the polar region, which suffered from a low bias in a prior version of the model when only three internally mixed modes were used .

We ran the model for the year 2011 with 6 years (2005–2010) of spinup. The simulation was conducted at a resolution of 0.9 $^{\circ}$ latitude by 1.25 $^{\circ}$ longitude along with emission inventories from CMIP6 emissions . We stored the instantaneous outputs every 3 h during the simulation, which yields 2920 timestamps for each surface-layer grid cell for the entire year of simulation time. The surface layer was chosen to be in line with the PartMC-MOSAIC model scenarios that were used as training data for the ML models of mixing state indices (see Sect. ) and which were designed to represent conditions in the planetary boundary layer.

3 Aerosol mixing state indices: definition and calculation

3.1 Particle-based aerosol mixing state index

The mixing state index $χ$ quantifies where an aerosol population lies on the continuum from external to internal mixing – that is, how spread out the chemical species are over an aerosol population. We will focus here on the mixing state of submicron aerosols (PM $_{1.0}$ ) due to their relevance for light scattering and absorption and their contribution to CCN formation .

To summarize, the mixing state index $χ$ is given by the affine ratio of the average particle species diversity, $D_{α}$ , and bulk population species diversity, $D_{γ}$ , as

1 $χ = \frac{D_{α} - 1}{D_{γ} - 1} .$

The diversities $D_{α}$ and $D_{γ}$ are calculated as follows. First, the per-particle mixing entropies $H_{i}$ are determined for each particle by 2 $H_{i} = \sum_{a = 1}^{A} - p_{i}^{a} ln⁡ p_{i}^{a} .$ Here, $A$ is the number of distinct aerosol species and $p_{i}^{a}$ is the mass fraction of species $a$ in particle $i$ . These values are then averaged (mass-weighted) over the entire population to obtain the average particle species diversity $D_{α}$ by $\begin{matrix} 3 & H_{α} = \sum_{i = 1}^{N_{p}} p_{i} H_{i}, \\ 4 & D_{α} = e^{H_{α}}, \end{matrix}$ where $N_{p}$ is the total number of particles in the population and $p_{i}$ is the mass fraction of particle $i$ in the population. Finally, the bulk diversity $D_{γ}$ is calculated as $\begin{matrix} 5 & H_{γ} = \sum_{a = 1}^{A} - p^{a} ln⁡ p^{a}, \\ 6 & D_{γ} = e^{H_{γ}}, \end{matrix}$ where $p^{a}$ is the bulk mass fraction of species $a$ in the population.

Note that the definition of “species” for calculating $χ$ is based on application needs. It can be based on operationally defined chemical species , elemental composition , or species groups such as volatile and nonvolatile species or hygroscopic and non-hygroscopic species . Other possibilities include the propensity for aerosols to undergo heterogeneous reactions, quantified by the heterogeneous reaction rate coefficient for a specific reaction. In this paper we consider three different definitions of $χ$ , which we explain in more detail in Sect. .

3.2 Mode-based aerosol mixing state index

The framework laid out in Sect. can be easily generalized to a modal modeling framework (see Fig. ). The bulk mixing entropy, $H_{γ}$ , and the bulk diversity, $D_{γ}$ , can be calculated using the bulk mass fractions, $p^{a}$ , of species $a$ from the MAM4 simulation and Eqs. () and (). To calculate the average particle mixing entropy, $H_{α}$ , and the average particle species diversity, $D_{α}$ , we use $\begin{matrix} 7 & H_{m} = \sum_{a = 1}^{A} - p_{m}^{a} ln⁡ p_{m}^{a}, \\ 8 & H_{α} = \sum_{m = 1}^{M} p_{m} H_{m}, \\ 9 & D_{α} = e^{H_{α}}, \end{matrix}$ where $p_{m}^{a}$ is the mass fraction of species $a$ in mode $m$ , $p_{m}$ is the mass fraction of mode $m$ in the population, and $H_{m}$ represents the per-mode mixing entropies. Finally, the mixing state index, $χ$ , can be calculated using Eq. (). Note that Eqs. () and () are analogous to Eqs. () and (). A detailed derivation of these equations is provided in the Appendix .

Figure 1

Illustration of the mode-based calculation of the aerosol mixing state index. The coarse mode is removed because only modes dominated by submicron particles are used for calculations. Note that the Aitken mode mass fraction is very low compared to the other modes and the caption does not obscure any data.

[Figure omitted. See PDF]

In this study, we consider the mixing states of submicron aerosols including the Aitken, accumulation, and primary carbon modes, and we do not include the coarse mode because the coarse particles are above 1 $µ$ m. Since the mixing entropies are mass-weighted (rather than number-weighted), the mixing state index is more representative of the modes with the larger particles, i.e., the accumulation and primary carbon modes.

3.3 Grouped surrogate species

Here we compare and contrast the aerosol mixing state indices defined in three different ways, namely based on the mixing of optically absorbing and non-absorbing species ( $χ_{o}$ ), based on the mixing of primary carbonaceous and non-primary carbonaceous species ( $χ_{c}$ ), and based on the mixing of hygroscopic and non-hygroscopic species ( $χ_{h}$ ). Table shows the definitions of these aerosol mixing state indices.

Table 1

Aerosol mixing state index definitions. Six aerosol species (bc: black carbon, dst: dust, ncl: sea salt, pom: primary organic matter, soa: secondary organic aerosol, so4: sulfate) are used in calculating the aerosol mixing state indices based on different species groupings. The mixing state indices $χ_{o}$ , $χ_{c}$ , and $χ_{h}$ are based on two grouped surrogate species.

Aerosol mixing state index (symbol)	Grouped species
Optical property ( $χ_{o}$ )	(bc)	(pom, dst, ncl, soa, so4)
Primary carbon ( $χ_{c}$ )	(bc, pom)	(dst, ncl, soa, so4)
Hygroscopicity ( $χ_{h}$ )	(bc, pom, dst)	(ncl, soa, so4)

For $χ_{o}$ , we considered two surrogate species: black carbon (strongly absorbing, assigned a mass absorption coefficient in CESM2 at 533 nm and 0 % RH of 8.144 m $^{2}$ g $^{- 1}$ ) and the five other aerosol species grouped together (less absorbing or non-absorbing, with mass absorption coefficients in CESM2 at 533 nm and 0 % RH of 0.1442, $9.975 \times 10^{- 2}$ , $4.703 \times 10^{- 2}$ , $2 \times 10^{- 6}$ , and $5 \times 10^{- 7}$ m $^{2}$ g $^{- 1}$ for POM, SOA, dust, sea salt, and sulfate, respectively). Thus, a lower value in $χ_{o}$ refers to the case where the strongly absorbing species black carbon and the sum of the other species (termed “non-absorbing” here for convenience) are more externally mixed.

The index $χ_{c}$ is motivated by the primary carbon treatment of MAM4, where the primary particulate organic matter and black carbon are assigned to a separate primary carbon mode . A lower value in $χ_{c}$ refers to the situation where the primary carbonaceous species and all other species exist separately in different particles.

Similarly, $χ_{h}$ was also calculated from two surrogate species. We combined black carbon, primary organic matter, and dust as one surrogate species, given their comparatively lower hygroscopicities (kappa values of $\sim 0$ , $\sim 0$ , and 0.068, respectively). Accordingly, NaCl (1.16), SOA (0.14), and sulfate (0.507) were grouped as the other surrogate species. Here, a lower value in $χ_{h}$ represents the case where hygroscopic and non-hygroscopic species tend to be present in separate particles.

4 Machine-learned models of mixing state indices

Aerosol mixing state indices can be calculated directly using particle-resolved modeling, but this comes with large computational costs. Alternatively, developed ML models, which integrate machine learning and particle-resolved aerosol simulations to estimate aerosol mixing state indices. To generate the training and testing data sets for developing such ML models, an ensemble of particle-resolved model scenarios was created using the particle-resolved model PartMC-MOSAIC . In brief, PartMC-MOSAIC simulates individual aerosol particles within a representative volume of air, including stochastic coagulation, particle-phase thermodynamics, gas- and particle-phase chemistries, and dynamic gas–particle mass transfer. Thus, the composition of the individual particles within a population evolves dynamically, and assumptions about mixing state are not necessary.

The strategy to generate the data was to vary the input parameters (45 in total) for the PartMC-MOSAIC model, including primary emissions of different aerosol types (e.g., carbonaceous aerosol and dust emissions, including contribution from Aitken mode, accumulation mode, and coarse mode size ranges), primary emissions of gas phase species (e.g., SO $_{2}$ , NO $_{2}$ , and various volatile organic compounds), and meteorological parameters (see Table 1 in , for more information). For instance, to vary the gas emissions, scaling factors were sampled from 0 % to 200 % for different gas species, based on the emission rates in . A Latin hypercube sampling approach was employed to sample the parameter space efficiently for the training and testing data sets. We note that new particle formation and growth was not simulated explicitly, but Aitken mode sulfate particles were introduced into the simulation by emission for a subset of scenarios as a proxy for having particles present that originate from new particle formation. While PartMC-MOSAIC includes the process of new particle formation , the reason for this simplification was that considerable uncertainty exists regarding the subsequent growth of the freshly nucleated particles , which poses a challenge for a highly detailed aerosol model such as PartMC-MOSAIC. Errors in representing this particle type adequately may result in underestimating the abundance of BC-free particles in some regions and thereby overestimating the degree of internal mixture. This would imply that the error in the MAM4 simulations is even larger than currently indicated. Other processes that are not explicitly included in generating training data are aerosol removal by nucleation-scavenging and other cloud processes. However, for the purpose of this study, the emphasis is on the aerosol state, i.e., having a sufficiently comprehensive set of aerosol populations that can serve as training data, not necessarily that all the processes are included.

The ML models were derived by the machine learning algorithm eXtreme Gradient Boosting XGBoost; from 45 000 particle populations. Each ML model was a tree-based ensemble model that could handle complex nonlinear interactions and collinearity among features. The hyperparameters were determined by grid search with 10-fold cross-validation. The ML models can be expressed as

10 $χ_{S} (x, y, t) = f_{S} (A (x, y, t), G (x, y, t), E (x, y, t)),$ where $χ_{S} (x, y, t)$ is the mixing state index ( $χ_{o}$ , $χ_{c}$ , or $χ_{h}$ ) at location $(x, y)$ in the model layer nearest the surface at time $t$ , and $f_{S}$ denotes the function for calculating the corresponding mixing state index $χ_{S}$ . The set names $A$ (aerosol), $G$ (gas), and $E$ (environmental) represent the predictors (features) used for predicting the mixing state index. The choice of features is determined by the overlap of variables that are present in both PartMC-MOSAIC and CESM2. Aerosol species include black carbon, mineral dust, sea salt, primary organic aerosol, secondary organic aerosol, and sulfate. Of note is that we used the bulk (not the per-mode) concentrations of submicron aerosol species as the features. The gas species include dimethyl sulfide, hydrogen peroxide, sulfuric acid, ozone, semi-volatile organic gas, and sulfur dioxide. The environmental variables are air temperature, relative humidity, and solar zenith angle. Table shows the performance of the ML models when predicting the mixing state indices. The mixing state calculation in this study was purely based on the above six aerosol species (excluding other aerosol species) for a fair comparison with the mode-based aerosol mixing state index, which resulted in slightly different performance of the ML model compared to . The average error of the ML model (using the hold-out testing samples) is about 5 % for $χ_{o}$ and 8 % for $χ_{c}$ and $χ_{h}$ (measured by mean absolute error).

Table 2

Predictive performance of the ML models using the testing data set. Metrics include the mean absolute error (MAE), root-mean-square error (RMSE), median absolute deviation (MAD), index of agreement ( $d$ ; ), Pearson correlation coefficient (PCC), and coefficient of determination ( $r^{2}$ ).

$χ$	MAE	RMSE	MAD	$d$	PCC	$r^{2}$
XGBoost ML models
$χ_{o}$	0.048	0.072	0.030	0.974	0.953	0.906
$χ_{c}$	0.079	0.107	0.056	0.955	0.916	0.836
$χ_{h}$	0.082	0.112	0.057	0.955	0.916	0.835

We would like to emphasize that this ML modeling framework cannot compensate for any biases that the global model (here CESM2) might have in simulating the quantities that serve as the features. Instead, what we can expect from this approach is that it provides the most likely mixing state associated with the species concentrations that CESM2 simulates.

5 Results

5.1 Quantitative comparison of mode-based and particle-based mixing state indices

Let $χ_{S, t}^{ML}$ and $χ_{S, t}^{MAM 4}$ denote the mixing state indices computed by the ML model and by the MAM4 model for each grid cell at timestamp $t$ , respectively. The corresponding time-averaged values for a certain time interval and for each grid cell are $\overline{χ_{S}^{ML}}$ and $\overline{χ_{S}^{MAM 4}}$ . Here we consider the full year as the time-averaging interval. An analysis of the seasonal variation of mixing state indices can be found in .

To compare the annual mean values, we calculated the mean difference ( $\overline{Δ χ_{S}}$ ) and the mean absolute difference ( $\overline{| Δ χ_{S} |}$ ) for each grid cell of the layer closest to the surface: $\begin{matrix} 11 & \overline{Δ χ_{S}} = \frac{1}{T} \sum_{t = 1}^{T} (χ_{S, t}^{MAM 4} - χ_{S, t}^{ML}) = \overline{χ_{S}^{MAM 4}} - \overline{χ_{S}^{ML}}, \\ 12 & \overline{| Δ χ_{S} |} = \frac{1}{T} \sum_{t = 1}^{T} (| χ_{S, t}^{MAM 4} - χ_{S, t}^{ML} |), \end{matrix}$ where the subscript $S$ refers to the mixing state index (o, c, or h), and the total number of timestamps is $T = 2920$ . Since it only makes sense to quantify mixing state when at least two species are present in a given location, areas where the mass fraction of any one surrogate species was higher than 99 % for $χ_{o}$ (due to the low mass fraction of black carbon) and 97.5 % for $χ_{c}$ and $χ_{h}$ were ignored for the calculation and appear as hatched areas in Fig. . We will first discuss the overall probability density functions of these quantities (Fig. ) and then their spatial distributions (Fig. ).

Figure 2

Probability density functions of annual averaged mixing state indices using the MAM4 model and ML model. The thin black lines refer to their mean values.

[Figure omitted. See PDF]

Figure 3

Global distribution of annually averaged mixing state indices ( $χ_{o}$ , $χ_{c}$ , and $χ_{h}$ ) using the ML model, MAM4 model, their mean difference ( $\overline{Δ χ}$ ), and mean absolute difference ( $\overline{| Δ χ |}$ ). Areas are hatched where the mass fraction of any one surrogate species was higher than 99 % for $χ_{o}$ (due to the low mass fraction of black carbon) and 97.5 % for $χ_{c}$ and $χ_{h}$ .

[Figure omitted. See PDF]

Figure shows the probability density functions of the annual averaged mixing state indices computed by the ML model ( $\overline{χ_{S}^{ML}}$ ), by MAM4 ( $\overline{χ_{S}^{MAM 4}}$ ), their average difference ( $\overline{Δ χ_{S}}$ ), and their average absolute difference ( $\overline{| Δ χ_{S} |}$ ) for each surface-layer grid cell. The results show large discrepancies in mixing state indices between the ML model and the MAM4 model, without a clear relationship between them (see Fig. d–f).

The annual average of the mixing state index $χ_{o}$ estimated by the ML model, $\overline{χ_{o}^{ML}}$ , ranged between 55 % and 96 %, with a mean of 73 %. Calculated by the MAM4 model, $\overline{χ_{o}^{MAM 4}}$ varied spatially from 46 % to 99.76 %, with a higher mean of 86 %. The similar mean values of $\overline{Δ χ_{o}}$ (14 %) and $\overline{| Δ χ_{o} |}$ (18 %) were caused by higher values in $χ_{o}^{MAM 4}$ compared to $χ_{o}^{ML}$ , which is confirmed below with Fig. . The averaged mixing state index $\overline{χ_{c}^{ML}}$ ranged between 31 % and 84 % with a mean of 54 %, while $\overline{χ_{c}^{MAM 4}}$ had a wider range (from 9 % to 99.81 %) with a mean (of 58 %). Similarly, $\overline{χ_{h}^{ML}}$ ranged from 21 % to 81 % with a mean of 58 %, while $\overline{χ_{h}^{MAM 4}}$ varied between 10 % and 99.85 % with a mean of 63 %. The large discrepancy between the mean difference (4.8 % for $\overline{Δ χ_{c}}$ and 4.7 % for $\overline{Δ χ_{h}}$ ) and mean absolute difference (30 % for $\overline{| Δ χ_{c} |}$ and 38 % for $\overline{| Δ χ_{h} |}$ ) indicates that the errors in $χ_{c}$ and $χ_{h}$ were symmetric (positive and negative) but large. The maximal errors in $\overline{| Δ χ_{c} |}$ and $\overline{| Δ χ_{h} |}$ between the two methods were up to 59 and 76 percentage points, respectively.

The implications of these discrepancies are more easily discussed with Fig. , which illustrates the global spatial distribution of annually averaged mixing state indices predicted by the ML model (first column), MAM4 (second column), their mean difference (third column), and their mean absolute difference (fourth column). The differences in mixing state indices between the ML model and MAM4 varied strongly across the globe.

High values of $\overline{χ_{o}^{ML}}$ occurred in the continental regions (77 %) compared to oceans (69 %). Specifically, the ML model predicted high values for $χ_{o}$ in central Africa (20 $^{\circ}$ S–15 $^{\circ}$ N, 12–30 $^{\circ}$ E), the Arctic (66.5–90 $^{\circ}$ N), and southern Asia (5–38 $^{\circ}$ N, 60–90 $^{\circ}$ E). These are also the regions with relatively larger mass fractions of black carbon ( $\sim 5$ %; see Fig. ). The mixing state index $χ_{o}^{MAM 4}$ showed a higher degree of internal mixing over the globe (with a median of 90 %) compared to the ML model. The only exceptions were oceans in the Northern Hemisphere at the mid-latitudes (45–60 $^{\circ}$ N, dominated by sea salt, sulfate, and secondary organic aerosol in the accumulation mode) and Antarctica (66.5–90 $^{\circ}$ S, dominated by sea salt and sulfate in the accumulation mode as well as sulfate in Aitken mode), where $\overline{χ_{o}^{MAM 4}}$ was 75 %. Qualitatively, the MAM4 model captured the trend that areas with high black carbon concentration (defined here as concentrations above the 95 % percentile) tended to have higher $χ_{o}$ values.

The ML model estimate $\overline{χ_{c}^{ML}}$ suggested a rather homogeneous spatial distribution of the annually averaged mixing state, with values of approximately 50 %. Compared to $\overline{χ_{c}^{ML}}$ , $\overline{χ_{c}^{MAM 4}}$ values were lower (primary carbonaceous aerosol more externally mixed) at high latitudes and higher at low and mid-latitudes (primary carbonaceous aerosol more internally mixed). Note that, while $\overline{χ_{c}^{MAM 4}}$ values were similar in the Arctic and Antarctic, the abundance of primary carbonaceous species was predicted to be higher in the Arctic compared to the Antarctic (see Fig. ).

The spatial distributions of $\overline{χ_{h}^{MAM 4}}$ were similar to $\overline{χ_{c}^{MAM 4}}$ . That is, the MAM4 model predicted that the hygroscopic species and non-hygroscopic species were more externally mixed at high latitudes and more internally mixed at low latitudes. In contrast, the spatial distribution of $\overline{χ_{h}^{ML}}$ shows qualitative differences compared to $\overline{χ_{c}^{ML}}$ in two aspects. First, $\overline{χ_{h}^{ML}}$ was higher than $\overline{χ_{c}^{ML}}$ at high latitudes, meaning that hygroscopic species and non-hygroscopic appeared more internally mixed than primary carbonaceous and non-carbonaceous species in this region. Second, areas over the North Atlantic Ocean (0–20 $^{\circ}$ N, 20–45 $^{\circ}$ W), southern Africa (5–32 $^{\circ}$ S, 5–20 $^{\circ}$ E), and Australia (10–30 $^{\circ}$ S, 100–140 $^{\circ}$ E) appeared rather externally mixed. These are areas where mineral dust is the dominant aerosol species (see Fig. ).

These two facts lead to the overall finding that $χ_{h}$ exhibits the largest differences between the two methods. This applies especially to regions where mineral dust was the dominant aerosol species, which points to an important structural issue of the four-mode setup used in MAM4. While the ML model predicted a more external mixture in these regions (dust externally mixed from sea salt and other species), the MAM4 model could not represent this because the accumulation mode included all six aerosol species in an internal mixture. Figure illustrates the relationship of the mean absolute difference of $χ_{h}$ and the mass fraction of dust for all model grid points. It confirms that grid points with large dust mass fractions were associated with larger mean absolute differences in $χ_{h}$ . These results confirm the tradeoff discussed in : MAM3 (and MAM4 in ) intentionally combines dust and sea salt in the same mode to reduce the computational burden; however, this simplification does not always realistically reflect the aerosol mixing state in the ambient atmosphere.

It is interesting to note that the areas where sea salt is present, but not dust, are not associated with large errors, even though sea salt – just like mineral dust – is a primary aerosol type. The reason for this lies in our surrogate species definitions (Table ) for computing the mixing state index. Based on our mixing state definitions, sea salt, secondary organic aerosol, and sulfate are always grouped together. Therefore, none of the mixing state indices as defined here tell us how externally mixed sea salt is when it is considered as a single aerosol type.

Figure 4

Dependence of mean absolute difference of $χ_{h}$ on dust mass fractions for all model grid points.

[Figure omitted. See PDF]

Figure further demonstrates the zonal mean annual aerosol mixing state indices, highlighting that differences between $χ_{c}$ and $χ_{h}$ tended to be zonally structured, where the MAM4 model overestimated at low latitudes, while it underestimated at high latitudes relative to the ML model. In contrast, the MAM4 model overestimated $χ_{o}$ at all latitudes north of 60 $^{\circ}$ S.

Figure 5

Zonal mean annual aerosol mixing state indices (a) $χ_{o}$ , (b) $χ_{c}$ , and (c) $χ_{h}$ using the MAM4 model and ML model. The bands refer to the standard deviation.

[Figure omitted. See PDF]

5.2 Interpretation of findings

From Sect. , the following picture emerges: MAM4 overestimates the mixing state index $χ_{o}$ except in regions at high latitudes in the Southern Hemisphere. At the same time, $χ_{c}$ and $χ_{h}$ are overestimated at low latitudes to mid-latitudes and underestimated at high latitudes. These findings point towards too rapid a transfer from the carbonaceous mode to the accumulation mode at low latitudes to mid-latitudes and too slow a transfer at high latitudes.

To conceptually illustrate these relationships, here we use $χ_{o}$ and $χ_{c}$ as examples and contrast the conditions for high and low latitudes. Figure a–f show conditions representative of high latitudes. A grid cell sampled from the CESM2/MAM4 simulation (73 $^{\circ}$ N, 151 $^{\circ}$ W) contains 15 % BC and 37 % POM, distributed over the accumulation and primary carbon mode as shown in Fig. a and d. The corresponding value for $χ_{o}$ is 80 %. Figure b depicts particle population that was sampled from the MAM4 population in Fig. a. All particles, except for the smallest ones (corresponding to Aitken mode particles), contain BC, which results in the relatively high mixing state index value for $χ_{o}$ . Note that in MAM4 BC is not included in the Aitken mode by definition. Considering the same particle population, but now evaluating the mixing state metric $χ_{c}$ , which quantifies the degree of mixing of primary carbon and other species, yields the following observation. The entire primary carbon mode, by definition, consists of POM and BC, which results in an appreciable number of particles that contain only primary carbon (BC $+$ POM), giving a mixing state index $χ_{c}$ of only 27 %.

We now compare the MAM4-sampled particle populations above to particle populations that were sampled from our PartMC scenario library. We searched for populations with similar mass fractions of BC and POM as in the MAM4 populations and that were simulated at a similar latitude as the grid point location of the CESM2/MAM4 model output. Figure c shows that the PartMC results have comparatively more BC-free particles, and Fig. f shows that comparatively more particles are mixtures of primary carbon and other species. Overall, this means that in MAM4 BC appears too internally mixed (because irrespective of whether BC is placed in the primary carbon or accumulation mode, it is by design mixed with other species) and that at high latitudes the primary carbon mode is not transferring mass to the accumulation mode as quickly as is the case in PartMC simulations.

The reason why MAM4 behaves in this way can be explained by the aging process treatment in MAM4. Aging in MAM4 is formulated using a threshold criteria. That is, BC and POA mass is transferred from the primary carbon mode to the accumulation mode when a certain threshold of sulfate and SOA has condensed. In MAM4 this threshold is set to a relatively large value. This is done to prevent BC from being removed too quickly by wet deposition – because the primary carbon mode has a lower hygroscopicity than the accumulation mode and thus a lower wet scavenging efficiency – thereby counteracting a low bias in BC concentrations in the Arctic regions. From we already know that using such a high threshold may not be appropriate. However, the global model also has biases in other processes that contribute to the low BC bias in the Arctic, and setting the threshold to a high value compensates for these errors. Our results are a reflection of this fact. While adjusting the threshold criteria in MAM4 to a lower value may improve the agreement with the ML simulations in some regions, it may deteriorate the overall results in other areas. This is a good example how structural uncertainty manifests itself, namely by the fact that adjusting a parameter does not fundamentally fix the issue.

Figure g–l show conditions representative of low latitudes. A grid cell sampled from the CESM2/MAM4 simulation (20 $^{\circ}$ N, 120 $^{\circ}$ E) contains 11 % BC and 24 % POM, distributed over the accumulation and primary carbon mode as shown in Fig. g and j, with most of the mass in the accumulation mode. The corresponding value for $χ_{o}$ is therefore 99 %, an almost complete internal mixture. For the same reason, $χ_{c}$ is also very high. Similarly to the high-latitude case, Fig. i and l show that the comparable PartMC population has comparatively more BC-free particles and more particles that contain very low amounts of primary carbonaceous material, leading to lower values of both $χ_{o}$ and $χ_{c}$ compared to the MAM4 results.

Figure 6

Illustration to explain the differences in mixing state representation between MAM4 and the ML model at high and low latitudes.

[Figure omitted. See PDF]

5.3 Comparison to observational data

The question that arises from Sect. and is of course the following: which spatial distribution of aerosol mixing state reflects reality more closely? The validation of simulated mixing state indices with observational data is still challenging since per-particle mass fractions of species are required for calculating the mixing state indices (see Sect. ). These are in principle obtainable from in situ deployments of single-particle mass spectrometers or by using electron microscopy techniques, but their quantitative derivation comes with challenges and is not routinely done, so that only very few data sets exist that allow for a meaningful comparison . Keeping these limitations in mind, reported a qualitative comparison of available measurements of mixing state metrics in locations in developed countries (Paris, France; Pittsburgh, USA; various locations in Japan) with seasonally averaged results from the ML model based on particle-resolved simulations. This showed that the ML model was able to capture the range of values that is consistent with the observations.

We further compared the ML model estimates using recent observations from China. Specifically, we compared $χ_{o}^{ML}$ and $χ_{o}^{MAM 4}$ with $χ$ values from Taizhou and Beijing derived from Single Particle Soot Photometer (SP2) measurements. For both locations, $χ_{o}^{MAM 4}$ overestimated the observed $χ$ values, while $χ_{o}^{ML}$ was in the range of the observations. Specifically, the $χ$ measured at a suburban site Taizhou from 26 May to 18 June 2017 ranged from 62 % to 82 %. During the same time period (but in the year 2011), the values of $χ_{o}^{ML}$ were between 63 % and 84 %, while $χ_{o}^{MAM 4}$ was between 84 % and 96 %. The $χ$ values at the urban site of Beijing ranged between 55 % and 70 % in winter (from 10 November to 10 December 2016) and varied between 60 % and 75 % in summer (from 18 May to 25 June 2017). Using our simulations of the year 2011, $χ_{o}^{ML}$ varied from 60 % to 88 % in winter and from 59 % to 83 % in summer. As a comparison, $χ_{o}^{MAM 4}$ ranged from 92 % to 97 % in winter and from 87 % to 95 % in summer. A caveat when comparing $χ_{o}^{ML}$ and $χ_{o}^{MAM 4}$ , respectively, with the observations reported in and is that the definition of $χ_{o}^{ML}$ and $χ_{o}^{MAM 4}$ included BC-free particles, while the $χ$ values in the measurements by and were calculated only considering the subpopulation of BC-containing particles. This might introduce a bias in the mixing state index between the $χ_{o}$ index used in this paper and the observations (depending on the fraction of the BC-free particles present at any given location).

We can also relate our $χ_{o}$ index qualitatively to the SP2 measurements in the Finnish Arctic during winter 2011–2012 . Although this study did not provide quantitative mixing state index calculations, it is an important finding that BC-containing particles (with various amounts of coatings) co-existed with BC-free particles. As we saw in Sect. , this condition can easily be represented with a particle-resolved approach. However, the modal model with modes configured as in MAM4 puts black carbon in all accumulation-sized particles (Fig. ), which is not consistent with the observations.

6 Conclusions

In this paper we present a framework for evaluating the error in submicron aerosol mixing state induced by aerosol representation assumptions, which is one of the important contributors to structural uncertainty in aerosol models. We quantitatively compared mixing state indices for submicron aerosol predicted by the modal model MAM4 within the global model CESM to a machine-learned model based on high-detail particle-resolved simulations. We focused on the mixing of optically absorbing and non-absorbing species ( $χ_{o}$ ), the mixing of primary carbonaceous with other aerosol species ( $χ_{c}$ ), and the mixing of hygroscopic and non-hygroscopic species ( $χ_{h}$ ).

For $χ_{o}$ , the MAM4 modal representation generally overestimated the degree of mixing of BC with other aerosol species. This overestimation is due to the fact that MAM4's choice of modes does not allow for representing BC-free particles in the accumulation and primary carbon modes. This is in contrast to field observations by , which showed that BC and POM may be externally mixed near sources. The implication of this is that, if optical properties are calculated based on the aerosol composition, absorption will be overestimated.

For $χ_{c}$ and $χ_{h}$ , the error tended to be zonally structured, where the MAM4 model overestimated the mixing state indices at low latitudes and underestimated them at high latitudes compared to the ML model. This behavior could be explained by modeling choices in MAM4, in particular that (1) BC is always emitted with POM, (2) no BC-free particles exist in the submicron modes, and (3) dust is always internally mixed with other aerosol species.

Mixing state is an important emergent property that affects the aerosol radiative forcing and aerosol–cloud interactions, but it is not easy to constrain this property globally. To the best of our knowledge, this is the first study that evaluated the spatial distribution of aerosol mixing state as predicted by a global model. Since errors in mixing state predictions propagate into errors in aerosol climate impacts, our findings provide a framework and reference for Earth system model developers and users regarding simulation reliability. For example, this framework can be used to (1) quantify model bias in simulating mixing state in different regions, identifying model structural deficiencies, and (2) provide insights into potential improvements of model process representations for a more realistic simulation of aerosols.

Appendix A Derivation of mode-based aerosol mixing state index

Table details the notation for aerosol mass and mass fractions to calculate $H_{α}$ using modal information.

To explain how to obtain Eqs. () and () from Eqs. () and (), let us assume that each mode $m$ contains $N_{m}$ particles and the number of species in the population is $A$ . The mixing entropy of particle $i$ in mode $m$ , $H_{m, i}$ , is given by A1 $H_{m, i} = \sum_{a = 1}^{A} - p_{m, i}^{a} ln⁡ p_{m, i}^{a} .$ The average particle mixing entropy of the entire population (summed over all modes), $H_{α}$ , is A2 $\begin{aligned} H_{α} & = \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} p_{m, i} H_{m, i} = \underset{m = 1}{\underset{︸}{p_{1, 1} H_{1, 1} + p_{1, 2} H_{1, 2} \dots + p_{1, N_{1}} H_{1, N_{1}}}} \\ + \dots + \underset{m = M}{\underset{︸}{p_{M, 1} H_{M, 1} + p_{M, 2} H_{M, 2} \dots + p_{M, N_{M}} H_{M, N_{M}}}} . \end{aligned}$

Given that each mode is assumed to be internally mixed, particles within the same mode have the same composition, and we have A3 $p_{m, i}^{a} = \frac{μ_{m, i}^{a}}{μ_{m, i}} = \frac{μ_{m}^{a}}{μ_{m}} = p_{m}^{a} .$

This results in A4 $H_{m, i} = \sum_{a = 1}^{A} - p_{m, i}^{a} ln⁡ p_{m, i}^{a} = \sum_{a = 1}^{A} - p_{m}^{a} ln⁡ p_{m}^{a} = H_{m} .$

Therefore, based on Eq. () and the fact that $p_{m} = \sum_{i = 1}^{N_{m}} p_{m, i}$ , Eq. () can be rewritten as A5 $H_{α} = \underset{m = 1}{\underset{︸}{p_{1} H_{1}}} + \dots + \underset{m = M}{\underset{︸}{p_{M} H_{M}}} = \sum_{m = 1}^{M} p_{m} H_{m} .$ With the mode-based $H_{α}$ , the other mixing state quantities can be computed as described in Sect. .

Table A1

Aerosol mass and mass fraction definition and notation. The number of modes is $M$ ( $M = 3$ for MAM4 without the coarse mode), the number of particles in mode $m$ is $N_{m}$ , and the number of species is $A$ .

Quantity	Meaning
$μ_{m, i}^{a}$	mass of species $a$ in particle $i$ frommode $m$
$μ_{m, i} = \sum_{a = 1}^{A} μ_{m, i}^{a}$	total mass of particle $i$ from mode $m$
$μ_{m}^{a} = \sum_{i = 1}^{N_{m}} μ_{m, i}^{a}$	total mass of species $a$ from mode $m$
$μ_{m} = \sum_{i = 1}^{N_{m}} \sum_{a = 1}^{A} μ_{m, i}^{a}$	total mass of mode $m$
$μ^{a} = \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} μ_{m, i}^{a}$	total mass of species $a$ in population
$μ = \sum_{m = 1}^{M} \sum_{i = 1}^{N_{m}} \sum_{a = 1}^{A} μ_{m, i}^{a}$	total mass of the population
$p_{m, i}^{a} = \frac{μ_{m, i}^{a}}{μ_{m, i}}$	mass fraction of species $a$ in particle $i$ (within mode $m$ )
$p_{m}^{a} = \frac{μ_{m}^{a}}{μ_{m}}$	mass fraction of species $a$ in mode $m$
$p_{m, i} = \frac{μ_{m, i}}{μ}$	mass fraction of particle $i$ from mode $m$ in population
$p_{m} = \frac{μ_{m}}{μ}$	mass fraction of mode $m$ in population
$p^{a} = \frac{μ^{a}}{μ}$	mass fraction of species $a$ in population

Figure A1

Aerosol species mixing ratio ( $µ$ g kg $^{- 1}$ ). Accumulation mode: a1; Aitken mode: a2; primary carbon mode: a4. The coarse mode (a3) is not used in this study and therefore omitted in this figure. Black carbon: bc; dust: dst; sea salt: ncl; primary organic matter: pom; secondary organic aerosol: soa; sulfate: so4.

[Figure omitted. See PDF]

Figure A2

Fraction of aerosol species mixing ratio (%). Accumulation mode: a1; Aitken mode: a2; primary carbon mode: a4. The coarse mode (a3) is not used in this study and therefore omitted in this figure. Black carbon: bc; dust: dst; sea salt: ncl; primary organic matter: pom; secondary organic aerosol: soa; sulfate: so4.

[Figure omitted. See PDF]

Code and data availability

Notebooks and data to reproduce the global mixing state index analysis are available at https://github.com/zzheng93/code_ms_ml_mam4 (last access: 16 November 2021) or https://doi.org/10.5281/zenodo.4731385 .

Author contributions

ZZ, MW, and NR conceptualized the analysis and wrote the manuscript with input from the co-authors. ZZ developed the code, carried out the simulations, and performed the analysis. LZ, PLM, and XL provided scientific suggestions for the manuscript. All authors were involved in helpful discussions and contributed to the manuscript.

Competing interests

Some authors are members of the editorial board of Atmospheric Chemistry and Physics. The peer-review process was guided by an independent editor, and the authors have also no other competing interests to declare.

Disclaimer

Publisher’s note: Copernicus Publications remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Acknowledgements

We would like to acknowledge high-performance computing support from Cheyenne (10.5065/D6RX99HX) provided by NCAR's Computational and Information Systems Laboratory, sponsored by the National Science Foundation. The CESM project is supported primarily by the National Science Foundation. This research is part of the Blue Waters sustained-petascale computing project, which is supported by the National Science Foundation (awards OCI-0725070 and ACI-1238993), the State of Illinois, and as of December 2019 the National Geospatial-Intelligence Agency. Blue Waters is a joint effort of the University of Illinois at Urbana-Champaign and its National Center for Supercomputing Applications. Po-Lun Ma and Xiaohong Liu were supported by the Enabling Aerosol-cloud interactions at GLobal convection-permitting scalES (EAGLES) project (74358), funded by the U.S. Department of Energy, Office of Science, Office of Biological and Environmental Research, Earth System Model Development program. The Pacific Northwest National Laboratory is operated for the U.S. Department of Energy by Battelle Memorial Institute under contract DE-AC05-76RL01830.

Financial support

This research has been supported by the Office of Biological and Environmental Research (grant no. DE-SC0019192), the NSF Division of Atmospheric and Geospace Sciences (grant no. AGS-1254428), the Office of Biological and Environmental Research (Enabling Aerosol-cloud interactions at GLobal convection-permitting scalES (EAGLES) project (grant no. 74358)), the Office of Advanced Cyberinfrastructure (grant no. OCI-0725070), and the Division of Advanced Cyberinfrastructure (grant no. ACI-1238993).

Review statement

This paper was edited by Qiang Zhang and reviewed by three anonymous referees.

Word count: 6951

Show less

© 2021. This work is published under https://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Translate

Aerosol mixing state is an important emergent property that affects aerosol radiative forcing and aerosol–cloud interactions, but it has not been easy to constrain this property globally. This study aims to verify the global distribution of aerosol mixing state represented by modal models. To quantify the aerosol mixing state, we used the aerosol mixing state indices for submicron aerosol based on the mixing of optically absorbing and non-absorbing species ( $χ_{o}$ ), the mixing of primary carbonaceous and non-primary carbonaceous species ( $χ_{c}$ ), and the mixing of hygroscopic and non-hygroscopic species ( $χ_{h}$ ). To achieve a spatiotemporal comparison, we calculated the mixing state indices using output from the Community Earth System Model with the four-mode version of the Modal Aerosol Module (MAM4) and compared the results with the mixing state indices from a benchmark machine-learned model trained on high-detail particle-resolved simulations from the particle-resolved stochastic aerosol model PartMC-MOSAIC. The two methods yielded very different spatial patterns of the mixing state indices. In some regions, the yearly averaged $χ$ value computed by the MAM4 model differed by up to 70 percentage points from the benchmark values. These errors tended to be zonally structured, with the MAM4 model predicting a more internally mixed aerosol at low latitudes and a more externally mixed aerosol at high latitudes compared to the benchmark. Our study quantifies potential model bias in simulating mixing state in different regions and provides insights into potential improvements to model process representation for a more realistic simulation of aerosols towards better quantification of radiative forcing and aerosol–cloud interactions.

Details

Title

Quantifying the structural uncertainty of the aerosol mixing state representation in a modal model

Author

Zheng, Zhonghua¹

; West, Matthew²

; Zhao, Lei³

; Po-Lun Ma⁴

; Liu, Xiaohong⁵; Riemer, Nicole⁶

¹ Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
² Department of Mechanical Science and Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA
³ Department of Civil and Environmental Engineering, University of Illinois at Urbana-Champaign, Urbana, IL, USA; National Center for Supercomputing Applications, University of Illinois at Urbana-Champaign, Urbana, IL, USA
⁴ Atmospheric Sciences and Global Change Division, Pacific Northwest National Laboratory, Richland, WA, USA
⁵ Department of Atmospheric Sciences, Texas A&M University, College Station, TX, USA
⁶ Department of Atmospheric Sciences, University of Illinois at Urbana-Champaign, Urbana, IL, USA

Pages

17727-17741

Publication year

2021

Publication date

2021

Publisher

Copernicus GmbH

ISSN

16807316

e-ISSN

16807324

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.5194/acp-21-17727-2021

ProQuest document ID

2605599395

Quantifying the structural uncertainty of the aerosol mixing state representation in a modal model

Jump to:

Full Text

Abstract

Details

Suggested sources