1. Introduction
Atmospheric aerosols are known to have both direct and indirect effects on Earth’s climate since aerosols scatter and absorb solar radiation and affect cloud microphysical properties [1]. The diverse and significant effects of aerosols on climate play a critical role in radiative forcing, which has high uncertainty [2]. To estimate aerosol radiative forcing, aerosol type has been identified as the important parameters because aerosol properties, such as radiation absorptivity and particle size, differ among aerosol types [3,4]. In addition, the aerosol type is an input parameter of satellite aerosol retrieval algorithms and affects their accuracy [5,6]. Therefore, accurate aerosol classification is essential for valid climate and satellite aerosol remote sensing.
Several satellite aerosol-type classification algorithms have been developed based on the threshold approach. Higurashi and Nakajima [3] developed a four-channel algorithm (4CA) to detect aerosol types using four-channel data from Sea-viewing Wide Field of View Sensors (SeaWiFS). Four aerosol types, including soil dust, carbonaceous, sulfate, and sea salt, have been detected over the ocean in northeastern Asia [3]. Jeong and Li [7] proposed an aerosol classification method using Advanced Very High Resolution Radiometer (AVHRR) and Total Ozone Mapping Spectrometer (TOMS) data. Aerosol optical thickness (AOT) and Ångström exponent (AE) from AVHRR and the aerosol index from TOMS were used to classify aerosols into seven types (biomass burning, dust, sea salt, pollution/sulfate mixtures, biomass/dust mixtures, sulfate/sea salt mixtures, and undefined mixtures) on a global scale. Lee et al. [8] classified aerosols into four major types (dust, sea salt, smoke, and sulfate) and two mixtures of major types using AOT and AE from Moderate Resolution Imaging Spectroradiometer (MODIS) data and aerosol index from Ozone Monitoring Instrument (OMI) data over northeastern Asia. Those authors also investigated the spatial and frequency distribution of aerosol types over northeastern Asia and made a comparison with those obtained from a global aerosol climate model [8]. Kim et al. [9] suggested a MODIS–OMI algorithm (MOA) with aerosol index from OMI and Fine Mode Fraction (FMF) from MODIS to classify aerosol types. The classification results of the MOA were compared with 4CA over northeastern Asia, and their agreement ranged from 32% to 81% [9]. In addition to channel data and aerosol optical properties, Torres et al. [6] introduced the use of carbon monoxide (CO), a tracer of carbonaceous aerosols, to distinguish between carbonaceous and dust aerosols instead of AE. Column amounts of CO from the Atmospheric Infrared Sounder (AIRS) and aerosol index were used to classify aerosol types in an operational OMI near-UV aerosol algorithm [6]. Penning de Vries et al. [10] developed a new global aerosol classification algorithm (GACA) using monthly mean aerosol properties (aerosol optical depth (AOD) and AE) and column densities of trace gases (NO2, HCHO, SO2, and CO). The GACA classified aerosol types on a monthly basis, and the results were compared against model-derived aerosol compositions from the global monitoring atmospheric composition and climate model. Mao et al. [11] utilized AOD and aerosol relative optical depth from MODIS to classify aerosol types over eight study regions including major aerosol source regions and downwind of the source regions. The agreement between the satellite-based results and ground-based results ranges from 36% to 91% over the study regions in Mao et al. [11]. In fact, Mao et al. [11] is the only one of the aforementioned satellite aerosol classification studies that attempted such a satellite and ground-based validation. Most of the classification results have been compared only with results from an aerosol climate model and earlier aerosol classification methods. Although the accuracy assessments of aerosol classification methods have rarely been carried out, uncertainties in satellite aerosol optical properties and trace gas products have been reported. The uncertainty of MODIS-derived AOD was found to be ±(0.05 + 0.15 × AOD) according to Chu et al. [12], and that of MODIS AE was up to 30% [13]. The uncertainty in AIRS CO products is 15% as reported in Thrastarson et al. [14]. These uncertainties associated with satellite input variables can lead to misclassifications of aerosol types in threshold-based classification methods.
Aerosol type classification methods have also been developed using Aerosol Robotic NETwork (AERONET) data, which are used to evaluate satellite aerosol products. Previous studies have established an AERONET-based aerosol classification method using several aerosol optical properties, including single scattering albedo (SSA), FMF, AOD, and AE obtained from an AERONET version 2 inversion product [15,16,17]. Recently, the AERONET version 3 inversion product was released, which provides particle depolarization ratios at 440, 675, 870, and 1020 nm and thus is sensitive to aerosol particle shape [18,19]. The AERONET-derived depolarization ratio values at several wavelengths were found to be highly correlated compared with those obtained from ground-based light detection and ranging (lidar) measurements [20]. Zo and Shin [19] reported a potential for depolarization ratio values for identifying the contribution of aerosol types in aerosol particle mixtures. Shin et al. [21] suggested a new aerosol type classification method using SSA to distinguish the aerosol absorbance and dust ratio (Rd) derived from particle linear depolarization ratios (PLDR), which consider contributions of non-spherical particles, such as dust aerosols, from AERONET version 3 inversion products. It was reported that Rd is a more suitable parameter for identifying aerosol types mixed with dust particles than FMF, as non-spherical dust aerosols might be present in the fine mode [21]. The aerosol classification method suggested by Shin et al. [21] identified aerosol types mixed with spherical and non-spherical particles in East Asia as well as other global sites, indicating a possibility as an evaluation tool of the satellite aerosol type classification methods. However, PLDR is not available in satellite measurements except for Aerosol Lidar with Orthogonal Polarization (CALIOP), which has 16 days of global coverage. Therefore, it is necessary to investigate new satellite-based parameters that can identify the contribution of non-spherical particles without PLDR.
Recently, machine learning techniques have been applied to satellite remote sensing of aerosol information that is difficult to fully estimate via traditional regression models and physical retrieval approaches. In particular, various machine learning models have been introduced to estimate surface concentrations of particulate matter (PM) using satellite measurements and meteorological data [22,23,24,25,26,27]. In addition to estimates of ground-level PM concentrations, machine-learning techniques have been used to estimate AOD and aerosol height (the altitude of peak aerosol concentration in a vertical profile). Han and Sohn [28] retrieved AOT and dust height using a statistical artificial neural network (NN) approach. The artificial NN-based model was trained by relating brightness temperature, surface elevation, and relative air mass from AIRS measurements to MODIS-derived AOT and dust height from CALIOP data. Chimot et al. [29] developed an aerosol layer height retrieval algorithm based on a multilayer perceptron NN model using an absorption band of an O2–O2 collision pair measured by OMI. Therefore, machine-learning-based techniques have been increasingly and successfully applied to solving challenging problems in aerosol remote sensing. Inspired by this, we attempted to apply the method to satellite-based aerosol type classification using data classified according to an AERONET-based method [21]. Such a trained aerosol classification model may be able to identify contributions of non-spherical particles using only satellite-based variables without the input of AERONET observations.
Accordingly, we propose a new machine-learning method for classifying aerosol types based on satellite observations. Various satellite input variables associated with aerosol properties and their production were introduced. Several aerosol optical properties were obtained from MODIS measurements, which have been used in various aerosol classification methods in previous studies. Aerosol index and trace gas information were obtained from TROPOspheric Monitoring Instrument (TROPOMI) measurements, which is a recent environmental sensor similar to OMI but with improved spatial resolution. Among the various satellite input parameters, we aimed to adopt an optimal input variable set such that missing data were minimized and classification accuracy maximized. Therefore, we adopted a random forest (RF) model, which provides a measure of importance for each input variable. The RF-based model was trained with input variables consisting of satellite data and a target variable of the AERONET-based aerosol type dataset. The performance of the RF-based model was statistically evaluated using an AERONET-based aerosol type dataset, which was excluded from the model training dataset. For the first time, the results were also evaluated using wavelength dependence of SSA and FMF values from AERONET measurements. In addition, the performance of earlier threshold-based aerosol classification methods was investigated via comparison with the AERONET-based aerosol type dataset. 2. Variables and Data Collection
We classified aerosol types using satellite measurements based on an RF approach (Figure 1). The RF model was trained with a set of observational data consisting of a target variable (i.e., aerosol types) and input variables (i.e., satellite measurements). The target variable dataset was constructed from the AERONET-based classification method suggested by Shin et al. [21]. For the satellite input variables, we selected input variable candidates prior to the determination of the optimal input variable set; this minimized missing data and maximized classification accuracy. Various satellite input variable candidates were selected, including new variables related to aerosol properties and their production as well as previously used satellite input variables.
2.1. Target Variable Dataset
The target variable (aerosol-type) dataset was constructed using an AERONET-based aerosol classification method [21]. Shin et al. [21] suggested a method utilizing PLDR and SSA data at 1020 nm from the AERONET version 3 inversion product. In the work of Shin et al. [21], the Rd was introduced to identify spherical and non-spherical particles (i.e., dust aerosols). The Rd can be defined using the PLDR [30,31]. Shin et al. [21] utilized the PLDR at 1020 nm to calculate the Rd, as AERONET-derived PLDR data at 1020 nm were reported to have the highest correlation with lidar PLDR [20]. The Rd is expressed as:
Rd=(δ−δnd)(1+δd)(δd−δnd)(1+δ)
whereδndandδdindicate the PLDR (δ) of non-dust and pure dust particles, respectively. The values used forδndandδd were 0.02 and 0.30, respectively, as suggested by Shin et al. [21]. The Rd is set to 0 when PLDR is lower thanδnd , which indicates spherical (e.g., anthropogenic or smoke) particles [21]. In contrast, the Rd is set to 1 when PLDR is higher than 0.30, which shows a higher contribution of dust aerosols.
In the AERONET-based aerosol classification algorithm [21], Rd is utilized to identify the contribution of dust aerosols. Pure dust (PD), dust dominated mixture (DDM), and pollution dominated mixture (PDM) are classified according to Rd thresholds:
(1) PD: 0.89 < Rd
(2) DDM: 0.53 ≤ Rd ≤ 0.89
(3) PDM: 0.17 ≤ Rd < 0.53
When Rd is less than 0.17, an aerosol is classified as a pollution particle. The SSA at 1020 nm is used to distinguish the absorption characteristics of pollution particles:
(1) Non-absorbing (NA): 0.95 < SSA
(2) Weakly absorbing (WA): 0.90 < SSA ≤ 0.95
(3) Moderately absorbing (MA): 0.85 ≤ SSA ≤ 0.90
(4) Strongly absorbing (SA): SSA < 0.85
The NA type represents non-absorbing fine-mode aerosols (e.g., sulfate and nitrate). Thus, the SA type indicates strongly absorbing fine-mode aerosols, such as carbonaceous aerosols.
The target variable dataset was constructed using AERONET measurements from January 2018 to July 2020, as TROPOMI data are available from 2018. Both AERONET Level 1.5 (cloud-screened) and Level 2.0 (quality-assured) data were collected at the overpass time (13:30 local time (LT)) of TROPOMI aboard the Sentinel-5P satellite and MODIS aboard the Aqua satellite. To minimize uncertainties in AERONET Level 1.5 data, we collected data only when AOD at 440 nm was above 0.4; i.e., the SSA data processing criterion of AERONET Level 2.0. For AERONET Level 1.5 and 2.0 data, a total of 10,481 or 2232 data points were collected over 300 or 161 sites, respectively. A map of AERONET stations that provided PLDR and SSA data during the study period for Level 1.5 and 2.0 data is shown in Figure 2.
Although AERONET Level 2.0 assures the quality of the data, the dataset includes only 2232 cases, with seven classes (aerosol types) being too small to train a machine-learning system. There were only 108 PD cases in the dataset. For AERONET Level 1.5 data, the sites were more globally distributed than those from which Level 2.0 data were collected (Figure 2). Therefore, we used mainly the AERONET Level 1.5 dataset (AOD > 0.4) to provide more training cases than are available from the AERONET Level 2.0 dataset, together with partial use of the AERONET Level 2.0 dataset (in Section 4 only).
2.2. Satellite Input Variable Candidates
The input variable candidates were first selected based on previous aerosol classification methods. Among FMF and AE (a measure of aerosol size), AE was selected, as used in previous studies [7,8], while FMF was excluded due to more missing values than AE. AOD was used to identify the presence of aerosols in previous satellite classification algorithms [7,8,9]. Therefore, we also selected AOD as one of the candidates for input variables. The aerosol index was selected as one of the candidates for input variables to account for the presence of absorbing aerosols, as it has been used in several satellite aerosol classification methods [6,7,8,9]. However, SSA values have been excluded due to many missing values in the MODIS aerosol product. In addition to the parameters directly related to aerosol optical properties, Torres et al. [6] introduced column CO amount to identify the presence of carbonaceous particles that we also adopted. Penning de Vries et al. [10] also utilized trace gas abundances (NO2, SO2, HCHO, and CO) for monthly aerosol-type classification, since AOD is reported to be significantly correlated with trace gas concentrations [32]. These trace gas column densities were used to infer the dominating source of the aerosols in Penning de Vries et al. [10]. In this present study, we also tried to introduce trace gas information to identify aerosol types on a daily basis. CO was selected as input variable candidates to identify the presence of carbonaceous aerosols since the TROPOMI CO column amount product is reported to be 6.5% with stable accuracy [33]. A bias of TROPOMI-derived tropospheric NO2 column density is calculated to be approximately 22% [34]. Tropospheric NO2 column density was selected as one of the input variable candidates to account for the presence of scattering-dominant aerosols. Nitrate aerosols, which are one of the scattering-dominant aerosols, are produced by the oxidation of nitrogen oxides (NOx) [35,36,37,38]. NOx is produced by all combustion sources including vehicular emissions, coal combustion, biomass burning, and industrial sources [38,39,40,41,42]. TROPOMI SO2 column density in the planetary boundary layer is reported to have a product accuracy of about 50% or more due to surface albedo or SO2 vertical profile shape [43]. Biases of TROPOMI HCHO column density were reported to be positive (26%) for clean areas and negative (−31%) for large emissions [33]. Therefore, both SO2 and HCHO were excluded in our study, as these retrieval errors may lead to decrease in aerosol identification accuracy in a daily scale.
Several satellite products thought to have a relationship with the aerosol type were additionally considered as input variable candidates. A solar zenith angle (SZA) was selected to indirectly represent photochemistry reactions of aerosol formation, which depend on amount of radiation [44]. Top-of-atmosphere (TOA) reflectance was selected because it is known to be dependent on specific aerosol types, especially for aerosol absorbance [45]. Finally, land cover type and percent of urban area were selected to account for the effects of different land cover type on aerosol formation and to serve a proxy for aerosol source information [46,47].
Table 1 summarizes the selected satellite input variable candidates. Aerosol index and SZA (Sentinel-5P TROPOMI Aerosol Index 1-Orbit L2), tropospheric NO2 column density (Sentinel-5P TROPOMI Tropospheric NO2 1-Orbit L2), and CO total column amount (Sentinel-5P TROPOMI Carbon Monoxide CO Column 1-Orbit L2) were obtained from TROPOMI level 2 products. AOD, AE, and TOA reflectance data were obtained from a MODIS aerosol product (MYD04_L2). Land cover type was obtained from a MODIS/Terra+Aqua Combined Land Cover Type product (MCD12C1). In the MODIS land cover product, both land cover type and urban area percent were utilized. The dataset of input variable candidates was collected for the target variable dataset (N = 10,481 for Level 1.5 and 2232 for Level 2.0). The collected data for each input variable candidate could be lower than the total target variables, as some satellite variable data may have missing values. The satellite variable dataset were collocated by selecting the satellite pixel nearest to the AERONET site location.
3. Methods 3.1. Machine Learning Approach and Training Process
An RF model was used to classify aerosol type. The RF is an ensemble model based on classification and regression trees (CART), in which multiple trees are aggregated with majority voting and averaged in classification regression tasks [48]. The RF is an enhanced method over individual tree models, which leads to an overfitting problem with a large degree of variability among different data. Breiman [48] suggested an ensemble model based on bagging and randomized node optimization. The RF indicates the importance of each input variable and was used to determine the optimal input variable set that minimized missing data and maximized classification accuracy among various combinations of variable candidates.
The RF model training was carried out using the “randomForest” package (version 4.6-14) [48] in Rstudio (R version 3.6.3, R Studio Inc., Boston, MA, USA). The RF model includes hyperparameters such as ntree (binary classification trees), mtry (the given number of input variables), and node size (the minimum size of terminal nodes). The accuracy of the RF model is reported to depend on these RF hyperparameters so that they would be optimized [48,49]. To determine optimal hyperparameters such as ntree and node size in the RF model, we used the “tune.randomForest” function available from the “e1071” package (version 1.7-3, TU Wien, Vienna, Austria). We attempted to tune model hyperparameters by using node sizes of 1–5 with ntree values of 100–1500. The mtry value was set to the square root of the number of input variables, which is a typical mtry value for classification [50].
The collected dataset was divided randomly into training (60%) and test (40%) datasets. The RF-based model was trained by optimizing hyperparameters based on a k-fold cross-validation procedure. The training dataset was divided randomly into five sets of the same size (Figure 3). The training involved only four folds and was validated with the remaining (unseen) single fold for calculation of model performance. This allowed us to choose optimized hyperparameters that led to the best model performance.
3.2. Classification Model Assessments
3.2.1. Statistical Assessment
In general, classification accuracy can be statistically quantified via confusion matrix analysis [51,52]. Figure 4 shows an example of a confusion matrix, which consists of N columns indicating target classes and N rows for estimated classes. The diagonal terms (e.g., n11, n22, n33, and nNN) indicate correctly classified instances, whereas off-diagonal terms show incorrectly classified instances. Thus, the confusion matrix provides a measure of classification accuracy as well as insights into misclassification patterns of the classifier. To statistically quantify classification accuracy, we utilized the overall accuracy (OA) and producer’s accuracy (PA) [51,52]. The OA denotes total classification accuracy as the percentage of the number of correctly classified pixels to the total test data (Equation (2)), whereas PAk represents the classification accuracy of specific class k (Equation (3)).
OA=∑k=1N nkkntotal×100
PAk=nkknk+×100
where ntotal is the total number of test data points, nkk is the number of pixels correctly classified as class k (diagonal term in the error matrix), and nk+ is the total number of test data points for specific class k.
3.2.2. Assessment Using AERONET Aerosol Optical Properties
Even though the aerosol type method suggested by Shin et al. [21] is reported to be reasonable for the identification of aerosol sphericity, it was necessary to evaluate the classified aerosol with our RF model using aerosol optical properties from AERONET data. Among several aerosol properties, SSA depends on the complex refractive index (related to extinction coefficient) as well as aerosol size distribution [53,54]. Furthermore, Dubovik et al. [55] also found a wavelength dependency of SSA for each aerosol type. For example, desert dust aerosols show an increasing trend of SSA with increasing wavelength, whereas decreasing SSA values were found with increasing wavelength for biomass-burning smoke aerosols. Therefore, we qualitatively investigated the wavelength dependence of SSA for each aerosol type. In addition, we calculated the differences between several aerosol optical properties (SSA, FMF, and Rd) obtained from AERONET-based aerosol types and RF-based aerosol types.
4. Determination of the Optimal Input Variables
We investigated the contribution of each input variable candidate in our RF-based model to determine the optimal input variable set. To evaluate the importance of each satellite variable by minimizing missing values, we considered three sets of input variable candidates (Table 2):
(1) All input variable candidates (N = 4906 for Level 1.5 and 1119 for Level 2.0);
(2) TROPOMI input variable candidates (N = 8693 for Level 1.5 and 1804 for Level 2.0);
(3) MODIS input variable candidates (N = 5714 for Level 1.5 and 1348 for Level 2.0).
Thus, we investigated classification accuracy and the importance of each input variable to the RF model for these three initial input variable sets.
The OAs of the three initial input variable sets were in the ranges 51–59% and 52–58% for AERONET Level 1.5 and Level 2.0 data, respectively (Table 2). The set of all input variable candidates had the highest OA values of 59% and 58% for Level 1.5 and Level 2.0 data, respectively, indicating the benefit of combining TROPOMI and MODIS data for aerosol identification.
The RF provides the importance of each parameter using the mean decrease accuracy (MDA), which indicates the accuracy lost when a specific variable is excluded from the RF model. Therefore, a variable with a large MDA is more important for the classification algorithm. Figure 5 shows variable importance determined by the RF-based model. TROPOMI-based variables (aerosol index, 83%; SZA, 76%; CO, 70%; NO2, 64%) have generally higher MDA values than MODIS-based variables except for MODIS AOD. MODIS AOD tends to have a high MDA value of 82%. The TOA reflectance at 660 nm, land cover type, and urban ratio have higher MDA values (>57%) than other MODIS input variables.
Therefore, we chose an optimal input variable set that would assure the highest classification accuracy, while also having high variable importance. All variable candidates from TROPOMI (aerosol index, CO, NO2, and SZA) and MODIS (AOD, AE, TOA reflectance, land-cover type, and percent urban area) were selected to account for the combined effects of TROPOMI trace-gas data and MODIS aerosol optical properties on aerosol classification (Table 3). The optimal input variable set was constructed using AERONET Level 1.5 data, which can obtain more data than the Level 2.0 data.
5. Results 5.1. Statistical Assessment and Classification Sensitivity of the RF Model We investigated the confusion matrix and classification accuracy for each aerosol type to determine which type is usually confused or classified well. Aerosol types that are mainly confused with other types were merged based on a sensitivity test via confusion matrix analysis.
When aerosols were classified into seven types (PD, DDM, PDM, SA, MA, WA, and NA), the OA of the RF-based model was 59%. The model generally yielded reliable detection accuracies for the SA (74%), PD (74%), and DDM (69%) types, indicating sensitivity to dust and pollution particles with strong absorption. However, the detection performance for pollution aerosols MA, WA, and NA was generally <47%. The NA type classification accuracy in particular was low (21%). The confusion matrix (Figure 6) indicates that the main confusion occurs between pollution aerosols, apart from the SA type. Pollution aerosols (SA, MA, WA, and NA) tend to be confused with the PDM type in general. Specifically, MA and NA are generally confused with WA. In addition, It turns out our aerosol classifiers have difficulty distinguishing the seven aerosol types with suitable classification performance, especially between the pollution-related aerosols (MA, WA, NA, and PDM). We suggest that it is difficult to discriminate the absorbing features of aerosols in our current model and input variables except for SA aerosols. Furthermore, it is difficult to identify the PDM type that pollution aerosols are slightly mixed with dust aerosols. Thus, the classification performance is expected to be enhanced by merging aerosol classes. Although our model classifies aerosol types with average accuracy of 59%, we tried to additionally improve the performance of our RF-based model by merging aerosol classes. Even though WA class is confused from MA, merging between MA and WA classes may highly decrease the detection accuracy of NA. As a result, the WA type was merged into NA, and MA was integrated into the SA class. Thus, the initial seven aerosol types (PD, DDM, PDM, SA, MA, WA, and NA) were integrated into five aerosol types (PD, DDM, PDM, SA, and NA).
The training process was repeated for the new merged aerosol types. Our RF-based model showed enhanced aerosol detection performance with the OA of 67%. As shown in Figure 7, PAs for each aerosol class of our model ranges from 57% to 77%. In particular, the PA of the NA-type classification is improved greatly, from 21% to 60%. Moreover, there were still good classification performances for SA (77%), PD (75%), and DDM (68%) for our RF-based model. In PDM identification, our model usually confused it with pollution aerosols (NA and SA). The omission error for the PDM classification was found to be mainly caused by SA and NA types (31%), showing that a high contribution of pollution aerosols was responsible for the confusion. It is possible that our RF-based model using satellite variables had insufficient sensitivity to fully detect pollution aerosols that were slightly mixed with dust aerosols. Therefore, we decided to integrate the PDM class into pollution aerosols, so aerosols classified as PDM were reclassified into NA or SA types according to the AERONET SSA value at 1020 nm.
Then, the training process was repeated for the newly merged classes (SA, NA, DDM, and PD). The classification performance was also improved from 67% to 73% (Figure 8). Compared with the seven aerosol types (Figure 6), the new and simplified aerosol classes yielded stable classification performances (68% ≤ PA for each aerosol type classification ≤ 74%). In particular, the PA for NA classification in the RF-based model was greatly improved from 60% to 74%. However, there is still potential for improvement of the classification performance of the RF-based model through the adoption of new satellite input variables and increasing the size of the training dataset. In this section, our RF-based model is found to classify aerosols for up to seven types (PD, DDM, PDM, NA, WA, MA, and SA). The performance of our RF-based model was improved by merging aerosol classes into four (PD, DDM, NA (e.g., sulfate and nitrate), and SA (e.g., carbonaceous aerosols)) resulting in overall accuracy of up to 73%.
5.2. Evaluation of the RF Model with Aerosol Optical Properties from AERONET Data
The spectral dependence of SSA has been utilized to infer aerosol composition [56]. For example, SSA values of dust aerosols tend to increase with increasing wavelength; however, those of carbonaceous aerosols decrease with increasing wavelength [55,56,57]. As shown in Figure 9a,b, both AERONET-based and RF-based aerosol types showed similar trends in the wavelength dependence of SSA with increasing wavelength for each aerosol type. In particular, the SSAs of PD and DDM tended to increase with wavelength, showing a high contribution of dust aerosols. However, in the case of the DDM type, the increasing rate of SSA with wavelength is lower than that of PD, showing that our RF model was capable of distinguishing PD and dust aerosols slightly mixed with pollution aerosols. In the SA type, SSA values tended to decrease with increasing wavelength, showing that the AERONET-based and our RF-based algorithms reasonably identified the typical wavelength dependence of carbonaceous aerosol types, which is consistent with previous studies [21,55,56,57].
To quantitatively evaluate the performance of our RF-based model using aerosol optical properties for AERONET-based aerosol types, we compared SSA values at several wavelengths (440, 675, 870, and 1020 nm) for AERONET-based aerosol types (Figure 9a) against our RF-based model (Figure 9b). The differences between SSAs for AERONET-based and RF-based aerosol types were calculated (Figure 9c,d). The mean differences in SSA at 440, 675, 870, and 1020 nm were 0.002, 0.004, 0.007, and 0.008, respectively, showing low SSA differences (<0.01 on average).
We also compared the influence of merging aerosol types on the aerosol classification performance in terms of aerosol optical properties, including SSA values at 440, 675, 870, and 1020 nm, FMF, and Rd. Table 4 summarizes the average and standard deviations of differences in SSA values, FMF, and Rd between AERONET-based and RF-based aerosol types. In general, all of these parameters tended to decrease when the aerosol types were merged. For example, with seven aerosol types, differences in SSA at 440, 675, 870, and 1020 nm were 0.007, 0.008, 0.010, and 0.012, respectively, whereas with four aerosol classes, the differences were 0.002, 0.004, 0.005, and 0.006, respectively. The difference in Rd decreased from 0.047 to 0.016 by merging aerosol classes. The difference in Rd also highly decreased the most (0.027 to 0.005) with a decreasing ratio of 81% among other aerosol optical properties. These highly decreasing trends in aerosol optical properties indicate that merging aerosol classes contributed to a decrease in classification confusion in the satellite aerosol classification model.
6. Evaluation of the Threshold-Based Aerosol Classification Methods
In this present study, we compared the classification results from earlier aerosol classification methods against those from AERONET-based aerosol classification method [21], which has capability to identify the contribution of non-spherical particles aerosol type. Furthermore, we evaluated the classification methods with aerosol optical properties from the AERONET. The comparison was carried out using the satellite variables from TROPOMI and MODIS measurements data for the period from January 2018 to July 2020 to compare the performance of each aerosol classification algorithm in the same measurement period and with similar data. Two aerosol classification methods have been selected, since they utilize satellite variables that can be obtained from TROPOMI and MODIS [6,8]. The aerosol classification method suggested in Kim et al. [9] has been excluded due to the small number of data points of MODIS FMF. To compare aerosol classification results, we considered the difference in each aerosol classification scheme between earlier aerosol classification methods [6,8] against the AERONET-based aerosol classification method [21]. Table 5 summarizes satellite aerosol classification methods, used input variables, classification scheme, and the AERONET-based aerosol types, which have similar aerosol properties. For simplicity, the PDM type was integrated into pollution aerosols (SA, MA, WA, and NA) in the comparison. We compared classification results from earlier satellite methods with AERONET-based aerosol types by calculating the detection rate defined as the satellite/AERONET ratio for each of the classified types with similar aerosol properties (Table 5).
In Lee et al. [8], the aerosol index from OMI and AOD and AE from MODIS are used to classify aerosols. Torres et al. [6] utilized the aerosol index from OMI and CO from AIRS to classify aerosols, whereas in this present study, we used those same products from TROPOMI. As a result of the differences between sensors and retrieval algorithms, thresholds suggested in the previous studies need to be adjusted. In general, TROPOMI aerosol index is found to be less than the OMI aerosol index with a bias within 1 (TROPOMI aerosol index level 2 README Document, 2020). The detection rate was calculated based on various aerosol index thresholds from 0.0 to 1.0. In case of CO, the difference between AIRS CO and TROPOMI CO has not yet been compared. Therefore, the detection rate was calculated under various CO threshold values with the differences of ±5%, ±10%, and ±15%.
Figure 10 shows the detection rates for two threshold-based aerosol classification methods. In general, both aerosol classification methods have very good performance to identify dust aerosols with a detection rate of more than 74% compared with the detection rates of other aerosol types. In the case of Lee et al. [8], the dust detection rate ranges from 78% to 88% under various aerosol index thresholds as shown in Figure 10a. The other aerosol classification algorithm [6] also shows high detection rate ranges from 74% to 100% (Figure 10b). For both aerosol classification methods, the average detection rates tend to increase with decreasing aerosol index threshold. In case of Lee et al. [8], the maximum average detection rate (43%) was found when the aerosol index threshold is 0.1. For Torres et al. [6], highest average detection rate of 55% was found when the aerosol index threshold is 0.1 and the CO threshold is 10% higher than the original CO threshold.
Figure 11 and Figure 12 shows evaluation results between satellite aerosol classification methods against AERONET-based aerosol type using wavelength dependence of SSA. As shown in Figure 10, both aerosol classification methods also showed good agreement between wavelength dependence of SSA for the satellite methods and that for AERONET-based classification method for dust aerosol detection. However, for smoke aerosols detected by Lee et al. [8], we found the highest difference in SSAs compared with other aerosol types (Figure 11c). In general, the wavelength dependence of SSA for smoke and sulfate aerosol appears to be the opposite, indicating a confusion of absorbance for spherical aerosols. The confusion trend between scattering and absorbing properties for spherical aerosols was also observed for the Torres et al. [6] algorithm. The spectral SSA values for smoke and sulfate aerosols detected by Torres et al. [6] were found to be similar compared with those for AERONET-based aerosol types.
7. Discussion
Our newly developed aerosol classification method has an RF-based approach, whereas earlier methods use a threshold-based approach involving input variables from various satellite sensors. Previous studies utilized various satellite input variables including AOD, AE, FMF, aerosol index, and column amounts of trace gases (CO, NO2, HCHO, and SO2) from various satellite measurements [3,6,7,8,9,10,11]. Here, we used satellite input variables including AE, AOD, aerosol index, and column amounts of trace gases (tropospheric NO2 and CO), as suggested in previous studies. In addition, other satellite input variables (SZA, land-cover type, percentage urban area, and TOA reflectance at 412, 470, and 660 nm) were used in the RF-based aerosol classification method.
Aerosol classification methods were evaluated using comparisons between the aerosol climate model and earlier aerosol classification methods [8,9,10]. We evaluated the accuracy of the classification algorithm using the AERONET aerosol-type dataset and typical aerosol property values obtained from AERONET measurements. The overall accuracy of our model for classifying seven aerosol types was 59%, which improved to up to 73% when the seven classes were merged into four aerosol types (PD, DDM, SA, and NA). These four simplified aerosol classes yielded a stable classification performance (68% ≤ PA for each aerosol type classification ≤74%). The typical SSA wavelength dependence for the individual aerosol types is consistent with that determined by our method.
In future studies, the accuracy of the RF model may be improved with more training data being collected over a longer time period. The consideration of more input variables (e.g., meteorological variables) may enhance the sensitivity of aerosol type classification. This study demonstrates that the RF model can classify aerosol types with sensitivity to the contribution of non-spherical particles. Furthermore, the RF-based algorithm has the potential to improve the retrieval accuracy of satellite aerosol and trace-gas algorithms. 8. Summary and Conclusions In this study, we propose a new method for the identification of aerosol types using an RF model—a machine learning technique. The RF-based model was trained using an AERONET-based aerosol type dataset using MODIS and TROPOMI satellite input-variable datasets. Previous and new satellite input variables were adopted in the RF model, and their importance was investigated. The aerosol index, AOD, SZA, and the column amount of trace gases were found to make substantial contributions to the RF aerosol classification model. The performance of the RF-based model was evaluated using AERONET-based aerosol data and compared with that of previous satellite aerosol classification methods, which use empirically calculated threshold values. Our RF-based model was found to have generally better performance than the previous methods. Therefore, our RF-based model can identify aerosols of up to seven types, with more aerosol types and greater accuracy than in previous studies that use input variables similar to those of this study. The performance of our RF-based model was improved by merging aerosol classes into four types (PD, DDM, NA [e.g., sulfate and nitrate], and SA [e.g., carbonaceous aerosols]). This merging of aerosol types improved aerosol classification accuracy, with the RF-based method (using satellite variables) having limitations in detailing aerosol compositions. Thus, this study demonstrates that the RF-based model is capable of satellite aerosol classification with a sensitivity to aerosol identification.
With regard to the classification of the sea salt type, this type is not classified in the AERONET-based classification method [21], which was used to construct the AERONET-based aerosol type dataset in this study. Therefore, our current RF-based model does not classify the sea salt type. The sea salt type would be added to the RF-based model if a new AERONET-based aerosol type classification method could classify this type.
Sensor (Mission) | Product (Level) | Variables | Notes |
---|---|---|---|
TROPOMI (Sentinel-5P) | AI (L2) | Aerosol index | A qualitative measure indicating the presence of absorbing aerosols |
Solar zenith angle | The angle between the zenith and the sun | ||
CO (L2) | CO column amount | The number of molecules of CO from the surface to top of atmosphere per unit area | |
NO2 (L2) | Tropospheric NO2 column density | The number of molecules of NO2 from the surface to top of the troposphere per unit area | |
MODIS (Aqua) | MYD04 (L2) | Aerosol optical depth | A measure of the extinction of the solar radiance by aerosols |
Ångström exponent | A power law relationship with AOD An indicator of particle size | ||
TOA reflectance (deep blue; 412, 470 and 660 nm) | A ratio of reflected radiance to the incident solar radiance | ||
MCD12C1 (L3) | Land cover type | Major land cover type among land classes (annual) | |
Percent of urban area | A ratio of urban area (annual) |
Initial Input Variable Sets | ||||||
---|---|---|---|---|---|---|
Dataset Name | Input Variables | AERONET Data Level | The Number of Data | OA (%) | ||
Total | Training (60%) | Test (40%) | ||||
All input variable candidates (11 variables) | TROPOMI
- Aerosol index - Solar zenith angle - CO column amount - Tropospheric NO2 column density MODIS - Aerosol optical depth - Ångström exponent - TOA reflectance at 412, 470 and 660 nm - Land cover type - Percent of urban area | Level 1.5 | 4906 | 2946 | 1960 | 59% |
Level 2.0 | 1119 | 674 | 445 | 58% | ||
TROPOMI input variable candidates (4 variables) |
- Aerosol index - Solar zenith angle - CO column amount - Tropospheric NO2 column density | Level 1.5 | 8693 | 5218 | 3475 | 51% |
Level 2.0 | 1804 | 1086 | 718 | 53% | ||
MODIS input variable candidates (7 variables) |
- Aerosol optical depth - Ångström exponent - TOA reflectance at 412, 470 and 660 nm - Land-cover type - Percent of urban area | Level 1.5 | 5714 | 3432 | 2282 | 56% |
Level 2.0 | 1348 | 812 | 536 | 52% |
Optimal Input Variable Set | |||||
---|---|---|---|---|---|
Input Variables | AERONET Data Level | The Number of Data | OA (%) | ||
Total | Training (60%) | Test (40%) | |||
TROPOMI
- Aerosol index - Solar zenith angle - CO column amount - Tropospheric NO2 column density MODIS - Aerosol optical depth - Ångström exponent - TOA reflectance at 412, 470 and 660 nm - Land cover type - Percent of urban area | Level 1.5 | 4906 | 2946 | 1960 | 59% |
Seven Aerosol Classes (PD, DDM, PDM, SA, MA, WA, and NA (Sulfate)) | Four Aerosol Classes (PD, DDM, SA, and NA (Sulfate)) | |||
---|---|---|---|---|
Average | Standard Deviation | Average | Standard Deviation | |
SSA440 | 0.007 | 0.006 | 0.002 | 0.003 |
SSA675 | 0.008 | 0.009 | 0.004 | 0.005 |
SSA870 | 0.010 | 0.011 | 0.005 | 0.007 |
SSA1020 | 0.012 | 0.012 | 0.006 | 0.008 |
FMF | 0.027 | 0.020 | 0.005 | 0.006 |
Rd | 0.047 | 0.016 | 0.016 | 0.019 |
Overall accuracy | 59% | 73% |
Method | Variables | Classified Aerosol Types | AERONET-Based Aerosol Types |
---|---|---|---|
Lee et al. [8] | Aerosol index AE AOD | Smoke | SA and MA |
Dust | PD and DDM | ||
Sulfate, Seasalt+Sulfate | WA and NA (sulfate) | ||
Seasalt | They were not compared due to small number of classified cases (less than 10) | ||
Dust+Smoke | |||
Torres et al. [6] | Aerosol index CO | Carbonaceous | SA and MA |
Dust | PD and DDM | ||
Sulfate | WA and NA (sulfate) |
Author Contributions
H.L. designed and interpreted the entire experiments. W.C. collected AERONET measurement data and also trained and evaluated the RF-based model. J.P. collected satellite measurement data. All authors have read and agreed to the published version of the manuscript.
Funding
This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIP) (No. 2019R1F1A1058295).
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
For the results and data generated during the study, please contact the first author ([email protected]).
Acknowledgments
The authors would like to thank the U.S. NASA providing MODIS Collection 6.1 aerosol product and AERONET data. We also thank ESA for making possible the distribution of TROPOMI data. This work was performed within the framework of the Sentinel 5P Calibration & Validation (S5P Cal/Val) Project.
Conflicts of Interest
The authors declare no conflict of interest.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021. This work is licensed under http://creativecommons.org/licenses/by/3.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Abstract
A new method was developed for classifying aerosol types involving a machine-learning approach to the use of satellite data. An Aerosol Robotic NETwork (AERONET)-based aerosol-type dataset was used as a target variable in a random forest (RF) model. The contributions of satellite input variables to the RF-based model were quantified to determine an optimal set of input variables. The new method, based on inputs of satellite variables, allows the classification of seven aerosol types: pure dust, dust-dominant mixed, pollution-dominant mixed aerosols, and pollution aerosols (strongly, moderately, weakly, and non-absorbing). The performance of the model was statistically evaluated using AERONET data excluded from the model training dataset. Model accuracy for classifying the seven aerosol types was 59%, improving to 72% for four types (pure dust, dust-dominant mixed, strongly absorbing, and non-absorbing). The performance of the model was evaluated against an earlier aerosol classification method based on the wavelength dependence of single-scattering albedo (SSA) and fine-mode-fraction values from AERONET. Typical wavelength dependences of SSA for individual aerosol types are consistent with those obtained for aerosol types by the new method. This study demonstrates that an RF-based model is capable of satellite aerosol classification with sensitivity to the contribution of non-spherical particles.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer