1. Introduction
Downward shortwave radiation (RS) incident at the earth’s surface plays a vital role in the energy exchange between the land surface and atmosphere; RS drives the significant ecological and biophysical processes on the earth [1,2,3,4]. The RS information acts as an indicator of climate change since its availability on the earth depends on the atmospheric load and sky conditions, and also an essential variable of the earth’s surface radiation budget [5,6,7,8,9]; therefore, it is critical to acquire accurate RS estimates.
Ground measurements provide the most accurate RS data; however, special maintenance is needed, and they are still not adequate to deduce the spatial distribution of RS on account of the sparsely distributed stations [9,10]. Therefore, the spatial analysis of RS based on the limited ground stations at regional and global scales are always inadequate [4,11]. The ground measurements over the whole homogeneous flat area could be obtained by extending a single station if the atmosphere is horizontally homogeneous. Nevertheless, the extrapolation may be not valid if the land surface is heterogeneous or rugged [12]. Remote sensing is one of the alternative ways for estimating the RS on the local, regional, and global scales [13,14,15,16], due to its extensive spatiotemporal coverage of the earth surface. Many studies have made great efforts for estimating the RS using satellite observations based on various methods, which include the empirical statistical methods [14,17], parametrization methods [18,19,20,21], and retrieval methods based on radiative transfer models [3,22,23,24,25,26,27,28,29,30]. These developed approaches have both advantages and disadvantages. For example, one of the advantages of empirical methods is that they are easy to operate, but the proposed empirical methods are always site-dependent. Compared to empirical methods, the parametrization methods and retrieval approaches based on radiative transfer models have a clear physical basis. On the other hand, these two methods require multiple atmospheric products (e.g., cloud optical depth and aerosol optical depth [31,32,33,34]) as input variables. The errors in cloud and aerosol products may cause possible uncertainties in the estimated products [31,35]. Moreover, it is not easy to simultaneously balance the accuracy and efficiency for the retrieval methods based on the radiative transfer models.
In addition to the aforementioned methods, the machine learning method is another feasible way to estimate the RS using satellite observations [3,36,37,38,39,40,41,42,43,44,45]. Ryu et al. [3] applied the artificial neural network (ANN) for computing the RS with the MODIS product as inputs. The validation results indicated that the relative bias was −2.3% for the ANN model. Wang et al. [37] proposed an ANN-based approach to derive the RS based on MODIS products. The validation results exhibited that the maximum root mean square error (RMSE) was less than 45 Wm−2. Ghimire et al. [43] selected the support vector regression (SVR) method to estimate the RS using MODIS data; the result showed that it was also a feasible way to apply the hybrid SVR model for obtaining RS using satellite observations. Machine learning methods are weak in the physical basis; however, previous studies proved that machine learning methods are one effective way to estimate RS using satellite observations [44,45]. Compared to the traditional approaches for estimating RS, machine learning methods have the advantage of catching potential nonlinear relationships between the input variables and the RS, and they can be applied to a variety of remote sensing variables [38]. The random forest (RF) method has been widely applied in the regression and classification analysis within the remote sensing research field owing to the high accuracy and computational efficiency [46,47]. However, it has rarely been used to estimate the RS for the new satellite missions.
Himawari-8 is a new generation of geostationary meteorological satellite with the most advanced optical sensors, showing significant improvements over previously available satellites in the geostationary orbit [48,49]. It provides a scan of full-disk, a hemispheric region including the Pacific Ocean (with central coordinates of 0.0°N, 140.7°E) with 2-km spatial resolution and 10-min temporal resolution [50]. To make widespread application of the enhanced monitoring capabilities of Himawari-8, great efforts have been expended, and a few level-2 physical products (e.g., the sea surface temperature, aerosol properties, and RS) have been officially released from March 2016. Some studies have been conducted to evaluate the performance of the RS product from the Himawari-8, which was generated based on the algorithm proposed by Frouin and Murakami [51]. Lee et al. [29] retrieved the RS at the top of the atmosphere using Himawari-8 Advanced Himawari Imager (AHI) data; the results showed the RMSE was 52.12 Wm−2 compared with Terra, Aqua, and S-NPP/CERES data. Shi et al. [52] evaluated the RS product from the Himawari-8 using the Chinese Ecosystem Research Network (CERN) RS data; the results indicated that the officially released daily RS product had a mean bias error (MBE) value of 13.8 Wm−2 when compared to the CERN RS measurements. Yu et al. [53] evaluated the Himawari-8 RS products using ground measurements collected from five networks with an MBE value of 19.7 Wm−2. Damiani et al. [49] evaluated the Himawari-8 RS product using surface observations from four SKYNET stations in Japan and the Japan Meteorological Agency (JMA) surface network. The comparison results showed that the Himawari-8 RS product was in good agreement with the ground-measured data with the MBE values ranging from 20 to 30 Wm−2. Although many studies have been conducted to estimate RS using diverse methods, few studies have investigated the possibility of estimating RS using machine learning methods over China, especially based on the RF method.
In this study, we try to estimate the RS using the Himawari-8 Advanced Himawari Imager (AHI) data based on the RF method. The estimated RS data at the daily time scale and monthly time scale are evaluated against ground-measured RS at 86 Climate Data Center of the Chinese Meteorological Administration (CDC/CMA) stations in China. We also make a comparison between the RF method and the traditional ANN method in estimating the RS. This paper is organized as follows: In Section 2, the datasets applied in this study are introduced. Section 3 introduces the RF machine learning method. Section 4 shows the results and a brief analysis, and Section 5 provides a discussion of the RF method. Finally, a short summary is presented in the last section of this paper.
2. Data
2.1. Himawari-8 AHI Data
Himawari-8 is a geostationary satellite, which is managed by the JMA. It is located at 0.0°N and 140.7°E, about 35,800 km over the land surface. The Himawari-8 has the most advanced optical sensors named AHI. The number of spectral bands, and the spectral and spatial resolution have been greatly promoted [48,53]. The Himawari-8 AHI has 16 bands (three for visible, three for near-infrared, and ten for infrared) in spatial resolutions of 0.5–2.0 km, which can provide abundant spectral information from visible to infrared [29]. The Himawari-8 AHI is capable of supplying a scan of the full-disk of the earth’s surface and the target region at the temporal resolution of 10-min and 2.5-min, respectively. On account of the high temporal resolution, it is important for better understanding the RS spatiotemporal variations in short time scales [54]. The information of Himawari-8 AHI bands is shown in Table 1 [55,56], and more detailed information about the Himawari-8 AHI is described in Bessho et al. [50]. The full-disk scan of Himawari-8 AHI data with 16 observational bands from February to May 2016 were used for analysis in this study.
Clouds and aerosols are the two important parameters affecting the RS [32,33,34]. The accuracy of the estimated RS is dependent on the quality of clouds and aerosols data. Besides clouds and aerosols, water vapor is also a significant parameter affecting the RS [4,14]. Water vapor provided by the Modern-Era Retrospective Analysis for Research and Applications, version 2 (MERRA-2) at an hourly temporal resolution and 0.5 × 0.625-degree spatial resolution [57,58] was used as an ancillary variable. The length of daytime is also suggested to be used for estimating the RS data [59,60]. The impact of solar zenith angle and surface elevation also should be considered in estimating the RS [4,61]. In this study, we presented a direct estimation method to generate the RS data from the Himawari-8 AHI based on the RF method. The Himawari-8 AHI top-of-atmosphere (TOA) radiance of 16 observational bands, water vapor, solar zenith angle, the length of daytime, and elevation data were used as input variables of the RF method to estimate the RS in this study.
2.2. Ground Measurements
The ground measurements used to build RS estimation models were obtained from the CDC/CMA. The spatial coverage of the collected Himawari-8 AHI data covered 86 CDC/CMA radiation stations in China. Ground-measured RS of daily time scale at the corresponding 86 stations from February to May 2016 were collected from CDC/CMA. The RS measurement by CDC/CMA began in 1957; since 1994, there have been only 96 stations with the records of solar radiation. Before releasing the solar radiation measurements, the CDC/CMA carries out quality control on the radiation data, which includes spatiotemporal consistency checks, and manual correction and adjustment. Previous studies [62,63] indicated that the quality of ground-measured solar radiation data from CDC/CMA should be examined more critically before using them. With the aim of guaranteeing the solar radiation data is reliable, a quality control procedure, proposed by Zhang et al. [10], was conducted in this study. The quality of RS ground measurements from CDC/CMA is controlled based on the reconstructed daily and monthly integrated RS data using daily routine meteorological data from the CDC/CMA [64,65]. More detailed information about the quality control procedure is described in Zhang et al. [10]. Figure 1 displays the spatial distribution of the 86 radiation stations over China.
2.3. CERES–EBAF RS Data
The CERES–EBAF data from February to May 2016 were applied to compare with the estimated RS data from the RF method in this study. The CERES–EBAF data [24,66] are specifically generated for the applications of evaluating and improving climate models and improving the understanding of the variability in the earth’s energy budget. The CERES project offers surface irradiance and TOA irradiance in a variety of spatiotemporal scales [67]. Surface irradiance is calculated with satellite-derived data including cloud, aerosol, temperature, and humidity profiles [68]. The EBAF-surface dataset provides monthly downward shortwave irradiance, which is constrained by the TOA irradiance derived from CERES. The EBAF data are generated for addressing two disadvantages of the CERES level-3 dataset: the requirement of absolute accuracy in quantifying the earth’s energy imbalance (EEI) [69], and the fact that CERES–EBAF RS data combine more accurate cloud information from CERES instruments [24]. It has proven to be one of the most accurate RS datasets [10,70]. The CERES–EBAF RS is a gap-filled product with a temporal resolution of one month and a spatial resolution of one degree from March 2000 to February 2017 [71].
3. Methodology
3.1. Random Forest
Random forest (RF), which contains abundant regression trees, is a powerful ensemble-learning algorithm for regression and classification studies [72]. The RF method is a modified method of the bagging regression tree, which belongs to the family of regression trees [73,74,75]. The binary regression tree method includes iteratively dividing the dataset into two separate sets based on some specific rules (e.g., thresholds) [74]; a group of rules on the decision about the predictors is established [76], so that the smaller sets can be divided from the data according to an individual predictor, and the binary split can be performed [75]. Each set is split into two sets, and the final set is named a leaf; then the predictive value can be obtained from the values of the leaf. Each tree firstly grows to the maximum size, and then prunes overfitting trees into optimal sizes using cross-validation and other techniques. A randomization step was added to bagging by Breiman [72]; in the random subgroup of the predictors, the segmentation of each bagged regression tree was built [74]. During the process, around one-third of the training samples that are not used in the bootstrap process is known as out-of-bag data (OBB) [72]. For each bootstrap sample, the optimal segmentation is decided by means of a randomly selected subset at each node according to the minimized Gini index. A sufficiently large number of trees are grown creating a random forest. During the prediction process, the random forest averages the predictors of all regression trees. The residual mean square (RMS) of OBB is used to evaluate the prediction accuracy [77]. The RF method was carried out using the scikit-learn toolbox in this study [78]. The structure of the method is displayed in Figure 2.
3.2. Model Construction
This study estimated the RS from Himawari-8 AHI data and other ancillary parameters using the RF method. The prediction of the RS started with putting variables in the RF model. To construct the optimal model, we studied the various band intervals for their importance in determining RS. The experiments were tested with different band intervals (1, 2, 3, and 4 bands). The validation results showed that the accuracy of the estimated RS data decreased as the band intervals increased, which indicated that the estimated RS data were the most accurate when all Himawari-8 AHI bands were used. The ground-measured daily RS of 86 stations from February to May 2016 were selected as the object variables. Sixteen variables of the Himawari-8 AHI data, which were TOA radiance of bands 1–16, were the input variables in this study. Besides these sixteen variables, water vapor data from MERRA-2, elevation data, the length of daytime, and the solar zenith angle data were also used as the input variables. The solar zenith angle and the length of daytime were calculated according to the specific geographical location and time. The datasets were randomly divided into two groups to construct the models: 80% for training data and the remaining 20% for validation data. We then applied the k-fold cross validation for selecting the best parameters during the training process. Several main parameters of the RF method were used for adjustment to obtain the optimal RF model. The n-estimator parameter, which is the tree number of the forest, can increase the RF method accuracy but at a higher computational cost when it is larger. Additionally, the underfitting and overfitting may occur when the n-estimators are smaller or larger than the best number [79]. As for the max-features parameter, which is the number of random feature subgroups, it can reduce variance but increase MBE. To improve the performance of the RF method, we also modified the min-samples-leaf and min-samples-split. Looping in each parameter threshold as shown in Table 2, these parameters were optimized using the k-fold cross validation approach according to the R, RMSE, and MBE values. It is the optimization that improved the RF method performance for the best fitting. The optimization results indicated that the RF method had the optimum performance when the n-estimators parameter was 400, the max features parameter was 10, the min-samples-split was 2, and the min-samples-leaf was 1. After the optimal parameters were determined, the RF model was applied for estimating the RS.
3.3. Sensitivity Analysis and Scaling Issue
To further understand the RF method in estimating the RS, a sensitivity analysis was conducted for investigating the influence of the RF method parameters on the accuracy of estimated RS data. In order to conduct the sensitivity analysis of each parameter, we obtained the variations on R, RMSE, and MBE of the model for the validation data by keeping other parameters of the optimal RF model invariant.
As shown in Figure 3, the sensitivity analysis of the RF method parameters illustrated that R values were sensitive to the max-features. When the max-features increased from 2 to 10, the R values increased from 0.77 to 0.90 gradually. However, the R values decreased to 0.69 as the max-features increased from 10 to 19. The variations of n-estimators, min-samples-split, and min-sample-leaf had little impacts on R values. As shown in Figure 4, the sensitivity analysis of the RF method parameters illustrated that RMSE and MBE values were more sensitive to the max-features and min-samples-leaf parameters than the other two parameters according to the range of variations. The RMSE and MBE values increased as the min-sample-leaf parameter and min-samples-split parameter increased, while for the n-estimator, the RMSE and MBE values decreased as it increased. When the max-features increased from 2 to 10, the RMSE values decreased from 41.42 Wm−2 to 36.62 Wm−2; when the max-features increased from 10 to 19, the RMSE values increased to 43.14 Wm−2. The MBE values of the RF method were not sensitive to the min-sample-split parameter. Almost no variation of the MBE values was observed when the min-sample-split parameter varied from 2 to 10.
According to the sensitivity analysis, the accuracy of the estimated RS (R, RMSE, and MBE) was the most sensitive to the max-features of the RF method of the four parameters. The min-sample-split and n-estimator parameters had low impacts on the accuracy of RS data.
A problem of scaling in comparing RS from satellite data and ground measurements was found in previous studies [80,81,82]. The impact of window size of spatial averaging on the consistency of the estimated RS data from the RF method and ground measurements was studied. Figure 5 shows that the RMSE values of the estimated RS data from the RF method decreased with increasing window size when the window size was less than 58 km, then the RMSE values increased with increasing window size. The smallest RMSE appeared when the window size was 58 km. For the monthly data, the optimal window size was 26 km.
4. Results and Analysis
The estimated RS data based on the RF method were evaluated against ground RS measurements of 86 CDC/CMA stations at the daily time scale, as well as the monthly time scale. The correlation coefficient (R), root mean square errors (RMSE), and MBE were computed for evaluating the accuracy of the estimated RS data. Additionally, the comparison between the estimated RS data and the CERES–EBAF RS data was also conducted.
4.1. Validation Against Ground Measurements
4.1.1. Validation at a Daily Time Scale
The RF method performance for estimating daily RS from the Himawari-8 AHI data is displayed in Figure 6. The daily estimated RS data were averaged from a 58 km × 58 km window centered at the stations. It showed that the daily mean RS estimates from the Himawari-8 AHI data using the RF method agreed well with ground measurements from CDC/CMA. For the training data, the RF method had better performance in estimating daily RS; it showed that the overall R of the estimated RS data based on the RF method was 0.99, the RMSE was 11.16 (5.83%) Wm−2, and the MBE was −0.06 (−0.03%) Wm−2. For the validation data, it showed that the overall R was 0.92, the RMSE was 35.38 (18.40%) Wm−2, and the MBE was 0.01 (0.01%) Wm−2. The results indicated that the RF method performed well in estimating the RS with the Himawari-8 AHI data at the daily time scale.
4.1.2. Validation at a Monthly Time Scale
The monthly mean RS were obtained from averaging all the daily RS in a month including the training and validation data. The monthly RS of a month was not be calculated if there were more than nine missing days in a month. The monthly mean estimated RS data from the RF method from February to May were validated using the ground-measured RS data from CDC/CMA. The monthly estimated RS data were averaged from a 26 km × 26 km window centered at the stations. The performance of the RF method for estimating monthly RS is displayed in Figure 7. The figure illustrates that the monthly mean estimated RS data were in good agreement with the ground measurement from CDC/CMA. The overall R was 0.99, the RMSE was 7.74 (4.09%) Wm−2, and the MBE was 0.03 (0.02%) Wm−2. It can be concluded that the RF method could estimate the RS accurately at the monthly time scale.
4.2. Comparison with CERES–EBAF
4.2.1. Validation Against Ground Measurements
It has been demonstrated that the CERES–EBAF RS is one of the most accurate global RS datasets available to date [10,70]. The estimated RS data from the Himawari-8 AHI data using the RF method were compared with the CERES–EBAF RS data in this study.
The monthly mean RS estimates from February to May 2016 at 86 CDC/CMA stations were compared with ground measurements. Figure 7 and Figure 8 show the comparison of RS estimates and RS ground measurements. The estimated RS data from the RF method had relatively better accuracy than those from the CERES–EBAF data at the selected stations over China with an overall R value 0.99, an RMSE value of 7.74 (4.09%) Wm−2, and an MBE value of 0.03 (0.02%) Wm−2. The CERES–EBAF RS data had similar accuracy at the monthly time scale; the overall R was 0.89, the RMSE was 24.24 (12.79%) Wm−2, and the MBE was −0.25 (−0.13%) Wm−2. The validation results indicated that the estimated RS data from the RF method had comparable accuracy compared to the CERES–EBAF RS data at the selected stations over China, and the estimated RS data from the RF method were in good agreement with ground measurements.
In addition, the estimated RS data from the RF method and CERES–EBAF data were analyzed with the two-sample Kolmogorov–Smirnov (K–S) test. The two-sample K–S test is a non-parametric test that requires no assumptions on the distribution. It is designed to test whether two samples are drawn from the same distribution [83]. The null hypothesis was that these two distributions had no significant change. The resulting H was 1 if the test rejected the null hypothesis at the 5% significance level, and 0 otherwise. As shown in Figure 9, the resulting H was 0, which confirmed the null hypothesis. It indicated that the estimated RS data from the RF method and CERES–EBAF data had no observable difference at the 5% significance level. According to the validation results, we concluded that the estimated RS data based on the RF method showed comparable precision to the CERES–EBAF RS data but with higher resolution.
4.2.2. Mapping RS of China
Remote sensing is one of the feasible ways to generate the RS products involving high temporal and spatial resolution from satellite observations. In this study, the Himawari-8 AHI data were applied to estimate RS with one-hour temporal resolution and 2-km spatial resolution in mainland China from February to May 2016 based on the RF method. The spatial distribution over China of the monthly mean RS generated from the CERES–EBAF data and RS estimates from the Himawari-8 AHI data based on the RF method were compared, as displayed in Figure 10. It illustrates that the spatial distribution and temporal variation of monthly mean estimated RS data from the RF method were in line with those of CERES–EBAF RS data over China. It indicates that the estimated RS data from the RF method were accurate but with higher spatial resolution and time resolution compared to the CERES–EBAF RS data.
The CERES–EBAF RS is one of the most accurate global RS datasets, according to previous studies; however, some studies pointed out that it tends to overestimate the RS in China [10,70]. To quantify the difference, the spatial resolution of the estimated RS data from the RF method was aggregated into 1-degree steps for matching those of the CERES–EBAF RS data, and the correlation between them was calculated (Figure 11). The estimated RS data from the RF method agreed reasonably well with CERES–EBAF data with an R value of 0.91, an RMSE of 22.00 (10.83%) Wm−2, and an MBE of −2.93 (−1.44%) Wm−2. The spatial distribution patterns of their discrepancies are shown in Figure 12. It shows that the differences between the estimated RS data from the RF method and the CERES–EBAF were relatively large in February and March, especially in southern China. The differences between them varied from −20 Wm−2 to 20 Wm−2 in April and May. It is also found that the discrepancies between the estimated RS and the CERES–EBAF gradually changed from negative to positive from February to May 2016. As for the monthly mean RS from February to May, the differences of the multiple months mean RS was not as obvious as these in each month. The mean differences values in China were relatively larger in May, and the mean difference values in February, March, and April gradually changed from −11.04 Wm−2 to 8.51 Wm−2. Overall, the RS estimates from Himawari-8 AHI data based on the RF method and from the CERES–EBAF had similar temporal variation and spatial distribution over China. The relatively large differences in southern China between the estimated RS data from the RF method and the CERES–EBAF were probably related to the influences of aerosols, clouds, and their interactions on the RS [32,33,34].
4.3. Comparison with ANN
The RF method was applied for estimating the RS using the Himawari-8 AHI data in this study. The ANN method, as the traditional machine learning method, is a non-linear modeling algorithm, which was developed based on the biological structure of the human neuron [84]. The ANN method does not require the knowledge of a specified problem, because the method can learn from examples; moreover, the method can provide flexible mathematical algorithms for different problems. The ANN method has been widely used in various applications including estimating the RS [3,84]. In this study, the RF method was compared with the commonly used ANN method in estimating the RS.
The ANN method performance in estimating the daily RS using Himawari-8 AHI data is displayed in Figure 13. For the training data, the overall R of the estimated RS data based on the ANN method was 0.90, the RMSE was 41.09 (21.49%) Wm−2, and the MBE was 1.46 (0.76%) Wm−2. For the validation data, the overall R was 0.86, the RMSE was 45.96 (23.90%) Wm−2, and the MBE was 1.48 (0.77%) Wm−2. For the monthly time scale, Figure 14 shows that the overall R of the estimated RS data based on the ANN method was 0.93, the RMSE was 20.09 (10.62%) Wm−2, and the MBE was 1.87 (0.99%) Wm−2.
The comparison of the results (Table 3) indicated that the RF method was more accurate than the ANN method in estimating the RS over China at the daily and monthly time scales, and the RF method could estimate the RS with reasonable accuracy at the daily and monthly time scale.
Figure 15 displays the accuracy of spatial distribution of daily mean RS estimated from Himawari-8 AHI data based on the RF and ANN method at 86 CDC/CMA stations. According to Figure 15, RS estimates from Himawari-8 AHI data using the RF method agreed well with ground measurements in the majority of the sites, with R values varying from 0.92 to 0.99, MBE values varying from −12.83 to 10.36 Wm−2, and RMSE values varying from 12.04 to 29.21 Wm−2. It is noted that R values were larger than 0.92 in 84 of 86 stations, and MBE values were less than 10 Wm−2 in 76 of 86 stations. For the ANN method, the R values varied from 0.70 to 0.96, the MBE values varied from −55.71 to 36.70 Wm−2, and the RMSE values varied from 22.23 to 87.03 Wm−2. This showed that the MBE and RMSE values of the ANN model were greater than those of the RF method at the daily time scale. From the results, one may conclude that the RF method performed better than the ANN method at most stations.
5. Discussion
In this study, the RF method was applied to estimate the RS data directly using the Himawari-8 AHI data. Compared to the ground measurements and CERES–EBAF RS data, the validation results showed that the estimated RS data had reasonable accuracy. The results also showed that the differences between the RF method estimates and two other data were relatively large in southern China. This might be related to aerosols and clouds, since previous studies reported that aerosols absorb and scatter the RS and indirectly influence the RS by modifying cloud properties [85,86,87]. Zhang et al. [32] pointed out that southern China includes some well-developed places where heavy pollution occurs. Li et al. [88] pointed out that the aerosol optical depth of southern China in the winter is relatively high. This is related to the monsoon, as well as anthropogenic aerosols, which are caused by pollution due to the rapid economic development in southern China. The aerosol optical depth and aerosol–cloud interaction might make the RS estimation more difficult and cause relatively larger differences between the estimated RS data from the RF method and CERES–EBAF in southern China. Since ground-measured clouds and aerosol data were not available at CDC/CMA stations, the influence of aerosols and clouds were not included in this study. The influence of aerosols, clouds, and their interactions on estimating the RS should be analyzed in future studies. Moreover, urbanization and topographical effects might cause bias in ground measurements [10,89], and their influence on the RS also should be quantified in future studies.
The RF method had been widely applied in many studies. The RF method has some advantages; for example, the nonlinear interaction can be automatically found by the RF method through regression tree learning. There are also some disadvantages of the RF method; for example, it may be ineffective because regression trees are regarded as a black box [90]; moreover, it is not easily interpretable. Overfitting problems may occur in applications of the RF method because of complicated regression trees, particularly when noisy data are involved [91]. When the network of ANN grows too large, the ANN method may also have the overfitting issue [92].
6. Conclusions
The RF machine learning method is applied in this study to estimate RS at the daily and monthly time scales and 2-km spatial resolution using the Himawari-8 AHI data and ancillary datasets. The sensitivity analysis of the RF method parameters shows that the max-features parameter has the most significant influence on the accuracy of estimated RS data. The RS estimates from the Himawari-8 AHI data using the RF method are evaluated against ground measurements at 86 CDC/CMA stations from February to May 2016 as a target population. At the daily time scale, the estimated RS data from the RF method agree well with ground-measured RS for the validation data with an overall R of 0.92, an RMSE of 35.38 (18.40%) Wm−2, and an MBE of 0.01 (0.01%) Wm−2, while at the monthly time scale, the estimated RS data from the RF method agree better with ground-measured RS. The results show that the overall R is 0.99, the RMSE is 7.74 (4.09%) Wm−2, and the MBE is 0.03 (0.02%) Wm−2.
The monthly RS estimates from the RF method are used to make a comparison to the CERES–EBAF data, the results indicate that the estimated RS data from the RF method agree well with the CERES–EBAF RS data. Moreover, the estimated RS data from the RF method are relatively more accurate than the CERES–EBAF RS data, when compared with the ground-measured data at the selected stations over China, and they show similar temporal variation and spatial distribution. The differences between the CERES–EBAF RS data and estimated RS data from the RF method vary with time; relatively larger mean differences values occur in May, and the mean differences values in February, March, and April gradually change from −11.04 Wm2 to 8.51 Wm2. In addition, this study also compares the RF method to the ANN method in estimating the RS. The results indicate that the RF method has higher accuracy in estimating the RS from Himawari-8 AHI data at both the daily time scale and monthly time scale than the ANN method.
Overall, the results in this study indicate that the Himawari-8 AHI data based on the RF method can be used to estimate the RS over China with reasonable accuracy. The estimated RS data using the Himawari-8 AHI data based on the RF method can be used in other research, such as hydrological and climate change studies, and so on.
Author Contributions
Conceptualization, X.Z.; data curation, N.H.; supervision, X.Z.; writing—original draft, N.H.; writing—review and editing, X.Z., W.Z., Y.W., K.J., Y.Y., B.J. and J.C. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded in part by the National Key Research and Development Program of China under Grant 2016YFA0600102, and in part by the National Natural Science Foundation of China under Grant 41571340.
Acknowledgments
The ground-measured RS was collected from CDC/CMA at
Conflicts of Interest
The authors declare no conflict of interest.
Figures and Tables
Figure 1. Spatial distribution of the 86 Climate Data Center of the Chinese Meteorological Administration (CDC/CMA) radiation stations.
Figure 3. Sensitivity analysis of R values to the (a) n-estimators; (b) min-sample-split; (c) min-sample-leaf; (d) max-features of the RF method.
Figure 4. Sensitivity analysis of RMSE and MBE values to the (a) n-estimators; (b) min-sample-split; (c) min-sample-leaf; (d) max-features of the RF method.
Figure 5. Impact of window size on the evaluation results of monthly mean estimated RS data from the RF method against ground measurements from CDC/CMA from February to May 2016. (a) Daily RS; (b) Monthly RS.
Figure 6. Evaluation results of daily estimated RS data from Himawari-8 AHI data based on the RF method against ground measurements from CDC/CMA from February to May 2016: (a) for training data; (b) for validation data. Estimated RS data are averaged from a 58 km × 58 km window centered at the stations. N is the number of data points.
Figure 7. Evaluation results of monthly mean estimated RS data from Himawari-8 AHI data based on the RF method against ground measurements from CDC/CMA from February to May 2016. Estimated RS data are averaged from a 26 km × 26 km window centered at the stations. N is the number of data points.
Figure 8. Evaluation results of monthly mean estimated RS data from CERES–EBAF data against ground measurements from CDC/CMA from February to May 2016. N is the number of data points.
Figure 9. Cumulative density function of two-sample K–S test values for the estimated RS data from the RF method and CERES–EBAF from February to May 2016.
Figure 10. The spatial distribution of the monthly mean estimated RS data over China of CERES–EBAF and the RF method from February to May 2016: panels (a–h) for each month; (i,j) for monthly mean RS from February to May.
Figure 11. Evaluation results of the estimated monthly mean RS based on the RF method using CERES–EBAF RS data from February to May 2016. N is the number of data points.
Figure 12. The differences of the monthly mean estimated RS data between the CERES–EBAF and the RF method (i.e., the CERES–EBAF estimates minus the RF-based estimates) from February to May 2016: panels (a–d) for each month; (e) for the difference of monthly mean RS from February to May.
Figure 13. Evaluation results of daily estimated RS data from Himawari-8 AHI data using the ANN method with ground-measured RS from CDC/CMA from February to May 2016: (a) for training data; (b) for validation data. N is the number of data point.
Figure 14. Results of evaluation for monthly mean estimated RS data from Himawari-8 AHI data using the ANN method with ground-measured RS from CDC/CMA from February to May 2016. N is the number of data point.
Figure 15. Spatial distribution of accuracy of RS estimates from Himawari-8 AHI data based on two machine learning methods at a daily time scale at the CDC/CMA stations from February to May 2016: panels (a,b) are the R between ground measurements and estimates from the RF and ANN methods, respectively; panels (c,d) are the MBE and panels (e,f) are the RMSE for two methods, respectively.
Specifications of Himawari-8 Advanced Himawari Imager (AHI) spectral bands.
Band | Descriptive Name | Central Wavelength (μm) | Spatial Resolution (km) | Primary Purpose |
---|---|---|---|---|
1 | Blue | 0.46 | 1.0 | Daytime aerosol over land, coastal water mapping |
2 | Green | 0.51 | 1.0 | Green band-to produce color composite imagery |
3 | Red | 0.65 | 0.5 | Day time vegetation/burn scar and aerosols over water, winds |
4 | Vegetation | 0.86 | 1.0 | Daytime cirrus cloud |
5 | Snow/ice | 1.61 | 2.0 | Daytime cloud-top phase and particle size, snow |
6 | Cloud particle size | 2.26 | 2.0 | Daytime land/cloud properties, particle size, vegetation, snow |
7 | Shortwave window | 3.85 | 2.0 | Surface and cloud, fog at night, fire, and winds |
8 | Upper-level water vapor | 6.25 | 2.0 | High-level atmospheric water vapor, winds, and rainfall |
9 | Mid-level water vapor | 6.95 | 2.0 | Mid-level atmospheric water vapor, winds, and rainfall |
10 | Lower-level/Mid-level water vapor | 7.35 | 2.0 | Lower-level atmospheric water vapor, winds, and SO2 |
11 | Cloud-top phase | 8.60 | 2.0 | Total water for stability, cloud phase, dust, SO2, and rainfall |
12 | O3 | 9.63 | 2.0 | Total ozone, turbulence, and winds |
13 | Clean longwave window | 10.45 | 2.0 | Surface and cloud |
14 | Longwave window | 11.20 | 2.0 | Imagery, sea surface temperature, clouds, and rainfall |
15 | Dirty longwave window | 12.35 | 2.0 | Total water, ash, and sea surface temperature |
16 | CO2 | 13.30 | 2.0 | Air temperature, cloud heights and amounts |
Parameters to set for determining the optimal RF model.
Parameters | Threshold | Intervals |
---|---|---|
n-estimators | 50–400 | 50 |
max-features | 2–19 | 1 |
min-samples-split | 2–10 | 1 |
min-samples-leaf | 1–10 | 1 |
Statistical parameters of training and validation data of two models at the daily and monthly time scale.
Time Scale | Data | Method | R | RMSE (Wm−2) | MBE (Wm−2) |
---|---|---|---|---|---|
Daily | Training Data | RF | 0.99 | 11.16 (5.83%) | −0.06 (−0.03%) |
ANN | 0.90 | 41.09 (21.49%) | 1.46 (0.76%) | ||
Validation Data | RF | 0.92 | 35.38 (18.40%) | 0.01 (0.01%) | |
ANN | 0.86 | 45.96 (23.90%) | 1.48 (0.77%) | ||
Monthly | All Data | RF | 0.99 | 7.74 (4.09%) | 0.03 (0.02%) |
ANN | 0.93 | 20.09 (10.62%) | 1.81 (0.99%) |
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2020 by the authors.
Abstract
Downward shortwave radiation (RS) drives many processes related to atmosphere–surface interactions and has great influence on the earth’s climate system. However, ground-measured RS is still insufficient to represent the land surface, so it is still critical to generate high accuracy and spatially continuous RS data. This study tries to apply the random forest (RF) method to estimate the RS from the Himawari-8 Advanced Himawari Imager (AHI) data from February to May 2016 with a two-km spatial resolution and a one-day temporal resolution. The ground-measured RS at 86 stations of the Climate Data Center of the Chinese Meteorological Administration (CDC/CMA) are collected to evaluate the estimated RS data from the RF method. The evaluation results indicate that the RF method is capable of estimating the RS well at both the daily and monthly time scales. For the daily time scale, the evaluation results based on validation data show an overall R value of 0.92, a root mean square error (RMSE) value of 35.38 (18.40%) Wm−2, and a mean bias error (MBE) value of 0.01 (0.01%) Wm−2. For the estimated monthly RS, the overall R was 0.99, the RMSE was 7.74 (4.09%) Wm−2, and the MBE was 0.03 (0.02%) Wm−2 at the selected stations. The comparison between the estimated RS data over China and the Clouds and Earth’s Radiant Energy System (CERES) Energy Balanced and Filled (EBAF) RS dataset was also conducted in this study. The comparison results indicate that the RS estimates from the RF method have comparable accuracy with the CERES-EBAF RS data over China but provide higher spatial and temporal resolution.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details



1 State Key Laboratory of Remote Sensing Science, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China; Beijing Engineering Research Center for Global Land Remote Sensing Products, Institute of Remote Sensing, Science and Engineering, Faculty of Geographical Science, Beijing Normal University, Beijing 100875, China