Full Text

Turn on search term navigation

This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1. Introduction

Fine particulate matter (PM2.5) is particulate matter with aerodynamic diameter less than 2.5 μm in ambient air [1]. Hazy weather will form if PM2.5 concentration is too high, which has adverse impacts on human health, traffic, and outdoor activities [2], and it will also produce other indirect inestimable economic losses [3]. Therefore, many countries attach great importance to the monitoring and forecasting of PM2.5 concentration. A large number of ground-based monitoring stations have been established. For example, 1500 monitoring stations have been set up in the United States. In China, around 1500 stations have been set up in 454 cities by 2018, and a new national ambient air quality standard for PM2.5 was introduced in 2012 [1, 2]. Generally, it is believed that high PM2.5 concentration has become a prominent challenge for air pollution control in China, which is mainly caused by the industrial combustion of coal and gasoline, traffic emissions, and long-distance transport [4, 5]. The North China Plain, especially the Beijing-Tianjin-Hebei region (Figure 1(a)), is one of the regions most severely affected by the hazy weather [4, 6]. To monitor air pollution, many urban environmental stations have been built in this region, and many researchers have analyzed the causes and behavior of high PM2.5 concentration recently [3, 7].

[figures omitted; refer to PDF]

There have been many studies on PM2.5 concentration data analysis methods, such as real-time data space interpolation of monitoring points, weighted regression models, and mixed models [1, 8]. The application of the preceding methods mostly depends on the complete and continuous monitoring data provided by local monitoring stations. However, problem arises when original spatiotemporal PM2.5 concentration data are incomplete, which hinders further analysis and modelling, such as aerosol-related haze control and environmental health risk assessment [9, 10].

In practice, missing values and data gaps always exist in the original spatiotemporal observation records due to various factors. For example, satellite-based remote sensing may be affected by clouds, rain, aerosols, or incomplete track coverage in atmospheric research [11, 12]; in situ observations from land-based stations, shipborne monitoring, offshore buyo stations, and other platforms may suffer unexpected factors such as instrumental malfunction, power supply failure, and Internet outage [10, 13]. Directly ignoring incomplete spatiotemporal observation data should be carefully considered. The reasons include that the some platforms of data acquisition are expensive and irreplaceable (e.g., ocean research vessels and buoy stations), the demanding requirements of data quality (e.g., coastal tidal gauge records), and ignoring missing values sometimes may lead to biased spatial patterns and invalid inferences [10, 13]. Thus, many temporal, spatial, and spatiotemporal data interpolation and imputation methods have been proposed to fill these gaps in records.

Simple methods commonly used to fill gaps in univariate time series include mean value substitution (or median value and mode value), polynomial interpolation (linear, piecewise polynomials, and spline interpolations), and last observation carried forward (locf), but they may result in large deviations when the time gaps are too large [14–17]. Based on a Markovian process, statistical parametric models include autoregressive (AR) models, moving average (MA) models, ARMA models, and linear weighted or exponential weighted MA. Complex machine learning techniques include gradient boosting and artificial neural networks (ANNs), which are computationally intensive [10, 18].

At present, there are also numerous spatial interpolation methods. Common simple methods include inverse distance weighting (IDW) interpolation [19], global polynomial interpolation (GPI), local polynomial interpolation (LPI) [20], surface spline (SS) interpolation [21], Cressman interpolation [22], and radial basis function (RBF). Using different basis functions, RBF includes thin plate spline (TPS), thin plate spline with tension, regularized spline, multiquadric spline, and inverse multiquadric spline. The TPS method does not need to set parameters, while other RBF needs to set parameters [23]. Some statistical-based methods (e.g., Kriging interpolation, optimal interpolation (OI), and Kalman filter) are conventional and classical methods in geoscience [12, 13, 24–27].

Numerous methods have been proposed to deal with spatiotemporal data containing missing values, and a considerable part of them are based on empirical orthogonal function (EOF) (e.g., [28–31]). Compared with other methods, EOF-based methods have the advantages of ease of implementation and less computation costs [32, 33].

EOF is based on the theory of matrix eigenvalue decomposition, and the core step of EOF is to decompose the spatiotemporal matrix into the sums of space-dependent spatial modes multiplied by corresponding time-dependent temporal modes. These EOF spatial and temporal modes can reveal data inherent characteristics or some phenomenon (e.g., ENSO) [13, 28]. EOF is usually used for spatiotemporal data analysis, but it can be also used to fill the missing data gaps.

One of the earliest applications of EOF interpolation is reconstruction of global-scale sea surface temperature (SST) [28]. Based on gridded data (1982–1993) processed by OI, EOF decomposition was performed to obtain spatial modes, and then, the temporal modes were expanded to longer time period (1950–1992) via least squares method when the data coverage was relatively poor; next, the longer time period spatiotemporal SST data were reconstructed. Their work can be considered as another form of optimal interpolation [13, 34]. In 2003, Data INterpolating Empirical Orthogonal Functions (DINEOF), an iterated EOF interpolation method, was proposed to fill the missing data gap [30]. Based on the principle of EOF, DINEOF was successfully used to reconstruct missing data and fill data gaps. Alvera-Azcárate et al. [32] reconstructed missing data of Adriatic sea surface temperature. Sirjacobs et al. [35] used DINEOF to show the reconstruction of complete space-time information for 4 years of surface chlorophyll-a (CHL), total suspended matter, and SST over the Southern North Sea and the English Channel. However, DINEOF may fail if the data gaps are too huge.

Similar to the principle of DINEOF, EOF interpolation (EOFI) was proposed to reconstruct spatially continuous water levels in the Columbia River Estuary using limited tide gauges along the river [36]. Their main steps are as follows: firstly, the spatial-temporal data matrix of the river existing observation stations was decomposed with EOF method. Then, Pan and Lv adopt one-dimensional linear interpolation and one-dimensional spline interpolation to estimate the missing data station’s spatial modes, respectively; then, EOFI reconstruction sequence is obtained by the estimated spatial modes multiplied by corresponding temporal modes, and this reconstruction sequence was in good agreement with that of the NS_TIDE method. NS_TIDE is specially designed and applied to the analysis of river tidal water level, and river flow discharge data are needed [37].

Based on the research of Pan and Lv [36], this study attempts to extend the missing data station’s EOFI spatial mode from one-dimensional spatial interpolation to two-dimensional spatial interpolation. The river upstream and downstream sites are nearly one-dimensional distributed, and there is a strong correlation between the upstream and downstream water level records (e.g., when the upstream of a river rises, the water level in the downstream generally rises). Therefore, it is reasonable to apply one-dimensional interpolation to establish the spatial mode’s connection between the observation stations and the missing data station. Compared with the river water level reconstruction, the PM2.5 stations’ correlation is not so strong and intuitive because the PM2.5 concentration stations are spatially distributed. To establish a connection between variables that two-dimensional distributed in space, a simple idea is using IDW, so EOFI here uses IDW to estimate the spatial modes of the missing data station. Of course, other spatial interpolation methods can also be applied to the establishment of spatial mode relationships, but we will not discuss them in this paper. We consider the simple case (IDW) to verify the usability of EOFI. To the best of our knowledge, our proposed EOFI has not been applied to PM2.5 concentration data reconstruction currently; therefore, we firstly introduce and use this method to fill the data gaps and compare the result with IDW interpolation, surface spline (SS), and TPS interpolation. The competing methods we choose here are all widely used and easy to implement [38].

Compared with widely used DINEOF- and other EOF-based methods, the novelty of our method is to deal with the case of sparsely distributed observation stations and a large proportion of missing values in some stations’ records. In this case, the data of the station with too many missing values are not suitable for EOF decomposition (DINEOF fills these gaps with first guess values and then uses these data for EOF decomposition); otherwise, the accuracy of temporal and spatial modes will be affected. EOFI here only uses the observation data with a small proportion of missing values for decomposition; thus, the EOF decomposed temporal and spatial modes are more accurate and less affected. Then, spatial interpolation is applied to establish spatial modes’ connection between observation stations and missing data station, and next, the reconstruction sequence with optimal mode number is determined by root mean square error (RMSE). The EOFI reconstruction sequence can be used as a reasonable first guess value of the missing data station for other methods further EOF decomposition (e.g., DINEOF). In this way, the spatial mode patterns are considered to some extent. Further comparison between DINEOF and EOFI will be explained in Discussion.

The paper is arranged as follows: Section 2.1 describes the study area and data. Then, we revisit the principle of EOF decomposition and introduce IDW, EOFI, TPS, and SS. The evaluation indices of these methods will also be mentioned in Section 2. Four methods (IDW, EOFI, TPS, and SS) are applied to reconstruct two stations’ PM2.5 concentrations records, and then, the results are compared with corresponding valid observations in Section 3. EOFI inverse distance weighting power P, the impact of site number and data time length on the EOFI reconstruction, and comparison between DINEOF and EOFI will be discussed and analyzed in Section 4. Finally, we present the advantages and disadvantages of EOFI in Section 5.

2. Materials and Methods

2.1. Study Area and Data

There are 14 monitoring stations (Figure 1(b)) located in Tianjin. These stations are distributed in different regions of the city: some stations are located in the urban area (e.g., stations 1, 2, and 3), while other stations are near the Bohai Sea (e.g., stations 10, 11, and 13). The PM2.5 concentration data provided by these monitoring stations come from China National Environmental Monitoring Center (CNEMC). The CNEMC releases near real-time PM2.5 concentration data online, but there are no direct data download interface [10]. Bai et al. used web crawler technology to obtain many cities PM2.5 concentration data from 2014 to 2019. Here, our data sources and acquisition method are the same. In this study, some of the stations provided the hourly PM2.5 data throughout the year of 2015, except for the first 25 hours from January 1st 0:00 AM to January 2nd 0:00 AM. Thus, the total time length is 8735 hours (8760 hours in 2015). The reason for first 25 hours missing values may be web crawler technology failure, or CNEMC did not release the data for that time period. Figure 2 shows the original observation records of several stations used in this study. Among them, the first half year PM2.5 concentration data of station (sta) 1 and station (sta) 8 are reconstructed and compared with their corresponding valid records (Figure 2 (1 and 7)). There are no observed data from June 30th 23:00 PM to the end of the year (near six months) in sta 1 and sta 8. In addition, Bai et al. [10] mentioned that some monitoring stations across China have stopped releasing PM2.5 observations since the middle of 2015, and consequently, observations at these stations for the second half of 2015 are missing. This is the exact case at sta 1 and sta 8 in Tianjin. In sta 1, 10.70% of the data in the first half year are missing, and the percentage of missing data for the nearly whole year record is 55.86% (Figure 2 (1)). At sta 8, the proportions of missing data for the first half year and the nearly whole year are 9.59% and 55.31%, respectively (Figure 2 (7)). It shows that there are still nearly 400 missing values in the first half of the year for both sta 1 and sta 8.

[figure omitted; refer to PDF]

Figures 4 and 5 depict the four interpolation reconstruction sequences and their residuals for sta 1 and 8, respectively. Both power parameters P (1 or 2) are adopted for IDW and EOFI reconstruction for sta 1 and sta 8, but the indices show that choosing P = 1 for IDW and EOFI is more accurate in sta 1, while P = 2 for IDW and EOFI is more accurate in sta 8. The optimal mode number for EOF reconstruction is both three in sta 1 and sta 8. In the part of Result Evaluation and Discussion, we try to explain the reasons for this. It can be seen that four methods can roughly reproduce the valid records in sta 1 and sta 8. In sta 1 (Figure 4), the residuals of the four interpolation methods all change near 0, but there are several errors which are quite different from the observed values. For example, they all show errors of more than 100 μg/m³ around February 20th and mid-March. Regardless of the instrument failure and other factors, the large error at these times may indicate that the PM2.5 concentration varies greatly among different regions of the same city, and it is not accurate to rely on only the adjacent data in this case. In Figure 5 of sta 8, the situation is similar, but the fluctuation magnitude of the residual sequence is significantly larger than that of sta 1, and the large residuals are also more frequently occurred. The performance of the four methods in sta 8 is generally worse than that of sta 1.

[figure omitted; refer to PDF]

Table 4 compares the performance (in terms of RMSE) of four interpolation methods reconstruction sequence. Among the 21 experiments, there are 19 experiments in sta 1 and 13 experiments in sta 8 showing the RMSE of EOFI reconstruction is the smallest, respectively. There are also another 7 groups in sta 8 showing SS performed best in terms of RMSE, and these groups mainly include winter months January, February, and March. We infer that this is due to large PM2.5 concentration difference in different sites in winter, and the accuracy of spatial and temporal modes is not as good as those of other seasons.

Table 4

RMSE of four interpolation methods in sta 1 and sta 8.

RMSE	Station	1				8
RMSE	Method	IDW	EOFI	TPS	SS	IDW	EOFI	TPS	SS
Groups	Month(s)
E1	1	14.562	14.547 $^{*}$	23.675	15.878	36.031	35.751	38.931	35.458 $^{*}$
	2	14.629	14.613	20.924	13.780 $^{*}$	27.065	26.984	28.915	26.103 $^{*}$
	3	11.123 $^{*}$	11.123 $^{*}$	13.989	11.380	26.376	26.040	27.469	25.815 $^{*}$
	4	15.001	14.772 $^{*}$	22.446	15.689	22.207 $^{*}$	22.207 $^{*}$	24.905	22.820
	5	10.286	10.273 $^{*}$	16.905	10.968	22.399	22.313 $^{*}$	27.000	24.248
	6	16.229	16.029 $^{*}$	23.275	16.749	32.065	31.596 $^{*}$	36.881	33.100

E2	1-2	14.596	14.557 $^{*}$	22.308	14.840	31.889	31.688	34.317	31.159 $^{*}$
	2-3	12.980	12.908	17.769	12.627 $^{*}$	26.723	26.519	28.203	25.960 $^{*}$
	3-4	13.120	12.901 $^{*}$	18.518	13.610	24.371	24.339 $^{*}$	26.212	24.357
	4-5	12.732	12.650 $^{*}$	19.716	13.406	22.303 $^{*}$	22.303 $^{*}$	25.965	23.539
	5-6	13.545	13.478 $^{*}$	20.296	14.116	27.549	27.215 $^{*}$	32.208	28.913

E3	1–3	13.507	13.459 $^{*}$	19.855	13.753	30.175	29.934	32.211	29.497 $^{*}$
	2–4	13.650	13.508 $^{*}$	19.368	13.668	25.301	25.273	27.143	24.953 $^{*}$
	3–5	12.209	12.178 $^{*}$	17.973	12.752	23.737	23.692 $^{*}$	26.475	24.321
	4–6	14.016	13.913 $^{*}$	20.991	14.625	25.846	25.758 $^{*}$	29.911	26.984

E4	1–4	13.878	13.791 $^{*}$	20.503	14.239	28.390	28.262	30.545	27.974 $^{*}$
	2–5	12.858	12.791 $^{*}$	18.757	13.018	24.614	24.594 $^{*}$	27.108	24.780
	3–6	13.340	13.262 $^{*}$	19.450	13.872	25.981	25.837 $^{*}$	29.312	26.693

E5	1–5	13.204	13.114 $^{*}$	19.800	13.616	27.310	27.196 $^{*}$	29.878	27.278
E5	2–6	13.607	13.542 $^{*}$	19.754	13.853	26.204	26.142 $^{*}$	29.232	26.574

E6	1–6	13.764	13.691 $^{*}$	20.432	14.197	28.119	27.985 $^{*}$	31.096	28.283

The power P of IDW and EOFI is based on the analysis of Section 4.1 (i.e., $P = 1$ for sta 1 and $P = 2$ for sta 8), and the EOFI reconstruction with optimal mode number is considered. “ $^{*}$ ” represents the smallest RMSE of this experiment.

4.3. Comparison between EOFI and DINEOF

There have been many EOF-based interpolation methods (e.g., DCCEOF in [10], EOFI in [36], and VE-DINEOF in [40]). One of the most widely utilized methods is the iterated EOF method, DINEOF [30]. Therefore, it is necessary to compare DINEOF and EOFI in this study.

First of all, two methods are both based on the matrix eigenvalue decomposition theory, and they all assume that the short missing value intervals of original spatiotemporal observation records will not affect the dominant temporal and spatial modes significantly. Moreover, the first guess values are given to the missing values to enable matrix decomposition. By calculating the RMSE and other indicators, the temporal and spatial modes of the optimal mode number will be used for final reconstruction.

However, the most significant difference between DINEOF and EOFI is the original data used for matrix decomposition. In EOFI, the data of sta 1 and sta 8 (the second half of the year data is missing) are not included in the decomposed matrix, but in DINEOF, the data of sta 1 and sta 8 are taken into EOF decomposition; firstly, the missing values are replaced with first guess values and then conducted matrix decomposition and iterative replacement until convergence. However, this step may be not suitable for the data processing of a small number of stations because the first guess values of these missing stations may greatly affect the accuracy of temporal and spatial modes in this case. Even if the final convergent temporal and spatial modes are obtained through iteration, the calculation resources consumed may be huge. Alvera-Azcárate et al. [32] mentioned that the data points with missing percent more than 95% are removed before data decomposition because they cannot provide effective information. The number of data points involved in their decomposition is huge; therefore, these less-informative points’ removal has little impact on the final results. The DINEOF has been widely used for reconstruction of gap-free satellite images where densely sampled and numerous observations are obtained by remote sensing, while in other platforms (e.g., PM2.5 land-based stations in this study and offshore buoy stations array), where observations are relatively rare and sparse sampled, the temporal and spatial modes of iterated EOF methods may be not accurate when there is a large proportion of missing values in the few sites observation data matrix.

Therefore, for the observation records of finite number stations, if we want to make full use of the data of station with large proportion of missing values, EOFI may be more suitable for this kind of interpolation. The superiority of EOFI here is to obtain more reasonable spatial and temporal modes by excluding the records of large missing percent stations before EOF decomposition. All stations share the same time-dependent temporal mode, while the space-dependent spatial mode of the missing data station is estimated by spatial interpolation (IDW is used in this study), and the spatial mode features and patterns are considered. In addition, EOFI can provide more reasonable first guess values for the data of these missing stations, and next, DINEOF is used to iteratively calculate until convergence. For other differences, such as DINEOF iterative decomposition, EOFI can also use iterative decomposition in this study; DINEOF randomly selects a part of observation data as cross validation points, and EOFI here uses the first half year valid observation records and monthly records of sta 1 and sta 8 as check points, both of which can be unified in these aspects.

5. Conclusion

In this paper, two-dimensional EOFI is introduced and applied to reconstruct spatial-distributed PM2.5 data as an extension to one-dimensional EOFI in river water level reconstruction. The main step of EOFI here is to calculate the missing data station’s estimated spatial modes $\tilde{F}$ by IDW interpolation of spatial modes of the observation sites and then multiply $\tilde{F}$ and the corresponding temporal modes to obtain the EOFI reconstruction sequence, and the optimal mode number of EOFI reconstruction is determined by minimizing RMSE. Compared with the other three interpolation methods (IDW, TPS, and SS), the quantitative indices show that EOFI can improve the interpolation effect. The conclusion is as follows.

TPS and SS have fixed function forms, and their coefficient matrices are space-dependent. The advantage of EOFI is that the spatiotemporal matrix is decomposed into time-dependent temporal modes and space-dependent spatial modes under EOF assumption. Observation stations and missing data stations share the same temporal modes, while the spatial modes of missing data station are estimated by the IDW of observation stations’ spatial modes. The benefit of IDW is that when the distance between the missing station and the observation station is very close, the spatial mode estimated by IDW is very close to that of the observation station; thus, the EOFI reconstruction sequence of the missing station is also close to the data of the observation station, which is consistent with our cognition. More essentially, the IDW weights of neighboring points are generated by statistical estimate of covariance between the observation points. TPS and SS weights do not depend on the statistical features of interpolated fields. EOFI can reduce MAE and RMSE compared with other three methods, and other indices show that the performance of EOFI is better too. This shows that EOFI can improve the interpolation effect with optimal modes. The results of several experimental groups with different data lengths show that the dominant spatial modes of EOF decomposition almost do not change with the time length, which is consistent with the EOF assumption that the spatial modes are independent of time. At the same time, the RMSE of EOFI reconstruction with optimal mode number still shows the advantages over the other three methods.

The proposed method is suitable for interpolation when observations are rare and sparsely distributed, and there are large percent of missing values for some stations’ original records. The EOFI reconstruction sequence of missing data station can be a reasonable first guess value for further DINEOF (or other iterated EOF-based method) steps.

EOFI has the advantages of less calculation, less parameter choices, and ease of implementation and can be extended to fill the missing data gaps of other two-dimensional spatial distribution physical variables. The limitation of EOFI is that the missing values’ temporal and space gaps should not be too large; otherwise, it will affect the accuracy of spatial and temporal modes. At the same time, the quality of the original data has an impact on the reconstruction results. High quality and complete observation data can produce more accurate spatial and temporal modes, which is conducive to EOFI reconstruction.

Acknowledgments

The authors would like to thank Professor Yang Gao for providing the PM2.5 concentration data. This work was supported by the National Natural Science Foundation of China (Grant no. 41876003) and the National Key Research and Development Program of China (Grant nos. 2017YFA0604101 and 2016YFC1401404).

References

[1] S. Zhai, D. J. Jacob, X. Wang, "Fine particulate matter (PM2.5) trends in China, 2013–2018: separating contributions from anthropogenic emissions and meteorology," Atmospheric Chemistry and Physics, vol. 19, pp. 11031-11041, DOI: 10.5194/acp-19-11031-2019, 2019.

[2] S. Gautam, A. K. Patra, P. Kumar, "Status and chemical characteristics of ambient PM2.5 pollutions in China: a review," Environment, Development and Sustainability,DOI: 10.1007/s10668-018-0123-1, 2018.

[3] H. Shi, S. Wang, J. Li, L. Zhang, "Modeling the impacts of policy measures on resident’ s PM2.5 reduction behavior : an agent-based simulation analysis," Environmental Geochemistry and Health, vol. 1,DOI: 10.1007/s10653-019-00397-1, 2019.

[4] Y. Li, J. Wang, C. Chen, Y. Chen, J. Li, "Estimating PM2.5 in the Beijing-tianjin-hebei region using modis aod products from 2014 to 2015," The International Archives of the Photogrammetry, Remote Sensing and Spatial Information Sciences, vol. 41, pp. 721-727, DOI: 10.5194/isprsarchives-XLI-B2-721-2016, 2016.

[5] X. Liu, Coauthors, "Fine particulate matter pollution in North China: seasonal-spatial variations, source apportionment, sector and regional transport contributions," Environmental Research, vol. 184,DOI: 10.1016/j.envres.2020.109368, 2020.

[6] J. Feng, J. Quan, H. Liao, Y. Li, X. Zhao, "An air stagnation index to qualify extreme haze events in northern China," Journal of the Atmospheric Sciences, vol. 75, pp. 3489-3505, DOI: 10.1175/jas-d-17-0354.1, 2018.

[7] X. Wu, Y. Chen, J. Guo, G. Wang, Y. Gong, "Spatial concentration, impact factors and prevention-control measures of PM2.5 pollution in China," Natural Hazards, vol. 86, pp. 393-410, DOI: 10.1007/s11069-016-2697-y, 2017.

[8] L. Zhou, C. Zhou, F. Yang, L. Che, B. Wang, D. Sun, "Spatio-temporal evolution and the influencing factors of PM2.5 in China between 2000 and 2015," J. Geogr. Sci., vol. 29, pp. 253-270, DOI: 10.1007/s11442-019-1595-0, 2019.

[9] P. Yin, Coauthors, "Higher risk of cardiovascular disease associated with smaller size-fractioned particulate matter," Environmental Science & Technology Letters, vol. 7, pp. 95-101, DOI: 10.1021/acs.estlett.9b00735, 2020.

[10] K. Bai, K. Li, J. Guo, Y. Yang, N.-B. Chang, "Filling the gaps of in situ hourly PM2.5 concentration data with the aid of empirical orthogonal function analysis constrained by diurnal cycles," Atmospheric Measurement Techniques, vol. 13, pp. 1213-1226, DOI: 10.5194/amt-13-1213-2020, 2020.

[11] A. Alvera-Azcárate, A. Barth, D. Sirjacobs, F. Lenartz, J. M. Beckers, "Data interpolating empirical orthogonal functions (DINEOF): a tool for geophysical data analyses," Mediterranean Marine Science, vol. 12,DOI: 10.12681/mms.64, 2011.

[12] D. Kondrashov, M. Ghil, "Spatio-temporal filling of missing points in geophysical data sets," Nonlinear Processes in Geophysics, vol. 13, pp. 151-159, DOI: 10.5194/npg-13-151-2006, 2006.

[13] J. Elken, M. Zujev, J. She, P. Lagemaa, "Reconstruction of large-scale Sea surface temperature and salinity fields using sub-regional EOF patterns from models," Frontiers Earth Science, vol. 7,DOI: 10.3389/feart.2019.00232, 2019.

[14] L. Feng, G. Nowak, T. J. O. Neill, A. H. Welsh, "CUTOFF : a spatio-temporal imputation method," Journal of Hydrology, vol. 519, pp. 3591-3605, DOI: 10.1016/j.jhydrol.2014.11.012, 2014.

[15] S. Moritz, T. Bartz-Beielstein, "ImputeTS: time series missing value imputation in R," The R Journal, vol. 9, pp. 207-218, DOI: 10.32614/rj-2017-009, 2017.

[16] M. W. Beck, N. Bokde, G. Asencio-Cortés, K. Kulat, "R package imputetestbench to compare imputation methods for Univariate time series," The R Journal, vol. 10, pp. 218-233, DOI: 10.32614/rj-2018-024, 2018.

[17] N. Bokde, M. W. Beck, F. Martínez Álvarez, K. Kulat, "A novel imputation methodology for time series based on pattern sequence forecasting," Pattern Recognition Letters, vol. 116, pp. 88-96, DOI: 10.1016/j.patrec.2018.09.020, 2018.

[18] M. Lepot, J. B. Aubin, F. H. L. R. Clemens, "Interpolation in time series: an introductive overview of existing methods, their performance criteria and uncertainty assessment," Water (Switzerland), vol. 9,DOI: 10.3390/w9100796, 2017.

[19] G. Y. Lu, D. W. Wong, "An adaptive inverse-distance weighting spatial interpolation technique," Computers & Geoscience, vol. 34, pp. 1044-1055, DOI: 10.1016/j.cageo.2007.07.010, 2008.

[20] Y. Chen, X. Shan, X. Jin, T. Yang, F. Dai, D. Yang, "A comparative study of spatial interpolation methods for determining fishery resources density in the Yellow Sea," Acta Oceanologica Sinica, vol. 35 no. 12, pp. 65-72, DOI: 10.1007/s13131-016-0966-y, 2016.

[21] X. Zong, M. Xu, J. Xu, X. Lv, "Improvement of the ocean pollutant transport model by using the surface spline interpolation," Tellus A: Dynamic Meteorology and Oceanography, vol. 70,DOI: 10.1080/16000870.2018.1481689, 2018.

[22] G. P. Cressman, "An operational objective analysis system," Monthly Weather Review, vol. 87, pp. 367-374, DOI: 10.1175/1520-0493(1959)087<0367:AOOAS>2.0.CO;2, 1959.

[23] S. Chen, C. F. N. Cowan, P. M. Grant, "Orthogonal least squares learning algorithm for radial basis function networks," IEEE Transactions on Neural Networks and Learning Systems, vol. 2, pp. 302-309, DOI: 10.1109/72.80341, 1991.

[24] J. P. C. Kleijnen, "Kriging metamodeling in simulation : a review," European Journal of Operational Research, vol. 192, pp. 707-716, DOI: 10.1016/j.ejor.2007.10.013, 2009.

[25] Y. C. Fang, T. J. Weingartner, R. A. Potter, P. R. Winsor, H. Statscewich, "Quality assessment of HF radar-derived surface currents using optimal interpolation," Journal of Atmospheric and Oceanic Technology, vol. 32, pp. 282-296, DOI: 10.1175/JTECH-D-14-00109.1, 2015.

[26] Z. H. Liu, R. G. Huang, Y. M. Hu, S. D. Fan, P. H. Feng, "Generating high spatiotemporal resolution LAI based on MODIS/GF-1 data and combined Kriging-Cressman interpolation," International Journal of Agricultural and Biological Engineering, vol. 9, pp. 120-131, DOI: 10.3965/j.ijabe.20160905.1777, 2016.

[27] G. Burgers, P. J. Van Leeuwen, G. Evensen, "Analysis scheme in the ensemble Kalman filter," Monthly Weather Review, vol. 126, pp. 1719-1724, DOI: 10.1175/1520-0493(1998)126<1719, 1998.

[28] T. M. Smith, R. W. Reynolds, R. E. Livezey, D. C. Stokes, "Reconstruction of historical Sea surface temperatures using empirical orthogonal functions," Journal of Climate, vol. 9, pp. 1403-1420, DOI: 10.1175/1520-0442(1996)009<1403:rohsst>2.0.co;2, 1996.

[29] K. Y. Kim, "Statistical interpolation using cyclostationary EOFs," Journal of Climate, vol. 10, pp. 2931-2942, DOI: 10.1175/1520-0442(1997)010<2931:siuce>2.0.co;2, 1997.

[30] J.-M. Beckers, M. Rixen, "EOF calculations and data filling from Incomplete Oceanographic Datasets," Journal of Atmospheric and Oceanic Technology, vol. 20, pp. 1839-1856, DOI: 10.1175/1520-0426(2003)020<1839:ecadff>2.0.co;2, 2003.

[31] C. Jayaram, N. Priyadarshi, J. Pavan Kumar, T. V. S. Udaya Bhaskar, D. Raju, A. J. Kochuparampil, "Analysis of gap-free chlorophyll-a data from MODIS in Arabian Sea, reconstructed using DINEOF," International Journal of Remote Sensing, vol. 39, pp. 7506-7522, DOI: 10.1080/01431161.2018.1471540, 2018.

[32] A. Alvera-Azcárate, A. Barth, M. Rixen, J. M. Beckers, "Reconstruction of incomplete oceanographic data sets using empirical orthogonal functions: Application to the Adriatic Sea surface temperature," Ocean Model, vol. 9, pp. 325-346, DOI: 10.1016/j.ocemod.2004.08.001, 2005.

[33] Y. C. Liang, M. R. Mazloff, I. Rosso, S. W. Fang, J. Y. Yu, "A multivariate empirical orthogonal function method to construct nitrate maps in the Southern Ocean," Journal of Atmospheric and Oceanic Technology, vol. 35, pp. 1505-1519, DOI: 10.1175/jtech-d-18-0018.1, 2018.

[34] Z. Zhang, X. Yang, H. Li, W. Li, H. Yan, F. Shi, "Application of a novel hybrid method for spatiotemporal data imputation: a case study of the Minqin County groundwater level," Journal of Hydrology, vol. 553, pp. 384-397, DOI: 10.1016/j.jhydrol.2017.07.053, 2017.

[35] D. Sirjacobs, A. Alvera-Azcárate, A. Barth, "Cloud filling of ocean colour and sea surface temperature remote sensing products over the Southern North Sea by the Data Interpolating Empirical Orthogonal Functions methodology," Journal of Sea Research, vol. 65, pp. 114-130, DOI: 10.1016/j.seares.2010.08.002, 2011.

[36] H. Pan, X. Lv, "Reconstruction of spatially continuous water levels in the Columbia River estuary: the method of empirical orthogonal function revisited," Estuarine, Coastal and Shelf Science, vol. 222, pp. 81-90, DOI: 10.1016/j.ecss.2019.04.011, 2019.

[37] P. Matte, D. A. Jay, E. D. Zaron, "Adaptation of classical tidal harmonic analysis to nonstationary tides, with application to river tides," Journal of Atmospheric and Oceanic Technology, vol. 30 no. 3, pp. 569-589, DOI: 10.1175/JTECH-D-12-00016.1, 2013.

[38] J. Li, A. D. Heap, "A review of comparative studies of spatial interpolation methods in environmental sciences : performance and impact factors," Ecological Informatics, vol. 6, pp. 228-241, DOI: 10.1016/j.ecoinf.2010.12.003, 2011.

[39] E. N. Lorenz, Empirical Orthogonal Functions and Statistical Weather Prediction, 1956.

[40] B. Ping, F. Su, Y. Meng, "An improved DINEOF algorithm for filling missing values in spatio-temporal sea surface temperature data," PLoS One, vol. 11,DOI: 10.1371/journal.pone.0155928, 2016.

[41] W. R. Tobler, "A computer movie simulating urban growth in the Detroit region," Journal of Economic Geography, vol. 46, pp. 234-240, DOI: 10.2307/143141, 1970.

[42] J. Duchon, "Splines minimizing rotation-invariant semi-norms in Sobolev spaces," Constructive Theory of Functions of Several Variables, pp. 85-100, 1977.

[43] F. L. Bookstein, "Principal Warps : thin-plate splines and the decomposition of deformations," IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 11, pp. 567-585, DOI: 10.1109/34.24792, 1989.

[44] Z. Guo, H. Pan, W. Fan, X. Lv, "Application of surface spline interpolation in inversion of bottom friction coefficients," Journal of Atmospheric and Oceanic Technology, vol. 34, pp. 2021-2028, DOI: 10.1175/JTECH-D-17-0012.1, 2017.

[45] J. E. Nash, J. V. Sutcliffe, "River flow forecasting through conceptual models. Part 1 — a discussion of principles," Journal of Hydrology, vol. 10, pp. 282-290, DOI: 10.1016/0022-1694(70)90255-6, 1970.

[46] C. J. Willmott, "On the validation of models," Progress in Physical Geography, vol. 2, pp. 184-194, DOI: 10.1080/02723646.1981.10642213, 1981.

[47] X. Wang, R. R. E. Dickinson, L. Su, C. Zhou, K. Wang, "PM 2.5 pollution in China and how it has been exacerbated by terrain and meteorological conditions," Bulletin of the American Meteorological Society, vol. 99, pp. 105-120, DOI: 10.1175/BAMS-D-16-0301.1, 2018.

Word count: 5058

Show less

Copyright © 2020 Hongwu Zhou et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/

Abstract

Fine particulate matter with diameters less than 2.5 μm (PM2.5) concentration monitoring is closely related to public health, outdoor activities, environmental protection, and other fields. However, the incomplete PM2.5 observation records provided by ground-based PM2.5 concentration monitoring stations pose a challenge to the study of PM2.5 propagation and evolution model. Consequently, PM2.5 concentration data imputation has been widely studied. Based on empirical orthogonal function (EOF), a new spatiotemporal interpolation method, EOF interpolation (EOFI) is introduced in this paper, and then, EOFI is applied to reconstruct the hourly PM2.5 concentration records of two stations in the first half of the year. The main steps of EOFI here are to firstly decompose the spatiotemporal data matrix of the original observation site into mutually orthogonal temporal and spatial modes with EOF method. Secondly, the spatial mode of the missing data station is estimated by inverse distance weighting interpolation of the spatial mode of the observation sites. After that, the records of the missing data station can be reconstructed by multiplying the estimated spatial mode and the corresponding temporal mode. The optimal mode number for EOFI is determined by minimizing the root mean square error (RMSE) between reconstructed records and corresponding valid records. Finally, six evaluation indices (mean absolute error (MAE), RMSE, correlation coefficient (Corr), deviation rate bias, Nash–Sutcliffe efficiency (NSE), and index of agreement (IA)) are calculated. The results show that EOFI performs better than the other three interpolation methods, namely, inverse distance weight interpolation, thin plate spline, and surface spline interpolation. The EOFI has the advantages of less computation, less parameter selection, and ease of implementation, it is an alternative method when the number of observation stations is rare, and the proportion of missing value at some stations is large. Moreover, it can also be applied to other spatiotemporal variables interpolation and imputation.

Details

Title

Application of Empirical Orthogonal Function Interpolation to Reconstruct Hourly Fine Particulate Matter Concentration Data in Tianjin, China

Author

Zhou, Hongwu¹; Pan, Haidong¹; Li, Shuang²

; Lv, Xianqing¹

¹ Physical Oceanography Laboratory, Qingdao Collaborative Innovation Center of Marine Science and Technology (CIMST), Ocean University of China, Qingdao, China; Qingdao National Laboratory for Marine Science and Technology, Qingdao, China
² Ocean College, Zhejiang University, Zhoushan, China

Editor

Zhihan Lv

Publication year

2020

Publication date

2020

Publisher

John Wiley & Sons, Inc.

ISSN

10762787

e-ISSN

10990526

Source type

Scholarly Journal

Language of publication

English

DOI

https://doi.org/10.1155/2020/9724367

ProQuest document ID

2458480883

Application of Empirical Orthogonal Function Interpolation to Reconstruct Hourly Fine Particulate Matter Concentration Data in Tianjin, China

Jump to:

Full Text

Abstract

Details

Suggested sources