1. Introduction
Soil is a heterogeneous system, displaying complex processes and mechanisms that are difficult to comprehensively understand. Many conventional analytical techniques have been employed in an attempt to establish a direct relationship between soil and soil properties [1]. Knowledge concerning the soil system, its interactions and quality has been systematically supported through routine analyses which, although reliable, involve the collection of large numbers of samples, as well as laborious analysis processes. A high economic cost is also noted, due to the use of chemical reagents, laboratory equipment, and personnel, with the added generation of hazardous waste [2].
The search for agile, low cost, accurate, and low environmental impact analytical tools is key in soil studies, mainly those on a large scale. The search for the development of cleaner and more economically viable technologies that allow for analytical repeatability as well as a minimum of sample preparation [3,4], has become one of the great demands of the 21st century. Most likely due to these reasons, Visible, Near Infrared and Short Wave Infrared (VNIR-SWIR) spectroscopy has been considered a possible alternative to improve or replace conventional laboratory soil analysis methods [5,6]. This technique meets all the desirable characteristics and leads to less onerous soil attribute predictions [7].
VNIR-SWIR spectra are signals containing information concerning chemical, physical, and mineralogical soil characteristics. Models constructed through multivariate statistical regression techniques [8], termed chemometric analyses, can be applied in order to take advantage of this information.
Despite the existence of this sophisticated method and the fact that VNIR-SWIR spectroscopy presents a robust physicochemical basis, the samples used to construct a model must be capable of representing both the chemical and physical attributes of the area to which it will be applied. In the present study, we attempted to calibrate models from spectral libraries containing a high number of samples, which should be representative and encompass the entire expected variability for the study area [9]. However, according to Viscarra Rossel et al. [10] and Wetterlind et al. [11], calibrations obtained from large sample sizes do not always guarantee direct applicability to new environments.
In this case, model application in non-sampled locations and, therefore, without any representatives in the spectral library, can lead to errors during the prediction phase, resulting in low accuracy [12]. Current studies have demonstrated that recalibration of so-called regional, state, or global models with samples from an area to be assessed (spiking) is the best way to overcome this problem [13,14,15].
For this, some of the soil samples from the target area are scanned by means of spectroscopy and are also analyzed in the laboratory using the classic reference method of analysis to estimate chemical, physical, and mineralogical attributes. Subsequently, these samples are added to the original calibration matrix which represents different scales and thus the prediction model is recalibrated. The domain of the new model starts to be expanded and it contains the variability of the target area. This process can often improve the model accuracy for the remaining soil samples from target area [16,17]. The key issue in this process is that, although it is known that the spiking technique presents promising potential for model improvements and soil attribute estimations, it does not always lead to good results. Studies such as those performed by Guerrero et al. [15] and Guy et al. [18] have shown that the achievement of satisfactory results is mainly linked to the characteristics of the target area. Estimates usually present low efficiency when the samples are spectrally very different from each other, as well as very different from those used to generate the regional, state, or global models.
In this case, the use of the so-called hybrid spectra seems to be the best way to fill the spectral space between the model and target area samples. The hybrid spectrum (average spectrum) is an "artificial" spectrum constructed using a sample from the spectral library and a local sample [19]. It is important to highlight that as these hybrid spectra share characteristics of both sets of samples, they could increase the number of spectra in the spectral space located between both sets of samples without additional analytical cost.
With this strategy, it is expected that this spectral space, filled with hybrid samples, will have greater relevance within the recalibrated model, allowing a better estimate for the remaining samples of the target area. Studies such as the one by Guerrero and Zornoza [19] seem to confirm these assumptions, offering positive results, but which should be addressed with a larger set of data in relation to the work done by the researchers.
In this context, the aim of the present study was to assess the effects of the spiking technique and hybridization in the recalibration of a state model and their reflexes on organic matter estimations in areas that are under different uses and management. We chose to work with organic matter, due to the fact that it is an important indicator of the physical and chemical quality of the soil [20,21]. On the other hand, procedures such as Walkley and Black [22], traditionally used to quantify this attribute uses dangerous chemical reagents such a chromium solution [23].
Associated with this, there is a strong demand in the State of Paraná for chemical analyzes of determination of organic matter, since monitoring the content of this element is important for the development of management practices that will improve and maintain the productivity of agricultural soils [24].
Facing the aforementioned, this study is expected to contribute to understanding how we can expand the use of VNIR-SWIR spectroscopy in the estimation of organic matter in new areas, based on a minimum number of soil samples and spectral data of this environment.
2. Materials and Methods
2.1. Soil Sampling
A total of 425 soil samples were collected from different areas submitted to different uses and management in the state of Paraná, Brazil (0–20 cm depth). The sampling points were selected based on the pedological and geological maps of the state of Paraná represented on a scale of 1:600.000 and 1:650.000, respectively [25,26]. The sites were chosen in such a way that there was no collection of repeated soil samples on similar source materials. This care was taken so that there was no influence from repeated samples in the spectral models to be adjusted. Use of soil as well as the management practices adopted in agricultural areas were recorded as the field work was being carried out. Subsequently, this information was inserted into a geographic information system for the creation of a georeferenced database and a spectral library of the state of Paraná. The predominant soil classes in the areas are lixisols, cambisols, gleysols, ferralsols, arenosols, nitisols, and histosols [27].
In addition, a total of 200 soil samples (0–20 cm depth) were collected in a specific area located in the Lobato municipality, northwestern of Paraná-Brazil, comprising 2500 ha. This area is basically occupied with remnants of forest and sugar cane crops. The soil classes from the area include ferralsols, nitisols, lixisols, cambisols, and arenosols [27]. The location of the target area within the state of Paraná is shown below (Figure 1).
2.2. Organic Matter and Spectral Analyses
The soil samples were oven dried at 45 °C for 24 h and subsequently sieved through a 2 mm mesh to be submitted to chemical analysis. Total organic carbon was determined following the Walkley and Black methodology [22]. The organic matter content was obtained by multiplying the total organic carbon by 1.724, since it is admitted that in the humus medium composition, carbon participates with 58% [28].
The organic matter attribute was chosen for spectral modelling because it is an important indicator of soil quality in Brazil and its traditional determination uses reagents which, without the correct final destination, may contaminate the environment.
For the determination of spectral readings, in addition to the drying process aforementioned, the samples were milled to homogenize soil particle size and reduce roughness effects [29]. The samples were then arranged in a 9 cm diameter and 1.5 cm high Petri dish and spectral readings were carried out using an ASD FieldSpec 3 JR spectroradiometer, which covers the spectral range from 350 to 2500 nm. The equipment was programmed to perform 50 readings per sample, generating an average spectral curve.
A standard white plate with 100% calibrated reflectance was used for data acquisition, according to the Labsphere Reflectance Calibration Laboratory [30]. The fiber optic reader was placed 8 cm upright from the sample support platform. The reading area was of approximately 2 cm2. The light source used was a 650 W lamp with an uncollimated beam for the target plane, positioned 35 cm from the platform and at a 30° angle to the horizontal plane [31].
The spectral readings were repeated three times, with successive displacement of the Petri dish 120° clockwise, allowing for a full sample scan. Subsequently, the simple arithmetic means of the three readings was determined for each sample, as recommended by Nanni and Demattê [32].
2.3. Data Processing and Statistical Analyses
Raw spectral data were preprocessed to improve the stability of the regression models, as described by Lee et al. [33]. Each spectral curve was subjected to baseline correction and light scattering by the multiplicative scatter correction (MSC) method, according to Buddenbaum and Steffens [34]. For noise reduction, the Savitzky-Golay Smoothing method [35] was used, with the first derivative employed using seven smoothing points. The calibration models were constructed applying the partial least squares regressions (PLSR), using the Unscrambler version 10.3 software package (CAMO, Inc., Oslo, Norway).
The prediction performance of the models was assessed using the coefficient of determination (r2), square root of mean prediction error (RMSEP), standard error (SEP), systematic error (BIAS) and ratio of performance to deviation (RPD), as described by D’Acqui et al. [13].
2.4. State Model
The state model (unspiked state model) was generated from a total of 425 soil samples collected in the state of Paraná. Its effectiveness in the estimation of organic matter was tested on 200 samples collected in the target area.
Recalibrated State Model
For this step, the spiking technique was used to recalibrate the state model with selected samples from the target area. Sample selection was performed based on spectral sample characteristics. The criterion of choice was based on the distribution of spectra from the set of samples from the target area within the spectral domain they occupy. In this way, we tried to use spectra that were at the limit of the spectral domain, as well as those that were in the center or even randomly distributed, with the objective of covering the entire spectral space.
The selection based on cluster analysis sought to group smaller samples with spectral similarity into smaller subsets. As it is an unsupervised analysis, a biased selection is discarded, in addition, recalibration with samples from different clusters may indicate that the best soil samples from a set should be employed in the Spiking technique. Initially recalibrations were tested with five and 10 samples, however, five samples proved to be an insufficient number (not presented).
A large number of samples was not tested in the recalibration, since such a procedure is not recommended, since routine analyzes would be required to determine organic matter, which would increase costs, contrary to the application of the spectroscopy technique. The selection criteria are presented below.
A total of 10 samples located at the periphery of the spectral space, comprising the first two main components (subset one), 10 samples located in the center of the spectral space, comprising the first two main components (subset two), 10 samples located along the spectral space consisting of the first two main components (subset three) and 10 samples belonging to different clusters (k-means) (subset four) were chosen, according to Cezar et al. [36].
In a second step, the state model was recalibrated with hybrid spectra, obtained using the dataset of the target area and of the state of Paraná. In order to obtain these spectra, the four selection criteria mentioned above were applied to both datasets. After the subset’s selection, the simple means between the corresponding spectra was calculated for each criterion, obtaining 10 hybrid spectra for each criterion, with a total of 40 hybrid spectra. A general state model recalibration scheme is presented in Figure 2.
After state model recalibration, the model was used on an unknown data matrix (95% of the remaining samples of the target area), for performance and predictive ability testing.
3. Results and Discussions
3.1. Statistical Soil Sample Characterization
The results obtained from the descriptive analysis indicate that organic matter content is variable. Considering the entire state, the oscillation between the minimum and maximum values is high, reaching values above 60 g dm−3 (Figure 3). Compared to the set of samples from the target area, the standard deviation of the set of samples collected in the state of Paraná is higher, showing a smaller homogeneity among the data.
The high variability (around 72.23%, not presented) of the state of the Paraná samples is mainly due to the existence of differences in the state climatic conditions [37], leading to significant variations in organic matter content in several regions [36].
In addition, the managements applied in the agricultural areas also result in greater variability, with higher or lower organic matter accumulation on the soil surface. Therefore, those variations were expected, since both no-tillage and tillage planting are observed in the state of Paraná, with higher organic matter accumulation over the years in no-tillage planting, agreeing with Martínez et al. [38].
On the other hand, the sample set from the target area presented lower variability, around 59.23% (not presented). In this case, some factors such as climate and management were less relevant, due to the smaller size of the area, 2500 ha, mostly used for sugarcane plantations, which undergoes the same management during the crop cycle. At a lesser extent, some forest remnants are also observed.
3.2. Spectral Soil Sample Characterization
The spectral curves representative of the samples used for the model generation also showed inter-state differences, as well as differences between the state and the target area samples (Figure 4).
The average spectral curves for the samples collected in the target area are better defined, with distinct inter-sample spectral differentiation, mainly in wavelengths greater than 700 nm. On the other hand, the spectral curves for the Paraná samples are less differentiated (except for histosols and arenosols that have very different spectral behavior) and are better separated in wavelengths greater than 1900 nm. The reflectance factor of the target area samples presents a lower amplitude, ranging from 0.02 to 0.25, while for the Paraná samples it ranges from 0.01 to 0.70.
This difference occurs mainly as a function of soil variability in Paraná State, as described in Section 2.1. In addition, soil use and management can lead to changes in spectral behavior, mainly due to variations in organic matter content, which can absorb electromagnetic radiation at all wavelengths, masking absorption bands generated by other elements [39]. This behavior can be observed through the evaluation of the spectral curve of samples collected in the remaining forest area, classified as histosols (Figure 4), which do not display absorption bands except in the 1900 nm region, characteristic for the presence of water [40].
Iron occurrence was the same for all target area samples (absorption around 900 nm), except for arenosols. This agrees with one of the materials of origin that form this soil, which displays low iron concentrations. On the other hand, the spectral responses of the Paraná samples indicated the presence of iron only for ferralsols and nitisols, which usually present this element above 150 g kg−1 of soil.
The other classes, besides presenting lower levels than the aforementioned soil, displayed impaired iron absorption bands due to the presence of organic matter, [41,42]. This was higher than 2% for 175 samples, with the potential to influence any spectral curve, as discussed by Baumgardner et al. [43] and Bilgili et al. [44].
3.3. Unspiked State Model
Calibration and Prediction
Table 1 displays the results obtained during model calibration and validation.
Although the BIAS, correlation coefficient, and coefficient of determination results were satisfactory, the RPD value was below the ideal for use in agriculture, as discussed by Viscarra Rossel et al. [1], considered as presenting average precision. According to these researchers, the ideal values for use with agriculture would be above 3, where values from 2 to 3 are considered good, 1.5 to 2 average, and below 1.5, unsatisfactory. This finding is corroborated when using this model to estimate organic matter content in the target area, where a low ability to accurately predict these values is observed. The RPD in this case is noteworthy, at 1.42, relatively superior to the values obtained by Cezar et al. [36] in a similar study.
Therefore, it is evident that, even in the case of a large state model composed of 425 soil samples, the entire variability of the target area could not be determined, with no accurate representation of the organic matter contents present in the target area, agreeing with what was described by Viscarra Rossel et al. [45] and Guerrero et al. [15]. The presence of variability can be corroborated by the means of the average spectral curves for the target area, which present differences in terms of absorption bands, as well as oscillations in reflectance values for the different soil classes (Figure 4).
3.4. Spiked State Model/Spiked and Hybridized State Model
3.4.1. Recalibration
The statistical parameters presented a small improvement over the unspiked state model after recalibration of the state model through the spiking technique and spiking associated with hybridization (Table 2).
When assessing the spiked state model, it is noted that the RMSEC is lower, ranging from 9.6 to 9.9 g dm−3, while the correlation coefficient reaches a maximum value of 0.80 for the models recalibrated with subset one and subset four. The RPD values are also relatively better, reaching a maximum of 1.72. These results indicate that the recalibration of the state models with some target area samples can lead to improvements in the statistical parameters, agreeing with Guerrero et al. [46] and Hong et al. [17].
Similar behavior was observed after the use of the spiked state model associated with hybridization for recalibration of the state models. However, no significant improvements were observed after the use of this technique. RMSEC values reached a maximum of 9.9 g dm−3, while RPD reached 1.66. It should be noted that, in both cases, BIAS was low, demonstrating a lack of bias for the generated models.
3.4.2. Prediction
The statistical parameters obtained by the recalibrated model during the prediction phase are presented in Table 3.
A relative improvement in organic matter content estimates is noted after the use of the spiked state model in a new area, in agreement with the one described by Brown et al. [9], Sankey et al. [12], and Wetterlind and Stenberg [14]. The RMSEP values were lower than those obtained for the unspiked state model, while the correlation coefficient and determination values were higher.
When compared to the work of Lazzaretti et al. [47], which estimated soil organic matter through NIR spectroscopy associated with an unspiked model in the southern region of Brazil, the results were also slightly higher. Emphasis should be given to the correlation coefficient, which ranged from 0.68 to 0.76 (Table 3), depending on the strategy used for selecting samples used in the recalibration of the state model, against 0.62 of the aforementioned work.
On the other hand, the results were inferior to those obtained by Lazaar et al. [48] who, working in Eastern Morocco with organic matter estimation by means of VNIR-SWIR spectroscopy, obtained r2 equal to 0.93 and RMSE equal to 0.13. It should be noted that in this case, the prediction models as well as their validations were developed with a smaller number of samples (lower variability) when compared to the work carried out in Paraná State.
Similarly, the results were lower than those achieved by Qiao et al. [49], who developed organic matter prediction models for Chinese soils using hyperspectral data. These researchers obtained values of r2 and RPD (prediction) equal to 0.61 and 5.53, whereas we obtained maximum values for these indices equal to 0.43 and 1.36, respectively, considering the spiked state model.
As already explained for Lazaaret al. [48], the number of samples used in the calibration (165) and validation (15) of this work is small when compared to that used for the study of Paraná soils. While we obtained a mean value for organic matter above 20 g dm−3 and a maximum value above 60 g dm−3 (Figure 3), the study by Qiao et al. [49] obtained a mean value of 2.60 g dm−3 and a maximum of 4.33 g dm−3. These results demonstrate the difference between the organic matter content of tropical soils and other parts of the world.
Regarding the use of hybridization, this technique did not allow for improvements in organic matter content estimates when compared to the spiked state model, since lower quality indices were found in most cases (subset one, subset two, and subset three). Likewise, when comparing the recalibrated models using hybrid curves with the unspiked state model, only subset four presented better results, with RMSEP, BIAS, and the correlation coefficient equal to 4.6 g dm−3, 1.55, and 0.71, respectively.
The lack of effectiveness of the hybridization to improve organic matter predictions lies in the fact that, although the spectral curves are different in terms of absorption and reflectance, as displayed in Figure 4, both datasets (from Paraná and the target area) are within the same spectral domain, demonstrating that these samples are not spectrally distant, agreeing with Nawar and Mouazen [50] and Hong et al. [17].
This assertion can be corroborated by means of the similarity map formed by principal component analysis scores. When the PCA model generated with the Paraná soil sample spectra was applied to the target area sample spectra, the scores were calculated and projected to the local site within the spectral space occupied by the state samples (Figure 5), similar to that obtained by Wetterlind and Stenberg [14].
Therefore, to obtain success with the use of hybrid curves, the regional, state, or national soil samples that generally make up large spectral libraries must be spectrally very different from the samples collected in new areas where organic matter estimates are of interest, i.e., both datasets must be separated within the occupied spectral space. Thus, after recalibration, this space will be filled by the hybrid curves, forcing the recalibrated model to present better estimation power.
The improvements in organic matter predictions noted after recalibration of the state model were due to the use of the spiking technique without hybridization, as demonstrated by Guerrero et al. [15]. The presence of both datasets within the same spectral domain (Figure 5) led to more positive results, as advocated by Kuang and Mouazen [51] and Nawar and Mouazen [52]. After spiking, a slight improvement in the model fit was noted, especially when assessing the spiked state model using subset four (Figure 6).
However, although improvements in the predictions of the aforementioned attribute were noted, surpassing the results obtained by Nanni et al. [31], who worked with VNIR-SWIR spectroscopy in soil from this region of Paraná, these were lower than expected. With the use of the spiking technique and hybrid spectral curves, the results were expected to be higher than those reported by Daniel et al. [53], Wetterlind et al. [11], and Wang et al. [54], which was not the case.
The recalibration of the state model with selected samples from the target area, despite displaying slightly improved organic matter estimates, indicated that the type of sample directly influences the result. The selected samples should be able to transfer the maximum variability concerning organic matter composition and amounts to the models to be recalibrated, to allow them to adequately estimate this attribute when it is applied to the sample collection area. Within this context, sample selection based on the cluster analysis (strategy four) was the most adequate for state model recalibration.
Another point that should be highlighted refers to the number of soil samples used for recalibration. Only 10 samples were not sufficient to represent the organic matter content distribution of the within the target area, since 59.23% variation was observed. According to Kuang and Mouazen [55], depending on spatial variability, one to two samples per hectare would be sufficient to capture this dissimilarity at a specific site. Nawar and Mouazen [52] suggested the use of four to five samples per ha for recalibration of European spectral libraries in order to provide adequate accuracy in the prediction of soil organic carbon content.
Hong et al. [17] observed improvements in the estimation of soil organic carbon when a larger number of samples (from 10 to 60) were used in the recalibration of the prediction model. However, it was detected that there were no significant gains when more than 30 samples were used for recalibration. Considering the edaphoclimatic conditions of the studied areas, the researchers suggested selecting from 20 to 30 samples to recalibrate the models to balance the relationship between the modeling cost and predictive accuracy. In turn, Gogé et al. [56] obtained satisfactory results in the estimation of organic carbon applying the spiking technique when they used 50 local samples to recalibrate the prediction model, corresponding to 35% of the total number of samples from the target area.
It should be emphasized that increasing the number of samples selected of a target area for recalibration of a prediction model is acceptable to a certain extent, since there is a requirement for analytical results of these samples. If the number is high, spectroscopy ceases to be an attractive technique with innovative potential, becoming an expensive technique when applied to estimation of organic matter or other soil attribute, agreeing with Hong et al. [17].
4. Conclusions
1. The samples selected through a cluster analysis were more effective for state model recalibration, since they were able to transfer more information about the chemical and physical characteristics of the organic matter attribute present in the target area. This fact allowed for a relatively better prediction when compared to the use of other samples selected by other strategies, reaching r2 and RPD equal to 0.43 and 1.36, respectively, in the spiked state model;
2. The use of hybrid spectral curves did not allow for improvements in organic matter content estimations, since the target area and Paraná state samples occupied the same spectral space. As the hybrid spectrum contains information from both datasets, these will be effective in completing the spectral space if the groups are in different spectral domain, a fact that did not occur in this work;
3. The spiking technique was more effective in state model recalibration than when in conjunction with hybridization, generating more satisfactory results. Maximum RMSEP and R equal to 4.9 g dm−3 against 6.2 g dm−3 and 0.76 against 0.71, respectively were observed. The use of selected samples from the target area to recalibrate the state model allowed the incorporation of minimal, but promising, information to improve the estimation of the organic matter content.
4. The study developed in the state of Paraná and in Brazil involving the spiking technique is new, so it is recommended that further research is to be carried out in order to continue the search for the selection of representative samples from new areas, as well as recalibration of predictive models, not only for organic matter, but also for other important soil attributes, which are currently determined by the use of chemical reagents with potential for soil contamination.
Author Contributions
Conceptualization: E.C., M.R.N. and J.A.M.D.; methodology: E.C., L.G.T.C., M.R.N., R.H.F., M.R., R.N.R.S. and J.A.M.D.; formal analysis: L.G.T.C., M.S.C., R.N.R.S., M.R. and G.F.C.S.; data curation: L.G.T.C., M.R.N. and K.M.d.O.; writing—original draft preparation: E.C.; visualization: L.G.T.C. and M.R.N.; project administration: E.C., M.R.N. and L.S.; funding acquisition: E.C., M.R.N. and L.S. All authors have read and agreed to the published version of the manuscript.
Funding
This work has been funded by the National Council for Scientific and Technological Development, CNPq, through a postdoctoral fellowship grant to the first author and resources to carry out the project analyses foreseen [151658/2012-9]; Coordination of Superior Level Staff Improvement, CAPES; Central Public-Interest Scientific Institution Basal Research Fund [Y2021GH18] and the Talented Young Scientist Program—China Science and Technology Exchange Center [Brazil–19-004].
Institutional Review Board Statement
Not applicable.
Informed Consent Statement
Not applicable.
Data Availability Statement
The data that support the findings of this study are available from the author E.C.
Conflicts of Interest
The authors declare no conflict of interest.
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figures and Tables
Figure 1. Location of the state of Paraná and the target area inserted in the northwestern region.
Figure 2. Scheme used to represent the experiment. (a) Unspiked initial calibration (IC) model constructed only with state samples; (b) Initial calibration spiked with a spiking subset (SS) selected from the target site (TS); (c) Initial calibration spiked with a spiking subset and hybridization (SSH) selected from the target site (TS) and state of Paraná. Adapted from Guerrero et al. [15].
Figure 3. A: Sample set collected in state of Paraná (n = 425); B: Sample set collected in target area (n = 200); n = number of samples.
Figure 4. Mean spectral curves obtained for the sample set from the state of Paraná (A) and from the target area (B), separated by soil classes.
Figure 5. Main component (PC) similarity maps, between the Paraná and target site datasets. Blue scores were obtained by the calibration model using state spectra. Green scores were obtained by the calibration model using local spectra.
Figure 6. Scatterplots obtained during the prediction phase. (A) Unspiked State Model; (B) Spiked State Model (subset one); (C) Spiked State Model (subset two); (D) Spiked State Model (subset three); (E) Spiked State Model (subset four); (F) Spiked and Hybridized State Model (subset one); (G) Spiked and Hybridized State Model (subset two); (H) Spiked and Hybridized State Model (subset three) and (I) Spiked and Hybridized State Model (subset four). Line 1:1 (dashed); regression line (solid line).
Figure 6. Scatterplots obtained during the prediction phase. (A) Unspiked State Model; (B) Spiked State Model (subset one); (C) Spiked State Model (subset two); (D) Spiked State Model (subset three); (E) Spiked State Model (subset four); (F) Spiked and Hybridized State Model (subset one); (G) Spiked and Hybridized State Model (subset two); (H) Spiked and Hybridized State Model (subset three) and (I) Spiked and Hybridized State Model (subset four). Line 1:1 (dashed); regression line (solid line).
Statistical parameters obtained during the calibration and validation phase of the unspiked state model.
Calibration | |||||||
n | r2 (1) | RMSEC (2) | SEC (3) | BIAS (4) | R (5) | RPD (6) | Nº factors |
425 | 0.86 | 10.2 | 10.2 | 6.05 × 10−4 | 0.77 | 1.63 | 10 |
Prediction | |||||||
n | r2 | RMSEP (7) | SEP (8) | BIAS | R | RPD | Nº factors |
200 | 0.30 | 5.2 |
5.2 | −0.43 | 0.67 | 1.42 | 7 |
(1) Determination coefficient; (2) Root-Mean-Square Error calibration; (3) Standard Error calibration; (4) Calibration Systematic Error, (5) Correlation Coefficient, (6) Ratio of Performance to Deviation; (7) Root-Mean-Square Error for prediction; (8) Standard error prediction; n: number of Paraná and target site soil samples.
Table 2Statistical parameters obtained by recalibration with samples from the target area (n = 435).
Spiked State Model | |||||||
Subset | r2 (1) | RMSEC (2) | SEC (3) | BIAS (4) | R (5) | RPD (6) | Nº factors |
1 | 0.87 | 9.8 | 9.8 | 6.91 × 10−4 | 0.80 | 1.68 | 14 |
2 | 0.87 | 9.7 | 9.7 | 2.18 × 10−4 | 0.79 | 1.66 | 14 |
3 | 0.86 | 9.9 | 9.9 | −3.25 × 10−4 | 0.78 | 1.65 | 14 |
4 | 0.88 | 9.6 | 9.6 | −3.58 × 10−5 | 0.80 | 1.72 | 14 |
Spiked and Hybridizated State Model | |||||||
Subset | r2 (1) | RMSEC (2) | SEC (3) | BIAS (4) | R (5) | RPD (6) | Nº factors |
1 | 0.87 | 9.9 | 9.9 | 2.40 × 10−3 | 0.79 | 1.66 | 14 |
2 | 0.88 | 9.6 | 9.7 | 9.46 × 10−5 | 0.80 | 1.68 | 14 |
3 | 0.87 | 9.8 | 9.8 | 4.75 × 10−4 | 0.78 | 1.66 | 11 |
4 | 0.87 | 9.7 | 9.7 | 2.41 × 10−4 | 0.79 | 1.65 | 11 |
(1) Recalibration Determination Coefficient; (2) Recalibration Root-Mean-Square Error; (3) Recalibration Standard Error; (4) Recalibration Systematic Error, (5) Correlation Coefficient, (6) Residual Predictive Deviation; n: number of samples used in the recalibrated model.
Table 3Statistical parameters obtained during the prediction phase, using 95% of the remaining samples from the target area (n = 190).
Spiked State Model | |||||||
Subset | r2 (1) | RMSEP (2) | SEP (3) | BIAS (4) | R (5) | RPD (6) | Nº factors |
1 | 0.41 | 4.8 | 4.8 | 0.33 | 0.68 | 1.52 | 7 |
2 | 0.38 | 4.9 | 4.9 | −0.41 | 0.70 | 1.28 | 7 |
3 | 0.37 | 4.9 | 5.3 | 1.20 | 0.68 | 1.37 | 7 |
4 | 0.43 | 4.4 | 4.7 | 0.94 | 0.76 | 1.36 | 6 |
Spiked and Hybridized State Model | |||||||
Subset | r2 (1) | RMSEP (2) | SEP (3) | BIAS (4) | R (5) | RPD (6) | Nº factors |
1 | 0.32 | 6.2 | 6.7 | 1.37 | 0.62 | 1.22 | 10 |
2 | 0.34 | 5.1 | 5.6 | 1.39 | 0.69 | 1.21 | 7 |
3 | 0.34 | 5.0 | 5.6 | 1.47 | 0.71 | 1.23 | 6 |
4 | 0.41 | 4.6 | 4.7 | 0.49 | 0.71 | 1.55 | 7 |
(1) Prediction Determination Coefficient; (2) Prediction Root-Mean-Square Error; (3) Prediction Standard Error; (4) Prediction Systematic Error, (5) Correlation Coefficient, (6) Residual Predictive Deviation; n: number of soil samples used in the prediction.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
© 2021 by the authors.
Abstract
Visible (V), Near Infrared (NIR) and Short Waves Infrared (SWIR) spectroscopy has been indicated as a promising tool in soil studies, especially in the last decade. However, in order to apply this method, it is necessary to develop prediction models with the capacity to capture the intrinsic differences between agricultural areas and incorporate them in the modeling process. High quality estimates are generally obtained when these models are applied to soil samples displaying characteristics similar to the samples used in their construction. However, low quality predictions are noted when applied to samples from new areas presenting different characteristics. One way to solve this problem is by recalibrating the models using selected samples from the area of interest. Based on this premise, the aim of this study was to use the spiking technique and spiking associated with hybridization to expand prediction models and estimate organic matter content in a target area undergoing different uses and management. A total of 425 soil samples were used for the generation of the state model, as well as 200 samples from a target area to select the subsets (10 samples) used for model recalibration. The spectral readings of the samples were obtained in the laboratory using the ASD FieldSpec 3 Jr. Sensor from 350 to 2500 nm. The spectral curves of the samples were then associated to the soil attributes by means of a partial least squares regression (PLSR). The state model obtained better results when recalibrated with samples selected through a cluster analysis. The use of hybrid spectral curves did not generate significant improvements, presenting estimates, in most cases, lower than the state model applied without recalibration. The use of the isolated spiking technique was more effective in comparison with the spiked and hybridized state models, reaching r2, square root of mean prediction error (RMSEP) and ratio of performance to deviation (RPD) values of 0.43, 4.4 g dm−3, and 1.36, respectively.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Details





1 Department of Agronomy, State University of Maringá, Maringá, PR 87020-900, Brazil;
2 Department of Agronomy, State University of Maringá, Maringá, PR 87020-900, Brazil;
3 Key Laboratory of Agricultural Remote Sensing, Ministry of Agriculture/CAAS-CIAT Joint Laboratory in Advanced Technologies for Sustainable Agriculture—Institute of Agricultural Resources and Regional Planning, Chinese Academy of Agricultural Sciences, Beijing 100081, China
4 Mathematician, Statistical Specialist, Londrina, PR 86001-970, Brazil;
5 Department of Soil Science, University of São Paulo, Piracicaba, SP 13418-900, Brazil;