Content area
Near‐infrared (NIR) spectroscopy data encounter challenges in data processing such as peak overlapping, information redundancy, and background or noise, which complicate the evaluation of weak differences among similar samples. Therefore, accurately identifying these differences and assessing similarities are essential in practical applications for sample classification and further replacement of raw materials in the product formulation. In this work, 32 data preprocessing strategies of NIR data were systematically combined for comprehensive comparison, and 11 methods for similarity analysis were evaluated to attain optimal performance. Using the rationality of similarity evaluation as the assessment criterion, the combination of NIR data pretreatment methods of “standard normal variate (SNV) + first‐order derivative by Savitzky–Golay (1D/SG) + maximum–minimum scaling (MMS) + spectral similarity by combinatorial strategy (SS/CS)” is ultimately preferred as the most effective combination for similarity evaluation. It uses SNV transformation, 1D/SG, MMS, and scattering correction to eliminate the scattering effect, enhance the signal‐to‐noise ratio (SNR) of the distinction of overlapping peaks, and improve data comparability. After this, the widely used methods for similarity evaluation were employed for comprehensive analysis and comparison of the rationality, such as Euclidean distance, correlation coefficient, and divergence information. The evaluation strategy proposed in this work can effectively distinguish the difference among the tobacco samples existing in 10 different categories. The similarity among typical samples in the same class is above 0.9, while the values in different classes are below 0.7. In real applications for method validation, recognition precision of tobacco samples with blending of interfering mixtures reaches 5%, which is conducted using complex tobacco materials for formulation replacement and optimization. The satisfactory results introduce robust and CS that outperforms traditional single‐method approaches to resolve weak spectral differences through real‐world tobacco formulation replacement applications. It can be widely used in the areas related to NIR for similarity evaluation, such as pharmaceuticals, food quality control, and environmental monitoring.
Details
Similarity;
Quality standards;
Data processing;
Product quality;
Quality control;
Wavelet transforms;
Spectrum analysis;
Scattering;
Combinatorial analysis;
Data smoothing;
Near infrared radiation;
Environmental monitoring;
Optimization;
Raw materials;
Tobacco;
Correlation coefficients;
Euclidean geometry
; Jiang, Hui 2
; Ling, Jun 1
; Wen, Liliang 2
; Yan, Keliang 1
; Chen, Aiming 2
; Zeng, Zhongda 3
; Wang, Miaomiao 4
; Yang, Qianxu 1
1 R & D Center, , China Tobacco Yunnan Industrial Co. Ltd., , Kunming, , , Yunnan, China,
2 Department of R & D, , Dalian Chem Data Solution Information Technology Co. Ltd., , Dalian, , , Liaoning, China
3 College of Environmental and Chemical Engineering, , Dalian University, , Dalian, , , Liaoning, China,
4 Xinjiang Science & Technology Resource Sharing Service Center, , Xinjiang Key Laboratory of Featured Functional Food Nutrition and Safety Testing, , Kexue North Road 374, Urumqi, , , Xinjiang Uygur, China