Content area
The preprocessing of infrared spectra can significantly improve predictive accuracy for protein, carbohydrate, lipid, or other nutrition components, yet optimal preprocessing selection is typically empirical, tedious, and dataset specific. This study introduces a Bayesian optimization-based framework designed for the automated selection of optimal spectral preprocessing pipelines within a chemometric modeling context. The framework was applied to mid-infrared spectra of milk to predict compositional parameters for fat, protein, lactose, and total solids. A total of 385 averaged spectra corresponding to 198 unique samples was split into a 70/30 ratio (training/test) using a group-aware Kennard-Stone algorithm, resulting in 269 averaged spectra (135 unique samples) for training and 116 spectra (58 unique samples) for testing. Six regression models: Elastic Net, Gradient Boosting Machines (GBM), Partial Least Squares (PLS), RidgeCV Regression, LassoLarsCV, and Support Vector Regression (SVR) were evaluated across three preprocessing conditions: (1) no preprocessing, (2) literature-derived custom preprocessing (e.g., MSC, SNV, and first and second derivatives), and (3) optimized preprocessing via the proposed Bayesian framework. Optimized preprocessing consistently outperformed other methods, with RidgeCV achieving the best performance for all components except lactose, where PLS slightly outperformed it. Improvements in predictive accuracy, particularly in terms of RMSEP were observed across all milk components. The best RMSEP results were achieved for protein (RMSEP = 0.054,
Details
Datasets;
Regression analysis;
Regression models;
Optimization;
Infrared analysis;
Homogenization;
Training;
Automation;
Peptides;
Oils & fats;
Infrared spectra;
Dietary minerals;
Proteins;
Milk;
Carbohydrates;
Preprocessing;
Bayesian analysis;
Spectrum analysis;
Support vector machines;
Lactose;
Pipelines;
Process controls;
Effectiveness;
Lipids;
Meat quality;
Mathematical models;
Infrared radiation;
Chemometrics;
Uniqueness
; McDougal, Owen M 2
; Andersen, Timothy 1
1 Computer Science, Boise State University, Boise, ID 83725, USA; [email protected]
2 Department of Chemistry and Biochemistry, Boise State University, Boise, ID 83725, USA; [email protected]