Content area

Abstract

The preprocessing of infrared spectra can significantly improve predictive accuracy for protein, carbohydrate, lipid, or other nutrition components, yet optimal preprocessing selection is typically empirical, tedious, and dataset specific. This study introduces a Bayesian optimization-based framework designed for the automated selection of optimal spectral preprocessing pipelines within a chemometric modeling context. The framework was applied to mid-infrared spectra of milk to predict compositional parameters for fat, protein, lactose, and total solids. A total of 385 averaged spectra corresponding to 198 unique samples was split into a 70/30 ratio (training/test) using a group-aware Kennard-Stone algorithm, resulting in 269 averaged spectra (135 unique samples) for training and 116 spectra (58 unique samples) for testing. Six regression models: Elastic Net, Gradient Boosting Machines (GBM), Partial Least Squares (PLS), RidgeCV Regression, LassoLarsCV, and Support Vector Regression (SVR) were evaluated across three preprocessing conditions: (1) no preprocessing, (2) literature-derived custom preprocessing (e.g., MSC, SNV, and first and second derivatives), and (3) optimized preprocessing via the proposed Bayesian framework. Optimized preprocessing consistently outperformed other methods, with RidgeCV achieving the best performance for all components except lactose, where PLS slightly outperformed it. Improvements in predictive accuracy, particularly in terms of RMSEP were observed across all milk components. The best RMSEP results were achieved for protein (RMSEP = 0.054, R2=0.981) and lactose (RMSEP = 0.026, R2=0.917), followed by fat (RMSEP = 0.139, R2=0.926) and total solids (RMSEP = 0.154, R2=0.960). Literature-based pipelines demonstrated inconsistent effectiveness, highlighting the limitations of transferring preprocessing methods between datasets. The Bayesian optimization approach identified relatively simple yet highly effective preprocessing pipelines, typically involving few steps. By eliminating manual trial and error, this data-driven strategy offers a robust and generalizable solution that streamlines spectral modeling in dairy analysis and can be readily applied to other types of spectroscopic data across various domains.

Details

1009240
Business indexing term
Title
Automated Spectral Preprocessing via Bayesian Optimization for Chemometric Analysis of Milk Constituents
Author
Babatunde Habeeb Abolaji 1   VIAFID ORCID Logo  ; McDougal, Owen M 2   VIAFID ORCID Logo  ; Andersen, Timothy 1   VIAFID ORCID Logo 

 Computer Science, Boise State University, Boise, ID 83725, USA; [email protected] 
 Department of Chemistry and Biochemistry, Boise State University, Boise, ID 83725, USA; [email protected] 
Publication title
Foods; Basel
Volume
14
Issue
17
First page
2996
Number of pages
29
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
23048158
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-08-27
Milestone dates
2025-07-14 (Received); 2025-08-19 (Accepted)
Publication history
 
 
   First posting date
27 Aug 2025
ProQuest document ID
3249681403
Document URL
https://www.proquest.com/scholarly-journals/automated-spectral-preprocessing-via-bayesian/docview/3249681403/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-09-12
Database
ProQuest One Academic