Full text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

This paper offers a comprehensive examination of the process involved in developing and automating supervised end-to-end machine learning workflows for forecasting and classification purposes. It offers a complete overview of the components (i.e., feature engineering and model selection), principles (i.e., bias–variance decomposition, model complexity, overfitting, model sensitivity to feature assumptions and scaling, and output interpretability), models (i.e., neural networks and regression models), methods (i.e., cross-validation and data augmentation), metrics (i.e., Mean Squared Error and F1-score) and tools that rule most supervised learning applications with numerical and categorical data, as well as their integration, automation, and deployment. The end goal and contribution of this paper is the education and guidance of the non-AI expert academic community regarding complete and rigorous machine learning workflows and data science practices, from problem scoping to design and state-of-the-art automation tools, including basic principles and reasoning in the choice of methods. The paper delves into the critical stages of supervised machine learning workflow development, many of which are often omitted by researchers, and covers foundational concepts essential for understanding and optimizing a functional machine learning workflow, thereby offering a holistic view of task-specific application development for applied researchers who are non-AI experts. This paper may be of significant value to academic researchers developing and prototyping machine learning workflows for their own research or as customer-tailored solutions for government and industry partners.

Details

Title
Fundamental Components and Principles of Supervised Machine Learning Workflows with Numerical and Categorical Data
Author
Kampezidou, Styliani I 1   VIAFID ORCID Logo  ; Ray, Archana Tikayat 2   VIAFID ORCID Logo  ; Bhat, Anirudh Prabhakara 3   VIAFID ORCID Logo  ; Pinon Fischer, Olivia J 1   VIAFID ORCID Logo  ; Mavris, Dimitri N 1   VIAFID ORCID Logo 

 Aerospace Systems Design Laboratory, School of Aerospace Engineering, Georgia Institute of Technology, Atlanta, GA 30332, USA; [email protected] (O.J.P.F.); [email protected] (D.N.M.) 
 AI Fusion Technologies, Toronto, ON M5V 3Z5, Canada; [email protected] 
 Amazon, Toronto, ON M5H 4A9, Canada; [email protected] 
First page
384
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
26734117
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2998847468
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.