Full text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Imbalanced datasets pose pervasive challenges in numerous machine learning (ML) applications, notably in areas such as fraud detection, where fraudulent cases are vastly outnumbered by legitimate transactions. Conventional ML methods often grapple with such imbalances, resulting in models with suboptimal performance concerning the minority class. This study undertakes a thorough examination of strategies for optimizing supervised learning algorithms when confronted with imbalanced datasets, emphasizing resampling techniques. Initially, we explore multiple methodologies, encompassing Gaussian Naive Bayes, linear and quadratic discriminant analysis, K-nearest neighbors (K-NN), support vector machines (SVMs), decision trees, and multi-layer perceptron (MLP). We apply these on a four-class spiral dataset, a notoriously demanding non-linear classification problem, to gauge their effectiveness. Subsequently, we leverage the garnered insights for a real-world credit card fraud detection task on a public dataset, where we achieve a compelling accuracy of 99.937%. In this context, we compare and contrast the performances of undersampling, oversampling, and the synthetic minority oversampling technique (SMOTE). Our findings highlight the potency of resampling strategies in augmenting model performance on the minority class; in particular, oversampling techniques achieve the best performance, resulting in an accuracy of 99.928% with a significantly low number of false negatives (21/227,451).

Details

Title
Optimizing Neural Networks for Imbalanced Data
Author
de Zarzà, I 1   VIAFID ORCID Logo  ; de Curtò, J 1   VIAFID ORCID Logo  ; Calafate, Carlos T 2   VIAFID ORCID Logo 

 Informatik und Mathematik, GOETHE–University Frankfurt am Main, 60323 Frankfurt am Main, Germany; [email protected]; Departamento de Informática de Sistemas y Computadores, Universitat Politècnica de València, 46022 València, Spain; [email protected]; Estudis d’Informàtica, Multimèdia i Telecomunicació, Universitat Oberta de Catalunya, 08018 Barcelona, Spain 
 Departamento de Informática de Sistemas y Computadores, Universitat Politècnica de València, 46022 València, Spain; [email protected] 
First page
2674
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2829796433
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.