Full text

Turn on search term navigation

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The world prevalence of the two types of authorized and fraudulent transactions makes it difficult to distinguish between the two operations. The small percentage of fraudulent transactions, in turn, gives rise to the class imbalance problem. Hence, an adequately robust fraud detection mechanism must exist for tax systems to avoid their collapse. It has become significantly difficult to obtain any dataset, specifically a tax return dataset, because of the rising importance of privacy in a society where people generally feel squeamish about sharing personal information. Because of this, we arrive at the decision to synthesize our dataset by employing publicly available data, as well as enhance them through Correlational Generative Adversarial Networks (CGANs) and the Synthetic Minority Oversampling Technique (SMOTE). The proposed method includes a preprocessing stage to denoise the data and identify anomalies, outliers, and dimensionality reduction. Then the data have undergone enhancement using the SMOTE and the proposed CGAN techniques. A unique encoder design has been proposed, which serves the purpose of exposing the hidden patterns among legitimate and fraudulent records. This research found anomalous deductions, income inconsistencies, recurrent transaction manipulations, and irregular filing practices that distinguish fraudulent from valid tax records. These patterns are identified by encoder-based feature extraction and synthetic data augmentation. Several machine learning classifiers, along with a voting ensemble technique, have been used both with and without data augmentation. Experimental results have shown that the proposed Soft-Voting technique outperformed the original without an ensemble method.

Details

Title
Advanced Tax Fraud Detection: A Soft-Voting Ensemble Based on GAN and Encoder Architecture
Author
Alrasheedi, Masad A 1   VIAFID ORCID Logo  ; Ijaz, Samia 2 ; Alrashdi, Ayed M 3   VIAFID ORCID Logo  ; Seung-Won, Lee 4   VIAFID ORCID Logo 

 Department of Management Information Systems, College of Business Administration, Taibah University, Al-Madinah Al-Munawara 42353, Saudi Arabia; [email protected] 
 Department of Computer Science, HITEC University, Taxila 47080, Pakistan 
 Department of Electrical Engineering, College of Engineering, University of Ha’il, Ha’il 81441, Saudi Arabia; [email protected] 
 Department of Precision Medicine, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea; Department of Metabiohealth, Sungkyunkwan University, Suwon 16419, Republic of Korea; Personalized Cancer Immunotherapy Research Center, Sungkyunkwan University School of Medicine, Suwon 16419, Republic of Korea; Department of Artificial Intelligence, Sungkyunkwan University, Suwon 16419, Republic of Korea 
First page
642
Publication year
2025
Publication date
2025
Publisher
MDPI AG
e-ISSN
22277390
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3171097573
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.