Content area

Abstract

With the increasing complexity of financial statement manipulation and severe class imbalance issue, the growing complexity of financial fraud detection systems has revealed limitations in conventional approaches that rely exclusively on quantitative financial data and traditional machine learning algorithms. To overcome these constraints, we propose an enhanced financial fraud detection model that leverages advanced ensemble learning classifiers on combined features, comprising both textual information extracted from annual reports through natural language processing techniques and structured financial data from corporate statements. Utilizing a dataset of Chinese manufacturing firms listed between 2010 and 2019, we integrate textual topic indicators derived from the latent Dirichlet allocation (LDA) model with raw financial items to construct a comprehensive fraud detection system. Empirical results demonstrate the superiority of combined textual and financial indicators, which achieves significant improvements, with AUC increasing +1.5% for RUSBoost and +1.6% for XGBoost, alongside 4.5% and 3.8% NDCG@K gains (p < 0.01). Further evaluation using precision, recall, and F1‐score confirms the robustness and practical effectiveness of the proposed model under imbalanced class distributions.

Details

1009240
Title
Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets
Author
Wei, Congying 1 ; Qian, Xiyuan 1   VIAFID ORCID Logo 

 School of Mathematics, , East China University of Science and Technology, , Shanghai, , China, ecust.edu.cn 
Publication title
Volume
2025
Issue
1
Number of pages
17
Publication year
2025
Publication date
2025
Publisher
John Wiley & Sons, Inc.
Place of publication
Cairo
Country of publication
United States
Publication subject
ISSN
23144629
e-ISSN
23144785
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-11-20
Milestone dates
2025-09-23 (manuscriptRevised); 2025-11-20 (publishedOnlineFinalForm); 2025-03-14 (manuscriptReceived); 2025-10-24 (manuscriptAccepted)
Publication history
 
 
   First posting date
20 Nov 2025
ProQuest document ID
3273641315
Document URL
https://www.proquest.com/scholarly-journals/bridging-semantic-gap-ensemble-learning-framework/docview/3273641315/se-2?accountid=208611
Copyright
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-21
Database
ProQuest One Academic