Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets

Abstract

With the increasing complexity of financial statement manipulation and severe class imbalance issue, the growing complexity of financial fraud detection systems has revealed limitations in conventional approaches that rely exclusively on quantitative financial data and traditional machine learning algorithms. To overcome these constraints, we propose an enhanced financial fraud detection model that leverages advanced ensemble learning classifiers on combined features, comprising both textual information extracted from annual reports through natural language processing techniques and structured financial data from corporate statements. Utilizing a dataset of Chinese manufacturing firms listed between 2010 and 2019, we integrate textual topic indicators derived from the latent Dirichlet allocation (LDA) model with raw financial items to construct a comprehensive fraud detection system. Empirical results demonstrate the superiority of combined textual and financial indicators, which achieves significant improvements, with AUC increasing +1.5% for RUSBoost and +1.6% for XGBoost, alongside 4.5% and 3.8% NDCG@K gains (p < 0.01). Further evaluation using precision, recall, and F1‐score confirms the robustness and practical effectiveness of the proposed model under imbalanced class distributions.

Details

Business indexing term

Subject:

Machine learning;
Annual reports;
Fraud prevention;
Financial statements;
Regulation of financial institutions;
Financial reporting;
Management Discussion & Analysis

Location

United States--US

Identifier / keyword

ensemble learning classifiers; financial fraud detection; latent Dirichlet allocation (LDA); textual topic indicator

Title

Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets

Author

Wei, Congying¹; Qian, Xiyuan¹

¹ School of Mathematics, , East China University of Science and Technology, , Shanghai, , China, ecust.edu.cn

Publication title

Journal of Mathematics; Cairo

Volume

2025

Issue

Number of pages

Publication year

2025

Publication date

2025

Publisher

John Wiley & Sons, Inc.

Place of publication

Cairo

Country of publication

United States

Publication subject

Mathematics

ISSN

23144629

e-ISSN

23144785

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-11-20

Milestone dates

2025-09-23 (manuscriptRevised); 2025-11-20 (publishedOnlineFinalForm); 2025-03-14 (manuscriptReceived); 2025-10-24 (manuscriptAccepted)

Publication history

First posting date

20 Nov 2025

DOI

https://doi.org/10.1155/jom/6643152

ProQuest document ID

3273641315

Document URL

https://www.proquest.com/scholarly-journals/bridging-semantic-gap-ensemble-learning-framework/docview/3273641315/se-2?accountid=208611

© 2025. This work is published under http://creativecommons.org/licenses/by/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Last updated

2025-11-21

Database

ProQuest One Academic

Bridging the Semantic Gap: An Ensemble Learning Framework With Textual Topic‐Raw Financial Feature Fusion to Enhance Fraud Detection in Chinese Markets

Content area

Abstract

Details