Boosting Arabic text classification using hybrid deep learning approach

Abstract

As a significant natural language processing task (NLP), Arabic text classification is essential for efficiently processing and analyzing Arabic language content in various digital forms, such as information retrieval, sentiment analysis, and topic modeling. Deep Learning architectures, such as convolutional neural networks (CNN) and long short-term memory (LSTM), have been widely utilized to categorize and organize language contents accurately to improve the autonomy and perception of NLP tasks. In this paper, we develop a hybrid deep learning framework for Arabic text classification, using the Inception-CNN (introduced in the GoogleNet architecture) and the LSTM (variation of the Recurrent Neural Network). Specifically, the proposed system has been trained and evaluated on two datasets of an Arabic articles dataset, viz. SANAD and NADiA datasets. Consequently, several variations of the model architecture have been configured, trained, evaluated, and compared, with the aim of obtaining the best model architecture and hyperparameters. Our best experimental evaluation showed that the proposed hybrid system (Inception CNN with and LSTM) yielded an accuracy of 92% and 96% for the Akhbarona and AlKhaleej datasets, respectively. At the same time, the entire SANAD data set also yielded a high accuracy of 92%. Lastly, comparing with the state-of-the-art models revealed the superiority of our hybrid model, which outperformed the other architectures in the same area of study, the accuracies have been improved by 1% to 30% for the different datasets.

Article Highlights

Proposing a model that combines the Inception module (CNN architecture) and LSTM for Arabic Text Classification

Research conducted on a low-sourced language; Arabic, using the datasets SANAD and NADiA.

The proposed model has yielded an accuracy of 92% for SANAD and 89% for NADiA, which outperformed other compared architectures.

Details

Business indexing term

Subject:

Machine learning

Identifier / keyword

Arabic text classification (ATC); Convolutional neural network (CNN); Long short term memory (LSTM); Inception module; SANAD dataset; NADiA dataset

Title

Boosting Arabic text classification using hybrid deep learning approach

Publication title

SN Applied Sciences; London

Volume

Issue

Pages

540

Publication year

2025

Publication date

Jun 2025

Publisher

Springer Nature B.V.

Place of publication

London

Country of publication

Netherlands

Publication subject

Sciences: Comprehensive Works

ISSN

25233963

e-ISSN

25233971

Source type

Scholarly Journal

Language of publication

English

Document type

Journal Article

Publication history

Online publication date

2025-05-25

Milestone dates

2025-04-23 (Registration); 2024-12-28 (Received); 2025-04-23 (Accepted)

Publication history

First posting date

25 May 2025

DOI

https://doi.org/10.1007/s42452-025-07025-x

ProQuest document ID

3211745890

Document URL

https://www.proquest.com/scholarly-journals/boosting-arabic-text-classification-using-hybrid/docview/3211745890/se-2?accountid=208611

Last updated

2025-05-30

Database

ProQuest One Academic

Boosting Arabic text classification using hybrid deep learning approach

Content area

Abstract

Details