Full Text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Social media networks have grown exponentially over the last two decades, providing the opportunity for users of the internet to communicate and exchange ideas on a variety of topics. The outcome is that opinion mining plays a crucial role in analyzing user opinions and applying these to guide choices, making it one of the most popular areas of research in the field of natural language processing. Despite the fact that several languages, including English, have been the subjects of several studies, not much has been conducted in the area of the Arabic language. The morphological complexities and various dialects of the language make semantic analysis particularly challenging. Moreover, the lack of accurate pre-processing tools and limited resources are constraining factors. This novel study was motivated by the accomplishments of deep learning algorithms and word embeddings in the field of English sentiment analysis. Extensive experiments were conducted based on supervised machine learning in which word embeddings were exploited to determine the sentiment of Arabic reviews. Three deep learning algorithms, convolutional neural networks (CNNs), long short-term memory (LSTM), and a hybrid CNN-LSTM, were introduced. The models used features learned by word embeddings such as Word2Vec and fastText rather than hand-crafted features. The models were tested using two benchmark Arabic datasets: Hotel Arabic Reviews Dataset (HARD) for hotel reviews and Large-Scale Arabic Book Reviews (LARB) for book reviews, with different setups. Comparative experiments utilized the three models with two-word embeddings and different setups of the datasets. The main novelty of this study is to explore the effectiveness of using various word embeddings and different setups of benchmark datasets relating to balance, imbalance, and binary and multi-classification aspects. Findings showed that the best results were obtained in most cases when applying the fastText word embedding using the HARD 2-imbalance dataset for all three proposed models: CNN, LSTM, and CNN-LSTM. Further, the proposed CNN model outperformed the LSTM and CNN-LSTM models for the benchmark HARD dataset by achieving 94.69%, 94.63%, and 94.54% accuracy with fastText, respectively. Although the worst results were obtained for the LABR 3-imbalance dataset using both Word2Vec and FastText, they still outperformed other researchers’ state-of-the-art outcomes applying the same dataset.

Details

Title
Arabic Sentiment Analysis Based on Word Embeddings and Deep Learning
Author
Elhassan, Nasrin 1 ; Varone, Giuseppe 2   VIAFID ORCID Logo  ; Ahmed, Rami 1 ; Gogate, Mandar 3 ; Dashtipour, Kia 3   VIAFID ORCID Logo  ; Almoamari, Hani 4 ; El-Affendi, Mohammed A 5   VIAFID ORCID Logo  ; Bassam Naji Al-Tamimi 6 ; Albalwy, Faisal 7   VIAFID ORCID Logo  ; Hussain, Amir 3   VIAFID ORCID Logo 

 College of Computer Sciences and Information Technology, Sudan University of Science and Technology, Khartoum P.O. Box 407, Sudan; [email protected] (N.E.); [email protected] (R.A.) 
 Department of Physical Therapy, Movement and Rehabilitation Science, Northeastern University, Boston, MA 02115, USA; [email protected] 
 School of Computing, Edinburgh Napier University, Edinburgh EH10 5DT, UK; [email protected] (M.G.); [email protected] (K.D.) 
 Faculty of Computer and Information Systems, Islamic University of Madinah, Medina 42351, Saudi Arabia; [email protected] 
 Department of Computer Science, College of Computer and Information Sciences, Prince Sultan University, Riyadh 12435, Saudi Arabia; [email protected] 
 School of Computing and Digital Technology, Birmingham City University, Birmingham B4 7XG, UK; [email protected] 
 Department of Computer Science, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia; [email protected]; Division of Informatics, Imaging and Data Sciences, School of Health Sciences, Faculty of Biology, Medicine and Health, The University of Manchester, Stopford Building, Oxford Road, Manchester M13 9PL, UK 
First page
126
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
2073431X
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2829787828
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.