Full Text

Turn on search term navigation

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Arabic is one of the official languages recognized by the United Nations (UN) and is widely used in the middle east, and parts of Asia, Africa, and other countries. Social media activity currently dominates the textual communication on the Internet and potentially represents people’s views about specific issues. Opinion mining is an important task for understanding public opinion polarity towards an issue. Understanding public opinion leads to better decisions in many fields, such as public services and business. Language background plays a vital role in understanding opinion polarity. Variation is not only due to the vocabulary but also cultural background. The sentence is a time series signal; therefore, sequence gives a significant correlation to the meaning of the text. A recurrent neural network (RNN) is a variant of deep learning where the sequence is considered. Long short-term memory (LSTM) is an implementation of RNN with a particular gate to keep or ignore specific word signals during a sequence of inputs. Text is unstructured data, and it cannot be processed further by a machine unless an algorithm transforms the representation into a readable machine learning format as a vector of numerical values. Transformation algorithms range from the Term Frequency–Inverse Document Frequency (TF-IDF) transform to advanced word embedding. Word embedding methods include GloVe, word2vec, BERT, and fastText. This research experimented with those algorithms to perform vector transformation of the Arabic text dataset. This study implements and compares the GloVe and fastText word embedding algorithms and long short-term memory (LSTM) implemented in single-, double-, and triple-layer architectures. Finally, this research compares their accuracy for opinion mining on an Arabic dataset. It evaluates the proposed algorithm with the ASAD dataset of 55,000 annotated tweets in three classes. The dataset was augmented to achieve equal proportions of positive, negative, and neutral classes. According to the evaluation results, the triple-layer LSTM with fastText word embedding achieved the best testing accuracy, at 90.9%, surpassing all other experimental scenarios.

Details

Title
Arabic Language Opinion Mining Based on Long Short-Term Memory (LSTM)
Author
Setyanto, Arief 1   VIAFID ORCID Logo  ; Laksito, Arif 2   VIAFID ORCID Logo  ; Alarfaj, Fawaz 3   VIAFID ORCID Logo  ; Alreshoodi, Mohammed 4   VIAFID ORCID Logo  ; Kusrini 1 ; Oyong, Irwan 2   VIAFID ORCID Logo  ; Hayaty, Mardhiya 2 ; Alomair, Abdullah 3 ; Almusallam, Naif 3 ; Kurniasari, Lilis 5 

 Magister of Informatics Engineering, Universitas Amikom Yogyakarta, Yogyakarta 55281, Indonesia; [email protected] 
 Faculty of Computer Science, Universitas Amikom Yogyakarta, Yogyakarta 55281, Indonesia; [email protected] (I.O.); [email protected] (M.H.) 
 Department of Computer and Information Sciences, Imam Mohammad Ibn Saud Islamic University, Riyadh 11564, Saudi Arabia; [email protected] (A.A.); [email protected] (N.A.) 
 Department of Natural Applied Science, Applied College, Qassim University, Buraydah 52571, Saudi Arabia; [email protected] 
 Departemen of Electrical Engineering, Universitas Nahdlatul Ulama Yogyakarta, Yogyakarta 55162, Indonesia; [email protected] 
First page
4140
Publication year
2022
Publication date
2022
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2662926368
Copyright
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.