Full Text

Turn on search term navigation

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Hate speech on social media may spread quickly through online users and subsequently, may even escalate into local vile violence and heinous crimes. This paper proposes a hate speech detection model by means of machine learning and text mining feature extraction techniques. In this study, the authors collected the hate speech of English-Odia code mixed data from a Facebook public page and manually organized them into three classes. In order to build binary and ternary datasets, the data are further converted into binary classes. The modeling of hate speech employs the combination of a machine learning algorithm and features extraction. Support vector machine (SVM), naïve Bayes (NB) and random forest (RF) models were trained using the whole dataset, with the extracted feature based on word unigram, bigram, trigram, combined n-grams, term frequency-inverse document frequency (TF-IDF), combined n-grams weighted by TF-IDF and word2vec for both the datasets. Using the two datasets, we developed two kinds of models with each feature—binary models and ternary models. The models based on SVM with word2vec achieved better performance than the NB and RF models for both the binary and ternary categories. The result reveals that the ternary models achieved less confusion between hate and non-hate speech than the binary models.

Details

Title
Automatic Hate Speech Detection in English-Odia Code Mixed Social Media Data Using Machine Learning Techniques
Author
Mohapatra, Sudhir Kumar 1   VIAFID ORCID Logo  ; Prasad, Srinivas 2 ; Bebarta, Dwiti Krishna 3   VIAFID ORCID Logo  ; Das, Tapan Kumar 4 ; Srinivasan, Kathiravan 5   VIAFID ORCID Logo  ; Yuh-Chung, Hu 6   VIAFID ORCID Logo 

 Faculty of Emerging Technologies, Sri Sri University, Cuttack 754006, India; [email protected] 
 Department of Computer Science and Engineering, GITAM University, Visakhapatnam 530045, India; [email protected] 
 Department of Information Technology, Gayatri Vidya Parishad College of Engineering for Women, Vishakhapatnam 530048, India; [email protected] 
 School of Information Technology and Engineering, Vellore Institute of Technology, Vellore 632014, India; [email protected] 
 School of Computer Science and Engineering, Vellore Institute of Technology, Vellore 632014, India; [email protected] 
 Department of Mechanical and Electromechanical Engineering, National Ilan University, Yilan 26047, Taiwan 
First page
8575
Publication year
2021
Publication date
2021
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2576378705
Copyright
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.