Full text

Turn on search term navigation

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

Currently, a significant portion of published research on online hate speech relies on existing textual corpora. However, when examining a specific context, there is a lack of preexisting datasets that include the particularities associated with various conditions (e.g., geographic and cultural). This issue is evident in the case of online anti-immigrant speech in Mexico, where available data to study this emergent and often overlooked phenomenon are scarce. In light of this situation, we propose a novel methodology wherein three domain experts annotate a certain number of texts related to the subject. We establish a precise control mechanism based on these annotations to evaluate non-expert annotators. The evaluation of the contributors is implemented in a custom annotation platform, enabling us to conduct a controlled crowdsourcing campaign and assess the reliability of the obtained data. Our results demonstrate that a combination of crowdsourced and expert data leads to iterative improvements, not only in the accuracy achieved by various machine learning classification models (reaching 0.8828) but also in the model’s adaptation to the specific characteristics of hate speech in the Mexican Twittersphere context. In addition to these methodological innovations, the most significant contribution of our work is the creation of the first online Mexican anti-immigrant training corpus for machine-learning-based detection tasks.

Details

Title
High-Quality Data from Crowdsourcing towards the Creation of a Mexican Anti-Immigrant Speech Corpus
Author
Molina-Villegas, Alejandro 1   VIAFID ORCID Logo  ; Cattin, Thomas 2 ; Gazca-Hernandez, Karina 3   VIAFID ORCID Logo  ; Aldana-Bobadilla, Edwin 4   VIAFID ORCID Logo 

 CONAHCYT, Mexico City 03940, Mexico; [email protected]; Centro de Investigación en Ciencias de Información Geoespacial, Mexico City 14240, Mexico; [email protected] 
 Centro de Investigación en Ciencias de Información Geoespacial, Mexico City 14240, Mexico; [email protected]; IFG Lab Centre de recherches et d’analyses géopolitiques, Université Paris 8, 93526 Saint-Denis, France 
 Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional—Unidad Tamaulipas, Ciudad Victoria, Tamaulipas 87130, Mexico; [email protected] 
 CONAHCYT, Mexico City 03940, Mexico; [email protected]; Centro de Investigación y de Estudios Avanzados del Instituto Politécnico Nacional—Unidad Tamaulipas, Ciudad Victoria, Tamaulipas 87130, Mexico; [email protected] 
First page
8417
Publication year
2023
Publication date
2023
Publisher
MDPI AG
e-ISSN
20763417
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2842935592
Copyright
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.