Full Text

Turn on search term navigation

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

We introduce the first study of the automatic detoxification of Russian texts to combat offensive language. This kind of textual style transfer can be used for processing toxic content on social media or for eliminating toxicity in automatically generated texts. While much work has been done for the English language in this field, there are no works on detoxification for the Russian language. We suggest two types of models—an approach based on BERT architecture that performs local corrections and a supervised approach based on a pretrained GPT-2 language model. We compare these methods with several baselines. In addition, we provide the training datasets and describe the evaluation setup and metrics for automatic and manual evaluation. The results show that the tested approaches can be successfully used for detoxification, although there is room for improvement.

Details

Title
Methods for Detoxification of Texts for the Russian Language
Author
Dementieva, Daryna 1   VIAFID ORCID Logo  ; Moskovskiy, Daniil 1 ; Logacheva, Varvara 1 ; Dale, David 1 ; Kozlova, Olga 2 ; Semenov, Nikita 2 ; Panchenko, Alexander 1   VIAFID ORCID Logo 

 Skolkovo Institute of Science and Technology, 121205 Moscow, Russia; [email protected] (D.M.); [email protected] (V.L.); [email protected] (D.D.); [email protected] (A.P.) 
 Mobile TeleSystems (MTS), 109147 Moscow, Russia; [email protected] (O.K.); [email protected] (N.S.) 
First page
54
Publication year
2021
Publication date
2021
Publisher
MDPI AG
e-ISSN
24144088
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2576471144
Copyright
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.