Content area

Abstract

In this paper, we apply an information-theoretic method proposed by Ryabko and Savina (therefore called the RS-method), based on the use of data compression, to recognize the individual author’s style of a writer across four languages from different language groups and families. In this paper, the presented method was used to study fiction texts in Russian (East Slavic group of languages of the Indo-European language family), Amharic (South Ethiosemitic group of the Semitic language family), Chinese (Sinitic group of the Sino-Tibetan language family) and English (West Germanic language group of the Indo-European language family). It was found that the amount of data necessary for recognizing an author’s style is almost the same for all four languages, i.e., the amount of data is invariant across different language groups. The results obtained are of interest to computer science, literary studies, linguistics and, in particular, computational linguistics.

Details

1009240
Title
The Amount of Data Required to Recognize a Writer’s Style Is Consistent Across Different Languages of the World
Author
Ryabko Boris 1   VIAFID ORCID Logo  ; Savina Nadezhda 2   VIAFID ORCID Logo  ; Getachew, Lulu Yeshewas 2   VIAFID ORCID Logo  ; Han, Yunfei 2 

 Federal Research Center for Information and Computational Technologies, 6300090 Novosibirsk, Russia, Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, [email protected] (Y.G.L.); [email protected] (Y.H.) 
 Department of Information Technologies, Novosibirsk State University, 6300090 Novosibirsk, [email protected] (Y.G.L.); [email protected] (Y.H.) 
Publication title
Entropy; Basel
Volume
27
Issue
10
First page
1039
Number of pages
13
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
10994300
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-10-04
Milestone dates
2025-08-25 (Received); 2025-10-01 (Accepted)
Publication history
 
 
   First posting date
04 Oct 2025
ProQuest document ID
3265896178
Document URL
https://www.proquest.com/scholarly-journals/amount-data-required-recognize-writer-s-style-is/docview/3265896178/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
ProQuest One Academic