Content area

Abstract

Even though comments serve as crucial artifacts for understanding computer programs, relatively few studies examine their form, frequency, or authorship. Code comments are human-readable text that a compiler or interpreter ignores when executing the program. Comments serve multiple purposes, including describing a program’s functionality, explaining bugs or pending updates, and communicating with other developers. Although writing good comments is considered a best practice in software engineering, few studies examine the style and practice of code comment writing, especially non-English comments. The Russian Comment Corpus (RCC) was born out of a desire to understand how Russian-speaking programmers write comments in programming code. This project proposes a new methodology for code comment corpus construction implemented using a Python program to process, filter, and store files containing Russian comments. The RCC contains 95,538 code comments from programs written in C#, Java, JavaScript, Kotlin, PHP, Python, Ruby, and SQL. This project introduces an original comment corpus construction methodology and implements it to create the Russian Comment Corpus. The RCC methodology serves as a blueprint for developing future comment corpora to support studies in code comments, developer cognition, and natural language usage in programming. As a dataset, the Russian Comment Corpus is a foundational work for studying Russian language used in the context of computer programming.

Details

1010268
Title
The Russian Comment Corpus
Number of pages
55
Publication year
2025
Degree date
2025
School code
0008
Source
MAI 86/11(E), Masters Abstracts International
ISBN
9798314875025
Advisor
Committee member
Riley, Kathleen; Ceballos, Luis Cerezo
University/institution
American University
Department
Computer Science
University location
United States -- District of Columbia
Degree
M.Sc.C.S.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31932286
ProQuest document ID
3202686343
Document URL
https://www.proquest.com/dissertations-theses/russian-comment-corpus/docview/3202686343/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic