Content area

Abstract

This paper addresses the problem of named entities recognition from source code reviews. The paper provides a comparative analysis of existing approaches and proposes its own methods to improve the quality of problem solving. Proposed and implemented improvements include: methods to deal with data imbalances, improved tokenization of input data, the use of large arrays of unlabeled data, and the use of additional binary classifiers. To assess quality, a new set of 3000 user code reviews was collected and manually labeled. It is shown that the proposed improvements can significantly increase the performance measured by quality metrics, calculated both at the token level (+22%) and at the entire entity level (+13%).

Details

Title
Named Entity Recognition for Code Review Comments
Publication title
Volume
50
Issue
7
Pages
511-523
Publication year
2024
Publication date
Dec 2024
Publisher
Springer Nature B.V.
Place of publication
New York
Country of publication
Netherlands
ISSN
03617688
e-ISSN
16083261
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-12-04
Milestone dates
2024-12-01 (Registration); 2024-03-05 (Received); 2024-03-15 (Accepted); 2024-03-15 (Rev-Recd)
Publication history
 
 
   First posting date
04 Dec 2024
ProQuest document ID
3140795493
Document URL
https://www.proquest.com/scholarly-journals/named-entity-recognition-code-review-comments/docview/3140795493/se-2?accountid=208611
Copyright
Copyright Springer Nature B.V. Dec 2024
Last updated
2024-12-05
Database
ProQuest One Academic