Abstract

Background: Repositories of scholarly articles should provide authoritative information about the materials they distribute and should distribute those materials in keeping with pertinent laws. To do so, it is important to have accurate information about the versions of articles in a collection. Analysis: This article presents a simple statistical model to classify articles as author manuscripts or versions of record, with parameters trained on a collection of articles that have been hand-annotated for version. The algorithm achieves about 94 percent accuracy on average (cross-validated). Conclusion and implications: The average pairwise annotator agreement among a group of experts was 94 percent, showing that the method developed in this article displays performance competitive with human experts.

Details

Title
Automatically Determining Versions of Scholarly Articles
Author
Rothchild, Daniel; Shieber, Stuart
Section
Articles
Publication year
2017
Publication date
2017
Publisher
Canadian Institute for Studies in Publishing Press Simon Fraser University
e-ISSN
19230702
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2123671743
Copyright
© 2017. This work is licensed under http://creativecommons.org/licenses/by-nc-nd/2.5/ca (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.