Full text

Turn on search term navigation

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

In the realm of digital libraries, efficiently managing and accessing scientific publications necessitates automated bibliographic reference segmentation. This study addresses the challenge of accurately segmenting bibliographic references, a task complicated by the varied formats and styles of references. Focusing on the empirical evaluation of Conditional Random Fields (CRF), Bidirectional Long Short-Term Memory with CRF (BiLSTM + CRF), and Transformer Encoder with CRF (Transformer + CRF) architectures, this research employs Byte Pair Encoding and Character Embeddings for vector representation. The models underwent training on the extensive Giant corpus and subsequent evaluation on the Cora Corpus to ensure a balanced and rigorous comparison, maintaining uniformity across embedding layers, normalization techniques, and Dropout strategies. Results indicate that the BiLSTM + CRF architecture outperforms its counterparts by adeptly handling the syntactic structures prevalent in bibliographic data, achieving an F1-Score of 0.96. This outcome highlights the necessity of aligning model architecture with the specific syntactic demands of bibliographic reference segmentation tasks. Consequently, the study establishes the BiLSTM + CRF model as a superior approach within the current state-of-the-art, offering a robust solution for the challenges faced in digital library management and scholarly communication.

Details

Title
Neural Architecture Comparison for Bibliographic Reference Segmentation: An Empirical Study
Author
Rodrigo Cuéllar Hidalgo 1   VIAFID ORCID Logo  ; Raúl Pinto Elías 2   VIAFID ORCID Logo  ; Torres-Moreno, Juan-Manuel 3   VIAFID ORCID Logo  ; Osslan Osiris Vergara Villegas 4   VIAFID ORCID Logo  ; Gerardo Reyes Salgado 5   VIAFID ORCID Logo  ; Salazar, Andrea Magadán 2   VIAFID ORCID Logo 

 Biblioteca Daniel Cosío Villegas, El Colegio de México, Carretera Picacho Ajusco 20, Mexico City 14110, Mexico; [email protected] 
 Tecnológico Nacional de México/CENIDET, Cuernavaca 62490, Mexico; [email protected] (R.P.E.); [email protected] (A.M.S.) 
 Laboratoire Informatique d’Avignon, Université d’Avignon, 339 Chemin des Meinajariès, CEDEX 9, 84911 Avignon, France 
 Industrial and Manufacturing Engineering Department, Universidad Autónoma de Ciudad Juárez, Ciudad Juárez 32310, Mexico; [email protected] 
 Departamento de Informática y Estadística, Universidad Rey Juan Carlos, Av. del Alcalde de Móstoles, 28933 Madrid, Spain; [email protected] 
First page
71
Publication year
2024
Publication date
2024
Publisher
MDPI AG
e-ISSN
23065729
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3059396548
Copyright
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.