Full text

Turn on search term navigation

© 2024. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.

Abstract

The Europeana digital library features cultural heritage collections from over 3,000 European institutions described in 37 languages. However, most textual metadata describe the records in a single language, the data providers' language. Improving Europeana's multilingual accessibility presents challenges due to the unique characteristics of cultural heritage metadata, often expressed in short phrases and using in-domain terminology. This work presents the EuropeanaTranslate project's approach and results, aimed at translating Europeana metadata records from 23 EU languages into English. Machine Translation engines were trained on a cleaned selection of bilingual and synthetic data from Europeana, including multilingual vocabularies and relevant cultural heritage repositories. Automatic translations were evaluated through standard metrics and human assessments by linguists and domain cultural heritage experts. The results showed significant improvements when compared to the generic engines used before the in-domain training as well as the eTranslation service for most languages. The EuropeanaTranslate engines have translated over 29 million metadata records on Europeana.eu. Additionally, the MT engines and training datasets are publicly available via the European Language Grid Catalogue and the ELRC-SHARE repository.

Details

Title
Adapting Machine Translation Engines to the Needs of Cultural Heritage Metadata
Author
Chatzitheodorou, Konstantinos 1 ; Kaldeli, Eirini 2 ; Isaac, Antoine 3 ; Scalia, Paolo 4 ; Lacal, Carmen Grau 5 ; Escrivá, Ma Ángeles García

 Postdoctoral Researcher, Ionian University 
 Research Associate, National Technical University of Athens 
 Research and Development Manager, Europeana Foundation 
 Technical Business Analyst, Europeana Foundation 
 Computational Linguist, Pangeanic SL 
Pages
1-17
Section
ARTICLE
Publication year
2024
Publication date
Sep 2024
Publisher
American Library Association
e-ISSN
21635226
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
3114239631
Copyright
© 2024. This work is published under https://creativecommons.org/licenses/by-nc/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.