Content area

Abstract

In this study, ten language models are explored and compared in an English-Latvian semantic information retrieval setting, where the indexed collection of documents is written in English while the query documents are written in Latvian. Currently, no similar research has been done regarding the Latvian language. A dataset of 77736 pairs of articles from Latvian and English Wikipedia was created, transformed into embedding vectors, and used for retrieval experiments with brute force search, Hierarchical Navigable Small World method, and Inverted File Indexing method. The LaBSE language model achieved the best performance for short texts and a version of Sentence-BERT and E5-large for long texts.

Details

1009240
Title
Comparison of Language Models for English-Latvian Semantic Search
Author
Kucheravy Artem 1 ; Gints, Jēkabsons 1   VIAFID ORCID Logo 

 1,2 Institute of Applied Computer Systems , Riga Technical University , Riga , Latvia 
Publication title
Volume
30
Issue
1
Pages
34-39
Number of pages
7
Publication year
2025
Publication date
2025
Publisher
De Gruyter Brill Sp. z o.o., Paradigm Publishing Services
Place of publication
Riga
Country of publication
Poland
Publication subject
ISSN
22558683
e-ISSN
22558691
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-02-07
Milestone dates
2024-09-30 (Received); 2025-01-02 (Accepted)
Publication history
 
 
   First posting date
07 Feb 2025
ProQuest document ID
3206821911
Document URL
https://www.proquest.com/scholarly-journals/comparison-language-models-english-latvian/docview/3206821911/se-2?accountid=208611
Copyright
© 2025. This work is published under http://creativecommons.org/licenses/by/4.0 (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-12-13
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic