Content area

Abstract

This study investigates the potential of large language models (LLMs) to estimate the familiarity of words and multi-word expressions (MWEs). We validated LLM estimates for isolated words using existing human familiarity ratings and found strong correlations. LLM familiarity estimates performed even better in predicting lexical decision and naming performance in megastudies than the best available word frequency measures. We then applied LLM estimates to MWEs, also finding their effectiveness in measuring familiarity for these expressions. We have created a list of more than 400,000 English words and MWEs with LLM-generated familiarity estimates, which we hope will be a valuable resource for researchers. There is also a cleaned-up list of nearly 150,000 entries, excluding lesser-known stimuli, to streamline stimulus selection. Our findings highlight the advantages of LLM-based familiarity estimates, including their better performance than traditional word frequency measures (particularly for predicting word recognition accuracy), their ability to generalize to MWEs, availability for large lists of words, and ease of obtaining new estimates for all types of stimuli.

Details

Title
Moving beyond word frequency based on tally counting: AI-generated familiarity estimates of words and phrases are an interesting additional index of language knowledge
Author
Brysbaert, Marc 1 ; Martínez, Gonzalo 2 ; Reviriego, Pedro 3 

 Ghent University, Department of Experimental Psychology, Ghent, Belgium (GRID:grid.5342.0) (ISNI:0000 0001 2069 7798) 
 Universidad Carlos III de Madrid, Leganés, Spain (GRID:grid.7840.b) (ISNI:0000 0001 2168 9183) 
 ETSI de Telecomunicación, Universidad Politécnica de Madrid, Madrid, Spain (GRID:grid.5690.a) (ISNI:0000 0001 2151 2978) 
Publication title
Volume
57
Issue
1
Pages
28
Publication year
2025
Publication date
Jan 2025
Publisher
Springer Nature B.V.
Place of publication
New York
Country of publication
Netherlands
Publication subject
e-ISSN
15543528
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2024-12-28
Milestone dates
2024-11-20 (Registration); 2024-11-18 (Accepted)
Publication history
 
 
   First posting date
28 Dec 2024
ProQuest document ID
3287487415
Document URL
https://www.proquest.com/scholarly-journals/moving-beyond-word-frequency-based-on-tally/docview/3287487415/se-2?accountid=208611
Copyright
© The Psychonomic Society, Inc. 2024.
Last updated
2026-01-02
Database
ProQuest One Academic