Content area

Abstract

State of the art NLP methods to leverage enormous amounts of digital text are transforming the experience of working with computers and accessing the internet for many people. However, for most of the world’s languages, there is insufficient digital data to make recently popular technology like large language models (LLMs) possible. New technology like LLMs are typically not well-suited for underrepresented languages—often referred to as low-resource languages in NLP—without sufficient digital data. In this case, simpler language technologies like dictionaries, morphological analyzers, and text normalizers are useful. This is especially apparent for language documentary life-cycles, building educational tools, and the development of language typology databases. With this in mind, we propose techniques for automatically expanding coverage of morphological databases and develop methods for building morphological tools for the large set of languages with few available resources. We then study the generation capabilities of neural network models that learn from these resources. Finally we propose methods for training neural networks when only small amounts of data are available, taking inspiration from the recent successes of self-supervised pretraining in high-resource NLP.

Details

1010268
Title
Generalizing Low-Resource Morphology: Cognitive and Neural Perspectives on Inflection
Number of pages
214
Publication year
2025
Degree date
2025
School code
0051
Source
DAI-A 86/11(E), Dissertation Abstracts International
ISBN
9798315702948
Committee member
Palmer, Alexis; Palmer, Martha; Pacheco, Maria; Gorman, Kyle
University/institution
University of Colorado at Boulder
Department
Computer Science
University location
United States -- Colorado
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
31940506
ProQuest document ID
3205813730
Document URL
https://www.proquest.com/dissertations-theses/generalizing-low-resource-morphology-cognitive/docview/3205813730/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic