Content area

Abstract

State of the art NLP methods to leverage enormous amounts of digital text are transforming the experience of working with computers and accessing the internet for many people. However, for most of the world’s languages, there is insufficient digital data to make recently popular technology like large language models (LLMs) possible. New technology like LLMs are typically not well-suited for underrepresented languages—often referred to as low-resource languages in NLP—without sufficient digital data. In this case, simpler language technologies like dictionaries, morphological analyzers, and text normalizers are useful. This is especially apparent for language documentary life-cycles, building educational tools, and the development of language typology databases. With this in mind, we propose techniques for automatically expanding coverage of morphological databases and develop methods for building morphological tools for the large set of languages with few available resources. We then study the generation capabilities of neural network models that learn from these resources. Finally we propose methods for training neural networks when only small amounts of data are available, taking inspiration from the recent successes of self-supervised pretraining in high-resource NLP.

Details

Title
Generalizing Low-Resource Morphology: Cognitive and Neural Perspectives on Inflection
Author
Wiemerslage, Adam J.
Publication year
2025
Publisher
ProQuest Dissertations & Theses
ISBN
9798315702948
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
3205813730
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.