Content area

Abstract

Low-resource settings, in which intelligent systems constantly face emerging knowledge beyond their initial learning, are inevitable when developing intelligent systems. Only by extracting the true semantic understanding of the linguistic inputs can these systems prevail when little knowledge is provided. This persistent challenge hampers the ability of intelligent systems to excel in the emerging essential tasks. In reality, low-resource can occur on different granularities of the textual understanding: (1) coarse-grained on the sentence-level, (2) fine-grained on the token-level, or both.

Throughout this manuscript, we address the issues of low-resource settings in Natural Language Understanding (NLU) across multiple granularities. First, we tackle the challenges of low-resource coarse-grained annotations by introducing dynamic semantic extraction together with multi-perspective matching and aggregation networks. Secondly, we address the concerns of unavailable fine-grained annotations and explore the potentials of inducing such information without the need of token-level supervised training by extracting and refining the preserved knowledge existent in generic-purpose language models with additional multi-level contrastive learning objectives. Third, we overcome the challenges of low-resource multi-grained annotations by reinforcing the interconnections of different granularities via coarse-to-fine chain-of-thought reasoning and structured knowledge from Abstract Meaning Representation Graph. Finally, we broaden the scope of low-resource NLU challenges beyond English, focusing on the cross-lingual transfer towards low-resource languages through the novel phonemic transcription integration beyond the textual scripts. Our work leverages publicly available datasets catering for both Task-oriented Dialogue Systems (SNIPS, NLUE, ATIS, MTOP, MASSIVE) in conjunction with the open-source comprehensive generic-purpose multilingual NLU benchmark datasets such as XTREME.

Details

1010268
Business indexing term
Title
Low-Resource Multi-Grained Natural Language Understanding: English and Beyond
Number of pages
134
Publication year
2025
Degree date
2025
School code
0799
Source
DAI-B 87/5(E), Dissertation Abstracts International
ISBN
9798263310301
Committee member
Parde, Natalie; Yadav, Shweta; Zhang, Chenwei; Liu, Ye
University/institution
University of Illinois at Chicago
Department
Computer Science
University location
United States -- Illinois
Degree
Ph.D.
Source type
Dissertation or Thesis
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
32154517
ProQuest document ID
3272331240
Document URL
https://www.proquest.com/dissertations-theses/low-resource-multi-grained-natural-language/docview/3272331240/se-2?accountid=208611
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Database
ProQuest One Academic