Content area
Abstract
Understanding the true meaning of natural language sentence by the machine has been a long-time goal in the field of Natural Language Processing (NLP) and Artificial Intelligence (AI). A meaning representation (or semantic representation) is often required to represent the meaning of a natural language sentence in a machine-understandable way. The task of mapping natural language sentences into their meaning representations is called Meaning Representation Parsing (or Semantic Parsing). In this dissertation, we focus on parsing one particular form of meaning representation, Abstract Meaning Representation(AMR), which is a meaning representation formalism that is used to annotate a large semantic bank and has been shown to be promising in various downstream NLP tasks such as information extraction, text summarization, and machine comprehension.
In this dissertation, we address three challenges in AMR parsing: Graph Representation, Abstraction, and Data Sparsity. Graph Representation refers to the fact that AMR is formally a non-tree graph and new algorithms are needed to parse natural language sentences into graphs. We address this with a transition-based algorithm that builds an AMR graph incrementally from a dependency tree. Abstraction represents the fact that AMR does not provide alignments between concepts and relations in an AMR graph and word tokens in a natural language, but this connection is needed to model the transformation from the dependency tree of a sentence to an AMR graph. We address this challenge through two efforts, and these are: 1) designing a transition system that is capable of inferring concepts that are not directly aligned to any particular concepts in the sentence, and 2) building a graph-based string-to-AMR aligner that takes advantages of the structural information in AMR graph. The Data Sparsity issue is caused by the large label space of AMR and the relatively small size of the AMR Bank at the current stage. We tackle this problem by taking advantage of deep learning technique to build a Bidirectional LSTM based concept identifier upon a redesigned concept label set. We also explore the possibility of building an end-to-end Neural AMR parser through a sequence-to-sequence model. To complete this dissertation, we further apply our approaches to the Chinese AMR bank, where we extend our work to Chinese and discuss unique problems in Chinese AMR parsing.