Content area
Full text
(ProQuest: ... denotes non-US-ASCII text omitted.)
This book is dedicated to a heterogeneous set of linguistic constructions that has generated recurrent interest in natural language processing (NLP) and correlated areas throughout the years: Collocations. These have also been referred to as multiword expressions and phraseological units, for example, and include among them idioms (in 1), phrasal verbs (in 2 and 3) and compound nouns (in 4):
(1) To make ends meet he (2) ended up (3) relying on a (4) monthly bus pass
Collocations are discussed in this book in the context of a framework for their automatic identification using syntactic information. In particular, the author examines the hypothesis that parse-based identification, using the syntactic proximity between words, produces improvements in performance, in relation to standard window methods, using linear text proximity. A parse-based approach looks for recurrent combinations of words that are syntactically related and satisfy morphological and syntactic constraints (e.g., not being an auxiliary verb and having a subject-verb or verb-object relation), like prepare and exam in a verb-object relation. Window-based methods focus on recurrent pairs of words found in a short window of text, including interruptible and adjacent sequences (n-grams). A parse-based method would, for instance, be able to extract combinations even on the face of morphological and syntactic transformations and long-distance dependencies, and would be less susceptible to data sparseness, as it would collapse syntactically varied occurrences of a combination into a canonical form. This hypothesis is presented and evaluated in this book organized in six chapters and six appendices, which include a comprehensive overview of the literature in the area from an NLP perspective.
In Chapter 1, the book starts with a brief introduction to collocations and motivation for the research through examples. It also defines the aims and structure of the book.
Chapter 2 provides a historical overview of the interest in collocations, starting with their role in first and second language learning, as language chunks that need to be treated as units and require memorization. Therefore, they are associated with high levels of proficiency and often seen as the privilege of native speakers for lending fluency to speech. Their importance has long been recognized in lexicographic work, reflected by the many on-going initiatives for compiling collocation...





