Full text

Turn on search term navigation

Copyright Buro van die Woordeboek van die Afrikaanse Taal (Bureau of the WAT) 2015

Abstract

This article focuses on lesser-resourced languages for which only very limited corpora are available and how such relatively small and often unbalanced, raw corpora could be maximally utilized for lexicographic purposes to obtain similar results as for bigger corpora. Sepedi and Afrikaans will be studied in this regard. The aim is to determine to what extent enlarging a corpus from e.g. one to 10 million, and from 10 million to 100 million words enhances its potential for (a) macrostructure compilation, (b) sourcing information on the most important microstructural aspects and (c) the creation of lexicographic tools. It will be argued that valuable and even sufficient data for the compilation of a specific dictionary can be extracted from a relatively small corpus of approximately one million words but that "bigger" in some instances indeed means "better".

Details

Title
Corpus-based Lexicography for Lesser-resourced Languages - Maximizing the Limited Corpus
Author
Prinsloo, D J
Pages
285-300
Publication year
2015
Publication date
2015
Publisher
Buro van die Woordeboek van die Afrikaanse Taal (Bureau of the WAT)
ISSN
16844904
e-ISSN
22240039
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1773272694
Copyright
Copyright Buro van die Woordeboek van die Afrikaanse Taal (Bureau of the WAT) 2015