Abstract

Kamus Dewan is the authoritative dictionary for Bahasa Malaysia, containing a wealth of linguistic and cultural information about Bahasa Malaysia. It is currently available in print, as well as a searchable online dictionary. However, the online dictionary lacks advanced search capabilities that target specific fields within each headword and lemma entry. For this information to be targeted and extracted efficiently by computers, the macro- and micro-structures of Kamus Dewan entries need to be first annotated or marked up explicitly. We describe how TEI-P5 guidelines have been applied in this endeavour to make the Kamus Dewan more machine-tractable. We also give some examples of how the machine-tractable data from Kamus Dewan can be used for linguistic research and analysis, as well as for producing other language resources.

Details

Title
Digitising a machine-tractable version of Kamus Dewan with TEI-P5
Author
Lim, Lian Tze; Chiew, Ruoh Tau; Enya Kong Tang; Rusli Abdul Ghani; Yusof, Naimah
Publication year
2016
Publication date
Jul 1, 2016
Publisher
PeerJ, Inc.
e-ISSN
21679843
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
1953945509
Copyright
© 2016 Lim et al. This is an open access article distributed under the terms of the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Preprints) and either DOI or URL of the article must be cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.