Content area
Full text
In this article, we present a new lexical database for Modern Standard Arabic: Aralex. Based on a contemporary text corpus of 40 million words, Aralex provides information about (1) the token frequencies of roots and word patterns, (2) the type frequency, or family size, of roots and word patterns, and (3) the frequency of bigrams, trigrams in orthographic forms, roots, and word patterns. Aralex will be a useful tool for studying the cognitive processing of Arabic through the selection of stimuli on the basis of precise frequency counts. Researchers can use it as a source of information on natural language processing, and it may serve an educational purpose by providing basic vocabulary lists. Aralex is distributed under a GNU-like license, allowing people to interrogate it freely online or to download it from www.mrc-cbu.cam.ac.uk:8081/aralex .online/login.jsp.
(ProQuest: ... denotes formulae omitted.)
Psycholinguistic databases, providing statistical information such as word frequency, length, and imageability, have proved to be invaluable tools for the experimental investigation of the cognitive processes underlying language functions and for the design of language assessment tools for both educational and clinical purposes (Lété, Sprenger-Charolles, & Colé, 2004; Stadthagen-Gonzalez & Davis, 2006). Such databases have long been available for European languages such as English, French, German, and Spanish and have contributed to major advances in basic theoretical and translational research in psycholinguistics (Baayen, Piepenbrock, & van Rijn, 1993; Content, Mousty, & Radeau, 1990; New, Pallier, Brysbaert, & Ferrand, 2004; Sebastián-Gallés, Marti, Cuetos, & Carreiras, 2000). The obverse of these achievements is that much of what we know to date about human language understanding and representation is based on the study of a select few languages, with a single language (English) still largely dominant.
This is scientifically unsatisfactory. When broader cross-linguistic studies have been carried out, they have proved enlightening and, sometimes, even revolutionary. Research into Arabic, the most widely spoken Semitic language, has led to major developments in linguistic theory, as attested by the groundbreaking work of John McCarthy on morphophonology (McCarthy, 1981). Ongoing psycholinguistic research into Arabic is also helping to constrain our knowledge of and theorizing about how different linguistic components, such as morphology, phonology, orthography, and semantics, are processed by and represented in the human brain/mind (Boudelaa & Gaskell, 2002; Boudelaa...