Content area
Arabic natural language processing (NLP) has garnered significant attention in recent years due to the growing demand for automated text and Arabic-based intelligent systems, in addition to digital transformation in the Arab world. However, the unique linguistic characteristics of Arabic, including its rich morphology, diverse dialects, and complex syntax, pose significant challenges to NLP researchers. This paper provides a comprehensive review of the main linguistic challenges inherent in Arabic NLP, such as morphological complexity, diacritics and orthography issues, ambiguity, and dataset limitations. Furthermore, it surveys the major computational techniques employed in tokenisation and normalisation, named entity recognition, part-of-speech tagging, sentiment analysis, text classification, summarisation, question answering, and machine translation. In addition, it discusses the rapid rise of large language models and their transformative impact on Arabic NLP.
Details
Linguistics;
Text categorization;
Datasets;
Deep learning;
Large language models;
Trends;
Morphology;
Sentiment analysis;
Social networks;
Orthography;
Machine translation;
Natural language processing;
Complexity;
Dialects;
Semantics;
Speech recognition;
Summarization;
Orthographic symbols;
Syntactic complexity;
Morphological complexity;
Tagging (Computational linguistics);
Language modeling
