Content area
This research proposes a hybrid approach for Named-Entity Recognition (NER) for Setswana, a low-resource language, that combines a bidirectional long short-term memory (BiLSTM) with a transfer learning model and a convolutional neural network (CNN). Among the 11 official languages of South Africa, Setswana is a morphologically rich language that is underrepresented in the field of deep learning for natural language processing (NLP). The fact that it is a language with limited resources is one of the reasons for this gap. The suggested NER hybrid transfer learning approach and an open-source Setswana NER dataset from the South African Centre for Digital Language Resources (SADiLaR), which contains an estimated 230,000 tokens overall, are used in this research to close this gap. Five NER models are created for the study and contrast with one another to determine which performs best. The performance of the top model is then contrasted with that of the baseline models. The latter three models are trained at sentence-level, whereas the first two are at word-level. Sentence-level models interpret the entire sentence as a series of word embeddings, while word-level models represent each word as a character sequence or word embedding. CNN is the first model, and CNN-BiLSTM transfer learning based on Word level is the second. Sentence-Level is the basis for the last three CNN, CNN-BiLSTM Transfer Learning, and CNN-BiLSTM models. With 99% of accuracy, the CNN-BiLSTM Transfer Learning sentence-level outperforms all other models. Furthermore, it outperforms the state-of-the-art models for Setswana in the literature that were created using the same dataset.
Details
Deep learning;
Machine learning;
Natural language processing;
Words (language);
Recognition;
Artificial neural networks;
Sentences;
Computer science;
Language policy;
Morphology;
Tswana language;
African languages;
Computers;
Bantu languages;
Short term memory;
Neural networks;
Linguistics;
Semantics;
Official languages;
Acknowledgment;
Bidirectionality;
Learning;
Languages;
Language acquisition