Content area

Abstract

Automated extraction of disaster-related named entities is crucial for gathering pertinent information during natural or human-made crises. Timely and reliable data is vital for effective disaster management, benefiting humanitarian response authorities, law enforcement agencies, and other concerned organizations. Online news media plays a pivotal role in disseminating crisis-related information during emergencies and facilitating post-hazard disaster response operations. To extract relevant named entities, contextual embedding features prove instrumental. In this study, we investigate the automatic extraction of disaster-related named entities from an annotated dataset of 1000 online news articles. These articles are carefully annotated with 14 crisis-specific entities obtained from relevant ontologies. To generate contextual vector representations of words, we construct a novel word embedding model inspired by Word2vec. These contextual word embedding features, combined with lexicon features, are encoded using a novel contextualized deep Bi-directional LSTM network augmented with self-attention and conditional random field (CRF) layers. We compare the performance of our proposed model with existing word embedding approaches. Through extensive evaluation on an independent test set of 200 articles that includes more than 80,000 tokens, our context-sensitive optimized NER model achieves impressive results at the sentence level. With a Precision of 92%, Recall of 91%, Accuracy of 87%, and an F1-score of 92%, our model outperforms those utilizing general and non-contextual word embeddings, including fine-tuned and contextual BERT models, showcasing its superior performance.

Full text

Turn on search term navigation

© 2025 Hafsa et al. This is an open access article distributed under the terms of the Creative Commons Attribution License: http://creativecommons.org/licenses/by/4.0/ (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.