Abstract

Named Entity Recognition (NER) for cyber security aims to identify and classify cyber security terms from a large number of heterogeneous multisource cyber security texts. In the field of machine learning, deep neural networks automatically learn text features from a large number of datasets, but this data-driven method usually lacks the ability to deal with rare entities. Gasmi et al. proposed a deep learning method for named entity recognition in the field of cyber security, and achieved good results, reaching an F1 value of 82.8%. But it is difficult to accurately identify rare entities and complex words in the text.To cope with this challenge, this paper proposes a new model that combines data-driven deep learning methods with knowledge-driven dictionary methods to build dictionary features to assist in rare entity recognition. In addition, based on the data-driven deep learning model, an attention mechanism is adopted to enrich the local features of the text, better models the context, and improves the recognition effect of complex entities. Experimental results show that our method is better than the baseline model. Our model is more effective in identifying cyber security entities. The Precision, Recall and F1 value reached 90.19%, 86.60% and 88.36% respectively.

Details

Title
Data and knowledge-driven named entity recognition for cyber security
Author
Chen, Gao 1 ; Zhang, Xuan 2   VIAFID ORCID Logo  ; Liu, Hui 1 

 School of Software, Yunnan University, Yunnan, China (GRID:grid.440773.3) (ISNI:0000 0000 9342 2456) 
 School of Software, Yunnan University, Yunnan, China (GRID:grid.440773.3) (ISNI:0000 0000 9342 2456); Key Laboratory of Software Engineering of Yunnan Province, Yunnan, China (GRID:grid.440773.3); Engineering research center of cyberspace, Yunnan, China (GRID:grid.440773.3) 
Publication year
2021
Publication date
May 2021
Publisher
Springer Nature B.V.
e-ISSN
25233246
Source type
Scholarly Journal
Language of publication
English
ProQuest document ID
2520676440
Copyright
© The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.