Content area

Abstract

As cyberattacks continue to rise in frequency and sophistication, extracting actionable Cyber Threat Intelligence (CTI) from diverse online sources has become critical for proactive threat detection and defense. However, accurately identifying complex entities from lengthy and heterogeneous threat reports remains challenging due to long-range dependencies and domain-specific terminology. To address this, we propose XLNet-CRF, a hybrid framework that combines permutation-based language modeling with structured prediction using Conditional Random Fields (CRF) to enhance Named Entity Recognition (NER) in cybersecurity contexts. XLNet-CRF directly addresses key challenges in CTI-NER by modeling bidirectional dependencies and capturing non-contiguous semantic patterns more effectively than traditional approaches. Comprehensive evaluations on two benchmark cybersecurity corpora validate the efficacy of our approach. On the CTI-Reports dataset, XLNet-CRF achieves a precision of 97.41% and an F1-score of 97.43%; on MalwareTextDB, it attains a precision of 85.33% and an F1-score of 88.65%—significantly surpassing strong BERT-based baselines in both accuracy and robustness.

Details

1009240
Title
XLNet-CRF: Efficient Named Entity Recognition for Cyber Threat Intelligence with Permutation Language Modeling
Author
Wang, Tianhao 1 ; Liu, Yang 1 ; Liang, Chao 1 ; Wang Bailing 2 ; Liu Hongri 3   VIAFID ORCID Logo 

 School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China 
 Shandong Key Laboratory of Industrial Network Security, Weihai 264209, China, Harbin Institute of Technology (Weihai) Qingdao Research Institute, Qingdao 266000, China 
 School of Computer Science and Technology, Harbin Institute of Technology, Weihai 264209, China, Weihai Cyberguard Technologies Co. Ltd., Weihai 264209, China 
Publication title
Volume
14
Issue
15
First page
3034
Number of pages
16
Publication year
2025
Publication date
2025
Publisher
MDPI AG
Place of publication
Basel
Country of publication
Switzerland
Publication subject
e-ISSN
20799292
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-07-30
Milestone dates
2025-06-01 (Received); 2025-07-28 (Accepted)
Publication history
 
 
   First posting date
30 Jul 2025
ProQuest document ID
3239023561
Document URL
https://www.proquest.com/scholarly-journals/xlnet-crf-efficient-named-entity-recognition/docview/3239023561/se-2?accountid=208611
Copyright
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-11-07
Database
ProQuest One Academic