Content area
Abstract
System logs are used to record the operational status of a system and significant events, and by performing anomaly detection on these logs, system faults can be rapidly and accurately identified. However, existing anomaly detection methods encounter difficulties with features that exhibit complex relationships, thereby limiting detection accuracy. Furthermore, the majority of methods depend on supervised learning, which hinders the detection of abnormal logs in large, unlabeled datasets. To address these limitations, this paper proposes a novel semi-supervised log anomaly detection model, termed LogCTBL (CNN-TCN-Bi-LSTM). Firstly, the model parses raw logs using the Drain3 tool. Secondly, it applies BERT for semantic embedding, thereby addressing the issue of log statement discreteness. Thirdly, the model further employs the HDBSCAN (hierarchical density-based spatial clustering of applications with noise) algorithm to estimate dummy tags for unlabeled data in the training set, thereby addressing the challenge of insufficient labeled data. Finally, the hybrid model is then applied to anomaly detection, and the efficacy of the proposed method is evaluated on the BGL and Thunderbird datasets. The findings demonstrate that the proposed method surpasses alternative approaches, attaining F1 scores of 99.87% and 99.78% in the two datasets, respectively.
Details
1 Sichuan University of Science & Engineering, School of Computer Science and Engineering, Yibin, China (GRID:grid.412605.4) (ISNI:0000 0004 1798 1351); Key Laboratory of Enterprise Informatization and IoT Measurement and Control Technology for Universities in Sichuan Province, Yibin, China (GRID:grid.412605.4)
2 Sichuan University of Science & Engineering, School of Computer Science and Engineering, Yibin, China (GRID:grid.412605.4) (ISNI:0000 0004 1798 1351)





