This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
1. Introduction
In the field of network security, current research on situational awareness technology mainly focuses on individual alert events. The strategy is to calculate the threat level based on statistical analysis, without deeply examining the correlation between security events [1]. In fact, there are often strong correlations, such as causality and sequential relationships, between security events detected across multiple sensors; thus, statistics-based situational analysis cannot fully reflect the true state of the network [2, 3]. Network security situational awareness technologies that are based on multisource fusion can realize complete and in-depth security situational awareness. However, due to various components in the security system, it is often difficult to obtain a comprehensive overview of the entire network, despite the available data being very rich and diverse [4]. Situation awareness is a process of simplifying and processing a large number of situation basic data to obtain more accurate information. Situation information comes from various sources and has the characteristics of multiclassification, multisource, and high conflict. How to deal with multisource and high conflict data quickly puts forward higher requirements for situation information processing methods [5, 6]. Situation information fusion is the key to solve this problem. From the perspective of the acquisition of situational awareness data, current methods, based on single-source data, fail to consider the multisource nature and heterogeneity of situational information [7]. As the low-level primary alert event data to be fused are of high volume but low precision, this has a significant negative impact on the final outcome of situational awareness systems [8–10].
The main idea of current situation information fusion research is to transplant classical data fusion methods such as association analysis and evidence theory to the field of network security situation information fusion, without considering the multiclassification, multisource, and high conflict characteristics of situation information itself, which leads to low efficiency and low fusion accuracy in the process of fusion. In this study, a multisource situational information fusion method based on dynamic evidence combination (MSF-DEC) is proposed that involves layer clustering and improved evidence theory. The proposed method refines the evidence through layer clustering and evidence elimination. We show that the MSF-DEC algorithm reduces the amount of input data of situation information fusion, improves the quality of input data, and provides more accurate and concise input for situation information fusion. Furthermore, a rule of combination, based on dynamic construction, can effectively avoid conflicts in situation information synthesis, thus obtaining more accurate situational information fusion results and fusion efficiency.
The remainder of this paper is organized as follows. Section 2 is a summary of current information fusion methods. Section 3 introduces the basic principles of evidence theory. In Section 4, the proposed method is explained in detail. Section 5 presents the experimental simulation and analysis, and conclusions are reported in Section 6.
2. Related Work
Data fusion technology is widely used in the field of network security. Tim Bass introduced the Joint Directors of Laboratories (JDL) data fusion model in the field of network situational awareness [8]. Gucciardi proposed a data fusion method based on prior information credibility to overcome the limitations of Bayesian theory in the fusion of small data samples [9]. Good fusion accuracy can be achieved by measuring the differences between different groups of experimental data based on prior information credibility. But it is less efficient. Alhaj et al. proposed an alert clustering method based on attribute similarity [1]. The Alhaj method first sorted features in descending order based on high information gain entropy and then performed similarity correlation analysis based on the characteristics of high information gain entropy. The Alhaj method can achieve good efficiency, but the accuracy was not ideal [11]. Liu proposed a causality-based alert information fusion method where the causality between each alert event was established to form a knowledge base [4]. Then, associations between alerts were established using the knowledge base to generate what were known as superalerts. The method proposed by Liu has poor fusion accuracy in a complex network environment, with large resource consumption and low fusion efficiency [12]. Zheng et al. used the Pignistic distance to measure the similarity between different pieces of evidence based on sampling past evidence using a sliding window [13]. This established an uncertain state evidence model of the Markov chain and obtained the fusion results using a Murphy combination. However, this method simply averages multiple sets of evidence and does not consider the correlation between the various pieces of evidence, causing some data with large deviations to have a destructive effect on the entire fusion process. Zhao et al. proposed a combined method for conflicting evidence based on inconsistent measurements [14]. This method first measured the degree of conflict between two pieces of evidence and then modified the conflicting evidence by calculating a discount coefficient in order to improve the accuracy of the results of Dempster’s combination. This method may overly weaken the weight of some large deviation evaluation information and lose part of the decision information [15]. Those methods can effectively deal with highly contradictory evidence and provide a good idea for evidence fusion, but they will also weaken the weight of some large deviation evaluation information. Pan et al. proposed a data fusion method based on a measure of uncertainty [16], which facilitated dividing the evidence into reliable and unreliable types. This method then used information entropy to measure the amount of information contained in the evidence, which was regarded as the weight of each item of evidence. However, this method has a slower convergence rate and requires more evidence accumulation. Mihai established an evidence combination rule according to the degree of conflict [17]. When the conflict was large, Mihai assigned more trust to the combination rule based on a set union. When there was less conflict, more trust was assigned to the combination rule based on set intersection. Mihai’s method is rational; however, it fails to address the highly conflicting evidence often involved in the fusion process. Jing et al. proposed a probability transformation method based on the correlation coefficient of belief function, which maximizes the correlation coefficient between the given BPA and the transformed probability distribution and can reflect the original information of the given BPA to the maximum extent [18]. Wu et al. proposed a method of using belief entropy to calculate the uncertainty of each evidence body, which can better improve the classification or fusion effect [19, 20]. Tang et al. analyzed the relationship between conflict caused by incomplete information and information fusion and proposed a generation method generalized basic probability assignment (GBPA) based on the triangular fuzzy number model under the open-world assumption [21]. Therefore, in general, past research has only transplanted traditional data fusion methods, such as association analysis, Bayesian networks, and evidence theory, to the problem of network security situational information fusion, without comprehensively considering the multiclassification, multisource, and highly conflicting nature of situational information. This results in an inefficient fusion process with low fusion accuracy [22, 23].
To address these problems, we propose a multisource situational information fusion method based on dynamic evidence combination (MSF-DEC). The MSF-DEC algorithm first clusters and classifies the alert information hierarchically, based on features of the information, to reduce the range of situational information fusion. Second, an evidence distance metric and bitmap method are used to identify and eliminate highly conflicting evidence to further reduce the amount of alert data and decrease the number of alerts to be fused. Third, the reduced alerts are used as the fusion evidence of MSF-DEC and the combination rule is dynamically adjusted based on the conflict information. The method not only fully considers the hierarchical and highly conflicting nature of alert data but also makes full use of the complementarity between alert data sources to achieve accurate fusion of multisource situational information.
3. Evidence Theory
As an indeterminate reasoning theory [24], evidence theory was first proposed by Dempster and later improved by Shafer [12]. Evidence theory is an extension of probability theory. In classical probability theory,
Definition 1.
When
The Dempster rule of combination for n items of evidence is as follows [26]:
Whether there are two pieces of evidence or
4. Multisource Information Aggregation Method Based on Dynamic Evidence Combination
4.1. Basic Principles
In view of the problems with the application of evidence theory, many solutions have been put forward to improve the theory’s adaptability for multisource fusion. There are three ways to improve the theory [27]. The first approach is to estimate the importance of pieces of evidence and modify their weight in the combination of the results. The second is to improve the combination rule. The third is to use methods, such as neural networks, to optimize the combination rule. Since the security components are affected by factors including the data collection method and the application environment, the security incidents obtained vary greatly in sensitivity [28]. Also, the evidence provided by the less reliable security components should be discarded because they often conflict with the evidence generated from more reliable sources [29].
We propose MSF-DEC to address these problems. The MSF-DEC algorithm first eliminates redundant and misleading data using a conflict evidence elimination method. On this basis, the evidence combination rule is constructed dynamically to improve the multisource information fusion capability of evidence theory. The fusion process is divided into three phases: alert aggregation, evidence elimination, and evidence fusion (Figure 1).
(1) Alert aggregation: the original alert data are clustered and classified to reduce the quantity of alerts and improve the efficiency of fusing the alert information
(2) Evidence elimination: the distance between multiple sources of evidence and degree of conflict between pieces of evidence are measured so that the influence of highly conflicting alerts on the fusion is eliminated, thereby improving the accuracy of situational information fusion
(3) Evidence fusion: the evidence is fused based on the dynamically constructed combination rule, which effectively avoids the Zadeh paradox and improves the accuracy of situational information fusion
[figure omitted; refer to PDF]
Here, the RR is shown to be strongly affected by the similarity threshold T. When the similarity threshold M is large, there is a high requirement for clustering. Consequently, the lower the RR of the clustering results, the larger the number of alerts finally obtained. Conversely, when the similarity threshold is small, the requirement for clustering is relatively low. Thus, as the RR in the experimental results increases, the number of alerts output is correspondingly reduced. It can also be seen from the figure that a similarity threshold of 0.6 is an inflection point for the change in RR. When the similarity threshold is less than 0.6, the RR changes relatively slowly, but when the value is greater than 0.6, the RR decreases rapidly. After several tests, a reasonable range of the similarity threshold T was established as [0.5, 0.7]. As shown in Figure 3, when the similarity threshold T = 0.5, the RR is 93.8%, which can greatly reduce the number of alerts. This is beneficial for the later evidence fusion, laying the basis for accurate alert information fusion.
5.2.2. Clustering Stability
Stability is a key index to evaluate the effectiveness of the clustering method. To test the stability of the proposed method, six experiments were carried out based on the collected alert data. In the experiments, the similarity thresholds were varied from 0.1 to 0.6 in increments of 0.1. An experimental comparison was conducted between the ALC-MFS algorithm proposed in this study and single-layer clustering based on attribute similarity, as proposed by the Alhaj method. The results are shown in Figures 4 and 5.
[figure omitted; refer to PDF]
Figure 4 shows the experimental results obtained by the Alhaj method. It can be seen that the number of alerts in each cluster shows no clear trend as the threshold varies. This indicates that particular alerts were not clustered consistently according to their attributes during the allocation process. Figure 5 shows the experimental results obtained by the ALC-MFS algorithm. Here, the number of alerts in each cluster changes steadily. This shows that the alerts within each cluster are highly correlated, according to their attributes, while the correlation between clusters is comparatively weak. Therefore, the clustering results are quite effective.
Based on further analysis of the clustering results, we found out that cluster 1 mainly includes two types of alert events: Echo Reply and ping scanning; the number of alerts was 421 and 419, respectively. Echo Reply is a message generated in reply to ping scanning; therefore, the Echo Reply alert is generated in the process of replying to the ping scanning. Hence, the two types of alerts are strongly correlated with each other. Moreover, the results for cluster 1 are reasonable, with an accurate response to the attack behavior. When T = 0.6, almost all alert events are assigned to the same cluster, which indicates that the cluster size monotonously and stably changes with the change of cluster granularity. Hence, the clustering process of the method proposed in this study is stable.
5.3. Fusion Accuracy Test
To verify the validity and fusion efficiency of the MSF-DEC algorithm, it was compared with typical alert information processing methods proposed by other researchers [1, 14]. In order to facilitate the comparison of the above methods, the following detection indicators were used, such as detection rate (DR) and false detection rate (FDR).
Definition 7.
DR can be expressed as the ratio of the number of actual attacks to the number of observed attacks, which can be expressed as
Definition 8.
FDR refers to the ratio of the number of false alerts to the total number of alerts, which can be expressed as
As shown in Figures 6–8, the DS method, Xiao method, Mihai method, and the method proposed in this study have obvious advantages in terms of DR and FDR. The method proposed by Alhaj et al. has low accuracy but high processing efficiency, mainly because the evidence fusion method was not applied, and therefore, evidence theory consumed more time. The MSF-DEC algorithm has the optimal overall performance because it reduces the volume of evidence by removing conflicting pieces of evidence and therefore greatly reduces the amount of evidence involved in the fusion, thereby improving the fusion efficiency and having similar performance with the Alhaj method in fusion time. Moreover, the dynamic evidence combination rule adopted by the MSF-DEC algorithm effectively reduces the influence of highly conflicting evidence on the fusion results, improving the overall fusion accuracy.
6. Conclusion
The MSF-DEC algorithm was proposed in this study to overcome the problems associated with current situational information fusion algorithms in terms of fusion efficiency, DR, and FDR. The method divides the situational information fusion process into three stages. First, the ALC-MFS algorithm is used to cluster the original alert information to reduce the volume of alert data. Next, evidence distance and bitmap methods were adopted to eliminate highly conflicting evidence to further reduce the alert data. Finally, with the reduced alerts as the fusion evidence, the multisource situational information was accurately and efficiently fused based on dynamically constructed evidence combination rules.
One of the limitations of this method is that, during information fusion, the remaining evidence, after removing the conflicting alerts, is given with the same weight. In other words, each data source is assigned with the same belief. This method cannot reflect the importance of each data source. In future research, evidence will be evaluated to improve the impact of more reliable evidence on situation information fusion. Evidence reliability measurement based on entropy and the conflict caused by incomplete information will be significant research directions. Additionally, future studies will weaken the influence of evidence that has low credibility, further improving the accuracy of multisource situational information fusion.
Acknowledgments
This work was supported by the National Natural Science Foundation of China (Grant no. 61902427).
[1] T. A. Alhaj, M. M. Siraj, A. Zainal, "Feature selection using information gain for improved structural-based alert correlation," PloS One, vol. 11 no. 11,DOI: 10.1371/journal.pone.0166017, 2016.
[2] Q. S. Qassim, A. M. Zin, M. J. A. Aziz, "Anomaly-based network IDS false alarm filter using cluster-based alarm classification approach," International Journal of Security and Networks, vol. 12 no. 1, pp. 13-26, DOI: 10.1504/ijsn.2017.081056, 2017.
[3] A. Javaid, N. Javaid, Z. Wadud, "Machine learning algorithms and fault detection for improved belief function based decision fusion in wireless sensor networks," Sensors, vol. 19 no. 6,DOI: 10.3390/s19061334, 2019.
[4] J. Liu, S. Li, R. Zhang, "Algorithm of reducing the false positives in IDS based on correlation analysis," IOP Conference Series: Materials Science and Engineering, vol. 322 no. 6,DOI: 10.1088/1757-899x/322/6/062016, 2018.
[5] J. Qi, P. Yang, L. Newcombe, X. Peng, Z. Zhao, "An overview of data fusion techniques for Internet of Things enabled physical activity recognition and measure," Information Fusion, vol. 55, pp. 269-280, DOI: 10.1016/j.inffus.2019.09.002, 2020.
[6] P. Zuo, Y. Hua, Y. Sun, "Bandwidth and energy efficient image sharing for situation awareness in disasters," IEEE Transactions on Parallel and Distributed Systems, vol. 30 no. 1, pp. 15-28, DOI: 10.1109/TPDS.2018.2859930, 2018.
[7] L. Snidaro, I. Visentini, K. Bryan, "Fusing uncertain knowledge and evidence for maritime situational awareness via Markov logic networks," Information Fusion, vol. 21, pp. 159-172, DOI: 10.1016/j.inffus.2013.03.004, 2015.
[8] T. Bass, "Intrusion detection systems and multisensor data fusion," Communications of the ACM, vol. 43 no. 4, pp. 99-105, DOI: 10.1145/332051.332079, 2000.
[9] D. F. Gucciardi, B. Jackson, "Understanding sport continuation: an integration of the theories of planned behaviour and basic psychological needs," Journal of Science and Medicine in Sport, vol. 18 no. 1, pp. 31-36, DOI: 10.1016/j.jsams.2013.11.011, 2015.
[10] J. An, M. Hu, L. Fu, J. Zhan, "A novel fuzzy approach for combining uncertain conflict evidences in the dempster-shafer theory," IEEE Access, vol. 7, pp. 7481-7501, DOI: 10.1109/access.2018.2890419, 2019.
[11] X. Fan, Y. Guo, Y. Ju, J. Bao, W. Lyu, "Multisensor fusion method based on the belief entropy and DS evidence theory," Journal of Sensors, vol. 2020 no. 10,DOI: 10.1155/2020/7917512, 2020.
[12] F. Ye, P. Bai, Y. Tian, "An algorithm based on evidence theory and fuzzy entropy to defend against SSDF," Journal of Systems Engineering and Electronics, vol. 31 no. 2, pp. 243-251, DOI: 10.23919/jsee.2020.000002, 2020.
[13] X. Zheng, B. Huang, D. Ni, Q. Xu, "A novel intelligent vehicle risk assessment method combined with multi-sensor fusion in dense traffic environment," Journal of Intelligent and Connected Vehicles, vol. 1 no. 2, pp. 41-54, DOI: 10.1108/jicv-02-2018-0004, 2018.
[14] Y. Zhao, R. Jia, P. Shi, "A novel combination method for conflicting evidence based on inconsistent measurements," Information Sciences, vol. 367-368, pp. 125-142, DOI: 10.1016/j.ins.2016.05.039, 2016.
[15] V. Shah, A. Aggarwal, A. Aggarwal, N. Chaubey, "Alert fusion of intrusion detection systems using Fuzzy Dempster Shafer theory," Journal of Engineering Science and Technology Review, vol. 10 no. 3, pp. 123-127, DOI: 10.25103/jestr.103.17, 2017.
[16] L. Pan, Y. Deng, "A new belief entropy to measure uncertainty of basic probability assignments based on belief function and plausibility function," Entropy, vol. 20 no. 11,DOI: 10.3390/e20110842, 2018.
[17] M. C. Florea, A. L. Jousselme, "Robust combination rules for evidence theory," Information Fusion, vol. 10 no. 2, pp. 183-197, DOI: 10.1016/j.inffus.2008.08.007, 2009.
[18] M. Jing, Y. Tang, "A new base basic probability assignment approach for conflict data fusion in the evidence theory," Applied Intelligence, vol. 51 no. 2, pp. 1056-1068, DOI: 10.1007/s10489-020-01876-0, 2021.
[19] D. Wu, Z. Liu, Y. Tang, "A new classification method based on the negation of a basic probability assignment in the evidence theory," Engineering Applications of Artificial Intelligence, vol. 96,DOI: 10.1016/j.engappai.2020.103985, 2020.
[20] D. Wu, Y. Tang, "An improved failure mode and effects analysis method based on uncertainty measure in the evidence theory," Quality and Reliability Engineering International, vol. 36 no. 5, pp. 1786-1807, DOI: 10.1002/qre.2660, 2020.
[21] Y. Tang, D. Wu, Z. Liu, "A new approach for generation of generalized basic probability assignment in the evidence theory," Pattern Analysis and Applications, vol. 24, 2021.
[22] Y. Song, X. Wang, J. Zhu, L. Lei, "Sensor dynamic reliability evaluation based on evidence theory and intuitionistic fuzzy sets," Applied Intelligence, vol. 48 no. 11, pp. 3950-3962, DOI: 10.1007/s10489-018-1188-0, 2018.
[23] A. Cheng, X. Jiang, Y. Li, C. Zhang, H. Zhu, "Multiple sources and multiple measures based traffic flow prediction using the chaos theory and support vector regression method," Physica A: Statistical Mechanics and its Applications, vol. 466, pp. 422-434, DOI: 10.1016/j.physa.2016.09.041, 2017.
[24] N. Derrick, L. Renfa, W. Yongheng, "Particle swarm optimization and dempster shafer approach to achieve Internet of Things context fusion using quality of context," International Journal of Multimedia and Ubiquitous Engineering, vol. 11 no. 2, pp. 247-264, DOI: 10.14257/ijmue.2016.11.2.25, 2016.
[25] W. Li, B. Wang, "A synthetic method for situation assessment based on fuzzy logic and DS evidential theory," Systems Engineering and Electronics, vol. 10, 2003.
[26] F. Yang, X. Wang, Combination of Conflict for D-S Evidence Theory, 2010.
[27] X. Yu, F. Zhang, L. Zhou, "Novel data fusion algorithm based on event-driven and Dempster-Shafer evidence theory," Wireless Personal Communications, vol. 100, pp. 1377-1391, 2018.
[28] A. Maseleno, M. Mahmud Hasan, "The Dempster-Shafer theory algorithm and its application to insect diseases detection," International Journal of Advanced Science and Technology, vol. 50, pp. 111-120, 2013.
[29] D. Yu, D. Frincke, "Alert confidence fusion in intrusion detection systems with extended Dempster-Shafer theory," Proceedings of the 43rd Annual Southeast Regional Conference, pp. 142-147, .
[30] N. Dhanachandra, Y. J. Chanu, "An image segmentation approach based on fuzzy c-means and dynamic particle swarm optimization algorithm," Multimedia Tools and Applications, vol. 79 no. 25-26, pp. 18839-18858, DOI: 10.1007/s11042-020-08699-8, 2020.
[31] A. Mazher, P. Li, T. A. Moughal, H. Xu, "A decision fusion method using an algorithm for fusion of correlated probabilities," International Journal of Remote Sensing, vol. 37 no. 1, pp. 14-25, DOI: 10.1080/2150704x.2015.1109158, 2015.
[32] J. Zhang, B. Yu, J. Li, "Research on IDS alert aggregation based on improved quantum-behaved particle swarm optimization," Proceedings of the International Conference Computer Science and Technology (CST2016), pp. 293-299, .
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer
Copyright © 2021 Jing Liu et al. This is an open access article distributed under the Creative Commons Attribution License (the “License”), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License. https://creativecommons.org/licenses/by/4.0/
Abstract
To address the problems of fusion efficiency, detection rate (DR), and false detection rate (FDR) that are associated with existing information fusion methods, a multisource information fusion method featuring dynamic evidence combination based on layer clustering and improved evidence theory is proposed in this study. First, the original alerts are hierarchically clustered and conflicting evidence is eliminated. Then, dynamic evidence combination is applied to fuse the condensed alerts, thereby improving the efficiency and accuracy of the fusion. The experimental results show that the proposed method is superior to current fusion methods in terms of fusion efficiency, DR, and FDR.
You have requested "on-the-fly" machine translation of selected content from our databases. This functionality is provided solely for your convenience and is in no way intended to replace human translation. Show full disclaimer
Neither ProQuest nor its licensors make any representations or warranties with respect to the translations. The translations are automatically generated "AS IS" and "AS AVAILABLE" and are not retained in our systems. PROQUEST AND ITS LICENSORS SPECIFICALLY DISCLAIM ANY AND ALL EXPRESS OR IMPLIED WARRANTIES, INCLUDING WITHOUT LIMITATION, ANY WARRANTIES FOR AVAILABILITY, ACCURACY, TIMELINESS, COMPLETENESS, NON-INFRINGMENT, MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE. Your use of the translations is subject to all use restrictions contained in your Electronic Products License Agreement and by using the translation functionality you agree to forgo any and all claims against ProQuest or its licensors for your use of the translation functionality and any output derived there from. Hide full disclaimer