Abstract

Recent advances in Artificial Intelligence (AI) enables computers to analyze big data and make real-time decisions. As a result, machine/deep learning techniques are increasingly used to detect threats in security applications such as anomaly detection and malware classification. AI-based security solutions must achieve high threat detection capability and robustness, because vulnerabilities exploited in such applications could lead to dire consequences. Additionally, interpretability of machine/deep learning models employed within security applications is highly desirable, because insights gained could help build better defenses to thwart future attacks. However, much work still needs to be done to improve the speed, accuracy, robustness, and explainability of AI-based security solutions. To this end, this dissertation focuses on three aspects of AI-based security solutions, namely real-time responsiveness, adversarial robustness and explainability.

First, this work introduces RAMP (Real-Time Aggregated Matrix Profile), a realtime anomaly detection model designed to detect misbehaviors in scientific workflows, which are computing paradigms widely used to facilitate scientific collaborations across multiple geographically distributed research sites. RAMP builds upon an existing time series data analysis technique called Matrix Profile to detect anomalous distances among sub-sequences of event streams collected from scientific workflows in an online manner. Using an adaptive uncertainty function, the anomaly detection model is dynamically adjusted to prevent high false alarm rates. RAMP also incorporates user feedback on reported anomalies and modifies model parameters to improve anomaly detection accuracy.

Next, this work proposes LAM (Log Anomaly Mask) to evaluate the robustness of deep-learning based anomaly detection from distributed system logs, which record states and events that occurred during the execution of a distributed system. LAM perturbs streaming logs with minimal modifications in an online fashion so that the attacks can evade anomaly detection by even the state-of-the-art deep learning models. To overcome the search space complexity challenge, LAM models the perturber as a reinforcement learning agent that operates in a partially observable environment to predict the best perturbation action.

Finally, this work introduces CFGExplainer, an interpretability solution that explains the malware classification results made by Graph Neural Networks (GNNs). GNNs that process malware as Control Flow Graphs (CFGs) have shown great promise for malware classification. However, these models are viewed as black-boxes, which makes it hard to validate and identify malicious patterns. CFGExplainer addresses this issue by identifying a subgraph of the malware CFG that contributes most towards the classification and providing insight into importance of the nodes (i.e., basic blocks) within it. We compared CFGExplainer against three explainers, namely GNNExplainer, SubgraphX and PGExplainer, and showed that CFGExplainer is able to identify top equisized subgraphs with higher classification accuracy than the other three models.

Details

Title
Empowering Artificial Intelligence for Cybersecurity Applications
Author
Herath, Jerome Dinal
Publication year
2022
Publisher
ProQuest Dissertations & Theses
ISBN
9798834060765
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2692012412
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.