Content area
The increasing complexity of cybersecurity threats necessitates the integration of artificial intelligence (AI) agents and large language models (LLMs) into offensive cyber operations. This study focused on adversarial emulation and explored how AI agents can assist in the correlation between tactics, techniques, and procedures (TTP). By incorporating a postpositivist and pragmatic perspective that values the adaptability of AI and human-machine collaboration, the researcher simulated both benign and malicious behaviors employing Command and Control (C2) Frameworks—namely, Caldera, Sliver, and Havoc—in a controlled DetectionLab environment. The study evaluated the feasibility of classifying reconnaissance network activity as benign or malicious by utilizing machine learning (ML) models and aligning with the MITRE ATT&CK Framework. The results indicate that ML models such as Support Vector Machines and Logistic Regression excelled in classification, particularly with Sliver. Nonetheless, differences in detectability and operational complexity were evident among the tools. These results confirm that the ATT&CK Framework is a reputable knowledge-based repository. The study also revealed limitations in generalizability, data representation, and the interpretability of AI output. Challenges such as hallucinations in LLMs and the necessity for contextual validation reveal persistent difficulties in applying AI within high-stakes environments. This research encourages the transformative potential of artificial intelligence in cybersecurity; however, ethical oversight is necessary to facilitate responsible implementation.