Content area
In this paper, the authors analyze the applicability of artificial intelligence algorithms for classifying file encryption methods based on statistical features extracted from the binary content of files. The prepared datasets included both unencrypted files and files encrypted using selected cryptographic algorithms in Electronic Codebook (ECB) and Cipher Block Chaining (CBC) modes. These datasets were further diversified by varying the number of encryption keys and the sample sizes. Feature extraction focused solely on basic statistical parameters, excluding an analysis of file headers, keys, or internal structures. The study evaluated the performance of several models, including Random Forest, Bagging, Support Vector Machine, Naive Bayes, K-Nearest Neighbors, and AdaBoost. Among these, Random Forest and Bagging achieved the highest accuracy and demonstrated the most stable results. The classification performance was notably better in ECB mode, where no random initialization vector was used. In contrast, the increased randomness of data in CBC mode resulted in lower classification effectiveness, particularly as the number of encryption keys increased. This paper provides a comprehensive analysis of the classifiers’ performance across various encryption configurations and suggests potential directions for further experiments.
Details
Cryptography;
Language;
Machine learning;
Accuracy;
Datasets;
Deep learning;
Forensic sciences;
Regression analysis;
Artificial intelligence;
Decision making;
Neural networks;
Malware;
Data encryption;
Natural language processing;
Algorithms;
Decision trees;
Libraries;
Python;
Computer forensics;
Entropy;
Ransomware
