Content area

Abstract

Portable Document Format (PDF) files are widely used for information exchange but have become a frequent vector for cyberattacks. Traditional signature-based and heuristic methods often fail against obfuscation and polymorphic malware, highlighting the need for more adaptive detection strategies. This study addresses the problem of PDF malware detection by applying machine learning, focusing on ensemble methods. A Random Forest model was trained on the PDFMal-2022 dataset using both static features (file size, page count, text length, image and JavaScript markers) and engineered features (text-to-size ratio, images-per-page ratio, missing text flag, and enhanced JavaScript count). Stratified cross-validation demonstrated stable performance with a macro F1-score of approximately 0.992. Feature importance analysis further confirmed the dominance of JavaScript-related attributes. The contribution of this work is to demonstrate that a lightweight and interpretable Random Forest framework can deliver state-ofthe-art detection while avoiding the computational demands of deep learning.

Details

1009240
Business indexing term
Title
Random Forest Approach for pdf Malvare Detection
Publication title
Volume
13
Issue
3
Pages
694-719
Number of pages
27
Publication year
2025
Publication date
2025
Publisher
University of Latvia
Place of publication
Riga
Country of publication
Latvia
Publication subject
ISSN
22558942
e-ISSN
22558950
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
ProQuest document ID
3259293989
Document URL
https://www.proquest.com/scholarly-journals/random-forest-approach-pdf-malvare-detection/docview/3259293989/se-2?accountid=208611
Copyright
© 2025. This work is published under https://creativecommons.org/licenses/by-sa/4.0/ (the "License"). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.
Last updated
2025-10-10
Database
2 databases
  • ProQuest One Academic
  • ProQuest One Academic