Content area

Abstract

With the increasing reliance on digital platforms for shopping, communication, and meetings, users are more exposed to cyber threats like phishing. These attacks often involve fraudulent websites designed to steal sensitive information, such as passwords and credit card details, by mimicking legitimate sites. Attackers use various deceptive techniques, including link manipulation, filter evasion, covert redirection, website forgery, and social engineering. This study introduces an advanced phishing detection framework using machine learning (ML) models. A dataset of 1,353 URLs (702 legitimate, 103 suspicious, and 548 phishing) was compiled, with nine key features extracted for classification. Four ML classifiers—Categorical Boosting, Random Forest (RF), Decision Tree (DT), and Extreme Gradient Boosting (XGB)—were employed, with cross-validation ensuring robust model evaluation. Feature selection was conducted using SHapley Additive Explanations (SHAP) and Recursive Feature Elimination (RFE) to enhance interpretability and computational efficiency. To further refine classification accuracy across legitimate, suspicious, and phishing categories, hyperparameter tuning was performed using four nature-inspired optimization algorithms: Golden Jackal Optimization, Dandelion Optimization, Coati Optimization, and Puma Optimization. These algorithms were chosen for their strong global search capabilities and adaptability to complex datasets, ensuring optimal parameter selection for improved model performance. The study’s main contribution lies in integrating these optimization techniques with ML classifiers, significantly improving phishing detection accuracy while reducing computational complexity. Experimental results demonstrated that XGB-based models, particularly XGPO, achieved the highest performance across two feature-selection scenarios. In Scenario 1, Accuracy = 0.980, Precision = 0.981, Recall = 0.980, F1-score = 0.980, MCC = 0.965, AUC = 0.985. In Scenario 2, Accuracy = 0.984, Precision = 0.985, Recall = 0.984, F1-score = 0.985, MCC = 0.973, AUC = 0.989. These findings highlight the effectiveness of ML-driven phishing detection in strengthening user security, preventing cyber fraud, and fostering trust in online interactions.

Details

1009240
Business indexing term
Title
Leveraging machine learning to proactively identify phishing campaigns before they strike
Publication title
Volume
12
Issue
1
Pages
124
Publication year
2025
Publication date
May 2025
Publisher
Springer Nature B.V.
Place of publication
Heidelberg
Country of publication
Netherlands
e-ISSN
21961115
Source type
Scholarly Journal
Language of publication
English
Document type
Journal Article
Publication history
 
 
Online publication date
2025-05-20
Milestone dates
2025-04-27 (Registration); 2024-11-26 (Received); 2025-04-27 (Accepted)
Publication history
 
 
   First posting date
20 May 2025
ProQuest document ID
3206309445
Document URL
https://www.proquest.com/scholarly-journals/leveraging-machine-learning-proactively-identify/docview/3206309445/se-2?accountid=208611
Copyright
Copyright Springer Nature B.V. May 2025
Last updated
2025-11-14
Database
2 databases
  • Coronavirus Research Database
  • ProQuest One Academic