Content area
Social engineering is a manipulation technique that influences individuals into unintentionally sharing private data (Alzahrani, 2020). SMS Phishing (Smishing) attacks use social engineering to trick victims into ignoring security protocols and sharing confidential information (Salahdine & Kaabouch, 2019). Phishing attacks account for 90% of data breaches, and 84% of organizations were targeted by at least one Smishing attempt in 2023 (Verizon Business, 2024). These attacks have become the preferred method for adversaries who exploit victims’ behavior and emotions to gain their trust.
Keeping up with the evolving tactics of cybercriminals has been difficult for traditional Smishing detection models with limited computational resources. This research examines the challenge of detecting and classifying Smishing within a multiclass dataset, focusing on improving the detection of minority classes. A Deep Learning (DL) based phishing detection system building on the work by Mishra & Soni, (2023b) expands detection from binary to multiclass to better identify minority Smishing types using four different feature types.
This research explores the key features that contribute in differentiating Smishing and Spam from legitimate messages using the “SMS Phishing Dataset for Machine Learning and Pattern Recognition” (Mishra & Soni, 2023b) dataset. The analysis finds that URLs and email addresses are the most important features for this classification. After testing various ensemble models, this study shows that a chain transformer model using GPT-2 for generating synthetic data and BERT for embeddings as a multiclass classifier works best for detecting and classifying phishing attacks, especially for rare Smishing data. The MultiChainGuard model shows advanced phishing detection effectiveness by achieving over 97% precision in identifying different types of phishing.
This research fills a gap in current phishing detection capabilities by identifying the best deep learning architecture for Smishing detection using a multiclass dataset. The analysis focuses on deploying the model on devices with limited computational resources using small, open-source models. Integrating multiple deep-learning models to improve Smishing detection advances the SMS Phishing field.
Creating efficient and accurate Smishing detection models can improve user protection against growing cyberattacks. Using open source resources will make this security measure available to more users, helping create a safer online environment.
