Content area
Full text
Abstract: Phishing is described as the art of echoing a website of a creditable firm intending to grab user's private information such as usernames, passwords and social security number. Phishing websites comprise a variety of cues within its content-parts as well as the browser-based security indicators provided along with the website. Several solutions have been proposed to tackle phishing. Nevertheless, there is no single magic bullet that can solve this threat radically. One of the promising techniques that can be employed in predicting phishing attacks is based on data mining, particularly the 'induction of classification rules' since anti-phishing solutions aim to predict the website class accurately and that exactly matches the data mining classification technique goals. In this study, the authors shed light on the important features that distinguish phishing websites from legitimate ones and assess how good rule-based data mining classification techniques are in predicting phishing websites and which classification technique is proven to be more reliable.
1 Introduction
Phishing attack classically starts by sending an email that seems to come from an honest enterprise to victims asking them to update or confirm their personal information by visiting a link within the email. Although phishers are now employing several techniques in creating phishing websites to fool and allure users, they all use a set of mutual features to create phishing websites because, without those features they lose the advantage of deception. This helps us to differentiate between honest and phishing websites based on the features extracted from the visited website.
Overall, two approaches are employed in identifying phishing websites. The first one is based on blacklists [1], in which the requested URL is compared with those in that list. The downside of this approach is that the blacklist usually cannot cover all phishing websites; since, within seconds, a new fraudulent website is expected to be launched. The second approach is known as heuristic-based method [2], where several features are collected from the website to classify it as either phishy or legitimate. In contrast to the blacklist method, a heuristic-based solution can recognise freshly created phishing websites. The accuracy of the heuristic-based method depends on picking a set of discriminative features that might help in distinguishing the website class [3]. The way in...





