Content area
Credit card frauds are easy and friendly targets. E-commerce and many other online sites have increased the online payment modes, increasing the risk for online frauds. Increase in fraud rates, researchers started using different machine learning methods to detect and analyse frauds in online transactions. The main aim of the paper is to design and develop a novel fraud detection method for Streaming Transaction Data, with an objective, to analyse the past transaction details of the customers and extract the behavioural patterns. Where cardholders are clustered into different groups based on their transaction amount. Then using sliding window strategy to aggregate the transaction made by the cardholders from different groups so that the behavioural pattern of the groups can be extracted respectively. Later different classifiers are trained over the groups separately. And then the classifier with better rating score can be chosen to be one of the best methods to predict frauds. Thus, followed by a feedback mechanism to solve the problem of concept drift. In this paper, we worked with European credit card fraud dataset.
Keywords: Credit Card Fraud Detection, Machine Learning, Deep Learning, Streaming Transaction Data, Behavioural Pattern Analysis.
ABSTRACT
Credit card frauds are easy and friendly targets. E-commerce and many other online sites have increased the online payment modes, increasing the risk for online frauds. Increase in fraud rates, researchers started using different machine learning methods to detect and analyse frauds in online transactions. The main aim of the paper is to design and develop a novel fraud detection method for Streaming Transaction Data, with an objective, to analyse the past transaction details of the customers and extract the behavioural patterns. Where cardholders are clustered into different groups based on their transaction amount. Then using sliding window strategy to aggregate the transaction made by the cardholders from different groups so that the behavioural pattern of the groups can be extracted respectively. Later different classifiers are trained over the groups separately. And then the classifier with better rating score can be chosen to be one of the best methods to predict frauds. Thus, followed by a feedback mechanism to solve the problem of concept drift. In this paper, we worked with European credit card fraud dataset.
1. INTRODUCTION: Fraud detection involves in monitoring the activities of the populations of users to avoid objectionable behavior which consists of fraud and defaulting. This has been a very relevant problem that demands the attention of communities such as machine learning and data science where the solution of this problem can be automated. As this kind of problem is particularly challenging in a learning perspective, as it is characterized by various factors such as class imbalance. The number of valid transactions far outnumbered fraudulent ones and also the transaction pattern changes in statistical properties over the course of time. Thus, in real world examples, the massive streams of payment requests are been quickly scanned by the automatic tools that determine which transaction to be authorized in order to prevent the performance of the fraud detection overtime As for many banks which has retained high profitable customers has been the number one business goal, these banking frauds poses a significant threat at different banks but in terms of substantial financial losses, trust and credibility has been a concerning issue for both banks and customers. In banking industry, credit card fraud detection using machine learning is not only a trend but it has been a necessity to put the proactive monitoring and fraud prevention mechanisms in place.
As machine learning has helped these institutions for reducing time consuming manual reviews and as well as the denial of legitimate transactions So in this project, we have detected the fraudulent credit card transactions with the help of machine learning models and analyze the customer data which has been collected. Machine learning is a subfield of Artificial Intelligence (AI) which is generally used to understand the structure of data and fit the data into models which can be understood and used by people. Since, it is a field of learning machines and it differs from traditional computing approach and the algorithms which are instructed by the computer to calculate on solving problem. The algorithm instead allows the computer to train on data inputs and use statistical analysis for output values which falls in a specific range. So the machine learning makes easy for computers in building models from sample data in order to operate the decision making process based on the data inputs.
The class imbalance is the problem in ML where the total number of a class of data (positive) is far less than the total number of another class of data (negative). The classification challenge of the unbalanced dataset has been the subject of several studies. An extensive collection of studies can provide several answers. Therefore, to the best of our knowledge, the problem of class imbalance has not yet been solved. We propose to alter the DL algorithm of the CNN model by adding the additional layers for features extraction and the classification of credit card transactions as fraudulent or otherwise. The top attributes from the prepared dataset are ranked using feature selection techniques. After that, CCF is classified using several supervised machine-driven and deep learning models. In this study, the main aim is to detect fraudulent transactions using credit cards with the help of ML algorithms and deep learning algorithms.
2. LITURETURE SURVEY: Mohamad Zamini purposed an unsupervised fraud detection method using autoencoder based clustering. The autoencoder is an auto associator neural network they have used it to lower the dimensionality, extract the useful features, and increase the efficiency of learning in a neural network. They had used European dataset with 284807 transactions in which 0.17% is the fraud and trained there autoencoder based clustering with the following parameters Number of iterations = 300 Number of clusters = 2 Clustering initialization = k-means++ Divergence tolerance = 0.001 Learning rate of the model = 0.1 Number of epochs = 200 Activation function = elu, Relu. As a result, they got their training loss as 0.024 and validation loss as 0.027 and the mean of not fraud data 75% less than the mean of reconstructive error that is 25% the design of there is model is context-free. In concern about the model predictions, the True positive are 56,257, False negative is 607, False positive are 18, True negatives is 80 and the best preferred are (56,257 + 80 = 56,337). The right predictions made are 56,337 out of 284807. [Credit Card Fraud Detection using auto encoder-based clustering.
They made a comparison based on two random forests. Random-tree based random forest CART-based random forest. They use different random forest algorithms to train the behaviour features of normal and abnormal transactions, and both of the algorithms are different in their base classifications and their performance. They applied both algorithms on the dataset e-commerce company in China. In which the fraud transaction in the subsets ratio is 1:1 to 10:1. As a result, accuracy from the random-tree based random forest is 91.96% whereas in CART-based random forest is 96.7%. Since the data used is from the B2C dataset many problems arrived such as unbalanced data. Hence, the algorithm can be improved. [Random Forest for Credit Card Fraud Detection
3. SYSTEM ANALYSIS
3.1 EXISTING SYSTEM
Credit card frauds are easy targets. Without any risks, a significant amount can be withdrawn without the owner's knowledge, in a short period. Fraudsters always try to make every fraudulent transaction legitimate, which makes fraud detection very challenging and difficult task to detect Predictive modeling for credit card fraud analysis using K-Nearest Neighbours (KNN) involves classifying credit card transactions as either legitimate or fraudulent based on their features, such as transaction amount, time, and user behavior. In the context of credit card fraud detection, the K-Nearest Neighbours (KNN) algorithm is a simple yet effective machine learning technique used for classification tasks. KNN works by analysing the features of a given transaction and comparing them with the features of the nearest neighbours in the feature space. When a new transaction is encountered, KNN identifies the 'K' most similar transactions from the historical dataset, and the majority class (fraud or non-fraud) among these neighbours is used to classify the new transaction.
LIMITATION OF EXISTING SYSTEM
* High-dimensional data reduces KNN's effectiveness.
* KNN accuracy decreases with noisy data.
* Outliers cause misclassification in fraud detection.
3.2 PROPOSED SYSTEM
The proposed model aims to classify credit card transactions as fraudulent or legitimate using machine learning (ML), deep learning (DL), and support vector machines (SVM). The dataset undergoes preprocessing, addressing missing values and outliers before being split into training and testing sets. NLP techniques like tokenization and named entity recognition extract valuable text features. Machine learning algorithms are trained on structured and unstructured data to improve fraud detection accuracy. The models are evaluated using precision, recall, and F1-score to ensure optimal performance. By integrating multiple techniques, the system enhances fraud detection efficiency, minimizing false positives and improving security.
4. SYSTEM ARCHITECTURE
5. METHO 1DOLOGY
The proposed credit card fraud detection system follows a structured methodology to ensure accurate classification of fraudulent transactions. Data collection is the first step, where a dataset of previous credit card transactions is gathered. This dataset includes both fraudulent and non-fraudulent cases, providing a foundation for model training.
Data preprocessing is then performed to handle missing values, remove outliers, and normalize transaction amounts. The dataset is split into training and testing sets, ensuring that the model learns patterns effectively while being tested on unseen data. Feature engineering techniques, including Natural Language Processing (NLP) methods like tokenization and named entity recognition, are applied to extract valuable insights from unstructured data.
Multiple machine learning (ML) and deep learning (DL) algorithms, including Support Vector Machines (SVM), Decision Trees, and Neural Networks, are implemented to classify transactions. These models are trained using the training dataset and evaluated on the test dataset using performance metrics such as precision, recall, and F1-score.
To further improve accuracy, an ensemble learning approach is considered, combining multiple classifiers to enhance prediction performance. Finally, the trained model is deployed, enabling real-time fraud detection and enhancing the security of online transactions.
6.MODULES
The implementation of Credit Card Fraud Detection using Machine Learning (ML) and Deep Learning (DL) techniques consists of several essential modules:
1. Data Pre-processing Module: This module includes handling missing data, removing or treating outliers, scaling numerical features, encoding categorical variables, and addressing class imbalance using techniques like SMOTE (Synthetic Minority Over-sampling Technique) or class weighting to ensure the model can accurately detect fraud despite the imbalance in transaction data.
2. Feature Engineering and Selection Module: This step involves creating new features from existing data (e.g., transaction time features, user behavior patterns) and selecting the most relevant features for training, which helps improve model accuracy and efficiency.
3. Model Building Module: This includes training machine learning models and deep learning models. It involves selecting appropriate algorithms based on data characteristics and tuning hyper parameters for optimal performance.
4. Model Evaluation Module: After training, models are evaluated using performance metrics like accuracy, precision, recall, F1-score, bAUC-ROC to ensure the model correctly identifies fraudulent transactions while minimizing false positives.
5. Deployment and Monitoring Module: This module deploys the trained model in a production environment, integrates it with real-time transaction systems, and continuously monitors its performance. It also includes periodic retraining of the model to adapt to new fraud patterns.
6. Visualization and Reporting Module: Visual tools and dashboards are used to track model performance, interpret results, and report insights, helping stakeholders understand fraud detection effectiveness and areas for improvement.
7. RESULTS:
In Credit Card Fraud Detection using Machine Learning (ML) and Deep Learning (DL) techniques, the results typically show that both approaches can effectively identify fraudulent transactions, though they have different strengths. Machine learning models often perform well in terms of accuracy and interpretability, particularly when tuned for imbalanced datasets. However, deep learning models, such as, can offer higher performance when trained on large datasets, capturing complex patterns in transaction data that simpler models might miss. Evaluation metrics such as are crucial, as they reveal the model's ability to minimize false positives while maximizing true positives (fraudulent transactions correctly identified). The credit card fraud detection system was successfully executed through a series of inputs via a user-friendly web interface. Users entered transaction details like amount, time, method, transaction ID, card type, location, and bank details. The model then predicted the transaction as "Fraudulent" or "Not Fraudulent.
Fig 2: Model accuracy
The above graph represents the model accuracy for credit card fraud detection. Among various models, the neural network achieved the highest accuracy of 98%, followed by XGBoost at 97%, random forest at 96%, logistic regression at 94%, and decision tree at 92%.
Generally, deep learning methods tend to perform better on larger datasets, whereas traditional ML models are more efficient and interpretable for smaller datasets or real-time applications. The discussion highlights that the choice between ML and DL depends on dataset size, complexity, and the need for model interpretability versus accuracy.
8. CONCLUSION
In conclusion, credit card fraud detection using machine and deep learning techniques offers a highly effective and adaptive approach to identifying and preventing fraudulent activities in real-time. By leveraging advanced algorithms and models, the system can continuously learn from historical and real-time data to detect subtle patterns and behaviours indicative of fraud. The ability to process large volumes of transaction data quickly and accurately reduces the risk of financial losses for both cardholders and financial institutions. Furthermore, the continuous feedback loop and model retraining ensure that the system evolves with emerging fraud tactics, maintaining its relevance and effectiveness. With robust security and compliance measures in place, this approach not only enhances fraud detection but also ensures that user data is safeguarded. Overall, machine and deep learning-based fraud detection systems are essential in today's rapidly evolving financial landscape, providing a reliable and scalable solution to combat credit card fraud.
FUTURE SCOPE
In Credit Card Fraud Detection using Machine Learning and Deep Learning Techniques, applying Convolutional Neural Networks (CNNs) is an innovative approach that has shown promising results, particularly in identifying patterns in transaction data. Traditionally, CNNs are used in image and video recognition tasks, but in fraud detection, they can be used to capture spatial hierarchies and patterns within sequential transaction data by treating the features as "spatial" data. CNNs can automatically learn and extract relevant features from the raw transaction data, such as transaction amount, time, and user behaviour, without the need for manual feature engineering.
REFERENCES
[1]. Credit card fraud detection: A realistic modeling and a novel learning strategy. IEEE Transactions on Neural Networks and Learning Systems,29(8):3784-3797, August 2018.
[2] J. O. Awoyemi, A. O. Adetunmbi, and S. A. Oluwadare. Credit card fraud detection using machine learning techniques: A comparative analysis. In 2017 International Conference on Computing Networking and Informatics (ICCNI), pages 1-9, 2017.
[3] M. Azhan, M. Ahmad, and M. S. Jafri. Me too: Sentiment analysis using neural networks (grand challenge). In 2020 IEEE Sixth International Conference on Multimedia Big Data (BigMM), pages 476-480, 2020.
[4] Fabrizio Carcillo, Yann-Aël Le Borgne, Olivier Caelen, Yacine Kessaci Fredéric Oblé, and Gianluca Bontempi. Combining unsupervised and supervised learning in credit card fraud detection. Information Sciences, May 2019.
[5] Fabrizio Carcillo, Andrea Dal Pozzolo, Yann-A el Le Borgne, Olivier Caelen, Yannis Mazdzer, and Gianluca Bontempi. SCARFF: A scalable framework for streaming credit card fraud detection with spark. Information Fusion, 41:182-194, May 2018.
[6] Kaithekuzhical Leena Kurien·1 Dr. Ajeet Chikkamannur2. Detection and prediction of credit card fraud transactions using machine learning.2019.
[7] Selvani Deepthi Kavila Lakshmi S V S S. Machine learning for credit card fraud detection system. Online Journal, 2018.
[8] Bertrand Lebichot, Yann-Ael Le Borgne, Liyun He-Guelton, FredericOble, and Gianluca Bontempi. Deep-learning domain adaptation techniques for credit cards fraud detection. In Proceedings of the International Neural Networks Society, pages 78-88. Springer International Publishing, April 2019.
[9] S P Maniraj, Aditya Saini, Shadab Ahmed, and Swarna Deep Sarkar and. Credit card fraud detection using machine learning and data science. International Journal of Engineering Research and, 08(09), September 2019.
[10] Andrea Dal Pozzolo. Adaptive Machine Learning for Credit Card Fraud Detection. Thesis (Ph.D.),
[11] Andrea Dal Pozzolo, Olivier Caelen, Yann-Ael Le Borgne, Serge Waters choot, and Gianluca Bontempi. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Systems with Applications, 41(10):4915-4928, August 2014.
[12] Andrea Dal Pozzolo, Olivier Caelen, Reid A. Johnson, and Gianluca Bontempi. Calibrating probability with under sampling for unbalanced classification. In 2015 IEEE Symposium Series on Computational Intelligence. IEEE, December 2015.
[13] Alex Sherstinsky. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Physica D: Nonlinear Phenomena, 404:132306, March 2020.
Copyright Kohat University of Science and Technology (KUST) 2025