Credit Card Fraud Detection - Machine Learning Methods

18th International Symposium INFOTEH-JAHORINA, 20-22 March 2019

Credit Card Fraud Detection - Machine Learning

Dejan Varmedja, Mirjana Karanovic, Srdjan Sladojevic, Marko Arsenovic, Andras Anderla
Faculty of Technical Sciences
University of Novi Sad
Novi Sad, Serbia
[email protected], [email protected], [email protected], [email protected], [email protected]

Abstract— Credit card fraud refers to the physical loss of credit amount of frauds is proportionally the same or have decreased
card or loss of sensitive credit card information. Many machine- due to sophisticated fraud detection systems. However,
learning algorithms can be used for detection. This research fraudsters are constantly coming up with new ways to steal
shows several algorithms that can be used for classifying information [4].
transactions as fraud or genuine one. Credit Card Fraud
Detection dataset was used in the research. Because the dataset There are two types of credit card frauds. One is theft of
was highly imbalanced, SMOTE technique was used for physical card, and other one is stealing sensitive information
oversampling. Further, feature selection was performed and from the card, such as card number, cvv code, type of card and
dataset was split into two parts, training data and test data. The other. By stealing credit card information, a fraudster can
algorithms used in the experiment were Logistic Regression, broach a large amount of money or make a large amount of
Random Forest, Naive Bayes and Multilayer Perceptron. Results purchase before cardholder finds out. Because of that,
show that each algorithm can be used for credit card fraud companies use various machine learning methods to recognize
detection with high accuracy. Proposed model can be used for which transactions are fraudulent and which are not.
detection of other irregularities.
The purpose of this paper is to analyze various machine-
Keywords: Fraud; Logistic Regression; Multilayer Perceptron; learning algorithms, such as Logistic Regression (LR),
Naive Bayes; Random Forest; Random Forest (RF), Naïve Bayes (NB) and Multilayer
Perceptron (MLP) in order to determine which algorithm is
I. INTRODUCTION most suitable for credit card fraud detection.
There are growing number of new companies all around the The rest of the paper is structured as follows: in Section II
world [1]. All of that companies are trying to provide best researches that deal with specified problem are presented,
service quality for their customers. In order to succeed in that, Section III gives a brief description of the dataset which is used
companies are processing a lot of data on a daily basis. These in the experiment, after which the results are presented in
data comes from vast number of resources and are in different Section IV. Finally, concluding remarks are discussed in
formats. Moreover, this data contains some of the key parts of Section V, followed by a list of literature.
the company’s future business. Because of that, companies
need to store that data, to process it and what is really II. RELATED WORK
important, to keep it safe. Without securing data, a lot of it can
be used by other companies or even worse, it can be stolen. In
most cases, financial information is stolen, which can harm Fraudulent activities are causing major loss, which
whole company or individual. motivated researchers to find a solution that would detect and
prevent frauds. Several methods have already been proposed
There are several types of frauds [2]. Check Fraud occurs and tested. Some of them are briefly reviewed below.
when person forges a check or pays for something with check
knowing that there is not enough money. Internet sales is fraud Classical algorithms such as Gradient Boosting (GB),
where fraudster sale fake items or counterfeit items, or taking Support Vector Machines (SVM), Decision Tree (DT), LR and
payment without delivering the item. There are a couple more, RF proven useful. In paper [5] GB, LR, RD, SVM and a
such as charities fraud, identity theft, credit card fraud, debt combination of certain classifiers was used, which led to high
elimination, Insurance fraud and others. Due to increasing recall of over 91% on a European dataset. High precision and
popularity of cashless transactions, one of the most common recall were achieved only after balancing the dataset by
frauds are credit card frauds. Credit card fraud refers to the undersampling the data. In paper [6], European dataset was
situation where fraudster uses credit card for their needs while also used, and comparison was made between the models based
owner of that credit card is not aware of that. Fraudulent on LR, DT and RF. Among the three models, RF proved to be
transactions conducted using credit cards acquired worldwide the best, with accuracy of 95.5%, followed by DT with 94.3%
amounted to €1.8 billion in 2016 [3]. Although there is a and LR with accuracy of 90%.
tremendous volume increase in credit card transactions, the

According to [7] and [8], k-Nearest neighbors (KNN) and transformation of these input variables were performed in
outlier detection techniques can also be efficient in fraud order to keep these data anonymous. Three of the given
detection. They are proven useful in minimizing false alarm features weren’t transformed. Feature "Time" shows the time
rates and increasing fraud detection rate. KNN algorithm also between first transaction and the every other transaction in the
performed well in experiment for paper [9], where the authors dataset. Feature "Amount" is the amount of the transactions
tested and compared it with other classical algorithms. made by credit card. Feature "Class" represents the label, and
takes only 2 values: value 1 in case of fraud transaction and 0
Unlike so far mentioned papers, in paper [10] a
comparison was made between some classical algorithms and
Dataset contains 284,807 transactions where 492
deep learning techniques. All of the tested techniques achieved
transactions were frauds and the rest were genuine.
accuracy of approximately 80%. Authors of paper [11], set
Considering the numbers, we can see that this dataset is highly
side by side following algorithms: RF, GB, LR, SVM, DT,
imbalanced, where only 0.173% of transactions are labeled as
KNN, NB, XGBoost (XGB), MLP and stacking classifier (a
combination of multiple machine learning classifiers), while
using European dataset. As a result of thorough data Since distribution ratio of classes plays an important role
preprocessing, all of the algorithms accomplished high in model accuracy and precision, preprocessing of the data is
accuracy of over 90%. Stacking classifier was most successful crucial.
with accuracy of 95% and recall value of 95%. In paper [12], a
B. Preprocessing
neural network was tested on the European dataset.
Experiment included back propagation neural network that Feature selection is a fundamental technique, which selects
was optimized with Whale algorithm. Neural network the variables that are most relevant in the given dataset.
consisted of 2 input layers, 20 hidden and 2 output layers. Due Carefully choosing appropriate features and removing the less
to optimization algorithm, they achieved exceptional results important one can reduce overfitting, improve accuracy and
on 500 test samples: 96.40% accuracy and 97.83% recall. reduce training time. Visualization techniques can be helpful
Authors of paper [13] and [14] used neural networks, in order in that process. Feature selector tool [18] by Will Koehrsen
to demonstrate improvement in results when ensemble was used in this experiment for that purpose. By using this
techniques are used. In paper [15] three datasets were used for tool it has been determined which features are the most
comparison between Auto-encoder and Restricted Boltzmann important. Furthermore, features that do not contribute to the
Machine algorithms, which led to the conclusion that cumulative importance of 95% were removed. After the
algorithms like MLP can be suitable for credit card fraud feature selection technique, 27 features were selected for
detection. additional experiment.

Numerous papers are focused on detecting fraudulent Machine learning algorithms have trouble learning when
transactions using deep neural networks. However, these classification categories are not approximately equally
models are computationally expensive and perform better on distributed. Considering given data is highly imbalanced, it is
larger datasets [16]. This approach may lead to great results, necessary to perform some kind of balancing, so that model
as we saw in some papers, but what if same results, or even can be efficiently trained. Frequently used methods for
better, can be achieved with less amount of resources? Our adjusting the class distribution include undersampling the
main goal is to show that different machine learning majority class, oversampling the minority class, or
algorithms can give decent results with appropriate combination of those two. Synthetic Minority Oversampling
preprocessing. Authors of most of the mentioned paper used Technique (SMOTE) is a popular oversampling method that
undersampling technique, and that was a motivation for using has proven useful when used on imbalanced dataset [19], [20].
a different approach – oversampling technique. SMOTE was proposed method to improve random
oversampling (Fig. 1).
Considering given facts, authors of this paper decided to
compare the suitability of LR, RF, NB and MLP for credit
card fraud detection. In order to achieve that, an experiment
was conducted.


A. Dataset
In this research the Credit Card Fraud Detection dataset
was used, which can be downloaded from Kaggle [17]. This Figure 1. Class distribution before and after sampling
dataset contains transactions, occurred in two days, made in Many machine-learning algorithms expect the scale of the
September 2013 by European cardholders. input. Taking into account that values of time and amount are
The dataset contains 31 numerical features. Since some of highly varying, scaling is done in order to bring all features to
the input variables contains financial information, the PCA the same level of magnitudes.
The experiment system environment is Windows 10 IV. RESULTS AND DISCUSSION
operating system, and the software operating environment is To determine which algorithm is most suitable for the
Spyder, scientific python development environment, which is problem of detecting fraud transactions, different criteria for
part of the Anaconda platform. Used libraries include: numpy,
algorithm comparison have been used. Most used metrics for
pandas, matplotlib, sklearn and imblearn.
determining the results of machine learning algorithms are
Previously mentioned algorithms used in the experiment
accuracy, recall and precision. All of the mentioned metrics
are described in the following section.
can be calculated from a Confusion matrix.
C. Experiment Evaluation of a model’s performance was made in
Logistic regression is one of the most popular accordance to these metrics. Models were tested both on
classification algorithm in machine learning. The logistic original and over-sampled data and the results have shown that
regression model describes relationship between predictors sampling is very important.
that can be continuous, binary, and categorical. Dependent Since the test set consists of 20% of the whole dataset,
variable can be binary. Based on some predictors we predict total sum of samples is 56962. Of the total of 98 fraud
whether something will happen or not. We estimate the transactions, LR model (Table 1) achieved:
probability of belonging to each category for a given set of
predictors. • precision: 58.82%,
Naive Bayes is one of the supervised learning algorithms • recall: 91.84%,
in which there are not dependencies between attributes. It's • accuracy: 97.46%.
based on Bayes theorem. Depending of the type of distribution
there are following algorithms: Gaussian distribution, TABLE 1:CONFUSION MATRIX FOR LR
Multinomial distribution, Bernoulli distribution. In this
research, Bernoulli distribution is used for detecting fraud Predicted
transactions. 0 1
Random forest is an algorithm that can be used in both Actual
0 55424 1440
classification and regression problems. It consists of many
decision trees. This algorithm gives better results when there
is higher number of trees in the forest and preventing model to 1 8 90
overfitting. Each decision tree in forest gives some results.
These results are merged together in order to get more
accurate and stable prediction. NB model obtained following results (Table 2):

Multilayer perceptron is feedforward artificial neural • precision: 16.17%,

network that consists of minimum 3 layers of nodes: input • recall: 82.65%,
layer, hidden layer and output layer. Each node use activation
function. Activation function calculates weighted sum of its • accuracy: 99.23%
inputs and adds bias. This allows us to decide which neuron
should be removed and not considered in outside connections.
ANN used in the experiment consisted of 4 hidden layers
with 50, 30, 30 and 50 units in each hidden layer, respectively, 0 1
with relu activation function. It has been shown that deeper
networks acquire better results than those with smaller number 0 56444 420

of layers [20]. Following this experience, we started with a

smaller number of layers gradually increasing them in order to 1 17 81
get acceptable results. Therefore, the best hyper-parameters
were chosen based on exhaustive research. Further enhancing
of the network caused greater computational time and RF model obtained following results (Table 3):
obtained results didn’t differ much from the chosen
architecture. Weight optimization was accomplished with • precision: 96.38%,
Adam, stochastic gradient-based optimizer. • recall: 81.63%,
Train and test set were split in 80:20 ratio and the model • accuracy: 99.96%.
was updated through multiple epochs, based on tolerance for
the optimization (TOL). When the loss or score is not
improving by at least TOL for specified consecutive iterations,
convergence is considered to be reached and training stops.
TABLE 3:CONFUSION MATRIX FOR RF transactions. Hence, comparison was made and it was
established that Random Forest algorithm gives the best
Predicted results i.e. best classifies whether transactions are fraud or not.
0 1 This was established using different metrics, such as recall,
accuracy and precision. For this kind of problem, it is
0 56861 3 important to have recall with high value. Feature selection and

balancing of the dataset have shown to be extremely important

1 18 80 in achieving significant results.
Further research should focus on different machine
learning algorithms such as genetic algorithms, and different
MLP model obtained following results (Table 4): types of stacked classifiers, alongside with extensive feature
selection to get better results.
• precision: 79.21%,
• recall: 81.63%,
This work has been funded by the SENSors and
• accuracy: 99.93% Intelligence in BuLt Environment (SENSIBLE) project with
Grant agreement ID: 734331. Authors would also like to thank
TABLE 4:CONFUSION MATRIX FOR MLP the PanonIT company for their support.
0 1

0 56843 21 [1] Global Facts (2019). Topic: Startups worldwide. [online] Available at:
Actual [Accessed 10
Jan. 2019].
1 18 80 [2] Legal Dictionary (2019). Fraud - Definition, Meaning, Types, Examples
of fraudulent activity. [online] Available at: [Accessed 15 Jan. 2019].
[3] European Central Bank (2018). Fifth report on card fraud, September
By analyzing obtained results, it is obvious that accuracy is 2018. [online]. Available at:
extremely high, although that doesn’t mean that results are
09.en.html#toc1 [Accessed 21 Jan. 2019].
perfect. Accuracy must be taken “with a grain of salt” –
[4] (2019). Credit card fraud. [online] Available at:
desirably it should be interpreted in combination with some [Accessed 24 Jan.
other metrics. According to given results, it is shown that 2019].
classic algorithm, like RF can give similar results as a simple [5] A. Mishra, C. Ghorpade, “Credit Card Fraud Detection on the Skewed
neural network. Data Using Various Classification and Ensemble Techniques” 2018
Comparison of the obtained results with results achieved in IEEE International Students' Conference on Electrical, Electronics and
Computer Science (SCEECS) pp. 1-5. IEEE.
researches on the same dataset, with classical algorithms [5]
[6] S. V. S. S. Lakshmi, S. D. Kavilla “Machine Learning For Credit Card
and [8], show that oversampling the data can improve fraud Fraud Detection System”, unpublished
detection rate. Like in papers [10] and [11], it is proven that [7] N. Malini, Dr. M. Pushpa, “Analysis on Credit Card Fraud Identification
classical algorithms can be as successful as deep learning Techniques based on KNN and Outlier Detection“, Advances in
algorithms. Although papers [12] and [15] represent deep Electrical, Electronics, Information, Communication and Bio-
Informatics (AEEICB), 2017 Third International Conference on pp. 255-
learning algorithms as optimal for this type of problems, it 258. IEEE.
should be decided according to the situation which of these [8] Mrs. C. Navamani, M. Phil, S. Krishnan, “Credit Card Nearest Neighbor
should be used. For example, deep networks work better with Based Outlier Detection Techniques”
more data and can be adapted to different domains more easily [9] J. O. Awoyemi, A. O. Adentumbi, S. A. Oluwadare, “Credit card fraud
than classical algorithms. On the other hand, if there is not detection using Machine Learning Techniques: A Comparative
much data, it is probably better to work with classical Analysis”, Computing Networking and Informatics (ICCNI), 2017
International Conference on pp. 1-9. IEEE.
algorithms. These algorithms are also easier to interpret and
[10] Z. Kazemi, H. Zarrabi, “Using deep networks for fraud detection in the
cheaper, both in financial and computational sense [21]. credit card transactions”, Knowledge-Based Engineering and Innovation
(KBEI), 2017 IEEE 4th International Conference on pp. 630-633. IEEE.
V. CONCLUSION [11] S. Dhankhad, B. Far, E. A. Mohammed, “Supervised Machine Learning
Credit card frauds represent a very serious business Algorithms for Credit Card Fraudulent Transaction Detection: A
Comparative Study”, 2018 IEEE International Conference on
problem. These frauds can lead to huge losses, both business Information Reuse and Integration (IRI) pp. 122-125. IEEE.
and personal. Because of that, companies invest more and [12] C. Wang, Y. Wang, Z. Ye, L. Yan, W. Cai, S. Pan, “Credit card fraud
more money in developing new ideas and ways that will help detection based on whale algorithm optimized BP neural network”, 2018
to detect and prevent frauds. 13th International Conference on Computer Science & Education
The main goal of this paper was to compare certain (ICCSE) pp. 1-4. IEEE.
machine learning algorithms for detection of fraudulent [13] N. Kalaiselvi, S. Rajalakshmi, J. Padmavathi, “Credit card fraud
detection using learning to rank approach”, 2018 Internat2018
International Conference on Computation of Power, Energy, Information [18] Github (2019). Feature selector. [online] Available at:
and Communication (ICCPEIC) ional conference on computation of [Accessed 18 Jan.
power, energy, Information and Communication (ICCPEIC) pp. 191- 2019].
196. IEEE [19] Garćıa, Salvador and Nitesh V. Chawla. “SMOTE for Learning from
[14] F. Ghobadi, M. Rohani, “Cost Sensitive Modeling of Credit Card Fraud Imbalanced Data : Progress and Challenges, Marking the 15-year
using Neural Network strategy”, 2016 Signal Processing and Intelligent Anniversary.” (2018), Journal of Artificial Intelligence Research, 61, pp.
Systems (ICSPIS), International Conference of pp. 1-5. IEEE. 863-905.
[15] A. Pumsirirat, L. Yan, “Credit Card Fraud Detection using Deep [20] J. Wang, M. Xu, H. Wang and J. Zhang, "Classification of Imbalanced
Learning based on Auto-Encoder and Restricted Boltzmann Machine”, Data by Using the SMOTE Algorithm and Locally Linear Embedding",
2018 International journal of advanced computer science and Signal Processing, 2006 8th International Conference on (Vol. 3). IEEE.
applications, 9(1), pp. 18-25 2006 8th international Conference on Signal Processing, Beijing, 2006
[16] Learning – Towards Data Science. [online] Available at: [21] (2019). Deep Learning. [online] Available at: [Accessed 11 Jan. 2019].
learning-9a42c6d48aa [Accessed 19 Jan. 2019].
[17] (2019). Credit Card Fraud Detection. [online] Available at: [Accessed 10 Jan.

