Credit Card Fraud Detection System
Credit Card Fraud Detection System
Credit Card Fraud Detection System
Abstract
During the past few years, the use of e-commerce has grown to a large scale.
Due to which, the use of credit card has also been increased. Many people now
use credit cards for online shopping, e-billing and other online payments. This
frequent use of credit cards is pushing the organizations and banks to implement
credit card fraud detection systems to distinguish between illicit and legitimate
transactions. These systems have trained in pre-existed datasets and then ap-
plied to the new transactions. Many techniques are used to detect fraudulent
transactions such as Genetic Algorithm (GA), Support Vector Machine (SVM)
and Artificial Immune System (SVM). In most of the techniques, the classifica-
tion results are biased towards the majority class due to this biasness False Pos-
itive Rate (FPR) and False Negative Rate (FNR) are maximized. To overcome
this problem, we have implemented three techniques, i.e. Naive Bayes (NB),
Generative Adversarial Networks (GAN) and Neural Networks (NN). The final
results are then compared in terms of accuracy, precision, recall and f-measure.
Our main objectives are to minimize the FPR and FNR, which ultimately im-
proves the identification of fraudulent and legitimate transactions. The results
show that NN outperforms NB and GAN in terms of accuracy, precision and
f-measure.
Keywords: Credit Card Fraud Detection, Classification, Naive Bayes,
∗ Corresponding author
Email address: [email protected] (Ijaz Hussaina,∗ .)
1. Introduction
Over the last decade, e-commerce has grown astonishingly for online pay-
ments [1]. For online payments, credit cards are commonly used now, which
consequently opens doors to many types of frauds [2]. Now for all credit card
issuing and online payment management organizations, an implementation of
an effective fraud detection solution is very important, in order to simultane-
ously improve customers confidence and reduce losses [3]. The purpose of fraud
detection system is to find out the doubtful usage pattern from a bunch of trans-
actions, where legitimate transactions are combined with illicit transactions, by
using data mining techniques and sophisticated analytics [4]. This means to an-
alyze large datasets and performing machine learning to discriminate legitimate
transactions from illicit transactions. For credit card fraud detection, machine
learning is extremely effective, particularly supervised classification techniques,
where for building a detection model a classifier is trained by pre-classified
datasets containing labeled transactions. The build model is capable to find
out anomalous transactions among legitimate ones. The class imbalance prob-
lem affects the supervised classification approach because there are very small
amount of illicit transactions available against legitimate ones [5]. Because of
class imbalance, legitimate transaction class is more represented in binary clas-
sification. On the other hand, for the rare class the number of examples is very
low which is discarded by learning algorithm, consider the minority class as
noise and ignore it and classify all records as majority class instances [6]. Ma-
chine learning algorithms are not appropriate for imbalanced datasets because
it typically aims to maximize accuracy[7]. In machine learning classification bi-
asness has recorded toward the class with a majority number of instances in the
training data. The classification error for majority class in training dataset is
lower than the classification error of classes with fewer instances. However, the
prior probabilities of classes are not always reflected by class frequencies in the
2
training data. In many binary classification systems implemented practically
like network intrusion prevention, fraud detection or medical diagnosis support,
False Positive Rate (FPR) or False Negative Rate (FNR) may have many ben-
efits or costs [8] and it might be necessary to control, the compromise between
these errors to a certain degree. Neyman-Pearson (NP) framework [9], the goal
is to minimize FNR, to achieve the goal the condition has implied on FPR that
it will not exceed the maximum value of FPR which is set as α.
When the class of minority is the class of interest it occurs quite naturally, as
in our case. The FPR and FNR will increase because traditional techniques like
a Support Vector Machine (SVM), Artificial Immune System (AIS) and many
machine learning techniques aims to maximize accuracy and treat minority class
as noise and ignore its instances.
Our main objectives are to minimize FPR, FNR and improve the accuracy to
detect fraudulent transactions. For this, we used Naive Bayes (NB), Generative
Adversarial Network (GAN) and Neural Networks (NN). GAN is appropriate
for credit card fraud detection because GAN generate fraud transaction from
noise value. These values are then combined with the training set to remove
class imbalance from the data. We used GAN for removing class imbalance
from data and then train a binary classifier on that data. The main problem in
GAN is stability, stability means that both the generator and discriminator can
be trained equally. If the generator outperforms discriminator then the GAN
will not be trained correctly and if the discriminator outperforms generator
then GAN will not trained accurately. Back propagation algorithm has used
for training NN. During training NN by using back propagation algorithm the
predicted values are compared with actual values to find out the error. If, the
error is greater than a constant value, then back propagate, re-assign weights
to layers and train the network again. By using back propagation in NN the
accuracy has increased and FPR and FNR have decreased.
The next section of our paper is dedicated to related work. In section 3
we described our used techniques. In section 4, we discussed our performed
simulations and results. Finally, we conclude our paper in section 5.
3
2. Related Work
Data mining based credit card fraud detection, has gained a serious atten-
tion from the researchers. Large volumes of data made available by many data
warehouses need to be carefully analyzed. The most promising and effective
solutions for credit card fraud detection are machine learning based supervised
approaches. In [10], for the first time credit card fraud detection has been
proposed using several data-visualization methods with supervised learning. In
[11], NB, K-Nearest neighbor algorithm and SVM has been applied for credit
card fraud detection. The authors also introduced bagging ensemble classifier,
based on decision tree which gives accurate results than traditional decision
tree algorithms. In [12], a model has been proposed to cure the credit card
fraud detection problem by combining Simple K Means and Principle Compo-
nent Analysis (PCA) algorithms. The geographical position of the client and
the transaction is added to traditional studied data to augment the model. The
proposed model gives best results for accuracy. However, the execution time
is increased because of k means process repetition for different initial clusters.
In [13], a hybrid approach of danger theory and SVM has been proposed for
credit card fraud detection. Danger theory removes a fraudulent transaction
and SVM then classify these transactions. The accuracy of credit card fraud
detection is improved by using danger theory because it removes bad causes
of data. However, execution time is increased because extra time is needed
to remove the abnormality from data. In [14], NN has been trained by using
an evolutionary simulated annealing algorithm to detect credit card frauds in
real time scenario. The proposed solution performed well in terms of time and
cost for both users and organization. However, by using this solution many
transactions are mis-classified, i.e. a fraudulent transaction is classified as gen-
uine or vise-versa. In [15], two data-driven approaches for real time scenario
based on optimal anomaly detection techniques have been proposed for fraud
detection. The efficiency of the approaches has been checked on real data of
European credit card holder. Both approaches provide good results on real
4
time data in terms of accuracy and false alarm rate. Both approaches provide
benefits to individual users and organizations in terms of time efficiency and
cost. However, when these approaches are applied to large datasets they did
not give good results in terms of accuracy. In [3], using SVM a personalized
system has been proposed which prevent credit card from fraud by using the
data which is collected in advance based on the behavior of consumers. In [16],
an algorithm has been proposed to successfully apply SVM on class imbalanced
data and results are compared with different algorithms. In [17], models based
on decision tree and SVM for credit card fraud detection have been proposed
and compare the results of the proposed models. In [18], a mechanism has been
implemented using Neuroph IDE, which uses NN to detect credit card fraud.
By implementing this mechanism, the classification is very accurate and errors
are within the maximum error rate. However, the iteration numbers are not
limited in advance, the mechanisms trained itself in the number of iterations
which are required, so when the number of iterations are greater the execution
time will be greater. In [19], a neural network has been designed for credit card
fraud detection, which use Genetic Algorithm (GA) for designing. GA has used
to find out best network topology, the number of nodes and the hidden layers
that are used in designing neural networks. In [20], AIS has been implemented
for credit card fraud detection, for parameter optimization GA and exhaustive
search have used and the results are compared with NN, NB and decision tree.
In [21], an AIS based fraud detection model has been proposed by using AIS
and immune system inspired algorithm. In [22], a case based genetic artificial
immune system for credit card fraud detection has been proposed which can
learn online with limited cost and time. In [7], minority class has been over-
sampled by duplicating minority class instances. However, by using this strategy
no informative contents are added to the dataset. ”A data mining with hybrid
approach based Transaction Risk Score Generation Model (TRSGM) for fraud
detection of online financial transactions” has been implemented in [23]. In [24]
and [25], for rule-based fraud detection system Decision tree classifiers has used
in which modified C4.5 algorithms has used. Real time fraud detection using
5
genetic algorithms and also minimize false alert [26]. In [27], GA and scatter
search have used to develop a credit card fraud detection system to minimize
mis-classification cost instead of mis-classification error.
We have the same goal to re-balance the training set and minimize FPR
and FNR as many of the above techniques. We are using GAN and NN for
re-balancing the training set. In, GAN the example, instances generated by
the generator are injected to the training set for balancing the minority class
instead of over-sampling. In, NN every time the predicted results are compared
with actual labels in a recursive pattern to minimize false predictions. Our
techniques, especially GAN and NN not only increase accuracy like machine
learning techniques, but also minimize FPR and FNR.
6
Dataset
Preprocessing
30% 70%
Preprocessed
Testing Set Training Set
Data
Cluster
Fraudulent
Generative
Transactio
Adversarial Neural Network Naïve Bayes
ns
Network
Combined Generated
Data Data
XgBoost XgBoost
Classifier Classifier Predictions Predictions
Predictions Predictions
7
process to divide text documents into different categories. NB classifiers are very
scalable; in learning process it requires a number of parameters which are linear
in the number of variables or features. When the dimensionality of data is high,
it is well suited. In NB method of maximum likelihood is used for parameter
estimation. In spite of oversimplified assumptions, NB often performs well in
many complex real world situations. The data inserted is split into training and
testing set. Then, prior probabilities are calculated and on the basis of that
prior probabilities trained the model to get predictions. For testing the model,
test set has passed through the model and predictions are compare actual values
to find out accuracy.
8
distribution by mapping n to these distributions and generate new examples
that look like the real data instances. The adversarial discriminator network
on the other hand has to correctly differentiate between the produced artificial
examples and the real data instances, to beat the artificial candidate production
activity of the generator.
The generator training goal is to trick the discriminator to believe that
the examples generated by the generator are real. Discriminator is trained to
minimize its prediction rate. On the other hand, the generator is trained in
such a way to maximize the prediction rate of the discriminator. This looks like
a competition between generator and discriminator and can be formalized as a
minimax game:
9
set.
2. Take all the illicit transactions from the training set T and put it in another
set denoted by F.
3. Train GAN by using the set F, tuning its hyper-parameters.
4. Synthetic examples F is generated from random noise n by using the
trained generator G* of GAN.
5. Merge the training set with F and compare the performance of C trained
with augmented set and trained with the original training set.
10
the connections between hidden and input units. The behavior of output unit
depends on the hidden units activity and the weights between output and hidden
units. This simple network is interesting because of the freedom of hidden units
to make their own representation of the input. The weights between hidden and
input units determine the activation each hidden unit, what to represent has
chosen by the hidden units by modifying these weights. First layer’s dimensions
are initialized randomly, weights between layers are initialized as zero and the
data has inserted. For training NN we are using back propagation algorithm.
During training through back propagation algorithm produced results are com-
pared with actual results in recursive pattern and calculate Mean Squared Error
(MSE). If MSE is less than a constant threshold, then return the network else
back propagate to re-assign the weights and train the network again.
1 initialize weights ;
2 for Each example in training examples do
3 prediction = output of neural network (network, example);
4 actual = actual-output(exsmple) ;
5 compute ∆(wh ) from hidden layer to output layer ;
6 compute ∆(wi ) from input layer to hidden layer ;
7 repeat update network weights until stopping criteria is satisfied ;
8 return network
Credit fraud datasets are difficult to obtain, because banks do not share
their data in public. We performed experiments on credit card dataset which is
11
available publicly [29], which contain 284,807 transactions made by European
card holders over two days in September 2013. The dataset contains 0.172%
fraudulent transactions which are 492 transactions. Time, is the time in seconds
between two transactions, Amount, is the amount of the transaction and Class,
is the predictive values of the transaction that it is legitimate or illicit. Features
with numerical values from V1 to V28 are the resulting principal component
values, which are obtained by applying principal component analysis on original
attributes due to privacy request of the releasing institution. We further divided
the dataset in the training and test set by the split ratio of 0.7 means 70% data
for training and 30% data for testing.
Experiments are performed in Spyder by using python 3.6. For NB we
loaded the data, then split the data with 0.7 split ratio in training and testing
set. Afterward, we calculate, prior probabilities of the classes and on the basis of
these prior probabilities assign class labels to new instances and then calculate
the accuracy.
For GAN we loaded the dataset of credit card transactions, remove dupli-
cation, perform data exploration and apply xgboost classifier on that dataset.
After that we isolate all fraudulent transactions from the dataset and train the
generator on that fraudulent set. We randomly set the number of generation
steps from 1 to 2000 and generate example instances through the generator.
Then combine the generated examples with the training set and train xgboost
with that augmented set. Then compare the performance of classifiers trained
with training set and with augmented set.
1
σ(z) = (2)
1 + e−z
For NN we split the loaded data with the split ratio of 0.7 in training and testing
sets. We model a NN which consist of 5 layers, in which 3 are hidden layers. Use
sigmoid function to introduce non linearity in the model. A linear combination
of its input signals computed by an NN element, and applies a sigmoid function
on the obtained result[30]. Use backward propagation algorithm for supervised
12
learning of NN and forward propagation to find the error in the model. At last
we compute the accuracy of the model.
Table 1: Precision and recall of GAN, NB and NN with respect to Number of Generations Ng
Ng Precision Recall
GAN NB NN GAN NB NN
1 0.9492 0.6900 0.9878 0.9878 0.9739 0.9876
81 0.9493 0.6920 0.9865 0.9878 0.9739 0.9867
161 0.9493 0.6910 0.9874 0.9883 0.9746 0.9870
301 0.9644 0.7010 0.9876 0.9918 0.9748 0.9871
651 0.9717 0.7030 0.9878 0.9796 0.9747 0.9874
1000 0.9959 0.7100 0.9897 0.9918 0.9746 0.9876
2000 0.9457 0.7130 0.9899 0.9920 0.9742 0.9878
Simulations are performed in python and the results are presented in Table.
1 and Table. 2.
Table 2: Accuracy and f-measure of GAN, NB and NN with respect to Number of Generations
Ng
Ng Accuracy F-measure
GAN NB NN GAN NB NN
1 0.9710 0.9739 0.9867 0.9681 0.9867 0.9882
81 0.9700 0.9742 0.9982 0.9681 0.9867 0.9884
161 0.9720 0.9744 0.9984 0.9682 0.9869 0.9886
301 0.9810 0.9740 0.9987 0.9779 0.9870 0.9889
651 0.9820 0.9736 0.9979 0.9717 0.9872 0.9887
1000 0.9900 0.9737 0.9980 0.9939 0.9871 0.9888
2000 0.9730 0.9734 0.9985 0.9682 0.9870 0.9884
We are implementing credit card fraud detection system not only to improve
the accuracy, but we will also focus to minimize FPR and FNR in classification.
13
1
0.95
0.9
GAN
NB
Precision
0.85 NN
0.8
0.75
0.7
0.65
1 81 161 301 651 1000 2000
No. of Steps
0.995
0.99
Accuracy
0.985
GAN
NB
0.98
NN
0.975
0.97
1 81 161 301 651 1000 2000
No. of Steps
14
NN outperforms NB and GAN because back propagation algorithm has used
for training NN. In back propagation training, after every iteration predictions
are compared with actual data labels.
0.992
0.99
0.988
0.986 GAN
0.984
NB
NN
Recall
0.982
0.98
0.978
0.976
0.974
0.972
1 81 161 301 651 1000 2000
No. of Steps
GAN accuracy is high at step 1000 because that is the point where the gen-
erator and discriminator are equally trained. The fake examples generated by
generator look like real and the discriminator also perform well to distinguish
between real and fake generations. That is why accuracy has increased. Figure
4, shows recall of NN, NB and GAN. Recall of GAN is better than NN and NB
it means that in GAN FNR is less as compared to NN and NB. Recall of GAN
at step 651 is low because the generator is trained well than discriminator and
fool the discriminator to assign real labels to the examples generated by the
generator that is why FNR is increased and recall is decreased.
15
0.995
0.99
0.985
F-Measure
0.98
GAN
NB
0.975
NN
0.97
0.965
1 81 161 301 651 1000 2000
No. of Steps
5. Conclusion:
In this paper, we proposed credit card fraud detection system using different
techniques like NB, GAN and NN to deal with the class imbalance problem.
Because of class imbalance machine learning ignore the minority class and treat
it as noise which affect the accuracy, FPR and FNR. We used GAN to over-
sample minority class in the data by generating example instances from noise
value and combine these examples with the training set to train a binary clas-
sifier. The performance of the classifier has been improved in terms of accuracy
FPR and FNR. In NN, back propagation algorithm is used to train the network.
In, back propagation algorithm the predictions are compared with actual labels
in a recursive pattern. By using back propagation algorithm for training NN the
FPR and FNR are fully minimized and the accuracy is increased. The overall
performance of NN is better than NB and GAN. We used these techniques for
credit card fraud detection and in future we are planning to use these techniques
in other domains where the class of interest is the minority class.
16
References
[3] R.-C. Chen, M.-L. Chiu, Y.-L. Huang, L.-T. Chen, Detecting Credit Card
Fraud by Using Questionnaire-Responded Transaction Model Based on
Support Vector Machines., Intelligent Data Engineering and Automated
Learning (2004) 800 – 806.
17
[10] B. G. Becker, Using mineSet for knowledge discovery, IEEE Computer
Graphics and Applications 17 (4) (1997) 75–78.
[17] Y. Sahin, E. Duman, Detecting Credit Card Fraud by Decision Trees and
Support Vector Machines, International Multiconference of Engineers and
computer scientists, 2011, pp. 442-447.
18
[19] R. Oberoi, Improving a credit card fraud detection system using genetic
algorithm, International Journal of Computer and Mathematical Sciences
6 (6) (2017) 436–440.
[21] N. Soltani Halvaiee, M. K. Akbari, A novel model for credit card fraud de-
tection using Artificial Immune Systems, Applied Soft Computing Journal
24 (2014) 40–49.
[22] J. Tuo, S. Ren, W. Liu, X. Li, B. Li, L. Lei, Artificial immune system for
fraud detection, Proceeding of IEEE International Conference on Systems,
Man and Cybernetics, 2004, pp. 1407-1411.
[23] A. R. Jyotindra, N.D., A data mining with hybrid approach based Transac-
tion Risk Score Generation Model (TRSGM) for fraud detection of online
financial transaction, Int. J. Comput. Appl.l 16 18–25.
19
[28] S. Bhattacharyya, S. Jha, K. Tharakunnel, J. C. Westland, Data mining for
credit card fraud: A comparative study, Decision Support Systems 50 (3)
(2011) 602–613.
20