mlproject

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 8

Abstract

When it comes to the simplicity of making a payment while sitting anywhere in


the world, online payments have been a source of attractiveness. Over the past
few decades, there has been an increase in online payments. E-payments enable
businesses earn a lot of money in addition to consumers.
However, because electronic payments are so simple, there is also a risk of
fraud associated with them. A consumer must ensure that the payment he is
paying is going exclusively to the appropriate service provider. Online fraud
exposes users to the possibility of their data being compromised, as well as the
inconvenience of having to report the fraud, block their payment method, and
other things.
When businesses are involved, it causes some issues; occasionally, they must
issue refunds in order to keep custom
Introduction
Online payments have become more popular during the last few decades. This is
because it’s so simple to send money from anywhere, but the pandemic has also
contributed significantly to the rise in e-payments. Numerous studies have
demonstrated that e-commerce and online payments will continue to grow in
popularity in the years to come. The risk of online payment fraud has also
increased as a result of this rise in online payments. Online payment fraud has
been shown to have increased over the past few years, making it crucial for
consumers and service providers to be aware of these frauds. It is crucial for
users to be certain that the payments they make are going to the legitimate
recipients; otherwise, they run the risk of having to report fraud, freeze their
payment method, and run the chance of having their data shared with criminals,
which could occasionally result in more crimes

On the other hand, it’s crucial for companies to check that their cus tomers
aren’t giving money to these fraudsters. Businesses may have to repay money to
clients in order to keep their patronage, which puts a strain on them. Even
though firms have created and introduced numerous fraud detection programs,
only a small number of them are effective in identifying online payment fraud.
Although companies make every effort to make the payment method as secure
as possible, fraudsters occasionally manage to circumvent security measures
and commit these online payment scams. According to 1 studiesZanin et al.
(2018), from 2014 to 2017, the cumulative losses from fraudulent bank card
transactions increased globally. Other studiesKalbande et al. (2021) concentrate
on idea drift, which refers to the possibility of change in the dataset’s underlying
distribu tion over time. Similar to how cardholders or users may alter their
purchasing patterns over time, these fraudsters may modify their tactics. These
fraudsters are always aware of the customers’ payment methods and behavior,
but occasionally their tactics become outdated with time as some professionals
work round-the-clock to uncover these scams and shield people from them.
How Machine Learning Works in Fraud Detection
Machine learning leverages transaction data to distinguish between legitimate
and fraudulent activities. The process involves the following steps:
1. Data Collection:
o Transactions include details like payment amount, user location,
device information, and timestamps.
o Data is labeled as either "fraudulent" or "legitimate" for supervised
learning or left unlabeled for unsupervised learning.
2. Data Preprocessing:
o Handling Missing Data: Ensures consistency in features such as
location or device type.
o Feature Encoding: Converts categorical data like payment
methods into numerical values.
o Scaling: Normalizes transaction values for algorithms sensitive to
scale.
3. Training the Model:
o ML algorithms learn patterns from historical data, such as repeated
transactions from new locations or unusual spending habits.
o Common techniques include decision trees, random forests, and
deep learning models.
4. Model Prediction:
o In real-time systems, the trained model evaluates incoming
transactions and flags suspicious ones for further review.
Machine Learning Techniques for Fraud Detection
1. Supervised Learning:
o Uses labeled datasets to classify transactions.
o Popular algorithms include logistic regression, decision trees, and
gradient boosting.
o Example: Detecting fraudulent credit card transactions based on
past patterns.
2. Unsupervised Learning:
o Ideal for detecting unknown fraud types where no labeled data is
available.
o Algorithms like clustering (e.g., K-Means) and autoencoders
identify anomalies in transaction behavior.
o Example: Flagging a sudden spike in high-value transactions from
a single account.
3. Hybrid Approaches:
o Combine supervised and unsupervised learning to enhance fraud
detection.
o Example: Clustering is used to group similar transactions, followed
by classification models to label these clusters.
Methodology

When it comes to finding fraudulent online payment transactions, data analysis


is crucial. Banks and other financial institutions can adapt the required defences
against these frauds with the aid of machine learning techniques. Many
businesses and organiz ations are investing a lot of money in the development of
these machine learning systems to determine whether a specific transaction is
fraudulent. Machine learning techniques assist these organizations in identifying
frauds and preventing their clients who may be at risk for such frauds and
occasionally sustain losses as a result. The research’s data set came from the
open platform ”kaggle.” Due to privacy concerns, it is challenging to obtain
real-time data sets; therefore, a data collection big enough to conduct the
research was taken. The data set has 1048576 records and 11 columns. This data
set includes attributes like type (type of payment), amount, ”nameOrig”
(customer initiating the transaction), ”pldbalance- Org” (balance before the
transaction), ”new balanceOrig” (balance after the transaction), ”nameDest”
(recipient of the transaction), 5 ”pldbalanceDest” (initial recipient balance prior
to the transaction), ”newbalanceDest” (the new balance recipient after the
transaction), and isFraud which (0 if the transaction is legitimate and 1 if the
transaction is fraudu- lent). The figure2. shows all the features in the dataset.
Whether a particular transaction is fraudulent or not depends highly on the type
of the transaction.
Figure2:Dataset
Data Preparation:
For the machine learning model to give accurate, high-quality results, the data
used to train and test it should be well-prepared. One of the most important
steps in data mining is getting the data ready. There are many things to consider,
such as how to deal with missing data, duplicate values, removing redundant
features from data using correlation matrix and feature selection methods, how
to deal with the fact that data isn’t balanced, etc. The quality of the techniques
used to prepare the data has a lot to do with how well machine learning works.
If the data is not prepared well, it could take a long time to run the models and
cost a lot of money. Because of all of these things, the most difficult and time-
consuming part of the data mining process is getting the data ready
Feature Selection
Feature selection is one of the approaches that helps models perform even
better after data cleansing and feature correlation analysis. This method is used
to eliminate unnecessary variables, which leads to a smaller feature space and
could improve the performance of the model. In our dataset two features
”namedest” and ”nameOrig” were of less significance as compared to other
features, however to compare the same we will be running the models without
these features and then including these two features.
Handling Class Imbalance
One of the key issues in the field of fraud detection is the class imbalance,
which was covered in the sections above. Algorithms for machine learning are
created to work optim ally when taught on sufficient examples from both
classes. The performance of machine learning models is vulnerable to skewed
outcomes and overfitting due to the rarity of fraud transactions within the
overall data. For the class samples with lower representation, this could lead to
incorrect classification. There are a number of sampling approaches that can
help with this problem, each with its own set of benefits and drawbacks.
Modelling Approach
Modelling is a very important aspect in machine learning. After the final data
pre paration, which includes steps like handling the class imbalance and feature
selection, the proposed models are implemented on the processed or prepared
data. The detailed explanation and working of the proposed models are
discussed in this section
Logistic Regression
Logistic Regression is the classification of algorithm into multiple categorical
values. It includes the use of multiple independent variables which are used to
predict a particular outcome of a variable which is dependent on all the
independent variables use to train the model. Logistic Regression is similar to
linear regression, it predicts a target field rather than a numeric one Zanin et al.
(2018). Like predicting True or False, successful or unsuccessful in our case it is
fraudulent or non fraudulent. The figure below explains the logistic regression

Random Forest Classifier


The random forest model is made up of many decision trees that are all put
together to solve classification problems. It uses methods like feature
randomization and bagging to build each tree. This makes a forest of trees that
don’t have anything in common with each other. Every tree in the forest is based
on a basic training sample, and the number of trees in the forest has a direct
impact on the results.Bahnsen et al. (2016) Tsest fdfdf 3.5.3 Decision

You might also like