EDA Assignment Summary PDF
EDA Assignment Summary PDF
EDA Assignment Summary PDF
By :- Shikha
The Given data hasinformationontheloanapplication.The data given below
contains the information about pastloan applicants and whether they
‘defaulted’ or not.
The main objective of the analysis was to determine the conditions and situations that
leads to an applicant being charged off or default.
For this task, we need to predict the “loan_status” column of the dataset which specifies
the status of the loan .
The datasethadinitially 111 columns with 39716 entries.
Consumer finance company-largest online loan marketplace, facilitating personal loans,
business loans, and financing of medical procedures.
2 risks associated with banks decision to approve loans.
1. loss of likely to repay the loan, then not approving the loan results in a loss of business to
the company
2. not likely to repay the loan, then approving the loan may lead to a
financial loss for the company
This task will be done by using univariate and bivariate analysis of different
columns of the dataset.
So, naturally the first step would be to reduce these to a sizable quantity.
Now, of the 28 columns we need to find the ones which affect
the target variable ‘loan_status’. We’ll do this by comparing it
with other columns and by analyzingeach of these columns
on their own.
On analyzingour ‘loan_status’ column we find that the overall
default rate is up to 14%
To start things off, let’s look at all the categorical columns
first.
We will plot them against our target variable ‘loan_status’.
It can be clearly seen that the risk of loan increases as we go
from grade A to G, which is expected because of LC
guidelines of assigning the grade.
From this it can be observed
that loans of 60 months term
tend to default more than 36
months term loans.
Plotting the purpose of loans
shows that small business, debt
consolidation, educational and
renewable energy loans default
more than any other category.
After analyzingcategorical
variables let’s now move on to
continuous variables. We will bin
these variables into different
categories to plot them better.