Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 2

Genpact Data Science Prodegree

Logistic Regression: Problem Statement

Good and Bad Customers for Granting Credit


Problem Statement
The banks with the intent of credit card were more focused on the number of customers
using their credit service but the drawback of them not being able to pay back the credit in
time was an issue that soon followed, a system was in need to effectively decide the credit
limit to be allowed to a person based on his previous credit history. You will learn how to
apply Logistic Regression to credibility of the customer. Also learn how to evaluate Logistic
Regression model using various parameter like on Accuracy, Sensitivity, Specificity and
area under the ROC curve.
Build a classification model using logistic regression to predict the credibility of the
customer, in order to minimize the risk and maximize the profit of a bank.

Data Description
 Customer ID: Unique identification of each customer
 Credit_Amount: Credit amount in dollars
 Gender: 1=Male, 2=Female
 Academic_Qualification: 1=Undergraduate, 2=Graduate, 3=Postgraduate,
4=Professional, 5=Others, 6=Unknown
 Marital:1=Married, 2=Single, 3=Do not prefer to say
 Age_Years: Age in years
 Repayment_Status_Jan: Repayment status in Jan (0=Paid on time, 1=Payment delay
for one month, 2=Payment delay for two months, ... 6=Payment delay for six
months)
 Repayment_Status_Feb: Repayment status in Feb (Scale same as above)
 Repayment_Status_March: Repayment status in March (Scale same as above)
 Repayment_Status_April: Repayment status in April (Scale same as above)
 Repayment_Status_May: Repayment status in May (Scale same as above)
 Repayment_Status_June: Repayment status in June (Scale same as above)
 Jan_Bill_Amount: Amount of bill statement in Jan (In dollars)
 Feb_Bill_Amount: Amount of bill statement in Feb (In dollar)
 March_Bill_Amount: Amount of bill statement in March (In dollar)
 April_Bill_Amount: Amount of bill statement in April (In dollar)
 May_Bill_Amount: Amount of bill statement in May (In dollar)
 June_Bill_Amount : Amount of bill statement in June (In dollar)
 Previous_Payment_Jan: Amount of previous payment in Jan (In dollar)
 Previous_Payment_Feb: Amount of previous payment in Feb (In dollar)
 Previous_Payment_March: Amount of previous payment in March (In dollar)
 Previous_Payment_April: Amount of previous payment in April (In dollar)
 Previous_Payment_May: Amount of previous payment in May (In dollar)
 Previous_Payment_June: Amount of previous payment in June (In dollar)
 Default_Payment: Default payment of next month (1=yes, 0=no)
Evaluation Parameters

Confidential and restricted. Do not distribute. (c) Imarticus Learning 1


Genpact Data Science Prodegree
Logistic Regression: Problem Statement
Evaluation will be based on:
 Data Preparation
 Model Comparison
 Model Selection

Data Preparation
Analyze the data statistically and treat the multicollinear variables.

Model Comparison
Apply logistic regression algorithms for every change made in the datasets and compare
results.

Model Selection
Select the best model. Model selection to be based on Accuracy, Sensitivity & Specificity and
area under the ROC curve.

Expected Outcome
Higher accuracy in predicting the outcome using test data.

Confidential and restricted. Do not distribute. (c) Imarticus Learning 2

You might also like