Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement
Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement
Good and Bad Customers For Granting Credit: Genpact Data Science Prodegree Logistic Regression: Problem Statement
Data Description
Customer ID: Unique identification of each customer
Credit_Amount: Credit amount in dollars
Gender: 1=Male, 2=Female
Academic_Qualification: 1=Undergraduate, 2=Graduate, 3=Postgraduate,
4=Professional, 5=Others, 6=Unknown
Marital:1=Married, 2=Single, 3=Do not prefer to say
Age_Years: Age in years
Repayment_Status_Jan: Repayment status in Jan (0=Paid on time, 1=Payment delay
for one month, 2=Payment delay for two months, ... 6=Payment delay for six
months)
Repayment_Status_Feb: Repayment status in Feb (Scale same as above)
Repayment_Status_March: Repayment status in March (Scale same as above)
Repayment_Status_April: Repayment status in April (Scale same as above)
Repayment_Status_May: Repayment status in May (Scale same as above)
Repayment_Status_June: Repayment status in June (Scale same as above)
Jan_Bill_Amount: Amount of bill statement in Jan (In dollars)
Feb_Bill_Amount: Amount of bill statement in Feb (In dollar)
March_Bill_Amount: Amount of bill statement in March (In dollar)
April_Bill_Amount: Amount of bill statement in April (In dollar)
May_Bill_Amount: Amount of bill statement in May (In dollar)
June_Bill_Amount : Amount of bill statement in June (In dollar)
Previous_Payment_Jan: Amount of previous payment in Jan (In dollar)
Previous_Payment_Feb: Amount of previous payment in Feb (In dollar)
Previous_Payment_March: Amount of previous payment in March (In dollar)
Previous_Payment_April: Amount of previous payment in April (In dollar)
Previous_Payment_May: Amount of previous payment in May (In dollar)
Previous_Payment_June: Amount of previous payment in June (In dollar)
Default_Payment: Default payment of next month (1=yes, 0=no)
Evaluation Parameters
Data Preparation
Analyze the data statistically and treat the multicollinear variables.
Model Comparison
Apply logistic regression algorithms for every change made in the datasets and compare
results.
Model Selection
Select the best model. Model selection to be based on Accuracy, Sensitivity & Specificity and
area under the ROC curve.
Expected Outcome
Higher accuracy in predicting the outcome using test data.