Bankruptcy Prevention Project

TEAM MEMBER
MR.GAURAV PAWAR
MR.ANIKET PRABHALE
MS.AYUSHI
CONTENT
>Business Objective
>Project Architecture
>Data Collection and Details
>Exploratory Data Analysis
>Visualization
>Modeling
>Evaluation
>Deployment
FLOW CHART
Business Problem :
 Business Companies goes Bankrupt
Business Objective :
 This is a classification project, since the variable to predict is binary (bankruptcy

or non- bankruptcy).
 The goal here is to model the probability that a business goes bankrupt from
different features.
DATASET DETAILS :
The data file contains 7 features about 250 companies.
 Industrial risk : 0=low risk, 0.5=medium risk, 1=high risk.
 Management risk : 0=low risk, 0.5=medium risk, 1=high risk.
 financial flexibility: 0=low flexibility, 0.5=medium flexibility, 1=high flexibility.
 credibility: 0=low credibility, 0.5=medium credibility, 1=high credibility.
 competitiveness: 0=low competitiveness, 0.5=medium competitiveness, 1=highcompetitiveness
 Operating risk : 0=low risk, 0.5=medium risk, 1=high risk.
 Class: bankruptcy, non-bankruptcy (target variable).

Exploratory Data Analysis
(EDA)
 Industrial_risk column has 1.0 = 89 or 0.5 = 81 or 0.0 = 80 unique values
 Management_risk column has 1.0 = 119 or 0.5 = 69 or 0.0 = 62 unique values
 Financial flexibility column has 1.0 = 57 or 0.5 = 74 or 0.0 = 119 unique values
 Credibility column has 1.0 = 79 or 0.5 = 77 or 0.0 = 94 unique values
 Competitiveness column has 1.0 = 91 or 0.5 = 56 or 0.0 = 103 unique values
 Operating_risk column has 1.0 = 114 or 0.5 = 57 or 0.0 = 79 unique values
 Class Column has Bankruptcy 107 unique items or Non-bankruptcy 143 unique items.
Data set Information : Feature Of Interest:

No. of Columns: 07 1. Independent Variable, X=6 Features
No. of Records: 250 2. Dependent Variable, y = class
Data Set Information
Visualization of missing values:
Checking the Missing Values
 data.isnull.sum()
There is no missing values in the dataset.

Count Plot
 # Most of the industrial risk count is
equal to 80 or above 80 high risk
count in industrial risk

Count of Management risk is high is
equal to 120 and low and medium count
is betwwen 60-70

Most of financial fexibility is low
count

credibility is almost similar in
low,medium and high

In our dataset most of data
competitiveness is low or high

In data opertaing risk is high

#as we can see in our data non-
bankruptcy has a high count
Correlation Matrix
industrial risk and management risk is
mostly correlated with each other
financial flexibility is highly correlated

with competitivness and credibility
similarly, competitivness is correlated

with financial flexibility and credibility
similar for credibility
operating risk is correlated with

industrial risk and management risk
Model building
We use 80% for traing
and 20% for testing
1. Logistic Regression• The

logistic regression is also known
in the literature as logit
regression, maximum-entropy
classification (MaxEnt) or the
log-linear classifier. In this
model, the probabilities
describing the possible
outcomes of a single trial are
modeled using a logistic
function.
2. • Logistic regression is
commonly used for prediction
and classification problems
DECISION TREE
A decision tree is a non-parametric

supervised learning algorithm, which is
utilized for both classification and regression
tasks. It has a hierarchical, tree structure,
which consists of a root node, branches,
internal nodes and leaf nodes.
Random Forest
• Random forest is a commonly-

used machine learning algorithm
which combines the output of
multiple decision trees to reach a
single result. Its ease of use and
flexibility have fueled its adoption,
as it handles both classification and
regression problems.
• Since the random forest model is

made up of multiple decision trees,
it would be helpful to start by
describing the decision tree
algorithm briefly.
MODLE
DEPLOYMENT ON
STREAMLIT
From the above try multiple model

but random forest giving good
accuracy, So we can use randam
forst model for deployment
THANK YOU

Bankruptcy Prevention Project

Uploaded by

Copyright:

Available Formats

Bankruptcy Prevention Project

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Bankruptcy Prevention Project

Uploaded by

Copyright:

Available Formats

TEAM MEMBER

 Business Companies goes Bankrupt

 This is a classification project, since the variable to predict is binary (bankruptcy

 Industrial risk : 0=low risk, 0.5=medium risk, 1=high risk.

 Management risk : 0=low risk, 0.5=medium risk, 1=high risk.

 financial flexibility: 0=low flexibility, 0.5=medium flexibility, 1=high flexibility.

 credibility: 0=low credibility, 0.5=medium credibility, 1=high credibility.

 competitiveness: 0=low competitiveness, 0.5=medium competitiveness, 1=highcompetitiveness

 Operating risk : 0=low risk, 0.5=medium risk, 1=high risk.

 Class: bankruptcy, non-bankruptcy (target variable).

 Management_risk column has 1.0 = 119 or 0.5 = 69 or 0.0 = 62 unique values

 Credibility column has 1.0 = 79 or 0.5 = 77 or 0.0 = 94 unique values

 Competitiveness column has 1.0 = 91 or 0.5 = 56 or 0.0 = 103 unique values

 Operating_risk column has 1.0 = 114 or 0.5 = 57 or 0.0 = 79 unique values

Data set Information : Feature Of Interest:

There is no missing values in the dataset.

financial flexibility is highly correlated

similarly, competitivness is correlated

similar for credibility

operating risk is correlated with

1. Logistic Regression• The

A decision tree is a non-parametric

• Random forest is a commonly-

• Since the random forest model is

From the above try multiple model

You might also like