Logistic Regression
Logistic Regression
Logistic Regression
History of R
Logistic regression is a type of regression analysis used for predicting the outcome of a Categorical dependent
variable (a dependent variable that can take on a limited number of categories) based on one or more predictor
variables (Continuous, Ordinal or Categorical).
Binary Logistic Regression has Outcome or Dependent Variable: Binary or Dichotomous i.e. 0 or 1
Example:
Logistic regression measures the relationship between a categorical dependent variable and independent variable (or
several), by converting the dependent variable to probability scores (P)
Probability score signifies the probability of event happening, for example probability of a customer to churn or
respond to a campaign
How is Logistic Regression different from Linear Regression
In Linear regression, the outcome variable is continuous and the predictor variables can be a mix of numeric and
categorical. But often there are situations where we wish to evaluate the effects of multiple explanatory variables on
a binary outcome variable
For example, the effects of a number of factors on the development or otherwise of a disease. A patient may be
cured or not; a prospect may respond or not, should we grant a loan to particular person or not, etc.
When the outcome or dependent variable is binary, and we wish to measure the effects of several independent
variables on it, we uses Logistic Regression
Probability of each observation will not be linearly distributed but more like sigmoid function i.e. values would be
closer to 0 and 1.
The binary outcome variable can be coded as 0 or 1.
The logistic curve is shown in the figure below:
Sigmoid Function
Concept of Sigmoid Function in Logistic Regression
The sigmoid function is a bounded function.
If
log of odds:
ln( p / 1 p) a bx
This is also called as a logit function
The estimation of parameters is done using Maximum Likelihood Estimate(for Non Linear
distribution) unlike Linear regression where method of Ordinary Least square is used.
Odds Ratio
Odds is calculated as P(Y = 1)/P(Y = 0)
Odds > 1 if Y = 1 is more likely
Odds < 1 if Y = 0 is more likely
β0 = -1.5, β1 = 3, β2 = -0.5
x1 = 1, x2 = 5
What is the value of the Logit for this observation? Recall that the Logit is
log(Odds)
What is the value of the Odds for this observation? Note that you can
compute e^x, for some number x, in your R console by typing exp(x). The
function exp() computes the exponential of its argument
Response to an Subscriber
Churning up of
E-mail conversion after
subscribers
Campaign a Campaign
Cross Sell
Application Behavioral
Up Sell
Risk Model Risk Model
Model
FALSE TRUE
0 1069 6
1 187 11
What is the sensitivity of our logistic regression model on the test set,
using a threshold of 0.5?
What is the specificity of our logistic regression model on the test set,
using a threshold of 0.5?
Thank You