Logistic Regression

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 21

Logistic Regression

Introduction

Logistic regression is a mathematical modeling approach that can be


used to describe the relationship of one or several independent variables
(X’s) to a dichotomous dependent variable (D).

Logistic regression is by far the most popular modeling procedure used


to analyze epidemiologic data when the illness measure is dichotomous

Logistic regression shares the same objective as linear regression.


Logistic and linear regression are statistical methods used to predict the
dependent variable based on one or more independent variables

In Logistic regression, dependent variable is a dichotomous variable

1
Logistic Regression

Some Pre-Requisite Concepts

2
Logistic Regression

Odds

3
Logistic Regression

Odds: Examples
Let’s say that the probability of success in a random experiment
is .8, thus p = .8
Then the probability of failure is q = 1 – p = .2
Odds are determined from probabilities and range between 0 and
infinity
odds(success) = p/(1-p) or p/q = .8/.2 = 4, (or 4:1)
that is, the odds of success are 4 to 1. which means 4 times out
of 5 it will be a success.
odds(failure) = q/p = .2/.8 = .25 (or 1:4)
Which means the odds of failure are 1 to 4
Odds greater than 1 indicates success is more likely than failure
Odds less than 1 indicates failure is more likely than success.
4
Logistic Regression

Relationship between Probability and Odds


Odds range from 0 to positive infinity while the probability ranges from
0 to 1

5
Logistic Regression

Odds and Odds Ratio : Examples 2

 Suppose that seven out of 10 males are admitted to an


engineering school while three of 10 females are admitted. The
probabilities for admitting a male are,
p = 7/10 = .7 q = 1 – .7 = .3

If you are male, the probability of being admitted is 0.7 and the
probability of not being admitted is 0.3.
Here are the same probabilities for females,
p = 3/10 = .3 q = 1 – .3 = .7
If you are female it is just the opposite, the probability of being
admitted is 0.3 and the probability of not being admitted is 0.7.
6
Logistic Regression

Odds: Examples 2

 Now we can use the probabilities to compute the odds of


admission for both males and females,
odds(male) = .7/.3 = 2.33333
odds(female) = .3/.7 = .42857

Next, we compute the odds ratio for admission,


OR = 2.3333/.42857 = 5.44

Thus, for a male, the odds of being admitted are 5.44 times
larger than the odds for a female being admitted.

7
Logistic Regression

Odds: Examples 2

8
Logistic Regression

Logit

9
Logistic Regression

Logit Example

10
Logistic Regression

Relationship between probability, odds and logit

1. (logit =0) = (odds =1) = (probability =0.50)


2. (logit <0) = (odds <1) = (probability <0.50)
3. (logit >0) = (odds >1) = (probability >0.50)

11
Logistic Regression

The Logit function; A graph between p and ln(p/(1-p))

12
Logistic Regression

Logistic Regression Concept

Model the probability of an event rather than measuring it.

The event may depend on a number of factors (independent


variables). The variables may be qualitative or quantitative

The probability ranges between 0 and 1

There is a need to create a transformation from independent


variables into probability and then into classes.

This is exactly what the logit (logic transformation) function


does. 13
Logistic Regression

14
Simple Logistic Regression

We shall discuss a simple example of Logistic regression with


one independent variable and the dependent variable has two
possible values (yes /no)

This is quite similar to simple linear regression with one


independent variable, but the difference is that in simple linear
regression the dependent variable is quantitative while in
logistic regression, it is qualitative.

Our other goal, in this example is to predict the probability of


getting a particular value of the dependent variable, given the
independent variable.
Simple Logistic Regression
A study was conducted in Japan, to
investigate the presence of an
endangered specie of Spider on the
beaches of Japan, given, the grain size
of the sand.

The dataset is shown in the table.

One goal of this study was to determine


whether there was a relationship
between sand grain size and the
presence or absence of the species
Simple Logistic Regression
Simple logistic regression finds the equation that best predicts
the value of the Y variable for each value of the X variable. 

What makes logistic regression different from linear regression


is that you do not measure the Y variable directly; it is instead
the probability of obtaining a particular value of a nominal
variable. For the spider example, the values of the nominal
variable are "spiders present" and "spiders absent.

The Y variable used in logistic regression would then be the


probability of spiders being present on a beach. This
probability could take values from 0 to 1.
Simple Logistic Regression

The Y variable used in logistic regression would then be the


probability of spiders being present on a beach. This
probability could take values from 0 to 1.

he limited range of this probability would present problems if


used directly in a regression, so the odds, Y/(1−Y), is used
instead.
Simple Logistic Regression

Taking the natural log of the odds makes the variable more
suitable for a regression, so the result of a logistic regression is
an equation that looks like this:
ln[Y/(1−Y)]=a+bX

You find the slope (b) and intercept (a) of the best-fitting
equation in a logistic regression using the maximum-likelihood
method, rather than the least-squares method you use for linear
regression. 
Simple Logistic Regression
For the spider example, the equation is:

ln[Y/(1−Y)]=−1.6476+5.1215(grain size)

Rearranging to solve for Y (the probability of spiders on a beach)


yields:

Y=e−1.6476+5.1215(grain size)/(1+e−1.6476+5.1215(grain size))

In order to predict the probability that spiders would live there, you
could measure the sand grain size, plug it into the equation, and get
an estimate of Y, the probability of spiders being on the beach.
Simple Logistic Regression
Furthermore, prediction about the presence or absence may be
made easily by making a decision based on a threshold (say 0.5)
on probability.

For example if the value of the probability is greater than 0.5, the
prediction is that for the given grain size, the spiders exist on the
beach, otherwise not.

You might also like