(BA ZG524/MBA ZG538/PDBA ZG538) Advanced Statistical Methods Lecture No: 11 (13-04-24)

BITS Pilani
Pilani Campus
[BA ZG524/MBA ZG538/PDBA

ZG538] Advanced Statistical
Methods
Lecture No : 11[13-04-24]
Logistic regression is a statistical analysis to predict a binary outcome, such as
yes or no, based on prior observations of a data set( on independent
variables)
For example, a logistic regression could be used to predict whether a political
candidate will win or lose an election or whether a high school student will
pass the exam or not pass the exam.
These binary outcomes allow straightforward decisions between two
alternatives.
BITS Pilani, Pilani Campus

Logistic Regression
Equation
If the DV Y are coded as 0(or)1, the value of E(Y) in the equation
given below provides the probability that Y=1 given set of
Independent variables.

To get better understanding on the logistic regression equation.

Example :
Let us consider an application of Logistic Regression involving direct mail

promotion being used by Simmons stores.
Simmons owns and Operates a national chain of women’s apparel stores
5000 copies of an expensive 4 color sales catalog have been printed, and each
catalog includes a coupon that provides a $50 discount on purchases of
$200 (or) more
The catalogs are expensive and Simmons would like to send them to only
those customers who have the highest probability of using the coupon.
Source : David R Anderson, Dennis J Sweeney, Thomas A Williams, Jeffrey D.
Camm and James J. Cochran, Statistics for Business and Economics.
Twelfth edition. Cengage Learning. 2014.[Page nos 771-779]

Variables
• Management thinks that annual spending at Simmon stores
and whether a customer has a Simmons credit card are two
variables that might be helpful in predicting whether a
customer who receives the catalog will use the coupon.
• Simmons conducted a pilot study using a random sample of
50 to customers who have a Simmons credit card and 50 to
customers who do not have the card.
• Sent the catalog to each of 100 customers
• At the end, Simmons noted whether the customer used the
coupon or not.

Dataset
The data is available in simmons.csv(webfile)
Source : Simmons data file

Explanation of Variables
The amount each customer spent last year at Simmons is shows

in thousands of dollars and the credit card information has
been coded as 1 if customer has Simmons credit card and 0 if
not.
In the Coupon column, a 1 recorded if the sampled customer
used the coupon and 0 if not.

Estimating the Logistic
Regression Equation
Output and Interpretation:

>LR<-read.csv(file.choose(),header=TRUE)
>LR
> fit <-glm(Y~X1+X2, data=LR, family="binomial")
> summary(fit)
Observe the output and interpret.

• Call:
• glm(formula = Coupon ~ Spending + Card, family = "binomial",
• data = buy)
• Deviance Residuals:
• Min 1Q Median 3Q Max
• -1.6839 -1.0140 -0.6503 1.1216 1.8794
• Coefficients:
• Estimate Std. Error z value Pr(>|z|)
• (Intercept) -2.1464 0.5772 -3.718 0.000201 ***
• Spending 0.3416 0.1287 2.655 0.007928 **
• Card 1.0987 0.4447 2.471 0.013483 *
• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
• (Dispersion parameter for binomial family taken to be 1)
• Null deviance: 134.60 on 99 degrees of freedom
• Residual deviance: 120.97 on 97 degrees of freedom
• AIC: 126.97
• Number of Fisher Scoring iterations: 4
Tests of Significance :

• Call:
• glm(formula = Coupon ~ Spending + Card, family = "binomial",
• data = buy)
• Deviance Residuals:
• Min 1Q Median 3Q Max
• -1.6839 -1.0140 -0.6503 1.1216 1.8794
• Coefficients:
•
•
Estimate
(Intercept) -2.1464
Std. Error
0.5772
z value
-3.718
Pr(>|z|)
0.000201 ***
Discuss on overall
•
•
Spending
Card
0.3416
1.0987
0.1287
0.4447
2.655
2.471
0.007928 **
0.013483 *
significance
• Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Individual significance
•
•
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 134.60 on 99 degrees of freedom
Conduct a test of significance
• Residual deviance: 120.97 on 97 degrees of freedom
using the G Statistic[Chi-
Square test statistic]. Use l.o.s
0.05
We described how to develop the estimated logistic regression
equation and how to test it for significance.
Let us now use it on how decision recommendations.
? How can Simmons use this information to better target

customers for the new promotion.
Suppose Simmons wants to send the promotional catalog only to
customers who have a 0.40 (or) higher probability of using the
coupon.
Using the above table his promotion strategy is :
Customers who have a Simmon credit card : send the catalog to
every customers who spends $2000 or more last year
Customers who do not have a Simmons credit card : Send the
copy every customer who spends $6000 or more last year.
With logistic regression, it is difficult to interpret the
relation between the independent variables and the
probability that y=1 directly ?
However, Statisticians have shown the relationship can
be interpreted indirectly using a concept called the
odds ratio.

The odds in favor of an event occurring is defined as the
probability that the event will occur divided by the probability
the event will not occur.
Note:
In logistic regression the event of interest is always y=1
Given particular set of values for independent variables
The odds in favor y=1 can be computed as:

The odds ratio measures
the impact on the odds of a one unit increase in only one of
the independent variables.
i.E
The odds ratio is the odds that y=1 given that one of the
independent variables has been increased by one
unit(odds1) divided by the odds that y=1 given no change
in the values for the independent variables(odds0)

Interpretation
Further, suppose we want to compare the odds of using the

coupon for customers who spend $2000 and have Simmons
credit card(X1=2,X2=1) to the odds of using the coupon for
customers who spends $2000 annually and do not have a
Simmons credit card(X1=2,X2=0)
“ We are interested in interpreting the effect of a one one-unit

increase in the I.V X2

Conclusion:
“ The estimated odds in favor of using the coupon for customers who spent
$2000 last year and have credit card are 3 times greater than that the
estimated odds in favor of using the coupon for customers who spent
$2000 and do not have credit card”

Note:
The odds ratio for each independent variable is computed while holding
all the other independent variables as constant.
But it does not matter what constant values are used for the other IVs
For instance, if we computed the odds ratio for scc variable(X2) using 3,000
instead of 2000,as the value for the annual spending variable(X 1), we
would still obtain the same odds ratio(3.00)
“ Thus, we can conclude that the estimated odds in favour of using the
coupon for customers who have a credit card are 3 times greater that the
estimated odds in favor of using the coupon for customers who do not
have credit card”

Call:
glm(formula = Coupon ~ Spending + Card, family = "binomial",
data = buy)
Deviance Residuals:
Min 1Q Median 3Q Max
-1.6839 -1.0140 -0.6503 1.1216 1.8794
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.1464 0.5772 -3.718 0.000201 ***
Spending 0.3416 0.1287 2.655 0.007928 **
Card 1.0987 0.4447 2.471 0.013483 *
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be 1)
Residual deviance: 120.97 on 97 degrees of freedom

BITS Pilani, Deemed to be University under Section 3 of UGC Act, 1956
• In this day and age, researchers find
themselves with dozens or even hundreds of
different variables entering into their analyses.
• Whenever the size of the data set becomes
unwieldly(interms of the number variables), the
process is further complicated by the fact that
there is often substantial redundancy among
dimensions, leading to high levels of
correlation and multicollinearity.

LDA Concept

Objectives:
PCA finds most accurate data representation in a
lower dimensional space by projecting data in the
direction of Max variance
LDA find projection to a line such that samples from

different classes are well separated.

Q. Referring to the Simmons stores example introduced in this section. The DV is coded as y=1 if the
customer used the coupon and 0 if not. Suppose that the only information available to help
predict whether the customer will use the coupon is the customers credit card status, coded as
x=1 if the customer has Simmons credit card and x=0 if not.
1.Write the logistic regression equation relating x to y
2. What is the estimated odds ratio and its interpretation
3. Conduct a test of significance using the G Statistic[Chi-Square test statistic]. Use l.o.s 0.05
Call:
glm(formula = Y ~ X1, family = "binomial", data = LR)
Deviance Residuals:
-1.2116 -0.8106 -0.8106 1.1436 1.5956
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.9445 0.3150 -2.999 0.00271 **
X1 1.0245 0.4235 2.419 0.01555 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
(Dispersion parameter for binomial family taken to be

1)


Call:
glm(formula = Y ~ ., family = "binomial", data = LR)
Deviance Residuals: Odds ratio Interpretation:

-2.4174 -0.7444 -0.5674 0.8416 1.9893
Coefficients: “Estimated odds for signing up for

Estimate Std. Error z value Pr(>|z|) payroll direct deposit for customers
(Intercept) -2.63348 0.79851 -3.298 0.000974 *** that have an average monthly
X1 0.22018 0.09001 2.446 0.014441 *
---
balance of $600 is 1.2463 times
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 greater than estimated odds for
signing up for payroll direct deposit
(Dispersion parameter for binomial family taken to be 1) for customers that have an
average monthly balance of $500.

##########
Factor analysis is a way to take a mass of data and
shrinking it to a smaller data set that is more manageable
and more understandable. It’s a way to find hidden
patterns.
It is also used to create a set of variables for similar items
in the set and label them.
It can be a very useful tool for complex sets of data
involving psychological studies, socioeconomic status and
other involved concepts.

##########
Dataset : L1

Interpretation :
1.
2.

Datafile : L
>L<-read.csv(file.choose(),header=TRUE)
>datpca=prcomp(L, center=TRUE, scale=TRUE)
>summary(datpca)
R output :
Importance of components PC1 PC2
SD 1.4097 0.11304
Prop of var 0.99 0.01
Cum Prop 0.99 1.00

(BA ZG524/MBA ZG538/PDBA ZG538) Advanced Statistical Methods Lecture No: 11 (13-04-24)

Uploaded by

Copyright:

Available Formats

(BA ZG524/MBA ZG538/PDBA ZG538) Advanced Statistical Methods Lecture No: 11 (13-04-24)

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

(BA ZG524/MBA ZG538/PDBA ZG538) Advanced Statistical Methods Lecture No: 11 (13-04-24)

Uploaded by

Copyright:

Available Formats

BITS Pilani

[BA ZG524/MBA ZG538/PDBA

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Let us consider an application of Logistic Regression involving direct mail

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

The data is available in simmons.csv(webfile)

Source : Simmons data file

BITS Pilani, Pilani Campus

The amount each customer spent last year at Simmons is shows

BITS Pilani, Pilani Campus

Output and Interpretation:

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

? How can Simmons use this information to better target

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

Further, suppose we want to compare the odds of using the

“ We are interested in interpreting the effect of a one one-unit

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

LDA find projection to a line such that samples from

BITS Pilani, Pilani Campus

(Dispersion parameter for binomial family taken to be

Null deviance: 134.60 on 99 degrees of freedom

BITS Pilani, Pilani Campus

Deviance Residuals: Odds ratio Interpretation:

Coefficients: “Estimated odds for signing up for

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

BITS Pilani, Pilani Campus

You might also like