Logistic SPSS (pg1 14)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Logistic-SPSS.

docx
Binary Logistic Regression with SPSS



Logistic regression is used to predict a categorical (usually dichotomous) variable from a set of
predictor variables. With a categorical dependent variable, discriminant function analysis is usually
employed if all of the predictors are continuous and nicely distributed; logit analysis is usually
employed if all of the predictors are categorical; and logistic regression is often chosen if the predictor
variables are a mix of continuous and categorical variables and/or if they are not nicely distributed
(logistic regression makes no assumptions about the distributions of the predictor variables). Logistic
regression has been especially popular with medical research in which the dependent variable is
whether or not a patient has a disease.
For a logistic regression, the predicted dependent variable is a function of the probability that a
particular subject will be in one of the categories (for example, the probability that Suzie Cue has the
disease, given her set of scores on the predictor variables).
Description of the Research Used to Generate Our Data
As an example of the use of logistic regression in psychological research, consider the
research done by Wuensch and Poteat and published in the Journal of Social Behavior and
Personality, 1998, 13, 139-150. College students (N = 315) were asked to pretend that they were
serving on a university research committee hearing a complaint against animal research being
conducted by a member of the university faculty. The complaint included a description of the
research in simple but emotional language. Cats were being subjected to stereotaxic surgery in
which a cannula was implanted into their brains. Chemicals were then introduced into the cats brains
via the cannula and the cats given various psychological tests. Following completion of testing, the
cats brains were subjected to histological analysis. The complaint asked that the researcher's
authorization to conduct this research be withdrawn and the cats turned over to the animal rights
group that was filing the complaint. It was suggested that the research could just as well be done
with computer simulations.
In defense of his research, the researcher provided an explanation of how steps had been
taken to assure that no animal felt much pain at any time, an explanation that computer simulation
was not an adequate substitute for animal research, and an explanation of what the benefits of the
research were. Each participant read one of five different scenarios which described the goals and
benefits of the research. They were:
COSMETIC -- testing the toxicity of chemicals to be used in new lines of hair care products.
THEORY -- evaluating two competing theories about the function of a particular nucleus in the
brain.
MEAT -- testing a synthetic growth hormone said to have the potential of increasing meat
production.
VETERINARY -- attempting to find a cure for a brain disease that is killing both domestic cats
and endangered species of wild cats.
MEDICAL -- evaluating a potential cure for a debilitating disease that afflicts many young adult
humans.
After reading the case materials, each participant was asked to decide whether or not to
withdraw Dr. Wissens authorization to conduct the research and, among other things, to fill out D. R.
Forysths Ethics Position Questionnaire (Journal of Personality and Social Psychology, 1980, 39, 175-
184), which consists of 20 Likert-type items, each with a 9-point response scale from completely

Copyright 2014 Karl L. Wuensch - All rights reserved.


2
disagree to completely agree. Persons who score high on the relativism dimension of this
instrument reject the notion of universal moral principles, preferring personal and situational analysis
of behavior. Persons who score high on the idealism dimension believe that ethical behavior will
always lead only to good consequences, never to bad consequences, and never to a mixture of good
and bad consequences.
Having committed the common error of projecting myself onto others, I once assumed that all
persons make ethical decisions by weighing good consequences against bad consequences -- but for
the idealist the presence of any bad consequences may make a behavior unethical, regardless of
good consequences. Research by Hal Herzog and his students at Western Carolina has shown that
animal rights activists tend to be high in idealism and low in relativism (see me for references if
interested). Are idealism and relativism (and gender and purpose of the research) related to attitudes
towards animal research in college students? Lets run the logistic regression and see.
Using a Single Dichotomous Predictor, Gender of Subject
Let us first consider a simple (bivariate) logistic regression, using subjects' decisions as the
dichotomous criterion variable and their gender as a dichotomous predictor variable. I have coded
gender with 0 = Female, 1 = Male, and decision with 0 = "Stop the Research" and 1 = "Continue the
Research".
Our regression model will be predicting the logit, that is, the natural log of the odds of having
made one or the other decision. That is,
( ) bX a
Y
Y
ODDS + =

-
=

ln ln , where Y

is the predicted probability of the event which is


coded with 1 (continue the research) rather than with 0 (stop the research), Y

1- is the predicted
probability of the other decision, and X is our predictor variable, gender. Some statistical programs
(such as SAS) predict the event which is coded with the smaller of the two numeric codes. By the
way, if you have ever wondered what is "natural" about the natural log, you can find an answer of
sorts at http://www.math.toronto.edu/mathnet/answers/answers_13.html.
Our model will be constructed by an iterative maximum likelihood procedure. The program
will start with arbitrary values of the regression coefficients and will construct an initial model for
predicting the observed data. It will then evaluate errors in such prediction and change the
regression coefficients so as make the likelihood of the observed data greater under the new model.
This procedure is repeated until the model converges -- that is, until the differences between the
newest model and the previous model are trivial.
Open the data file at http://core.ecu.edu/psyc/wuenschk/SPSS/Logistic.sav. Click Analyze,
Regression, Binary Logistic. Scoot the decision variable into the Dependent box and the gender
variable into the Covariates box. The dialog box should now look like this:
Open the data file at
http://core.ecu.edu/psyc/wuenschk/SPSS/Logistic.sav.
Click Analyze, Regression, Binary Logistic. Scoot
the decision variable into the Dependent box and the
gender variable into the Covariates box. The dialog
box should now look like this:


3
Click OK.
Look at the statistical output. We see that there are 315 cases used in the analysis.


The Block 0 output is for a model that includes only the intercept (which SPSS calls the
constant). Given the base rates of the two decision options (187/315 = 59% decided to stop the
research, 41% decided to allow it to continue), and no other information, the best strategy is to
predict, for every case, that the subject will decide to stop the research. Using that strategy, you
would be correct 59% of the time.


Under Variables in the Equation you see that the intercept-only model is ln(odds) = -.379. If
we exponentiate both sides of this expression we find that our predicted odds [Exp(B)] = .684. That
is, the predicted odds of deciding to continue the research is .684. Since 128 of our subjects decided
to continue the research and 187 decided to stop the research, our observed odds are 128/187 =
.684.


Now look at the Block 1 output. Here SPSS has added the gender variable as a predictor.
Omnibus Tests of Model Coefficients gives us a Chi-Square of 25.653 on 1 df, significant beyond
.001. This is a test of the null hypothesis that adding the gender variable to the model has not
significantly increased our ability to predict the decisions made by our subjects.
Case Processing Summary
315 100.0
0 .0
315 100.0
0 .0
315 100.0
Unweighted Cases
a
Included in Analysis
Missing Cases
Total
Selected Cases
Unselected Cases
Total
N Percent
If weight is in ef f ect, see classif ication table f or the total
number of cases.
a.
Classification Tabl e
a,b
187 0 100.0
128 0 .0
59.4
Observed
stop
continue
decision
Overall Percentage
Step 0
stop continue
decision
Percentage
Correct
Predicted
Constant is included in the model.
a.
The cut value is .500
b.
Variables in the Equation
-.379 .115 10.919 1 .001 .684 Constant St ep 0
B S. E. Wald df Sig. Exp(B)
4


Under Model Summary we see that the -2 Log Likelihood statistic is 399.913. This statistic
measures how poorly the model predicts the decisions -- the smaller the statistic the better the
model. Although SPSS does not give us this statistic for the model that had only the intercept, I know
it to be 425.666. Adding the gender variable reduced the -2 Log Likelihood statistic by 425.666 -
399.913 = 25.653, the c
2
statistic we just discussed in the previous paragraph. The Cox & Snell R
2

can be interpreted like R
2
in a multiple regression, but cannot reach a maximum value of 1. The
Nagelkerke R
2
can reach a maximum of 1.


The Variables in the Equation output shows us that the regression equation is
( ) Gender ODDS 217 . 1 847 . ln + - = .


We can now use this model to predict the odds that a subject of a given gender will decide to
continue the research. The odds prediction equation is
bX a
e ODDS
+
= . If our subject is a woman
(gender = 0), then the 429 . 0
847 . ) 0 ( 217 . 1 847 .
= = =
- + -
e e ODDS . That is, a woman is only .429 as likely
to decide to continue the research as she is to decide to stop the research. If our subject is a man
(gender = 1), then the 448 . 1
37 . ) 1 ( 217 . 1 847 .
= = =
+ -
e e ODDS . That is, a man is 1.448 times more likely to
decide to continue the research than to decide to stop the research.
We can easily convert odds to probabilities. For our women,
30 . 0
429 . 1
429 . 0
1

= =
+
=
ODDS
ODDS
Y . That is, our model predicts that 30% of women will decide to
continue the research. For our men, 59 . 0
448 . 2
448 . 1
1

= =
+
=
ODDS
ODDS
Y . That is, our model predicts that
59% of men will decide to continue the research
The Variables in the Equation output also gives us the Exp(B). This is better known as the
odds ratio predicted by the model. This odds ratio can be computed by raising the base of the
natural log to the b
th
power, where b is the slope from our logistic regression equation. For
Omnibus Tests of Model Coefficients
25.653 1 .000
25.653 1 .000
25.653 1 .000
St ep
Block
Model
St ep 1
Chi-square df Sig.
Model Summary
399.913
a
.078 .106
St ep
1
-2 Log
likelihood
Cox & Snell
R Square
Nagelkerke
R Square
Estimation terminat ed at iteration number 3 because
parameter est imat es changed by less than .001.
a.
Variables i n the Equation
1.217 .245 24.757 1 .000 3.376
-.847 .154 30.152 1 .000 .429
gender
Constant
Step
1
a
B S.E. Wald df Sig. Exp(B)
Variable(s) entered on st ep 1: gender.
a.
5
our model, 376 . 3
217 . 1
= e . That tells us that the model predicts that the odds of deciding to
continue the research are 3.376 times higher for men than they are for women. For the men, the
odds are 1.448, and for the women they are 0.429. The odds ratio is
1.448 / 0.429 = 3.376 .
The results of our logistic regression can be used to classify subjects with respect to what
decision we think they will make. As noted earlier, our model leads to the prediction that the
probability of deciding to continue the research is 30% for women and 59% for men. Before we can
use this information to classify subjects, we need to have a decision rule. Our decision rule will take
the following form: If the probability of the event is greater than or equal to some threshold, we shall
predict that the event will take place. By default, SPSS sets this threshold to .5. While that seems
reasonable, in many cases we may want to set it higher or lower than .5. More on this later. Using
the default threshold, SPSS will classify a subject into the Continue the Research category if the
estimated probability is .5 or more, which it is for every male subject. SPSS will classify a subject into
the Stop the Research category if the estimated probability is less than .5, which it is for every
female subject.
The Classification Table shows us that this rule allows us to correctly classify 68 / 128 = 53%
of the subjects where the predicted event (deciding to continue the research) was observed. This is
known as the sensitivity of prediction, the P(correct | event did occur), that is, the percentage of
occurrences correctly predicted. We also see that this rule allows us to correctly classify 140 / 187 =
75% of the subjects where the predicted event was not observed. This is known as the specificity of
prediction, the P(correct | event did not occur), that is, the percentage of nonoccurrences correctly
predicted. Overall our predictions were correct 208 out of 315 times, for an overall success rate of
66%. Recall that it was only 59% for the model with intercept only.

We could focus on error rates in classification. A false positive would be predicting that the
event would occur when, in fact, it did not. Our decision rule predicted a decision to continue the
research 115 times. That prediction was wrong 47 times, for a false positive rate of 47 / 115 = 41%.
A false negative would be predicting that the event would not occur when, in fact, it did occur. Our
decision rule predicted a decision not to continue the research 200 times. That prediction was wrong
60 times, for a false negative rate of 60 / 200 = 30%.
It has probably occurred to you that you could have used a simple Pearson Chi-Square
Contingency Table Analysis to answer the question of whether or not there is a significant
relationship between gender and decision about the animal research. Let us take a quick look at
such an analysis. In SPSS click Analyze, Descriptive Statistics, Crosstabs. Scoot gender into the
rows box and decision into the columns box. The dialog box should look like this:
Classification Tabl e
a
140 47 74.9
60 68 53.1
66.0
Observed
stop
continue
decision
Overall Percentage
Step 1
stop continue
decision
Percentage
Correct
Predicted
The cut value is .500
a.
6

Now click the Statistics box. Check Chi-Square and then click Continue.

Now click the Cells box. Check Observed Counts and Row Percentages and then click
Continue.

Back on the initial page, click OK.
In the Crosstabulation output you will see that 59% of the men and 30% of the women
decided to continue the research, just as predicted by our logistic regression.
7


You will also notice that the Likelihood Ratio Chi-Square is 25.653 on 1 df, the same test of
significance we got from our logistic regression, and the Pearson Chi-Square is almost the same
(25.685). If you are thinking, Hey, this logistic regression is nearly equivalent to a simple Pearson
Chi-Square, you are correct, in this simple case. Remember, however, that we can add additional
predictor variables, and those additional predictors can be either categorical or continuous -- you
cant do that with a simple Pearson Chi-Square.


Multiple Predictors, Both Categorical and Continuous
Now let us conduct an analysis that will better tap the strengths of logistic regression. Click
Analyze, Regression, Binary Logistic. Scoot the decision variable in the Dependent box and
gender, idealism, and relatvsm into the Covariates box.

gender * decision Crosstabulation
140 60 200
70.0% 30.0% 100.0%
47 68 115
40.9% 59.1% 100.0%
187 128 315
59.4% 40.6% 100.0%
Count
% wit hin gender
Count
% wit hin gender
Count
% wit hin gender
Female
Male
gender
Total
stop continue
decision
Total
Chi-Square Tests
25.685
b
1 .000
25.653 1 .000
315
Pearson Chi-Square
Likelihood Ratio
N of Valid Cases
Value df
Asy mp. Sig.
(2-sided)
Computed only f or a 2x2 table
a.
0 cells (.0%) hav e expect ed count less than 5. The
minimum expected count is 46.73.
b.
8
Click Options and check Hosmer-Lemeshow goodness of fit and CI for exp(B) 95%.

Continue, OK. Look at the output.
In the Block 1 output, notice that the -2 Log Likelihood statistic has dropped to 346.503,
indicating that our expanded model is doing a better job at predicting decisions than was our one-
predictor model. The R
2
statistics have also increased.

We can test the significance of the difference between any two models, as long as one model
is nested within the other. Our one-predictor model had a -2 Log Likelihood statistic of 399.913.
Adding the ethical ideology variables (idealism and relatvsm) produced a decrease of 53.41. This
difference is a c
2
on 2 df (one df for each predictor variable).
To determine the p value associated with this c
2
, just click Transform, Compute. Enter the
letter p in the Target Variable box. In the Numeric Expression box, type 1-CDF.CHISQ(53.41,2).
The dialog box should look like this:

Click OK and then go to the SPSS Data Editor, Data View. You will find a new column, p,
with the value of .00 in every cell. If you go to the Variable View and set the number of decimal
points to 5 for the p variable you will see that the value of p is.00000. We conclude that adding the
ethical ideology variables significantly improved the model, c
2
(2, N = 315) = 53.41, p < .001.
Note that our overall success rate in classification has improved from 66% to 71%.
Model Summary
346.503
a
.222 .300
St ep
1
-2 Log
likelihood
Cox & Snell
R Square
Nagelkerke
R Square
Estimation terminat ed at iteration number 4 because
parameter est imat es changed by less than .001.
a.
9

The Hosmer-Lemeshow tests the null hypothesis that predictions made by the model fit
perfectly with observed group memberships. Cases are arranged in order by their predicted
probability on the criterion variable. These ordered cases are then divided into ten (usually) groups of
equal or near equal size ordered with respect to the predicted probability of the target event. For
each of these groups we then obtain the predicted group memberships and the actual group
memberships This results in a 2 x 10 contingency table, as shown below. A chi-square statistic is
computed comparing the observed frequencies with those expected under the linear model. A
nonsignificant chi-square indicates that the data fit the model well.
This procedure suffers from several problems, one of which is that it relies on a test of
significance. With large sample sizes, the test may be significant, even when the fit is good. With
small sample sizes it may not be significant, even with poor fit. Even Hosmer and Lemeshow no
longer recommend its use.




Box-Tidwell Test. Although logistic regression is often thought of as having no assumptions,
we do assume that the relationships between the continuous predictors and the logit (log odds) is
linear. This assumption can be tested by including in the model interactions between the continuous
predictors and their logs. If such an interaction is significant, then the assumption has been violated.
I should caution you that sample size is a factor here too, so you should not be very concerned with a
just significant interaction when sample sizes are large.

Classification Tabl e
a
151 36 80.7
55 73 57.0
71.1
Observed
stop
continue
decision
Overall Percentage
Step 1
stop continue
decision
Percentage
Correct
Predicted
The cut value is .500
a.
Hosmer and Lemeshow Test
8.810 8 .359
Step
1
Chi-square df Sig.
Contingency Table for Hosmer and Lemeshow Test
29 29.331 3 2.669 32
30 27.673 2 4.327 32
28 25.669 4 6.331 32
20 23.265 12 8.735 32
22 20.693 10 11.307 32
15 18.058 17 13.942 32
15 15.830 17 16.170 32
10 12.920 22 19.080 32
12 9.319 20 22.681 32
6 4.241 21 22.759 27
1
2
3
4
5
6
7
8
9
10
Step
1
Observed Expected
decision = stop
Observed Expected
decision = continue
Total
10
Below I show how to create the natural log of a predictor. If the predictor has values of 0 or
less, first add to each score a constant such that no value will be zero or less. Also shown below is
how to enter the interaction terms. In the pane on the left, select both of the predictors to be included
in the interaction and then click the >a*b> button.





Variables in the Equation
B S.E. Wald df Sig. Exp(B)
Step 1
a

gender 1.147 .269 18.129 1 .000 3.148
idealism 1.130 1.921 .346 1 .556 3.097
relatvsm 1.656 2.637 .394 1 .530 5.240
idealism by idealism_LN -.652 .690 .893 1 .345 .521
relatvsm by relatvsm_LN -.479 .949 .254 1 .614 .620
Constant -5.015 5.877 .728 1 .393 .007
a. Variable(s) entered on step 1: gender, idealism, relatvsm, idealism * idealism_LN , relatvsm * relatvsm_LN .
11
Bravo, neither of the interaction terms is significant. If one were significant, I would try
adding to the model powers of the predictor (that is, going polynomial). For an example, see my
document Independent Samples T Tests versus Binary Logistic Regression.

Using a K > 2 Categorical Predictor
We can use a categorical predictor that has more than two levels. For our data, the stated
purpose of the research is such a predictor. While SPSS can dummy code such a predictor for you, I
prefer to set up my own dummy variables. You will need K-1 dummy variables to represent K
groups. Since we have five levels of purpose of the research, we shall need 4 dummy variables.
Each of the subjects will have a score of either 0 or 1 on each of the dummy variables. For each
dummy variable a score of 0 will indicate that the subject does not belong to the group represented by
that dummy variable and a score of 1 will indicate that the subject does belong to the group
represented by that dummy variable. One of the groups will not be represented by a dummy variable.
If it is reasonable to consider one of your groups as a reference group to which each other group
should be compared, make that group the one which is not represented by a dummy variable.
I decided that I wanted to compare each of the cosmetic, theory, meat, and veterinary groups
with the medical group, so I set up a dummy variable for each of the groups except the medical
group. Take a look at our data in the data editor. Notice that the first subject has a score of 1 for the
cosmetic dummy variable and 0 for the other three dummy variables. That subject was told that the
purpose of the research was to test the safety of a new ingredient in hair care products. Now scoot to
the bottom of the data file. The last subject has a score of 0 for each of the four dummy variables.
That subject was told that the purpose of the research was to evaluate a treatment for a debilitating
disease that afflicts humans of college age.
Click Analyze, Regression, Binary Logistic and add to the list of covariates the four dummy
variables. You should now have the decision variable in the Dependent box and all of the other
variables (but not the p value column) in the Covariates box. Click OK.
The Block 0 Variables not in the Equation show how much the -2LL would drop if a single
predictor were added to the model (which already has the intercept)


Look at the output, Block 1. Under Omnibus Tests of Model Coefficients we see that our
latest model is significantly better than a model with only the intercept.
Variables not i n the Equation
25.685 1 .000
47.679 1 .000
7.239 1 .007
.003 1 .955
2.933 1 .087
.556 1 .456
.013 1 .909
77.665 7 .000
gender
idealism
relatvsm
cosmetic
theory
meat
veterin
Variables
Overall Statistics
Step
0
Score df Sig.
12

Under Model Summary we see that our R
2
statistics have increased again and the -2 Log
Likelihood statistic has dropped from 346.503 to 338.060. Is this drop statistically significant? The
c
2
, is the difference between the two -2 log likelihood values, 8.443, on 4 df (one df for each dummy
variable). Using 1-CDF.CHISQ(8.443,4), we obtain an upper-tailed p of .0766, short of the usual
standard of statistical significance. I shall, however, retain these dummy variables, since I have an a
priori interest in the comparison made by each dummy variable.

In the Classification Table, we see a small increase in our overall success rate, from 71% to
72%.


I would like you to compute the values for Sensitivity, Specificity, False Positive Rate, and
False Negative Rate for this model, using the default .5 cutoff.
Sensitivity percentage of occurrences correctly predicted
Specificity percentage of nonoccurrences correctly predicted
False Positive Rate percentage of predicted occurrences which are incorrect
False Negative Rate percentage of predicted nonoccurrences which are incorrect
Remember that the predicted event was a decision to continue the research.
Under Variables in the Equation we are given regression coefficients and odds ratios.
Omnibus Tests of Model Coefficients
87.506 7 .000
87.506 7 .000
87.506 7 .000
St ep
Block
Model
St ep 1
Chi-square df Sig.
Model Summary
338.060
a
.243 .327
St ep
1
-2 Log
likelihood
Cox & Snell
R Square
Nagelkerke
R Square
Estimation terminat ed at iteration number 5 because
parameter est imat es changed by less than .001.
a.
Classification Tabl e
a
152 35 81.3
54 74 57.8
71.7
Observed
stop
continue
decision
Overall Percentage
Step 1
stop continue
decision
Percentage
Correct
Predicted
The cut value is .500
a.
13

We are also given a statistic I have ignored so far, the Wald Chi-Square statistic, which tests
the unique contribution of each predictor, in the context of the other predictors -- that is, holding
constant the other predictors -- that is, eliminating any overlap between predictors. Notice that each
predictor meets the conventional .05 standard for statistical significance, except for the dummy
variable for cosmetic research and for veterinary research. I should note that the Wald c
2
has been
criticized for being too conservative, that is, lacking adequate power. An alternative would be to test
the significance of each predictor by eliminating it from the full model and testing the significance of
the increase in the -2 log likelihood statistic for the reduced model. That would, of course, require
that you construct p+1 models, where p is the number of predictor variables.
Let us now interpret the odds ratios.
The .496 odds ratio for idealism indicates that the odds of approval are more than cut in half for
each one point increase in respondents idealism score. Inverting this odds ratio for easier
interpretation, for each one point increase on the idealism scale there was a doubling of the odds
that the respondent would not approve the research.
Relativisms effect is smaller, and in the opposite direction, with a one point increase on the nine-
point relativism scale being associated with the odds of approving the research increasing by a
multiplicative factor of 1.39.
The odds ratios of the scenario dummy variables compare each scenario except medical to the
medical scenario. For the theory dummy variable, the .314 odds ratio means that the odds of
approval of theory-testing research are only .314 times those of medical research.
Inverted odds ratios for the dummy variables coding the effect of the scenario variable indicated
that the odds of approval for the medical scenario were 2.38 times higher than for the meat
scenario and 3.22 times higher than for the theory scenario.
Let us now revisit the issue of the decision rule used to determine into which group to classify
a subject given that subject's estimated probability of group membership. While the most obvious
decision rule would be to classify the subject into the target group if p > .5 and into the other group if p
< .5, you may well want to choose a different decision rule given the relative seriousness of making
one sort of error (for example, declaring a patient to have breast cancer when she does not) or the
other sort of error (declaring the patient not to have breast cancer when she does).
Repeat our analysis with classification done with a different decision rule. Click Analyze,
Regression, Binary Logistic, Options. In the resulting dialog window, change the Classification
Cutoff from .5 to .4. The window should look like this:
Variables i n the Equation
1.255 20.586 1 .000 3.508 2.040 6.033
-.701 37.891 1 .000 .496 .397 .620
.326 6.634 1 .010 1.386 1.081 1.777
-.709 2.850 1 .091 .492 .216 1.121
-1.160 7.346 1 .007 .314 .136 .725
-.866 4.164 1 .041 .421 .183 .966
-.542 1.751 1 .186 .581 .260 1.298
2.279 4.867 1 .027 9.766
gender
idealism
relatvsm
cosmetic
theory
meat
veterin
Constant
Step
1
a
B Wald df Sig. Exp(B) Lower Upper
95.0% C.I.f or EXP(B)
Variable(s) entered on step 1: gender, idealism, relatv sm, cosmetic, theory , meat, v eterin.
a.
14

Click Continue, OK.
Now SPSS will classify a subject into the "Continue the Research" group if the estimated
probability of membership in that group is .4 or higher, and into the "Stop the Research" group
otherwise. Take a look at the classification output and see how the change in cutoff has changed the
classification results. Fill in the table below to compare the two models with respect to classification
statistics.
Value When Cutoff = .5 .4
Sensitivity
Specificity
False Positive Rate
False Negative Rate
Overall % Correct
SAS makes it much easier to see the effects of the decision rule on sensitivity etc. Using the
ctable option, one gets output like this:
------------------------------------------------------------------------------------
The LOGISTIC Procedure

Classification Table

Correct Incorrect Percentages
Prob Non- Non- Sensi- Speci- False False
Level Event Event Event Event Correct tivity ficity POS NEG

0.160 123 56 131 5 56.8 96.1 29.9 51.6 8.2
0.180 122 65 122 6 59.4 95.3 34.8 50.0 8.5
0.200 120 72 115 8 61.0 93.8 38.5 48.9 10.0
0.220 116 84 103 12 63.5 90.6 44.9 47.0 12.5

You might also like