Logistic Regression (Peng Et Al)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

An Introduction to Logistic Regression

Analysis and Reporting


CHAO-YING JOANNE PENG
KUK LIDA LEE
GARY M. INGERSOLL
Indiana University-Bloomington

& Kravitz, 1994; Tolman & Weisz, 1995) and in education-


ABSTRACT The purpose of this article is to provide al research—especially in higher education (Austin, Yaffee,
researchers, editors, and readers with a set of guidelines for & Hinkle, 1992; Cabrera, 1994; Peng & So, 2002a; Peng,
what to expect in an article using logistic regression tech- So, Stage, & St. John, 2002. With the wide availability of
niques. Tables, figures, and charts that should be included to
comprehensively assess the results and assumptions to be ver- sophisticated statistical software for high-speed computers,
ified are discussed. This article demonstrates the preferred the use of logistic regression is increasing. This expanded
pattern for the application of logistic methods with an illustra- use demands that researchers, editors, and readers be
tion of logistic regression applied to a data set in testing a attuned to what to expect in an article that uses logistic
research hypothesis. Recommendations are also offered for regression techniques. What tables, figures, or charts should
appropriate reporting formats of logistic regression results
and the minimum observation-to-predictor ratio. The authors be included to comprehensibly assess the results? What
evaluated the use and interpretation of logistic regression pre- assumptions should be verified? In this article, we address
sented in 8 articles published in The Journal of Educational these questions with an illustration of logistic regression
Research between 1990 and 2000. They found that all 8 studies applied to a data set in testing a research hypothesis. Rec-
met or exceeded recommended criteria. ommendations are also offered for appropriate reporting
Key words: binary data analysis, categorical variables, formats of logistic regression results and the minimum
dichotomous outcome, logistic modeling, logistic regression observation-to-predictor ratio. The remainder of this article
is divided into five sections: (1) Logistic Regression Mod-
els, (2) Illustration of Logistic Regression Analysis and
Reporting, (3) Guidelines and Recommendations, (4) Eval-
M any educational research problems call for the
analysis and prediction of a dichotomous outcome:
whether a student will succeed in college, whether a child
uations of Eight Articles Using Logistic Regression, and (5)
Summary.
should be classified as learning disabled (LD), whether a
teenager is prone to engage in risky behaviors, and so on. Logistic Regression Models
Traditionally, these research questions were addressed by
The central mathematical concept that underlies logistic
either ordinary least squares (OLS) regression or linear dis-
regression is the logit—the natural logarithm of an odds
criminant function analysis. Both techniques were subse-
ratio. The simplest example of a logit derives from a 2 × 2
quently found to be less than ideal for handling dichoto-
contingency table. Consider an instance in which the distri-
mous outcomes due to their strict statistical assumptions,
bution of a dichotomous outcome variable (a child from an
i.e., linearity, normality, and continuity for OLS regression
inner city school who is recommended for remedial reading
and multivariate normality with equal variances and covari-
classes) is paired with a dichotomous predictor variable
ances for discriminant analysis (Cabrera, 1994; Cleary &
(gender). Example data are included in Table 1. A test of
Angel, 1984; Cox & Snell, 1989; Efron, 1975; Lei &
independence using chi-square could be applied. The results
Koehly, 2000; Press & Wilson, 1978; Tabachnick & Fidell,
yield χ2(1) = 3.43. Alternatively, one might prefer to assess
2001, p. 521). Logistic regression was proposed as an alter-
native in the late 1960s and early 1970s (Cabrera, 1994),
and it became routinely available in statistical packages in
Address correspondence to Chao-Ying Joanne Peng, Depart-
the early 1980s. ment of Counseling and Educational Psychology, School of Edu-
Since that time, the use of logistic regression has cation, Room 4050, 201 N. Rose Ave., Indiana University, Bloom-
increased in the social sciences (e.g., Chuang, 1997; Janik ington, IN 47405–1006. (E-mail: [email protected])
3
4 The Journal of Educational Research

a boy’s odds of being recommended for remedial reading but curved at the ends (Figure 1, the S-shaped curve). Such a
instruction relative to a girl’s odds. The result is an odds ratio shape, often referred to as sigmoidal or S-shaped, is difficult
of 2.33, which suggests that boys are 2.33 times more like- to describe with a linear equation for two reasons. First, the
ly, than not, to be recommended for remedial reading class- extremes do not follow a linear trend. Second, the errors are
es compared with girls. The odds ratio is derived from two neither normally distributed nor constant across the entire
odds (73/23 for boys and 15/11 for girls); its natural loga- range of data (Peng, Manz, & Keck, 2001). Logistic regres-
rithm [i.e., ln(2.33)] is a logit, which equals 0.85. The value sion solves these problems by applying the logit transforma-
of 0.85 would be the regression coefficient of the gender pre- tion to the dependent variable. In essence, the logistic model
dictor if logistic regression were used to model the two out- predicts the logit of Y from X. As stated earlier, the logit is the
comes of a remedial recommendation as it relates to gender. natural logarithm (ln) of odds of Y, and odds are ratios of
Generally, logistic regression is well suited for describing probabilities (π) of Y happening (i.e., a student is recom-
and testing hypotheses about relationships between a cate- mended for remedial reading instruction) to probabilities (1 –
gorical outcome variable and one or more categorical or con- π) of Y not happening (i.e., a student is not recommended for
tinuous predictor variables. In the simplest case of linear remedial reading instruction). Although logistic regression
regression for one continuous predictor X (a child’s reading can accommodate categorical outcomes that are polytomous,
score on a standardized test) and one dichotomous outcome in this article we focus on dichotomous outcomes only. The
variable Y (the child being recommended for remedial read- illustration presented in this article can be extended easily to
ing classes), the plot of such data results in two parallel lines, polytomous variables with ordered (i.e., ordinal-scaled) or
each corresponding to a value of the dichotomous outcome unordered (i.e., nominal-scaled) outcomes.
(Figure 1). Because the two parallel lines are difficult to be The simple logistic model has the form
described with an ordinary least squares regression equation
 π 
due to the dichotomy of outcomes, one may instead create logit(Y ) = natural log( odds) = ln  = α + βX. (1)
categories for the predictor and compute the mean of the out-  1 − π
come variable for the respective categories. The resultant plot
of categories’ means will appear linear in the middle, much For the data in Table 1, the regression coefficient (β) is the
like what one would expect to see on an ordinary scatter plot, logit (0.85) previously explained. Taking the antilog of
Equation 1 on both sides, one derives an equation to predict
the probability of the occurrence of the outcome of interest
Table 1.—Sample Data for Gender and Recommendation for
Remedial Reading Instruction
as follows:
π = Probability(Y = outcome of interest | X = x,
e α +βx
Gender
Remedial reading instruction Boys Girls Total a specific value of X) = , (2)
1 + e α +βx
Recommended (coded as 1) 73 15 88 where π is the probability of the outcome of interest or
Not recommended (coded as 0) 23 11 34
Total 96 26 122
“event,” such as a child’s referral for remedial reading class-
es, α is the Y intercept, β is the regression coefficient, and
e = 2.71828 is the base of the system of natural logarithms.
X can be categorical or continuous, but Y is always categor-
Figure 1. Relationship of a Dichotomous Outcome Variable,
Y (1 = Remedial Reading Recommended, 0 = Remedial Read- ical. According to Equation 1, the relationship between
ing Not Recommended) With a Continuous Predictor, Reading logit (Y) and X is linear. Yet, according to Equation 2, the
Scores relationship between the probability of Y and X is nonlinear.
For this reason, the natural log transformation of the odds in
Equation 1 is necessary to make the relationship between a
1.0 –
categorical outcome variable and its predictor(s) linear.
– The value of the coefficient β determines the direction of
the relationship between X and the logit of Y. When β is

greater than zero, larger (or smaller) X values are associated
– with larger (or smaller) logits of Y. Conversely, if β is less
than zero, larger (or smaller) X values are associated with
– smaller (or larger) logits of Y. Within the framework of infer-
0.0 –
ential statistics, the null hypothesis states that β equals zero,
or there is no linear relationship in the population. Rejecting
| | | | | | | such a null hypothesis implies that a linear relationship exists
40 60 80 100 120 140 160
between X and the logit of Y. If a predictor is binary, as in the
Reading Score Table 1 example, then the odds ratio is equal to e, the natural
logarithm base, raised to the exponent of the slope β (eβ).
September/October 2002 [Vol. 96(No. 1)] 5

Extending the logic of the simple logistic regression to remedial reading instruction (1 = yes, 0 = no), and the two
multiple predictors (say X1 = reading score and X2 = gender), predictors were students’ reading score on a standardized
one can construct a complex logistic regression for Y (rec- test (X1 = the reading variable) and gender (X2 = gender). The
ommendation for remedial reading programs) as follows: reading scores ranged from 40 to 125 points, with a mean of
 π  64.91 points and standard deviation of 15.29 points (Table
logit(Y ) = ln  = α + β1 X1 + β2 X2 . (3) 2). The gender predictor was coded as 1 = boy and 0 = girl.
 1 − π
The gender distribution was nearly even with 49.21% (n =
Therefore,
93) boys and 50.79% (n = 96) girls.
π = Probability (Y = outcome of interest | X1 = x1, X2 = x2
e α +β1 X 1 +β2 X 2 (4)
Logistic Regression Analysis
= ,
1 + e α +β1 X 1 +β2 X 2 A two-predictor logistic model was fitted to the data to
where π is once again the probability of the event, α is the test the research hypothesis regarding the relationship
Y intercept, βs are regression coefficients, and Xs are a set between the likelihood that an inner city child is recom-
of predictors. α and βs are typically estimated by the max- mended for remedial reading instruction and his or her read-
imum likelihood (ML) method, which is preferred over the ing score and gender. The logistic regression analysis was
weighted least squares approach by several authors, such as carried out by the Logistic procedure in SAS version 8
Haberman (1978) and Schlesselman (1982). The ML (SAS Institute Inc., 1999) in the Windows 2000 environ-
method is designed to maximize the likelihood of reproduc- ment (SAS programming codes are found in Table 3). The
ing the data given the parameter estimates. Data are entered result showed that
into the analysis as 0 or 1 coding for the dichotomous out- Predicted logit of (REMEDIAL) = 0.5340
come, continuous values for continuous predictors, and + (−0.0261)*READING + (0.6477)*GENDER. (5)
dummy codings (e.g., 0 or 1) for categorical predictors.
The null hypothesis underlying the overall model states According to the model, the log of the odds of a child
that all βs equal zero. A rejection of this null hypothesis being recommended for remedial reading instruction was
implies that at least one β does not equal zero in the popu- negatively related to reading scores (p < .05) and positively
lation, which means that the logistic regression equation related to gender (p < .05; Table 3). In other words, the high-
predicts the probability of the outcome better than the mean er the reading score, the less likely it is that a child would be
of the dependent variable Y. The interpretation of results is recommended for remedial reading classes. Given the same
rendered using the odds ratio for both categorical and con- reading score, boys were more likely to be recommended
tinuous predictors. for remedial reading classes than girls because boys were
coded to be 1 and girls 0. In fact, the odds of a boy being
Illustration of Logistic Regression Analysis recommended for remedial reading programs were 1.9111
and Reporting (= e0.6477; Table 3) times greater than the odds for a girl.
The differences between boys and girls are depicted in
For the sake of illustration, we constructed a hypothetical
Figure 2, in which predicted probabilities of recommenda-
data set to which logistic regression was applied, and we
tions are plotted for each gender group against various read-
interpreted its results. The hypothetical data consisted of
ing scores. From this figure, it may be inferred that for a
reading scores and genders of 189 inner city school children
given score on the reading test (e.g., 60 points), the proba-
(Appendix A). Of these children, 59 (31.22%) were recom-
bility of a boy being recommended for remedial reading
mended for remedial reading classes and 130 (68.78%)
programs is higher than that of a girl. This statement is also
were not. A legitimate research hypothesis posed to the data
confirmed by the positive coefficient (0.6477) associated
was that “the likelihood that an inner city school child is
with the gender predictor.
recommended for remedial reading instruction is related to
both his/her reading score and gender.” Thus, the outcome Evaluations of the Logistic Regression Model
variable, remedial, was students being recommended for
How effective is the model expressed in Equation 5?
How can an educational researcher assess the soundness of
Table 2.—Description of a Hypothetical Data Set for Logistic
Regression a logistic regression model? To answer these questions, one
must attend to (a) overall model evaluation, (b) statistical
Remedial Total
tests of individual predictors, (c) goodness-of-fit statistics,
reading sample Boys Girls Reading score and (d) validations of predicted probabilities. These evalua-
recommended? (N) (n1) (n 2) M SD tions are illustrated below for the model based on Equation
5, also referred to as Model 5.
Yes 59 36 23 61.07 13.28
No 130 57 73 66.65 15.86
Overall model evaluation. A logistic model is said to pro-
Summary 189 93 96 64.91 15.29 vide a better fit to the data if it demonstrates an improvement
over the intercept-only model (also called the null model). An
6 The Journal of Educational Research

Table 3.—Logistic Regression Analysis of 189 Children’s Referrals for Remedial Reading Programs by
SAS PROC LOGISTIC (Version 8)

Wald’s eβ
Predictor β SE β χ2 df p (odds ratio)

Constant 0.5340 0.8109 0.4337 1 .5102 NA


Reading –0.0261 0.0122 4.5648 1 .0326 0.9742
Gender (1 = boys, 0 = girls) 0.6477 0.3248 3.9759 1 .0462 1.9111

Test χ2 df p

Overall model evaluation


Likelihood ratio test 10.0195 2 .0067
Score test 9.5177 2 .0086
Wald test 9.0626 2 .0108
Goodness-of-fit test
Hosmer & Lemeshow 7.7646 8 .4568

Note. SAS programming codes: [PROC LOGISTIC; MODEL REMEDIAL=READING GENDER/CTABLE PPROB=(0.1 TO
1.0 BY 0.1) LACKFIT RSQ;]. Cox and Snell R 2 = .0516. Nagelkerke R 2 (Max rescaled R 2) = .0726. Kendall’s Tau-a = .1180.
Goodman-Kruskal Gamma = .2760. Somers’s Dxy = .2730. c-statistic = 63.60%. All statistics reported herein use 4 decimal
places in order to maintain statistical precision. NA = not applicable.

intercept-only model serves as a good baseline because it con- should exceed 5, and expected frequencies should be at least
tains no predictors. Consequently, according to this model, all 5. For the present data, the number of observations in each
observations would be predicted to belong in the largest out- group was mostly 19 (3 groups) or 20 (5 groups); 1 group
come category. An improvement over this baseline is exam- had 21 observations and another had 11 observations. The
ined by using three inferential statistical tests: the likelihood number of groups was 10, and the expected frequencies were
ratio, score, and Wald tests. All three tests yield similar con- at or exceeded 5 in 90% of cells. Thus, it was concluded that
clusions for the present data (Table 3), namely, that the logis- the conditions were met for reporting the HL test statistic.
tic Model 5 was more effective than the null model. For other Two additional descriptive measures of goodness-of-fit
data sets, these three tests may not lead to similar conclusions. presented in Table 3 are R2 indices, defined by Cox and
When this happens, readers are advised to rely on the likeli- Snell (1989) and Nagelkerke (1991), respectively. These
hood ratio and score tests only (Menard, 1995). indices are variations of the R2 concept defined for the OLS
Statistical tests of individual predictors. The statistical regression model. In linear regression, R2 has a clear defin-
significance of individual regression coefficients (i.e., βs) is ition: It is the proportion of the variation in the dependent
tested using the Wald chi-square statistic (Table 3). Accord- variable that can be explained by predictors in the model.
ing to Table 3, both reading score and gender were signifi- Attempts have been devised to yield an equivalent of this
cant predictors of inner city school children’s referrals for concept for the logistic model. None, however, renders the
remedial reading programs (p < .05). The test of the intercept meaning of variance explained (Long, 1997, pp. 104–109;
(i.e., the constant in Table 3) merely suggests whether an Menard, 2000). Furthermore, none corresponds to predic-
intercept should be included in the model. For the present tive efficiency or can be tested in an inferential framework
data set, the test result (p > .05) suggested that an alternative (Menard). For these reasons, a researcher can treat these
model without the intercept might be applied to the data. two R2 indices as supplementary to other, more useful eval-
Goodness-of-fit statistics. Goodness-of-fit statistics uative indices, such as the overall evaluation of the model,
assess the fit of a logistic model against actual outcomes tests of individual regression coefficients, and the good-
(i.e., whether a referral is made for remedial reading pro- ness-of-fit test statistic.
grams). One inferential test and two descriptive measures Validations of predicted probabilities. As we explained
are presented in Table 3. The inferential goodness-of-fit test earlier, logistic regression predicts the logit of an event out-
is the Hosmer–Lemeshow (H–L) test that yielded a χ2(8) of come from a set of predictors. Because the logit is the nat-
7.7646 and was insignificant (p > .05), suggesting that the ural log of the odds (or probability/[1–probability]), it can
model was fit to the data well. In other words, the null be transformed back to the probability scale. The resultant
hypothesis of a good model fit to data was tenable. predicted probabilities can then be revalidated with the
The H–L statistic is a Pearson chi-square statistic, calcu- actual outcome to determine if high probabilities are indeed
lated from a 2 × g table of observed and estimated expected associated with events and low probabilities with non-
frequencies, where g is the number of groups formed from events. The degree to which predicted probabilities agree
the estimated probabilities. Ideally, each group should have with actual outcomes is expressed as either a measure of
an equal number of observations, the number of groups association or a classification table. There are four measures
September/October 2002 [Vol. 96(No. 1)] 7

Figure 2. Predicted Probability of Being Referred for Remedial Reading Instructions Versus Reading
Scores

0.6 –

boys

A
B
0.5 – A
FA
CC
EA
AI
E
0.4 – PC
girls BC
HB
Estimated Probability

D
AC A
CB AC
0.3 – BB ABA
AIB A
CJ BA
E
AKE B
AFA B
0.2 – BB A
CA
AAA
ACA
AB A
B A A A
0.1 – A

0.0 –

| | | | | |
40 60 80 100 120 140

Reading Score

Note. Plotting symbols A = 1 observation, B = 2 observations, C = 3 observations, and so forth.

of association and one classification table that are provided Tau-a when there are ties on both outcomes and predicted
by SAS (Version 8). probabilities, as was the case with the present data (see
The four measures of association are Kendall’s Tau-a, Appendix A). The Gamma statistic for Model 5 is 0.2760
Goodman-Kruskal’s Gamma, Somers’s D statistic, and the (Table 3). It is interpreted as 27.60% fewer errors made in
c statistic (Table 3). The Tau-a statistic is Kendall’s rank- predicting which of two children would be recommended
order correlation coefficient without adjustments for ties. for remedial reading programs by using the estimated prob-
The Gamma statistic is based on Kendall’s coefficient but abilities than by chance alone (Demaris, 1992). Some cau-
adjusts for ties. Gamma is more useful and appropriate than tion is advised in using the Gamma statistic because (a) it
8 The Journal of Educational Research

has a tendency to overstate the strength of association was more accurate than that for those who were. This obser-
between estimated probabilities and outcomes (Demaris), vation was also supported by the magnitude of sensitivity
and (b) a value of zero does not necessarily imply indepen- (3.39%) compared to that of specificity (99.23%). Sensitiv-
dence when the data structure exceeds a 2 × 2 format ity measures the proportion of correctly classified events
(Siegel & Castellan, 1988). (i.e., those recommended for remedial reading programs),
Somers’s D is a preferred extension of Gamma whereby whereas specificity measures the proportion of correctly
one variable is designated as the dependent variable and the classified nonevents (those not recommended for remedial
other the independent variable (Siegel & Castellan, 1988). reading programs). Both false positive and false negative
There are two asymmetric forms of Somers’s D statistic: Dxy rates were a little more than 30%. The false positive rate
and Dyx. Only Dyx correctly represents the degree of associa- measures the proportion of observations misclassified as
tion between the outcome (y), designated as the dependent events over all of those classified as events. The false nega-
variable, and the estimated probability (x), designated as the tive therefore measures the proportion of observations mis-
independent variable (Demaris, 1992). Unfortunately, SAS classified as nonevents over all of those classified as non-
computes only Dxy (Table 3), although this index can be cor- events. The overall correction prediction was 69.31%, an
rected to Dyx in SAS (Peng & So, 1998). improvement over the chance level. In the opinion of Hos-
The c statistic represents the proportion of student pairs mer and Lemeshow (2000, p. 160), “the classification table
with different observed outcomes for which the model cor- is most appropriate when classification is a stated goal of
rectly predicts a higher probability for observations with the the analysis; otherwise it should only supplement more rig-
event outcome than the probability for nonevent observations. orous methods of assessment of fit.”
For the present model, the c statistic is 0.6360 (Table 3). This Table 4 was prepared with SAS using a reduced-bias
means that for 63.60% of all possible pairs of children—one algorithm. The algorithm minimizes the bias of using the
recommended for remedial reading programs and the other same observations both for model fitting and for predicting
not—the model correctly assigned a higher probability to probabilities (SAS Institute Inc., 1999). According to a
those who were recommended. The c statistic ranges from 0.5 recent comparative study of six statistical packages that can
to 1. A 0.5 value means that the model is no better than assign- be used for logistic regression (Peng & So, 2002b), SAS is
ing observations randomly into outcome categories. A value the only package that uses this algorithm. Thus, entries in
of 1 means that the model assigns higher probabilities to all Table 4 would be slightly different if other software (such
observations with the event outcome, compared with non- as SPSS) was used to prepare it.
event observations. If several models were fitted to the same
data set, the model chosen as the best model should be asso- Reporting and Interpreting Logistic Regression Results
ciated with the highest c statistic. Thus, the c statistic provides
a basis for comparing different models fitted to the same data In addition to the data presented in Tables 3 and 4 and
or the same model fitted to different data sets. Figure 2, it is helpful to demonstrate the relationship
In addition to these measures of association, SAS output between the predicted outcome and certain characteristics
includes a classification table that documents the validity of found in observations. For the present data, this relationship
predicted probabilities (Table 4). The first two rows in Table is demonstrated in Table 5 for four cases (1–4) extracted
4 represent the two possible outcomes, and the two columns from Appendix A, as well as for four observations (5–8) for
under the heading “Predicted” are for high and low proba- whom reading scores were hypothesized at two levels for
bilities, based on a cutoff point. The cutoff point may be both genders. For the first four cases, the predicted proba-
specified by researchers or set at 0.5 by SAS. According to bilities of referrals for remedial reading programs were cal-
Table 4, with the cutoff set at 0.5, the prediction for children culated using Equation 5. Even though these four cases
who were not recommended for remedial reading programs were not perfectly predicted, the correct prediction rate was
better than chance.
Table 4.—The Observed and the Predicted Frequencies for The last four hypothetical cases show the descending pre-
Remedial Reading Instructions by Logistic Regression With dicted probabilities of referrals for remedial reading programs
the Cutoff of 0.50 as the reading scores increase for children of both genders.
For each point increase on the reading score, the odds of
Predicted being recommended for remedial reading programs decrease
Observed Yes No % Correct from 1.0 to 0.9742 (= e–0.0261; Table 3). If the increase on the
reading score was 10 points, the odds decreased from 1.0 to
Yes 2 57 3.39
No 1 129 99.23
0.7703 (= e10*[–0.0261]). However, when the reading score was
Overall % correct 69.31 held as a constant, boys were predicted to be referred for
remedial reading instructions with greater probability than
Note. Sensitivity = 2/(2+57)% = 3.39%. Specificity = 129/(1+129)% = girls. The differences between boys and girls are graphically
99.23%. False positive = 1/(1+2)% = 33.33%. False negative =
57/(57+129)% = 30.65%. shown in Figure 2 and confirmed previously by the positive
coefficient (0.6477) of the gender predictor in Equation 5.
September/October 2002 [Vol. 96(No. 1)] 9

Table 5.—Predicted Probability of Being Referred for Remedial Reading Instructions for 8 Children

Predicted probability of
Case Reading score Gender Intercept being referred for Actual outcome
number β = –0.0261 β = 0.6477 = 0.5340 remedial reading program 1 = Yes, 0 = No

1 52.5 Boy 0.5340 0.4530 1


2 85 Boy 0.5340 0.2618 0
3 75 Girl 0.5340 0.1941 1
4 92 Girl 0.5340 0.1250 0
5 60 Boy 0.5340 0.4051 —
6 60 Girl 0.5340 0.2627 —
7 100 Boy 0.5340 0.1934 —
8 100.5 Girl 0.5340 0.1115 —

The odds of a boy being recommended for remedial reading tics, the predictive power of the model, and the interpretabil-
programs were 1.9111 (= e0.6477; Table 3) times greater than ity of the model. Furthermore, researchers should pay atten-
the odds for a girl. tion to mathematical definitions of statistics (such as Dxy)
In terms of the research hypothesis posed earlier to the generated by the statistical package of choice. Among the
hypothetical data—“the likelihood that an inner city school packages that perform logistic regression, none was found
child is recommended for remedial reading instruction is to be error free (Peng & So, 2002b). A reference to the
related to both his/her reading score and gender”—logistic software should inform readers of programming mistakes
regression results supported this proposition. Specifically, and limitations, and help researchers verify results with
the likelihood of a child being recommended for remedial another statistical package. A recent review of six statistical
reading instruction was negatively related to his or her read- software programs, conducted by Peng and So (2002b, pp.
ing scores. However, given the same reading score, boys 55–56) for performing logistic regression, concluded that
were more likely to be recommended for remedial reading The versatile SAS logisitic and BMDP LR [were recom-
classes than girls. We reached this conclusion with multiple mended] for researchers experienced with logistic regression
evidences: the significant test result of the logistic model, techniques and programming. . . . Several unique goodness-
of-fit indices and selection methods are provided in SAS. Its
statistically significant test results of both predictors, ability to fit a broad class of binary response models, plus its
insignificant HL test of goodness-of-fit, and several provision to correct for over-sampling, over-dispersion, and
descriptive measures of associations between predicted bias introduced into predicted probabilities, sets it apart from
probabilities and data. the other five. . . . If either SPSS LOGISTIC REGRESSION
or SYSTAT LOGIT is the only package available,
Guidelines and Recommendations researchers must be aware that both compute the goodness-
of-fit and diagnostic statistics from individual observations.
What Tables, Figures, or Charts Should Be Included to Consequently, these statistics are inappropriate for statistical
Comprehensively Assess the Result? tests. With dazzling graphic interfaces, both packages are
user-friendly.
In presenting the assessment of logistic regression
MINITAB BLOGISTIC is the simplest to use. It adopts the
results, researchers should include sufficient information to hierarchical modeling restriction in direct modeling. . . . A
address the following: substantial number of goodness-of-fit indices are available
including the unique Brown statistic. However, the absence
• an overall evaluation of the logistic model of predictor selection methods may make it less appealing to
• statistical tests of individual predictors some researchers. . . . STATA LOGISTIC provides the most
• goodness-of-fit statistics detailed information on parameter estimates, yet its good-
• an assessment of the predicted probabilities ness-of-fit indices are limited. We recommend MINITAB
and STATA for beginners, although experienced researchers
Table 3 illustrates the presentation of the first three types may also employ them for logistic regression.
of information and Table 4 the fourth. To illustrate the
What Assumptions Should Be Verified?
impact of a statistically significant categorical predictor
(e.g., gender in our example) on the dichotomous dependent Unlike discriminant function analysis, logistic regression
variable (e.g., recommendation for remedial reading pro- does not assume that predictor variables are distributed as a
grams), it is helpful to include a figure such as Figure 2. It multivariate normal distribution with equal covariance
is our recommendation that logistic regression results be matrix. Instead, it assumes that the binomial distribution
reported, similar to those in Tables 3 and 4 and Figure 2, to describes the distribution of the errors that equal the actual
help communicate findings to readers. Y minus the predicted Y. The binomial distribution is also
A model’s adequacy should be justified by multiple indi- the assumed distribution for the conditional mean of the
cators, including an overall test of all parameters, a statistical dichotomous outcome. This assumption implies that the
significance test of each predictor, the goodness-of-fit statis- same probability is maintained across the range of predictor
10 The Journal of Educational Research

values. The binomial assumption may be tested by the nor- A breakdown of the articles by year showed that, prior to
mal z test (Siegel & Castellan, 1988) or may be taken to be 1993, there was no article that used logistic regression. In
robust as long as the sample is random; thus, observations 1993, 1994, 1996, and 1997, one article per year applied
are independent from each other. logistic regression; in 1998 and 2000, there were two per
year. This trend mirrors the pattern that was found in high-
Recommended Reporting Formats of Logistic Regression er education journals (Peng et al., 2002, except that the rise
In terms of reporting logistic regression results, we rec- of logistic regression began a year earlier, in 1992, in high-
ommend presenting the complete logistic regression model er education journals.
including the Y-intercept (similar to Equation 5), odds The research questions addressed in the eight articles
ratios, and a table such as Table 5 to illustrate the relation- included American Indian adolescents’ educational com-
ship between outcomes and observations with profiles of mitment (Trusty, 2000), school performance and activities
certain characteristics. Odds ratios are directly derived from (Alexander, Dauber, & Entwisle, 1996; McNeal, 1998;
regression coefficients in a logistic model. If βj represents Smith, 1997), students at-risk (Meisels & Liaw, 1993; Rush
the regression coefficient for predictor Xj, then exponentiat- & Vitale, 1994), family connectedness (Machamer & Gru-
ing βj yields the odds ratio. When all other predictors are ber, 1998), and parents’ conceptions of kindergarten readi-
held at a constant, the odds ratio means the change in the ness (Diamond, Reagan, & Bandyk, 2000). One central
odds of Y given a unit change in Xj. It is one of three epi- theme shared by all was education-related adjustment and
demiological measures of effect that have been recently rec- performance. The dependent variable was dichotomous,
ommended by psychologists for informing public policy whether it was retention in school, dropping-out from high
makers (Scott, Mason, & Chapman, 1999). Three condi- school, or readiness for kindergarten. The predictors typi-
tions must be met before odds ratios can be interpreted sen- cally included a combination of demographic characteris-
sibly: (a) the predictor Xj must not interact with another pre- tics (such as age, gender, and ethnicity) and cognitive,
dictor; (b) the predictor Xj must be represented by a single affective, or personality-related measures. The objective of
term in the model; and (c) a one-unit change in the predic- each study was to predict or to distinguish the outcome cat-
tor Xj must be meaningful and relevant. It is worth noting egories on the basis of predictors.
that odds ratios and odds are two different concepts. They To test pertinent research hypotheses, the authors of these
are related but not in a linear fashion. Likewise, the rela- eight articles used three modeling approaches: direct,
tionship between the predicted probability and odds, though sequential, and stepwise modeling. Of these three, only
positive, is not linear either. direct and sequential models were controlled and imple-
mented by researchers (Peng & So, 2002a). Three studies
Recommended Minimum Observation-to-Predictor Ratio investigated interactions among predictors (Alexander,
Dauber, & Entwisle, 1996; Meisels & Liaw, 1993; Trusty,
In terms of the adequacy of sample sizes, the literature 2000); the others did not. Though not all prior studies have
has not offered specific rules applicable to logistic regres- always followed the guidelines and recommendations out-
sion (Peng et al., 2002). However, several authors on multi- lined in the previous section, all authors are credited for
variate statistics (Lawley & Maxwell, 1971; Marascuilo & making substantive contributions as well as for introducing
Levin, 1983; Tabachnick & Fidell, 1996, 2001) have rec- logistic regression into the field of educational research.
ommended a minimum ratio of 10 to 1, with a minimum
sample size of 100 or 50, plus a variable number that is a The Assessment of Logistic Regression Results
function of the number of predictors.
Four groups of authors (Alexander, Dauber, & Entwisle,
Evaluations of Eight Articles Using Logistic Regression 1996; Diamond, Reagan, & Bandyk, 2000; McNeal, 1998;
Rush & Vitale, 1994) evaluated the overall logistic model;
To help understand how logistic regression has been all reported tests of individual predictors, such as those
applied by authors of articles published in The Journal of shown in Table 3. Evidence of the goodness-of-fit of logis-
Educational Research (JER), we reviewed articles that used tic models was provided by the R2 index for either the entire
this technique between 1990 and 2000. During this period, model or for each predictor (Alexander, Dauber, &
eight articles were found to have used logistic regression. Entwisle, 1996; Diamond, Reagan, & Bandyk, 2000; Rush
The criterion used in selecting articles was simple: at least & Vitale, 1994; Trusty, 2000). None reported the HL test.
one empirical analysis in the article must have been con- Only one study (Rush & Vitale, 1994) validated predicted
ducted to derive the logistic model and its regression coeffi- probabilities against data in the Table 4 format. Our review,
cients. This criterion excluded any article that relied on oth- however, uncovered two minor discrepancies in Rush and
ers’ work to derive the model or merely performed a Vitale’s (1994) classification table (Table 5, p. 331). In
logarithm or logit transformation of the dependent or the Table 5, the hit rate was reported to be 90.6%, and misclas-
independent variable. A complete list of these eight articles sifications were 223 for at-risk children and 112 for non-at-
is found in Appendix B. risk children. The text on page 332 reported a hit rate of
September/October 2002 [Vol. 96(No. 1)] 11

90.71%, and the misclassifications were 223 versus 115, incorrect; it simply makes the interpretation of the regres-
written on page 329. None reported measures of association sion coefficient awkward and less direct.
such as Kendall’s Tau-a, Goodman-Kruskal’s Gamma,
Somers’s D statistic, or the c statistic. None mentioned the Observation to Predictor Ratio
statistical package that performed the logistic analysis,
As stated earlier, the literature has not offered specific
although Rush and Vitale (1994) used SPSS-X to perform
rules that are applicable to logistic regression (Peng et al.,
factor analysis, and those results were subsequently incor-
2002). On the basis of the general rule of a minimum ratio of
porated into logistic regression.
10 to 1, with a minimum sample size of 100, all eight studies
met and even exceeded our recommendation. Therefore, the
Verification of the Binomial Assumption
results reported in these studies were considered stable.
As stated earlier, logistic regression has only one
assumption: The binomial distribution is the assumed dis- Summary
tribution for the conditional mean of the dichotomous out-
In this paper, we demonstrate that logistic regression can
come. This assumption implies that the same probability is
be a powerful analytical technique for use when the out-
maintained across the range of predictor values. Though
come variable is dichotomous. The effectiveness of the
none of the eight studies verified or tested this assumption,
logistic model was shown to be supported by (a) signifi-
the binomial assumption is known to be robust as long as
cance tests of the model against the null model, (b) the sig-
the sample is random; thus, observations are independent
nificance test of each predictor, (c) descriptive and inferen-
from each other. Samples used in the eight studies did not
tial goodness-of-fit indices, (d) and predicted probabilities.
appear to be nonrandom, nor did they have inherent depen-
During the last decade, logistic regression has been gain-
dence among observations. Thus, the binomial assumption
ing popularity. The trend is evident in the JER and higher
appeared to be robust underlying all logistic analyses con-
education journals. Such popularity can be attributed to
ducted by these eight studies.
researchers’ easy access to sophisticated statistical software
that performs comprehensive analyses of this technique. It
Reporting Formats of Logistic Regression Results
is anticipated that the application of the logistic regression
Five of the articles (Alexander, Dauber, & Entwisle, technique is likely to increase. This potential expanded
1996; Diamond, Reagan, & Bandyk, 2000; Machamer & usage demands that researchers, editors, and readers be
Gruber, 1998) did present the logistic model. Of those five, coached in what to expect from an article that uses the
three (Meisels & Liaw, 1993; Smith, 1997; Trusty, 2000) logistic regression technique. What tables, charts, or figures
did not include intercepts in the logistic model. Odds ratios should be included? What assumptions should be verified?
were reported in three studies (McNeal, 1998; Meisels & And how comprehensive should the presentation of logistic
Liaw, 1993; Rush & Vitale, 1994), and odds were reported regression results be? It is hoped that this article has
in one (Trusty, 2000). answered these questions with an illustration of logistic
One study presented results in terms of marginal proba- regression applied to a data set and with guidelines and rec-
bilities (McNeal, 1998). The use of marginal probabilities ommendations offered on a preferred pattern of application
has been criticized by Long (1997, pp. 74–75) and Peng et of logistic methods.
al. (2002) because marginal probabilities do not correspond
to a fixed change in the predicted probabilities that will ACKNOWLEDGMENTS
occur if there is a discrete change in one predictor (e.g., We wish to thank James D. Raths and one anonymous consulting editor
for their very helpful comments on earlier drafts of this article.
reading), while other predictors are realized at a constant. In
other words, the marginal probability corresponding to a REFERENCES
change in reading from 50 points to 60 points is different Austin, J. T., Yaffee, R. A., & Hinkle, D. E. (1992). Logistic regression for
from that associated with another 10-point change from, research in higher education. Higher Education: Handbook of Theory
say, 60 to 70 points. Furthermore, if other predictors (e.g., and Research, 8, 379–410.
Cabrera, A. F. (1994). Logistic regression analysis in higher education: An
age) are held at their respective means, the corresponding applied perspective. Higher Education: Handbook of Theory and
marginal probability for reading is different from that com- Research, Vol. 10, 225–256.
puted at other values (e.g., the mode). One study did not Chuang, H. L. (1997). High school youth’s dropout and re-enrollment
behavior. Economics of Education Review, 16(2), 171–186.
explain how a categorical predictor was coded in the data Cleary, P. D., & Angel, R. (1984). The analysis of relationships involving
(Diamond, Reagan, & Bandyk, 2000). These reporting for- dichotomous dependent variables. Journal of Health and Social Behav-
mats create difficulties for readers to verify results with ior, 25, 334–348.
Cox, D. R., & Snell, E. J. (1989). The analysis of binary data (2nd ed.).
another sample or at another time or place. London: Chapman and Hall.
One study (Trusty, 2000) coded a dichotomous predictor Demaris, A. (1992). Logit modeling: Practical applications. Newbury
as 1 (do not have a computer in the home) and 2 (do have a Park, CA: Sage.
Efron, B. (1975). The efficiency of logistic regression compared to normal
computer), instead of the recommended 0 and 1, or –1/2 and discriminant analysis. Journal of the American Statistical Association,
+1/2 (Peng & So, 2002b). This practice is not necessarily 70, 892–898.
12 The Journal of Educational Research

Haberman, S. (1978). Analysis of qualitative data (Vol. 1). New York: Aca-
demic Press. Remedial
Hosmer, D. W., Jr., & Lemeshow, S. (2000). Applied logistic regression Reading reading
(2nd ed.). New York: Wiley. ID Gender score recommended?
Janik, J., & Kravitz, H. M. (1994). Linking work and domestic problems
with police suicide. Suicide and Life Threatening Behavior, 24(3),
267–274. 3 Girl 52.5 No
Lawley, D. N., & Maxwell, A. E. (1971). Factor analysis as a statistical 4 Girl 54.0 No
method. London: Butterworth & Co. 5 Girl 53.5 No
Lei, P.-W., & Koehly, L. M. (2000, April). Linear discriminant analysis 6 Boy 62.0 No
versus logistic regression: A comparison of classification errors. Paper 7 Girl 59.0 No
presented at the annual meeting of the American Educational Research 8 Boy 51.5 No
Association, New Orleans, LA. 9 Girl 61.5 No
Long, J. S. (1997). Regression models for categorical and limited depen-
dent variables. Thousand Oaks, CA: Sage. 10 Girl 56.5 No
Marascuilo, L. A., & Levin, J. R. (1983). Multivariate statistics in the 11 Boy 47.5 No
social sciences: A researcher’s guide. Monterey, CA: Brooks/Cole. 12 Boy 75.0 No
Menard, S. (1995). Applied logistic regression analysis (Sage University 13 Boy 47.5 No
Paper Series on Quantitative Applications in the Social Sciences, 14 Boy 53.5 No
07–106). Thousand Oaks, CA: Sage. 15 Girl 50.0 No
Menard, S. (2000). Coefficients of determination for multiple logistic 16 Girl 50.0 No
regression analysis. The American Statistician, 54(1), 17–24.
17 Boy 49.0 No
Nagelkerke, N. J. D. (1991). A note on a general definition of the coeffi-
cient of determination. Biometrika, 78, 691–692. 18 Girl 59.0 No
Peng, C. Y., Manz, B. D., & Keck, J. (2001). Modeling categorical vari- 19 Boy 60.0 No
ables by logistic regression. American Journal of Health Behavior, 20 Girl 60.0 No
25(3), 278–284. 21 Boy 60.5 No
Peng, C. Y., & So, T. S. (1998). If there is a will, there is a way: Getting 22 Girl 50.0 No
around defaults of PROC LOGISTIC in SAS. Proceedings of the Mid- 23 Girl 101.0 No
West SAS Users Group 1998 Conference (pp. 243–252). Retrieved from 24 Boy 60.0 No
http://php.indiana.edu/~tso/articles/mwsug98.pdf
Peng, C. Y., & So, T. S. H. (2002a). Modeling strategies in logistic regres- 25 Boy 60.0 No
sion. Journal of Modern Applied Statistical Methods, 14, 147–156. 26 Girl 83.5 No
Peng, C. Y., & So, T. S. H. (2002b). Logistic regression analysis and report- 27 Girl 61.0 No
ing: A primer. Understanding Statistics, 1(1), 31–70. 28 Girl 75.0 No
Peng, C. Y., So, T. S., Stage, F. K., & St. John, E. P. (2002). The use and 29 Boy 84.0 No
interpretation of logistic regression in higher education journals: 30 Boy 56.5 No
1988–1999. Research in Higher Education, 43, 259–293. 31 Boy 56.5 No
Peterson, T. (1984). A comment on presenting results from logit and pro-
32 Girl 45.0 No
bit models. American Sociological Review, 50(1), 130–131.
Press, S. J., & Wilson, S. (1978). Choosing between logistic regression and 33 Boy 60.5 No
discriminant analysis. Journal of the American Statistical Association, 34 Girl 77.5 No
73, 699–705. 35 Boy 62.5 No
Ryan, T. P. (1997). Modern regression methods. New York: Wiley. 36 Girl 70.0 No
SAS Institute Inc. (1999). SAS/STAT® user’s guide (Version 8, Vol. 2). 37 Girl 69.0 No
Cary, NC: Author. 38 Girl 62.0 No
Schlesselman, J. J. (1982). Case control studies: Design, control, analysis. 39 Girl 107.5 No
New York: Oxford University Press.
Scott, K. G., Mason, C. A., & Chapman, D. A. (1999). The use of epi- 40 Girl 54.5 No
demiological methodology as a means of influencing public policy. 41 Boy 92.5 No
Child Development, 70(5), 1263–1272. 42 Girl 94.5 No
Siegel, S., & Castellan, N. J. (1988). Nonparametric statistics for the 43 Boy 65.0 No
behavioral science (2nd ed.). New York: McGraw-Hill. 44 Girl 80.0 No
Tabachnick, B. G., & Fidell, L. S. (1996). Using multivariate statistics (3rd 45 Girl 45.0 No
ed.). New York: Harper Collins. 46 Girl 45.0 No
Tabachnick, B. G., & Fidell, L. S. (2001). Using multivariate statistics (4th
47 Girl 66.0 No
ed.). Needham Heights, MA: Allyn & Bacon.
Tolman, R. M., & Weisz, A. (1995). Coordinated community intervention 48 Boy 66.0 No
for domestic violence: The effects of arrest and prosecution on recidi- 49 Girl 57.5 No
vism of woman abuse perpetrators. Crime and Delinquency, 41(4), 50 Boy 42.5 No
481–495. 51 Girl 60.0 No
52 Boy 64.0 No
53 Girl 65.0 No
APPENDIX A 54 Girl 47.5 No
Hypothetical Data for Logistic Regression 55 Boy 57.5 No
56 Boy 55.0 No
Remedial 57 Boy 55.0 No
Reading reading 58 Boy 76.5 No
ID Gender score recommended? 59 Boy 51.5 No
60 Boy 59.5 No
1 Boy 91.0 No 61 Boy 59.5 No
2 Boy 77.5 No 62 Boy 59.5 No

(Appendix continues)
September/October 2002 [Vol. 96(No. 1)] 13

APPENDIX A—continued

Remedial Remedial
Reading reading Reading reading
ID Gender score recommended? ID Gender score recommended?

63 Boy 55.0 No 122 Boy 80.0 No


64 Girl 70.0 No 123 Girl 57.5 No
65 Boy 66.5 No 124 Girl 64.5 No
66 Boy 84.5 No 125 Girl 65.0 No
67 Boy 57.5 No 126 Girl 60.0 No
68 Boy 125.0 No 127 Girl 85.0 No
69 Girl 70.5 No 128 Girl 60.0 No
70 Boy 79.0 No 129 Girl 58.0 No
71 Girl 56.0 No 130 Girl 61.5 No
72 Boy 75.0 No 131 Boy 60.0 Yes
73 Boy 57.5 No 132 Girl 65.0 Yes
74 Boy 56.0 No 133 Boy 93.5 Yes
75 Girl 67.5 No 134 Boy 52.5 Yes
76 Boy 114.5 No 135 Boy 42.5 Yes
77 Girl 70.0 No 136 Boy 75.0 Yes
78 Girl 67.0 No 137 Boy 48.5 Yes
79 Boy 60.5 No 138 Boy 64.0 Yes
80 Girl 95.0 No 139 Boy 66.0 Yes
81 Girl 65.5 No 140 Girl 82.5 Yes
82 Girl 85.0 No 141 Girl 52.5 Yes
83 Boy 55.0 No 142 Girl 45.5 Yes
84 Boy 63.5 No 143 Boy 57.5 Yes
85 Boy 61.5 No 144 Boy 65.0 Yes
86 Boy 60.0 No 145 Girl 46.0 Yes
87 Boy 52.5 No 146 Girl 75.0 Yes
88 Girl 65.0 No 147 Boy 100.0 Yes
89 Girl 87.5 No 148 Girl 77.5 Yes
90 Girl 62.5 No 149 Boy 51.5 Yes
91 Girl 66.5 No 150 Boy 62.5 Yes
92 Boy 67.0 No 151 Boy 44.5 Yes
93 Girl 117.5 No 152 Girl 51.0 Yes
94 Girl 47.5 No 153 Girl 56.0 Yes
95 Girl 67.5 No 154 Girl 58.5 Yes
96 Girl 67.5 No 155 Girl 69.0 Yes
97 Girl 77.0 No 156 Boy 65.0 Yes
98 Girl 73.5 No 157 Boy 60.0 Yes
99 Girl 73.5 No 158 Girl 65.0 Yes
100 Girl 68.5 No 159 Boy 65.0 Yes
101 Girl 55.0 No 160 Boy 40.0 Yes
102 Girl 92.0 No 161 Girl 55.0 Yes
103 Boy 55.0 No 162 Boy 52.5 Yes
104 Girl 55.0 No 163 Boy 54.5 Yes
105 Boy 60.0 No 164 Boy 74.0 Yes
106 Boy 120.5 No 165 Boy 55.0 Yes
107 Girl 56.0 No 166 Girl 60.5 Yes
108 Girl 84.5 No 167 Boy 50.0 Yes
109 Girl 60.0 No 168 Boy 48.0 Yes
110 Boy 85.0 No 169 Girl 51.0 Yes
111 Girl 93.0 No 170 Girl 55.0 Yes
112 Boy 60.0 No 171 Boy 93.5 Yes
113 Girl 65.0 No 172 Boy 61.0 Yes
114 Girl 58.5 No 173 Boy 52.5 Yes
115 Girl 85.0 No 174 Boy 57.5 Yes
116 Boy 67.0 No 175 Boy 60.0 Yes
117 Girl 67.5 No 176 Girl 71.0 Yes
118 Boy 65.0 No 177 Girl 65.0 Yes
119 Girl 60.0 No 178 Girl 60.0 Yes
120 Boy 47.5 No 179 Girl 55.0 Yes
121 Girl 79.0 No 180 Boy 60.0 Yes

(Appendix continues)
14 The Journal of Educational Research

APPENDIX A—continued

Remedial
Reading reading
ID Gender score recommended?

181 Boy 77.0 Yes


182 Boy 52.5 Yes
183 Girl 95.0 Yes
184 Boy 50.0 Yes
185 Girl 47.5 Yes
186 Boy 50.0 Yes
187 Boy 47.0 Yes
188 Boy 71.0 Yes
189 Girl 65.0 Yes

APPENDIX B
List of JER Articles Reviewed

1. Alexander, K. L., Dauber, S. L., & Entwisle, D. R. (1996).


Children in motion: School transfers and elementary school per-
formance. The Journal of Educational Research, 90(1), 3–11.
2. Diamond, K. E., Reagan, A J., & Bandyk, J. E. (2000). Par-
ents’ conceptions of kindergarten readiness: Relationships with
race, ethnicity, and development. The Journal of Educational
Research, 94(2), 93–100.
3. Machamer, A. M., & Gruber, E. (1998). Secondary school,
family, and educational risk: Comparing American Indian adoles-
cents and their peers. The Journal of Educational Research, 91(6),
357–369.
4. McNeal, R. B., Jr. (1998). High school extracurricular activi-
ties: Closed structures and stratifying patterns of participation.
The Journal of Educational Research, 91(3), 183–191.
5. Meisels, S. J., & Liaw, F.-R. (1993). Failure in grade: Do
retained students catch up? The Journal of Educational Research,
87(2), 69–77.
6. Rush, S., & Vitale, P. A. (1994). Analysis for determining fac-
tors that place elementary students at risk. The Journal of Educa-
tional Research, 87(6), 325–333.
7. Smith, J. B. (1997). Effects of eighth-grade transition pro-
grams on high school retention and experiences. The Journal of
Educational Research, 90(3), 144–152.
8. Trusty, J. (2000). High educational expectations and low
achievement: Stability of educational goals across adolescence.
The Journal of Educational Research, 93(6), 356–365.

You might also like