Inferring Transition Probabilities from Repeated
Cross Sections: A Cross-level Inference
Approach to US Presidential Voting
Ben Pelzer
Research Technical Department, University of Nijmegen,
P.O. Box 9104, 6500 HE Nijmegen, The Netherlands
email:
[email protected]
Rob Eisinga
Department of Social Science Research Methods, University of Nijmegen,
P.O. Box 9104, 6500 HE Nijmegen, The Netherlands
email:
[email protected]
Philip Hans Franses
Econometric Institute, Erasmus University Rotterdam,
P.O. Box 1738, 3000 DR Rotterdam, The Netherlands
email:
[email protected]
Econometric Institute EI 2001-21
Abstract
This paper outlines a nonstationary, heterogeneous Markov model designed to
estimate entry and exit transition probabilities at the micro-level from a time
series of independent cross-sectional samples with a binary outcome variable.
The model has its origins in the work of Moffitt (1993) and shares features with
standard statistical methods for ecological inference. We show how ML estimates
of the parameters can be obtained by the method-of-scoring, how to estimate
time-varying covariate effects, and how to include non-backcastable variables in
the model. The latter extension of the basic model is an important one as it
strongly increases its potential application in a wide array of research contexts.
The example illustration uses survey data on American presidential vote
intentions from a five-wave panel study conducted by Patterson (1980) in 1976.
We treat the panel data as independent cross sections and compare the estimates
of the Markov model with the observations in the panel. Directions for future
work are discussed.
______________________________
Authors’ note: The data utilized in this paper were made available by the Inter-university
Consortium for Political and Social Research (ICPSR). The data for Presidential Campaign
Impact on Voters: 1976 Panel, Erie, Pennsylvania, and Los Angeles were originally collected by
Thomas E. Patterson. Neither the collector of the original data nor the Consortium bear any
responsibility for the analysis or interpretation presented in this paper. The program CrossMark to
do the ML estimation reported here is, although not completed documented yet, available upon
request.
1 Introduction
Surveys that trace the same units across occasions provide the most powerful sorts of data
for dynamic electoral analysis. However, on many political issues repeated observations
are simply unavailable and those panel data sets that do exist are typically of limited time
coverage. This shortcoming combined with potential drawbacks like nonrandom attrition
and conditioning limits the use of panel data for the analysis of long-term political change.
In the absence of suitable panel data, repeated cross-sectional (RCS) surveys may
provide a viable alternative. There exists an abundance of high-quality RCS data and many
are available for relatively long time periods. Given the importance of dynamics in
electoral studies and the paucity of panel data, it would be of great advantage if such data
could be used for the estimation of longitudinal models with a dynamic structure. The
objective of this paper is to explore those possibilities. Specifically, our purpose here is to
discuss a nonstationary, heterogeneous Markov model for the analysis of a binary
dependent variable in a time series of independent cross-sectional samples. The model has
its origins in the work of Moffitt (1993) and shares features with standard statistical
methods for ecological or cross-level inference as outlined, for example, by Achen and
Shively (1995) and King (1997). It offers the opportunity to estimate individual-level entry
and exit transition rates and to examine the effects of time-constant and time-varying
covariates on the hazards. Previous brief discussions of specific versions of the model
include Mebane and Wand (1997) and Pelzer, Eisinga, and Franses (2001).
The following section presents the basic Markov model for RCS data along with
parameter estimation and various extensions of Moffitt’s approach. Section 3 provides an
example application using panel data on American presidential vote intentions from a fivewave survey conducted by Patterson (1980) in 1976. We treat these data as independent
cross sections and compare the predictions of the Markov model for RCS data with the
actual transitions in the panel. The calibration results suggest that the model can provide a
useful tool for inferring individual-level transition probability estimates in the absence of
transition data in cross-sectional samples. A discussion of intended extensions of the
model concludes the paper.
2 Estimating transition probabilities with RCS data
It is assumed in the sequel that the population is closed with respect to in- and outmigration, that the responses are observed at evenly spaced discrete time intervals
2
t = 1,2,... , and that the samples at periods t j and t k are independent if j ≠ k . The symbol
it is commonly used to indicate repeated observations on the same sample element i . To
simplify notation, this paper uses the symbol it to also index individuals in RCS samples.
Suppose we have a two-state Markov matrix of transition rates in which the cell
probabilities sum to unity across rows. For this 2x2 table, we define the following three
terms, were Yit denotes the value of the binary random variable Y for unit i at time point
t:
pit = P(Yit = 1 ) , µit = P(Yit = 1 Yit −1 = 0) and λit = P (Yit = 0 Yit −1 = 1) .
These marginal and conditional probabilities respectively give rise to the well-known flow
equation
E (Yit ) = pit = µit (1 − pit −1 ) + (1 − λit ) pit −1 = µit + ηit pit −1 ,
(1)
where ηit = 1 − λit − µit . This accounting identity is the elemental equation for estimating
dynamic models with repeated cross-sections as it relates the marginal probabilities pi at
t and t − 1 to the entry ( µit ) and exit ( λit ) transition probabilities. Clearly, the major
difficulty with using RCS data for dynamic analysis is that the surveys are ‘incomplete’ in
the sense that they do not assess directly the state-to-state transitions over time for each
individual unit. In RCS data one only observes at each of a number of occasions a different
sample of units and their current states, that is yit is observed but yit −1 is not. This
information gap implies that some identifying restrictions over i and/or t must be
imposed to estimate the unobserved transitions.
A rather restrictive approach frequently applied in the statistical literature is to a
priori assume that the transition probabilities are time-stationary and unit-homogeneous,
hence µit = µ and λit = λ for all i and t . It is easy to show that in this case the long-run
outcome of pit is pit = µ /( µ + λ ) as t goes to infinity. Some early references relating to
this steady-state model include those that estimate transition rates from aggregate
frequency data (Kalbfleish and Lawless 1984 1985, Lawless and McLeish 1984, Lee,
Judge, and Zellner 1970). The formulation has also been applied in various economic
3
studies (Topel 1983, McCall 1971), in the famous mover-stayer model of intragenerational job mobility (Bartholomew 1996, Goodman 1961), and in electoral studies on
voter transitions (Firth 1982). However, the assumption that individual differences in
transitions are not present in the population lacks plausibility in many applications.
Consequently, as noted by Hawkins and Han (2000), studies that assume a timehomogeneous Markov evolution with a common transition probability matrix have found
their estimates to be extremely inefficient.
A flexible approach that facilitates a more accurate representation of the transition
probabilities without imposing some presumed structure is provided by the reduced-form
dynamic version of eqn. (1). If we let the initial probability pi 0 = 0 (or t → ∞ ), it is
straightforward to show that the reduced form for pit is
t −1
t
pit = µit + ∑ µiτ ∏ ηis ,
τ =1
s =τ +1
(2)
where ηis = 1 − λis − µis . By explicitly allowing for time-dependence and unitheterogeneity, this reduced-form dynamic model is better suited to yield an informative
representation of the transition probability estimates. It will therefore be maintained in the
ensuing approach.
The framework Moffitt (1993) proposed to estimate eqn. (2) is based on a simple
observation. While RCS data lack direct information on transitions in opinions, preferences,
choices and other individual behaviors, they often do provide a set of time-invariant and
time-varying covariates X it that affect the hazards. If so, the history of the covariates (i.e.,
X it , X it −1 , ! , X i1 ) can be employed to generate backward predictions for the transition
probabilities
( µit , µit −1 , ! , µi1 and λit , λit −1 , ! , λi 2 )
and
thus
for
the
marginal
probabilities ( pit , pit −1 , ! , pi1 ). Hence the key idea is to model the current and past µit
and λ it in a regression setting as functions of current and backcasted values of timeinvariant and time-varying covariates X it . Parameter estimates for the covariates are
obtained by substituting the hazard functions into eqn. (2).
4
The
hazard
functions themselves are specified as
µit = F ( X it β )
and
λit = 1 − F ( X it β * ) , where F - in the current paper - is the logistic link function. Hence, it
is assumed that
logit ( µit ) = X it β and logit (1 − λit ) = X it β * ,
where β and β * are two potentially different sets of parameters associated with two
potentially different sets of covariates X it . This regression setup offers the opportunity to
estimate transition probabilities that vary across both individuals and - if the model
includes time-varying covariates - time periods. Maximum likelihood estimates of β and
β * can be obtained by maximization of the log likelihood function
T nt
LL = ∑∑ [yit log( pit ) + (1 − yit ) log(1 − pit )],
t =1 i =1
with respect to the parameters, where nt is the number of observations of cross section t
and T is the number of cross sections. As Moffitt (1993) notes, obtaining pit by means of
eqn. (2) is equivalent to ‘integrating out’ over all possible transition histories for each
individual i at time t to derive an expression for the marginal probability estimates. To
convey this idea, compare the contribution to the likelihood by the i th case at time point t
in panel data with the likelihood contribution in RCS data. For a first-order transition model
of binary recurrent events the contribution can be written as
y (1− y it −1 )
Lit ( β , β * ) = µit it
(1− y it ) y it −1
(1 − λit ) y it y it −1 (1 − µit ) (1− y it )(1− y it −1 ) λit
(e.g., Stott 1997). Hence, conditional on yt and yt −1 , the likelihood contribution
simplifies to a single transition probability estimate. For
however, the contribution from the i th case is given by
5
RCS
data with a binary outcome,
Lit ( β , β * ) = [µit (1 − pit −1 ) + (1 − λit ) pit −1 ]y it [(1 − µit )(1 − pit −1 ) + λit pit −1 ]1− y it .
In this formulation the likelihood contribution does not collapse to a single rate estimate
but rather to a weighted sum of two hazards. Also note from this comparison that estimates
of the parameters of the hazard functions in RCS data are likely to be less efficient than they
would be in a comparable panel data set. To summarize the model a graphical presentation
is given in Figure 1, omitting the subscript i for clarity.
Figure 1 about here
The marginal probability pit depends on the set of all possible transition histories for each
individual i up to time t . The unobserved transition probabilities in their turn are modeled
as functions of current and backcasted values of time-invariant and time-varying covariates
X it . As Mebane and Wand (1997) point out, an important characteristic of the model is
that the transition probabilities are estimated as a function of all the available crosssectional samples rather than simply the observations from the current time period. This
full information strategy expresses the notion that in
RCS
data different groups of
individuals are observed over time, but individuals with similar covariate values are
exchangeable in the sense their transition histories are assumed to be identical.
2.2 Extensions and modifications of the basic model
ML estimation. Moffitt (1993) offers no discussion of the computation of the maximum
likelihood parameter estimates. A convenient optimization technique, implemented in our
program CrossMark, is Fisher’s method-of-scoring (Amemiya 1981). If we suppress the
subscript i for the moment to avoid cumbersome notation and define p0 = 0 , the first
order partial derivatives of LL with respect to the parameters β and β * are easily
established as
∂µ
y − pt ∂pt −1
∂LL ∂LL ∂pt
ηt + t (1 − pt −1 )
=
⋅
= t
⋅
∂β
∂β
∂pt ∂β
pt (1 − pt ) ∂β
and
6
∂LL
∂β *
=
∂λ
y − pt ∂pt −1
∂LL ∂pt
ηt −1 − t pt −1 ,
⋅
⋅
= t
∂pt ∂β * pt (1 − pt ) ∂β *
∂β *
where ∂µt / ∂β = xt µt (1 − µt ) and ∂λt / ∂β * = − xt λt (1 − λt ) . The
values
of
the
parameters
for
which
the
efficient
ML
scores
estimators are the
are
zero,
i.e.,
∂LL / ∂β = ∂LL / ∂β * = 0 . Let θ denote the stacked column vectors β and β * , then the
method-of-scoring
uses
the
iterative
estimation
algorithm
θˆ ( k +1) = θˆ ( k ) + ε [Î(θˆ ( k ) )]−1 (∂LL(θˆ ( k ) ) / ∂θ ) . The parameter ε denotes an appropriate
step length that scales the parameter increments and Î(θˆ(k ) ) is an estimate of the Fisher
information matrix I(θ ) = − Ε[∂ 2 LL(θ ) / ∂θ ] evaluated at θ = θˆ ( k ) , where ∂ 2 LL(θ ) / ∂θ is
the Hessian. The method-of-scoring also provides, by design, an estimate of the asymptotic
variance-covariance matrix of the model parameters, given by the inverse of the estimated
Fisher information matrix evaluated at the values of the maximum likelihood estimates.
Non-backcastable covariates. The estimation strategy proposed by Moffitt (1993) involves
searching the cross-sectional data files for variables taking known values in the past.
Clearly, time-invariant characteristics such as sex, race, cohort, completed education,
etcetera are candidates and time-specific aggregates measurable in the past may also enter
the model. But variables like age are usable too, as are age-related variables such as the
number of children at different ages, since knowledge of the current age implies
knowledge of age in any past year. Given current information, each age and time-invariant
variables relevant for preceding years are known.
In many applications settings, however, we have time-dependent covariates that the
basic model would omit because the past histories are unknown. To incorporate these
‘non-backcastable’ variables, we adopt a model with two different sets of parameters for
both µit and λ it , i.e., one for the current transition probability estimates and a separate
one for the preceding ones. Define Zit as a vector of non-backcastable variables with
Zit = Z it for cross section t and Zit = 0 for the cross sections t − 1, ! ,1 and ζ as the
associated parameter vector. One can then write
7
X β ** + Zitζ
logit ( µit ) = it
X it β
for t
for t − 1, ! ,1,
where β ** = β + β + . A similar model with non-backcastable covariate effects on the exit
rates may be specified for λit . This specification offers the opportunity to express the
current transition probability estimates as a logistic function of both backcastable and nonbackcastable variables. The expression obviously also affords a test – useful for efficiency
gains - of the hypothesis H 0 : β + = 0 , using the restriction β ** = β .
Time-varying covariate effects. Another potential drawback of the basic model is that it
assumes that the effects of the covariates are fixed over time. This restriction may not be
valid for long time periods and thus potentially biases the estimated effects. Of course,
modifying continually the values of the parameters - so as to allow the model to adapt itself
to ‘local’ conditions - produces problems of overparametrization. We aim to avoid such
problems by assuming the parameters to be constant across a limited number of time
periods. An alternative specification, not pursued in this paper, is to allow the regression
coefficients
to
become
polynomials
in
time
using
the
expression
β t = γ 0 + γ 1t + γ 2t 2 + " + γ d t d , where d is a positive integer specifying the degree of the
polynomial. For this parametric setup, too, it will be desirable to have models with low
degree polynomials that avoid nonexistence of unique ML estimates.
First observed outcome. Moffitt (1993) defines the first observed outcome of the process,
P(Yi1 = 1) , to equal the transition probability µi1 . However, in many applications it will be
more plausible to take P(Yi1 = 1) to equal the state probability pi1 . That is, one
conveniently assumes that the Yi1 's are random variables with a probability distribution
P(Yi1 = 1) = F ( X itδ ) , where δ is a set of parameters to be estimated and F is the logistic
link function. The δ -parameters for the first observed outcome at t = 1 are estimated
simultaneously with the entry and exit parameters of interest at t = 2, ! , T . Note once
again that the probability vector at the beginning of the Markov chain is estimated as a
function of all cross-sectional data, rather than simply the observations at t = 1.
8
Unequal sample sizes. We may also relax the implicit assumption that the cross-sections at
each time t are of the same sample size. To ensure a potentially equal contribution of the
cross-sectional samples to the likelihood, we use the weighted log likelihood function
T nt
LL∗ = ∑∑ wi [yit log( pit ) + (1 − yit ) log(1 − pit )],
where
wi = n / nt ,
with
t =1 i =1
n = ∑ t =1 nt / T , nt is the number of observations of cross section t and T is the number
T
of cross sections.
Shrinking logical bounds. The partition equation (1) implies the familiar restriction,
customarily attributed to Duncan and Davis (1953),
µit =
pit
pit −1
p
(1 − pit −1 )
κ it and κ it = it −
µit ,
−
pit −1
pit −1
(1 − pit −1 ) (1 − pit −1 )
where κ it = 1 − λit . These identities were used by King (1997) to construct a so-called
tomography plot. The axes of this plot represent the parameters κ it and µit and the linear
constraint on each individual i inherent in eqn. (1) is represented by a tomography line
with intercept pit /(1 − pit −1 ) and slope − pit /(1 − pit −1 ) that goes through the point
(κ it , µit ) . The lines have a limited range of angles (i.e., all have a negative slope) and they
all intersect the 45◦ line of µit = κ it at ( pit , pit ) . Since the estimated probabilities are
guaranteed to lie in the
(0,1)
range, we have that
µit ∈ ( Lµit , Uµit )
and
κ it ∈ ( Lκ it ,Uκ it ) , where the lower ( L ) and upper (U ) bounds of these intervals are
defined by the min and max operators
p − pit −1
pit
≤ µit ≤ min
Lµit = max 0, it
,1 = Uµit
1 − p it −1
1 − pit −1
and
p
p − (1 − pit −1 )
≤ κ it ≤ min it ,1 = Uκ it
Lκ it = max 0, it
p it −1
pit −1
9
(King 1997). Hence the estimated values of µit and κ it are constrained to lie on that part
of the tomography line that intersects the feasible region defined by the logical boundary
points. Since the limits are related
Uκ it =
pit
p
(1 − pit −1 )
(1 − pit −1 )
Lµit and Lκ it = it −
Uµit ,
−
pit −1
pit −1
pit −1
pit −1
the tomography lines correspond to the main diagonal of the rectangular region defined by
the lower and upper bounds. Because the estimates produced are restricted to lie on the
diagonal they satisfy κ it = ait − bit µit , where ait = (UµitUκ it − Lµit Lκ it )(Uµit − Lµit ) −1
and bit = (Uκ it − Lκ it )(Uµit − Lµit ) −1 (Chambers and Steel 2001).
Our estimation procedure implicitly takes into account the bounds and thereby
restricts the range of feasible estimates of µit and κ it . This is accomplished simply by
constraining the individual probabilities to lie within the admissible range (0,1) . Clearly,
explicit assumptions about the relative magnitude of µit and κ it would allow one to
narrow the bounds beyond the logical limits. For example, in studies of US interparty
electoral transition it may be assumed, in the spirit of Shively (1991), that the probability
that a Democrat at t − 1 repeats a vote for that party at t is greater than the probability that
a non-Democrat at t − 1 shifts to the Democrats at t . This assumption translates into the
restriction that κ it > µit (i.e., ηit > 0 ). Such a restriction is difficult to justify in general,
however, and we would not expect it to be the case for every single voter. Because there is
also no algebraic requirement in eqn. (1) that ηit > 0 , we would not recommend using this
assumption universally. Also note that if the entry and 1-exit transitions are equal to each
other (i.e., µit = κ it ), identity (1) reduces to pit = µit .
Quantities of interest. The model presented above may be used for different purposes. One
is to understand the individual level relation between covariate effects and transitions in a
binary response variate, under Markov assumptions. Another potential goal is to estimate
transition probabilities when individual sequence information is not available. The
empirical application below illustrates how the model can be used to provide information
on individual electoral transitions and the role of voting-related covariates when exact
10
voting sequences are unknown. While our illustration example uses bimonthly panel data
the model is obviously designed for estimating transition probabilities from repeated cross
sections covering relatively long-term periods. An example of when such a formulation is
most relevant includes an analysis of the labor force participation decisions of Dutch
women over the 1986-1995 period by Pelzer, Eisinga and Franses (2001).
3. Application
Our empirical illustration employs election-year panel data on US presidential vote
intention drawn from the campaign study conducted by Patterson (1980) in Erie, PA, and
Los Angeles, CA, in 1976. These five-wave bimonthly panel data were also used by
Sigelman (1991) in his panel ecological inference study. Obviously, the purpose of this
example is to illustrate the model rather than to provide a definitive analysis of the data.
The panel data were treated as if they were a temporal sequence of cross sections of the
electorate. That is, no information on the cov ( yt , y t −1 ) is available in the data file used for
analysis. The application uses panel data because they provide a check of the ability of the
Markov approach to recover known party-switching transitions. Some caution is warranted
in interpreting the results, however, as the individual transition probability estimates are
based on observations that are not independent. Consequently, in this particular application
the variance-covariance matrix of the first derivatives may not be a consistent estimator of
the Hessian and hence the parameter standard errors. The binary outcome variable y it is
defined to equal 1 if the voter i prefers the Democratic party or candidate (i.e., Carter) at
time period t and 0 otherwise (i.e., Republican party or candidate (Ford) and others).
Table 1 about here
Table 1 provides some summary descriptive statistics. It gives the number of observations
including panel inflow and outflow, the marginal distribution of y it over time, and the
observed entry and exit transitions rates in the panel. The table shows that, despite
substantial bimonthly turnover with values ranging from .138 to .248, almost half of the
respondents continue to prefer the Democratic presidential candidate. The bottom part of
the table presents the (non)participation patterns across the five waves of data and the
number of sample members attriting from the panel. Because some nonrespondents from
11
one wave are recruited back into the sample at subsequent waves, both monotone and
nonmonotone attrition patterns arise. It is important to note that the analysis includes both
attritors and nonattritors.
Next to voting intention, the survey provides information on socio-demographic
characteristics and attitudes towards presidential candidates. The analysis presented here
uses only covariates that would generally be available in repeated cross-sectional surveys.
As backcastable variables, the analysis employs vote choice at the preceding election (i.e.,
whether the respondent voted for either Nixon or Ford in 1972), race, education, age, and
sex. All these covariates are assumed to be fixed over the surveys’ duration. In addition to
these time-constant variables the analysis includes several non-backcastable covariates.
These include (i) whether the respondents identify themselves as Democrat or not, (ii)
responses to the statements “It doesn’t make much difference whether a Republican or a
Democrat is elected President” and “All in all, Gerald Ford has done a good job as
President”, (iii) measures of (un)favorable feelings towards the candidates Ford and Carter,
and (iv) opinions about their specific qualities, i.e., very (un)trustworthy, excellent/poor
leader, and great deal of/almost no ability. The responses to the two statements and the
candidate images were all registered on seven-point Likert-type scales, running from
“strongly disagree” to “strongly agree”, from “unfavorable” to “favorable”, etcetera.
3.1 Model estimation
First a time-stationary Markov model with constant terms only was applied to the data.
This model produced the parameters β ( µ t >1 ) = -.238 and β * (λt >1 ) = .034 and a
corresponding maximum log likelihood value of LL* = -2643.56. These estimates imply
constant transition rates of µ = .44 and λ = .51; hence implausibly high values that amply
exceed the observed rates reported in Table 1. The model was thereupon extended to a
nonstationary, heterogeneous Markov model (model 1) by including the backcastable
covariates reported above. The results are shown in Table 2.
Table 2 about here
The parameters in the first column show the effects of the backcastable variables on the
probability of a Democratic vote at t = 1 , pi1 , estimated for all observations. As can be
12
seen, the parameters are well determined with a Democratic preference positively affected
by being black and a vote for McGovern in 1972 and negatively by education and a vote
for Nixon at the 1972 election. The second column of Table 2 presents the effects of the
backcastable variables on the transitions from non-Democratic (i.e., Republican and
others) to Democratic. Whereas a previous vote for McGovern is significant in
encouraging entry into a Democratic preference, education, age, and a 1972 vote for Nixon
negatively affect the entry decision. The third column gives the effects on the transitions
into non-Democratic. We find that the exit rates are negatively affected by a vote for
McGovern in 1972, being black and age and positively by sex (female).
The right-hand side of the table (model 2) reports the regression estimates of a
transition model that also includes the non-backcastable variables with unknown covariate
history. Wald and likelihood ratio tests revealed no significant difference between the
effects of the backcastable variables on the current transitions and their effects on the past
transitions. The table therefore presents a single parameter for the backcastable covariates.
Further, because there are substantive arguments to anticipate that the effects of the nonbackcastable covariates may vary over the period leading up to the election, several tests
with different time-varying-coefficient models of varying degrees of simplicity were
applied to the data. The model shown in Table 2 describes the data best in terms of
goodness-of-fit. The likelihood-ratio statistic may also be computed to assess the statistical
significance of the improvement in fit that results from including the non-backcastable
variables. But it is clear from the log likelihood values in Table 2 that the enlarged model
provides a much better fit.
The columns pertaining to model 2 again show the estimated effects on the state
probability pi1 . Whereas the effects of a 1972 vote for McGovern and identification with
the Democrats turn out to be positive, the effects of a vote for Nixon, favorable feelings
towards Ford and indifference towards the future president’s leaning are negative. The last
two columns of Table 2 provide the effects on the entry and exit rates respectively with
respect to a Democratic vote. The columns labeled t indicate the time periods pertaining
to the (time-varying) parameters. For example, favorable feelings towards Carter has an
effect of .35 at t = 2, 3 and an effect of 1.14 at t = 4, 5 . Most of the parameters are again
well determined and consistent with those commonly reported in the literature. In short,
positive attitudes towards the Republican (Democratic) candidate Ford (Carter) decrease
13
(increases) the entry rates and increases (decreases) the exit rates. The stronger respondents
think of themselves as being Democrat, the higher (lower) their entry (exit) transition rates.
The tomography lines for one time period are singled out for discussion purposes.
Figure 2 shows for all i at t = 5 the lines µi 5 = ( pi 5 / 1 − pi 4 ) − ( pi 4 / 1 − pi 4 )κ i 5 , where
κ i 5 = 1 − λi 5 .
Figure 2 about here
The 691 lines all have a negative slope and they all intersect the 45◦ line of µi 5 = κ i 5 at
( pi 5 , pi 5 ) . The permissible range of the parameters for an individual can be obtained by
projecting each line onto the horizontal (for κ i 5 ) and vertical (for µi 5 ) axes. Note that
while most of the point estimates are below the 45◦ line, for a substantial number of the
estimates µi 5 exceeds κ i 5 . In fact, almost 25% of the observations fail to conform to the
restriction that κ it > µit . Hence incorporating the external substantive assumption that
party loyalty rates are greater than defection rates would most likely lead to incorrect
conclusions. Visual inspection of the figure also suggests a strong relationship between
µi 5 and κ i 5 , with low (high) entry rates corresponding with high (low) exit rates. Also
note that most of the predictions tend to the basically ideal situation of either extremely
high or extremely low transition probability estimates.
3.2 Model validation
To understand how well the model reproduces the panel observation we may examine its
efficacy in a variety of ways. One is to assess the fit of the model in terms of prediction
errors, using the panel data and various summary measures, i.e., the mean squared error
(MSE), the mean value of minus log likelihood error (MML), and the mean probability of
correct allocation (MCA). Details are given in Table 3.
Table 3 about here
The prediction error measures can be seen as analogues to the R-squared measure in
regression. The
MSE
OLS
and MML tend to zero if µit ( λit ) tends to 0 or 1 and the smaller the
14
error rate, the better the model predicts. Table 3 indicates that the mean squared errors and
the mean minus log likelihoods are remarkably low and gradually lean to the ideal situation
of perfect separation between the yit = 0 and yit = 1 groups. The average probability of
correct allocation also reveals that the ability of the model to recover the observed
transitions is very good, ranging from a low of .736 to a high of .899. Note that the
summary measures suggest that the model does somewhat better in terms of predicting
entry than it does in predicting exit.
Another way to examine the performance of the model is to compare the actual
sample frequency of all possible bimonthly (0,1) voting sequences with the estimated
expected frequency of each sequence. The latter were computed as follows. With T sample
periods, we have
∑
T
t =1
2 t different (0,1) sequences (which in the present application equals
62) ranging in length from 1 (e.g., ‘0’) to T (e.g., ‘11111’) . We define the probability of a
sequence
of
length
t
for
each
observation
i
of
cross
section
t
as
~
pi ( ~
y1 ,..., ~
yt ) = P (Yi1 = ~
y1 ∩ ! ∩ Yit = ~
yt ) , where ~
y1 , ! , ~
yt = 0,1 . Hence
~
pi ( ~
y1 ) = P(Yi1 = ~
y1 ) = ~
y1 pi1 + (1 − ~
y1 )(1 − pi1 ) ,
where pi1 is P(Yi1 = 1) . For t > 1 , we have
t
~
pi ( ~y1 ,..., ~
yt ) = ~
pi ( ~y1 )∏τ = 2 ( p00 + p01 + p10 + p11 ) ,
where p00 = (1 − ~
yτ −1 )(1 − ~
yτ )(1 − µ iτ ) ,
p01 = (1 − ~
yτ −1 ) ~
yτ µ iτ ,
p10 = ~
yτ −1 (1 − ~
yτ )λiτ , and
pi ( ~
y1 ,..., ~
yt ) for all observations of cross section t
p11 = ~
yτ −1 ~
yτ (1 − λiτ ) . The mean value of ~
n
p( ~
y1 ,..., ~
yt ) = ∑i =t 1 ~
pi ( ~
y1 ,..., ~
yt ) / nt . The estimated expected absolute
was obtained as ~
frequency ~
f (~
y1 ,..., ~
y t ) of each participation sequence was thereupon computed by
evaluating ~
f (~
y1 ,..., ~
yt ) = ~
p( ~
y1 ,..., ~
yt ) nt .
An initial examination of the frequencies is to compare the expected with the
observed first-order transitions (i.e., yt −1 , yt ) over the time period of our data. Before
embarking on the findings it is important to note that while model 2 predicts the current
probabilities at time point t (i.e., pit , µit and λit ) very well, it does not in general
15
reproduce the past probabilities at t − 1 , t − 2 , etcetera equally well. The reason is that the
past probabilities are predicted by the backcastable variables only and these are not very
good predictors. This obviously hampers the estimation of the expected frequencies. We
therefore decided to ‘backcast’ the nonbackcastable variables a single time period - by
assuming them to be constant for the two consecutive cross sections at t − 1 and t - and
subsequently compute the expected frequencies. Table 4 shows the relative frequencies of
the observed and the estimated expected first-order voter transitions between parties.
Table 4 about here
As can be seen, both the observed and the predicted frequencies are concentrated in the
continuous Democratic vote (11) and the continuous non-Democrat vote (00) categories.
Also note that the partisan changes seem to decline over time leading up to the presidential
election. Further, the discrepancies between the predicted and the observed frequencies are
all relatively small and not significant at the .05 level. This implies that both loyal and
defection categories are predicted well.
A final examination of the goodness-of-fit reported here is to compare the
estimated expected and actually observed absolute frequencies of all 62 (0,1) voting
sequences. They are tabulated in Table 5.
Table 5 about here
The longitudinal voting profiles indicate that most voters remain loyal to their initial
preference and that proportionally few change their vote intention frequently. What is
encouraging is the ability of the model to recover sequence membership, even in the
presence of relatively extreme patterns of vote switching. Table 5 indicates quite clearly
that for most sequences the estimated expected frequencies predicted by the RCS transition
model match the observed frequencies in the panel data well. The only notable exceptions
are the highly populated consecutive Democratic vote categories (i.e., the sequences of
1’s). However, even for these sequences model performance is quite good. Hence overall
our findings illustrate that the model is well able to recover the actual transitions in the
panel.
16
4. Conclusion
The Markov model for cross-level inference presented here can help us better understand
binary transitions when it is either impossible or impractical to collect panel information
on the exact sequences. Our example application shows that the model captures voters
with very different entry and exit transitions probabilities. More important, it yields
transition frequency estimates remarkably consistent with the observations in the panel.
The results thus demonstrate that the proposed model can be used to accurately identify
transition probabilities solely on the basis of repeated cross sections and hence to coax
panel conclusions out of non-panel data.
Although the above model promises to be useful in different settings, there are
some extensions that we are currently exploring that may further enhance its applicability.
One next step is to allow for unobserved heterogeneity. The model specification assumes
that individual heterogeneity is due to the observed variables. It is likely, however, that
unobserved and possibly unobservable variables are also a source of heterogeneity.
Ignoring this over-dispersion is unlikely to change point estimates in any radical way, but
estimates of standard errors will be underestimated and tests will be in error. It is thus
important to try to account for it. Another extension of interest is to use Bayesian methods,
in the spirit of King, Rosen, and Tanner (1999) and Rosen, Jiang, King, and Tanner
(2001), next to
ML
estimation. A limitation of
ML
is that it is basically a large-sample
inferential approach. With small or moderate-sized data sets, the likelihood may have a
nonnormal shape and asymptotic theory may not work well. It is unknown, however, how
large the sample should be for the standard errors based on the information matrix of the
present model to yield reliable inferences. One approach to study this small sample
problem is to analyse the data by MCMC using Gibbs or Metropolis sampling.
Finally, it has frequently been argued that King’s ecological inference solution can
fruitfully be adapted to repeated cross sections (Penubarti and Schuessler 1998, King,
Rosen, and Tanner 1999, Davies Withers 2001). Despite the steady development in
ecological analysis in the direction of more sophisticated statistical modeling, little has
been done to date on developing models that draw panel inference from non-panel data
(Sigelman 1991 and Penubarti and Schuessler 1998 are notable exceptions). It is our
believe that the approach presented here has the potential to make a significant
contribution to political (and other) inquiry.
17
References
Achen, Christopher H., and W. Phillips Shively. 1995. Cross-level Inference. Chicago:
Chicago University Press.
Amemiya, Takeshi. 1981. “Qualitative Response Models: A Survey.” Journal of
Econometric Literature 19:1483-1536.
Bartholomew, David J. 1996. The Statistical Approach to Social Measurement. San Diego:
Academic Press.
Chambers, R.L., and D.G. Steel. 2001. “Simple Methods for Ecological Inference in 2x2
Tables.” Journal of the Royal Statistical Society. Series A 164:175-192.
Davies Withers, Suzanne. 2001. “Quantitative Methods: Advancement in Ecological
Inference.” Progress in Human Geography 25:87-96.
Duncan, Otis Dudley, and Beverly Davis. 1953. “An Alternative to Ecological
Correlation.” American Sociological Review 18:665-666.
Firth, David. 1982. Estimation of Voter Transition Matrices from Election Data. M.Sc.
thesis. London: Department of Mathematics, Imperial College London.
Goodman, L.A. 1961. “Statistical Methods for the Mover-Stayer Model.” Journal of the
American Statistical Association 56:841-868.
Hawkins, D.L., and C.P. Han. 2000. “Estimating Transition Probabilities from Aggregate
Samples Plus Partial Transition Data.” Biometrics 56:848-854.
Kalbfleish, J.D., and J.F. Lawless. 1984. “Least Squares Estimation of Transition
Probabilities from Aggregate Data.” Canadian Journal of Statistics 12:169-182.
Kalbfleish, J.D., and J.F. Lawless. 1985. “The Analysis of Panel Data under a Markovian
Assumption.” Journal of the American Statistical Association 80:863-871.
Kay, Richard, and Sarah Little. 1986. “Assessing the Fit of the Logistic Model.” Applied
Statistics 35:16-30.
King, Gary. 1997. A Solution to the Ecological Inference Problem: Reconstructing
Individual Behavior from Aggregate Data. Cambridge: Cambridge University Press.
King, Gary, Ori Rosen, and Martin Tanner. 1999. “Binomial-beta Hierarchical Models for
Ecological Inference.” Sociological Methods and Research 28:61-90.
Lawless, J.F., and D.L. McLeish. 1984. “The Information in Aggregate Data from Markov
Chains.” Biometrika 71:419-430.
18
Lee, T.C., G.G. Judge, and A. Zellner. 1970. Estimating the Parameters of the Markov
Probability Model from Aggregate Time Series Data. Amsterdam: North-Holland.
McCall, John J. 1971. “A Markovian Model of Income Dynamics.” Journal of the
American Statistical Association 66:439-447.
Mebane, Walter R., and Jonathan Wand. 1997. Markov Chain Models for Rolling CrossSection Data: How Campaign Events and Political Awareness Affect Vote Intentions
and Partisanship in the United States and Canada. Paper presented at the 1997 Annual
Meeting of the Midwest Political Science Association, Chicago Il.
Moffitt, Robert. 1993. “Identification and Estimation of Dynamic Models with a Time
Series of Repeated Cross-sections.” Journal of Econometrics 59:99-123.
Patterson, Thomas E. 1980. The Mass Media Election: How Americans Choose Their
President. New York: Praeger.
Pelzer, Ben, Rob Eisinga, and Philip H. Franses. 2001. “Estimating Transition
Probabilities from a Time Series of Repeated Cross Sections.” Statistica Neerlandica
55:248-261.
Penubarti, Mohan, and Alexander A. Schuessler. 1998. Inferring Micro- from Macrolevel
Change: Ecological Panel Inference in Surveys. Los Angeles: University of California
LA.
Rosen, Ori, Wenxin Jiang, Gary King, and Martin Tanner. 2001. “Bayesian and
Frequentist Inference for Ecological Inference: the R x C case.” Statistica Neerlandica
55:133-155.
Stott, David. 1997. “Sabre 3.0: Software for the Analysis of Binary Recurrent Events.”
http://www.cas.lancs.ac.uk:80/software/ (June 2001) .
Shively, W. Phillips. 1991. “A General Extension of the Methods of Bounds, with Special
Application to Studies of Electoral Transition.” Historical Methods 24:81-94.
Sigelman, Lee. 1991. “Turning Cross Sections into a Panel: A Simple Procedure for
Ecological Inference.” Social Science Research 20:150-170.
Topel, Robert H. 1983. “On Layoffs and Unemployment Insurance.” American Economic
Review 73:541-559.
Van Houwelingen, J.C., and S. Le Cessie. 1990. “Predictive Value of Statistical Models.”
Statistics in Medicine 9:1303-1325.
19
Table 1: Marginal fraction of Democratic vote intention, observed entry and exit
transition rates and panel attrition
————————————————————————————————————
year
month
nt
inflow
outflow
yt
y t y t −1 = 0
y t y t −1 = 1
————————————————————————————————————
1976
02
04
06
08
10
856
790
792
727
691
142
153
90
80
208
151
155
116
.384
.460
.471
.465
.457
.248
.170
.203
.140
.178
.176
.229
.138
panel attrition patterns and number of observations across waves*
11111
11110
11101
11100
11011
11010
11001
11000
412
50
33
56
31
10
9
47
10111
10110
10101
10100
10011
10010
10001
10000
26
6
2
13
10
7
6
138
01111
01110
01101
01100
01011
01010
01001
01000
57
8
9
14
12
6
1
35
00111
00110
00101
00100
00011
00010
00001
56
22
8
20
7
7
12
————————————————————————————————————
*1=observed, 0=missing. The figures were obtained after listwise deletion (for each time point separately) of
respondents who exhibit item nonresponse.
20
Table 2: Markov repeated cross-section estimates for transitions into and out of Democratic vote intention *
——————————————————————————————————————————————————————————
model 1
δ ( pt =1 )
Backcastable variables
Voted Nixon in 1972
Voted McGovern in 1972
Black
Education
Age
Female
Constant
-1.14
1.30
.96
-.29
-.01
model 2
β ( µt )
(.00)
(.00)
(.00)
(.00)
(.01)
-1.36 (.00)
1.58 (.00)
.82 (.00)
3.47 (.00)
-.23 (.00)
-.08 (.00)
- β ( λt )
*
-.56 (.04)
-2.29 (.00)
-0.92 (.00)
0.57 (.01)
β ( µt )
t
-0.51 (.04) 2,4
0.88 (.00) 2
1.29 (.00) 2
*
- β ( λt )
t
1.59 (.02) 2,4
0.71 (.00) 2,3,4
-.10 (.00)
.73 (.00)
2.67 (.00)
Non-backcastable variables
Self-identification as Democrat
-1.37 (.00)
1.87 (.00)
Indifferent towards Democratic or Republican president
Ford:
- good job as president
- favorable feelings
-1.11 (.00) 2,3,4,5 -4.60 (.00) 2,3,4,5
2.38 (.00) 2,3
1.44 (.01) 5
-0.19 (.00)
-0.28 (.00)
- trustworthiness
- leadership
- ability
Carter:
- favorable feelings
-0.36
-0.29
-1.21
-1.04
-0.35
(.02)
(.00)
(.00)
(.00)
(.00)
4,5
2
4
5
3
-3.09 (.00) 3
-2.95 (.00) 4
0.45 (.00) 2,3,4
0.64 (.00) 2,3,4
0.99 (.00) 5
1.43 (.00) 4
1.34 (.00) 2,5
0.35 (.00) 2,3
1.14 (.00) 4,5
1.26 (.00) 4,5
- trustworthiness
- leadership
- ability
Constant
Log likelihood (LL*)
δ ( pt =1 )
-1.12 (.00) 2,3
-1.85 (.01) 4
-1.57 (.04) 5
-2142.48
-0.69 (.00) 3,4
-1.81 (.00) 5
-0.78
-1.28
3.28
2.82
(.01)
(.02)
(.00)
(.00)
4
2
3
4
-1431.17
——————————————————————————————————————————————————————————
* p -values in parentheses. The β -parameters represent the effect on µ t , β * the effect on (1 − λt ) and thus - β * the effect on λt . The columns labeled t
indicate the discrete time periods pertaining to the parameters.
Table 3: Prediction error measures*
———————————————————————————————————————
t
2
3
4
5
MSE
µ : nt−1 ∑i =t 1 (( yit | yt −1 = 0) − µit ) 2
.146 .123 .068 .049
λ : nt−1 ∑i =t 1 (( yit | yt −1 = 1) − λit ) 2
.155 .121 .126 .069
µ : − nt−1 ∑i =t 1 ( yit | yt −1 = 0) ln µit + (1 − ( yit | yt −1 = 0)) ln(1 − µit )
.437 .378 .235 .162
λ : − nt−1 ∑i =t 1 ( yit | yt −1 = 1) ln λit + (1 − ( yit | yt −1 = 1)) ln(1 − λit )
.607 .412 .495 .250
µ : nt−1 ∑i =t 1 ( yit | yt −1 = 0) µit + (1 − ( yit | yt −1 = 0))(1 − µit )
.736 .756 .879 .899
λ : nt−1 ∑i =t 1 ( yit | yt −1 = 1) λit + (1 − ( yit | yt −1 = 1))(1 − λit )
.788 .803 .817 .867
n
n
MML
n
n
MCA
n
n
———————————————————————————————————————
* MSE is the mean squared error, MML is the mean value of minus log likelihood error (Van Houwelingen
and Le Cessie 1990), and MCA is the mean probability of correct allocation (Kay and Little 1986).
Table 4: Frequencies of observed (obs) and estimated expected (exp) (non-)Democratic
vote transitions ( yt −1 yt ) at sample period t *
———————————————————————————————————
t
nt
2
3
4
5
670
643
642
617
(00)
obs exp
(01)
obs exp
(11)
obs exp
296
270
269
305
102 104
57 73
69 70
47 42
213
253
233
243
309
279
271
288
219
253
239
236
(10)
obs exp
χ2
51
47
64
34
1.2
5.0
0.7
2.7
46
54
69
39
*1=Democratic, 0=non-Democratic. The frequencies were only obtained for respondents with a valid
score on both y t and y t −1 .
23
Table 5: Frequencies of observed (obs) and estimated expected (exp) (non-)Democratic vote
intention sequences *
———————————————————————————————————————
sequence
0
1
00
01
10
11
000
001
010
011
100
101
110
111
0000
0001
0010
0011
0100
0101
obs
527
329
309
102
46
213
223
37
26
66
25
13
20
160
160
30
12
14
13
10
exp
524
332
296
104
50
219
207
40
20
69
26
14
20
174
157
24
18
14
13
4
∆
-3
3
-13
2
4
6
-16
3
-6
3
1
1
0
14
-3
-6
6
0
0
-6
0110
0111
1000
1001
1010
1011
1100
1101
1110
1111
00000
00001
00010
00011
00100
00101
00110
00111
01000
01001
01010
15
43
12
5
5
4
12
4
23
114
140
7
9
14
9
2
2
11
8
5
3
12
40
19
2
6
5
11
6
13
132
138
9
3
13
13
2
3
8
10
1
0
-3
-3
7
-3
1
1
-1
2
-10
18
-2
2
-6
-1
4
0
1
-3
2
-4
-3
01011
01100
01101
01110
01111
10000
10001
10010
10011
10100
10101
10110
10111
11000
11001
11010
11011
11100
11101
11110
11111
4
10
4
4
33
9
3
1
4
3
2
0
4
9
0
1
3
9
11
9
91
2
7
3
5
29
18
2
1
1
5
1
2
2
6
1
0
3
7
3
12
114
-2
-3
-1
1
-4
9
-1
0
-3
2
-1
2
-2
-3
1
-1
0
-2
-8
3
23
———————————————————————————————————————
* A binary digit represents a spell occurring over the sample periods t , where 1 refers to Democrat and
0 to non-Democrat. The first spell starts at t = 1 and the sequences end at the observation period t . The
frequencies were only obtained for respondents with a valid score on y1 through yt in the panel.
24
1-p0
1-p1
1-p2
1− µ2
µ1
1 − µ3
λ2
λ3
µ2
µ3
1 − λ3
1 − λ2
p1
β
X1
1-p3
p2
β
β∗
X2
p3
β
β∗
X3
Figure 1: Graphical illustration of Markov model for RCS data
Figure 2: Tomography plot for current entry and 1-exit transitions at sample period t = 5