Research Designs
Gerardo L. Munck
University of Southern California, Los Angeles, California, USA
Jay Verkuilen
University of Illinois, Champaign-Urbana, Champaign,
Illinois, USA
Glossary
case study A study in which one unit is analyzed, typically in
an intensive manner that is attentive to time and process.
Although, strictly speaking, in a case study the N ¼ 1,
frequently the effective number of observations is considerably higher.
cross-sectional design A study in which observations on
a variable or multiple variables are collected across units at
the same point in time.
experimental design A study in which the treatment is
consciously manipulated by the researcher and in which
units are randomly assigned to treatment and control groups.
To distinguish experimental from quasi-experimental
designs, the former are sometimes called randomized
experiments.
external validity Concept originally introduced by Donald T.
Campbell to refer to the generalizability of the finding of
a causal relationship between variables beyond the domain
of the actual units, spatial and temporal setting, and specific
treatments that are examined.
internal validity Concept originally introduced by Donald T.
Campbell to refer to a causal relationship between variables
in the actual units, spatial and temporal setting, and specific
treatments that are examined.
large N study A study in which observations are made across
a large number of units. Such studies, however, vary
significantly in terms of their N, with typical cross-national
studies in the field of comparative politics and international
relations oscillating between 30 and 100 and those using
opinion surveys reaching into the thousands.
longitudinal design A study in which multiple observations
on variables are collected across time for the same unit.
Also known as time series design.
observational studies Nonexperimental studies, also called
correlational studies, in which the treatment is not
consciously manipulated by the researcher. Rather,
researchers simply record the values of variables as they
naturally occur. These studies include natural experiments
in which particularly sharp and obvious changes in the
value of a variable are held to offer an analogy to the
introduction of a treatment.
pooled time series, cross-sectional design A design that
combines a cross-sectional design and a longitudinal design;
includes panel studies and repeated measures design.
quasi-experimental design Design in which the treatment is
consciously manipulated by the researcher, as in an
experiment, but in which, unlike in experiments, units are
not randomly assigned to treatment and control groups.
research design A key aspect of the research process that
revolves around the direct impact on the prospects of causal
inference of four core questions: How is the value of the
independent variable(s) assigned? How are units selected?
How many units are selected? and How are comparisons
organized (i.e., whether temporally and/or spatially)? In
addition, research designs can be evaluated in terms of
their indirect impact on causal inference in light of their
requirements and contributions vis-à-vis theory and data.
small N study A study in which observations are made across
a small number of units. Typically, each unit is treated
as a case study so that multiple observations on each unit
are made.
The methodology of research design hinges on the choices
made with regard to four core questions: How is the value
of the independent variable(s) assigned? How are units
Encyclopedia of Social Measurement, Volume 3 Ó2005, Elsevier Inc. All Rights Reserved.
385
386
Research Designs
selected? How many units are selected? and How are
comparisons organized (i.e., whether temporally and/or
spatially)? These choices can be assessed in terms of their
direct impact but also their indirect impact—due their
requirements and contributions vis-à-vis theory and
data—on the prospects of making causal inferences.
Three research traditions—experimental and quasi-experimental, quantitative, and qualitative—represent
distinct responses to these methodological choices and
each has important strengths but also significant
weaknesses. Thus, the need for choices about research
design to be explicitly addressed and justified, and the
need to actively construct bridges across research traditions, is emphasized.
Introduction: Goals, Problems,
and Options
The pioneering work on research design by Donald T.
Campbell and associates has made such a major contribution that it is difficult to think about research design
without, in one way or another, drawing on their insightful
discussions. They provide a valuable template for thinking
about research design, helpfully framing the discussion in
terms of the basic goal of validity, a range of problems or
threats to validity, and a set of options or design choices
that can be pursued as a way to guard against these threats.
At the same time, their work displays some limitations and
biases and ultimately fails to offer a clear, encompassing,
and balanced understanding of the challenges involved
in research design. This assessment and comparison of
research designs thus adopts Campbell et al.’s basic
template and some of their key ideas about the goals,
problems, and options of research design but also parts
company with them in some significant ways.
First, Campbell et al.’s discussion of the goals of research design in terms of the concept of validity is both
somewhat confusing and biased. Initially, Campbell introduced the concepts of internal and external validity, which
disaggregated the ultimate goal of research design—to
increase the prospects of making causal inferences—
and aptly distinguished different problems of causal
inference that are affected and potentially solved by different design choices. Over time, however, these two
concepts have been awkwardly relabeled and, more important, defined in different ways in different texts. In
addition, two other types of validity—statistical conclusion
validity and construct validity—that pertain in part to research design but spill over into questions of data
analysis and measurement, respectively, have been introduced into the discussion, further complicating matters.
Another problem with this typology of validity is that it is
somewhat biased. Thus, Lee Cronbach has argued that
Campbell et al. gave undue primacy to internal over
external validity and did not recognize the importance
of generalizability. This is a critique that they have acknowledged and sought to address in their latest statement.
However, they still do not fully incorporate the difficulties
of generalizing on the basis of experimental and quasiexperimental designs in their overall assessment of the
strengths and weaknesses of design options.
To avoid these problems, we only consider internal and
external validity to be the core goals directly relevant to
a discussion of research design, and both retain the classic
labels of internal and external validity and follow the original definitions offered by Campbell and Julian Stanley in
1966. As they argue, the establishment of the internal
validity of a causal proposition involves showing that
a factor is the cause of an effect or, more modestly, probing alternative hypotheses and opting for those that stand
up best to attempts at disconfirmation. In contrast, the
verification of the external validity of a causal proposition
entails demonstrating that a causal proposition can be
generalized beyond the domain of the actual units, spatial
and temporal setting, and specific treatments that are
actually studied. In turn, both internal and external
validity can be viewed as distinct and equally essential
goals of research design.
Second, Campbell et al.’s discussion of the problems of
causal inference is organized around a fairly ad hoc and
cumbersome list of threats to validity. To be sure, the
listed threats to validity are all important, and their analysis of the way different designs get around or fail to get
around these threats to validity is exemplary. Indeed, they
offer many specific recommendations that creatively respond to thorny and complex problems routinely encountered in the conduct of substantive research. However,
they do little to present their list of threats in a logically
explicit manner, to distinguish threats that are relevant
to experimental designs from those that pertain to nonexperimental designs, and to offer a sense of the extent
to which the threats to validity they discuss constitute a
complete list. In place of their list, we propose a scheme
whereby research design choices are evaluated in light
of a set of problems of causal inference that are directly
and indirectly affected by design choices.
Figures 1 and 2 provide a graphic representation of
the assumptions and potential problems of causal inference that design options impact in a direct manner. With
regard to the analysis of single units, the core concerns
are the need to ascertain that the posited direction of
causality is correctly described and that the proposed
model of the link among causal variables is fully and
correctly specified. With regard to the analysis of multiple
units, the key issues are the need to validate the assumptions of unit independence and unit homogeneity. In turn,
Fig. 3 draws attention to the need to place the discussion
of research design in the broader context of the research
387
Research Designs
Assumption
Potential problem
Assumption
Potential problem
Direction of causality
Reverse causation
Independence of units, i.e. units
are not affected by each other
Diffusion across units
(Galton’s problem)
Unit 1 at time 1 Unit 1 at time 2
Unit 1 at time 1
IV
DV
IV
DV
Endogeneity (e.g., reciprocal,
nonrecursive causation)
IV/DV
DV/IV
DV
DV
IV
i
IV
ii
DV
iii
iv
All variables affecting the DV
are included in the model and
are correctly specified as being:
i: an antecedent variable,
ii: an interacting variable,
iii: an intervening variable, or
iv: another independent
variables
Omitted variable: failure to
identify any variable that
affects the DV
Misspecified variable: failure to
correctly specify the link
between all identified
variables (e.g., an interacting
variable is modeled as another
independent variable)
DV, dependent variable; IV,
independent variable.
DV
Units 2, 3, etc.
at time 1
IV
Spuriousness
Fully and correctly specified
model of the link among causal
variables
IV
DV
IV
DV
Units 2, 3, etc.
at time 2
IV
IV
DV
Units 2, 3, etc. at time 1
IV
DV
DV
DV, dependent variable; IV,
independent variable.
Diffusion within units across
time (the problem of history)
Unit 1 at time 1 Unit 1 at time 2
IV
DV
IV
DV
Unit homogeneity, i.e. changes
in the dependent variable are
produced by same causal model
Heterogeneity of units within a
sample or a population
Unit 1
Unit 1
DV
IV1
IV2
IV1
DV
IV2
Units 2, 3, etc.
IV1
Units 2, 3, etc.
DV
IV2
IV3
DV
IV4
Figure 2 Problems of causal inference: Issues in the analysis
of multiple units and singular units across time.
Figure 1 Problems of causal inference: Issues in the analysis
of single units.
process and to consider how design options have an indirect but important impact on causal inference as a result
of their requirements and contributions vis-à-vis theory
and data.
Third, Campbell et al.’s discussion of design options is
fairly limited and their assessment biased toward experimental designs. Indeed, they place such strong emphasis
on certain design options—the ability of researcher to
consciously manipulate independent variables and randomly assign units to treatment and control groups—that
they offer a narrow and unbalanced optic on questions of
research design. They do highlight the difficulties of
conducting experiments and emphasize the ways in
which experiments are likely to never guarantee that
all threats to validity are eliminated. Moreover, occasionally they offer an exemplary display of pluralism.
However, they tend to overlook some significant shortcomings associated with experimental data and downplay
the significant potential virtues of nonexperimental data,
and they ignore both the role played by design choices
other than those that are defining elements of experiments and the way in which design choices have an indirect impact on causal inference. Thus, to assess the
Theory
Research
design
Causal inference
Data
Figure 3 Research design in context.
Although research is frequently portrayed as proceeding in
a linear fashion from theory to data collection and causal inference, as a matter of practice research is an interactive and iterative
process. Thus, it is important to recognize that design options are
(i) affected by the state of available theory and can also affect
theory building and (ii) limited by the availability of data and also
have an impact on the quality of data that are generated.
strengths and weaknesses of different research designs
in a systematic and balanced manner, it is important to
recognize that design options revolve around at least four
core questions: How is the value of the independent
variable(s) assigned? How are units selected? How
388
Table I
Research Designs
Research Designs: Classification Criteria and Options
Classification criteria
How is the value of the independent
variable(s) assigned?
Main options
Disaggregate designs
Manipulation, random assignation of cases to
treatment and control groups
Manipulation, nonrandom assignation of cases
to treatment and control groups
Nature
Experimental (randomized experiment)
Quasi-experimental
Observational
How are units selected?
Random sample
Purposive (deliberate or intentional) sample
Entire population
Representative
Typical (mode, mean, or median)
Heterogeneous (extreme and typical)
Census
How many units are selected?
Many
Few
One
Large-N
Small-N
Case study
How are comparisons organized?
Across units
Across time
Cross-sectional
Longitudinal
many units are selected? and How are comparisons organized (i.e., whether temporally and/or spatially) (Table I)?
Moreover, it is necessary to address the manner in which
design choices have both a direct and an indirect impact
on causal inference.
Recognizing the seminal nature of the work by
Campbell et al. but also seeking to overcome its limitations,
this article provides an overview of the current state of
knowledge on research design. The number of alternative
research designs is, in principle, very large. Indeed, focusing only on the main options related to the four criteria
presented in Table I, it is clear that the range of possible
designs is much greater than usually believed, although not
all will be sensible. However, the following discussion is
organized in terms of research traditions, given that this is
how much research and discussion about design options
is carried out, focusing on experimental and quasiexperimental designs first, turning next to large N or
quantitative designs, and completing the discussion with
case study and small N or qualitative designs. In each case,
the focus is on the prototypical designs associated with
each tradition and an assessment and comparison of the
strengths and weaknesses of these designs. The need for
choices about research design to be explicitly addressed
and justified, and the need to actively construct bridges
across research traditions, is emphasized.
Experimental and QuasiExperimental Designs
Experimental designs, also called randomized experiments, are characterized by two distinguishing features:
(i) the conscious manipulation by the researcher of
a treatment or, more generically, an independent variable
of interest, and (ii) the random assignment of units to
treatment and control groups (Fig. 4). Experiments
also draw upon other design elements, and thus it is possible to distinguish a variety of experimental designs.
However, these two features are the defining features
of the prototypical experimental design.
The strengths of this design are undeniable. Given that
the treatment is administered before a posttest measure
on the dependent variable is registered, the direction of
causality is clearly established. In addition, the random
assignment of units to treatment and control groups
ensures that these groups are equivalent; that is, they
do not vary systematically on any variable except the manipulated variable. In this manner, multiple unexamined
variables are held to not vary in any patterned manner
and, hence, control over these alternative hypotheses is
ensured by turning other variation into noise. Thus, the
beauty of experimental designs is that the data generated
lend themselves to a particularly simple form of analysis,
in which the differences in posttest measures of the treatment and control groups can be interpreted as a measure
of the causal effect of the treatment. Indeed, experiments
are the most powerful means of gaining control and establishing the internal validity of a causal claim (Table II).
A quasi-experimental design differs from an experimental design in that in the former the treatment is consciously manipulated by the researcher but the units are
not randomly assigned to treatment and control groups.
Thus, this design offers less of a basis for assuming the
initial equivalence of treatment and control groups and
requires researchers to consider how third extraneous
variables might confound efforts at ascertaining causal
effects and to be explicit about alternative hypotheses.
The difficulties this difference between experiments
and quasi-experiments introduces are central to the
work of Campbell et al. Indeed, their work can be understood essentially as an effort to alert researchers to
Research Designs
389
The units are randomly divided into two groups—the treatment group and the control group—and a treatment is applied to the
treatment group and not to the control group.
Measures are taken of the values of the dependent or outcome variable for the treatment and control groups. The generated data are
organized as follows:
Treatment group
Control group
Units 1, 2, 3, . . . , n
Units 1, 2, 3, . . . , n
Independent
Yes
No
variable
(treatment)
Dependent
Value
Value
variable
____________________________
Alternatively, after the units have been randomly divided into two groups, a pretest measure of the value of the dependent variable
for both the treatment group and the control group is made. The generated data are organized as follows:
Treatment group
Control group
Units 1, 2, 3, . . . , n
Units 1, 2, 3, . . . , n
T1
T2
T1
T2
Independent
No
Yes
No
No
variable
(treatment)
Dependent
Value
Value
Value
Value
variable
____________________________
In both instances, the analysis of the data focuses on the difference in values on the dependent variable between treatment and
control groups (the italicized values), a difference that can be interpreted as a measure of the causal effect of the treatment.
When a pretest measure on the dependent variable is obtained in addition to a posttest measure, a comparison of the values of units
of the treatment and control groups at T1 (the underlined values) serves to double-check that all units are ‘‘equivalent,’’ given that
this goal is already ensured by the random assignment of units.
Figure 4 Experimental designs: Defining features.
Table II Experimental Designs: An Assessment
Design elements
Strengths
Conscious manipulation of
independent variables
Establishes causal direction
(internal validity)
Random assignment of units to
treatment and control groups
Establishes equivalence of
units, helps guard against
third, extraneous variables
(internal validity)
Nonrandom selection of units,
setting, and treatment
the various ways in which factors other than the treatment
may be responsible for the observed posttest variance
between treatment and control groups. Thus, in large
part due to the significant effort by Campbell et al.
to anticipate potential threats to the validity of quasiexperiments—and to offer a range of designs that minimize
Weaknesses
Lack of viability, due to practical and/or ethical reasons, to
study many important questions
Unsuitable for the study of action and lack of attention to
causal mechanisms (internal validity)
Unsuitable for an assessment of complex causes
Requires a priori knowledge of plausible independent
variables and well-specified causal model; not useful in
theory generation
Tends to generate obtrusive, reactive measurements
Lack of viability, due to practical and/or ethical reasons, to
study many important questions
Difficult to generalize from sample to population (external
validity)
and help guard against such threats—researchers have
a sophisticated road map for consciously taking into consideration how third extraneous variables may exercise
a confounding effect. In summary, even though quasiexperiments are more complicated than experiments,
both designs offer a powerful basis for making claims
390
Research Designs
about causality, and it is thus hardly surprising that these
designs have been used often in psychology and economics and are increasingly being adopted by political
scientists.
The considerable strengths of experimental and quasiexperimental designs notwithstanding, it is important to
recognize the serious weaknesses associated with these
designs. One standard limitation concerns the viability
of conducting experiments, whether for practical and/or
ethical reasons, to study the sort of questions that are of
interest to many social scientists. The fact that we cannot
or would not want to manipulate some variables and/or
randomly assign subjects to control and treatment groups
has vast implications. Indeed, as Hubert Blalock (1991)
argues, ‘‘If we were to confine our analyses to experimental
and quasi-experimental designs, virtually all of sociology
and political science would have to go to the wayside.’’
(p. 332) However, the problem is deeper than the standard
concern with viability would indicate and actually derives
from the very substance of the subject matter of the social
sciences.
The core difference between the social and natural
sciences is that the former are first and foremost about
agents and actions. However, a basic feature of experimental and quasi-experimental designs—the conscious
manipulation of the treatment—is founded on a notable
asymmetry, which places the experimenter in an active
role and relegates the experimentee to a passive, reactive
role. In effect, experiments and quasi-experiments treat
subjects as objects and embody an implicit behaviorist
perspective that renders them ineffectual instruments
for the study of the causal significance of action and, relatedly, forces them to be silent on the critical question of
causal mechanisms. Indeed, as William Shadish et al.
recognize, although experimental and quasi-experimental
designs can be used to predict what the effect of a factor is,
they are less useful for explaining why and how such
effects are generated. This is a significant limitation. Indeed, it is not far-fetched to argue that the internal validity
of a causal argument is not fully established until the
causal mechanisms that generate the causal effect are
properly identified and tested.
Experiments are also of limited use with regard to the
study of complex causal relationships. This failure is in
part due to the inability to deal squarely with agency
through experiments and quasi-experiments. Consequently, these designs are not suitable means for getting
at a core theoretical concern in social theory: the interaction between structures and agents, or macro and micro
causal factors. In addition, because experiments and
quasi-experiments are in essence instruments geared to
the study of short-term effects, they are not useful for
studying causes that work themselves out over an extended period of time or for assessing the interaction
between long- and short-term causal factors. Moreover,
because experimenters, as a way to manipulate the
treatment, must rigidly assign variables to the status of
independent and dependent variables, they do not
constitute a means of assessing feedback effects or reciprocal and nonrecursive causation. In short, a range of
theories simply cannot be tested through experimental
and quasi-experimental designs.
Another significant limitation concerns the generalizability of results derived from experiments. The reason for
this is that although a defining feature of experiments is
that units are randomly assigned to treatment and control
groups; the units, settings, and treatments are usually not
randomly selected. Indeed, inasmuch as control is gained
through the manipulation of the treatment and the random assignment of units to treatment and control groups,
it is practically inevitable that the ability to randomly select these units will decline. The purposive or intentional
selection of units that researchers have to use is not without merits, and it can certainly be carried out with an eye
to the relationship between the studied sample and the
universe. However, even when carefully practiced, purposive selection tends to lead to biased results. Indeed, in
many instances the gap between the conditions of experimental research (whether in the laboratory or in field
settings) and the phenomenon being studied can be substantial; hence, the ability to generalize beyond the domain of the actual units, spatial and temporal setting, and
specific treatments that are examined is compromised.
Thus, it is important to recognize that both internal
and external validity are critical aspects of knowledge,
and that there are good reasons for the standard view
that the gains made by experiments in terms of internal
validity tend to come at the cost of a loss in external
validity.
Finally, two other limitations are indirect consequences of the manipulation of treatments that characterizes experimental and quasi-experimental designs.
First, because of this design element, experiments and
quasi-experiments approach the two-sided issue of causal
theorizing from one side: They focus on the effect of
causes rather than on the causes of effects. Thus, they
require a priori knowledge about plausible independent
variables and presume that all that needs to be determined is the effect of preselected causes. Hence, experiments and quasi-experiments are of less use during the
early, exploratory stage in the research process, when
a typical challenge is to uncover potential independent
variables by working backward from a dependent variable. Furthermore, experiments and quasi-experiments
assume a well-specified causal model and thus require
that the state of theory building already be fairly advanced. Second, the manipulation of treatments makes
experiments and quasi-experiments a particularly obtrusive and reactive form of generating data. The gain made
by manipulating treatments comes at the cost of the
Research Designs
quality of the data generated for analysis. In summary, as
shown in Table II, although experiments especially, but
also quasi-experiments, have some important strengths,
they are also associated with a number of significant
weaknesses.
Observational Designs
The distinction between experimental and observational
studies is a fundamental and deep one. The key difference
is that in observational studies, control of possible third
variables is not attained ‘‘automatically’’ through random
assignment. Rather, in observational studies third
variables have to be formulated and measured explicitly,
and control is sought through the analysis of the data.
However, the distinction between large N studies, on
the one hand, and case and small N studies, on the
other hand, is equally profound and probably more pervasive. This second distinction is not unique to observational studies. Indeed, the quantitative vs qualitative
distinction runs through both the experimental and the
observational research communities. However, the discussion of this distinction is developed only in the context
of observational studies and focuses primarily on the prototypical quantitative and qualitative studies: the large N,
cross-sectional study and the small N study based on the
longitudinal case study, respectively.
Large N Studies
A large N study has some considerable strengths that
make it, in some regards, superior to an experimental
study. First, because it uses data generated through the
natural course of events, it is a viable design for studying
Table III
391
important questions that involve nonmanipulable
variables that cannot be addressed with an experimental
method. Second, because a large N study is not constrained by the requirement to randomly assign units to
treatment and control groups, it is more likely to entail
a randomly selected sample. This is a major benefit that
gives large N researchers the ability to generalize beyond
the domain of the actual units, spatial and temporal setting, and specific treatments that are actually studied and
to establish the generalizability of their findings (external
validity), a core weakness of experimental methods.
However, much as is the case with regard to random
assignment in the context of experimental designs, the
beauty of random selection is tarnished by the difficulty
of applying this design element to many units, settings,
and variables (Table III).
Another important strength of large N studies, due to
their tendency to study quite large samples, is their ability
to use statistical analysis to establish patterns of association with a high degree of precision and confidence. Such
results, however, differ significantly from those that can
be obtained using experimental data. Indeed, although
experimental data offer strong grounds for making claims
about causality, because large N studies are observational
it is crucial to remember the simple but profound point
that ‘‘association is not causation’’ and, moreover, that
even ‘‘a lack of correlation does not disprove causation’’
(Bollen, 1989: p. 52). This does not mean that claims about
causality cannot be made on the basis of large N studies.
Indeed, such claims can be made legitimately if researchers verify the causal assumptions of their statistical
models, including nonspuriousness, the lack of omitted
variables, independence of cases, and unit homogeneity.
However, it is extremely difficult to substantiate that patterns of association establish causality (internal validity).
Observational Designs: Large-N Studies
Design elements
Assignment of value of the
independent variable(s) by nature
Random selection of units or selection
of an entire population
Selection of many units that are
compared cross-sectionally and/or
longitudinally
Strengths
Viability of studying important questions
that involve nonmanipulable variables
Establishes generalizability (external
validity)
Establishes patterns of association with
a high degree of precision and
confidence
Constitutes a tool for theory generation
Weaknesses
Lack of viability for many units, settings, and
variables
Association does not establish causation
(internal validity) and it is difficult to verify
the causal assumptions in statistical models
Associations are more interpretable when
guided by a strong theory, i.e., a theory
with few variables and detailed predictions
Measurement validity is harder to establish
the larger the N and the larger the number
of variables
Lack of attention to causal mechanisms
(internal validity)
392
Research Designs
In short, claims about causality derived from large N
studies should be treated with great caution.
Regarding the indirect consequences of design
choices, it bears emphasizing that large N studies are
quite useful for exploratory work. In an experiment, treatments need to be planned in advance and need to be
relatively few in number, a particularly difficult and restrictive requirement for many areas of the social science.
In contrast, at least with regard to the kinds of questions
that are common to macro-oriented inquiries in sociology,
political science, and economics, it is possible to obtain
more data after examining some relationships. This practice runs the risk of capitalizing on chance if not rigorously
tested on new data, but it is potentially an important part
of any study.
This benefit notwithstanding, a core problem regarding the demands that large N research puts on theory and
data should be highlighted. On the one hand, associations
are more interpretable when guided by a strong theory—
that is, one with few variables and detailed predictions. In
other words, good large N research not only requires
a causal model specified prior to testing but also puts
a heavy burden on the ability of theory builders to reduce
the number of potential explanatory factors. On the other
hand, to ensure that the causal model is fully specified, it is
important that potential independent variables are not
omitted. This fact complicates the interpretation of
results. Moreover, it makes a heavy, sometimes practically
impossible, demand concerning data. Indeed, because
potentially confounding factors are not controlled ‘‘automatically’’ through random assignment in large N studies
but rather must be modeled and measured explicitly, the
need to collect data on numerous variables across a large
number of cases and/or time entails some serious costs.
This demand opens the door to well-founded charges
concerning the validity of the data. In addition, the
data requirements of large N designs make it extremely
difficult to offer a quantitative study of causal mechanisms, a crucial shortcoming that further weakens the
claims about causality (internal validity) that can be
made on the basis of a large N study. Gains in terms of
generalizability and findings about patterns of association
are heavily dependent on good theory and good data—
valuable resources that are not always available.
Some advances that go beyond the cross-sectional design typically used in large N studies help get around some
of these limitations. Especially noteworthy are time series
and panel studies, event history analysis, and hierarchical
modeling. These methods offer fruitful ways of addressing
the problems associated with the assumptions of unit independence and unit homogeneity. However, these potential gains tend to make even more imposing demands in the
area of data collection than cross-sectional designs and thus
are achieved at a cost. In summary, as Table III highlights,
much as in the case of experimental and quasi-experimental
designs, it is only fair to note the strengths of quantitative
or large N designs but also to recognize their weaknesses.
Case and Small N Studies
The status of the case and small N studies tradition in the
social sciences has been marked by a notable disjuncture
between the high number of its practitioners and its low
standing in the broader methodological community. This
odd situation, however, has been changing over time.
After famously stating that ‘‘one-shot case studies [are]
of almost no scientific value,’’ Campbell retracted this
harsh critique of a staple of qualitative research. In
turn, statistician David Freedman has argued that case
studies, when well designed, can establish causality in
a more powerful manner than is standard in most quantitative research. Finally, a flourishing debate on case and
small N research in recent years has increasingly made
explicit the rationale for choosing a qualitative research
design and the methodological foundations of rigorous
qualitative research. As a result, the reconstructed logic
of qualitative methods is catching up with the logic of
qualitative research, and a clear sense of the strengths
and weaknesses of this tradition is emerging.
One well-established and important strength of case
and small N studies is that they represent a viable design to
address important questions that involve nonmanipulable
variables. This is a virtue shared by small and large N
studies, but in this regard small N researchers have an
advantage, particularly in light of their ability to gain access to various types of data and form a complex picture of
their cases, including a keen sense of developments over
time. For this reason, much of the analysis in the social
sciences on a range of critical questions, especially during
the initial stages in the research process, is done by qualitative researchers (Table IV).
A feature of case and small N studies setting them apart
from both large N and experimental studies and accounting for a key comparative advantage is the manner in
which these studies can be used to analyze the role of
agency and hence to establish causal mechanisms. This
is a critical point. Indeed, as philosophers of science and
methodologists have increasingly insisted, it is not enough
to focus on causal effects. Rather, it is necessary to go
beyond statements about what the effect of a factor is
and to consider how and why the effect operates. This
calls for the specification of causal mechanisms, which
requires, in most areas of the social sciences, considering
agency. In this regard, it is probably not an overstatement
to suggest that a qualitative design is the method par excellence to study agency and hence to empirically ground
the analysis of causal mechanisms.
A related strength of case and small N studies is that
they offer a basis for assessing causality (internal validity).
Research Designs
393
Table IV Observational Designs: Case and Small-N Studies
Design elements
Assignment of value of the
independent variable(s) by nature
Selection of one or a few units that are
compared longitudinally and/or
cross-sectionally
Strengths
Weaknesses
Viability of studying important
questions that involve
nonmanipulable variables
Addresses agency and establishes
causal mechanisms
In the absence of strong theory, it is difficult
to establish control and eliminate potential
variables (internal validity)
The study of causal mechanisms, and
within- and cross-case analysis,
offers a basis for assessing causality
(internal validity)
Constitutes a powerful tool for theory
generation
Measurement validity is easier to
establish the smaller the N
Purposive selection of a few units
This is done, most fundamentally, through a within-case
form of analysis that uses empirical evidence about causal
mechanisms as a way to check expectations concerning
the direction of causal processes, to eliminate potential
third variables, and to verify the assumption of unit independence and unit homogeneity. Moreover, this goal is
also frequently advanced by combining the prototypical
longitudinal within-case design with a cross-sectional design, such as a traditional cross-case study or a cross-sectional within-case study either focused on different
implications of a theory or cast at a different level of
aggregation. Through these means, which capitalize on
knowledge about process and the ways in which case
studies lend themselves to more observations than are
suggested by the strict definition of a case study as an
N ¼ 1 study, small N researchers can make valuable contributions that are important to highlight.
However, the weaknesses of qualitative methods as
a tool for establishing causality should also be duly recognized. To use case studies to assess causality, it is necessary
to be explicit about the posited causal model—including all
its variables, the relationship among the variables, and the
form of the effect of each variable or combination of
variables—as well as about the causal mechanisms that
are considered to be in play. This is a demanding task,
but failing to specify these things in advance makes it
easy for researchers to simply focus on confirming evidence and to disregard alternative interpretations. Indeed,
a drawback of most analyses of causal mechanisms is that
they fail to specify formally what mechanisms are posited
and what plausible alternative mechanisms should be considered and also to set up the study of mechanisms as
a standard test among competing hypotheses.
Moreover, even if such steps are taken, it is necessary to
recognize the limits of small N analysis, in light of the
number of observations they entail, as a tool for causal
Difficult to generalize from sample to
population (external validity)
assessment. Some important exceptions to this general
principle exist. First, occasionally qualitative researchers
may be able to design their research in a way that resembles a natural experiment, in which the hypothesized
causal factor changes markedly while other factors remain
the same. Second, it is possible to establish patterns of
association with precision even with a relatively small
sample using techniques of analysis such as exact tests,
permutation tests, resampling, or Bayesian methods that
do not rely on asymptotics.
Third, presuming a powerful theory is being tested,
a few observations could very well serve as the basis for
clear results. However, frequently qualitative researchers
seek to assess complex causal relationships and, when this
is the case, small N designs tend to constitute weak means
of controlling for alternative hypotheses and for ruling out
chance. Indeed, claims to test complex causes using small
N designs rely on the quite unreasonable assumptions that
(i) the world operates in a deterministic fashion, (ii) the
proposed causal model is complete, and (iii) there is no
measurement error. In short, the use of case studies and
small N studies to assess causality tends to rely on some
very stringent assumptions concerning the state of theory
and data.
Another significant limitation concerns the difficulty of
using case studies to make generalizations. When studying a small number of cases, random selection is not
advisable. Indeed, random selection offers a basis for
generalization only when a large number of cases are
selected and hence the law of large numbers takes
force. Thus, qualitative researchers have to resort to
the purposive selection of their cases. When this is
done, it is important to avoid drawing a sample of convenience that bears an unclear relationship to the broader
population. Moreover, it is important that researchers be
aware of how their choice of cases might introduce bias.
394
Research Designs
Such efforts to select cases in a conscious and careful
manner, however, should not be mistaken as steps providing a sufficient basis for making claims about generalizations (external validity).
Finally, with regard to the indirect consequences of
design choices, two strengths of case and small N studies
deserve mention. One is that this type of design constitutes a powerful tool for theory generation. Indeed, one of
the clear benefits of qualitative research is its fruitfulness
at a tool for generating ideas about causal variables and
mechanisms and for theorizing that is closely informed by
knowledge of the substantive problem of interest. Another related benefit is that the detailed knowledge of
context that is associated with case studies plays
a critical role in helping researchers establish measurement validity. In summary, as is the case with the experimental and quantitative traditions, the qualitative
tradition is characterized by a mix of strengths and
weaknesses (Table IV) that must be considered in any
balanced assessment of the potentials of different
research designs.
Conclusion: Choices and Bridges
The broad message concerning research design we have
sought to convey is that making causal inferences about
the complex realities of interest to social scientists is probably more difficult than is generally believed and that
questions of research design play a key role in determining
whether researchers have a solid basis for making claims
about causal relationships. Technical fixes cannot, in general, get around design problems, but more attention goes
to the former than the latter. Indeed, given the centrality
of research design to the research process as a whole, it is
probably fair to say that research design is a relatively
unappreciated aspect of methodology and that it deserves
more attention from methodologists and practicing researchers alike. Beyond this generic admonition, two further points that build on but go beyond the previous
discussion offer material for further consideration.
One point is the need for choices about research design
to be explicitly addressed and justified. There is a tendency
for researchers to work within distinct research traditions
and to simply opt for certain designs as a matter of default.
Such a tendency is understandable in that different designs require different skills and training, and in a practical
sense researchers are thus not free to choose among research designs. Nonetheless, short of justifying their design choices in light of the range of possible options, at the
very least researchers should address the impact of their
choices on the certainty of their conclusions. As this article
shows, this entails a consideration of the direct impact on
the prospects of causal inference of the four core choices
involved in research design and of the indirect impact,
due to their requirements and contributions vis-à-vis theory and data, of these choices.
A second point is the need to creatively construct
bridges across research traditions. Although it is common
for certain traditions to be presented as inherently superior to others and the standard against which other traditions should be measured, this article has shown that all
traditions are characterized by certain strengths and
weaknesses and that it is thus more accurate and useful
to think in terms of the tradeoffs involved in working
within different traditions. An implication of this assessment, then, is that greater effort should be made to capitalize on what are clearly the complementary strengths of
different traditions (compare Tables IIIV).
Efforts at bridging, whether carried out through multiple studies on the same question or mixed designs that
combine multiple designs within a single study, are very
demanding. Thus, although it is common to point out that
multiple studies in the context of a shared research program offer a way of combining different designs, such
combinations are only effective inasmuch as research
programs are organized around clearly specified concepts
and questions and are advanced, at least to a certain extent, through explicitly coordinated teamwork. In turn,
the effective use of mixed designs requires a level of
methodological sophistication, as well as theoretical and
substantive knowledge, that is rare. Nonetheless, the high
payoffs associated with the use of mixed methods make
these options strongly recommendable.
See Also the Following Articles
Experiments, Overview Explore, Explain, Design
Longitudinal Cohort Designs Observational Studies
Quasi-Experiment Sample Design Survey Design
Time-SeriesCross-Section Data Validity, Data Sources
Further Reading
Blalock, H. (1991). Are there really any constructive alternatives to causal modeling? Sociol. Methodol. 21, 325335.
Bollen, K. A. (1989). Structural Equations with Latent
Variables. Wiley, New York.
Brady, H. E., and Collier, D. (eds.) (2004). Rethinking Social
Inquiry: Diverse Tools, Shared Standards. Rowman &
Littlefield/Berkeley Public Policy Press, Lanham, MD.
Campbell, D. T. (1988a). Factors relevant to the validity of
experiments in social settings. In Methodology and
Epistemology for Social Science: Selected Papers. Donald
T. Campbell (E. Samuel Overman, ed.), pp. 151166.
University of Chicago Press, Chicago [Original work
published 1957].
Campbell, D. T. (1988b). Degrees of freedom and the case
study. In Methodology and Epistemology for Social Science:
Selected Papers. Donald T. Campbell (E. Samuel Overman,
Research Designs
ed.), pp. 377388. University of Chicago Press, Chicago
[Original work published 1975].
Campbell, D. T. (1999). Relabeling internal and external validity for applied social scientists. In Social Experimentation
(D. T. Campbell and M. J. Russo, eds.), pp. 111122. Sage,
Thousand Oaks, CA [Original work published 1986].
Collier, D. (1993). The comparative method. In Political
Science: The State of the Discipline II (A. W. Finifter, ed.),
pp. 105119. American Political Science Association,
Washington, DC.
Cronbach, L. J. (1982). Designing Evaluations of Educational
and Social Programs. Jossey-Bass, San Francisco.
Freedman, D. A. (1991). Statistical analysis and shoe leather.
Sociol. Methodol. 21, 291313.
Goldthorpe, J. H. (2000). On Sociology: Numbers, Narratives,
and the Integration of Research and Theory. Oxford
University Press, Oxford, UK.
Good, P. I. (2000). Permutation Tests: A Practical Guide to
Resampling Methods for Testing Hypotheses. Springer,
Berlin.
Kagel, J. H., and Roth, A. E. (eds.) (1995). The Handbook
of Experimental Economics. Princeton University Press,
Princeton, NJ.
395
King, G., Keohane, R. O., and Verba, S. (1994). Designing Social Inquiry. Scientific Inference in Qualitative
Research. Princeton University Press, Princeton, NJ.
Mahoney, J. (2000). Strategies of causal inference in small-N
research. Sociol. Methods Res. 28(4), 387424.
McDermott, R. (2002). Experimental methods in political
science. Annu. Rev. Political Sci. 5, 3161.
Oehlert, G. W. (2000). A First Course in Design and Analysis
of Experiments. Freeman, New York.
Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002).
Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston.
Smith, H. L. (1990). Specification problems in experimental and
nonexperimental social research. Sociol. Methodol. 20, 5991.
Tashakkori, A., and Teddlie, C. (1998). Mixed Methodology:
Combining Qualitative and Quantitative Approaches. Sage,
Thousand Oaks, CA.
Webb, E. J., Campbell, D. T., Schwartz R. D., and Sechrest, L.
(2000). Unobtrusive Measures. Revised Edition. Sage,
Thousand Oaks, CA.
Western, B., and Jackman, S. (1994). Bayesian inference
for comparative research. Am. Political Sci. Rev. 88(2),
412423.