Academia.eduAcademia.edu

Research Designs

AI-generated Abstract

The paper discusses various research designs crucial for establishing causal relationships in social science research. It categorizes designs into experimental, quasi-experimental, cross-sectional, longitudinal, and case studies, highlighting their specific strengths and weaknesses in terms of internal and external validity. The methodology's core involves the careful selection of units, the assignment of independent variables, and the impacts these choices have on causal inference.

Research Designs Gerardo L. Munck University of Southern California, Los Angeles, California, USA Jay Verkuilen University of Illinois, Champaign-Urbana, Champaign, Illinois, USA Glossary case study A study in which one unit is analyzed, typically in an intensive manner that is attentive to time and process. Although, strictly speaking, in a case study the N ¼ 1, frequently the effective number of observations is considerably higher. cross-sectional design A study in which observations on a variable or multiple variables are collected across units at the same point in time. experimental design A study in which the treatment is consciously manipulated by the researcher and in which units are randomly assigned to treatment and control groups. To distinguish experimental from quasi-experimental designs, the former are sometimes called randomized experiments. external validity Concept originally introduced by Donald T. Campbell to refer to the generalizability of the finding of a causal relationship between variables beyond the domain of the actual units, spatial and temporal setting, and specific treatments that are examined. internal validity Concept originally introduced by Donald T. Campbell to refer to a causal relationship between variables in the actual units, spatial and temporal setting, and specific treatments that are examined. large N study A study in which observations are made across a large number of units. Such studies, however, vary significantly in terms of their N, with typical cross-national studies in the field of comparative politics and international relations oscillating between 30 and 100 and those using opinion surveys reaching into the thousands. longitudinal design A study in which multiple observations on variables are collected across time for the same unit. Also known as time series design. observational studies Nonexperimental studies, also called correlational studies, in which the treatment is not consciously manipulated by the researcher. Rather, researchers simply record the values of variables as they naturally occur. These studies include natural experiments in which particularly sharp and obvious changes in the value of a variable are held to offer an analogy to the introduction of a treatment. pooled time series, cross-sectional design A design that combines a cross-sectional design and a longitudinal design; includes panel studies and repeated measures design. quasi-experimental design Design in which the treatment is consciously manipulated by the researcher, as in an experiment, but in which, unlike in experiments, units are not randomly assigned to treatment and control groups. research design A key aspect of the research process that revolves around the direct impact on the prospects of causal inference of four core questions: How is the value of the independent variable(s) assigned? How are units selected? How many units are selected? and How are comparisons organized (i.e., whether temporally and/or spatially)? In addition, research designs can be evaluated in terms of their indirect impact on causal inference in light of their requirements and contributions vis-à-vis theory and data. small N study A study in which observations are made across a small number of units. Typically, each unit is treated as a case study so that multiple observations on each unit are made. The methodology of research design hinges on the choices made with regard to four core questions: How is the value of the independent variable(s) assigned? How are units Encyclopedia of Social Measurement, Volume 3 Ó2005, Elsevier Inc. All Rights Reserved. 385 386 Research Designs selected? How many units are selected? and How are comparisons organized (i.e., whether temporally and/or spatially)? These choices can be assessed in terms of their direct impact but also their indirect impact—due their requirements and contributions vis-à-vis theory and data—on the prospects of making causal inferences. Three research traditions—experimental and quasi-experimental, quantitative, and qualitative—represent distinct responses to these methodological choices and each has important strengths but also significant weaknesses. Thus, the need for choices about research design to be explicitly addressed and justified, and the need to actively construct bridges across research traditions, is emphasized. Introduction: Goals, Problems, and Options The pioneering work on research design by Donald T. Campbell and associates has made such a major contribution that it is difficult to think about research design without, in one way or another, drawing on their insightful discussions. They provide a valuable template for thinking about research design, helpfully framing the discussion in terms of the basic goal of validity, a range of problems or threats to validity, and a set of options or design choices that can be pursued as a way to guard against these threats. At the same time, their work displays some limitations and biases and ultimately fails to offer a clear, encompassing, and balanced understanding of the challenges involved in research design. This assessment and comparison of research designs thus adopts Campbell et al.’s basic template and some of their key ideas about the goals, problems, and options of research design but also parts company with them in some significant ways. First, Campbell et al.’s discussion of the goals of research design in terms of the concept of validity is both somewhat confusing and biased. Initially, Campbell introduced the concepts of internal and external validity, which disaggregated the ultimate goal of research design—to increase the prospects of making causal inferences— and aptly distinguished different problems of causal inference that are affected and potentially solved by different design choices. Over time, however, these two concepts have been awkwardly relabeled and, more important, defined in different ways in different texts. In addition, two other types of validity—statistical conclusion validity and construct validity—that pertain in part to research design but spill over into questions of data analysis and measurement, respectively, have been introduced into the discussion, further complicating matters. Another problem with this typology of validity is that it is somewhat biased. Thus, Lee Cronbach has argued that Campbell et al. gave undue primacy to internal over external validity and did not recognize the importance of generalizability. This is a critique that they have acknowledged and sought to address in their latest statement. However, they still do not fully incorporate the difficulties of generalizing on the basis of experimental and quasiexperimental designs in their overall assessment of the strengths and weaknesses of design options. To avoid these problems, we only consider internal and external validity to be the core goals directly relevant to a discussion of research design, and both retain the classic labels of internal and external validity and follow the original definitions offered by Campbell and Julian Stanley in 1966. As they argue, the establishment of the internal validity of a causal proposition involves showing that a factor is the cause of an effect or, more modestly, probing alternative hypotheses and opting for those that stand up best to attempts at disconfirmation. In contrast, the verification of the external validity of a causal proposition entails demonstrating that a causal proposition can be generalized beyond the domain of the actual units, spatial and temporal setting, and specific treatments that are actually studied. In turn, both internal and external validity can be viewed as distinct and equally essential goals of research design. Second, Campbell et al.’s discussion of the problems of causal inference is organized around a fairly ad hoc and cumbersome list of threats to validity. To be sure, the listed threats to validity are all important, and their analysis of the way different designs get around or fail to get around these threats to validity is exemplary. Indeed, they offer many specific recommendations that creatively respond to thorny and complex problems routinely encountered in the conduct of substantive research. However, they do little to present their list of threats in a logically explicit manner, to distinguish threats that are relevant to experimental designs from those that pertain to nonexperimental designs, and to offer a sense of the extent to which the threats to validity they discuss constitute a complete list. In place of their list, we propose a scheme whereby research design choices are evaluated in light of a set of problems of causal inference that are directly and indirectly affected by design choices. Figures 1 and 2 provide a graphic representation of the assumptions and potential problems of causal inference that design options impact in a direct manner. With regard to the analysis of single units, the core concerns are the need to ascertain that the posited direction of causality is correctly described and that the proposed model of the link among causal variables is fully and correctly specified. With regard to the analysis of multiple units, the key issues are the need to validate the assumptions of unit independence and unit homogeneity. In turn, Fig. 3 draws attention to the need to place the discussion of research design in the broader context of the research 387 Research Designs Assumption Potential problem Assumption Potential problem Direction of causality Reverse causation Independence of units, i.e. units are not affected by each other Diffusion across units (Galton’s problem) Unit 1 at time 1 Unit 1 at time 2 Unit 1 at time 1 IV DV IV DV Endogeneity (e.g., reciprocal, nonrecursive causation) IV/DV DV/IV DV DV IV i IV ii DV iii iv All variables affecting the DV are included in the model and are correctly specified as being: i: an antecedent variable, ii: an interacting variable, iii: an intervening variable, or iv: another independent variables Omitted variable: failure to identify any variable that affects the DV Misspecified variable: failure to correctly specify the link between all identified variables (e.g., an interacting variable is modeled as another independent variable) DV, dependent variable; IV, independent variable. DV Units 2, 3, etc. at time 1 IV Spuriousness Fully and correctly specified model of the link among causal variables IV DV IV DV Units 2, 3, etc. at time 2 IV IV DV Units 2, 3, etc. at time 1 IV DV DV DV, dependent variable; IV, independent variable. Diffusion within units across time (the problem of history) Unit 1 at time 1 Unit 1 at time 2 IV DV IV DV Unit homogeneity, i.e. changes in the dependent variable are produced by same causal model Heterogeneity of units within a sample or a population Unit 1 Unit 1 DV IV1 IV2 IV1 DV IV2 Units 2, 3, etc. IV1 Units 2, 3, etc. DV IV2 IV3 DV IV4 Figure 2 Problems of causal inference: Issues in the analysis of multiple units and singular units across time. Figure 1 Problems of causal inference: Issues in the analysis of single units. process and to consider how design options have an indirect but important impact on causal inference as a result of their requirements and contributions vis-à-vis theory and data. Third, Campbell et al.’s discussion of design options is fairly limited and their assessment biased toward experimental designs. Indeed, they place such strong emphasis on certain design options—the ability of researcher to consciously manipulate independent variables and randomly assign units to treatment and control groups—that they offer a narrow and unbalanced optic on questions of research design. They do highlight the difficulties of conducting experiments and emphasize the ways in which experiments are likely to never guarantee that all threats to validity are eliminated. Moreover, occasionally they offer an exemplary display of pluralism. However, they tend to overlook some significant shortcomings associated with experimental data and downplay the significant potential virtues of nonexperimental data, and they ignore both the role played by design choices other than those that are defining elements of experiments and the way in which design choices have an indirect impact on causal inference. Thus, to assess the Theory Research design Causal inference Data Figure 3 Research design in context. Although research is frequently portrayed as proceeding in a linear fashion from theory to data collection and causal inference, as a matter of practice research is an interactive and iterative process. Thus, it is important to recognize that design options are (i) affected by the state of available theory and can also affect theory building and (ii) limited by the availability of data and also have an impact on the quality of data that are generated. strengths and weaknesses of different research designs in a systematic and balanced manner, it is important to recognize that design options revolve around at least four core questions: How is the value of the independent variable(s) assigned? How are units selected? How 388 Table I Research Designs Research Designs: Classification Criteria and Options Classification criteria How is the value of the independent variable(s) assigned? Main options Disaggregate designs Manipulation, random assignation of cases to treatment and control groups Manipulation, nonrandom assignation of cases to treatment and control groups Nature Experimental (randomized experiment) Quasi-experimental Observational How are units selected? Random sample Purposive (deliberate or intentional) sample Entire population Representative Typical (mode, mean, or median) Heterogeneous (extreme and typical) Census How many units are selected? Many Few One Large-N Small-N Case study How are comparisons organized? Across units Across time Cross-sectional Longitudinal many units are selected? and How are comparisons organized (i.e., whether temporally and/or spatially) (Table I)? Moreover, it is necessary to address the manner in which design choices have both a direct and an indirect impact on causal inference. Recognizing the seminal nature of the work by Campbell et al. but also seeking to overcome its limitations, this article provides an overview of the current state of knowledge on research design. The number of alternative research designs is, in principle, very large. Indeed, focusing only on the main options related to the four criteria presented in Table I, it is clear that the range of possible designs is much greater than usually believed, although not all will be sensible. However, the following discussion is organized in terms of research traditions, given that this is how much research and discussion about design options is carried out, focusing on experimental and quasiexperimental designs first, turning next to large N or quantitative designs, and completing the discussion with case study and small N or qualitative designs. In each case, the focus is on the prototypical designs associated with each tradition and an assessment and comparison of the strengths and weaknesses of these designs. The need for choices about research design to be explicitly addressed and justified, and the need to actively construct bridges across research traditions, is emphasized. Experimental and QuasiExperimental Designs Experimental designs, also called randomized experiments, are characterized by two distinguishing features: (i) the conscious manipulation by the researcher of a treatment or, more generically, an independent variable of interest, and (ii) the random assignment of units to treatment and control groups (Fig. 4). Experiments also draw upon other design elements, and thus it is possible to distinguish a variety of experimental designs. However, these two features are the defining features of the prototypical experimental design. The strengths of this design are undeniable. Given that the treatment is administered before a posttest measure on the dependent variable is registered, the direction of causality is clearly established. In addition, the random assignment of units to treatment and control groups ensures that these groups are equivalent; that is, they do not vary systematically on any variable except the manipulated variable. In this manner, multiple unexamined variables are held to not vary in any patterned manner and, hence, control over these alternative hypotheses is ensured by turning other variation into noise. Thus, the beauty of experimental designs is that the data generated lend themselves to a particularly simple form of analysis, in which the differences in posttest measures of the treatment and control groups can be interpreted as a measure of the causal effect of the treatment. Indeed, experiments are the most powerful means of gaining control and establishing the internal validity of a causal claim (Table II). A quasi-experimental design differs from an experimental design in that in the former the treatment is consciously manipulated by the researcher but the units are not randomly assigned to treatment and control groups. Thus, this design offers less of a basis for assuming the initial equivalence of treatment and control groups and requires researchers to consider how third extraneous variables might confound efforts at ascertaining causal effects and to be explicit about alternative hypotheses. The difficulties this difference between experiments and quasi-experiments introduces are central to the work of Campbell et al. Indeed, their work can be understood essentially as an effort to alert researchers to Research Designs 389 The units are randomly divided into two groups—the treatment group and the control group—and a treatment is applied to the treatment group and not to the control group. Measures are taken of the values of the dependent or outcome variable for the treatment and control groups. The generated data are organized as follows: Treatment group Control group Units 1, 2, 3, . . . , n Units 1, 2, 3, . . . , n Independent Yes No variable (treatment) Dependent Value Value variable ____________________________ Alternatively, after the units have been randomly divided into two groups, a pretest measure of the value of the dependent variable for both the treatment group and the control group is made. The generated data are organized as follows: Treatment group Control group Units 1, 2, 3, . . . , n Units 1, 2, 3, . . . , n T1 T2 T1 T2 Independent No Yes No No variable (treatment) Dependent Value Value Value Value variable ____________________________ In both instances, the analysis of the data focuses on the difference in values on the dependent variable between treatment and control groups (the italicized values), a difference that can be interpreted as a measure of the causal effect of the treatment. When a pretest measure on the dependent variable is obtained in addition to a posttest measure, a comparison of the values of units of the treatment and control groups at T1 (the underlined values) serves to double-check that all units are ‘‘equivalent,’’ given that this goal is already ensured by the random assignment of units. Figure 4 Experimental designs: Defining features. Table II Experimental Designs: An Assessment Design elements Strengths Conscious manipulation of independent variables Establishes causal direction (internal validity) Random assignment of units to treatment and control groups Establishes equivalence of units, helps guard against third, extraneous variables (internal validity) Nonrandom selection of units, setting, and treatment the various ways in which factors other than the treatment may be responsible for the observed posttest variance between treatment and control groups. Thus, in large part due to the significant effort by Campbell et al. to anticipate potential threats to the validity of quasiexperiments—and to offer a range of designs that minimize Weaknesses Lack of viability, due to practical and/or ethical reasons, to study many important questions Unsuitable for the study of action and lack of attention to causal mechanisms (internal validity) Unsuitable for an assessment of complex causes Requires a priori knowledge of plausible independent variables and well-specified causal model; not useful in theory generation Tends to generate obtrusive, reactive measurements Lack of viability, due to practical and/or ethical reasons, to study many important questions Difficult to generalize from sample to population (external validity) and help guard against such threats—researchers have a sophisticated road map for consciously taking into consideration how third extraneous variables may exercise a confounding effect. In summary, even though quasiexperiments are more complicated than experiments, both designs offer a powerful basis for making claims 390 Research Designs about causality, and it is thus hardly surprising that these designs have been used often in psychology and economics and are increasingly being adopted by political scientists. The considerable strengths of experimental and quasiexperimental designs notwithstanding, it is important to recognize the serious weaknesses associated with these designs. One standard limitation concerns the viability of conducting experiments, whether for practical and/or ethical reasons, to study the sort of questions that are of interest to many social scientists. The fact that we cannot or would not want to manipulate some variables and/or randomly assign subjects to control and treatment groups has vast implications. Indeed, as Hubert Blalock (1991) argues, ‘‘If we were to confine our analyses to experimental and quasi-experimental designs, virtually all of sociology and political science would have to go to the wayside.’’ (p. 332) However, the problem is deeper than the standard concern with viability would indicate and actually derives from the very substance of the subject matter of the social sciences. The core difference between the social and natural sciences is that the former are first and foremost about agents and actions. However, a basic feature of experimental and quasi-experimental designs—the conscious manipulation of the treatment—is founded on a notable asymmetry, which places the experimenter in an active role and relegates the experimentee to a passive, reactive role. In effect, experiments and quasi-experiments treat subjects as objects and embody an implicit behaviorist perspective that renders them ineffectual instruments for the study of the causal significance of action and, relatedly, forces them to be silent on the critical question of causal mechanisms. Indeed, as William Shadish et al. recognize, although experimental and quasi-experimental designs can be used to predict what the effect of a factor is, they are less useful for explaining why and how such effects are generated. This is a significant limitation. Indeed, it is not far-fetched to argue that the internal validity of a causal argument is not fully established until the causal mechanisms that generate the causal effect are properly identified and tested. Experiments are also of limited use with regard to the study of complex causal relationships. This failure is in part due to the inability to deal squarely with agency through experiments and quasi-experiments. Consequently, these designs are not suitable means for getting at a core theoretical concern in social theory: the interaction between structures and agents, or macro and micro causal factors. In addition, because experiments and quasi-experiments are in essence instruments geared to the study of short-term effects, they are not useful for studying causes that work themselves out over an extended period of time or for assessing the interaction between long- and short-term causal factors. Moreover, because experimenters, as a way to manipulate the treatment, must rigidly assign variables to the status of independent and dependent variables, they do not constitute a means of assessing feedback effects or reciprocal and nonrecursive causation. In short, a range of theories simply cannot be tested through experimental and quasi-experimental designs. Another significant limitation concerns the generalizability of results derived from experiments. The reason for this is that although a defining feature of experiments is that units are randomly assigned to treatment and control groups; the units, settings, and treatments are usually not randomly selected. Indeed, inasmuch as control is gained through the manipulation of the treatment and the random assignment of units to treatment and control groups, it is practically inevitable that the ability to randomly select these units will decline. The purposive or intentional selection of units that researchers have to use is not without merits, and it can certainly be carried out with an eye to the relationship between the studied sample and the universe. However, even when carefully practiced, purposive selection tends to lead to biased results. Indeed, in many instances the gap between the conditions of experimental research (whether in the laboratory or in field settings) and the phenomenon being studied can be substantial; hence, the ability to generalize beyond the domain of the actual units, spatial and temporal setting, and specific treatments that are examined is compromised. Thus, it is important to recognize that both internal and external validity are critical aspects of knowledge, and that there are good reasons for the standard view that the gains made by experiments in terms of internal validity tend to come at the cost of a loss in external validity. Finally, two other limitations are indirect consequences of the manipulation of treatments that characterizes experimental and quasi-experimental designs. First, because of this design element, experiments and quasi-experiments approach the two-sided issue of causal theorizing from one side: They focus on the effect of causes rather than on the causes of effects. Thus, they require a priori knowledge about plausible independent variables and presume that all that needs to be determined is the effect of preselected causes. Hence, experiments and quasi-experiments are of less use during the early, exploratory stage in the research process, when a typical challenge is to uncover potential independent variables by working backward from a dependent variable. Furthermore, experiments and quasi-experiments assume a well-specified causal model and thus require that the state of theory building already be fairly advanced. Second, the manipulation of treatments makes experiments and quasi-experiments a particularly obtrusive and reactive form of generating data. The gain made by manipulating treatments comes at the cost of the Research Designs quality of the data generated for analysis. In summary, as shown in Table II, although experiments especially, but also quasi-experiments, have some important strengths, they are also associated with a number of significant weaknesses. Observational Designs The distinction between experimental and observational studies is a fundamental and deep one. The key difference is that in observational studies, control of possible third variables is not attained ‘‘automatically’’ through random assignment. Rather, in observational studies third variables have to be formulated and measured explicitly, and control is sought through the analysis of the data. However, the distinction between large N studies, on the one hand, and case and small N studies, on the other hand, is equally profound and probably more pervasive. This second distinction is not unique to observational studies. Indeed, the quantitative vs qualitative distinction runs through both the experimental and the observational research communities. However, the discussion of this distinction is developed only in the context of observational studies and focuses primarily on the prototypical quantitative and qualitative studies: the large N, cross-sectional study and the small N study based on the longitudinal case study, respectively. Large N Studies A large N study has some considerable strengths that make it, in some regards, superior to an experimental study. First, because it uses data generated through the natural course of events, it is a viable design for studying Table III 391 important questions that involve nonmanipulable variables that cannot be addressed with an experimental method. Second, because a large N study is not constrained by the requirement to randomly assign units to treatment and control groups, it is more likely to entail a randomly selected sample. This is a major benefit that gives large N researchers the ability to generalize beyond the domain of the actual units, spatial and temporal setting, and specific treatments that are actually studied and to establish the generalizability of their findings (external validity), a core weakness of experimental methods. However, much as is the case with regard to random assignment in the context of experimental designs, the beauty of random selection is tarnished by the difficulty of applying this design element to many units, settings, and variables (Table III). Another important strength of large N studies, due to their tendency to study quite large samples, is their ability to use statistical analysis to establish patterns of association with a high degree of precision and confidence. Such results, however, differ significantly from those that can be obtained using experimental data. Indeed, although experimental data offer strong grounds for making claims about causality, because large N studies are observational it is crucial to remember the simple but profound point that ‘‘association is not causation’’ and, moreover, that even ‘‘a lack of correlation does not disprove causation’’ (Bollen, 1989: p. 52). This does not mean that claims about causality cannot be made on the basis of large N studies. Indeed, such claims can be made legitimately if researchers verify the causal assumptions of their statistical models, including nonspuriousness, the lack of omitted variables, independence of cases, and unit homogeneity. However, it is extremely difficult to substantiate that patterns of association establish causality (internal validity). Observational Designs: Large-N Studies Design elements Assignment of value of the independent variable(s) by nature Random selection of units or selection of an entire population Selection of many units that are compared cross-sectionally and/or longitudinally Strengths Viability of studying important questions that involve nonmanipulable variables Establishes generalizability (external validity) Establishes patterns of association with a high degree of precision and confidence Constitutes a tool for theory generation Weaknesses Lack of viability for many units, settings, and variables Association does not establish causation (internal validity) and it is difficult to verify the causal assumptions in statistical models Associations are more interpretable when guided by a strong theory, i.e., a theory with few variables and detailed predictions Measurement validity is harder to establish the larger the N and the larger the number of variables Lack of attention to causal mechanisms (internal validity) 392 Research Designs In short, claims about causality derived from large N studies should be treated with great caution. Regarding the indirect consequences of design choices, it bears emphasizing that large N studies are quite useful for exploratory work. In an experiment, treatments need to be planned in advance and need to be relatively few in number, a particularly difficult and restrictive requirement for many areas of the social science. In contrast, at least with regard to the kinds of questions that are common to macro-oriented inquiries in sociology, political science, and economics, it is possible to obtain more data after examining some relationships. This practice runs the risk of capitalizing on chance if not rigorously tested on new data, but it is potentially an important part of any study. This benefit notwithstanding, a core problem regarding the demands that large N research puts on theory and data should be highlighted. On the one hand, associations are more interpretable when guided by a strong theory— that is, one with few variables and detailed predictions. In other words, good large N research not only requires a causal model specified prior to testing but also puts a heavy burden on the ability of theory builders to reduce the number of potential explanatory factors. On the other hand, to ensure that the causal model is fully specified, it is important that potential independent variables are not omitted. This fact complicates the interpretation of results. Moreover, it makes a heavy, sometimes practically impossible, demand concerning data. Indeed, because potentially confounding factors are not controlled ‘‘automatically’’ through random assignment in large N studies but rather must be modeled and measured explicitly, the need to collect data on numerous variables across a large number of cases and/or time entails some serious costs. This demand opens the door to well-founded charges concerning the validity of the data. In addition, the data requirements of large N designs make it extremely difficult to offer a quantitative study of causal mechanisms, a crucial shortcoming that further weakens the claims about causality (internal validity) that can be made on the basis of a large N study. Gains in terms of generalizability and findings about patterns of association are heavily dependent on good theory and good data— valuable resources that are not always available. Some advances that go beyond the cross-sectional design typically used in large N studies help get around some of these limitations. Especially noteworthy are time series and panel studies, event history analysis, and hierarchical modeling. These methods offer fruitful ways of addressing the problems associated with the assumptions of unit independence and unit homogeneity. However, these potential gains tend to make even more imposing demands in the area of data collection than cross-sectional designs and thus are achieved at a cost. In summary, as Table III highlights, much as in the case of experimental and quasi-experimental designs, it is only fair to note the strengths of quantitative or large N designs but also to recognize their weaknesses. Case and Small N Studies The status of the case and small N studies tradition in the social sciences has been marked by a notable disjuncture between the high number of its practitioners and its low standing in the broader methodological community. This odd situation, however, has been changing over time. After famously stating that ‘‘one-shot case studies [are] of almost no scientific value,’’ Campbell retracted this harsh critique of a staple of qualitative research. In turn, statistician David Freedman has argued that case studies, when well designed, can establish causality in a more powerful manner than is standard in most quantitative research. Finally, a flourishing debate on case and small N research in recent years has increasingly made explicit the rationale for choosing a qualitative research design and the methodological foundations of rigorous qualitative research. As a result, the reconstructed logic of qualitative methods is catching up with the logic of qualitative research, and a clear sense of the strengths and weaknesses of this tradition is emerging. One well-established and important strength of case and small N studies is that they represent a viable design to address important questions that involve nonmanipulable variables. This is a virtue shared by small and large N studies, but in this regard small N researchers have an advantage, particularly in light of their ability to gain access to various types of data and form a complex picture of their cases, including a keen sense of developments over time. For this reason, much of the analysis in the social sciences on a range of critical questions, especially during the initial stages in the research process, is done by qualitative researchers (Table IV). A feature of case and small N studies setting them apart from both large N and experimental studies and accounting for a key comparative advantage is the manner in which these studies can be used to analyze the role of agency and hence to establish causal mechanisms. This is a critical point. Indeed, as philosophers of science and methodologists have increasingly insisted, it is not enough to focus on causal effects. Rather, it is necessary to go beyond statements about what the effect of a factor is and to consider how and why the effect operates. This calls for the specification of causal mechanisms, which requires, in most areas of the social sciences, considering agency. In this regard, it is probably not an overstatement to suggest that a qualitative design is the method par excellence to study agency and hence to empirically ground the analysis of causal mechanisms. A related strength of case and small N studies is that they offer a basis for assessing causality (internal validity). Research Designs 393 Table IV Observational Designs: Case and Small-N Studies Design elements Assignment of value of the independent variable(s) by nature Selection of one or a few units that are compared longitudinally and/or cross-sectionally Strengths Weaknesses Viability of studying important questions that involve nonmanipulable variables Addresses agency and establishes causal mechanisms In the absence of strong theory, it is difficult to establish control and eliminate potential variables (internal validity) The study of causal mechanisms, and within- and cross-case analysis, offers a basis for assessing causality (internal validity) Constitutes a powerful tool for theory generation Measurement validity is easier to establish the smaller the N Purposive selection of a few units This is done, most fundamentally, through a within-case form of analysis that uses empirical evidence about causal mechanisms as a way to check expectations concerning the direction of causal processes, to eliminate potential third variables, and to verify the assumption of unit independence and unit homogeneity. Moreover, this goal is also frequently advanced by combining the prototypical longitudinal within-case design with a cross-sectional design, such as a traditional cross-case study or a cross-sectional within-case study either focused on different implications of a theory or cast at a different level of aggregation. Through these means, which capitalize on knowledge about process and the ways in which case studies lend themselves to more observations than are suggested by the strict definition of a case study as an N ¼ 1 study, small N researchers can make valuable contributions that are important to highlight. However, the weaknesses of qualitative methods as a tool for establishing causality should also be duly recognized. To use case studies to assess causality, it is necessary to be explicit about the posited causal model—including all its variables, the relationship among the variables, and the form of the effect of each variable or combination of variables—as well as about the causal mechanisms that are considered to be in play. This is a demanding task, but failing to specify these things in advance makes it easy for researchers to simply focus on confirming evidence and to disregard alternative interpretations. Indeed, a drawback of most analyses of causal mechanisms is that they fail to specify formally what mechanisms are posited and what plausible alternative mechanisms should be considered and also to set up the study of mechanisms as a standard test among competing hypotheses. Moreover, even if such steps are taken, it is necessary to recognize the limits of small N analysis, in light of the number of observations they entail, as a tool for causal Difficult to generalize from sample to population (external validity) assessment. Some important exceptions to this general principle exist. First, occasionally qualitative researchers may be able to design their research in a way that resembles a natural experiment, in which the hypothesized causal factor changes markedly while other factors remain the same. Second, it is possible to establish patterns of association with precision even with a relatively small sample using techniques of analysis such as exact tests, permutation tests, resampling, or Bayesian methods that do not rely on asymptotics. Third, presuming a powerful theory is being tested, a few observations could very well serve as the basis for clear results. However, frequently qualitative researchers seek to assess complex causal relationships and, when this is the case, small N designs tend to constitute weak means of controlling for alternative hypotheses and for ruling out chance. Indeed, claims to test complex causes using small N designs rely on the quite unreasonable assumptions that (i) the world operates in a deterministic fashion, (ii) the proposed causal model is complete, and (iii) there is no measurement error. In short, the use of case studies and small N studies to assess causality tends to rely on some very stringent assumptions concerning the state of theory and data. Another significant limitation concerns the difficulty of using case studies to make generalizations. When studying a small number of cases, random selection is not advisable. Indeed, random selection offers a basis for generalization only when a large number of cases are selected and hence the law of large numbers takes force. Thus, qualitative researchers have to resort to the purposive selection of their cases. When this is done, it is important to avoid drawing a sample of convenience that bears an unclear relationship to the broader population. Moreover, it is important that researchers be aware of how their choice of cases might introduce bias. 394 Research Designs Such efforts to select cases in a conscious and careful manner, however, should not be mistaken as steps providing a sufficient basis for making claims about generalizations (external validity). Finally, with regard to the indirect consequences of design choices, two strengths of case and small N studies deserve mention. One is that this type of design constitutes a powerful tool for theory generation. Indeed, one of the clear benefits of qualitative research is its fruitfulness at a tool for generating ideas about causal variables and mechanisms and for theorizing that is closely informed by knowledge of the substantive problem of interest. Another related benefit is that the detailed knowledge of context that is associated with case studies plays a critical role in helping researchers establish measurement validity. In summary, as is the case with the experimental and quantitative traditions, the qualitative tradition is characterized by a mix of strengths and weaknesses (Table IV) that must be considered in any balanced assessment of the potentials of different research designs. Conclusion: Choices and Bridges The broad message concerning research design we have sought to convey is that making causal inferences about the complex realities of interest to social scientists is probably more difficult than is generally believed and that questions of research design play a key role in determining whether researchers have a solid basis for making claims about causal relationships. Technical fixes cannot, in general, get around design problems, but more attention goes to the former than the latter. Indeed, given the centrality of research design to the research process as a whole, it is probably fair to say that research design is a relatively unappreciated aspect of methodology and that it deserves more attention from methodologists and practicing researchers alike. Beyond this generic admonition, two further points that build on but go beyond the previous discussion offer material for further consideration. One point is the need for choices about research design to be explicitly addressed and justified. There is a tendency for researchers to work within distinct research traditions and to simply opt for certain designs as a matter of default. Such a tendency is understandable in that different designs require different skills and training, and in a practical sense researchers are thus not free to choose among research designs. Nonetheless, short of justifying their design choices in light of the range of possible options, at the very least researchers should address the impact of their choices on the certainty of their conclusions. As this article shows, this entails a consideration of the direct impact on the prospects of causal inference of the four core choices involved in research design and of the indirect impact, due to their requirements and contributions vis-à-vis theory and data, of these choices. A second point is the need to creatively construct bridges across research traditions. Although it is common for certain traditions to be presented as inherently superior to others and the standard against which other traditions should be measured, this article has shown that all traditions are characterized by certain strengths and weaknesses and that it is thus more accurate and useful to think in terms of the tradeoffs involved in working within different traditions. An implication of this assessment, then, is that greater effort should be made to capitalize on what are clearly the complementary strengths of different traditions (compare Tables IIIV). Efforts at bridging, whether carried out through multiple studies on the same question or mixed designs that combine multiple designs within a single study, are very demanding. Thus, although it is common to point out that multiple studies in the context of a shared research program offer a way of combining different designs, such combinations are only effective inasmuch as research programs are organized around clearly specified concepts and questions and are advanced, at least to a certain extent, through explicitly coordinated teamwork. In turn, the effective use of mixed designs requires a level of methodological sophistication, as well as theoretical and substantive knowledge, that is rare. Nonetheless, the high payoffs associated with the use of mixed methods make these options strongly recommendable. See Also the Following Articles Experiments, Overview  Explore, Explain, Design  Longitudinal Cohort Designs  Observational Studies  Quasi-Experiment  Sample Design  Survey Design  Time-SeriesCross-Section Data  Validity, Data Sources Further Reading Blalock, H. (1991). Are there really any constructive alternatives to causal modeling? Sociol. Methodol. 21, 325335. Bollen, K. A. (1989). Structural Equations with Latent Variables. Wiley, New York. Brady, H. E., and Collier, D. (eds.) (2004). Rethinking Social Inquiry: Diverse Tools, Shared Standards. Rowman & Littlefield/Berkeley Public Policy Press, Lanham, MD. Campbell, D. T. (1988a). Factors relevant to the validity of experiments in social settings. In Methodology and Epistemology for Social Science: Selected Papers. Donald T. Campbell (E. Samuel Overman, ed.), pp. 151166. University of Chicago Press, Chicago [Original work published 1957]. Campbell, D. T. (1988b). Degrees of freedom and the case study. In Methodology and Epistemology for Social Science: Selected Papers. Donald T. Campbell (E. Samuel Overman, Research Designs ed.), pp. 377388. University of Chicago Press, Chicago [Original work published 1975]. Campbell, D. T. (1999). Relabeling internal and external validity for applied social scientists. In Social Experimentation (D. T. Campbell and M. J. Russo, eds.), pp. 111122. Sage, Thousand Oaks, CA [Original work published 1986]. Collier, D. (1993). The comparative method. In Political Science: The State of the Discipline II (A. W. Finifter, ed.), pp. 105119. American Political Science Association, Washington, DC. Cronbach, L. J. (1982). Designing Evaluations of Educational and Social Programs. Jossey-Bass, San Francisco. Freedman, D. A. (1991). Statistical analysis and shoe leather. Sociol. Methodol. 21, 291313. Goldthorpe, J. H. (2000). On Sociology: Numbers, Narratives, and the Integration of Research and Theory. Oxford University Press, Oxford, UK. Good, P. I. (2000). Permutation Tests: A Practical Guide to Resampling Methods for Testing Hypotheses. Springer, Berlin. Kagel, J. H., and Roth, A. E. (eds.) (1995). The Handbook of Experimental Economics. Princeton University Press, Princeton, NJ. 395 King, G., Keohane, R. O., and Verba, S. (1994). Designing Social Inquiry. Scientific Inference in Qualitative Research. Princeton University Press, Princeton, NJ. Mahoney, J. (2000). Strategies of causal inference in small-N research. Sociol. Methods Res. 28(4), 387424. McDermott, R. (2002). Experimental methods in political science. Annu. Rev. Political Sci. 5, 3161. Oehlert, G. W. (2000). A First Course in Design and Analysis of Experiments. Freeman, New York. Shadish, W. R., Cook, T. D., and Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin, Boston. Smith, H. L. (1990). Specification problems in experimental and nonexperimental social research. Sociol. Methodol. 20, 5991. Tashakkori, A., and Teddlie, C. (1998). Mixed Methodology: Combining Qualitative and Quantitative Approaches. Sage, Thousand Oaks, CA. Webb, E. J., Campbell, D. T., Schwartz R. D., and Sechrest, L. (2000). Unobtrusive Measures. Revised Edition. Sage, Thousand Oaks, CA. Western, B., and Jackman, S. (1994). Bayesian inference for comparative research. Am. Political Sci. Rev. 88(2), 412423.