Teknik Sampling Penelitian

Download as pdf or txt
Download as pdf or txt
You are on page 1of 44

71 HUMAN

THE LOGIC
INQUIRY
OF SAMPLING
AND SCIENCE

Image not available due to copyright restrictions


Photo credit

This Chapter
What You’ll Learn in this Chapter

Now you’ll
We’ll see how
examine thesocial scientists
way people can
learn select
about a few
their people
world andfor
thestudy and discover
mistakes they
thingsalong
make that apply to hundreds
the way. ofbegin
We’ll also millions of people
to see not studied.
what makes sience different from
other ways of knowing things. 2S
1S
N
L
198
In this chapter . . . WHAT DO YOU THINK?

Introduction
In 1936 the Literary
A Brief History of Sampling Digest collected the
President Alf Landon voting intentions of
President Thomas E. Dewey Image not available due to copyright restrictions two million voters
Two Types of Sampling Methods in order to predict
whether Franklin
Nonprobability Sampling
D. Roosevelt or Alf
Reliance on Available Subjects
Landon would be
Purposive or Judgmental Sampling
elected president of the United States. Dur-
Snowball Sampling
ing more recent election campaigns, with
Quota Sampling
many more voters going to the polls, na-
Selecting Informants
tional polling firms have typically sampled
The Theory and Logic of Probability around 2,000 voters across the country.
Sampling Which technique do you think is the most
Conscious and Unconscious Sampling Bias effective? Why?
Representativeness and Probability of Selection
See the “What Do You Think? Revisited” box
Random Selection
toward the end of the chapter.
Probability Theory, Sampling Distributions,
and Estimates of Sample Error

Populations and Sampling Frames


INTRODUCTION
Types of Sampling Designs
Simple Random Sampling
Systematic Sampling One of the most visible uses of survey sampling
Stratified Sampling lies in the political polling that the election results
Implicit Stratification in Systematic Sampling subsequently test. Whereas some people doubt the
Illustration: Sampling University Students accuracy of sample surveys, others complain that
Sample Modification political polls take all the suspense out of cam-
paigns by foretelling the result. In recent presiden-
Multistage Cluster Sampling tial elections, however, the polls have not removed
Multistage Designs and Sampling Error the suspense.
Stratification in Multistage Cluster Sampling Going into the 2004 presidential elections, poll-
Probability Proportionate to Size (PPS) Sampling sters generally agreed that the election was “too
Disproportionate Sampling and Weighting close to call,” a repeat of their experience four
years earlier. Table 7-1 reports polls conducted
Probability Sampling in Review
during the few days preceding the election. Despite
The Ethics of Sampling some variations, the overall picture they present is

199
200 CHAPTER 7 THE LOGIC OF SAMPLING

TABLE 7-1 Election Eve Polls Reporting Political polling, like other forms of social re-
Percentage of Population Voting for U.S. search, rests on observations. But neither poll-
Presidential Candidates, 2004 sters nor other social researchers can observe ev-
erything that might be relevant to their interests.
Date
Poll Begun Bush Kerry A critical part of social research, then, is decid-
ing what to observe and what not. If you want to
Fox/OpinDynamics Oct 28 50 50
study voters, for example, which voters should you
TIPP Oct 28 53 47
study?
CBS/NYT Oct 28 52 48
The process of selecting observations is called
ARG Oct 28 50 50
sampling. Although sampling can mean any proce-
ABC Oct 28 51 49
dure for selecting units of observation—for exam-
Fox/OpinDynamics Oct 29 49 51
ple, interviewing every tenth passerby on a busy
Gallup/CNN/USA Oct 29 49 51
street—the key to generalizing from a sample to
NBC/WSJ Oct 29 51 49
a larger population is probability sampling, which
TIPP Oct 29 51 49
involves the important idea of random selection.
Harris Oct 29 52 48
Much of this chapter is devoted to the logic and
Democracy Corps Oct 29 49 51
skills of probability sampling. This topic is more
Harris Oct 29 51 49
rigorous and precise than some of the other topics
CBS Oct 29 51 49
in this book. Whereas social research as a whole
Fox/OpinDynamics Oct 30 49 52
is both art and science, sampling leans toward sci-
TIPP Oct 30 51 49
ence. Although this subject is somewhat techni-
Marist Oct 31 50 50
cal, the basic logic of sampling is not difficult to
GWU Battleground 2004 Oct 31 52 48
understand. In fact, the logical neatness of this
Actual vote Nov 2 52 48
topic can make it easier to comprehend than, say,
Source: Poll data adapted from the Roper Center, Election 2004 conceptualization.
(http://www.ropercenter.uconn.edu/elect_2004/pres_trial_heats Although probability sampling is central to so-
.html). Accessed November 16, 2004. I’ve apportioned the un-
decided and other votes according to the percentages saying cial research today, we’ll also examine a variety
they were voting for Bush or Kerry. of nonprobability methods. These methods have
their own logic and can provide useful samples for
amazingly consistent and pretty well matched the social inquiry.
election results. Before we discuss the two major types of sam-
Now, how many interviews do you suppose it pling, I’ll introduce you to some basic ideas by way
took each of these pollsters to come within a cou- of a brief history of sampling. As you’ll see, the
ple of percentage points in estimating the behavior pollsters who correctly predicted the election cliff-
of more than 115 million voters? Often fewer than hangers of 2000 and 2004 did so in part because
2,000! In this chapter, we’re going to find out how researchers had learned to avoid some pitfalls that
social researchers can achieve such wizardry. earlier pollsters had fallen into.
For another powerful illustration of the potency
of sampling, look at this graphic portrayal of Presi-
A BRIEF HISTORY OF SAMPLING
dent George W. Bush’s approval ratings prior to
and following the September 11, 2001, terrorist
attack on the United States (see Figure 7-1). The Sampling in social research has developed hand
data reported by several different polling agencies in hand with political polling. This is the case, no
describe the same pattern. doubt, because political polling is one of the few
A BRIEF HISTORY OF SAMPLING 201

100
Before After
September 11th attack September 11th attack
90

80
Approval rating

70

60

50

40
20

20

20

20

20

20

20

20

20

0
20

20

20

20

20

20

20

20

20
/2

/2

/2
1/

2/

3/

4/

5/

6/

7/

8/

9/

1/

2/

3/

4/

5/

6/

7/

8/

9/
10

11

12
2001 2002
Date

Key: ABC/Post CBS Harris Ipsos-Reid Pew


Bloomberg Fox IBD/CSM NBC/WSJ AmResGp
CNN/Time Gallup Zogby Newsweek

FIGURE 7-1 Bush Approval: Raw Poll Data. This graph demonstrates how independent polls produce
the same picture of reality. This also shows the impact of a national crisis on the president’s popularity: in
this case, the 9/11 terrorist attack and President George W. Bush’s popularity.
Source: Copyright © 2001, 2002 by drlimerick.com (http://www.pollkatz.homestead.com/files/MyHTML2.gif). All rights reserved.

opportunities social researchers have to discover they were planning to vote for in the presidential
the accuracy of their estimates. On election day, campaign between Warren Harding and James
they find out how well or how poorly they did. Cox. Names were selected for the poll from tele-
phone directories and automobile registration
lists. Based on the postcards sent back, the Digest
President Alf Landon
correctly predicted that Harding would be elected.
President Alf Landon? Who’s he? Did you sleep In the elections that followed, the Literary Digest
through an entire presidency in your U.S. history expanded the size of its poll and made correct pre-
class? No—but Alf Landon would have been presi- dictions in 1924, 1928, and 1932.
dent if a famous poll conducted by the Literary Di- In 1936 the Digest conducted its most ambitious
gest had proved to be accurate. The Literary Digest poll: Ten million ballots were sent to people listed
was a popular newsmagazine published between in telephone directories and on lists of automobile
1890 and 1938. In 1920 Digest editors mailed post- owners. Over two million people responded, giv-
cards to people in six states, asking them whom ing the Republican contender, Alf Landon, a stun-
202 CHAPTER 7 THE LOGIC OF SAMPLING

ning 57 to 43 percent landslide over the incumbent, poor voted predominantly for Roosevelt’s New
President Franklin Roosevelt. The editors modestly Deal recovery program. The Digest’s poll may or
cautioned, may not have correctly represented the voting in-
tentions of telephone subscribers and automobile
We make no claim to infallibility. We did not owners. Unfortunately for the editors, it decidedly
coin the phrase “uncanny accuracy” which has did not represent the voting intentions of the popu-
been so freely applied to our Polls. We know lation as a whole.
only too well the limitations of every straw
vote, however enormous the sample gathered,
however scientific the method. It would be a You may be able to find the Literary Digest
miracle if every State of the forty-eight behaved in your library. You can find traces of it by
searching the web. As an alternative, go to
on Election Day exactly as forecast by the Poll.
http://www.eBay.com and see how many
— (LITERARY DIGEST 1936a:6)
old issues are available for sale.*
Two weeks later, the Digest editors knew the
limitations of straw polls even better: The vot-
ers gave Roosevelt a second term in office by the President Thomas E. Dewey
largest landslide in history, with 61 percent of the
The 1936 election also saw the emergence of a
vote. Landon won only 8 electoral votes to Roos-
young pollster whose name would become synon-
evelt’s 523.
ymous with public opinion. In contrast to the Liter-
The editors were puzzled by their unfortunate
ary Digest, George Gallup correctly predicted that
turn of luck. Part of the problem surely lay in the 22
Roosevelt would beat Landon. Gallup’s success in
percent return rate garnered by the poll. The edi-
1936 hinged on his use of something called quota
tors asked,
sampling, which we’ll examine later in the chapter.
Why did only one in five voters in Chicago to
For now, it’s enough to know that quota sampling
whom the Digest sent ballots take the trouble to
is based on a knowledge of the characteristics of
reply? And why was there a preponderance of
the population being sampled: what proportion
Republicans in the one-fifth that did reply? . . .
are men, what proportion are women, what pro-
We were getting better cooperation in what we
portions are of various incomes, ages, and so on.
have always regarded as a public service from
Quota sampling selects people to match a set of
Republicans than we were getting from Demo-
these characteristics: the right number of poor,
crats. Do Republicans live nearer to mailboxes?
white, rural men; the right number of rich, African
Do Democrats generally disapprove of straw
American, urban women; and so on. The quotas
polls? — (LITERARY DIGEST 1936b:7)
are based on those variables most relevant to the
study. In the case of Gallup’s poll, the sample selec-
Actually, there was a better explanation—what tion was based on levels of income; the selection
is technically called the sampling frame used by the procedure ensured the right proportion of respon-
Digest. In this case the sampling frame consisted of dents at each income level.
telephone subscribers and automobile owners. In Gallup and his American Institute of Public
the context of 1936, this design selected a dispro- Opinion used quota sampling to good effect in
portionately wealthy sample of the voting popula- 1936, 1940, and 1944—correctly picking the presi-
tion, especially coming on the tail end of the worst
economic depression in the nation’s history. The *Each time the Internet icon appears, you’ll be given help-
sample effectively excluded poor people, and the ful leads for searching the World Wide Web.
NONPROBABILITY SAMPLING 203

dential winner each time. Then, in 1948, Gallup Today, probability sampling remains the primary
and most political pollsters suffered the embar- method of selecting large, representative samples
rassment of picking Governor Thomas Dewey of for social research, including national political
New York over the incumbent, President Harry polls. At the same time, probability sampling can
Truman. The pollsters’ embarrassing miscue con- be impossible or inappropriate in many research
tinued right up to election night. A famous photo- situations. Accordingly, before turning to the logic
graph shows a jubilant Truman—whose followers’ and techniques of probability sampling, we’ll first
battle cry was “Give ‘em hell, Harry!”—holding aloft take a look at techniques for nonprobability sam-
a newspaper with the banner headline “Dewey De- pling and how they’re used in social research.
feats Truman.”
Several factors accounted for the pollsters’ fail-
ure in 1948. First, most pollsters stopped polling in NONPROBABILITY SAMPLING
early October despite a steady trend toward Tru-
man during the campaign. In addition, many vot-
ers were undecided throughout the campaign, and Social research is often conducted in situations
they went disproportionately for Truman when that do not permit the kinds of probability samples
they stepped into the voting booth. used in large-scale social surveys. Suppose you
More important, Gallup’s failure rested on the wanted to study homelessness: There is no list of
unrepresentativeness of his samples. Quota sam- all homeless individuals, nor are you likely to cre-
pling—which had been effective in earlier years— ate such a list. Moreover, as you’ll see, there are
was Gallup’s undoing in 1948. This technique re- times when probability sampling would not be ap-
quires that the researcher know something about propriate even if it were possible. Many such situa-
the total population (of voters in this instance). For tions call for nonprobability sampling.
national political polls, such information came pri- In this section, we’ll examine four types of non-
marily from census data. By 1948, however, World probability sampling: reliance on available sub-
War II had produced a massive movement from jects, purposive or judgmental sampling, snowball
the country to cities, radically changing the char- sampling, and quota sampling. We’ll conclude
acter of the U.S. population from what the 1940 with a brief discussion of techniques for obtaining
census showed, and Gallup relied on 1940 cen- information about social groups through the use
sus data. City dwellers, moreover, tended to vote of informants.
Democratic; hence, the overrepresentation of rural
voters in his poll had the effect of underestimating
the number of Democratic votes. Reliance on Available Subjects
Relying on available subjects, such as stopping
Two Types of Sampling Methods people at a street corner or some other location, is
an extremely risky sampling method; even so, it’s
By 1948 some academic researchers had already used all too frequently. Clearly, this method does
been experimenting with a form of sampling based
on probability theory. This technique involves the
selection of a “random sample” from a list contain-
nonprobability sampling Any technique in which
ing the names of everyone in the population being
samples are selected in some way not suggested by
sampled. By and large, the probability-sampling probability theory. Examples include reliance on available
methods used in 1948 were far more accurate than subjects as well as purposive (judgmental), snowball, and
quota-sampling techniques. quota sampling.
204 CHAPTER 7 THE LOGIC OF SAMPLING

not permit any control over the representative- tion than were other physicians, but we can’t say
ness of a sample. It’s justified only if the researcher for sure. Although such studies can provide useful
wants to study the characteristics of people pass- insights, we must take care not to overgeneralize
ing the sampling point at specified times or if less from them.
risky sampling methods are not feasible. Even
when this method is justified on grounds of fea-
Purposive or Judgmental Sampling
sibility, researchers must exercise great caution in
generalizing from their data. Also, they should alert Sometimes it’s appropriate to select a sample on
readers to the risks associated with this method. the basis of knowledge of a population, its ele-
University researchers frequently conduct sur- ments, and the purpose of the study. This type of
veys among the students enrolled in large lecture sampling is called purposive sampling (or judg-
classes. The ease and frugality of this method ex- mental sampling). In the initial design of a ques-
plains its popularity, but it seldom produces data of tionnaire, for example, you might wish to select
any general value. It may be useful for pretesting a the widest variety of respondents to test the broad
questionnaire, but such a sampling method should applicability of questions. Although the study find-
not be used for a study purportedly describing stu- ings would not represent any meaningful popula-
dents as a whole. tion, the test run might effectively uncover any pe-
Consider this report on the sampling design in culiar defects in your questionnaire. This situation
an examination of knowledge and opinions about would be considered a pretest, however, rather
nutrition and cancer among medical students and than a final study.
family physicians: In some instances, you may wish to study a
small subset of a larger population in which many
The fourth-year medical students of the Univer- members of the subset are easily identified, but the
sity of Minnesota Medical School in Minneapolis enumeration of them all would be nearly impos-
comprised the student population in this study. sible. For example, you might want to study the
The physician population consisted of all physi- leadership of a student protest movement; many of
cians attending a “Family Practice Review and the leaders are visible, but it would not be feasible
Update” course sponsored by the University of to define and sample all leaders. In studying all or a
Minnesota Department of Continuing Medical sample of the most visible leaders, you may collect
Education. — (COOPER-STEPHENSON AND THEOLOGIDES data sufficient for your purposes.
1981:472) Or let’s say you want to compare left-wing and
right-wing students. Because you may not be able
After all is said and done, what will the results of
to enumerate and sample from all such students,
this study represent? They do not provide a mean-
you might decide to sample the memberships of
ingful comparison of medical students and family
left- and right-leaning groups, such as the Green
physicians in the United States or even in Minne-
Party and the Young Americans for Freedom. Al-
sota. Who were the physicians who attended the
though such a sample design would not provide a
course? We can guess that they were probably
good description of either left-wing or right-wing
more concerned about their continuing educa-
students as a whole, it might suffice for general
comparative purposes.
Field researchers are often particularly inter-
purposive sampling A type of nonprobability sam- ested in studying deviant cases—cases that do not
pling in which the units to be observed are selected on the
fit into patterns of mainstream attitudes and be-
basis of the researcher’s judgment about which ones will
be the most useful or representative. Also called judgmen- haviors—in order to improve their understanding
tal sampling. of the more usual pattern. For example, you might
NONPROBABILITY SAMPLING 205

gain important insights into the nature of school snowball sampling to discover a network of to-
spirit, as exhibited at a pep rally, by interviewing bacco policy makers in Australia: both those at the
people who did not appear to be caught up in the core of the network and those on the periphery.
emotions of the crowd or by interviewing students
who did not attend the rally at all. Selecting devi-
Quota Sampling
ant cases for study is another example of purpo-
sive study. Quota sampling is the method that helped George
Gallup avoid disaster in 1936—and set up the di-
saster of 1948. Like probability sampling, quota
Snowball Sampling
sampling addresses the issue of representative-
Another nonprobability-sampling technique, which ness, although the two methods approach the is-
some consider to be a form of accidental sam- sue quite differently.
pling, is called snowball sampling. This proce- Quota sampling begins with a matrix, or table,
dure is appropriate when the members of a special describing the characteristics of the target popula-
population are difficult to locate, such as homeless tion. Depending on your research purposes, you
individuals, migrant workers, or undocumented may need to know what proportion of the popula-
immigrants. In snowball sampling, the researcher tion is male and what proportion female as well as
collects data on the few members of the target what proportions of each gender fall into various
population he or she can locate, then asks those age categories, educational levels, ethnic groups,
individuals to provide the information needed and so forth. In establishing a national quota sam-
to locate other members of that population whom ple, you might need to know what proportion of
they happen to know. “Snowball” refers to the the national population is urban, Eastern, male,
process of accumulation as each located subject under 25, white, working class, and the like, and
suggests other subjects. Because this procedure all the possible combinations of these attributes.
also results in samples with questionable repre- Once you’ve created such a matrix and assigned
sentativeness, it’s used primarily for exploratory a relative proportion to each cell in the matrix, you
purposes. proceed to collect data from people having all the
Suppose you wish to learn a community orga- characteristics of a given cell. You then assign to
nization’s pattern of recruitment over time. You all the people in a given cell a weight appropriate
might begin by interviewing fairly recent recruits, to their portion of the total population. When all
asking them who introduced them to the group. the sample elements are so weighted, the overall
You might then interview the people named, ask- data should provide a reasonable representation of
ing them who introduced them to the group. You the total population.
might then interview those people named, ask- Although quota sampling resembles probability
ing, in part, who introduced them. Or, in studying sampling, it has several inherent problems. First,
a loosely structured political group, you might ask
one of the participants who he or she believes to
be the most influential members of the group. You snowball sampling A nonprobability-sampling
might interview those people and, in the course of method, often employed in field research, whereby each
the interviews, ask who they believe to be the most person interviewed may be asked to suggest additional
people for interviewing.
influential. In each of these examples, your sample
would “snowball” as each of your interviewees quota sampling A type of nonprobability sampling
in which units are selected into a sample on the basis of
suggested other people to interview. prespecified characteristics, so that the total sample will
In another example, Karen Farquharson (2005) have the same distribution of characteristics assumed to
provides a detailed discussion of how she used exist in the population being studied.
206 CHAPTER 7 THE LOGIC OF SAMPLING

the quota frame (the proportions that different cells of respondents as people who provide informa-
represent) must be accurate, and it is often difficult tion about themselves, allowing the researcher to
to get up-to-date information for this purpose. The construct a composite picture of the group those
Gallup failure to predict Truman as the presidential respondents represent, an informant is a member
victor in 1948 stemmed partly from this problem. of the group who can talk directly about the group
Second, the selection of sample elements within a per se.
given cell may be biased even though its propor- Especially important to anthropologists, infor-
tion of the population is accurately estimated. In- mants are important to other social researchers
structed to interview five people who meet a given, as well. If you wanted to learn about informal so-
complex set of characteristics, an interviewer may cial networks in a local public housing project, for
still avoid people living at the top of seven-story example, you would do well to locate individuals
walk-ups, having particularly run-down homes, or who could understand what you were looking for
owning vicious dogs. and help you find it.
In recent years, some researchers have at- When Jeffrey Johnson (1990) set out to study a
tempted to combine probability and quota sam- salmon-fishing community in North Carolina, he
pling methods, but the effectiveness of this effort used several criteria to evaluate potential infor-
remains to be seen. At present, you should treat mants. Did their positions allow them to interact
quota sampling warily if your purpose is statistical regularly with other members of the camp, for ex-
description. ample, or were they isolated? (He found that the
At the same time, the logic of quota sampling carpenter had a wider range of interactions than
can sometimes be applied usefully to a field re- did the boat captain.) Was their information about
search project. In the study of a formal group, for the camp limited to their specific jobs, or did it
example, you might wish to interview both leaders cover many aspects of the operation? These and
and nonleaders. In studying a student political other criteria helped determine how useful the po-
organization, you might want to interview radi- tential informants might be.
cal, moderate, and conservative members of that Usually, you’ll want to select informants who
group. You may be able to achieve sufficient rep- are somewhat typical of the groups you’re study-
resentativeness in such cases by using quota sam- ing. Otherwise, their observations and opinions
pling to ensure that you interview both men and may be misleading. Interviewing only physicians
women, both younger and older people, and so will not give you a well-rounded view of how a
forth. community medical clinic is working, for example.
Along the same lines, an anthropologist who in-
terviews only men in a society where women are
Selecting Informants
sheltered from outsiders will get a biased view.
When field research involves the researcher’s at- Similarly, although informants fluent in English are
tempt to understand some social setting—a ju- convenient for English-speaking researchers from
venile gang or local neighborhood, for exam- the United States, they do not typify the members
ple—much of that understanding will come from of many societies or even many subgroups within
a collaboration with some members of the group English-speaking countries.
being studied. Whereas social researchers speak Simply because they’re the ones willing to work
with outside investigators, informants will almost
always be somewhat “marginal” or atypical within
informant Someone well versed in the social phenom-
enon that you wish to study and who is willing to tell you their group. Sometimes this is obvious. Other
what he or she knows about it. Not to be confused with a times, however, you’ll learn about their marginal-
respondent. ity only in the course of your research.
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 207

contexts in which those actions are embedded.


— (LOFLAND ET AL. 2006:15)

In other words, nonprobability sampling does


have its uses, particularly in qualitative research
projects. But researchers must take care to ac-
knowledge the limitations of nonprobability sam-
pling, especially regarding accurate and precise
Earl Babbie

representations of populations. This point will


become clearer as we discuss the logic and tech-
With so many possible informants, how can the niques of probability sampling.
researcher begin to choose?
To see some practical implications of
choosing and using informants, visit the
In Johnson’s study, the county agent identified
website of Canada’s Community Adapta-
one fisherman who seemed squarely in the main- tion and Sustainable Livelihoods (CASL)
stream of the community. Moreover, he was co- Program: http://www.iisd.ca/casl
operative and helpful to Johnson’s research. The /CASLGuide/KeyInformEx.htm.
more Johnson worked with the fisherman, how-
ever, the more he found the man to be a marginal
member of the fishing community.
THE THEORY AND LOGIC OF
First, he was a Yankee in a southern town. Sec- PROBABILITY SAMPLING
ond, he had a pension from the Navy [so he was
not seen as a “serious fisherman” by others in
Although appropriate to some research purposes,
the community]. . . . Third, he was a major Re-
nonprobability-sampling methods cannot guaran-
publican activist in a mostly Democratic village.
tee that the sample we observed is representative
Finally, he kept his boat in an isolated anchor-
of the whole population. When researchers want
age, far from the community harbor. — (JOHNSON
precise, statistical descriptions of large popula-
1990:56)
tions—for example, the percentage of the popula-
Informants’ marginality may not only bias the tion who are unemployed, plan to vote for Candi-
view you get but also limit their access (and hence date X, or feel a rape victim should have the right to
yours) to the different sectors of the community an abortion—they turn to probability sampling.
you wish to study. All large-scale surveys use probability-sampling
These comments should give you some sense methods.
of the concerns involved in nonprobability sam- Although the application of probability sam-
pling, typically used in qualitative research proj- pling involves a somewhat sophisticated use of
ects. I conclude with the following injunction: statistics, the basic logic of probability sampling
is not difficult to understand. If all members of a
Your overall goal is to collect the richest possible
data. By rich data, we mean a wide and diverse
range of information collected over a relatively
probability sampling The general term for samples
prolonged period of time in a persistent and sys-
selected in accord with probability theory, typically involv-
tematic manner. Ideally, such data enable you to ing some random-selection mechanism. Specific types of
grasp the meanings associated with the actions probability sampling include EPSEM, PPS, simple random
of those you are studying and to understand the sampling, and systematic sampling.
208 CHAPTER 7 THE LOGIC OF SAMPLING

50 a sample that should adequately reflect variations


44 44 that exist in the population.
40
Number of people

Conscious and
30 Unconscious Sampling Bias
At first glance, it may look as though sampling is
20
pretty straightforward. To select a sample of 100
university students, you might simply interview the
10
6 6 first 100 students you find walking around campus.
Although untrained researchers often use this kind
0 of sampling method, it runs a high risk of introduc-
White African White African
women American men American ing biases into the samples.
women men In connection with sampling, bias simply means
that those selected are not typical or representa-
FIGURE 7-2 A Population of 100 Folks. Typically,
sampling aims to reflect the characteristics and tive of the larger populations they have been cho-
dynamics of large populations. For the purpose sen from. This kind of bias does not have to be in-
of some simple illustrations, let’s assume our total tentional. In fact, it’s virtually inevitable when you
population has only 100 members. pick people by the seat of your pants.
Figure 7-3 illustrates what can happen when re-
searchers simply select people who are convenient
population were identical in all respects—all de- for study. Although women make up 50 percent of
mographic characteristics, attitudes, experiences, our micropopulation, the people closest to the re-
behaviors, and so on—there would be no need for searcher (in the lower right corner) happen to be
careful sampling procedures. In this extreme case 70 percent women, and although the population is
of perfect homogeneity, in fact, any single case 12 percent African American, none were selected
would suffice as a sample to study characteristics into the sample.
of the whole population. Beyond the risks inherent in simply studying
In fact, of course, the human beings who com- people who are convenient, other problems can
pose any real population are quite heterogeneous, arise. To begin, the researcher’s personal leanings
varying in many ways. Figure 7-2 offers a simpli- may affect the sample to the point where it does
fied illustration of a heterogeneous population: not truly represent the student population. Sup-
The 100 members of this small population differ pose you’re a little intimidated by students who
by gender and race. We’ll use this hypothetical mi- look particularly “cool,” feeling they might ridicule
cropopulation to illustrate various aspects of prob- your research effort. You might consciously or un-
ability sampling. consciously avoid interviewing such people. Or,
The fundamental idea behind probability sam- you might feel that the attitudes of “super-straight-
pling is this: In order to provide useful descriptions looking” students would be irrelevant to your re-
of the total population, a sample of individuals search purposes and so avoid interviewing them.
from a population must contain essentially the Even if you sought to interview a “balanced”
same variations that exist in the population. This group of students, you wouldn’t know the exact
isn’t as simple as it might seem, however. Let’s take proportions of different types of students making
a minute to look at some of the ways researchers up such a balance, and you wouldn’t always be
might go astray. Then, we’ll see how probability able to identify the different types just by watching
sampling provides an efficient method for selecting them walk by.
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 209

African
White
American
Men
Women
The Sample

FIGURE 7-3 A Sample of Convenience: Easy, but Not Representative. Selecting and observing those
people who are most readily at hand is the simplest method, perhaps, but it’s unlikely to provide a sample
that accurately reflects the total population.

Further, even if you made a conscientious ef- Ironically, the failure of such polls to represent
fort to interview, say, every tenth student entering all opinions equally was inadvertently acknowl-
the university library, you could not be sure of a edged by Phillip Perinelli (1986), a staff manager
representative sample, because different types of of AT&T Communications’ DIAL-IT 900 Service,
students visit the library with different frequencies. which offers a call-in poll facility to organizations.
Your sample would overrepresent students who Perinelli attempted to counter criticisms by say-
visit the library more often than do others. ing, “The 50-cent charge assures that only inter-
Similarly, the “public opinion” call-in polls— ested parties respond and helps assure also that
in which radio stations or newspapers ask people no individual ‘stuffs’ the ballot box.” We cannot
to call specified telephone numbers to register determine general public opinion while consider-
their opinions—cannot be trusted to represent ing “only interested parties.” This excludes those
general populations. At the very least, not every- who don’t care 50-cents’ worth, as well as those
one in the population will even be aware of the who recognize that such polls are not valid. Both
poll. This problem also invalidates polls by maga- types of people may have opinions and may even
zines and newspapers who publish coupons for vote on election day. Perinelli’s assertion that the
readers to complete and mail in. Even among 50-cent charge will prevent ballot stuffing actually
those who are aware of such polls, not all will ex- means that only those who can afford it will en-
press an opinion, especially if doing so will cost gage in ballot stuffing.
them a stamp, an envelope, or a telephone charge. The possibilities for inadvertent sampling bias
Similar considerations apply to polls taken over are endless and not always obvious. Fortunately
the Internet. several techniques can help us avoid bias.
210 CHAPTER 7 THE LOGIC OF SAMPLING

Representativeness
and Probability of Selection IN THE REAL WORLD
REPRESENTATIVE SAMPLING
Although the term representativeness has no
precise, scientific meaning, it carries a common-
Representativeness applies to many areas of
sense meaning that makes it useful here. For our
life, not just survey sampling. Consider qual-
purpose, a sample is representative of the popula-
ity control, for example. Imagine running a
tion from which it is selected if the aggregate char-
company that makes light bulbs. You want
acteristics of the sample closely approximate those
to be sure that they actually light up, but you
same aggregate characteristics in the population.
can’t test them all. You could, however, de-
If, for example, the population contains 50 percent
vise a method of selecting a sample of bulbs
women, then a sample must contain “close to” 50
drawn from different times in the production
percent women to be representative. Later, we’ll
day, on different machines, in different fac-
discuss “how close” in detail. See the box “Repre-
tories, and so forth.
sentative Sampling” for more on this.
Sometimes the concept of representa-
Note that samples need not be representative
tive sampling serves as a protection against
in all respects; representativeness concerns only
overgeneralization, discussed in Chapter 1.
those characteristics that are relevant to the sub-
Suppose you go to a particular restaurant
stantive interests of the study. However, you may
and don’t like the food or service. You’re
not know in advance which characteristics are
ready to cross it off your list of dining possi-
relevant.
bilities, but then you think about it—perhaps
A basic principle of probability sampling is that
you hit them on a bad night. Perhaps the chef
a sample will be representative of the population
had just discovered her boyfriend in bed with
from which it is selected if all members of the pop-
that “witch” from the Saturday wait staff and
ulation have an equal chance of being selected in
her mind wasn’t on her cooking. Or perhaps
the sample. (We’ll see shortly that the size of the
the “witch” was serving your table and kept
sample selected also affects the degree of repre-
looking over her shoulder to see if anyone
sentativeness.) Samples that have this quality are
with a meat cleaver was bursting out of the
often labeled EPSEM samples (EPSEM stands for
kitchen. In short, your first experience might
“equal probability of selection method”). Later we’ll
not have been representative.
discuss variations of this principle, which forms the
basis of probability sampling.

Moving beyond this basic principle, we must


representativeness That quality of a sample of
realize that samples—even carefully selected EP-
having the same distribution of characteristics as the
population from which it was selected. By implication, de- SEM samples—seldom if ever perfectly represent
scriptions and explanations derived from an analysis of the the populations from which they are drawn. Nev-
sample may be assumed to represent similar ones in the ertheless, probability sampling offers two special
population. Representativeness is enhanced by probability advantages.
sampling and provides for generalizability and the use of
First, probability samples, although never per-
inferential statistics.
fectly representative, are typically more represen-
EPSEM (equal probability of selection
tative than other types of samples, because the bi-
method) A sample design in which each member of a
population has the same chance of being selected into the ases previously discussed are avoided. In practice,
sample. a probability sample is more likely than a nonprob-
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 211

ability sample to be representative of the popula- (Americans as of when?). Translating the abstract
tion from which it is drawn. “adult New Yorkers” into a workable population
Second, and more important, probability theory would require a specification of the age defining
permits us to estimate the accuracy or represen- adult and the boundaries of New York. Specifying
tativeness of the sample. Conceivably, an unin- the term college student would include a consider-
formed researcher might, through wholly haphaz- ation of full- and part-time students, degree can-
ard means, select a sample that nearly perfectly didates and nondegree candidates, undergraduate
represents the larger population. The odds are and graduate students, and so forth.
against doing so, however, and we would be un- A study population is that aggregation of ele-
able to estimate the likelihood that he or she has ments from which the sample is actually selected.
achieved representativeness. The probability sam- As a practical matter, researchers are seldom in a
pler, on the other hand, can provide an accurate position to guarantee that every element meeting
estimate of success or failure. Shortly we’ll see ex- the theoretical definitions laid down actually has
actly how this estimate can be achieved. a chance of being selected in the sample. Even
I’ve said that probability sampling ensures that where lists of elements exist for sampling pur-
samples are representative of the population we poses, the lists are usually somewhat incomplete.
wish to study. As we’ll see in a moment, probabil- Some students are always inadvertently omitted
ity sampling rests on the use of a random-selection from student rosters. Some telephone subscribers
procedure. To develop this idea, though, we need have unlisted numbers.
to give more-precise meaning to two important Often, researchers decide to limit their study
terms: element and population.* populations more severely than indicated in the
An element is that unit about which informa- preceding examples. National polling firms may
tion is collected and that provides the basis of limit their national samples to the 48 adjacent
analysis. Typically, in survey research, elements states, omitting Alaska and Hawaii for practical
are people or certain types of people. However, reasons. A researcher wishing to sample psychol-
other kinds of units can constitute the elements ogy professors may limit the study population to
of social research: Families, social clubs, or cor- those in psychology departments, omitting those
porations might be the elements of a study. In a in other departments. Whenever the population
given study, elements are often the same as units under examination is altered in such fashion, you
of analysis, though the former are used in sample must make the revisions clear to your readers.
selection and the latter in data analysis.
Up to now we’ve used the term population to
Random Selection
mean the group or collection that we’re interested
in generalizing about. More formally, a popula- With these definitions in hand, we can define the
tion is the theoretically specified aggregation of ultimate purpose of sampling: to select a set of
study elements. Whereas the vague term Ameri- elements from a population in such a way that
cans might be the target for a study, the delinea-
tion of the population would include the definition
of the element Americans (for example, citizen- element That unit of which a population is composed
ship, residence) and the time referent for the study and which is selected in a sample. Distinguished from
units of analysis, which are used in data analysis.
population The theoretically specified aggregation of
*I would like to acknowledge a debt to Leslie Kish and his
the elements in a study.
excellent textbook Survey Sampling. Although I’ve modified
some of the conventions used by Kish, his presentation is study population That aggregation of elements from
easily the most important source of this discussion. which a sample is actually selected.
212 CHAPTER 7 THE LOGIC OF SAMPLING

descriptions of those elements accurately portray this danger. Second, and more important, random
the total population from which the elements are selection offers access to the body of probability
selected. Probability sampling enhances the likeli- theory, which provides the basis for estimating the
hood of accomplishing this aim and also provides characteristics of the population as well as esti-
methods for estimating the degree of probable mates of the accuracy of samples. Let’s now ex-
success. amine probability theory in greater detail.
Random selection is the key to this process. In
random selection, each element has an equal
Probability Theory,
chance of selection independent of any other event
Sampling Distributions,
in the selection process. Flipping a coin is the most
and Estimates of Sample Error
frequently cited example: Provided that the coin is
perfect (that is, not biased in terms of coming up Probability theory is a branch of mathematics that
heads or tails), the “selection” of a head or a tail provides the tools researchers need to devise sam-
is independent of previous selections of heads or pling techniques that produce representative sam-
tails. No matter how many heads turn up in a row, ples and to statistically analyze the results of their
the chance that the next flip will produce “heads” sampling. More formally, probability theory pro-
is exactly 50–50. Rolling a perfect set of dice is an- vides the basis for estimating the parameters of a
other example. population. A parameter is the summary descrip-
Such images of random selection, though use- tion of a given variable in a population. The mean
ful, seldom apply directly to sampling methods in income of all families in a city is a parameter; so is
social research. More typically, social researchers the age distribution of the city’s population. When
use tables of random numbers or computer pro- researchers generalize from a sample, they’re us-
grams that provide a random selection of sampling ing sample observations to estimate population
units. A sampling unit is that element or set of parameters. Probability theory enables them both
elements considered for selection in some stage of to make these estimates and to arrive at a judg-
sampling. In Chapter 9, on survey research, we’ll ment of how likely the estimates will accurately
see how computers are used to select random represent the actual parameters in the population.
telephone numbers for interviewing, a technique So, for example, probability theory allows pollsters
called random-digit dialing. to infer from a sample of 2,000 voters how a popu-
There are two reasons for using random-se- lation of 100 million voters is likely to vote—and to
lection methods. First, this procedure serves as a specify exactly what the probable margin of error
check on conscious or unconscious bias on the in the estimates is.
part of the researcher. The researcher who selects Probability theory accomplishes these seem-
cases on an intuitive basis might very well select ingly magical feats by way of the concept of sam-
cases that would support his or her research ex- pling distributions. A single sample selected from a
pectations or hypotheses. Random selection erases population will give an estimate of the population
parameter. Other samples would give the same or
slightly different estimates. Probability theory tells
us about the distribution of estimates that would
random selection A sampling method in which each
element has an equal chance of selection independent of be produced by a large number of such samples.
any other event in the selection process. The logic of sampling error can be applied to
sampling unit That element or set of elements consid- different kinds of measurements: mean income or
ered for selection in some stage of sampling. mean age, for example. Measurements expressed
parameter The summary description of a given variable as percentages, however, provide the simplest in-
in a population. troduction to this general concept.
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 213

To start, suppose we were to select—at ran-


dom—a sample of only one person from the ten.
Our ten possible samples thus consist of the ten
cases shown in Figure 7-4.
The ten dots shown on the graph in Figure 7-5
represent these ten samples. Because we’re tak-
ing samples of only one, they also represent the
“means” we would get as estimates of the popu-
lation. The distribution of the dots on the graph
is called the sampling distribution. Obviously, it
wouldn’t be a very good idea to select a sample
of only one, because we’ll very likely miss the true
Earl Babbie

mean of $4.50 by quite a bit.


Now suppose if we take a sample of two. As
shown in Figure 7-6, increasing the sample size
How would researchers conduct a random sample improves our estimations. There are now 45 pos-
of this subdivision? What are the pitfalls they would
sible samples: [$0 $1], [$0 $2], . . . [$7 $8], [$8 $9].
need to avoid?
Moreover, some of those samples produce the
same means. For example, [$0 $6], [$1 $5], and
To see how this works, we’ll look at two ex- [$2 $4] all produce means of $3. In Figure 7-6, the
amples of sampling distributions, beginning with three dots shown above the $3 mean represent
a simple example in which our population consists those three samples.
of just ten cases. Moreover, the 45 samples are not evenly dis-
tributed, as they were when the sample size was
The Sampling Distribution of Ten Cases Sup- only one. Rather, they cluster somewhat around
pose there are ten people in a group, and each has the true value of $4.50. Only two possible samples
a certain amount of money in his or her pocket. deviate by as much as $4 from the true value ([$0
To simplify, let’s assume that one person has no $1] and [$8 $9]), whereas five of the samples give
money, another has one dollar, another has two the true estimate of $4.50; another eight samples
dollars, and so forth up to the person with nine miss the mark by only 50 cents (plus or minus).
dollars. Figure 7-4 presents the population of ten Now suppose we select even larger samples.
people.* What do you suppose that will do to our estimates
Our task is to determine the average amount of the mean? Figure 7-7 presents the sampling dis-
of money one person has: specifically, the mean tributions of samples of 3, 4, 5, and 6.
number of dollars. If you simply add up the money The progression of sampling distributions is
shown in Figure 7-4, you’ll find that the total is $45, clear. Every increase in sample size improves the
so the mean is $4.50. Our purpose in the rest of this distribution of estimates of the mean. The limit-
exercise is to estimate that mean without actually ing case in this procedure, of course, is to select a
observing all ten individuals. We’ll do that by se- sample of ten. There would be only one possible
lecting random samples from the population and sample (everyone) and it would give us the true
using the means of those samples to estimate the mean of $4.50. As we’ll see shortly, this principle
mean of the whole population. applies to actual sampling of meaningful popula-
tions. The larger the sample selected, the more ac-
*I want to thank Hanan Selvin for suggesting this method curate it is, as an estimation of the population from
of introducing probability sampling. which it was drawn.
FIGURE 7-4 A Population of 10 People with $0–$9. Let’s imagine a population of only 10 people with
differing amounts of money in their pockets—ranging from $0 to $9.

10 10
9 9
True mean = $4.50
8 8
Number of samples

Number of samples

7 7
(Total = 10)

(Total = 45)

6 6
True mean = $4.50
5 5
4 4
3 3
2 2
1 1
0 0
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $0 $1 $2 $3 $4 $5 $6 $7 $8 $9
Estimate of mean Estimate of mean
(Sample size = 1) (Sample size = 2)

FIGURE 7-5 The Sampling Distribution of FIGURE 7-6 The Sampling Distribution of
Samples of 1. In this simple example the mean Samples of 2. After merely increasing our sample
amount of money these people have is $4.50 size to 2, the possible samples provide somewhat
($45/10). If we picked 10 different samples of better estimates of the mean. We couldn’t get
1 person each, our “estimates” of the mean either $0 or $9, and the estimates are beginning to
would range all across the board. cluster around the true value of the mean: $4.50.
a. Samples of 3 b. Samples of 4
True mean = $4.50 True mean = $4.50
20 20

18 18 •

Number of samples (Total = 120)

Number of samples (Total = 210)


16 16 •• • • •
•• • • •
14 14 • • • •• • •
• • • • •• • • •
12 12 • • • • •• • • •
• • • • •• • • •
10 • • • • 10 • • • • • •• • • • •
• • • • • • •• • • • • • • • • • • •
8 • • • • • • • • 8 •• • • • • • • • • • • •
• • • • • • • • • • •• • • • • • • • • • • •
6 • • • • • • • • • • 6 ••• • • • • • • • • • • • •
• • • • • • • • • • • • • ••• • • • • • • • • • • • • •
4 • • • • • • • • • • • • • • 4 • ••• • • • • • • • • • • • • •
• • • • • • • • • • • • • • • • • • ••• • • • • • • • • • • • • • •
2 • • • • • • • • • • • • • • • • • • 2 • • • ••• • • • • • • • • • • • • • •• •
• • • • • • • • • • • • • • • • • • • • • • • • • • • • • ••• • • • • • • • • • • • • • •• • • • • •
0 0
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $0 $1 $2 $3 $4 $5 $6 $7 $8 $9
Estimate of mean Estimate of mean
(Sample size = 3) (Sample size = 4)

c. Samples of 5 d. Samples of 6
True mean = $4.50 True mean = $4.50

20 •• 20
••• •
18 • ••• • • 18 •
• ••• • • •
Number of samples (Total = 252)

Number of samples (Total = 210)

16 •• ••• • •• 16 •••••
•• ••• • •• •••••
14 • •• ••• • •• • 14 •• •••• •
• •• ••• • •• • • •• •••• ••
12 • •• ••• • •• • 12 • •• •••• ••
•• •• ••• • •• • • • •• •••• ••
10 •• •• ••• • •• • • 10 •• •• •••• •• •
• •• •• ••• • •• • • • ••• •• •••• •• ••
8 • •• •• ••• • •• • • • 8 ••• •• •••• •• ••
• • •• •• ••• • •• • • • • ••• •• •••• •• ••
6 • • •• •• ••• • •• • • • • 6 •••• •• •••• •• •••
• • • •• •• ••• • •• • • • • • • •••• •• •••• •• ••••
4 • • • •• •• ••• • •• • • • • • 4 • •••• •• •••• •• ••••
•• • • •• •• ••• • •• • • • • • • •• •••• •• •••• •• •••• •
2 • •• • • •• •• ••• • •• • • • • • • • 2 • •• •••• •• •••• •• •••• ••
•• • • • • •• • • •• •• ••• • •• • • • • • • • • • • • ••• •• •••• •• •••• •• •••• •• ••
0 0
$0 $1 $2 $3 $4 $5 $6 $7 $8 $9 $0 $1 $2 $3 $4 $5 $6 $7 $8 $9
Estimate of mean Estimate of mean
(Sample size = 5) (Sample size = 6)

FIGURE 7-7 The Sampling Distributions of Samples of 3, 4, 5, and 6. As we increase the sample size,
the possible samples cluster ever more tightly around the true value of the mean. The chance of extremely
inaccurate estimates is reduced at the two ends of the distribution, and the percentage of the samples near
the true value keeps increasing.
216 CHAPTER 7 THE LOGIC OF SAMPLING

Sampling Distribution and Estimates of Sam-


pling Error Let’s turn now to a more realistic 0 50 100
sampling situation involving a much larger popu- Percent of students approving of the student code
lation and see how the notion of sampling distri- FIGURE 7-8 Range of Possible Sample Study
bution applies. Assume that we wish to study the Results. Shifting to a more realistic example, let’s
student population of State University (SU) to de- assume that we want to sample student attitudes
termine the percentage of students who approve concerning a proposed conduct code. Let’s assume
or disapprove of a student conduct code proposed
that 50 percent of the whole student body ap-
proves and 50 percent disapproves—though the
by the administration. The study population will be researcher doesn’t know that.
the aggregation of, say, 20,000 students contained
in a student roster: the sampling frame. The ele-
ments will be the individual students at SU. We’ll Sample 2 (51%)
select a random sample of, say, 100 students for
the purposes of estimating the entire student body. Sample 1 (48%) Sample 3 (52%)
The variable under consideration will be attitudes
toward the code, a binomial variable comprising • ••
the attributes approve and disapprove. (The logic
of probability sampling applies to the examination 0 50 100
of other types of variables, such as mean income, Percent of students approving of the student code
but the computations are somewhat more compli- FIGURE 7-9 Results Produced by Three
cated. Consequently, this introduction focuses on Hypothetical Studies. Assuming a large student
binomials.) body, let’s suppose that we selected three different
The horizontal axis of Figure 7-8 presents all samples, each of substantial size. We would not
necessarily expect those samples to perfectly
possible values of this parameter in the popula-
reflect attitudes in the whole student body, but
tion—from 0 percent to 100 percent approval. The they should come reasonably close.
midpoint of the axis—50 percent—represents half
the students approving of the code and the other
half disapproving.
sure their approval or disapproval of the student
To choose our sample, we give each student on
code. Perhaps 51 students in the second sample
the student roster a number and select 100 ran-
approve of the code. We place another dot in the
dom numbers from a table of random numbers.
appropriate place on the x axis. Repeating this pro-
Then we interview the 100 students whose num-
cess once more, we may discover that 52 students
bers have been selected and ask whether they ap-
in the third sample approve of the code.
prove or disapprove of the student code. Suppose
Figure 7-9 presents the three different sample
this operation gives us 48 students who approve
statistics representing the percentages of students
of the code and 52 who disapprove. This summary
in each of the three random samples who ap-
description of a variable in a sample is called a
proved of the student code. The basic rule of ran-
statistic. We present this statistic by placing a dot
dom sampling is that such samples, drawn from
on the x axis at the point representing 48 percent.
a population, give estimates of the parameter that
Now let’s suppose we select another sample of
exists in the total population. Each of the random
100 students in exactly the same fashion and mea-
samples, then, gives us an estimate of the percent-
age of students in the total student body who ap-
prove of the student code. Unhappily, however, we
statistic The summary description of a variable in a have selected three samples and now have three
sample, used to estimate a population parameter. separate estimates.
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 217

• •••••
• ••••••
•••••••

Number of samples
• ••••••••
•••••••••
• ••••••••••
• •••••••••••
••••••••••••
• •••••••••••••
••••••••••••••
• •••••••••••••••
80 ••••••••••••••••
••••••••••••••••••
60 •••••••••••••••••••••
••••••••••••••••••••••••
40 ••••••••••••••••••••••••••
•••••••••••••••••••••••••••••
20 •••••••••••••••••••••••••••••••••
••••••••••••••••••••••••••••••••••••
0
0 50 100
Percent of students approving of the student code

FIGURE 7-10 The Sampling Distribution. If we were to select


a large number of good samples, we would expect them to
cluster around the true value (50 percent), but given enough such
samples, a few would fall far from the mark.

To retrieve ourselves from this problem, let’s clustered around the true value. To put it another
draw more and more samples of 100 students each, way, probability theory enables us to estimate the
question each of the samples concerning their ap- sampling error—the degree of error to be ex-
proval or disapproval of the code, and plot the new pected for a given sample design. This formula
sample statistics on our summary graph. In draw- contains three factors: the parameter, the sample
ing many such samples, we discover that some of size, and the standard error (a measure of sam-
the new samples provide duplicate estimates, as in pling error):
the illustration of ten cases. Figure 7-10 shows the
sampling distribution of, say, hundreds of samples.
This is often referred to as a normal curve.
s⫽ 冑 P⫻Q
n
Note that by increasing the number of samples
selected and interviewed, we have also increased The symbols P and Q in the formula equal the
the range of estimates provided by the sampling population parameters for the binomial: If 60 per-
operation. In one sense we have increased our di- cent of the student body approve of the code and
lemma in attempting to guess the parameter in the 40 percent disapprove, P and Q are 60 percent and
population. Probability theory, however, provides 40 percent, respectively, or 0.6 and 0.4. Note that
certain important rules regarding the sampling dis- Q ⫽ 1 ⫺ P and P ⫽ 1 ⫺ Q. The symbol n equals
tribution presented in Figure 7-10. the number of cases in each sample, and s is the
First, if many independent random samples standard error.
are selected from a population, the sample statis- Let’s assume that the population parameter in
tics provided by those samples will be distributed the student example is 50 percent approving of
around the population parameter in a known way. the code and 50 percent disapproving. Recall that
Thus, although Figure 7-10 shows a wide range of we’ve been selecting samples of 100 cases each.
estimates, more of them fall near 50 percent than When these numbers are put into the formula,
elsewhere in the graph. Probability theory tells
us, then, that the true value is in the vicinity of 50
sampling error The degree of error to be expected
percent.
in probability sampling. The formula for determining
Second, probability theory gives us a formula sampling error contains three factors: the parameter, the
for estimating how closely the sample statistics are sample size, and the standard error.
218 CHAPTER 7 THE LOGIC OF SAMPLING

we find that the standard error equals 0.05, or ment, we note that the standard error will increase
5 percent. as a function of an increase in the quantity P times
In probability theory, the standard error is a Q. Note further that this quantity reaches its maxi-
valuable piece of information because it indicates mum in the situation of an even split in the popu-
the extent to which the sample estimates will be lation. If P ⫽ 0.5, PQ ⫽ 0.25; if P ⫽ 0.6, PQ ⫽ 0.24;
distributed around the population parameter. (If if P ⫽ 0.8, PQ ⫽ 0.16; if P ⫽ 0.99, PQ ⫽ 0.0099. By
you’re familiar with the standard deviation in sta- extension, if P is either 0.0 or 1.0 (either 0 percent
tistics, you may recognize that the standard error, or 100 percent approve of the student code), the
in this case, is the standard deviation of the sam- standard error will be 0. If everyone in the popula-
pling distribution.) Specifically, probability theory tion has the same attitude (no variation), then ev-
indicates that certain proportions of the sample es- ery sample will give exactly that estimate.
timates will fall within specified increments—each The standard error is also a function of the
equal to one standard error—from the population sample size—an inverse function. As the sample
parameter. Approximately 34 percent (0.3413) size increases, the standard error decreases. As
of the sample estimates will fall within one stan- the sample size increases, more and more sam-
dard error increment above the population pa- ples will be clustered nearer to the true value. An-
rameter, and another 34 percent will fall within other general guideline is evident in the formula:
one standard error below the parameter. In our Because of the square root formula, the standard
example, the standard error increment is 5 per- error is reduced by half if the sample size is qua-
cent, so we know that 34 percent of our samples drupled. In our present example, samples of 100
will give estimates of student approval between produce a standard error of 5 percent; to reduce
50 percent (the parameter) and 55 percent (one the standard error to 2.5 percent, we must increase
standard error above); another 34 percent of the the sample size to 400.
samples will give estimates between 50 percent All of this information is provided by established
and 45 percent (one standard error below the probability theory in reference to the selection of
parameter). Taken together, then, we know that large numbers of random samples. (If you’ve taken
roughly two-thirds (68 percent) of the samples will a statistics course, you may know this as the Cen-
give estimates within 5 percentage points of the tral Tendency Theorem.) If the population param-
parameter. eter is known and many random samples are se-
Moreover, probability theory dictates that lected, we can predict how many of the sampling
roughly 95 percent of the samples will fall within estimates will fall within specified intervals from
plus or minus two standard errors of the true value, the parameter.
and 99.9 percent of the samples will fall within plus Recognize that this discussion illustrates only
or minus three standard errors. In our present ex- the logic of probability sampling; it does not de-
ample, then, we know that only one sample out of scribe the way research is actually conducted.
a thousand would give an estimate lower than 35 Usually, we don’t know the parameter: The very
percent approval or higher than 65 percent. reason we conduct a sample survey is to estimate
The proportion of samples falling within one, that value. Moreover, we don’t actually select large
two, or three standard errors of the parameter numbers of samples: We select only one sample.
is constant for any random sampling procedure Nevertheless, the preceding discussion of prob-
such as the one just described, providing that a ability theory provides the basis for inferences
large number of samples are selected. The size about the typical social research situation. Know-
of the standard error in any given case, however, ing what it would be like to select thousands of
is a function of the population parameter and the samples allows us to make assumptions about the
sample size. If we return to the formula for a mo- one sample we do select and study.
THE THEORY AND LOGIC OF PROBABILITY SAMPLING 219

Confidence Levels and Confidence Intervals appropriate sample size for a study. Once you’ve
Whereas probability theory specifies that 68 percent decided on the degree of sampling error you can
of that fictitious large number of samples would tolerate, you’ll be able to calculate the number of
produce estimates falling within one standard er- cases needed in your sample. Thus, for example,
ror of the parameter, we can turn the logic around if you want to be 95 percent confident that your
and infer that any single random sample has a 68 study findings are accurate within ⫾ 5 percentage
percent chance of falling within that range. This points of the population parameters, you should
observation leads us to the two key components of select a sample of at least 400. (Appendix E is a
sampling-error estimates: confidence level and convenient guide in this regard.)
confidence interval. We express the accuracy This, then, is the basic logic of probability sam-
of our sample statistics in terms of a level of confi- pling. Random selection permits the researcher to
dence that the statistics fall within a specified inter- link findings from a sample to the body of probabil-
val from the parameter. For example, we may say ity theory so as to estimate the accuracy of those
we are 95 percent confident that our sample statis- findings. All statements of accuracy in sampling
tics (for example, 50 percent favor the new student must specify both a confidence level and a confi-
code) are within plus or minus 5 percentage points dence interval. The researcher must report that he
of the population parameter. As the confidence or she is x percent confident that the population
interval is expanded for a given statistic, our con- parameter lies between two specific values. In this
fidence increases. For example, we may say that example, I have demonstrated the logic of sam-
we are 99.9 percent confident that our statistic falls pling error using a variable analyzed in percent-
within three standard errors of the true value. ages. A different statistical procedure would be re-
Although we may be confident (at some level) quired to calculate the standard error for a mean,
of being within a certain range of the parameter, for example, but the overall logic is the same.
we’ve already noted that we seldom know what Notice that nowhere in this discussion of sam-
the parameter is. To resolve this problem, we sub- ple size and accuracy of estimates did we consider
stitute our sample estimate for the parameter in the size of the population being studied. This is
the formula; that is, lacking the true value, we sub- because the population size is almost always irrel-
stitute the best available guess. evant. A sample of 2,000 respondents drawn prop-
The result of these inferences and estimations erly to represent Vermont voters will be no more
is that we can estimate a population parameter accurate than a sample of 2,000 drawn properly
and also the expected degree of error on the ba- to represent all voters in the United States, even
sis of one sample drawn from a population. Be- though the Vermont sample would be a substan-
ginning with the question “What percentage of the tially larger proportion of that small state’s voters
student body approves of the student code?” you than would the same number chosen to represent
could select a random sample of 100 students and the nation’s voters. The reason for this counter-
interview them. You might then report that your intuitive fact is that the equations for calculating
best estimate is that 50 percent of the student body sampling error all assume that the populations be-
approves of the code and that you are 95 percent
confident that between 40 and 60 percent (plus or
minus two standard errors) approve. The range
confidence level The estimated probability that a
from 40 to 60 percent is the confidence interval.
population parameter lies within a given confidence inter-
(At the 68 percent confidence level, the confidence val. Thus, we might be 95 percent confident that between
interval would be 45–55 percent.) 35 and 45 percent of all voters favor Candidate A.
The logic of confidence levels and confidence in- confidence interval The range of values within
tervals also provides the basis for determining the which a population parameter is estimated to lie.
220 CHAPTER 7 THE LOGIC OF SAMPLING

ing sampled are infinitely large, so every sample


would equal 0 percent of the whole. IN THE REAL WORLD
Of course, this is not literally true in practice. MEDIA SUCCESS WITH THE
A sample of 2,000 represents only 0.68 percent CONCEPT OF SAMPLING ERROR
of the Vermonters who voted for president in the
2000 election, and a sample of 2,000 U.S. voters The mass media have become increasingly
represents a mere 0.002 percent of the national sophisticated in reporting the concept of
electorate. Both of these proportions are small sampling error over the years. Notice how
enough to approach the situation with infinitely this report by CNN (2006) distinguishes mar-
large populations. gins of error within the whole sample and
Further, unless a sample represents, say, 5 per- subsamples:
cent or more of the population it is drawn from,
that proportion is irrelevant. In those rare cases of A CNN poll released Monday found
large proportions being selected, a “finite popula- Bush’s approval rating was 34 percent—
tional correction” can be calculated to adjust the an uptick of 2 percentage points from
confidence intervals. Simply subtract the propor- the most recent CNN poll in late April.
tion from 1.0 and multiply the result times the The president’s disapproval rating
sampling error. As you can see, with proportions was 58 percent, down 2 points from the
close to zero, this will make no difference. If, on previous poll.
the other hand, your sample were half of the popu- The poll, also done by Opinion Re-
lation, the sampling error would be cut in half by search Corp., was based on interviews
this procedure. In the extreme, if you included the of 1,021 adults. Both shifts are within the
whole population in your sample, the sample-to- poll’s sampling error of plus or minus 3
population proportion would be 1.0, and you would percentage points.
multiple the calculated standard error by 0.0—sug- More than half of those who disap-
gesting there was no sampling error, which would, proved of Bush’s job performance—
of course, be the case. (How cool is that?) See the 56 percent—said the war in Iraq was
box “Media Success with the Concept of Sampling the reason.
Error” for more on this topic. Thirteen percent said the recent
Two cautions are in order before we conclude increase in gas prices had fueled their
this discussion of the basic logic of probability sam- displeasure. Twenty-six percent gave
pling. First, the survey uses of probability theory other reasons.
as discussed here are technically not wholly jus- Because that question was asked only
tified. The theory of sampling distribution makes of those who disapproved, it had a differ-
assumptions that almost never apply in survey ent sampling error—4 percentage points.
conditions. The exact proportion of samples con-
Source: CNN, 2006 (http://www.cnn.com/2006/
tained within specified increments of standard POLITICS/05/10/congress.poll/index.html).
errors, for example, mathematically assumes an Accessed May 11, 2006.
infinitely large population, an infinite number of
samples, and sampling with replacement—that
is, every sampling unit selected is “thrown back
into the pot” and could be selected again. Sec- I offer these cautions to provide perspective on
ond, our discussion has greatly oversimplified the the uses of probability theory in sampling. Social
inferential jump from the distribution of several researchers often appear to overestimate the preci-
samples to the probable characteristics of one sion of estimates produced by the use of probabil-
sample. ity theory. As I’ll mention elsewhere in this chapter
POPULATIONS AND SAMPLING FRAMES 221

and throughout the book, variations in sampling grade in public and parochial schools in Yakima
techniques and nonsampling factors may further County, Washington. — (PETERSEN AND MAYNARD
reduce the legitimacy of such estimates. For exam- 1981:92)
ple, those selected in a sample who fail or refuse to The sample at Time 1 consisted of 160 names
participate further detract from the representative- drawn randomly from the telephone directory of
ness of the sample. Lubbock, Texas. — (TAN 1980:242)
Nevertheless, the calculations discussed in this The data reported in this paper . . . were
section can be extremely valuable to you in under- gathered from a probability sample of adults
standing and evaluating your data. Although the aged 18 and over residing in households in the
calculations do not provide estimates as precise as 48 contiguous United States. Personal interviews
some researchers might assume, they can be quite with 1,914 respondents were conducted by the
valid for practical purposes. They are unquestion- Survey Research Center of the University of
ably more valid than less rigorously derived esti- Michigan during the fall of 1975. — (Jackman
mates based on less rigorous sampling methods. and Senter 1980:345)
Most important, being familiar with the basic logic
In each example I’ve italicized the actual sampling
underlying the calculations can help you react sen-
frames.
sibly both to your own data and to those reported
Properly drawn samples provide information
by others.
appropriate for describing the population of ele-
ments composing the sampling frame—nothing
POPULATIONS AND SAMPLING FRAMES more. I emphasize this point in view of the all-
too-common tendency for researchers to select
samples from a given sampling frame and then
The preceding section introduced the theoretical make assertions about a population similar to, but
model for social research sampling. Although as not identical to, the population defined by the sam-
students, research consumers, and researchers we pling frame.
need to understand that theory, appreciating the For example, take a look at this report, which
less-than-perfect conditions that exist in the field is discusses the drugs most frequently prescribed by
no less important. In this section we’ll look at one U.S. physicians:
aspect of field conditions that requires a compro-
mise with idealized theoretical conditions and as- Information on prescription drug sales is not
sumptions: the congruence of or disparity between easy to obtain. But Rinaldo V. DeNuzzo, a
populations of sampling frames. professor of pharmacy at the Albany College
Simply put, a sampling frame is the list or quasi of Pharmacy, Union University, Albany, NY, has
list of elements from which a probability sample is been tracking prescription drug sales for 25
selected. If a sample of students is selected from years by polling nearby drugstores. He pub-
a student roster, the roster is the sampling frame. lishes the results in an industry trade magazine,
If the primary sampling unit for a complex popu- MM&M.
lation sample is the census block, the list of cen- DeNuzzo’s latest survey, covering 1980, is
sus blocks composes the sampling frame—in the based on reports from 66 pharmacies in 48 com-
form of a printed booklet, a magnetic tape file, a munities in New York and New Jersey. Unless
CD-ROM, or some other medium. Here are some
reports of sampling frames appearing in research
sampling frame That list or quasi list of units compos-
journals: ing a population from which a sample is selected. If the
sample is to be representative of the population, it is
The data for this research were obtained from a essential that the sampling frame include all (or nearly all)
random sample of parents of children in the third members of the population.
222 CHAPTER 7 THE LOGIC OF SAMPLING

there is something peculiar about that part of members; factory workers; fraternity or sorority
the country, his findings can be taken as repre- members; members of social, service, or political
sentative of what happens across the country. clubs; and members of professional associations.
— (MOSKOWITZ 1981:33) The preceding comments apply primarily to lo-
cal organizations. Often, statewide or national or-
What is striking in the excerpt is the casual ganizations do not have a single membership list.
comment about whether there is anything pecu- There is, for example, no single list of Episcopalian
liar about New York and New Jersey. There is. The church members. However, a slightly more com-
lifestyle in these two states hardly typifies the plex sample design could take advantage of local
lifestyles in the other 48. We cannot assume that church membership lists by first sampling churches
residents in these large, urbanized, Eastern sea- and then subsampling the membership lists of
board states necessarily have the same drug-use those churches selected. (More about that later.)
patterns that residents of Mississippi, Nebraska, or Other lists of individuals may be especially rel-
Vermont have. evant to the research needs of a particular study.
Does the survey even represent prescription Government agencies maintain lists of registered
patterns in New York and New Jersey? To determine voters, for example, that might be used if you
that, we would have to know something about the wanted to conduct a preelection poll or an in-depth
way the 48 communities and the 66 pharmacies examination of voting behavior—but you must in-
were selected. We should be wary in this regard, sure that the list is up-to-date. Similar lists contain
in view of the reference to “polling nearby drug- the names of automobile owners, welfare recipi-
stores.” As we’ll see, there are several methods for ents, taxpayers, business permit holders, licensed
selecting samples that ensure representativeness, professionals, and so forth. Although it may be
and unless they’re used, we shouldn’t generalize difficult to gain access to some of these lists, they
from the study findings. provide excellent sampling frames for specialized
A sampling frame, then, must be consonant research purposes.
with the population we wish to study. In the sim- The sampling elements in a study need not
plest sample design, the sampling frame is a list be individual persons. Lists of other types of ele-
of the elements composing the study population. ments also exist: universities, businesses of vari-
In practice, though, existing sampling frames often ous types, cities, academic journals, newspapers,
define the study population rather than the other unions, political clubs, professional associations,
way around. That is, we often begin with a pop- and so forth.
ulation in mind for our study; then we search for Telephone directories are frequently used for
possible sampling frames. Having examined and “quick and dirty” public opinion polls. Undeniably
evaluated the frames available for our use, we de- they’re easy and inexpensive to use—no doubt
cide which frame presents a study population most the reason for their popularity. And, if you want to
appropriate to our needs. make assertions about telephone subscribers, the
Studies of organizations are often the simplest directory is a fairly good sampling frame. (Realize,
from a sampling standpoint because organizations of course, that a given directory will include neither
typically have membership lists. In such cases, the new subscribers nor those who have requested
list of members constitutes an excellent sampling unlisted numbers. Sampling is further complicated
frame. If a random sample is selected from a mem- by the directories’ inclusion of nonresidential list-
bership list, the data collected from that sample ings.) Unfortunately, telephone directories are all
may be taken as representative of all members—if too often used as a listing of a city’s population or
all members are included in the list. of its voters. Of the many defects in this reason-
Populations that can be sampled from good or- ing, the chief one involves a social-class bias. Poor
ganizational lists include elementary school, high people are less likely to have telephones; rich peo-
school, and university students and faculty; church ple may have more than one line.
TYPES OF SAMPLING DESIGNS 223

The class bias inherent in telephone directory Because social research literature gives surpris-
samples often remains hidden. Preelection polls ingly little attention to the issues of populations
conducted in this fashion are sometimes quite ac- and sampling frames, I’ve devoted special atten-
curate, perhaps because of the class bias evident tion to them here by providing a summary of the
in voting itself: Poor people are less likely than main guidelines to remember:
rich people to vote. Frequently, then, these two bi-
1. Findings based on a sample can be taken as
ases nearly coincide, so that the results of a tele-
representing only the aggregation of elements
phone poll may come very close to the final elec-
that compose the sampling frame.
tion outcome. Unhappily, you never know for sure
2. Often, sampling frames do not truly include all
until after the election. Sometimes, as in the case
the elements their names might imply. Omis-
of the 1936 Literary Digest poll, you may discover
sions are almost inevitable. Thus, a first concern
that the voters have not acted according to the ex-
of the researcher must be to assess the extent
pected class biases. The ultimate disadvantage of
of the omissions and to correct them if pos-
this method, then, is the researcher’s inability to
sible. (Of course, the researcher may feel that
estimate the degree of error to be expected in the
he or she can safely ignore a small number of
sample findings.
omissions that cannot easily be corrected.)
The growth in popularity of cell phones has be-
3. To be generalized even to the population com-
come a new source of concern for survey research-
posing the sampling frame, all elements must
ers, because cell phone numbers are typically not
have equal representation in the frame. Typi-
included in phone surveys. Those who use cell
cally, each element should appear only once.
phones exclusively, moreover, tend to be younger
Elements that appear more than once will have
and, in 2004, they were more likely to vote for John
a greater probability of selection, and the sam-
Kerry than were older voters. Scott Keeter (2006)
ple will, overall, overrepresent those elements.
found, however, that researchers who weighted
their results in terms of age avoided bias in this Other, more practical matters relating to popu-
respect. lations and sampling frames will be treated else-
Street directories and tax maps are often used where in this book. For example, the form of the
for easy samples of households, but they may also sampling frame—such as a list in a publication,
suffer from incompleteness and possible bias. For 3-by-5 card file, CD-ROM, or magnetic tape—can
example, in strictly zoned urban regions, illegal affect how easy it is to use. And ease of use may
housing units tend not to appear on official re- often take priority over scientific considerations:
cords. As a result, such units might not be selected, An “easier” list may be chosen over a “harder” one,
and sample findings might not be representative even though the latter is more appropriate to the
of those units, which are often poorer and more target population. Every researcher should care-
overcrowded than average. fully weigh the relative advantages and disadvan-
Though the preceding comments apply to the tages of such alternatives.
United States, the situation is quite different in
some other countries. In Japan, for example, the
TYPES OF SAMPLING DESIGNS
government maintains quite accurate population
registration lists. Moreover, citizens are required by
law to update their information, such as changes Up to this point, we’ve focused on simple ran-
in residence or births and deaths in the household. dom sampling (SRS). Indeed, the body of statis-
As a consequence, you can select simple random tics typically used by social researchers assumes
samples of the Japanese population more easily such a sample. As you’ll see shortly, however, you
than in the United States. Such a registration list in have several options in choosing your sampling
the United States would conflict directly with this method, and you’ll seldom if ever choose simple
country’s norms regarding individual privacy. random sampling. There are two reasons for this.
224 CHAPTER 7 THE LOGIC OF SAMPLING

First, with all but the simplest sampling frame, sim- Figure 7-11 offers a graphic illustration of sim-
ple random sampling is not feasible. Second, and ple random sampling. Note that the members of
probably surprisingly, simple random sampling our hypothetical micropopulation have been num-
may not be the most accurate method available. bered from 1 to 100. Moving to Appendix B, we
Let’s turn now to a discussion of simple random decide to use the last two digits of the first column
sampling and the other available options. and to begin with the third number from the top.
This yields person number 30 as the first one se-
lected into the sample. Number 67 is next, and so
Simple Random Sampling
forth. (Person 100 would have been selected if “00”
As noted, simple random sampling is the basic had come up in the list.)
sampling method assumed in the statistical com-
putations of social research. Because the mathe- Systematic Sampling
matics of random sampling are especially complex,
we’ll detour around them in favor of describing the Simple random sampling is seldom used in prac-
ways of employing this method in the field. tice. As you’ll see, it’s not usually the most efficient
Once a sampling frame has been properly es- method, and it can be laborious if done manually.
tablished, to use simple random sampling the re- Typically, simple random sampling requires a list of
searcher assigns a single number to each element elements. When such a list is available, research-
in the list, not skipping any number in the process. ers usually employ systematic sampling instead.
A table of random numbers (Appendix B) is then In systematic sampling, every kth element in
used to select elements for the sample. The box “Us- the total list is chosen (systematically) for inclusion
ing a Table of Random Numbers” explains its use. in the sample. If the list contained 10,000 elements
If your sampling frame is in a machine-readable and you wanted a sample of 1,000, you would se-
form, such as computer disk or magnetic tape, a lect every tenth element for your sample. To en-
simple random sample can be selected automati- sure against any possible human bias in using this
cally by computer. (In effect, the computer pro- method, you should select the first element at ran-
gram numbers the elements in the sampling frame, dom. Thus, in the preceding example, you would
generates its own series of random numbers, and begin by selecting a random number between one
prints out the list of elements selected.) and ten. The element having that number is in-
cluded in the sample, plus every tenth element fol-
lowing it. This method is technically referred to as
a systematic sample with a random start. Two terms
simple random sampling A type of probability
sampling in which the units composing a population are are frequently used in connection with systematic
assigned numbers. A set of random numbers is then gen- sampling. The sampling interval is the standard
erated, and the units having those numbers are included distance between elements selected in the sample:
in the sample. ten in the preceding sample. The sampling ratio
systematic sampling A type of probability sampling is the proportion of elements in the population that
in which every kth unit in a list is selected for inclusion
are selected: 1/10 in the example.
in the sample—for example, every 25th student in the
college directory of students. You compute k by dividing population size
the size of the population by the desired sample size; k sampling interval ⫽
sample size
is called the sampling interval. Within certain constraints,
systematic sampling is a functional equivalent of simple sample size
random sampling and usually easier to do. Typically, the sampling ratio ⫽
population size
first unit is selected at random.
sampling interval The standard distance (k) between In practice, systematic sampling is virtually
elements selected from a population for a sample. identical to simple random sampling. If the list of
sampling ratio The proportion of elements in the elements is indeed randomized before sampling,
population that are selected to be in a sample. one might argue that a systematic sample drawn
TYPES OF SAMPLING DESIGNS 225

Micropopulation

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

Appendix B: Table of Random Numbers The Sample


10480 15011 01536
22368 46573 25595
241 30 48360 22527
421 67 93093 06243
375 70 39975 81837

779 21 06907 11008 30 67 70 21 62


995 62 72905 56420
963 01 91977 05463
895 79 14342 63661
854 75 36857 53342

289 18 69578 88231 1 79 75 18 53


635 53 40961 48235
09429 93969 52636

FIGURE 7-11 A Simple Random Sample. Having numbered everyone in the population, we can use
a table of random numbers to select a representative sample from the overall population. Anyone who’s
number is chosen from the table is in the sample.

from that list is in fact a simple random sample. There is one danger involved in systematic
By now, debates over the relative merits of simple sampling. The arrangement of elements in the list
random sampling and systematic sampling have can make systematic sampling unwise. Such an
been resolved largely in favor of the latter, simpler arrangement is usually called periodicity. If the list
method. Empirically, the results are virtually identi- of elements is arranged in a cyclical pattern that
cal. And, as you’ll see in a later section, systematic coincides with the sampling interval, a grossly bi-
sampling, in some instances, is slightly more ac- ased sample can be drawn. Here are two examples
curate than simple random sampling. that illustrate this danger.
226 CHAPTER 7 THE LOGIC OF SAMPLING

ISSUES AND INSIGHTS


USING A TABLE OF RANDOM NUMBERS

In social research, it’s often appropriate to select 3. Now turn to the first page of Appendix B.
a set of random numbers from a table such as Notice there are several rows and columns
the one in Appendix B. Here’s how to do that. of five-digit numbers, and there are several
Suppose you want to select a simple random pages. The table represents a series of ran-
sample of 100 people (or other units) out of a dom numbers in the range from 00001
population totaling 980. to 99999. To use the table for your hypo-
thetical sample, you have to answer these
1. To begin, number the members of the popu- questions:
lation: in this case, from 1 to 980. Now the
problem is to select 100 random numbers. a. How will you create three-digit numbers
Once you’ve done that, your sample will out of five-digit numbers?
consist of the people having the numbers b. What pattern will you follow in moving
you’ve selected. (Note: It’s not essential to through the table to select your numbers?
actually number them, as long as you’re c. Where will you start?
sure of the total. If you have them in a list,
for example, you can always count through Each of these questions has several satis-
the list after you’ve selected the numbers.) factory answers. The key is to create a plan
2. The next step is to determine the number and follow it. Here’s an example.
of digits you’ll need in the random num- 4. To create three-digit numbers from five-
bers you select. In our example, there are digit numbers, let’s agree to select five-digit
980 members of the population, so you’ll numbers from the table but consider only
need three-digit numbers to give everyone the left-most three digits in each case. If we
a chance of selection. (If there were 11,825 picked the first number on the first page—
members of the population, you’d need to 10480—we would only consider the 104.
select five-digit numbers.) Thus, we want (We could agree to take the digits farthest to
to select 100 random numbers in the range the right, 480, or the middle three digits, 048,
from 001 to 980. and any of these plans would work.) The

In a classic study of soldiers during World War II, the sample is drawn from a list of apartments ar-
the researchers selected a systematic sample from ranged in numerical order (for example, 101, 102,
unit rosters. Every tenth soldier on the roster was 103, 104, 201, 202, and so on), there is a danger of
selected for the study. The rosters, however, were the sampling interval coinciding with the number
arranged in a table of organizations: sergeants first, of apartments on a floor or some multiple thereof.
then corporals and privates, squad by squad. Each Then the samples might include only northwest-
squad had ten members. As a result, every tenth corner apartments or only apartments near the
person on the roster was a squad sergeant. The elevator. If these types of apartments have some
systematic sample selected contained only ser- other particular characteristic in common (for ex-
geants. It could, of course, have been the case that ample, higher rent), the sample will be biased. The
no sergeants were selected for the same reason. same danger would appear in a systematic sample
As another example, suppose we select a of houses in a subdivision arranged with the same
sample of apartments in an apartment building. If number of houses on a block.
TYPES OF SAMPLING DESIGNS 227

key is to make a plan and stick with it. For have 99 more to go. Moving down the sec-
convenience, let’s use the left-most three ond column, we select 069, 729, 919, 143,
digits. 368, 695, 409, 939, and so forth, continuing
5. We can also choose to progress through in the same column onto the next page. At
the tables any way we want: down the col- the bottom of column 2 (the second page of
umns, up them, across to the right or to the the table), we select number 017 and con-
left, or diagonally. Again, any of these plans tinue to the top of column 3: 015, 255, and
will work just fine so long as we stick to so on.
it. For convenience, let’s agree to move 8. See how easy it is? But trouble could lie
down the columns. When we get to the bot- ahead. Say we need more than our 100
tom of one column, we’ll go to the top numbers. When we reach column 5, we are
of the next. speeding along, selecting 816, 309, 763, 078,
6. Now, where do we start? You can close 061, 277, 988 . . . Wait a minute! There are
your eyes and stick a pencil into the table only 980 students in the senior class. How
and start wherever the pencil point lands. can we pick number 988? The solution is
(I know it doesn’t sound scientific, but it simple: Ignore it. Any time you come across
works.) Or, if you’re afraid you’ll hurt the a number that lies outside your range, skip it
book or miss it altogether, close your eyes and continue on your way: 188, 174, and so
and make up a column number and a row forth. The same solution applies if the same
number. (“I’ll pick the number in the fifth number comes up more than once. If you
row of column 2.”) Start with that select 399 again, for example, just ignore it
number. the second time.
7. Let’s suppose we decide to start with the 9. That’s it. You keep up the procedure until
fifth number in column 2. If you look on the you’ve selected 100 random numbers.
first page of Appendix B, you’ll see that the Returning to your list, your sample consists
starting number is 39975. We have selected of person number 399, person number 69,
399 as our first random number, and we person number 729, and so forth.

In considering a systematic sample from a list, Stratified Sampling


then, you should carefully examine the nature of
So far we have discussed two methods of sam-
that list. If the elements are arranged in any par-
ple selection from a list: random and systematic.
ticular order, you should figure out whether that
Stratification is not an alternative to these meth-
order will bias the sample to be selected, then you
ods; rather, it represents a possible modification of
should take steps to counteract any possible bias
their use.
(for example, take a simple random sample from
cyclical portions).
Usually, however, systematic sampling is supe- stratification The grouping of the units composing a
population into homogeneous groups (or strata) before
rior to simple random sampling, in convenience if
sampling. This procedure, which may be used in conjunc-
nothing else. Problems in the ordering of elements tion with simple random, systematic, or cluster sampling,
in the sampling frame can usually be remedied improves the representativeness of a sample, at least in
quite easily. terms of the variables used for stratification.
228 CHAPTER 7 THE LOGIC OF SAMPLING

Simple random sampling and systematic sam- neous on the stratification variables, they may be
pling both ensure a degree of representativeness homogeneous on other variables as well. Because
and permit an estimate of the error present. Strati- age is related to college class, a sample stratified
fied sampling is a method for obtaining a greater by class will be more representative in terms of
degree of representativeness by decreasing the age as well, compared with an unstratified sample.
probable sampling error. To understand this Because occupational aspirations still seem to be
method, we must return briefly to the basic theory related to gender, a sample stratified by gender will
of sampling distribution. be more representative in terms of occupational
Recall that sampling error is reduced by two aspirations.
factors in the sample design. First, a large sample The choice of stratification variables typically
produces a smaller sampling error than does a depends on what variables are available. Gender
small sample. Second, a homogeneous popula- can often be determined in a list of names. Univer-
tion produces samples with smaller sampling er- sity lists are typically arranged by class. Lists of fac-
rors than does a heterogeneous population. If ulty members may indicate their departmental af-
99 percent of the population agrees with a certain filiation. Government agency files may be arranged
statement, it’s extremely unlikely that any prob- by geographic region. Voter registration lists are
ability sample will greatly misrepresent the extent arranged according to precinct.
of agreement. If the population is split 50–50 on the In selecting stratification variables from among
statement, then the sampling error will be much those available, however, you should be con-
greater. cerned primarily with those that are presumably
Stratified sampling is based on this second factor related to variables you want to represent accu-
in sampling theory. Rather than selecting a sample rately. Because gender is related to many variables
from the total population at large, the researcher and is often available for stratification, it is often
ensures that appropriate numbers of elements are used. Education is related to many variables, but it
drawn from homogeneous subsets of that popu- is often not available for stratification. Geographic
lation. To get a stratified sample of university stu- location within a city, state, or nation is related to
dents, for example, you would first organize your many things. Within a city, stratification by geo-
population by college class and then draw appro- graphic location usually increases representative-
priate numbers of freshmen, sophomores, juniors, ness in social class, ethnic group, and so forth.
and seniors. In a nonstratified sample, representa- Within a nation, it increases representativeness in
tion by class would be subject to the same sam- a broad range of attitudes as well as in social class
pling error as would other variables. In a sample and ethnicity.
stratified by class, the sampling error on this vari- When you’re working with a simple list of all
able is reduced to zero. elements in the population, two methods of strati-
More-complex stratification methods are also fication predominate. In one method, you sort the
possible. In addition to stratifying by class, you population elements into discrete groups based on
might also stratify by gender, by GPA, and so forth. whatever stratification variables are being used.
In this fashion you might be able to ensure that On the basis of the relative proportion of the popu-
your sample would contain the proper numbers lation represented by a given group, you select—
of male sophomores with a 3.5 average, of female randomly or systematically—several elements
sophomores with a 4.0 average, and so forth. from that group constituting the same proportion
The ultimate function of stratification, then, is of your desired sample size. For example, if sopho-
to organize the population into homogeneous sub- more men with a 4.0 average compose 1 percent
sets (with heterogeneity between subsets) and to of the student population and you desire a sample
select the appropriate number of elements from of 1,000 students, you would select 10 sophomore
each. To the extent that the subsets are homoge- men with a 4.0 average.
TYPES OF SAMPLING DESIGNS 229

Random start

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25

26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50

51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75

76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100

The Sample

3 13 23 33 43 53 63 73 83 93

FIGURE 7-12 A Stratified, Systematic Sample with a Random Start. A stratified, systematic sample
involves two stages. First the members of the population are gathered into homogeneous strata; this simple
example merely uses gender as a stratification variable but more could be used. Then every kth (in this case,
every 10th) person in the stratified arrangement is selected into the sample.

The other method is to group students as de- up our micropopulation according to gender and
scribed and then put those groups together in a race. Then, beginning with a random start of “3,”
continuous list, beginning with all male freshmen we’ve taken every tenth person thereafter, result-
with a 4.0 average and ending with all female se- ing in a list of 3, 13, 23, . . . , 93.
niors with a 1.0 or below. You would then select a Stratified sampling ensures the proper repre-
systematic sample, with a random start, from the sentation of the stratification variables; this, in
entire list. Given the arrangement of the list, a syste- turn, enhances the representation of other vari-
matic sample would select proper numbers (within ables related to them. Taken as a whole, then, a
an error range of 1 or 2) from each subgroup. (Note: stratified sample is more likely than a simple ran-
A simple random sample drawn from such a com- dom sample to be more representative on several
posite list would cancel out the stratification.) variables. Although the simple random sample is
Figure 7-12 offers a graphic illustration of strati- still regarded as somewhat sacred, it should now
fied, systematic sampling. As you can see, we lined be clear that you can often do better.
230 CHAPTER 7 THE LOGIC OF SAMPLING

Implicit Stratification of other information such as field of study, class,


in Systematic Sampling age, and gender.
The computer database, however, contained
I mentioned that systematic sampling can, under
information on all people who could, by any con-
certain conditions, be more accurate than simple
ceivable definition, be called students, many of
random sampling. This is the case whenever the
whom seemed inappropriate to the purposes of the
arrangement of the list creates an implicit strati-
study. As a result, researchers needed to define the
fication. As already noted, if a list of university
study population in a somewhat more restricted
students is arranged by class, then a systematic
fashion. The final definition included those 15,225
sample provides a stratification by class whereas a
day-program degree candidates registered for the
simple random sample would not.
fall semester on the Manoa campus of the univer-
In a study of students at the University of Ha-
sity, including all colleges and departments, both
waii, after stratification by school class, the stu-
undergraduate and graduate students, and both
dents were arranged by their student identification
U.S. and foreign students. The computer program
numbers. These numbers, however, were their so-
used for sampling, therefore, limited consideration
cial security numbers. The first three digits of the
to students fitting this definition.
social security number indicate the state in which
the number was issued. As a result, within a class,
Stratification The sampling program also per-
students were arranged by the state in which they
mitted stratification of students before sample se-
were issued a social security number, providing a
lection. The researchers decided that stratification
rough stratification by geographic origin.
by college class would be sufficient, although the
An ordered list of elements, therefore, may be
students might have been further stratified within
more useful to you than an unordered, random-
class, if desired, by gender, college, major, and so
ized list. I’ve stressed this point in view of the un-
forth.
fortunate belief that lists should be randomized be-
fore systematic sampling. Only if the arrangement
Sample Selection Once the students had been
presents the problems discussed earlier should the
arranged by class, a systematic sample was se-
list be rearranged.
lected across the entire rearranged list. The sam-
ple size for the study was initially set at 1,100. To
Illustration: achieve this sample, the sampling program was set
Sampling University Students for a 1/14 sampling ratio. The program generated
Let’s put these principles into practice by looking at a random number between 1 and 14; the student
an actual sampling design used to select a sample having that number and every 14th student there-
of university students. The purpose of the study after was selected in the sample.
was to survey, with a mail-out questionnaire, a Once the sample had been selected, the com-
representative cross section of students attending puter was instructed to print students’ names and
the main campus of the University of Hawaii. The mailing addresses on self-adhesive mailing labels.
following sections describe the steps and decisions These labels were then simply transferred to enve-
involved in selecting that sample. lopes for mailing the questionnaires.

Study Population and Sampling Frame The


Sample Modification
obvious sampling frame available for use in this
sample selection was the computerized file main- This initial design of the sample had to be modified.
tained by the university administration. The file Before the mailing of questionnaires, the research-
contained students’ names, local and permanent ers discovered that unexpected expenses in the
addresses, social security numbers, and a variety production of the questionnaires made it impos-
MULTISTAGE CLUSTER SAMPLING 231

sible to cover the costs of mailing to all 1,100 stu- bers for study. (For an example, see Glock, Ringer,
dents. As a result, one-third of the mailing labels and Babbie 1967.)
were systematically selected (with a random start) Another typical situation concerns sampling
for exclusion from the sample. The final sample for among population areas such as a city. Although
the study was thereby reduced to 733 students. there is no single list of a city’s population, citizens
I mention this modification in order to illus- reside on discrete city blocks or census blocks. Re-
trate the frequent need to alter a study plan in searchers can therefore select a sample of blocks
midstream. Because the excluded students were initially, create a list of people living on each of the
systematically omitted from the initial systematic selected blocks, and take a subsample of the peo-
sample, the remaining 733 students could still be ple on each block.
taken as reasonably representing the study popu- In a more complex design, researchers might
lation. The reduction in sample size did, of course, (1) sample blocks, (2) list the households on each
increase the range of sampling error. selected block, (3) sample the households, (4) list
the people residing in each household, and (5)
sample the people within each selected household.
MULTISTAGE CLUSTER SAMPLING
This multistage sample design leads ultimately to
a selection of a sample of individuals but does not
The preceding sections have dealt with reason- require the initial listing of all individuals in the
ably simple procedures for sampling from lists of city’s population.
elements. Such a situation is ideal. Unfortunately, Multistage cluster sampling, then, involves
however, much interesting social research re- the repetition of two basic steps: listing and sam-
quires the selection of samples from populations pling. The list of primary sampling units (churches,
that cannot easily be listed for sampling purposes: blocks) is compiled and, perhaps, stratified for
the population of a city, state, or nation; all univer- sampling. Then a sample of those units is selected.
sity students in the United States; and so forth. In The selected primary sampling units are then listed
such cases the sample design must be much more and perhaps stratified. The list of secondary sam-
complex. Such a design typically involves the ini- pling units is then sampled, and so forth.
tial sampling of groups of elements—clusters—fol- The listing of households on even the selected
lowed by the selection of elements within each of blocks is, of course, a labor-intensive and costly
the selected clusters. activity—one of the elements making face-to-
Cluster sampling may be used when it’s ei- face household surveys quite expensive. Vincent
ther impossible or impractical to compile an ex- Iannacchione, Jennifer Staab, and David Redden
haustive list of the elements composing the target (2003) report some initial success using postal
population, such as all church members in the mailing lists for this purpose. Although the lists are
United States. Often, however, the population el- not perfect, they may be close enough to warrant
ements are already grouped into subpopulations, the significant savings in cost.
and a list of those subpopulations either exists or Multistage cluster sampling makes possible
can be created practically. For example, church those studies that would otherwise be impossible.
members in the United States belong to discrete
churches, which are either listed or could be. Fol-
lowing a cluster-sample format, then, researchers cluster sampling A multistage sampling in which
would sample the list of churches in some manner natural groups (clusters) are sampled initially, with the
(for example, a stratified, systematic sample). Next, members of each selected group being subsampled
afterward. For example, you might select a sample of U.S.
they would obtain lists of members from each of
colleges and universities from a directory, get lists of the
the selected churches. Each of the lists would then students at all the selected schools, then draw samples of
be sampled, to provide samples of church mem- students from each.
232 CHAPTER 7 THE LOGIC OF SAMPLING

ISSUES AND INSIGHTS


SAMPLING IRAN

Whereas most of the examples given in this 5. The western provinces including western
textbook are taken from its country of origin, and eastern Azarbaijan, Zanjan, Ghazvin
the United States, the basic methods of sam- and Ardebil
pling would apply in other national settings 6. The eastern provinces including Khorasan
as well. At the same time, researchers may and Semnan
need to make modifications appropriate to lo- 7. The northern provinces including Gilan,
cal conditions. In selecting a national sample Mazandran and Golestan
of Iran, for example, Hamid Abdollahyan and 8. Systan
Taghi Azadarmaki (2000:21) from the Uni- 9. Kurdistan
versity of Tehran began by stratifying the na-
tion on the basis of cultural differences, divid- Within each of these cultural areas, the research-
ing the country into nine cultural zones as ers selected samples of census blocks and, on
follows: each selected block, a sample of households.
Their sample design made provisions for getting
1. Tehran the proper numbers of men and women as re-
2. Central region including Isfahan, Arak, spondents within households and provisions for
Qum, Yazd and Kerman replacing those households where no one was
3. The southern provinces including at home.
Hormozgan, Khuzistan, Bushehr
and Fars Source: Hamid Abdollahyan and Taghi Azadarmaki,
Sampling Design in a Survey Research: The Sampling
4. The marginal western region including
Practice in Iran, paper presented to the meetings of
Lorestan, Charmahal and Bakhtiari, the American Sociological Association, August 12–
Kogiluyeh and Eelam 16, 2000, Washington, DC.

Specific research circumstances often call for spe- elements selected within a given cluster will rep-
cial designs, as the box “Sampling Iran” demon- resent all the elements in that cluster only within
strates. a range of sampling error. Thus, for example, a re-
searcher runs a certain risk of selecting a sample
of disproportionately wealthy city blocks, plus a
Multistage Designs
sample of disproportionately wealthy households
and Sampling Error
within those blocks. The best solution to this
Although cluster sampling is highly efficient, the problem lies in the number of clusters selected
price of that efficiency is a less accurate sample. initially and the number of elements within each
A simple random sample drawn from a popula- cluster.
tion list is subject to a single sampling error, but Typically, researchers are restricted to a total
a two-stage cluster sample is subject to two sam- sample size; for example, you may be limited to
pling errors. First, the initial sample of clusters will conducting 2,000 interviews in a city. Given this
represent the population of clusters only within broad limitation, however, you have several op-
a range of sampling error. Second, the sample of tions in designing your cluster sample. At the ex-
MULTISTAGE CLUSTER SAMPLING 233

tremes you could choose one cluster and select The general guideline for cluster design, then,
2,000 elements within that cluster, or you could is to maximize the number of clusters selected
select 2,000 clusters with one element selected while decreasing the number of elements within
within each. Of course, neither approach is advis- each cluster. However, this scientific guideline
able, but a broad range of choices lies between must be balanced against an administrative con-
them. Fortunately, the logic of sampling distribu- straint. The efficiency of cluster sampling is based
tions provides a general guideline for this task. on the ability to minimize the listing of population
Recall that sampling error is reduced by two fac- elements. By initially selecting clusters, you need
tors: an increase in the sample size and increased only list the elements composing the selected
homogeneity of the elements being sampled. clusters, not all elements in the entire popula-
These factors operate at each level of a multistage tion. Increasing the number of clusters, however,
sample design. A sample of clusters will best rep- goes directly against this efficiency factor. A small
resent all clusters if a large number are selected number of clusters may be listed more quickly and
and if all clusters are very much alike. A sample more cheaply than a large number. (Remember
of elements will best represent all elements in a that all the elements in a selected cluster must be
given cluster if a large number are selected from listed even if only a few are to be chosen in the
the cluster and if all the elements in the cluster are sample.)
very much alike. The final sample design will reflect these two
With a given total sample size, however, if the constraints. In effect, you’ll probably select as
number of clusters is increased, the number of el- many clusters as you can afford. Lest this is-
ements within a cluster must be decreased, and sue be left too open-ended at this point, here is
vice versa. In the first case, the representativeness one general guideline. Population researchers
of the clusters is increased at the expense of more conventionally aim at the selection of 5 house-
poorly representing the elements composing each holds per census block. If a total of 2,000 house-
cluster. Fortunately, homogeneity can be used to holds are to be interviewed, you would aim at
ease this dilemma. 400 blocks with 5 household interviews on each.
Typically, the elements composing a given Figure 7-13 presents a graphic overview of this
natural cluster within a population are more ho- process.
mogeneous than are all elements composing the Before turning to other, more detailed proce-
total population. The members of a given church dures available to cluster sampling, let me reiterate
are more alike than are all church members; the that this method almost inevitably involves a loss
residents of a given city block are more alike than of accuracy. The manner in which this appears,
are the residents of a whole city. As a result, rela- however, is somewhat complex. First, as noted
tively few elements may be needed to represent a earlier, a multistage sample design is subject to a
given natural cluster adequately, although a larger sampling error at each stage. Because the sample
number of clusters may be needed to represent size is necessarily smaller at each stage than the
adequately the diversity found among the clusters. total sample size, the sampling error at each stage
This fact is most clearly seen in the extreme case will be greater than would be the case for a single-
of very different clusters composed of identical stage random sample of elements. Second, sam-
elements within each. In such a situation, a large pling error is estimated on the basis of observed
number of clusters would adequately represent all variance among the sample elements. When those
its members. Although this extreme situation never elements are drawn from among relatively homo-
exists in reality, it’s closer to the truth in most cases geneous clusters, the estimated sampling error will
than its opposite: identical clusters composed of be too optimistic and must be corrected in the light
grossly divergent elements. of the cluster sample design.
234 CHAPTER 7 THE LOGIC OF SAMPLING

1st St.

Stage One: Identify blocks


and select a sample.
2nd St. (Selected blocks are shaded.)

Rosemary Ave.

Robinson Ave.
Parsley Ave.

Thyme Ave.

Bridge Ave.
Boxer Ave.
Sage Ave.

3rd St.

4th St.

5th St.

Stage Two: Go to each selected


block and list all households in order.
(Example of one listed block.)

1. 491 Rosemary Ave. 16. 408 Thyme Ave.


2. 487 Rosemary Ave. 17. 424 Thyme Ave.
3. 473 Rosemary Ave. 18. 446 Thyme Ave.
4. 455 Rosemary Ave. 19. 458 Thyme Ave.
5. 437 Rosemary Ave. Stage Three: For 20. 480 Thyme Ave.
6. 423 Rosemary Ave. each list, select 21. 498 Thyme Ave.
7. 411 Rosemary Ave. sample of households. 22. 1186 5th St.
8. 403 Rosemary Ave. (In this example, every 23. 1174 5th St.
sixth household has
9. 1101 4th St. 24. 1160 5th St.
been selected starting
10. 1123 4th St. with #5, which was 25. 1140 5th St.
11. 1137 4th St. selected at random.) 26. 1122 5th St.
12. 1157 4th St. 27. 1118 5th St.
13. 1169 4th St. 28. 1116 5th St.
14. 1187 4th St. 29. 1104 5th St.
15. 402 Thyme Ave. 30. 1102 5th St.

FIGURE 7-13 Multistage Cluster Sampling. In multistage cluster sampling, we begin by selecting a
sample of the clusters (in this case, city blocks). Then, we make a list of the elements (households, in this
case) and select a sample of elements from each of the selected clusters.

Stratification in
techniques can be used to refine and improve the
Multistage Cluster Sampling
sample being selected.
Thus far, we’ve looked at cluster sampling as The basic options here are essentially the same
though a simple random sample were selected as those in single-stage sampling from a list. In se-
at each stage of the design. In fact, stratification lecting a national sample of churches, for example,
MULTISTAGE CLUSTER SAMPLING 235

you might initially stratify your list of churches by individual steps in sampling. That is, each house-
denomination, geographic region, size, rural or hold has a 1/10 chance of its block being selected
urban location, and perhaps by some measure of and a 1/10 chance of that specific household being
social class. selected if the block is one of those chosen. Each
Once the primary sampling units (churches, household, in this case, has a 1/10 ⫻ 1/10 ⫽ 1/100
blocks) have been grouped according to the chance of selection overall. Because each house-
relevant, available stratification variables, either hold would have the same chance of selection, the
simple random or systematic sampling techniques sample so selected should be representative of all
can be used to select the sample. You might select households in the city.
a specified number of units from each group, or There are dangers in this procedure, however.
stratum, or you might arrange the stratified clus- In particular, the variation in the size of blocks
ters in a continuous list and systematically sample (measured in numbers of households) presents a
that list. problem. Let’s suppose that half the city’s popu-
To the extent that clusters are combined into lation resides in 10 densely packed blocks filled
homogeneous strata, the sampling error at this with high-rise apartment buildings, and suppose
stage will be reduced. The primary goal of stratifi- that the rest of the population lives in single-fam-
cation, as before, is homogeneity. ily dwellings spread out over the remaining 900
Stratification could, of course, take place at blocks. When we first select our sample of 1/10 of
each level of sampling. The elements listed within the blocks, it’s quite possible that we’ll miss all of
a selected cluster might be stratified before the the 10 densely packed high-rise blocks. No matter
next stage of sampling. Typically, however, this is what happens in the second stage of sampling, our
not done. (Recall the assumption of relative homo- final sample of households will be grossly unrepre-
geneity within clusters.) sentative of the city, comprising only single-family
dwellings.
Whenever the clusters sampled are of greatly
Probability Proportionate
differing sizes, it’s appropriate to use a modified
to Size (PPS) Sampling
sampling design called PPS (probability pro-
This section introduces you to a more sophisti- portionate to size). This design guards against
cated form of cluster sampling, one that is used the problem I’ve just described and still produces a
in many large-scale survey sampling projects. In final sample in which each element has the same
the preceding discussion, I talked about selecting a chance of selection.
random or systematic sample of clusters and then As the name suggests, each cluster is given a
a random or systematic sample of elements within chance of selection proportionate to its size. Thus,
each cluster selected. Notice that this produces an a city block with 200 households has twice the
overall sampling scheme in which every element chance of selection as one with only 100 house-
in the whole population has the same probability holds. Within each cluster, however, a fixed num-
of selection. ber of elements is selected, say, 5 households per
Let’s say we’re selecting households within a block. Notice how this procedure results in each
city. If there are 1,000 city blocks and we initially household having the same probability of selection
select a sample of 100, that means that each block overall.
has a 100/1,000 or 0.1 chance of being selected.
If we next select 1 household in 10 from those re-
PPS (probability proportionate to size) This
siding on the selected blocks, each household has
refers to a type of multistage cluster sample in which clus-
a 0.1 chance of selection within its block. To cal- ters are selected, not with equal probabilities (see EPSEM)
culate the overall probability of a household being but with probabilities proportionate to their sizes—as
selected, we simply multiply the probabilities at the measured by the number of units to be subsampled.
236 CHAPTER 7 THE LOGIC OF SAMPLING

Let’s look at households of two different city noted that the various sampling procedures result
blocks. Block A has 100 households, whereas in an equal chance of selection—even though the
Block B has only 10. In PPS sampling, we would ultimate selection probability is the product of sev-
give Block A ten times as good a chance of be- eral partial probabilities.
ing selected as Block B. So if, in the overall sam- More generally, however, a probability sample is
ple design, Block A has a 1/20 chance of being one in which each population element has a known
selected, that means Block B would only have a nonzero probability of selection—even though dif-
1/200 chance. Notice that this means that all the ferent elements may have different probabilities. If
households on Block A would have a 1/20 chance controlled probability-sampling procedures have
of having their block selected; Block B households been used, any such sample may be representative
have only a 1/200 chance. of the population from which it is drawn if each
If Block A is selected and we’re taking 5 house- sample element is assigned a weight equal to the
holds from each selected block, then the house- inverse of its probability of selection. Thus, where
holds on Block A have a 5/100 chance of being all sample elements have had the same chance of
selected into the block’s sample. Because we can selection, each is given the same weight: 1. This is
multiply probabilities in a case like this, we see that called a self-weighting sample.
every household on Block A had an overall chance Sometimes it’s appropriate to give some cases
of selection equal to 1/20 ⫻ 5/100 ⫽ 5/2000 ⫽ more weight than others, a process called weight-
1/400. ing. Disproportionate sampling and weighting
If Block B happens to be selected, on the other come into play in two basic ways. First, you may
hand, its households stand a much better chance of sample subpopulations disproportionately to en-
being among the 5 chosen there: 5/10. When this sure sufficient numbers of cases from each for
is combined with their relatively poorer chance of analysis. For example, a given city may have a sub-
having their block selected in the first place, how- urban area containing one-fourth of its total popu-
ever, they end up with the same chance of selec- lation. Yet you might be especially interested in a
tion as those on Block A: 1/200 ⫻ 5/10 ⫽ 5/2000 detailed analysis of households in that area and
⫽ 1/400. may feel that one-fourth of this total sample size
Further refinements to this design make it a very would be too few. As a result, you might decide
efficient and effective method for selecting large to select the same number of households from the
cluster samples. For now, however, it’s enough to suburban area as from the remainder of the city.
understand the basic logic involved. Households in the suburban area, then, are given a
disproportionately better chance of selection than
those located elsewhere in the city.
Disproportionate
As long as you analyze the two area samples
Sampling and Weighting
separately or comparatively, you need not worry
Ultimately, a probability sample is representative about the differential sampling. If you want to
of a population if all elements in the population combine the two samples to create a composite
have an equal chance of selection in that sample. picture of the entire city, however, you must take
Thus, in each of the preceding discussions, we’ve the disproportionate sampling into account. If n
is the number of households selected from each
area, then the households in the suburban area
weighting Assigning different weights to cases that had a chance of selection equal to n divided by
were selected into a sample with different probabilities one-fourth of the total city population. Because
of selection. In the simplest scenario, each case is given a
the total city population and the sample size are
weight equal to the inverse of its probability of selection.
When all cases have the same chance of selection, no the same for both areas, the suburban-area house-
weighting is necessary. holds should be given a weight of 1/4n, and the
MULTISTAGE CLUSTER SAMPLING 237

remaining households should be given a weight of ated,” we know that the female response is based
3/4n. This weighting procedure could be simpli- on a substantial number of cases. That’s good.
fied by merely giving a weight of 3 to each of the There are problems, however.
households selected outside the suburban area. To begin with, subscriber surveys are always
(This procedure gives a proportionate representa- problematic. In this case, the best the research-
tion to each sample element. The population fig- ers can hope to talk about is “what subscribers to
ure would have to be included in the weighting if Harvard Business Review think.” In a loose way, it
population estimates were desired.) might make sense to think of that population as
Here’s an example of the problems that can be representing the more sophisticated portion of
created when disproportionate sampling is not corporate management. Unfortunately, the over-
accompanied by a weighting scheme. When the all response rate was 25 percent. Although that’s
Harvard Business Review decided to survey its sub- quite good for subscriber surveys, it’s a low re-
scribers on the issue of sexual harassment at work, sponse rate in terms of generalizing from prob-
it seemed appropriate to oversample women be- ability samples.
cause female subscribers were vastly outnumbered Beyond that, however, the disproportionate
by male subscribers. Here’s how G. C. Collins and sample design creates a further problem. When the
Timothy Blodgett explained the matter: authors state that 73 percent of respondents favor
company policies against harassment (Collins
We also skewed the sample another way: to
and Blodgett, 1981:78), that figure is undoubtedly
ensure a representative response from women,
too high, because the sample contains a dispro-
we mailed a questionnaire to virtually every
portionately high percentage of women, who are
female subscriber, for a male/female ratio of
more likely to favor such policies. Further, when
68% to 32%. This bias resulted in a response of
the researchers report that top managers are more
52% male and 44% female (and 4% who gave
likely to feel that claims of sexual harassment are
no indication of gender)—compared to HBR’s
exaggerated than are middle- and lower-level
U.S. subscriber proportion of 93% male and 7%
managers (1981:81), that finding is also suspect.
female. — (1981:78)
As the researchers report, women are dispropor-
Notice a couple of things in this quotation. tionately represented in lower management. That
First, it would be nice to know a little more about alone might account for the apparent differences
what “virtually every female” means. Evidently, the among levels of management regarding harass-
authors of the study didn’t send questionnaires ment. In short, the failure to take account of the
to all female subscribers, but there’s no indica- oversampling of women confounds all survey re-
tion of who was omitted and why. Second, they sults that do not separate the findings by gender.
didn’t use the term representative in its normal The solution to this problem would have been to
social science usage. What they mean, of course, weight the responses by gender, as described ear-
is that they wanted to get a substantial or “large lier in this section.
enough” response from women, and oversam- In the 2000 and 2004 election campaign polls,
pling is a perfectly acceptable way of accomplish- survey weighting became a controversial topic, as
ing that. some polling agencies weighted their results on
By sampling more women than a straightfor- the basis of party affiliation and other variables,
ward probability sample would have produced, the whereas others did not. Weighting in this instance
authors were able to “select” enough women (812) involved assumptions regarding the differential
to compare with the men (960). Thus, when they participation of Republicans and Democrats in
report, for example, that 32 percent of the women opinion polls and on election day—plus a determi-
and 66 percent of the men agree that “the amount nation of how many Republicans and Democrats
of sexual harassment at work is greatly exagger- there were. This will likely remain a topic of de-
238 CHAPTER 7 THE LOGIC OF SAMPLING

bate among pollsters and politicians in the years


to come.
WHAT DO YOU THINK? REVISITED

Contrary to common sense, we have seen


Alan Reifman has created a website
devoted to a discussion of weighting: that the number of people selected in a sam-
http://www.hs.ttu.edu/hdfs3390/ ple, while important, is less important than
weighting.htm. how people are selected. The Literary Digest
mailed ballots to ten million people and re-
ceived two million from voters around the
country. However, the people they selected
PROBABILITY SAMPLING IN REVIEW
for their enormous sample—auto owners
and telephone subscribers—were not rep-
Much of this chapter has been devoted to the resentative of the population in 1936, in the
key sampling method used in controlled sur- aftermath of the Great Depression. Overall,
vey research: probability sampling. In each of the sample was wealthier than was the vot-
the variations examined, we’ve seen that ele- ing population at large. Because rich people
ments are chosen for study from a population on are more likely than the general public to
a basis of random selection with known nonzero vote Republican, the Literary Digest tallied
probabilities. the voting intentions of a disproportionate
Depending on the field situation, probability number of Republicans.
sampling can be either very simple or extremely The probability-sampling techniques used
difficult, time consuming, and expensive. What- today allow researchers to select smaller,
ever the situation, however, it remains the most ef- more representative samples. Even a couple
fective method for the selection of study elements. of thousand respondents, properly selected,
There are two reasons for that. can accurately predict the behavior of a hun-
First, probability sampling avoids researchers’ dred million voters.
conscious or unconscious biases in element se-
lection. If all elements in the population have an
equal (or unequal and subsequently weighted)
chance of selection, there is an excellent chance detail how social researchers have found ways to
that the sample so selected will closely represent deal with this issue.
the population of all elements.
Second, probability sampling permits estimates
THE ETHICS OF SAMPLING
of sampling error. Although no probability sample
will be perfectly representative in all respects, con-
trolled selection methods permit the researcher to The key purpose of the sampling techniques dis-
estimate the degree of expected error. cussed in this chapter is to allow researchers to
In this lengthy chapter, we’ve taken on a basic make relatively few observations but gain an ac-
issue in much social research: selecting observa- curate picture of a large population. Quantitative
tions that will tell us something more general than studies using probability sampling should result
the specifics we’ve actually observed. This issue in a statistical profile, based on the sample, that
confronts field researchers, who face more action closely mirrors the profile that would have been
and more actors than they can observe and record gained from observing the whole population. In
fully, as well as political pollsters who want to pre- addition to using legitimate sampling techniques,
dict an election but can’t interview all voters. As researchers should be careful to point out the pos-
we proceed through the book, we’ll see in greater sibility of errors: sampling error, flaws in the sam-
MAIN POINTS 239

pling frame, nonresponse error, or anything else ber of that population. Although this is a legitimate
that might make the results misleading. and valuable approach, readers may mistake the
Sometimes, more typically in qualitative stud- display of differences to reflect the distribution
ies, the purpose of sampling may be to tap into of characteristics in the population. As such, the
the breadth of variation within a population rather researcher should ensure that the reader is not
than focusing on the “average” or “typical” mem- misled.

Main Points ❏ The most carefully selected sample will never


provide a perfect representation of the popula-
Introduction tion from which it was selected. There will
❏ Social researchers must select observations always be some degree of sampling error.
that will allow them to generalize to people
❏ By predicting the distribution of samples with
and events not observed. Often this involves
respect to the target parameter, probability-
sampling, a selection of people to observe.
sampling methods make it possible to estimate
❏ Understanding the logic of sampling is essen- the amount of sampling error expected in a
tial to doing social research. given sample.
❏ The expected error in a sample is expressed
A Brief History of Sampling
in terms of confidence levels and confidence
❏ Sometimes you can and should select prob-
intervals.
ability samples using precise statistical
techniques, but at other times nonprobability
Populations and Sampling Frames
techniques are more appropriate.
❏ A sampling frame is a list or quasi list of the

Nonprobability Sampling members of a population. It is the resource


❏ Nonprobability-sampling techniques include used in the selection of a sample. A sample’s
reliance on available subjects, purposive representativeness depends directly on the
(judgmental sampling), snowball sampling, extent to which a sampling frame contains all
and quota sampling. In addition, research- the members of the total population that the
ers studying a social group may make use of sample is intended to represent.
informants. Each of these techniques has its
uses, but none of them ensures that the result- Types of Sampling Designs
ing sample is representative of the population ❏ Several sampling designs are available to
being sampled. researchers.
❏ Simple random sampling is logically the most
The Theory and Logic
fundamental technique in probability sam-
of Probability Sampling
pling, but it is seldom used in practice.
❏ Probability-sampling methods provide an
excellent way of selecting representative ❏ Systematic sampling involves the selection of
samples from large, known populations. These every kth member from a sampling frame. This
methods counter the problems of conscious method is more practical than simple random
and unconscious sampling bias by giving each sampling and, with a few exceptions, is func-
element in the population a known (nonzero) tionally equivalent.
probability of selection. ❏ Stratification, the process of grouping the
❏ The key to probability sampling is random members of a population into relatively homo-
selection. geneous strata before sampling, improves the
240 CHAPTER 7 THE LOGIC OF SAMPLING

representativeness of a sample by reducing the informant sampling frame


degree of sampling error. nonprobability sampling sampling interval
parameter sampling ratio
Multistage Cluster Sampling population sampling unit
❏ Multistage cluster sampling is a relatively PPS (probability simple random sampling
complex sampling technique that is frequently proportionate to size) snowball sampling
probability sampling statistic
used when a list of all the members of a popu-
purposive sampling stratification
lation does not exist. Typically, researchers
quota sampling study population
must balance the number of clusters and the
random selection systematic sampling
size of each cluster to achieve a given sample
representativeness weighting
size. Stratification can be used to reduce the sampling error
sampling error involved in multistage cluster
sampling. Review Questions
❏ Probability proportionate to size (PPS) is a
special, efficient method for multistage cluster 1. Review the discussion of the 1948 Gallup poll
sampling. that predicted that Thomas Dewey would
defeat Harry Truman for president. What are
❏ If the members of a population have unequal
some ways Gallup could have modified his
probabilities of selection into the sample,
quota-sample design to avoid the error?
researchers must assign weights to the differ-
ent observations made, in order to provide a 2. Using Appendix B of this book, select a
representative picture of the total population. simple random sample of 10 numbers in the
Basically, the weight assigned to a particular range from 1 to 9,876. What is each step in
sample member should be the inverse of its the process?
probability of selection. 3. What are the steps involved in selecting a multi-
stage cluster sample of students taking first-
Probability Sampling in Review year English in U.S. colleges and universities?
❏ Probability sampling remains the most effec-
4. In Chapter 9, we’ll discuss surveys conducted
tive method for the selection of study elements
on the Internet. Can you anticipate possible
for two reasons: It allows researchers to avoid
problems concerning sampling frames, rep-
biases in element selection and it permits
resentativeness, and the like? Do you see any
estimates of error.
solutions?
The Ethics of Sampling
❏ Probability sampling always carries a risk of Online Study Resources
error; researchers must inform readers of any
errors that might make results misleading.
❏ When nonprobability-sampling methods are
used to obtain the breadth of variations in a
Go to
population, researchers must take care not to
mislead readers into confusing variations with http://sociology.wadsworth.com/babbie_basics4e
what’s typical in the population. and click on ThomsonNow for access to this
powerful online study tool. You will get a per-
Key Terms sonalized study plan based on your responses to
a diagnostic pretest. Once you have mastered
cluster sampling element the material with the help of interactive learning
confidence interval EPSEM (equal probability tools, you can take a posttest to confirm that you
confidence level of selection method) are ready to move on to the next chapter.
ADDITIONAL READINGS 241

Website for topics we’ve discussed in this chapter but in a more


The Basics of Social Research statistical context. It demonstrates the links between
4th edition probability sampling and statistical analyses.
Kalton, Graham. 1983. Introduction to Survey Sampling.
At the book companion website (http://sociology Thousand Oaks, CA: Sage. Kalton goes into more
of the mathematical details of sampling than the
.wadsworth.com/babbie_basics4e) you will find
present chapter does but without attempting to be as
many resources in addition to ThomsonNow to
definitive as Kish, described next.
aid you in studying for your exams. For example,
Kish, Leslie. 1965. Survey Sampling. New York: Wiley.
you will find Tutorial Quizzes with feedback, Inter- Unquestionably the definitive work on sampling in
net Exercises, Flashcards, and Chapter Tutorials, as social research. Kish’s coverage ranges from the sim-
well as Extended Projects, InfoTrac College Edition plest matters to the most complex and mathematical,
search terms, Social Research in Cyberspace, GSS both highly theoretical and downright practical. Eas-
Data, Web Links, and primers for using various ily readable and difficult passages intermingle as Kish
data analysis software such as SPSS and NVivo. exhausts everything you could want or need to know
about each aspect of sampling.
Sudman, Seymour. 1983. “Applied Sampling.” Pp. 145–94
Additional Readings in Handbook of Survey Research, edited by Peter H.
Rossi, James D. Wright, and Andy B. Anderson. New
Frankfort-Nachmias, Chava, and Anna Leon-Guerrero. York: Academic Press. An excellent, practical guide to
1997. Social Statistics for a Diverse Society. Thousand survey sampling.
Oaks, CA: Pine Forge Press. See Chapter 11 espe-
cially. This statistics textbook covers many of the

You might also like