Chapter Three

Sampling and Sample Size Determination

Learning Objectives
At the end of this lesson, students will be able to:
 Define the meaning of population, sample, sample frame, sampling error and
 Identify sampling techniques for qualitative research,
 Identify sampling techniques for quantitative research, and
 Determine representative sample for their research topic

Important Terminologies

Sampling is the selection of a given number of units of analysis (people, households, firms, etc.), called cases,
from a population of interest. Generally, the sample size (n) is chosen in order to reproduce, on a small scale,
some characteristics of the whole population (N). Sampling is a key issue in social research designs. The
advantages of sampling are evident:
 Feasibility of the research,
 lower costs, economy of time, and
 Better organization of the work.
However, there is an important problem to deal with that is, sampling error, because a sample is a model of
reality (like a map, a doll, or an MP3) and not the reality itself. The sampling error measures this inevitable
distance of the model from reality. Obviously, the less it is, the more the estimates are close to reality.
Unfortunately, in some cases, the sampling error is unknowable.

Population VS Sample
All the items under consideration in any field of inquiry constitute a ‘universe’ or ‘population’. A complete
enumeration of all the items in the ‘population’ is known as a census inquiry. It can be presumed that in such an
inquiry when all the items are covered, no element of chance is left and highest accuracy is obtained.
Nevertheless, in practice this may not be true. Even the slightest element of bias in such an inquiry will get larger
and larger as the number of observations increases. Moreover, there is no way of checking the element of bias or
its extent except through a resurvey or use of sample checks. Besides, this type of inquiry involves a great deal of
time, money and energy. Not only this, census inquiry is not possible in practice under many circumstances. For
instance, blood testing is done only on sample basis. Hence, quite often we select only a few items from the
universe for our study purposes. The items so selected constitute what is technically called a sample.
The sample size of a survey most typically refers to the number of units that were chosen from which data were
gathered. However, sample size can be defined in various ways. There is the designated sample size, which is the
number of sample units selected for contact or data collection. There is also the final sample size, which is the
number of completed interviews or units for which data are actually collected. The final sample size may be
much smaller than the designated sample size if there is considerable non-response, ineligibility, or both. Not all
the units in the designated sample may need to be processed if productivity in completing interviews is much
higher than anticipated to achieve the final sample size. However, this assumes that units have been activated
from the designated sample in a random fashion. Survey researchers may also be interested in the sample size
for subgroups of the full sample.
Sampling Fraction
A sampling fraction, denoted by f, is the proportion of a universe that is selected for a sample. The sampling
fraction is important for survey estimation because in sampling without replacement, the sample variance is
reduced by a factor of (1−f), called the finite population correction or adjustment.
In a simple survey design, if a sample of n is selected with equal probability from a universe of N, then the
sampling fraction is defined as f =n=N. In this case, the sampling fraction is equal to the probability of selection.
In the case of systematic sampling, f =1/I where I is the sampling interval. The sampling fraction can also be
computed for stratified and multi-stage samples. In a stratified (single-stage) sample, the sampling fraction, fh is
computed separately for each of the h strata. For a stratified sample fh =nh/Nh, where nh is the sample size for
stratum h and Nh is number of units (in the universe of N) that belong to stratum h. Because many samples use
stratification to facilitate over sampling, the probabilities of selection may differ among strata, in which case the f
fh values will not be equal. For multi-stage samples, the sampling fraction can be computed at each stage,
assuming sampling is with equal probability within the stage.
Sampling Frame
A survey may be a census of the universe (the study population) or may be conducted with a sample that
represents the universe. Either a census or a sample survey requires a sampling frame. For a census, the frame
will consist of a list of all the known units in the universe, and each unit will need to be surveyed. For a sample
survey, the sample frame represents a list of the target population from which the sample is selected. Major
categories of sampling frames are area frames for in-person interviews, random-digit dialing (RDD) frames for
telephone survey samples, and a variety of lists used for all types of surveys. Few lists that are used as sampling
frames were created specifically for that use. Exceptions are commercially available RDD frames.

Sampling interval
The length of the string of consecutive integers is commonly referred to as the sampling interval. If the size of the
population or universe is N and n is the size of the sample, then the integer that is at least as large as the number
N=n is called the sampling interval (often denoted by k). Used in conjunction with systematic sampling, the
sampling interval partitions the universe into n zones, or strata, each consisting of k units. In general, systematic
sampling is operationalized by selecting a random start between 1 and the sampling interval. This random start,
r, and every subsequent kth integer would then be included in the sample (i.e., r, r+k, r+2k, etc.), creating k
possible cluster samples each containing n population units. The probability of selecting any one population unit
and consequently, the probability of selecting any one of the k cluster samples is 1/k. The sampling interval and
its role in the systematic sample selection process are illustrated in Figure 1.

For example, suppose that 100 households are to be selected for interviews within a neighborhood containing
1,000 households (labeled 1, 2, 3, . . . , 1,000 for reference). Then the sampling interval, k=1,000= 100=10,
partitions the population of 1,000 households into 100 strata, each having k=10 households. The random start 1
would then refer to the cluster sample of households {1, 11, 21, 31, 41, . . ., 971, 981, 991} under systematic
random sampling.

Sample design
The researcher must decide the way of selecting a sample or what is popularly known as the sample design. In
other words, a sample design is a definite plan determined before any data are actually collected for obtaining a
sample from a given population. Thus, the plan to select 120 of a city’s 2000 private banks in a certain way
constitutes a sample design. In planning to conduct a survey, the survey researcher must decide on the sample
design. Sample size is one aspect of the sample design. It is inversely related to the variance, or standard error of
survey estimates, and is a determining factor in the cost of the survey. In the simplest situation, the variance is a
direct function of the sample size. For example, if a researcher is taking a simple random sample and is interested
in estimating a proportion p, then:
Variance=p (1−p)/n, where n: is the sample size.
More generally, Variance=f (p) (1-p)/n, where f is the design effect, which reflects the effect
of the sample design and weighting on the variance.

There are two main families of sampling methods: probability (random) sampling and nonprobability sampling,
respectively typical of (but not exclusive to) quantitative and qualitative research respectively. Probabilistic
samples are considered representative of reality: What can be said about the sample can be extended to the
reality of what is sampled by statistical inference. Another advantage is that the sampling error, which is a crucial
datum to assess the validity of the sample, is calculable: This is possible only for probability samples. The main
problem, however, is that researchers need the complete list of the target population (i.e., the sample frame),
though sometimes the exact number of the population is sufficient, to extract the sample, and often this is
impossible to obtain (e.g., hen a researcher wants to study the audience of a movie). With probability samples,
each element has a known probability of being included in the sample but the non-probability samples do not
allow the researcher to determine this probability. Probability samples are those based on simple random
sampling, systematic sampling, stratified sampling, cluster/area sampling whereas non-probability samples are
those based on convenience sampling, judgment sampling and quota sampling techniques.
In most cases, the size of a probability sample is determined by the following formula:

n z 2 pqN
n¿ 2 2
E ( N −1 )+ z pq

 Z: refers to the confidence level (cl) of the estimate (ofteny fixed at 1.96, corresponding to a 95% cl)
 pq: is the variance (that is unknown and then fixed at its maximum value: 0.25),
 N: is the size of the population,
 E: is the sampling error (often ≤ 0.04).
 n: is the required sample size

On the other hand non-probability samples are generally purposive or theory driven. This means they are
gathered following a criterion the researcher believes to be satisfying to obtain typological representativeness.
This latter is achieved, when the researcher has sufficient members of all the main categories of interest to be
able to describe with confidence their patterned similarities and differences.

Being purposive, non-probability samples are rather heterogeneous. Up to 16 different qualitative sampling
strategies are available for choosing a non-probability sample. It is almost impossible to give an exhaustive list,
because they are continuously open to integrations and new solutions. However, quota, snowball, purposive,
theoretical, and accidental sampling are among the most common types of non-probability sampling techniques.
The main problem with non-probability samples is that the researcher has only loose criteria for assessing their
validity: The sampling error is unknowable, so the researchers cannot say whether the results are representative
or not, and the risk of non-sampling errors is large.

Sampling in Qualitative research (Non-probability Sampling)

1. Deliberate sampling: Deliberate sampling is also known as purposive or non-probability sampling. This
sampling method involves purposive or deliberate selection of particular units of the universe for constituting a
sample, which represents the universe.
2. Convenience sampling When population elements are selected for inclusion in the sample based on the ease
of access, it can be called convenience sampling. If a researcher wishes to secure data from, say, gasoline buyers,
he may select a fixed number of petrol stations and may conduct interviews at these stations. This would be an
example of convenience sample of gasoline buyers.
3. Judgment sampling: Sometimes, Convenience sampling procedure may give very biased results particularly
when the population is not homogeneous. On the other hand, in judgment sampling the researcher’s judgment is
used for selecting items that he considers as representative of the population. For example, a judgment sample of
college students might be taken to secure reactions to a new method of teaching. Judgment sampling is used
quite frequently in qualitative research where the desire happens to be to develop hypotheses rather than to
generalize to larger populations.
4. Purposive sampling: Purposive sampling, one of the most common sampling strategies,
groups participants according to preselected criteria relevant to a particular research question (for example,
HIV-positive women in Capital City). Sample sizes, which may or may not be fixed prior to data collection, depend
on the resources and time available, as well as the study’s objectives. Purposive sample sizes are often
determined on the basis of theoretical saturation (the point in data collection when new data no longer bring
additional insights to the research questions). Purposive sampling is therefore most successful when data review
and analysis are done in conjunction with data collection.

5. Quota sampling: Quota sampling, sometimes considered a type of purposive sampling, is also common. In
quota sampling, we decide while designing the study how many people with which characteristics to include as
participants. Characteristics might include age, place of residence, gender, class, profession, marital status, use of

a particular contraceptive method, HIV status, etc. The criteria we choose allow us to focus on people we think
would be most likely to experience, know about, or have insights into the research topic. Then we go into the
community and – using recruitment strategies appropriate to the location, culture, and study population – find
people who fit these criteria, until we meet the prescribed quotas. (See the section in this module on Recruitment
in Qualitative Research, page 6.)

6. Snowball sampling: A third type of sampling, snowballing – also known as chain referral sampling – is
considered a type of purposive sampling. In this method, participants or informants with whom contact has
already been made use their social networks to refer the researcher to other people who could potentially
participate in or contribute to the study. Snowball sampling is often used to find and recruit “hidden
populations,” that is, groups not easily accessible to researchers through other sampling strategies.
Sampling in Quantitative research (Random sampling)
1. Simple random sampling: This type of sampling is also known as chance sampling or probability sampling
where each and every item in the population has an equal chance of inclusion in the sample and each one of the
possible samples, in case of finite universe, has the same probability of being selected. For example, if we have to
select a sample of 300 items from a universe of 15,000 items, then we can put the names or numbers of all the
15,000 items on slips of paper and conduct a lottery. Using the random number tables is another method of
random sampling. To select the sample, each item is assigned a number from 1 to 15,000. Then, 300 five digits
random numbers are selected from the table. To do this we select some random starting point and then a
systematic pattern is used in proceeding through the table. We might start in the fourth row, second column and
proceed down the column to the bottom of the table and then move to the top of the next column to the right.
When a number exceeds the limit of the numbers in the frame, in our case over 15,000, it is simply passed over
and the next number selected that does fall within the relevant range. Since the numbers were placed in the table
in a completely random fashion, the resulting sample is random. This procedure gives each item an equal
probability of being selected. In case of infinite population, the selection of each item in a random sample is
controlled by the same probability and that successive selections are independent of one another.
2. Systematic sampling: In some instances, the most practical way of sampling is to select every 15th name on a
list, every 10th house on one side of a street and so on. Sampling of this type is known as systematic sampling. An
element of randomness is usually introduced into this kind of sampling by using random numbers to pick up the
unit with which to start. This procedure is useful when sampling frame is available in the form of a list. In such a
design, the selection process starts by picking some random point in the list and then every nth element is
selected until the desired number is secured.
3. Stratified sampling: If the population from which a sample is to be drawn does not constitute a homogeneous
group, then stratified sampling technique is applied so as to obtain a representative sample. In this technique, the
population is stratified into a number of no overlapping subpopulations or strata and sample items are selected
from each stratum. If the items selected from each stratum is based on simple random sampling the entire
procedure, first stratification and then simple random sampling, is known as stratified random sampling.

4. Quota sampling: In stratified sampling the cost of taking random samples from individual strata is often so
expensive that interviewers are simply given quota to be filled from different strata, the actual selection of items
for sample being left to the interviewer’s judgment. This is called quota sampling. The size of the quota for each
stratum is generally proportionate to the size of that stratum in the population. Quota sampling is thus an
important form of non-probability sampling. Quota samples generally happen to be judgment samples rather
than random samples.
5. Cluster sampling and area sampling: Cluster sampling involves grouping the population and then selecting
the groups or the clusters rather than individual elements for inclusion in the sample. Suppose some
departmental store wishes to sample its credit card holders. It has issued its cards to 15,000 customers. The
sample size is to be kept say 450. For cluster sampling this list of 15,000 card holders could be formed into 100
clusters of 150 card holders each. Three clusters might then be selected for the sample randomly. The sample
size must often be larger than the simple random sample to ensure the same level of accuracy because is cluster
sampling procedural potential for order bias and other sources of error is usually accentuated. The clustering
approach can, however, make the sampling procedure relatively easier and increase the efficiency of field work,
specially in the case of personal interviews.

Area sampling is quite close to cluster sampling and is often talked about when the total geographical area of
interest happens to be big one. Under area sampling, we first divide the total area into a number of smaller non-
overlapping areas, generally called geographical clusters, then a number of these smaller areas are randomly
selected, and all units in these small areas are included in the sample. Area sampling is specially helpful where
we do not have the list of the population concerned. It also makes the field interviewing more efficient since
interviewer can do many interviews at each location.
6. Multi-stage sampling: This is a further development of the idea of cluster sampling. This technique is meant
for big inquiries extending to a considerably large geographical area like an entire country. Under multi-stage
sampling the first stage may be to select large primary sampling units such as states, then districts, then towns
and finally certain families within towns. If the technique of random-sampling is applied at all stages, the
sampling procedure is described as multi-stage random sampling.
7. Sequential sampling: This is somewhat a complex sample design where the ultimate size of the sample is not
fixed in advance but is determined according to mathematical decisions on the basis of information yielded as
survey progresses. This design is usually adopted under acceptance sampling plan in the context of statistical
quality control. In practice, several of the methods of sampling described above may well be used in the same
study in which case it can be called mixed sampling. It may be pointed out here that normally one should resort
to random sampling so that bias can be eliminated and sampling error can be estimated. But purposive sampling
is considered desirable when the universe happens to be small and a known characteristic of it is to be studied
intensively. In addition, there are conditions under which sample designs other than random sampling may be
considered better for reasons like convenience and low costs. The sample design to be used must be decided by the
researcher taking into consideration the nature of the inquiry and other related factors.


