Sampling English

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

SAMPLING

First: Introduction to Sampling Surveys:


There are two main approaches to data collection: complete enumeration (census)
and sampling.
 Complete enumeration means collecting information about all individual
elements in the studied population. Example: population censuses
 Sampling, on the other hand, means gathering information about only a fraction
or a subset of the studied population. Example: Sampling surveys

Advantages of Sampling
1) Low cost
With each extra element added to the process, the cost of data collection and data
handling (including editing, coding, data entry, description, and data analysis)
increases monotonically.
2) Higher accuracy
Since a research using sampling requires fewer employees and less time for data
collection and data handling, it is easier to have stricter control of the two processes.

3) Wider applicability
The low cost of sampling research facilitates performing a large number of studies
that would otherwise have been precluded by the high cost of complete enumeration.
Probability and Non-Probability Sampling
 Probability sampling is defined as any selection technique that gives
each population element a determined (known or could be estimated)
probability of being selected into the sample. These probabilities of
selection are not necessarily equal.
 Unequal probabilities of selection imply biased probability sampling.
Some population elements may be given a zero probability of selection,
in which case we have non-coverage probability sampling. Both biased
and non-coverage probability samplings are not representative of the
sampled population.
 All probability sampling techniques that assign equal probabilities of
selection to all elements of the sampled population are said to be
representative sampling techniques.
 The main advantage of probability sampling is its following
probability laws. Therefore, generalization from the results of a sample
selected by a probability sampling technique to the population can be
achieved through statistical inference.

 All sample selection techniques that do not satisfy the necessary condition
of probability sampling are considered non-probability sampling
techniques. There are several circumstances when it is useful to use non-
probability sampling.
Examples of Non-Probability Sampling:
1) Purposive sampling
It is any sample selection technique that depends on experts’ opinion
in choosing a typical sample.
It could be preferred to probability sampling techniques in some
circumstances. This mainly occurs when the desired sample size is
small

2) Quota Sampling
Experts do not specify which elements to be chosen, but rather their
main characteristics such that the sample resembles the population.
Quota sampling is frequently used in marketing studies and in opinion
polls where quick and not so costly results are required. However,
quota sampling is plagued by the personal interference in the selection
process which makes it non-generalizable. The resulting sample can be
highly non-representative of the population if the study topic depends
on other characteristics beside those included in putting the selection
constraints.
3) Convenience Sampling
 The main selection criterion in this sampling technique is how easy it is to reach the
selected population elements.
 This technique is not suitable to descriptive studies because it yields samples that are
not representative of any specific population and cannot be generalized from
 Usefulness of the technique depends on the validity of the underlying assumption that
all population elements are similar with respect to the basic structure of variable
relationships
 Convenience sampling is only appropriate in exploratory studies
4) Haphazard Sampling
 This type includes several selection techniques that do not apply any objective
criteria. Examples are selecting people who happen to be present in a specific place at
a specific time, without any objective selection of the place and time (e.g. around the
corner surveys)
 Haphazard sampling is the worst and most biased type of sampling. Results from a
haphazard sample are non-generalizable. They cannot even be taken as indicative,
like the results of purposive or convenience samples, because haphazard samples are
usually atypical, i.e. so different from the population. It is known, for instance, that
media and internet polls tend to represent the minority-held extreme opinions rather
than the majority-held moderate ones because those with strong extreme opinions are
keener to participate in such polls.
5) Snowball sampling
It is mainly used in studies of hard-to-reach/enter populations (such as criminals and
illegal migrants). The idea is to start with a small number of individuals belonging to
that population. Each initial member is then used to bring in or recruit other members,
and so on.
The Population and the Frame
The population is the entire set of individual elements to which the researcher wants to
generalize (or extrapolate) the findings of the research.
 In the first stage, the researcher should determine what is called the target
population, which is the population from which the sample will be selected. The target
population is usually smaller than the population to which the researcher wants to
generalize.
The most important factors to be considered in the process of determining the target
population are generalizability to the original population and the feasibility of
sampling and data collection from the chosen target population.
 In the second stage, the researcher has to get a frame for the target population.
The frame is a mapping of the target population that can be used for sample selection.
The frame contains a list of sampling units, which are usually given numerical codes
such that a probability sample could be selected using random number generators. The
key is to have a clear mapping between the population elements and the sampling units
in the frame.
Most straightforward examples of frames are name lists. Frames also exist in other
forms besides name lists. The frame could be a map such that specific delimited
locations constitute the sampling units.
There are the two criteria in choosing the frame. The first is the correspondence
between the sampling units in the frame and the elements of the target population. The
second criterion is the availability of the proposed frame.
Unconventional frames usually depend on specifying times or places (or both) where
theelements of the target population can be allocated.
Characteristics of Perfect Frames
1- Coverage
Every element in the population has to be mapped to at least one
sampling unit in the frame.
2- Non-repetition
Every element in the target population should correspond to a single
unit in the frame.
3- No empty or foreign units
Every sampling unit in the frame should correspond to at least one
element in the target population.
4- Individuality
Every sampling unit in the frame should correspond to only one
element in the target population.
Sampling Distributions
 Let us assume that the target population contains N elements and that we
are interested in estimating a particular measure that summarizes the
values of a specific variable among the elements of the target population. Let
V denote the value of the measure we want to estimate, which we will get if
we study the target population using complete enumeration. Since we are
not going to do a complete enumeration, the population value V is unknown.
 If we select a sample of size n from the target population using a probability
sampling approach and estimated the measure under study we will get what
we will call the sample value, denoted by v. Since the selected sample
contains only a subset of the elements of the target population, the sample
value is typically different from the population value. The difference (e = v –
V) is called the sampling error. The sampling error is hence defined as the
error in estimating the measure under study that is attributable to the fact
that sampling rather than complete enumeration is used in data collection.
Since the population value V is unknown, the sampling error e is also
unknown.

 Typically, the estimate v will differ among these potential samples. Since the
sampling approach is probabilistic, the sample value v is a random variable
that will vary among potential samples according to a specific probability
distribution.
 We can construct a table containing each possible value for the sample value
v along with the probability of getting that particular value, i.e. the
probability distribution of the sample value, which is called the sampling
distribution.
 It should be noted that the sampling distribution depends on the
properties of the target population and also on the sampling design
(i.e. the sample size and the technique of selecting the sampled
elements from the target population).
 Also note that the sampling distribution of the sample value v can
easily be translated to the sampling distribution of the sampling error
e, since the difference between the two random variables v and e is a
constant V.
 Statistical inference techniques can be employed to estimate the
characteristics of the sampling distribution using one observed
representative sample. If the sample is large enough, many sample
values (estimators) are approximately normally distributed.
Sources of Errors in Surveys:
Surveys’ errors can be classified into two main groups: non-measurement
errors and measurement errors.
Non-measurement errors mean that information is not available about
some population elements.
Measurement errors mean that some of the information available is
different from the true values of the variable under study.
Whether the true value for some elements are unavailable due to a non-
measurement or a measurement error, the value computed for the
measurement under study will be different from its true value V*. If the
type of error results in a tendency for the calculated value to be higher (or
lower) than the true value (i.e. to be biased), it is said to be a systematic
error. Otherwise, it is said to be a random error.
Non-Measurement Errors:
There are three sources for non-measurement errors.
1) Sampling
Using the sampling approach, instead of complete enumeration, the
results are always subject to sampling errors. The main advantage of the
sampling error in case of representative sampling techniques is its being
random.
2) Non-coverage
This error happens when a part of the target population is excluded from
the studied population from which data are collected (in case of complete
enumeration) or from the sampling frame. Non-coverage typically results
.in biased results

3) Non-response
This means the unavailability of information about some elements covered
by the study, either because efforts to collect this information failed or
because collected information were lost or deemed unusable. Refusal is the
most prevalent cause of non-response; and it usually introduces bias in
the results. This is because refusal is typically related to the variable
under study.
Measurement Errors
There are many sources for measurement errors (also called response
errors).
1) Interviewers
Unless effectively trained and monitored, interviewers can introduce a
multitude of response errors. This might happen due to negligence in
asking the survey questions or in writing down the answers they get.
At the worst, some interviewers could fabricate the answers of whole
questionnaires or parts of them.
2) Respondents
The respondent may give untrue answers due to being ignorant or
forgetful of the true answer; or because he or she misunderstood the
question. The respondent may also give an untrue answer deliberately
to avoid a real or imagined harm; or to gain a real or imagined benefit.
3) Questionnaires
Complicated, unclear and suggestive questions induce respondents to
give untrue answers.
4) Data handling
In addition to the measurement errors occurring during the interview or
questionnaire filling process, errors can occur while editing the
questionnaire. Extra errors can be induced during coding and data entry
processes.

The sampling error is the only type of error that is unique to sampling
surveys while the rest of errors can affect both complete enumerations and
sampling surveys. In addition, except for non-coverage errors, all other
types of errors are more likely to affect complete enumerations which are
more complicated and, hence, are more difficult to monitor efficiently.
Criteria for Good Sampling Designs:
1) Goal orientation
Focusing on other criteria for good designs may distract the researcher
from focusing on the most important criterion, which is satisfying the
goals of the planned study.
2) Feasibility
This essential criterion is especially relevant when determining the
target population, preparing the sampling frame and choosing the
sampling technique compatible with the available frame.
3) Generalizability
This criterion is the main reason why probability sampling is preferred
to non-probability sampling. However, even with probability sampling,
further conditions have to be met to enable generalizability.
4) Efficiency
By efficiency it is meant achieving a specified level of precision with the
least cost; or, alternatively, achieving the highest precision within a
specified cost.

You might also like