SAMPLING and Sampling Distribution

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Sampling and Sampling Distribution

Dr. Shweta Mogre


 The researcher has to decide upon

the number of respondent from
whom he would be required to
collect the data.
Definition : Population & Sample
 Population means the entire
spectrum of a system of interest
 Example: Population representing
the account holder of a bank consist
of individual who are having
accounts with the bank.
 Sample is that part of universe
which we select for the purpose of
Bases for defining population

 Geographical area
 Demographics
 Usage/lifestyle
 Awareness
Sampling: Meaning

 Sampling is the process of learning

about the population on the basis of
a sample drawn from it.
 Process of selecting a subset of
randomized number of members of
the population of a study and
collecting data about their
Sampling Units

 The limited member of population

selected for sampling are called as
Sampling Units.
 The size of the number of sampling
units is relatively smaller when
compared to that of the population.
 Results in reduced time & cost of
data collection.
Sampling Frame

 The complete list of all members/

unit of the population from which
each sampling unit is selected is
known as Sampling Frame
o Sampling frame should be free from
o Sampling frame contain its each
unit only once.
Essentials of Sampling Frame

 All valid units in a frame makes it

 Absence of non-existing unit in
frame improves its accuracy.
 The structure of frame should make
it adequate in terms of covering the
entire population
Classification of sampling Techniques

Sampling Techniques

Non probability Probability

Sampling Techniques Sampling Techniques

Convenience Judgmental Quota Snowball

Sampling Sampling Sampling Sampling

Simple Random Systematic Stratified Cluster Multistage

Sampling Sampling Sampling Sampling Sampling

 Each unit of the population has a

probability of being selected as a
unit of sample.
 Probability Sampling is more
Rigorous and free from biases.
Methods of Probability Sampling

1. Simple random sampling

2. Systematic sampling
3. Stratified Sampling
4. Cluster sampling
5. Multi-stage sampling
1. Simple Random Sampling
 Refers to that sampling technique in
which each and every unit of population
has an equal opportunity of being
selected in the sample .
 Every unit has an equal non-zero chances
of being selected.
 Two ways of performing simple random
i) Random sampling with replacement
ii) Random sampling without replacement
Procedures for Drawing Probability

1. Select a suitable sampling frame

2. Each element is assigned a number
from 1 to N (pop. size)
3. Generate n (sample size) different
random numbers between 1 and N
4. The numbers generated denote the
elements that should be included in
the sample.
Systematic Sampling
 Systematic random sampling is a
method of probability sampling in which the
defined target population is ordered and
the sample is selected according to position
using a skip interval.
 The sample is chosen by selecting a
random starting point and then picking
every ith element in succession from the
sampling frame.
 The sampling interval, i, is determined by
dividing the population size N by the
sample size n and rounding to the nearest
 For example, there are 1,00,000
elements in the population and a
sample of 1,000 is desired. In this
case the sampling interval, i, is 100.
A random number between 1 and
100 is selected. If, for example,
this number is 23, the sample
consists of elements 23, 123, 223,
323, 423, 523, and so on
Stratified Sampling

 The population is divided into

specific set of strata such that the
members within each stratum have
similar attributes but the members
between strata have dissimilar
 Each stratum is homogenous when
compared to the population.
 The strata should be mutually exclusive
and collectively exhaustive in that every
population element should be assigned to
one and only one stratum and no
population elements should be omitted.
 In proportionate stratified sampling, the
size of the sample drawn from each
stratum is proportionate to the relative
size of that stratum in the total population
 In disproportionate stratified
sampling, the size of the sample
from each stratum is proportionate
to the relative size of that stratum
and to the standard deviation of the
distribution of the characteristic of
interest among all the elements in
that stratum.
Cluster Sampling
 The target population is first divided into
mutually exclusive and collectively
exhaustive subpopulations, or clusters.
 Then a random sample of clusters is
selected, based on a probability sampling
technique such as SRS.
 For each selected cluster, either all the
elements are included in the sample (one-
stage) or a sample of elements is drawn
probabilistically (two-stage).
Cluster Sampling
 Elements within a cluster should be as
heterogeneous as possible, but clusters
themselves should be as homogeneous as
possible. Ideally, each cluster should be a small-
scale representation of the population.
 In probability proportionate to size sampling,
the clusters are sampled with probability
proportional to size. In the second stage, the
probability of selecting a sampling unit in a
selected cluster varies inversely with the size of
the cluster
Methods of Non-probability Sampling

1. Convenience Sampling
2. Judgment Sampling
3. Quota Sampling
4. Snow-ball Sampling
Convenience Sampling
Convenience sampling attempts to obtain a
sample of convenient elements. Often,
respondents are selected because they happen to
be in the right place at the right time.

 use of students, and members of social

 mall intercept interviews without qualifying the
 department stores using charge account lists
 “people on the street” interviews
Judgment Sampling
 Judgmental sampling is a form of
convenience sampling in which the
population elements are selected based
on the judgment of the researcher
 The researcher applies his/her intuitive
judgment and previous experience in
selecting the sampling unit.
 Also called as Purposive Sampling
 There is more chances of personal biases.
Quota Sampling

 Population is classified into a

number of groups based on some
criteria like age of the members of
population viz old age, middle age,
young age.
Snow-ball Sampling
In snowball sampling, an initial group of
respondents is selected, usually at random.

 After being interviewed, these respondents are

asked to identify others who belong to the
target population of interest.
 Subsequent respondents are selected based on
the referrals.
 The method is appropriate where the
development of sampling frame is a difficult &
time consuming task.
Sampling Errors

 Sampling errors are statistical

errors that arise when a sample
does not represent the whole
population. They are the difference
between the real values of the
population and the values derived
by using samples from the
Non-sampling Error

 Non-sampling error refers to an

error that arises from the result of
data collection, which causes the
data to differ from the true values.
 Non-sampling errors have the
potential to cause bias in polls,
surveys or samples
Choosing Non probability vs.
Probability Sampling
Conditions Favoring the Use of
Factors Nonprobability Probability
sampling sampling

Nature of research Exploratory Conclusive

Relative magnitude of sampling Non sampling Sampling

and non sampling errors errors are errors are
larger larger

Statistical considerations Unfavorable Favorable

Operational considerations Favorable Unfavorable

Sampling Distribution
 Researchers often use a sample to draw
inferences about the population that
sample is from.
 To do that, they make use of a probability
distribution that is very important in the
world of statistics: the sampling
 Sampling distribution is the distribution of
all possible values of static from all the
distinct possible samples of equal size
drawn from a population.
 It is theoretical distribution.
Sampling Distribution of Sample Mean

Suppose that a small, finite population contains only N = 8 numbers:

54 55 59 63 64 68 69 70
• Distribution of the population data:

• Suppose that all possible samples of size n = 2 are taken from this
Sampling Distribution of Sample Mean

(54, 54) (55, 54) (59, 54) (63, 54)
54 55 59 63 64 68 69 70 (54, 55) (55, 55) (59, 55) (63, 55)
(54, 59) (55, 59) (59, 59) (63, 59)
All possible samples of n = 2: (54, 63) (55, 63) (59, 63) (63, 63)
(54, 64) (55, 64) (59, 64) (63, 64)
(54, 68) (55, 68) (59, 68) (63, 68)
• Then take the means of all of the samples (54, 69) (55, 69) (59, 69) (63, 69)
(54, 70) (55, 70) (59, 70) (63, 70)
(64, 54) (68, 54) (69, 54) (70, 54)
(64, 55) (68, 55) (69, 55) (70, 55)

(64, 59) (68, 59) (69, 59) (70, 59)

(64, 63) (68, 63) (69, 63) (70, 63)
(64, 64) (68, 64) (69, 64) (70, 64)
(64, 68) (68, 68) (69, 68) (70, 68)
(64, 69) (68, 69) (69, 69) (70, 69)
(64, 70) (68, 70) (69, 70) (70, 70)

Sampling Distribution of Sample Mean
Means of the samples:
54 54.5 56.5 58.5 59 61 61.5 62
54.5 55 57 59 59.5 61.5 62 62.5
56.5 57 59 61 61.5 63.5 64 64.5
58.5 59 61 63 63.5 65.5 66 66.5
59 59.5 61.5 63.5 64 66 66.5 67
61 61.5 63.5 65.5 66 68 68.5 69
61.5 62 64 66 66.5 68.5 69 69.5
62 62.5 64.5 66.5 67 69 69.5 70

Sampling Distribution of Sample Mean

Distribution of the means of the samples:

Sampling Distribution of Sample Mean

Distribution of the mean of the samples looks different

from the original distribution:

Central limit theorem
 When the random samples of observations
are drawn from a non-normal population
with finite mean μ and standard deviation
σ, and as the sample size n is increased,
the sampling distribution of sample mean x
is approximately normally distributed, with
mean and standard deviation as:

   

  
x n

x  
x 
z  x

 

Central limit theorem
 Central limit theorem says that, provided that the sample
size is sufficiently large, the sampling distribution of
sample mean X-bar has an approximately normal
distribution. Even if the variable of interest is not normally
distributed in the population.
 No matter how a variable is distributed in the population,
the sampling distribution of the sample mean is always
approximately normal, as long as the sample size is large
 As a guideline for ‘large enough’ a sample size of 30 or
larger is often used.
 The central limit theorem is useful in statistical inference.
When the sample size is sufficiently large, estimations
such as ‘average’ or ‘proportion’ that are used to make
inferences about population parameters are expected to
have sampling distribution that is approximately normal.
Point Estimation
 A sample statistic (such as x bar, s, or p ) that is
calculated using sample data to estimate most likely
value of the corresponding unknown population
parameter (such as μ, σ or p) is termed as point
estimator, and the numerical value of the estimator is
termed as point estimate.
 For example, if we calculate that 10 per cent of the items
in a random sample taken from a day’s production are
defective, then the result ‘10 per cent’ is a point estimate
of the percentage of items in the whole lot that are
defective. Thus, until the next sample of items is not
drawn and examined, we may proceed on manufacturing
with the assumption that any day’s production contains
10 per cent defective items.
Drawback of Point Estimates
 The draw back of a point estimate is that no
information is available regarding its reliability,
i.e. how close it is to its true population
parameter. In fact, the probability that a single
sample statistic actually equals the population
parameter is extremely small.
 For this reason, point estimates are rarely used
alone to estimate population parameters. It is
better to offer a range of values within which
the population parameters are expected to fall
so that reliability (probability) of the estimate
can be measured. This is the purpose of
interval estimation
Interval Estimation
 Interval Estimation is calculation of
the interval(range of values) within
which the population parameters
are expected to fall.
 It is calculated from a sample
expected to include the
corresponding population
 It is calculated by with the help of
confidence interval and margin of
Confidence Interval and Margin of
 Margin of error: The value added or subtracted from a
point estimate in order to develop an interval estimate of
a population parameter.
Margin of error =

z *
 Or = s
t *
 Confidence interval: The interval within which the
population parameter is expected to lie.
 Confidence interval= Point estimate ± Margin of error
 Confidence limits: The boundaries (both upper and
lower) of a confidence interval.
Excel Formula

 Margin of Error
= Confidence. Norm (alpha, standard
= Confidence.t (alpha, standard

Alpha- level of signicance

 A survey conducted by a shopping mall group
showed that a family in a metro-city spends an
average of Rs 2000 on cloths every month. Suppose
a sample of 81 families resulted in a sample mean of
Rs 2040 per month and a sample standard deviation
of Rs 150, develop a 95 per cent confidence interval
estimator of the mean amount spent per month by a
 = 2040 ± 1.96 (150/9)
 = 2040 ± 32.67
 = 2040 + 32.67 and 2040 – 32.67
 2072.67 and 2007.33
 Hence for 95% level of confidence the population
mean is likely to fall between in Rs.2007.33 and Rs.
Estimating Sample Size

when estimating mean

 n = (σ *Z /E)2

when estimating proportion

 n = Z2*pq /E2
Estimating Sample Size
 Question:
Assuming the heights of students in a college campus are
normally distributed with a standard deviation = 5 in, find
the minimum size required to construct a 95% confidence
interval for mean with a maximum error = 0.5 in.
 Solution:
 Given: E = 0.5 in, σ = 5 and α = 1 – 0.95 = 0.05
 Therefore, Z at 0.05 level = 1.96
 The formula to find the minimum sample size is N = (σ *Z
 Now, substitute the given values in the sample size
formula, we get
 n=(1.96*(5)/0.5)^2=384.16
 Therefore, rounding this value up to the next integer, the
minimum sample space required is 385.

 A business analyst wants to

estimate what proportion of refinery
workers in US are contract workers.
The analyst wants to be 99%
confident of her results and be
within 0.05 of the actual proportion.
 Note: Z = 2.275 at 1% level of
Correction factor for finite

 n= n0N/n0+(N-1)

 For a population of 1000, what

should be the sampling size
necessary to estimate the
population mean at 95 per cent
confidence with a sampling error of
5 and the standard deviation equal
to 20?

You might also like