Sampling Theory

Download as pdf or txt
Download as pdf or txt
You are on page 1of 26

SAMPLING THEORY

*CMS 311: Business Statistics Dr. Abraham Kiruga, Department of Finance & Accounting, CUEA

Sampling Theory

Introduction

Sampling theory is a branch of statistics that focuses on selecting and analyzing a subset (sample)
of data from a larger population to make inferences about the whole population. Its primary aim is
to understand how samples relate to populations, allowing researchers to make informed
conclusions about a population without needing to study every individual in it. Sampling theory
helps ensure that the sample is representative of the population, which is crucial for the accuracy
and reliability of the inferences.

Key components of sampling theory include:

Population and Sample: The population is the complete set of elements or observations relevant
to a particular study, while a sample is a subset of this population used to * inferences.

Sampling Methods: Various methods can be used to select a sample, such as simple random
sampling, stratified sampling, cluster sampling, and systematic sampling. The choice of method
depends on factors like the study’s objectives, the nature of the population, and resource
constraints.

Sampling Distribution: This is the probability distribution of a given statistic (e.g., mean or
proportion) based on all possible samples of a particular size from the population. Sampling
distributions help estimate the variability of a statistic and form the foundation for statistical
inference.

Sampling Error: This error arises due to the difference between a sample statistic and the actual
population parameter. Sampling error decreases with larger sample sizes and more representative
sampling methods.

Central Limit Theorem (CLT): CLT is fundamental to sampling theory, stating that the sampling
distribution of the sample mean will approximate a normal distribution as the sample size
increases, regardless of the population's original distribution. This property allows researchers to
make probabilistic statements about population parameters using sample data.

Sampling theory is widely used across various fields, including market research, opinion polling,
and scientific studies, where it enables cost-effective and time-efficient data collection while
preserving the ability to make accurate generalizations.

Important Terms

❖ Sampling design= it is a set of decisions that must be made before the data are collected.

1|Page
SAMPLING THEORY

❖ Population= it is a set or collection of all possible observations of some specific


characteristic.
❖ Elementary unit or Element = it is a person or object on which a measurement is taken
❖ Frame= a listing of all elementary units in a given problem
o ○ Since a sample is only a part of a population, any inference made about the
o population characteristics based on the sample may be erroneous.
o ○ Despite this possibility, there are various reasons for taking a sample rather than
o a census of the entire population.

1.1.Sample
By sample we mean the aggregate of objects, persons or elements, selected from the universe. It
is a portion or sub part of the total population.

The following two methods are used to collect information about the population
❖ Census and;
❖ Sampling

Census: When each and every element or unit of the population is studied
Sampling: When a small part of the population is selected for study.

Why Sampling?

1.1.Advantages
❖ Helps to collect vital information more quickly. Even small samples, when properly
selected, help to make estimates of the characteristics of the population in a shorter time.
❖ The modern world is highly dynamic, therefore, any study must be completed in short
time, otherwise, by the time the survey is completed the situations, characteristics etc may
have changed.
❖ It cuts costs; enumeration of total population is much more costly than the sample studies.
❖ Sampling techniques often increases the accuracy of data. With small sample, it becomes
easier to check the accuracy of the data. Some sampling techniques/ methods make it
possible to measure the reliability of the sample estimates from the sample itself.
❖ From the administrative point of view also sampling becomes easier, because it involves
less staff, equipment’s etc.
1.1.Disadvantages
❖ Sampling is not feasible where knowledge about each element or unit or a statistical
universe is needed.
❖ The sampling procedures must be correctly designed and followed otherwise, what we
call as wild sample, would crop up with mis-leading results.
❖ Each type of sampling has got its own limitations.
❖ There are numerous situations in which units, to be measured, are highly variable. Here a
very large sample is required in order to yield enough cases for achieving statistically
reliable information.

2|Page
SAMPLING THEORY

❖ To know certain population characteristics like population growth rate, population density
etc. census of population at regular intervals is more appropriate than studying by
sampling.

STEPS IN THE SAMPLING PROCESS


❖ Define the population from which the sample is to be drawn.
❖ Specify the population frame from which the sample will be taken.
❖ Choose the sampling methods on selecting samples.
❖ Determine the sample size requirement for the study.
❖ Select the actual samples.

SAMPLING METHODS/DESIGNS
❖ According to Element Selection:
○ Unrestricted = any element from the population has the chance to become a
sample
○ Restricted = certain elements are given the chance to become a sample given
certain qualifications

❖ According to Representation Basis:


○ Probability = everyone is given an equal chance to become a respondent.
(Simple Random, Complex Random, Systematic, Stratified, Cluster, Area, Double-stage,
Multi-stage)
○ Non-probability = not everyone is given an equal chance to become a respondent.
(Quota, Judgmental, Cluster, Convenience, Accidental, Snowball, Purposive)

Probability sampling methods

Simple Random Sampling


This refers to the sampling technique in which each and every item of the population is given an
equal chance of being included in the sample. Since selection of items in the sample depends
entirely on chance, this method is also called chance selection or representative sampling.

It is assumed that if the sample is chosen at random and if the size of the sample is sufficiently
large, it will represent all groups in the population

Random sampling is of 2 types; sampling with replacement and sampling without replacement

Sampling is said to be with replacement when from a finite population a sampling unit is drawn
observed and then returned to the population before another unit is drawn. The population in
this case remains the same and a sampling unit might be selected more than once

If on the other hand a sampling unit is chosen and not returned to the population after it has
been observed the sampling is said to be without replacement.

3|Page
SAMPLING THEORY

Random samples may be selected by the help of lottery method or table of random numbers (such
as tippet’s table of random numbers, fischer and Yates numbers or Kendall and Babington smith
numbers.)

Random sampling, therefore, involves careful planning and orderly procedure.

1.1. Steps of Simple Random Sampling


❖ Involves listing or cataloguing of all the elements in the population and assigning them
consecutive numbers.
❖ Deciding upon the desired sample size.
❖ Using any method of sampling, a certain number of elements from the list is selected.
1.1. Advantages of Random Sampling Technique
❖ Most basic, simple and easy method
❖ Provides a representative sample.
1.1. Disadvantages
❖ In most cases it is difficult to find data list of all units of the population to be sampled.
❖ The task of numbering every unit before the sample is chosen is time consuming and
expensive. The units need not only to be numbered but also arranged in a specified order.
❖ The possibility of obtaining a poor or misleading sample is always present when random
selection is used.

1.1. Methods of Drawing, Sample in Random Method


Lottery Method: The numbers of all the elements of the universe are written on different tickets
or pieces of paper of equal size shape and colour. which are then shuffled thoroughly in a box,
or a container. Then tickets are then drawn randomly their numbers are noted and the
corresponding individuals or objects are studied.

Tippets Numbers: It was first developed by Prof L. H. C. Tippet and since then is known by
his name. He developed a list of 10,400 sets of numbers randomly, each set being of four digits
There numbers are written on several pages in unsystematic order.

Grid Method: This method is applied in selection of the areas. Suppose we have to select any
number of areas from a town or any number of towns from a province for survey. For selection,
first a map of the whole area is prepared. The area is often divided into different blocks. A
transparent plate is made equivalent to the size of the map that consists of several seqared holes
in it which carries different numbers. By random sampling method it is decided as to which
numbers are to be included in the sample.

Stratified sampling
In this case the population is divided into groups in such a way that units within each group are
as similar as possible in a process called stratification. The groups are called strata. Simple
4|Page
SAMPLING THEORY

random samples from each of the strata are collected and combined into a simple. This technique
of collecting a sample from a population is called stratified sampling. According to the nature of
the problem relevant criteria are selected for stratification. Among the possible stratifying criteria,
cum age, sex, family income, number of years of education, occupation, religion, race, place of
residence etc. On the basis of characteristics universe can be divided into different strata or
stratum, Each stratum has to be homogeneous from within such a division can be done on the
basis of any single criterion. e.g. on the basis of age we can divide people into below 25 and
above 25 groups, on the basis of education into matriculates and non matriculates etc.
Stratification can also be done on the basis of a combination of any two or more criteria viz. on
the basis of sex and education, we can divide the people into four groups.
❖ Educated women
❖ Un-educated women
❖ Education men
❖ Un educated men
Elements are then selected from each stratum through simple a random sampling method. An
estimate is made for each stratum separately. These estimates are combined to provide an
estimate for the entire population.

Purpose: The primary purpose is to increase the representatives of the sample without
increasing the size of the sample on the basis of having greater knowledge of the population
characteristics.
❖ Advantages
❖ The population is first stratified into different groups and then the elements of the sample
are selected from each group. Therefore, the different groups are sure to have
representation in the sample. In case of random sample, there is possibility that bigger
groups have greater representation and the smaller groups are often eliminated or under
represented.
❖ With more homogenous population greater precision can be achieved with fewer cases.
This saves time in collecting and processing of the data when detailed study about
population characteristics are wanted it is more effective.
❖ As compared to random samples, stratified samples are geographically more
concentrated and thus save time, money and energy, in money from one address to
another.
❖ Disadvantages
❖ Unless there are extreme differences between the strata, the expected proportional
representation would be small. Here a random sampling may give a nearly proportional
representation.
❖ Even after stratification, the sample is selected from each stratum either by simple
random sampling method or by systematic sampling method; as such the draw backs of
both methods can be present.
❖ For application of the stratified method, one must know the characteristics of the
specified population in which the study is to be made. He must also known as to which
characteristics are related to the subject under investigation and therefore can be
considered as relevant for stratification.
❖ The process of stratification becomes more and more complicated and difficult as the
numbers of characteristics to be used for stratification are increased.

5|Page
SAMPLING THEORY

Types of Stratified Sampling


Stratified random sampling method can further be sub divided into two groups
❖ Disproportionate stratified sampling
❖ Proportionate stratified sampling
Disproportionate stratified sampling: Disproportionate stratified sampling is also known as
equal size stratified sampling. In this method, an "equal number" of cases are selected from each
stratum irrespective of the size of the stratum in the universe. The number of cases drawn from
each one is restricted to the number of pre designated in the plans. This also called "controlled
sampling" because the number of cases to be selected in various strata us limited.
Advantages
❖ When equal numbers of cases are taken from each stratum, comparisons of
different strata are facilitated.
❖ Economy of procedure
❖ The controlled sample prevents the investigators from securing an un necessary
large number of schedules for most prevalent groups of population.
Disadvantages
❖ It requires the weighing of results stratum by stratum, the relative frequency of each
stratum in the universe must be known or estimated in under tto determine the
weights.
Proportionate stratified Sampling: In this method cases are drawn from each stratum in same
proportion as they occur in the universe. To apply this method we first of all we need to have a
list of all striatum and also need to know their proportionate size in total population. Since the
size of the stratum vary, the number of persons coming from each stratum in the sample on the
basis of selection of a given percentage of people will also vary.

Advantage
❖ The definiteness of proportional representation.
Disadvantage
❖ The researcher may have poor judgment or in adequate information upon which to base
the stratification. the greater the number of characteristics on which we are to boor our
stratification, and the more are the strata the more complicated becomes the problem of
securing proportional representation of each stratum.

Systematic Sampling
This sampling is a part of simple random sampling in ascending or descending orders. In
systematic sampling a sample is drawn according to some predetermined object. Suppose a
population consists of 1000 units, then every tenth, 20th or 50th item is selected. This method is
very easy and economical. It also saves a lot of time.

1.1.Advantages
❖ It is frequently used because it is simple, direct and in- expensive.
❖ When a list of names or items is available, systematic sampling is often an efficient
approach.
1.1.Disadvantages

6|Page
SAMPLING THEORY

❖ One should not use systematic sampling in case of exploring unfamiliar areas because
listing of elements is not possible
❖ When there is a periodic fluctuation in the characteristic under examination in relation to
the order in which the items appear, the methods is ineffective

Multistage sampling
This is similar to stratified sampling except division is done on geographical/location basis, e.g. a
country can be divided into counties and then survey is done in 4 towns in each counties. This
helps to cut traveling costs for a surveyor.

Double stage sampling


❖ Getting a smaller sample from the initial large sample. This design is sometimes called
Sample within a Sample.
❖ It is usually done when the researcher intends to gather more in-depth and focused data
on the topic of investigation. The initial larger sample provides preliminary information
which helps in determining the second sample set to be drawn from the same sample
group

Area sampling
❖ It pertains to the grouping of the population into geographical division before selecting
the respondents. This sampling can be done if there exists a clear delineation of
communities where the respondents can be found.

Non- probability sampling methods

1.Clustered sampling
This is where a few geographical regions e.g. a location, town or village are selected at random
and say every single household or shop in that area is interviewed this again cuts on costs.

❖ It involves the grouping or division of the elements of the population into heterogenous
groups. It should be noted that each cluster sample is composed of respondents with
different perspectives and interests.

Difference of Stratified and clustered Sampling:

○ In STRATIFIED sampling, the grouping is done per department, that is why there
is homogeneity.
○ In CLUSTERED sampling, each group is composed of members representing a
particular department, that is why there is heterogeneity.

All the elements in these clusters are not to be included in the sample; the ultimate selection
from within the clusters is also carried out on simple or stratified sampling basis.
Purpose: The purpose of a cluster sample is to reduce cost and not essentially to increase
percussion.
Advantage

7|Page
SAMPLING THEORY

❖ In cluster sampling the cost per element is greatly reduced.


❖ It becomes possible to take a larger sample and regain the amount of precision
❖ It can be used in situations where it is impossible to obtain sample by other methods.
Disadvantage
❖ It is a complicated sample design the researcher has to be highly skilled in sampling.
❖ Its standard errors are almost inevitably larger then those of sample random sampling.

2. Judgment Sampling
Here the interviewer selects whom to interview believing that their view is more fundamental since
they might be directly affected e.g. to find out effects of public transport one may choose to
interview only people who don’t own cars and travel frequently to work.

3. Quota Sampling
Definition: Quota sampling involves dividing the population into subgroups (strata) based on
certain characteristics, and then selecting a specified number of participants from each subgroup.
The goal is to ensure the sample reflects the population in terms of specific characteristics (e.g.,
gender, age), but the participants within the quota are chosen non-randomly.

Example: A researcher wants to study the opinions of university students on a new online learning
system. They decide that the sample should include 50 male and 50 female students. Once they
meet the quota for each gender, they stop selecting more participants, even if more people want to
participate.
Advantage
❖ If properly planned and executed, a quota sample is most likely to give maximum
representative sample of the population.

❖ In purposive sampling one picks up the cases that are considered to be typical of the
population in which to one is interested.
❖ The cases are judged to be typical on the basis of the need of the researcher.
❖ Since the selection of elements is based upon the judgment of the researcher, the
purposive sampling as called judgment sample.
❖ The researcher trees in his sample to match the universe in some of the
important known characteristics.
Disadvantage
❖ The defect with this method is that the researcher can easily make esser in judging as to
which cases are typical.

4. Convenience Sampling
Definition: In convenience sampling, participants are selected based on their availability and
willingness to participate. It is often used when quick or easy access to participants is necessary,
but it can lead to bias since the sample may not represent the broader population.

8|Page
SAMPLING THEORY

Example: A researcher standing outside a mall asks the first 100 people who walk by to fill out a
survey about their shopping preferences. The sample is based on convenience, as those selected
are nearby and available at that moment.

May be use when


❖ Universe is not clearly defined
❖ Sampling units are not clear
❖ Complete source list is not available
5. Accidental Sampling
Definition: Accidental sampling is a form of convenience sampling where participants are chosen
by chance or accident, without any structured approach. This occurs when the researcher uses
whatever individuals are conveniently available.

Example: A researcher conducting a survey on public transport usage happens to meet people at a
bus stop and asks them to participate in the study. These participants are selected "accidentally,"
as they were available at the time and location of the study.

6. Snowball Sampling
Definition: Snowball sampling is used when participants are difficult to locate. In this technique,
the researcher asks initial participants to refer others who meet the criteria for the study, and this
process continues until the sample size is sufficient.

Example: A study on people recovering from rare diseases could start with a few individuals found
through a support group. The researcher then asks these individuals to refer others who are also
recovering from the same condition, expanding the sample through referrals.

7. Purposive Sampling
Definition: Purposive sampling (also called judgmental sampling) involves selecting participants
based on specific characteristics or criteria that are relevant to the research question. The researcher
uses their judgment to choose participants who are most appropriate for the study.

Example: A study investigating the effectiveness of a leadership program may specifically select
participants who have held leadership positions for at least five years, as they are most likely to
provide relevant insights into the program’s impact.
Advantage
❖ Quote sampling is a stratified cum purposive sampling and thus enjoys the
benefits of both samplings.
❖ It proper controls or checks are imposed, it is likely to give accurate results.
❖ It is only useful method when no sample frame is available.

THE CENTRAL LIMIT THEOREM

The theory was introduced by Moivre, Abraham de (1733). Laplace, Pierre Simon de (1810)
formulated the proof of the theorem and according to it;

9|Page
SAMPLING THEORY

if we select a large number of simple random samples, say from any population and
determine the mean of each sample, the distribution of these sample means will tend to be
described by the normal probability distribution with a mean µ and variance σ2/n.

This is true even of the population itself is not normal distribution. Or the sampling distribution of
sample means approaches to a normal distribution irrespective of the distribution of population
from where the sample is taken and approximation to the normal distribution becomes increasingly
close with increase in sample sizes

The Central Limit Theorem (CLT) is a statistical concept that states, that the sample mean
distribution of a random variable will assume a near-normal or normal distribution if the sample
size is large enough. In simple terms, the theorem states that the sampling distribution of
the mean approaches a normal distribution as the size of the sample increases, regardless of the
shape of the original population distribution.

As the user increases the number of samples to 30, 40, 50, etc., the graph of the sample means will
move towards a normal distribution. The sample size must be 30 or higher for the central limit
theorem to hold.

One of the most important components of the theorem is that the mean of the sample will be the
mean of the entire population. If you calculate the mean of multiple samples of the population, add
them up, and find their average, the result will be the estimate of the population mean.

The same applies when using standard deviation. If you calculate the standard deviation of all the
samples in the population, add them up, and find the average, the result will be the standard
deviation of the entire population.

How Does the Central Limit Theorem Work?

10 | P a g e
SAMPLING THEORY

The central limit theorem forms the basis of the probability distribution. It makes it easy to
understand how population estimates behave when subjected to repeated sampling. When plotted
on a graph, the theorem shows the shape of the distribution formed by means of repeated
population samples.

As the sample sizes get bigger, the distribution of the means from the repeated samples tends to
normalize and resemble a normal distribution. The result remains the same regardless of what the
original shape of the distribution was. It can be illustrated in the figure below:

From the figure above, we can deduce that despite the fact that the original shape of the distribution
was uniform, it tends towards a normal distribution as the value of n (sample size) increases.

Apart from showing the shape that the sample means will take, the central limit theorem also gives
an overview of the mean and variance of the distribution. The sample mean of the distribution is
the actual population mean from which the samples were taken.

The variance of the sample distribution, on the other hand, is the variance of the population divided
by n. Therefore, the larger the sample size of the distribution, the smaller the variance of the sample
mean.

Example of Central Limit Theorem

An investor is interested in estimating the return of ABC stock market index that is comprised of
100,000 stocks. Due to the large size of the index, the investor is unable to analyze each stock
independently and instead chooses to use random sampling to get an estimate of the overall return
of the index.

11 | P a g e
SAMPLING THEORY

The investor picks random samples of the stocks, with each sample comprising at least 30 stocks.
The samples must be random, and any previously selected samples must be replaced in subsequent
samples to avoid bias.

If the first sample produces an average return of 7.5%, the next sample may produce an average
return of 7.8%. With the nature of randomized sampling, each sample will produce a different
result. As you increase the size of the sample size with each sample you pick, the sample means
will start forming their own distributions.

The distribution of the sample means will move toward normal as the value of n increases. The
average return of the stocks in the sample index estimates the return of the whole index of 100,000
stocks, and the average return is normally distributed.

SAMPLING: THEORETICAL BASIS


Theoretical Basis of Sampling
❖ On the basis of sample study we can predict and generalize the behavior of mass
phenomenon.
❖ This is possible because there is no statistical population whose elements would
vary from each along without limit.
❖ Though we final diversity is a universal quality of mass data, every population has
characteristic properties with limited variation.
❖ Thus makes possible to select a relatively small unbiased random sample that can
portray fairly well. There are two important laws on which the theory of sampling is based:

Law of 'statistical Regularity’, and;


Law of 'Inertia of Large Number’.

Law of 'Statistical Regularity’


This law says that if a sample is taken, at random, from a population, it is likely to possess
almost the same characteristics as that of the population. The size of sample should be
'moderately large'.

Law of Inertia of Large Number


This law is a corollary (result or supplement) of the law of statistical regularity. It states that if
other things being equal, larger the size of the sample, more accurate the results are likely to be.
Thus is because large numbers are more stable as compared to small ones. The difference in the
aggregate result is likely to be significant, when the number in the sample is large.

Essentials of Sampling
If the sample results are to have any worthwhile meaning, it should possess the following
essentials.
• Representativeness: A sample should be so selected that it truly represents

12 | P a g e
SAMPLING THEORY

the universe, otherwise the results obtained may be misleading.


• Adequacy: The size of sample should be adequate otherwise it may not
represent the characteristics of the universe.
• Independence: All the items of the sample should be selected independently of
one another and all the items of the universe should have the same chance of being
selected in the sample.
• Homogeneity: The term homogeneity means that there is no basic difference in the
nature of the universe and that of the sample. It two sample from the same universe
are taken, they should give more or less the same unit.
Probability Sampling Methods
Simple or un restricted Random Sampling: Simple random sampling refers to that sampling
technique in which each and every unit of the population has an equal opportunity of being
selected in the sample. In simple sampling which item gets selected in the sample is just a matter
of chance personal bias of the investigator does not influence the selection. It must be noted that
random does not mean 'haphazard' or 'hit- or-miss' - it rather means that the selection process is
such that chance only determines which items shall be included in the sample.
Lottery Method: This is a very popular method of taking a random sample under this method,
all items of the universe are numbered or named on a separate steps of paper of identical shape
and size. These slips are then folded and mixed up in a container or drum. A blind fold selection
is then made of the number of slip required to constitute the desired sample size. The selection
of items is thus depends entirely on chance.

Restricted Random Sampling


Stratified Sampling: Stratified random sampling or simply stratified sampling is one of the
random methods which, by using the available information concerning the population, attempts
to design a more efficient sample than obtained by simple random procedure.
Proportionate and Disproportionate Stratified Sample: In a proportionate stratified sampling
plan, the number of items drawn from each strata is proportional to the size of strata. For
example, if the population is divided into five strata groups, their respective sizes being 10, 15,
20, 30 and 25 percent of the population and a sample of 5,000 is drawn. The desired proportional
sample may be obtained as follows:

From stratum one 5,000 (0.10) = 500 items


From stratum two 5,000 (0.15) = 750 items
From stratum three 5,000 (0.20) = 1,000 items
From stratum four 5,000 (0.30) = 1,500 items
From stratum five 5,000 (0.25) = 1,250 items
Total 5,000

Systematic Sampling: A systematic sampling is formed by selecting one unit at random and
then selecting additional units at evenly spaced intervals until the sample has been formed.
Thus method is popularly used in those cases where a complete list of the population from which

13 | P a g e
SAMPLING THEORY

the sample is to be drawn is available. The list may be prepared alphabetically, geographically
numerical etc. The items are serially numbered. The first item is selected at random generally
by following the lottery method. Subsequent items are selected by taking every the item from
the list where 'k' refers to the sampling interval or sampling ratio.
or k ' N / n,
Where N = size of universe
n = size of sample
k = sampling interval

Size of Sample
❖ An important decision that has to be taken in adopting a sampling technique is about the
size of the sample. Size of the sample means the number of sampling units selected from
the population to be investigated.
❖ Different opinions have been expressed by experts on this point. Some suggest that the
sample size should be 5 percent of the size of population while others are of the opinion
that the sample size should be at least 10 percent. However, these views are of little use
in practice because no hard and fast rule can be laid down that sample size should be 5
percent, 10 percent or 25 percent of the universe size.
❖ It may be provided out that mere size alone does not ensure representativeness. A smaller
sample, but well selected sample, may be superior to a larger but badly selected sample.
Similarly, if the size of the sample is small, it may not represent the universe and the
inference drawn about the universe may be misleading. On the other hand, if the size of
sample is very large, it may too burdensome financially, require a lot of time and may
have serious problems of managing it.
❖ Hence the sample size should neither be too small nor too large. It should be optimum.
Optimum size is that one that fulfils the requirements of 'efficiency', 'representativeness',
'reliability and 'flexibility'. The following factors should be considered while deciding
the size of sample

The Size of Universe


• The larger the size of universe, the bigger should be the sample size.
The Availability of Resources
• If the resources available are vast, a large sample size could be taken. However, in
most cases resources constitute a big constraint on sample size.
Degree of Accuracy or Precision Desired
• The greater the degree of accuracy desired, the larger should be the sample size.
However, it does not necessarily mean that bigger samples always ensure greater
accuracy. If the sample is selected by experts by following scientific method, it may
ensure better results even when it is small compared to a situation in which a large
sample size is selected by inexperienced people.

Homogeneity or Heterogeneity of University


• If the universe consists of homogenous units, a small sample may serve the purpose
but if the universe consists of heterogeneous units, a large sample may be inevitable.

14 | P a g e
SAMPLING THEORY

Nature of Study
• For an intensive and continuous study a small sample may be suitable. But for
studies, which are not likely to be repeated and are quite extensive in nature, a large
sample size may be required.

Method of Sampling Adopted


• The size of sample is also influenced by the type of sampling plan adopted. For
example, if the sample is a simple random sample, it may necessitate a bigger sample
size. However, in a properly drawn stratified sampling plan, even a small sample
may give a better result.
Nature of Respondent
• Where it is expected that a large number of respondents will not co-operate and send
back the questionnaire, a large sample should be selected.

Determination of Sample Size


• A number of formulae have been devised for determining the sample size depending
upon the availability of information.
Z
n = ( ----- )2
d
Where
n = sample size
z = value at a specified level of confidence or desired degree of precision
 = standard deviation of the population
d = difference between population mean and sample mean.

SAMPLING AND NON-SAMPLING ERRORS

❖ The error assign out due to drawing inferences about population on the basis of few
observations (sampling), is termed 'sampling error'.

❖ In the complete enumeration survey since the whole population is surveyed, sampling
error in this sense in non-existent. However, the mainly arising at the stage of
ascertainment and processing of data, which are termed non-sampling errors, are common
both in complete enumeration and sample surveys.

Sampling Errors: Even if utmost care has been taken in selecting a sample, the results derived
from a sample study may not be exactly equal to the true value in the population. The reason is
that estimate is based on a part and not on the whole and samples are seldom, if ever, perfect
miniature of the population. Hence sampling gives rise to certain errors known as sampling
errors. However, the errors can be controlled. The modern sampling theory helps in designing
the survey in such a manner that the sampling errors can be made small.
Sampling errors are of two types:
❖ biased, and

15 | P a g e
SAMPLING THEORY

❖ un-biased
Biased Errors: These errors arise from any bias in selection, estimation, etc. For example, if in
place of simple random sampling, deliberate sampling has been used in a particular case some
bias is introduced is the result and hence such errors are called sampling errors.
Un-biased Errors: These errors arise due to "chance" differences between the members of the
population included in the sample and those not included. An error in statistics is the difference
between the value of a statistic and that of the corresponding parameter.
❖ Thus the total sampling error is made up of errors due to bias, if any and the random
sampling error.
❖ The bias error, forms a constant component of error that does not decrease in large
population as the number of sample increases. Such error is also known as cumulative
or non-compensating error. The random sampling error, on the other hand, decreases,
on an average, as the size of sample increases. Such errors are, therefore, known as
non-cumulative or compensating error.
Causes of Bias: Bias may arise due to:
❖ Faulty process of selection;
❖ Faulty work during the collection; and
❖ Faulty methods of analysis

Faulty Selection: Deliberate selection of a 'representative' sample.


Substitution: Substitution of an item in place of one chosen in random sample some times lead
to bias.
Non response: It all the items to be included in the sample are not covered then there will
be bias even though no substitution has been attempted.
An appeal to the variety of the person questioned may give rise to yet another kind of bias. For
example, the question. Are you a good student? is such that most of the students would succumb
to variety and answer 'Yes'.
Bias Due to Faulty Collection of Data: Any consistent error in measurement will give rise to
bias whether the measurements are carried out on a sample or on all units of the population. The
danger of error is, however, likely to be greater in sampling work. Bias may arise due to improper
formulation of the decision, problem or strongly defining the population etc. Bias observation
may result from poorly designed questionnaire, ill trained interviewer, failure of a respondents
memory.

Bias in Analysis: In addition to bias, which arises from faulty process of selection and faulty
collection of information, faulty methods of analysis may also introduce such bias. Such bias
can be avoided by adopting the proper method of analysis.
Avoidance of Bias: If the possibility of bias exists, fully objective conclusion cannot be drawn.
The first essential of any sampling or census procedure must, therefore, be the elimination of all
sources of bias.

Method of Reducing Sampling Errors


Once the absence of bias has been ensured, attention should be given to the random
sampling errors.
Such errors must be reduced to the minimum so as to attain the desired accuracy.

16 | P a g e
SAMPLING THEORY

Apart from reducing errors of bias, the simplest way of increasing the accuracy of a
sample is to increase its size. The sampling error usually decreases with increase in
sample size, and infact in many situations the decrease is inversely proportional to the
square root of the sample size.
Sampling Error

Sample Size

From this diagram it is clear that though the reduction in sampling error is substantial for
initial increases in sample size, it becomes marginal after a certain stage. In other words,
considerably greater effort is needed after a certain stage to decrease the sampling error
this is the initial instance.
From this point of view it could be said that there is a strong case for resorting to a sample
survey to provide estimates within permissible margins of error instead of a complete
enumeration survey.

Non-Sampling Errors
As regards non-sampling errors they are likely to be more in case of complete enumeration
survey than in case of a sample survey. When a complete enumeration of units in the universe
is needs, one would expect that it would give rise to date free from errors. However, in practice
it is not so. For example, it is difficult to completely avoid errors of observation or ascertainment.
Similarly, in the processing of data, tabulation errors may be committed, affecting the final
result. Errors arising in this manner are termed as non-sampling errors. Non-sampling error can
occur at every stage of planning and execution of census or survey. Such errors can arise due to
a number of causes such as defective methods of data collection, and tabulation, faulty
definition, incomplete coverage etc. More specifically, non-sampling errors may arise from one
or more of the following factors:

Data specification may be inadequate and inconsistent with respect to the objectives of the study.
Inaccurate or inappropriate method of interview, observation or measurement with
inadequate on ambiguous schedules.
Lack of trained and experienced investigators.
Lack of inadequate inspection and supervision of
primacy staff.
Errors due to non-response.
Errors in data processing operations.
Errors committed during presentation and printing of tabulated results.

17 | P a g e
SAMPLING THEORY

Control of Non-Sampling Errors: In some situations, the non-sampling errors may be large
and deserve greater attention than sampling errors. While, in general, sampling error decrease
with increase in sample size, non-sampling error tends to increase with the sample size.
Increase of complete enumeration non-sampling errors and incase of sample surveys both
sampling and non- sampling errors require to be controlled and reduced at a level at which their
presence does not vitiate the use of final result.

Reliability of Samples: The reliability of samples can be tested in the following ways.
More samples of the same size should be taken from the same universe and their results be
compared. If the results are similar, the sample will be reliable.
If the measurements of the universe are known, then they should be compared with the
measurements of the sample. In case of similarity of measurements, the sample will be reliable.

Types of distribution

Population distribution
It refers to the distribution of the individual values of population. Its mean is denoted by ‘µ’

Sample distribution
It is the distribution of the individual values of a single sample. Its mean is generally written as “
x ”. it is not usually the same as µ

Distribution of Sample Means or sampling distribution


A sample of size n is taken from the parent population and mean of the sample is calculated. This
is repeated for a number of samples so that we have a distribution of sample means, which
approaches a normal distribution.

Standard errors of the mean


The series of sample means X 1 , X 2 , X 3 …….. is normally distributed or nearly so (according to
the central limit theorem). It can be described by its mean and its standard deviation. This standard
deviation is known as the standard error.

s
Standard error of the mean = S x =
n
Note: this formular is satisfactory for larger samples and a large population i.e. n > 30 and n > 5%
of N.
- The word ‘error’ is in place of ‘deviation’ to emphasize that variation among sample means is
due to sampling errors.
- The smaller the standard error the greator the precision of the sample value.

18 | P a g e
SAMPLING THEORY

Statistical inference
It is the process of drawing conclusions about attributes of a population based upon information
contained in a sample (taken from the population).
It is divided into estimation of parameters and testing of hypothesis. Symbols for statistic of
population parameters are as follows.

Sample Population
Statistic Parameter
Arithmetic mean x µ
Standard deviation s σ
Number of items n N

Statistical estimation

It is the procedure of using statistic to estimate a population parameter

It is divided into point estimation (where an estimate of a population parameter is given by a single
number) and interval estimation (where an estimate of a population is given by a range which the
parameter may be considered to lie),

e.g. a bus meant to take a class of 100 students (population N) for trip has a limit to the maximum
weight of 600kg of which it can carry, the teacher realizes he has to find out the weight of the class
but without enough time to weigh everyone he picks 25 students selected at random (sample n =
25).

These students are weighed and their average weight recorded as 64kg ( X - mean of a sample)
with a standard deviation (s), now using this the teacher intends to estimate the average weight of
the whole class (µ – population mean) by using the statistical parameters standard deviation (s),
and mean of the sample ( x ).

Characteristic of a good estimator


(i) Unbiased: where the expected value of the statistic is equal to the population
parameter e.g. if the expected mean of a sample is equal to the population mean
(ii) Consistency: where an estimator yields values more closely approaching the
population parameter as the sample increases
(iii) Efficiency: where the estimator has smaller variance on repeated sampling.
(iv) Sufficiency: where an estimator uses all the information available in the data
concerning a parameter

Confidence Interval
The interval estimates or a ‘confidence interval’ consists of a range (an upper confidence limit and
lower confidence limit) within which we are confident that a population parameter lies and we
assign a probability that this interval contains the true population value

19 | P a g e
SAMPLING THEORY

The confidence limits are the outer limits to a confidence interval. Confidence interval is the
interval between the confidence limits. The higher the confidence level the greater the confidence
interval. For example

A normal distribution has the following characteristic


i. Sample mean ± 1.960 σ includes 95% of the population
ii. Sample mean ± 2.588 σ includes 99% of the population

1.LARGE SAMPLES
These are samples that contain a sample size greater than 30(i.e. n>30)

(a) Estimation of population mean


Here we assume that if we take a large sample from a population then the mean of the population
is very close to the mean of the sample

Steps to follow to estimate the population mean includes


i. Take a random sample of n items where (n>30)
ii. Compute sample mean ( X ) and standard deviation (S)
iii. Compute the standard error of the mean by using the following formular
s
Sx =
n
where S x = Standard error of mean
S = standard deviation of the sample
n = sample size
iv. Choose a confidence level e.g. 95% or 99%
v. Estimate the population mean as under
Population mean µ = χ ± ‘appropriate number’ XS x
‘Appropriate number’ means confidence level e.g. at 95% confidence level is 1.96 this
number is usually denoted by Z and is obtained from the normal tables.

Example
The quality department of a wire manufacturing company periodically selects a sample of wire
specimens in order to test for breaking strength. Past experience has shown that the breaking
strengths of a certain type of wire are normally distributed with standard deviation of 200 kg. A
random sample of 64 specimens gave a mean of 6200 kgs. Find out the population mean of 95%
level of confidence

Solution
Population mean = χ ± 1.96 S x
Note that sample size is already n > 30 whereas s and x are given thus step i), ii) and iv) are
provided.
Here: X = 6200 kgs

20 | P a g e
SAMPLING THEORY

s 200
Sx = = = 25
n 64

Population mean = 6200 ± 1.96(25)


= 6200 ± 49
= 6151 to 6249
At 95% level of confidence, population mean will be in between 6151 and 6249

FINITE POPULATION CORRECTION FACTOR


If a given population is relatively of small size and sample size is more than 5% of the population
then the standard error should be adjusted by multiplying it by the finite population correction
factor
N −n
FPCF is given by =
n −1
where N = population size
n = sample size

Example
A manager wants an estimate of sales of salesmen in his company. A random sample 100 out of
500 salesmen is selected and average sales are found to be Shs. 75,000. if a sample standard
deviation is Shs. 15000 then find out the population mean at 99% level of confidence

Solution
Here N = 500, n = 100, X = 75000 and S = 15000
Now
Standard error of mean
s N −n
= Sx = x
n n −1

=
15000
x
(500 − 100)
100 (500 − 1)

15000 400
= x
10 499
15000
= (0.895)
10

Sx = 1342.50 at 99% level of confidence

Population mean = X ± 2.58 S x


=shs 75000 ± 2.58(1342.50)
=shs 75000 ± 3464
= Shs 71536 to 78464

21 | P a g e
SAMPLING THEORY

b).Estimation of difference between two means


We know that the standard error of a sample is given by the value of the standard deviation
(σ)divided by the square root of the number of items in the sample ( n ).
But, when given two samples, the standard errors is given by
S A2 S B2
S (X )= +
A−X B n A nB
Also note that we do estimate the interval not from the mean but from the difference between the
two sample means i.e. (X A − X B ) .
The appropriate number of confidence level does not change
Thus the confidence interval is given by;
(X A − X B ) ± Confidence level S(X − X )A B
= (X A − X B ) ± Z S (X − X )
A B

Example
Given two samples A and B of 100 and 400 items respectively, they have the means X 1 = 7 ad
X 2 = 10 and standard deviations of 2 and 3 respectively. Construct confidence interval at 70%
confidence level?

Solution
Sample A B
X1 = 7 X 2 = 10
n1 = 100 n2 = 400
S1 = 2 S2 = 3
The standard error of the samples A and B is given by
4 9
S (X − X ) = +
A B 100 400
25 5
= =
400 20

=¼ = 0.25

At 70% confidence level, then appropriate number is equal to 1.04 (as read from the normal tables)
X 1 − X 2 = 7 – 10 = - 3 = 3
We take the absolute value of the difference between the means e.g. the value of /X/ = absolute
value of X i.e. a positive value of X.
Confidence interval is therefore given by
= 3± 1.04 (0.25 ) From the normal tables a z value of 1.04 gives a value of 0.7.

= 3± 0.26

= 3.26 and 2.974

22 | P a g e
SAMPLING THEORY

Thus 2.974 ≤ X ≤ 3.26

Example 2
A comparison of the wearing out quality of two types of tyres was obtained by road testing.
Samples of 100 tyres were collected. The miles traveled until wear out was recorded and the results
given were as follows
Tyres T1 T2
Mean X 1 = 26400 miles X 2 = 25000 miles
Variance S 1= 1440000 miles S22= 1960000 miles
2

Find a confidence interval at the confidence level of 70%

Solution
X 1 = 26400
X 2 = 25000
Difference between the two means
( )
X 1 − X 2 = (26400 – 25000)
= 1,400
Again we take the absolute value of the difference between the two means
We calculate the standard error as follows
S12 S 22
S (X − X ) = +
A B n1 n2

1, 440, 000 1,960, 000


= +
100 100

= 184.4
Confidence level at 70% is read from the normal tables as 1.04 (Z = 1.04).
Thus the confidence interval is calculated as follows
= 1400 ± (1.04) (184.4)

= 1400 ± 191.77

or (1400 – 191.77) to (1400 + 191.77)

1,208.23 ≤ X ≤ 1591.77

a) Estimation of population proportions


This type of estimation applies at the times when information cannot be given as a mean or as a
measure but only as a fraction or percentage
The sampling theory stipulates that if repeated large random samples are taken from a population,
the sample proportion “p’ will be normally distributed with mean equal to the population
proportion and standard error equal to

23 | P a g e
SAMPLING THEORY

Pq
Sp = = Standard error for sampling of population proportions
n
Where n is the sample size and q = 1 – p.
The procedure for estimating a proportion is similar to that for estimating a mean, we only have a
different formular for calculating standard error is different.

Example 1
In a sample of 800 candidates, 560 were male. Estimate the population proportion at 95%
confidence level.

Solution
Here
560
Sample proportion (P) = = 0.70
800
q = 1 – p = 1 – 0.70 = 0.30
n = 800
pq
=
( 0.70 )( 0.30 )
n 800

Sp = 0.016

population proportion
= P ± 1.96 Sp where 1.96 = Z.
= 0.70 ± 1.96 (0.016)

= 0.70 ± 0.03

= 0.67 to 0.73

= between 67% to 73%

Example 2
A sample of 600 accounts was taken to test the accuracy of posting and balancing of accounts
where in 45 mistakes were found. Find out the population proportion. Use 99% level of confidence

Solution
Here
45
n = 600; p = = 0.075
600
q = 1 – 0.075 = 0.925

24 | P a g e
SAMPLING THEORY

Sp =
pq
=
( 0.075)( 0.925)
n 600

= 0.011

Population proportion
= P ± 2.58 (Sp)
= 0.075 ± 2.58 (0.011)

= 0.075 ± 0.028

= 0.047 to 0.10

= between 4.7% to 10%

b) Estimation of difference between population proportions


Let the two proportions be given by P1 and P2, respectively
Then the difference (absolute) between the two proportions is given by (P1 – P2)
The standard error is given by
pq pq p n + p2 n2
S (P − P ) = + where p = 1 1 and q = 1 - p
1 2 n1 n2 n1 + n2
Then given the confidence level, the confidence interval between the two population proportions
is given by
(P1 – P2) ± Confidence level S (P − P )
1 2

pq pq
= (P1 – P2) ± Z +
n1 n2
p1n1 + p2 n2
Where P = always remember to convert P1 & P2 to P.
n1 + n2

2.SMALL SAMPLES
(a) Estimation of population mean
If the sample size is small (n<30) the arithmetic mean of small samples are not normally
distributed. In such circumstances, students t distribution must be used to estimate the population
mean.
In this case
Population mean µ = X ± tsx
X = Sample mean
s
Sx =
n

25 | P a g e
SAMPLING THEORY

 ( x − x)
2

S = standard deviation of samples = for small samples.


n −1
n = sample size
v = n – 1 degrees of freedom.
The value of t is obtained from students t distribution tables for the required confidence level

Example
A random sample of 12 items is taken and is found to have a mean weight of 50 grams and a
standard deviation of 9 grams
What is the mean weight of population
a) with 95% confidence
b) with 99% confidence

Solution
s 9
X = 50; S = 9; v = n – 1 = 12 – 1 = 11; Sx = =
n 12
µ = x’ ± tsx

At 95% confidence level


 9 
µ = 50 ± 2.262  
 12 

= 50 ± 5.72 grams

Therefore we can state with 95% confidence that the population mean is between 44.28 and
55.72 grams
At 99% confidence level
 −9 
µ = 50 ± 3.25  
 12 

= 50 ± 8.07 grams

Therefore we can state with 99% confidence that the population mean is between 41.93 and
58.07 grams

Note: To use the t distribution tables it is important to find the degrees of freedom (v = n – 1). In
the example above v = 12 – 1 = 11
From the tables we find that at 95% confidence level against 11 and under 0.05, the value of t =
2.201

26 | P a g e

You might also like