Measuring Public Opinion - Unit 6 Estimation

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

Unit 6: Estimation

If a probability sample was drawn, valid estimates can be computed,


and also, the accuracy of the estimates can be computed (e.g., in the
form of margins of error).
To compute an estimate, an estimator is required.
An estimator is a procedure, a recipe, describing how to compute an
estimate.
The recipe also makes clear which ingredients are required. Of course,
the researcher must have the sample values of the target variables. An
estimator is only meaningful if it produces estimates that are close to
the population characteristics to be estimated.
Therefore, a good estimator must satisfy the following two conditions:
1) An estimator must be unbiased
Suppose, the process of drawing a sample and computing an estimate is repeated a
large number of times. As random samples are drawn, each turn will result in a
different sample. Therefore, the estimate will also be different. The estimator is
unbiased if the average of all the estimates is equal to the population characteristic
to be estimated (in other words is equal to the true population value). To say it in
different words: The estimator must not systematically over- or underestimate the
true population value to be unbiased.

The term valid is related to the term unbiased.


A measuring instrument is called valid if it measures what it intends to measure.
Hence, an unbiased estimator is a valid measuring instrument, because it estimates
what it intends to estimate which is the true value of population.
2) An estimator must be precise.
All estimates (values of the estimator in all drawn samples) obtained
by repeatedly selecting a sample must be close to each other. In
other words: The variation [ Var(estimator) ] in the outcomes of the
estimator (i.e. the estimates) must be as small as possible.
The term reliable is related to the term precise.
A measuring instrument is called reliable if repeated use leads to
(approximately) the same value. Hence, a precise estimator is a
reliable measurement instrument, because all possible estimates are
close to each other.
Case study:
A poll was conducted to know whether government employees
prefer to receive their salaries via ATM machine or not.
The population consists of 100,000 government employees living in
greater Cairo. A simple random sample of 100 was drawn from the
population.
Are we sure that the percent preferring ATM calculated from the
sample equals the true percent in the population?

Not necessarily!
Let’s do an experiment to evaluate the results.
Assume that the true percent of employees who would like to receive
their salaries via ATM is 62.5% of the population [Note that this
results is unknown].
Step 1: Draw a simple random sample of 1000 employees and
calculate the percent, assume that the result was 66.8%.
Step 2: Repeat step 1 many times (50 times) and you will end up with
50 estimates that might differ.
Means of simulated 50 SRS each having size 50.

66.8 66.8 64.8 70.9 67.6


61.0 63.4 63.5 57.6 58.4
63.0 63.2 59.8 62.8 61.3
62.3 59.9 59.9 57.0 58.0
73.7 78.1 75.1 70.2 75.9
59.7 62.2 61.0 62.2 61.4
58.0 60.1 59.2 64.9 59.8
61.0 65.5 66.7 58.2 61.0
56.6 60.8 64.5 61.8 59.4
60.6 56.8 63.7 63.8 63.8
‫ ﯾﻌﻧﻲ‬simple random sample ‫ﻛل اﻟﻛﻼم اﻟﻠﻲ ﺟﺎي ده ﺗﺣت اﻓﺗراض اﻧﻧﺎ ﺑﻧﺎﺧد‬
‫ ﯾﻌﻧﻲ اﻟﻠﻲ أﺧذﺗﮫ ﻣرة ﻣش ھﺎﺧده‬without replacement ‫ وﻛﻣﺎن‬probability sampling
population is INFINITE i.e. "N" is very large ‫ﺗﺎﻧﻲ وﻛﻣﺎن ال‬
Generally:
If the sample is a probability sample, and people are selected with equal probabilities
(i.e. if a simple random sample is drawn), then a population characteristic can be
estimated by computing the corresponding sample characteristic.

Estimating a population percentage (Proportion):


if a simple random sample is drawn, then the sample percentage “p” is a good estimator
(i.e. Unbiased and precise) for the population percentage “P”.

The sample percentage ”p” is (under simple random sampling without replacement) an
unbiased estimator of the population percentage ”P”. proved
And replace P by p
Standard normal distribution

95%
A margin of error of 4.4 means that the percentage of students who prefer to have
afternoon exams cannot differ more than 4.4 from its estimate 40%, i.e., the percentage of
university students who prefer to have afternoon exams is somewhere between 35.6% and
44.4%.

This interval is called the confidence interval.


Its lower bound is obtained by subtracting the margin of error “e” from the estimate “p”
its upper bound is obtained by adding the margin of error ”e” to the estimate “p”.

Statements about the margin of error or the confidence interval always have an element
of uncertainty.

The correct statement is that


the confidence interval [lowe limit , upper limit] contains the true population value of % or
proportion with a high probability (confidence level). This probability is here equal to 95%.
Therefore, it is better to call this confidence interval the 95% confidence interval (i.e. a
confidence interval with a confidence level of 95% and significance level of 5%)
Use of a 95% confidence interval means that the confidence interval
contains the true population value with a probability of 95%.
This implies that in 5% of the cases, a wrong conclusion is drawn from
the confidence interval (the true value lies in the interval while this is
not the case).
If a higher confidence level is required, one can, for example, compute
a 99% confidence interval. This interval is obtained by replacing the
value 1.96 in the above-mentioned formulae by the value 2.58.
The sample size “n” needed for estimating a population mean:

If a simple random sample was drawn, the sample mean (X-Bar) is a good estimator (i.e.
precise and unbiased) for the population mean (MU).

The sample mean (X-bar) is an unbiased estimator of the population mean (MU) (under
simple random sampling without replacement). This can be proved mathematically, but it
can also be shown by carrying out a simulation:
construct a fictitious population,
select many samples,
compute the estimate for each sample, and,
see how the estimates behave.
assuming we have an infinite population i.e. “N” is very large, then 1/N will be
approximately ZERO that is why (estimated variance of X-bar) will = 1/n * S^2
It is not so simple to determine the sample size as it depends on a
number of different factors. It was already shown that there is a
relationship between the precision of estimators and the sample size:
the larger the sample, the more precise the estimators. Therefore, the
question about the sample size can only be answered if it is clear how
precise the estimators must be. Once the precision has been
specified, the sample size can be computed. A very high precision
nearly always requires a large sample. However, a large poll will also
be costly and time consuming. Therefore, the sample size will in many
practical situations be a compromise between costs and precision.
Sample size should be determined given the following points:
1) Nature of universe: Universe may be either homogenous or
heterogenous in nature. If the items of the universe are
homogenous, a small sample can serve the purpose. But if the
items are heterogenous, a large sample would be required.
Technically, this can be termed as the dispersion factor.
2) Number of classes proposed: If many class-groups (groups and
sub-groups) are to be formed, a large sample would be required
because a small sample might not be able to give a reasonable
number of items in each class-group.
3) Nature of study: If items are to be intensively and continuously
studied, the sample should be small. For a general survey the size
of the sample should be large, but a small sample is considered
appropriate in technical surveys.
4) Type of sampling: Sampling technique plays an important part in
determining the size of the sample. A small random sample is apt
to be much superior to a larger but badly selected sample.
5) Standard of accuracy and acceptable confidence level: If the
standard of accuracy or the level of precision is to be kept high, we
shall require relatively larger sample
6) Availability of finance: In practice, size of the sample depends
upon the amount of money available for the study purposes. This
factor should be kept in view while determining the size of sample
for large samples result in increasing the cost of sampling
estimates.
7) Other considerations: Nature of units, size of the population, size
of questionnaire, availability of trained investigators, the
conditions under which the sample is being conducted, the time
available for completion of the study are a few other
considerations to which a researcher must pay attention while
selecting the size of the sample.
Examples of free sample size calculator tools
▪ ClinCalc LLC. Sample Size Calculator: https://clincalc.com/stats/SampleSize.aspx
• A free online sample size calculator
▪ Epi Info™ is a free software that can be downloaded from the Centers for Disease Control and
Prevention (CDC) website at: https://www.cdc.gov/epiinfo.
• Watch the Epi Info™ 7 Tutorial Videos.
▪ OpenEpi.com: https://www.openepi.com
• An open-source web tool that provides epidemiologic statistics.
▪ PS: Power and Sample Size Calculation:
https://biostat.app.vumc.org/wiki/Main/PowerSampleSize
▪ A free interactive program for performing power and sample size calculations
▪ StatCalc: https://www.cdc.gov/epiinfo/user-guide/statcalc/statcalcintro.html
• A utility tool in Epi Info™ and statistical calculator that produces summary epidemiologic information.
• Six types of calculations are available including Sample Size and Power calculations for Population Survey, Cohort
or Cross-Sectional, and Unmatched Case-Control.
▪ STEPS Sample Size Calculator and Sampling Spreadsheet
References:
Bethlehem (2017) Chapter 6.

Sampling Distributions: Introduction to the Concept


https://www.youtube.com/watch?v=Zbw-YvELsaM

The Sampling Distribution of the Sample Proportion


https://www.youtube.com/watch?v=fuGwbG9_W1c
Thank you

You might also like