CH Ii Business Stat
CH Ii Business Stat
CH Ii Business Stat
CHAPTER TWO
STATISTICAL ESTIMATIONS
Introduction
The sampling distribution of the mean shows how far sample means could be from a known
population mean. Similarly, the sampling distribution of the proportion shows how far sample
proportions could be from a known population proportion. In estimation, our aim is to determine
how far an unknown population mean could be from the mean of a simple random sample
selected from that population; or how far an unknown population proportion could be from a
sample proportion. These are the concerns of statistical inference, in which a statement about an
unknown population parameter is derived from information contained in a random sample
selected from the population.
One aspect of inferential statistics is estimation, which is the process of estimating the value of a
parameter from information obtained from a sample. Since the populations from which these
values were obtained are large, these values are only estimates of the true parameters and are
derived from data collected from samples. An important question in estimation is that of sample
size. How large should the sample be in order to make an accurate estimate? This question is not
easy to answer since the size of the sample depends on several factors, such as the accuracy
desired and the probability of making a correct estimate. Inferential statistical techniques have
various assumptions that must be met before valid conclusions can be obtained. One common
assumption is that the samples must be randomly selected. The other common assumption is that
either the sample size must be greater than or equal to 30 or the population must be normally or
approximately normally distributed if the sample size is less than 30. To check this assumption,
you can use the methods such as the histogram to see if it is approximately bell-shaped, check for
outliers, and if possible, generate a normal quartile plot and see if the points fall close to a
straight line. (Note: An area of statistics called nonparametric statistics does not require the
variable to be normally distributed.) Some statistical techniques are called robust. This means
that the distribution of the variable can depart somewhat from normality, and valid conclusions
can still be obtained. The statistical procedures for estimating the population mean, proportion,
variance, and standard deviation and how large a sample size have been explained in this
chapter.
Estimation: is the process of using statistics as estimates of parameters. It is any procedure
where sample information is used to estimate/ predict the numerical value of some
population measure (called a parameter).
Statistical inference - is the process of using limited information, a sample, for the purpose
of reaching conclusion about a large set of data, the population.
A statistic-is a summary measure that is computed to describe a characteristic for only a
sample of the population.
Estimator- refers to any sample statistic that is used to estimate a population parameter.
E.g. x for μ , p for p.
Estimate- is a specific numerical value of our estimator. E.g. x= 9, 2, 5
AMU Department of Accounting and Finance Page 1 of 26
BUSINESS Statistics chapter TWO 2014
2
x, p,s ,s …………….estimators
2
μ, p ,σ ,σ ………… items being estimated/parameters
1, 0.5, 9, 3 ………… estimates
In inferential statistics, μ is called the true population mean and p is called the true population
proportion. There are many other population parameters, such as the median, mode, variance,
and standard deviation.
Examples of estimation:
Mean fuel consumption for a particular model of a car
Average time taken by new employees to learn a job
Mean housing expenditure per month incurred by households
If we can conduct a census each time we want to find the value of a population parameter, then
the estimation procedures are not needed.
Example, if the Ethiopian Census Bureau can contact every household in the Ethiopian to find
the mean housing expenditure of households, the result of the survey will actually be a census.
However, conducting a census:
is too expensive,
very time consuming,
virtually impossible to contact every member of a population
The estimation procedure involves the following steps.
Select a sample.
Collect the required information from the members of the sample.
Calculate the value of the sample statistic.
Assign value(s) to the corresponding population parameter.
Types of estimates:
We can make two types of estimates about a population: a point estimate and an interval
estimate.
A. Point estimate
A point estimate: - is a specific (a single) numerical value used to estimate a parameter. It is a
single value that is measured from a sample and used as an estimate of the corresponding
population parameter. The best point estimate of the population mean μ is the sample mean X .
The most important point estimates (given that they are single values) are:
o Sample mean ( x ) for population mean ( μ ) ;
o Sample proportion ( p ) for population proportion ( p ) ;
o Sample variance ( s ) for population variance ( σ ) and
2 2
X=
∑ X =30 =5
The sample mean is n 6
The estimator is X , and 5 is the point estimate of the unknown population mean.
Point estimate of the population proportion
Sample proportion p is an estimator of population proportion, P;-population proportion P is
equal to the number of elements in the population belonging to the category of interest divided
X x
by the total number of elements in the population P= N , Sample proportion, p= n
X is the number of elements in the sample found to belong to the category of interest and n is the
sample size.
The above array contains two even numbers 2 and 4. Calling the even numbers success, the
2 1
P= =
sample proportion of success is: 6 3
1
The statistic P is an estimator of the unknown population proportion of success and 3 is a
point estimate of the population proportion.
Point estimate of the unknown population standard deviation
We will use the symbol S to mean an estimate of the unknown population standard deviation σ.
The estimator, called sample standard deviation, is defined by the formula
S=
Where
√ ∑ ( X −X )2
n−1
X = sample mean
n= sample size
For the random sample 1, 2, 4, 5, 7, 11 write the symbol for and compute the sample standard
deviation.
Solution
S=
√ ∑ ( X −X )2
n−1
S=
√
(1−5 )2+(2−5 )2 +(4−5 )2 +(5−5)2 +(7−5)2 +(11−5 )2
6−1
Point Estimator of Standard Error of the Mean
=3.633
δ
δx=
Standard error of the mean is computed by the formula √n when the sample size is less
than 5 % of the population size. In our case, the total size of the population is unknown; therefore
it is safer to assume that the sample is less than 5% of the entire population. Hence, we will use
s
the estimator √ n to estimate the standard error δ X . The symbol S X is called the sample
standard error of the mean. The formula for
S X is
S
S X=
√n Where
S X = Sample standard deviation
n= sample size
S σ .
Thus, S is the estimator for σ, and X is the estimator for X
We have calculated S= 3.633 for the random sample of 1, 2, 4, 5, 7, 11. The sample standard
error can be obtained using the formula
S 3. 633
S X= =1.483
√n = √6
A Point Estimate of Sample Standard Error of the Proportion
Standard error of the proportion answers how far an unknown population proportion
S P=
√ pq
n Where p = sample proportion of success q=1−p
and
n= sample size
Example 2.2
Let an even number be success, and suppose a sample of 200 numbers be selected randomly
from a population that contains 120 even numbers. Write the symbol for and compute the value
of the point estimator of the standard error of the proportion.
S p=
√ √
P q 0 .6 x 0. 4
n
=
200
=0 .0346
The following table shows some population parameters and their estimators.
Population parameter sample statistic
Mean μ X
Standard deviation σ S
Variance σ2 S2
Proportion P P
Standard error of the mean
δ μ SX
B. Interval Estimate
As stated in the above, the sample mean will be, for the most part, somewhat different from the
population mean due to sampling error. Therefore, you might ask a second question: How good
is a point estimate? The answer is that there is no way of knowing how close a particular point
estimate is to the population mean. This answer places some doubt on the accuracy of point
estimates. For this reason, statisticians prefer another type of estimate, called an interval
estimate.
An interval estimate of a parameter is an interval or a range of values used to estimate the
parameter. It is a range of values that conveys the fact that estimation is an uncertain
process.This estimate may or may not contain the value of the parameter being estimated.
Interval estimate states the range within which a population parameter probably lies. Stated
differently, an interval estimate is a range of values with in which the analyst can declare with
some confidence that the population parameter will fall. The interval with in which a population
parameter is expected to lie is usually referred to as the confidence interval. The probability that
a parameter lies within the specified interval estimate of the parameter is called confidence level
Confidence interval is a specific interval estimate of a parameter determined by using data
obtained from a sample and the specific confidence level of the estimate. The confidence interval
for the population mean is the interval that has a high probability of containing the population
mean,
In an interval estimate, the parameter is specified as being between two values. For example, an
interval estimate for the above example page 3, if we state that the mean, μ , is between x±2 ,
the range of values from 3 (5-2) to 7 (5+2) is an interval estimate.
Either the interval contains the parameter or it does not. A degree of confidence (usually a
percent) can be assigned before an interval estimate is made. Three common confidence intervals
are used: the 90%, the 95%, and the 99% confidence intervals.
A 95% confidence interval means that about 95% of the similarly constructed intervals will
contain the parameter being estimated. If we use the 99% confidence interval we expect about
99% of the intervals to contain the parameter being estimated.
σ σ
Zα/2 ( / n) Zα/2 ( / n)
X
Distribution of ’s
When n≥30, can be substituted for σ but a different distribution is used.
The margin of error also called the maximum error of the estimate is the maximum likely
difference between the point estimate of a parameter and the actual value of the parameter. A
more detailed explanation of the margin of error follows in the following examples which
illustrate the computation of confidence intervals.
Assumptions for Finding a Confidence Interval for a Mean When σ is Known
1. The sample is a random sample.
2. Either n ≥ 30 or the population is normally distributed if n <30.
Example 2.3
A researcher wishes to estimate the number of days it takes an automobile dealer to sell a
Chevrolet Aveo. A sample of 50 cars had a mean time on the dealer’s lot of 54 days. Assume the
population standard deviation to be 6.0 days. Find the best point estimate of the population mean
and the 95% confidence interval of the population mean.
Solution
The best point estimate of the mean is 54 days. For the 95% confidence interval use z =1.96.
X -Zα/2( σ / μ X σ
n)< < + Zα/2 ( / n )
54-1.96(6.0/ 50 ) < μ < 54+ 1.96 (6.0/ 50 )
54 – 1.7 < µ < 54 + 1.7
52.3 < µ < 55.7 or 54 ± 1.7
Hence one can say with 95% confidence that the interval between 52.3 and 55.7 days does
contain the population mean, based on a sample of 50 automobiles.
Example 2.4
A survey of 30 emergency room patients found that the average waiting time for treatment was
174.3 minutes. Assuming that the population standard deviation is 46.5 minutes, find the best
point estimate of the population mean and the 99% confidence of the population mean.
Solution
The best point estimate is 174.3 minutes. The 99% confidence is interval is
X -Zα/2( σ / n ) < μ X + Zα/2 ( σ / n )
<
174.3-2.58 (46.5/ 50 ) < μ < 174.3+ 2.58 (46.5/ 50 )
Step 3 Find zα/2. Subtract 0.05 from 1.000 to get 0.9500. The corresponding z value obtained
from table is 1.65. (Note: This value is found by using the z value for an area between
0.9495 and 0.9505. A more precise z value obtained mathematically is 1.645 and is
sometimes used; however, 1.65 will be used.
Step 4 Substitute in the formula
X -Zα /2( σ / n ) < μ X + Zα/2 ( σ / n )
<
11.091-1.65 (14.405/ 50 ) < μ < 11.091+ 1.65 (14.405/ 50 )
11.091 – 4.339 < µ < 11.091 + 4.339
6.752 < µ < 15.430 or 11.091 ± 4.339
Hence, one can be 90% confident that the population mean of the assets of all credit unions is
between $6.752 million and $15.430 million, based on a sample of 30 credit unions. Another
way of looking at a confidence interval is shown below.
According to the central limit theorem, approximately 95% of the sample means fall within 1.96
standard deviations of the population mean if the sample size is 30 or more, or if s is known
when n is less than 30 and the population is normally distributed. If it were possible to build a
confidence interval about each sample mean, for µ, 95% of these intervals would contain the
population mean, as shown in Hence, you can be 95% confident that an interval built around a
specific sample mean would contain the population mean. If you desire to be 99% confident, you
must enlarge the confidence intervals so that 99 out of every 100 intervals contain the population
mean.
Since other confidence intervals (besides 90, 95, and 99%) are sometimes used in statistics, an
explanation of how to find the values for Zα/2 is necessary. As stated previously, the Greek letter
α represents the total of the areas in both tails of the normal distribution. The value for α is found
by subtracting the decimal equivalent for the desired confidence level from 1. For example, if
you wanted to find the 98% confidence interval, you would change 98% to 0.98 and find α = 1 -
0.98, or 0.02. Then a/2 is obtained by dividing a by 2. So α/2 is 0.02/2, or 0.01. Finally, Z0.01 is
the Z value that will give an area of 0.01 in the right tail of the standard normal distribution
curve. See the following curves.
-zα/2 zα/2
Finding α/2 for a 98%Confidence Interval
Finding zα/2 for a 98% confidence interval from cumulative standard normal distribution z= 2.33
is 0.9901. Once α/2 is determined, the corresponding Zα/2 value can be found by using the
following procedures. To get the Zα/2 value for a 98% confidence interval, subtract 0.01 from
1.0000 to get 0.9900. Next, locate the area that is closest to 0.9900 (in this case, 0.9901) in Table
and then find the corresponding Z value. In this example, it is 2.33. For confidence intervals,
only the positive z value is used in the formula.
When the original variable is normally distributed and σ is known, the standard normal
distribution can be used to find confidence intervals regardless of the size of the sample. When n
≥ 30,the distribution of means will be approximately normal even if the original distribution of
the variable departs from normality.
When σ is unknown, s can be used as an estimate of but a different distribution is used for the
critical values. This method is explained in the following section
Confidence estimate of µ, normal population, σ unknown
When σ is known and the sample size is 30 or more, or the population is normally distributed if
the sample size is less than 30, the confidence interval for the mean can be found by using the z
σ
X ±Z α /2
distribution we search for Z value of /2 and use the formula √ n to estimate the
interval within which the population mean lies with C Confidence coefficient.However, most of
the time, the value of σ is not known, so it must be estimated by using s, namely, the standard
deviation of the sample.
S=
√ ∑ ( X −X )2
n−1
After calculate the standard deviation, standard error must be computed using the following
formula.
S
S x=
√n
When population standard deviation known, the interval estimate can be calculated as
X−μ
Z=
σx
When s is used, especially when the sample size is small, critical values greater than the values
for are used in confidence intervals in order to keep the interval at a given level, such as the 95%.
These values are taken from the Student t distribution, most often called the t distribution, which
was identified for the first time by W. S. Gosset in 1900s.
To use this method, the samples must be simple random samples, and the population from which
the samples were taken must be normally or approximately normally distributed, or the sample
size must be 30 or more. Some important characteristics of the t distribution are described below.
Characteristics of the t Distribution
The t distribution shares some characteristics of the normal distribution and differs from it in
others. The t distribution is similar to the standard normal distribution in the following ways:
1. It is bell-shaped.
2. It is symmetric about the mean.
3. The mean, median, and mode are equal to 0 and are located at the center of the distribution.
4. The curve never touches the x axis.
The t distribution differs from the standard normal distribution in the following ways:
1. The variance is greater than 1.
2. The t distribution is actually a family of curves based on the concept of degrees of freedom,
which is related to sample size.
3. As the sample size increases, the t distribution approaches the standard normal distribution.
When population standard deviation is unknown, we need to estimate population standard
deviation with sample standard deviation and the distribution does not follow normal distribution
rather it follows a student’s t-distribution. There are different t-distributions for each sample size.
T-distribution is discussed in a greater detail in hypothesis test. Tail areas for t-distribution are
presented according to parameter called degrees of freedom. Many statistical distributions use
the concept of degrees of freedom, and the formulas for finding the degrees of freedom vary for
different statistical tests. The degrees of freedom are the number of values that are free to vary
after a sample statistic has been computed, and they tell the researcher which specific curve to
use when a distribution consists of a family of curves.
For example, if the mean of 5 values is 10, then 4 of the 5 values are free to vary. But once 4
values are selected, the fifth value must be a specific number to get a sum of 50, since 50 ÷5 =10.
Hence, the degrees of freedom are 5-1=4, and this value tells the researcher which t curve to use.
See the following curve
X=
∑X 95
=9 .5
Solution n = 10
S=
√ ∑ ( X −X )2
n−1 = √
94 .5
9 =3.24
The confidence level is 95%. Therefore, significance level α = 1-C= 1-0.95= 0.05 and
/2=0.025.
Next, we have to calculate the degree of freedom for the observation; which is given as ν=n-1=
10-1=9
S S
X −t α /2 , ν ≤μ≤ X+t α /2 , ν
We can now calculate the interval as √n √ n . t α/2,ν in this specific
situation means t0.025, 9 = 2.26
Therefore Interval can be calculated as:
3. 24 3 .24
9.5−2 .26 ( )≤μ≤9 . 5+2. 26( )
√ 10 √ 10 = 7.2≤μ≤11.8
Example 2.8
Ten randomly selected people were asked how long they slept at night. The mean time was 7.1
hours, and the standard deviation was 0.78 hour. Find the 95% confidence interval of the mean
time. Assume the variable is normally distributed.
Solution
Since σ is unknown and s must replace it, the t distribution table must be used for the confidence
interval. Hence,with 9 degrees of freedom tα/2 = 2.262. The 95% confidence interval can be
found by substituting in the formula.
S S
X −t α /2 , ν ≤μ≤ X+t α /2 , ν
√n √ n . t α/2,ν
0. 78 0. 78
7 .1−2. 262( )<μ<7 . 1+2. 262( )
√10 √10 = 6 .54<μ<7 . 66
Therefore, one can be 95% confident that the population mean is between 6.54 and 7.66 inches.
Example 2.9
The data represent a sample of the number of home fires started by candles for the past several
years. (Data are from the National Fire Protection Association.) Find the 99% confidence
interval for the mean number of home fires started by candles each year.
5460 5900 6090 6310 7160 8440 9930
Solution
Step 1 Find the mean and standard deviation for the data. The mean X = 7041.4. The standard
deviation s =1610.3.
Step 2 Find tα/2 in table. Use the 99% confidence interval with d.f. =6. It is 3.707.
Step 3 Substitute in the formula and solve.
S S
X −tα /2 ≤μ≤X +t α /2
√n √n .
1610 .3 1610 .3
7041. 4−3 . 707( )<μ<7041 . 4+3 . 707( )
√7 √ 7 = 4785 . 2<μ <9297 .6
One can be 99% confident that the population mean number of home fires started by candles
each year is between 4785.2 and 9297.6, based on a sample of home fires occurring over a period
of 7 years.
Students sometimes have difficulty deciding whether to use Zα/2 or tα/2 values when finding
confidence intervals for the mean. As stated previously, when σ is known, Zα/2 values can be
used no matter what the sample size is, as long as the variable is normally distributed or n ≥ 30.
When σ is unknown and n ≥ 30, then s can be used in the formula and tα/2 values can be used.
Finally, when σ is unknown and n < 30, s is used in the formula and tα/2 values are used, as long
as the variable is approximately normally distributed.
It may be difficult, sometimes, to know if the population is normally distributed or not. Hence,
we may need to use approximation. You may remember the central limit theorem. The Central
limit theorem proves that as sample sizes increases the distribution approaches normal
distribution. In fact for n greater than or equal to 30 statisticians use normal distribution. Hence,
we can use the Central limit theorem to construct interval estimate for a mean when sample size
is
A one-sided confidence interval can be found for a mean by using
S S
μ > x −t α / 2 or μ < x +t α / 2
√n √n .
Where tα is the value found under the row labeled One tail.
Determination of Sample Size
Sample size determination is closely related to statistical estimation. One reason behind sampling
is to reduce the cost of data collection. If we conduct a census study the cost we incur to collect
data will be prohibitively high. Therefore, we have to take small sample to hold costs dawn. On
the other hand, we want to the sample to be large enough to provide good estimator of population
proportion. Consequently, the issue is how large should the sample size be? The size of the
sample depends on three factors:
How precise or narrow we want the interval estimate to be
How confident we want to be that the interval estimate is correct
How variable is the population being sampled
The higher the desired precision or level of confidence, the larger will be the sample; also for a
given precision and level of confidence, the larger the population variability is, the larger will be
the sample. Quite often you ask, how large a sample is necessary to make an accurate estimate?
The answer is not simple, since it depends on three things: the margin of error, the population
standard deviation, and the degree of confidence. For example, how close to the true mean do
you want to be (2 units, 5 units, etc.), and how confident do you wish to be (90, 95, 99%, etc.)?
For the purpose of this, it will be assumed that the population standard deviation of the variable
is known or has been estimated from a previous study.
Sample Size for Estimating a Population Mean
σ σ
X −Z α /2 ≤μ≤ X−Z α /2
The confidence interval estimate of μ, √n √n
σ
X ±Z α /2
Can be rewritten as √ n this can be expressed as X ±e .
Therefore, the formula for sample size is derived from the margin of error formula is
σ
e= Z α / 2
√n
and this formula is solved for n as follows:
e √ n=Z α /2 (σ )
2
Z ∗σ Z ∗σ Z
2
α/2
σ2
√ n= α /2 n= α /2 n=
e e e2
, Minimum Sample Size Needed for an Interval Estimate of the Population Mean
Therefore
2
Z α /2∗σ Z
2
α/2
σ2
n= n=
e e2
Where e is the margin of error. If necessary, round the answer up to obtain a whole number.
That is, if there is any fraction or decimal portion in the answer, use the next whole number
for sample size n
.10.
Example 2
A scientist wishes to estimate the average depth of a river. He wants to be 99% confident that the
estimate is accurate within 2 feet. From a previous study, the standard deviation of the depths
measured was 4.33 feet.
Solution Since α =0.01 (or 1 -0.99), Zα/2 =2.58 and E = 2. Substituting in the formula,
2 2
Z ∗σ (2. 58)( 4 .33 )
n= α / 2 n=
e 2
= 31.2
Round the value 31.2 up to 32. Therefore, to be 99% confident that the estimate is within 2 feet
of the true mean depth, the scientist needs at least a sample of 32 measurements. In most cases in
statistics, we round off. However, when determining sample size, we always round up to the next
whole number.
As can be seen from the above formula, there is direct relationship between sample size and
variation in the population. Therefore, the more the variability the larger is the sample size.
Variation of the population, however, is neither known nor its estimate obtained prior to
sampling. Hence, if there is historical evidence of the variance that can be used. But most of the
time neither the population variance nor the sample variance are known. Hence we need to
estimate it using the formula:
officials high value −officials low value
σ=
4
Example 2.11
A sample is to be taken to estimate the mean salary of plumbers to be within 500 with a
confidence coefficient of 0.99. A Plumber’s union official states that birr 40,000 and birr 26,000
would be unusual large and small salaries for plumbers in the union. What should the sample
size be?
2 2
α/2 x
Z σ
n= 2
Solution e
It is possible to use formula and find sample size but we need to firs find the σ.
officials high value −officials low value
σ=
4
40 , 000−26 , 000 2. 33(3500) 2
σ= =3500 n=( ) =266 . 02
Therefore, 4 Therefore 500
Confidence interval estimate for population proportion
AUSA TODAY Snapshots feature stated that 12% of the pleasure boats in the United States
were named Serenity. The parameter 12% is called a proportion. It means that of all the pleasure
boats in the United States, 12 out of every 100 are named Serenity. A proportion represents a
part of a whole. It can be expressed as a fraction, decimal, or percentage. In this case, 12% =
0.12 =12/100 or 3/25. Proportions can also represent probabilities. In this case, if a pleasure boat
is selected at random, the probability that it is called Serenity is 0.12.
Proportions can be obtained from samples or populations. The following symbols will be used.
P = population proportion and q=1− p
P (read “p hat”) = sample proportion
AMU Department of Accounting and Finance Page 16 of 26
BUSINESS Statistics chapter TWO 2014
Sample proportion .
number of success in a sample of size n
p=
n
q=1− p
sample proportion. For a point estimate of p (the population proportion), p (the sample
proportion) is used. On the basis of the three properties of a good estimator, is unbiased,
consistent, and relatively efficient. But as with means, one is not able to decide how good the
point estimate of p is. Therefore, statisticians also use an interval estimate for a proportion, and
they can assign a probability that the interval will contain the population proportion.
p
The confidence interval for a particular p is based on the sampling distribution of . When the
sample size n is no more than 5% of the population size, the sampling distribution of is
To contract interval estimate for population proportion we use the normal distribution. You may
recall form probability distribution that population proportion follows binomial distribution.
However, binomial distribution is difficult to construct interval with. Hence, we will use normal
distribution with the assumption that if sample size is sufficiently large, the distribution of p is
nearly normal. The rule of thumb is that n is considered to be large if both np and nq are greater
than 5. The rule assumes that we know the population proportions of success and failures.
However, in reality neither population proportions nor failures are known but we need to
estimate them with sample proportion of success and failures. The rule of thumb we shall follow
is that both n p≥5 and n q≥5 .
To construct a confidence interval about a proportion, you must use the margin of error, which is
E= Z α / 2 √ pq / n
S p=
√ S
pq
n
σ p , we can calculate the interval estimate as:
Then using p as estimator of
P−Z α /2
√ pq
n
≤ p≤ p+Z α /2
n p and nq
n√
pq
P−Z α /2
√ pq
n
< p< p+Z α /2
√
pq
n
AMU Department of Accounting and Finance Page 18 of 26
BUSINESS Statistics chapter TWO 2014
0 .23−1 .65
1404 √
(0 .23 )(0 .77 )
< p<0 .23+1 . 65
0 .23−0 . 019< P<0 . 23+0 .019
√
(0 . 23)(0 . 77 )
1404
P−Z α /2
√ pq
n
< p< p+Z α /2
n√
pq
0 .159−1. 96
1721 √
(0 .. 159 )( 0. 841)
Hence, you can say with 95% confidence that the true percentage is between 14.2% and 17.6%.
Example 2.15
A random sample of 400 members of labour force in a five state region showed that 32 were
unemployed. Construct the 95% confidence interval for the proportion unemployed in the region.
Solution
32
P=
400 = 0.08
With C of 95%, α=0 . 05 and α /2=0 . 025
Find Z0.025 from the statistical table. To find Z value search for the probability in the main body of
the Z table and search for the corresponding Z score. In our case that will be 1.96. Therefore, the
interval estimate can be calculated as:
P−Z α /2
√ pq
n
≤ p≤ p+Z α /2
n√
pq
0 . 08−1 . 96
400
0 . 053≤ p≤0. 107
√
(0. 08 )(0 . 92)
≤ p≤0 . 08+1. 96
400 √
(0 . 08)(0 . 92 )
Consequently, with 95% confidence, we state the population proportion to be between 0.053 and
0.107 that is between 5.3% and 10.7%
Sample Size for Estimating Population Proportions
To find the sample size needed to determine a confidence interval about a proportion, use the
following formula:
The confidence interval for p is
P−Z α /2
√ pq
n
≤ p≤ p+Z α /2
√pq
n , Which shows that the interval extends from
P−Z α /2
pq
n √
to
P+Z α /2
√pq
n so we can express this as:
P±Z α /2
√pq
n
The interval will be more precise or narrower the smaller the term that follows . The term is
called margin error value and is indicated by e.
e=Zα/2
√ pq
n
If we solve for n, we get the following formula:
2
2
Zα /2 Z α / 2 pq
n=pq n=
e e2
P is known (e.g., from a
There are two situations to consider. First, if some approximation of
previous study), that value can be used in the formula. Second, if no approximation of is known,
you should use P=0.5 . This value will give a sample size sufficiently large to guarantee an
accurate prediction, given the confidence interval and the error of estimate. The reason is that
when P and q are each 0.5, the product P q is at maximum, as shown here. . If the existing
information leads to the belief that the population proportion is between two values: If both
values are on the same side of 0.5, choose p as the value closer to 0.5. If 0.5 is between the two
values, use 0.5 as for p.
P q P q P q P q
0.1 0.9 0.09 0.6 0.4 0.24
0.2 0.8 0.16 0.7 0.3 0.21
0.3 0.7 0.21 0.8 0.2 0.16
0.4 0.6 0.24 0.9 0.1 0.09
0.5 0.5 0.25
Example 2.16
A researcher wishes to estimate, with 95% confidence, the proportion of people who own a home
computer. A previous study shows that 40% of those interviewed had a computer at home. The
researcher wishes to be accurate within 2% of the true proportion. Find the minimum sample size
necessary.
Solution
Since Zα/2 =1.96, E = 0.02, P =0.40, and q =0.60, then
AMU Department of Accounting and Finance Page 20 of 26
BUSINESS Statistics chapter TWO 2014
2 2
Zα /2 1. 96
n=pq n=(0 . 40 )( 0. 60 )
E 0 .02
n=2304 . 96≈2305 which, when rounded up, is 2305 people to interview.
Example 2.17
A researcher wishes to estimate the percentage of M&M’s colors that are brown. He wants to be
95% confident and be accurate within 3% of the true proportion. How large a sample size would
be necessary?
Solution
Since no prior knowledge of P is known, assign a value of 0.5 and then q =1 - 0.5= 0.5.
Substitute in the formula, using E = 0.03.
2 2
Zα /2 1 . 96
n=pq n=(0 .50 )(0 .50 )
E 0 . 03
n=1067. 1≈1068
Hence, a sample size of 1068 would be needed.
N.B. In determining the sample size, the size of the population is irrelevant. Only the degree of
confidence and the margin of error are necessary to make the determination.
Example 2.18
Suppose we want to estimate a population proportion to be with in 0.04 and we want to a
confidence coefficient of C= 0.90. How large should the sample size we take be?
Solution
We are given confidence coefficient and error. The population proportion that yields the safest
sample size is 0.5. Therefore, it is possible to calculate the sample size using the formula:
2
α /2
Z pq
n=
e2
Next, we need to read Z value of /2, where = 1- 0.9 = 0.1. Therefore, /2 = 0.05
Z α /2 =Z 0 . 05=1 . 64
2
1 .64 (0 . 5)(0 . 5)
n= 2
n=420 .25
Therefore, (0 . 04 )
Interval Estimation of the Difference between two independent Means
It is clear that the unbiased point estimate of the difference between the means of two
populations ( 1 2 ) is the difference between two sample means ( 1 2 ) , where each sample is
μ −μ x −x
a random sample taken from the respective target population. The confidence interval is
constructed by adding the relevant standard error value which is called standard error of the
difference between means and the confidence level desired.
Interval Estimation of μ1 −μ 2 - population normal, σ known
If the two parent populations are normal, then the sampling distribution of the difference
between two means will be normally distributed regardless of n (sample size). And we can
estimate μ1 −μ 2 (regardless of n1 ∧n2 using the following formula; given that σ 1 & σ 2 are known.
μ1 −μ 2 =X 1 −X 2 ±Z α /2 σ X −X
1 2 2
√
2 2
σ 1 σ2
1 2
√
σ X −X = σ +σ =
2
+
X1
n 1 n2
2
X2
When σ 1 and σ 2 are not known, the standard error between two sample means ( 1 2 ) is
σ x −x
estimated by the sample standard error of the difference between two sample means,
1 2
√
S X − X = S +S =
2S 21 S 22
X1 +
2
X2
√
n 1 n2 , and the interval estimation takes the following form:
μ1 −μ 2 =X 1 −X 2 ±Z α /2 S X − X
1 2 , given that the sample sizes are large.
Example 2.19
In a sex discrimination case, an employee alleged that a large corporation paid men more than
women for comparable work. Let population 1 represent all male employees performing certain
jobs and population 2 represent all female employees performing comparable jobs at the
corporation. Independent samples are taken of n1 =100 males and n2 =100 females; the sample
means are
x 1=Birr 20,600 and x 2=Birr 19 ,700 , and the sample standard deviations are
s 1 =Birr 3 ,000 and s 2 =Birr 2 , 500 . Construct a 95% confidence interval for μ1 −μ 2 . What do
you conclude from this?
Solution: Male employees Female employees
n1 =100 males n2 =100 females C= 0.95
x 1=Birr 20,600 x 2=Birr 19 ,700
s 1 =Birr 3 ,000 s 2 =Birr 2 , 500
Steps:
i. Calculate the (sample) standard error of the difference between two means
SX −X =
1 2 √
S21 S22
+ =
n1 n2 100 √
(3 , 000 )2 (2, 500 )2
+
100
=√ 142 ,500=390. 51
ii. Compute α /2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
iii. Look up
Z α /2 =Z 0. 025=1 . 96
2
This formula works also for problems which involve large sample sizes ( n1 ∧n 2≥30 ) even though the
parent population may not be normally distributed.
AMU Department of Accounting and Finance Page 22 of 26
BUSINESS Statistics chapter TWO 2014
134.60 ≤ μ1 −μ 2 ≤ 1,665.40
We state with 95% confidence that the mean salary difference between the male and female
workers lies between Birr 134.60 and Birr 1665.40
Because this interval contains only positive values, we can be quite confident that ( 1 2 ) > 0.
μ −μ
Thus,it reasonable to assume that the mean salary for males exceeds the mean salary for females.
Interval estimation of μ1 −μ 2 population normal, σ 1 andσ 2 unknown, n1 ∧n2 <30.
When the sample sizes are small, the population standard deviations are unknown, and the
population distributions are normal, we use t-distribution to construct a confidence interval for
μ1 −μ 2 . Moreover, to use a t-distribution we have to assume that the two variances (standard
deviations) are equal. In short, to use a t-distribution for constructing confidence interval for
μ1 −μ 2 , we assume the following:
√ √ ( )
2 2
σ σ 1 1
σX −X 2 = + = σ2 + .
1 n 1 n2 n1 n2
written as
2 μ1 −μ 2 =X 1 −X 2 ±Z α /2 σ X −X
If the variance σ of the populations is known, 1 2 can be used to
develop the interval estimate of μ1 −μ 2 .
2
2 s and s2
However, in most cases, σ is unknown; thus the two sample variances 12 must be used
√
2 2
σ σ
σX −X 2 = +
to develop the estimate of σ . Since
2 1 n 1 n2
is based on the assumption that
σ 2=σ 2 =σ
2
σ and σ 2
1 2 , we do not need a separate estimates of 1 2 2 . In fact, we can combine the
2
data from the two samples to provide the best single estimate of σ . The process of combing the
2
results of two independent simple random samples to provide one estimate of σ is referred to as
2 s2 (s )
pooling. The pooled estimator of variance, σ , denoted by
2
p is the weighted average of the
2
s 2 and s2
two sample variances, 1 , with the degrees of freedom associated with each sample being
2
used as the weights. The formula for the pooled estimator of σ is:
( n1−1 ) S 1+ ( n2−1 ) S 2 ∑ ( X i 1− X 1 ) +∑ ( X i2−X 2 )
2 2 2
S 2P= =
n1 +n2−2 n1 +n2 −2
Where:
2
S P = pooled estimate of the variance
n1 = sample size drawn from population 1
n2 = sample size drawn from population 2
S 21 = sample variance of the sample drawn from population 1
S 22 = sample variance of the sample drawn from population 2
n1+n2-2 = pooled degrees of freedom
Based on the assumption that the population standard deviations are equal, the standard error of
the difference between means is estimated by the sample standard error of the difference between
√
2 2
SP S P
+
n1 n 2
The confidence interval for μ1 −μ 2 when the common standard deviations σ 1 =σ 2=σ are not
known is based on t-distribution, and is given by:
μ1 −μ 2 =X 1 −X 2 ±t α /2 , v S X − X
1 2
Where:
1 2
√
S X − X = S 2P
( 1 1
+
n1 n2 )
ν = pooled degrees of freedom (n1 + n2 – 2)
Example 2.20
Two manufacturing companies produce drill tips that are used to cut holes in steel sheets. A
customer wishing to know which drill tips have the longer site purchases, independent samples
of n1 =20 drill tips from company 1 and n2 =15 drill tips from company 2. The mean lives of
the drill tips are x 1=78 minutes and x 2=84 minutes. The population variances are unknown
2
2
s = 41 and s =36
but assumed to be equal. The sample variances are 12 . Construct a 95%
confidence interval for μ1 −μ 2 assuming that the two populations are normally distributed.
Solution:
Company One Company Two
n1 =20 drill tips n2 =15 drill tips C = 0.95
x 1=78 minutes x 2=84 minutes
2 2
S 1 = 41 S 2 = 36
i. Calculate the sample standard error of the difference between two means and the pooled
degrees of freedom
√ √
2 2
( n1−1 ) S1 + ( n2 −1 ) S 2 1 1
2 2
( )
SP SP
S X −X = + = +
1 2 n1 n 2 n 1 +n2 −2 n1 n 2
= √
( 20−1 ) 41+ ( 15−1 ) 36 1 1
20+15−2 20
+
15 ( )
= √
1 , 283 1 1
33
= 2.13
20
+(15 )
ν = n1 +n2 -2
= 20 + 15 -2 = 33
ii. Compute α /2 and look up α /2
t ,v
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
t α /2 , v = t 0.025,33 = 2.04
iii. Construct the confidence interval
μ1 −μ 2 =( X 1 −X 2 ) ±t α /2 , v S X
1− X 2
= ( 78−84 )±2 . 04 ( 2. 13 )
=−6±4 . 34
-10.34 ≤ μ1 −μ 2 ≤ -1.66
The 95% confidence interval is (-10.34 to –1.66). This interval contains only negative values
indicating that the drill tips made by company 1 do not last as long on average, as those made by
company 2.
sample is a random sample taken from the respective target population. Moreover, based on
CLT, if
n1 p1 , n1 q1 and n2 p2 , n 2 q2 are greater than 5, the sampling
( P 1−P2 )−( P1−P2 )
Z=
distribution of
p1 −p 2 is normal with √ P1 q 1 p2 q 2
n1
+
n2
However, here p1 andp 2 are unknown, and we want to estimate p1 andp 2 by p1 and p 2
respectively, and hence Z becomes:
( P 1−P2 )−( P1−P2 )
Z=
√ P 1 q 1 p2 q 2
n1
+
n2 . That is,
σ p −p
1 2 is substituted by
Sp −p
1 2
P1−P 2= p1 − p2 + Z
values, it becomes:
√ p 1 q 1 p 2 q2
n1
+
n2 , and since Z can assume both positive and negative
P1−P 2= p1 − p2 ±Z
√ p1 q1 p 2 q2
n1
+
n2
Since z represents the confidence level we write it as
P1−P 2= p1 − p2 ±Z α / 2
Where:
√ p1 q 1 p2 q 2
n1
+
n2
Example 2.21
A TV executive is interested in determining if the proportion of people who watch a late-night
talk show is higher with the regular host or a guest host. In a random sample of 400 people, 175
watch the show when the regular host is on. In an independent random sample of 500 people,
185 watch the show a guest host is on. Calculate a 95% confidence interval for p1 −p 2 . What do
you conclude?
Solution:
Regular host Guest Host
n1 = 400 p1 = 0.4375 n2 = 500 p2 = 0.37
X1 = 175
q 1 = 0.5625 X2 = 185
q 2 = 0.63
C = 0.95
i. Calculate the sample standard error of the diff. between two proportions
S p −p =
1 2 √ p1 q1 p2 q2
n1
+
n2
=
√
0 . 4375∗0 .5625 0. 37∗0 . 63
400
+
500
=0 .033
ii. Compute α /2
α = 1-C = 1- 0.95 = 0.05
α/2 = 0.05/2 = 0.025
Z =Z
iii. Look up α /2 0. 025 =1 . 96
iv. Construct the confidence interval
P1−P 2= p1 − p2 ±Z α /2
√ p1 q 1 p2 q 2
n1
=( 0 . 4375−037 )±1 . 96(0 . 033)
+
n2
= 0.0675 ± 0.065
0.0025 ≤ p1 −p 2 ≤ 0.1325
We state with 95% confidence that the true difference between p1 −p 2 is between 0.0025 and
0.1325. Since this interval contains only positive value it is reasonable to say that the proportion
of people who watch TV when the regular host is on is greater than when the guest host is on.