Chapter 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 45

Statistical Method

May 30, 2023


2.Estimation and sampling distribution of mean, proportion
and variance

Statistical inference refers to making inference about population


parameter through the uses of sample information.
Sample statistics summarizes information contained in sample and
can be used to make inference about population parameter.
Statistical inference can be made through either by estimation or
hypothesis testing.
Estimation: is the procedure of estimating a population parameter by
some corresponding statistic from information in a sample.
The two approaches of estimation are:
i. Point estimation: is a procedure of estimating a parameter by a single
numerical value computed from a sample

Statistical Method May 30, 2023 2 / 45


ii. Interval estimation: Any point estimator may either under or over
estimate the true value of parameter.
The value of parameter equals the point estimator ± allowance for error.
Parameter = statistic ± E, where E is measure of error.
Sampling with and without replacement
i. Sampling with replacement: is selection of n units from population of N
units in such a way of the selected unit is replaced before the next draw.
The selected units are independent of each other.
There are Nn possible samples of size n from population of size N in
sampling with replacement.

Statistical Method May 30, 2023 3 / 45


ii. Sampling without replacement: is selection of n units from a population
of size N in such away a selected item will not be replaced back before the
next draw.
!
N
There will be possible samples of size n from population of size N
n
in sampling without replacement.
Example: Select a sample of size 2 units from a population consisting 3, 4,
7.
Solution: Sampling with replacement Nn = 32 = 9 possible samples. (3,
3), (3, 4), (3, 7), (4, 3), (4, 4), (4, 7), (7, 3), (7, 4), (7, 7).
! !
N 3
Sampling without replacement = possible samples. (3, 4),
n 2
(3, 7), (4, 7).

Statistical Method May 30, 2023 4 / 45


If we compute sample means for the above samples we can get
different values.
Therefore sample mean is a random variable.
Sampling distribution

Sampling distribution is a probability distribution of all possible values


of statistic for a sample of given size from the same population.
The distribution of all possible values of a given statistic, computed
from all possible samples of the same size randomly drawn from the
same population is called the sampling distribution of that statistic.

Statistical Method May 30, 2023 5 / 45


The three important characteristics of sampling distribution are;
Its mean (µx̄ )
Its variance (σx̄2 ), and
Its Proportion(functional) form which describes how it looks like when
graphed (i.e. normally distributed or not).
Sampling distribution of sample mean
It is a probability distribution of all possible sample means of a given size n
from the same population.
Example: Weights in kg of three judges of certain court are 79, 81 and 86
for judges A, B and C respectively. Select a sample of 2 judges from the
population of three and construct sampling distribution of sample mean.
Note: The populations mean body weight is µ

Statistical Method May 30, 2023 6 / 45


Pn
(xi −µ) 26
µ= 79+81+86
3 =82 The population variance σ2= i=1
N = 3 =8.66
i. Sampling with replacement
Sample Weight Sample mean
A,A 79,79 79
A,B 79,81 80
B,A 81,79 80
B,B 81,81 81
B,C 81,86 83.5
C,B 86,81 83.5
A,C 79,86 82.5
C,A 86,79 82.5
C,C 86,86 86

Statistical Method May 30, 2023 7 / 45


Sampling distribution of sample mean(x̄ )
Sample mean 79 80 81 82.5 83.5 86
P(x̄ ) 1/9 2/9 1/9 2/9 2/9 1/9
P
E(x̄ )= x̄ p(x̄ )=79(1/9)+80(2/9)+81(1/9)+. . . +86(1/9)=82

E(x̄ 2 )=
P 2
x̄ p(x̄ )
=792 (1/9) + 802 (2/9) + 812 (1/9) + · · · + 862 (1/9)=6728.333
Var (x̄ ) = E (x̄ 2 ) − E (x̄ )2 =4.333
For sampling with replacement:
2
E(x̄ )=µ=82 and Var (x̄ )= σn =8.666/2=4.333

Statistical Method May 30, 2023 8 / 45


ii. Sampling without replacement
Sample Weight Sample mean (x̄ )
A,B 79,81 80
B,C 81,86 83.5
A,C 79,86 82.5
Sampling distribution of sample mean (x̄ )
Sample mean(x̄ ) 80 82.5 83.5
P(x̄ ) 1/3 1/3 1/3
For sampling without replacement
P
E(x̄ )= x̄ p(x̄ )=(1/3*80)+(1/3*82.5)+(1/3*83.5)=246/3=82
E(x̄ 2 )= x̄ 2 p(x̄ )2
P

=(1/3 ∗ 802 ) + (1/3 ∗ 82.52 ) + (1/3 ∗ 83.52 )=20178.5/3=6726.16666

Statistical Method May 30, 2023 9 / 45


Var (x̄ ) = E (x̄ 2 ) − E (x̄ )2 =6726.16666 - 822 =2.1666
For sampling without replacement from finite population, we have:
σ 2 3−2
E(x̄ )=µ=82 and Var (x̄ )= N−n
N−1 ∗ n = 3−1 ∗ 8.666
2 =2.16666
N−n
Note: The term N−1 is called population correction factor.
N−n
If population size too large relative to n, N−1 ≈ 1. In that cases
σ2
Var (x̄ ) = n
Suppose x1 , x2 , . . . , xn be sample selected randomly (independent) from a
population with mean
Pn
x
i=1 i
E (x̄ ) = E [ n ] then
= n E ( i=1 xi )= n E ( ni=1 µ)= n1
1 Pn 1
∗ nµ=µ
P

Statistical Method May 30, 2023 10 / 45


If the expected value of estimator is the same as population
parameter, the estimator is called unbiased estimator.
Since E (x̄ ) = µ, x̄ is called unbiased estimator of µ .
and variance σ 2 =
Pn
x Pn Pn
Var(x̄ )=Var[ i=1 i
n ] = n12 Var ( 1
i=1 xi )= n2 i=1 Var (xi ),

because xi ’s are independent i.e. Cov (xi , yj ) = 0, i 6=j


= n12
Pn 2 = nσ 2 = σ 2
i=1 σ n2 n
The standard deviation of sample mean(x̄ ) is called standard error.
2
S.E(x̄ )= σn

Statistical Method May 30, 2023 11 / 45


Generally we can summarize the sampling distribution of the sample mean
under the following three conditions;
Sampling from a normally distributed population with a known
population variance σ 2
The sampling distribution will be normal with mean µ and variance
2
σ 2 /n and x̄ = N(µ, σn )
Z= σx̄2 /−µ
√ ∼ N(0,1)
n
Sampling from normally distributed population with unknown
variance.
The means will have approximate normal distribution when the sample size
is large with mean µ and variance S2 /n
The means will have a t-distribution with a df= n-1, when the sample size
is small.

Statistical Method May 30, 2023 12 / 45


Sampling from a population of non-normal/unknown distribution
The means will have an approximate normal distribution as stated by the
central limit theorem.
Suppose a random sample consisting x1 , x2 , . . . , xn is drawn from
population with mean µ and variance σ 2 ,
then the sampling distribution of the sample mean x̄ is approximated by
normal distribution with mean µ and variance σ 2 /n as n gets larger (n>
2
30) x ∼ N(µ, σn )

Statistical Method May 30, 2023 13 / 45


Example: A sample of size 100 is selected from a population having a
mean of 200 and standard deviation 50.
a) What is the probability that the sampling mean will be with in ± 5 of
population mean?
b) What is the probability that the sampling mean will be greater than
190?
Solution: a). since n=100 is large, then by the central limit theorem, x̄ ∼
N(2000,2500/100)
p(µ − 5 6 x̄ 6 µ + x̄ )=P( (200−5)−200
5 6z 6 (200−5)+200
5 )=P(−1 6z 6
1)=2P(0 6 z 6 1)=0.6826
b).Similarly
P(x̄ > 190) = P(z > 190−200
5 ) = P(z > −2) = P(0 6 z 6 0) + 0.5 =
P(0 6 z 6 2)+0.5=0.4772+0.5=0.9772

Statistical Method May 30, 2023 14 / 45


Point estimation of population mean
The best unbiased estimator of population mean µ is a sample mean x̄
P n
x
1 i
given by: x̄ = n
Example: Ages in years of 10 randomly selected regular students of JU are:
20, 19, 23, 18, 22, 19, 20, 21, 22 and 26. Estimate the mean age of all
regular students in JU.
Solution: Population mean is estimated by the sample mean.
Pn
xi
x̄ = 1
n = 20+19+···+26
10 =21
Therefore the average age of regular students in JU is estimated to be 21
years. In point estimation, we have the following terms.

Statistical Method May 30, 2023 15 / 45


Estimator: is a formula that uses sample information to estimate
parameter.
Estimate: is a specific numerical value obtained from information in
specific sample.
Pn
xi
So, from above example x̄ = 1
n is estimator of µ and x̄ =21 is estimate
of µ.
Properties of point estimation
A good point estimator should have the following properties:
X Unbiasedness
X Consistency
X Efficiency

Statistical Method May 30, 2023 16 / 45


z A given estimator is unbiased if its expected value (the mean of its
sampling distribution) is equal to the parameter which it tries to estimate.

That is if θ̂ is point estimator of θ , then is unbiased estimator θ if


E(θ̂)=θ.
However, if E(θ̂)6=θ is biased.
z An unbiased estimator is called efficient , if it has the minimum variance
as compared to any other estimator.

That is if θˆ1 and θˆ2 are unbiased estimators of θ , then θˆ1 is efficient
estimator if var(θˆ1 ) 6 var(θˆ1 2)
z An unbiased estimator is called consistent , if its value tends to the
value of parameter as n increase.

That is θ is consistent estimator of θ if limN θ̂ → θ.


−→

Statistical Method May 30, 2023 17 / 45


Interval estimation of population mean
i. When population variance is known
The 100 % CI for µ is given by:
µ=x̄ ± E To determine E, we use sampling distribution of x̄ . If the parent
population is normally distributed with known variance σ 2 we have:
x̄ −µ √
Z= σ/ √
n
Z can be either positive or negative. ⇒ µ=x̄ ± Zα/2 σ/ n

Under such case the allowance for error is Zα/2 σ/ n.
Determinants of error term are:
X Sample size
X Population variance
X Z-value

Statistical Method May 30, 2023 18 / 45


√ √
If α is the probability that µ lies outside (x̄ − Zα/2 σ/ n, x̄ + Zα/2 σ/ n)
then we have:
P(−Zα/2 6 Z 6 Zα/2 )
x −µ
⇒ P(−Zα/2 6 √
σ/ n
6 Zα/2 )=1-α
√ √
⇒ P(−Zα/2 ∗ σ/ n 6 x − µ 6 Zα/2 ∗ σ/ n)=1-α
√ √
⇒ P(−Zα/2 ∗ σ/ n − x 6 −µ 6 Zα/2 ∗ σ/ n − x )=1-α
√ √
⇒ P(Zα/2 ∗ σ/ n + x > µ > Zα/2 ∗ σ/ n + x )=1-α
√ √
⇒ P(x̄ − Zα/2 ∗ σ/ n 6 µ 6 x̄ + Zα/2 ∗ σ/ n)=1-α
This shows the 100(1-α)% CI for µ is
√ √ √
(x̄ − Zα/2 ∗ σ/ n 6 µ 6 x̄ + Zα/2 ∗ σ/ n) where x̄ − Zα/2 ∗ σ/ n is the

lower confidence limit and (x̄ + Zα/2 ∗ σ/ n is the upper confidence limit.
P(0 6 Z 6 Zα/2 )=0.5 - α/2

Statistical Method May 30, 2023 19 / 45


Example: A random sample of 25 packets of sugar was drawn from a
production line of certain sugar factory. Assume that the distribution of
weight of packets is normal with variance 81gm2. The mean weight of
selected packets is 2015gm. find a 90% CI for the mean weight
Solution: Given n=25, σ 2 x̄ =2015, and √σ = 9 =1.8gm
n 5

100(1-α)%=90%
1-α=0.9
α=0.1 ⇒ α/2
P(0 6 Z 6 Zα/2 ) =0.5 - 0.05=0. 45
1.64+1.65
This implies ⇒ Zα/2 = 2 = 1.645

⇒ The lower confidence limit x̄ − Zα/2 ∗ σ/ n =2015-1.645*1.8=2012.04

⇒ The upper confidence limit x̄ + Zα/2 ∗ σ/ n=2015+1.645*1.8=2017.96

Statistical Method May 30, 2023 20 / 45


Therefore the 90% CI for is (2012.04, 2017.96).
Interpretation: we are 90% sure that the true mean weight of packets of
sugar produced by the factory is between 2012.04gm and 2017.96gm.
Note:
X As level confidence of significance gets larger, the CI gets wider and vice
versa.
X As level of significance gets larger, the CI gets narrower and vice versa.
Exercise: Construct a 95% CI for for the above example.

Statistical Method May 30, 2023 21 / 45


ii. When population variance σ 2 is unknown
Pn If σ 2 , is unknown it can be
(xi −x̄ )2
estimated by sample variance S 2 = i=1n−1
x̄ −µ
Under such case, if sample size is small,(n6 30) , then √
σ/ n
∼ tn−1 .
Therefore when the parameter population is normally distributed with
unknown variance and if the sample size is small, the 100 (1- α) CI for µ is
√ √
(x̄ − tα/2 (n − 1) ∗ s/ n , x̄ + tα/2 (n − 1) ∗ s/ n)
Example: Suppose that the followings are weight (in pounds) of 13 seventy
years old man selected randomly from certain community: 140, 143, 148,
155, 158, 159, 159, 162, 163, 167, 171, 173 and 195. It is assumed that
the distribution of weight is normal. Construct a 95% CI for µ.
Pn Pn
xi (xi −x̄ )
Solution: Given x̄ = i=1
n =161 s2 = i=1
n−1 =2016.4

⇒ S.E (x̄ ) = s/ n=14.2

Statistical Method May 30, 2023 22 / 45


Here we have:
X The population is normally distributed
X Population variance is unknown
X Sample size =13 is small.
We have to use t-distribution.
1-α=0.95 ⇒ α = 0.05 ⇒ α/2=0.025 ⇒ t0.025 (13)=2.179
⇒ The lower confidence limit

x̄ − tα/2 (12)s/ n=161-2.179*3.94=152.41
⇒ The upper confidence limit

x̄ + tα/2 (12)s/ n=161+2.179*3.94=169.59
⇒ The 95% CI for µ is (152.41, 169.59).
Interpretation: We are 95% sure that the mean body weight of 70 years
old men in that community is between 152.41lb and 169.59lb.
Statistical Method May 30, 2023 23 / 45
Exercise: Construct a 99% CI for µ using above example.
⇒ Irrespective of whether population normal or not, where population
variance is known or not, if sample size (n) is large, then we use normal
distribution (z).
Note: So far, we have seen how to construct a two sided confidence
interval in which case, the probability that the mean is outside that interval
is given to both sides. But sometimes we may be interested to construct a
one sided CI in which µ is estimate to be above or below some value.
We give to one side.
P(z 6 zα )
x̄ −µ
P( σ/ √ 6 zα )=1-α
n √
P(x̄ − µ 6 σ/ n ∗ zα )=1-α

P(−µ 6 σ/ n ∗ zα − x̄ )=1-α

P(−µ > x̄ − σ/ n ∗ zα )=1-α

P(x̄ − σ/ n ∗ zα 6 µ)=1-α

Statistical Method May 30, 2023 24 / 45



A 100(1-α)% lower CI is (x̄ − zα ∗ σ/ n, ∞)

Similarly the upper one side CI for µ is (∞ + x̄ − zα ∗ σ/ n)
Example: The life time of 16 randomly selected light bulbs were observed
to have an average of 1300hrs. Assuming that this random sample from
normal population with standard deviation of 500hrs. Construct the
a. 95% upper CI for µ.
b. 95% lower CI for µ.
Solution: a. Given n = 16, x̄ =1300, σ=500 and it is normal. If population
standard deviation is known we use normal distribution.

S.E (x̄ ) = σ/ n =500/4=125

Upper confidence limit is x̄ + zα ∗ σ/ n=1300+1.645*1.25=1505.625

Statistical Method May 30, 2023 25 / 45


Therefore a 95% upper CI for µ is (-∞,1505.625). But life time cannot be
negative. ⇒ A 100(1-α)% upper CI for mean life time is (0,1505.625).

b. Lower confidence limit is x̄ − zα ∗ σ/ n=1300-1.645*125=1094.375
Therefore a 95% lower CI for µ is . But life time cannot be negative. ⇒ A
100(1-α)% upper CI for mean life time is (1094.375,0).
Sampling distribution of the sample proportion
Suppose a variable x has values x1 , x2 , . . . , xn . Let xi =1, if it has certain
characteristics and 0 , if it does not have that characteristics.
Pn
x
i=1 i number of individual with total characteristics
N = N =P

Statistical Method May 30, 2023 26 / 45


P → is called population proportion of those possessing that
characteristic.
P can be estimated by a sample proportion
Since xi takes only values 0 or, it has Bernoulli distribution.
Pn
If x= i=1 xi has a binomial distribution with parameter p and n.
⇒ E (P̂) = E ( xn ) = n1 E (x ) = n1 nP=P
P(1−P)
Var(P̂) = Var ( xn ) = 1
n2
Var (x ) = 1
n2
nP(1 − P) = n

Since P̂ is similar to x̄ , then we can apply the central limit theorem to


determine distribution of P̂.
P(1−P)
Therefore by CLT, P̂ has normal distribution with mean P and n as n
gets larger.
P̂ ∼ N(P, P(1−P)
n ) Z= qP̂−P ∼ N(0,1)
P(1−P)
n

Statistical Method May 30, 2023 27 / 45


Example 1: Suppose 30% of JU students are non-cafe. If 150 students are
randomly selected, what is the probability that the sample proportion of
non-cafe students is less than 0.25?
Solution: P ∼ N(0.3, 0.3∗0.7
150 )

P(P̂ 6 0.25) = P( √p̂−0.3


0.0014
6 0.25−0.3

0.0014
)=P(Z 6 −1.34) = 0.5 − P(0 6 Z 6
1.34)=0.0901
Example 2: A manufacturer of certain food reports that 76% of consumers
read the ingredients listed on the products label. If a random sample of
400 consumers is selected. What is the probability that the sample
proportion with in ± 0.3 of population proportion?
Solution: Given P = 0.76 and n=400.

Statistical Method May 30, 2023 28 / 45


P(p − 3 6 p̂ 6 p + 3)=P( p−0.3−p
q 6 qp̂−p 6 p+0.3−p
q )
p(1−p) p(1−p) p(1−p)
n n n

0.3 0.3
P( q 6Z 6 q )
0.76(1−0.76) 0.76(1−0.76)
n400 400

=P(−34 6 Z 6 34)
=2P(0 6 Z 6 34) = 2 ∗ P(0 6 34)=2*0.4998=0.9996

To confirm E(p̂) and Var(p̂)= p(1−p)


n , consider the example of judges.
Suppose out of the 3 judges 2 are female. Select a random sample of
2 judges and construct the sampling distribution of the sampling
proportion for females.
Lets A and C are female judges and B is a male judge.
Note: p= 32 is population proportion of females.

Statistical Method May 30, 2023 29 / 45


Sample Gender Sample proportion of female (p)
A,A 1,1 1
A,B 1,0 0.5
B,A 0,1 0.5
B,B 0,0 0
B,C 0,1 0.5
C,B 1,0 0.5
A,C 1,1 1
C,A 1,1 1
C,C 1,1 1
⇒ The sampling distribution of p̂ is:
Sample proportion(x̄ ) 0 0.5 1
P(p̂) 1/9 4/9 4/9

Statistical Method May 30, 2023 30 / 45


Pn
E(p̂) = i=1 p̂ ∗ p(p̂)=0*1/9+0.5*4/9+1*4/9=2/3
Pn
E(p̂ 2 ) = i=1 p̂
2 ∗ p(p̂)=02 ∗ 1/9 + 0.52 ∗ 4/9 + 12 ∗ 4/9=5/9
Var(p̂) = E (p̂ 2 ) − E (p̂)=5/9-2/3=1/9
p
and S.E(p̂)= 1/9=1/3
Point estimation of population proportion
The best unbiased estimation of population proportion(π) is a sample
P n
xi
proportion of p̂ is given by: p̂ = i=1
n = na , where xi =1, if i th unit possess
characteristic of information and xi =0 other wise.
Example: Suppose some of JU students are non-cafe. In a sample of 10
students, 3 are females. Estimate proportion of female’s regular students
at JU.
Solution: The best estimator of population proportion is sample
3
proportion, p̂ = na = 10 =0.3
About 30% of regular students at JU are females.

Statistical Method May 30, 2023 31 / 45


Interval estimation of population proportion
We know that p̂ ∼ N(p, p(1−p)
n )
q
p(1−p)
⇒ 100(1-α)% CI for p is: (p ± Zα/2 n )
⇒ Since p is unknown we estimate p by p̂.
⇒ The 100(1-α)% CI for p is
q q
p(1−p) p(1−p)
(p − Zα/2 n ,p + Zα/2 n )
Example: When 121 patients at JUMC were surveyed, it was found that
70% of them had been referred by other healthy institutions. Then
construct a 95% CI for true proportion of patients referred by others
healthy institutions.
Solution: Given n=121, p̂=0.7, and α=0.05 ⇒ Zα/2 = Z0 .025=1.96

Statistical Method May 30, 2023 32 / 45


q q
p(1−p) p(1−p)
CI is given by: (p − Zα/2 n ,p + Zα/2 n )=(0.618,0.782)
Exercise: Construct a 90% CI for true proportion of patients referred by
other institution for the above example.
Sampling distribution of sample variances
The sample variance is given by:
Pn
(xi −x̄ )
S2 = i=1
n−1

To obtain sampling distribution of S 2 , sample should be taken from


normally distributed population with mean µ and variance σ 2 .
Note: The sum or difference of two chi-square variable is a chi-square
variable with a sum or difference degree of freedom.
That is χ2 (n1 ) ± χ2 (n2 ) = χ2 (n1 ± n2 )

Statistical Method May 30, 2023 33 / 45


⇒ If the sample is taken from normally distributed population with mean
2
µ and variance σ 2 , then (n−1)S
σ2
∼ χ2 (n − 1)
Pn
(xi −x̄ )
Proof: S2 = i=1
n−1

Statistical Method May 30, 2023 34 / 45


Statistical Method May 30, 2023 35 / 45
Statistical Method May 30, 2023 36 / 45
Example: Suppose that random sample of 25 units is drawn from normally
distributed population with variance 81. Find the probability that the
variance from this sample will not exceed 90.
Solution:

Statistical Method May 30, 2023 37 / 45


Statistical Method May 30, 2023 38 / 45
Note: χ2 values are given in table for small degrees of freedom (df 6 100).
For problems with higher df, we use the following approximation.

i. Point Estimation of a Population Variance


The best unbiased point estimator of population variance σ 2 is a sample
variance S 2 is given by
Pn
(xi −x̄ )
S2 = i=1
n−1
Example: Ages in years of 10 randomly selected regular students in JU are:
20, 19, 23, 18, 22, 19, 20, 21, 22 and 26. Estimate the variance of Age
distribution of all regular students at JU.
Pn
x
x̄ = i=1 i
n = 20+19+···+26
10 =21
Pn
(xi −x̄ ) (20−21)2 +(19−21)2 +···+(26−21)2
S2 = i=1
n−1 = 10−1 =5.556

Statistical Method May 30, 2023 39 / 45


Therefore, the variance of Age distribution of regular students at JU is
estimated to be 5.556 unit squares.
ii. Interval Estimation of a Population Variance
If a sample of size n is taken from normal population with mean µ and
2
variance σ 2 , then (n−1)S
σ2
∼ χ2 (n − 1)

Statistical Method May 30, 2023 40 / 45


Statistical Method May 30, 2023 41 / 45
Note: If you get once the CI for σ 2 , then CI for σ can be obtained by
taking a positive square root of confidence limits to σ 2 . ⇒A 100(1-α)%
Confidence Interval (CI) for true value of σ is:

Statistical Method May 30, 2023 42 / 45


Example: A plastic sheets produced by certain machine are periodically
monitored for possible fluctuation in thickness. If the true standard
deviation of thickness exceeds 1.5mm there is a quality problem.
Thickness measured in mm of 15 sheets produced by this machine is: 225,
227, 230, 224, 228, 227, 226, 229, 232, 227, 228, 223, 226, 232 and 228.
Using this sampling data construct 95% CI for true population standard
deviation of thickness.
Solution:

Statistical Method May 30, 2023 43 / 45


Statistical Method May 30, 2023 44 / 45
Estimation about the difference between two populations

Statistical Method May 30, 2023 45 / 45

You might also like