Chapter 4 Inferential
Chapter 4 Inferential
Chapter 4 Inferential
6/21/2017 Teresa K. 2
Basic terms
6/21/2017 Teresa K. 3
Inferential Statistics cont’d…
6/21/2017 Teresa K. 5
Properties of Sampling Distribution
6/21/2017 Teresa K. 6
The Central Limit Theorem
6/21/2017 Teresa K. 7
The central limit theorem cont’d...
6/21/2017 Teresa K. 8
Sampling Distribution of the proportion
6/21/2017 Teresa K. 10
Standard deviation and Standard error
6/21/2017 Teresa K. 11
Parameter Estimations
6/21/2017 Teresa K. 14
Point Estimation
∑x
i =1
i
x =
n
6/21/2017 Teresa K. 15
Point Estimation cont’d …
6/21/2017 Teresa K. 16
6/21/2017 Teresa K. 17
Point estimations cont’d …
6/21/2017 Teresa K. 18
Point estimations cont’d …
6/21/2017 Teresa K. 19
Example
6/21/2017 Teresa K. 20
Interval Estimation
6/21/2017 Teresa K. 22
Meaning of confidence Interval
6/21/2017 Teresa K. 23
Interval Estimation cont’d …
6/21/2017 Teresa K. 24
Interval Estimation cont’d …
σ
The term zα /2 ( ) is called the maximum error of
the estimate. n
6/21/2017 Teresa K. 25
Interval estimation cont’d…
6/21/2017 Teresa K. 26
Confidence interval cont’d…
On the other hand 99% CI will be wider than 95% CI; the
extra width meaning that we can be more certain that
the interval will contain the population parameter. But
to obtain a higher confidence from the same sample,
we must be willing to accept a larger margin of error (a
wider interval).
6/21/2017 Teresa K. 27
Confidence interval cont’d…
6/21/2017 Teresa K. 29
The t-distribution
6/21/2017 Teresa K. 30
The t-distribution cont’d…
6/21/2017 Teresa K. 31
Characteristics of the t distribution
Degrees of Freedom
As explained earlier, the t-distribution involves the
degrees of freedom (df).
It is defined as the number of values which are free
to vary after imposing a certain restriction on your
data.
Example: If the weight of three individuals have a mean
of 40Kg, how many of the individuals can be freely
chosen?
6/21/2017 Teresa K. 33
The t-distribution cont’d…
Solution
The first and the second individuals could be chosen
freely (i.e., 20Kg and 30Kg, 15Kg and 25KG, 50KG &
40Kg, etc.) But the third individual is fixed (i.e., 70,
80, 20, etc.)
Hence, there are two degrees of freedom.
6/21/2017 Teresa K. 34
The t-distribution cont’d…
Table of t-distributions
The table of t-distribution shows values of t for
selected areas under the t curve.
Different values of df appear in the first column. The
table is adapted for efficient use for either one or
two-tailed tests.
Example 1. If df = 8, 5% of t scores are above what
value?
Example 2. Find to if n =13 and 95% of t scores are
between –to and +to.
Example 3. If df =5, what is the probability that a t
score is above 2.02 or Teresa
6/21/2017
below K.
-2.02? 35
The t-distribution cont’d…
Solutions
1. Look at the table (t-distribution ). Along the row
labeled “one tail” to the value 0.05; the intersection
of the 0.05 column and the row with 8 in the df
column gives the value of t = 1.86.
2. df =13-1 = 12. If 95% of t scores are between -to and
+ to, then 5% are in the two tails. Look at the table
along the row labeled “two tail” to the value 0.05;
the intersection of this 0.05 column and the row
with 12 in the df column gives to = 2.179.
3. Two tails are implied. Look along the “df =5” row to
find the entry 2.02. The
6/21/2017
probability is 0.10 .
Teresa K. 36
Confidence interval for a single mean (for continuous
variable )
σ σ
CI = ( x - zα /2 , x + zα /2 )
n n
6/21/2017 Teresa K. 37
Confidence interval for a single mean cont’d…
s s
CI = ( x - tα /2, n-1 , x + tα /2, n-1 )
n n
Where, n-1 = degree of freedom for student’s t-
distribution and s = sample standard deviation.
6/21/2017 Teresa K. 38
Confidence interval for a single mean cont’d…
No
Use tα/2 values and s in the formula.**
σ σ
2 2
σ σ
2 2
CI = ( x1 - x2 ) - zα /2 + , ( x1 - x2 ) + zα /2 +
n n n n
1 2 1 2
6/21/2017 Teresa K. 40
Interval estimation for difference of mean cont’d…
• Where ,
σ2 σ2
+
n1 n2
6/21/2017 Teresa K. 41
Interval estimation for difference of mean cont’d…
s12 s2 2 s12 s2 2
CI = ( x1 - x2 ) - tα /2, n1 + n 2 −2 + , ( x1 - x2 ) + tα /2, n1 + n 2 −2 +
n n n n
1 2 1 2
CI = p − Z α π (1 − π ) / n , p + Z α π (1 − π ) / n
2 2
When np and nq are greater than or equal to 5
6/21/2017 Teresa K. 43
Confidence interval for the difference of two
proportions
The point estimate for the difference of two
population proportion, π1- π2 is given by p1-p2.
A(1-α)100% confidence interval estimate for the
difference of population proportions, p1-p2 is given
by:
π1 (1 − π1 ) π 2 (1 − π 2 ) π1 (1 − π1 ) π 2 (1 − π 2 )
CI = (p1 − p2 ) − Z α + , ( p1 − p2 ) + Z α +
2
n1 n2 2
n1 n2
6/21/2017 Teresa K. 44
In general the width of confidence interval depends
on:
Sample size,
Level of confidence and
The standard error.
6/21/2017 Teresa K. 45
Examples
Example 1
A SRS of 36 apparently healthy subjects yielded the
following values of urine excreted (milligram per day);
0.007, 0.03, 0.025, 0.008, 0.03, 0.038, 0.007, 0.005,
0.032, 0.04, 0.009, 0.014, 0.011, 0.022, 0.009, 0.008,
0.012, 0.03, 0.05, 0.009, 0.008, 0.007, 0.006, 0.02,
0.034, 0.007, 0.008, 0.036, 0.007, 0.023, 0.011, 0.012,
0.022, 0.03, 0.04, 0.04
Compute point estimate of the population mean
6/21/2017 Teresa K. 46
Example 1 cont’d…
If x1 , x 2 , ..., x n are n observed values , then
n
∑ xi
0.704
x= i =1
= = 0.0196
n 36
Construct 90% and 95% confidence interval for the
mean:
90%CI =(0.0196-1.65x0.0123/6, 0.0196+1.65x0.0123/6)
=(0.0134, 0.0235)
95%CI=(0.0196-1.96x0.0123/6,0.0196+1.96x0.0123/6)
=(0.0124, 0.0245)
6/21/2017 Teresa K. 47
Example 2
The mean diastolic blood pressure for 225 randomly
selected individuals is 75 mmHg with a standard
deviation of 12.0 mmHg. Construct a 95% confidence
interval for the mean
Solution
n=225
mean =75mmhg
Standard deviation=12 mmHg
confidence level 95%
The 95% confidence interval for the unknown population mean is
given
95%CI = (75 ±1.96x12/15) = (73.432,76.56)
6/21/2017 Teresa K. 48
Example 3
In a survey of 300 automobile drivers in one city, 123
reported that they wear seat belts regularly. Estimate
the seat belt rate of the city and 95% confidence
interval for true population proportion.
Solution
The point estimate of p = 123/300
=0.41 (41%)
6/21/2017 Teresa K. 50
Given
Population 1 (non-smokers)
n 1=50 , = 76, S1 = 8,
δ2 =
δ2 =
6/21/2017 Teresa K. 51
Example 4
Solution
A. The point estimation of the difference of mean is:
point estimation of the difference = -
= 76 – 68
=8
B. 95% CI
CI =
= (8-1.96*1.28, 8+1.96*1.246)
=(5.491,10.442)
6/21/2017 Teresa K. 52
Example 5
Each of two groups consists of 100 patients who
have leukemia. A new drug is given to the first group
but not to the second (the control group). It is found
that in the first group 75 people have remission for 2
years; but only 60 in the second group.
6/21/2017 Teresa K. 53
Example 5 cont’d ….
Solution
Note that
n1p1=100*0.75=75>5
n1q1 = 100*0.25=25>5
n2p2 = 100*0.60=60>5
n2q2 =100*0.40=40>5
p1 = 0.75, q1 = 0.25, n1=100
p2 = 0.60, q2= 0.40, n2=100
δ21 = p1q1/n1 = 0.75*0.25/100= 0.001875
δ22 = p2q2/n2 =0.60*0.40/100= 0.0024
Hence, δ2(1-2) = 0.001875+ 0.0024 = 0.004275
δ(1-2) = = 0.0653
6/21/2017 Teresa K. 54
Example 5 cont’d ….
6/21/2017 Teresa K. 55
Assignment
1. In a hospital, the mean noise level in the 170 ward
areas was 58.0 decibels and the standard deviation
was 4.8. Find 95% confidence interval for the true
mean?
2. In Addis Ababa, a survey of 350 students showed that
28% carried their lunch to school. Find the 95% CI for
the true population proportion of students who carried
their lunch to school?
3. A recent study in Gondar from 100 people found that
22 were obese. Find the 95% confidence interval for
the true population proportion?
6/21/2017 Teresa K. 56
Assignment
6/21/2017 Teresa K. 57
6. The standard hemoglobin reading for normal males of
adult age is 15 g/100 ml. The standard deviation is
about 2.5 g/100 ml. For a group of 36 male
construction workers, the sample mean was 16 g/100
ml.
– Construct a 95% confidence interval for the male
construction workers. What is your interpretation of
this interval relative to the normal adult male
population?
– What would the confidence interval have been if the
above results were obtained based on 49
construction workers?
6/21/2017 Teresa K. 58
HYPOTHESIS TESTING
6/21/2017 Teresa K. 60
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 61
Example of Hypothesis?
A hypothesis is an
assumption about the Example of hypothesis
population parameter. I assume the mean GPA of this
class is 3.5!
– A parameter is a
characteristic of the
population, like its mean or
variance.
– The parameter must be
identified before analysis.
Types of hypothesis;
1. The null hypothesis:
Null hypothesis (represented by HO) is the
statement about the value of the population
parameter. That is the null hypothesis postulates
that ‘there is no association between factor and
outcome’ or ‘there is no an intervention effect’.
It is the main hypothesis which we wish to test .
pronounced
H “nought”
6/21/2017 Teresa K. 65
Hypothesis Testing cont’d…
Possible choices of HA :
If Ho is Then HA is
µ = A(single mean) µ ≠ A or µ < A or µ > A
P = B(single proportion) p ≠ B or p < B or p> B
µ1- µ2 = C (difference of means) µ1- µ2 ≠ C or µ1- µ2 < C or µ1- µ2 > C
P1-p2 = D(difference of proportion) P1-p2 ≠D or p1-p2 < D or P1-P2 > D
6/21/2017 Teresa K. 68
Hypothesis Testing cont’d…
Exercises
State HA and HO for each of the following
1. A researcher thinks that if expectant mothers use vitamin
pills, the birth weight of the babies will increase. The
average of the birth weight of the population is 8.6
pounds.
2. A psychologist feels that if he plays soft music during a test,
the result of the test will be changed. He is not sure whether
the grade will be higher or lower. In the past the mean of
the scores was 73.
3. Is the average height of the CHS students 1.63 m or is it
something different?
4. There is a belief that 10% of the smokers develop lung
cancer in country x.
5. Are men and women infected by malaria in equal
proportions, or is a higher proportion of men get malaria in
6/21/2017
Ethiopia? Teresa K. 69
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 70
Hypothesis Testing cont’d…
Level of significance
A method for making a decision must be agreed
upon.
If HO is rejected, then HA is accepted.
How is a “significant” difference defined?
A null hypothesis is either true or false, and it is
either rejected or not rejected.
No error is made if it is true and we fail to reject it, or
if it is false and rejected.
An error is made, however, if it is true but rejected,
or if it is false and we fail to reject it.
6/21/2017 Teresa K. 71
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 72
Hypothesis Testing cont’d…
Null hypothesis
6/21/2017 Teresa K. 73
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 75
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 76
Hypothesis Testing cont’d…
6/21/2017 Teresa K. 77
Level of Significance (α), critical value and the Rejection
Region
H0: µ = µ0
H1: µ > µ0 0
α/2
H0: µ = µ0
H1: µ ≠ µ0
0
Two tailed test
6/21/2017 Teresa K. 78
Alpha (α ) vs. critical value.
The α-level is represented by
the clouded areas.
Sample results in this area lead
to rejection of H0.
Region of
DOUBT Region of
/rejection DOUBT /rejection
region region
Acceptance region
Critical value
6/21/2017 Teresa K. 79
Alpha (α ) vs. critical value cont’d...
Acceptance region
Region of
DOUBT Region of
/rejection DOUBT /rejection
region region
Acceptance region
6/21/2017 Teresa K. 82
What Do We Test
6/21/2017 Teresa K. 83
Hypothesis testing for population mean
Test procedure for two tailed test
1. state the null hypothesis: H0: µ =µ0
2. state the alternative hypothesis:H1:µ≠µ0
3. fix the level of significance(α) and compute the
test statistics under the null (assuming the null
hypothesis is true) as:
x − µ0
z =
se
Note that: this is not the only test statistics.
depending on the type of data and sample size, we
may need to compute z-score, t-score or x2-score
For large samples (n≥30), the test statistic has
standard normally distribution
6/21/2017 z ~ N (0, 1) Teresa K. 84
Steps for two tailed Test cont’d…
6/21/2017 Teresa K. 85
4. Find the critical values corresponds to the given alpha
(α) from the distribution table.
5. Decision rule: For the hypothesis which is two tailed,
the decision is defined by:
Reject the null Hypothesis if:
x - µo
zcal = > ztab = zα /2
SE
Do not reject the null hypothesis if:
x - µo
zcal = < ztab = zα /2
SE
6/21/2017 Teresa K. 86
Example:
6/21/2017 Teresa K. 87
Example: cont’d…
3. Test statistics:
x - µo 27 - 30
Z = = = -2.12
SE 20
10
6/21/2017 Teresa K. 88
Example: cont’d…
5. Decision rule:
6/21/2017 Teresa K. 89
Test procedure for one tailed test
6/21/2017 Teresa K. 90
Test procedure for one tailed test cont’d…
6/21/2017 Teresa K. 91
Test procedure for one tailed test cont’d…
x - µo
Z =
SE
6/21/2017 Teresa K. 92
Test procedure for one tailed test cont’d…
x - µo
Z cal = < zα
6/21/2017
SE
Teresa K. 94
Test procedure for one tailed test cont’d…
6/21/2017 Teresa K. 96
Example: cont’d …
3. Test statistics:
x - µo 27 - 30
Z = = = -2.12
SE 20
10
Critical value
6/21/2017 Teresa K. 98
Comparison of two Means
6/21/2017 Teresa K. 99
The relevant null hypothesis is that the means are
identical, i.e.,
HO: µt = µc or HO : µt- µc = 0
The rationale for the test of significance is as before.
Assuming the null hypothesis is true (i.e., that there is
no difference in the population means), one determines
the chance of obtaining differences in sample means as
discrepant as or more discrepant than that observed.
If this chance is sufficiently small, there is reasonable
evidence to doubt the validity of the null hypothesis;
hence, one concludes there is a statistically significant
difference between the means of the two populations
(i.e., one rejects the null Teresa
6/21/2017
hypothesis).
K. 100
Example
If a random sample of 50 non-smokers have a mean life
of 76 years with a standard deviation of 8 years, and a
random sample of 65 smokers live 68 years with a
standard deviation of 9 years,
Test the hypothesis that there is no difference
between the mean lifetimes of non smokers and
smokers at a 0.01 level of significance.
δ21 =
δ(n-s) = =
p-π
zcal = < ztab = zα /2
SE
6/21/2017 Teresa K. 105
Example:
• The national institute of mental health published an
article stating that in any one year period,
approximately 9.5 percent of American adults suffer
from depression or a depressive illness. Suppose that
in a survey of 100 people in a certain town, seven of
them suffered from depression or a depressive
illness. Conduct a hypothesis test to determine if the
true proportion of the people in that town suffering
from depression or depressive illness is different
from the percent in the general adult American
population.
Solution:
Hypothesis: Ho: π = 0.095
HA: π > 0.095
The level of significance for the test is the same, α =
0.05. The critical value can be Z value corresponding
to the given alpha. i.e. zα = z0.05 = 1.65.
Test statistics:
p −π 0.07-0.09
zcal = = = -29.08
6/21/2017
SE 0.0293
Teresa K. 110
Decision: since Zcal is less than the tabulated value of
Z, then the decision is do not reject the null
hypothesis
Se(p1-p2) =
zcal =
( p1 − p2 ) − (π 1 − π 2 )
z cal =
p1 (1 − p1 ) p2 (1 − p2 )
+
n1 n2
Z cal = =
6/21/2017
= Teresa K.
= 0.018/0.0021=8.571 116
Decision: reject Ho
Because Z calc > Z tab; in other words, the p- value is
less than the level of significance (i.e., α= 0.01)
Test statistics:
Z cal = =
= = 0.018/0.0021=8.571
Decision: reject Ho
Because Zcalc > Ztab; in other words, the p- value is less
than the level of significance (i.e., α= 0.01)
6/21/2017 Teresa K. 119
Hypothesis testing for means and proportion of small
samples
Zα Zcal
Solution:
Following the steps of hypothesis testing, the first step is
stating the null and alternative hypothesis, but before
that let us see the observed difference using the normal
curve:
6/21/2017 Teresa K. 129
Is 15.76 far enough to right
of μ=15 to be in the critical
area (rejection region)?
µ = 15
Confidence Interval
1. Provide information that p-value gives.
– If null value is included in a 95% confidence interval,
by definition the corresponding P-value is >0.05.
2. Indicate the amount of variability (effect of sample size)
by the width of the confidence interval.
– This information can not be obtained from p-value.