Hypothesis Testing
Hypothesis Testing
Hypothesis Testing
By
Hirbo Shore (MPH, Assistant Professor)
School of Public Health, CHMS-HU
Contact: [email protected]
Hypothesis testing
Objectives:
•Calculate p-values and test statistics
•Understand the role of significance and the difference between Type I and Type II
errors
•Explain the connection between hypothesis testing and confidence intervals
•Perform parametric hypothesis tests relating to:
a) A sample mean
b) The difference between two sample means (independent samples)
c) The mean difference between two dependent samples.
d) A sample proportion
e) The difference between two sample proportions (independent samples)
2
Hypothesis Testing
• A research question is typically posed as a hypothesis
• Sometimes based on prior research, preliminary observations, or an
“educated guess”
– e.g. in a randomised control trial the hypothesis might be “the
proportion of patients with disease X who survive after receiving
a drug treatment is greater than the proportion of patients with
disease X who survive after receiving a placebo”
3
• Hypothesis: A statement about one or more population
4
Steps involved in testing about a
hypothesis
1. State the research question in terms of statistical hypothesis.
5
• Example
Null Hypothesis: Things are what they say they are, status quo.
Ho: μ ≠ μ0 Ho: μ =1.6m
6
2. Select a sample and collect data
3. Decide on the appropriate test statistic for the hypothesis (Z,
t, χ2, F, etc.)
– Test Statistic is function of the data that uses estimates of the
parameters we are interested in and whose sampling distribution
is known when we assume the null hypothesis is true.
7
4. Select the level of significance for the statistical test (α=0.05,
0.01, 0.001, etc.)
– Determine the critical value. A value the test statistic must attain
to be declared significant (2) (3)
H0: True
μ0 μ1
8
• What would you conclude if the calculated value of the test statistic
• Location (3)?
9
10
6. Perform the calculation
11
• Another way to make decision
12
Level of Significance
13
Example
• Suppose we are interested in finding evidence that there
is a difference in average height between adult
Ethiopian men and women
14
Example
15
Types of Errors
16
Types of Errors
Why does this happen?
• Due to natural random variation among subjects in a
population, sometimes the sample data will lead us to
an incorrect conclusion
17
Types of Errors
• There are four possible situations:
18
Types of Errors
19
Type I Error
• Type I error occurs if, based on the sample data, it is decided
to reject the null hypothesis when in fact (i.e., in the
population) the null hypothesis is true
20
Type II Error
• A Type II Error occurs if, based on the sample data, we do not
reject the null hypothesis when in fact (i.e in the population) the
null hypothesis is false
21
CI’s and Hypothesis testing
There is a strong link between (two-tailed) hypothesis testing and
confidence intervals:
• Suppose 𝐻0 is that a treatment effect is zero and the significance is
set at 𝛼 = 0.05
• Then 𝐻0 can be rejected if the 95% confidence interval for the
treatment effect does not include zero.
• This means that the observed test statistic is not inside the CI for
the population value under the null hypothesis
• i.e. the chance of observing a test statistic like this by chance is less
than 𝛼 = 0.05
22
Hypothesis tests
A. A sample mean
D. A sample proportion
23
Hypothesis tests
24
Hypothesis tests
a) Hypothesis test for a single mean EXAMPLE
• Researchers collected serum amylase values from a random
sample of 50 apparently healthy subjects.
• They want to know whether they can conclude that the mean
of the population from which the sample was drawn is
different from 120 units/100 ml, at a significance level of 0.05.
• The mean and standard deviation computed from the sample
are 96 and 35 units/100 ml respectively.
• We don’t have the data but will assume a normal distribution
for sample values
25
Hypothesis tests
26
Hypothesis tests
27
Hypothesis tests
Test
statistic
df
p-value for two-
sided test
28
Hypothesis tests
29
Hypothesis tests
b) Hypothesis test for difference between two means (independent
samples)
Assumptions:
1. The two populations are normally distributed.
30
Hypothesis tests
b) Hypothesis test for difference between two means (independent
samples)
Notes about the normality assumption:
• For hypothesis tests for means, the assumption that the underlying population
distributions are approximately normally distributed is fairly robust.
31
Hypothesis tests
b) Hypothesis test for difference between two means (independent
samples)
Notes about the equality of variance assumption:
• Stata performs this version of the t-test when you specify the ,unequal option.
• The degrees of freedom for this test are not the usual n1 + n2 - 2 and may not even
be an integer.
32
Hypothesis tests
33
Hypothesis tests
34
Checking assumptions of Normality:
histogram pulse1, by(sex) norm graph hbox pulse1, by(sex)
1 2 1 2
.06
.04
Density
.02
0
40 60 80 100 40 60 80 100
pulse1
Density 50 60 70 80 90 100 50 60 70 80 90 100
normal pulse1 pulse1
Graphs by sex
Graphs by sex
1.00
1.00
0.75
0.75
Normal F[(pulse1-m)/s]
0.50
0.50
0.25
0.25
0.00
0.00
variances
are similar
36
Hypothesis tests
37
Hypothesis tests
38
Hypothesis tests
39
Hypothesis tests
b) Hypothesis test for difference between two means (independent
samples) EXAMPLE
Step 4: using Stata to calculate the test statistic and p-value
40
41
Allowing
unequal
variances affects
• Standard errors fo
diff. in means and
the CI
• test statistic t
• p-value
• df
42
Hypothesis tests
c) Hypothesis test for difference between two means (dependent
samples)
Assumptions
1. The population of difference scores is normally distributed.
2. The two samples are dependent (e.g. before and after, pairs of knees)
Just like for confidence intervals, we start by calculating the difference scores
(e.g. BP_after - BP_before) for each pair. Then perform a t-test on the new
variable.
43
Hypothesis tests
44
Hypothesis tests
c) Hypothesis test for difference between two means (dependent
samples) EXAMPLE
• 13 apnea patients received aminophylline treatment
• The number of apneic episodes per hour was measured 24 hours
before treatment and 16 hours after treatment
• Test the difference in mean apneic episodes per hour before and
after the treatment
45
Hypothesis tests
46
Hypothesis tests
c) Hypothesis test for difference between two means (dependent
samples) EXAMPLE
Checking normality of diff=before-after
hist diff, bin(8) pnorm diff
1.00
1
0.75
Normal F[(diff-m)/s]
Density
0.50
.5
0.25
0.00
0
47
Hypothesis tests
48
Hypothesis tests
49
Hypothesis tests
c) Hypothesis test for difference between two means (dependent
samples) EXAMPLE
Step 4: calculate test statistic and p-value
OR using Stata dataset: ttest varname==0
50
Hypothesis tests
c) Hypothesis test for difference between two means (dependent
samples) EXAMPLE
Step 5: conclusion
The test statistic is t=5.278 compared to a t-distribution with df=12.
The p-value is p=0.0002 which is less than the pre-specified alpha level 0.05. We
therefore reject H0 and conclude that there is a significant difference between
apneic episodes before and after treatment.
Because the mean difference of the “before – after” scores was positive, this means that
on average the number of apneic episodes was higher before the treatment.
That is, treatment with Aminophylline seems to influence (reduce) the frequency of
apneic episodes.
51
Hypothesis tests
52
Hypothesis tests
53
Hypothesis tests
54
Hypothesis tests
55
Hypothesis tests
56
Hypothesis tests
Z-statistic
p-value for
two-sided test
Don’t worry that this is labelled “mean”,
it’s still a test of proportion
57
Hypothesis tests
58
Hypothesis tests
59
Hypothesis tests
60
Hypothesis tests
e) Hypothesis test for difference between two proportions
(independent samples) EXAMPLE
• In a field trial of Salk poliomyelitis vaccine 200,745 children received the vaccine,
of whom 33 developed paralytic polio.
• Placebo was given to 201,229 children, of whom 115 contract paralytic polio.
61
Hypothesis tests
62
Hypothesis tests
63
Hypothesis tests
64
Hypothesis tests
Test
statistic
proportions
vaccine group
(row percentages)
placebo group
66
CI’s and Hypothesis testing
• There is a strong link between (two-tailed) hypothesis testing and
confidence intervals
67
CI’s and Hypothesis testing
Reporting considerations
• Reporting the results of hypothesis tests as “significant” or “not significant”
is not very informative
• The results of a medical study don’t simplify into a “yes” or “no” answer
• It is better to provide the actual p-value for the hypothesis test. Medical
journals require that hypothesis tests in submitted papers be reported with
p-values and not as “S” or “NS”.
68
CI’s and Hypothesis testing
Statistical vs. Clinical significance
• Reporting results as “significant” or “not significant” encourages the
possibly erroneous interpretation that statistical significance is the same as
medical or practical importance.
69
CI’s and Hypothesis testing
70
CI’s and Hypothesis testing
Power considerations
• If the null hypothesis is not rejected, it does not mean that it should be
accepted as true.
• We may have made a Type II error (failed to detect a difference when one
really exists).
• In this case one should consider the power of the test (i.e., its ability to
detect a difference that would be considered to be clinically important or
worth detecting).
71
CI’s and Hypothesis testing
72
Thankyou!
73
Exercise 1a
single proportion – one tail test
• In a sample of 1500 residents of an inner city
neighbourhood who participated in a health screening
program, 125 tests yielded positive results for sickle-
cell anaemia.
• Let α=0.05
74
Ho: The proportion of individuals in the population with sickle cell
anaemia is less than or equal to 0.06.
(P 0.06)
Ha: The proportion of individuals in the population with sickle cell
anaemia is greater than 0.06.
(P > 0.06) one-sided test
Assumptions:
i) must assume random sample, independent observations.
ii) nP0 = 90, n (1 − P0 ) = 140, both > 5 so normal approximation
to the binomial is appropriate ( P0 = 0.06 ).
Set = 0.05
Where: 125
pˆ = = 0.0833
1500
P0 = 0.06
display 1-normprob(3.82)
0.00006673 (one-sided test)
display 2*(1-normprob(3.82))
.00013345 (two-sided test)
76
prtesti 1500 125 0.06, count
One-sample test of proportion x: Number of obs = 1500
------------------------------------------------------------------------------
Variable | Mean Std. Err. [95% Conf. Interval]
-------------+----------------------------------------------------------------
x | .0833333 .0071362 .0693466 .0973201
------------------------------------------------------------------------------
p = proportion(x) z = 3.8052
Ho: p = 0.06
Set the level of significance (α)=1% and thus also consider 99% confidence intervals
77
Exercise 1b
Comparing two means (independent)
Test the hypothesis that the vaccines differ in effectiveness with = 0.05
(assume that antibody responses are normally distributed).
Ho: The antibody responses of the two vaccines are the same ( 1 = 2 )
Ha: The antibody responses of the two vaccines are different
( two-tailed test) ( 1 2 )
Assumptions:
We are told in the question that the populations are normally distributed.
The sample standard deviations are similar.
We would need to check with the investigator that the two groups were chosen at random
and that the observations in the two groups are independent.
78
Set = 0.05
( x1 − x 2 ) = 4.5 − 2.5 = 2.0
s 12 ( n1 − 1) + s 22 ( n 2 − 1) 2.52 9 + 2.0 2 8
s 2p = =
n1 + n 2 − 2 10 + 9 − 2
= 5.1912
2.0
t = = 1.91
5.1912 ( 1 10 + 1 9 )
79
Exercise 1b
Comparing two means (independent)
ttesti 10 4.5 2.5 9 2.5 2.0
Two-sample t test with equal variances
------------------------------------------------------------------------------
| Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
---------+--------------------------------------------------------------------
x | 10 4.5 .7905694 2.5 2.711608 6.288392
y | 9 2.5 .6666667 2 .9626639 4.037336
---------+--------------------------------------------------------------------
combined | 19 3.552632 .5598594 2.440371 2.376411 4.728853
---------+--------------------------------------------------------------------
diff | 2 1.04686 -.2086807 4.208681
------------------------------------------------------------------------------
diff = mean(x) - mean(y) t = 1.9105
Ho: diff = 0 degrees of freedom = 17
Conclusion:
There is no statistically significant difference in the mean antibody response
from the two vaccines since p-value<0.05 (p = 0.073).
80
Exercise 1c- Paired t-test
81
Exercise 1c- Paired t-test
Ho: The mean difference in dexterity scores is less than or equal
to zero ( 0)
D
82
Exercise 1c- Paired t-test
• Assumptions:
• that the two samples are dependent (we are told that we have matched pairs)
• that the pairs were obtained by independent random sampling (need to check with
investigator)
• that individuals within pairs were randomly allocated to new or standard treatment (we
are told that this is so)
• Set = 0.05
• You could calculate the mean and standard deviations of the within pair differences by
hand.
• Alternatively you could enter the data into Stata and get Stata to do this for you:
83
gen diff = new - standard
sum diff
Variable | Obs Mean Std.Dev. Min Max
---------+--------------------------------
diff | 24 5.375 5.64772 -5 17
One-sample t test
--------------------------------------------------------------------
Obs Mean Std. Err. Std. Dev. [95% Conf. Interval]
--------+----------------------------------------------------------
x | 24 5.375 1.152836 5.64772 2.990177 7.759823
--------------------------------------------------------------------
Degrees of freedom: 23
Ho: mean(x) = 0
Ha: mean < 0 Ha: mean != 0 Ha: mean > 0
t = 4.6624 t = 4.6624 t = 4.6624
P < t = 0.9999 P > |t| = 0.0001 P > t = 0.0001
Note: Stata says One sample t -test
We have considered a paired t-test where Ho: diff=0
And hence the similar output to the one sample test
84
Exercise 1c- Paired t-test
Conclusion:
• Since p< 0.05, we reject Ho and conclude that the
dexterity scores are statistically significantly higher
on the new therapy as compared to the standard
treatment.
85
!!
THANK YOU FOR YOUR ATTENTION!!
86