Hypothesis Testing

Download as pdf or txt
Download as pdf or txt
You are on page 1of 38

Hypothesis testing

Luca Aguilar

Mathematics Department, School of Technology, University of Extremadura


[email protected]

May 2017
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Overview

1 Introduction

2 Hypothesis tests for the parameters of key distributions

3 Relation between confidence intervals and hypothesis


tests

4 Classical approach to hypothesis testing


Introduction Tests for key distributions Relation between intervals and tests Classical approach

1 Introduction

In the previous lecture you have studied point and interval


estimation for the unknown parameter values of
distributions. Both are techniques employed in the
inferential phase of statistical analysis.
In this course we consider hypothesis testing, the third
main inferential technique.
As we shall see, hypothesis testing has much in common
with confidence interval construction, and we will draw
heavily on the results already presented for the sampling
distributions of the random variables underpinning such
constructions.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

1 Introduction

We first introduce and illustrate hypothesis testing


procedures for the unknown parameter values of the main
distributions.
In the practical session we will introduce a test used to
explore whether a data set was drawn from a normal
population.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2 Hypothesis tests for the parameters of key


distributions

In this section we consider procedures designed to test


whether the parameters of the Bernoulli, binomial, Poisson,
exponential and normal distributions take specified values.
We assume that a single variable, X , is of interest and we
measure its value for each element of a simple random
sample of size n drawn from the population under
consideration.
Due to the use of simple random sampling, it is reasonable
to assume that the values obtained, X1 , X2 , ..., Xn , are
independent and identically distributed (iid) with a common
distribution which is that of X in the population.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

As we know, the Bernoulli distribution has a single


parameter: p, the probability of success.
Suppose X1 , X2 , ..., Xn are iid Bernoulli random variables
and we are interested in testing whether p takes a specific
value, p0 say.
We say that the null hypothesis is that p = p0 , denoted as

H0 : p = p0 .
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

The null hypothesis is tested against an alternative


hypothesis, denoted as H1 , which in this case could be any
one of:
1 H1 : p 6= p0 ;
2 H1 : p < p0 ;
3 H1 : p > p0 .
The first is referred to as being two-sided because under it
p could be less than p0 or greater than p0 .
The other two are referred to as being one-sided
alternative hypotheses.
Unless we have reasons for investigating one of the
one-sided alternatives, it is usual to test H0 against the
two-sided alternative hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

To make things more concrete, we will consider an


example, Example 1. A large computing company wants
to know what proportion of its workers are in their offices
after 20:00. On a typical working day they choose 50
members of staff at random and found that the proportion
of the 50 who were in their offices after 8 oclock was
0.72.
So our point estimate of p is p = 0.72.
We can calculate a 95% confidence interval for p,
obtaining the interval (0.60, 0.84).
Suppose now that we wanted to test the null hypothesis
that p = 0.75, i.e. H0 : p = 0.75, against the two-sided
alternative hypothesis that p 6= 0.75, i.e. H1 : p 6= 0.75.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

Having identified the null and alternative hypotheses we


next calculate the value of a test statistic.
For the problem under consideration, and sufficiently large
n, this isp
no other than the value of the random variable
(p p)/ p(1 p)/n, with p replaced by its value under
the null hypothesis, namely p0 .
Implicitly, then, we are assuming the null hypothesis to be
true. And we carry on assuming it to be true until we have
sufficient evidence to the contrary.
So we calculate the value of
p p0
Z0 = p .
p0 (1 p0 )/n
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

p p0
Z0 = p .
p0 (1 p0 )/n

If p = p0 we would expect p to have a value close to p0 ,


and hence Z0 to have a value close to 0.
If p was actually less than p0 then we would expect p to be
less than p0 and hence the value of Z0 to be negative.
If p was actually greater than p0 then we would expect p to
be greater than p0 and hence the value of Z0 to be positive.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

p p0
Z0 = p .
p0 (1 p0 )/n

Returning to the data of Example 1, under H0 : p = 0.75


the observed value of Z0 is
p
z0 = (0.72 0.75)/ (0.75(1 0.75)/50) = 0.49.

Does this value provide evidence for or against the null


hypothesis?
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution

p p0
Z0 = p .
p0 (1 p0 )/n

Being an estimator, p is a random variable and as Z0 is a


function of it, Z0 is a random variable too.
According to the Central Limit Theorem, for large enough
n, the distribution of Z0 should be well approximated by the
standard normal distribution if p = p0 .
To decide whether the calculated value of Z0 is sufficiently
far from 0 for us to reject the null hypothesis, we calculate
what is referred to as the p-value of the test, defined as:
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

Definition
The probability, under the null hypothesis, of observing a value
of the test statistic that is at least as extreme as its value for the
data.

In this definition, what extreme means depends on the


alternative hypothesis under consideration.
For the problem under consideration, the test statistic is Z0 .
Values of Z0 close to 0 tend to support the null hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

p p0
Z0 = p .
p0 (1 p0 )/n

For a two-sided alternative hypothesis large values of Z0 ,


either positive or negative, tend to indicate that the
alternative hypothesis is true.
For H1 : p < p0 large negative values of Z0 tend to indicate
that the alternative hypothesis is true.
For H1 : p > p0 large positive values of Z0 tend to indicate
that the alternative hypothesis is true.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

So, for the two-sided alternative hypothesis H1 : p 6= p0 the


p-value is
P(Z0 |z0 | or Z0 |z0 |).
As the standard normal distribution is symmetric, this
probability is

2P(Z0 |z0 |) = 2(|z0 |),

where denotes the distribution function of the standard


normal distribution.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

For the one-sided alternative hypothesis H1 : p < p0 the


p-value is
P(Z0 z0 ) = (z0 ).
For the one-sided alternative hypothesis H1 : p > p0 the
p-value is
P(Z0 z0 ) = 1 (z0 ).
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

Returning to the hypothesis testing problem for the data


of Example 1, for the two-sided alternative hypothesis
H1 : p 6= 0.75, the p-value is 2(| 0.49|) = 2(0.49).
In R this probability can be calculated using
2*pnorm(-0.49)
The value returned by R is 0.62.
If the alternative hypothesis had been H1 : p < 0.75, the
p-value would have been (0.49) = (0.49) = 0.31.
If the alternative hypothesis had been H1 : p > 0.75, the
p-value would have been
1 (0.49) = 1 0.31 = 0.69.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

Given its definition, a p-value is a probability and must


therefore take a value in the interval [0, 1].
Large p-values tend to suggest that values of the test
statistic like that observed for the data are highly likely
under the null hypothesis.
Small p-values suggest that values of the test statistic like
that observed for the data are improbable under the null
hypothesis.
So, a large p-value does not provide evidence against the
null hypothesis.
A small p-value does provide evidence against the null
hypothesis and in favour of the alternative hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

But how small is small?, I hear you ask.


It has become standard practice in many disciplines to take
p-values less than 0.05 (5% or 1 in 20) as significant
statistical evidence for the rejection of the null hypothesis
and acceptance of the alternative hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

For our testing problem for the data of Example 1, none of


the p-values for the different alternative hypotheses are
less than 0.05.
So there is no statistical evidence to reject the null
hypothesis H0 : p = 0.75 in favour of any of the three
alternative hypotheses at the 5% level of significance.
In practice, we would only be interested in testing the null
hypothesis against one of the three alternative
hypotheses, not all three.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value

But what if we had wanted to test the null hypothesis


H0 : p = 0.55?
p
Now, z0 = (0.72 0.55)/ (0.55(1 0.55)/50) = 2.42,
and:
1 2(|z0 |) = 2(2.42) = 0.016;
2 (z0 ) = (2.42) = 0.992;
3 1 (z0 ) = 1 (2.42) = 0.008.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

p-value
So, if the alternative hypothesis had been H1 : p 6= 0.55,
we would have rejected the null hypothesis H0 : p = 0.55
in favour of the alternative hypothesis at the 5%
significance level because 0.016 is less than 0.05.
If the alternative hypothesis had been H1 : p 0.55 we
would not have rejected the null hypothesis in favour of
the alternative hypothesis at the 5% significance level
because 0.992 is (much) greater than 0.05.
If the alternative hypothesis had been H1 : p 0.55 we
would have rejected the null hypothesis in favour of the
alternative hypothesis at the 5% significance level
because 0.008 is smaller than 0.05.
So, clearly, the result of a hypothesis test very much
depends on the alternative hypothesis being investigated,
not just the null hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.1 Bernoulli distribution


The steps in this hypothesis testing procedure are:
1 Identify the value of p0 for the null hypothesis H0 : p = p0 .
2 Identify the alternative hypothesis H1 of interest (one of
p 6= p0 , p < p0 or p > p0 ).
3 Calculate the value of the test statistic z0 = pp0 for
p0 (1p0 )/n
the data.
4 Calculate the p-value for the chosen alternative hypothesis
(either 2(|z0 |), (z0 ) or 1 (z0 )).
5 If the p-value is less than 0.05, reject the null hypothesis in
favour of the alternative hypothesis (at the 5% significance
level).
6 If not, there is no statistically significant evidence against
the null hypothesis and there is no reason to reject it in
favour of the alternative hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.2 Binomial distribution B(k ,p)

We will assume that the number of Bernoulli trials, k , is


known, and concentrate on hypothesis testing for p, the
probability of success in a Bernoulli trial.
Proceeding as in Section 2.1, the steps for this hypothesis
testing procedure are the same as those at the end of the
previous subsection apart from the third which changes to:
pp0
3. Calculate the value of the test statistic z0 =
p0 (1p0 )/kn
for the data.
The only difference between the test statistic here and that
for the test in Section 2.1 is the inclusion of the number of
Bernoulli trials, k .
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Example 2

45 Computer Engineering and 52 Software Engineering


students attempted to complete 6 simple programming
tasks in two hours. It was found that, for the 45 Computer
Engineering students, the proportion of successfully
completed programming tasks was 0.65, and for the 52
Software Engineering students the proportion was 0.78.
We can calculate 95% confidence intervals for the value
of p in the two populations of students, obtaining the
interval (0.59, 0.71) for the Computer Engineering
students and (0.73, 0.83) for the Software Engineering
students.
Here we consider testing the null hypothesis
H0 : p = 0.68 against the alternative hypothesis
H1 : p > 0.68 in each of the two populations.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Example 2

For the Computer Engineering


p students,
z0 = (0.65 0.68)/ 0.68(1 0.68)/(6(45)) = 1.06
and for the SoftwarepEngineering students,
z0 = (0.78 0.68)/ 0.68(1 0.68)/(6(52)) = 3.79.
For the Computer Engineering students,
1 (z0 ) = 1 (1.06) = 0.86
and for the Software Engineering students,
1 (z0 ) = 1 (3.79) = 0.00008.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Example 2

As the p-value for the Computer Engineering students is


greater than 0.05, we have no reason to reject the null
hypothesis H0 : p = 0.68 in favour of the alternative
hypothesis H1 : p > 0.68 at the 5% significance level.
However, the p-value for the Software Engineering
students is far less than 0.05 and so we reject the null
hypothesis H0 : p = 0.68 in favour of the alternative
hypothesis H1 : p > 0.68 at the 5% significance level.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

2.3 Poisson distribution

Here we consider hypothesis testing for the parameter of


a Poisson distribution.
Proceeding along analogous lines to those in Section 2.1,
the steps for this hypothesis testing procedure are the
same as those at the end of Section 2.1 apart from the first
three which change to:
1 Identify the value of 0 for the null hypothesis H0 : = 0 .
2 Identify the alternative hypothesis H1 of interest (one of
6= 0 , < 0 or > 0 ).
x0
3 Calculate the value of the test statistic z0 = for the
0 /n
data.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Example 3

Number of users connecting to the network each minute


during a period of an hour.
4 5 7 8 7 6 7 4 4 7
9 7 3 5 10 8 7 4 7 10
6 6 6 6 9 8 8 7 4 4
9 6 11 5 12 11 7 4 4 10
8 11 6 8 8 10 7 12 6 6
7 10 8 12 3 6 3 8 4 4

The point estimate of , = x, is 6.98.


We test now the null hypothesis H0 : = 8 against the
alternative hypothesis H1 : < 8.
The value of the p
test statistic is
z0 = (6.98 8)/ 8/60 = 2.79.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Example 3

The p-value is (z0 ) = (2.79) = 0.003.


As the p-value is considerably less than 0.05, we reject
the null hypothesis H0 : = 8 in favour of the alternative
hypothesis H1 : < 8 at the 5% significance level.
We therefore have significant statistical evidence that the
mean number of connections to the network per minute is
less than 8.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Table 1 Summary of hypothesis tests with the column headings denoting: D, distribution; NH, null hypothesis; AH,

alternative hypothesis; TS, test statistic; SD, sampling distribution under the null hypothesis; PV, p-value.

D NH AH TS SD PV
pp0
Bernoulli p = p0 p 6= p0 Z0 = r N(0, 1) 2(|z0 |)
p0 (1p0 )
n
p < p0 (z0 )
p > p0 1 (z0 )
pp0
Binomial p = p0 p 6= p0 Z0 = r N(0, 1) 2(|z0 |)
p0 (1p0 )
kn
p < p0 (z0 )
p > p0 1 (z0 )
X
Poisson = 0 6= 0 Z0 = q 0 N(0, 1) 2(|z0 |)
0
n
< 0 (z0 )
> 0 1 (z0 )
 
Exponential = 0 =6 0 Z0 = 0 1 n N(0, 1) 2(|z0 |)

< 0 1 (z0 )
> 0 (z0 )
X 0
Normal = 0 6= 0 T0 = S
tn1 2Ft,n1 (|t0 |)

n
< 0 Ft,n1 (t0 )
> 0 1 Ft,n1 (t0 )

(n1)S 2
 
2 = 02 2 6= 02 C0 = 2n1 2 min F2 ,n1 (c0 ), 1 F2 ,n1 (c0 )
2
0
2 < 02 F2 ,n1 (c0 )
2 > 02 1 F2 ,n1 (c0 )
Introduction Tests for key distributions Relation between intervals and tests Classical approach

3 Relation between confidence intervals and


hypothesis tests

There is clearly much in common between the confidence


interval constructions and the hypothesis testing
procedures.
More generally, a 100(1 )% confidence interval for a
parameter, say, contains all those values of 0 for which
the null hypothesis H0 : = 0 would not be rejected at the
100% significance level against the two-sided alternative
hypothesis H1 : 6= 0 .
Introduction Tests for key distributions Relation between intervals and tests Classical approach

3 Relation between confidence intervals and


hypothesis tests

So, a 95% confidence interval for identifies all those


values of 0 for which the null hypothesis H0 : = 0
would not be rejected at the 5% significance level against
the alternative hypothesis H1 : 6= 0 .
As the significance level most often applied in hypothesis
testing is 5%, this is the main reason why 95%
confidence intervals are usually quoted.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

3 Relation between confidence intervals and


hypothesis tests

As we have seen, a confidence interval gives us an idea of


the range of possible values the parameter of interest, ,
might take (for a given confidence level).
On the other hand, the p-value of a test gives us an idea of
how likely, or unlikely, it is that = 0 against a specified
alternative hypothesis.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

4 Classical approach to hypothesis testing

We have based our approach to hypothesis testing on the


calculation of a p-value and its comparison with = 0.05,
or the significance level of 100% = 5%.
In the classical approach to hypothesis testing, a
significance level, 100%, is chosen and the acceptance
region of values of the test statistic is identified for the
chosen significance level and the alternative hypothesis
under consideration.
The value of the test statistic is calculated and if it falls
outside the acceptance region, and hence inside the
so-called critical region, the null hypothesis is rejected.
If not, the null hypothesis is not rejected.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

4 Classical approach to hypothesis testing

Instead of identifying the acceptance and critical regions


one can equivalently calculate the p-value of the test and
reject the null hypothesis if that p-value is less than ,
corresponding to the significance level of 100%.
This is what we have done, employing the commonly used
significance level of 5%.
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Table 2: Summary of the two types of errors that can occur when
performing an hypothesis test.

Accept H0 Reject H0
H0 true X Type I error
H0 false Type II error X

The significance level of a test is the probability of


committing a type I error.
We commit a type I error when we reject the null
hypothesis when it is in fact true.
This type of error is always possible, unless, of course, we
never reject the null hypothesis!
An argument for not doing so is based on a consideration
of the type II error.
We commit a type II error when we accept the null
hypothesis when in fact the alternative hypothesis is true
(i.e. we do not reject the null hypothesis when it is false).
Introduction Tests for key distributions Relation between intervals and tests Classical approach

Type I and Type II Errors

Clearly, we would like to reduce the probabilities of


committing such errors as much as possible.
However, it is difficult to control both of them
simultaneously and the classical approach is to fix one of
them: the probability of committing a type I error, , or
equivalently the significance level 100%.
For a given significance level, 100%, the power of a test
is 1 , where denotes the probability of committing a
type II error.
For a given significance level, it is natural to seek tests with
high power: so-called, powerful tests.

You might also like