Hypothesis Testing

Introduction to Hypothesis Testing
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.1

Hypothesis Testing
Hypothesis
• An informed guess or assumption about a certain
problem or set of circumstances
• Accepted or rejected hypotheses act as conclusions
for the research effort
Abstracted from Brooks/Cole,. 11.2

Nonstatistical Hypothesis Testing…
A criminal trial is an example of hypothesis testing without
the statistics.
In a trial a jury must decide between two hypotheses. The
null hypothesis is
H0: The defendant is innocent
The alternative hypothesis or research hypothesis is

H1: The defendant is guilty
The jury does not know which hypothesis is true. They must
make a decision on the basis of evidence presented.
In the language of statistics convicting the defendant is called rejecting
the null hypothesis in favor of the alternative hypothesis. That is, the
jury is saying that there is enough evidence to conclude that the
defendant is guilty (i.e., there is enough evidence to support the
alternative hypothesis).
If the jury acquits it is stating that there is not enough evidence to

support the alternative hypothesis. Notice that the jury is not saying
that the defendant is innocent, only that there is not enough evidence to
support the alternative hypothesis. That is why we never say that we
accept the null hypothesis, although most people in industry will say
“We accept the null hypothesis”

There are two possible errors.
A Type I error occurs when we reject a true null hypothesis.
That is, a Type I error occurs when the jury convicts an
innocent person. We would want the probability of this type
of error [maybe 0.001 – beyond a reasonable doubt] to be
very small for a criminal trial where a conviction results in
the death penalty, whereas for a civil trial, where conviction
might result in someone having to “pay for damages to a
wrecked auto”,we would be willing for the probability to be
larger [0.49 – preponderance of the evidence ]
P(Type I error) =  [usually 0.05 or 0.01]

A Type II error occurs when we don’t reject a false null
hypothesis [accept the null hypothesis]. That occurs when a
guilty defendant is acquitted.
In practice, this type of error is by far the most serious
mistake we normally make. For example, if we test the
hypothesis that the amount of medication in a heart pill is
equal to a value which will cure your heart problem and
“accept the hull hypothesis that the amount is ok”. Later on
we find out that the average amount is WAY too large and
people die from “too much medication” [I wish we had
rejected the hypothesis and threw the pills in the trash can],
it’s too late because we shipped the pills to the public.
The probability of a Type I error is denoted as α (Greek letter
alpha). The probability of a type II error is β (Greek letter
beta).
The two probabilities are inversely related. Decreasing one

increases the other, for a fixed sample size.
In other words, you can’t have  and β both real small for
any old sample size. You may have to take a much larger
sample size, or in the court example, you need much more
evidence.

Types of Errors…
A Type I error occurs when we reject a true null hypothesis
(i.e. Reject H0 when it is TRUE)
H0 T F
Reject I
Reject II
A Type II error occurs when we don’t reject a false null

hypothesis (i.e. Do NOT reject H0 when it is FALSE)
The critical concepts are theses:
1. There are two hypotheses, the null and the alternative hypotheses.
2. The procedure begins with the assumption that the null hypothesis is
true.
3. The goal is to determine whether there is enough evidence to infer
that the alternative hypothesis is true, or the null is not likely to be
true.
4. There are two possible decisions:
Conclude that there is enough evidence to support the alternative
hypothesis. Reject the null.
Conclude that there is not enough evidence to support the
alternative hypothesis. Fail to reject the null.

Concepts of Hypothesis Testing (1)…
The two hypotheses are called the null hypothesis and the
other the alternative or research hypothesis. The usual
notation is:
pronounced
H “nought”
H0: — the ‘null’ hypothesis
H1: — the ‘alternative’ or ‘research’ hypothesis
The null hypothesis (H0) will always state that the parameter
equals the value specified in the alternative hypothesis (H1)
Concepts of Hypothesis Testing…
Consider mean demand for computers during assembly lead
time. Rather than estimate the mean demand, our operations
manager wants to know whether the mean is different from
350 units. In other words, someone is claiming that the mean
time is 350 units and we want to check this claim out to see
if it appears reasonable. We can rephrase this request into a
test of the hypothesis:
H0: = 350
Thus, our research hypothesis becomes:
H1: ≠ 350
Recall that the standard deviation [σ]was assumed to be 75,
the sample size [n] was 25, and the sample mean [ ] was
calculated to be 370.16
Concepts of Hypothesis Testing…
For example, if we’re trying to decide whether the mean is
not equal to 350, a large value of (say, 600) would provide
enough evidence.
If is close to 350 (say, 355) we could not say that this

provides a great deal of evidence to infer that the population
mean is different than 350.

The two possible decisions that can be made:
Conclude that there is enough evidence to support the alternative

hypothesis
(also stated as: reject the null hypothesis in favor of the alternative)
Conclude that there is not enough evidence to support the

alternative hypothesis
(also stated as: failing to reject the null hypothesis in favor of the
alternative)
NOTE: we do not say that we accept the null hypothesis if a
statistician is around…

The testing procedure begins with the assumption that the
null hypothesis is true.
Thus, until we have further statistical evidence, we will

assume:
H0: = 350 (assumed to be TRUE)

The next step will be to determine the sampling distribution
of the sample mean assuming the true mean is 350.
is normal with 350
75/SQRT(25) = 15
Hypothesis Testing
The general goal of a hypothesis test is to rule out chance
(sampling error) as a plausible explanation for the results
from a research study.
Hypothesis testing is a technique to help determine whether
a specific treatment has an effect on the individuals in a
population.
Abstracted from Brooks/Cole,. 15

Hypothesis Testing
The hypothesis test is used to evaluate the results from a
research study in which
1. A sample is selected from the
population.
2. The treatment is administered to the
sample.
3. After treatment, the individuals in the
sample are measured.

Abstracted from Brooks/Cole,.
Hypothesis Testing (cont.)
If the individuals in the sample are noticeably different from
the individuals in the original population, we have evidence
that the treatment has an effect.
However, it is also possible that the difference between the
sample and the population is simply sampling error

Hypothesis Testing (cont.)
The purpose of the hypothesis test is to decide between two
explanations:
1. The difference between the sample and the
population can be explained by sampling error (there
does not appear to be a treatment effect)
2. The difference between the sample and the
population is too large to be
explained by sampling error (there does
appear to be a treatment effect).

The Null Hypothesis, the Alpha Level, the
Critical Region, and the Test Statistic
The following four steps outline the process of

hypothesis testing and introduce some of the new
terminology:

Step 1
State the hypotheses and select an α level. The null
hypothesis, H0, always states that the treatment has no
effect (no change, no difference). According to the null
hypothesis, the population mean after treatment is the same
is it was before treatment. The α level establishes a
criterion, or "cut-off", for making a decision about the null
hypothesis. The alpha level also determines the risk of a
Type I error.

Step 2
Locate the critical region. The critical region consists
of outcomes that are very unlikely to occur if the null
hypothesis is true. That is, the critical region is defined by
sample means that are almost impossible to obtain if the
treatment has no effect. The phrase “almost impossible”
means that these samples have a probability (p) that is less
than the alpha level.

Step 3
Compute the test statistic. The test statistic (in this
chapter a z-score) forms a ratio comparing the obtained
difference between the sample mean and the hypothesized
population mean versus the amount of difference we would
expect without any treatment effect (the standard error).

Step 4
A large value for the test statistic shows that the

obtained mean difference is more than would be
expected if there is no treatment effect. If it is large
enough to be in the critical region, we conclude that the
difference is significant or that the treatment has a
significant effect. In this case we reject the null
hypothesis. If the mean difference is relatively small,
then the test statistic will have a low value. In this case,
we conclude that the evidence from the sample is not
sufficient, and the decision is fail to reject the null
hypothesis.

Errors in Hypothesis Tests
Just because the sample mean (following treatment) is
different from the original population mean does not
necessarily indicate that the treatment has caused a change.
You should recall that there usually is some discrepancy
between a sample mean and the population mean simply as a
result of sampling error.

Errors in Hypothesis Tests (cont.)
Because the hypothesis test relies on sample data, and
because sample data are not completely reliable, there is
always the risk that misleading data will cause the
hypothesis test to reach a wrong conclusion.
Two types of error are possible.

Type I Errors
A Type I error occurs when the sample data appear to show a

treatment effect when, in fact, there is none.
In this case the researcher will reject the null hypothesis and
falsely conclude that the treatment has an effect.
Type I errors are caused by unusual, unrepresentative samples.
Just by chance the researcher selects an extreme sample with
the result that the sample falls in the critical region even though
the treatment has no effect.
The hypothesis test is structured so that Type I errors are very
unlikely; specifically, the probability of a Type I error is equal
to the alpha level.

Type II Errors
A Type II error occurs when the sample does not

appear to have been affected by the treatment when, in
fact, the treatment does have an effect.
In this case, the researcher will fail to reject the null
hypothesis and falsely conclude that the treatment does
not have an effect.
Type II errors are commonly the result of a very small
treatment effect. Although the treatment does have an
effect, it is not large enough to show up in the research
study.

Directional Tests
When a research study predicts a specific direction for

the treatment effect (increase or decrease), it is possible
to incorporate the directional prediction into the
hypothesis test.
The result is called a directional test or a one-tailed
test. A directional test includes the directional
prediction in the statement of the hypotheses and in the
location of the critical region.

Directional Tests (cont.)
For example, if the original population has a mean of μ = 80
and the treatment is predicted to increase the scores, then the
null hypothesis would state that after treatment:
H0: μ < 80 (there is no increase)
In this case, the entire critical region would be located in the
right-hand tail of the distribution because large values for M
would demonstrate that there is an increase and would tend
to reject the null hypothesis.

Measuring Effect Size
A hypothesis test evaluates the statistical significance of

the results from a research study.
That is, the test determines whether or not it is likely
that the obtained sample mean occurred without any
contribution from a treatment effect.
The hypothesis test is influenced not only by the size of
the treatment effect but also by the size of the sample.
Thus, even a very small effect can be significant if it is
observed in a very large sample.

Measuring Effect Size
Because a significant effect does not necessarily mean a
large effect, it is recommended that the hypothesis test be
accompanied by a measure of the effect size.
We use Cohen=s d as a standardized measure of effect size.
Much like a z-score, Cohen=s d measures the size of the
mean difference in terms of the standard deviation.

Power of a Hypothesis Test
The power of a hypothesis test is defined is the probability
that the test will reject the null hypothesis when the
treatment does have an effect.
The power of a test depends on a variety of factors including
the size of the treatment effect and the size of the sample.

Three ways to determine this: First way
1. Unstandardized test statistic: Is in the guts of the
sampling distribution? Depends on what you define as
the “guts” of the sampling distribution.
If we define the guts as the center 95% of the distribution

[this means  = 0.05], then the critical values that define
the guts will be 1.96 standard deviations of X-Bar on
either side of the mean of the sampling distribution
[350], or
UCV = 350 + 1.96*15 = 350 + 29.4 = 379.4
LCV = 350 – 1.96*15 = 350 – 29.4 = 320.6

1. Unstandardized Test Statistic Approach

Three ways to determine this: Second way
2. Standardized test statistic: Since we defined the “guts” of
the sampling distribution to be the center 95% [ = 0.05],
If the Z-Score for the sample mean is greater than
1.96, we know that will be in the reject region on the right
side or
If the Z-Score for the sample mean is less than -1.97,
we know that will be in the reject region on the left side.
Z=( - )/ = (370.16 – 350)/15 = 1.344
Is this Z-Score in the guts of the sampling distribution???

2. Standardized Test Statistic Approach

Three ways to determine this: Third way
3. The p-value approach (which is generally used with a computer and
statistical software): Increase the “Rejection Region” until it “captures”
the sample mean.
For this example, since is to the right of the mean, calculate

P( > 370.16) = P(Z > 1.344) = 0.0901
Since this is a two tailed test, you must double this area for the p-value.
p-value = 2*(0.0901) = 0.1802
Since we defined the guts as the center 95% [ = 0.05], the reject
region is the other 5%. Since our sample mean, , is in the 18.02%
region, it cannot be in our 5% rejection region [ = 0.05].

3. p-value approach

Statistical Conclusions:
Unstandardized Test Statistic:
Since LCV (320.6) < (370.16) < UCV (379.4), we
reject the null hypothesis at a 5% level of significance.
Standardized Test Statistic:

Since -Z/2(-1.96) < Z(1.344) < Z/2 (1.96), we fail to
reject the null hypothesis at a 5% level of significance.
P-value:
Since p-value (0.1802) > 0.05 [], we fail to reject the
hull hypothesis at a 5% level of significance.
Example 11.1…
A department store manager determines that a new billing
system will be cost-effective only if the mean monthly
account is more than $170.
A random sample of 400 monthly accounts is drawn, for

which the sample mean is $178. The accounts are
approximately normally distributed with a standard deviation
of $65.
Can we conclude that the new system will be cost-effective?

Example 11.1…
The system will be cost effective if the mean account
balance for all customers is greater than $170.
We express this belief as a our research hypothesis, that is:
H1: > 170 (this is what we want to determine)
Thus, our null hypothesis becomes:
H0: = 170 (this specifies a single value for the

parameter of interest) – Actually H0: μ < 170
Example 11.1…
What we want to show:
H1: > 170
H0: < 170 (we’ll assume this is true)
Normally we put Ho first.
We know:
n = 400,
= 178, and
= 65
= 65/SQRT(400) = 3.25
 = 0.05

Example 11.1… Rejection Region…
The rejection region is a range of values such that if the test
statistic falls into that range, we decide to reject the null
hypothesis in favor of the alternative hypothesis.
is the critical value of to reject H0.

Example 11.1…
At a 5% significance level (i.e. =0.05), we get [all  in one tail]
Z = Z0.05 = 1.645
Therefore, UCV = 170 + 1.645*3.25 = 175.35
Since our sample mean (178) is greater than the critical value we
calculated (175.35), we reject the null hypothesis in favor of H1
OR
(>1.645) Reject null
OR
p-value = P( > 178) = P(Z > 2.46) = 0.0069 < 0.05 Reject null

Example 11.1… The Big Picture…
H1: > 170 =175.34

H0: = 170
=178
Reject H0 in favor of
Interpreting the p-value…
The smaller the p-value, the more statistical evidence exists
to support the alternative hypothesis.
•If the p-value is less than 1%, there is overwhelming
evidence that supports the alternative hypothesis.
•If the p-value is between 1% and 5%, there is a strong
•If the p-value is between 5% and 10% there is a weak
•If the p-value exceeds 10%, there is no evidence that
supports the alternative hypothesis.
We observe a p-value of .0069, hence there is
overwhelming evidence to support H1: > 170.

Interpreting the p-value…
Overwhelming Evidence
(Highly Significant)
Strong Evidence
(Significant)
Weak Evidence
(Not Significant)
No Evidence
(Not Significant)
0 .01 .05 .10
p=.0069
Conclusions of a Test of Hypothesis…
If we reject the null hypothesis, we conclude that there is
enough evidence to infer that the alternative hypothesis is
true.
If we fail to reject the null hypothesis, we conclude that there

is not enough statistical evidence to infer that the alternative
hypothesis is true. This does not mean that we have proven
that the null hypothesis is true!
Keep in mind that committing a Type I error OR a Type II

error can be VERY bad depending on the problem.

One tail test with rejection region on right
The last example was a one tail test, because the rejection
region is located in only one tail of the sampling distribution:
More correctly, this was an example of a right tail test.

H1: μ > 170
H0: μ < 170

One tail test with rejection region on left
The rejection region will be in the left tail.

Two tail test with rejection region in both tails
The rejection region is split equally between the two tails.

Example 11.2… Students work
AT&T’s argues that its rates are such that customers won’t
see a difference in their phone bills between them and their
competitors. They calculate the mean and standard deviation
for all their customers at $17.09 and $3.87 (respectively).
Note: Don’t know the true value for σ, so we estimate σ from
the data [σ ~ s = 3.87] – large sample so don’t worry.
They then sample 100 customers at random and recalculate a
monthly phone bill based on competitor’s rates.
Our null and alternative hypotheses are
H1: ≠ 17.09. We do this by assuming that:
H0: = 17.09

Example 11.2…
The rejection region is set up so we can reject the null
hypothesis when the test statistic is large or when it is small.
stat is “small” stat is “large”
That is, we set up a two-tail rejection region. The total area

in the rejection region must sum to , so we divide  by 2.

Example 11.2…
At a 5% significance level (i.e. = .05), we have
/2 = .025. Thus, z.025 = 1.96 and our rejection region is:
z < –1.96 -or- z > 1.96
z
-z.025 0 +z.025

Example 11.2…
From the data, we calculate = 17.55
Using our standardized test statistic:
We find that:
Since z = 1.19 is not greater than 1.96, nor less than –1.96
we cannot reject the null hypothesis in favor of H1. That is
“there is insufficient evidence to infer that there is a
difference between the bills of AT&T and the competitor.”
Summary of One- and Two-Tail Tests…
One-Tail Test Two-Tail Test One-Tail Test

(left tail) (right tail)

Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Hypothesis Testing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Hypothesis Testing

Uploaded by

Copyright:

Available Formats

Introduction to Hypothesis Testing

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. 11.1

Abstracted from Brooks/Cole,. 11.2

The alternative hypothesis or research hypothesis is

If the jury acquits it is stating that there is not enough evidence to

Abstracted from Brooks/Cole,. 11.4

Abstracted from Brooks/Cole,. 11.5

The two probabilities are inversely related. Decreasing one

Abstracted from Brooks/Cole,. 11.7

A Type II error occurs when we don’t reject a false null

Abstracted from Brooks/Cole,. 11.9

H0: — the ‘null’ hypothesis

H1: — the ‘alternative’ or ‘research’ hypothesis

If is close to 350 (say, 355) we could not say that this

Abstracted from Brooks/Cole,. 11.12

Conclude that there is enough evidence to support the alternative

Conclude that there is not enough evidence to support the

Abstracted from Brooks/Cole,. 11.13

Thus, until we have further statistical evidence, we will

H0: = 350 (assumed to be TRUE)

Abstracted from Brooks/Cole,. 15

Abstracted from Brooks/Cole,. 16

Abstracted from Brooks/Cole,. 18

Abstracted from Brooks/Cole,. 20

The following four steps outline the process of

Abstracted from Brooks/Cole,. 22

Abstracted from Brooks/Cole,. 23

Abstracted from Brooks/Cole,. 25

Abstracted from Brooks/Cole,. 27

A large value for the test statistic shows that the

Abstracted from Brooks/Cole,. 28

Abstracted from Brooks/Cole,. 30

Abstracted from Brooks/Cole,. 31

A Type I error occurs when the sample data appear to show a

Abstracted from Brooks/Cole,. 32

A Type II error occurs when the sample does not

Abstracted from Brooks/Cole,. 33

When a research study predicts a specific direction for

Abstracted from Brooks/Cole,. 35

Abstracted from Brooks/Cole,. 36

A hypothesis test evaluates the statistical significance of

Abstracted from Brooks/Cole,. 37

Abstracted from Brooks/Cole,. 38

Abstracted from Brooks/Cole,. 40

If we define the guts as the center 95% of the distribution

Abstracted from Brooks/Cole,. 11.42

Abstracted from Brooks/Cole,. 11.43

Z=( - )/ = (370.16 – 350)/15 = 1.344

Is this Z-Score in the guts of the sampling distribution???

Abstracted from Brooks/Cole,. 11.45

For this example, since is to the right of the mean, calculate

Abstracted from Brooks/Cole,. 11.46

Abstracted from Brooks/Cole,. 11.47

Standardized Test Statistic:

A random sample of 400 monthly accounts is drawn, for

Can we conclude that the new system will be cost-effective?

Abstracted from Brooks/Cole,. 11.49

We express this belief as a our research hypothesis, that is:

H1: > 170 (this is what we want to determine)

Thus, our null hypothesis becomes: