Week 9-1 - H0 and H1 (Updated)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 11

Hypothesis testing

1. Introducing H0 and H1
Statistical model Random
𝑋 𝑁 ( ? ,? ) sample

1. Confidence interval
Use the random sample to estimate the “?” by an interval [ , ]
E.g., Using a random sample and a 90% CI, we estimate
the average # of steps that a NTU student takes per day is
between [4767.383,5232.617] steps
Statistical model Random
𝑋 𝑁 ( ? ,? ) sample

2. Hypothesis testing (HT)


Use the random sample to test a statement (H0) about the value of “?”
Let me test whether “the average # of steps that a NTU student takes is equal
to 5000” is plausible or not
Mathematically, we want to verify ,” using a sample as evidence
HT describes a method to converting a random sample to a yes-no conclusion
about the statement
Boston marathon

The average finishing time is 3.97 hours, based on real data from
the 2017 race (data file in course website)
Understand and
• The “null” hypothesis ( can be the status quo, or our current knowledge of the
parameter (i.e., )
• Our target parameter is the population mean of finishing time (), among all
runners of a marathon race like this one
“I want to test whether the average finishing time has changed since 2017 or not”
3.97hr.
“Before putting up the advertisement, sales per day are $1,000 (. I want to test
whether the sales has changed in the presence the advertisement” $1000 / day
• is the alternative hypothesis that encompasses all the other possible values not
in (i.e., “all else”)
is auto-generated based on
Two types of
Depending on the purpose of your study, your may take the form of
equality or inequality

: Average finishing time has not changed since 2017 (3.97 hr)
: Average finishing time has changed since 2017 ( 3.97 hr)

: Average finishing time has not improved since 2017 (3.97 hr)
: Average finishing time has improved since 2017 ( 3.97 hr)
Hypothesis testing (HT): key concept 1
: (hours)
:
HT is a rule to decide supporting or not, based on a sample (from
sample conclusion)
Objective is to use sample to choose between these two opposing
conclusions:
a) “I do not find sufficient evidence against H0 “
b) “I find sufficient evidence against H0”
Hypothesis testing (HT): key concept 2
: : 3.97

A: Reject H0
B: Not Reject

To decide a) or b), I collected a random sample of 50 runners of the


2019 marathon (. Sample mean is 4.0 hours.
Since 4.0 is different from 3.97, is it right to reject ? If not, why not?
Let’s analyze this 
When H0 is true and (which we don’t know is true or not)
My evidence: the sample mean of 50 runners may take this form
(by CLT):

(given , by CLT)

Sample mean could deviate from 3.97 even when H0 is true—this


distribution or uncertainty is due to “sampling” (instead of a census)
So…not wise to reject H0 when differs only slightly with 3.97 (hr)
Hypothesis testing (HT): key concept 2
:
:

(given , by CLT)

When , it is “quite likely” to get near 3.97, say 4.0. But it’s quite
unlikely to get =5.0 or 3.0
We will set a “buffer zone” around 3.97, for which we consider normal
(pun unintended)
Concept 2: We will allow some “buffer” in sample information,
before rejecting
Summary of hypothesis testing
• Analyse the problem and formulate H0 and H1
- H0: Null hypothesis (no change; no effect). H1: Alternative hypothesis
(there is a change, or an effect on status quo)
• Key concept #1:
We use the sample, either to reject H0, or NOT reject H0
• Key concept #2:
The sampling process induces uncertainty -> Rationally, we create a
“buffer” in making the rejection decision

You might also like