Chap 9

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

QUANTITATIVE METHODS

(QM)

Vijay Aggarwal

SESSION # 9
Statistical Inference:
INTERVAL ESTIMATION &
TESTING OF HYPOTHESES
QM – Inferential Topics
Topic Readings

Estimation & Hypotheses Testing Ch 8

Hypothesis Testing for Single Populations: Ch 9


Population Means, Proportions and Variances
Statistical Inference for Two Populations Ch 10

Analysis of Variance (ANOVA) Ch 11

Simple Regression and Correlation Ch 12

Multiple Regression Ch 13

Analysis of Categorical Data & Non-parametric Ch 16 & 17


TAXONOMY (Tree Diagram):
INFERENTIAL STATISTICS
Outline for the Session
• Understand the logic of statistical hypothesis testing:
– Know how to establish null and alternative hypotheses.
– Understand Type I and Type II errors.
• Know how to implement the (HTAB) system to test hypotheses
(that is, formulating Hypothesis, Testing, taking Action,
determining Business implication)
• Understand how to test hypotheses about a single population
parameter:
– Mean
• when s is known (using z-statistic)
• when s is unknown (using t-statistic)
– Proportion (using z-statistic)
– Variance (using c2-statistic)
Types of Hypotheses
1. Research Hypothesis - a statement of what the
researcher believes will be the outcome of an
experiment or a study.
2. Statistical Hypotheses - a more formal structure
derived from the scientific method.
– Composed of two parts
• Null hypothesis (Ho) – the assumed value of the parameter
if there is no effect/impact. We will conclude that this
could be true unless there is a small chance of getting a
sample statistic (mean/proportion/variance) as extreme or
more extreme than from the data (small p-value).
• Alternative hypothesis (Ha) – a statement of whether the
true population parameter is higher, lower, or not equal to
that hypothesized in the null hypothesis.
Types of Hypotheses
3. Substantive Hypotheses - a statistically
significant difference does not imply or mean a
material, substantive difference.
– If the null hypothesis is rejected and the alternative
hypothesis is accepted, then one can say that a
statistically significant result has been obtained
– With “significant” results, you reject the null
hypothesis
Using the HTAB System

• H – Hypotheses
– Establish the hypotheses
• T – Test
– Conduct the test
• A – Action
– Take statistical action
• B – Business Implications
– Determine the business implications
CPA Salary Example
• Example: A survey of CPAs in the U.S., done 15
years ago, found that their average salary was
$74,914. An accounting researcher would like
to test whether this average has changed over
the years. A random sample of 112 CPAs
produced a mean salary of $78,695. Assume
that the population standard deviation of
salaries is s = $14,530 (note: this value is
typically not known, but we will assume it for
mathematical simplicity. Later, we will remove
this assumption).
Step 1: Hypotheses

• Set up the null and alternative hypotheses


Always contains “=“

H 0 :   $74,914
H a :   $74,914

> or < or ≠
Null and Alternative Hypotheses
• The null and alternative hypotheses are mutually
exclusive.
– Only one of them can be selected.
• The null hypothesis is assumed to be true. It is
compared to the observed data via either a
critical value (critical value method) or by
calculating a p-value (p-value method)
• The burden of proof falls on the alternative
hypothesis. Thus, you either reject the null in
favor of the alternative or you fail to reject the
null in favor of the alternative. The latter
statement does not imply that the null is true.
Examples of One- and Two-tailed Tests
• One-tailed Tests
– Means
H 0 :   40
H a :   40
H0 : p  0.18
– Proportions
Ha : p  0.18

H 0 :   12
• Two-tailed Test
H a :   12
Step 2: Determine Appropriate Test
• The z-statistic can be used to test  when the
following three conditions are met:
– The data are a random sample from the population
– The sample standard deviation (s) is known
– At least one of the following conditions are met:
• The sample size (n) is at least 30 OR
• the underlying distribution is normal

• Thus,
z X 
s/ n
Step 3: Set significance level (a)
• Significance level (a) or Type I error rate
– Committed by rejecting a true null hypothesis
– If the null hypothesis is true, any value that falls in a
rejection region will be a type I error.
– The probability of committing a Type I error is
referred to as a, the level of significance.
• The significance level is usually set at 0.05.
Other common values are 0.1, 0.01, or 0.001.
Type II Errors
• Type II Error
– Committed when a researcher fails to reject a false null
hypothesis
– The probability of committing a Type II error is referred
to as . Some refer to power, or 1- (the chance of
rejecting the null when it’s false), instead.
• In practice, we don’t know whether the null is true.
• Type I and type II error rates are inversely related,
if you reduce one, you increase the other.
• One way of reducing both type I and type II error
rates is to increase the sample size, but that
requires more time and money.
Decision Table
for Hypothesis Testing
State of Null (Truth)
Null True Null False
Decision:

Fail to Correct Type II error


reject null Decision ( )

Reject null Type I error Correct Decision


(a )
Step 4: Decision Rule
• A decision rule has to be made about when the
difference between the sample and hypothesized
population mean (under the null hypothesis) is small
or large.
• The rejection region is the area on the curve where
the null hypothesis is rejected. Here the value of the
sample mean is too far from the hypothesized
population mean to conclude that they are the same.
• The nonrejection region is the area where the null
hypothesis is not rejected. Here the sample mean is
close enough to the hypothesized population mean to
conclude that the null hypothesis could be true.
Rejection and Nonrejection Regions

Rejection Region
Rejection Region

Non Rejection Region


=$74,914

Hypothesized
Critical Value Mean under H0 Critical Value
Decision rule – CPA Example

Rejection Region
Rejection Region

Non Rejection Region


=$74,914
Za/2 = Z0.025 = -1.96 Z1-a/2 = Z0.975 = +1.96
Z=0
Decision Rule – CPA Example

• Thus, we will reject H0 if Z > 1.96 or Z < -1.96


– or reject H0 if |Z| > 1.96
Critical Value Based on Sample Mean
• Alternatively, you could calculate a critical value
based not on Z, but on x-bar.

x
zc  sc   so 1.96  xc  74,914
n 14,530 112

or xc  74,914 1.9614,530  74,914  2,691


112
lower xc  72,223 and upper xc  77,605
• Thus, we would reject the null if the sample mean
is above $77,605 or below $72,223.
One-tailed Tests
Depending on the problem, one-tailed tests are sometimes appropriat

H 0 :   40 H0 :   40
H a :   40 Ha :   40

Rejection Region Rejection Region

Non Rejection Region Non Rejection Region


=40 oz =40 oz

Critical Value Critical Value


Step 5 (Gather Data) and
Step 6 (Compute Test Statistic)
• Step 5: Gather the data
– Suppose that all 112 CPAs responded to the survey.
– The following summary statistics are calculated:
x-bar = $78,695, n = 112, s = $14,530 (given)
• Step 6: Compute the test statistic.

z  X    78695  74914  2 . 75
s / n 14530 / 112
Step 7: Statistical Action (Decision)
 The rejection region was
If z  zc  1.96, reject H 0 .
If z  zc  1.96, do not reject H 0 .

 We calculated Z to be 2.75 from our data.

z = 2.75  z c = 1.96, so you reject H 0 .


Step 8: Business Decision

• Statistically, the researcher has enough evidence


to reject the figure of $74,914 as the true
average salary for CPAs. Based on the evidence
gathered, it suggests that the average has
increased over the 15-year period.
Alternative Method: the p-value
• p-value – another way to reach statistical conclusion
in hypothesis testing
– If the null hypothesis is true, the p value is the
probability of getting a sample mean as extreme or more
extreme than what you observed.
– If the sample mean is in the rejection region, the p-value
will be small. These two methods are always consistent.
• p-value < a  reject H0, p-value  a  do not reject H0
• For two tailed test, a/2 is used in each region
– The p value is then compared to α/2 instead of a to
determine statistical significance.
– Some statisticians double the p-value for a two sided test
instead and compare to a.
p-value for CPA Example

• Recall in the CPA example, x-bar was $78,695


• p value = P(x-bar > $78,695  | H0 true) 
 x   $78,695 $74,914
P x  $78,695|   $74,914  P  
 s $14,530 
 n 112 
 P(Z  2.75)  0.003

=1- normsdist(2.75)

• Since 0.003 < 0.025, we reject the null


hypothesis
Review of 8 Steps of Hypothesis Testing

• 1 – Establish the null and alternative hypotheses H


• 2 – Determine the appropriate statistical test
• 3 – Set a, the type I error rate / significance level
• 4 – Establish the decision rule T
• 5 – Gather sample data
• 6 – Analyze the data
• 7 – Reach a statistical conclusion
• 8 – Make a business decision A
B
Hypothesis Test of  with unknown s
(from p.309)
The U.S. Farmers’ Production Company (USFPC)
builds large harvesters. For a harvester to be
properly balanced when operating, a 25-pound
plate is installed on its side. The machine that
produces these plates is set to yield plates that
average 25 pounds. The distribution of plates
produced from the machine is normal. However,
the shop supervisor is worried that the machine is
out of adjustment and is producing plates that do
not average 25 pounds. To test this, he randomly
selects 20 of the plates from the day before and
weighs them.
Hypothesis Test of  with unknown s
• 1 – Establish the null and alternative hypotheses
– H0: =25 pounds (where   mean weight of all plates)
– Ha: ≠25
• 2 – Determine the appropriate statistical test
– Recall from chapter 8, the conditions for the t-distribution:
• The sample was randomly selection from the population (as
stated in the problem)
• The sample standard deviation (s) is unknown
• One of these conditions are met:
t 
– The sample size (n) is at least 30 OR X  
n1 s / n
– the underlying distribution is normal
– These conditions are met!
– The degrees of freedom are n-1 = 20-1 = 19 in this example
Hypothesis Test of  with unknown s
• 3 – Set a, the type I error rate / significance level
– Set a = 0.05 (the common default)
• 4 – Establish the decision rule
– Using the critical value method, we will reject the null if
t19  ta / 2  t19  t0.025  t19  2.093

or tc = ±2.093
• 5 – Gather sample data
– From the sample data, x-bar = 25.51 and s = 2.1933
• 6 – Analyze the data

t  X    25 . 51  25  1 . 04
n 1 s / n 2 . 1933 / 20
Hypothesis Test of  with unknown s

• 7 – Reach a statistical conclusion


– Since |t| = 1.04 < tc = 2.093, do not reject H0
• 8 – Make a business decision
– There is not enough evidence to show that the
plates are different from 25 pounds. (Note: Is this
because the true population mean is close to 25
pounds, or is there a large chance that we have
suffered from a Type II error? This would entail
calculating type II error under varying values of Ha)
Hypothesis Test of p

(from p.316)
Suppose a company held 26% of market share for
several years. Due to a massive marketing
effort and improved product quality, company
officials believe that the market share has
increased, and they want to prove it
statistically. In a random sample of 140 users,
48 used their product. Does this present
evidence that their market share has
increased? Test it at the 5% level of
significance.
Hypothesis Test of p
• 1 – Establish the null and alternative hypotheses
– H0: p = 0.26 vs. Ha: p > 0.26
• 2 – Determine the appropriate statistical test
– Z-test for proportions:
z  pq pˆ  p
n
– Appropriate if the following two conditions are met:
• The sample was randomly selected from the population
• np>= 5 and nq >=5. For our data, 140(0.26) = 36.4 > 5
and 140 (0.74) = 103.6 > 5, so this condition is met.
• 3 – Set a, the type I error rate / significance level
– Choose the common value of a = 0.05
Hypothesis Test of p
• 4 – Establish the decision rule
– Critical value method: Reject H0 if z > zc = 1.645
– P-value method: Reject H0 if p-value < a = 0.05
• 5 – Gather sample data
– p-hat = 48/140 = 0.343
• 6 – Analyze the data
ˆ
p 
z  pq p 0 . 343  0 . 26  0 . 083  2 . 24
n ( 0 . 26 )(1  0 . 26 ) 0 . 037
140
Hypothesis Test of p

• 7 – Reach a statistical conclusion


– Since z = 2.24 > zc = 1.645, reject the null
hypothesis
• 8 – Make a business decision
– There is evidence that the market share is now
above 26%.
Hypothesis Test of s
(Demonstration Problem 9.4)
A small business has 37 employees. Because of the uncertain
demand for its product, the company usually pays overtime
on any given week. The company assumed that about 50 total
hours of overtime per week is required and that the variance
on this figure is about 25. Company officials want to know
whether the variance of overtime hours has changed. The
data below are a random sample of 16 weeks of overtime in
hours per week. Assume hours of overtime are normally
distributed. Use these data to test the null hypothesis that
the variance of overtime data is 25.
57 56 52 44
46 53 44 44
48 51 55 48
63 53 51 50
Hypothesis Test of s

• Step 1: Hypotheses
H0: s2 = 25
Ha: s2  25

• Step 2: Variances follow a chi-squared


distribution with n-1 degrees of freedom. The
test statistic is:
(n
c2  2 1)s2

n1 s
Hypothesis Test of s
• Step 3: Choose a = 0.10 [so a/2 = 0.05].
• Step 4: The degrees of freedom are 16 – 1 = 15.
The lower and upper critical chi-square values
are c2(1 – 0.05), 15 = c2 0.95, 15 = 7.3 and c2 0.05, 15 =
25.0.
• Step 5: The data are listed in the text.
• Step 6: The sample variance is s2 = 28.1.
The observed chi-square value is calculated as
c2 = (n-1)s2 / s2 = (15) 28.1 / 25 = 16.9.
Hypothesis Test of s
• Step 7: The observed chi-square value is in the
nonrejection region because c2 0.95, 15 = 7.3
< c2observed = 16.9 < c2 0.05, 15 = 25.0.
• Step 8: This result indicates to the company
managers that the variance of weekly overtime
hours is about what they expected.

You might also like