Chapter 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 65

Chapter 5: Statistical

Inference
Contents
• One-Sample Hypothesis Tests • ANOVA
(mean and proportion)
• Hypothesis Development • Hypothesis
• Type I error and Type II error Development
• Z-statistic or t-statistic
• F-statistic
• One-tailed and two-tailed test
(critical value and p-value methods)
• Chi-Square Test
• Two-Sample Hypothesis Tests
• Hypothesis Development • Hypothesis
• Z-statistic, t-statistic Development
• Two-Sample test for means,
• ? Known
• Computing Expected
• ? Unknown, assumed equal Frequencies
variances
• ? Unknown, assumed unequal • Conducting the Chi-
variances
• Paired samples Square Test
Introduction

Statistical Drawing conclusions about


Inference populations from samples

A technique that allows


researchers to draw valid
statistical conclusions about Hypothesis
the value of population Testing
parameters or differences
among them.
Hypothesis Testing

Hypothesis testing involves drawing inferences about two contrasting propositions


(each called a hypothesis) relating to the value of one or more population
parameters, such as the mean, proportion, standard deviation, or variance.

One of these propositions (called the null hypothesis, H 0) describes the existing
theory or a belief that is accepted as valid unless strong statistical evidence
exists to the contrary.

The second proposition (called the alternative hypothesis, H 1) is the complement of


the null hypothesis; it must be true if the null hypothesis is false.
Hypothesis-Testing Procedure

5. Applying the
decision rule to the
test statistic and
4. Collecting data and drawing a conclusion.
calculating a test
statistic
3. Determining a
decision rule on which
to base a conclusion
2. Selecting a level of
significance, which
defines the risk of
1. Identifying the drawing an incorrect
population parameter conclusion when the
of interest and assumed hypothesis is
formulating the actually true
hypotheses to test.
One-Sample Hypothesis
Tests
One-Sample Hypothesis Test
H0: population parameter ≥ constant vs H1: population parameter < constant
H0: population parameter ≤ constant vs H1: population parameter > constant
H0: population parameter = constant vs H1: population parameter ≠ constant

Hypothesis testing always assumes


For one-sample tests, the
that H0 is true and uses sample data to
statements of the null
determine whether H1 is more likely to
hypotheses are expressed as
be true. Statistically, we cannot
either ≥, ≤ or =. It is not
“prove” that H0 is true; we can only
correct to formulate a null
fail to reject it. Thus, if we cannot
hypothesis using >, <, or ≠.
reject the null hypothesis, we have
shown only that there is insufficient
evidence to conclude that the
alternative hypothesis is true.
Example: One-Tailed Test for the
Mean

How to
formulate
hypothesis?

H0: 𝜇 ≥ 25
H1: 𝜇 < 25
Hypothesis Testing Results
The null hypothesis is actually true, and the test
correctly fails to reject it.

The null hypothesis is actually false, and the


hypothesis test correctly reaches this conclusion.

The null hypothesis is actually true, but the


hypothesis test incorrectly rejects it (called Type I
error).

The null hypothesis is actually false, but the


hypothesis test incorrectly fails to reject it (called
Type II error).
Example
We assume the null hypothesis to be true.
The p-value is the probability of observing a sample mean that is as or more extreme than the observed.

Two-tailed test:
(The sum of the area in the tails is the p-value)
p-value = p(Z < -|z0| or Z > |z0|)
= 2p(Z < |z0|)

Rejection region, 𝛼/2 Rejection region, 𝛼/2


μ=0

1- 𝛼 1- 𝛼

μ=0 Rejection region, 𝛼 μ=0


Right-tailed test: Left-tailed test:
(The sum of the area in right tail is the p-value) (The sum of the area in left tail is the p-value)
p-value = p(Z > z0) p-value = p(Z < z0)
Type I Error
• The probability of making a Type I error, that is,
P(rejecting H0|H0 is true), is denoted by 𝛼 and is
called the level of significance.
• This defines the likelihood that you are willing to
take in making the incorrect conclusion that the
alternative hypothesis is true when, in fact, the null
hypothesis is true.
• The value of 𝛼 can be controlled by the decision
maker and is selected before the test is conducted.
• Commonly used levels for a are 0.10, 0.05, and 0.01.
Type II Error
• The probability of correctly failing to reject the null
hypothesis, or P(not rejecting H0|H0 is not true), is called the
confidence coefficient and is calculated as 1 - 𝛼.
• For a confidence coefficient of 0.95, we mean that we expect
95 out of 100 samples to support the null hypothesis rather
than the alternate hypothesis when H0 is actually true.
• Unfortunately, we cannot control the probability of a Type II
error, P(not rejecting H0|H0 is false), which is denoted by 𝛽.
• Unlike 𝛼, 𝛽 cannot be specified in advance but depends on
the true value of the (unknown) population parameter.
Example

The value 1 - 𝛽 is called the power of the test and represents the probability of
correctly rejecting the null hypothesis when it is indeed false, or P(rejecting H0|H0
is false). We would like the power of the test to be high (equivalently, we would
like the probability of a Type II error to be low) to allow us to make a valid
conclusion. The power of the test is sensitive to the sample size; small sample sizes
generally result in a low value of 1 - 𝛽. The power of the test can be increased by
taking larger samples, which enable us to detect small differences between the
sample statistics and population parameters with more accuracy.
Selecting the Test Statistic
• The next step is to collect sample data and use the data
to draw a conclusion.
• The decision to reject or fail to reject a null hypothesis is
based on computing a test statistic from the sample data.
• The test statistic used depends on the type of hypothesis
test.
• Different types of hypothesis tests use different test
statistics, and it is important to use the correct one.
• The proper test statistic often depends on certain
assumptions about the population—for example,
whether or not the standard deviation is known.
Which Test Statistic? Z-statistic
or t-statistic?
Example

Indicates that the sample mean of 21.91 is 1.05 standard errors below the
hypothesized mean of 25 minutes.
Conclusion: To Reject or Fail to
Reject H0
The conclusion to reject or fail to reject H0 is
based on comparing the value of the test
statistic to a “critical value at the chosen level
of significance, α.
• The critical value divides the sampling
distribution into two parts, a rejection region
and a non-rejection region.
• If the test statistic falls into the rejection region,
we reject the null hypothesis; otherwise, we fail
to reject it.
Rejection Region

H0 : parameter ≥ constant H0 : parameter ≤ constant


H1 : parameter < constant H1 : parameter > constant
Example

• By comparing the value of the t-test statistic with this


critical value, we see that the test statistic does not
fall below the critical value (i.e., −1.05 > −1.68) and is
not in the rejection region.
• Therefore, we cannot reject H0 and cannot conclude
that the mean response time has improved to less
than 25 minutes.
• The figure on the right illustrates the conclusion we
reached.
• Even though the sample mean is less than 25, we
cannot conclude that the population mean response
time is less than 25 because of the large amount of Critical value =
sampling error.
Basically, all hypothesis tests are similar; you just
have to ensure that you select the correct test
statistic, critical value, and rejection region,
depending on the type of hypothesis.
Example: Two-Tailed Test for the
Mean
Example: Using p-values
Refer to slide no 15 & 18

A p-value (observed significance level) is the probability of obtaining a


test statistic value equal to or more extreme than that obtained from the
sample data when the null hypothesis is true.
An alternative approach to Step 3 of a hypothesis test uses the p-value
rather than the critical value:
Reject H0 if p-value < α
Testing for Proportions dataset
Many important business measures are expressed as proportions such as market
share or the fraction of deliveries received on time. We may conduct a test of
hypothesis about a population proportion in a similar fashion as we did for
means. The test statistic for a one-sample test for proportions is

Where 𝜋0 is the hypothesized value and ṕ is the sample proportion. Similar to the
test statistic for means, the z-test statistic shows the number of standard errors
that the sample proportion is from the hypothesized value. The sampling
distribution of this test statistic has a standard normal distribution.
Example: One Sample Test for
Proportions Refer to formula in the previous
For a lower-tailed test, the p-value would be computed by
the area to the left of the test statistic; that is:
NORM.S.DIST(z, TRUE)

If we had a two-tailed test, the p-value is:


2*NORM.S.DIST(z, TRUE) if z < 0

Otherwise, the p-value is:


2*(1-NORM.S.DIST (-z, TRUE)) if z > 0
Confidence Interval and
Hypothesis Tests
• A close relationship exists between confidence
intervals and hypothesis tests.
• E.g. suppose we construct a 95% confidence interval
for the mean. We wish to test the hypotheses:

• At a 5% level of significance, we check whether the


hypothesized value 𝜇0 falls within the confidence
interval. If not, we reject H0; if it does, then we cannot
reject H0.
For one-tailed tests, we need to examine on which side of the hypothesized
value the confidence interval falls. For a lower-tailed test, if the confidence
interval falls entirely below the hypothesized value, we reject the null
hypothesis. For an upper-tailed test, if the confidence interval falls entirely
above the hypothesized value, we also reject the null hypothesis.
Let’s Recap
• The Excel file One Sample Hypothesis Tests provides
template worksheets for conducting hypothesis
tests for means and proportions.
Two Sample Hypothesis
Tests
Two-Sample Hypothesis Tests
H0: population parameter (1) – population vs H1: population parameter (1) – population
parameter (2) ≥ D0 parameter (2) < D0
When D0 = 0, the test simply seeks to conclude whether population parameter (1) is smaller than
population parameter (2).
H0: population parameter (1) – population vs H1: population parameter (1) – population
parameter (2) ≤ D0 parameter (2) > D0
When D0 = 0, the test simply seeks to conclude whether population parameter (1) is larger than population
parameter (2).
H0: population parameter (1) – population vs H1: population parameter (1) – population
parameter (2) = D0 parameter (2) ≠ D0
When D0 = 0, we are seeking evidence that population parameter (1) differs from parameter (2).

Many practical applications of hypothesis


In most applications D0 = 0, and we are testing involve comparing two populations for
simply seeking to compare the differences in means, proportions, or other
population parameters. However, there population parameters. Such tests can confirm
are situations when we might want to differences between suppliers, performance at
determine if the parameters differ by two different factory locations, new and old
some non-zero amount; for example, work methods or reward and recognition
“job classification A makes at least programs, and many other situations. Similar
$5,000 more than job classification B. to one-sample tests, two-sample hypothesis
tests for differences in population parameters.
Two Sample Tests for Differences
in Means
• In a two-sample test for differences in means, we
always test hypotheses of the form
Selection of Test Statistic
1. Population variance is known. From the Data Analysis
menu, choose z-test: Two-Sample for Means. This test
uses a test statistic that is based on the standard
normal distribution.
2. Population variance is unknown and assumed unequal.
From the Data Analysis menu, choose t-test: Two-
Sample Assuming Unequal Variances. The test statistic
for this case has a t-distribution.
3. Population variance unknown but assumed equal. In
Excel, choose t-test: Two- Sample Assuming Equal
Variances. The test statistic also has a t-distribution,
but it is different from the unequal variance case.
Caution!!!
You must be very careful in interpreting the output
information from these Excel tools and apply the
following rules:
1. If the test statistic is negative, the one-tailed p-value is the
correct p-value for a lower-tail test; however, for an upper-
tail test, you must subtract this number from 1.0 to get the
correct p-value.
2. If the test statistic is non-negative (positive or zero), then
the p-value in the output is the correct p-value for an
upper-tail test; but for a lower-tail test, you must subtract
this number from 1.0 to get the correct p-value.
3. For a lower-tail test, you must change the sign of the one-
tailed
Only rarely are criticalvariances
the population value.known; also, it is often difficult to justify the assumption that the
variances of each population are equal. Therefore, in most practical situations, we use the t-test: Two-
Sample Assuming Unequal Variances. This procedure also works well with small sample sizes if the
populations are approximately normal.
Example: Comparing Supplier’s
Performance
Example: Testing the Hypothesis for
Supplier Lead-Time Performance
Use data from Purchase Order Database
Example: Testing the Hypothesis for
Supplier Lead-Time Performance cont.
• Determine the mean lead time for Alum Sheeting (𝜇1) is greater than the
mean lead time for Durrable Products (𝜇2).
H0: 𝜇1 - 𝜇2 ≤ 0
H1: 𝜇1 - 𝜇2 > 0
• t-Test: Two-Sample Assuming Unequal Variances
• Variable 1 Range: Alum Sheeting data
• Variable 2 Range: Durrable Products data
Example: Testing the Hypothesis for
Supplier Lead-Time Performance cont.
• Results: Rule 2 – If the test statistic is nonnegative
(positive or zero), then the p-value in the output is
the correct p-value for an upper-tail test.
• t = 3.83
• Critical value = 1.81
• p-value = 0.00166
• Reject H0
Two-Sample Test for Means with
Paired Samples
• In the previous example for testing differences in
the mean supplier lead times, we used
independent samples i.e. the orders in each
supplier’s sample were not related to each other.
• In many situations, data from two samples are
naturally paired or matched.
Two-Sample Test for Means with
Paired Samples
Example 1 Example 2
• Suppose that a sample of assembly line • Suppose that we wish to compare
workers perform a task using two retail prices of grocery items between
different types of work methods, and the two competing grocery stores.
plant manager wants to determine if any
differences exist between the two • It makes little sense to compare
methods. different samples of items from each
• In collecting the data, each worker will
store.
have performed the task using each • Instead, we would select a sample of
method. grocery items and find the price
• Had we used independent samples, we charged for the same items by each
would have randomly selected two store.
different groups of employees and • In this case, the samples are paired
assigned one work method to one group because each item would have a price
and the alternative method to the second
group.
from each of the two stores.
• Each worker would have performed the
task using only one of the methods.
Paired t-test
• When paired samples are used, a paired t-test is more accurate
than assuming that the data come from independent populations.
The null hypothesis we test revolves around the mean difference
(𝜇D) between the paired samples; that is

• The test uses the average difference between the paired data and
the standard deviation of the differences similar to a one-sample
test.
• Excel has a Data Analysis tool, t-Test: Paired Two-Sample for
Means for conducting this type of test. In the dialog, you need to
enter only the variable ranges and hypothesized mean difference.
Example: Using the Paired Two-
Sample Test for Means
Test for Equality of Variances
• Understanding variation in business processes is
very important.
• E.g. does one location or group of employees show
higher variability than others?
• We can test for equality of variances between two
samples using a new type of test, the F-test.
• To use this test, we must assume that both samples
are drawn from normal populations. The
hypotheses we test are
Test for Equality of Variances
• To test these hypotheses, we collect samples of n1 observations
from population 1 and n2 observations from population 2.
• The test uses an F-test statistic, which is the ratio of the
variances of the two samples:

• The sampling distribution of this statistic is called the F-


distribution.
• Similar to the t- distribution, it is characterized by degrees of
freedom; however, the F-distribution has two degrees of
freedom, one associated with the numerator of the F-statistic, n 1
- 1, and one associated with the denominator of the F-statistic,
n2 - 1.
Test for Equality of Variances
• If the variances differ significantly from each other, we
would expect F to be much larger than 1; the closer F is to
1, the more likely it is that the variances are the same.
• Therefore, we need only to compare F to the upper-tail
critical value.
• Hence, for a level of significance a, we find the critical value
F𝛼/2,df1,df2 of the F-distribution, and then we reject the null
hypothesis if the F-test statistic exceeds the critical value.
• Note that we are using 𝛼/2 to find the critical value, not 𝛼.
• This is because we are using only the upper tail information
on which to base our conclusion.
Example: Applying the F-test for
Equality of Variances

The F-test for equality of variances is often used


before testing for the difference in means so that
the proper test (population variance is unknown
and assumed unequal or population variance is
unknown and assumed equal, which we discussed
earlier in this chapter) is selected.
Analysis of Variance
(ANOVA)
To compare the means of several different groups to
determine if all are equal or if any are significantly different
from the rest.
Example: Differences in
Insurance Survey Data
Analysis of Variance (ANOVA)
• In statistical terminology, the variable of interest is called a factor.
• In Eg. 7.13, the factor is educational level – there are three categorical
levels i.e. college graduate, graduate degree, and some college.
• Thus, it would appear that we will have to perform three different
pairwise tests to establish whether any significant differences exist among
them.
• As the number of factor levels increases, you can easily see that the
number of pairwise tests grows large very quickly.
• Fortunately, statistical tools exist that eliminate the need for such a
tedious approach.
• Analysis of variance (ANOVA) is one of them.
• The null hypothesis for ANOVA is that the population means of all groups
are equal; the alternative hypothesis is that at least one mean differs from
the rest:
What is ANOVA?
• ANOVA derives its name from the fact that we are analyzing
variances in the data; essentially, ANOVA computes a
measure of the variance between the means of each group
and a measure of the variance within the groups and
examines a test statistic that is the ratio of these measures.
• This test statistic can be shown to have an F-distribution
(similar to the test for equality of variances).
• If the F-statistic is large enough based on the level of
significance chosen and exceeds a critical value, we would
reject the null hypothesis.
• Excel provides a Data Analysis tool, ANOVA: Single Factor to
conduct analysis of variance.
Example: Applying the EXCEL
ANOVA tool
Example: Applying the EXCEL
ANOVA tool
Do you reject or fail to reject the H0?
Assumptions of ANOVA
• ANOVA requires assumptions that the m groups or factor levels being studied
represent populations whose outcome measures
1. Are randomly and independently obtained,
2. Are normally distributed, and
3. Have equal variances.
• If these assumptions are violated, then the level of significance and the power
of the test can be affected.
• Usually, the first assumption is easily validated when random samples are
chosen for the data.
• ANOVA is fairly robust to departures from normality, so in most cases this isn’t a
serious issue.
• If sample sizes are equal, violation of the third assumption does not have
serious effects on the statistical conclusions; however, with unequal sample
sizes, it can.
• When the assumptions underlying ANOVA are violated, you may use a
nonparametric test that does not require these assumptions.
Chi-Square Test for
Independence
The chi-square test is an example of a nonparametric test; that is,
one that does not depend on restrictive statistical assumptions, as
ANOVA does. This makes it a widely applicable and popular tool for
understanding relationships among categorical data.
Introduction
• A common problem in business is to determine whether two
categorical variables are independent.
• We introduced the concept of independent events in the energy
drink survey example where we used conditional probabilities to
determine whether brand preference was independent of
gender.
• However, with sample data, sampling error can make it difficult
to properly assess the independence of categorical variables.
• We would never expect the joint probabilities to be exactly the
same as the product of the marginal probabilities because of
sampling error even if the two variables are statistically
independent.
• Testing for independence is important in marketing applications.
Example: Independence and
Marketing Strategy

• A key marketing question is whether the proportion of males who prefer a particular brand is no
different from the proportion of females.
• If gender and brand preference are indeed independent, we would expect that about the same
proportion of the sample of female students would also prefer brand 1. If they are not
independent, then advertising should be targeted differently to males and females, whereas if
they are independent, it would not matter.
• We can test for independence by using a hypothesis test called the chi-square test for
independence. The chi-square test for independence tests the following hypotheses:
H0: the two categorical variables are independent
H1: the two categorical variables are dependent
Example: Independence and
Marketing Strategy cont.

Step 1: compute the expected frequency in each cell of the cross-tabulation if the two
variables are independent.
Example: Computing Expected
Frequency
Computing the Chi-Square
Statistic
• Step 2: Compute the chi-square statistic; the sum of
the squares of the differences between observed
frequency, fo, and expected frequency, fe, divided by
the expected frequency in each cell:

• The sampling distribution of 𝜒2 is a special


distribution called the chi-square (𝜒2) distribution.
Chi-Square Calculations
• Compare the chi-square statistic for a specified level of
significance a to the critical value from a chi-square
distribution with (r – 1)(c – 1) degrees of freedom, where r and
c are the number of rows and columns in the cross-tabulation
table, respectively.
• The Excel function CHISQ.INV.RT(probability, deg_ freedom) returns
the value of 𝜒2 that has a right-tail area equal to probability for a
specified degree of freedom.
• By setting probability equal to the level of significance, we can obtain
the critical value for the hypothesis test.
• If the test statistic exceeds the critical value for a specified level of
significance, we reject H0.
• The Excel function CHISQ.TEST(actual_range, expected_range)
computes the p-value for the chi-square test.
Results from the Chi-Square Test
• Test statistic = 6.49
• Deg. of freedom = (2 – 1)(3 – 1) =
2
• Critical value =
CHISQ.INV.RT(0.05, 2) = 5.99
• P-Value = CHISQ.TEST(F6:H7,
F12:H13)
• Decision: Reject H0
Cautions in Using the Chi-Square
Test
• The chi-square test assumes adequate expected cell
frequencies.
• A rule of thumb is that there be no more than 20% of
cells with expected frequencies smaller than 5, and no
expected frequencies of zero.
• Consider aggregating some of the rows or columns
in a logical fashion to enforce this assumption.
Example: Violations of Chi-
Square Assumptions
• A survey of 100 students at a university queried their
beverage preferences at a local coffee shop.

• Of the 16 cells, five, or over 30%, have frequencies


smaller than 5. Four of them are in the Cappuccino,
Latte, and Mocha columns; these can be aggregated
into one column called Hot Specialty beverages.
Example: Violations of Chi-
Square Assumptions
• Now only 2 of 12 cells have an expected frequency
less than 5; this now meets the assumptions of the
chi-square test.
End of Chapter Exercises
1. A business school has a goal that the average number of
years of work experience of MBA applicants is at least 3
years. Based on last year’s applicants, it was found that
among a sample of 47, the average number of years of
work experience is 2.57 with a standard deviation of 3.67.
What conclusion can the school reach?
2. A bank has historically found that the average monthly
charges in recent years on its credit card were $1,350. With
an improving economy, they suspect that this has
increased. A sample of 42 customers resulted in an average
monthly charge of $1,376.54 with a standard deviation of
$183.89. Does this data provide statistical evidence that the
average monthly charges have increased?
End of Chapter Exercises
3. A retailer believes that its new advertising strategy will
increase sales. Previously, the mean spending in 15
categories of consumer items in both the 18–34 and 35+ age
groups was $70.00.
a. Formulate a hypothesis test to determine if the mean spending in
these categories has statistically increased.
b. After the new advertising campaign was launched, a marketing
study found that the mean spending for 300 respondents in the 18–
34 age group was $75.86, with a standard deviation of $50.90. Is
there sufficient evidence to conclude that the advertising strategy
significantly increased sales in this age group?
c. For 700 respondents in the 35 + age group, the mean and standard
deviation were $68.53 and $45.29, respectively. Is there sufficient
evidence to conclude that the advertising strategy significantly
increased sales in this age group?
Any Questions
End of Chapter 5

You might also like