Unit 2 Da
Unit 2 Da
Unit 2 Da
Data Analytics
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1
What is a Hypothesis?
▪ A hypothesis is a claim
(assumption) about a
population parameter:
▪ population mean
Example: The mean monthly cell phone bill
of this city is μ = $42
▪ population proportion
Example: The proportion of adults in this
city with cell phones is π = 0.68
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-2
The Null Hypothesis, H0
H0 : μ = 3 H0 : X = 3
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-3
The Null Hypothesis, H0
(continued)
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-4
The Alternative Hypothesis, H1
▪ Is the opposite of the null hypothesis
▪ e.g., The average number of TV sets in U.S.
homes is not equal to 3 ( H1: μ ≠ 3 )
▪ Challenges the status quo
▪ Never contains the “=” , “≤” or “” sign
▪ May or may not be proven
▪ Is generally the hypothesis that the
researcher is trying to prove
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-5
Hypothesis Testing Process
Claim: the
population
mean age is 50.
(Null Hypothesis:
Population
H0: μ = 50 )
Now select a
random sample
Is X= 20 likely if μ = 50?
If not likely, Suppose
the sample
REJECT mean age Sample
Null Hypothesis is 20: X = 20
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
Reason for Rejecting H0
Sampling Distribution of X
X
20 μ = 50
If H0 is true
... then we
If it is unlikely that
reject the null
we would get a
... if in fact this were hypothesis that
sample mean of
the population mean… μ = 50.
this value ...
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-7
Level of Significance,
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-8
Level of Significance
and the Rejection Region
Level of significance = Represents
critical value
H0: μ = 3 /2 /2
H1: μ ≠ 3 Rejection
Two-tail test 0 region is
shaded
H0: μ ≤ 3
H1: μ > 3
Upper-tail test 0
H0: μ ≥ 3
H1: μ < 3
Lower-tail test 0
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-9
Errors in Making Decisions
▪ Type I Error
▪ Reject a true null hypothesis
▪ Considered a serious type of error
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-10
Errors in Making Decisions
(continued)
▪ Type II Error
▪ Fail to reject a false null hypothesis
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-11
Outcomes and Probabilities
Actual
Situation
Decision H0 True H0 False
Do Not
No error Type II Error
Reject
Key: (1 - ) (β)
Outcome H0
(Probability) Reject Type I Error No Error
H0 () (1-β)
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-12
Hypothesis Tests for the Mean
Hypothesis
Tests for
Known Unknown
(Z test) (t test)
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-13
One-Tail Tests
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-14
Lower-Tail Tests
H0: μ ≥ 3
▪ There is only one H1: μ < 3
critical value, since
the rejection area is
in only one tail
μ X
Critical value
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-15
Upper-Tail Tests
H0: μ ≤ 3
▪ There is only one
critical value, since H1: μ > 3
the rejection area is
in only one tail
Critical value
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-16
Example: Upper-Tail Z Test
for Mean ( Known)
A phone industry manager thinks that
customer monthly cell phone bills have
increased, and now average over $52 per
month. The company wishes to test this
claim. (Assume = 10 is known)
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-17
Example: Find Rejection Region
(continued)
▪ Suppose that = 0.10 is chosen for this test
= 0.10
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-18
Review:
One-Tail Critical Value
Standardized Normal
What is Z given = 0.10? Distribution Table (Portion)
0.90 0.10
Z .07 .08 .09
= 0.10
1.1 .8790 .8810 .8830
0.90
1.2 .8980 .8997 .9015
z 0 1.28
1.3 .9147 .9162 .9177
Critical Value
= 1.28
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-19
Example: Test Statistic
(continued)
X−μ 53.1 − 52
Z = = = 0.88
σ 10
n 64
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-20
Example: Decision
(continued)
Reach a decision and interpret the result:
Reject H0
= 0.10
Reject H0 P( X 53.1)
= 0.10
53.1 − 52.0
= P Z
0 10/ 64
= P(Z 0.88) = 1− 0.8106
Do not reject H0 Reject H0
1.28
Z = 0.88
= 0.1894
σKnown
Known σUnknown
Unknown
(Z test) (t test)
The test statistic is:
X −μ
t n-1 =
S
n
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-23
Example: Two-Tail Test
( Unknown)
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-24
Example Solution:
Two-Tail Test
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-25
What is an ANOVA?
Definition:
An ANOVA test is a type of statistical test used to
determine if there is a statistically significant
difference between two or more categorical
groups by testing for differences of means using
variance.
What is an ANOVA?(contd.)
Contd.
▪ ANOVA tests for significance use the F-test
for statistical significance. The F-test is a
groupwise comparison test, which means it
compares the variance in each group mean to
the overall variance in the dependent variable.
The Formula for ANOVA is:
▪
Assumptions of ANOVA
▪ Definition:
▪ A one way ANOVA is used to compare two means
from two independent (unrelated) groups using
the F-distribution.
▪ The null hypothesis for the test is that the
two means are equal.
Example:
▪ Situation 1: You have a group of individuals randomly
split into smaller groups and completing different tasks.
For example, you might be studying the effects
of tea on weight loss and form three groups:
green tea, black tea, and no tea.
▪ Situation 2: Similar to situation 1, but in this case the
individuals are split into groups based on an attribute
they possess.
For example, you might be studying leg strength of people
according to weight. You could split participants into
weight categories (obese, overweight and normal) and
measure their leg strength on a weight machine.
Limitation of one way ANOVA
▪ Definition:
▪ A two-way ANOVA is used to estimate how
the mean of a quantitative variable changes
according to the levels of two categorical variables.
▪ Use a two-way ANOVA when you want to know how
two independent variables, in combination, affect a
dependent variable.
Assumptions for Two Way
ANOVA
▪ An objective function:
The method finds to minimize the number of
overall features in a subset from the set of all
features to enhance the results.
▪ A sequential search algorithm:
This searching algorithm adds or removes the
feature candidate from the candidate subset while
evaluating the objective function or criterion.
Sequential Feature
▪ Definition:
▪ Stepwise regression is the step-by-step iterative
construction of a regression model that involves the
selection of independent variables to be used in a
final model. It involves adding or removing potential
explanatory variables in succession and testing for
statistical significance after each iteration.
Stepwise Regression
▪ In the end, the model might show that time of year and
temperatures are most significant, possibly suggesting
the peak energy consumption at the factory is when air
conditioner usage is at its highest.
Dummy Variables
▪ Definition:
▪ Numeric variables used in regression analysis to
represent categorical data that can only take on one
of two values: zero or one.
When to use dummy variable?
▪ When using categorical variables, it doesn’t make sense
to just assign values like 1, 2, 3, to values like “blue”,
“green”, and “brown” because it doesn’t make sense to
say that green is twice as colorful as blue or that brown is
three times as colorful as blue.
We could then
use Age and Gender_Dummy as predictor
variables in a regression model.
Logistic Regression
▪ Definition:
▪ Logistic regression is a process of modeling the
probability of a discrete outcome given an input
variable. The most common logistic regression
models a binary outcome; something that can take
two values such as true/false, yes/no, and so on.
▪ Logistic regression, despite its name, is
a classification model rather than regression model.
Linear VS Logistic Regression