Chi Square Test PDF
Chi Square Test PDF
Chi Square Test PDF
Test
Reporter: VANESSA O. NACAR
Objectives:
❑ Define and describe chi-square test.
❑ Describe uses of chi-square test.
❑ Perform chi-square goodness of fit test.
❑ Perform chi-square test of independence
and test of homogeneity.
Chi-Square Tests
01 03
Goodness-of-Fit Test Test of Homogeneity
typically used to 02 determines whether
two populations
determine if data fits a come from the same
particular distribution. Test of Independence distribution, even if
use of a contingency this distribution is
table to determine unknown
the independence of
two factors
What is Chi-Square Test?
❑ Common statistical test used for nominal
data
❑ Non-parametric or distribution-free test
❑ Developed by Karl Pearson in 1900
A. Nominal Variables
- variables that have two or more
categories, but which do not have
an intrinsic order.
Example:
❑ Type of property into distinct categories
such as houses, condos, co-ops or
bungalows.
❑ Classifying where people live in the USA by
states
B. Dichotomous Variables
- nominal variables which have only
two categories or levels.
Example:
❑ Gender: somebody as either “male”
or “female”
❑ Do you own a mobile phone?
Ownership as either “Yes” or “No”
❑ Type of property had been classified
as either residential or commercial.
When should you use a
chi-square test?
The chi-square test is appropriate when the following
conditions are met:
❑ the sampling method is simple random sampling
❑ the variable under study is categorical
❑ the expected value of the number of sample
observations in each level of the variable is at
least 5.
01
GOODNESS-OF-FIT
TEST
GOODNESS-OF-FIT TEST
❑ used to test whether a frequency
distribution fits an expected
distribution
❑ to calculate the test statistic for
the chi-square goodness-of-fit
test, the observed frequencies
and the expected frequencies are
used.
➢ The observed frequency (O) of a
category is the frequency for the
category observed in the sample data.
= 0.05
Step 3: Find the degrees of freedom
= 0.05
df = k-1
df =3-1
df = 2
Tabular
chi-square
value is
5.99
Step 4: Set up the decision rule
Next
Try This!
Employers want to know which days of the week employees are absent in a
five-day work week. Most employers would like to believe that employees
are absent equally during the week. Suppose a random sample of 60
managers were asked on which day of the week they had the highest
number of employee absences. The results were distributed as in the table
below. For the population of employees, do the days for the highest
number of absences occur with equal frequencies during a five-day work
week? Test at a 5% significance level.
Step 1. Formulate the hypotheses
= 0.05
Step 3: Find the degrees of freedom
= 0.05
df = k-1
df =5-1
df = 4
Tabular
chi-square
value is
9.49
Step 4: Set up the decision rule
If the absent days occur with equal frequencies, then, out of 60 absent days (the
total in the sample: 15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on
Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday.
Number of
Monday Tuesday Wednesday Thursday Friday
Absences
Observed
15 12 9 9 15
Values
Expected
12 12 12 12 12
Values
(𝑂 − 𝐸)2
𝑋 =
2
𝐸
2
(15 − 12)2 (12 − 12)2 (9 − 12)2 (9 − 12)2 (15 − 12)2
𝑋 = + + + +
12 12 12 12 12
(3) 2 (0) 2 (−3) 2 (−3) 2
(3) 2
2
𝑋 = + + + + Tabular
12 12 12 12 12 chi-square
𝟐
𝑿 =𝟑 value is
9.49
Step 6: Interpret the results
Column Totals
Expected Frequencies
(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍
Non-Jogger 𝟗𝟖 × 𝟒𝟗 𝟗𝟖 × 𝟏𝟐𝟎 𝟗𝟖 × 𝟒𝟏 98
𝑳𝑵𝑱 = = 𝟐𝟐. 𝟖𝟕 𝑴𝑵𝑱 = = 𝟓𝟔 𝑯𝑵𝑱 = = 𝟏𝟗. 𝟏𝟑
𝟐𝟏𝟎 𝟐𝟏𝟎 𝟐𝟏𝟎
Total 49 120 41 210
Step 3: Subtract the expected frequencies
from the observed frequencies.
O E 𝑶−𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87
-7.87
𝑴𝑱 (Moderate/Jogger) 57 64 -7
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7
𝑯𝑱 (High/Jogger) 21 21.87 -0.87
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87
Step 4: Square the difference.
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87
61.94
𝑴𝑱 (Moderate/Jogger) 57 64 -7 49
𝑴𝑵𝑱 (Moderate/Non- 63 56 7 49
Jogger)
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76
Step 5: Divide the squared difference
by the expected frequencies.
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94 2.37
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87 61.94 2.71
𝑴𝑱 (Moderate/Jogger) 57 64 -7 49 0.77
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7 49 0.88
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76 0.03
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76 0.04
Step 6: Add the quotients to obtain
the chi-square value.
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94 2.37
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87 61.94 2.71
𝑴𝑱 (Moderate/Jogger) 57 64 -7 49 0.77
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7 49 0.88
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76 0.03
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76 0.04
(𝑶 − 𝑬)𝟐
∑ = 𝟔. 𝟖𝟎
𝑬
Step 7: Find the degrees of freedom
= 0.05
df = (r-1)(c-1)
df = (2-1)(3-1)
df = (1)(2) Tabular
df = 2 chi-square
value is
5.99
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 6.80; Table 𝑥 2 = 5.99 at 0.05 and df = 2
Filipino people
❑ Opening of the second envelope favor; not favor or
neutral
❑ To calculate the test statistic for a test for
homogeneity, follow the same procedure as with
the test of independence.
Hypotheses
❑ 𝑯𝒐 : The distributions of the two populations are
the same.
❑ 𝑯𝒂 : The distributions of the two populations are
not the same.
Example
President Arroyo made a nationwide announcement
on television about her conversation with the COMELEC
Commissioner and she asked for public apology. To
determine the opinion of the public (Agree, Disagree, No
Opinion) , a survey was conducted in 3 municipalities of La
Union. The following table gives the opinion of 2000 parents
from San Fernando, 1500 parents from Rosario and 1000
parents from San Juan.
At the 0.01 level of significance, test for homogeneity
of opinion among the 3 municipalities concerning the public
apology of President Arroyo.
Observed Frequencies/Values
Opinion Municipalities
San Rosario San Juan Total
Fernando
Agree 650 660 360 1670
Opinion Municipalities
Expected San Rosario San Juan Total
Frequencies Fernando
Expected Frequencies
Calculate Test Statistic
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝑯𝑪 (Hypnotized/Correct) 7 12 -5 25 2.08
𝑯𝒊 (Hypnotized/Incorrect) 33 28 25 0.89
5
𝑪𝒄 (Control/Correct) 17 12 5 25 2.08
𝑪𝒊 (Control/Incorrect) 23 28 -5 25 0.89
𝒙𝟐 = 𝟓. 𝟗𝟒
Perform Step 3 - 6
Step 7: Find the degrees of freedom
= 0.05
df = (r-1)(c-1)
df = (2-1)(2-1)
df = (1)(1) Tabular
chi-square
df = 1 value is
3.84
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 5.94; Table 𝑥 2 = 3.84 at 0.05 and df = 1
𝒙𝟐 = 𝟓𝟎. 𝟖𝟑
Step 7: Find the degrees of freedom
= 0.05
df = (r-1)(c-1)
df = (2-1)(3-1)
df = (1)(2) Tabular
chi-square
df = 2 value is
5.99
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 50.83; Table 𝑥 2 = 5.99 at 0.05 and df = 1