Chi Square Test PDF

Download as pdf or txt
Download as pdf or txt
You are on page 1of 82

Chi-Square

Test
Reporter: VANESSA O. NACAR
Objectives:
❑ Define and describe chi-square test.
❑ Describe uses of chi-square test.
❑ Perform chi-square goodness of fit test.
❑ Perform chi-square test of independence
and test of homogeneity.
Chi-Square Tests
01 03
Goodness-of-Fit Test Test of Homogeneity
typically used to 02 determines whether
two populations
determine if data fits a come from the same
particular distribution. Test of Independence distribution, even if
use of a contingency this distribution is
table to determine unknown
the independence of
two factors
What is Chi-Square Test?
❑ Common statistical test used for nominal
data
❑ Non-parametric or distribution-free test
❑ Developed by Karl Pearson in 1900
A. Nominal Variables
- variables that have two or more
categories, but which do not have
an intrinsic order.

Example:
❑ Type of property into distinct categories
such as houses, condos, co-ops or
bungalows.
❑ Classifying where people live in the USA by
states
B. Dichotomous Variables
- nominal variables which have only
two categories or levels.
Example:
❑ Gender: somebody as either “male”
or “female”
❑ Do you own a mobile phone?
Ownership as either “Yes” or “No”
❑ Type of property had been classified
as either residential or commercial.
When should you use a
chi-square test?
The chi-square test is appropriate when the following
conditions are met:
❑ the sampling method is simple random sampling
❑ the variable under study is categorical
❑ the expected value of the number of sample
observations in each level of the variable is at
least 5.
01
GOODNESS-OF-FIT
TEST
GOODNESS-OF-FIT TEST
❑ used to test whether a frequency
distribution fits an expected
distribution
❑ to calculate the test statistic for
the chi-square goodness-of-fit
test, the observed frequencies
and the expected frequencies are
used.
➢ The observed frequency (O) of a
category is the frequency for the
category observed in the sample data.

➢ The expected frequency (E) of a category is the


calculated frequency for the category. Expected
frequencies are obtained assuming the specified
(or hypothesized) distribution. The expected
frequency for the ith category is
Ei = npi
where n is the number of trials (the sample size) and
pi is the assumed probability of the ith category.
Example
200 teenagers are randomly selected and asked what their favorite
pizza topping is. The results are shown below.

Find the observed frequencies and the expected frequencies. Ei = npi

Topping Results % of Observed Expected


(n = 200) teenagers Frequency Frequency
Cheese 78 41% 78 200(0.41) = 82
Pepperoni 52 25% 52 200(0.25) = 50
Sausage 30 15% 30 200(0.15) = 30
Mushrooms 25 10% 25 200(0.10) = 20
Onions 15 9% 15 200(0.09) = 18
FORMULA
(𝑶 − 𝑬 )𝟐
𝟐 𝒊 𝒊
𝒙 =∑
𝑬𝒊
where,
𝑶𝒊 is the observed frequency in 𝑖 𝑡ℎ category;
𝑬𝒊 is the expected frequency in the 𝑖 𝑡ℎ category
Goodness-of-Fit is typically used to see if the population is
uniform (all outcomes occur with equal frequency), the
population is normal, or the population is the same as
another population with a known distribution. The null and
alternative hypotheses are:

𝐻𝑜 : The population fits the given distribution.


𝐻𝑎 :The population does not fit the given distribution.
Example
In the 2010 Philippine Census, the ages of individuals
in a small town were found to be the following:
Less than 18 18-40 Greater than 40
20% 30% 50%

In 2020, ages of n = 1000 individuals were sampled.


Below are the results:
Less than 18 18-40 Greater than 40
288 571 141

Using  = 0.05, conclude that the population distribution


of ages has changed in the last 10 years.
n = 1000 Ei = npi

Less than 18 18-40 Greater than 40


Expected 20% 30% 50%

Less than 18 18-40 Greater than 40


Observed 288 571 141
Expected 1000(0.20)=200 1000(0.30)=300 1000(0.50)=500
Step 1. Formulate the hypotheses

𝐻𝒐 : The data meet the expected


distribution.
𝐻𝒂 : The data do not meet the expected
distribution.
Step 2: Set the level of significance
or alpha level

 = 0.05
Step 3: Find the degrees of freedom
 = 0.05
df = k-1
df =3-1
df = 2
Tabular
chi-square
value is
5.99
Step 4: Set up the decision rule

Decision Rule: Reject 𝐻0 if chi-square statistic is


greater than the critical value. Otherwise, fail to
reject 𝐻0 . (If 𝑥 2 is greater than 5.99, reject 𝐻0 )
Step 5: Compute the test statistic
Less than 18 18-40 Greater than 40
Observed Values 288 571 141
Expected Values 200 300 500
(𝑂 − 𝐸)2
𝑋2 = 
𝐸
2 2 2
(288 − 200) (571 − 300) (141 − 500)
𝑋2 = + +
200 300 500
(88) 2 (271) 2 (−359) 2
2
𝑋 = + +
200 300 500
2
7744 73441 128881
𝑋 = + + = 𝟓𝟒𝟏. 𝟐𝟗
200 300 500
Step 6: Interpret the results

Since, the chi-square statistic value (541.29) is


greater than 5.99, we reject the null hypothesis
(𝐻0 ). Therefore, at 5% level of significance,
there is sufficient data that the observed
distribution do not meet the expected
distribution.

Next
Try This!
Employers want to know which days of the week employees are absent in a
five-day work week. Most employers would like to believe that employees
are absent equally during the week. Suppose a random sample of 60
managers were asked on which day of the week they had the highest
number of employee absences. The results were distributed as in the table
below. For the population of employees, do the days for the highest
number of absences occur with equal frequencies during a five-day work
week? Test at a 5% significance level.
Step 1. Formulate the hypotheses

H0: The absent days occur with equal frequencies,


that is, they fit a uniform distribution.
Ha: The absent days occur with unequal
frequencies, that is, they do not fit a uniform
distribution.
Step 2: Set the level of significance
or alpha level

 = 0.05
Step 3: Find the degrees of freedom
 = 0.05
df = k-1
df =5-1
df = 4
Tabular
chi-square
value is
9.49
Step 4: Set up the decision rule

Decision Rule: Reject 𝐻0 if chi-square statistic is


greater than the critical value. Otherwise, fail to
reject 𝐻0 . (If 𝑥 2 is greater than 9.49, reject 𝐻0 )
Step 5: Compute the test statistic
Number of
Monday Tuesday Wednesday Thursday Friday
Absences
Observed 15
Values
12 9 9 15
Expected
12 12 12 12 12
Values

If the absent days occur with equal frequencies, then, out of 60 absent days (the
total in the sample: 15 + 12 + 9 + 9 + 15 = 60), there would be 12 absences on
Monday, 12 on Tuesday, 12 on Wednesday, 12 on Thursday, and 12 on Friday.
Number of
Monday Tuesday Wednesday Thursday Friday
Absences
Observed
15 12 9 9 15
Values
Expected
12 12 12 12 12
Values

(𝑂 − 𝐸)2
𝑋 = 
2
𝐸
2
(15 − 12)2 (12 − 12)2 (9 − 12)2 (9 − 12)2 (15 − 12)2
𝑋 = + + + +
12 12 12 12 12
(3) 2 (0) 2 (−3) 2 (−3) 2
(3) 2
2
𝑋 = + + + + Tabular
12 12 12 12 12 chi-square
𝟐
𝑿 =𝟑 value is
9.49
Step 6: Interpret the results

Since, the chi-square statistic value (3) is less


than 9.49, we fail to reject the null hypothesis
(𝐻0 ). Therefore, at 5% level of significance,
from the sample data, there is sufficient
evidence to conclude that the absent days
occur with equal frequencies.
CONTINGENCY TABLE
❑ A contingency table is a type in a
matrix format that displays the
frequency distribution of the variables.
❑ They provide a basic picture of the
interrelation between two variables
and can help find interaction between
them.
Column 1 Column 2 Totals
Row 1 A B R1
Row 2 C D R2
Totals C1 C2 N

The chi-square statistic compares the observed


count in each table cell to the count which would
be expected under the assumption of no
association between the row and column
classifications.
Tests using Contingency Table
The test of independence is used to determine
whether two variables are independent of or
related to each other when a single sample is
selected.

The test of homogeneity is used to determine


whether the proportions for a variable are
equal when several samples are selected from
different populations.
02 TEST OF
INDEPENDENCE
Test of Independence

A chi-square independence test is used to test


the independence of two variables. Using a chi-
square test, you can determine whether the
occurrence of one variable affects the probability
of the occurrence of the other variable.
Test of Independence
To test whether two categorical variable are associated
with each other, the formula employed is:
𝟐
𝟐
(𝑶𝒊𝒋 − 𝑬 𝒊𝒋 )
𝒙 =∑
𝑬𝒊𝒋
rows columns
where,
𝑶𝒊𝒋 is the observed frequency in 𝑖 𝑡ℎ and 𝑗𝑡ℎ category;
𝑬𝒊𝒋 is the expected frequency in the 𝑖 𝑡ℎ and 𝑗𝑡ℎ category
For a contingency table that has r rows and c
columns, the Chi-square test can be
generalized as a test of independence. Thus, as
a test of independence, hypotheses are as
follows:
𝑯𝒐 : There is no relationship between two categorical
variables. (The two variables are independent.)
𝑯𝒂 : There is a relationship between two categorical
variables. (The two variables are not independent.)
Decision Rule
2
➢ Reject 𝐻𝑜 if 𝑥 2 ≥ 𝑥 (𝑟 − 1)(𝑐 − 1); otherwise fail
2
to reject H0.
➢ Reject the null hypothesis at a specified
level of significance if the computed value
of chi-square exceeds the table value.
Example
A study is being conducted to determine
whether there is a relationship between
jogging and blood pressure. A random sample
of 210 subjects is selected, and they are
classified as shown in the contingency table.
Using  = 0.05, determine whether a
relationship exists between jogging and blood
pressure.
Observed Frequencies/Values
Blood Pressure
Jogging Low Moderate High Total
Status
Jogger 34 57 21 112
Non-Jogger 15 63 20 98

Total 49 120 41 210


Step 1. Formulate the hypotheses

𝑯𝟎 : There is no relationship between


jogging and blood pressure.
𝑯𝒂 : There is a relationship between jogging
and blood pressure.
Step 2: Obtained the expected
frequencies for each cell
The expected frequency in each cell can be determined by
getting the product of the row total and column total then
divide the product by the grand total.
(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍
Observed Frequencies
Low Moderate High Total
Jogger 34 57 21 112
Non-Jogger 15 63 20 98 Row Totals

Total 49 120 41 210 Grand Total

Column Totals
Expected Frequencies
(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)
𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍

Low Moderate High Total


Jogger 𝟏𝟏𝟐 × 𝟒𝟗 𝟏𝟏𝟐 × 𝟏𝟐𝟎 𝟏𝟏𝟐 × 𝟒𝟏 112
𝑳𝑱 = = 𝟐𝟔. 𝟏𝟑 𝑴𝑱 = = 𝟔𝟒 𝑯𝑱 = = 𝟐𝟏. 𝟖𝟕
𝟐𝟏𝟎 𝟐𝟏𝟎 𝟐𝟏𝟎

Non-Jogger 𝟗𝟖 × 𝟒𝟗 𝟗𝟖 × 𝟏𝟐𝟎 𝟗𝟖 × 𝟒𝟏 98
𝑳𝑵𝑱 = = 𝟐𝟐. 𝟖𝟕 𝑴𝑵𝑱 = = 𝟓𝟔 𝑯𝑵𝑱 = = 𝟏𝟗. 𝟏𝟑
𝟐𝟏𝟎 𝟐𝟏𝟎 𝟐𝟏𝟎
Total 49 120 41 210
Step 3: Subtract the expected frequencies
from the observed frequencies.
O E 𝑶−𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87
-7.87

𝑴𝑱 (Moderate/Jogger) 57 64 -7
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7
𝑯𝑱 (High/Jogger) 21 21.87 -0.87
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87
Step 4: Square the difference.

O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87
61.94

𝑴𝑱 (Moderate/Jogger) 57 64 -7 49
𝑴𝑵𝑱 (Moderate/Non- 63 56 7 49
Jogger)
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76
Step 5: Divide the squared difference
by the expected frequencies.
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94 2.37
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87 61.94 2.71
𝑴𝑱 (Moderate/Jogger) 57 64 -7 49 0.77
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7 49 0.88
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76 0.03
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76 0.04
Step 6: Add the quotients to obtain
the chi-square value.
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝐿𝐽 (Low/Jogger) 34 26.13 7.87 61.94 2.37
𝐿𝑁𝐽 (Low/Non-Jogger) 15 22.87 -7.87 61.94 2.71
𝑴𝑱 (Moderate/Jogger) 57 64 -7 49 0.77
𝑴𝑵𝑱 (Moderate/Non-Jogger) 63 56 7 49 0.88
𝑯𝑱 (High/Jogger) 21 21.87 -0.87 0.76 0.03
𝑯𝑵𝑱 (High/Non-Jogger) 20 19.13 0.87 0.76 0.04
(𝑶 − 𝑬)𝟐
∑ = 𝟔. 𝟖𝟎
𝑬
Step 7: Find the degrees of freedom
 = 0.05
df = (r-1)(c-1)
df = (2-1)(3-1)
df = (1)(2) Tabular
df = 2 chi-square
value is
5.99
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 6.80; Table 𝑥 2 = 5.99 at 0.05 and df = 2

❑ The tabular chi-square value of 5.99 is less


than the computed value of 6.80

Decision Rule: Reject the null hypothesis at a specified level of


significance if the computed value of chi-square exceeds the table value.
Step 9: Make a Conclusion

➢ Since, the tabular chi-square value of 5.99 is


less than the computed value of 6.80, then
there is sufficient evidence to reject the null
hypothesis and conclude that there is
significant relationship between jogging and
blood pressure.
TEST OF
03 HOMOGENEITY
Test of Homogeneity
❑ It is used to test the homogeneity of the
responses of the respondents with
regard to certain issues and opinions;
where responses are put in a
contingency table.
Example:
❑ Impeachment trial of Pres. Estrada – the reactions of the

Filipino people
❑ Opening of the second envelope favor; not favor or

neutral
❑ To calculate the test statistic for a test for
homogeneity, follow the same procedure as with
the test of independence.

Hypotheses
❑ 𝑯𝒐 : The distributions of the two populations are
the same.
❑ 𝑯𝒂 : The distributions of the two populations are
not the same.
Example
President Arroyo made a nationwide announcement
on television about her conversation with the COMELEC
Commissioner and she asked for public apology. To
determine the opinion of the public (Agree, Disagree, No
Opinion) , a survey was conducted in 3 municipalities of La
Union. The following table gives the opinion of 2000 parents
from San Fernando, 1500 parents from Rosario and 1000
parents from San Juan.
At the 0.01 level of significance, test for homogeneity
of opinion among the 3 municipalities concerning the public
apology of President Arroyo.
Observed Frequencies/Values
Opinion Municipalities
San Rosario San Juan Total
Fernando
Agree 650 660 360 1670

Disagree 420 300 260 980

No Opinion 930 540 380 1850

2000 1500 1000 4500


Step 1. Formulate the hypotheses.

𝑯𝟎 : For each opinion, the proportions


of municipalities are the same.
𝑯𝒂 : For at least one opinion the
proportions of the municipalities are
not the same.
Step 2: Determine the expected
frequencies.
The expected frequency in each cell can be determined by
getting the product of the row total and column total then
divide the product by the grand total.

(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍)


𝑬𝒙𝒑𝒆𝒄𝒕𝒆𝒅 𝒇𝒓𝒆𝒒𝒖𝒆𝒏𝒄𝒚 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍
Opinion Municipalities
Observed
San Rosario San Juan Total
Fernando Frequencies
Agree 650 660 360 1670
Disagree 420 300 420 1140
No Opinion 930 540 220 1690
2000 1500 1000 4500

Opinion Municipalities
Expected San Rosario San Juan Total
Frequencies Fernando

(𝒓𝒐𝒘 𝒕𝒐𝒕𝒂𝒍)(𝒄𝒐𝒍𝒖𝒎𝒏 𝒕𝒐𝒕𝒂𝒍) Agree 742.22 556.67 371.11 1670


𝑬𝑭 =
𝒈𝒓𝒂𝒏𝒅 𝒕𝒐𝒕𝒂𝒍 Disagree 506.67 380 253.33 1140
No Opinion 751.11 563.33 375.56 1690
2000 1500 1000 4500
Step 3: Subtract the expected frequencies
from the observed frequencies.
O E 𝑶−𝑬
San Fernando/Agree 650 742.22 -92.22
San Fernando/Disagree 420 506.67 -86.67
San Fernando/No Opinion 930 751.11 178.89
Rosario/Agree 660 556.67 103.33
Rosario/Disagree 300 380 -80
Rosario/No Opinion 540 563.33 -23.33
San Juan/Agree 360 371.11 -11.11
San Juan/Disagree 420 253.33 166.67
San Juan/ No Opinion 220 375.56 -155.56
Step 4: Square the difference.
O E 𝑶−𝑬 (𝑶 − 𝑬) 𝟐
San Fernando/Agree 650 742.22 -92.22 8504.52
San Fernando/Disagree 420 506.67 -86.67 7511.69
San Fernando/No Opinion 930 751.11 178.89 32001.63
Rosario/Agree 660 556.67 103.33 10677.09
Rosario/Disagree 300 380 -80 6400
Rosario/No Opinion 540 563.33 -23.33 544.29
San Juan/Agree 360 371.11 -11.11 123.43
San Juan/Disagree 420 253.33 166.67 27778.89
San Juan/ No Opinion 220 375.56 -155.56 24198.91
Step 5: Divide the squared difference by
the expected frequencies.
O E 𝑶−𝑬 (𝑶 − 𝑬) 𝟐 (𝑶 − 𝑬) 𝟐
𝑬
San Fernando/Agree 650 742.22 -92.22 8504.52 11.46
San Fernando/Disagree 420 506.67 -86.67 7511.69 14.83
San Fernando/No 930 751.11 178.89 32001.63 42.61
Opinion
Rosario/Agree 660 556.67 103.33 10677.09 19.18
Rosario/Disagree 300 380 -80 6400 16.84
Rosario/No Opinion 540 563.33 -23.33 544.29 0.97
San Juan/Agree 360 371.11 -11.11 123.43 0.33
San Juan/Disagree 420 253.33 166.67 27778.89 109.65
San Juan/ No Opinion 220 375.56 -155.56 24198.91 64.43
Step 6: Add the quotients to obtain the
chi-square value.
O E 𝑶−𝑬 (𝑶 − 𝑬) 𝟐 (𝑶 − 𝑬) 𝟐
𝑬
San Fernando/Agree 650 742.22 -92.22 8504.52 11.46
San Fernando/Disagree 420 506.67 -86.67 7511.69 14.83
San Fernando/No Opinion 930 751.11 178.89 32001.63 42.61
Rosario/Agree 660 556.67 103.33 10677.09 19.18
Rosario/Disagree 300 380 -80 6400 16.84
Rosario/No Opinion 540 563.33 -23.33 544.29 0.97
San Juan/Agree 360 371.11 -11.11 123.43 0.33
San Juan/Disagree 420 253.33 166.67 27778.89 109.65
San Juan/ No Opinion 220 375.56 -155.56 24198.91 64.43
𝟐
𝑶−𝑬
∑ = 𝟐𝟖𝟎. 𝟑
𝑬
Step 7: Find the degrees of freedom
 = 0.01
df = (r-1)(c-1)
df = (3-1)(3-1)
df = (2)(2) Tabular
df = 4 chi-square
value is
13.28
Step 8: Compare the obtained chi-square
value with the table value at 0.01 level of
significance.
𝑥 2 = 280.3; Table 𝑥 2 = 13.28 at 0.01 and df = 4

❑ The tabular chi-square value of 13.28 is less


than the computed value of 280.3

Decision Rule: Reject the null hypothesis at a specified level of


significance if the computed value of chi-square exceeds the table value.
Step 9: Make a Conclusion
➢ Since, the tabular chi-square value of 13.28 is
less than the computed value of 280.3, then
there is sufficient evidence to reject the null
hypothesis and conclude that at least the
proportions of the opinions in each municipality
are not the same. Meaning, people in different
municipalities give different views with regards
to the public apology of Pres. Arroyo.
Try This!
Considering a study in which the effectiveness of
hypnosis as a means of improving the memory of the
eyewitness to a crime is examined and the result is
shown:
Step 1. Formulate the hypotheses.
Ho: Hypnosis does not affect the
recognition memory of eyewitness to a
crime.
Ha: Hypnosis affects the recognition
memory of eyewitness to a crime.
Step 2: Obtained the expected
frequencies for each cell
Observed Frequencies

Expected Frequencies
Calculate Test Statistic
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
𝑯𝑪 (Hypnotized/Correct) 7 12 -5 25 2.08
𝑯𝒊 (Hypnotized/Incorrect) 33 28 25 0.89
5
𝑪𝒄 (Control/Correct) 17 12 5 25 2.08

𝑪𝒊 (Control/Incorrect) 23 28 -5 25 0.89

𝒙𝟐 = 𝟓. 𝟗𝟒

Perform Step 3 - 6
Step 7: Find the degrees of freedom
 = 0.05
df = (r-1)(c-1)
df = (2-1)(2-1)
df = (1)(1) Tabular
chi-square
df = 1 value is
3.84
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 5.94; Table 𝑥 2 = 3.84 at 0.05 and df = 1

❑ The obtained chi-square value of 5.94 is


greater than the tabular value of 3.84.

Decision Rule: Reject the null hypothesis at a specified level of


significance if the computed value of chi-square exceeds the table value.
Step 9: Make a Conclusion

➢ Since, the obtained chi-square value of 5.94 is greater


than the tabular value of 3.84, then we have sufficient
evidence to reject the null hypothesis. The result suggests
significant difference in the ability of hypnotized and
control subjects in identifying a thief. The hypnotized
subjects were less not more accurate in identifying the
thief.
Try This!
Suppose that 250 randomly selected male college students
and 300 randomly selected female college students were
asked about their living arrangements: dormitory,
apartment, with parents, other. Do male and female college
students have the same distribution of living arrangements?
Use a level of significance of 0.05. The results are shown in
figure below.
Low Income Middle Income High Income
Male 109 365 26
Female 192 249 9
Do male and female college students have the same
distribution of living arrangements?

Step 1. Formulate the hypotheses.


Ho: The income distribution is the same
for the males and females.
Ha: The income distribution is not the
same for the males and females.
Step 2: Obtained the expected frequencies
for each cell
Observed Low Income Middle High Total
Frequencies Income Income
Male 109 365 26 500

Female 192 249 9 450

Total 301 614 35 950

Expected Low Income Middle High Income Total


Frequencies Income
Male 158 323 18 500

Female 143 291 17 450

Total 301 614 35 950


Calculate Test Statistic
O E 𝑶−𝑬 (𝑶 − 𝑬)𝟐 (𝑶 − 𝑬)𝟐
𝑬
Low Income/Male 109 158 -49 2401 15.20
Low Income/Female 192 143 49 2401 16.79
Middle Income/Male 365 323 42 1764 5.46
Middle Income/Female 249 291 -42 1764 6.06
High Income/Male 26 18 8 64 3.56
High Income/Female 9 17 -8 64 3.76

𝒙𝟐 = 𝟓𝟎. 𝟖𝟑
Step 7: Find the degrees of freedom
 = 0.05
df = (r-1)(c-1)
df = (2-1)(3-1)
df = (1)(2) Tabular
chi-square
df = 2 value is
5.99
Step 8: Compare the obtained chi-square
value with the table value at 0.05 level of
significance.
𝑥 2 = 50.83; Table 𝑥 2 = 5.99 at 0.05 and df = 1

❑ The obtained chi-square value of 50.83 is


greater than the tabular value of 5.99.

Decision Rule: Reject the null hypothesis at a specified level of


significance if the computed value of chi-square exceeds the table value.
Step 9: Make a Conclusion

➢ Since, the obtained chi-square value of 50.83 is


greater than the tabular value of 5.99, then we
must reject the null hypothesis. Therefore, the
income distribution among males and females is
the same.
References:
❑ https://www.jmp.com/en_ch/statistics-knowledge-portal/chi-
square-test/chi-square-goodness-of-fit-test.html
❑ https://www.investopedia.com/terms/g/goodness-of-fit.asp
❑ https://link.springer.com/referenceworkentry/10.1007%2F978-1-
4020-5614-
7_3475#:~:text=The%20chi%2Dsquare%20test%20of,the%20ro
w%20and%20column%20labels.
❑ https://courses.lumenlearning.com/odessa-introstats1-
1/chapter/goodness-of-fit-test/
THANK YOU
FOR
LISTENING !
CREDITS: This presentation template was created by
Slidesgo, including icons by Flaticon, and infographics
& images by Freepik

You might also like