Hypothesis Testing 7,8ppt
Hypothesis Testing 7,8ppt
Hypothesis Testing 7,8ppt
Tutorial Note: To keep the ratio larger than 1, the larger variance is placed in the
numerator. If the computed value of F is greater than the table value of F, we reject H0
and conclude that the two populations do not have the same variance. If the
computed value of F is less than the table value of F, we accept H0 and conclude that
the two populations have the same variance.
Assumptions of Analysis of Variance or “F-test”
The analysis of variance of F-Test is based on the
following assumptions:
1. Each sample is drawn randomly from a
normal population and the sample statistics
tend to reflect the characteristics of the
population.
2. The population from which the samples are
drawn have same means and variances i.e.
𝜇1 = 𝜇2 = 𝜇3 = ⋯ 𝜇𝑘
𝜎1 2 = 𝜎2 2 = 𝜎3 3 = ⋯ 𝜎𝑘 2
Uses of F-Test
F test is used –
For test of hypothesis of equality between two variances.
For test of hypothesis of equality amongst several sample
means.
Properties of F-Test
Range – Range of values of F is from 0 to ∞. The value of F can
never be negative since both terms of the F-ratio and squared
values.
Classification Model
There may be one way classification model or two
way classification model.
One-way Classification Model
One way classification model is designed to study the effect of one factor in
an experiment. For example, influence of application of one or more types of
fertilizers may be considered on several pieces of land. It is designed to test
the null hypothesis that the arithmetic means of the population from which
the k samples are randomly drawn are equal to one another.
𝐻𝑜 : 𝜇1 = 𝜇2 = 𝜇3 = ⋯ 𝜇𝑘
Practical Steps involved in one factor analysis of variance
Step-1: We set 𝐻𝑜 : 𝜎12 = 𝜎22 𝐻1 : 𝜎12 ≠ 𝜎22
Step-2: Calculate the mean of each sample i.e. 𝑋ത1 , 𝑋2 , … . … 𝑋ത𝑘 and grand
average as follows:
Step-4: Square these differences and obtain their total i.e. σ(𝑋ത1 − 𝑋ധ )2 for
each sample.
Step-5: Calculate the sum of squares between the samples (SSB) as follows:
SSB = σ(𝑋ത1 − 𝑋ധ )2 + (𝑋ത2 − 𝑋ധ )2 + σ(𝑋ത3 − 𝑋ധ )2 + ⋯
Step-6: Calculate the difference between the various items in a sample and the mean
values of the respective samples.
Step-7: Square these differences and obtain their total for each sample i.e.
σ(𝑋 − 𝑋ത )2
Step-8: Calculate the sum of squares within the samples (SSW) as follows:
𝑆𝑆𝑊 = σ(𝑋1 − 𝑋ത1 )2 + σ(𝑋2 − 𝑋ത2 )2 +
σ(𝑋3 − 𝑋ത3 )2 + ⋯
Step-9: Prepare ANOVA table as follows:
Source Degree Comput
Table
of Sum of of Mean ed
value of
variatio squares freedo squares value of
F
n m F
Betwee MSB =
n SSB c–1 F=
samples
Within MSW =
SSW n–c
Samples
Total n–1
Step-10: Compare the computed value of F
with the table value of F for the given degrees of
freedom as a given critical level (generally we
take 5% level of significance) and interpret the
same as follows:
Case Interpretation
(a) If the computed value of F The difference in the
is greater than the table value variances is significant and it
of F could not have arisen due to
fluctuation of random
sampling and hence we
reject 𝐻0
(b) If the computed value of F The difference in the variance
is less than the table value of F is not significant and it could
have arisen due to
fluctuations of random
sampling and hence we
accept 𝐻0
Case Study-16:
The following table gives the yields on 15
sample fields under three varieties of seeds; (viz.
A, B, C)
YIELDS
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17
Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are
categories.
In this table, yield is the dependant variable.
Varieties of seeds are factor on which yield
depends. Therefore, it is a one fact ANOVA.
Discussion of Case Study - 16.
Yields
A B C
5 3 10
6 5 13
8 2 7
1 10 13
5 0 17
H0: 𝜎 21 = 𝜎 2 2 =𝜎 2 3
H1:At least two of the population variancess are unequal.
20 45 125
SSB = 20+45+125=190
Calculation of Sum of Squares within Samples (SSW/SSE)
A B C
2 2 2
𝑥1 − 𝑥ҧ 𝑥2 − 𝑥ҧ2 𝑥3 − 𝑥ҧ
5−5 2 = 0 3−4 2 =1 10 − 12 2 = 4
6−5 2 = 1 5−4 2 =1 13 − 12 2 = 1
8−5 2 = 9 2−4 2 =4 7 − 12 2 = 25
1 − 5 2 = 16 10 − 4 2 = 36 13 − 12 2 = 1
5−5 2 = 0 0 − 4 2 = 16 17 − 12 2 = 25
26 58 56
SSW = 26+58+56=140
ANNOVA TABLE
One Way ANOVA
Source of Sum of Degree of Mean Squares Computed value Table
Variance Squares Freedom of F value
of F
Note: To simplify the calculations, one may add, subtract, multiply or divide the
given data by any figure. It will not affect the ultimate solution.
𝑇2
Step-3: Calculate Correction Factor ( ) as follows.
𝑁
𝑇 2 (𝑆𝑢𝑚 𝑜𝑓 𝑎𝑙𝑙 𝑡ℎ𝑒 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛𝑠 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠)2
=
𝑁 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑜𝑏𝑠𝑒𝑟𝑣𝑎𝑡𝑖𝑜𝑛 𝑜𝑓 𝑎𝑙𝑙 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Step-4: Calculate total sum of squares (SST) as follows:
SST = Sum of squares of all the observations – Correction Factor
2 2 𝑇2
= σ 𝑋1 + σ 𝑋2 … . −
𝑁
Step-5: Calculate sum of squares between samples (SSB) as follows;
(σ 𝑋1 )2 (σ 𝑋2 )2 𝑇2
= + +⋯ −
𝑁1 𝑁2 𝑁
• Step-6: Calculate sum of squares within samples
(SSW) as follows:
• SSW = SST – SSB
•
• Step-7: Prepare the ANOVA table as follows:
ANOVA Table
Source of Sum of Degree of Mean Variance
variation squares freedom squares Ratio
Within 𝑆𝑆𝑊
SSW n–c MSW =
Samples 𝑛−𝑐
Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are
categories.
In this table, yield is the dependant variable.
Varieties of seeds are factor on which yield
depends. Therefore, it is a one fact ANOVA.
Discussion of Case Study - 16.
𝐻0 : 𝜇1 = 𝜇2 = 𝜇3
𝐻1 : At least two of the population means are
unequal.
Calculation of sum of observations of each row and
each column and grand total:
Yields
A B C Row Total
5 3 10 18
6 5 13 24
8 2 7 17
1 10 13 24
5 0 17 22
𝑇2 105 2
𝐶𝑓 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = = = 735
𝑁 15
Sum of squares between columns
𝑇1 2 𝑇2 2 𝑇3 2
SSB=SSC = + + − 735
𝑁1 𝑁2 𝑁3
25 2 20 2 60 2
= + + − 735
5 5 5
= 190
Sum of the Squares of Total (SST)
𝑆𝑆𝑇 = 𝑋 2 − 𝑐𝑓
SST =
(52 + 62 + 82 + 12 + 52 + 32 + 52 + 22 + 102 + 02 + 102
+ 132 + 72 + 132 + 172 ) − 735
= (25+36+64+1+25+9+25+4+100+0+100+169+49+169+289)
– 735
Residual
(c – 1) 𝑆𝑆𝐸
Error SSE MSE =
( r – 1) 𝑐−1 (𝑟−1)
within/SSE
Total SST rc – 1
Step-9: Compare the computed value of F with
the table value of F for the given degrees of freedom
at a given critical level (generally we take 5% level of
significance) and interpret the same as follows:
Case Interpretation
(a) If the computed value of F is The difference in the variances is
greater than the table value of F significant and it could not have
arisen due to fluctuation of
random sampling and hence we
reject 𝐻0
Hint:
We have to analyse the variability among three
independent variables A, B, C where A, B C are categories
of wheat and variability among five independent
variables X,Y,Z,P,Q where X,Y,Z,P,Q are categories of plot of
land.
In this table, yield is the dependant variable.
Varieties of wheat are factors on which yield depends.
Varieties of land are factors on which yield depends.
A B C Row Total
X 5 3 10 𝑅1 =18
Y 6 5 13 𝑅2 =24
Z 8 2 7 𝑅3 =17
P 1 10 13 𝑅4 =24
Q 5 0 17 𝑅5 =22
Col Total 𝑇1 =𝐶1 =125 𝑇2 =𝐶2 =20 𝑇3 =𝐶3 =60 T=105
T=𝑇1 +𝑇2 +𝑇3 =25 + 20 + 60 = 105
𝑇2 105 2
𝐶𝑓 = 𝐶𝑜𝑟𝑟𝑒𝑐𝑡𝑖𝑜𝑛 𝑓𝑎𝑐𝑡𝑜𝑟 = = = 735
𝑁 15
Sum of squares between columns
𝑇1 2 𝑇2 2 𝑇3 2
SSB=SSC = + + − 735
𝑁1 𝑁2 𝑁3
25 2 20 2 60 2
= + + − 735
5 5 5
= 190
Sum of Squares between Rows (SSR)
18 2 24 2 17 2 24 2 22 2
SSR = + + + + − 735
3 3 3 3 3
= 14.66
Sum of the Squares of Total (SST)
2
𝑆𝑆𝑇 − 𝑋 − 𝑐𝑓
SST =
(52 + 62 + 82 + 12 + 52 + 32 + 52 + 22 + 102 + 02 + 102
+ 132 + 72 + 132 + 172 ) − 735
= (25+36+64+1+25+9+25+4+100+0+100+169+49+169+289) – 735
N=c . r
Since the computed value of F (6.06) is greater
than the tabular value of F (4.46), 𝐻0 is rejected.
It is concluded that there is a significance
difference between the variance of wheat.
• Mann-Whitney U test
Studen A B C D E F G H I J K L M N O
ts
Word 81 76 53 71 66 59 88 73 80 66 58 70 60 56 55
per
Minute
Hint of CASE STUDY-19
Median/Mean of a random sample is given to
compare
Sign Test
Studen A B C D E F G H I J K L M N O
ts
Sign + + - + + - + + + + - + 0 - -
𝑋 = 𝐸 ′ + ′ 𝑠𝑖𝑔𝑛 = 9
𝑯𝟎 : 𝝁 = 𝟔𝟎
𝑯𝟏 : 𝝁 ≠ 𝟔𝟎
𝟏 𝟏
𝑷= , 𝒒= , 𝒏 = 𝟏𝟓𝑯
𝟐 𝟐
𝟏
𝑬 = 𝒏𝒑 = 𝟏𝟓 = 𝟕. 𝟓, 𝑿=𝟗
𝟐
𝑿 − 𝒏𝒑 𝟗 − 𝟕. 𝟓 𝟏. 𝟓 𝟑 𝟑
𝒁= = = = = (𝒂𝒑𝒑𝒓𝒐𝒙)
𝒏𝒑𝒒 𝟏 𝟏 𝟏𝟓 𝟏𝟓 𝟑. 𝟖𝟕
𝟏𝟓
𝟐 𝟐 𝟒
𝑍𝐶𝑎𝑙 = 0.775 < 𝑍𝑡𝑎𝑏 = 𝑍0.05 = 1.96
Hence, null hypothesis is accepted. Hence Median = 60
CASE STUDY-20
Use the sign test to see if there is a difference
between the number of day’s until collection of
an account receivable, before and after a new
collection policy. Take 𝛼 = 0.05.
Before 30 28 34 35 40 42 33 38
After 32 29 33 32 37 43 40 41
Before 34 45 28 27 25 41 36
After 37 44 27 33 30 38 36
Hint: CASE STUDY-20
Sign Test
No information about distribution.
𝑿−𝒏𝒑
𝒁= (Matched pair, Non-parametric test)
𝒏𝒑𝒒
Discussion of CASE STUDY-20
X=6
X = No. of ‘+’ sign.
𝟏
𝑿 − 𝒏𝒑 𝟔 − 𝟏𝟓
𝒁= = 𝟐 == −𝟏. 𝟓 ≅ −𝟎. 𝟕𝟕𝟓 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝒏𝒑𝒒 𝟏 𝟏 𝟏. 𝟗𝟑𝟔
𝟏𝟓
𝟐 𝟐
𝑍 = 0.775 < 𝑍0.05 = 1.96. 𝑯𝟎 is accepted. There is no significance difference.
CASE STUDY-21
Rank in 4 6 1 3 9 7 10 2 8 5
Training
(𝑅𝑥 )
Rank in 5 8 3 1 7 6 9 2 10 4
Field (𝑅𝑦 )
Hint of CASE STUDY-21
Ranks are given,
Spearman’s Rank Test
𝑯𝟎 : r = 0 (There is no correlation)
𝑯𝟏 : r ≠0 (There is correlation)
Rank in 4 6 1 3 9 7 10 2 8 5
Training
(𝑅𝑥 )
Rank in 5 8 3 1 7 6 9 2 10 4
Field (𝑅𝑦 )
𝑑 -1 -2 -2 2 2 1 1 0 -2 1
= 𝑅𝑥 − 𝑅𝑦
𝑑2 1 4 4 4 4 1 1 0 4 1
2
𝑑 = 24
2
6σ𝑑 6 24 144
𝑟 =1− 2
= =1−
𝑛 𝑛 −1 10 100 − 1 10 99
⇒ 𝑟 = 0.8545 ≅ 𝟎. 𝟖𝟓 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝟏 𝟏 𝟏 𝟏
𝑆. 𝐸 𝑟 = = = = = 𝟎. 𝟑𝟑
𝒏−𝟏 𝟏𝟎 − 𝟏 𝟗 𝟑
𝒓−𝟎 𝟎. 𝟖𝟓 − 𝟎
𝒁= = ≅ 𝟐. 𝟓𝟖 (𝒂𝒑𝒑𝒓𝒐𝒙)
𝑺. 𝑬 (𝒓) 𝟎. 𝟑𝟑
𝑯𝟎 is rejected.
There is a correlation between training and performance.
CASE STUDY-22
A larger hospital hires most of its doctors from the two major universities. Over the last
year, hospital has been conducting test for the newly recruited doctors to determine which
school educate better. Based on the following scores, help the human resource department
of the hospital to decide whether the universities differ in quality. (α= 𝟎. 𝟏𝟎)
Test Score
University A 99 83 89 64 98 85 61 79 91 87 88
University B 96 90 97 94 86 95 68 78 93 56 76 84
Hint of CASE STUDY-22
• Two samples are independent.
• No Information about distribution
• Nonparametric Test
• U Test
Given
Universit 99 83 89 64 98 85 61 79 91 87 88 -
yA
Universit 96 90 97 94 86 95 68 78 93 56 76 84
yB
Score 99 98 97 96 95 94 93 91 90 89 88 87 86
Rank 1 2 3 4 5 6 7 8 9 10 11 12 13
Univers A A B B B B B A B A A A B
ity
Score 85 84 86 79 78 76 68 64 61 56
Rank 14 15 16 17 18 19 20 21 22 23
Univers A B A A B B B A A B
ity
Calculation of Sum of Ranks
Universit 99 98 91 89 88 87 85 83 79 64 61 -
yA
Rank 1 2 8 10 11 12 14 16 17 21 22 𝑅1
= 134
Universit 97 96 95 94 93 90 86 84 78 76 68 56
yB
Rank 3 4 5 6 7 9 13 15 18 19 20 23 𝑅2
= 142
𝒏𝟏 . 𝒏𝟐 (𝒏𝟏 + 𝒏𝟐 + 𝟏) 𝟏𝟑𝟐(𝟐𝟒)
𝑺. 𝑬 𝑼 = 𝝈𝒗 = =
𝟏𝟐 𝟏𝟐
= 𝟏𝟔. 𝟐𝟒𝟖𝟏
𝑼 −𝝁𝒗 𝟔𝟒−𝟔𝟔 −𝟐
𝒁= = = = −𝟎. 𝟏𝟐𝟑𝟏
𝑺.𝑬.(𝑼) 𝟏𝟔.𝟐𝟒𝟖𝟏 𝟏𝟔.𝟐𝟒𝟖𝟏