U-3 Notes

UNIT III − TESTING OF HYPOTHESIS – PARAMETRIC TESTS
Introduction
Population in statistics means the whole of the information which comes under the purview of statistical
investigation. A part of the population selected for study is called a sample. When the sample is drawn properly, it
is identical with its population almost in all respect.
Inferential statistics is used to measure behavior in samples to learn more about the behavior in populations that
are very large or inaccessible. Samples are used because it is obvious how they are related to populations. For
example, if we want to have an idea of the average income of the people of a country, we will have to collect all the
earning individuals in the country-, which is quite difficult task. Hence samples are used.
Mean, median, mode, standard deviation are some examples of the statistical measure. It can be evaluated from
the population and samples. A numerical measure of a sample is called a statistic. A numerical measure of a
population is called a parameter. Population parameters are estimated by sample statistics. When a sample
statistic is used to estimate a population parameters, the statistic is called an estimator of the parameter.
Sampling Theory
The process of selecting a sample from an population/universe is called sampling. Theory of sampling is a study
of relationship existing between a population and samples drawn from the population. The aim is to get
information about the population by examining a sample of it.
A random sample is on in which each element of the population has an equal chance of inclusion in the sample. i.e.
each part of the population has some pre assigned probability of being selected in the sample.
Sampling Distribution
Sampling distribution of a statistic is the frequency distribution which is formed with various values of a statistic
calculated from all samples of same size. i.e. For any sample  x1 , x2 ,..., xn  of a given finite population, we can
compute statistics t  x1 , x2 ,..., xn  such as mean, variance etc. The set of all such statistics, one for each sample, is
called the sampling distribution of a statistic.
Standard Error of a Statistic
The standard deviation of the sampling distribution of a statistic is known as standard error. It is used to
measure the variability of the values of a statistic.
The Standard Errors of some of the well-known statistics, for large samples, are given below, where n is the
sample size,  2 is the population variance.
Statistic Standard Error Statistic Standard Error

Sample Mean x  Sample Standard 2
Deviation s
n 2n
Observed Sample PQ Sample Variance 2
Proportion p s2 2
n n
Example: A telephone tower monitored for an hour was found to have an estimated mean of 20 signals
transmitted per minute. The variance is known to be 4. Find the standard error for mean.
Department of Mathematics, SJCET

Solution: Given  2 = 4. Also x = 20 / min and n = 1 hr = 60min
 2
Standard Error = = 0.2582
n 60
Test of Significance
Sampling theory deals with a problem of testing hypothesis. A hypothesis is a statement about the
population parameter, i.e. a conclusion tentatively drawn on logical basis.
The method in which we select samples to learn more about characteristics in a given population is called
hypothesis testing. Hypothesis testing is a systematic way to test claims or population. i.e. It enables us to
decide, on the basis of the results of the sample, whether (i) the deviation between the observed sample
statistic and the hypothetical parameter value or (ii) the deviation between two samples statistics is
significant
Procedure for testing a hypothesis
Step 1: Setting up of Null Hypothesis H 0 : It is a definite statement about population parameter set up whether to
accept or reject it. It states that there is no difference between the sample statistic and population parameter. To
test the statement about population, hypothesis that it is true.
Step 2: Setting up Alternative Hypothesis H1 : It is a complementary statement to null hypothesis. It is set in such
a way that the rejection of null hypothesis implies the acceptance of alternative hypothesis.
Step 3: Computation of test statistic:

t − E (t )
For large sample (n  30) , Z − statistic is used and it is defined as Z = N (0,1) as n → 
S .E.(t )
Difference of means
For small sample, the student’s t − statistic is used and it is defined as Z = with n − 1 degrees
S .E.( x )
of freedom.
Types of Errors in Hypothesis Testing: There are two possible types of errors which may arise in testing a
hypothesis.
Type I Error : Rejecting Null Hypothesis when it is true.
Step 4: The probability of making Type I error is denoted by  , the level of significance. The probability level
below which we reject a null hypothesis is called the level of significance. In other words, level of significance is
the size of the type I error If level of significance in 5%, then we say that the probability for committing Type I
error is 0.05. This means that a correct decision is made 95% confidently.
Type II Error : Accepting Null Hypothesis when it is wrong.
Step 5: Critical or Rejection Region:

A region corresponding to a test statistic in the sample space which tends to rejection of H 0 is called critical
region or region of rejection. The value of the test statistic is known as critical value z . The critical value
separates the region of rejection from the acceptance region.

Step 6:.Two tailed and One tailed test: The probability curve of a sampling distribution is a normal curve. The
rejection may be represented by a area on each sides or by one side of the normal curve and the corresponding test
is known as two tailed or one tailed respectively. In two tailed test, the alternative hypothesis is denoted by    0
In one tailed test the same is denoted by    0 or    0 .
Step 7: Conclusion: If Z  Z , then we accept the null hypothesis. If Z  Z , then we reject the null
hypothesis.
Note:
1. Compare calculated value of z with the critical value z at level of significance  . The critical value of z of
the test statistic for a two tailed test is given by p ( z  z ) =  . By symmetry of normal curve
p ( z  z ) + p ( z  − z ) = 
2 p ( z  z ) = 

p ( z  z ) =
2
In case of one tailed test, p ( z  z ) =  if it is right tailed test; p ( z  − z ) =  if it is left tailed.
2. The critical value of z for one tailed test at level of significance  is same as the critical value of z for
two tailed test at level of significance 2 .
From the normal table, the critical values of z at different levels of significant are listed below:
1% 5% 10%
Two tailed test z = 2.58 z = 1.96 z = 1.645
Right tailed test z = 2.33 z = 1.645 z = 1.28

Left tailed test z = −2.33 z = −1.645 z = −1.28
3. The values of the test statistic which separates the critical region and acceptance region is called
critical values. This value is dependent on level of significance and alternative hypothesis.
Degrees of freedom
The number of independent variates used to compute the test statistic is known as the number of degrees of
freedom. In general, the number of degrees of freedom is given by v = n − k , where n is the number of
observations in the sample and k is the number of constraints imposed on them.
Testing of Significance for Single Mean – Large Sample
To Test whether sample mean differs from the hypothetical population mean
Working Rule
Sample size n , mean x , SD S and population mean  , SD  is given
• Set up null hypothesis H0 :  = x (the sample is drawn from the given population)

• Set up alternative hypothesis H1 . This will determine whether we have to use right tailed or left tailed
or two tailed test.
x− x−
• Compute z = (or ) z = .
 S    
   
 n  n
• Choose appropriate level of significance and find table value of z (critical value)
• Compare calculated value of z with the tabulated value.
• Conclusion: If z  z then accept H 0 . If z  z then reject H 0 .
Solved Problems
1. For the following case, specify which probability distribution to use in a hypothesis test.
(a). H 0 :  = 98, H1 :   98, x = 65, s = 12, n = 42
(a) Test of significance for single mean, large sample, one tailed test
2. A standard sample of 200 tins of coconut oil gave an average weight of 4.95 kgs with a standard
deviation of 0.21 kg. Do we accept that the net weight is 5kgs per tin at 5% level of significance?
Given n = 200, x = 4.95, s = 0.21,  = 5
Null Hypothesis H0 :  = 5 (no significant difference between sample mean and population mean)
Alternative Hypothesis H1 :   5 (two tailed test)
x −  4.95 − 5
Test statistic z = = = −3.36
S 0.21
n 200
Table value of z for 5% level of significance is 1.96
Since calculated value of z  the tabulated value, we reject the null hypothesis.
we can’t accept that the net weight is 5kgs per tin
3. A random sample of 100 recorded deaths in India during the past year showed an average life span
of 71.8 years. Assuming a population standard deviation of 8.9 years, does this seem to indicate that
the mean life span today is greater than 70 years? Use a 0.05 level of significance.
Given n = 100, x = 71.8,  = 70,  = 8.9

1. H 0 :  = 70
2. H1 :   70 [ use right tailed test ]
3.  = 5%

x− 71.8−70
4. The test statistic Z = = = 2.02
( n)
  8.9 
 
 100 
5. Tabulated value of z at 5% level is z = 1.64

6. Conclusion: Since calculated value of Z = 2.02 > the tabulated value, we reject the null hypothesis H 0
. i.e. there is significant difference between sample mean and population mean.
Test of significance for difference of means − Large Sample
Working Rule
When n1 , x1 , s1 be the sample size, mean and SD of first sample and n2 , x2 , s2 be the sample size, mean
and SD of second sample and population SD  is given
• Set up null hypothesis H 0 = 1 = 2 (samples are drawn from the populations with same mean)
or two tailed test.

x1 − x2
• Compute test statistic z = (if population SD not known and distinct) or
s12 s22
+
n1 n2
x1 − x2
z= (if samples are drawn from same population) (or)
1 1
 +
n1 n2
x1 − x2
z= (if samples are drawn from two population with same SD)
s12 s22
+
n2 n1
• Choose appropriate level of significance and find table value of z (critical value)
• Compare calculated value of | z | with the tabulated value.
Solved Problems
1 Write down the formula of test statistic t to the significance of difference between the mean(large
samples)
x1 − x2 x1 − x2
Test statistic z = or z =
s2
s 2
 12  22
1
+ 2
+
n1 n2 n1 n2

2. The sales manager of a large company conducted a sample survey in states A and B taking 400
samples in each case. The results were
State A State B
Average Sales Rs.2,500 Rs.2,200
S.D. Rs.400 Rs.550
Test whether the average sales is the same in the 2 states at 1% level of significance.
Here n1 = 400, x1 = 2500, s1 = 400 and n2 = 400, x2 = 2200, s2 = 550

Null Hypothesis H 0 = 1 = 2 (i.e. the sales in two states are equal)
Alternative Hypothesis: H1 = 1  2 (Two tailed test)
x1 − x2 2500 − 2200
Test statistic z = = = 8.82
2 2
s s 4002 5502
1
+ 2
+
n1 n2 400 400
Taking level of significance as 5%, the table value is z0.05 = 1.96
Since calculated z is greater than the tabulated value z , we reject the null hypothesis H 0 .
i.e. there is a significant difference in sales of the two cities.
3. A Mathematics test was given to 50 girls and 75 boys. The girls made an average grade of 76 with an
SD of 6 and the boys mad an average grade of 82 with an SD of 2. Test whether there is any difference
between the performance of boys and girls.
Here n1 = 50, x1 = 76, s1 = 6 and n2 = 75, x2 = 82, s2 = 2

Null Hypothesis H 0 = 1 = 2 (i.e. the performance of boys and girls are equal)
x1 − x2 76 − 75
s12 s22 62 22
+ +
n1 n2 50 75
Since calculated z is less than the tabulated value z , we accept the null hypothesis H 0 .
i.e. there is a no difference in the performance of boys and girls.
4. In a random sample of size 500, the mean is found to be 20. In another independent sample of size
400, mean is 15. Could the samples have been drawn from the same population with S.D. 4. Use 1%
level of significance.
Here n1 = 500, x1 = 20, n2 = 400, x2 = 15,  = 4
Null Hypothesis H 0 = 1 = 2 (i.e. the samples have been taken from the same population)

Test statistic z =
x1 − x2 20 − 15
= = 18.6
1 1 1 1
 + 4 +
n1 n2 500 400
Taking level of significance as 1%, the table value is z = 1.58
Since calculated z is greater than the tabulated value z , we conclude that the difference between
x1 and x2 is significant at 1% level of significance.
Hence we reject the null hypothesis H 0 . i.e. the samples could not have been drawn from the same
population.
5. The mean height of two samples of 1000 and 2000 members are respectively 67.5 and 68 inches. Can
they be regarded as drawn from the same population with standard deviation 2.5inches at 5% level
of significance?
Here n1 = 1000, x1 = 67.5, n2 = 2000, x2 = 68,  = 2.5
Null Hypothesis H 0 = 1 = 2 (i.e. the samples have been taken from the same population)
x1 − x2 67.5 − 68
1 1 1 1
 + 2.5 +
n1 n2 1000 2000
i.e. the samples could not have been drawn from the same population.
6. A random sample of 100 bulbs from a company P shows a mean life 1300 hours and standard
deviation of 82 hrs. Another random sample of 100 bulbs from company Q showed a mean life of
1248 hours and standard deviation of 93 hours. Are the bulbs of company P superior to bulbs of
company Q at 5% level of significance.
Here n1 = 100, x1 = 1300, s1 = 82 and n2 = 100, x2 = 1248, s2 = 93

Null Hypothesis H 0 = 1 = 2 (i.e. both the company bulbs are equally superior)
Alternative Hypothesis: H1 = 1  2 (One tailed test)
x1 − x2 1300 − 1248 52
Test statistic z = = = = 4.19
s2
s 2 2
82 93 2 12.39
1
+ 2
+
n1 n2 100 100
Taking level of significance as 5%, the table value is z = 2.33
i.e. the bulbs of company P is superior to bulbs of company Q.

7. Given 𝒏𝟏 =32, 𝒏𝟐 = 𝟑𝟔. ̅̅̅𝟏 = 72,
𝒙 ̅̅̅𝟐 = 74
𝒙 𝒔𝟏 = 𝟖, 𝒔𝟐 = 𝟔
Test if the means are significant.
Here n1 = 32, x1 = 72, s1 = 8 and n2 = 36, x2 = 74, s2 = 6

Null Hypothesis H 0 = 1 = 2
x1 − x2 72 − 74
2 2
s s 82 6 2
1
+ 2
+
n1 n2 32 36
i.e. there is a no difference in the sample means(both come from same population).
8. The mean height of 50 male students who showed above average participation in college athletics
was 68.2 inches with a standard deviation of 2.5 inches; while 50 male students who showed no
interest in such participation had a mean height of 67.5 inches with a standard deviation of 2.8 inches.
a. Test the hypothesis that male students who participate in college athletics are taller than other
male students.
b. By how much should the sample size of each of the two groups be increase in order that the
observed difference of 0.7 inches in the mean height be significant at the 5% level of significance.
Here n1 = 50, x1 = 68.2, s1 = 2.5; n2 = 50, x2 = 67.5, s2 = 2.8

Null Hypothesis H 0 = 1 = 2 (i.e. there is no difference between the means of the population)
Alternative Hypothesis: H1 = 1  2 (One tailed test)
x1 − x2 68.2 − 67.5
2 2
s s 2.52 2.82
1
+ 2
+
n1 n2 50 50
i.e. the height of the male students who participate in college athletics and other male students are same.
To find the sample size if the difference between the two population means are significant.
This may happen if z  1.645

x1 − x2
 1.645
s12 s22
+
n1 n2
68.2 − 67.5
 1.645
2.52 2.82
+
n n

0.7
 1.645
1
( 3.7536 )
n
1.645  3.7536
n
0.7
1.645  3.7536 
2
n 
 0.7
n  78
9. Test the significance of the difference between the means of the samples, drawn from two normal
populations with same SD using the following data:
Size Mean SD
Sample 1 100 61 4
Sample 2 200 63 6
Here n1 = 100, x1 = 61, s1 = 4; n2 = 200, x2 = 63, s2 = 6

Null Hypothesis H 0 = 1 = 2 (i.e. there is no difference between the means of the population)
x1 − x2 61 − 63
2 2
s s 42 62
1
+ 2
+
n2 n1 200 100
(Note the formula: samples are drawn from two population with same SD )
i.e. the populations, from which samples are drawn may not have the same mean.
Test of significance for Single Proportion – Large Sample
Test of significance of the difference between sample proportion and population proportion
Working Rule
When sample proportion p and population proportion P is given
• Set up null hypothesis H 0 : p = P (or) H 0 : P = Given Value
or two tailed test.

p−P
• Compute test statistic z = where Q = 1 − P, n = sample size.
PQ
n
• Choose appropriate level of significance and table value of z (critical value)
Solved Problems
1. A coin is tossed 144 times and head appeared 80 2. A coin is tossed 800 times and head appeared 350
times. Can we say that the coin is unbiased? times. Can we say that he has made a random tossing
each time? (equivalently can we say that the coin is
unbiased?
Probability of getting a head in a toss
1 1 Probability of getting a head in a toss
P = , hence Q = 1 − P = . Given n = 144
2 2 1 1
P = , hence Q = 1 − P = . Given n = 800
80 2 2
Given sample proportion p =
144 350
1 800
Null hypothesis H 0 : P = (the coin is unbiased)
2 1
Null hypothesis H 0 : P = (random tossing is made)
1 2
Alternative hypothesis H1 : P  (the coin is biased)
2 1
Alternative hypothesis H1 : P  (coin is not randomly
Test statistic: 2
tossed
p−P
z=
PQ Test statistic:
n p−P
z=
PQ
80 1
− n
= 144 2
1 1 350 1
. −
2 2 = 800 2
144 1 1
.
= 1.333 2 2
800
= −3.525
We choose 5% level of significance and hence the table
value z0.05 = 1.96 We choose 5% level of significance and hence the table
value z0.05 = 1.96
Since calculated value of z  z then accept H 0 .
i.e. the coin is unbiased Since calculated value of z  z then reject H 0 .
i.e. the coin is not tossed randomly (equivalently the coin
is not unbiased)

.
3. Experience has shown that 20% of a manufactured 4. In a city, a sample of 500 people, 280 are tea
product is of top quality. In one day’s production of drinkers and the rest are coffee drinkers. Can we
400 articles, only 50 are of top quality. Show that assume that both coffee and tea are equally popular in
either the production of the day chosen was not a this city at 5% level of significance.
representative sample or the hypothesis of 20% was
wrong.
Population proportion for tea drinkers
Population probability for top quality product 1 1
P = , hence Q = 1 − P = . Given n = 500
20 80 2 2
P= , hence Q = 1 − P = . Given n = 400
100 100 280
50 500
400 1
Null hypothesis H 0 : P = (tea and coffee are equally
1 2
Null hypothesis H 0 : P = (20% products
5 popular)
manufactured is of top quality) 1
Alternative hypothesis H1 : P  (tea and coffee are not
1 2
Alternative hypothesis H1 : P  (20% products
5 equally popular)
manufactured is not of top quality) p−P
z=
p−P PQ
z=
PQ n
n 280 1
−
50 1 = 500 2
− 1 1
= 400 5 .
1 4 2 2
. 500
5 5
400 0.06
=
−0.075 0.022
=
0.02 = 2.68
= −3.75 We choose 5% level of significance and hence the table
We choose 5% level of significance and hence the table value z0.05 = 1.96
value z0.05 = 1.96
Since calculated value of z  z then reject H 0 .
Since calculated value of z  z then reject H 0 .
i.e. 20% products manufactured is not of top quality i.e. tea and coffee are not equally popular
.

5. In a city 325 men out of 600 men were found to be p−P
smokers. Does this information support the conclusion z=
PQ
that the majority of men in this city are smokers?
n
Population proportion of smokers in the city
325 1
1 1 −
P = , hence Q = 1 − P = . Given n = 600 Test Statistic: = 600 2
2 2 1 1
.
325 2 2
600 600
Null hypothesis H 0 : P =
1
(smokers and non smokers
= 2.043
2
We choose 5% level of significance and hence the table
are equal in the city) value z0.05 = 1.645
1 Since calculated value of z  z then reject H 0 .
Alternative hypothesis H1 : P  (right tailed test)
2 i.e. majority of men in the city are smokers.
Test of significance for Difference of Proportions – Large Samples
Working Rule
When p1 , p2 be two sample proportions drawn from the same population or from two populations with the
same proportion P is given
• Set up null hypothesis H 0 : P1 = P2 (Population proportions are equal)
or two tailed test.
p1 − p2
• Compute test statistic z = where Q = 1 − P, n1 , n2 = sample sizes.
1 1
PQ  + 
 n1 n2 
n1 p1 + n2 p2
If P is not known, then P =
n1 + n2
• Choose appropriate level of significance and table value of z (critical value)
( p1 − p2 ) − d0
Note: Suppose we want to test H0 : P1 − P2 = d0 against H1 : P1 − P2  d0 . Now z = .
1 1
PQ  + 
 n1 n2 

Solved Problems
1. In a large city A 20% of a random sample of 900 school boys had a slight physical defect. In another
city B 18.5% of a random sample of 1600 school boys had the same effect. Is the difference between
the proportions significant?
Null Hypothesis: H 0 : P1 = P2 , the difference between the two proportions is not significant
Alternative Hypothesis: H1 : P1  P2
20 18.5
Given p1 = = 0.2 and p2 = = 0.185 . Also n1 = 900, n2 = 1600
100 100
n1 p1 + n2 p2 900(0.2) + 1600(0.185)
Hence P = = = 0.19 and Q = 1 − P = 0.81
n1 + n2 900 + 1600
p1 − p2 0.20 − 0.185
Therefore test statistic z = = = 0.918
1 1  1 1 
PQ  +  (0.19)(0.81)  + 
 n1 n1   900 1600 
Table value of z at 5% level of significance is 1.96
Since calculated value of z  tabulated value, we accept null hypothesis. Therefore the difference between
the proportions are not significant.
2. 400 men and 600 women were asked whether they would like to have a flyover near their residence.
200 men and 325 women were in favour of the proposal. Test whether these two proportions are
same.
Null Hypothesis: H 0 : P1 = P2 , the difference between the attitude of men and women as far as the
proposal is concerned is not significant.

Alternative Hypothesis: H1 : P1  P2
200 325
Given p1 = = 0.5 and p2 = = 0.542 . Also n1 = 400, n2 = 600
400 600
n1 p1 + n2 p2 400(0.5) + 600(0.542)
Hence P = = = 0.525 and Q = 1 − P = 0.475
n1 + n2 400 + 600
p1 − p2 0.5 − 0.542 −0.042
Therefore test statistic z = = = = −1.302
1 1  1 1  0.032234
PQ  +  (0.525)(0.475)  + 
 n1 n1   400 600 
Since calculated value of z  tabulated value, we accept the null hypothesis. Therefore the difference
between the attitude of men and women as far as the proposal is concerned is not significant.
3. A cigarette manufacturing firm claims that its brand A outsells its brand B by 8%. It is found that 42
out of a sample of 200 smokers prefer brand A and 18 out of another sample of 100 smokers prefer
brand B. Test whether the 8% difference is a valid claim.

Null Hypothesis: H 0 : P1 − P2 = 0.08 , the difference between the sale of brand A and brand B is 8%.
Alternative Hypothesis: H1 : P1 − P2  0.08
42
Proportion of preference of brand A p1 = = 0.21
200
18
Proportion of preference of brand B p2 = = 0.18 . Also n1 = 200, n2 = 100
100
n1 p1 + n2 p2 200(0.21) + 100(0.18)
Hence P = = = 0.2 and Q = 1 − P = 0.8
n1 + n2 200 + 100
( p1 − p2 ) − d0 0.03 − 0.08 −0.05

Therefore test statistic z = = = = −1.02
1 1  1 1  0.0489
PQ  +  (0.2)(0.8)  + 
 n1 n1   200 100 
Since calculated value of z  tabulated value, we accept the null hypothesis. Therefore the difference of
8% in the sale of brand A and brand B is a valid claim.
Test of Significance of the Mean (t-test) – Small Sample
This t − distribution is used when sample size is  30 and the population SD is unknown.
To Test whether sample mean differs from the hypothetical population mean
Working Rule
Sample size n , mean x , SD s and population mean  is given
• Set up null hypothesis H 0 :  = given value.

or two tailed test.
x−
• Compute t = .
 s 
 
 n −1 
x ( x − x )
2
x−
• If set of sample values are given find x = and s = , then t =
n n −1  s 
 
 n
• Choose appropriate level of significance and degrees of freedom ( n − 1) and find table value (critical
value) of t .
• Compare calculated value of | t | with the tabulated value.
• Conclusion: If t  t then accept H 0 . If t  t then reject H 0 .

Solved Problems
1 What are the applications of t-distributions?
To test if the sample mean differs significantly from the population mean
To test the significance between two sample means.
2. For the following case, specify which probability distribution to use in a hypothesis test.
(a). H 0 :  = 27, H1 :   27, x = 20.1,  = 5, n = 12
(a) Test of significance for single mean, small sample, two tailed test.
3. A company claims that a vacuum cleaner uses an average of 46 kilowatt hours per year. If a random
sample of 12 homes included in a planned study indicates that vacuum cleaners use an average 42
kilowatt hours per year with a standard deviation of 11.9 kilowatt hours, does this suggest at the 0.05
level of significance that vacuum cleaners use, on average, less than 46 kilowatt hours annually?
Assume the population of kilowatt hours to be normal.
Given n = 12 ,  = 46 KW , s = 11.9 and x = 42
1. H :  = 46
0
2. H :   46
1
3.  = 5% , d . f = n − 1 =12 − 1 = 11
x− 42 − 46
4. The test statistic t= = = −1.1
 s   11.9 
   
 n − 1   12 − 1 
5. For 5% level of significance, the tabulated value at 11 degrees freedom is t = 2.2
6. Conclusion : Since, calculated value of t = 1.1  the tabulated value, we accept H 0 .
4. Machinist is making engine parts with axle diameters of 0.7 inch. A random sample of 10 parts shows
a mean diameter of 0.742 inch with a standard deviation of 0.04 inch. Compute the statistic to test
the work is meeting the specification.
Given x = 0.742, n = 10 , s = 0.4 and population mean  = 0.7

Null Hypothesis H 0 :  = 0.7 (the product is confirming the specification)
Alternative Hypothesis H1 :   0.7 (two tailed test)
x −  0.742 − 0.7 0.042

Test statistic t = = = = 0.315
s 0.4 0.13
n −1 9
Table value of t for n − 1 = 9 degrees freedom at 5% level of significance is 2.26
Since calculated value of t  the tabulated value, we accept the null hypothesis.

5. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size of 22, test the
hypothesis that the value of the population mean is 70 against the alternative that is more than 70.
Use the 0.025 significance level.
Given x = 83, n = 22 and s = 12.5 and population mean  = 70
Null Hypothesis H 0 :  = 70 (sample mean is not different from the population mean)
Alternative Hypothesis H1 :   70 (one tailed test)
x −  83 − 70 13
s 12.5 0.841
n −1 21
Table value of t for n −1 = 21 degrees freedom at 0.025 level of significance is 35.47
Since calculated value of t  the tabulated value, we accept the null hypothesis.
6. A certain pesticide is packed into bags by a machine. A random sample of 10 bags is chosen and the
contents of the bags is found to have the following weights (in kgs) 50, 49, 52, 44, 45, 48, 46, 45, 49
and 45. Test if the average quantity packed be taken as 50 kg.
Let us tabulate the values as follows:
T:
x 50 49 52 44 45 48 46 45 49 45
473
T:
(x − x)
2
7.29 2.89 22.09 10.89 5.29 0.49 1.69 5.29 2.89 5.29
64.1
 x = 473 = 47.3 ( x − x )
2
64.1
x= and s = = = 2.66
n 10 n −1 9
Null Hypothesis H 0 :  = 50 (sample mean weight is not different from the expected weight)
Alternative Hypothesis H1 :   50 (two tailed test)
x −  47.3 − 50 −2.7
Test statistic t = = = = −3.2
s 2.66 0.841
n 10
Since calculated value of t  the tabulated value, we reject the null hypothesis.

Test for Difference of means of two samples – t test − Small Samples
Test of significance of the difference between two sample means
Working Rule
When n1 , x , s1 be the sample size, mean and SD of first sample and n2 , y , s2 be the sample size, mean
and SD of second sample is given
• Set up null hypothesis H 0 = 1 = 2 (samples are drawn from the populations with same mean)
or two tailed test.

x−y n1s12 + n2 s22
• Compute test statistic t = where estimated population variance S = 2
.
1 1 n1 + n2 − 2
S  + 
 n1 n2 
• Choose appropriate level of significance, degrees of freedom n1 + n2 − 2 and table value of t (critical
value)
• Compare calculated value of t with the tabulated value.
• Conclusion: If t  t then accept H 0 . If t  t then reject H 0 .
Note 1: If we were asked to test whether both the samples come from same normal population, we have to apply
both t and F tests.
Note 2: Instead of sample values xi , yi sometimes, the difference between them, say, X = xi − yi will be given.
X
In that case the test statistic is t = and proceed like test of hypothesis of single mean – Small sample
S
n
problem.
Note 3: Instead of two different samples, pairs of values which are correlated will be given. Then the test
d
statistic is t = where d = xi − yi
S
n
Solved Problems
1t. Two independent samples of sizes 8 and 7 contained the following values:
Sample I : 19 17 15 21 16 18 16 14
Sample II: 15 14 15 19 15 18 16
Is the difference between the sample means significance? Use 5% level of significance.

Given n1 = 8 and n2 = 7
x1 19 17 15 21 16 18 16 14
 x = 136
1
361 289 225 441 256 324 256 196

 x = 2348
2 2
x1 1
x2 15 14 15 19 15 18 16
 x = 112
2
225 196 225 361 225 324 256

 x = 1812
2 2
x 2 2
Mean of first sample x1 =

x 1
=
136
= 17
n1 8
Variance of I sample s12 =

x −(x ) 2
1 2
=
2348
− 17 2 = 4.5
1
n 1 8
Mean of second sample x2 =

x 2
=
112
= 16
n2 7
Variance of II sample s22 =

x −(x )
2
2 2
=
1812
− 162 = 2.85
2
n
2 7
n1s12 + n2 s22 8  4.5 + 7  2.85

Population variance S 2 = = = 4.303 and hence S = 2.07
n1 + n2 − 2 8+7−2
Null hypothesis H 0 = 1 = 2 (No significant difference between means of sample I and II)
x1 − x2 17 − 16 1
1 1 1 1 1.075
S  +  2.07  + 
 n1 n2  8 7
Table value of t at 5% level of significance for v = n1 + n2 − 2 = 13 degrees freedom is t0.05 = 2.16
Since, calculated value of t  the tabulated value, we accept H 0
i.e. two sample means do not differ significantly.
2. Two random samples gave the following results:
Sample Size Sample Mean Sum of squares of deviation from the mean
1 10 15 90
2 12 14 108
Test whether the samples come from the same normal population at 5% level of significance
(given F0.05 ( 9,11) = 2.9, F0.05 (11,9 ) = 3.1, t0.05 ( 20 ) = 2.086, t0.05 ( 22 ) = 2.07 approximately)

Here we have to apply both t and F test.
To test the mean
Given sample sizes n1 = 10 and n2 = 12 . Sample mean x1 = 15 and x2 = 14
( x − x ) = 90 and (x − x2 ) = 108

2 2
Also given that 1 1 2
1 1
 sample variances are s12 =  ( x1 − x1 ) = (90) = 10 and
2
n1 − 1 9
1 1
s22 =  ( x2 − x2 ) = (108) = 9.8
2
n2 − 1 11
n1s12 + n2 s22 10 10 + 12  9.8

Population variance S = = = 10.88 and hence S = 3.298
2
n1 + n2 − 2 10 + 12 − 2
Null hypothesis H 0 = 1 = 2 (No significant difference between means of sample 1 and 2)
x1 − x2 15 − 14 1
1 1 1 1 1.411
S  +  3.298  + 
 n1 n2   10 12 
Table value of t at 5% level of significance for v = n1 + n2 − 2 = 20 degrees freedom is t0.05 = 2.086
Since, calculated value of t  the tabulated value, we accept H 0
i.e. two sample means do not differ significantly. and both the samples come from same population.
To test the variance
Null hypothesis H 0 :  12 =  22 (No significant difference between variances of sample 1 and 2)
Alternative Hypothesis: H1 : 12   22
n1s12 10 10 n s 2 12  9.8

Population variances are S12 = = = 11.11 and S22 = 2 2 = = 10.69
n1 − 1 9 n2 − 1 11
S12 11.11
Test statistic F = = = 1.03
S22 10.69
Degrees of freedom ( n1 − 1, n2 − 1) = ( 9, 11)
Table value of F for degrees ( 9, 11) of freedom at 5% level of significance is 2.9
Since, calculated value of F  the tabulated value, we accept H 0
i.e. two sample variances do not differ significantly.

Therefore we conclude that both the samples were from same normal population with same mean.

3. A certain medicine administered to each of 10 patients resulted in the following increases in the B.P.
8, 8, 7, 5, 4, 1, 0, 0, −1, −1. Can it be concluded that the medicine was responsible for the increase
in B.P. 5% level of significance.
We are given the increments in blood pressure i.e X = xi − yi
Null Hypothesis H 0 = 1 = 2 (no significant difference in the BP before and after the medicine)
Alternative Hypothesis H1 : 1  2 (one tailed test)
T:
X 8 8 7 5 4 1 0 0 −1 −1
31
T:
(X − X )
2
24.1 24.1 15.21 3.61 0.81 4.41 9.61 9.61 16.81 16.81
125.08
X ( X − X )
2
31 125.08
X= = = 3.1 and S = = = 3.727
n 10 n −1 9
X 3.1 3.1
S 3.727 1.178
n 10
Since calculated value of t  the tabulated value, we reject the null hypothesis.
(i.e. the medicine was responsible for the increase in B.P.)
4. Memory capacity of 9 students was tested before and after a meditation treatment for a month. State
whether the treatment was effective or not from the following data:
Before treatment : 10 15 9 3 7 12 16 17 4
After treatment : 12 17 8 5 6 11 18 20 3
We are given the paired values i.e. same set of students and the data are concerned.
Null Hypothesis H 0 = 1 = 2 (training was not effective)
Alternative Hypothesis H1 = 1  2

Before 10 15 9 3 7 12 16 17 4
Training
After 12 17 8 5 6 11 18 20 3
Training
Difference
d
2 2 −1 2 −1 −1 2 3 −1  d =7
d2 4 4 1 4 1 1 4 9 1 d 2
=29
d=
 d = 7 = 0.7778 and S =
d 2
=
29
= 1.9
n 9 n −1 8
d 0.778
Test statistic t = = = 1.23
S 1.9
n 9
Since calculated value of t  the tabulated value, we accept the null hypothesis. i.e. training was not
improving the memory capacity
F-Test : Test of Significance for Equality of Population Variance (Small Sample)
This is used to test the significance of sample estimates of population variance. Under the null hypothesis
that the population variances are equal, the test statistic is given by
S12 1 1
F= , assuming S12  S 22 where S12 =  ( x − x ) , S22 =  (y − y)
2 2
S22
n1 − 1 n2 − 1
are unbiased estimates of the common population variance  2 obtained from two independent samples.
The test statistic follows F-distribution with degrees freedom ( n1 − 1, n2 − 1) . By comparing the calculated
value, with the tabulated value for the above degrees of freedom at specific level of significance, the null
hypothesis is either accepted or rejected.
1. Write any two important uses of normal curve.
Many of the distributions of sample statistic tend to normality for large samples and as such they can best
be studied with the help of the normal curves.
Theory of normal curves can be applied to the graduation of the curves which are not normal
Test of significance for equality of population variances

Working Rule
Let A and B be two samples with sizes n1 and n2 and SD s1 and s2
• Set up null hypothesis H 0 : 12 =  22 The Population variances are same.

or two tailed test.
1 1
Compute estimated population variance S12 =  ( x − x ) and S22 =  (y − y)
2 2
•
n1 − 1 n2 − 1
S12
• Compute test statistic F = 2 , assuming S12  S 22 .
S2
• Choose appropriate level of significance and degrees of freedom ( n1 − 1, n2 − 1) and find table value of
F (critical value)
• Compare calculated value of | F | with the tabulated value.
• Conclusion: If F  F then accept H 0 . If F  F then reject H 0 .
Solved Problems
1. Test if the variances are significantly different for:
x1 : 24 27 26 21 25
x2 : 27 30 32 36 28 23
Here we have to apply F test. Given n1 = 5 and n2 = 6
x1 24 27 26 21 25
 x = 123
1
x2
1
576 729 676 441 625
 x = 3047
2
1
x2 27 30 32 36 28 23
 x = 176
2
x 2
2
729 900 1024 1296 784 529
 x = 5262
2
2
Mean of sample 1 : x1 =
x 1
=
123
= 24.6 Mean of sample 2 : x2 =
 x = 176 = 29.3
2
n1 5 n2 6
Variance of sample 1 : s 2
=
 x −(x ) 2
1 2
=
3047
− 24.62 = 4.24
1 1
n 1 5
Variance of sample 2 : s 2
=
x −(x ) 2
2 2
=
5262
− 29.32 = 18.51
2 2
n 2 6
n1s12 5  4.24 n s 2 6 18.51

Estimated Population variances are S12 = = = 5.3 and S22 = 2 2 = = 22.21
n1 − 1 4 n2 − 1 5

S22 22.21
Test statistic F = 2 = = 4.19
S1 5.3
Degrees of freedom ( n2 − 1, n1 − 1) = ( 5, 4 )
Table value of F for degrees ( 5, 4 ) of freedom at 5% level of significance is 6.26
2 Pumpkins were grown under two experimental conditions. Two random samples of 11 and 9
pumpkins show the sample standard deviations of their weights as 0.8 and 0.5 respectively.
Assuming that the weight distributions are normal, test the hypothesis that the true variances are
equal, against the alternative hypothesis that they are not at the 10% level of significance.
Given n1 = 11, n2 = 9, s1 = 0.8, s2 = 0.5
Null hypothesis H 0 :  12 =  22 (Populations variances are equal)
n1s12 11 0.82 n s 2 9  0.52

n1 − 1 10 n2 − 1 8
S12 0.704
S22 0.28
Degrees of freedom ( n1 − 1, n2 − 1) = (10, 8 )
Table value of F for degrees (10, 8 ) of freedom at 10% level of significance is 5.81
i.e. difference between population variances are not significant.

3. A group of 10 rats fed on diet A and another group of 8 rats fed on diet B recorded the following
increase in weight.
Diet A : 5 6 8 1 12 4 3 9 6 10
Diet B : 2 3 6 8 10 1 2 8
Find the variances are significantly different.
x1 6 6 8 1 12 4 3 9 6 10
 x = 65
1
x2
1
36 36 64 1 144 16 9 81 36 100
 x = 523
2
1
x2 2 3 6 8 10 1 2 8
 x = 40
2
x 2
2
4 9 36 64 100 1 4 64
 x = 282
2
2

Mean of Diet A : x1 =
x 1
=
65
= 6.5 Mean of Diet B : x2 =
x 2
=
40
=5
n1 10 n2 8
Variance of Diet A : s12 =

x −(x ) 2
1 2
=
523
− 6.52 = 10.05
1
n 1 10
Variance of Diet B : s 2
=
x −(x ) 2
2 2
=
282 2
− 5 = 10.25
2 2
n 2 8

n1s12 10 10.05 n s 2 8 10.25
n1 − 1 9 n2 − 1 7
s22 11.71
s12 11.16
i.e. two population variance do not differ significantly.
4. Time taken by workers in performing a job is given below:
Method 1 20 16 26 27 23 22
Method 2 27 33 42 35 34 38
Test whether there is any significant difference between the variances of the time distribution at
5% level of significance.
x1 20 16 26 27 23 22
 x = 134 1
x2
1
400 256 676 729 529 484
 x = 3074
2
1
x2 27 33 42 35 34 38
 x = 209 2
x 2
2
729 1089 1764 1225 1156 1444
 x = 7407
2
2
Mean of method 1 : x1 =
x 1
=
134
= 22.3 Mean of method 2 : x2 =
x 2
=
209
= 34.8
n1 6 n2 6
Variance of method 1 : s 2
=
x −(x ) 2
1 2
=
3074
− 22.32 = 15.04
1 1
n 1 6

Variance of method 2 : s 2
=
x −(x )
2
2 2
=
7407
− 34.82 = 23.46
2 2
n2 6
n1s12 6 15.04 n s 2 6  23.46

n1 − 1 5 n2 − 1 5
S22 28.15
S12 18.04
DESIGN OF EXPERIMENTS
The design of experiments is a logical construction of the experiment in which the degree of uncertainty with
which the inferences is drawn may be well defined. Here we consider some aspects of experimental design and
analysis of data from such experiments using ANOVA techniques.
Statistical experiment is conducted to verify the truthiness of a hypothesis. Consider an agricultural experiment that
a particular manure increases the yield of a grain. Here the quantity of manure used and quantity of yield are two
experimental variables. In addition, there are other variables such as nature of soil, proper watering and quality of
seeds also affect the yield, which are called extraneous variables.
So the main aim of our design of experiment is to control the extraneous variables and hence to minimize the
experimental error so that the results of the experiments could be attributed only to the experimental variables.
The purpose of experimental design is to obtain maximum information with the minimum cost and labour.
With respect to an agricultural experiment, we mean the factors used in this design like treatments, experimental
unit, blocks and experimental error as follows:
Treatments: Types of crops, variety of manure, methods of cultivation

Experimental Unit: The plot of land
Blocks: Division of the land separated which are relatively homogeneous divisions
Error: Variation in the yield due to extraneous variables.
Basic Principles of Experimental Design
The basic principles of experimental design are (i) randomization (ii) replication and (iii) local control and (iv)
ANOVA.

Randomization controls the effect of extraneous variables. It is done by the selection of plots for experimental groups
and control groups in a random manner.
Replication means repetition. In our example, the manure is used in more than one plot so that the effect may be
identified precisely.
Local control controls the effect of extraneous variable by using the methods such as grouping, blocking and
balancing.
ANOVA is a test of the homogeneity of a set of data. It is defined as The separation of the variance ascribable to
one group of causes from the variance ascribable to other groups.
It enables us to find the total variability due to each factor and by comparing these variation, homogeneity of the
observation may be tested. i.e. whether all the observations are drawn from the same normal population.
Assumptions for ANOVA test
• The individual samples are drawn randomly from the population

• The variance between the samples is constant
• The sampled population is normal
• Experimental errors should be homogeneous and are independent.
Experimental Error
The unexplained random part of the variation in any experiment is termed as experimental error. An estimate of
experimental error can be obtained by replication.
Basic Designs of Experimental Design

The basic designs of experiment are
(i) One way classification (Completely Randomized Design)
(ii) Two way classification (Randomized Block Design)
Completely Randomized Design
The term CRD or one way classification refers to the fact that a single variable factor of interest is controlled
and its effect on the other elementary units is observed. Suppose we wish to compare h treatments (say
manure) and there are n plots available for the experiment. Let i th treatment be replicated ni times, so
that n1 + n2 + ... + nh = n .
In this design treatments are randomly arranged over the experimental units which are divided into groups
at random as follows.
The plots are numbered from 1 to n serially. n identical cards are taken, numbered from 1 to n and shuffled
thoroughly. The numbers on the first n1 cards drawn randomly give the number of plots to which the first treatment
is to be given. The numbers on the next n2 cards drawn at random give the numbers of the plots to which the second
treatment is to be given and so on.
This design is called a Completely Randomized Design. This design is used only when the number of treatments is
small and the experimental material is homogeneous.

For example, in on each of the several blocks agricultural field experiments, where several varieties of wheat are to
be tested on each of several blocks of land, it is necessary to assign the varieties at random to several plots in each
block.
One Way Classification
Here the data are classified on the basis of one criterion as follows
Treatment Values
1 x11 x12 ...... x1i ..... x1n1
2 x21 x22 ...... x2i ..... x2 n2
: :
: :
k xk1 xk 2 ...... xki ..... xknk
k
Then n
i =1
i =N
Here we wish to test the null hypothesis that there is no significant difference between the treatments
under consideration. i.e. H0 : 1 = 2 = ..... = k and hence the alternative hypothesis is
H1 : 1  2  .....  k
Computational formula for various sum of squares:
T2
Total sum of square V =   xij2 − where T =  x ij
N
2
 Ti  T 2
Sum of squares between samples V1 =    −
 ni  N
Sum of squares within samples(Error) V2 = V − V1
ANOVA table
Sources of Sum of Degrees of Mean square Calculated
variance squares freedom Variance
F
k −1 SST
Treatment V1 = ST2 ST2
k −1 F=
S E2
Error V2
N −k
SSE
= S E2 (S 2
T
 S E2 )
N −k
Total V N −1
Here the calculated ratio follows F distribution with degrees freedom ( k − 1, N − k ) . If the calculated
value of F is less than the tabulated value, then the null hypothesis is accepted. Otherwise it is rejected.

1. Write two advantages of completely randomized experimental design.
• CRD results in the maximum use of the experimental units since all the experimental materials can
be used.
• The design is very flexible and easy to layout
• Any number of replicates and treatments may be used
• It provides with the maximum number of degrees of freedom
• It is most useful for laboratory techniques and methodological studies
2. What are the basic elements of an ANOVA table for one way classification?
ANOVA table
variance squares freedom Variance F
k −1 SST
Treatment SST = ST2 ST2
k −1 F= 2
SE
Error SSE
N −k
SSE
= S E2 (S 2
T
 S E2 )
N −k
Total N −1
Solved Problems
3. The following table gives the yields of 15 samples of plot under three varieties of seed.
A 20 21 23 16 20
B 18 20 17 15 25
C 25 28 22 28 32
Test using analysis of variance whether there is a significant difference in the average yield of seeds.
This is one way classification. Let us tabulate the data:
Plots
Varieties of 1 2 3 4 5 Total x12 x22 x32 x42 x52
seeds
A 20 21 23 16 20 100 400 441 529 256 400
B 18 20 17 15 25 95 324 400 289 225 625
C 25 28 22 28 32 103 625 784 484 784 1024

Total 298 1349 1625 1302 1265 2049
H 0 : There is no difference between the varieties of seeds in respect of growth.
H1 : There is significant difference between the varieties of seeds in respect of growth.
Step 1 : Number of data N = 15

Step 2. Total T = 298

T 2 (298) 2
Step 3. Correction Factor = = 5920.266
N 15
T2
Step 4. Total Sum of Squares V =  x + x + x + x + x −
2
1
2
2
2
3
2
4
2
5
N
= 1349 + 1625 + 1302 + 1265 + 2049 – 5920.266
= 1669.73
( T ) + ( T ) + ( T )
2 2 2
T2
Step 5. Sum of Squares between varieties of seeds V1 = −
1 2 3
n1 n2 n3 N
(100) 2 (95) 2 (103) 2

V1 = + + − 5920.266
5 5 5
= 6.534
Step 6. Sum of Squares within varieties of seeds V2 = V − V1 = 1669.73 − 6.534 = 1663.19
Step 7. ANOVA table

Sources of Sum of Degrees of Mean square Calculated Table value of F
variance squares freedom Variance F at 5% level
Between
k −1 6.534 138.59 F0.05 (12, 2 )
varieties of 6.534 = 3.267 = 42.42
3–1=2 2 3.267 = 9.41
seeds
Within 1663.19
varieties of 1663.19 N −k = 138.59
seeds 15 – 3 = 12 12
Total 1669.724 N −1 = 14
Step 7 : Conclusion : Here calculated value is greater than the tabulated value.
Therefore , Null hypothesis is rejected. i.e. There is significant difference between the varieties of seeds in respect
of growth.
4. The accompanying data resulted from an experiment comparing the degree of soiling for fabric
copolymerized with the 3 different mixtures of methacrylic acid. Analyse the classification.
Mixture1 0.56 1.12 0.90 1.07 0.94

Mixture 2 0.72 0.69 0.87 0.78 0.91
Mixture 3 0.62 1.08 1.07 0.99 0.93
This is one way classification. Let us tabulate the data:
Degree of Soiling
Total x12 x22 x32 x42 x52
Mixture 1 2 3 4 5
M1 0.56 1.12 0.9 1.07 0.94 4.59 0.314 1.254 0.81 1.145 0.884
M2 0.72 0.69 0.87 0.78 0.91 3.97 0.518 0.476 0.757 0.608 0.828
M3 0.62 1.08 1.07 0.99 0.93 4.69 0.384 0.384 1.145 0.98 0.865
Total 13.25 1.216 2.114 2.712 2.733 2.577

H 0 : There is no difference between the degree of soiling with respect to the mixtures
H1 : There is significant difference between the degree of soiling with respect to the mixtures
Step 2. Total T = 13.25

T 2 (13.25) 2
N 15
T2
Step 4. Total Sum of Squares V =  x +  x +  x +  x +  x −
2
1
2
2
2
3
2
4
2
5
N
= 1.216 + 2.114 + 2.712 + 2.733 + 2.577 – 11.704
= 0.4311
( T ) + ( T ) + ( T )
2 2 2
T2
Step 5. Sum of Squares between degree of soiling V1 = −
1 2 3
n1 n2 n3 N
4.592 3.97 2 4.69 2

V1 = + + − 11.704
5 5 5
= 0.061
Step 6. Sum of Squares within degree of soiling V2 = V − V1 = 0.4311 − 0.061 = 0.3701
Step 7. ANOVA table

variance squares freedom Variance at 5% level
F
Between
k −1 0.061 0.0308 F0.05 (12, 2 )
0.061 = 0.0305 = 1.011
degree of 3–1=2 2 0.0305 = 19.41
soiling
Within degree N −k 0.3701

0.3701 = 0.0308
of soiling 12
15 – 3 = 12
Total 36 N −1 = 11
Step 7 : Conclusion : Here calculated value is less than tabulated value.
Therefore , Null hypothesis is accepted. i.e. There is no difference between the degree of soiling with
respect to the mixtures

Randomised Block Design
Consider an agricultural experiment using which we wish to test the effect of k fertilizing treatments on the yield
of a crop. We assume that soil fertility of the plots are known. Then we divide the plots into h blocks, according
to the soil fertility, each block containing k plots. Thus the plots in each block will be of homogeneous fertility.
Within each block, the k treatments are given to the k plots in a perfectly random manner, such that each
treatment occurs only once in any block. But the same k treatments are repeated from block to block. This
design is called Randomized Block Design.
Advantages of RBD
This is more accurate than completely randomized design
Any number of treatments on the number of replicates may be used
Statistical analysis is simple and fast.
Note: It is not suitable (i) for large number of treatments (ii) if blocks are not homogeneous
Comparison of RBD and CRD

RBD is more efficient than CRD
Experimental error of RBD is very less than CRD
In RBD, treatments are allocated at random within the units of each stratum, but it is not done in CRD.
RBD is more flexible than CRD because there is no restrictions on the number of treatments or replications
Two Way Classification
The data collected from experiments with RBD form a two way classification i.e. classified according to
two factors say blocks ( r ) and treatments ( k ) .
Here the data are classified on the basis of one criterion as follows
Treatments
1 2 ………………. k
1 x11 x12 ...... x1i ..... x1k
2 x21 x22 ...... x2i ..... x2k
Blocks
: :
: :
r xr1 xr 2 ...... xri ..... xrk
Then rk = N
Here we wish to test the null hypothesis that there is no significant difference between the treatments as
well as blocks under consideration. i.e.
H01 : 1 = 2 = ..... = r
and H 02 : 1 = 2 = ..... = k
Computational formula for various sum of squares:

T2
Total sum of square V =   xij2 − where T =  x ij
N
2 2
T  T
Sum of squares between blocks V1 =   i  −
r N
 Tj  T 2
2
Sum of squares between treatments V2 =    −

k  N
Sum of squares within samples(Error) V2 = V − V1 − V2
ANOVA table
variance squares freedom Variance
F
r −1 V1 S R2
Blocks V1 = S R2 F1 = 1
r −1 S E2
k −1 V2 SC2
Treatments V2 = SC2 F2 = 1
k −1 S E2
V3
Error V3 = S E2
(r − 1)(k − 1) (r − 1)(k − 1)
Total V N −1
Here the calculated ratios F1 , F2 follows F distribution with degrees freedom ( r − 1,(r − 1)(k − 1) ) and
( k − 1,(r − 1)(k − 1) ) respectively.
If the calculated value of F is less than the tabulated value, then the null hypothesis is accepted. Otherwise
it is rejected.
Thus a two way analysis is used to measure how two dependent variables, in combination, affect a
dependent variable. For example the agricultural output may be classified on the basis of different
varieties of seeds and also on the basis of different varieties of fertilizers used.
Solved Problems
1. Performa 2-way ANOVA on the data given below:
Treatment-I
1 2 3
1 30 26 38
2 24 29 28
Treatment-II 3 33 24 35
4 36 31 30
5 27 35 33
Use the coding method subtracting 30 from the given number.
This is two way classification. Calculation table. Subtract 30 from all the values.

Treatment-I Row
Total x12 x22 x32
1 2 3
TR
Treatment-II 1 0 −4 8 4 0 16 64
2 −6 −1 −2 −9 36 1 4
3 3 −6 5 2 9 36 25
4 6 1 0 7 36 1 0
5 −3 5 3 5 9 25 9
Column Total TC 0 −5 14 9 90 79 102
x
2
ij = 271
H01 : There is no difference between treatment-II with respect to wellness.
H02 : There is no difference between treatment-I with respect to wellness.

Step 2. Total T = 9
T 2 (9) 2
N 15
T2
Step 4. Total Sum of Squares V =  x − = 271 − 5.4 = 265.6
2
ij
N
Step 5. Sum of Squares between treatment-II
( T ) + ( T ) + ( T ) + ( T ) + ( T )
2 2 2 2 2
T2
= −
R1 R2 R3 R4 R5
V1
n1 n2 n3 n4 n5 N
(4) 2 (−9) 2 22 7 2 52
V1 = + + + + − 5.4 = 52.9
3 3 3 3 3
( T ) + ( T ) + ( T )
2 2 2
T2
Step 6. Sum of Squares between treatment-I V2 = −
C1 C2 C3
n1 n2 n3 N
02 (−5)2 (14)2
V2 = + + − 5.4 = 38.8
5 5 5
Step 7. Error Sum of Squares V3 = V − V1 − V2 = 265.6 − 52.9 − 38.8 = 173.9
Step 8. ANOVA table

F
m −1 52.9 21.7
Between 52.9 = 13.22 = 1.64 F0.05 ( 8, 4 ) = 6.04
5–1=4 4 13.22
Treatment-II
Between 38.8 21.7

38.8 n −1 = 19.4 = 1.11 F0.05 ( 8, 2 ) = 19.37
Treatment-I 2 19.4
3–1=2
( m − 1)( n − 1) 173.9
Error 173.9 = 21.7
=8 8
Total 24.6 N −1 = 14
Step 7 : Considering the difference between treatment-II, we find that, calculated value of F = 1.64 
tabulated value of F5% = 6.04 , we accept H 01 : (the treatment-II do not differ significantly)
Considering the difference between treatment-I, we find that, calculated value of F = 1.11  tabulated
value of F5% = 19.37 , we accept H 02 : (the treatment-I do not differ significantly)
2. Three varieties of coal were analysed by 4 Chemists

chemists and the ash content is tabulated A B C D
here. Perform an analysis of variance. I 8 5 5 7
Coal II 7 6 4 4
III 3 6 5 4
This is two way classification. Calculation table.
Chemists Row
Total x12 x22 x32 x42
A B C D TR
I 8 5 5 7 25 64 25 25 49
C
o
II 7 6 4 4 21 49 36 16 16
a
l
III 3 6 5 4 18 9 36 25 16
Column Total TC 18 17 14 15 64 122 97 66 81
x 2
ij = 366
H01 : There is no difference between coal with respect to ash content.

H 02 : There is no difference between chemists with respect to ash content.

Step 1. Number of data N = 12
T 2 (64) 2
N 12
T2
Step 4. Total Sum of Squares V =  ij N = 366 − 341.3 = 24.6
x 2
−
( T ) + ( T ) + ( T )
2 2 2
T2
Step 5. Sum of Squares between coals V1 = −
R1 R2 R3
n1 n2 n3 N
(25) 2 212 182

V1 = + + − 341.3 = 6.2
4 4 4
( T ) + ( T ) + ( T ) + ( T )
2 2 2 2
T2
Step 6. Sum of Squares between chemists V2 = −
C1 C2 C3 C4
n1 n2 n3 n4 N
182 17 2 (14) 2 (15) 2

V2 = + + + − 341.3 = 3.36
3 3 3 3
Step 8. ANOVA table

m −1 6.2 3.1
Between 6.2 = 3.1 = 1.24 F0.05 ( 2, 6 ) = 5.14
3–1=2 2 2.5
Coals
Between
n −1 3.36 2.5
3.36 = 1.12 = 2.2 F0.05 ( 6,3) = 8.94
Chemists 4–1=3 3 1.12
( m − 1)( n − 1) 15.04
Error 15.04 = 2.5
=6 6
Total 24.6 N −1 = 11
Step 7 : Considering the difference between Coals, we find that, calculated value of F = 1.24  tabulated
value of F5% = 5.14 , we accept H 01 : (the coals do not differ significantly)
Considering the difference between chemists, we find that, calculated value of F = 2.2  tabulated value of
F5% = 8.94 , we accept H 02 : (the chemists do not differ significantly)

3. The following data represent a certain person to work from Monday to Friday by four different
routes.
Days
Mon Tue Wed Thu Fri
1 22 26 25 25 31
Routes 2 25 27 28 26 29
3 26 29 33 30 33
4 26 28 27 30 30
Test at 5% level of significance whether the differences among the means obtained for the different
routes are significant and also whether the differences among the means obtained for the different
days of the week are significant.
This is two way classification. Let us arrange the data by subtracting 26 from each value.
Days Row
1 2 3 4 5 Total x12 x22 x32 x42 x52
TR
1 −4 0 −1 −1 5 −1 16 0 1 1 25
R
o 2 −1 1 2 0 3 5 1 1 4 0 9
u
t 3 0 3 7 4 7 21 0 9 49 16 49
s
4 0 2 1 4 4 11 0 4 1 16 16
Column
−5 6 9 7 19 36 17 14 55 33 90
Total TC
x 2
ij = 209
H01 : There is no difference between routes with respect to work.

H 02 : There is no difference between days with respect to work.

T 2 (36) 2
N 20
T2
x 2
−
( T ) + ( T ) + ( T ) + ( T )
2 2 2 2
T2
Step 5. Sum of Squares between routes V1 = −
R1 R2 R3 R4
n1 n2 n3 n4 N
−12 52 212 112

V1 = + + + − 64.8 = 52.8
5 5 5 5

( T ) + ( T ) + ( T ) + ( T ) + ( T )
2 2 2 2 2
T2
Step 6. Sum of Squares between days V2 = −
C1 C2 C3 C4 C5
n1 n2 n3 n4 n5 N
−5 6 9 7 19
2 2 2 2 2
V2 = + + + + − 64.8
4 4 4 4 4
= 73.2
Step 8. ANOVA table

F
m −1 52.8 17.6
Between 52.8 = 17.6 = 11.57 F0.05 ( 3,12 ) = 3.49
4–1=3 3 1.52
Routes
73.2 18.53
Between Days 73.2 n −1 = 18.3 = 12.2 F0.05 ( 3,12 ) = 3.49
4 1.52
5–1=4
( m − 1)( n − 1) 18.2
Error 18.2 = 1.52
= 12 12
Total 144.2 N −1 = 19
Step 7 : Considering the difference between Routes, we find that, calculated value of F = 11.57 
tabulated value of F5% = 3.49 , we reject H 01 : (the routes differ significantly)
Considering the difference between Days, we find that, calculated value of F = 12.2  tabulated value of
F5% = 3.49 , we reject H 02 : (the Days differ significantly)
4. The following table gives the number of refrigerators sold by 4 salesman in 3 months May, June, July.
Month Salesman
May 50 40 48 39
June 46 48 50 45
July 39 44 40 39
Is this a significant difference in the sales made by 4 salesman?

Is this a significant difference in the sales during different month?

Salesman Row
1 2 3 4 Total x12 x22 x32 x42
TR
M
May 10 0 8 −1 17 100 0 64 1
o
n
June 6 8 10 5 29 36 64 100 25
t
h
July −1 4 0 −1 2 1 16 0 1
Column
15 12 18 3 48 137 80 164 27
Total TC
x 2
ij = 408
H01 : There is no difference between salesmen with respect to sales.

H 02 : There is no difference between month with respect to sales.
Step 1. Number of data N = 12
T 2 (48) 2
Step 3. Correction Factor = = 192
N 12
T2
Step 4. Total Sum of Squares V =  xij − = 408 − 192 = 216
2
( T ) + ( T ) + ( T )
2 2 2
T2
Step 5. Sum of Squares between months V1 = −
R1 R2 R3
n1 n2 n3 N
17 2 292 22
V1 = + + − 216
4 4 4
= 74.5
( T ) + ( T ) + ( T ) + ( T )
2 2 2 2
T2
Step 6. Sum of Squares between salesman V2 = −
C1 C2 C3 C4
n1 n2 n3 n4 N
152 122 182 32

V2 = + + + − 216 = 18
3 3 3 3
Step 7. Error Sum of Squares V3 = V − V1 − V2 = 216 − 74.5 − 18 = 123.5
Step 8. ANOVA table

m −1 74.5 37.25
Between 74.5 = 37.25 = 1.81 F0.05 ( 2,6 ) = 5.14
3–1=2 2 20.58
Months
Between
n −1 18 20.58
18 =6 = 3.43 F0.05 ( 6,3) = 8.94
Salesman 4–1=3 3 6
( m − 1)( n − 1) 123.5
Error 123.5 = 20.58
=6 6
Total 216 N −1 = 11
Step 7 : Considering the difference between Months, we find that, calculated value of F = 1.81 
tabulated value of F5% = 5.14 , we accept H 01 : (the months do not differ significantly)
Considering the difference between Salesman, we find that, calculated value of F = 3.43  tabulated value
of F5% = 8.94 , we accept H 02 : (the salesman do not differ significantly)
5. Perform ANOVA and test at 0.05 level of Engine

significance whether these are differences in A B C
the detergents or in the engines for the given I 45 31 51
data. Detergent II 47 46 52
III 48 50 55
IV 42 37 49
Engine Row
Total x12 x22 x32
A B C TR
D
I 5 −9 11 7 25 81 121
e
t
II 7 6 12 25 49 36 144
e
r
III 8 10 15 33 64 100 225
g
e IV 2 −3 9 8 4 9 81
n
t
Column
22 4 47 73 142 226 571
Total TC
x 2
ij = 939
H01 : There is no difference between detergents with respect to washing.

H 02 : There is no difference between engine with respect to washing.

T 2 (73) 2
N 12
T2
x 2
−
( T ) + ( T ) + ( T ) + ( T )
2 2 2 2
T2
Step 5. Sum of Squares between detergents V1 = −
R1 R2 R3 R4
n1 n2 n3 n4 N
7 2 252 332 82
V1 = + + + − 444.1 = 164.9
3 3 3 3
( T ) + ( T ) + ( T )
2 2 2
T2
Step 6. Sum of Squares between engines V2 = −
C1 C2 C3
n1 n2 n3 N
222 42 47 2
V2 = + + − 444.1 = 233.15
4 4 4
Step 8. ANOVA table

F
m −1 164.9 54.96
Between 164.9 = 54.96 = 3.4 F0.05 ( 3,6 ) = 4.76
4–1=3 3 16.14
Detergents
Between
n −1 233.15 116.5
233.15 = 116.5 = 7.2 F0.05 ( 2, 6 ) = 5.14
Engines 3–1=2 2 16.14
( m − 1)( n − 1) 96.85
Error 96.85 = 16.14
=6 6
Total 494.9 N −1 = 11
Step 7 : Considering the difference between Detergents, we find that, calculated value of F = 3.4 
tabulated value of F5% = 4.76 .
Therefore we accept H 01 : (the detergents do not differ significantly)
Considering the difference between Engines, we find that, calculated value of F = 7.2  tabulated value of
F5% = 5.14 .
Therefore, we reject H 02 : (the Engines differ significantly)

6. Three varieties A, B, C of a crop are tested in a randomized block design with 4 replications. The plot
yields in pounds are as follows:
A6 C5 A8 B9
C8 A4 B6 C9
B 7 B 6 C 10 A 6
Analyze experimental yield and stat your conclusion.
H01 (the varieties of crops do not differ significantly with respect to yield)
H 02 : (the blocks do not differ significantly with respect to yield)
Rewriting the data such that the rows represent the blocks and the columns represent the varieties of
crops, we have
Variety of Crops
Block A B C
1 6 7 8
2 4 6 5
3 8 6 10
4 6 9 9
Crops
Blocks
A B C Ti
Ti 2 x 2
ij
k i
2
(21)
1 6 7 8 21 = 147 149
3
(15) 2
2 4 6 5 15 = 75 77
3
(24) 2
3 8 6 10 24 = 192 200
3
242
4 6 9 9 24 = 192 198
3
Ti 2
Tj 24 28 32 T = 84  k = 606 x
2
ij = 624
T j2 (24)2 282 (32) 2 T j2

h 4
= 144
4
= 196
4
= 256  h
= 596
 xij2
j
152 202 270 x 2
ij = 624
T 2 (84) 2
Correction Factor = = 588
N 12

T2
Total sum of squares Q =  N
xij2 −
= 624 − 588 = 36
T2 T2
Sum of squares between blocks Q1 =  i − = 606 − 588 = 18
k N
T j2T2
Sum of squares between crops Q2 =  −
h N
= 596 − 588 = 8
Error sum of square Q3 = Q − Q1 − Q2 = 36 − 18 − 8 = 10
ANOVA Table
Source of Variation Sum of Degrees of Mean Square Calculated F Value
Squares Freedom
Between Rows (Blocks) 18 6
Q1 = 18 h −1 = 3 =6 = 3.6
3 1.67
Between Columns 8 4
(Crops) Q2 = 8 k −1 = 2 =4 = 2.4
2 1.67
Error 10
Q3 = 10 (h −1)(k −1) = 6 = 1.67 -
6
Total Q = 36 hk −1 = 11 - -
From F − table, F5% ( v1 = 3, v2 = 6 ) = 4.76 and F5% ( v1 = 2, v2 = 6 ) = 5.14

Considering the difference between rows, we find that, calculated value of F = 3.6  tabulated value of
F5% = 4.76 , we accept H 01 : (the blocks do not differ significantly with respect to yield)
Considering the difference between columns, we find that, calculated value of F = 2.4  tabulated value
of F5% = 5.14 , we accept H 02 : (the varieties of crops do not differ significantly with respect to yield)
7. Four air conditioning compressor designs were tested in four different regions of India. The test
was repeated by installing additional air conditioners in a second cooling season. The following are
the times to failure (to the nearest month) of each compressor tested.
Replicate 1 Replicate 2
Designs Designs
A B C D A B C D
Northeast 58 35 72 61 49 24 60 64
Region Southeast 40 18 54 38 38 22 64 50
Northwest 63 44 81 52 59 16 60 48
Southwest 36 9 47 30 29 13 52 41
Test at the 0.05 level of significance whether the difference among the means determined for designs,
for regions, and for replicates are significant and for significance of the interaction between
compressor designs and regions.
Add the respective values and continue as a two way analysis.

U-3 Notes

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

U-3 Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

U-3 Notes

Uploaded by

Copyright:

Available Formats

UNIT III − TESTING OF HYPOTHESIS – PARAMETRIC TESTS

Standard Error of a Statistic

Statistic Standard Error Statistic Standard Error

Department of Mathematics, SJCET

Procedure for testing a hypothesis

Step 3: Computation of test statistic:

Type I Error : Rejecting Null Hypothesis when it is true.

Type II Error : Accepting Null Hypothesis when it is wrong.

Step 5: Critical or Rejection Region:

Department of Mathematics, SJCET

Right tailed test z = 2.33 z = 1.645 z = 1.28

Testing of Significance for Single Mean – Large Sample

Sample size n , mean x , SD S and population mean  , SD  is given

Department of Mathematics, SJCET

• Compare calculated value of z with the tabulated value.

• Conclusion: If z  z then accept H 0 . If z  z then reject H 0 .

Given n = 200, x = 4.95, s = 0.21,  = 5

Alternative Hypothesis H1 :   5 (two tailed test)

we can’t accept that the net weight is 5kgs per tin

Given n = 100, x = 71.8,  = 70,  = 8.9

2. H1 :   70 [ use right tailed test ]

Department of Mathematics, SJCET

5. Tabulated value of z at 5% level is z = 1.64

Test of significance for difference of means − Large Sample

or two tailed test.

• Compare calculated value of | z | with the tabulated value.

• Conclusion: If z  z then accept H 0 . If z  z then reject H 0 .

Department of Mathematics, SJCET

Here n1 = 400, x1 = 2500, s1 = 400 and n2 = 400, x2 = 2200, s2 = 550

Alternative Hypothesis: H1 = 1  2 (Two tailed test)

Taking level of significance as 5%, the table value is z0.05 = 1.96

i.e. there is a significant difference in sales of the two cities.

Here n1 = 50, x1 = 76, s1 = 6 and n2 = 75, x2 = 82, s2 = 2

Alternative Hypothesis: H1 = 1  2 (Two tailed test)

Taking level of significance as 5%, the table value is z0.05 = 1.96

i.e. there is a no difference in the performance of boys and girls.

Here n1 = 500, x1 = 20, n2 = 400, x2 = 15,  = 4

Alternative Hypothesis: H1 = 1  2 (Two tailed test)

Department of Mathematics, SJCET

x1 and x2 is significant at 1% level of significance.

Here n1 = 1000, x1 = 67.5, n2 = 2000, x2 = 68,  = 2.5

Alternative Hypothesis: H1 = 1  2 (Two tailed test)

Taking level of significance as 5%, the table value is z0.05 = 1.96

Here n1 = 100, x1 = 1300, s1 = 82 and n2 = 100, x2 = 1248, s2 = 93

Alternative Hypothesis: H1 = 1  2 (One tailed test)

Taking level of significance as 5%, the table value is z = 2.33

i.e. the bulbs of company P is superior to bulbs of company Q.

Department of Mathematics, SJCET

Here n1 = 32, x1 = 72, s1 = 8 and n2 = 36, x2 = 74, s2 = 6

Here n1 = 50, x1 = 68.2, s1 = 2.5; n2 = 50, x2 = 67.5, s2 = 2.8

Alternative Hypothesis: H1 = 1  2 (One tailed test)

Taking level of significance as 5%, the table value is z0.1 = 1.645

This may happen if z  1.645

Department of Mathematics, SJCET

Here n1 = 100, x1 = 61, s1 = 4; n2 = 200, x2 = 63, s2 = 6

Alternative Hypothesis: H1 = 1  2 (Two tailed test)

Test of significance for Single Proportion – Large Sample

When sample proportion p and population proportion P is given