XII STD - Statistics English Medium
XII STD - Statistics English Medium
XII STD - Statistics English Medium
STATISTICS
Content Creation
The wise
possess all
II
STATISTICS
III
IV
• Statisticians • Census
• Business Analyst • Ecology
• Mathematician
• Medicine
• Professor
• Risk Analyst • Election
• Data Analyst • Crime
• Content Analyst • Economics
• Statistics Trainer • Education
• Data Scientist • Film
• Consultant • Sports
• Biostatistician
• Tourism
• Econometrician
• Strong Foundation in Mathematical Statistics
• Logical Thinking & Ability to Comprehend Key Facts
• Ability to Interact with people from various fields to understand the problems
• Strong Background in Statistical Computing
• Ability to stay updated on recent literature & statistical software
• Versatility in solving problems
TESTS OF
1 SIGNIFICANCE –
BASIC CONCEPTS
AND LARGE SAMPLE
TESTS
Jerzy Neyman (1894-1981) was Egon Sharpe Pearson
born into a Polish family in Russia. (1885-1980) was the son
He is one of the Principal architects of Prof. Karl Pearson.
of Modern Statistics. He developed He was the Editor of
the idea of confidence interval Biometrika, which is still
estimation during 1937. He had one of the premier journals Egon Sharpe
Jerzy Neyman also contributed to other branches in Statistics. He was Pearson
of Statistics, which include Design of Experiments, instrumental in publishing the two volumes
Theory of Sampling and Contagious Distributions. of Biometrika Tables for Statisticians, which
He established the Department of Statistics in has been a significant contribution to the
University of California at Berkeley, which is one world of Statistical Data Analysis till the
of the preeminent centres for statistical research invention of modern computing facilities.
worldwide.
Neyman and Pearson worked together about a decade from 1928 to 1938 and developed the
theory of testing statistical hypotheses. Neyman-Pearson Fundamental Lemma is a milestone
work, which forms the basis for the present theory of testing statistical hypotheses. In spite
of severe criticisms for their theory, in those days, by the leading authorities especially
Prof.R.A.Fisher, their theory survived and is currently in use.
“Statistics is the servant to all sciences” – Jerzy Neyman
LEARNING OBJECTIVES
Recall: The unknown constants which appear in the probability density function or probability
mass function of the random variable X, are also called parameters of the corresponding
distribution/population.
The parameters are commonly denoted by Greek letters. In Statistical Inference, some or
all the parameters of a population are assumed to be unknown.
Statistic: Any statistical quantity calculated on the basis of the random sample is called
a statistic. The sample mean, sample standard deviation, sample proportion etc., are called
statistics (plural form of statistic).They will be denoted by Roman letters.
Let (x1, x2, …, xn) be an observed value of (X1, X2, ..., Xn). The collection of (x1, x2, …, xn)
is known as sample space, which will be denoted by ‘S’.
Note 1:
A set of n sample observations can be made on X, NOTE
say, x1, x2, …, xn for making inferences on the unknown
The statistic itself is a random
parameters. It is to be noted that these n values may vary variable and has a probability
from sample to sample. Thus, these values can be considered distribution.
as the realizations of the random variables X1, X2, ..., Xn,
which are assumed to be independent and have the same distribution as that of X. These are also
called independently and identically distributed (iid) random variables.
Note 2:
1 n
In Statistical Inference, the sample standard deviation is defined as S
n 1 i 1
( Xi X )2 ,
1 n
where X Xi . It may be noted that the divisor is n – 1 instead of n.
n i 1
Note 3:
The statistic itself is a random variable, until the numerical values of X1, X2, ..., Xn are observed,
and hence it has a probability distribution.
Notations to denote various population parameters and their corresponding sample
statistics are listed in Table 1.1. The notations will be used in the first four chapters of this book
with the same meaning for the sake of uniformity.
Table 1.1 Notations for Parameters and Statistics
The set of pairs (x1, x2) listed in column 2 constitute the sample space of samples of size 2 each.
Hence, the sample space is:
S = {(4,4), (4,8), (4,12), (4,16), (8,4), (8,8), (8,12), (8,16), (12,4), (12,8), (12,12), (12,16),
(16,4), (16,8), (16,12), (16,16)}
The sampling distribution of X, the sample mean, is determined and is presented in Table 1.3.
Note 4: The sample obtained under sampling with replacement from a finite population satisfies
the conditions for a random sample as described earlier.
Note 5: If the sample values are selected under without replacement scheme, independence
property of X1, X2, ... Xn will be violated. Hence it will not be a random sample.
Note 6: When the sample size is greater than or equal to 30, in most of the text books, the sample
is termed as a large sample. Also, the sample of size less than 30 is termed as small sample.
However, in practice, there is no rigidity in this number i.e., 30, and that depends on the nature
of the population and the sample.
Note 7: The learners may recall from XI Standard Textbook that some of the probability distributions
possess the additive property. For example, if X1, X2, ..., Xn are iid N(μ, σ2) random variables, then the
probability distributions of X1 + X2 + ... + Xn and X are respectively the N(nμ, nσ2) and N(μ, σ2/n). These
two distributions, in statistical inference point of view, can be considered respectively as the sampling
distributions of the sample total and sample mean of a random sample drawn from the N(μ, σ2)
distribution. The notation N(μ, σ2) refers to the normal distribution having mean μ and variance σ2.
Solution:
Here, the population is {4, 8, 12, 16}.
V(X ) (x )2 P ( X x )
PX Q X PY QY
+ , where m and n are sizes of the samples drawn from
m n
Difference between the the populations whose proportions are respectively P and P ;
X Y
proportions pX and pY Q = 1 – P , Q = 1 – P .
X X Y Y
of two independent
samples: (pX – pY) pq 1 1 mpX npY q
m n , where p , 1 p , m and n are sample sizes,
mn
when PX and PY are unknown.
Null Hypothesis:
A hypothesis which is to be actually tested for possible rejection based on a random sample
is termed as null hypothesis, which will be denoted by H0.
YOU WILL KNOW
Alternative Hypothesis:
A statement about the population, which contradicts the null hypothesis, depending upon
the situation, is called alternative hypothesis, which will be denoted by H1.
For example, if we test whether the population mean has a specified value μ0, then the null
hypothesis would be expressed as:
H 0: μ = μ 0
The alternative hypothesis may be formulated suitably as anyone of the following:
(i) H1: μ ≠ μ0
(ii) H1: μ > μ0
(iii) H1: μ < μ0
Example 1.3
A soft drink manufacturing company makes a new kind of soft drink. Daily sales of the new soft
drink, in a city, is assumed to be distributed with mean sales of ₹40,000 and standard deviation of
₹2,500 per day. The Advertising Manager of the company considers placing advertisements in local
TV Channels. He does this on 10 random days and tests to see whether or not sales has increased.
Formulate suitable null and alternative hypotheses. What would be type I and type II errors?
Solution:
The Advertising Manager is testing whether or not sales increased more than ₹40,000.
Let μ be the average amount of sales, if the advertisement does appear.
The null and alternative hypotheses can be framed based on the given information as
follows:
Null hypothesis: Ho: μ = 40000
i.e., The mean sales due to the advertisement is not significantly different from ₹40,000.
Alternative hypothesis: H1: μ > 40000
i.e., Increase in the mean sales due to the advertisement is significant.
(i) If type I error occurs, then it will be concluded as the advertisement has improved
sales. But, really it is not.
(ii) If type II error occurs, then it will be concluded that the advertisement has not
improved the sales. But, really, the advertisement has improved the sales.
1.6 L
EVEL OF SIGNIFICANCE, CRITICAL REGION
AND CRITICAL VALUE(S)
In a given hypotheses testing problem, the maximum probability with which we would be
willing to tolerate the occurrence of type I error is called level of significance of the test. This
probability is usually denoted by ‘α’. Level of significance is specified before samples are drawn
to test the hypothesis.
The level of significance normally chosen in every hypotheses testing problem is 0.05 (5%)
or 0.01 (1%). If, for example, the level of significance is chosen as 5%, then it means that among
the 100 decisions of rejecting the null hypothesis based on 100 random samples, maximum of 5
of among them would be wrong. It is emphasized that the 100 random samples are drawn under
identical and independent conditions. That is, the null hypothesis H0 is rejected wrongly based on
5% samples when H0 is actually true. We are about 95% confident that we made the right decision
of rejecting H0.
Critical region in a hypotheses testing problem is a subset of the sample space whose
elements lead to rejection of H0. Hence, its elements have the dimension as that of the sample size,
say, n(n > 1). That is,
Then, X1 and X2 are iid random variables and they have the Bernoulli (P) distribution.
1 2
Let H 0 : P = and H1 : P =
3 3
The sample space is S = {(0,0),(0,1),(1,0),(1,1)}
If T(X1, X2) represents the number of defective screws, in each random sample, then the
statistic T(X1, X2) = X1 + X2 is a random variable distributed according to the Binomial (2, P)
distribution. The possible values of T(X1, X2) are 0, 1 and 2. The values of T(X1, X2) which lead
to rejection of H0 constitute the set {1,2}.
But, the critical region is defined by the elements of S corresponding to T(X1,X2) = 1 or 2.
Thus, the critical region is {(0,1), (1,0), (1,1)} whose dimension is 2.
Note 8: When the sampling distribution is continuous, the set of values of t ( X ) corresponding to
the rejection rule will be an interval or union of intervals depending on the alternative hypothesis.
It is empahazized that these intervals identify the elements of critical region, but they do not
constitute the critical region.
When the sampling distribution of the test statistic Z is a normal distribution, the critical
values for testing H0 against the possible alternative hypothesis at two different levels of
significance, say 5% and 1% are displayed in Table 1.6.
Table 1.6 Critical values of the Z statistic
Example 1.5
Suppose a pizza restaurant claims its average pizza delivery time is 30 minutes. But you
believe that the restaurant takes more than 30 minutes. Now, the null and the alternate hypotheses
can be formulated as
H0 : μ = 30 minutes and H1: μ > 30 minutes
Suppose that the decision is taken based on the delivery times of 4 randomly chosen pizza
deliveries of the restaurant. Let X1, X2, X3, and X4 represent the delivery times of the such four
occasions. Also, let H0 be rejected, when the sample mean exceeds 31. Then, the critical region is
x x x3 x 4
Critical Region = (x1 , x2 , x3 , x 4 ) | x 1 2 31
4
In this case, P X 31 will be the area, which fall at the right end under the curve representing
–
the sampling distribution of X. Hence, this test can be categorized as a right-tailed test.
H0 : μ = 5 and H1: μ ≠ 5.
Suppose that the decision on H0 is made based on the diameter of 10 randomly selected
ball-bearings. Let Xi, i = 1, 2, …, 10 represent the diameter of the randomly chosen ball bearings.
Then, the critical region is
x x ... x10
Critical Region = (x1 , x2 ,..., x10 ) | x 1 2 4.75 or 5.10
10
In this case, P X 4.75 is the area, which will fall at the left end and P X 5.10 is the
area, which will fall at the right end under the curve representing the sampling distribution of X.
This kind of test can be categorized as a two-tailed test (see Figure 1.3).
Step 1 : Describe the population and its parameter(s). Frame the null hypothesis (H
H0) and
alternative hypothesis (H1).
Step 2 : Describe the sample i.e., data.
Step 3 : Specify the desired level of significance, α.
Step 4 : Specify the test statistic and its sampling distribution under H0.
Step 5 : Calculate the value of the test statistic under H0 for given sample.
Step 6 : Find the critical value(s) (table value(s)) from the statistical table generated from the
sampling distribution of the test statistic under H0 corresponding to α.
Step 7 : Decide on rejecting or not rejecting the null hypothesis based on the rejection rule
which compares the calculated value(s) of the test statistic with the table value(s).
Now, let us see some of the large sample tests, which apply the above general procedure.
As mentioned in Note-6, for large samples, the size of the sample is greater than or equal to 30.
In the case of two samples considered for a hypotheses testing problem, the test is a large sample
test, when the sizes of both the samples are greater than or equal to 30.
Procedure:
Step 1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is known. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0
Step 2 : Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population,
where n is large (n ≥ 30).
Step 3 : Let the level of significance be α.
X −µ0
Step 4 : Consider the test statistic Z = under H0. Here, X represents the sample
σ/ n
mean, which is defined in Note 2. The approximate sampling distribution of the
test statistic under H0 is the N(0,1) distribution.
Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as
x 0
z0
/ n
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.7
A company producing LED bulbs finds that mean life span of the population of its bulbs
is 2000 hours with a standard derivation of 150 hours. A sample of 100 bulbs randomly chosen
is found to have the mean life span of 1950 hours. Test, at 5% level of significance, whether the
mean life span of the bulbs is significantly different from 2000 hours.
1950 2000
z0
150 100
= –3.33
Thus; | z 0 | = 3.33
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at α = 0.05 is ze = z0.025 = 1.96.
(see Table 1.6).
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ ze. Thus, it is a two-tailed test. For the given sample
information, the rejection rule holds i.e., |z0| = 3.33 > ze = 1.96. Hence, H0 is rejected
in favour of H1: μ ≠ 2000. Thus, the mean life span of the LED bulbs is significantly
different from 2000 hours.
Solution:
Step 1 : Let μ and σ represent respectively the mean and standard deviation of the probability
distribution of the breaking strength of the cables. It is given that σ = 120 n/m2. The
null and alternative hypotheses are
Null hypothesis H0: μ = 1900
i.e., the mean breaking strength of the cables is not significantly different from
1900n/m2.
Alternative hypothesis: H1: μ > 1900
i.e., the mean breaking strength of the cables is significantly more than 1900n/m2.
It may be noted that it is a one-sided (right) alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 60. Hence, it is a large sample.
Sample mean (x)= 1960
Step 3 : Level of significance
α = 1%
Step 4 : Test statistic
X −µ0
The test statistic is Z = , under H0
σ/ n
Since n is large, under the null hypothesis, the sampling distribution of Z is the
N(0,1) distribution.
Step 5 : Calculation of test statistic
x 0
The value of Z under H0 is calculated from z 0
/ n
1960 1900
z0
120 / 60
Thus, z0 = 3.87
Step 6 : Critical value
Since H1 is a one-sided (right) alternative hypothesis, the critical value at α = 0.01
level of significance is ze = z0.01= 2.33 (see Table 1.6)
1.10 T
EST OF HYPOTHESES FOR POPULATION MEAN
(POPULATION VARIANCE IS UNKNOWN)
Procedure:
Step1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is unknown. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0
Step 2 : Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population,
where n is large (n ≥ 30).
Step 3 : Specify the level of significance, α.
X 0
Step 4 : Consider the test statistic Z under H0, where X and S are the sample
S/ n
mean and sample standard deviation respectively. It may be noted that the above
test statistic is obtained from Z considered in the test described in Section 1.9 by
substituting S for σ.
The approximate sampling distribution of the test statistic under H0 is the N(0,1)
distribution.
YOU WILL KNOW
It is important to note that the exact sampling distribution of Z is the Student’s ‘t’
distribution with (n – 1) degrees of freedom, when n is small (n < 30). This hypotheses testing
problem, when n is small, is discussed, in detail, in Chapter 2. When n is large, the Student’s
‘t’ distribution converges to the N(0,1) distribution.
x 0
Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as z 0 . Here, x
s/ n
and s are respectively the values of X and S calculated for the given sample.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Example 1.9
A motor vehicle manufacturing company desires to introduce a new model motor vehicle.
The company claims that the mean fuel consumption of its new model vehicle is lower than that
of the existing model of the motor vehicle, which is 27 kms/litre. A sample of 100 vehicles of the
new model vehicle is selected randomly and their fuel consumptions are observed. It is found that
the mean fuel consumption of the 100 new model motor vehicles is 30 kms/litre with a standard
deviation of 3 kms/litre. Test the claim of the company at 5% level of significance.
Solution:
Step 1 : Let the fuel consumption of the new model motor vehicle be assumed to be distributed
according to a distribution with mean and standard deviation respectively μ and σ.
The null and alternative hypotheses are
Null hypothesis H0: μ = 27
i.e., the average fuel consumption of the company’s new model motor vehicle is not
significantly different from that of the existing model.
Alternative hypothesis H1: μ > 27
i.e., the average fuel consumption of the company’s new model motor vehicle is
significantly lower than that of the existing model. In other words, the number of
kms by the new model motor vehicle is significantly more than that of the existing
model motor vehicle.
Step 2 : Data:
The given sample information are
Size of the sample (n) = 100. Hence, it is a large sample.
Sample mean ( x )= 30
Sample standard deviation(s) = 3
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under H0 is
X 0
Z .
S n
Since n is large, the sampling distribution of Z under H0 is the N(0,1) distribution.
Step 7 : Decision
Since H1 is a one-sided (right) alternative, elements of the critical region are defined by
the rejection rule z0 > ze = z0.05. Thus, it is a right-tailed test. Since, for the given sample
information, z0 = 10 > ze = 1.645, H0 is rejected.
1.11 T
EST OF HYPOTHESES FOR EQUALITY OF MEANS OF TWO
POPULATIONS (Population variances are known)
Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are known admissible values.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY
Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1 and
(Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2, where
m and n are large(i.e., m ≥ 30 and n ≥ 30). Further, these two samples are assumed to be
independent.
( X Y ) ( X Y )
Z
Step 4 : Consider the test statistic X2 Y2 under H0, where X and Y are
m n
respectively the means of the two samples described in Step-2.
( X −Y )
It may be noted that the test statistic, when s X2 = sY2 = σ2, is Z =
.
1 1
s +
m n
(x − y )
Step 5 : Calculate the value of Z for the given samples (x1, x2, ..., xm) and (y1, y2 , …, yn) as z o = .
s X2 sY2
+
m n
Here, x and y are respectively the values of X and Y for the given samples.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.10
Performance of students of X Standard in a national level talent search examination was studied.
The scores secured by randomly selected students from two districts, viz., D1 and D2 of a State were
analyzed. The number of students randomly selected from D1 and D2 are respectively 500 and 800.
Average scores secured by the students selected from D1 and D2 are respectively 58 and 57. Can the
samples be regarded as drawn from the identical populations having common standard deviation 2?
Test at 5% level of significance.
Solution:
Step 1 : Let μX and μY be respectively the mean scores secured in the national level talent
search examination by all the students from the districts D1 and D2 considered for the
study. It is given that the populations of the scores of the students of these districts
have the common standard deviation σ = 2. The null and alternative hypotheses are
Null hypothesis: H0: µX = µY
i.e., average scores secured by the students from the study districts are not significantly
different.
Alternative hypothesis: H1: µX ≠ µY
i.e., average scores secured by the students from the study districts are significantly
different. It is a two-sided alternative.
58 57
z0
1 1
2
500 800
z0 = 8.77
Step-6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at α = 0.05 is
ze = z0.025 = 1.96.
Step-7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are defined by the
rejection rule |z0| ≥ ze = z0.025. For the given sample information, |z0| = 8.77 > ze = 1.96.
It indicates that the given sample contains sufficient evidence to reject H0. Thus,
it may be decided that H0 is rejected. Therefore, the average performance of the
students in the districts D1 and D2 in the national level talent search examination
are significantly different. Thus the given samples are not drawn from identical
populations.
Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are assumed to be unknown.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY
Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1
and (Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2,
where m and n are large (m ≥30 and n ≥30). Here, these two samples are assumed to
be independent.
Step 5 : Calculate the value of Z for the given samples (x1, x2, ...,xm) and (y1, y2, …, yn) as
xy
z0
2 2
s X sY .
m n
Here x and y are respectively the values of X and Y for the given samples.
Also, sX2 and sY2 are respectively the values of SX2 and SY2 for the given samples.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Example 1.11
A Model Examination was conducted to XII Standard students in the subject of Statistics.
A District Educational Officer wanted to analyze the Gender-wise performance of the students
using the marks secured by randomly selected boys and girls. Sample measures were calculated
and the details are presented below:
Solution:
Step 1 : Let μX and μY denote respectively the average marks secured by boys and girls in
the Model Examination conducted to the XII Standard students in the subject of
Statistics. Then, the null and the alternative hypotheses are
H101::µµXX ≠
Null hypothesis: H =≠ µµYY
i.e., there is no significant difference in the performance of the students with respect to
their gender.
Alternative hypothesis: H1 : µ X ≠ µY
i.e., performance of the students differ significantly with the respect to the gender. It is
a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are
Gender of the
Sample Size Sample Mean Sample Standard Deviation
Students
Boys m = 100 x = 50 sX = 4
Girls n = 150 y = 51 sY = 5
X Y .
Z
SX2 SY2
m n
The sampling distribution of Z under H0 is the N(0,1) distribution.
xy as
z0
s 2X sY2
m n
50 51
z0
42 52
100 150
Thus, z0 = −1.75
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at 5% level of significance is
ze = z0.025 = 1.96.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule z0 ≥ z0 . Thus it is a two-tailed test. But, z0 = 1.75 is less than
the critical value ze = 1.96. Hence, it may inferred as the given sample information
does not provide sufficient evidence to reject H0. Therefore, it may be decided that
there is no sufficient evidence in the given sample to conclude that performance of
boys and girls in the Model Examination conducted in the subject of Statistics differ
significantly.
Procedure:
Step 1 : Let P denote the proportion of the population possessing the qualitative characteristic
(attribute) under study. If p0 is an admissible value of P, then frame the null hypothesis
as H0:P = p0 and choose the suitable alternative hypothesis from
(i) H1: P ≠ p0 (ii) H1: P > p0 (iii) H1: P < p0
Step 2 : Let p be proportion of the sample observations possessing the attribute, where n is
large, np > 5 and n(1 – p) > 5.
p p0 ,
Step 5 : Calculate the value of Z under H0 for the given data as z 0 q 0 = 1 – p 0.
p0q0
n
Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table
Alternative Hypothesis (H1) P ≠ p0 P > p0 P < p0
Critical Value (ze) zα/2 zα -zα
Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.12
A survey was conducted among the citizens of a city to study their preference towards
consumption of tea and coffee. Among 1000 randomly selected persons, it is found that 560 are tea-
drinkers and the remaining are coffee-drinkers. Can we conclude at 1% level of significance from
this information that both tea and coffee are equally preferred among the citizens in the city?
Solution:
Step 1 : Let P denote the proportion of people in the city who preferred to consume tea.
Then, the null and the alternative hypotheses are
Null hypothesis: H 0 : P = 0.5
i.e., it is significant that both tea and coffee are preferred equally in the city.
Alternative hypothesis: H1 : P ≠ 0.5
i.e., preference of tea and coffee are not significantly equal. It is a two-sided alternative
hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 1000. Hence, it is a large sample.
No. of tea-drinkers = 560
560
Sample proportion (p) = = 0.56
1000
Step 3 : Level of significance
α = 1%
Thus, z0 = 3.79
Step 6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 1% level of
significance is zα/2 = z0.005 = 2.58.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule |z0| ≥ ze. Thus it is a two-tailed test. Since |z0| = 3.79 > ze = 2.58,
reject H0 at 1% level of significance. Therefore, there is significant evidence to
conclude that the preference of tea and coffee are different.
Procedure:
Step 1 : Let PX and PY denote respectively the proportions of Population-1 and Population-2
possessing the qualitative characteristic (attribute) under study. Frame the null
hypothesis as H0: PX=PY and choose the suitable alternative hypothesis from
(i) H1: PX≠ PY (ii) H1: PX>PY (iii) H1: PX<PY
Step 2 : Let p X and pY denote respectively the proportions of the samples of sizes m and n
drawn from Population-1 and Population-2 possessing the attribute, where m and n are
large (i.e., m ≥ 30 and n ≥ 30). Also, mpX 5, m 1 pX 5, npY 5 and n 1 pY 5 .
Here, these two samples are assumed to be independent.
Step 3 : Specify the level of significance, α.
pX pY
Step 5 : Calculate the value of Z for the given data as z 0
pq 1 1 .
m n
Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.13
A study was conducted to investigate the interest of people living in cities towards self-
employment. Among randomly selected 500 persons from City-1, 400 persons were found to be
self-employed. From City-2, 800 persons were selected randomly and among them 600 persons
are self-employed. Do the data indicate that the two cities are significantly different with respect
to prevalence of self-employment among the persons? Choose the level of significance as
α = 0.05.
Solution:
Step1 : Let PX and PY be respectively the proportions of self-employed people in City-1 and
City-2. Then, the null and alternative hypotheses are
Null hypothesis: H 0 : PX = PY
i.e., there is no significant difference between the proportions of self-employed
people in City-1 and City-2.
Alternative hypothesis: H1 : PX ≠ PY
i.e., difference between the proportions of self-employed people in City-1 and City-2
is significant. It is a two-sided alternative hypothesis.
Statistic is a random variable and its probability distribution is called sampling distribution.
Generally, the random sample used in Statistical Inference is drawn under sampling with
replacement from a finite population.
In each statistical hypotheses testing problem, there is one null hypothesis and one alternative
hypothesis.
Critical region is a subset of the sample space defined by the rejection rule.
If the number of sample observations is greater than or equal to 30, it is called large sample.
For hypotheses testing based on two samples, if the sizes of both the samples are greater than
or equal to 30, they are called large samples.
For testing population proportion, the sampling distribution of the test statistic is N(0, 1), only
when n > 30, np > 5 and n (1 - p) > 5.
For testing equality of two population proportions, the sampling distribution of the test statistic
is N (0, 1) only when m > 30, n > 30, mpX > 5, m(1− pX) > 5, npY > 5 and n(1− pY) > 5.
EXERCISE 1
n
σ n
(c) (d)
n σ
2. When n is large and σ is unknown, σ is replaced in the test statistic by
2 2
4. The critical value (table value) of the test statistic at the level of significance α for a two-tailed
large sample test is
(a) zα 2 (b) zα
(c) - zα (d) − zα /2
8. For testing H0: μ = μ0 against H1 : μ < μ0, what is the critical value at α = 0.01?
(a) 1.645 (b) –1.645
(c) –2.33 (d) 2.33
10. When the alternative hypothesis is H1 : µ ≠ µ0 , the critical region will be determined by
(a) both right and left tails (b) neither right nor left tail
(c) right tail (d) left tail
12. What is the standard error of the sample proportion under H0?
PQ pq
(a) ( )
n n
PQ
(c) (d) pq
n n
(c) X Y (d) X −Y
2
2
s 2X sY2
X
Y
+
m n m n
X Y
14. When the population variances are known and equal, the statistic Z is used to
test the null hypothesis 1 1
m n
(a) H 0 : P
µX ==µP0Y (b) H0 : µX = µY
(c) H 0 : PX = PY (d) H0 : P = p0
15. What is the standard error of the difference between two sample proportions, (PX – PY)?
1 1 pq 1 + 1
(a) pq + (b)
m n m n
1 1 1 1
(c) pq (d) pq
ˆ ˆ +
m n m n
16. What is level of significance?
(a) P(Type I Error) (b) P(Type II Error)
(c) upper limit of P(Type I Error) (d) upper limit of P(Type II Error)
17. Large sample test for testing population proportion is valid, when
(a) n is large, n ≥ 30, np > 5, n(1 – p) > 5 (b) n is large, n ≥ 30, np > 5, n(1 – p) < 5
(c) n is large, n ≥ 30, np > 5, np (1 – p) > 5 (d) n is large, n ≥ 30, np > 5, np(1 – p) < 5
18. Large sample test can be applied for two-sample problems, for testing the equality of two
population means, when
(a) m + n ≥ 30 (b) m ≥ 30, n ≥ 30
(c) m ≥ 30, m + n ≥ 30 (d) mn ≥ 30
20. What is the rejection rule, based on large sample, for testing H0 against one-sided (left)
alternative hypothesis?
(a) z0 ≥ zα /2 (b) z0 < − zα
(c) z0 ≤ –zα/2 (d) z0 > zα
Does the above information ensure at 1% level of significance that the difference between the
performances of the athletes of the two States is significant?
65. A District Administration conducted awareness campaign on a contagious disease utilizing the
services of school students. Among 64 randomly selected households, 50 of them appreciated
the involvement of students. Can the District Administration decide whether more than 90%
success could be achieved in these kinds of programmes by involving the students? Fix the
level of significance as 1%.
66. A coin is tossed 10, 000 times and head turned up 5,195 times. Test the hypothesis, at 5%
level of significance, that the coin is unbiased.
67. A study was conducted among randomly selected families who are living in two locations of
a district, and parents were asked “Whether watching TV programmes by parents affects the
studies of their children?” Details are presented hereunder:
No. of Families
Locality
Contacted Agreed
A 200 48
B 600 96
Test whether the difference between proportions of school students who prefer Modern
Gymnasium to do their exercises in the two States is significant at 5% level of significance.
70. Interest of XII Students on Residential Schooling was investigated among randomly selected
students from two regions. Among 300 students selected from Region A, 34 students
expressed their interest. Among 200 students selected from Region B, 28 students expressed
their interest. Does this information provide sufficient evidence to conclude at 5% level of
significance that students in Region A are more interested in Residential Schooling than the
students in Region B?
ACTIVITIES
1. In your institution, collect data from your Physical Education department about height/weight
of the students in a particular class (Standard 9). Take a random sample of 50 students from
your school and find the mean and standard deviation for their height/weight. Verify through
the statistical tests of inferential statistics, Find the significant difference between sample mean
and population mean. Give your comments.
2. Consider two group of students of same class (Standard 10) taking an examination in a
particular subject in Tamil medium and in English medium in your school. Get samples from
two medium and find their mean and standard deviation. Is there any significant difference
between the performance between the students taking their examination in Tamil medium and
English medium? Discuss with your friends.
Note: The teacher and students discuss and they may create their own problems similar
to the above problems and expand this exercises.
Steps:
• Open the browser and type the URL given (or) scan the QR code.GeoGebra work book called “Tests of
Significance – Basic Concepts and Large Sample Tests”will appear.
• In that Tick the square boxes for “Show Rejection Region-Show Type I Error probability-Show Alternative
Distribution”
• Now If press and drag the Orange and Blue colour buttons you will get the different results.
Step-1 Step-2
Step-3 Step-4
URL:
https://www.geogebra.org/m/VzUpY5M
TESTS BASED ON
2 SAMPLING
DISTRIBUTIONS I
LEARNING OBJECTIVES
Introduction
In the earlier chapter, we have discussed various problems related to tests of significance based
on large samples by applying the standard normal distribution. However, if the sample size is small
(n < 30) the sampling distributions of test statistics are far from normal and the procedures discussed
in Chapter-1 cannot be applied, except the general procedure (Section 1.8). But in this case, there
exists a probability distribution called t-distribution which may be used instead of standard normal
distribution to study the problems based on small samples.
Note 1: The degrees of freedom of t is the same as the degrees of freedom of the corresponding
chi-square random variable.
Note 2: The t-distribution is used as the sampling distribution(s) of the statistics(s) defined based on
random sample(s) drawn from normal population(s).
X
2
X X
N 0,1 and Y
i
X n21 are independent.
2
n
Hence,
X
T1
Y
n 1
X n 1
2
n Xi X
n
X (X X) i
2
t n 1 where S i 1
S (n 1)
n .
ii) If (X1 , X2 , …, Xm) and (Y1 , Y2 , …, Yn) are independent random samples drawn from N (μX , σ2)
and N (μY , σ2) populations respectively, then
X Y
m n
X Y
2 2
Xi X Y j Y
N 0,1 and i 1 j 1
are independent.
1 1 m2 n 2
2
m n
Then,
T2
X Y X Y
tm n 2
1 1
Sp
m n
3. If (X1 , Y1), (X2 , Y2), …, (Xn , Yn) is a random sample of n paired observations drawn from a
bivariate normal population, then Di = Xi – Yi , i = 1, 2, …, n is a random sample drawn from
N (μD , σD2). Here μD = μX – μY.
Hence,
D D
T3 t n 1
SD
n
normal
ii. t–distribution has a greater spread
sideways than the normal distribution
curve. It means that there is more area in
f(x)
0.2
The t-distribution has few more applications but they are not considered in this Chapter. You
will study these applications in higher classes.
Procedure:
Step 1 : Let µ and σ2 be respectively the mean and variance of the population under study, where
σ2 is unknown. If µ0 is an admissible value of µ, then frame the null hypothesis as
H0: µ = µ0 and choose the suitable alternative hypothesis from
Step 2 : Describe the sample/data and its descriptive measures. Let (X1, X2, …, Xn) be a random
sample of n observations drawn from the population, where n is small (n < 30).
Step 6 : Choose the critical value, te, corresponding to α and H1 from the following table
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table corresponding
to H1.
Step 7 : Decision
Since it is right-tailed test, elements of critical region are defined by the rejection rule
t0 >te = tn-1, α =t25,0.05 = 1.708. For the given sample information t0 = 10.20 > te =1.708. It
indicates that given sample contains sufficient evidence to reject H0. Hence, the campaign
has helped in promoting the increase in sales of a particular brand of tooth paste.
Step 2 : Data
The given sample information are
Size of the sample (n) = 10. Hence, it is a small sample.
∑u
i =1
i =0 u
i 1
2
i 690
= 85 + 0 = 85
Sample standard deviation
1 10 2
s= å ui
n -1 i=1
= 1 ´690
9
= 76.67
= 8.756
Hence,
x 0
t 0
s n
85 − 90 −5
= t0 =
8.756 2.77
10
= –1.806 and
|t0| = 1.806
Step 7 : Decision
Since it is two-tailed test, elements of critical region are defined by the rejection rule
|t0| > te = tn-1, α = t9,0.025 = 2.262. For the given sample information |t0| = 1.806 < te = 2.262.
2
It indicates that given sample does not provide sufficient evidence to reject H0. Hence, we
conclude that the class average scores is 90.
2.1.5 T
est of Hypotheses for Equality of Means of Two Normal Populations
(Independent Random Samples)
Procedure:
Step 1 : Let μX and μY be respectively the means of population-1 and population-2 under study. The
variances of the population-1 and population-2 are assumed to be equal and unknown
given by σ2.
Frame the null hypothesis as H 0 : µ X = µY and choose the suitable alternative hypothesis
from (i) H1 : μX ≠ μY (ii) H1 : μX > μY (iii) H1 : μX < μY
S
Sp =
( m − 1) s X2 + ( n − 1) sY2 ;
m+n−2
and
1 m
= sX 2 ∑
m − 1 i =1
( X i − X )2
1 n
=
sY 2 ∑
n − 1 i =1
(Yi − Y ) 2
T ( X Y ) under H
0
1 1
Sp
m n
Step 5 : Calculate the value of T for the given sample ( x1 , x2 ,...xm ) and ( y1 , y2 ,... yn ) as
(x y )
t0 .
1 1
s
m n
1 m
Here x and y are the values of X and Y for the samples. Also s x 2
m 1 i 1
(xi x )2 ,
1 n ( m − 1) sx2 + ( n − 1) s y2
sy2
n 1 i 1
( yi y )2 are the sample variances and ss =
p
.
m+n−2
Step 6 : Choose the critical value, te, corresponding to α and H1 from the following table
Example 2.3
The following table gives the scores (out of 15) of two batches of students in an examination.
Batch I 6 7 9 2 13 3 4 8 7 11
Batch II 5 6 5 7 1 7 2 7
Test at 1% level of significance the average performance of the students in Batch I and Batch II
are equal.
Solution:
Step 1 : Hypotheses: Let µ X and µY denote respectively the average performance of students in
Batch I and Batch II. Then the null and alternative hypotheses are :
Null Hypothesis H 0 : µ X = µY
i.e., the average performance of the students in Batch I and Batch II are equal.
Alternative Hypothesis H1 : µ X ≠ µ Y
i.e., the average performance of the students in Batch I and Batch II are not equal.
Step 2 : Data
u= xi − x v=i yi − y
xi
i
ui 2 yi vi 2
( x = 7) ( y = 5)
6 -1 1 5 0 0
7 0 0 6 1 1
9 2 4 5 0 0
2 -5 25 7 2 4
13 6 36 1 -4 16
3 -4 16 7 2 4
4 -3 9 2 -3 9
8 1 1 7 2 4
7 0 0
11 4 16
10 10 10 8 8 8
∑x
i =1
i = 70 ∑ui =1
i =0 ∑ ui2 =108
i =1
∑ yi = 40
i =1
∑ vi = 0
i =1
∑v
i =1
2
i = 38
x i
70
x i 1
=
x = 7
10 10
8
y i
40
y i 1
=
y = 5
8 8
To find combined sample standard deviation:
1 10 1 10 2 108
s 2X i
9 i 1
( x x )2
ui 9 12
9 i 1
1 8 1 8 38
sY2
7 i 1
( yi y )2 vi 2
7 i 1 7
5. 4
(m − 1) sx2 + (n − 1) s y2 108 + 38
=Ssp = = 9.125 3.021
=
m+n−2 10 + 8 − 2
Step 7 : Decision
Since it is two-tailed test, elements of critical region are defined by the rejection rule
|t0| < te = tm+n-2, α = t16,0.005 = 2.921. For the given sample information |t0| = 1.3957 < te = 2.921.
2
It indicates that given sample contains insufficient evidence to reject H0. Hence, the mean
performance of the students in these batches are equal.
Example 2.4
Two types of batteries are tested for their length of life (in hours). The following data is the
summary descriptive statistics.
Type Number of batteries Average life (in hours) Sample standard deviation
A 14 94 16
B 13 86 20
Is there any significant difference between the average life of the two batteries at 5% level of
significance?
Solution:
Step 1 : Hypotheses
Null Hypothesis H0 : μX = μY
i.e., there is no significant difference in average life of two types of batteries A and B.
Alternative Hypothesis H0 : μX ≠ μY
i.e., there is significant difference in average life of two types of batteries A and B. It is a
two-sided alternative hypothesis
Step 2 : Data
sp
m 1 s 2X n 1 sY2
mn2
Step 7 : Decision
Since it is a two-tailed test, elements of critical region are defined by the rejection rule
|t0| < te = tm+n-2, α = t25, 0.025 = 2.060. For the given sample information |t0| = 1.15 < te = 2.060.
2
It indicates that given sample contains insufficient evidence to reject H0. Hence, there is
no significant difference between the average life of the two types of batteries.
Procedure:
Step 1 : Let X and Y be two correlated random variables having the distributions respectively
N(μX , σX2 ) (Population-1) and N(μY , σY2 ) (Population-2). Let D = X–Y, then it has normal
distribution N(μD = μX – μY , σD2 ).
H0 : μD= 0
Step 2 : Describe the sample/data. Let (X1, X2, …, Xm) be a random sample of m observations
drawn from Population-1 and (Y1, Y2, …, Yn) be a random sample of n observations
drawn from Population-2. Here, these two samples are correlated in pairs.
∑ Di (D D )
i
2
where D = i =1 ; Di = Xi – Yi and S i 1
.
n n 1
The approximate sampling distribution of the test statistic T under H0 is t - distribution
∑d i n
sample mean= d=
where i =1
n
∑
; d=
iX i xi − yi (sample mean) and
i =1
∑ (d − d )
i
2
Step 6 : Choose the critical value, te, corresponding to α and H1 from the following table
Example 2.5
A company gave an intensive training to its salesmen to increase the sales. A random sample of
10 salesmen was selected and the value (in lakhs of Rupees) of their sales per month, made before and
after the training is recorded in the following table. Test whether there is any increase in mean sales at
5% level of significance.
Salesman 1 2 3 4 5 6 7 8 9 10
Before 15 22 6 17 12 20 18 14 10 16
After 17 23 16 20 14 21 18 20 10 11
Solution:
Step 1 : Hypotheses
i.e., there is no significant increase in the mean sales after the training.
i.e., there is significant increase in the mean sales after the training. It is a one-sided
alternative hypothesis.
Step 2 : Data
Sample size n = 10
Tt = D
S
n
To find d and s:
Let x denote sales before training and y denote sales after training
d d
2
Salesmen xi yi di = yi – xi di − d i
1 15 17 2 0 0
2 22 23 1 -1 1
3 6 16 10 8 64
4 17 20 3 1 1
5 12 14 2 0 0
6 20 21 1 -1 1
7 18 18 0 -2 4
8 14 20 6 4 16
9 10 10 0 -2 4
10 16 11 -5 -7 49
n n n
Total ∑ di =20 ∑ (di − d ) =
i =1
0 ∑ (d i − d )2 =
140
i =1 i =1
d i
20
d i 1
2
n 10
1 n
140
s= ∑
(n − 1) i =1
(di − d ) 2 =
9
= 15.56 = 3.94
Step 7 : Decision
It is a one-tailed test. Since |t0| = 1.6052 < te = tn-1, α = t9,0.05 = 1.833, H0 is not rejected.
Hence, there is no evidence that the mean sales has increased after the training.
X X
n 2
n i 1 2
and n2 respectively, then their sum U + V has the same χ2 distribution with d.f n1 + n2.
Step 2 : Describe the sample/data and its descriptive measures. Let (X1, X2, …, Xn) be a random
sample of n observations drawn from the population, where n is small (n < 30).
Step 3 : Fix the desired level of significance α.
(n 1) S 2
Step 4 : Consider the test statistic under H0. The approximate sampling distribution of
2
02
the test statistic under H0 is the chi-square distribution with (n–1) degrees of freedom.
(n 1) s 2
Step 5 : Calculate the value of the of χ2 for the given sample as 0
2
02
Step 6 : Choose the critical value of χe2 corresponding to α and H1 from the following table.
2n1, and
Critical value (ce2 )
2
n1
2
,
n
2
1,1
0
2 2
n 1,1
2
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table corresponding
to H1.
02 2n1,1
2
NOTE
If the population mean µ is known then for testing H0 : σ2 = σ02 against any of the alternatives, we
n
(x )i
2
use 0 i 1
with n d.f.
2
02
Step 2 :
The given sample information is
Sample size (n)= 8
2(n 1)S 2
χ
σ02
xi ( xi − 46 ) ( xi − 46 ) 2
38 -8 64
42 -4 16
43 -3 9
50 4 16
48 2 4
45 -1 1
52 6 36
50 4 16
8 8
xi 368
i1
0 (x x )
i 1
i
2
162
x i
368
x i 1
46
n 8
8 8
( xi x )2 ( x 46 )
i
2
162 .
s2 i 1
i 1
23.143
(n 1) (8 1) 7
8
(n 1)s 2 ( x x)
i
2
162
The calculated value of chi-square is 0
2 i 1
3.375
2 s 2
0 48
Step 7 : Decision
Since it is a two-tailed test, the elements of the critical region are determined by the
rejection rule 0 n1, 2 or 0 n1,1 2 .
2 2 2 2
For the given sample information, the rejection rule does not hold, since
Hence, H0 is not rejected in favour of H1. Thus, Population variance can be regarded as 48 kg.
Example 2.7
A normal population has mean µ (unknown) and variance 9. A sample of size 9 observations
has been taken and its variance is found to be 5.4. Test the null hypothesis H0: σ2 = 9 against
H1: σ2 > 9 at 5% level of significance.
Solution:
Step 1 : Null Hypothesis H0: σ2 = 9.
i.e., Population variance regarded as 9.
Alternative hypothesis H1: σ2 > 9.
i.e. Population variance is regarded as greater than 9.
Step 2 : Data
Sample size (n) = 9
Sample variance (s2) = 5.4
Example 2.8
A normal population has mean µ (unknown) and variance 0.018. A random sample of
size 20 observations has been taken and its variance is found to be 0.024. Test the null hypothesis
H0: σ2 = 0.018 against H1: σ2 < 0.018 at 5% level of significance.
Solution:
Step 1 : Null Hypothesis H0: σ2 = 0.018.
i.e. Population variance regarded as 0.018.
Alternative hypothesis H1: σ2 < 0.018.
i.e. Population variance is regarded as lessthan 0.018.
Step 2 : Data
Sample size (n) = 20
Sample variance (s2) = 0.024
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
Under null hypothesis H0
(n 1)S 2
χ2
σ02
(n 1)s 2 19 0.024
02 25.3
02 0.018
Step 2 : Data
The data set is given in the form of a contigency as under. Compute expected frequencies
Eij corresponding to each cell of the contingency table, using the formula
RRii ×× C
Cjj ;
=
=EEijij =
= ii 1,
1,=
=2,...
2,...mm;; jj 1,
1,2,..
2,..nn
N
N
where,
N = Total sample size
Ri = Row sum corresponding to ith row
C j = Column sum corresponding to jth column
Attribute B Total
B1 B2 … Bj … Bn
A1 O11 O12 … O1j … O1n R1
A2 O21 O22 … O2j … O2n R2
Attribute A
: : : : : : : :
. . . . . . . .
Ai Oi1 Oi2 … Oij … Oin Ri
: : : : : : : :
. . . . . . . .
Am Om1 Om2 … Omj … Omn Rm
Total C1 C2 … Cj … Cn N = m×n
Step 4 : Calculation
Calculate the value of the test statistic as
m n (Oij Eij ) 2
0
2
i 1 j 1 Eij
Example 2.9
The following table gives the performance of 500 students classified according to age in a
computer test. Test whether the attributes age and performance are independent at 5% of significance.
Solution:
Step 1 :
Null hypothesis H0: The attributes age and performance are independent.
Alternative hypothesis H1: The attributes age and performance are not independent.
Step 2 : Data
Compute expected frequencies Eij corresponding to each cell of the contingency table,
using the formula
Ri C j
Eij i 1, 2; j 1, 2, 3
N
where,
N = Total sample size
Ri = Row sum corresponding to ith row
Cj = Column sum corresponding to jth column
Step 4 : Calculation
Calculate the value of the test statistic as
2 3 (Oij Eij ) 2
0
2
i 1 j 1 Eij
Step 5 :
Critical value
From the chi-square table the critical value at 5% level of significance is
2 ( 21)(31),0.05 2 2,0.05 5.991 .
, the null hypothesis H0 is rejected. Hence, the performance and age of students are not
independent.
NOTE
If the contigency table is 2 x 2 then the value of χ2 can be calculated as given below:
A not A Total
B a b a+b
not B c d c+d
Total a+c b+d N=a+b+c+d
N (ad bc)2
02 2 (1d. f )
(a b)(c d )(a c)(b d )
Example 2.10
A survey was conducted with 500 female students of which 60% were intelligent, 40% had
uneducated fathers, while 30 % of the not intelligent female students had educated fathers. Test the
hypothesis that the education of fathers and intelligence of female students are independent.
Solution:
Step 1 :
Null hypothesis H0: The attributes are independent i.e. No association between education
fathers and intelligence of female students
Alternative hypothesis H1: The attributes are not independent i.e there is association
between education of fathers and intelligence of female students
Step 2 : Data
The observed frequencies (O) has been computed from the given information as under.
30
Educated fathers 300 −120 =
180 × 200 =
60 240
100
40
Uneducated fathers × 300 =
120 200 − 60 =
140 260
100
60
Total × 500 =
300 500 − 300 =
200 N= 500
100
N (ad bc)2
0 2
Step 6 : Decision
The calculated value 02 10.092 is greater than the critical value 1,0.05 3.841 , the
2
Step 2 : Data
Calculate the expected frequencies (Ei) using appropriate theoretical distribution such as
Binomial or Poisson.
Oi Ei
2
k
c
2
i 1 Ei
where k = number of classes
Oi
i 1
E
i 1
i .
If any of Ei is found less than 5, the corresponding class frequency may be pooled
with preceding or succeeding classes such that Ei's of all classes are greater than or equal
to 5. It may be noted that the value of k may be determined after pooling the classes.
The approximate sampling distribution of the test statistic under H0 is the chi-
square distribution with k-1-s d.f , s being the number of parametres to be estimated.
Step 5 : Calculation
Calculate the value of chi-square as
Oi Ei
2
k
c0 2
Ei
i 1
The above steps in calculating the chi-square can be summarized in the form of the table as
follows:
Step 7 : Decision
Decide on rejecting or not rejecting the null hypothesis by comparing the calculated
value of the test statistic with the table value, at the desired level of significance.
Example 2.11
Five coins are tossed 640 times and the following results were obtained.
Number of heads 0 1 2 3 4 5
Solution:
Step 1 : Null hypothesis H0: Fitting of binomial distribution is appropriate for the given data.
Alternative hypothesis H1: Fitting of binomial distribution is not appropriate to the
given data.
Step 2 : Data
Compute the expected frequencies:
n = number of coins tossed at a time = 5
x f fx
0 19 0
1 99 99
2 197 394
3 198 594
4 105 420
5 22 110
Total 640 1617
q^ 1 p^ 0.5
= 640 × 0.03125 = 20
i 1 Ei
Step 5 : Calculation
The test statistic is computed as under:
19 20 –1 1 0.050
99 100 –1 1 0.010
22 20 2 4 0.200
Total 0.575
Oi Ei
2
k
0 2
i 1 Ei
0.575
Critical value for d.f 4 at 5% level of significance is 9.488 i.e., c4,0.05 = 9.488
2
Step 7 : Decision
2
As the calculated c0 (=0.575) is less than the critical value c4,0.05 = 9.488 , we do not reject
2
Examine whether Poisson distribution is appropriate for the above data at 5% level of
significance.
Solution:
Step 1 : Null hypothesis H0: Fitting of Poisson distribution is appropriate for the given data.
Alternative hypothesis H1: Fitting of Poisson distribution is not appropriate for the
given data.
Step 2 : Data
The expected frequencies are computed as under:
To find the mean of the distribution.
x f fx
0 61 0
1 14 14
2 10 20
3 7 21
4 5 20
5 3 15
Total 100 90
x = = 0. 9
fx 90
f 100
Probability mass function of Poisson distribution is:
e m m x
p( x) ; x 0,1,... (2.2)
x!
In the case of Poisson distribution mean (m) = x = 0.9.
At x = 0, equation (2.2) becomes
e m m0
p 0 e m e0.9 0.4066 .
0!
The expected frequency at x is N P(x)
m Expected frequency
x
x +1 at x = N P(x)
0.9
0 40.66
0.9
1 36.594
2
0.9
2 16.4673
3
0.9
3 4.94019
4
0.9
4 1.1115
5
0.9
5 0.20007
6
x 0 1 2 3 4 5
Expected frequency 41 37 16 5 1 0
i 1 Ei
61 41 20 400 9.756
14 37 -23 529 14.297
10 16 -6 36 2.250
7 5
5 15 1 6 9 81 13.5
3 0
Total 51.253
Note: In the above table, we find the cell frequencies 0,1 in the expected frequency
column (E) is less than 5, Hence, we combine (pool) with either succeeding or
preceding one such that the total is made greater than 5. Here we have pooled
with preceding frequency 5 such that the total frequency is made greater than 5.
Correspondingly, cell frequencies in observed frequencies are pooled.
Oi Ei
2
k
c0
2
i 1 Ei
51.253
Step 7 : Decision
2
The calculated c0 (=51.253) is greater than the critical value (5.991) at 5% level of
significance. Hence, we reject H0. i.e., fitting of Poisson distribution is not appropriate
for the given data.
Example 2.13
A sample 800 students appeared for a competitive examination. It was found that 320 students
have failed, 270 have secured a third grade, 190 have secured a second grade and the remaining students
qualified in first grade. The general opinion that the above grades are in the ratio 4:3:2:1 respectively.
Test the hypothesis the general opinion about the grades is appropriate at 5% level of significance.
Step 1 : Null hypothesis H0: The result in four grades follows the ratio 4:3:2:1
Alternative hypothesis H1: The result in four grades does not follows the ratio 4:3:2:1
320 320 0 0 0
270 240 30 900 3.75
190 160 30 900 5.625
20 80 -60 3600 45
Total 54.375
i 1 Ei
54.375
Step 4 : Critical value
The critical value of χ2 for 3 d.f. at 5% level of significance is 7.81 i.e., c32,0.05 = 7.81
Step 5 : Decision
2
As the calculated value of c0 (=54.375) is greater than the critical value c32,0.05 = 7.81,
reject H0. Hence, the results of the four grades do not follow the ratio 4:3:2:1.
Example 2.14
The following table shows the distribution of digits in numbers chosen at random from a
telephone directory.
Digit 0 1 2 3 4 5 6 7 8 9
Frequency 1026 1107 997 966 1075 933 1107 972 964 853
Test whether the occurence of the digits in the directory are equal at 5% level of significance.
Step 1 : Null hypothesis H0: The occurrence of the digits are equal in the directory.
Alternative hypothesis H1: The occurrence of the digits are not equal in the directory.
i 1 Ei
58.542
Step 5 : Decision
Since the calculated χ02 (58.542) is greater than the critical value 9,0.05 16.919 , reject
2
H0. Hence, the digits are not uniformly distributed in the directory.
POINTS TO REMEMBER
If the number of elements in the sample is less than 30, it is called a small sample.
For conducting t-tests the parent population(s) should be normal and the samples(s)
should be small.
In case of two sample problems based on t-distribution the sizes of both samples must
be less than 30.
When the degrees of freedom is large the t-distribution converges to N(0,1) distribution.
The sampling distribution of the test statistic for testing hypothesis about normal
population mean is tn–1, when n is small and σ is unknown.
The sampling distribution of the test statistic for testing equality of two normal population
mean is tm+n–2 when m, n < 30 and the common population variance σ2 is unknown.
The uses of χ2 – distribution are (i) testing the specified variance of a normal
population (ii) testing goodness of fit and (iii) testing independence of attributes.
When expected frequency for a cell is less than 5, it is should be clubbed with the
adjacent cells such that the expected frequency in the resultant cell is greater than 5.
The degrees of freedom for the χ2 – statistic used for the independence of attributes is
(m–1) × (n–1), where m and n are respectively the number of rows and columns in a
contegency table.
EXERCISE 2
a)
ms12 + ns22
b)
( m − 1) s12 + ( n − 1) s22
m+n m+n
c)
ms12 + ns12
d)
( m − 1) s12 + ( n − 1) s22
m+n+2 m+n−2
10. A company gave an intensive training to its salesman to increase the sales. A random sample
of 6 salesmen was selected and the value of their sales made before and after the training is
recorded. Which test will be more appropriate to test whether there is an increase in mean sales
a) normal test b) paired t-test c) χ2-test d) F-test
11. If the order of the contigency table is (5 × 4). Then the degree of freedom of the corresponding
chi-square test statistic is
a) 18 b) 17 c) 12 d) 25
12. For testing the hypothesis concerning variance of a normal population _______ is used.
a) t-test b) F-test c) Z-test d) χ2-test
13. If σ2 is the variance of normal population, then the degrees of freedom of the sampling
distribution of the test statistic for testing H0 : σ2 = σ02 is:
a) n–1 b) n+1 c) n d) n–2
15. If chi-square is performed for testing goodness of fit to a data with k classes on estimating ‘s’
parameters then degrees of fredom of test statistic is.
a) k–s b) (k–1)(s–1)
c) k–1–s d) k–1
16. The statistic χ2, with usual notations, in case of contingency table of order (m × n) is given by
Oi Ei
2
k
a) 0 2 (Oij Eij ) b) 0
m n 2
2
i 1 j 1 Eij i 1 Ei
c) 02
k
Oi Ei d) c 2
k
Oi
i 1 Ei i 1 Ei
TYPE-I TYPE-II
Number of units 8 7
Mean of the samples (in hrs.) 1134 1024
Standard deviation of the samples (in hrs) 35 40
Test the hypothesis that the population means are equal at 5% level of significance.
43. The number of pages typed by 5 DTP – operators for 1 hour in the morning sessions are 10,
12, 13, 8, 9 and the number of pages typed by them in the afternoon are 11, 15, 12, 10, 8. Is
there any significant difference in the mean number of pages typed?
44. An IQ test was conducted to 5 persons before and after they were trained. The results are
given below:
Candidates I II III IV V
IQ before training 110 120 123 132 125
IQ after training 120 118 125 136 121
47. An experiment was conducted 144 times with tossing of four coins and the number of heads
appeared at each throw are recorded.
No. of heads 0 1 2 3 4
frequency 10 34 56 36 8
Test whether the number of defective blades follows a Poisson distribution with mean = 0.44. Use
α = 0.05.
49. The quality grade of electric components produced in two factories is given in the table given
below.
Quality of grade Total
Factory
Poor Medium Good Excellent
A 136 165 151 148 600
B 31 58 55 36 180
Total 167 223 206 184 780
Test whether there is any association between factories and quality of grades.
50. The eyesight was tested among 2000 randomly selected patients from a city and the following
details are obtained.
Eye-sight
Gender Total
Poor Good
Male 620 380 1000
Female 550 450 1000
Total 1170 830 2000
Can we conclude that there is an association between gender and quality of eye-sight at 5%
level of significance?
51. The weights (in kg) of 10 students from a school are 38,40,45,53,47,43,55,48,52,49. Can we
say that variance of the distribution of weights of all students from the above school is equal
to 20 kg?
52. In a sample of 200 households in a colony, two brands of hair oils A and B are applied by 90
females. Further, 60 females and 70 males are using brand A. To test whether there is any
association between the gender and brand of hair oil used, by constructing a contigency table.
45. t = 4.4898,reject H0
46. |t| = 0.7304, we do not reject H0
47. χ02 = 0.407407, we do not reject H0 at 5% level with 4 d.f.
48. χ02 = 35.10855, reject H0 at 5% level with 4 d.f.
49. χ02 = 370.4034, reject H0 at 5% level with 3 d.f.
50. χ02 = 10.09165, reject H0 at 5% level with 2 d.f.
51. χ02 =14, we do not reject H0 at 5% level with 9 d.f.
52.
Gender Hair Oil Brands TOTAL
A B
Male 70 40 110
Female 60 30 90
Total 130 70 200
χ02 = 0.1998, we do not reject H0 at 5% level with 1 d.f.
Steps:
• This is an android app activity. Open the browser and type the URL given (or) scan the QR code. (Or)
search for Probability Statistical Distributions Calculator in google play store.
• (i) Install the app and open the app, (ii) click “Menu”, (iii) In the menu page click “Students Distribution”
menu.
• Input freedom degree and t-store, cumulative probability to get the output.
URL:
URL:http://play.google.com/store/apps/details?id=net.eaglepeak.distributions_calculator
https://www.geogebra.org/m/wfencemf
TESTS BASED
3 ON SAMPLING
DISTRIBUTIONS – II
Sir Ronald Aylmer Fisher (1890–1962) was a British statistician and geneticist.
His work in statistics, made him popularly known as “a genius who almost
single-handedly created the foundations for modern statistical science” and “the
single most important figure in 20th century Statistics”. In genetics, his work
used Mathematics to combine Mendelian Genetics and natural selection and
this contributed to the revival of Darwinism in the early 20th century revision
of the Theory of Evolution.
R. A. Fisher
“Natural selection is a mechanism for generating an exceedingly high degree of improbability”
“The Best time to plan an experiment is after you have done it”
“The analysis of variance is not a mathematical theorem, but rather a convenient method of
arranging the arithmetic”
LEARNING OBJECTIVES
Introduction
In the previous chapters, we have discussed various concepts used in testing of hypotheses
and problems relating to means of the populations. Although many practical problems involve
inferences about population means or proportions, the inference about population variances
is important and needs to be studied. In this chapter we will study (i) testing equality of two
population variances (ii) one-way ANOVA and (iii) two-way ANOVA, using F-distribution.
Definition: F -Distribution
Let X and Y be two independent χ2 random variates with m and n degrees of freedom respectively.
X
Then F = m
is said to follow F-distribution with (m, n) degrees of freedom. This F-distribution
Y
n
is named after the famous statistician R.A. Fisher (1890 to 1962).
Definition: F-Statistic
CARE
Let (X1, X2, …, Xm) and (Y1, Y2, …, Yn) be
two independent random samples drawn from If the populations are not normal,
N(μX, σX2) and N(μY, σY2) populations respectively. F – test may not be used.
Then, Assumptions for testing the ratio of
two normal population variances
1
Y Y
n
1 m
2 2
X X
X 2 i 1 i
m2 1 and
Y2 j 1
j n21 i) The population from which the
samples were obtained must be
are independent normally distributed.
(2) F-Statistic is also defined as the ratio of two mean square errors.
Applications of F-distribution
The following are some of the important applications where the sampling distribution of
the respective statistic under H0 is F–distribution.
(i) Testing the equality of variances of two normal populations. [Using (1)]
(ii) Testing the equality of means of k (>2) normal populations. [Using (2)]
(iii) Carrying out analysis of variance for two-way classified data. [Using (2)]
Test procedure:
This test compares the variances of two independent normal populations, viz., N(μX, σX2)
and N(μY, σY2).
That is, there is no significant difference between the variances of the two normal
populations.
The alternative hypothesis can be chosen suitably from any one of the following
(i) H1 : σX2 < σY2 (ii) H1 : σX2 > σY2 (iii) H1 : σX2 ≠ σY2
Step 2 : Data
Let X1, X2,. . ., Xm and Y1, Y2,. . ., Yn be two independent samples drawn from two normal
populations respectively.
Step 3 : Level of significance α
Step 7 : Decision
H1 σ X2 < σ Y2 σ X2 > σ Y2 σ X2 ≠ σ Y2
Rejection F0 ≤ f(m–1, n–1), 1–α F0 ≥ f(m–1, n–1), α F0 ≤ f(m–1, n–1), 1–α/2
Rule or
F0 ≥ f(m–1, n–1), α/2
Note 1: Since f(m–1, n–1), 1–α is not avilable in the given F-table, it is computed as the reciprocal of
f(n–1, m–1),α.
1
i.e., f m1, n1, 1
f n1, m1,
Note 2: A F-test is based on the ratio of variances, it is also known as Variance Ratio Test.
Example 3.1
Two samples of sizes 9 and 8 give the sum of squares of deviations from their respective means as
160 inches square and 91 inches square respectively. Test the hypothesis that the variances of the two
populations from which the samples are drawn are equal at 10% level of significance.
Solution:
Step 1 : Null Hypothesis: H0 : σX2 = σY2
That is there is no significant difference between the two population variances.
Alternative Hypothesis: H1 : σX2 ≠ σY2
That is there is significant difference between the two population variances.
Step 2 : Data
m = 9, n = 8
y y 91
9 8
xi x
2 2
160 j
i 1 j 1
2 2 2
S2 / 2 S2
Step 5 : Calculation
1 m 1 n
i and Y yj y
2
s X2 x x s
2
2
m 1 i 1 n 1 j 1
160 91
s 2X
= = 20 s=
Y
2
= 13
8 7
s 2X 20
=
F0 = 2
= 1.54
sY 13
Example 3.2
A medical researcher claims that the variance of the heart rates (in beats per minute) of
smokers is greater than the variance of heart rates of people who do not smoke. Samples from
two groups are selected and the data is given below. Using = 0.05, test whether there is enough
evidence to support the claim.
Smokers Non Smokers
m = 25 n = 18
s12 = 36 s22 = 10
Solution:
Step 1 : Null Hypothesis: H0 : σ12 = σ22
That is there is no significant difference between the two population variances.
H1 : σ12 > σ22
That is, the variance of heart rates of smokers is greater than that of non-smokers.
Step 2 : Data
Smokers Non Smokers
m = 25 n = 18
s12 = 36 s22 = 10
Step 7 : Decision
Since F0 = 3.6 > f (24,17),0.05 = 2.19, the null hypothesis is rejected and we conclude that
the variance of heart beats for smokers seems to be considerably higher compared to
that of the non-smokers.
Solution:
Let X1, X2, …, Xm represent sample values for school A and let Y1, Y2, …, Yn represent
sample values for school B.
Step 1 : Null Hypothesis: H1 : σX2 = σY2
That is, there is no significant difference between the two population variances.
Alternative Hypothesis: H1 : σX2 < σY2
That is, the variance of marks in school A is significantly less than that of school B.
Step 2 : Data
X1, X2 ,…, Xm are sample from school A
Y1, Y2, ..., Yn are sample from school B
Step 3 : Test statistic
s 2X
F
sY2
2
11 mm
s X m 1
s 2X2
xx 2
xi x
m 1 i 1
i 1
2
1 nn
s 1 yy 2
2
1 y j y
s n 1 i
Y2
Y
n 1 j 1
Step 4 : Calculations
x i
666
x i 1
74
m 9
n
y i
693
y i 1
77
n 9
1 1
s 2X 628 628 78.5
9 1 8
1 1
sY2 1024 1024 128
9 1 8
78.5
=F0 = 0.609
128.8
Step 7 : Decision
Since F0 = 0.609 > f(8,8),0.95 = 0.291, the null hypothesis is not rejected and we conclude
that in school B there seems to be more variance present than in school A.
According to R.A. Fisher ANOVA is the “Separation of variance, ascribable to one group
of causes from the variance ascribable to other groups”.
The data may be classified with respect to different levels of a single factor/or different
levels of two factors.
…
Treatment k xk1 xk2 … xkn xk.
k
xij - the j th sample value from the ith treatment, j = 1, 2, …,ni, i =1, 2,…,k
k - number of treatments compared.
xi. - the sample total of i th treatment.
ni - the number of observations in the ith treatment.
k
∑n
i =1
i =n
The total variation in the observations xij can be split into the following two components
i) variation between the levels or the variation due to different bases of classification,
commonly known as treatments.
he variation within the treatments i.e. inherent variation among the observations
ii) T
within levels.
Causes involved in the first type of variation are known as assignable causes. The causes
leading to the second type of variation are known as chance or random causes.
The first type of variation that is due to assignable causes, can be detected and controlled by
human endeavor and the second type of variation that is due to chance causes, is beyond the human
control.
k ni
(ii) Total Sum of Squares: TSS = x
i 1 j 1
2
ij C.F
ni
xi2. k
Step 7 : Decision
Demerits
• Population variances of experimental units for different treatments need to be equal.
• Verification of normality assumption may be difficult.
Example 3.4
Three different techniques namely medication, exercises and special diet are randomly
assigned to (individuals diagnosed with high blood pressure) lower the blood pressure. After four
weeks the reduction in each person’s blood pressure is recorded. Test at 5% level, whether there is
significant difference in mean reduction of blood pressure among the three techniques.
Medication 10 12 9 15 13
Exercise 6 8 3 0 2
Diet 5 9 12 8 4
Solution:
Step 1 : Hypotheses
Null Hypothesis: H0: µ1 = µ 2 = µ3
That is, there is no significant difference among the three groups on the average reduction
in blood pressure.
That is, there is significant difference in the average reduction in blood pressure in atleast
one pair of treatments.
Step 2 : Data
Medication 10 12 9 15 13
Exercise 6 8 3 0 2
Diet 5 9 12 8 4
Total Square
Medication 10 12 9 15 13 59 3481
Exercise 6 8 3 0 2 19 361
Diet 5 9 12 8 4 38 1444
G = 116 5286
Individual squares
x 2
ij 1162
(116 )
2
G2 13456
1. Correction Factor: CF =
= = = 897.06
n 15 15
5286
897.06
5
1057.2 897.06
160.14
4. Sum of Squares due to Error: SSE = TSS – SST
= 264.94 – 160.14 = 104.8
n – 1 = 15 – 1
Total 264.94
= 14
Step 7 : Decision
As F0 = 9.17 > f(2, 12),0.05 = 3.8853, the null hypothesis is rejected. Hence, we conclude that
there exists significant difference in the reduction of the average blood pressure in atleast
one pair of techniques.
Example 3.5
Three composition instructors recorded the number of spelling errors which their students
made on a research paper. At 1% level of significance test whether there is significant difference
in the average number of errors in the three classes of students.
Instructor 1 2 3 5 0 8
Instructor 2 4 6 8 4 9 0 2
Instructor 3 5 2 3 2 3 3
Solution:
Step 1 : Hypotheses
Null Hypothesis: H 0 : µ1 = µ 2 = µ3
That is there is no significant difference among the mean number of errors in the three
classes of students.
Alternative Hypothesis
Instructor 1 2 3 5 0 8
Instructor 2 4 6 8 4 9 0 2
Instructor 3 5 2 3 2 3 3
Total Square
Instructor 1 2 3 5 0 8 18 324
Instructor 2 4 6 8 4 9 0 2 33 1089
Instructor 3 5 2 3 2 3 3 18 324
69
Individual squares
Instructor 1 4 9 25 0 64
Instructor 2 16 36 64 16 81 0 4
Instructor 3 25 4 9 4 9 9
x 2
ij 1162
379
G 2 69
2
4761
Correction Factor: CF = 264.5
n 18 18
Total Sum of Squares: TSS xij2 C. F
379 264.5 114.5
9. 9 4.95
Between treatments 9.9 3–1=2 = 4.95 =
F0 = 0.710
2 6.97
104.6
Error 104.6 15 = 6.97
15
n – 1 = 18 – 1
Total
= 17
Step 7 : Decision
As F0 = 0.710 < f(15, 2),0.05 = 3.6823, null hypothesis is not rejected. There is no enough
evidence to reject the null hypothesis and hence we conclude that the mean number of
errors made by these three classes of students are not equal.
k m
ii) Total Sum of Squares: TSS
= ∑∑ x
=i 1 =j 1
2
ij − C.F
k
xi2.
SST
iii) Sum of Squares between Treatments: = ∑
i =1 m
− C.F
MSB
Blocks SSB m-1 MSB F0b =
MSE
Step 7 : Decision
For Treatments: If the calculated F0t value is greater than the corresponding critical
value, then we reject the null hypothesis and conclude that there is significant
difference among the treatment means, in atleast one pair.
Demerits
• If the number of treatments is large enough, then it becomes difficult to maintain the
homogeneity of the blocks.
• If there is a missing value, it cannot be ignored. It has to be replaced with some function of
the existing values and certain adjustments have to be made in the analysis. This makes the
analysis slightly complex.
Comparison between one-way ANOVA and two-way ANOVA
ANOVA
Basis of comparison
One-way Two-way
Independent variable One Two
Three or more levels of Three or more levels of two
Compares
one factor factors, simultaneously
Need not be same in each Need to be equal in each treatment
Number of observations
treatment group group
Example 3.6
A reputed marketing agency in India has three different training programs for its salesmen.
The three programs are Method – A, B, C. To assess the success of the programs, 4 salesmen from
each of the programs were sent to the field. Their performances in terms of sales are given in the
following table.
Methods
Salesmen
A B C
1 4 6 2
2 6 10 6
3 5 7 4
4 7 5 4
Test whether there is significant difference among methods and among salesmen.
Alternative Hypotheses:
H11 : At least one average is different from the other, among the three programs.
H12 : At least one average is different from the other, among the four salesmen.
Step 2 : Data
Salesmen Methods
A B C
1 4 6 2
2 6 10 6
3 5 7 4
4 7 5 4
Methods
Total xi. xi.2
A B C
1 4 6 2 12 144
2 6 10 6 22 484
3 5 7 4 16 256
4 7 5 4 16 256
xi 22 28 16 66 1140
xi.2 484 784 256 1524
Squares
16 36 4
36 100 36
25 49 16
49 25 16
∑∑ x 2
ij = 408
TSS xij C. F
2
Total Sum of Squares:
408 363 45
k
Between blocks 9
18 2 9 Fob
= = 5.39
(Salesmen) 1.67
Error 10 6 1.67
Total 11
Example 3.7
The illness caused by a virus in a city concerning some restaurant inspectors is not consistent
with their evaluations of cleanliness of restaurants. In order to investigate this possibility, the director
has five restaurant inspectors to grade the cleanliness of three restaurants. The results are shown below.
Restaurants
Inspectors
I II III
1 71 55 84
2 65 57 86
3 70 65 77
4 72 69 70
5 76 64 85
Solution:
Step 1 :
Null hypotheses
H 0 I : µ=
1 µ=
2 µ=
3 µ=
4 µ5 (For inspectors - Treatments)
That is, there is no significant difference among the five inspectors over their mean
cleanliness scores
H0R : μI = μII = μIII (For restaurants - Blocks)
That is, there is no significant difference among the three restaurants over their mean
cleanliness scores
Alternative Hypotheses
H1I: At least one mean is different from the other among the Inspectors
H1R: At least one mean is different from the other among the Restaurants.
Restaurants
Inspectors
I II III
1 71 55 84
2 65 57 86
3 70 65 77
4 72 69 70
5 76 64 85
Squares
∑∑ x 2
ij = 76988
(=
1066 )
2 2
Correction Factor: = G
CF= 1136356
= 75757.07
n 15 15
− C. F
l
227454
= − 75757.07
3
= 75818 − 75757.07
= 60.93
k
15.23
Between inspectors 60.93 4 15.23 F0 I
= = 0.377
40.38
423.47
Between restaurants 846.93 2 423.47 F0 R
= = 10.49
40.38
Step 7 : Decision
i) As F0I = 0.377 < f(4, 8),0.05 = 3.838, the null hypothesis is not rejected and we conclude
that there is no significant difference among the mean cleanliness scores of inspectors.
ii) As F0R = 10.49 > f(2, 8),0.05 = 4.459, the null hypothesis is rejected and we conclude that
there exists significant difference in atleast one pair of restaurants over their mean
cleanliness scores.
Two independent random samples of size m and n are taken from Normal populations.
s2
Then the statistic F = X2 is a random variable following the F-distribution with m–1 and
sY
n–1 degrees of freedom.
According to R.A. Fisher, ANOVA is the “Separation of variance, ascribable to one group
of causes from the variance ascribable to other groups”.
One-way ANOVA is used to compare means in more than two groups.
Two-way ANOVA is used to compare means in more than two groups, controlling another
variable.
Assumptions required for ANOVA are:
• The observations follow normal distribution.
• Experimental units assigned to treatments are random.
• The sample observations are independent.
• The population variances of the groups are unknown but are assumed to be equal.
EXERCISE 3
3. One of the assumptions of ANOVA is that the population from which the samples are drawn is
6. In one-way classification with 30 observation and 5 treatments the degrees of freedom for error is
9. In two-way classification with 5 treatments and 4 blocks the degrees of freedom due to error
is
10. The formula for comparing three or more means in one-way analysis of variance is
MST TSS
(a) F = (b) F =
MSE SST
MSB MST
(c) F = (d) F =
MST MSB
11. ______ test is used to compare three or more means.
12. When there is no significant difference among three or more means the value of F will be
close to
14. The Analysis of Variance procedure is appropriate for testing the equivalence of three or
more population
16. If the calculated value of F is greater than the critical value at the given level of significance
then the H0 is
17. ______ and ______ causes are present in Analysis of Variance techniques
19. The correction factor is ______ in ANOVA (with the usual notations).
(a)
∑T 2
ij
(b) ∑T 2
i.
n n
2
(c) G (d) ∑T i.
n
n
Sample Size Sample mean Sum of squares of deviations from the mean
I 10 15 90
II 12 14 108
38. The following data refer to the yield of wheat in quintals on plots of equal area in two
agricultural blocks A and B.
39. The calories contained in 1/2 cup servings of ice-creams selected randomly from two national
brands are listed here. At 5% level of significance, is there sufficient evidence to conclude that
the variance of calories is less for brand A than brand B?
Brand A 330 310 300 310 300 350 380 300 300
Brand B 300 300 270 290 310 370 300 310 250
40. The carbohydrates contained in servings of some randomly selected chocolate and non-chocolate
candies are listed below. Is there sufficient evidence to conclude that the variance in carbohydrates
varies between chocolate and non-chocolate candies? Use = 2%.
41. A home gardener wishes to determine the effects of four fertilizers on the average number of
tomatoes produced. Test at 5% level of significance the hypothesis that the fertilizers A, B, C
and D have equal average yields.
A 14 10 12 16 17
B 9 11 12 8 10
C 16 15 14 10 18
D 10 11 11 13 8
42. Three processes X, Y and Z are tested to see whether their outputs are equivalent. The
following observations on outputs were made.
X 10 13 12 11 10 14 15 13
Y 9 11 10 12 13
Z 11 10 15 14 12 13
Carry out the one-way analysis of variance and state your conclusion.
43. A test was given to five students taken at random from XII class of three schools of a town.
The individual scores are
School I 9 7 6 5 8
School II 7 4 5 4 5
School III 6 5 6 7 6
44. A farmer applies three types of fertilizers on four separate plots. The figures on yield per acre
are tabulated below.
Plots
Fertilizer
A B C D
Nitrogen 6 4 8 6
Potash 7 6 6 9
Phosphate 8 5 10 9
Test whether there is any significant difference among mean yields of different plots and among
different fertilizers.
Machine types
Operators
A B C D E
I 8 10 7 12 6
II 12 13 8 9 12
III 7 8 6 8 8
IV 5 5 3 5 14
ANSWERS
I. 1. (c) 2. (b) 3. (d) 4. (a) 5. (a)
6. (c) 7. (a) 8. (c) 9. (a) 10. (a)
11. (c) 12. (c) 13. (b) 14. (c) 15. (d)
16. (a) 17. (c) 18. (b) 19. (c) 20. (c)
Steps:
• Open the browser and type the URL given (or) scan the QR code. GeoGebra work book called
“F-Distribution” will appear.
• In this several work sheets for statisticsare given, open the worksheet named “F-Distribution”
• Drag and move the Red colour and Blue colour button or type the values in the left side box for result
Step-1 Step-2
Step-3 Step-4
URL:
https://www.geogebra.org/m/A45YdMfJ
CORRELATION
4 ANALYSIS
“When the relationship is of a quantitative nature, the appropriate statistical tool for discovering
the existence of relation and measuring the intensity of relationship is known as correlation”
—CROXTON AND COWDEN
LEARNING OBJECTIVES
The statistical techniques discussed so far are for only one variable. In many research
situations one has to consider two variables simultaneously to know whether these two variables
are related linearly. If so, what type of relationship that exists between them. This leads to
bivariate (two variables) data analysis namely correlation analysis. If two quantities vary in such a
way that movements ( upward or downward) in one are accompanied by the movements( upward
or downward) in the other, these quantities are said to be co-related or correlated.
The correlation concept will help to answer the following types of questions.
• Whether study time in hours is related with marks scored in the examination?
• Is it worth spending on advertisement for the promotion of sales?
• Whether a woman’s age and her systolic blood pressure are related?
• Is age of husband and age of wife related?
• Whether price of a commodity and demand related?
• Is there any relationship between rainfall and production of rice?
• Investigates the type and strength of the relationship that exists between the two variables.
• Progressive development in the methods of science and philosophy has been characterized by
the rich knowledge of relationship.
In this chapter, we study simple correlation only, multiple correlation and partial correlation
involving three or more variables will be studied in higher classes .
3. Uncorrelated
In other words, if one variable increases, the other variable (on an average) also increases or if one
variable decreases, the other (on an average)variable also decreases.
For example,
i) Income and savings
ii) Marks in Mathematics and Marks in Statistics. (i.e.,Direct relationship pattern exists).
X -Height of goods
Height of the Li increases / decreases according The starng posion of wring depends on the height of
to the Height of goods increases / decreases. the writer.
For example,
i) Price and demand
ii) Unemployment and purchasing power
3) Uncorrelated:
The variables are said to be uncorrelated if smaller values of x are associated with smaller
or larger values of y and larger values of x are associated with larger or smaller values of y. If the
two variables do not associate linearly, they are said to be uncorrelated. Here r = 0.
Important note: Uncorrelated does not
imply independence. This means “do not interpret
as the two variables are independent instead
interpret as there is no specific linear pattern exists
but there may be non linear relationship”.
X Y X Y
4) Perfect Positive Correlation
If the values of x and y increase or decrease proportionately then they are said to have
perfect positive correlation.
5) Perfect Negative Correlation
The purpose of correlation analysis is to find the existence of linear relationship between
the variables. However, the method of calculating correlation coefficient depends on the types of
measurement scale, namely, ratio scale or ordinal scale or nominal scale.
If the plotted points in the plane form a band and they show the
rising trend from the lower left hand corner to the upper right hand corner,
X
the two variables are positively correlated. In this case 0 < r < 1
2) Negative correlation
Y
If the plotted points in the plane form a band and they show the falling
trend from the upper left hand corner to the lower right hand corner, the two
variables are negatively correlated. In this case -1 < r < 0
X
3) Uncorrelated Y
If the plotted points spread over in the plane then the two variables
are uncorrelated.
X
In this case r = 0
4) Perfect positive correlation
Y
If all the plotted points lie on a straight line from lower left hand
corner to the upper right hand corner then the two variables have perfect
positive correlation. X
In this case r = +1
If all the plotted points lie on a straight line falling from upper left
hand corner to lower right hand corner, the two variables have perfect
negative correlation. In this case r = -1
X
4.4.2 Properties
1. The correlation coefficient between X and Y is same as the correlation coefficient between Y and
X (i.e, rxy = ryx ).
2. The correlation coefficient is free from the units of measurements of X and Y
3. The correlation coefficient is unaffected by change of scale and origin.
x A y B
Thus, if ui i and vi i with c ≠ 0 and d ≠ 0 i=1,2, ..., n
c d
n n n
n ui vi ui vi
r i 1 i 1 i 1
2 2
n
n n
n
n ui ui
2
n vi vi 2
i 1 i 1 i 1 i 1
Example 4.1
The following data gives the heights(in inches) of father and his eldest son. Compute the
correlation coefficient between the heights of fathers and sons using Karl Pearson’s method.
Height of father 65 66 67 67 68 69 70 72
Height of son 67 68 65 68 72 72 69 71
i 1 i 1 i 1 i 1
Calculation
xi yi x i2 y i2 x iy i
65 67 4225 4489 4355
66 68 4356 4624 4488
67 65 4489 4225 4355
67 68 4489 4624 4556
68 72 4624 5184 4896
69 72 4761 5184 4968
70 69 4900 4761 4830
72 71 5184 5041 5112
544 552 37028 38132 37560
Heights of father and son are positively correlated. It means that on the average , if fathers are
tall then sons will probably tall and if fathers are short, probably sons may be short.
Short-cut method
Let A = 68 , B = 69, c = 1 and d = 1
xi yi ui = (xi – A)/c v i = (y i – B)/d ui 2 v i2 u iv i
= xi – 68 = y i – 69
65 67 -3 -2 9 4 6
66 68 -2 -1 4 1 2
67 65 -1 -4 1 16 4
67 68 -1 -1 1 1 1
68 72 0 3 0 9 0
69 72 1 3 1 9 3
70 69 2 0 4 0 0
72 71 4 2 16 4 8
Total 0 0 36 44 24
n n n
n ui vi ui vi
r i 1 i 1 i 1
2 2
n
n n
n
n ui ui
2
n vi vi
2
i 1 i 1 i 1 i 1
8 × 24
r=
8 × 36 8 × 44
= 0.603
Note: The correlation coefficient computed by using direct method and short-cut method is the same.
Example 4.2
The following are the marks scored by 7 students in two tests in a subject. Calculate
coefficient of correlation from the following data and interpret.
Marks in test-1 12 9 8 10 11 13 7
Marks in test-2 14 8 6 9 11 12 3
Solution:
Let x denote marks in test-1 and y denote marks in test-2.
xi yi xi2 yi2 xiyi
12 14 144 196 168
9 8 81 64 72
8 6 64 36 48
10 9 100 81 90
11 11 121 121 121
1 12 169 144 156
7 3 49 9 21
Total 70 63 728 651 676
n n n
n xi yi xi yi
r i 1 i 1 i 1
2 2
n
n n
n
n xi xi
2
n yi yi 2
i 1 i 1 i 1 i 1
n n n
xi 70
i 1
xi 2 728
i 1
x y
i 1
i i 676
n n
yi 63
i 1
y
i 1
i
2
651 n 7
7 676 70 63
r
7 728 702 7 651 632
4732 4410
5096 4900 7 651 3969
322 322 322
0.95
196 588 14 24.25 339.5
2. Correlation does not imply causal relationship. That a change in one variable causes a
change in another.
NOTE
1. Uncorrelated : Uncorrelated (r = 0) implies no ‘linear relationship’. But there may exist non-
linear relationship (curvilinear relationship).
Example: Age and health care are related. Children and elderly people need much more health
care than middle aged persons as seen from the following graph.
Health care
Child Old
0 Age
Adult
However, if we compute the linear correlation r for such data, it may be zero implying
age and health care are uncorrelated, but non-linear correlation is present.
2. Spurious Correlation : The word ‘spurious’ from Latin means ‘false’ or ‘illegitimate’. Spurious
correlation means an association extracted from correlation coefficient that may not exist in reality.
n n2 1
where Di = R1i – R2i
Interpretation
Spearman’s rank correlation coefficient is a statistical measure of the strength of a
monotonic (increasing/decreasing) relationship between paired data. Its interpretation is similar
to that of Pearson’s. That is, the closer to the ±1 means the stronger the monotonic relationship.
0.01 to 0.19: “Very Weak Agreement” (-0.01) to (-0.19): “Very Weak Disagreement”
0.80 to 1.0: “Very Strong Agreement” (-0.80) to (-1.0): “Very Strong Disagreement”
Example 4.3
Two referees in a flower beauty competition rank the 10 types of flowers as follows:
Referee A 1 6 5 10 3 2 4 9 7 8
Referee B 6 4 9 8 1 2 3 10 5 7
Use the rank correlation coefficient and find out what degree of agreement is between the
referees.
D
i 1
2
i 60
n
Here n = 10 and D i 1
2
i 60
n
6 Di2
1 i 1
n n 1 2
6 60 360 360
1 1 1 0.636
10 10 1
2
10 99 990
Interpretation: Degree of agreement between the referees ‘A’ and ‘B’ is 0.636 and they have “strong
agreement” in evaluating the competitors.
Example 4.4
Calculate the Spearman’s rank correlation coefficient for the following data.
Candidates 1 2 3 4 5
Marks in Tamil 75 40 52 65 60
Marks in English 25 42 35 29 33
D
i 1
i
2
40 and n = 5
n
6 Di2
1 i 1
n n 1 2
6 40 240
1 1 1
5 5 1
2
5 24
Interpretation: This perfect negative rank correlation (-1) indicates that scorings in the subjects,
totally disagree. Student who is best in Tamil is weakest in English subject and vice-versa.
Example 4.5
Quotations of index numbers of equity share prices of a certain joint stock company and
the prices of preference shares are given below.
Years 2013 2014 2015 2016 2017 2008 2009
Equity shares 97.5 99.4 98.6 96.2 95.1 98.4 97.1
Reference shares 75.1 75.9 77.1 78.2 79 74.6 76.2
Using the method of rank correlation determine the relationship between equity shares
and preference shares prices.
Solution:
D
i 1
2
i 90
D
i 1
2
i 90 and n = 7.
n
6 Di2
1 i 1
n n 1 2
6 90 540 540
1 1 1 1 1.66071 0.6071
7 72 1 7 48 336
Interpretation: There is a negative correlation between equity shares and preference share prices.
There is a strong disagreement between equity shares and preference share prices.
1 6
n n2 1
where mi is the number of repetitions of ith rank
Example 4.6
Compute the rank correlation coefficient for the following data of the marks obtained by
8 students in the Commerce and Mathematics.
Marks in Commerce 15 20 28 12 40 60 20 80
Marks in Mathematics 40 30 50 30 20 10 30 60
1 1
Di 12 m1 m1 12 m2 m2 ...
2 3 3
1 6
n n2 1
Repetitions of ranks
In Commerce (X), 20 is repeated two times corresponding to ranks 3 and 4. Therefore, 3.5
is assigned for rank 2 and 3 with m1=2.
In Mathematics (Y), 30 is repeated three times corresponding to ranks 3, 4 and 5. Therefore,
4 is assigned for ranks 3,4 and 5 with m2=3.
Therefore,
1 3 1 3
(
81.5 + 12 2 − 2 + 12 3 − 3) ( )
ρ = 1− 6
8 82 − 1 ( )
= 1− 6
[81.5 + 0.5 + 2] = 1−
504
=0
504 504
=0
Interpretation: Marks in Commerce and Mathematics are uncorrelated
Yule’s coefficient: Q
AB A B
AB A B
Note 1: The usage of the symbol α is not to be confused with level of significance.
Note 2: (AB): Number with attributes AB etc.
This coefficient ranges from –1 to +1. The values between –1 and 0 indicate inverse
relationship (association) between the attributes. The values between 0 and +1 indicate direct
relationship (association) between the attributes.
Example 4.7
Out of 1800 candidates appeared for a competitive examination 625 were successful; 300 had
attended a coaching class and of these 180 came out successful. Test for the association of attributes
attending the coaching class and success in the examination.
Solution:
N = 1800
A: Success in examination α: No success in examination
B: Attended the coaching class β: Not attended the coaching class
(A) = 625, (B) = 300, (AB) = 180
B β Total
A 180 445 625
α 120 1055 1175
Total 300 1500 N = 1800
Remark: Consistency in the data using contingency table may be found as under.
Construct a 2 × 2 contingency table for the given information. If at least one of the cell
frequencies is negative then there is inconsistency in the given data.
Example 4.8
Verify whether the given data: N = 100, (A) = 75, (B) = 60 and (AB) = 15 is consistent.
Solution:
The given information is presented in the following contingency table.
B β Total
A 15 60 75
α 45 -20 25
Total 60 40 N = 100
POINTS TO REMEMBER
Correlation study is about finding the linear relationship between two variables.
Correlation is not causation. Sometimes the correlation may be spurious.
Correlation coefficient lies between –1 and +1.
Pearson’s correlation coefficient provides the type of relationship and intensity of
relationship, for the data in ratio scale measure.
Spearman’s correlation measures the relationship between the two ordinal variables.
Yule’s coefficient of Association measures the association between two dichotomous
attributes.
(a) 6 D (b) i
2
6 D (c)i
2
6 D (d) i
2
1 i 1 1 i 1
1 1
i 1 i 1
n n
3
n n
3
n n
3 n n 1 2
8. If ∑D 2
= 0, rank correlation is
45. Find the Karl Pearson’s coefficient of correlation for the following data.
Wages 100 101 102 102 100 99 97 98 96 95
Cost of living 98 99 99 97 95 92 95 94 90 91
How are the wages and cost of living correlated?
46. Calculate the Karl Pearson’s correlation coefficient between the marks (out of 10) in statistics
and mathematics of 6 students.
Student 1 2 3 4 5 6
Statistics 7 4 6 9 3 8
Mathematics 8 5 4 8 3 6
48. Calculate the Spearman’s rank correlation coefficient between price and supply from the
following data.
Price 4 6 8 10 12 14 16 18
Supply 10 15 20 25 30 35 40 45
49. A random sample of 5 college students is selected and their marks in Tamil and English are
found to be:
Tamil 85 60 73 40 90
English 93 75 65 50 80
Calculate Spearman’s rank correlation coefficient.
50. Calculate Spearman’s coefficient of rank correlation for the following data.
x 53 98 95 81 75 71 59 55
y 47 25 32 37 30 40 39 45
51. Calculate the coefficient of correlation for the following data using ranks.
Mark in Tamil 29 24 25 27 30 31
Mark in English 29 19 30 33 37 36
52. From the following data calculate the rank correlation coefficient.
x 49 34 41 10 17 17 66 25 17 58
y 14 14 25 7 16 5 21 10 7 20
Yule’s coefficient
53. Can vaccination be regarded as a preventive measure of Hepatitis B from the data given below.
Of 1500 person in a locality, 400 were attacked by Hepatitis B. 750 has been vaccinated. Among
them only 75 were attacked.
III 38. n = 10
40. r = 0.85
41. (αβ ) = −50 , The given data is inconsistent
45. r = 0.847 wages and cost of living are highly positively correlated.
46. r = 0.8081. Statistics and mathematics marks are highly positively correlated.
47. ρ = 0.8929 price of tea and coffee are highly positively correlated.
49. ρ = 0.8
52. ρ = +0.733
54. There is a positive association between not attacked and not vaccinated.
Steps:
• This is an android app activity. Open the browser and type the URL given (or) scan the QR code. (Or)
search for “Correlation Coefficient” in google play store.
• (i) Install the app and open the app, (ii) To calculate Correlation Co-efficient in put the the values of
X and Y in the given box (iii) Then click “CALCULATE” we will get the result.
Step-1 Step-2
Step-3 Result
URL:
https://play.google.com/store/apps/details?id=com.hiox.CreliCoefficientCalcul
REGRESSION
5 ANALYSIS
Francis Galton (1822-1911) was born in a wealthy family. The youngest of nine
children, he appeared as an intelligent child. Galton’s progress in education was
not smooth. He dabbled in medicine and then studied Mathematics at Cambridge.
In fact he subsequently freely acknowledged his weakness in formal Mathematics,
but this weakness was compensated by an exceptional ability to understand the
meaning of data. Many statistical terms, which are in current usage were coined
by Galton. For example, correlation is due to him, as is regression, and he was the Francis Galton
originator of terms and concepts such as quartile, decile and percentile, and of the use of median as
the midpoint of a distribution.
The concept of regression comes from genetics and was popularized by Sir Francis Galton
during the late 19th century with the publication of regression towards mediocrity in hereditary stature.
Galton observed that extreme characteristics (e.g., height) in parents are not passed on completely to
their offspring. An examination of publications of Sir Francis Galton and Karl Pearson revealed that
Galton's work on inherited characteristics of sweet peas led to the initial conceptualization of linear
regression. Subsequent efforts by Galton and Pearson brought many techniques of multiple regression
and the product-moment correlation coefficient.
LEARNING OBJECTIVES
Introduction
The correlation coefficient is an useful statistical tool for describing the type ( positive or
negative or uncorrelated ) and intensity of linear relationship (such as moderately or highly) between
two variables. But it fails to give a mathematical functional relationship for prediction purposes.
Regression analysis is a vital statistical method for obtaining functional relationship between a
dependent variable and one or more independent variables. More specifically, regression analysis
helps one to understand how the typical value of the dependent variable (or ‘response variable’)
changes when any one of the independent variables (regressor(s) or predictor(s)) is varied, while
the other independent variables are held fixed. It helps to determine the impact of changes in
the value(s) of the the independent variable(s) upon changes in the value of the dependent variable.
Regression analysis is widely used for prediction.
Types of ‘Regression’
Based on the kind of relationship between the dependent variable and the set of independent
variable(s), there arises two broad categories of regression viz., linear regression and non-linear regression.
If the relationship is linear and there is only one independent variable, then the regression
is called as simple linear regression. On the other hand, if the relationship is linear and the
number of independent variables is two or more, then the regression is called as multiple linear
regression. If the relationship between the dependent variable and the independent variable(s) is
not linear, then the regression is called as non-linear regression.
NOTE
Error There are many reasons for the
presence of the error term in the
linear regression model. It is also
Inependent Dependent
known as measurement error. In
variable X variable Y
some situations, it indicates the
Regression line,Y=a+bX+e presence of several variables other
than the present set of regressors.
The general form of the simple linear regression equation is Y = a + bX + e, where ‘X’ is
independent variable, ‘Y’ is dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is
error term. This equation can be used to estimate the value of response variable (Y) based on the
given values of the predictor variable (X) within its domain.
Before going for further study, the following points are to be kept in mind.
•• Both the independent and dependent variables must be measured at the interval scale.
•• There must be linear relationship between independent and dependent variables.
•• Linear Regression is very sensitive to Outliers (extreme observations). It can affect the
regression line extremely and eventually the estimated values of Y too.
The method of least squares helps us to find the values of unknowns ‘a’ and ‘b’ in such a
way that the following two conditions are satisfied:
n
• Sum of the residuals is zero. That is ∑ ( y − yˆ ) =
i =1
i i 0.
n
∑ ( y − yˆ )
2
• Sum of the squares of the residuals E (=
a, b) i i is the least.
i =1
∑ ( y − yˆ )
2
E (=
a, b) i i
i =1 Simple Linear Regression Model
∑ ( y − a − bx ) Y
2
i.e., E (a,=
b) i i . yi = a+b xi+Error
}
i =1
Error
Here, yˆi = a + bxi is the expected (estimated) value of the Regression line
∂E (a, b) n
=−2∑ ( yi − a − bxi ) = 0
∂a i =1
∂E (a, b) n
=−2∑ xi ( yi − a − bxi ) = 0
∂b i =1
These give
n n
na + b∑ xi =
∑ yi
=i 1 =i 1
n n n
a ∑ xi + b∑ xi2 =
∑ xi yi
=i 1 =i 1 =i 1
These equations are popularly known as normal equations. Solving these equations for ‘a’
and ‘b’ yield the estimates â and b̂ .
ˆ
â= y − bx
and
1 n
∑ xi yi − x y
n i =1
ˆ
b=
1 n 2
∑
n i =1
xi − x 2
It may be seen that in the estimate of ‘b’, the numerator and denominator are respectively
the sample covariance between X and Y, and the sample variance of X. Hence, the estimate of ‘b’
may be expressed as
Cov( X , Y )
bˆ =
V (X )
Further, it may be noted that for notational convenience the denominator of b̂ above is
mentioned as variance of X. But, the definition of sample variance remains valid as defined in
1 n
Chapter I, that is,
n 1 i 1
xi x 2 .
From Chapter 4, the above estimate can be expressed using, rXY , Pearson’s coefficient of the
simple correlation between X and Y, as
SD(Y )
bˆ = rXY .
SD( X )
Example 5.1
n n
Construct the simple linear regression equation of Y on X if n = 7, xi 113 , x 2
i 1983 ,
n n i 1
y 182
i 1
i 1
i and x y 3186 .
i 1
i i
Solution:
The simple linear regression equation of Y on X to be fitted for given data is of the form
^
Y = a + bx(1)
The values of ‘a’ and ‘b’ have to be estimated from the sample data solving the following
normal equations.
n n
na b xi yi (2)
i 1 i 1
n n n
a xi b xi2 xi yi
i 1 i 1 i 1
(3)
Substituting the given sample information in (2) and (3), the above equations can be
expressed as
7 a + 113 b = 182 (4)
113 a + 1983 b = 3186 (5)
(4) × 113 ⇒ 791 a + 12769 b = 20566
(5) × 7 ⇒ 791 a + 13881 b = 22302
(−) (−) (−)
−1112 b = −1736
1736
⇒b= = 1.56
1112
b = 1.56
Substituting this in (4) it follows that,
7 a + 113 × 1.56 = 182
7 a + 176.28 = 182
7 a = 182 – 176.28
= 5.72
Hence, a = 0.82
Man-hours 3.6 4.8 7.2 6.9 10.7 6.1 7.9 9.5 5.4
Productivity (in units) 9.3 10.2 11.5 12 18.6 13.2 10.8 22.7 12.7
Solution:
The simple linear regression equation to be fitted for the given data is
Yˆ= a + bx
Here, the estimates of a and b can be calculated using their least squares estimates
ˆ
â= y − bx
1 n ˆ1 x
n
=aˆ ∑ i n∑
y
n i 1=
− b i
i.e., = i 1
1 n
∑ xi yi − ( x × y )
n i =1
ˆ
b=
1 n 2
∑ xi − x 2
n i =1
n
n n
n∑ xi yi − ∑ xi × ∑ yi
or equivalently bˆ=
= i1 = i 1 =i 1
2
n
n
n∑ xi − ∑ xi
2
=i 1 = i1
From the given data, the following calculations are made with n=9
Man-hours xi Productivity y i x i2 x iy i
3.6 9.3 12.96 33.48
4.8 10.2 23.04 48.96
7.2 11.5 51.84 82.8
6.9 12 47.61 82.8
10.7 18.6 114.49 199.02
6.1 13.2 37.21 80.52
7.9 10.8 62.41 85.32
9.5 22.7 90.25 215.65
5.4 12.7 29.16 66.42
9 9 9 9
∑ xi = 62.1
i =1
∑ yi = 121
i =1
∑ xi2 = 468.97
i =1
∑ x y = 894.97
i =1
i i
8054.73 − 7514
=
4220.73 − 3856.41
540.73
=
364.32
Thus, bˆ = 1.48 .
Now â can be calculated using b̂ as
121 62.1
aˆ = − 1.48 ×
9 9
= 13.40 – 10.21
Hence, â = 3.19
Therefore, the required simple linear regression equation fitted to the given data is
Yˆ 3.19 + 1.48 x
=
It should be noted that the value of Y can be estimated using the above fitted equation for
the values of x in its range i.e., 3.6 to 10.7.
In the estimated simple linear regression equation of Y on X
ˆ
Yˆ= aˆ + bx
ˆ . Then, the regression equation will become as
we can substitute the estimate â= y − bx
ˆ + bx
Yˆ =y − bx ˆ
Yˆ − y = bˆ( x − x )
It shows that the simple linear regression equation of Y on X has the slope b̂ and the
corresponding straight line passes through the point of averages ( x , y ) . The above representation
of straight line is popularly known in the field of Coordinate Geometry as ‘Slope-Point form’. The
above form can be applied in fitting the regression equation for given regression coefficient b̂
and the averages x and y .
As mentioned in Section 5.3, there may be two simple linear regression equations for each
X and Y. Since the regression coefficients of these regression equations are different, it is essential
to distinguish the coefficients with different symbols. The regression coefficient of the simple
linear regression equation of Y on X may be denoted as bYX and the regression coefficient of the
simple linear regression equation of X on Y may be denoted as bXY.
Xˆ −=
x bXY ( y − y ).
Also, the relationship between the Karl Pearson’s coefficient of correlation and the
regression coefficient are
SD(Y )
bXX = r SD( X ) and bYXbˆ == rXY .
XY
SD(Y ) SD( X )
2. It is clear from the property 1, both regression coefficients must have the same sign.
i.e., either they will positive or negative.
3. If one of the regression coefficients is greater than unity, the other must be less than unity.
4. The correlation coefficient will have the same sign as that of the regression coefficients.
5. Arithmetic mean of the regression coefficients is greater than the correlation coefficient.
bXY bYX
rXY
2
6. Regression coefficients are independent of the change of origin but not of scale.
m m2
3. Angle between the two regression lines is tan 1 1 where m1 and m2 are the
1 m1m2
slopes of regression lines X on Y and Y on X respectively.
4. The angle between the regression lines indicates the degree of dependence between the variable.
5. Regression equations intersect at (X, Y)
x 12 14 15 14 18 17
y 42 40 45 47 39 45
Estimate the likely demand when the X = 25.
Solution:
xi ui = xi – 15 ui 2 yi v i = yi – 43 v i2 u iv i
12 -3 9 42 -1 1 3
14 -1 1 40 -3 9 3
15 -0 0 45 2 4 0
14 -1 1 47 4 16 -4
18 3 9 39 -4 16 -12
17 2 4 45 2 4 4
Total 90 0 24 258 0 50 -6
66
90
90
xx
=
= ∑xx=
∑ 66
/=
11
i =i =
ii =
66
15
= 15
6
258
=y ∑ y=
/5
i =1
i = 43
6
=i 1 = i1
∧ ∧ ∧^ ∧
a a=
u u− −buvUV
= 00
buvv v=
=
^∧ ∧
Hence, the regression line of U on V is U =u =
bUV
buv vv + a =−0.12v
y = 40.5
Example 5.4
The following data gives the experience of machine operators and their performance
ratings as given by the number of good parts turned out per 50 pieces.
Operators 1 2 3 4 5 6 7 8
Experience (X) 8 11 7 10 12 5 4 6
Ratings (Y) 11 30 25 44 38 25 20 27
Obtain the regression equations and estimate the ratings corresponding to the experience
x=15.
Solution:
xi yi x iy i x i2 y i2
8 11 88 64 121
11 30 330 121 900
7 25 175 49 625
10 44 440 100 1936
12 38 456 144 1444
5 25 125 25 625
4 20 80 16 400
6 27 162 36 729
Total 63 220 1856 555 6780
Regression equation of Y on X,
Y y bYX x x
^
x i
63
x i 1
7.875
n 8
n
y i
220
y i 1
27.5
n 8
8 1856 63 220
8 555 63 63
14848 13860
4440 3969
988
=
471
bYX = 2.098
^
Y – 27.5 = 2.098 (x – 7.875)
^
Y – 27.5 = 2.098 x – 16.52
^
Y = 2.098x + 10.98
When x = 15,
^
Y = 2.098 × 15 +10.98
^
Y = 31.47 + 10.98
= 42.45
Regression equation of X on Y,
X x bXY y y
^
n n n
n x i y i x i y i
bXY i 1 i 1 i 1
2
n
n
n y y i
2
i
i 1 i 1
8 1856 63 220
8 6780 220 220
14848 13860
54240 48400
988
=
5840
bXY = 0.169
Example 5.5
The random sample of 5 school students is selected and their marks in statistics and
accountancy are found to be
Statistics 85 60 73 40 90
Accountancy 93 75 65 50 80
Solution:
The two regression lines are:
Regression equation of Y on X,
Y y bYX x x
^
Regression equation of X on Y,
X x bXY y y
^
ui = x i – A v i = xi – B
xi yi u iv i ui 2 y i2
= xi – 60 = xi – 75
x i
348
x i 1
69.6
n 5
n
y i
363
y i 1
72.6
n 5
Since the mean values are in decimals format not as integers and numbers are big, we take
origins for x and y and then solve the problem.
Y y bYX x x
^
Calculation of bYX
n n n
n ui vi ui vi
bYX bVU i 1 i 1 i 1
2
n
n
n u ui
2
i
i 1 i 1
5 970 48 9(12)
5 2094 (48) 2
4850 + 576
=
10470 – 2304
5426
= = 0.664
8126
b=
YX b=
VU 0.664
^
Y – 72.6 = 0.664 (x – 69.6)
^
Y – 72.6 = 0.64x – 46.214
^
Y = 0.664x + 26.386
Regression equation of X on Y,
X x bXY y y
^
Calculation of bXY
n n n
n ui vi ui vi
bXY bUV i 1 i 1 i 1
2
n
n
n v vi
2
i
i 1 i 1
5 970 48 (12)
5 1074 (12) 2
4850 576 5426
5370 144 5226
bUV = 1.038
^
X – 69.6 = 1.038 (y – 72.6)
^
X – 69.6 = 1.038y – 75.359
^
X = 1.038y – 5.759
Solution:
The regression coefficient of Y on X is bYX = –1.5
The regression coefficient of X on Y is bXY = 0.6
Both the regression coefficients are of different sign, which is a contrary. So the given
equations cannot be regression lines.
Example: 5.7
mean S.D
Yield of wheat (kg. unit area) 10 8
Annual Rainfall (inches) 8 2
Solution:
Let us denote the dependent variable yield by Y and the independent variable rainfall by X.
Regression equation of Y on X is given by
SD(Y )
Y – ybˆ == rXY (x – x)
SD( X )
Example 5.8
For 50 students of a class the regression equation of marks in Statistics (X) on marks in
Accountancy (Y) is 3Y – 5X + 180 = 0. The mean marks in of Accountancy is 50 and variance of
marks in statistics is 16
25
of the variance of marks in Accountancy.
Solution:
We are given that:
n = 50, Regression equation of X on Y as 3Y – 5X + 180 = 0
16
y = 50 , V ( X ) = V (Y ) , and V(Y) = 25.
25
We have to find (i) x and (ii) rXY
(i) Calculation for x
Since (x, y) is the point of intersection of the two regression lines, they lie on the regression
line 3Y – 5X + 180 = 0
Hence, 3 y 5x 180 0
3(50) 5x 180 0
5 x 180 150
330
330
x 66
5
x 66
(ii) Calculation for coefficient of correlation.
3Y 5 X 180 0
5 X 180 3Y
X 36 0.6 Y
bXY 0.6
0.6 = r SD( X )
XY
SD(Y )
0.6 × SD(Y )
rXY =
SD( X )
V (Y )
rXY
2
= 0.36 × (1)
V (X)
Given that:
V(Y) = 25
16
V ( X ) = V (Y )
25
= 16 × 25
25
V(X) = 16
0.36 25
rXY = 0.75
16
Example 5.9
5 9
If two regression coefficients are bYX = and bXY = , what would be the value of rXY?
6 20
Solution:
The correlation coefficient rXY bYX bXY
5 9 = 0.375
6 20
Since both the signs in bYX and bXY are positive, correlation coefficient between X and Y is
positive.
Correlation Regression
1. It indicates only the nature and extent of It is the study about the impact of the
linear relationship independent variable on the dependent
variable. It is used for predictions.
2. If the linear correlation is coefficient is The regression coefficient is positive, then for
positive / negative , then the two variables every unit increase in x, the corresponding
are positively / or negatively correlated average increase in y is bYX. Similarly, if the
regression coefficient is negative , then for
every unit increase in x, the corresponding
average decrease in y is bYX.
3. One of the variables can be taken as x and Care must be taken for the choice of independent
the other one can be taken as the variable y. variable and dependent variable. We can not
assign arbitrarily x as independent variable and
y as dependent variable.
4. It is symmetric in x and y, It is not symmetric in x and y, that is, bXY and bYX
ie., rXY=rYX have different meaning and interpretations.
POINTS TO REMEMBER
There are several types of regression - Simple linear correlation , multiple linear
correlation and non-linear correlation.
In simple linear regression there are two linear regression lines Y on X and X on Y.
In the linear regression line Y = a + bX + e , where ‘X’ is independent variable, ‘Y’ is
dependent variable, a’ is intercept, ‘b’ is slope of the line and ‘ e’ is error term.
The point ( X , Y ) passes through the regression lines.
The “ Method of least squares” gives the line of best fit.
Both the regression lines have the same sign either positive of negative.
The sign of the regression coefficient and the sign of the correlation coefficient is
same.
4. In regression equation X = a + by + e is
a) correlation coefficient of Y on X b) correlation coefficient of X on Y
c) regression coefficient of Y on X d) regression coefficient of X on Y
5. bYX =
SD( X ) SD(Y ) SD( X ) SD(Y )
a) rXY b) rXY c) d)
SD(Y ) SD( X ) SD(Y ) SD( X )
a) cov(X, Y) b) SD(X)
c) correlation coefficient d) coefficient of variance
10. Regression analysis helps in establishing a functional relationship between ______ variables.
a) 2 or more variables b) 2 variables
c) 3 variables d) none of these
13. If the two lines of regression are perpendicular to each other then rXY =
a) 0 b) 1 c) –1 d) 0.5
m m2
c) tan 1 1 d) none of the above
1 m1 m2
SD(Y ) SD( X )
a) rXY b) rXY
SD( X ) SD(Y )
1
c) rXY SD(X) SD(Y) d)
bYX
17. Regression equation of X on Y is
a) Y = a + bYX x + e b) Y = bXY x + a + e
c) X = a + bXY y + e d) X = bYX y + a + e
^
18. For the regression equation 2Y = 0.605x + 351.58. The regression coefficient of Y on X is
a) Y = 8 + 0.7 X b) X = 8 + 0.7 Y
c) Y = 0.7 + 8 X d) X = 0.7 + 8 Y
36. Given x = 90, y = 70, bXY = 1.36, bYX = 0.61 when y = 50, Find the most probable value of X.
37. Compute the two regression equations from the following data.
x 1 2 3 4 5
y 3 4 5 6 7
^
If x = 3.5 what will be the value of Y ?
43. The following table shows the age (X) and systolic blood pressure (Y) of 8 persons.
Age (X) 56 42 60 50 54 49 39 45
Blood pressure (Y) 160 130 125 135 145 115 140 120
Fit a simple linear regression model, Y on X and estimate the blood pressure of a person of
60 years.
44. Find the regression equation of X on Y given that n = 5, ∑x = 30, ∑y = 40, ∑xy = 214, ∑x2 = 220,
∑y2 = 340.
45. Given the following data, estimate the marks in statistics obtained by a student who has
scored 60 marks in English.
ean of marks in Statistics = 80, Mean of marks in English = 50, S.D of marks in Statistics =
M
15, S.D of marks in English = 10 and Coefficient of correlation = 0.4.
46. Find the linear regression equation of percentage worms (Y) on size of the crop (X) based on
the following seven observations.
47. In a correlation analysis, between production (X) and price of a commodity (Y) we get the
following details.
Variance of X = 36.
The regression equations are:
12X – 15Y + 99 = 0 and 60 X – 27 Y =321
Calculate (a) The average value of X and Y.
(b) Coefficient of correlation between X and Y.
INDEX NUMBERS
6
LEARNING OBJECTIVES
Introduction
Index number is a technique of measuring changes in a variable or a group of variables
with respect to time, location or other characteristics. It is one of the most widely used statistical
methods. Index number is a specialized average designed to measure the change in a group
of related variables over a period of time. For example, the price of cotton in 2010 is studied
with reference to its price in 2000. It is used to feel the pulse of the economy and it reveals
the inflationary or deflationary tendencies. In reality, it is viewed as barometers of economic
activity because if one wants to have an idea as to what is happening in an economy, he should
check the important indicators like the index number of agricultural production, index number
of industrial production, and the index number business activity etc., There are several types of
index numbers and the students will learn them in this chapter.
6.1.1 Definition
An Index Number is defined as a relative measure to compare and describe the average
change in price, quantity value of an item or a group of related items with respect to time,
geographic location or other characteristics accordingly.
In the words of Maslow “An index number is a numerical value characterizing the change in
complex economic phenomenon over a period of time or space”
Spiegal defines, “An index number is a statistical measure designed to show changes in
a variable on a group of related variables with respect to time, geographical location or other
characteristics”.
According to Croxton and Cowden “Index numbers are devices for measuring differences in
the magnitude of a group of related variables”.
Bowley describes “Index Numbers as a series which reflects in its trend and fluctuations the
movements of some quantity”.
6.1.2 Uses
The various uses of index numbers are:
Economic Parameters
The Index Numbers are one of the most useful devices to know the pulse of the economy.
It is used as an indicator of inflanationary or deflanationary tendencies.
Measures Trends
Index numbers are widely used for measuring relative changes over successive periods of
time. This enable us to determine the general tendency. For example, changes in levels of prices,
population, production etc. over a period of time are analysed.
Useful in deflating
Price index numbers are used for connecting the original data for changes in prices. The
price index are used to determine the purchasing power of monetary unit.
NOTE
The points and precautions that should be taken in the constructing index numbers
are:
• determination of the purpose
• selection of the base period
• selection of commodities
• selection of price quotations
• selection of appropriate weight
• selection of an appropriate average
• selection of an appropriate formula
Methods of constructing
Index Numbers
Unweighted Weighted
P01
p1
100
p0 NOTE
The base period is the period
p1 = Current year prices for various commodities
against which comparison is made.
p0 = Base year prices for various commodities
Generally a year is taken as base
P01 = Price Index number
period. The base period should be
Limitations of the simple aggregative method free from economic and natural
disturbances.
(i) Relative importance of the commodities
is not taken into account.
(ii) Highly priced items influence the index number
Example 6.1
Construct the Price Index Number for the year 1997, from the following information
taking 1996 as base year.
Commodities Price in 1996 (`) Price in 1997 (`)
Rice 130 115
Wheat 80 65
Sugar 75 70
Ragi 95 90
Oil 105 105
Dal 35 20
∑p 0 = 520 ∑p 1 = 465
P01
p 1
100
p 0
465
100 89.42
520
Example 6.2
Calculate Price Index Number for 2016 from the following data by simple aggregate
method, taking 2016 as base year.
Price per kg
Commodities
2015 2016
Apple 100 140
Orange 30 40
Pomegranate 120 130
Guava 40 50
Solution:
360
= ×100
290
Example 6.3
Compute price index number by simple average of price relatives method using arithmetic
mean and geometric mean.
P01 = antilog
log p
= antilog
10.6515
N
5
= antilog (2.13303)
= 134.9
Hence, the price index number based on arithmetic mean and geometric mean for the year 2002
are 137.34 and 134.9 respectively.
Example 6.4
Construct simple average price relative index number using arithmetic mean for the year 2012
for the following data showing the profit from various categories sold out in departmental stores.
Profit (per week) 2010 2012
Groceries 150600 170800
Cosmetics 70000 82000
Stationery items 12000 10800
Utensils 20000 18600
Solution: Index number uning Arithmertic Mean of price relatives
Profit in 2010 (p0) Profit in 2012 (p1) p1/p0 x 100
170800
Groceries 150600 170800 150600
×100 =
11341
82000
Cosmetics 70000 82000 70000
×100 =
117.14
10800
Stationery items 12000 10800 12000
×100 =
90.00
18600
Utensils 20000 18600 20000
×100 =
93.00
Total 413.55
N
413.55
=
4
=103.3875
P01 = 103.39
The average price relative index number using arithmetic mean for the year 2012 is 103.39
Example 6.5
Construct simple average price relative index number using geometric mean for the year
2015 for the data showing the expenditure in education of the children taking different courses.
Expenditure per year 2014 2015
B.Sc 24000 26000
B.Com 20000 22000
B.E 108000 12000
M.B.B.S 150000 168000
Solution:
Expenditure Year 2014 (p0) Year 2015 (p1) P = (p1/p0) × 100 log P
26000
B.Sc 24000 26000 ×100 =
108.33 2.0346
24000
22000
B.Com 20000 22000 ×100 =
110.00 2.0414
20000
120000
B.E 108000 12000 ×100 =
111.11 2.0457
108000
168000
M.B.B.S 150000 168000 ×100 =
112.00 2.0492
150000
log P 8.1709
= antilog 8.1709
4
= antilog (2.04275)
= antilog (2.0428)
= 110.4
The average price relative index number using geometric mean for the year 2015 is 110.4
a. Laspeyre’s method
The base period quantities are taken as weights. The Index is
P01L
p1q0 100
p0q0
b. Paasche’s method
The current year quantities are taken as a weight. In this method, we use continuously
revised weights and thus this method is not frequently used when the number of commodities is
large. The Index is
P01P
p1q1 100
p0q1
c. Dorbish and Bowley’s method
In order in take into account the impact of both the base and current year, we make use of
simple arithmetic mean of Laspeyre’s and Paasche’s formula
The Index is
DB P01L P01P
P01
2
= ∑ ∑p q
pq 0 0 0 1
P01 × 100
2
pq pq
1 0 1 1
100
p q p q
0 0 0 1
e. Marshall-Edgeworth method
In this method also both the current year as well as base year prices and quantities are
considered.
The Index is
P01ME
p q q 100
1 0 1
p q q
0 0 1
p q p q 100
1 0 1 1
p q p q
0 0 0 1
f. Kelly’s method
The Kelly’s Index is
P01K
p q 100,
1
q
q0 q1
p q
0 2
where q refers to quantity of some period, not necessarily of the mean of the base year
and current year. It is also possible to use average quantity of two or more years as weights. This
method is known as fixed weight aggregative index.
Example 6.6
Construct weighted aggregate index numbers of price from the following data by applying
1. Laspeyre’s method
2. Paasche’s method
3. Dorbish and Bowley’s method
4. Fisher’s ideal method
5. Marshall-Edgeworth method
Solution:
Calculation of various indices
2016 2017
Commodity Price Quantity Price Quantity p1q0 p0q0 p1q1 p0q1
p0 q0 p1 q1
A 2 8 4 6 32 16 24 12
B 5 10 6 5 60 50 30 25
C 4 14 5 10 70 56 50 40
D 2 19 2 13 38 38 26 26
∑pq
1 0 = 200 ∑pq
0 0 = 160 ∑pq
1 1 = 130 ∑pq
0 1 = 103
P01L
p1q0 100
p0q0
200
= × 100 = 125
160
P01P
pq 1 1
100
p q0 1
130
= × 100 = 126.21
126.21
103
(3) Dorbish and Bowley’s Index
P01F
pq pq
1 0 1 1
100
p q p q
0 0 0 1
200 130
= × × 100
160 103
= 125.61
(5) Marshall-Edgeworth method
P01ME
pq pq
1 0 1 1
100
p q p q
0 0 0 1
Example 6.7
Calculate the price indices from the following data by applying (1) Laspeyre’s method
(2) Paasche’s method and (3) Fisher ideal number by taking 2010 as the base year.
2010 2011
Commodity
Prices Quantities Prices Quantities
A 20 10 25 13
B 50 8 60 7
C 35 7 40 6
D 25 5 35 4
Solution: Calculations
p0 q0 p1 q1 p0 q0 p0 q1 p1 q0 p1 q1
20 10 25 13 200 260 250 325
P01L
pq 1 0
100
pq 0 0
1185
100
970
= 122.16
(2) ) Paasche’s Index
P01p
pq 1 1
100
pq 0 1
11254
100
920
= 122.28
(3) Fisher’s Ideal Index
P01F
pq pq
1 0 1 1
100
p q p q
0 0 0 1
1185 1125
= × ×100
970 920
= 1.49377 ×100
= 1.2055 ×100
= 120.55
Example 6.8
Calculate the Dorbish and Bowley’s price index number for the following data taking 2014
as base year.
2014 2015
Items Quantities Quantities
Prices (per kg) Prices (per kg)
(purchased) (purchased)
Oil 80 3 100 4
Pulses 35 2 45 3
Sugar 25 2 30 3
Rice 50 30 54 35
Cereals 35 2 40 3
pq pq
1 0 1 1
P01DB
p q p q
0 0 0 1
100
2
1 2150 2635
100
2 1930 2355
1
1.1139 1.1188 100
2
1
2.2327 100
2
= 1.1164 x 100 = 111.64
Example 6.9
Compute Marshall – Edgeworth price index number for the following data by taking 2016
as base year.
Items sold out in 2016 2017
a men’s wear Prices Quantity Prices Quantity
Shirts 700 150 900 175
Pants 1000 100 1200 150
Sandals 500 70 600 100
Shoes’ 1500 50 1800 60
Belts 400 100 600 150
Watches 1200 300 1500 250
P01ME
pq pq
1 0 1 1
100
p q p q
0 0 0 1
969500 927000
100
772500 391000
1896500
100
1163500
=162.99
Example 6.10
Calculate a suitable price index form the following data.
Commodity Quantity Price
2007 2010
X 25 3 4
Y 12 5 7
Z 10 6 5
Solution:
In this problem, the quantities for both current year and base year are same. Hence, we can
conlude Kelly’s Index price number.
Commodity q p0 p1 p0q p1q
X 25 3 4 75 100
Y 12 5 7 60 84
Z 10 6 5 60 50
195 234
Kelly’s price Index number:
P01K
p q 100
1
p q 0
234
P01k
= ×100
195
= 120
Example 6.11
Compute price index for the following data by applying weighted average of price relatives
method using (i) Arithmetic mean and (ii) Geometric mean.
Item p0 q0 p1
Wheat 3.0 20 kg 4.0
Flour 1.5 40 kg 1.6
Milk 1.0 10 kg 1.5
Solution:
(i) Computation for the weighted average of price relatives using arithmatic mean.
wp 15,900
P01 = = 122.31
w 130
This means that there has been a 22.31 % increase in prices over the base year.
(ii) Index number using geometric mean of price relatives is:
P01 Antilog
w log p = Antilog
270.947
w 130
This means that there has been a 21.3 % increase in prices over the base year.
Q01L
=
∑q p 1 0
×100
∑q p 0 0
q p q p
1 0 1 1
100
q p q p 0 0 0 1
These formulae represent the quantity index in which quantities of the different
commodities are weighted by their prices.
Example 6.12
Compute the following quantity indices from the data given below:
(i) Laspeyre’s quantity index (ii) Paasche’s quantity index and (iii) Fisher’s quantity index
1970 1980
Commodity
Price Total value Price Total value
A 10 80 11 110
B 15 90 9 108
C 8 96 17 340
Solution:
Since we are given the value and the prices, the quantity figures can be obtained by dividing
the value by the price for each of the commodities.
Commodity p0 q0 p1 q1 p0q0 p1q0 p0q1 p1q1
A 10 8 11 10 80 88 100 110
B 15 6 9 12 90 54 180 108
Q01L
=
∑q p
1 0
×100
∑q p
0 0
440
= ×100
266
= 165.4
(ii) Paasche’s quantity index
= Q01P
∑ q1 p1 ×100
∑ q0 p1
558
= ×100
342
= 163.15
(iii) Fisher’s quantity index
Q01F Q01L Q01P
q p q p
1 0 1 1
100
q p q p
0 0 0 1
440 558
= × ××100
100
266 342
= 1.6428 × 100
= 164.28
p q p q
0 0 0 1
∑pq ∑pq
1 1 1 0
P01
P01 ××PP10
∑pq ×∑pq ×∑p q ×∑p q
1 0 1 1 0 1 0 0
10 =
∑p q ∑p q ∑pq ∑pq
0 0 0 1 1 1 1 0
∑pq
P01 × Q01 = 1 1
∑ p0 q0
Now, PP0101 = ∑pq ×∑pq
1 0 1 1
∑p q ∑p q
0 0 0 1
QQ0101 = ∑p q ×∑pq
0 1 1 1
∑p q ∑pq
0 0 1 0
Hence, PP ×× Q01
∑pq 1 0 ∑pq ×∑p q ×∑pq
1 1 0 1 1 1
01 = ×
0101
∑p q 0 0 ∑p q ∑p q ∑pq
0 1 0 0 1 0
P01 × Q01 =
∑pq 1 1
×
∑pq
1 1
∑p q 0 0 ∑p q
0 0
P01 × Q01 =
∑pq 1 1
∑pq 0 0
Circular Test
It is an extension of time reversal test. The time reversal test takes into account only two
years. The current and base years. The circular test would require this property to holdgood for
any two years. An index number is said to satisfy the circular test when there are three indices,
P01, P12 and P20, such that P01 × P12 × P20 = 1.
Laspeyres, Paasche’s and Fisher’s ideal index numbers do not satisfy this test.
Solution:
Index number by Fisher’s ideal index method
Commodity p0 q0 p1 q1 p0q0 p0q1 p1q0 p1q1
A 4 40 5 60 160 240 200 300
B 5 50 10 70 250 350 500 700
C 8 65 12 80 520 640 780 960
D 6 20 6 90 120 540 120 540
E 7 30 10 75 210 525 300 750
1260 2295 1900 3250
=P01
∑q p × ∑q p
1 0 1 1
∑q p ∑q p 0 0 0 1
1900 3250
= ×
1260 2295
=P10
∑q p × ∑q p
0 1 0 0
∑q p ∑q p 1 1 1 0
2295 1260
= ×
3250 1900
1900 3250 2295 1260
∴ P01 × P10 =
Hence, × × ×
1260 2295 3250 1900
= 1 1
=
Solution
∑ p1q1
Factor Reversal test: P01 × Q01 =
∑ p0 q0
= P01
∑pq ×∑pq
1 0 1 1
∑pq ∑pq
0 0 0 1
Q01
=
∑q p × ∑q p
1 0 1 1
∑q p ∑q p
0 0 0 1
P ×= ∑ p q × ∑ p q × ∑q p × ∑q p
1 0 1 1 1 0 1 1
01 Q01
∑ p q ∑ p q ∑q p ∑q p
0 0 0 1 0 0 0 1
=
∑pq
1 1
∑pq
0 0
Hence, Fisher ideal index number satisfies the factor reversal test
Note: Consumer price index numbers are also called as cost of living index numbers.
Solution
(i) Calculation of cost of living index number on the basis of Aggregate expenditure
method.
=P01 ∑pq 1 0
×100
∑pq 0 0
1152
= ×100
940
≈ 112.6
(iii) Calculation of consumer price index number according to family budget method or
weighted relative method
p1
Commodity q0 p0 p1 p 100 w = p0q0 wp
p0
Wheat 20 15 20 400/3 300 40000
Rice 8 20 24 120 160 19200
Sugar 2 160 200 125 320 40000
Ghee 4 40 40 100 160 16000
940 115200
Consumer price index number for 2015
POINTS TO REMEMBER
EXERCISE 6
2010 2011
Commodity
Price (`) Quantity Price (`) Quantity
A 15 15 22 12
B 20 5 27 4
C 4 10 7 5
What change in the cost of living figures of 2015 has taken place as compared to 2014?
35. Construct the cost of living index of 2014 using family budget method.
Expenses % base year (2000) year 2004
Food 40 150 174
Rent 15 50 60
Clothing 15 100 125
Fuel 10 20 25
Misc 20 60 90
36. Construct the index of 2014 from the following data for the year 2012 taking 2011 as base
year as base using i) arithmetic mean and ii) geometric mean.
Item Price (`) in 2014 Price (`) in 2015
A 6 10
B 2 2
C 4 6
D 10 12
E 8 12
37. Compute price index for the following data by applying weighted average of price relative
method using i) arithmetic mean and ii) geometric mean.
Item Price (`) in 2006 Price (`) in 2007 Quantity in 1996
A 2 2.5 40
B 3 3.25 20
C 1.5 1.75 10
Both Box and Jenkins contributed to Auto regressive moving average models popularly known as
Box-Jenkins Models.
LEARNING OBJECTIVES
The students will be able to
understand the concept of time series
know the upward and downward trends
calculate the trend values using semi - average and moving average methods
estimate the trend values using method of least squares
compute seasonal indices
understand cyclical and irregular variations
understand the forecasting concept
Introduction
In modern times we see data all around. The urge to evaluate the past and to peep into
the future has made the need for forecasting. There are many factors which change with the
passage of time. Sometimes sets of observations which vary with the passage of time and whose
measurements made at equidistant points may be regarded as time series data. Statistical data
which are collected, observed or recorded at successive intervals of time constitute time series
data. In the study of time series, comparison of the past and the present data is made. It also
compares two or more series at a time. The purpose of time series is to measure chronological
variations in the observed data.
7.1 DEFINITION
Time series refers to any group of statistical information collected at regular intervals of
time. Time series analysis is used to detect the changes in patterns in these collected data.
1. Secular trend
2. Seasonal variation
3. Cyclical variation
4. Irregular (random) variation
Y=T+C+S+R
where Y = magnitude of a time series
T = Trend,
C =Cyclical component,
S =Seasonal component, and
R = Random component
In additive approach, the unit of measurements remains the same for all the four components.
Multiplicative Additive
(i) Four components of time series are Four components of time series are
interdependent independent
(ii) Logarithm of components are additive Components are additive
Merits
• It is simple method of estimating trend.
Demerits
• It is a subjective method
• The values of trend obtained by different statisticians would be different and hence not reliable.
Example 7.1
Annual power consumption per household in a certain locality was reported below.
Years 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000
Power used 15 20 21 25 28 26 30 32 40 38
(units)
Years
Merits NOTE
• This method is very simple and easy to understand In semi-average method if the
difference between the semi-
• It does not require many calculations.
averages is negative then the trend
values will be in decreasing order.
Demerits
• This method is used only when the trend is linear.
• It is used for calculation of averages and they are affected by extreme values.
Example 7.2
Calculate the trend values using semi-averages methods for the income from the forest
department. Find the yearly increase.
Example 7.3
Population of India for 7 successive census years are given below. Find the trend values
using semi-averages method.
Solution:
Trend values using semi average method
Population
Census Year 3-year semi-total 3-year semi-average Trend values
(in lakhs)
1951 301.2 278.86
1961 336.9 1050.1 350.03 350.03
1971 412.0 421.2
1981 484.1 492.37
1991 558.6 563.54
2001 624.1 1904.1 634.7 634.71
2011 721.4 705.88
Example 7.4
Find the trend values by semi-average method for the following data.
Solution:
Trend values using semi averages method
Production of
Year 4 year semi-total 4 year semi-average Trend
bleaching powder
1965 7.4 7.315
1966 10.8 8.755
37.9 9.475
1967 9.2 10.195
1968 10.5 11.635
1969 15.5 13.075
1970 13.7 14.515
60.9 15.225
1971 16.7 15.955
1972 15 17.395
Merits
• It can be easily applied
• It is useful in case of series with periodic fluctuations.
• It does not show different results when used by different persons
• It can be used to find the figures on either extremes; that is, for the past and future years.
Demerits
• In non-periodic data this method is less effective.
• Selection of proper ‘period’ or ‘time interval’ for computing moving average is difficult.
• Values for the first few years and as well as for the last few years cannot be found.
Example 7.5
Calculate the 3-year moving averages for the loans issued by co-operative banks for non-
farm sector/small scale industries based on the values given below.
Year 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10 2010-11 2011-12 2012-13 2013-14 2014-15
Loan by
District Central
41.82 40.05 39.12 24.72 26.69 59.66 23.65 28.36 33.31 31.60 36.48
Cooperative banks
(Rupees in crores)
Example 7.6
Compute the trends by the method of moving averages, assuming that 4-year cycle is
present in the following series.
Year 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008
Annual value 154.0 140.5 147.0 148.5 142.9 142.1 136.6 142.7 145.7 145.1 137.8
Year Annual value 4-year moving total Centered total 4-year moving average
1998 154.0
-
1999 140.5 -
590.0
2000 147.0 1168.9 146.11
578.9
2001 148.5 1159.4 144.93
580.5
2002 142.9 1150.6 143.83
570.1
2003 142.1 1134.4 141.8
564.3
2004 136.6 1131.4 141.43
567.1
2005 142.7 1137.2 142.15
570.1
2006 145.7 1141.4 142.68
571.3
2007 145.1 -
-
2008 137.8
n n
yi na b xi
i 1 i 1 (7.1)
n n n
xi yi a x b xi2
i 1 i 1 i 1 (7.2)
Solving these two equations we get the vales for a and b and the fit of the trend equation
(line of best):
y = a + bx(7.3)
Substituting the observed values xi in (7.3) we get the trend values y i, i = 1, 2, …, n.
Note: The time unit is usually of uniform duration and occurs in consecutive numbers. Thus,
when the middle period is taken as the point of origin, it reduces the sum of the time variable x
n
to zero xi 0 and hence we get
i 1
n n
∑ yi ∑x y i i
a= i =1
and b = i =1
n by simplifying (7.1) and (7.2)
n
∑x i =1
i
2
The number of time units may be even or odd, depending upon this, we follow the method
of calculating trend values using least square method.
Merits
• The method of least squares completely eliminates personal bias.
• Trend values for all the given time periods can be obtained
• This method enables us to forecast future values.
Demerits
• The calculations for this method are difficult compared to the other methods.
• Addition of new observations requires recalculations.
• It ignores cyclical, seasonal and irregular fluctuations.
• The trend can be estimated only for immediate future and not for distant future.
iii) Find ui = xi – A
yi na b ui
i 1 i 1
n n n
u y
i 1
i i a ui b ui2
i 1 i 1
n n
∑ yi ∑u y i i
Find a =
i =1
and b = i =1
n n
∑u
i =1
2
i
Example 7.7
Fit a straight line trend by the method of least squares for the following consumer price
index numbers of the industrial workers.
Solution:
Index ui = X – A
Year X = xi – 2010 ui2 ui yi Trend
Number =X–2
2010 166 0 –2 4 –332 165
2011 177 1 –1 1 –177 181.2
2012 198 2 0 0 0 197.4
2013 221 3 1 1 221 213.6
2014 225 4 2 4 450 229.8
5 5 5 5
yi 987 ui 0
i 1
ui2 10 u y
i 1
i i 162
i 1 i 1
∑u uy
y i
162 i
b b=
i =1
16.2
n
∑u
u 10 2
i
2
i =1
y = 197.4 + 16.2 (X – 2)
= 197.4 + 16.2 X – 32.4
= 16.2 X + 165
That is, y = 165 + 16.2X
To get the required trend values, put X = 0, 1, 2, 3, 4 in the estimated equation.
X = 0, y = 165 + 0 = 165
X = 1, y = 165 + 16.2 = 181.2
X = 2, y = 165 + 32.4 = 197.4
X = 3, y = 165 + 48.6 = 213.6
X = 4, y = 165 + 64.8 = 229.8
Hence, the trend values for 2010, 2011, 2012, 2013 and 2014 are 165, 181.2, 197.4, 213.6 and 229.8
respectively.
ii). Find ui = 2X – (n – 1)
Then follow the same procedure used in previous method for odd years
Example 7.8
Tourist arrivals (Foreigners) in Tamil Nadu for 6 consecutive years are given in the following
table. Calculate the trend values by using the method of least squares.
u
6
yi 115
i 1 i 1
i 0
i 1
2
i 70 u y i i 115
i 1
yy i
115
aa i 1 19.17
nn 6
n
∑uuyy 115 i i
bb=
i =1
1.64
∑u
u n
70 2
i
2
i =1
y = a + bu
= 19.17 + 1.64 (2X – 5)
= 19.17 + 3.28X – 8.2
= 3.28X + 10.97
That is, y = 10.97 + 3.28X
To get the required trend values, put X = 0, 1, 2, 3, 4, 5 in the estimated equation. Thus,
X = 0, y = 10.97 + 0 = 10.97
X = 1, y = 10.97 + 3.28 = 14.25
X = 2, y = 10.97 + 6.56 = 17.53
X = 3, y = 10.97 + 9.84 = 20.81
X = 4, y = 10.97 + 13.12 = 24.09
X = 5, y = 10.97 + 16.4 = 27.37
Hence, the trend values for 2005, 2006, 2007, 2008, 2009 and 2010 are 10.97, 14.25, 17.53, 20.81,
24.09 and 27.37 respectively.
Example 7.9
Calculate the seasonal indices for the rain fall (in mm) data in Tamil Nadu given below by
simple average method
Season
Year
I II III IV
2001 118.4 260.0 379.4 70
2002 85.8 185.4 407.1 8.7
2003 129.8 336.5 403.1 12.0
2004 283.4 360.7 472.1 14.3
2005 231.7 308.5 828.8 15.9
Season
Year
I II III IV
2001 118.4 260.0 379.4 70
2002 85.8 185.4 407.1 8.7
2003 129.8 336.5 403.1 12.0
2004 283.4 360.7 472.1 14.3
2005 231.7 308.5 828.8 15.9
Seasonal total 849.1 1451.1 2490.5 120.9
Seasonal average 169.82 290.22 498.1 24.18
Seasonal index 69 118 203 10
Quarter
Year
I II III IV
2009 38.2 166.8 612.6 72.2
2010 38.5 250.9 773.1 153.1
2011 55 277.7 717.8 65.8
2012 50.5 197 706.1 101.1
Seasonal total 182.2 892.4 2809.6 392.2
Seasonal average 45.55 223.1 702.4 98.05
Seasonal index 17 83 263 37
Depression
The cyclical pattern of any time series tells about the prosperity and recession, ups and downs,
booms and depression of a business. In most of the businesses there are upward trend for some time
followed by a downfall, touching its lowest level. Again a rise starts which touches its peak. This process
of prosperity and recession continues and may be considered as a natural phenomenon.
7.4 FORECASTING
The importance of statistics lies in the extent to which it serves as the basis for making
reliable forecasts, against arbitrary forecast with no statistical background.
POINTS TO REMEMBER
yi na b xi
i 1 i 1
n n n
xi yi a x b xi2
i 1 i 1 i 1
4. Data on annual turnover of a company over a period of ten years can be represented by a
(a) a time series (b) an index number
(c) a parameter (d) a statistic
9. The terms prosperity, recession, depression and recovery are in particular attached to
(a) secular trend (b) seasonal fluctuation
(c) cyclical movements (d) irregular variation
34. With what characteristic component of a time series should each of the following be associated.
(i) An upturn in business activity
(ii) Fire loss in a factory
(iii) General increase in the sale of Television sets.
35. The number of units of a product exported during 1990-97 is given below. Draw the trend
line using graphical method.
Year 1990 1991 1992 1993 1994 1995 1996 1997
No. of units 12 13 13 16 19 23 21 23
exported (in ‘000)
36. Draw a time series graph relating to the following data and show the trend by free hand
method
Year 1996 1997 1998 1999 2000 2001 2002 2003
Production in Million tonnes 40 44 42 48 51 54 50 56
38. Yield of ground nut in Kharif season in India for the years 2003-04 to 2009-10 are given
below. Calculate 3-year moving averages.
Year 2003-04 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10
Yield
1320 909 1097 689 1386 1063 835
(kg/hectare)
41. The following data states the number of ATM centers during 1995 to2001.
Year 1995 1996 1997 1998 1999 2000 2001
Number of ATM centres 50 63 75 100 109 120 135
Obtain the trend values using semi averages method
42. From the following data estimate the trend values using semi averages method
Year 2003 2004 2005 2006 2007 2008 2009 2010
Consumption of
cotton (Thousands 677 696 747 755 766 777 785 836
of bales)
43. Following data gives the yield of food grains in India for the years 2000-01 to 2009-10. Find
the trend values using 4 year moving averages.
Year 2000-01 2001-02 2002-03 2003-04 2004-05 2005-06 2006-07 2007-08 2008-09 2009-10
Yield (kg/
1626 1734 1535 1727 1652 1715 1756 1860 1909 1798
hectare)
44. Estimate the value of production for the year 1995 by using the method of least squares from
the following data.
Year 1990 1991 1992 1993 1994
Production (1000s tons) 70 72 88 90 92
45. Find the following for the calculation of number of telephones for the year 2000.
(1) Fit a straight line trend by the method of least squares.
(2) Calculate the trend values.
48. Find seasonal Indices for the rainfall data in Tamil Nadu (in mm)
Quarter
Year 2009 2010 2011 2012
I 38.2 38.5 55 50.5
II 166.8 250.9 277.7 197
III 612.6 773.1 717.8 706.1
IV 72.2 153.1 65.8 101.1
49. The following table gives quarterly expenditure over a number of years. Obtain seasonal
correction for the data
Year
Season 2000 2001 2002 2003
I 78 84 92 100
II 62 64 70 81
III 56 61 63 72
IV 71 82 83 96
50. Find the trend values using semi averages method. The following table shows the area covered
for cultivation of Ragi in Tamil Nadu (in ‘000 hectares)
Year 2003 2004 2005 2006 2007 2008 2009 2010
Area 118 109 100 95 94 90 82 76
(in 000 hectares)
‘
Steps:
• Open the browser and type the URL given (or) scan the QR code.
• GeoGebra work book called “Time Series Plot” will open.
• Move the “seed” slider to select a new sample.
• Put the Tick Mark in the Check box to see an answer to the problem.
Step-1 Step-2
Step-3
URL:
https://www.geogebra.org/m/Cd6dvjMV
VITAL STATISTICS
8 AND OFFICIAL
STATISTICS
LEARNING OBJECTIVES
Introduction
Demography is a term, generally, concerned with human population and it is also
concerned with the social implications of periodical variations taking place in the population
with reference to geographical location(s).
Vital Statistics is a branch of Demography, which is the science applied to the analysis and
interpretation of numerical facts regarding vital events occurring in human population such as
births, deaths, marriages, divorces, migration etc.
The following are some of the important definitions of Vital Statistics:
“The whole study of mankind by heredity or environment in so far as the results of their study
can be arithmetically stated”
–Arthur Newsholme
“Vital Statistics are conventional numerical records of marriages, births, sickness and deaths
by which the health and growth of a community may be studied”.
–Benjamin
Vital Statistics is the science of numbers applied to the life history of communities or regions.
Civil Registration System is the most common method of collecting information on vital
events. It is an administrative procedure followed by governments, to record various vital events
occurring in their population.
In this method, occurrence of the vital events such as births, deaths, marriages, migration
etc., are registered. Many countries adopt this system. Registration is done with the Authorities
appointed by the respective government. In India, registration of births and deaths are made
compulsory by legislation, through an act viz., "The Registration of Births and Deaths Act, 1969”.
It came into force throughout the country through a gazette notification published in 1970.
Ad hoc surveys are conducted in areas where the recording of births and deaths has not
been done properly and periodically, particularly in those areas where registration offices have
not been established. However, survey records help to provide Vital Statistics for that region only.
Vital rates are required to monitor population growth, especially for the purpose of evaluation
of family planning programmes in terms of their ultimate objective of controlling fertility.
National Level
(a) Infant mortality
(b) Age specific mortality rates in rural areas
(c) Sampling variability of vital rates
State Level
Example 8.1
There were 15,000 persons living in a village during a period and the number of persons
dead during the same period was 98.
Then, the CDR of the village can be calculated from
D
CDR= ×1000
P
as
98
CDR
= ×1000 = 6.53 per thousand.
15000
In some cases, information about the population may be provided in such a manner that
people in the population are grouped according to their age into mutually exclusive and exhaustive
age groups. The CDR can also be calculated based on such kind of information.
Example 8.2
People living in a town are grouped according to their age into five groups. The number of
persons lived during a calendar year and the number of deaths recorded during the same period
are as follows:
Age Group (in years) 0-10 10-30 30-50 50-70 70 and above
No. of Persons 5,000 10,000 15,000 10,000 2,000
No. of Deaths 125 30 30 200 1,000
Solution:
The total number of deaths occurred in the town, irrespective of the age, is 1385 and the
population size is 42,000. Therefore, the CDR of the town can be calculated as
1385
CDR
= ×1000 = 32.98
42000
Thus, the Crude Death Rate of the town is 32.98 per thousand.
Mortality pattern may differ in different sections/segments of the population such as age,
gender, occupation etc.
Specific Death Rate (SDR) can be calculated exclusively for a section of the population.
The SDR can be calculated for a group of persons, who are distinguished by age or gender or
occupational class or marital status. The formula to calculate SDR is
D
SDR = S ×1000,
PS
where
DS refers to the number of deaths in a specific section of population during the given
period, and
PS refers to the total number of persons in the specific section of population during the
given period.
SDRs can be calculated for any age group, gender, religion, caste or community. If the
death rates are calculated for different age groups, say, 0-5 years, 5-15 years, 50-60 years, they are
called age specific death rates (ASDRs). In the age group (x,x+n), all the persons in the population
or in its section, who have attained the age of x years and the persons with age less than x+n years,
are included in the age group.
If
(x,n) denotes the number of deaths in the age group (x,x+n) recorded in a given region
D
during a given period, and
P(x,n) denotes the number of persons in the age group (x,x+n) in the region during the
same period, then
SDR for the age group (x,x+n) for the given region during the period is given by
A
D ( x, n )
( x, n )
ASDR= ×1000.
P ( x, n )
The death rates, calculated for persons belonging to different gender, is called as gender
specific death rate. The SDR can also be used to compare the death rates due to different kinds of
seasonal diseases such as dengue, chikungunya, swine flu.
The SDR helps to measure the death rates for different sections/segments of the population
unlike CDR. ASDR and SDR for gender can be used to compare the death rates of the respective
sections of the given population in different regions.
Find the crude death rates and age specific death rates of Area I and Area II.
Solution:
Age Specific Death Rate can be calculated for each age group using the formula
D ( x, n )
( x, n )
ASDR= ×1000
P ( x, n )
Calculation of the ASDR for both the areas in each age group is presented in the following table:
Area I Area II
Age Group
(x) ASDR(x,n) ASDR(x,n)
P(x,n) D(x,n) P(x,n) D(x,n)
(per thousand) (per thousand)
55 300
0-10 3000 55 ×1000 = 18.33 7500 300 ×1000 = 40.00
3000 7500
30 50
10-25 4500 30 ×1000 = 6.67 6000 50 ×1000 = 8.33
4500 6000
40 40
25-45 6000 40 ×1000 = 6.67 8000 40 ×1000 = 5.00
6000 8000
15 64
45 and over 1000 15 ×1000 = 15.00 2000 64 ×1000 = 32.00
1000 2000
Total 14500 140 23500 454
140
CDR of Area I = ×1000 = 9.66 per thousand
14500
454
CDR of Area II = ×1000 = 19.32 per thousand
23500
Find the Illness specific death rates for the two districts. Also, compare health conditions
of both the districts with reference to these two causes of death. Assume that a person affected by
Diabetes is not affected by Lung Cancer and vice-versa.
Solution:
The SDR due to the two causes of death are calculated as follows:
SDR of District A
DDiabetes
SDR
SDR
=Diabetes
Diabetes
×1000
PDiabetes
325
= ×1000
20000
SDRDiabetes = 16.25 per thousand
DLung Cancer
SDR=
Lung Cancer
Lung Cancer
×1000
PLung Cancer
300
= ×1000
19500
SDRLung Cancer = 15.38 per thousand
SDR of District B
400
SDR=
Diabetes ×1000
22000
SDRDiabetes = 18.18 per thousand
380
SDRLung=
Cancer ×1000
21225
SDRLung Cnacer = 17.90 per thousand
In both the districts, death rates are more due to Diabetes in comparison with Lung Cancer.
Among the two districts, District B has relatively more death rate due to both Diabetes and Lung
Cancer.
where
DInfant denotes the number of deaths of infants in a population during a period, and
PInfant denotes the number of live births in the population during the period.
Child Mortality Rate: Child mortality is the death of a child before the child’s fifth birth
day, measured as the under 5 child mortality rate (U5MR).
Example 8.5
The number of live births recorded and the number of infants died in a town during a
given period are respectively 400 and 25. Calculate, from these information, the infant mortality
rate of the town for the period.
Solution:
The IMR of the town is given by
25
IMR
= ×1000
400
IMR = 62.50 per thousand.
Example 8.6
A Life Table was constructed for a cohort. The following is a section of the table, wherein
some of the entries are not available. Find the estimates of missing values and complete the Life
Table.
Age (in years) l(x) d(x) p(x) q(x) L(x) T(x) e0(x)
40 10, 645 - - - - 1, 93, 820 -
41 10, 543 169 - - - - -
Solution:
The Life Table can be completed using the relationship among missing terms and other terms.
The number of persons who die before reaching age 40 years is calculated as
d(40) = l(40) – l(41)
= 10645 – 10543
Therefore, d(40) = 102.
= 0.0095.
= 0.0160.
Values of p(x) are estimated from the corresponding values of q(x) as
p(40) = 1 – q(40)
= 1 – 0.0095 = 0.9905
p(41) = 1 – q(41)
= 1 – 0.0160 = 0.9840
Values of L(x) are estimated using its relationship with l(x) and d(x) as follows:
= 10,594
= 10,459 (Approx.)
The value of T(41) is estimated from the given value of T(40) and the estimated value of
L(40) from the relationship
as
Example 8.7
The following is a part of the Life Table constructed for a population, where the contents
are incomplete. Evaluate the missing values using the given data and complete the Life Table.
Solution:
Values of the missing entries can be estimated from the given data applying the respective
formulae as follows:
The number of persons who die before reaching age of 83 years is calculated as
= 3560 × 0.16
= 569.6 = 570
= 3560 – 570
= 2990
p(83) = 1 – q(83)
= 1 – 0.16 = 0.84
p(84) = 1 – q(84)
= 1 – 0.17 = 0.83
= 3,275
L(84) = 2736.
The value of T(83) can be estimated from the given value of T(84) and the estimated value
of L(83) from the relationship
The life expectancy of the cohort at the age x = 83 and 84 years is estimated using the
relationship
as
Example 8.8
A part of the Life Table of a population is given hereunder with incomplete information.
Find those information from the given data and complete the Life Table.
Solution:
Values of the missing entries can be calculated from the given data applying the respective
formulae as follows:
The number of persons who die before reaching age x = 72 and 73 years can be calculated as
d(72) = l(72) – l(73)
= 4412 – 3724
= 688
d(73) = l(73) – l(74)
= 3724 – 3201
d(73) = 523.
Values of q(x) are estimated as
= 0.1559
=
q(73) = 0.1404
= 0.2006.
Values of p(x) are estimated from the corresponding values of q(x) as
p(72) = 1 – q(72)
= 1 – 0.1559 = 0.8441
=
= 4,068
=
= 3,463
=
= 2880.
The value of T(x) is estimated for x = 72 and 73 from the given value of T(74) and the
estimated values of L(72) and L(73) as
= 3463 + 26567 = 30,030.
= 4068 + 30030 = 34,098.
The life expectancy of the cohort at the age x = 72, 73 and 74 years is estimated using the
relationship
as
Example 8.9
Find the missing values in the following Life Table:
Age
l(x) d(x) p(x) q(x) L(x) T(x) e0(x)
(in years)
4 95,000 500 - - - 48,50,300 -
5 - 400 - - - - -
Solution:
Value of the survivorship function l(x) at x = 5 years can be estimated as
= 95000 – 500
= 94500
= 0.005
= 0.004.
p(4) = 1 – q(4)
= 1 – 0.005 = 0.995
p(5) = 1 – q(5)
= 1 – 0.004 = 0.996.
= 94,750
= 94,300.
The value of T(5) is estimated from the given value of T(4) and the estimate of L(4) as
T(5) = T(4) – L(4)
T(5) = 4850300 – 94750 = 47,55,550.
The life expectancy of the cohort at the age x = 4 and 5 years is estimated using the relationship
as
Example 8.10
The number of children born in a city during a period was 15,628 and the total population
of the city in that period was 80,00,000. Find the crude birth rate of the city.
Solution:
The Crude Birth Rate can be calculated using the formula
Bt
CBR
= ×1000
Pt
The CBR of the city is
15628
CBR
= ×1000
8000000
= 1.95 per thousand.
Example 8.11
People living in a town are grouped according to their age into nine groups. Details about
the number of live birts are also grouped according to the age group of women. These information
are presented in the following table:
Solution:
The total number of persons in the town during the specified period can be calculated
from the given information as
P t = 1,89,000
where
Pt i denotes the number of women in the reproductive age i years in the given region/
community during the period t, i = a1 to a2.
In India, generally, a1 = 15 years and a2 = 49 years.
GFR overcomes the disadvantage of CBR considering only the women population at the
child bearing age group, since the denominator in the above formula represents the entire women
population at the reproductive age group. GFR expresses the increase in the women population at
the child bearing age through live births.
However, GFR does not express the age composition of women population at the
reproductive age group. Hence, two different regions/communities cannot be compared with
respect to age of women using GFR.
Solution:
The total number of women, at child bearing age in the district during the study period
can be calculated from the given information as
49
∑P
i =15
t
i
= 1,63,000
It is a well known fact that fertility is affected by several factors such as age, marriage,
migration, region etc. But, both CBR and GFR do not take into account of this fact. In this respect,
Specific Fertility Rate (SFR) is defined as
Number of live births to the women population in the reeproductive age groups of specific section in a given periiod
SFR 1000
Total number of women in the reproductive age groups off the specific section in the given period
SFR can be calculated separately for various age groups of females who are at child bearing
age such as 15-20, 20-25, and so on. The SFR computed with respect to different reproductive age
of women is known as the Age Specific Fertility Rate (ASFR), which can be calculated using the
formula
Bt ( x, x + n)
ASFR ( x=
, x + n) ×1000
Pt ( x, x + n)
where
Bt ( x, x + n) denotes the number of live births to the women in the reproductive age group
(x,x+n) during the period t in the given region, and
Pt ( x, x + n) denotes the number of women in the reproductive age group (x,x+n) during the
period t in the given region.
Example 8.13
The female population, at reproductive age, of a country is grouped into six age groups.
The number of women in each group and the number of live births given by them are given the
following table.
Calculate the general fertility rate and the age specific fertility rates of the country.
Solution:
Total number of women in the country at reproductive age during the period of ‘t’ is
44
∑P
i =15
t
i
= 5,64,070
and the total number of live births of the country during the same period is
Bt = 52,852
Therefore, the general fertility rate of the country is
52852
GFR = ×1000
564070
= 93.69 per thousand.
The age specific fertility rates of the country can be calculated for each age group using
the formula
Bt ( x, x + n)
ASFR ( x=
, x + n) ×1000
Pt ( x, x + n)
where
Bt ( x, x + n) denotes the number of live births to the women in the reproductive age group
(x,x+n) during the period t in the given region, and
Pt ( x, x + n) denotes the number of women in the reproductive age group (x,x+n) during
the period t in the given region.
10668
15-20 ×1000 = 91.48
116610
17183
20-25 ×1000 = 150.98
113810
12722
25-30 ×1000 = 123.36
103130
7283
30-35 ×1000 = 77.89
93500
3656
35-40 ×1000 = 49.33
74120
1340
40-45 ×1000 = 21.30
62900
It is can be observed, with respect to ASFRs, that the women in the country falling in the
age group of 20-25 years have given relatively more live births. The women at the age of 40-45
years have reproduced less number of live births.
Example 8.14
The following is the data regarding the size of female population in a country at reproductive
age and the live births during a period.
Age Group (yrs) Female Population No. of Live Births
15–20 2,16,410 20,468
20–25 2,13,610 26,983
25–30 2,02,930 22,522
30–35 1,93,300 17,083
35–40 1,73,920 13,456
40–45 1,62,870 11,140
Calculate the general fertility rate and the age specific fertility rates of the country.
Solution:
Size of the women population of the country at reproductive age during the period ‘t’, is
44
∑ P = 11,63,040
i =15
t
i
and the total number of live births occurred during the same period in the country is
Bt = 1,11,652.
20468
15 – 20 ×1000 =
94.58
216410
26983
20 – 25 ×1000 =
126.32
213610
22522
25 – 30 ×1000 =
110.98
202930
17083
30 – 35 ×1000 =
88.38
193300
13456
35 – 40 ×1000 =
77.37
173920
11140
40 – 45 ×1000 =
68.40
162870
It can be observed, with respect to ASFR, that the women population of the country falling
in the age group of 20-25 years have given relatively more live births. The women at the age of
40-45 years have reproduced less number of live births.
Positive values of Crude Rate of Natural Increase indicate the net increase in the population.
Similarly, negative values of Crude Rate of Natural Increase indicate the net decrease in the
population.
If Pearl’s Vital Index is greater than 100, then it can be regarded as the population is
growing. On the other hand, if this index is less than 100, it can be regarded as the population is
not growing. The above formula shows that the Vital Index can also provide knowledge on birth-
death ratio of the population.
These two measures are simple and easy to calculate. They indicate whether the number of
births exceeds the number of deaths. However, these two measures suffer from the limitations of
CBR and CDR. They cannot be used for comparing two different populations. Also, information
regarding whether the population has a tendency to increase or decrease cannot be obtained from
these two measures.
The Second Five Year Plan of India followed the model developed by Prof. P.C. Mahalanobis,
which focused on public sector development and rapid industrialization. The Government of
India honoured him with one of the highest civilian awards Padma
Vibhushan. The Government of India released a stamp on June 29, 1993
in commemoration of his 100th birthday. Recently, the Government of
India released a commemorative coin on June 29, 2018 during the
celebration of his 125th birthday.
The headquarters of this division is at Delhi. This division has a network of 6 Zonal Offices,
49 Regional Offices and 118 Sub-Regional Offices spread throughout the country. This division is
responsible for collection of primary data for the surveys undertaken by NSSO.
The Division, with its headquarters at Kolkata and 6 Data Processing Centers at various
places, is responsible for
• selection of sample subjects
• developing relevant software
• processing, validation and tabulation of the data collected through surveys.
Vital Statistics are quantitative measurements on live births, deaths, foetal deaths, infant
deaths, fertility and so on.
Data on vital events are collected adopting the five methods - Civil Registration System,
Census or Complete Enumeration method, Survey method, Sample Registration System
and Analytical method.
Census method normally covers data regarding age, sex, marital status, educational level,
occupation, religion and other factors needed for computing vital statistics. Census is
conducted in most countries at intervals of ten years.
Rates of vital events are usually expressed ‘per thousand’.
No. of deaths in a specific section of the population during the given period
×1000
Total number of persons in the specific section of the population during the given period
Crude Rate of Natural Increase = Crude Birth Rate – Crude Death Rate
Crude Birth Rate
Pearle's
Pearl's Vital Index= ×100
Crude Death Rate
Official statistics are the statistical information collected and compiled on various aspects
including all major areas of citizens’ lives, such as economic and social development, living
conditions, health, education and environment.
An Official Statistical System was established in India by Col. Sykes during 1847 with a
Department of Statistics in India House. The first Census Report of India was published in
1848.
The Central Statistics Office is responsible for coordination of statistical activities in the
country, and evolving and maintaining statistical standards, which has five main divisions.
National Sample Survey Office (NSSO), headed by a Director General, is responsible
for conduct of national level large scale sample surveys in diverse fields, which has four
divisions.
EXERCISE 8
2. When there is no proper system of recording births and deaths, Vital Statistics are collected
through
(a) Registration Method (b) Census Method
(c) Survey Method (d) Analytical Method
3. Compulsory registration of births and deaths was implemented in India, during the year
(a) 1947 (b) 1951 (c) 1969 (d) 1970
12. Department of Statistics in India House was established in India by Col. Sykes during
(a) 1847 (b) 1947 (c) 1857 (d) 1887
17. Celebration of National Statistics Day is one of the responsibilities of ______ Division of
CSO.
(a) Coordination and Publications (b) Training
(c) Social Statistics (d) Economic Statistics
Age Group (in years) 0-10 10-20 20-40 40-60 60 and above
No. of Persons 6500 12,000 24,000 20,000 8,000
No. of Deaths 25 37 30 90 100
58. The number of deaths registered in a district with respect to age during a year and the
population size in each age group are given below. Calculate the crude death rate of the
district for the period.
Age Group (in years) Below 15 15-25 25-40 40-65 65 and above
No. of Persons 40,000 88,000 90,000 60,800 23,000
No. of Deaths 40 62 100 78 20
60. Calculate the specific death rates for each age group of a population, whose size and the
number of deaths are given in the following table:
Age Group (in years) Below 10 10-20 20-40 40-60 60 and above
No. of Persons 36,000 28,000 62,000 52,000 18,000
No. of Deaths 682 204 576 878 725
61. What are the different components and their formulae of Life Table?
62. Find the missing entries in the following Life Table.
Age (in years) l(x) d(x) p(x) q(x) L(x) T(x) e0(x)
25 75818
26 75445 2722331
27 75039 0.009
63. There are missing entries in some of the columns in the following Life Table. Find the values
of the missing entries.
Age (in years) l(x) d(x) p(x) q(x) L(x) T(x) e0(x)
42 64711 1513333
43 63787
44 62821 62310
64. The following is a section of a Life Table with some missing entries. Complete the Life Table.
Age (in years) l(x) d(x) p(x) q(x) L(x) T(x) e0(x)
36 69818
37 69032
38 68212 850 1779254
66. The women population at reproductive age in a State are grouped and the population size in
each group are given hereunder. The number of live births given by the women in each group
are also presented. Find the general fertility rate of the State. Also, calculate the age specific
fertility rate for each group.
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
No. of Women 2,12,724 1,89,237 2,45,367 1,32,109 1,29,645 90,708 34,975
No. of Live Births 20,209 23,655 37,787 12,815 9,723 4,898 874
67. The number of live births occurred in a District during a calendar year are classified according
to the age of mother. The female population size at child bearing age are also given.
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
No. of Women 4,729 6,236 8,034 9,408 5,907 4,657 2,975
No. of Live Births 356 845 970 1,878 856 608 452
alculate the general fertility rate of the District. Also, calculate the specific fertility rates for
C
each of the reproductive age group.
68. The following are the information registered about the number of live births and the female
population size in a town during a calendar year.
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
No. of Women 1,276 3,253 5,628 7,345 6,901 4,253 3,957
No. of Live Births 218 361 693 1,305 1,031 634 390
alculate from these information the general fertility rate of the town and the age specific
C
fertility rates for the year.
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
ASFR 95.00 125.00 154.00 97.00 75.00 54.00 24.99
67. GFR = 142.21
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
ASFR 75.28 135.50 120.74 199.62 145.06 130.56 151.93
68. GFR = 142.03
Age Group
15-20 20-25 25-30 30-35 35-40 40-45 45-49
(in years)
ASFR 170.85 110.97 123.13 177.67 149.40 149.07 98.56
References
Mukhopadhyay, P.(2016). Applied Statistics. Books and Allied (P) Ltd., Kolkata.
Goon, A.M., Gupta M. K. and Das Gupta B. (2016). Fundamentals of Statistics, Vol. 2. The World
Press pvt Ltd, Kolkatta.
Rao, T.J. (2010). Official Statistics in India: The Past and the Present. Journal of Official Statistics,
Vo. 26(2), 215-231.
Rao, T.J. (2013). National Statistical Commission and Indian Official Statistics. Resonance,
December, 1062-1072.
PROJECT WORK
9
LEARNING OBJECTIVES
Introduction
A mechanical engineer has to spend time in an Industry as an apprentice, a medical doctor
with a hospital as a house surgeon, an auditor with an accountant, a budding lawyer with an
established senior as a junior to get to know intricacies of the profession in his day to day affairs.
Where is the counterpart of this for a student studying Statistics? It is in this background, the
inclusion of project work in the curriculum gains importance. This would help to some extent to
bridge the gap between the theory learned in the class room and application.
The Project work is an assignment to be carried out by students during the course, either
individually or in a group under supervision of the teacher. A brief written report of this work is an
integral part of the assignment. Thus the project work is a complete assignment. . It contains the levels
of planning, execution, analysis and reporting the work done.
Advantages
1. Formulating a real world problem with statistical perspective, student acquires application
knowledge through the completion of the project.
2. Project works provides the knowledge about systematic collection of data, organization of ideas
and an ability to analyze them with in a stipulated time. The following intangible benefits will be
derived on completion of the project work.
3. It pave the way for interaction with the respondents, the ability to fit into a team, provides co-
operation and provides interaction among the students and between the teachers.
The first and foremost stages of any project work is identifying the topic of the project work. The
sources for selecting the topic include (i) individual experiences, (2) personal conversation, (4) day-
to-day practical experience, (5) Social problems (6) Politics like opinion poll.
The criteria for fixing the topic depends about data availability, time period to complete a research,
availability of expertise.
Objectives:
The aspects, the project worker want to probe in the project be spelt out in clear terms as objectives.
Research should not proceed until objectives are clearly spelt out.
Hypothesis
Hypothesis is the perception of the researcher. It is stated as a testable proposition subject to
empirical verification. Some times it is called as research hypothesis.
It involves selecting the most appropriate methods of (i) selecting respondents, (ii) method of
collecting the data, (iii) selecting the data gathering instrument and test for its reliability and validity
reliability and (iv) techniques to solve the problem under investigation. Designing a research project
can be compared to that of an architect designing a building. It is a plan or blue print for collection,
analysis and interpretation of data.
Once the project plan is ready, the student will know from whom/ where to collect the data and
the data gathering instrument. Now he moves to collect the required data.
Data file preparation involves entering in a spread sheet and checking for errors if any.
Stage 5: Statistical Data Analysis
This activity is most important. Here, we select the appropriate statistical tool for data analysis.
This necessitates a good conceptual clarity and application understanding of the subject. The selection
of statistical tool primarily depends on Research Objective. And subsequently on the variables involved,
its measurement scale (nominal, ordinal and scale ) and number of variables considered at a time.
The important step in any project study is that of preparing the project report. The report records
the purpose, the importance, the procedure, the findings, the limitations and the conclusion of the
project study. This should be prepared in such a way that it is easily understood and is helpful to other
research or project workers in a similar field.
6. From whom can be required data be found? (Where to collect? Target group?)
7. What periods of time will study include? (How many times to collect?)
Questions RESPONSES
1. Name ----------------------------------------
2. Age (in completed years) _______ Years
3. Sex 1. Male 2. Female
Once the researcher has decided on the specific type of questions and the response formats,
the next task is the actual writing of the questions. The wording in specific questions always poses
significant time investment for the researcher. The general guidelines are useful to bear in mind during
the wording and sequencing of each question.
Characteristics of a questionnaire
Once a rough draft of the Questionnaire has been designed the researcher is obligated to take a
step back and critically evaluate it. This phase may seem redundant, given all the careful thoughts that
went into each question. But recall the crucial role played by the questionnaire. At this point in the
questionnaire development of the following item should be considered.
After the questionnaire is prepared, pre-test is to be done. The process collecting information
from the related respondent in small number with the framed questionnaire is called Pre-Test.
Sometimes, the questionnaire is circulated among some competent investigators and they are asked to
make suggestions for its improvement. Once this has been done and suggestion incorporated in the
final form of the questionnaire is ready for the collection of data.
This includes the title page, the preface, the acknowledgement, the table of contents and the list
of tables and figures.
This contains the introduction to the problem, the review of previous research in a similar field,
the details of the procedure, the findings, the analysis of that, and conclusions.
Report should be written in a simple language. It should be clear, precise and simple in style,
and brief. It should be written in third person or passive voice. Spelling mistakes, colloquial form
of presentation should be avoided. Spelling of non-English words, if used, should be kept uniform
throughout.
4. When the researcher uses the data of an agency, then the data is called:
a) Quantitative data b) Qualitative data
c) Secondary data d) Primary data
6.
a) Independent variable b) Dependent variable
c) Intervening variable d) Extraneous variable
10. Thanking the people who helped as to complete the project work will come under:
a) Reference b) Conclusion
c) Acknowledgement d) Need not appear in the report.
ANNEXURES
(These Annextures use for understanding purpose only and
questions should not be asked in this portion)
Hypothess:
1. Obesity doesn't depend on sex
2. Diet intake is not associated with obesity.
3. The linear model is good fit for the data between height and weight
4. The average weight of male and the average height female do not differ due to sex.
5. Different diet intake do not affect the average height.
Stage 3: Project work planning
1. Name :
2. Sex 1.Male
2.Female
3. Residence type: 1. Urban 2. Rural.
4. Height in cms
5. Weight in kgs.
Stage 4: Data collection and data file preparation
Data Entry
2 3 1 131.5 30.5
3 1 2 132.8 31.5
4 3 1 139.8 30.5
39 2 1 126 24
40 3 2 128.5 23.5
The appropriate statistical tools are selected based on the four following aspects.
1. Purpose.
2. Variables involved.
3. Types of measurement scales.
4. Number of variables considered at a time.
Interpretation: While interpreting data, the researcher should give both findings/inference
and conclusion. It should be remembered that finding is what you have obtained from the data and
conclusion is with related to answering the research question.
Statistical calculation:
Findings:
Conclusion:
Research Hypothesis:
Test Procedure:
Null Hypothesis H0 :
Alternative Hypothesis H1:
Level of Significance α:
Test Statistic:
Calculation of calculated value using sample data:
Critical value from the table:
Inference:
Preliminaries,
Chapter I INTRODUCTION
Introduction about the research area : Broad Area
Problem Selection : Selection of the topic
Objectives of the study
Hypothesis of the study
Methodology of the study:
Sample Design
Source of Data - Instrument used for extracting information from the sample units
Pre test- Reliability and validity of the data collection instrument and Pilot study
Description of variables
Frame work of analysis
Significance of the study
Period of study
Scope of Study
Limitations of the study
Scheme or layout or organization of the report.
The syllabus for 12th standard practical are the following problems should be taken from the
textbook examples or Exercises or relevant problems in real life situation. The question paper consists
of two sections. Each section contains five questions. The students should answer four questions
choosing two from each section.
Section A
1. Tests of Significance of a Proportion and Equality of Proportions based on Z-Statistic.
2. Tests of Significance of a Mean and Equality of Means based on Z-Statistic
3. Tests of Significance of a Mean based on t-Statistic.
4. Tests of Significance for equality of Means of two Independent Populations. (Independent
samples ‘t’ test)
5. Paired t-Test for dependent samples.
6. Test of Significance for Equality of Population Variances based on F-Statistic.
7. ANOVA for One Way Classification.
8. ANOVA for Two Way Classification.
9. Chi-square Test for Independence of attributes.
10. Chi-square Test for Goodness of fit.
Section B
1. Computation of Pearson’s Correlation Coefficient.
2. Computation of Spearman’s Rank Correlation.
3. Computation of Yule’s coefficient of association.
4. Construction of Regression Equations.
5. Construction of Index Numbers.
6. Trend by the Method of “Moving Averages” of a Time series data.
7. Trend by the Method of “Least Squares” of a Time series data.
8. Seasonal Indices by the Method of “Simple Averages” of a Time series data.
9. Computation of CBR, ASBR, CDR,ASDR.
10. Construction of Life Table for Vital Statistics.
¾¾ Decision
261 Glossary
263 Glossary
Mean Difference
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
1.0 0.0000 0.0043 0.0086 0.0128 0.0170 0.0212 0.0253 0.0294 0.0334 0.0374 4 8 12 17 21 25 29 33 37
1.1 0.0414 0.0453 0.0492 0.0531 0.0569 0.0607 0.0645 0.0682 0.0719 0.0755 4 8 11 15 19 23 26 30 34
1.2 0.0792 0.0828 0.0864 0.0899 0.0934 0.0969 0.1004 0.1038 0.1072 0.1106 3 7 10 14 17 21 24 28 31
1.3 0.1139 0.1173 0.1206 0.1239 0.1271 0.1303 0.1335 0.1367 0.1399 0.1430 3 6 10 13 16 19 23 26 29
1.4 0.1461 0.1492 0.1523 0.1553 0.1584 0.1614 0.1644 0.1673 0.1703 0.1732 3 6 9 12 15 18 21 24 27
1.5 0.1761 0.1790 0.1818 0.1847 0.1875 0.1903 0.1931 0.1959 0.1987 0.2014 3 6 8 11 14 17 20 22 25
1.6 0.2041 0.2068 0.2095 0.2122 0.2148 0.2175 0.2201 0.2227 0.2253 0.2279 3 5 8 11 13 16 18 21 24
1.7 0.2304 0.2330 0.2355 0.2380 0.2405 0.2430 0.2455 0.2480 0.2504 0.2529 2 5 7 10 12 15 17 20 22
1.8 0.2553 0.2577 0.2601 0.2625 0.2648 0.2672 0.2695 0.2718 0.2742 0.2765 2 5 7 9 12 14 16 19 21
1.9 0.2788 0.2810 0.2833 0.2856 0.2878 0.2900 0.2923 0.2945 0.2967 0.2989 2 4 7 9 11 13 16 18 20
2.0 0.3010 0.3032 0.3054 0.3075 0.3096 0.3118 0.3139 0.3160 0.3181 0.3201 2 4 6 8 11 13 15 17 19
2.1 0.3222 0.3243 0.3263 0.3284 0.3304 0.3324 0.3345 0.3365 0.3385 0.3404 2 4 6 8 10 12 14 16 18
2.2 0.3424 0.3444 0.3464 0.3483 0.3502 0.3522 0.3541 0.3560 0.3579 0.3598 2 4 6 8 10 12 14 15 17
2.3 0.3617 0.3636 0.3655 0.3674 0.3692 0.3711 0.3729 0.3747 0.3766 0.3784 2 4 6 7 9 11 13 15 17
2.4 0.3802 0.3820 0.3838 0.3856 0.3874 0.3892 0.3909 0.3927 0.3945 0.3962 2 4 5 7 9 11 12 14 16
2.5 0.3979 0.3997 0.4014 0.4031 0.4048 0.4065 0.4082 0.4099 0.4116 0.4133 2 3 5 7 9 10 12 14 15
2.6 0.4150 0.4166 0.4183 0.4200 0.4216 0.4232 0.4249 0.4265 0.4281 0.4298 2 3 5 7 8 10 11 13 15
2.7 0.4314 0.4330 0.4346 0.4362 0.4378 0.4393 0.4409 0.4425 0.4440 0.4456 2 3 5 6 8 9 11 13 14
2.8 0.4472 0.4487 0.4502 0.4518 0.4533 0.4548 0.4564 0.4579 0.4594 0.4609 2 3 5 6 8 9 11 12 14
2.9 0.4624 0.4639 0.4654 0.4669 0.4683 0.4698 0.4713 0.4728 0.4742 0.4757 1 3 4 6 7 9 10 12 13
3.0 0.4771 0.4786 0.4800 0.4814 0.4829 0.4843 0.4857 0.4871 0.4886 0.4900 1 3 4 6 7 9 10 11 13
3.1 0.4914 0.4928 0.4942 0.4955 0.4969 0.4983 0.4997 0.5011 0.5024 0.5038 1 3 4 6 7 8 10 11 12
3.2 0.5051 0.5065 0.5079 0.5092 0.5105 0.5119 0.5132 0.5145 0.5159 0.5172 1 3 4 5 7 8 9 11 12
3.3 0.5185 0.5198 0.5211 0.5224 0.5237 0.5250 0.5263 0.5276 0.5289 0.5302 1 3 4 5 6 8 9 10 12
3.4 0.5315 0.5328 0.5340 0.5353 0.5366 0.5378 0.5391 0.5403 0.5416 0.5428 1 3 4 5 6 8 9 10 11
3.5 0.5441 0.5453 0.5465 0.5478 0.5490 0.5502 0.5514 0.5527 0.5539 0.5551 1 2 4 5 6 7 9 10 11
3.6 0.5563 0.5575 0.5587 0.5599 0.5611 0.5623 0.5635 0.5647 0.5658 0.5670 1 2 4 5 6 7 8 10 11
3.7 0.5682 0.5694 0.5705 0.5717 0.5729 0.5740 0.5752 0.5763 0.5775 0.5786 1 2 3 5 6 7 8 9 10
3.8 0.5798 0.5809 0.5821 0.5832 0.5843 0.5855 0.5866 0.5877 0.5888 0.5899 1 2 3 5 6 7 8 9 10
3.9 0.5911 0.5922 0.5933 0.5944 0.5955 0.5966 0.5977 0.5988 0.5999 0.6010 1 2 3 4 5 7 8 9 10
4.0 0.6021 0.6031 0.6042 0.6053 0.6064 0.6075 0.6085 0.6096 0.6107 0.6117 1 2 3 4 5 6 8 9 10
4.1 0.6128 0.6138 0.6149 0.6160 0.6170 0.6180 0.6191 0.6201 0.6212 0.6222 1 2 3 4 5 6 7 8 9
4.2 0.6232 0.6243 0.6253 0.6263 0.6274 0.6284 0.6294 0.6304 0.6314 0.6325 1 2 3 4 5 6 7 8 9
4.3 0.6335 0.6345 0.6355 0.6365 0.6375 0.6385 0.6395 0.6405 0.6415 0.6425 1 2 3 4 5 6 7 8 9
4.4 0.6435 0.6444 0.6454 0.6464 0.6474 0.6484 0.6493 0.6503 0.6513 0.6522 1 2 3 4 5 6 7 8 9
4.5 0.6532 0.6542 0.6551 0.6561 0.6571 0.6580 0.6590 0.6599 0.6609 0.6618 1 2 3 4 5 6 7 8 9
4.6 0.6628 0.6637 0.6646 0.6656 0.6665 0.6675 0.6684 0.6693 0.6702 0.6712 1 2 3 4 5 6 7 7 8
4.7 0.6721 0.6730 0.6739 0.6749 0.6758 0.6767 0.6776 0.6785 0.6794 0.6803 1 2 3 4 5 5 6 7 8
4.8 0.6812 0.6821 0.6830 0.6839 0.6848 0.6857 0.6866 0.6875 0.6884 0.6893 1 2 3 4 4 5 6 7 8
4.9 0.6902 0.6911 0.6920 0.6928 0.6937 0.6946 0.6955 0.6964 0.6972 0.6981 1 2 3 4 4 5 6 7 8
5.0 0.6990 0.6998 0.7007 0.7016 0.7024 0.7033 0.7042 0.7050 0.7059 0.7067 1 2 3 3 4 5 6 7 8
5.1 0.7076 0.7084 0.7093 0.7101 0.7110 0.7118 0.7126 0.7135 0.7143 0.7152 1 2 3 3 4 5 6 7 8
5.2 0.7160 0.7168 0.7177 0.7185 0.7193 0.7202 0.7210 0.7218 0.7226 0.7235 1 2 2 3 4 5 6 7 7
5.3 0.7243 0.7251 0.7259 0.7267 0.7275 0.7284 0.7292 0.7300 0.7308 0.7316 1 2 2 3 4 5 6 6 7
5.4 0.7324 0.7332 0.7340 0.7348 0.7356 0.7364 0.7372 0.7380 0.7388 0.7396 1 2 2 3 4 5 6 6 7
264
Mean Difference
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
5.5 0.7404 0.7412 0.7419 0.7427 0.7435 0.7443 0.7451 0.7459 0.7466 0.7474 1 2 2 3 4 5 5 6 7
5.6 0.7482 0.7490 0.7497 0.7505 0.7513 0.7520 0.7528 0.7536 0.7543 0.7551 1 2 2 3 4 5 5 6 7
5.7 0.7559 0.7566 0.7574 0.7582 0.7589 0.7597 0.7604 0.7612 0.7619 0.7627 1 2 2 3 4 5 5 6 7
5.8 0.7634 0.7642 0.7649 0.7657 0.7664 0.7672 0.7679 0.7686 0.7694 0.7701 1 1 2 3 4 4 5 6 7
5.9 0.7709 0.7716 0.7723 0.7731 0.7738 0.7745 0.7752 0.7760 0.7767 0.7774 1 1 2 3 4 4 5 6 7
6.0 0.7782 0.7789 0.7796 0.7803 0.7810 0.7818 0.7825 0.7832 0.7839 0.7846 1 1 2 3 4 4 5 6 6
6.1 0.7853 0.7860 0.7868 0.7875 0.7882 0.7889 0.7896 0.7903 0.7910 0.7917 1 1 2 3 4 4 5 6 6
6.2 0.7924 0.7931 0.7938 0.7945 0.7952 0.7959 0.7966 0.7973 0.7980 0.7987 1 1 2 3 3 4 5 6 6
6.3 0.7993 0.8000 0.8007 0.8014 0.8021 0.8028 0.8035 0.8041 0.8048 0.8055 1 1 2 3 3 4 5 5 6
6.4 0.8062 0.8069 0.8075 0.8082 0.8089 0.8096 0.8102 0.8109 0.8116 0.8122 1 1 2 3 3 4 5 5 6
6.5 0.8129 0.8136 0.8142 0.8149 0.8156 0.8162 0.8169 0.8176 0.8182 0.8189 1 1 2 3 3 4 5 5 6
6.6 0.8195 0.8202 0.8209 0.8215 0.8222 0.8228 0.8235 0.8241 0.8248 0.8254 1 1 2 3 3 4 5 5 6
6.7 0.8261 0.8267 0.8274 0.8280 0.8287 0.8293 0.8299 0.8306 0.8312 0.8319 1 1 2 3 3 4 5 5 6
6.8 0.8325 0.8331 0.8338 0.8344 0.8351 0.8357 0.8363 0.8370 0.8376 0.8382 1 1 2 3 3 4 4 5 6
6.9 0.8388 0.8395 0.8401 0.8407 0.8414 0.8420 0.8426 0.8432 0.8439 0.8445 1 1 2 2 3 4 4 5 6
7.0 0.8451 0.8457 0.8463 0.8470 0.8476 0.8482 0.8488 0.8494 0.8500 0.8506 1 1 2 2 3 4 4 5 6
7.1 0.8513 0.8519 0.8525 0.8531 0.8537 0.8543 0.8549 0.8555 0.8561 0.8567 1 1 2 2 3 4 4 5 5
7.2 0.8573 0.8579 0.8585 0.8591 0.8597 0.8603 0.8609 0.8615 0.8621 0.8627 1 1 2 2 3 4 4 5 5
7.3 0.8633 0.8639 0.8645 0.8651 0.8657 0.8663 0.8669 0.8675 0.8681 0.8686 1 1 2 2 3 4 4 5 5
7.4 0.8692 0.8698 0.8704 0.8710 0.8716 0.8722 0.8727 0.8733 0.8739 0.8745 1 1 2 2 3 4 4 5 5
7.5 0.8751 0.8756 0.8762 0.8768 0.8774 0.8779 0.8785 0.8791 0.8797 0.8802 1 1 2 2 3 3 4 5 5
7.6 0.8808 0.8814 0.8820 0.8825 0.8831 0.8837 0.8842 0.8848 0.8854 0.8859 1 1 2 2 3 3 4 5 5
7.7 0.8865 0.8871 0.8876 0.8882 0.8887 0.8893 0.8899 0.8904 0.8910 0.8915 1 1 2 2 3 3 4 4 5
7.8 0.8921 0.8927 0.8932 0.8938 0.8943 0.8949 0.8954 0.8960 0.8965 0.8971 1 1 2 2 3 3 4 4 5
7.9 0.8976 0.8982 0.8987 0.8993 0.8998 0.9004 0.9009 0.9015 0.9020 0.9025 1 1 2 2 3 3 4 4 5
8.0 0.9031 0.9036 0.9042 0.9047 0.9053 0.9058 0.9063 0.9069 0.9074 0.9079 1 1 2 2 3 3 4 4 5
8.1 0.9085 0.9090 0.9096 0.9101 0.9106 0.9112 0.9117 0.9122 0.9128 0.9133 1 1 2 2 3 3 4 4 5
8.2 0.9138 0.9143 0.9149 0.9154 0.9159 0.9165 0.9170 0.9175 0.9180 0.9186 1 1 2 2 3 3 4 4 5
8.3 0.9191 0.9196 0.9201 0.9206 0.9212 0.9217 0.9222 0.9227 0.9232 0.9238 1 1 2 2 3 3 4 4 5
8.4 0.9243 0.9248 0.9253 0.9258 0.9263 0.9269 0.9274 0.9279 0.9284 0.9289 1 1 2 2 3 3 4 4 5
8.5 0.9294 0.9299 0.9304 0.9309 0.9315 0.9320 0.9325 0.9330 0.9335 0.9340 1 1 2 2 3 3 4 4 5
8.6 0.9345 0.9350 0.9355 0.9360 0.9365 0.9370 0.9375 0.9380 0.9385 0.9390 1 1 2 2 3 3 4 4 5
8.7 0.9395 0.9400 0.9405 0.9410 0.9415 0.9420 0.9425 0.9430 0.9435 0.9440 0 1 1 2 2 3 3 4 4
8.8 0.9445 0.9450 0.9455 0.9460 0.9465 0.9469 0.9474 0.9479 0.9484 0.9489 0 1 1 2 2 3 3 4 4
8.9 0.9494 0.9499 0.9504 0.9509 0.9513 0.9518 0.9523 0.9528 0.9533 0.9538 0 1 1 2 2 3 3 4 4
9.0 0.9542 0.9547 0.9552 0.9557 0.9562 0.9566 0.9571 0.9576 0.9581 0.9586 0 1 1 2 2 3 3 4 4
9.1 0.9590 0.9595 0.9600 0.9605 0.9609 0.9614 0.9619 0.9624 0.9628 0.9633 0 1 1 2 2 3 3 4 4
9.2 0.9638 0.9643 0.9647 0.9652 0.9657 0.9661 0.9666 0.9671 0.9675 0.9680 0 1 1 2 2 3 3 4 4
9.3 0.9685 0.9689 0.9694 0.9699 0.9703 0.9708 0.9713 0.9717 0.9722 0.9727 0 1 1 2 2 3 3 4 4
9.4 0.9731 0.9736 0.9741 0.9745 0.9750 0.9754 0.9759 0.9763 0.9768 0.9773 0 1 1 2 2 3 3 4 4
9.5 0.9777 0.9782 0.9786 0.9791 0.9795 0.9800 0.9805 0.9809 0.9814 0.9818 0 1 1 2 2 3 3 4 4
9.6 0.9823 0.9827 0.9832 0.9836 0.9841 0.9845 0.9850 0.9854 0.9859 0.9863 0 1 1 2 2 3 3 4 4
9.7 0.9868 0.9872 0.9877 0.9881 0.9886 0.9890 0.9894 0.9899 0.9903 0.9908 0 1 1 2 2 3 3 4 4
9.8 0.9912 0.9917 0.9921 0.9926 0.9930 0.9934 0.9939 0.9943 0.9948 0.9952 0 1 1 2 2 3 3 4 4
9.9 0.9956 0.9961 0.9965 0.9969 0.9974 0.9978 0.9983 0.9987 0.9991 0.9996 0 1 1 2 2 3 3 3 4
265
Mean Difference
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
0.00 1.000 1.002 1.005 1.007 1.009 1.012 1.014 1.016 1.019 1.021 0 0 1 1 1 1 2 2 2
0.01 1.023 1.026 1.028 1.030 1.033 1.035 1.038 1.040 1.042 1.045 0 0 1 1 1 1 2 2 2
0.02 1.047 1.050 1.052 1.054 1.057 1.059 1.062 1.064 1.067 1.069 0 0 1 1 1 1 2 2 2
0.03 1.072 1.074 1.076 1.079 1.081 1.084 1.086 1.089 1.091 1.094 0 0 1 1 1 1 2 2 2
0.04 1.096 1.099 1.102 1.104 1.107 1.109 1.112 1.114 1.117 1.119 0 1 1 1 1 2 2 2 2
0.05 1.122 1.125 1.127 1.130 1.132 1.135 1.138 1.140 1.143 1.146 0 1 1 1 1 2 2 2 2
0.06 1.148 1.151 1.153 1.156 1.159 1.161 1.164 1.167 1.169 1.172 0 1 1 1 1 2 2 2 2
0.07 1.175 1.178 1.180 1.183 1.186 1.189 1.191 1.194 1.197 1.199 0 1 1 1 1 2 2 2 2
0.08 1.202 1.205 1.208 1.211 1.213 1.216 1.219 1.222 1.225 1.227 0 1 1 1 1 2 2 2 3
0.09 1.230 1.233 1.236 1.239 1.242 1.245 1.247 1.250 1.253 1.256 0 1 1 1 1 2 2 2 3
0.10 1.259 1.262 1.265 1.268 1.271 1.274 1.276 1.279 1.282 1.285 0 1 1 1 1 2 2 2 3
0.11 1.288 1.291 1.294 1.297 1.300 1.303 1.306 1.309 1.312 1.315 0 1 1 1 2 2 2 2 3
0.12 1.318 1.321 1.324 1.327 1.330 1.334 1.337 1.340 1.343 1.346 0 1 1 1 2 2 2 2 3
0.13 1.349 1.352 1.355 1.358 1.361 1.365 1.368 1.371 1.374 1.377 0 1 1 1 2 2 2 3 3
0.14 1.380 1.384 1.387 1.390 1.393 1.396 1.400 1.403 1.406 1.409 0 1 1 1 2 2 2 3 3
0.15 1.413 1.416 1.419 1.422 1.426 1.429 1.432 1.435 1.439 1.442 0 1 1 1 2 2 2 3 3
0.16 1.445 1.449 1.452 1.455 1.459 1.462 1.466 1.469 1.472 1.476 0 1 1 1 2 2 2 3 3
0.17 1.479 1.483 1.486 1.489 1.493 1.496 1.500 1.503 1.507 1.510 0 1 1 1 2 2 2 3 3
0.18 1.514 1.517 1.521 1.524 1.528 1.531 1.535 1.538 1.542 1.545 0 1 1 1 2 2 2 3 3
0.19 1.549 1.552 1.556 1.560 1.563 1.567 1.570 1.574 1.578 1.581 0 1 1 1 2 2 3 3 3
0.20 1.585 1.589 1.592 1.596 1.600 1.603 1.607 1.611 1.614 1.618 0 1 1 1 2 2 3 3 3
0.21 1.622 1.626 1.629 1.633 1.637 1.641 1.644 1.648 1.652 1.656 0 1 1 2 2 2 3 3 3
0.22 1.660 1.663 1.667 1.671 1.675 1.679 1.683 1.687 1.690 1.694 0 1 1 2 2 2 3 3 3
0.23 1.698 1.702 1.706 1.710 1.714 1.718 1.722 1.726 1.730 1.734 0 1 1 2 2 2 3 3 4
0.24 1.738 1.742 1.746 1.750 1.754 1.758 1.762 1.766 1.770 1.774 0 1 1 2 2 2 3 3 4
0.25 1.778 1.782 1.786 1.791 1.795 1.799 1.803 1.807 1.811 1.816 0 1 1 2 2 2 3 3 4
0.26 1.820 1.824 1.828 1.832 1.837 1.841 1.845 1.849 1.854 1.858 0 1 1 2 2 3 3 3 4
0.27 1.862 1.866 1.871 1.875 1.879 1.884 1.888 1.892 1.897 1.901 0 1 1 2 2 3 3 3 4
0.28 1.905 1.910 1.914 1.919 1.923 1.928 1.932 1.936 1.941 1.945 0 1 1 2 2 3 3 4 4
0.29 1.950 1.954 1.959 1.963 1.968 1.972 1.977 1.982 1.986 1.991 0 1 1 2 2 3 3 4 4
0.30 1.995 2.000 2.004 2.009 2.014 2.018 2.023 2.028 2.032 2.037 0 1 1 2 2 3 3 4 4
0.31 2.042 2.046 2.051 2.056 2.061 2.065 2.070 2.075 2.080 2.084 0 1 1 2 2 3 3 4 4
0.32 2.089 2.094 2.099 2.104 2.109 2.113 2.118 2.123 2.128 2.133 0 1 1 2 2 3 3 4 4
0.33 2.138 2.143 2.148 2.153 2.158 2.163 2.168 2.173 2.178 2.183 0 1 1 2 2 3 3 4 4
0.34 2.188 2.193 2.198 2.203 2.208 2.213 2.218 2.223 2.228 2.234 1 1 2 2 3 3 4 4 5
0.35 2.239 2.244 2.249 2.254 2.259 2.265 2.270 2.275 2.280 2.286 1 1 2 2 3 3 4 4 5
0.36 2.291 2.296 2.301 2.307 2.312 2.317 2.323 2.328 2.333 2.339 1 1 2 2 3 3 4 4 5
0.37 2.344 2.350 2.355 2.360 2.366 2.371 2.377 2.382 2.388 2.393 1 1 2 2 3 3 4 4 5
0.38 2.399 2.404 2.410 2.415 2.421 2.427 2.432 2.438 2.443 2.449 1 1 2 2 3 3 4 4 5
0.39 2.455 2.460 2.466 2.472 2.477 2.483 2.489 2.495 2.500 2.506 1 1 2 2 3 3 4 5 5
0.40 2.512 2.518 2.523 2.529 2.535 2.541 2.547 2.553 2.559 2.564 1 1 2 2 3 4 4 5 5
0.41 2.570 2.576 2.582 2.588 2.594 2.600 2.606 2.612 2.618 2.624 1 1 2 2 3 4 4 5 5
0.42 2.630 2.636 2.642 2.649 2.655 2.661 2.667 2.673 2.679 2.685 1 1 2 2 3 4 4 5 6
0.43 2.692 2.698 2.704 2.710 2.716 2.723 2.729 2.735 2.742 2.748 1 1 2 3 3 4 4 5 6
0.44 2.754 2.761 2.767 2.773 2.780 2.786 2.793 2.799 2.805 2.812 1 1 2 3 3 4 4 5 6
0.45 2.818 2.825 2.831 2.838 2.844 2.851 2.858 2.864 2.871 2.877 1 1 2 3 3 4 5 5 6
0.46 2.884 2.891 2.897 2.904 2.911 2.917 2.924 2.931 2.938 2.944 1 1 2 3 3 4 5 5 6
0.47 2.951 2.958 2.965 2.972 2.979 2.985 2.992 2.999 3.006 3.013 1 1 2 3 3 4 5 5 6
0.48 3.020 3.027 3.034 3.041 3.048 3.055 3.062 3.069 3.076 3.083 1 1 2 3 4 4 5 6 6
0.49 3.090 3.097 3.105 3.112 3.119 3.126 3.133 3.141 3.148 3.155 1 1 2 3 4 4 5 6 6
266
Mean Difference
0 1 2 3 4 5 6 7 8 9 1 2 3 4 5 6 7 8 9
0.50 3.162 3.170 3.177 3.184 3.192 3.199 3.206 3.214 3.221 3.228 1 1 2 3 4 4 5 6 7
0.51 3.236 3.243 3.251 3.258 3.266 3.273 3.281 3.289 3.296 3.304 1 2 2 3 4 5 5 6 7
0.52 3.311 3.319 3.327 3.334 3.342 3.350 3.357 3.365 3.373 3.381 1 2 2 3 4 5 5 6 7
0.53 3.388 3.396 3.404 3.412 3.420 3.428 3.436 3.443 3.451 3.459 1 2 2 3 4 5 6 6 7
0.54 3.467 3.475 3.483 3.491 3.499 3.508 3.516 3.524 3.532 3.540 1 2 2 3 4 5 6 6 7
0.55 3.548 3.556 3.565 3.573 3.581 3.589 3.597 3.606 3.614 3.622 1 2 2 3 4 5 6 7 7
0.56 3.631 3.639 3.648 3.656 3.664 3.673 3.681 3.690 3.698 3.707 1 2 3 3 4 5 6 7 8
0.57 3.715 3.724 3.733 3.741 3.750 3.758 3.767 3.776 3.784 3.793 1 2 3 3 4 5 6 7 8
0.58 3.802 3.811 3.819 3.828 3.837 3.846 3.855 3.864 3.873 3.882 1 2 3 4 4 5 6 7 8
0.59 3.890 3.899 3.908 3.917 3.926 3.936 3.945 3.954 3.963 3.972 1 2 3 4 5 5 6 7 8
0.60 3.981 3.990 3.999 4.009 4.018 4.027 4.036 4.046 4.055 4.064 1 2 3 4 5 6 6 7 8
0.61 4.074 4.083 4.093 4.102 4.111 4.121 4.130 4.140 4.150 4.159 1 2 3 4 5 6 7 8 9
0.62 4.169 4.178 4.188 4.198 4.207 4.217 4.227 4.236 4.246 4.256 1 2 3 4 5 6 7 8 9
0.63 4.266 4.276 4.285 4.295 4.305 4.315 4.325 4.335 4.345 4.355 1 2 3 4 5 6 7 8 9
0.64 4.365 4.375 4.385 4.395 4.406 4.416 4.426 4.436 4.446 4.457 1 2 3 4 5 6 7 8 9
0.65 4.467 4.477 4.487 4.498 4.508 4.519 4.529 4.539 4.550 4.560 1 2 3 4 5 6 7 8 9
0.66 4.571 4.581 4.592 4.603 4.613 4.624 4.634 4.645 4.656 4.667 1 2 3 4 5 6 7 9 10
0.67 4.677 4.688 4.699 4.710 4.721 4.732 4.742 4.753 4.764 4.775 1 2 3 4 5 7 8 9 10
0.68 4.786 4.797 4.808 4.819 4.831 4.842 4.853 4.864 4.875 4.887 1 2 3 4 6 7 8 9 10
0.69 4.898 4.909 4.920 4.932 4.943 4.955 4.966 4.977 4.989 5.000 1 2 3 5 6 7 8 9 10
0.70 5.012 5.023 5.035 5.047 5.058 5.070 5.082 5.093 5.105 5.117 1 2 4 5 6 7 8 9 11
0.71 5.129 5.140 5.152 5.164 5.176 5.188 5.200 5.212 5.224 5.236 1 2 4 5 6 7 8 10 11
0.72 5.248 5.260 5.272 5.284 5.297 5.309 5.321 5.333 5.346 5.358 1 2 4 5 6 7 9 10 11
0.73 5.370 5.383 5.395 5.408 5.420 5.433 5.445 5.458 5.470 5.483 1 3 4 5 6 8 9 10 11
0.74 5.495 5.508 5.521 5.534 5.546 5.559 5.572 5.585 5.598 5.610 1 3 4 5 6 8 9 10 12
0.75 5.623 5.636 5.649 5.662 5.675 5.689 5.702 5.715 5.728 5.741 1 3 4 5 7 8 9 10 12
0.76 5.754 5.768 5.781 5.794 5.808 5.821 5.834 5.848 5.861 5.875 1 3 4 5 7 8 9 11 12
0.77 5.888 5.902 5.916 5.929 5.943 5.957 5.970 5.984 5.998 6.012 1 3 4 5 7 8 10 11 12
0.78 6.026 6.039 6.053 6.067 6.081 6.095 6.109 6.124 6.138 6.152 1 3 4 6 7 8 10 11 13
0.79 6.166 6.180 6.194 6.209 6.223 6.237 6.252 6.266 6.281 6.295 1 3 4 6 7 9 10 11 13
0.80 6.310 6.324 6.339 6.353 6.368 6.383 6.397 6.412 6.427 6.442 1 3 4 6 7 9 10 12 13
0.81 6.457 6.471 6.486 6.501 6.516 6.531 6.546 6.561 6.577 6.592 2 3 5 6 8 9 11 12 14
0.82 6.607 6.622 6.637 6.653 6.668 6.683 6.699 6.714 6.730 6.745 2 3 5 6 8 9 11 12 14
0.83 6.761 6.776 6.792 6.808 6.823 6.839 6.855 6.871 6.887 6.902 2 3 5 6 8 9 11 13 14
0.84 6.918 6.934 6.950 6.966 6.982 6.998 7.015 7.031 7.047 7.063 2 3 5 6 8 10 11 13 15
0.85 7.079 7.096 7.112 7.129 7.145 7.161 7.178 7.194 7.211 7.228 2 3 5 7 8 10 12 13 15
0.86 7.244 7.261 7.278 7.295 7.311 7.328 7.345 7.362 7.379 7.396 2 3 5 7 8 10 12 13 15
0.87 7.413 7.430 7.447 7.464 7.482 7.499 7.516 7.534 7.551 7.568 2 3 5 7 9 10 12 14 16
0.88 7.586 7.603 7.621 7.638 7.656 7.674 7.691 7.709 7.727 7.745 2 4 5 7 9 11 12 14 16
0.89 7.762 7.780 7.798 7.816 7.834 7.852 7.870 7.889 7.907 7.925 2 4 5 7 9 11 13 14 16
0.90 7.943 7.962 7.980 7.998 8.017 8.035 8.054 8.072 8.091 8.110 2 4 6 7 9 11 13 15 17
0.91 8.128 8.147 8.166 8.185 8.204 8.222 8.241 8.260 8.279 8.299 2 4 6 8 9 11 13 15 17
0.92 8.318 8.337 8.356 8.375 8.395 8.414 8.433 8.453 8.472 8.492 2 4 6 8 10 12 14 15 17
0.93 8.511 8.531 8.551 8.570 8.590 8.610 8.630 8.650 8.670 8.690 2 4 6 8 10 12 14 16 18
0.94 8.710 8.730 8.750 8.770 8.790 8.810 8.831 8.851 8.872 8.892 2 4 6 8 10 12 14 16 18
0.95 8.913 8.933 8.954 8.974 8.995 9.016 9.036 9.057 9.078 9.099 2 4 6 8 10 12 15 17 19
0.96 9.120 9.141 9.162 9.183 9.204 9.226 9.247 9.268 9.290 9.311 2 4 6 8 11 13 15 17 19
0.97 9.333 9.354 9.376 9.397 9.419 9.441 9.462 9.484 9.506 9.528 2 4 7 9 11 13 15 17 20
0.98 9.550 9.572 9.594 9.616 9.638 9.661 9.683 9.705 9.727 9.750 2 4 7 9 11 13 16 18 20
0.99 9.772 9.795 9.817 9.840 9.863 9.886 9.908 9.931 9.954 9.977 2 5 7 9 11 14 16 18 20
267
z 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09 z
0.0 0.5000 0.4960 0.4920 0.4880 0.4841 0.4801 0.4761 0.4721 0.4681 0.4641 0.0
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247 0.1
0.2 0.4207 0.4168 0.4129 0.4091 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859 0.2
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483 0.3
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121 0.4
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776 0.5
0.6 0.2743 0.2709 0.2676 0.2644 0.2611 0.2579 0.2546 0.2514 0.2483 0.2451 0.6
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2236 0.2207 0.2177 0.2148 0.7
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1894 0.1867 0.8
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611 0.9
1.0 0.1587 0.1563 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379 1.0
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170 1.1
1.2 0.1151 0.1131 0.1112 0.1094 0.1075 0.1057 0.1038 0.1020 0.1003 0.0985 1.2
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823 1.3
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0722 0.0708 0.0694 0.0681 1.4
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559 1.5
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455 1.6
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367 1.7
1.8 0.0359 0.0352 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294 1.8
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233 1.9
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183 2.0
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143 2.1
2.2 0.0139 0.0136 0.0132 0.0129 0.0126 0.0122 0.0119 0.0116 0.0113 0.0110 2.2
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084 2.3
2.4 0.0082 0.0080 0.0078 0.0076 0.0073 0.0071 0.0070 0.0068 0.0066 0.0064 2.4
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048 2.5
2.6 0.0047 0.0045 0.0044 0.0043 0.0042 0.0040 0.0039 0.0038 0.0037 0.0036 2.6
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026 2.7
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019 2.8
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014 2.9
3.0 0.0014 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010 3.0
3.1 0.0010 0.0009 0.0009 0.0009 0.0009 0.0008 0.0008 0.0008 0.0007 0.0007 3.1
3.2 0.0007 0.0007 0.0006 0.0006 0.0006 0.0006 0.0006 0.0005 0.0005 0.0005 3.2
3.3 0.0005 0.0005 0.0005 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 0.0004 3.3
3.4 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0003 0.0002 3.4
3.5 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 0.0002 3.5
268
269
α
df 0.995 0.990 0.975 0.950 0.900 0.100 0.050 0.025 0.010 0.005 df
1 0.000 0.000 0.001 0.004 0.016 2.706 3.841 5.024 6.635 7.879 1
2 0.010 0.020 0.051 0.103 0.211 4.605 5.991 7.378 9.210 10.597 2
3 0.072 0.115 0.216 0.352 0.584 6.251 7.815 9.348 11.345 12.838 3
4 0.207 0.297 0.484 0.711 1.064 7.779 9.488 11.143 13.277 14.860 4
5 0.412 0.554 0.831 1.145 1.610 9.236 11.070 12.833 15.086 16.750 5
6 0.676 0.872 1.237 1.635 2.204 10.645 12.592 14.449 16.812 18.548 6
7 0.989 1.239 1.690 2.167 2.833 12.017 14.067 16.013 18.475 20.278 7
8 1.344 1.646 2.180 2.733 3.490 13.362 15.507 17.535 20.090 21.955 8
9 1.735 2.088 2.700 3.325 4.168 14.684 16.919 19.023 21.666 23.589 9
10 2.156 2.558 3.247 3.940 4.865 15.987 18.307 20.483 23.209 25.188 10
11 2.603 3.053 3.816 4.575 5.578 17.275 19.675 21.920 24.725 26.757 11
12 3.074 3.571 4.404 5.226 6.304 18.549 21.026 23.337 26.217 28.300 12
13 3.565 4.107 5.009 5.892 7.042 19.812 22.362 24.736 27.688 29.819 13
14 4.075 4.660 5.629 6.571 7.790 21.064 23.685 26.119 29.141 31.319 14
15 4.601 5.229 6.262 7.261 8.547 22.307 24.996 27.488 30.578 32.801 15
16 5.142 5.812 6.908 7.962 9.312 23.542 26.296 28.845 32.000 34.267 16
17 5.697 6.408 7.564 8.672 10.085 24.769 27.587 30.191 33.409 35.718 17
18 6.265 7.015 8.231 9.390 10.865 25.989 28.869 31.526 34.805 37.156 18
19 6.844 7.633 8.907 10.117 11.651 27.204 30.144 32.852 36.191 38.582 19
20 7.434 8.260 9.591 10.851 12.443 28.412 31.410 34.170 37.566 39.997 20
21 8.034 8.897 10.283 11.591 13.240 29.615 32.671 35.479 38.932 41.401 21
22 8.643 9.542 10.982 12.338 14.041 30.813 33.924 36.781 40.289 42.796 22
23 9.260 10.196 11.689 13.091 14.848 32.007 35.172 38.076 41.638 44.181 23
24 9.886 10.856 12.401 13.848 15.659 33.196 36.415 39.364 42.980 45.559 24
25 10.520 11.524 13.120 14.611 16.473 34.382 37.652 40.646 44.314 46.928 25
26 11.160 12.198 13.844 15.379 17.292 35.563 38.885 41.923 45.642 48.290 26
27 11.808 12.879 14.573 16.151 18.114 36.741 40.113 43.195 46.963 49.645 27
28 12.461 13.565 15.308 16.928 18.939 37.916 41.337 44.461 48.278 50.993 28
29 13.121 14.256 16.047 17.708 19.768 39.087 42.557 45.722 49.588 52.336 29
30 13.787 14.953 16.791 18.493 20.599 40.256 43.773 46.979 50.892 53.672 30
40 20.707 22.164 24.433 26.509 29.051 51.805 55.758 59.342 63.691 66.766 40
50 27.991 29.707 32.357 34.764 37.689 63.167 67.505 71.420 76.154 79.490 50
60 35.534 37.485 40.482 43.188 46.459 74.397 79.082 83.298 88.379 91.952 60
70 43.275 45.442 48.758 51.739 55.329 85.527 90.531 95.023 100.425 104.215 70
80 51.172 53.540 57.153 60.391 64.278 96.578 101.879 106.629 112.329 116.321 80
90 59.196 61.754 65.647 69.126 73.291 107.565 113.145 118.136 124.116 128.299 90
100 67.328 70.065 74.222 77.929 82.358 118.498 124.342 129.561 135.807 140.169 100
270
12th_Statistics_EM_Logtable.indd 271
α
0 F(m,n),α
m
n
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120
1 4052.181 4999.500 5403.352 5624.583 5763.650 5858.986 5928.356 5981.070 6022.473 6055.847 6106.321 6157.285 6208.730 6234.631 6260.649 6286.782 6313.030 6339.391
2 98.503 99.000 99.166 99.249 99.299 99.333 99.356 99.374 99.388 99.399 99.416 99.433 99.449 99.458 99.466 99.474 99.482 99.491
3 34.116 30.817 29.457 28.710 28.237 27.911 27.672 27.489 27.345 27.229 27.052 26.872 26.690 26.598 26.505 26.411 26.316 26.221
4 21.198 18.000 16.694 15.977 15.522 15.207 14.976 14.799 14.659 14.546 14.374 14.198 14.020 13.929 13.838 13.745 13.652 13.558
5 16.258 13.274 12.060 11.392 10.967 10.672 10.456 10.289 10.158 10.051 9.888 9.722 9.553 9.466 9.379 9.291 9.202 9.112
6 13.745 10.925 9.780 9.148 8.746 8.466 8.260 8.102 7.976 7.874 7.718 7.559 7.396 7.313 7.229 7.143 7.057 6.969
7 12.246 9.547 8.451 7.847 7.460 7.191 6.993 6.840 6.719 6.620 6.469 6.314 6.155 6.074 5.992 5.908 5.824 5.737
8 11.259 8.649 7.591 7.006 6.632 6.371 6.178 6.029 5.911 5.814 5.667 5.515 5.359 5.279 5.198 5.116 5.032 4.946
9 10.561 8.022 6.992 6.422 6.057 5.802 5.613 5.467 5.351 5.257 5.111 4.962 4.808 4.729 4.649 4.567 4.483 4.398
10 10.044 7.559 6.552 5.994 5.636 5.386 5.200 5.057 4.942 4.849 4.706 4.558 4.405 4.327 4.247 4.165 4.082 3.996
11 9.646 7.206 6.217 5.668 5.316 5.069 4.886 4.744 4.632 4.539 4.397 4.251 4.099 4.021 3.941 3.860 3.776 3.690
12 9.330 6.927 5.953 5.412 5.064 4.821 4.640 4.499 4.388 4.296 4.155 4.010 3.858 3.780 3.701 3.619 3.535 3.449
271
13 9.074 6.701 5.739 5.205 4.862 4.620 4.441 4.302 4.191 4.100 3.960 3.815 3.665 3.587 3.507 3.425 3.341 3.255
14 8.862 6.515 5.564 5.035 4.695 4.456 4.278 4.140 4.030 3.939 3.800 3.656 3.505 3.427 3.348 3.266 3.181 3.094
15 8.683 6.359 5.417 4.893 4.556 4.318 4.142 4.004 3.895 3.805 3.666 3.522 3.372 3.294 3.214 3.132 3.047 2.959
16 8.531 6.226 5.292 4.773 4.437 4.202 4.026 3.890 3.780 3.691 3.553 3.409 3.259 3.181 3.101 3.018 2.933 2.845
17 8.400 6.112 5.185 4.669 4.336 4.102 3.927 3.791 3.682 3.593 3.455 3.312 3.162 3.084 3.003 2.920 2.835 2.746
18 8.285 6.013 5.092 4.579 4.248 4.015 3.841 3.705 3.597 3.508 3.371 3.227 3.077 2.999 2.919 2.835 2.749 2.660
19 8.185 5.926 5.010 4.500 4.171 3.939 3.765 3.631 3.523 3.434 3.297 3.153 3.003 2.925 2.844 2.761 2.674 2.584
20 8.096 5.849 4.938 4.431 4.103 3.871 3.699 3.564 3.457 3.368 3.231 3.088 2.938 2.859 2.778 2.695 2.608 2.517
21 8.017 5.780 4.874 4.369 4.042 3.812 3.640 3.506 3.398 3.310 3.173 3.030 2.880 2.801 2.720 2.636 2.548 2.457
22 7.945 5.719 4.817 4.313 3.988 3.758 3.587 3.453 3.346 3.258 3.121 2.978 2.827 2.749 2.667 2.583 2.495 2.403
23 7.881 5.664 4.765 4.264 3.939 3.710 3.539 3.406 3.299 3.211 3.074 2.931 2.781 2.702 2.620 2.535 2.447 2.354
24 7.823 5.614 4.718 4.218 3.895 3.667 3.496 3.363 3.256 3.168 3.032 2.889 2.738 2.659 2.577 2.492 2.403 2.310
25 7.770 5.568 4.675 4.177 3.855 3.627 3.457 3.324 3.217 3.129 2.993 2.850 2.699 2.620 2.538 2.453 2.364 2.270
26 7.721 5.526 4.637 4.140 3.818 3.591 3.421 3.288 3.182 3.094 2.958 2.815 2.664 2.585 2.503 2.417 2.327 2.233
27 7.677 5.488 4.601 4.106 3.785 3.558 3.388 3.256 3.149 3.062 2.926 2.783 2.632 2.552 2.470 2.384 2.294 2.198
28 7.636 5.453 4.568 4.074 3.754 3.528 3.358 3.226 3.120 3.032 2.896 2.753 2.602 2.522 2.440 2.354 2.263 2.167
29 7.598 5.420 4.538 4.045 3.725 3.499 3.330 3.198 3.092 3.005 2.868 2.726 2.574 2.495 2.412 2.325 2.234 2.138
30 7.562 5.390 4.510 4.018 3.699 3.473 3.304 3.173 3.067 2.979 2.843 2.700 2.549 2.469 2.386 2.299 2.208 2.111
40 7.314 5.179 4.313 3.828 3.514 3.291 3.124 2.993 2.888 2.801 2.665 2.522 2.369 2.288 2.203 2.114 2.019 1.917
60 7.077 4.977 4.126 3.649 3.339 3.119 2.953 2.823 2.718 2.632 2.496 2.352 2.198 2.115 2.028 1.936 1.836 1.726
120 6.851 4.787 3.949 3.48 3.174 2.956 2.792 2.663 2.559 2.472 2.336 2.192 2.035 1.95 1.86 1.763 1.656 1.533
2/27/2019 2:07:14 PM
m
12th_Statistics_EM_Logtable.indd 272
n
1 2 3 4 5 6 7 8 9 10 12 15 20 24 30 40 60 120
1 161.448 199.500 215.707 224.583 230.162 233.986 236.768 238.883 240.543 241.882 243.906 245.950 248.013 249.052 250.095 251.143 252.196 253.253
2 18.513 19.000 19.164 19.247 19.296 19.330 19.353 19.371 19.385 19.396 19.413 19.429 19.446 19.454 19.462 19.471 19.479 19.487
3 10.128 9.552 9.277 9.117 9.013 8.941 8.887 8.845 8.812 8.786 8.745 8.703 8.660 8.639 8.617 8.594 8.572 8.549
4 7.709 6.944 6.591 6.388 6.256 6.163 6.094 6.041 5.999 5.964 5.912 5.858 5.803 5.774 5.746 5.717 5.688 5.658
5 6.608 5.786 5.409 5.192 5.050 4.950 4.876 4.818 4.772 4.735 4.678 4.619 4.558 4.527 4.496 4.464 4.431 4.398
6 5.987 5.143 4.757 4.534 4.387 4.284 4.207 4.147 4.099 4.060 4.000 3.938 3.874 3.841 3.808 3.774 3.740 3.705
7 5.591 4.737 4.347 4.120 3.972 3.866 3.787 3.726 3.677 3.637 3.575 3.511 3.445 3.410 3.376 3.340 3.304 3.267
8 5.318 4.459 4.066 3.838 3.687 3.581 3.500 3.438 3.388 3.347 3.284 3.218 3.150 3.115 3.079 3.043 3.005 2.967
9 5.117 4.256 3.863 3.633 3.482 3.374 3.293 3.230 3.179 3.137 3.073 3.006 2.936 2.900 2.864 2.826 2.787 2.748
10 4.965 4.103 3.708 3.478 3.326 3.217 3.135 3.072 3.020 2.978 2.913 2.845 2.774 2.737 2.700 2.661 2.621 2.580
11 4.844 3.982 3.587 3.357 3.204 3.095 3.012 2.948 2.896 2.854 2.788 2.719 2.646 2.609 2.570 2.531 2.490 2.448
12 4.747 3.885 3.490 3.259 3.106 2.996 2.913 2.849 2.796 2.753 2.687 2.617 2.544 2.505 2.466 2.426 2.384 2.341
13 4.667 3.806 3.411 3.179 3.025 2.915 2.832 2.767 2.714 2.671 2.604 2.533 2.459 2.420 2.380 2.339 2.297 2.252
14 4.600 3.739 3.344 3.112 2.958 2.848 2.764 2.699 2.646 2.602 2.534 2.463 2.388 2.349 2.308 2.266 2.223 2.178
15 4.543 3.682 3.287 3.056 2.901 2.790 2.707 2.641 2.588 2.544 2.475 2.403 2.328 2.288 2.247 2.204 2.160 2.114
16 4.494 3.634 3.239 3.007 2.852 2.741 2.657 2.591 2.538 2.494 2.425 2.352 2.276 2.235 2.194 2.151 2.106 2.059
272
17 4.451 3.592 3.197 2.965 2.810 2.699 2.614 2.548 2.494 2.450 2.381 2.308 2.230 2.190 2.148 2.104 2.058 2.011
18 4.414 3.555 3.160 2.928 2.773 2.661 2.577 2.510 2.456 2.412 2.342 2.269 2.191 2.150 2.107 2.063 2.017 1.968
19 4.381 3.522 3.127 2.895 2.740 2.628 2.544 2.477 2.423 2.378 2.308 2.234 2.155 2.114 2.071 2.026 1.980 1.930
20 4.351 3.493 3.098 2.866 2.711 2.599 2.514 2.447 2.393 2.348 2.278 2.203 2.124 2.082 2.039 1.994 1.946 1.896
21 4.325 3.467 3.072 2.840 2.685 2.573 2.488 2.420 2.366 2.321 2.250 2.176 2.096 2.054 2.010 1.965 1.916 1.866
22 4.301 3.443 3.049 2.817 2.661 2.549 2.464 2.397 2.342 2.297 2.226 2.151 2.071 2.028 1.984 1.938 1.889 1.838
23 4.279 3.422 3.028 2.796 2.640 2.528 2.442 2.375 2.320 2.275 2.204 2.128 2.048 2.005 1.961 1.914 1.865 1.813
24 4.260 3.403 3.009 2.776 2.621 2.508 2.423 2.355 2.300 2.255 2.183 2.108 2.027 1.984 1.939 1.892 1.842 1.790
25 4.242 3.385 2.991 2.759 2.603 2.490 2.405 2.337 2.282 2.236 2.165 2.089 2.007 1.964 1.919 1.872 1.822 1.768
26 4.225 3.369 2.975 2.743 2.587 2.474 2.388 2.321 2.265 2.220 2.148 2.072 1.990 1.946 1.901 1.853 1.803 1.749
27 4.210 3.354 2.960 2.728 2.572 2.459 2.373 2.305 2.250 2.204 2.132 2.056 1.974 1.930 1.884 1.836 1.785 1.731
28 4.196 3.340 2.947 2.714 2.558 2.445 2.359 2.291 2.236 2.190 2.118 2.041 1.959 1.915 1.869 1.820 1.769 1.714
29 4.183 3.328 2.934 2.701 2.545 2.432 2.346 2.278 2.223 2.177 2.104 2.027 1.945 1.901 1.854 1.806 1.754 1.698
30 4.171 3.316 2.922 2.690 2.534 2.421 2.334 2.266 2.211 2.165 2.092 2.015 1.932 1.887 1.841 1.792 1.740 1.683
40 4.085 3.232 2.839 2.606 2.449 2.336 2.249 2.180 2.124 2.077 2.003 1.924 1.839 1.793 1.744 1.693 1.637 1.577
60 4.001 3.150 2.758 2.525 2.368 2.254 2.167 2.097 2.040 1.993 1.917 1.836 1.748 1.700 1.649 1.594 1.534 1.467
120 3.920 3.072 2.680 2.447 2.290 2.175 2.087 2.016 1.959 1.910 1.834 1.750 1.659 1.608 1.554 1.495 1.429 1.352
2/27/2019 2:07:14 PM
12th_Statistics_EM_Logtable.indd 273
Exponential Function Table (Values of e–m)
m 0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 1.0000 0.9900 0.9802 0.9704 0.9608 0.9512 0.9418 0.9324 0.9231 0.9139
0.1 0.9048 0.8958 0.8869 0.8781 0.8694 0.8607 0.8521 0.8437 0.8353 0.8270
0.2 0.8187 0.8106 0.8025 0.7945 0.7866 0.7788 0.7711 0.7634 0.7558 0.7483
0.3 0.7408 0.7334 0.7261 0.7189 0.7118 0.7047 0.6977 0.6907 0.6839 0.6771
273
0.4 0.6703 0.6637 0.6570 0.6505 0.6440 0.6376 0.6313 0.6250 0.6188 0.6126
0.5 0.6065 0.6005 0.5945 0.5886 0.5827 0.5769 0.5712 0.5655 0.5599 0.5543
0.6 0.5488 0.5434 0.5379 0.5326 0.5273 0.5220 0.5169 0.5117 0.5066 0.5016
0.7 0.4966 0.4916 0.4868 0.4819 0.4771 0.4724 0.4677 0.4630 0.4584 0.4538
0.8 0.4493 0.4449 0.4404 0.4360 0.4317 0.4274 0.4232 0.4190 0.4148 0.4107
0.9 0.4066 0.4025 0.3985 0.3946 0.3906 0.3867 0.3829 0.3791 0.3753 0.3716
2/27/2019 2:07:14 PM
Statistics – Class XII
List of Authors and Reviewers
Domain Experts Reviewers
Dr. G. Gopal Dr. M.R. Srinivasan
Professor & Head (Retd.), Dept. of Statistics Professor & Head, Dept. of Statistics,
University of Madras, Chennai University of Madras, Chennai.
Dr. G. Stephen Vincent Dr. P. Dhanavanthan
Associate Professor & Head (Retd.), Dept. of Statistics Professor and Dean, Dept. of Statistics,
St.Joseph's College, Trichy Pondicherry University, Pondicherry.
Dr. R. Ravanan
Principal, Presidency College, Chennai. Content Writers
Dr. K. Senthamarai Kannan G. Gnanasundaram
Professor, Dept. of Statistics, Manonmaniam HM (Retd.), SSV HSS, Parktown, Chennai.
Sundaranar University, Tirunelveli.
P. Rengarajan
Dr. A. Loganathan PG Asst., (Retd.), Thiyagarajar HSS, Madurai.
Professor, Dept. of Statistics, Manonmaniam
Sundaranar University, Tirunelveli. S. John Kennadi
PG Asst., St. Xavier's HSS, Purathakudi, Trichy.
Dr. R. Kannan
Professor, Dept. of Statistics, AL.Nagammai
Annamalai University, Chidambaram. PG Asst., Sevasangam GHSS, Trichy.
Dr. N. Viswanathan M. Rama Lakshmi
Associate Professor, Dept. of Statistics, PG Asst., Suguni Bai Sanathana Dharma GHSS, Chennai.
Presidency College, Chennai. Maala Bhaskaran
Dr. R.K. Radha PG Asst., GGHSS, Nandhivaram, Kanchipuram.
Assistant Professor, Dept. of Statistics,
Presidency College, Chennai.
M. Boobalan
PG Asst., Zamindar HSS, Thuraiyur, Trichy.
R. Avoodaiappan
PG Asst., GGHSS, Ashok Pillar, Chennai.
K. Chitra
Art and Design Team PG Asst., Tarapore and Loganathan GHSS, Chennai.