Statistics for Management SFM - BA4101 - Notes by JeppiaarEC
Statistics for Management SFM - BA4101 - Notes by JeppiaarEC
Statistics for Management SFM - BA4101 - Notes by JeppiaarEC
3rd Semester
Human Resources
2nd Semester
STUDY MATERIAL
Faculty In charge
Dr. P. SIVAGAMI
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
TABLE OF CONTENTS
S.No Particulars Pg No
PEO’S/PO’s 4
4 STUDY MATERIAL
UNIT I 6
UNIT II 39
UNIT III 48
UNIT IV 108
UNIT V 117
PART (10 questions from the unit) with page number from the
question bank. ( Unit 1 to Unit 3)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
VISION
To build Jeppiaar Engineering College as an institution of academic excellence in technology and
management education, leading to become a world class university.
MISSION
• To excel in teaching and learning, research and innovation by promoting the principles of
scientific analysis and creative thinking.
• To participate in the production, development and dissemination of knowledge and
interact with national and international communities.
• To equip students with values, ethics and life skills needed to enrich their lives and enable
them to contribute for the progress of society.
• To prepare students for higher studies and lifelong learning, enrich them with the practical
skills necessary to excel as future professionals and entrepreneurs for the benefit of
Nation’s economy.
VISION
MISSION
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
COURSE OBJECTIVE:
To learn the applications of statistics in business decision making.
COURSE OUTCOMES:
C101.1: Facilitate objective solutions in business decision making.
C101.2: Understand and solve business problems.
C101.3: Apply statistical techniques to data sets, and correctly interpret the results.
C101.4: Develop skill-set that is in demand in both the research and business environments.
C101.5: Enable the students to apply the statistical techniques in a work setting.
CO-PO Matrix
CO1 3 3 3 0 0 2
CO2 3 3 3 0 0 2
CO3 3 3 3 0 0 2
CO4 3 3 3 0 0 2
CO5 3 3 3 0 3 2
Average 3 3 3 0 3 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Syllabus
COURSE OBJECTIVE:
To learn the applications of statistics in business decision making.
UNIT I INTRODUCTION
Basic definitions and rules for probability, conditional probability independence of events,
Baye‘s theorem, and random variables, Probability distributions: Binomial, Poisson, Uniform
and Normal distributions.
UNIT II SAMPLING DISTRIBUTION AND ESTIMATION
Introduction to sampling distributions, sampling distribution of mean and proportion, application
of central limit theorem, sampling techniques. Estimation: Point and Interval estimates for
population parameters of large sample and small samples, determining the sample size.
UNIT III TESTING OF HYPOTHESIS – PARAMETIRC TESTS
Hypothesis testing: one sample and two sample tests for means and proportions of large samples
(ztest), one sample and two sample tests for means of small samples (t-test), F-test for two
sample standard deviations. ANOVA one and two way
UNIT IV NON-PARAMETRIC TESTS
Chi-square test for single sample standard deviation. Chi-square tests for independence of
attributes and goodness of fit. Sign test for paired data. Rank sum test. Kolmogorov-Smirnov –
test for goodness of fit comparing two populations. Mann – Whitney U test and Kruskal Wallis
test. One sample run test.
UNIT V CORRELATION AND REGRESSION
Correlation – Coefficient of Determination – Rank Correlation – Regression – Estimation of
Regression line – Method of Least Squares – Standard Error of estimate.
REFERENCES:
1. Richard I. Levin, David S. Rubin, Masood H.Siddiqui, Sanjay Rastogi, Statistics for
Management, Pearson Education, 8th Edition, 2017.
2. Prem. S. Mann, Introductory Statistics, Wiley Publications, 9th Edition, 2015.
3. T N Srivastava and Shailaja Rego, Statistics for Management, Tata McGraw Hill, 3rd Edition
2017.
4. Ken Black, Applied Business Statistics, 7th Edition, Wiley India Edition, 2012.
5. David R. Anderson, Dennis J. Sweeney, Thomas A.Williams, Jeffrey D.Camm, James
J.Cochran, Statistics for business and economics, 13th edition, Thomson (South – Western) Asia,
Singapore, 2016.
6. N. D. Vohra, Business Statistics, Tata McGraw Hill, 2017.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
UNIT I INTRODUCTION
Basic definitions and rules for probability, conditional probability independence of
events, Baye's theorem, and random variables, Probability distributions: Binomial,
Poisson, Uniform and Normal distributions.
UNIT-1 INTRODUCTION
Random experiment: An experiment whose all possible outcomes are known, but
it is not possible to predict the outcome.
Probability:
Let A be a event and B be a its sample space then its probability on the occurrence
No. of favourable Cases
on events is defined as P ( A ) = .
Total no. of exhaustic Cases
Axioms of Probability:
n n
(i) 0 P( E ) 1 (ii) P( S ) = 1 (iii) P( Ei ) = P ( Ei ) if Ei’s are mutually exclusive
i =1 i =1
events.
Example: (i) A fair coin is “tossed” (ii) A die is “rolled” are random experiments,
since we cannot predict the outcome of the experiment in any trial.
Mutually exclusive:
Two events are said to be mutually exclusive if the occurrence of any one of them
excludes the occurrence of other in a single experiment.
Example: Tossing of Coin.
Independent events:
Two (or) more events are independent if the occurrence of one does not affect the
occurrence of the other.
Example: If coin is tossed twice; result of second throw is not affected by the result
of first throw.
Addition Law of Probability:
If A and B are two events in a sample space “S” then
P ( A B ) = P ( A) + P ( B ) − P ( A B ) .
Conditional Probability:
The conditional probability of an event B assuming that the event A has happened,
P ( A B)
is defined as P ( B A) = , P ( A) 0
P ( A)
P ( A B)
Similarly, P ( A B ) = , P ( B) 0 .
P ( B)
1. If A and B are independent events then a) A and B b) A and B are also
independent.
Solution:
Since A and B are independent,
P ( A B ) = P ( A) P ( B ) − − − 1
a) P ( A B ) = P ( A ) − P ( A B )
= P ( A) − P ( A) P ( B ) [u sin g (1)]
= P ( A ) 1 − P ( B )
P ( A B ) = P ( A ) P ( B ) A & B are independent events
(
b) P ( A B ) = P A B )
= 1− P ( A B)
= 1 − P ( A ) + P ( B ) − P ( A B ) [ By addition theorem]
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
= 1 − P ( A) − P ( B ) + P ( A B )
= 1 − P ( A) − P ( B ) + P ( A) P ( B ) [u sin g − 1]
= 1 − P ( A ) − P ( B ) 1 − P ( A )
= 1 − P ( A ) 1 − P ( B )
P ( A B ) = P ( A) P ( B )
A & B are indepentent events .
2. A Problem in statistics is given to three students. A, B and C whose chances
1 1 1
of solving it are , and respectively. What is the probability that the
2 3 4
problem will be solved?
Solution:
Let A, B, C Denote the events that the problem is solved by the students A, B, C
respectively.
1 1 1
Then P ( A) = , P ( B) = , P (C ) =
2 2 4
P ( A) = 1 − =
1 1
2 2
P ( B ) = 1− =
1 2
3 3
P (C ) = 1 − =
1 3
4 4
P(all the three students will not solve the problem) = P ( A ) P ( B ) P ( C ) = . . =
1 2 3 1
2 3 4 4
P(all the three students will solve the problem) = P ( A B C )
= 1 − P ( A ) P ( B ) P (C ) = 1−
1 3
=
4 4
, P ( AB ) = and P ( A ) = find
3 1 2
3. Event A and B are such that P ( A + B ) =
4 4 3
P ( B) .
Solution:
Given P ( A + B ) = , P ( AB ) = , P ( A ) =
3 1 2
4 4 3
i.e. P ( A) = 1 − P ( A ) = 1 − =
2 1
3 3
By addition theorem
P ( A B ) = P ( A) + P ( B ) − P ( A B )
i.e. P ( B ) = P ( A B ) − P ( A) + P ( A B )
3 1 1 9−4+3 8 2
P ( B) =
− + = = =
4 3 4 12 12 3
4. An integer is chosen at random from two hundred digits. What is the
probability that the integer is divisible by 6 or 8?
Solution:
The sample space = 1, 2,3......199, 200
n ( S ) = 100
Let the event A be an integer chosen that is divisible by 6,
i.e. A = 6,12,18........198
198
n ( A) = = 33
6
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
n ( A) 33
n ( A) = =
n ( S ) 200
Let the event B be an integer chosen that is divisible by 8
i.e. B = 8,16, 24.....200
200
n ( B) = = 25
8
n ( B ) 25
P ( B) = =
n ( S ) 200
The L.C.M of 6 & 8 is 24.
Hence, a number that is divisible by both 6 & 8 is divisible by 24.
A B = 24, 48, 72,.....192
192
n ( A B) = =8
24
n ( A B) 8
P ( A B) = =
n(S ) 200
Hence by addition theorem on probability
33 25 8 58 − 8 50 1
P ( A B ) = P ( A) + P ( B ) − P ( A B ) = + − = = =
200 200 200 200 200 4
5. A and B throw alternatively with a pair of dice. A wins if he throws 6 before
B throws 7 and 8 wins if he throws 7 before a throws 6.If A begins, show that
their respective chances of winning are in the ratio 30:61.
Solution:
Let Ai denote the event of A’s throwing 6 in the ith thrown i = 1, 2,3,...
‘6’ can be obtained with two dice in the following ways
(1,5)( 5,1)( 2, 4 )( 4, 2 )( 3,3)
i.e. 5 distinct ways
P ( Ai ) =
5
36
( ) 31
, P Ai = 1 − P ( Ai ) = , i = 1, 2,...
36
Let Bi denote the event of B’s throwing 7 in the ith thrown i = 1, 2,3,...
‘7’ can be obtained with two dice in the following ways
(1, 6 ) , ( 6,1) , ( 2,5) , ( 5, 2 ) , (3, 4 ) , ( 4,3)
i.e. 6 distinct ways
P ( Bi ) =
6
36
( ) 5
, P Bi = 1 − P ( Bi ) = , i = 1, 2,.....
6
6. If A starts the game, he will win in the following mutually exclusive ways:
(i) A1 happens (ii) A1 B2 A3 happens
(iii) A1 B2 A3 B4 A5 happens, and so on.
Hence by addition theorem of probability, the required probability of winning is
given by P ( A ) ,
P ( A) = P ( i ) + P ( ii ) + P ( iii ) + ...
( ) (
= P ( A1 ) + P A1 B2 A3 + P A1 B2 A3 B4 A5 + ... )
= P ( A ) + P ( A ) P ( B ) P ( A ) + P ( A ) P ( B ) P ( A ) P ( B ) P ( A ) + ..
1 1 2 3 1 2 3 4 5
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
a 5 31 5
the series is an infinite geometric series 1 − r where a = 36 & r = 36 6
5 5
30
= 36 = 36 = .
31 5 61 61
1 −
36 6 216
7. The probability that a contractor will get a plumbing contract is 2 and the
3
probability that he will not get an electric contract is 5 .If the probability of
9
getting at least one contract is 4 , what is the probability that he will get both
5
?
Solution:
Let A be an event of getting a plumbing contract & B be an event of getting an electric
contract.
2
( ) 5
P ( A) = , P B = , P ( A B ) =
3 9
4
5
( )
P ( B) = 1− P B = 1− =
5 4
9 9
By addition theorem of probability
P ( A B ) = P ( A) + P ( B ) − P ( A B )
2 4 4 30 + 20 − 36 50 − 36 14
P ( A B) = + − = = =
3 9 5 45 45 45
14
i.e. probability of getting both the contract is .
15
5 1 1
( )
8. Let P ( A B ) = , P ( A B ) = and P B = . Are the events A and B
6 3 2
independent Explain.
Solution:
( )
P ( B) = 1− P B =
1
2
P ( A) = P ( A B ) + P ( A B ) − P ( B )
5 1 1 5+ 2−3 4 2
= + − = = =
6 3 2 6 6 3
1 21 1
Since P ( A B ) = = P ( A) P ( B ) P ( A) P ( B ) = =
3 32 3
Hence A & B are independent
P ( Ei ) P ( A / Ei )
P ( Ei / A) = n
, i = 1, 2,..., n .
P(E ) P( A/ E )
i =1
i i
9. If the probability that A solves a problem is 1 2 and that for B is 3 4 and if they
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
aim at solving a problem independently, what is the probability that the
problem is solved?
Solution:
Probability of A solving a problem is P ( A) = 1 & that of B is P ( B ) = 3 .
2 4
1 3 3
A & B are independent P ( A B ) = P ( A) P ( B ) = . =
2 4 8
Hence the probability that the problem is solved is
1 3 3
P ( A B ) = P ( A) + P ( B ) − P ( A B ) = + − Using (1)
2 4 8
4 + 6 − 3 10 − 3 7
= = = .
8 8 8
10. In a shooting test, the probability of hitting the target is 1 for A, 2 for B
2 3
and 3 for C. If all of them fine at the target, find the probability that (i) none
4
of them hits the target and (ii) atleast one of them hits the target.
Solution:
1 2 3
Given P ( A) = , P ( B ) = , P ( C ) =
2 3 4
( ) 1
( )
P A = ,P B = ,P C =
2
( )1
3
1
4
( ) ( ) ( ) ( )
(i) P A B C = P A P B P C (by independence) = =
1 1 1 1
2 3 4 24
1 23
(ii) P (atleast one hits the target) =1−P(none hits the target) = 1 − =
24 24
11. A bolt is manufactured by 3 machines A,B and C. A turns out twice as many
items as B, and machines B and C produce equal number of items. 2% of bolts
produced by A and B are defective and 4% of bolts produced by C are defective.
All bolts are put into 1 stock pile and 1 is chosen from this pile. What is the
Probability that it is defective?
Solution:
Let A, B & C be the event in which the item has been produced by machine A, B & C
respectively.
D be the event of the item being defective.
1 1
Given P ( A) = , P ( B ) = P ( C ) =
2 4
P ( D / A) = P (an item is defective, given that A has produced it)
2
= = P ( D / B)
100
4
P(D /C) =
100
By theorem of total probability,
P ( D ) = P ( A) P ( D / A) + P ( B ) P ( D / B ) + P ( C ) P ( D / C )
1 2 1 2 1 4 1 1 1 2 1
= + + = + + = +
2 100 4 100 4 100 100 200 100 100 200
4 +1 5 1
= = = .
200 200 40
12. For a certain binary communication channel, the probability that a
transmitted ‘0’ is received as a ‘0’ is 0.95 and the probability that a transmitted
‘1’ is received as ‘1’is 0.90. If the probability that (i) a ‘1’ is received and (ii) a
‘1’ was transmitted given that a ‘1’ was received.
Solution:
Let A be the event of transmitting ‘1’
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
A be the event of transmitting ‘0’
B be the event of received ‘1’ &
B be the event of receiving ‘0’.
( ) ( )
Given P A = 0.4, P ( B / A) = 0.9 & P B / A = 0.95
ii.Continuous Variable:
A distribution function of a continuous random variable X is defined as
x
F ( x) = P ( X x) = f ( x ) dx .
−
Mathematical Expectation
The expected value of the random variable X is defined as
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
i.If X is discrete random variable E ( X ) = xi p ( xi ) where p ( x ) is the probability
i =1
function of x .
ii.If X is continuous random variable E ( X ) =
−
xf ( x ) dx where f ( x ) is the probability
density function of x .
Properties of Expectation:
1. If C is constant then E ( C ) = C
Proof:
Let X be a discrete random variable then E ( x ) = xp ( x )
Now E ( C ) = Cp ( x )
n
= C p ( x) since p
i =1
i = p1 + p2 + ... + pn = 1
=C
2. If a, b are constants then E ( ax + b ) = aE ( x ) + b
Proof:
Let X be a discrete random variable then E ( x ) = xp ( x )
Now E ( ax + b ) = ( ax + b ) p ( x )
= axp ( x ) + bp ( x )
n
= a xp ( x ) + b p ( x ) since p
i =1
i = p1 + p2 + ... + pn = 1
= aE ( x ) + b
3. If a and b are constants then Var ( ax + b ) = a 2Var ( x )
Proof:
Var ( ax + b ) = E ( ax + b − E ( ax + b ) )
2
= E ( ax + b − aE ( x ) − b )
2
= E a ( x − E ( x ) )
2 2
= a 2 E ( x − E ( x ) )
2
= a Var ( x ) .
2
= E ( ax − aE ( x ) )
2
= E a 2 ( x − E ( x ) )
2
= a 2 E ( x − E ( x ) )
2
= a Var ( x ) .
2
( )
5. Prove that Var ( x ) = E x2 − E ( x )
2
Proof:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Var ( x) = E ( x − E ( x ) )
2
= E x + ( E ( x ) ) − 2 xE ( x )
2 2
= E x + − 2 x
2 2
= E ( x 2 ) + E ( 2 ) − E ( 2 x )
= E ( x 2 ) + 2 − 2 E ( x )
= E ( x 2 ) + 2 − 2 2
= E ( x2 ) − 2
Var ( x) = E ( x2 ) − E ( x )
2
defined as M X ( t ) = E ( e ) =
tX
e p ( x ) , if x is discrete
tx
(
= E e x( ct ) )
= M x ( ct )
2. M x + c ( t ) = ect M x ( t )
Proof:
(
M x+c ( t ) = E e( x+c)t )
= E (e e )xt ct
= ect M x ( t )
3. M ax +b ( t ) = ebt M x ( at )
Proof:
M ax+b ( t ) = E e( ax+b)t( )
= E ( e axt ebt )
(
= ebt E e x( at ) )
= ebt M x ( at )
4. If X and Y are independent random variables then M x + y ( t ) = M x ( t ) .M y ( t )
Proof:
(
M x+ y ( t ) = E e( x+ y )t )
= E ( e xt e yt )
= E ( e xt ) E ( e yt )
M x+ y ( t ) = M x (t ) M y (t )
Problem.1
If the probability distribution of X is given as
X : 1 2 3 4
P X : 0.4 0.3 0.2 0.1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Find P (1/ 2 X 7 / 2 X 1)
Solution:
P (1/ 2 X 7 / 2 ) X 1
P 1/ 2 X 7 / 2 / X 1 =
P ( X 1)
P ( X = 2 or 3) P ( X = 2 ) + P ( X = 3)
= =
P ( X = 2,3 or 4 ) P ( X = 2 ) + P ( X = 3) + P ( X = 4 )
0.3 + 0.2 0.5 5
= = = .
0.3 + 0.2 + 0.1 0.6 6
Problem.2
A random variable X has the following probability distribution
X : 2 1 0 1 2 3
P X : 0.1 K 0.2 2 K 0.3 3K
a) Find K , b) Evaluate P ( X 2 ) and P ( −2 X 2 )
b) Find the cdf of X and d) Evaluate the mean of X .
Solution:
a) Since P ( X ) = 1
0.1 K 0.2 2K 0.3 3K 1
6K 0.6 1
6K 0.4
0.4 1
K= =
6 15
b) P ( X 2 ) = P ( X = −2, −1, 0 or 1)
= P ( X = −2 ) + P ( X = −1) + P ( X = 0 ) + P ( X = 1)
1 1 1 2 3 + 2 + 6 + 4 15 1
= + + + = = =
10 15 5 15 30 30 2
P ( −2 X 2 ) = P ( X = −1, 0 or 1)
1 1 2 1+ 3 + 2 6 2
= P ( X = −1) + P ( X = 0 ) + P ( X = 1) =
+ + = = =
15 5 15 15 15 5
c) The distribution function of X is given by F ( x ) defined by
X =x P( X = x ) F ( x ) = P( X x)
-2 1 1
F ( x ) = P( X −2) =
10 10
-1 1 1
F ( x ) = P( X −1) =
15 6
0 2 11
F ( x ) = P( X 0) =
10 30
1 2 1
F ( x ) = P( X 1) =
15 2
2 3 4
F ( x ) = P( X 2) =
10 5
3 3 F ( x ) = P ( X 3) = 1
15
d) Mean of X is defined by E ( X ) = xP ( x )
1 1 1 2 3 1
E ( X ) = −2 + −1 + 0 + 1 + 2 + 3
10 15 5 15 10 5
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
1 1 2 3 3 16
=− − + + + = .
5 15 15 5 5 15
Problem.3
A random variable X has the following probability function:
X : 0 1 2 3 4 5 6 7
2 2 2
P X : 0 K 2 K 2 K 3K K 2 K 7 K K
Find (i) K , (ii) Evaluate P ( X 6 ) , P ( X 6 ) and P ( 0 X 5 )
(iii). Determine the distribution function of X .
(iv). P (1.5 X 4.5 X 2 )
(v). E ( 3x − 4 ) , Var (3x − 4)
1
(vi). The smallest value of n for which P ( X n ) .
2
Solution:
7
(i) Since P ( X ) = 1,
x =0
K + 2K + 2K + 3K + K 2 + 2K 2 + 7 K 2 + K = 1
10 K 2 + 9 K − 1 = 0
1
K= or K = −1
10
1
As P ( X ) cannot be negative K =
10
(ii) P ( X 6 ) = P ( X = 0 ) + P ( X = 1) + ... + P ( X = 5 )
1 2 2 3 1 81
= + + + + + ... =
10 10 10 10 100 100
Now P ( X 6 ) = 1 − P ( X 6 )
81 19
= 1−
=
100 100
Now P ( 0 X 5) = P ( X = 1) + P ( X = 2 ) + P ( X = 3) = P ( X = 4 )
= K + 2K + 2K + 3K
8 4
= 8K = = .
10 5
(iii) The distribution of X is given by F ( x ) = P ( X x )
X =x P( X = x ) F ( x ) = P( X x)
0 0 F ( x ) = P( X 0) = 0
1 1 1
F ( x ) = P( X 1) =
10 10
2 2 3
F ( x ) = P( X 2) =
10 10
3 2 5
F ( x ) = P( X 3) =
10 10
4 3 8
F ( x ) = P( X 4) =
10 10
5 1 81
F ( x ) = P ( X 5) =
100 100
6 2 83
F ( x ) = P( X 6) =
100 100
7 17 F ( x ) = P( X 7) = 1
100
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
5
P ( x = 3) + P ( x = 4 ) 5
(iv) P (1.5 X 4.5 X 2 ) = = 10 =
1 − P ( x = 0 ) + P ( x = 1) + P ( x = 2 ) 3 7
1−
10
(v) E ( x ) = xp ( x)
1 2 2 3 1 2 17
= 1 + 2 + 3 + 4 + 5 + 6 + 7
10 10 10 10 100 100 100
E ( x ) = 3.66
E ( x2 ) = x2 p ( x )
1 2 2 3 1 2 17
= 12 + 22 + 32 + 42 + 52 + 62 + 72
10 10 10 10 100 100 100
E ( x ) = 16.8
2
Mean = E ( x ) = 3.66
( )
Variance = E x 2 − E ( x ) = 16.8 − ( 3.66 ) = 3.404
2 2
1
(vi) The smallest value of n for which P ( X n ) is 4
2
Problem.4
The probability mass function of random variable X is defined as P ( X = 0 ) = 3C 2 ,
P ( X = 1) = 4C − 10C 2 , P ( X = 2 ) = 5C − 1 , where C 0 , and P ( X = r ) = 0 if r 0,1, 2
. Find (i). The value of C .
(ii). P ( 0 X 2 x 0 ) .
(iii). The distribution function of X .
1
(iv). The largest value of x for which F ( x ) .
2
Solution:
x=2
(i) Since p ( x) = 1
x =0
p ( 0 ) + p (1) + p ( 2 ) = 1
3C 2 + 4C − 10C 2 + 5C − 1 = 1
7C 2 − 9C + 2 = 0
2
C = 1,
7
C = 1 is not applicable
2
C =
7
The Probability distribution is
X : 0 1 2
12 16 21
P( X ) :
49 49 49
P ( 0 x 2 ) x 0 P 0 x 2
(ii) P 0 x 2 = =
x 0 P x 0 P x 0
P x = 1
=
P x = 1 + P X = 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
16
16
P 0 x 2 = = 49
x 0 16 21 37
+
49 49
(iii). The distribution function of X is
X F ( X = x) = P ( X x)
12
0 F ( 0) = P ( X 0) = = 0.24
49
12 16
1 F (1) = P ( X 1) = P ( X = 0 ) + P ( X = 1) =
+ = 0.57
49 49
12 16 21
2 F ( 2) = P ( X 2) = P ( X = 0 ) + P ( X = 1) + P ( X = 2 ) = + + =1
49 49 49
1
(iv) The Largest value of x for which F ( x ) = P ( X x ) is 0.
2
Problem.5
x
; x = 1, 2,3, 4,5
If P ( x ) = 15
0 ; elsewhere
Find (i) P X = 1or 2 and (ii) P 1/ 2 X 5 / 2 x 1
Solution:
1 2 3 1
i) P ( X = 1 or 2 ) = P ( X = 1) + P ( X = 2 ) =
+ = =
15 15 15 5
1 5
P X ( X 1)
1
ii) P X / x 1 =
5 2 2 = P ( X = 1or 2 ) ( X 1)
2 2 P ( X 1) P ( X 1)
P ( X = 2)
=
1 − P ( X = 1)
2 /15 2 /15 2 1
= = = = .
1 − (1/15) 14 /15 14 7
Problem.6
A continuous random variable X has a probability density function f ( x ) = 3x 2 ,
0 x 1. Find ' a ' such that P ( X a ) = P ( X a ) .
Solution:
1
Since P ( X a ) = P ( X a ) , each must be equal to because the probability
2
is always 1.
1
P ( X a) =
2
a
1
f ( x ) dx =
0
2
a
a
1 x3 1
0 = = a3 = .
2
3 x dx 3
2 3 0 2
1
1 3
a =
2
Problem.7
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Cxe − x ; if x 0
A random variable X has the p.d.f f ( x ) given by f ( x ) = Find the value
0 ; if x 0
of C and cumulative density function of X .
Solution:
Since f ( x ) dx = 1
−
Cxe
−x
dx = 1
0
C x ( −e− x ) − ( e− x ) = 1
0
C =1
xe − x ; x 0
f ( x) =
0 ;x 0
Cumulative Distribution of x is
x x
F ( x ) = f ( x )dt = xe − x dx = − xe − x − e − x = − xe − x − e − x + 1
x
0
0 0
= 1 − (1 + x ) e− x , x 0 .
1
( x + 1) ; −1 x 1
8. If a random variable X has the p.d.f f ( x ) = 2 . Find the mean
0 ; otherwise
and variance of X .
Solution:
1
1 1 1
1 x3 x 2
Mean=1 = xf ( x ) dx = x ( x + 1) dx = ( x 2 + x ) dx = + =
1 1 1
−1
2 −1 2 −1 2 3 2 −1 3
1
1 x x3
1 1 4
2 = x f ( x )dx = ( x3 + x 2 ) dx = + = + − + = . =
2 1 1 1 1 1 1 1 2 1
−1
2 −1 2 4 3 −1 2 4 3 4 3 2 3 3
( )
1 1 3 −1 2
2
Variance = 2 − 1 = − = = .
3 9 9 9
9. A continuous random variable X that can assume any value between X = 2 and
X = 5 has a probability density function given by f ( x) = k (1 + x) . Find P ( X 4 ) .
Solution:
k (1 + x) , 2 x 5
Given X is a continuous random variable whose pdf is f ( x ) = .
0 , Otherwise
5
Since f ( x ) dx = 1 k (1 + x)dx = 1
− 2
5
(1 + x)2
k =1
2 2
(1 + 5) 2 (1 + 2) 2
k − =1
2 2
9
k 18 − = 1
2
27 2
k =1 k =
2 27
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2(1 + x)
,2 x 5
f ( x ) = 27
0 , Otherwise
4
2
27 2
P( X 4) = (1 + x) dx
4
2 (1 + x)2 2 (1 + 4)2 (1 + 2)2 2 25 9 2 16 16
= = − = − = = .
27 2 2 27 2 2 27 2 2 27 2 27
2e −2 x ; x 0
10. A random variable X has density function given by f ( x ) = . Find
0 ;x 0
m.g.f
Solution:
M X ( t ) = E ( etx ) = etx f ( x ) dx = etx 2e −2 x dx
0 0
= 2 e( t − 2 ) x dx
0
e( t − 2 ) x 2
= 2 = ,t 2 .
t − 2 0 2 − t
2 x, 0 x b
11. The pdf of a random variable X is given by f ( x ) = . For what value
0, otherwise
of b is f ( x ) a valid pdf? Also find the cdf of the random variable X with the above
pdf.
Solution:
2 x, 0 x b
Given f ( x ) =
0, otherwise
b
Since f ( x ) dx = 1 2 x dx = 1
− 0
b
x2
2 2 = 1
0
b 2 − 0 = 1 b =1
2 x, 0 x 1
f ( x) =
0, otherwise
x
x
x2
x
F ( x ) = P( X x) = f ( x ) dx = 2 xdx = 2 = x 2 , 0 x 1
0 0 2 0
x x
F ( x ) = P( X x) = f ( x ) dx = 0 dx = 0 , x 0
− −
0 1 x 0 1 x
F ( x ) = P( X x) = f ( x ) dx + f ( x ) dx + f ( x ) dx = 0 dx + 2 x dx + 0 dx =
− 0 1 − 0 1
2 1
x
2 2 = 1 , x 1
0
0, x0
2
F ( x) = x , 0 x 1
1, x 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
K
,− x
12. A random variable X has density function f ( x ) = 1 + x 2 .
0 , Otherwise
Determine K and the distribution functions. Evaluate the probability P ( x 0 ) .
Solution:
Since
−
f ( x )dx = 1
K
1+ x
−
2
dx = 1
dx
K =1
1 + x2
K ( tan −1 x )
=1
−
K −− =1
2 2
K = 1
1
K=
x x
K
F ( x) = f ( x )dx = 1 + x 2
dx
− −
1 −1
= tan x − −
2
1
F ( x) = + tan −1 x , − x
2
( tan x )
1 dx 1
P ( X 0) =
1+ x
−1
=
0
2
0
1 −1 1
= − tan 0 = .
2 2
Ke −3 x , x 0
13. If X has the probability density function f ( x ) = find K ,
0 , otherwise
P 0.5 X 1 and the mean of X .
Solution:
Since
−
f ( x ) dx = 1
Ke
−3 x
dx = 1
0
e−3 x
K
− =1
3 0
K
=1
3
K =3
1 1
e −3 − e −1.5
P ( 0.5 X 1) = f ( x ) dx = 3 e −3 x dx = 3 = e − e
−1.5 −3
0.5 0.5 −3
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Mean of X = E ( x ) = xf ( x ) dx = 3 xe −3 x dx
0 0
−e−3 x e−3 x 3 1 1
= 3x − 1 = =
3 9 0 9 3
1
Hence the mean of X = E ( X ) = .
3
14. If X is a continuous random variable with pdf given by
Kx in 0 x 2
2 K in 2 x 4
f ( x) = . Find the value of K and also the cdf F ( x ) .
6 K − Kx in 4 x 6
0 elsewhere
Solution:
Since F ( x ) dx = 1
2 4 6
x 2 2 6
x2
6
K + ( 2 x ) 2 + 6 x − = 1
4
2 0 4
2 4
K 2 + 8 − 4 + 36 − 18 − 24 + 8 = 1
8K = 1
1
K=
8
x
We know that F ( x ) = f ( x ) dx
−
x
If x 0 , then F ( x ) = f ( x ) dx = 0
−
x
If x ( 0, 2 ) , then F ( x ) = f ( x ) dx
−
0 x
F ( x) = f ( x ) dx + f ( x ) dx
− 0
0 x 0 x
1
= 0dx + Kxdx =
− 0
0dx +
−
8 0
xdx
x
x2 x2
F ( x ) = = ,0 x 2
16 0 16
0 2 x
If x ( 2, 4 ) , then F ( x ) = f ( x ) dx + f ( x ) dx + f ( x ) dx
− 0 2
0 2 x
= 0dx + Kxdx + 2Kdx
− 0 2
2
x2 x
2 x x
x 1
= dx + dx = +
0
8 2
4 16 0 4 2
1 x 1
= + −
4 4 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x 4 x −1
F ( x) = − = ,2 x4
4 16 4
0 2 4 x
If x ( 4, 6 ) , then F ( x ) = 0dx + Kxdx + 2Kdx + K ( 6 − x ) dx
− 0 2 4
2 4 x
x 1 1
= dx + dx + ( 6 − x ) dx
0
8 2
4 4
8
2 x
x2 x 6x x2
4
= + + −
16 0 4 2 8 16 4
1 1 6 x x2
= +1− + − − 3 +1
4 2 8 16
4 + 16 − 8 + 12 x − x 2 − 48 + 16
=
16
− x + 12 x − 20
2
F ( x) = ,4 x 6
16
0 2 4 6
If x 6 , then F ( x ) = 0dx + Kxdx + 2 Kdx + K ( 6 − x ) dx + 0dx
− 0 2 4 6
F ( x) = 1 , x 6
0 ;x0
2
x ;0 x 2
16
1
F ( x ) = ( x − 1) ;2 x4
4
−1
16 ( 20 − 12 x + x ) ; 4 x 6
2
1 ;x 6
2 x , 0 x 1
15. A random variable X has the P.d.f f ( x ) =
0 , Otherwise
1 1 1 3 1
Find (i) P X (ii) P x (iii) P X / X
2 4 2 4 2
Solution:
1/ 2
1
1/ 2 1/ 2
x2 2 1 1
(i) P x =
2 f ( x ) dx =
0 0
2 xdx = 2 =
2 0 8
=
4
1/ 2
1 1
1/ 2 1/ 2
x2
(ii) P x = f ( x ) dx = 2 xdx = 2
4 2 1/ 4 1/ 4 2 1/ 4
1 1 1 1 3
= 2 − = − = .
8 32 4 16 16
3 1 3
P X X P X
1
(iii) P X / X =
2
=
3 4 4
4 2 1 1
P X P X
2 2
1
3
1 1
x2 9 7
P X = f ( x ) dx = 2 xdx = 2 = 1 − =
4 3/ 4 3/ 4 2 3/ 4 16 16
1
1
1
x 1
1 3 2
P X = f ( x ) dx = 2 xdx = 2 = 1 − =
2 1/ 2 1/ 2 2 1/ 2 4 4
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
7
3 1 7 4 7
P X / X = 16 = = .
4 2 3 16 3 12
4
1 − 2x
e ,x 0
16. Let the random variable X have the p.d.f f ( x ) = 2 .Find the
0
, otherwise.
moment generating function, mean & variance of X .
Solution:
M X ( t ) = E ( e ) = e f ( x ) dx = etx e − x / 2 dx
tx tx 1
− 0
2
− 1 −t x
1 e 2
− 1 −t x
1 1 1
= e 2
dx = = , if t .
20 2 1 1 − 2t 2
− 2 − t
0
d 2
E ( X ) = M X ( t ) = 2
=2
dt t =0 (1 − 2t ) t =0
d2 8
E ( X 2 ) = 2 M X ( t ) = 3
=8
dt t =0 (1 − 2t ) t =0
Var ( X ) = E ( X 2 ) − E ( X ) = 8 − 4 = 4 .
2
17. The first four moments of a distribution about x = 4 are 1,4,10 and 45
respectively. Show that the mean is 5, variance is 3, 3 = 0 and 4 = 26 .
Solution:
Given 1 = 1, 2 = 4, 3 = 10, 4 = 45
r = r th moment about to value x = 4
Here A = 4
Here Mean = A + 1 = 4 + 1 = 5
( )
2
Variance = 2 = 2 − 1
= 4 −1 = 3 .
( )
3
3 = 3 − 321 + 2 1
= 10 − 3 ( 4 )(1) + 2 (1) = 0
3
( ) ( )
2 4
4 = 4 − 431 + 62 1 − 3 1
= 45 − 4 (10 )(1) + 6 ( 4 )(1) − 3 (1)
2 4
4 = 26 .
18. Find the moment generating function and rth moments for the distribution.
Whose p.d.f is f ( x ) = Ke− x , 0 x . Find also standard deviation.
Solution:
Total Probability=1
ke − x dx = 1
0
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
e− x
k =1
−1 0
k =1
M X ( t ) = E etx = etx e − x dx = e( t −1) x dx
0 0
e ( t −1) x
1
= = , t 1
t −1 0 1 − t
= (1 − t ) = 1 + t + t 2 + ... + t r + ...
−1
tr
r = coeff . of = r !
r!
When r = 1 , 1 = 1! = 1
r = 2 , 2 = 2! = 2
Variance = 2 − 1 = 2 − 1 = 1
Standard deviation=1.
19. A continuous random variable X has the p.d.f f ( x ) = kx 2e − x , x 0. Find the rth
moment of X about the origin. Hence find mean and variance of X.
Solution:
Kx e
2 −x
Since dx = 1
0
e− x e− x e− x
K x2 − 2 x + 2 = 1
−1 1 −1 0
2K = 1
1
K= .
2
r = x r f ( x )dx
0
1 r +2 − x
2 0
= x e dx
1
= e − x x ( r +3)−1dx =
( r + 2 )!
20 2
3!
Putting n = 1 , 1 = = 3
2
4!
n = 2 , 2 = = 12
2
Mean = 1 = 3
( )
2
Variable= 2 − 1
i.e. 2 = 12 − ( 3) = 12 − 9
2
2 = 3.
20. Find the moment generating function of the random variable X, with probability
x for 0 x 1
density function f ( x ) = 2 − x for 1 x 2 .Also find 1 , 2 .
0
otherwise
Solution:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
M X (t ) = e f ( x )dx
tx
−
1 2
= etx xdx + etx ( 2 − x ) dx
0 1
1 2
xetx etx etx etx
= − 2 + ( 2 − x ) − (−1) 2
t t 0 t t 1
et et 1 e 2 t e t e t
= − + + − −
t t2 t2 t2 t t2
2
et − 1
=
t
2
t t2 t3
= 1 + + + + ... − 1
1! 2! 3!
2
t t2 t3
= 1 + + + + ...
2! 3! 4!
t
1 = coeff . of =1
1!
t2 7
2 = coeff . of = .
2! 6
x −
1 −
21. The p.d.f of the r.v. X follows the probability law: f ( x ) = e
, − x .
2
Find the m.g.f of X and also find E X and V X .
Solution:
− x −
M X ( t ) = E ( e ) = e f ( x ) dx =
tx tx1
e etx dx
− −
2
( x − ) −( x − )
1 1
= − 2 e
etx dx +
2
e
etx dx
1 1
e−1 x t + e − x −t
M X (t ) = e 2
dx + e dx
2 −
1 1
t+ − −1
−1
e e e e
= +
2 1 2 1
t + −t
t t
e e e t 2 −1
= + = = e t 1 − ( t )
2 ( t + 1) 2 (1 − t ) 1 − t2 2
2t 2
= 1 + t + + ... 1 + 2t 2 + 4t 4 + ...
2!
3 t2 2
= 1+t + + ...
2!
E ( X ) = 1 = coeff . of t in M X ( t ) =
t2
2 = coeff . of in M X ( t ) = 3 2
2!
( ) = 3
2
Var ( X ) = 2 − 1 2
− 2 = 2 2 .
22. The elementary probability law of a continues random variable is
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
f ( x ) = y0e−b( x −a ) , a x , b 0 where a, b and y0 are constants. Find y0 the rth
moment about point x = a and also find the mean and variance.
Solution:
Since the total probability is unity,
f ( x ) dx = 1
−
y0 e − b( x − a ) dx = 1
0
e − b( x − a )
y0 =1
− b 0
1
y0 = 1
b
y0 = b.
r ( rth moment about the point x = a ) = ( x − a ) f ( x ) dx
r
−
= b ( x − a ) e
r − b( x − a )
dx
a
Put x − a = t , dx = dt , when x = a, t = 0 , x = , t =
= b t r e −bt dt
0
( r + 1) r!
=b ( r +1)
=
b br
In particular r = 1
1
1 =
b
2
2 = 2
b
1
Mean = a + 1 = a +
b
( )
2
Variance = 2 − 1
2 1 1
=2
− 2= 2.
b b b
23. The lifetime (in hours) of a certain piece of equipment is a continuous r.v. having
xe − kx , 0 x
range 0 x and p.d.f.is f ( x ) = . Determine the constant K and
0 , otherwise
evaluate the probability that the life time exceeds 2 hours.
Solution:
Let X the life time of a certain piece of equipment.
xe − kx , 0 x
Then the p.d.f. f ( x ) =
0 , Otherwise
To find K , f ( x ) dx = 1
0
e
− kx
x 2−1dx = 1
0
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
( 2)
2
=1 K2 =1 K = 1
K
xe − x , 0 x
f ( x) =
0 , Otherwise
P[Life time exceeds 2 hours] = P X 2
= f ( x ) dx
2
= xe− x dx
2
= x ( −e− x ) − ( e− x )
= 2e + e = 3e = 0.4060
−2 −2 −2
F ( x) = 2 e 2
U ( x ) find E ( x n ) and deduce the values of E ( X ) and Var ( X ) .
Solution:
1 if x 0
Here U ( x ) =
0 if x 0
E(x n
)=x n
f ( x )dx
0
− x2
x
=x n
e 2 2
dx
0
2
x2
Put =t, x = 0, t = 0
2 2
x
dx = dt x = ,t =
2
= ( 2 2t ) x = 2 . t
n/2
e − t dt
0
=2 t
n/2 n n / 2 −t
e dt
0
n
E ( x n ) = 2n / 2 n + 1 − (1)
2
Putting n = 1 in (1) we get
3 1
E ( x ) = 21/ 2 = 2 + 1
2 2
1 1
= 2
2 2
1
= [ =
2 2
E ( x) =
2
Putting n = 2 in (1), we get
E ( x 2 ) = 2 2 ( 2 ) = 2 2 [ ( 2 ) = 1]
Var ( X ) = E ( X 2 ) − E ( X )
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
= 2 2 − 2
2
4 − 2
= 2 − 2 = .
2 2
Standard Distributions
Discrete type
Binomial distribution:
A random variable X is said to follow binomial distribution if it assumes only
non negative values and its probability mass function is given by
nCx p x q n − x , x = 0,1, 2,..., n; q = 1 − p
P ( X = x ) = p( x) =
0, otherwise
Notation: X B ( n, p ) read as X is following binomial distribution with parameter
n and p .
1. Find m.g.f. of Binomial distribution and find its mean and variance.
Solution:
M.G.F.of Binomial distribution:-
n
M X ( t ) = E etx = etx P ( X = x )
x =0
n
= nCx x P x q n − x etx
x =0
x
= nCx ( pet ) q n − x
n
x =0
M X (t ) = ( q + pet )
n
t =0
E ( X 2 ) = M X ( 0 )
t =0
E ( X 2 ) = n ( n − 1) p 2 + np
= n 2 p 2 + np (1 − p ) = n 2 p 2 + npq
( )
Variance = E X 2 − E X = npq
2
Poisson distribution:
A random variable X is said to follow Poisson distribution if it assumes only
non negative values and its probability mass function is given by
e− x
; x = 0,1, 2,...; 0
P ( X = x ) = x!
0, otherwise
Notation: X P ( ) read as X is following Poisson distribution with parameter .
Poisson distribution as limiting form of binomial distribution:
Poisson distribution is a limiting case of Binomial distribution under the
following conditions:
(i). n the number of trials is indefinitely large, (i.e.) n →
(ii). p the constant probability of success in each trial is very small (i.e.) p → 0
(iii). np = is finite.
Proof:
P ( X = x ) = p ( x ) = ncx px qn − x
Let np =
p= , q =1−
n n
x n−x
p ( x ) = ncx 1 −
n n
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x n−x
n!
= 1−
x ! (n − x )! n n
n ( n − 1) ( n − ( x − 1) ) ( n − x ) ! x
n−x
= n 1 − n
x ! (n − x )!
1 x −1
1. 1 − 1 − n x n−x
=
n n x 1 −
x! n
nx
n−x
1 x −1 x
p ( x ) = 1. 1 − 1 − 1−
n n x! n
Taking limit n → on both sides
n−x
x 1 x − 1
lim p ( x ) = lim 1 − 1 − 1−
n → x ! n → n n n
−x n
x 1 x − 1
= lim 1 − 1 − lim 1 − lim 1 −
x ! n → n n n → n n → n
− x
e
P ( X = x) = ; x = 0,1, 2,...
x!
Problem.1Criticise the following statement: “The mean of a Poisson distribution is 5
while the standard deviation is 4”.
Solution:
For a Poisson distribution mean and variance are same. Hence this statement is not
true.
Problem.2If X is a Poisson variate P ( X = 2 ) = 9P ( X = 4 ) + 90P ( X = 6 ) , find mean
and variance of X.
Solution:
x
P ( X = x ) = e −, x = 0,1,2,...
x!
P ( X = 2 ) = 9P ( X = 4 ) + 90P ( X = 6 )
e − 2 e − 4 e − 6
=9 + 90
2! 4! 6!
2 4
1 9
= + 90
2! 4! 6!
4
1 3 2
= +
2 8 8
3 2 4
1= +
4 4
4 + 3 3 − 4 = 0
Put 2 = t , t 2 + 3t − 4 = 0
(t + 4 )(t − 1) = 0
t = 1, −4
2 = 1, 2 = −4
= 1 , = 2i
Mean = = 1 ( 0 )
Variance = = 1 .
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Problem.3 If X is a Poisson rv such that P ( X = 1) = 0.3 and P ( X = 2 ) = 0.2 . Find
P ( X = 0) .
Solution:
x
Given X is a Poisson rv, p ( X = x ) = e − , x = 0,1,...
x!
e − 1
P ( X = 1) = = 0.3 (1)
1!
e − 2
P ( X = 2) = = 0.2 (2)
2!
(1) e − 1 0.3
− 2
2=
(2) e 0.2
1 0.3
=
2 ( 0.2)
= 1.3333
(1.3333 )x
P (X = x) = e −1.3333
x!
(1)
P ( X = 0 ) = e −1.3333 = 0.2636 .
1
Problem.4 Out of 800 families with 4 children each, how many families would be
expected to have (i) 2 boys and 2 girls, (ii) at least 1 boy, (iii) at most 2 girls and (iv)
children of both sexes. Assume equal probabilities for boys and girls.
Solution:
Considering each child as a trial, n = 4 . Assuming that birth of a boy is a success,
1 1
p= q= .
2 2
Let X denote the number of successes (boys)
= 800 ( 0.375 )
= 300
(ii) P ( at least 1 boy ) = P ( X 1)
= 1 − P ( X 1)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
= 1 − P ( X = 0)
0 4
1 1
= 1 − 4c0
2 2
= 0.9375
No. of families having at least 1 boy = N .P ( X 1)
= 800 ( 0.9375 )
= 750 .
(iii) P ( atmost 2 girls) = P ( exactly 0 girl, 1girl, 2 girls)
= P ( X = 4 ) + P ( X = 3) + P ( X = 2)
4 0 3 1 2 2
1 1 1 1 1 1
= 4c4 + 4c3 + 4c4
2 2 2 2 2 2
= 0.6875.
No. of families having atmost 2 girls = 800 ( 0.6875) = 550 .
= 1 − P ( X = 4 ) + P ( X = 0 )
1
4
1
4
= 1 − 4c4 + 4c0
2 2
= 0.875 .
No. of families having children of both genders = 800 ( 0.875 ) = 700 .
Continuous type
Uniform (or) Rectangular distribution:
A continuous random variable X is said to have a uniform distribution over
an interval ( a, b ) if its probability density function is given by
1
,a x b
f ( x) = b − a
0, otherwise
12 3
a + b = 2 & b − a = 4 We get b = 3, a = −1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
a = −1& b = 3 and probability density function of x is
1
; −1 x 3
f ( x) = 4
0 ; Otherwise
0
1 1 1
P x 0 = 4 dx = 4 x =
0
−1
.
−1
4
Normal distribution:
A random variable X is said to have a Normal distribution with parameters
(mean) and 2 (variance) if its probability density function is given by the
probability law
1 x−
2
1 −
f ( x) = e 2 , − x , − , 0
2
Notation: X N ( , 2 ) read as X is following normal distribution with mean and
variance 2 are called parameter.
t2
Problem.1 Prove that “For standard normal distribution N ( 0,1) , M X ( t ) = e . 2
Solution:
Moment generating function of Normal distribution
= M X ( t ) = E etx
1 x−
2
−
1
e
2
= tx
e dx
2 −
x−
Put z = then dz = dx, − Z
z2
1 t ( z + ) −
M X (t ) = e 2
dz
2 −
z2
e t − −t z
=
2 e
−
2
dz
1 2t 2
( z −t )2 +
e t −
= e
2 2
dz
2 −
2t 2
e t e 2 −
1
( z −t )2
=
2 e
−
2
dz
1
1 − ( z −t )2
the total area under normal curve is unity, we have
2 e
−
2
dz = 1
2t 2
t +
Hence M X ( t ) = e 2
For standard normal variable N ( 0,1)
t2
M X (t ) = e 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
n n
2
i , i .
i =1 i =1
Proof:
We know that. M X1 + X 2 +...+ X n ( t ) = M X1 ( t ) M X 2 ( t ) ...M X n ( t )
t 2 i 2
i t +
But M X i ( t ) = e 2
, i = 1, 2....n
t 12
2
t 2 22 t 2 n 2
1t + 2 t + n t +
M X1 + X 2 +...+ X n ( t ) = e 2
e 2
...e 2
( 1 + 2 +...+ n )t +
( 1 + 2 +...+ n
2 2 2
)t 2
=e 2
n
n i 2 t 2
it + i=1 2
= e i=1
By uniqueness MGF, X 1 + X 2 + ... + X n follows normal random variable with
n n
parameter i , i 2 .
i =1 i =1
This proves the property.
Problem.3 X is a normal variate with mean = 30 and S.D = 5 Find the following
P 26 X 40
Solution:
X ~ N ( 30,52 )
= 30 & = 5
X −
Let Z = be the standard normal variate
26 − 30 40 − 30
P 26 X 40 = P Z
5 5
= P −0.8 Z 2 = P −0.8 Z 0 + P 0 Z 2
= P 0 Z 0.8 + 0 z 2
= 0.2881 + 0.4772 = 0.7653 .
Problem.4 The average percentage of marks of candidates in an examination is 45
will a standard deviation of 10 the minimum for a pass is 50%.If 1000 candidates
appear for the examination, how many can be expected marks. If it is required, that
double that number should pass, what should be the average percentage of marks?
Solution:
Let X be marks of the candidates
Then X ~ N ( 42,10 2 )
X − 42
Let z =
10
P X 50 = P Z 0.8
= 0.5 − P 0 z 0.8
= 0.5 − 0.2881 = 0.2119
Since 1000 students write the test, nearly 212 students would pass the examination.
If double that number should pass, then the no of passes should be 424.
We have to find z1 , such that P Z z1 = 0.424
P 0 z z1 = 0.5 − 0.424 = 0.076
From tables, z = 0.19
50 − x1
z1 = x1 = 50 − 10 z1
10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
= 50 −1.9 = 48.1
The average mark should be 48 nearly.
Problem.5 Given that X is normally distribution with mean 10 and probability
P X 12 = 0.1587 . What is the probability that X will fall in the interval ( 9,11) .
Solution:
Given X is normally distributed with mean = 10.
x−
Let z = be the standard normal variate.
12 − 10 2
For X = 12, z = z=
2
Put z1 =
Then P X 12 = 0.1587
P Z Z1 = 0.1587
0.5 − p 0 z z1 = 0.1587
P 0 z z1 = 0.3413
From area table P 0 z 1 = 0.3413
2
Z1 = 1 =1
To find P 9 x 11
1 1
For X = 9, z = − and X = 11, z =
2 2
P 9 X 11 = P −0.5 z 0.5
= 2 P 0 z 0.5
= 2 0.1915 = 0.3830
Problem.6 In a normal distribution 31% of the items are under 45 and 8% are over
64.Find the mean and standard deviation of the distribution.
Solution:
Let be the mean and be the standard deviation.
Then P X 45 = 0.31 and P X 64 = 0.08
45 −
When X = 45 , Z = = − z1
z1
z1 = 0.495
45 − = −0.495 ---(1)
64 −
When X = 64 , Z = = z2
z2
z2 = 1.405
64 − = 1.405 ---(2)
Solving (1) & (2) We get = 10 (approx) & = 50 (approx)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Population:
The group of individuals under study is called population. The population may be finite or infinite.
Sample and Sample Size:
A finite subset of statistical individuals in a population is called Sample. The number of individuals in a
sample is called Sample Size(n).
Parameter and Statistic:
A numerical measure of a population is called a population parameter or simply a parameter.
A numerical measure of the sample is called a sample statistic or simply a statistic.
Sampling distribution:
The sampling distribution of a statistic is the probability distribution of all possible values the statistic
may take, when computed from random samples of same size, drawn from a specified population. Like
any other distribution, a sampling distribution will have its mean, standard deviation and moments of
higher order.
Standard Error:
The standard deviation of the sampling distribution of a statistic is known as itsstandard error.
Uses of Standard Error:
The magnitude of the standard error gives an index of the reliability of the estimate of the parameter.
The greater the standard error of the estimate, lesser will be the reliability of the sample.
Standard error is useful for determining the probable limits or confidence limits for an unknown
parameter with a specified confidence co-efficient.Standard error is also used for testing of hypothesis.
Type I error and Type II error:
Type I error: If we reject a hypothesis when it should be accepted, we say that type I error has been
made.
Type II error: If we accept a hypothesis when it should be rejected, we say that a type II error has been
made.
Critical region:
A region corresponding to a test statistic in the sample space which tends to rejection of H 0(Null
Hypothesis) is called critical region or region of rejection.
The region complementary to the critical region is called the region of acceptance.
Level of significance:
The probability ‘’ (the probability of making type I error) that a random value of the test statistic
belongs to the critical region is known as the level of significance. In other words, level of significance
is the size of the type I error.
The levels of significance usually employed in testing of hypothesis are 5% and 1%.
Critical values or significant values:
The value of test statistic which divides the critical (or rejection) region and acceptance region is called
the critical value or significant value. It depends on the level of significance used and the alternative
hypothesis.
Different types of sampling:
Non probability Samples: Judgment sample, Quota sample, Chunk sample.
Probability samples: Simple random sample, stratified sample, systematic sample, Cluster sample.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
When the hypothesis about the population parameter is rejected only for the value of sample statistic
falling into one of the tails of the sampling distribution, then it is known as one-tailed test.
If it is right tail then it is called right-tailed test or one-sided alternative to the right and if it is on the left
tail, then it is one-sided alternative to the left and called left-tailed test.
Two tailed test:
Two tailed test is one where the hypothesis about the population parameter is rejected for the value of
sample statistic falling into the either tails of the sampling distribution.
Systematic sampling:
In a systematic sample, the N items in the population are partitioned into k groups by dividing the size of
the population by the desired sample size n.
Stratified Sampling:
In a stratified sample, then N items in the population are first subdivided into separate subpopulations, or
strata, according to some common characteristic.
Cluster Sampling:
In a cluster sample, the N items in the population are divided into several clusters so that each cluster is
representative of the entire population.
Sampling Error:
Sampling errors have their origin in sampling and arise due to the fact that only a part of the population
has been used to estimate populations parameters and draw inferences about them.
Estimator:
An estimator of a population parameter is a sample statistic used to estimate the parameter. An estimate
of the parameter is a particular numerical value of the estimator obtained by sampling.
Different types of estimation:
There are two types of estimation. They are Point estimation and Interval estimation.
Point estimation:
When a single value is used as an estimate, the estimate is called a point estimate of the population
parameter. For example, the sample mean is the sample statistic used as an estimate of population mean
μ.
Interval estimation:
An estimate of a population parameter given by two numbers between which the parameter may be
considered to lie is called an interval estimate of the parameter.
The interval estimate or a confidence interval consists of an upper confidence limit and lower confidence
limit and we assign a probability that this interval contains the unknown population parameter.
Characteristics of a good estimator:
The important properties of good statistical estimators are (i) unbiasedness (ii) efficiency (iii) consistency
(iv) sufficiency.
Unbiased estimator:
An estimator is said to be unbiased if its expected value is equal to the population parameter it estimates.
Consistent estimator:
An estimator is said to be consistent if its probability of being close to the parameter it estimates
increases as the sample size increases.
Efficient estimator:
An estimator is efficient if it has a relatively smaller variance.
Sufficient estimator:
An estimator is said to be sufficient if it contains all the information in the data about the parameter it
estimates.
FORMULAS:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
1. Write the confidence interval for the population mean for large samples when is known.
The confidence interval for μ when is known and sampling is done from a normal population or with a
large sample, is x Z /2 .
n
Here x - sample mean, - standard deviation, n – size of the sample
2. Write the confidence interval for the population mean for small samples when is unknown.
s 1
i
2
The confidence interval for μ when is not known is x t /2 , where S 2 x x
n n 1
Here x - sample mean, s - standard deviation, n – size of the sample
3. Write the confidence interval for the difference between two population means for large samples
when is known.
The confidence limits for the difference between two population means are given by
12 22
x1 x2 Z
n1 n2
2
4. Write the confidence interval for the difference between two population means for small samples
when is unknown.
The confidence limits for the difference between two population means are given by
12 22
1
xi x yi y
2 2
x1 x2 t S , where S 2
2 n1 n2 n1 n2 2
5. Write the confidence interval for the population proportion for large samples.
pq
The confidence interval for the population proportion P is p Z , where q = 1- p.
2 n
6. Write the confidence interval for the difference between two population proportions for large
samples.
Confidence limits for the difference between two population proportions are
pq p q
p1 p2 Z 2 1 1 2 2
n1 n2
7. Write the confidence interval for a mean when a finite population N is known?
N n
x z
2 n N 1
8. What is the sample size for estimating a population mean when the sample standard deviation
and standard error is known?
Z .
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
PROBLEMS:
1. A machine produces components, which have a standard deviation of 1.6cm in length. A
random sample of 64 parts is selected from the output and this sample has a mean length of 90cm.
The customer will reject the part if it is either less than 88cm or more than 92cm. Does the 95%
confidence interval for the true mean length of all the components produced ensure acceptance by
the customer?
Solution:
Formula for confidence interval is x z x z
2 n 2 n
where 1.6, z 1.96, x 90 and n 64.
2
89.61 90.39.
This implies that the probability that thetrue value of the population mean length of the components
will faill in this interval is 95%.
2. A server channel monitored for an hour was found to have an estimated mean of 20 transactions
transmitted per minute. The variance is known to be 4. Find the standard error. Establish an
interval estimate that includes a population mean 95% of the time and 99% of the time.
Solution:
(i) Standard error = 0.2582
n
(ii) Z 1.96`. 95% confidence interval is x Z x (19.4939,20.5061)
2 2
z 1.96`. 95% confidence interval is x 1 x 2 z
S12 S 22
n1 n2
(8.06,13.94)
2 2
5. A sample poll of 100 voters chosen at random from all voters in a given district indicated that
55% of them were in favour of a particular candidate. Find (i) 95% and (ii) 99% confidence limits
for the proportion of all the voters in favour of this candidate.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
We have n 100.
Solution: Sample proportion p 0.55 and q 0.45 Z 1.96
2
pq
(a) Z 1.96 : 95%Confidenceintervalfor p p Z 0.4526,0.6474
2 2 n
pq
(b) Z 2.58 : 99%Confidenceintervalfor p p Z 0.4218,0.6782
2 2 n
6. Two operators perform the same operation of applying a plastic coating to a part. A random
sample of 100 parts from the first operator shows that 6 are non-conforming. A random sample of
200 parts from the second operator shows that 8 are non-conforming. Find a 90% confidence
interval for the difference in the proportion of non-conforming parts produced by the two
operators.
Solution:
Given n1 100, n2 200, p1 0.06, q1 0.94, p 2 0.04, q 2 0.96, Z 1.645.
2
90% confidence interval for the difference in the proportionof non - conforming parts produced is
p1 q1 p 2 q 2
p1 p 2 Z 0.0275,0.0652
2 n1 n2
7. The operations manager for a large newspaper wants to determine the proportion of newspapers
printed that have a non conforming attribute, such as excessive rub off, missing pages, and
duplicate pages. The operations manager determines that a random sample of 200 newspapers
should be selected for analysis. Suppose that, of this sample of 200, thirty five contain same type of
non conformance. If the operations manager wants to have 90% confidence in estimating the true
population proportion, set up the confidence interval estimate.
Solution:
We have x 35 and n 200.
x
Sample proportion p 0.175 , q 0.825 and Z 1.645
n 2
0.1308,0.2192
pq
90% confidence interval for p p Z
2 n
8. The following are the average weekly losses of worker-hours due to accidents in 10 industrial plants
before and after a certain safety program were put into operation:
Before 45 73 46 124 33 57 83 34 26 17
After 36 60 44 119 35 51 77 29 24 11
Find a 90% confidence interval for the mean improvement in lost worker hours.
Null Hypothesis H0: There is no improvement between before and after the safety program.
Alternative Hypothesis H1: There is an improvement in the performance between before and after the safety
program.
From the given data
Group 1 Group 2
Mean 53.8 48.6
Variance 1027.7333 962.9333
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Since the calculated t value is smaller than critical value (4.0333<4.297), we reject H0. So the means are not
significantly different.
9. In a test given to two groups of students the marks obtained were as follows
I group 18 20 36 50 49 36 34 49 61
II group 29 28 26 35 30 44 46
Construct a 95% confidence interval on the mean marks secured by students of the above two
groups.
Solution:
Given n 1 9, n2 7, x 1 37 , x 2 34
2
xi x1 xi x 2 108.57. S 10.42.
1
S2
2
n1 n2 2 i j
t 1.76`. 95% confidence interval is x 1 x 2 t S
1
1
n1 n2
(6.24,12.24)
2 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
sample of 120 stereos of each manufacturer are tested. (i) What is the probability that the
manufacturer of A’s stereo’s will have a mean life time of atleast 160 hours more than the
manufacturer B’s stereos (ii) and 250 hours more than the manufacturer B stereos.
Solution:
(i)
P x 160 Pz 1.95 0.5 0.4750 0.9750
(ii) Px 250 Pz 2.45 0.5 0.4929 0.0071
14. If two proportions 10% of machine produced by a company A are defective and 5% of machine
produced by a company B are defective. A random sample of 250 machines is taken from
company A and has the random sample of 300 machines from company B. What is the probability
that the difference in sample proportion is less than or equal to 0.02.
Solution:
P p1 p2 0.02 P( z 1.32) 0.5 P0 z 1.32 0.5 0.4066 0.0934
15. A random sample of 500 toys was taken from a consignment and 65 were found to be defective.
Find the percentage of defective toys in the consignment.
Solution:
n 500; X 65. p 0.13
pq
The limits for the population proportion P are given by p 1.96 (0.159,0.101)
n
The percentage of defective toysin the consignmen t lies betwwen and 10.1% and 15.9%.
16. What are the different types of sampling methods? Also write short notes on different types of
sampling?
Solution:
Simple random sampling, Stratified sampling, Cluster sampling, Judgment sampling and Quota
sampling.
Technique Descriptions Advantages Disadvantages
Simple Random sample from Highly representative if all Not possible without complete list
random whole population subjects participate; the ideal of population members; potentially
uneconomical to achieve; can be
disruptive to isolate members from
a group; time-scale may be too
long, data/sample could change
Stratified Random sample from Can ensure that specific groups More complex, requires greater
random identifiable groups (strata), are represented, even effort than simple random; strata
subgroups, etc. proportionally, in the sample(s) must be carefully defined
(e.g., by gender), by selecting
individuals from strata list
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Purposive Hand-pick subjects on the Ensures balance of group sizes Samples are not easily defensible as
basis of specific when multiple groups are to be being representative of populations
characteristics selected due to potential subjectivity of
researcher
Quota Select individuals as they Ensures selection of adequate Not possible to prove that the
come to fill a quota by numbers of subjects with sample is representative of
characteristics appropriate characteristics designated population
proportional to
populations
Snowball Subjects with desired traits Possible to include members of No way of knowing whether the
or characteristics give groups where no lists or sample is representative of the
names of further identifiable clusters even exist population
appropriate subjects (e.g., drug abusers, criminals)
Volunteer, Either asking for Inexpensive way of ensuring Can be highly unrepresentative
accidental, volunteers, or the sufficient numbers of a study
convenience consequence of not all
those selected finally
participating, or a set of
subjects who just happen
to be available
17. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size of 22, test the
hypothesis that the value of the population mean is 70 against the alternative that it is more than 70. Use
the 0.025 level of significance.
Here the sample size, n = 22 < 30. Hence the sample is small sample. Given x 83, 70, s 12.5
Null Hypothesis H0: There is no significant different between sample mean and population mean.
Alternative Hypothesis H1: > 70.
Degrees of freedom: n–1 = 21.
x
The test statistic is, t
s
n 1
83 70
=4.7659
12.5
21
Tabulated value of t at 5% level with 21 degrees of freedom for single tailed test is 1.72. Here calculated value >
tabulated value, we reject H0.
Therefore > 70.
18. All the 0.10 level of significance, can we conclude that the following 400 observations follow a Poisson
distribution with = 3?
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Unit-III Testing of Hypothesis
Population:
A population in statistics means a set of object. The population is finite or infinite
according to the number of elements of the set is finites or infinite.
Sampling:
A sample is a finite subset of the population. The number of elements in the sample is
called size of the sample.
Large and small sample:
The number of elements in a sample is greater than or equal to 30 then the sample is called a
large sample and if it is less than 30, then the sample is called a small sample.
Parameters:
Statistical constant like mean , variance 2 , etc., computed from a population are called
parameters of the population.
Statistics:
Statistical constants like x , variance S 2 , etc., computed from a sample are called samlple
staticts or statistics.
Statistical Hypothesis:
In making statistical decision, we make assumption, which may be true or false are called
Statistical Hypothesis.
Null Hypothesis( H 0 ):
For applying the test of significance, we first setup a hypothesis which is a statement about the
population parameter. This statement is usually a hypothesis of no true difference between
sample statistics and population parameter under consideration and so it is called null hypothesis
and is denoted by H 0 .
Alternative Hypothesis ( H1 ):
Suppose the null hypothesis is false, then something else must be true. This is called an
alternative hypothesis and is denoted by H1 .
Eg. If H 0 is population mean =300, then H1 is 300 (ie. 300 or 300) or
H1 is 300 or H1 is 300 . So any of these may be taken as alternative hypothesis.
Error in sampling:
After applying a test of significance a decision is to be taken to accept or reject the null
hypothesis H 0 .
Type I error: The rejection of the null hypothesis H 0 when it is true is called type I error.
Type II error: The acceptance of the null hypothesis H 0 when it is false is called type II error.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Level of significance:
The probability of type I error is called level of significance of the test and it is denoted by α.
We usually take either α=5% or α=1%.
One tailed and Two tailed test:
If 0 is a population parameter and is the corresponding sample statistics and if we setup
the null hypothesis H 0 : 0 , then the alternative hypothesis which is complementary to H 0 can
be anyone of the following:
(i) H1 : 0 ( 0 or 0 ) (ii) H1 : 0 (iii) H1 : 0
Alternative hypotheis, whereas H1 given in (ii) is called a left-tailed test. And (iii) is called a
right tailed test.
Level of significance:
The probability of Type I error is called the level of significance of the test and is denoted by .
Critical region:
For a test statistic, the area under the probability curve, which is normal is divided into two
region namely the region of acceptance of H 0 and the region of rejection of H 0 . The region in
which H 0 is rejected is called critical region. The region in which H 0 is accepted is called
acceptance region.
Procedure of Testing of Hypothesis:
(i) State the null hypothesis H 0
(ii) Decide the alternative hypothesis H1 (ie, one tailed or two tailed)
(iii) Choose the level of significance α (α=5% or α=1%).
(iv) Determine a suitable test statistic.
t E (t )
Test statistic
S .E of (t )
(v) Compute the computed value of z with the table value of z and decide the acceptane or the
rejection of H 0 .
For a single tail test(right tail or left tail) we compare the computed value of z with 1.645(at
5% level) and 2.33(at 1% level) and accept or reject H 0 accordingly.
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Test of significance of small sample:
When the size of the sample (n) is less than 30, then that sample is called a small sample.
The following are some important tests for small sample,
(I) students t test
(II) F-test
(III) 2 -test
I Student t test
(i). Test of significance of the difference between sample mean and population mean
(ii). Test of significance of the difference between means of two small samples
(i) Test of significance of the difference between sample mean and population mean:
x
The studemts ‘t’ is defined by the statistic t where x =sample mean, =population
S
n
mean, S=standard deviation of sample,
n= sample size.
Note:
x
If standard deviation of sample is not given directly then, the static is given by t , where
S
n
x x
n n
xi
2
i
x i 1
,S2 i 1
n n 1
Confident Interval:
s
The confident interval for the population mean for small sample is x t
n
s s
x t , x t
n n
Working Rule:
(i) Let H 0 : x (there is no significant difference between sample mean and population
mean)
H1 : x (there is no significant difference between sample mean and population
mean)(Two tailed test)
x
Find t .
S
n 1
Let t be the table value of t with v=n-1 degrees of freedom at % level of significance.
Conclusion:
If t t , H 0 is accepted at % level of significance.
If t t , H 0 is rejected at % level of significance.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Problem:
1. The mean lifetime of a sample of 25 bulbs is found as 1550h, with standard deviation of
120h. The company manufacturing the bulbs claims that the average life of their bulbs is
1600h. Is the claim acceptable at 5% level of significance?
Solution:
Given sample size n=25, mean x =1550, S.D.(S)=120, population mean =1600
Let H 0 : 1600 ( the claim is acceptable)
H1 : 1600 ( x) (two tailed test)
x 1550 1600
Under H 0 , the test statistic is t 2.0833
S 120
n 25
t 2.0833
From the table, for v=24, t0.05 =2.064. Since t t0.05
H 0 is rejected
Conclusion: The claim is not acceptable.
2. Test made on the breaking strength of 10 pieces of a metal gave the following results:
578,572,570,568,572,570,570,572,596, and 584kg. Test if the mean breaking strength of the
wire can be assumed as 577kg.
Solution:
let us first compute sample mean x and sample standard deviation S and then test if x differs
significantly from the population mean =577.
xx x x
2
x
x i
5752
x i 1
575.2,
n 10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x
n 2
i x
681.6
S2 i 1
75.733
n 1 9
Let H 0 : x ,
H1 : x
x 572.2 577
Under H 0 , the test statistic is t 1.74
S 75.733
n 10
t 1.74
Tabulated value of t for v=9 degrees of freedom t0.05 =2.262
Since t t0.05 . H 0 is accepted
Conclusion:
The mean breaking strength of the wire can be assumed as 577kg at 5% level of significance.
3. A random sample of 10 boys had the following I.Q’s: 70, 120, 110, 101, 88, 83, 95,
98,107,100. Do these data support the assumption of a population mean I.Q of 100 ? Find a
reasonable range in which most of the mean I.Q. values of samples of 10 boys lie.
Solution:
Given 100, n 10
Null Hypothesis:
H 0 : 100 i.e., The data are consist with the assumption of men IQ of 100 in the population
Alternate Hypothesis:
H1 : 100 i.e., The data are consist with the assumption of men IQ of 100 in the population
x
x 70 120 110 101 88 83 95 98 107 100 972 97.2
n 10 10
1 (70 97.2) (120 97.2) (110 97.2) (101 97.2) (88 97.2)
2 2 2 2 2
S2
10 1 (83 97.2)2 (95 97.2)2 (98 97.2)2 (107 97.2)2 (100 97.2)2
1
S 2 (1833.6) 203.73 S 14.2734
9
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
97.2 100 2.8
t 0.6203
14.2734 4.5136
10
Table value : t ,n1 t 5% ,101 t 0.05,9 2.262 (Two –tailed test)
Conclusion :
Here t t
i.e., The table value >calculated value,
we accept the null hypothesis and conclude that the data are consistent with the assumption of
mean I.Q of 100 in the population.
To find the confidence limit:
S 14.2734
x t 97.2 2.262 97.2 (2.262)(4.514) (86.99,107.41)
n 10
A reasonable range in which most of the mean I.Q. values of samples of 10 boys lies
(86.99,107.41)
4. A random sample of 16 values from a normal population showed a mean of 41.5 inches and
the sum of squares of deviations from this mean equal to 135 square inches. Show that the
assumption of a mean of 43.5 inches for the population is not reasonable. Obtain 95 percent
and 99 percent confidence limits for the same.
Solution:
Given x 41.5, 43.5, n 16
Sum of squares of deviations from mean= x x
2
135
The parameter of interest is .
Null Hypothesis H0: =43.5 i.e., the assumption of a mean of 43.5 inches for the population is
reasonable.
Alternative Hypothesis H1: 43.5 i.e., the assumption of a mean of 43.5 inches for the
population is not reasonable.
Level of significance: (i) 5% = 0.05, degrees of freedom = 16–1=15
(ii) 1% =0.01, degrees of freedom = 16–1=15
x
Test Statistic : t
S
n
1 1
where S 2
n 1
( x x )2
16 1
135 9 S 9
41.5 43.5 8
t 2.667 t 2.667
3 3
16
Conclusion:
(i) Since t =2.667 > 2.131 so we reject H0 at 5% level of significance.
So we conclude that the assumption of mean of 43.5 inches for the population is not
reasonable.
(ii) Since t =2.667 < 2.947 so we accept H0 at 1% level of significance.
So we conclude that the assumption is reasonable.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
95% confidence limits:
S 3
x t 41.5 2.947 4 41.5 1.5983 (39.9, 43.09)
n
39.902 43.098
99% confidence limits:
S 3
x t 41.5 2.947 4 41.5 2.2101 (39.29, 43.71)
n
39.29 43.71
5. Ten oil tins are taken at random from an automatic filling machine. The mean weight of the
tins is 15.8 kg and standard deviation is 0.5 kg. Does the sample mean differ significantly
from the intended weight of 16 kg?
Solution:
Given x 15.8, 16, s 0.50, n 10
Null Hypothesis H0: 16 the sample mean weight is not different from the intended weight.
Alternative Hypothesis H1: 16 i.e., the sample mean weight is not different from the
intended weight.
Level of significance: 5% = 0.05, degrees of freedom = 10-1=9
x
Test Statistic : t
s
n
15.8 16 0.2
t 1.27 t 1.27
0.50 0.1581
10
Critical value : The critical value of t at 5% level of significance with degrees of freedom 9 is
2.26
Conclusion:
Here calculated value < table value.
so we accept H0 at 5% level of significance.
Hence the sample mean weight is not different from the intended weight.
(ii) Test of significance of the difference between means of two small samples:
To test the significance of the difference between the means x1 and x2 of sample of size n1
and n 2 .
x1 x2
Under H 0 , the test statistic is t ,
1 1
S
n1 n2
x1 x1 x2 x2
2 2
n s 2 n2 s22
where S 1 1 or S 2 (if s1 , s2 is not given directly )
n1 n2 2 n1 n2 2
Degrees of freedom(df) v = n1 + n 2 -2
Note:
If n1 = n 2 =n and if the pairs of values x1 and x2 are associated in some way (or correlated).
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
d d
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2. Two independent sample of size 8 and 7 contained the following value:
Sample I 19 17 15 21 16 18 16 14
Sample 15 14 15 19 15 18 16
II
Is the difference between the sample means significant?
Solution:
x1 x1 x x x2 x2 x
2 2
x1 x2
1 1 2 x2
19 2 4 15 -1 1
17 0 0 14 -2 4
15 -2 4 15 -1 1
21 4 16 19 3 9
16 -1 1 15 -1 1
18 1 1 18 2 4
16 -1 1 16 0 0
14 -3 9
136 0 36 112 0 20
x1
x 1
136
17, x2
x2 112 16
n1 8 n2 7
x x x
2 2
1 1 2 x2 36 20
S2 4.3076 S 2.0754
n1 n2 2 872
Let H 0 : x1 x2 ,
H1 : x1 x2 (Two tailed test)
x1 x2 17 16
Under H 0 , the test statistic is t 0.9309
1 1 1 1
S 2.0754
n1 n2 8 7
t 0.9309
Degrees of freedom v = v = n1 + n 2 -2=13
From the ‘t’ table, v = 13 degrees freedom at 5% level of significance is t0.05 =2.16
Since t t0.05 H 0 is accepted
Conclusion:
The two sample mean do not differ significantly at 5% level of significance.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
3. The following data represent the biological values of protein from cow’s milk and buffalo’s
milk:
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
Examine whether the average values of protein in the two samples significantly differ at
5% level.
Solution:
Given n1 n2 6
H 0 : 1 2 There is no significant difference between the means of the two samples.
H1 : 1 2 There is a significant difference between the means of the two samples.
xy
Test Statistic: t
1 1
S
n1 n2
x y xx ( x x )2 y y
( y y )2
x 1.78 y 1.965
1.82 2 0.04
0.0016 0.035 0.00123
2.02 1.83 0.24
0.0576 -0.135 0.01823
1.88 1.86 0.1 0.01 -0.105 0.01102
1.61 2.03 -0.17
0.0289 0.065 0.00425
1.81 2.19 0.03
0.0009 0.225 0.0506
1.54 1.88 -0.24
0.0576 -0.085 0.00723
Total
10.68 11.79 0.1566 0.09256
x
x 10.68 1.78 ; y y 11.79 1.965
n1 6 n2 6
1
S2 0.1566 0.09256 (0.1)(0.2492) 0.0249 S 0.1578
662
1.78 1.956 0.176 0.176
t 1.9319
1 1 (0.1578)(0.5774) 0.0911
(0.1578)
6 6
Critical value:The critical value of t at 5% level of significance with degrees of freedom 10 is
2.228
Here calculated value < table value, we accept H 0
(i.e,) The difference between the mean protein values of the two varieties of milk is not
significant at 5% level.
10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
4. The following data relate to the marks obtaind by 11 students in 2 test, one held at the
beginning of a year and the other at the end of the year intensive coaching.
Test 1 19 23 16 24 17 18 20 18 21 19 20
Test 2 17 24 20 24 20 22 20 20 18 22 19
Do the data indicate that the students have benefited by coaching?
Solution:
The given data relate to the marks obtained in 2 tests by the same set of students. Hence the
marks in the 2 set can be regarded as correlated.
We use t-test for paired values.
Let H 0 : x1 x2 ,
H1 : x1 x2 (one tailed test)
d d
2 2
x1 x2 d = x1 - x2 d 2 x1 x2 d- d
19 17 2 4 3 9
23 24 -1 1 0 0
16 20 -4 16 -3 9
24 24 0 0 1 1
17 20 -3 9 -2 4
18 22 -4 16 -3 9
20 20 0 0 1 1
18 20 -2 4 -1 1
21 18 3 9 4 16
19 22 -3 9 -2 4
20 19 1 1 2 4
-11 58
d d
2
d
d 11 1 S 2
58
5.272
n 11 n 11
d 1
the test statistic is t 1.377 t 1.377
S 5.272
n 1 10
from the table, v = n-1 = 10 (d.f.), t0.05 =1.812
Since t t0.05 H 0 is accepted
Conclusion:
The students have not benefitted by coaching.
11
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
5. Ten Persons were appointed in the officer cadre in an office. Their performance was noted
by giving a test and the marks were recorded out of 100.
Employee A B C D E F G H I J
Before training 80 76 92 60 70 56 74 56 70 56
After training 84 70 96 80 70 52 84 72 72 50
By applying the t-test, can it be concluded that the employees have been benefited by the
training?
Solution:
Null Hypothesis H0: 1 2 i.e., the employees have not been benefited by the training.
Alternative Hypothesis H1: 1 2 i.e., the employees have been benefited by the training.
Level of significance: 5% = 0.05 (one tailed test)
d
Test Statistic : t
S
n
where S 2
1
(d d ) 2 & d
d
n 1 n
Employees Before After d (d d ) 2
A 80 84 -4 0
B 76 70 6 100
C 92 96 -4 0
D 60 80 -20 256
E 70 70 0 16
F 56 52 4 64
G 74 84 -10 36
H 56 72 -16 144
I 70 72 -2 4
J 50 50 6 100
Total 44 44.4
d
d 40 4
n 10
1 1
S
2
n 1
(d d )2 (720) 80
9
d 4
t 1.414 t 1.414
S 8.94 / 10
n
Critical value : The critical value of tat 5% level of significance with degrees of freedom 9 is
1.83
Conclusion:
Here calculated value < table value.
so we accept H0
Hence the employees have not been benefited by the training.
12
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
6. The weight gains in pounds under two systems of feeding of calves of 10 pairs of identical
twins is given below.
Twin pair 1 2 3 4 5 6 7 8 9 10
Weight gains under 43 39 39 42 46 43 38 44 51 43
System A
Sytem B 37 35 34 41 39 37 37 40 48 36
Discuss whether the difference between the two systems of feeding is significant.
Solution:
Null Hypothesis H0: 1 2 i.e., there is no significance difference between the two system of
feedings
Alternative Hypothesis H1: 1 2 i.e., there is significance difference between the two systems
of feedings.
Level of significance: 5% = 0.05 ( Two tailed test)
d
Test Statistic : t
S
n
where S 2
1
(d d ) 2 & d
d
n 1 n
System System
Twin
Pair
A B d x y (d d ) 2
x y
1 43 37 6 2.56
2 39 35 4 0.16
3 39 34 5 0.36
4 42 41 1 11.56
5 46 39 7 6.76
6 43 37 6 2.56
7 38 37 1 11.56
8 44 40 4 0.16
9 51 48 3 1.96
10 43 36 7 6.76
Total 44 44.4
d
d 44 4.4
n 10
1 1
S
2
n 1
(d d )2 (44.4) 4.93
9
S 2.08
13
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
d 4.4
t 6.68
S 2.08 / 10
n
Critical value : The critical value of tat 5% level of significance with degrees of freedom 9 is
2.62
Conclusion:
Here calculated value < table value.
so we accept H0
Hence there is no significance difference between the two systems of feedings.
II F-test
(i) To test whether if there is any significant difference between two estimates of population
variance
(ii) To test if the two sample have come from the same population.
We use F-test:
S2
The test statistic is given by F 12 , if S12 S22
S2
n1s12 n s2
Where S12 [ n1 is the first sample size] and S22 2 2 [ n2 is the second sample size]
n1 1 n2 1
The degrees of freedom ( v1 , v2 )=( n1 1 n2 1 )
Note :
S2
1. If S12 S22 then F 22 (always F > 1)
S1
2. To test whether two independent samples have been drawn from the same normal population,
we have to test
i) Equality of population means using t-test or z-test, according to sample size.
ii) Equality of population variances using F-test
Problem:
1. A sample of size 13 gave an estimated population variance of 3.0, while another sample of
size 15 gave an estimate of 2.5. Could both sample be from population with the same
variance?
Solution:
Given n1 =13, n 2 =15, S12 =3.0, S22 2.5
Let H 0 : S12 S22 (the two samples have been drawn from populations with same variance}
H1 : S12 S22
S12 3
The test statistics is F 2
1.2
S2 2.5
From the table, with degrees of freedom v = ( n1 1 n2 1 ) = (12, 14)
F0.05 2.53 Since F F0.05 H 0 is accepted
Conclusion:
The two sample could have come from two normal population with the same variance.
14
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2. Two sample of size 9 and 8 give the sums of squares of deviations from their respective
means equal to 160 and 91 respectively. Could both samples be from populations with the
same variance?
Solution:
x x y y
2 2
Given n1 =9, n 2 =8, 160 , 91
x x y y
2 2
160 91
S12 20 , S22 13
n1 1 8 n2 1 7
Let H 0 : 12 22 (the two normal populations have the same variance}
H1 : 12 22
S12 20
The test statistics is F 1.538
S22 13
From the table, with degrees of freedom v = ( n1 1 n2 1 ) = (8,7)
F0.05 3.73 Since F F0.05 H 0 is accepted
Conclusion:
The two sample could have come from two populations with the same variance.
3. Two random samples gave the following data:
Sample Size Mean Variance
I 8 9.6 1.2
II 11 16.5 2.5
Cane we conclude that the two samples have been drawn from the same normal
population?
Solution:
The two samples have been drawn from the same normal population we have to check
(i) the variance of the population do not differ significantly by F-test.
(ii) the sample means do not differ significantly by t-test.
(i) F-test:
Given n1 =8, n 2 =11, s12 =1.2, s22 2.5 , x1 =9.6, x2 =16.5
n1s12 8(1.2) n s 2 11(2.5)
S12 1.37 S22 2 2 2.75
n1 1 7 n2 1 10
Let H 0 : 12 22
H1 : 12 22
S22
The test statistics is F 2
(sin ce S12 S22 )
S1
2.75
2.007
1.37
From the table, F0.05 n2 1, n1 1 F0.05 (10,7) 3.63
Since F F0.05 H 0 is accepted
(ii) t-test:(Equality of means)
Let H 0 : 1 2
H1 : 1 2
15
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x1 x2
Under H 0 , the test statistic is t ,
1 1
S
n1 n2
n1s12 n2 s22 8(1.2) 11(2.5)
where S 1.4772
n1 n2 2 8 11 2
9.6 16.5
t 10.0525 t 10.0525
1 1
1.4772
8 11
From the table ,with degrees of freedom n1 + n 2 -2=17, t0.05 =2.110
sin ce t t0.05 H 0 is rejected ie. 1 2
Conclusion:
The two samples could not have been drawn from the same normal population.
4. Two independent samples of 5 and 6 items respectively had the following values of the
following values of the variable:
Sameple1: 21 24 25 26 27
Sameple2: 22 27 28 30 31 36
Can you say that the two samples came from the same population?
Solution:
Let H 0 : 12 22 and 1 2 ( the two samples have been drawn from the same population)
H1 : 12 22 and 1 2
(i) F-test : (Equality of variance)
x1 x1 x x x2 x2 x x
2 2
x1 x2
1 1 2 2
21 -3.6 12.96 22 -7 49
24 -0.6 0.36 27 -2 4
25 0.4 0.16 28 -1 1
26 1.4 1.96 30 1 1
27 2.4 5.76 31 2 4
36 7 49
123 21.2 174 108
x1
x 1
123
24.6, x2
x2 174 29
n1 5 n2 6
x x x
2 2
21.2 2 x2 108
s12 5.3 , s22 21.6
n1 1 4 n2 1 5
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
25.92
3.912
6.625
From the table, F0.05 n2 1, n1 1 F0.05 (5, 4) 6.26
Since F F0.05 H 0 is accepted
(ii) t-test:(Equality of means)
x1 x2
Under H 0 , the test statistic is t ,
1 1
S
n1 n2
n1s12 n2 s22 5(5.3) 6(21.6)
where S 4.164
n1 n2 2 562
24.6 29
t 1.746 t 1.746
1 1
4.16
5 6
From the table ,with degrees of freedom n1 + n 2 -2=9, t0.05 =2.262
sin ce t t0.05 H 0 is accepted ie. 1 2
Conclusion: The two samples could have been drawn from the same normal population.
5. Two random samples gave the following results:
Sample Size Sample Sum of squares of
mean deviations from the
mean
1 10 15 90
2 12 14 108
Test whether the samples come from the same normal population at 5% level of
significance.
Solution:
A normal population has 2 parameters namely mean µ and variance 2 . To test if independent
samples have been drawn from the same normal population, we have to test
1) Equality of population means using t-test or z-test, according to sample size.
2) Equality of population variances using F-test.
Given x 15, y 14, n1 10, n2 12, ( x x ) 90,
2
( y y) 2
108
i) t-test to test equality of population means:
1 2 there is no difference between the two population means.
Null hypothesis H 0 :
Alternate Hypothesis H1 : 1 2 there is difference between the two population means.
Level of Significance : 5% 0.05 (Two tailed test )
xy
Test statistic: t
1 1
S
n1 n2
1 1
Where S
2
( x x )2 ( y y )2 (90 108) 9.9
n1 n2 2 10 12 2
S 9.9 3.146
17
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
15 14
t 0.742
1 1
3.146
10 12
Critical value: The critical value of t at 5% level of significance with degrees of freedom
n1 n2 2 10 12 2 20 is 2.086
Conclusion: calculated value < table value
H 0 is Accepted.
ii) F-test to test equality of populations variances:
Null Hypothesis H0: 12 22 The population Variances are equal
Alternative Hypothesis H1: 1 2 The population Variances are not equal
2 2
Level of significance: 5%
Test Statistics:
S2
F 12
S2
1 1
Where S1
2
n1 1
( x x )2
10 1
(90) 10
1 1
S12
n1 1
( y y )2
12 1
(108) 9.818
S12 10
Here S1 S2 F
2 2
2
1.02
S2 9.818
Critical value:The critical value of F at 5% level of significance with degrees of freedom
(n1 1, n2 1) (9,11) is 2.90
Here calculated value < table value, we accept H0
Conclusion: Both null hypothesis 1 2 and 1 2 are accepted.
2 2
Hence we may conclude the two samples are drawn from same normal population.
III 2 -test:
(i). 2 -Test for a specified population variance
(ii). 2 -test is used to test whether differences between observed and expected frequencies are
significant (goodness of fit).
(iii). 2 -test is used to test the independence of attributes.
2 -Test for a specified population variance:
ns 2
The test statistics 2
2
Which follows 2 - distribution with (n – 1) degrees of freedom
Problem:
1. The lapping process is used to grind certain silicon wafers to the proper thickness is
acceptable only , the population S.D. of the thickness of dice cut from the wafers, is at
most 0.5mil. Use the 0.05 level of significance to test the null hypothesis =0.5 against the
alternative hypothesis >0.5, if the thickness of 15 dice cut from such wafers have S.D of
0.64mil.
18
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Solution:
Given n 15 , s=0.64, =0.5
H 0 : 0.5 , H1 : 0.5
ns 2 15 (0.64)2
Under H 0 , The test statistics 2 24.576
2 (0.5)2
From 2 table, with degrees of freedom = 14, 0.05
2
23.625
2 0.05
2
H 0 is rejected. Hence 0.5
2 -test is used to test whether differences between observed and expected frequencies are
significant (goodness of fit):
Oi Ei 2
The test statistics
2
i Oi
Where Oi is observed frequency, and Ei is the expected frequency.
If the data given in a series of n number, then degree of freedom = n - 1 .
Note: In case of binomial distribution d.f = n – 1, poisson distribution d.f = n – 2, normal
distribution d.f = n – 3.
Problem:
1. The following data give the number of aircraft accident that occurred during the various
days of a week:
Days : Mon Tue Wed Thu Fri Sat
No of 15 19 13 12 16 15
accidents:
Test the whether the accident are uniformly distributed over the week.
Solution:
90
The expected number of accident on any day 15
6
Let H 0 : Accidents occur uniformly over the week
H1 : Accidents not occur uniformly over the week
Days Observed Expected Oi Ei Oi Ei
2
Freqency Frequency
Ei
( Oi ) ( Ei )
Mon 15 15 0 0
Tue 19 15 4 1.066
Wed 13 15 -2 0.266
Thu 12 15 -3 0.6
Fri 16 15 1 0.066
Sat 15 15 0 0
90 1.998
O Ei
2
Now, 2 i 1.998
i Oi
Here 6 observations are given, degrees of freedom = n – 1= 6 – 1 = 5
19
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2 0.05
2
H 0 is accepted.
Conclusion: Accidents occur uniformly over the week
2. A survey of 320 families with 5 children each revealed the following distribution:
No. of 5 4 3 2 1 0
Boys:
No. of 0 1 2 3 4 5
Girls:
No. of 14 56 110 88 40 12
families:
Is the result consistent with the hypothesis that male and female births are equally
probable?
Solution:
Let H 0 : Male and female births are equally probable
H1 : Male and female births are not equally probable
1 1
Probability of male birth p , Probability of female birth q
2 2
x 5 x
The probability of x male births in a family of 5 is p( x) 5Cx p q , x 0,1, 2...5
Expected number of families with x male births 320 5Cx p x q5 x , x 0,1, 2...5
x 5 x
1 1
320 5Cx
2 2
5
1
320 5Cx 10 5Cx
2
The 2 is calculated using the following table:
No. of Observed Expected Oi Ei Oi Ei
2
x: 0 1 2 3 4 5 6
f(x): 275 72 30 7 5 2 1
Solution:
20
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Freqency Frequency
Ei
( Oi ) Ei
0 275 242.1 4.471
1 72 116.7 17.122
2 30 28.1 0.128
3 7 4.5
4 5 15 0.5 5.1 19.218
5 2 0.1
6 1 0
Total 392 392 40.939
40.939
2
21
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Attribute B
B1 B2 B3 Total
A1 a11 a12 a13 R1
A2 a21 a22 a23 R2
Attribute A
Total C1 C2 C3 N
Now, under the null hypothesis H 0 : The attributes A and B are independent and we calculate
the expected frequency Eij for varies cells using the following formula.
Ri C j
Eij , i 1, 2,...r , j 1, 2,...s
N
R1 C1 R1 C2 R1 C3 R1
E a11 E a12 E a13
N N N
R C R C R C R2
E a21 2 1 E a22 2 2 E a23 2 3
N N N
R3 C1 R3 C2 R3 C3 R3
E a31 E a32 E a33
N N N
C1 C2 C3 N
O Eij
2
r s
and we compute 2
ij
i 1 j 1 Eij
Which follows 2 distribution with n = (r-1) (s-1) degrees of freedom at 5% or 1% level of
significance.
1. Calculate the expected frequencies for the following data presuming two attributes viz.,
conditions of home and condition of child as independent.
Condition of home
Clean Dirty
Condition of Child Clean 70 50
Fair 80 20
Dirty 35 45
Use Chi-Square test at 5% level of significance to state whether the two attributes are
independent.
Solution:
Null hypothesis H 0 : Conditions of home and conditions of child are independent.
Alternate hypothesis H 1 : Conditions of home and conditions of child are not independent.
Level of significance: 0.05
22
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
r s (Oij Eij ) 2
The test statistics: 2
i 1 i 1 Eij
Analysis:
Condition of home Total
Clean Dirty
Condition of Child Clean 70 50 120
Fair 80 20 100
Dirty 35 45 80
Total 185 115 300
Corresponding row total×Column total
Expected Frequency
Grand Total
120×185 100×185
Expected Frequency for 70 74 , Expected Frequency for 80 61.67 ,
300 300
80×185 120×115
Expected Frequency for 35 49.33 , Expected Frequency for 50 46 ,
300 300
100×115 80×115
Expected Frequency for 20 38.33 , Expected Frequency for 45 30.67
300 300
Oij E ij Oij - E ij (Oij Eij ) 2 (Oij Eij ) 2
Eij
70 74 -4 16 16
0.216
74
50 46 4 16 0.348
80 61.67 18.33 335.99 5.448
20 38.33 -18.33 335.99 8.766
35 49.33 -14.33 205.35 4.163
45 30.67 14.33 205.35 6.695
Total 25.636
2 25.636
23
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2. The following contingency table presents the reactions of legislators to a tax plan according
to party affiliation. Test whether party affiliation influences the reaction to the tax plan at
0.01 level of signification.
Reaction
Party In favour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
Solution:
Null hypothesis H 0 : Party affiliation and tax plan are independent.
Alternate hypothesis H 1 : Party affiliation and tax plan are not independent.
Level of significance: 0.05
r s (Oij Eij ) 2
The test statistic:
2
i 1 i 1 Eij
Analysis:
Reaction
Party Infavour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
24
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Oij E ij Oij - E ij (Oij Eij ) 2 (Oij Eij ) 2
Eij
3. From a poll of 800 television viewers, the following data have been accumulated as to, their
levels of education and their preference of television stations. We are interested in
determining if the selection of a TV station is independent of the level of education
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 50 150 80 280
Commercial Stations 150 250 120 520
Total 200 400 200 800
(i) State the null and alternative hypotheses.
(ii) Show the contingency table of the expected frequencies. (iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for
this test.
Solution:
(i)Null Hypothesis: Selection of TV station is independent of level of education
Alternative Hypothesis: Selection of TV station is not independent of level of education
(ii) Level of significance: 0.05
25
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 50 150 80 280
Commercial Stations 150 250 120 520
Total 200 400 200 800
280×200 280×400
Expected Frequency for 50 70 , Expected Frequency for 150 140
800 800
280×200 520×200
Expected Frequency for 80 70 , Expected Frequency for 150 130
800 800
520×400 520×200
Expected Frequency for 250 260 , Expected Frequency for 120 130
800 800
r s (Oij Eij ) 2
The test statistic:
2
i 1 i 1 Eij
Analysis:
Oij E ij Oij - E ij (Oij Eij ) 2 (Oij Eij ) 2
Eij
26
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Large sample:
If the size of the sample n>30, then that samplw is said to be large sample. There are four
important test to test the significance of large samples.
Note:
(i). The sampling distribution of a static is approximately normal, irrespective of whether the
distribution of the population is normal or not.
(ii). The sample statistics are sufficiently close to the corresponding population parameters and
hence may be used to calculate the standard errors of the sampling distribution.
(iii). Critical values for some standard LOS’s (For Large Samples)
1% (0.01) 2% (0.02) 5% (0.05) 10% (0.1)
Nature of test
(99%) (98%) (95%) (90%)
Note:
x
If standard deviation of population is not known then the static is z ,
S
n
where S = standard deviation of sample.
Confident Interval:
The confident interval for when is known and sampling is done from a normal population or
with a large sample is x z
n
27
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x z , x z
n n
s
If s is known ( is not known): x z
n
1. A sample of 100 students is taken from a large population, the mean height in the sample is
160cm. Can it be reasonable regarded that in the population the mean height is 165cm, and
s.d. is 10cm. and find confident limit. Use an level of significance at 1%
Solution:
Given n = 100, x =160cm, =165cm, =10cm
Let H 0 : 165
H1 : 165 (two tailed test)
x 160 165
Under H 0 , the test statistic is z 5
10
n 100
z 5
From the table, z0.01 =2.58. Since z z0.01 H 0 is rejected. hence 165 .
Confident Interval:
10 10
x z , x z 160 2.58 ,160 2.58 (157.42,162.58)
n n 100 100
2. The mean breaking strength of the cables supplied by a manufacture is 1800 with a S.D of
100. By a new techniques in the manufacturing process, it it claimed that the breaking
strength of the cable has increased. In order to test this claim, a sample of 50 cables is tested
and it is found that the mean breaking strength is 1850. Can we support the claim at 1%
level of significance?
Solu:
Given n = 50, x =1850, =1800, =100
Let H 0 : x
H1 : x (one tailed test)
x 1850 1800
Under H 0 , the test statistic is z 3.535
100
n 50
z 3.535
From the table, z0.01 =2.33. Since z z0.01 H 0 is rejected. hence x .
3. A sample of 900 members has a mean of 3.4 cms and s.d is 2.61 cms. Is the sample from a
large population of mean 3.25cm and s.d is 2.61 cms. If the population is normal and its
mean is unknown find the 95% confidence limits of true mean.
Solution:
Given n 900 , 3.25 , x 3.4cm , 2.61, s 2.61
Null Hypothesis H0 : Assume that there is no significant difference between sample mean and
population mean. (i.e) 3.25
Alternative Hypothesis H1 : Assume that there is a significant difference between sample mean
and population mean. (i.e) 3.25
28
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Level of significance : 5%
Test Statistic :
x 3.4 3.25
z 1.724
s 2.61
n 900
Critical value: The critical value of z for two tailed test at 5% level of significance is 1.96
Conclusion:
i.e., z 1.724 1.96 calculated value < tabulated value
Therefore We accept the null hypothesis H0.
i.e., The sample has been drawn from the population with mean 3.25
Solution:
Given n 121, x 6.08, 6, S 0.44
Null Hypothesis H0: 6 i.e., Assume that the lathe is in perfect adjustment
Alternative Hypothesis H1: 6 i.e., Assume that the lathe is not in perfect adjustment.
Level of Significance : 0.05
ii) Test Statistic :
x 6.08 6 0.08
z 2
S 0.44 0.04
n 121
Table value: Table value at 5% level of significance is 1.96
iii) Conclusion:
Here calculated value > tabulated value
Hence we reject 𝐻0 .
5. The mean life time of a sample of 100 light tubes produced by a company is found to be
1580 hours with standard deviation of 90 hours. Test the hypothesis that the mean lifetime
of the tubes produced by the company is 1600 hours.
Solution:
Given n 100, x 1580, 1600, S 90
Null Hypothesis H0: 1600 i.e., There is no significance difference between the sample mean
29
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
and population mean
Alternative Hypothesis H1: 1600 i.e., There is a significance difference between the
sample mean and population mean
Level of Significance : 5% 0.05
Test Statistic :
x 1580 1600 20
z 2.22
S 90 9
n 100
z 2.22
Table value: Table value at 5% level of significance is 1.96 (two tailed test)
Conclusion:
Here calculated value > tabulated value
Hence we reject 𝐻0 .
Hence the mean life time of the tubes produced by the company may not be 1600 hrs.
30
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
x1 x2 20 15
Test statistic: z 18.6
1 1 1 1
4
n2 n1 500 400
Critical value: The critical value of t at 1% level of significance is 2.58
Conclusion: calculated value > table value
H 0 is rejected
The samples could not have been drawn from same population.
2. Test significance of the difference between the means of the samples, drawn from two
normal populations with the same SD using the following data:
Size Mean Standard Deviation
Sample I 100 61 4
Sample II 200 63 6
Solution:
Given x1 60, x2 63, s1 4, s2 6, n1 100, n2 200
Null hypothesisH 0 : 1 2 there is no significance difference between the means of the samples.
Alternate Hypothesis H1 : 1 2 there is a significance difference between the means of the
samples.
Level of Significance : 5% 0.05 (two tailed test )
x1 x 2 61 63
Test statistic: z 3.02 z 3.02
2 2 2 2
s1 s 4 6
2
n2 n1 200 100
Critical value: The critical value of t at 5% level of significance is 1.96
Conclusion: calculated value > table value
H 0 is rejected .Therefore the two normal populations, from which the samples are drawn, may
not have the same mean though they may have the same S.D.
3. A sample of heights of 6400 Englishmen has a mean of 170cm and a S.D of 6.4cm, while a
simple sample of heights of 1600 Americans has a mean of 172cm and a S.D of 6.3cm. D the
data indicate that Americans are on the average, taller than Englishmen?
Solution:
Given x1 170, x2 172, s1 6.4, s2 6.3, n1 6400, n2 1600
Null hypothesis H 0 : 1 2 there is no significance difference between the heights of Americans
and Englishmen.
Alternate Hypothesis H1 : 1 2 Americans are on the average, taller than Englishmen
Level of Significance : 5% 0.05 (one tailed test )
x1 x 2 170 172
Test statistic: z 11.32 z 11.32
2 2 2 2
s1 s 6.4 6.3
2
n2 n1 6400 1600
Critical value: The critical value of t at 5% level of significance is 1.645
31
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Conclusion: calculated value > table value
H 0 is rejected. We conclude that the data indicate that Americans are on the average, taller than
Englishmen.
4. The aveage marks scored by 32 boys is 72 with a S.D of 8, while that for 36 girls is 70 with a
S.D of 6. Test at 1%level of significance whether the boys perform beter than girls.
Solution:
Given x1 72, x2 70, s1 8, s2 6, n1 32, n2 36
H 0 : 1 2 (Both perfom are equal)
H 0 : 1 2 (Boys are better than girls) (one tailed test)
x1 x2 72 70
The test statistic: z 1.15
2 2
s s2 82 62
1
n2 n1 32 36
Critical value: The critical value of t at 1% level of significance is 2.33
Confident Interval:
PQ
The confident interval for population proportion for large sample is p z
n
1. In a big city 325 men out of 600 men were found to be smokers. Does this information
support the conclusion that the majority of men in this city are smokers?
Solution:
Given n=600 , Number of smokers=325
p = sample proportion of smokers p =325/600=0.5417
P= Population proportion of smokers in the city = 1/2 =0.5Q=0.5
Null Hypothesis H0: The number of smokers and non-smokers are equal in the city.
Alternative Hypothesis H1: P > 0.5 (Right Tailed)
Test Statistic:
p P 0.5417 0.5
z 2.04
PQ 0.5*0.5
n 600
32
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Critical value:
Tabulated value of z at 5% level of significance for right tail test is 1.645.
Conclusion:
Since Calculated value of z > tabulated value of z.
We reject the null hypothesis. The majority of men in the city are smokers.
2. 40 people were attacked by a disease and only 36 survived. Will you reject the hypothesis
that the survival rate, if attacked by this disease, is 85% at 5% level of significance?
Solution:
Given
36
The Sample proportion, p 0.90
40
Population proportion P 0.85 Q 1 P 1 0.85 0.15
Null Hypothesis H0: P 0.85 i.e., There is no significance difference in survival rate
Alternative Hypothesis H1: P 0.85
i.e., There is a significance difference in survival rate.
Level of Significance : 0.05
Test Statistic :
pP 0.90 0.85
z 0.886
PQ 0.85 0.15
n 40
Table value: Tabulated value of z at 5% level of significance is 1.96
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Conclusion:
Here calculated value > table value.
So we accept H0 . Hence the manufacturers claim cannot be supported.
4. A salesman in a departmental store claims that at most 60 percent of the shoppers entering
the store leave without making a purchase. A random sample of 50 shoppers should that 35
out of them left without making a purchase. Are these sample reults consistent with the
claim of the salesman? Use an LOS of 0.05.
Solution:
35
Let p = Sample proportion of shoppers not making a purchase = 0.7
50
60
P = Population proportion of shoppers not making a purchase = 60% 0.6 ,
100
and Q = 1 – P = 0.4
H0: P 0.6 i.e., The claim is accepted
H1: P 0.6 (two tailed test)
pP 0.7 0.6
The test Statistic is z 1.445
PQ 0.6 0.4
n 50
From the table, z0.05 =1.96. Since z z0.05 H 0 is accepted
Conclusion:
The sample reults are consistent with the claim of the salesman.
Problem based on Test of significance for Two proportion:
To test the significant difference between the sample proportion p1 and p2 and the population
proportion P, then we use the test statistic
p1 p2
z , where Q = 1 – P
1 1
PQ
n1 n2
n1 p1 n2 p2
If P is not known, then P
n1 n2
Confident Interval:
The confident interval for difference between two population proportion for large sample is
1 1
p1 p2 z PQ
n1 n2
1. Before an increase in excise duty on tea, 800 people out of a sample of 1000 were consumers
of tea. After the increase in duty, 800 people were consumers of tea in a sample of 1200
persons. Find whether there is significant decrease in the consumption of tea after the
increase in duty. Also find confident limit.
Solution:
Given n1 1000, n2 1200
800
p1 proportion of tea drinkers before increase inexcise duty 0.8
1000
34
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
800
p2 proportion of tea drinkers before increase inexcise duty 0.6667
1200
Null hypothesis: H 0 : P1 P2 there is no significance difference in the consumption of tea before
after increase in excise duty
Alternate hypothesis: H1 : P1 P2 there is a significance difference in the consumption of tea
before after increase in excise duty
Level of significance: 5% =0.05
p1 p2
Test Statistic: z
1 1
PQ
n1 n2
Where
n1 p1 n2 p2 (0.8)(1000) (0.67)(1200)
P 0.7273 Q 1 P 1 0.7273 0.2727
n1 n2 1000 1200
0.8 0.6667 0.1333
z 6.99
1 1 0.01907
(0.7273)(0.2727)
1000 1200
Critical value: the critical value of z at 5% level of significance is 1.645
Conclusion:
Here calculated value > table value
We reject H 0
Hence there is no significance difference in the consumption of tea before after increase in excise
duty.
Confident Interval:
The confident interval for difference between two population proportion for large sample is
1 1 1
p1 p2 z PQ 0.8 0.667 1.645 0.7273 0.2727
1
n1 n2 1000 1200
(0.1016,0.1644)
2. Random samples of 400 men and 600 women asked whether they would like to have a
flyover near their residence.200 men and 325 women were in favor of the proposal. Test the
hypothesis that proportions of men and women in favor of the proposal are same against
that they are not, at 5% level.
Solution:
Given n1 400, n2 600
200
p1 proportion of men 0.5
400
325
p2 proportion of women 0.541
600
Null hypothesis: H 0 : P1 P2 Assume that there is no significant difference between the
option of men and women as far as proposal of flyover is concerned.
Alternate hypothesis: H1 : P1 P2 Assume that there is significant difference between the
option of men and women as far as proposal of flyover is concerned
35
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Level of significance: 5% =0.05 (two tailed)
p1 p2
Test Statistic: z
1 1
PQ
n1 n2
n1 p1 n2 p2 (400)(0.5) (600)(0.541)
Where P 0.525 Q 1 P 1 0.525 0.475
n1 n2 400 600
0.5 0.541 0.041
z 1.34 z 1.34
1 1 0.032
(0.525)(0.475)
400 600
Critical value: the critical value of z at 5% level of significance is 1.96
Conclusion:
Here calculated value < table value
We accept H 0 at 5% level of significance.
Hence There is no difference between the option of men and women as far as proposal
of flyover are concerned.
3. A machine puts out 16 imperfect articles in a sample of 500. After the machine is
overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine improved?
Solution:
Hypothesis:
H 0 : P1 P2
H 1 : P1 P2
Level of Significance : 0.05
p1 p 2
Test Statistic : Z
1 1
PQ
n1 n2
Analysis:
The Sample proportion,
16 3 n p n2 p 2
p1 0.032 , p 2 0.03 , P 1 1 0.032 & Q 1 P 0.968
500 100 n1 n2
p1 p 2 0.032 0.03
Z 0.1037
1 1 1 1
PQ 0.032 0.968
n1 n2 500 100
Table value : Z 1.645
Conclusion:
Calculated value < table value
Hence we accept the null hypothesis and conclude that the machine has not improved after
overhauling.
36
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
UNIT-IV DESIGN OF EXPERIMENT
The sequence of steps taken to ensure a scientific analysis leading to valid inferences about
the hypothesis is called design of experiment. The main aim of the design of experiments is to
control the extraneous variables and hence to minimize the experimental error so that the
results of the experiments could be attributed only to the experimental variables.
The basic principles of design of experiments:
(i) Randomization
(ii) Replication
(iii) Local Control
Basic design of Experiments:
Depending on the number of extraneous variables whose effects are to be controlled, various
design procedures are developed in the study of experimental design. We shall consider here
three important designs.
(1) Completely randomized Design (C.R.D)
(2) Randomized Block Design (R.B.D)
(3) Latin Square Design (LSD)
ANOVA:
Analysis of Variance is a technique that will enable us to test for the significance of the
difference among more than two sample means.
Assumptions of analysis of variance:
(i) The sample observations are independent
(ii) The environmental effects are additive in nature
(iii) The samples have been randomly selected from the population.
(iv) Parent population from which observations are taken in normal.
One Way Classification (or) Completely randomized Design (C.R.D)
The C.R.D is the simplest of all the designs, based on principles of randomization and
replication. In this design, treatments are allocated at random to the experimental units over the
entire experimental materials.
Advantages of completely randomized block design:
The advantages of completely randomized experimental design as follows:
(i) Easy to lay out.
(ii) Allow flexibility
(iii) Simple statistical analysis
(iv) lots of information due to missing data is smaller than with any other design
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Working Procedure ( One – Way classification )
Analysis:
Step 1: Find N= number of observations
Setp 2: Find T = The total value of observations
2
T
Step 3: Find the correction Factor = C . F
N
X1
2 2 2
X X
Step 5: Column Sum of Square S S C ... C . F
2 3
N1 N2 N3
Source
Sum of Degree
of Mean
Degree of F- Ratio
Variatio Square
s freedom
n
M SC
Between SSC FC if M SC M SE
SSC c-1 M SC M SE
Columns c 1
(or)
M SE
SSE FC if M SE M SC
Error SSE N-c M SE M SC
N c
Step 8: Conclusion:
Calculated value < Table Value, the we accept Null Hypothesis H 0 (or)
Calculated value > Table Value, the we reject Null Hypothesis H 0
1. The following are the number of mistakes made in 5 successive days by 4 technicians
working for a photographic laboratory. Test whether the difference among the four
sample means can be attributed to chance. (Test at a level of significance 0 . 01 )
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Technicians
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
Solution:
Step 4: T S S X1 X2 X3 X 4 C . F 3 7 3 7 3 9 1 0 8 .4 5 1 1 4 .5 5
2 2 2 2
Step 5:
X1 X3
2 2 2 2
X 2
X 4
SSC C.F
N1 N1 N1 N1
( 1)
2 2 2
9 5
0 8 .4 5
5 5 5
S S C 0 .2 1 6 .2 5 8 .4 5 1 2 .9 5
Step 7: S S E = T S S -S S C 1 1 4 .5 1 2 .9 5 1 0 1 .6
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Step 8: ANOVA TABLE:
Source of Sum of Degree of
Mean Square F- Ratio
Variation Squares freedom
Between SSC
=4.317 FC
M SC
SSC=12.95 C-1= 4-1=3 M SC
Columns C 1 M SE
6 .3 5
SSE 4 .3 1 7
Error SSE=101.6 N-C=20-4=16 M SE =6.35
N C =1.471
Cal FC = 1.471
Table value : FC (16,3)=5.29
Conclusion : Cal FC< Tab FC
There is no significance difference between the technicians
2. A completely randomized design exprement with 10 plots and 3 treatments gave the
following results.
Plot No : 1 2 3 4 5 6 7 8 9 10
Treatment : A B C A C C A B A B
Yield : 5 4 3 7 5 1 3 4 1 7
Analyse the results for treatment effects.
Solution:
A B C
5 4 3
7 4 5
3 7 1
1
Null Hypothesis H0: There is no significant difference in treatments
Alternate Hypothesis H1 : Significant difference in treatments
X1 X2 X3 TOTAL X12 X22 X32
5 4 3 12 25 16 9
7 4 5 16 49 16 25
Total
3 7 1 11 9 49 1
1 1 1
16 15 9 40 84 81 35
Step1: N= Total No of Observations = 10
Step 2: T=Grand Total = 40
2 T
2
40
2
( Grand total )
Step 3: Correction Factor = = 160
Total No of Observatio ns N 10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Step 4: T S S X1 X2 X 3 C . F 84 81 35 160 40
2 2 2
X1
2 2 2
2 2
X 2
X 3 (1 6 ) 15
Step 5: S S C C .F 3 160
N1 N1 N1 4 3
SSC 64 75 27 160 6
SSC
M SC
Between C 1
SSC=6 C-1= 3-1=2 M SE
Columns 6 FC
3 M SC
2
4 .8 6
SSE
M SE 3
N C
Error SSE=34 N-C=10-3=7 1 .6 2
34
4 .8 6
87
Cal FC = 1.62
Table value : FC (7,2)=19.35
Conclusion : Cal FC< Tab FC
We accept Null Hypothesis There is no significance difference in tretments
3. As head of the department of a consumers research organization you have the responsibility of
testing
and comparing life times of 4 brands of electric bulbs.suppose you test the life time of 3 electric
bulbs
each of 4 brands,the data is given below,each entry representing the life time of an electric
bulb,measured
in hundreds of hours
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
Solution:
H0: Here the population means are equal.
H1: The population mean are not equal.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2 T
2
258
2
( Grand total )
Step 3: Correction Factor = = 5547
Total No of Observatio ns N 12
Step 4: T S S X1 X2 X 3 C . F 39
2 2 2
X1
2 2 2
X 2
X 3
Step 5: S S C C .F 15
N1 N1 N1
Between SSC
=13
M SE
SSC=39 C-1= 4-1=3 M SC FC
Columns C 1 M SC
13
SSE
M SE =1.875 1 .8 7 5
Error SSE=15 N-C=12-4=8 N C
6 .9 3
Analysis:
Step 1: Find N= number of observations
Setp 2: Find T = The total value of observations
2
T
Step 3: Find the correction Factor = C .F
N
X1
2 2 2
X X
Step 5: Find column sum of Square S S C ... C . F
2 3
N1 N2 N3
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Where N i Total number of observation in each column ( i 1, 2 , 3, ...)
Y Y Y
2 2 2
Step 8: Find the table value for both F C & F R (use table)
2
Step 9: Conclusion:
Calculated value < Table Value, the we accept Null Hypothesis H 0 (or)
Calculated value > Table Value, the we reject Null Hypothesis H 0
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
ii) Is there significant difference between the seasons?
Solution:
Null Hypothesis H 0 : There is no significant difference between the sales in the 3 seasons and
also between the sales of the 4 salesmen.
Alternate Hypothesis H 1 : There is a significant difference between the sales in the 3 seasons and
also between the sales of the 4 salesmen.
Test statistic:
To simplify calculations we deduct 30 from each value
Seasons A B C D Seasons
X12 X22 X32 X42
X1 X2 X3 X4 Total
Y1 Summer 6 6 -9 5 8 36 36 81 25
Y2 Winter -2 -1 1 2 0 4 1 1 4
Y3 Monson -4 -2 -1 -1 -8 16 4 1 1
Total 0 3 -9 6 0 56 41 83 30
Step 4: T S S X1 X2 X3 X 4 C . F 56 41 83 30 0 210
2 2 2 2
X1
2 2 2 2
(9)
2 2 2 2
X 2
X 3
X 4 0 3 6
Step 5: S S C C .F 0
N1 N1 N1 N1 3 3 3 3
SSC 0 3 27 12 0 42
Y Y Y
2 2 2
(8)
2 2 2 2
1 2 3 8 0 6
Step 6: S S R C .F 0 16 0 16 0 32
N2 N2 N2 4 4 4 4
Step 7: S S E = T S S -S S C -S S R 2 1 0 4 2 3 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Step 8: ANOVA TABLE:
Source of Sum of Degrees of Mean Sum of varience F – ratio
Variation Squares Freedom Squares
Between SSC=42 c-1=4-1=3 SSC M SE
M SC M SC
Columns c 1 M SC
(Salesmen) 42 2 2 .6 7 F C ( 6 , 3 ) 8 .9 4
14
3 14
1 .6 1 9
Between SSR =32 r-1=3-1=2 SSR M SE F R ( 6 , 2 ) 8 .9 4
M SR M SR
rows r 1 M SR
(Seasons) 32 2 2 .6 7
16
2 16
1 .4 1 7
Error SSE=136 N-c-r +1=6 SSE
M SE
N c r 1
136
2 2 .6 7
6
Total 210 11
Table Value of F = F C (E rro r, d . f) F C ( 6 , 3 ) 8 .9 4 , F R (E rro r, d . f) 8 .9 4 with 5% level of
significance
Conclusion:
1) Cal F R < Table F R , 0 .0 5 ( 6 , 3 )
Hence we accept the H 0 and we conclude that there is no significant difference between sales in
the three seasons.
2) Cal F R < Table F R , 0 .0 5 ( 6 , 2 ) .
Hence we accept the H 0 and we conclude that there is no significant difference between in the
sales of 4 salesmen.
2. The following data represent the number of units of production per day turned out by
different workers using 4 different types of machines.
A B C D
Machine type
1 44 38 47 36
2 46 40 52 43
Workers 3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
(1) Test whether the five men differ with respect to mean productivity and
(2) Test whether the mean productivity is the same for the four different machine types.
Solution:
Null Hypothesis H0: There is no significant difference between the Machine types the Workers.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Alternate Hypothesis H1 : Significant difference between the Machine types between the Workers
Test statistic:
To simplify calculations we deduct 46 from each value
Machine type
workers
X12 X22 X32 X42
Total
worker A B C D
s X1 X2 X3 X4
Y2 0 -6 6 -3 -3 0 36 36 9
Y5 -8 -4 3 -7 -16 64 16 9 49
Step 5:
X1
2 2 2 2
Y Y Y Y Y
2 2 2 2 2
1 2 3 4 5
Step 6: S S R C .F
N2 N2 N2 N2 N2
500
4 4 4 4 4
361 9 1444 576 256
5 0 0 6 6 1 .5 5 0 0 1 6 1 .5
4
S S R 1 6 1 .5
Where N 2 = Number of elements in each row=4
Step 7: S S E T S S S S C S S R 5 7 4 3 3 8 .8 1 6 1 .5 7 3 .7
10
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Step 8: ANOVA TABLE:
Source of Sum of Degrees of Mean Sum of varience F – ratio
Variation Squares Freedom Squares
Between SSC=338.8 c-1=4-1=3 SSC M SC
M SC M SC
Columns c 1 M SE
(Salesmen) F C (3,1 2 ) 3 .4 9
3 3 8 .8 1 1 2 .9
3 6 .1 4
1 1 2 .9 1 8 .3 9
Between SSR r-1=5-1=4 SSR M SR F R ( 4 ,1 2 ) 3 .2 6
M SR M SR
rows =161.8 r 1 M SE
(Seasons) 1 6 1 .5 4 0 .4
4 6 .1 4
4 0 .4 6 .5 8
Error SSE=73.7 N-c-r +1 SSE
M SE
=20-4-5+1 N c r 1
=12 7 3 .7
12
6 .1 4
Total 574.3 19
Table Value of F = F C (E rro r, d . f) F C ( 6 , 3 ) 8 .9 4 , F R (E rro r, d . f) 8 .9 4 with 5% level of
significance
Conclusion:
1) Calculated value > table value
Hence we reject the H 0 and we conclude that there is significant difference between sales in the
three seasons.
2) Calculated value > table value
Hence we rejectthe H 0 and we conclude that there is significant difference between in the sales
of 4 salesmen.
Conclusion :There is significant difference between the Machine types and no significant difference
between the Workers
3. A laboratory technician measures the breaking strength of each of 5 kinds of linen threads
by using 4 different measuring instruments, and obtains the following results, in ounces.
I1 I2 I3 I4
Thread 1 20.9 20.4 19.9 21.9
Thread 2 25 26.2 27.0 24.8
Thread 3 25.5 23.1 21.5 24.4
Thread 4 24.8 21.2 23.5 25.7
Thread 5 19.6 21.2 22.1 22.1
Perform a 2 – way ANOVA using the 0.05 level of significance for both tests.
11
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Solution:
Null Hypothesis: There is no significant difference between in breaking strength of various
threads H 01 : 1 2 3 4 5 and H 02 : 1 2 3 4
threads H 11 : 1 2 3 4 5 and H 12 : 1 2 3 4
ANOVA TABLE
Source of Sum of Degrees of Mean Sum of F – ratio
Variation Square Freedom Squares
Between 66.393 R–1=4 16.598 16 . 598
FR
Rows 2 . 078
7 . 987
Table Value of F = F 0 .0 5 ( 4 ,1 2 ) 3 .2 6 a n d F 0 .0 5 (1 2 , 3) 8 .7 4
Conclusion:
between threads
12
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
4. Four varieties A,B,C,D of a fertilizer are tested in a randomized block design with 4
replication. The plot yields in pounds are as follows:
Column / Row 1 2 3 4
N=16
T=Grand Total = 236
2 2
( Grand total ) ( 236 )
Correction Factor = 3481
Total No of Observatio ns 16
2
T* j
SSC C . F 841 900 930 812 3481 2
h
2
T i*
SSR C . F 729 484 930 . 25 1482 . 25 3481 144 . 5
k
13
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
ANOVA Table
Between MSR=
SSR=144.5 h - 1= 3 FR = 39.27 F5%(3,9) = 3.86
varieties 48.17
Between
SSC=2 k – 1=3 MSC = 0.67
blocks
FC = 0.545 F5%(3, 9) = 3.86
(h – 1)( k – 1) MSE
Residual SSE = 10.5
=9 = 1.17
Conclusion: Cal FC<Tab FC and Cal FR> Tab FR Therefore null hypothesis is rejected. Hence four
varieties are not similar. But the varieties are similar along block wise.
14
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Working Procedure ( Three – Way classification )
we have seen data from a latin square experiment result in a three way classification
result in a three way classification say
(i) variety seeds
(ii) types of spacing(rows)
(iii) the letters for different manure treatment
H0: There is no difference between columns, between rows and between treatments
H1 :Not all are equal.
X1
2 2 2
X X
Step 5: Find column sum of Square S S C ... C . F
2 3
N1 N2 N3
Y Y Y
2 2 2
If MSC>MSE
M SC
FC
Column SSC M SE
SSC n-1 M SC
Treatment n 1 If MSC<MSE
M SE
FC
M SC
15
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
If MSR>MSE
M SR
FR
Row SSR M SE
SSR n-1 M SR
Treatments n 1 If MSR>MSE
M SE
FR
M SR
If MSK>MSE
M SK
FK
Between SSK M SE
SST n-1 M SK
Treatments n 1 If MSK>MSE
M SE
FK
M SK
1. Analyse the variance in the following latin square of yields (in kgs) of paddy where A, B,
C, D denote the different methods of cultivation.
16
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2
2
T i* ( 30 )
SSR C . F 81 24 . 75
n 16
2
2
T* j ( 30 )
SSC C . F 59 2 . 75
n 16
2
2
T i* ( 30 )
SSL C . F 60 . 5 4 . 25
n 16
Between
SSR=24.75 n - 1= 3 MSR=8.25 FR= 12.31 FR(3, 6)=4.76
Rows
Between
SSC=2.75 n - 1= 3 MSC = 0.92 FC = 1.37
Columns
17
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Between Fc(3, 6)=4 .76
SSL = 4.25 n - 1= 3 MSL = 1.42
Letters FL = 2.12
FL(3, 6)=4 .76
(n – 1)(n – 2)
Residual SSE= 4 MSE = 0.67
=6
Total 35.75
Conclusion :
Cal FC< Tab FC , Cal FL< Tab FL and Cal FR> Tab FR There is significant difference between the rows ,
no significant difference between the letters and no significant difference between the columns
2. The following is a Latin square of a design when 4 varieties of seeds are being tested. Set
up the analysis of variance table and state your conclusion. The following is a Latin square
of a design when 4 varieties of seeds are being tested. Set up the analysis of variance table
and state your conclusion. You may carry out suitable change of origin and scale.
A 105 B 95 C 125 D 115
C 115 D 125 A 105 B 105
D 115 C 95 B 105 A 115
B 95 A 135 D 95 C 115
(APRIL / MAY ‘17)
Solu.:
Variety X1 X2 X3 X4 TOTAL X1 2 X2 2 X3 2 X4 2
Y1 1 -1 5 3 8 1 1 25 9
Y2 3 5 1 1 10 9 25 1 1
Y3 3 -1 1 3 6 9 1 1 9
Y4 -1 7 -1 3 8 1 49 1 9
6 10 6 10 32 20 76 28 28
N=Total No of Observations = 16 T=Grand Total = 32
2
( Grand total )
Correction Factor = = 64
Total No of Observatio ns
2 2 2 2
TSS X1 X2 X3 X4 C . F 20 76 28 28 64 88
18
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2 2 2 2 2 2 2
( X 1) ( X 2) ( X 3) (6 ) (10 ) (6 ) (10 )
SSC C .F 64 4
N1 N1 N1 4 4 4 4
Y) Y Y Y
2 2 2 2 2 2 2 2
( 1
( 2
) ( 3
) ( 4
) (8 ) (10 ) (6 ) (8 )
SSR C .F 64 2
N1 N2 N2 N2 4 4 4 4
To find SSK
Treatment 1 2 3 4 Total
A 1 1 3 7 12
B -1 1 1 -1 0
C 5 3 -1 3 10
D 3 5 3 -1 10
Y Y Y Y
2 2 2 2
SSK= ( 1
)
( 2
)
( 3
)
( 4
)
C .F
K1 K 2
K3 K4
22
19
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2
2 F a c to ria l D e s ig n E x p e rim e n t:
In factorial experiment, the effect of several factors of variation are investigated simultaneously, the
treatment being all the combinations of different factors under study.
Procedure:
1. Find N,T
2
T
2. Find Correction factor = C . F
N
2. For 2 2 o r 2 2 f a c to r ia l :
Find
1
a + a b -b -(1 )
2
SSA =
N
1
b + a b -a -(1 )
2
SSB =
N
1
a b + (1 )-a -b
2
SSAB =
N
3. Find S S E = S S T -S S A -S S B -S S A B
ANOVA Table:
If MSC>MSE
M SC
FC
Between SSC M SE
SSC n-1 M SC
Column n 1 If MSC<MSE
M SE
FC
M SC
If MSR>MSE
Between SSR M SR
SSR n-1 M SR FR
Row n 1 M SE
If MSR<MSE
20
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
M SE
FR
M SR
If MSA>MSE
M SA
Fa
M SE
A SSA 1 M SA SSA
If MSA<MSE
M SE
Fa
M SA
If MSB>MSE
M SA
Fb
M SE
B SSB 1 M SB SSB
If MSB<MSE
M SE
Fb
M SA
If MSAB>MSE
M SAB
Fa b
M SE
AB SSAB 1 M SAB SSAB
If MSAB<MSE
M SE
Fa b
M SAB
Problem:
1. Analyse 22 factorial experiment for the following table
Treatment
I II III IV
(l) 64 75 76 75
(k) 25 14 12 33
(p) 30 50 41 25
(kp) 6 33 17 10
21
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Solution:
Treatme
I II III IV
nt
(l) 64 75 76 75
(k) 25 14 12 33
(p) 30 50 41 25
(kp) 6 33 17 10
SSR
∑Ti* 2
− C.F 7782 − 4 7778
n
SSC
∑T* j 2
− C.F 171.5 − 4 167.5
n
SSE = TSS – SSC – SSR = 8932 – 7778 – 167.5 = 986.5
[k] = [kp] – [p] + [k] – [1] =-300 ;[p] = [kp] + [p] - [k] – [1] =-148
Sk = [k]2 /4r = 5625; Sp = [p]2 /4r = 1369; Skp = [kp]2 /4r = 992.2
22
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
ANOVA Table
Degree FTab
Source of Sum of
of Mean Square F- Ratio
Variation Degrees
freedom
If MSR>MSE
F0.05(1 ,9)=5.12
If MSA>MSE
M SA
k SSk=5625 1 M SA SSA 5625 Fa
M SE
5 1 .3 2
M SA
p SSp=1369 1 M SB SSB 1369 Fb
M SE
1 2 .4 9
If F0.05(1 ,9)=5.12
MSAB>MSE
SSkp=992. M SAB SSAB
M SAB
kp 1 Fa b
2 9 9 2 .2
M SE
9 .0 5
SSE
Error (or) M SE
SSE=986.5 9 n c r 1
Residual
1 0 9 .6 1
Conclusion : Cal Fk > Tab Fk , Cal Fp > Tab Fp and Cal Fkp> Tab Fkp
There is significant difference between the treatments.
23
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
24
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
UNIT – IV:
NON-PARAMETRIC TESTS
PROBLEMS
1. Calculate the expected frequencies for the following data presuming two attributes viz., conditions
of home and condition of child as independent.
Condition of home
Clean Dirty
Condition of Child Clean 70 50
Fair 80 20
Dirty 35 45
Use Chi-Square test at 5% level of significance to state whether the two attributes are independent.
(AU NOV/DEC 2013)
Solution:
Null hypothesis: Two attributes are independent.
Alternative hypothesis: Two attributes are not independent.
2
(𝑂 − 𝐸 )
2
𝜒 = ∑𝑟𝑖=1 ∑𝑠𝑗=1 ( 𝑖𝑗 𝑖𝑗 ) follows chi-square distribution with (r-1)(s-1) degrees of freedom.
𝐸𝑖𝑗
𝜒 2 = 25.633
Table value at 5% level of significance is 5.991.
Null hypothesis is rejected.
2. A group of 19 pilots were trained in three different methods video cassette, audio cassette and class
room training. Their scores in written exams were as follows:
Video cassette : 74 88 82 93 55 70
Audio cassette : 78 80 65 57 88
Class room training : 68 50 91 84 77 94 81 92
Test whether there is any difference in the effectiveness of the three methods. Use appropriate rank sum
test.
Solution:
Here three independent populations are given
we use Kruscal – wallis H-Test.
Null Hypothesis: All the Populations are identical.
H 0 : 1 2 3
Alternate Hypothesis: All the Populations are not identical.
H 1 : 1 2 3
Level of significance: 0.05
The test statistics:
12 k Ri2
H or W 3(n 1)
n(n 1) i 1 ni
Where
k = 3 (Number of populations or samples )
Ri = sum of the ranks of all items in sample i
Analysis:
Ranking the data jointly from 1 to 19, we find that
R1 = sum of the ranks of all items in sample 1
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
=7+14+12+18+2+6
=59
R 2 = sum of the ranks of all items in sample 2
=9+10+4+3+15
=41
R 3 = sum of the ranks of all items in sample 3
=5+1+16+13+8+19+11+17
=90
n1 = 6 (the no. of items in sample 1)
n2 = 5 (the no. of items in sample 2)
n3 = 8 (the no. of items in sample 3)
n n i n1 n 2 n 3 ... n k
658
19
12 R12 R22 R32
H or W 3(n 1)
n(n 1) n1 n2 n3
12 59 412 902 3(20)
2
19(20) 6 5 8
0.95
The sampling distribution of W can be approximated by a 2 distribution with
k 1 =3-1=2 degrees freedom. 0.05
2 5.991
Conclusion: Here H 2 then we Accept Null hypothesis H 0
Hence we conclude that the given three methods are equally effective.
3. Use the sign test to see if there is a difference between the number of days until collection of
an account receivable before and after a new collection policy. Use the 0.05 significance level.
Before: 30 28 34 35 40 42 33 38 34 45 28 27 25 41 36
After : 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
Solution:
Null Hypothesis: There is no significant difference between the two types of
collections.
(i.e) H 0 : P 0.5
Alternate Hypothesis: There is a significant difference between the two types of
Collections.
(i.e) H 1 : P 0.5
Level of significance: 0.05
The test statistics:
Find d i xi yi
=- - + + + - - - - + + - - + 0
By omitting zero differences, we get n=14
No. of + signs = 6
No. of - signs = 8
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
p = 0.43 q = 0.57
np=6 and nq =8 is greater than 5
we use normal distribution
pq
The standard error of the proportion p
n
(0.57)(0.47)
=
14
=0.132
0.05 Z 1.96
The confidence interval ( P p Z , P p Z )
i.e ( 0.5 (0.132)(1.96) , 0.5 (0.132)(1.96) )
i.e 0.5 0.26, 0.5 0.26
i.e 0.241 , 0.76
Conclusion:
Here the sample proportion p = 0.57 lies within these two limits, so we Accept our
Null hypothesis H 0 .
Hence there is no significant difference between the two types of collections.
4. Write the merits and demerits of non parametric tests.
Solution:
Null Hypothesis: There is no significant difference between the two methods
(i.e) H 0 : 1 2
Alternate Hypothesis: There is a significant difference between the two methods
(i.e) H 0 : 1 2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Level of significance: 5%
The test statistics:
U U E (U )
Z or z
var(U )
Analysis:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
5% Z 1.96
Conclusion :
Since Z Z , we Accept Null Hypothesis H 0
Hence, there is no significant difference between the two methods.
6. Kevin Morgan, national sales manager of an electronics firm, has collected the following salary
statistics on his field sales force earnings. He has both observed frequencies and expected
frequencies if the distribution of salaries is normal. At the 0.05 level of significance, can Kevin
conclude that the distribution of sales force earnings is normal?
Solution:
Null Hypothesis:
H 0 : The distribution of salesforce earnings a normal.
Alternate hypothesis:
H 1 : The distribution of salesforce earnings is not normal.
Level of significance: 0.05
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
7. The following contingency table presents the reactions of legislators to a tax plan according to party
affiliation. Test whether party affiliation influences the reaction to the tax plan at 0.01 level of
signification.
Reaction
Party Infavour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
Solution:
Given
Null hypothesis H 0 : Party affiliation and tax plan are independent.
Alternate hypothesis H 1 : Party affiliation and tax plan are not independent.
Level of significance: 0.05
r s (Oij Eij ) 2
The test statistics:
2
i 1 i 1 Eij
Analysis:
Reaction
Party Infavour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
160 220
E(120)= 88
400
160 60
E(20)= 24
400
160 120
E(20)= 48
400
140 220
E(50)= 77
400
140 60
E(30)= 21
400
140 120
E(60)= 42
400
100 220
E(50)= 55
400
100 60
E(10)= 15
400
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
120 100
E(40)= 30
400
O ij E ij O ij - E ij (Oij Eij ) 2 (Oij Eij ) 2
Eij
120 88 32 1024 11.64
20 24 -4 16 0.67
20 48 -28 784 16.33
50 77 -27 729 9.47
30 21 9 81 3.86
60 42 18 324 7.71
50 55 -5 25 0.45
10 15 -5 25 1.67
40 30 10 100 3.33
2 55.13
0.05 Degrees of freedom = (r 1)(s 1) (3 1)(3 1) 4
2 13.28
Conclusion:
Since 2 2 , we Reject our Null Hypothesis H 0
Hence, the Party Affiliation and tax plan are dependent.
8. A technician is asked to analyze the results of 22 items made in preparation run. Each item has
been measured and compared to engineering specifications. The order of acceptance ‘a’ and
rejections of ‘r’ is
aarrrarraa aaarrarraa ra
Determine whether it is a random sample or not. Use 0.05 .
Solution:
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Analysis:
aa rrr a rr aaaaa rr a rr aa r a
1 2 3 4 5 6 7 8 9 10 11
R = No. of Runs = 11
n1 12 (no. of items in the first sample)
n2 = 10 (no. of items in the second sample)
2(12)(10)
E ( R) 1 11.909
12 10
2n1n2 (2n1n2 n1 n2 ) 2(12)(10)(2(12)(10) 12 10)
2.2688
(n1 n2 ) (n1 n2 1)
2
(12 10) 2 (12 10 1)
R 11 11.909
Z 0.4007 Z 0.4007
2.2688
0.05 Z 1.96
Conclusion:
Since Z Z , we Accept our Null hypothesis H 0
Hence, The sample is randomly chosen.
9. From a poll of 800 television viewers, the following data have been accumulated as to, their
levels of education and their preference of television stations. We are interested in determining
if the selection of a TV station is independent of the level of education
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 50 150 80 280
Commercial Stations 150 250 120 520
Total 200 400 200 800
(i) State the null and alternative hypotheses.
(ii) Show the contingency table of the expected frequencies.
(iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for this test.
(AU JAN 2014)
Solution:
(i)Null Hypothesis: Selection of TV station is independent of level of education
Alternative Hypothesis: Selection of TV station is not independent of level of education
(ii)
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 70 140 70 280
Commercial Stations 130 260 130 520
Total 200 400 200 800
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
10. The manager of a company believes that differences in sales performance depend upon the
salesperson’s age. Independent samples of salespeople were taken and their weekly sales record
is reported below.
Below 30 years Between 30 and 45 years Over 45 years
No. of Sales No. of Sales No. of Sales
24 23 30
16 17 20
21 22 23
15 25 25
19 18 34
26 29 36
27 28
(i) State the null and alternative hypotheses.
(ii) At 95% confidence, test the hypotheses using Kruskal Wallis Test. (AU JAN 2014)
Solution:
(i) Null Hypothesis: All three populations are identical.
Alternative Hypothesis: Not all populations are identical.
(ii) W=6.78, Critical 𝜒 2 = 5.991, at 0.05 level of significance with 2 degrees of freedom.
Hence Reject Null Hypothesis.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Solution:
X Y X2 Y2 XY
1 9 1 81 9
2 8 4 64 16
3 10 9 100 30
4 12 16 144 48
5 11 25 121 55
6 13 36 169 78
7 14 49 196 98
8 16 64 256 128
9 15 81 225 135
45 108 285 1356 597
Since x
x 45 5 and y
y 108 12
n 9 n 9
n xy x y
r 0.95
n x x n y y
2 2 2 2
n xy x y n xy x y
bxy 0.95 and byx 0.95
n y 2 y n x 2 x
2 2
Re gression line y on x is y y byx x x y 0.95 x 7.25
Re gression line x on y is x x bxy y y x 0.95 y 6.4
The value of correspond ing to x 6.2 is y 0.95 6.2 7.25 13.14
2. Explain the basic components of Time series analysis.
Solution:
The Components of a time series are
1. Secular Trend
2. Seasonal Variations
3. Cyclical Variations
4. Irregular Variations
1. Secular Trend
Trend, also called secular or long term trend, is the basic tendency of a series to grow or decline
over a period of time. The concept of trend does not include short range oscillations, but rather the
steady movement over a long time. If the values of the time series plotted on a graph paper,
cluster around a straight line, the trend is said to be a linear trend. Then the rate of growth in nearly
constant. If the plotted points do not fall in the pattern of a straight line, the trend is said to be a
non-linear trend or curvilinear trend.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
2. Seasonal Variations
Seasonal variations are variations which occur with some degree of regularity within a specific period
of one year or shorter. Seasons could be weekly, monthly, quarterly or half yearly depending on the
nature of the phenomenon. Production, consumption and prices of commodities show seasonal variations.
These variations are periodic and regular.
Eg. Prices of agricultural commodities go down at the time of harvest.
Seasonal variations may occur due to the following causes.
(1) Climate and Weather conditions.
(2) Customs, traditions and habits
A study of seasonal variations is helpful in scheduling purchases, inventory control, personnel
recruitment, advertising et. A consumer can gain by purchasing things during slack season.
3. Cyclical Variations
Cyclical variations in a time series are the recurrent variations whose duration is more than one year.
A business cycle has four phases – boom, recession, depression and recovery. These phases are uniform
but their time duration may vary from cycle to cycle. In spite of the importance of measuring cyclical
variations, they are very difficult to measure due to the following reasons.
(i) Business cycles do not show regular periodicity
(ii) The cyclical variations are associated with erratic, random or irregular faces.
4. Irregular Variations
Irregular variations refers to those variations in business or other activities, which do not repeat in a
definite pattern. They are caused by random factors like floods, earth quakes, famines, wars, strikes,
lockout etc. Sudden changes in demand or very rapid technological progress may also be responsible
for these variations. No advance preparation can be done to meet the consequences of irregular
variations and their effects are unpredictable and irregular.
3. Given below are the figures of production (in thousand quintals) of a sugar factory.
Year 1974 1975 1976 1977 1978 1979 1980
Production 77 88 94 85 91 98 90
Fit a straight line by the least squares method and tabulate the trend values.
Solution.
Year Production Trend
X X2 XY
(y) Values
1974 77 -3 9 -231 83
1975 88 -2 4 -176 85
1976 94 -1 1 -94 87
1977 85 0 0 0 89
1978 91 1 1 91 91
1979 98 2 4 196 93
1980 90 3 9 270 95
Total 623 0 28 56
The straight line trend equation is Y = a + bx
Normal equations are
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
a
y 623 89
y na b x , if sum of x is zero then
n 7
xy a x b x 2
b
xy 56 2
x 28
2
5. The following data on production (in ‘000 units) of a commodity from the year 2006-2012. Fit a
straight line trend and forecast for the year 2020
Year 2006 2007 2008 2009 2010 2011 2012
Production 6 7 5 4 6 7 5
(AU NOV/DEC 2013)
Solution:
Trend line Y=a+bX,
Normal equations are ∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2
a=5.71,
b=0.642
Y=5.71+0.642X, X=x-2009
When x = 2020, y=12.772
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
6. The monthly water consumption in thousand gallons in a hostel for five years is given below.
Calculate the seasonal indices by the method of simple averages
Year Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
Solution:
Year Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
Total 183 169 164 157 140 152 160 177 164 184 224 217
Avg. 36.6 33.8 32.8 31.4 28 30.4 32 35.4 32.8 36.8 44.8 43.4
Seasonal
105.02 96.99 94.12 90.10 80.34 87.23 91.82 101.58 94.12 105.60 128.55 124.53
Index
Monthly Average
Seasonal Index = 100
General Average
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
7. The quarterly sales (in thousands of copies) for a specific education software over the past three
years are given in the following table.
2003 2004 2005
Quarter 1 170 180 190
Quarter 2 111 96 120
Quarter 3 270 280 290
Quarter 4 250 220 223
(i) Compute the four seasonal factors (Seasonal Indexes). Show all of your computations.
(ii) The trend for these data is Trend = 174+4t (t represents time, where t=1 for Quarter 1 of 2003
And t=12 for Quarter 4 of 2005). Forecast sales for the first quarter of 2006 using the trend and
seasonal indexes.
Show all of your computations.(AU JAN 2014).
Solution:
(i)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
QUESTION BANK
I SEMESTER
Prepared by
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
QUESTION BANK
I SEMESTER
Prepared by
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
UNIT –I : PROBABILITY
PART – A
1. If A and B are independent events then prove that A and B are independent.
Since A and B are independent,
P A B P A P B (1)
P A B P A P A B P A P A P B [using (1)]
= P A 1 P B
P A B P A P B A & B are independent events
2. 1 3 1
Let A and B be two events such that P(A) = , P(B) = , P(A B) = . Compute P A B and
3 4 4
P A B . (May/June 2019)
1
P A B
P A B P B P A B
1 3 1 2 1
P A B 4 . .
P( B) 3 3 4 4 4 2
4
3. 3 1 2
If A and B are two events such that P(A B) = , P(A B) = , P(A) = , find P A B .
4 4 3
3 1 1 2
P( A B) P( A) P( B) P( A B) P( B) P( B)
4 3 4 3
2 1
P A B
P( B) P( A B) 3 4 5
P AB P( B)
P( B)
2
8
3
4. 5 3 12
If P ( A) , P ( B ) , P ( A B ) find P ( A B ) .
13 7 91
P( A B) = P(A) + P(B) – P(A∩B) = 5/13 +3/7 – 12/91 = 62/91
5. State Baye’s Theorem on Probability. (AU NOV/DEC 2013, APR/MAY 2018)
If E1, E2…En are a set of exhaustive and mutually exclusive events associated with a random experiment
P ( Ei ) P ( A / Ei )
and A is any other event associated with Ei. Then P ( Ei / A) n , i=1,2,..n
P ( Ei ) P ( A / Ei )
i 1
6. What are mutually exclusive and independent events? (AU JAN 2015)
Two events are said to be mutually exclusive if the occurance of any one of them excludes the occurance of
other in a single experiment. Example: Tossing of Coin.
Two (or) more events are independent if the occurance of one does not affect the occurance of the other.
Example: If coin is tossed twice; result of second throw is not affected by the result of first throw.
7. Define Random variable. (AU JAN 2014)
A random variable is a function that assigns a real number X(S) to every element S in the sample space
corresponding to a random experiment E.
i.e., X: S R, S-Sample Space and R-Real Numbers
8. Explain discrete and continuous variable with examples. (AU NOV/DEC 2013)
A random variable X is said to be discrete, if it takes finite or countable number of values.
Example: X=1,2,3,4,5
A random variable X is said to be continuous, if it takes uncountable number of values.
Example: X is defined in any interval.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
Find K
Since P X 1
0.1 K 0.2 2K 0.3 3K 1
0.4 1
6K 0.6 1 6K 0.4 K
6 15
13. Find the probability of getting a total of 5 at least once in three tosses of pair of fair dice .
4 1 2
p = 36 = 9 , q = 9 , n = 3
let X=number of times getting total 5, 𝑋~𝐵(𝑛, 𝑝)
⇒ 𝑃(𝑋 = 𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
1 0 2 3−0 23
𝑃(𝑋 ≥ 1) = 1 − 𝑃(𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1 − 3𝐶0 ( ) ( ) = 1 − 3 = 0.99
9 9 9
14. One percent of jobs arriving at a computer system need to wait until weekends for scheduling, owing
to core-size limitations. Find the probability that among a sample of 200 jobs there are no jobs that
have to wait until weekends.
p = 0.01, n = 200, = np = 2, X is the no. of jobs that have to wait
x 2
e e (2) x
e2 (2)0
P(X x) P(X 0) e2 0.1353.
x! x! 0!
15. The number of monthly breakdown of a computer is having a Poisson distribution with mean equal
to 1.8. Find the probability that this computer will function for a month with only one breakdown.
(AU MAY/JUNE 2019)
Mean = = np = 1.8, X = No. of breakdowns per month.
e x e1.8 (1.8) x e1.8 (1.8)1
P(X x) P(X 1) 0.2975.
x! x! 1!
16. 4
If X is a Uniformly distributed random variable with mean 1 and variance , find P(X<0).
3
b a 4
2
ab
Mean = 1 a b 2 and variance = b a 4
2 12 3
By solving the above eqns. We get a = -1 and b = 3
1
𝑓(𝑥) = 𝑖𝑛 𝑎 < 𝑥 < 𝑏
𝑏−𝑎
0 0
1 1 1 1
𝑓(𝑥) = 𝑖𝑛 − 1 < 𝑥 < 3 ⇒ 𝑃(𝑋 < 0) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = [𝑥]0−1 =
4 −1 −1 4 4 4
17. If R.V ‘X’ is uniformly distributed over (-3,3), then compute P ( | X – 2 | < 2).
1 1
𝑓(𝑥) = 𝑖𝑛 𝑎 < 𝑥 < 𝑏 = 𝑖𝑛 − 3 < 𝑥 < 3
𝑏−𝑎 6
3 3
1 1 3 1
P ( | X – 2 | < 2) = P ( -2 < X – 2 < 2) = P ( 0 < X < 4) = f (x)dx dx x 0 .
0 0
6 6 2
18. Define Normal distribution.
A random variable X is said to have a Normal distribution with parameters (mean) and 2
(variance) if its probability density function is given by the probability law
1 x
2
1
f x e 2 , x , , 0
2
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
26 30 40 30
P 26 X 40 P Z P 0.8 Z 2
5 5
P 0.8 Z 0 P 0 Z 2
P 0 Z 0.8 0 Z 2 0.2881 0.4772 0.7653 .
PART B
1a. A bag contains 3 black and 4 white balls. Two balls are drawn at random one at a time without
replacement. (1) What is the probability that the second ball drawn is white?
(2) What is the conditional probability that the first ball drawn is white if the second ball is known to
be white? (May/June 2019)
1b. An industrial unit has 3 machines – 1, 2 and 3, which produce the same item. It is known that machines
1 and 2 each produce 30% of the total output, while machine 3 produces 40% of the remaining output.
It is also known that 2% of machine 1 output is defective, while machines 2 and 3, each produce 3%
defective items. All the items are put into one stockpile and then one item is chosen at random. Find
the probability that defective item is produced by machine 1 (AU JAN 2016)
1c. The first bag contains 3 white balls, 2 red balls and 4 black balls. Second bag contains 2 white, 3 red
and 5 black balls and third bag contains 3 white, 4 red and 2 black balls. One bag is chosen at random
and from it 3 balls are drawn. Out of three balls two balls are white and one is red. What are the
probabilities that they were taken from first bag, second bag, third bag.
2a. A consulting firm rents cars from three rental agencies in the following manner: 20% from agency
D, 20% from agency E and 60% from agency F. If 10% cars from D, 12% of the cars from E and
4% of the cars from F have bad tyres, what is the probability that the firm will get a car with bad
tyres? Find the probability that a car with bad tyres is rented from agency F.
2b. A random variable X has the following probability function:
X : 0 1 2 3 4 5 6 7
P X : 0 K 2K 2K 3K K 2 2K 2 7K 2 + K
Find (i) K , (ii) Evaluate P X < 6 , P X 6 and P 0 < X < 5 (iii) Determine the distribution
function of X. (iv) P 1.5 < X < 4.5 X > 2 (v) E 3X - 4 , Var(3X - 4)
1
(vi) If P X C > , find the minimum value of C. (April/May 2015)
2
3a. 1
A random variable X has the probability mass function f (x) = , x= 1,2,3,...
2x
Find its (i) M.G.F (ii) Mean (iii) Variance.
3b. ax, 0 x1
a, 1 x 2
If the density function of a continuous random variable X is given by f(x) =
3a - ax, 2 x 3
0, elsewhere
Find the value of a and find the c.d.f of X , P(X 1.5 ).
4a. In the production of electric bulbs , the quality specification of their life was found to normally
distributed with average life of 2100 hours and standard deviation of 80 hours . In a sample of 1500
bulbs, find out the expected number of bulbs likely to burn for
(i) more than 2200 hours ,
(ii) less than 1950 hours ,
(iii) more than 2000 hours but less than 2150 hours . (AU MAY/JUNE 2019) (APR /MAY 2018)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
4b. Trains arrive at a station at 15 minutes interval starting at 4 a.m. If a passenger arrive at a station at
a time that is uniformly distributed between 9.00 a.m. and 9.30 a.m., find the probability that he has
to wait for the train for (i) less than 6 minutes (ii) more than 10 minutes.
5a. In a normal population with mean 15 and standard deviation 3.5, it is found that 647 observations
exceed 16.25. What is the total number of observations in the population?
5b. Derive MGF, Mean and Variance of Binomial and Poison Distribution.
UNIT –II : SAMPLING DISTRIBUTION AND ESTIMATION
PART – A
1. Define Population. (AU JAN 2016)
The group of individuals under study is called population. The population may be finite or infinite.
2. Define Sample and Sample Size. (AU JAN 2018)
A finite subset of statistical individuals in a population is called Sample. The number of individuals in a
sample is called Sample Size (n).
3. Define Parameter and Statistic. (AU JAN 2016)
A numerical measure of a population is called a population parameter or simply a parameter.
A numerical measure of the sample is called a sample statistic or simply a statistic.
4. Define Sampling distribution.
The sampling distribution of a statistic is the probability distribution of all possible values the statistic may
take, when computed from random samples of same size, drawn from a specified population. Like any
other distribution, a sampling distribution will have its mean, standard deviation and moments of higher
order.
5. Distinguish between estimate and estimator.
An estimator of a population parameter is a sample statistic used to estimate the parameter. An estimate of
the parameter is a particular numerical value of the estimator obtained by sampling.
6. What is Standard Error and mention its uses? (AU MAY/JUNE 2019)
The standard deviation of the sampling distribution of a statistic is known as its standard error.
The magnitude of the standard error gives an index of the reliability of the estimate of the parameter. The
greater the standard error of the estimate, lesser will be the reliability of the sample.Standard error is useful
for determining the probable limits or confidence limits for an unknown parameter with a specified
confidence co-efficient. Standard error is also used for testing of hypothesis.
7. Define Type I error and Type II error. (AU MAY/JUNE 2019)
Type I error: If we reject a hypothesis when it should be accepted, we say that type I error.
Type II error: If we accept a hypothesis when it should be rejected, we say that a type II error.
8. Define Critical region.
A region corresponding to a test statistic in the sample space which tends to rejection of H0 (Null
Hypothesis) is called critical region or region of rejection.
The region complementary to the critical region is called the region of acceptance.
9. Define level of significance.
The probability ‘’ (the probability of making type I error) that a random value of the test statistic belongs
to the critical region is known as the level of significance. In other words, level of significance is the size
of the type I error.
The levels of significance usually employed in testing of hypothesis are 5% and 1%.
10. Define Critical values or significant values.
The value of test statistic, which divides the critical (or rejection) region and acceptance region, is called
the critical value or significant value. It depends on the level of significance used and the alternative
hypothesis.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
11. Write the two properties of the sampling distribution of the mean when the population is normally
distributed. (AU JAN 2016)
1. It has a mean equal to the population mean 𝜇𝑥̅ = 𝜇.
2. It has a standard deviation equal to the population standard deviation divided by the square root of the
𝜎
sample size 𝜎𝑥̅ = 𝑛 , where 𝜎𝑥̅ is the standard error of the mean.
√
12. State the characteristics of best estimator.
i)Unbiasedness ii)Efficiency iii)Consistency iv)Sufficiency
13. Define One tailed test and two tailed test. (AU NOV/DEC 2013)
When the hypothesis about the population parameter is rejected only for the value of sample statistic
falling into one of the tails of the sampling distribution, then it is known as one-tailed test.
If it is right tail then it is called right-tailed test or one-sided alternative to the right and if it is on the left
tail, then it is one-sided alternative to the left and called left-tailed test. Two tailed test is one where the
hypothesis about the population parameter is rejected for the value of sample statistic falling into the either
tails of the sampling distribution.
14. Given that n 400, x 250, s 40 for one sample and n 400, x 220, s 55 for another
1 1 1 2 2 2
S.E of x1 - x 2 = 2.4074
1
+
1
400 400
= 0.17023
n 2 423
0.04
2
E
18. State central limit theorem. (AU JAN 2014, 2015)
A sample of samples is always normally distributed about the mean of sample means, even if the samples
themselves are not normally distributed themselves about their means.
19. Differentiate between point estimate and interval estimate (AU JAN 2015)
Point estimate Interval estimate
When a single value is used as an estimate, the estimate An estimate of a population parameter given
is called a point estimate of the population parameter. by two numbers between which the parameter
For example, the sample mean is the sample statistic may be considered to lie is called an interval
used as an estimate of population mean estimate of the parameter.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
20 Write the confidence interval for the population mean for large samples when is known.
The confidence interval for μ when is known and sampling is done from a normal population or with a
large sample, is x Z /2 .Here x - sample mean, - standard deviation, n – size of the sample.
n
PART – B
1a. Below you are given the values obtained from a random sample of 4 observations taken from an
infinite population.
32, 34, 35, 39
(i) Find a point estimate for µ. Is this an unbiased estimate of µ? Explain.
(ii) Find a point estimate for . Is this an unbiased estimate of ? Explain.
(iii) Find a point estimate for .
(iv) What can be said about the sampling distribution of ? Be sure to discuss the expected value,
the standard deviation and the shape of the sampling distribution of . (AU JAN 2014)
1b. In a sample of 25 observations from a normal distribution with mean 98.6 and standard deviation
17.2. What is [𝟗𝟐 < 𝒙̅ < 102] ? (AU JAN 2016)
2a. Explain the types of estimation and the qualities of a good estimator. (AU NOV/DEC 2013)
2b. In a random sample of 75 axle shafts, 12 have a surface finish that is rougher than the specifications
will allow. Suppose that a modification is made in the surface finishing process and subsequently a
second random sample of 85 axle shafts is obtained. The number of defective shafts in this second
sample is 10. Obtain an approximate 95% confidence interval on the difference in the proportions of
defectives produced under the two processes. (AU JAN 2016)
3a. In a batch chemical process used for etching printed circuit boards, two different catalysts are being
compared to determine whether they require different emersion times for removal of identical
quantities of photoresist material. Twelve batch were run with catalyst 1, resulting in a sample mean
emersion time of 24.6 minutes and sample standard deviation of 0.85 minutes. Fifteen batches were
run with catalyst 2, resulting in a mean emersion time of 22.1 minutes and a standard deviation of
0.98 minutes. Find a 95% confidence interval on the difference in means, assuming that 𝝈𝟏 𝟐 = 𝝈𝟐 𝟐 .
Also find a 90% confidence interval on the ratio of variances. (AU JAN 2016)
3b. From a population of 540, a sample of 60 individuals are taken. From this sample, the mean is found
to be 6.2 and the standard deviation is 1.368.
(1) Find the estimated standard error of the mean.
(2) Construct a 90 percent confidence interval for the mean. (AU MAY/JUNE 2019)
4. Discuss various non – probability sampling methods in use with its applications.
5a. A bank has kept records of the checking balances of its customers and determined that the average
daily balance of its customers is Rs. 300 with a standard deviation of Rs. 48. A random sample of 144
checking accounts is selected.
(i) What is the probability that the sample mean will be more than Rs. 306.60?
(ii) What is the probability that the sample mean will be less than Rs. 308?
(iii) What is the probability that the sample mean will be between Rs.302 and Rs. 308?
(iv) What is the probability that the sample mean will be atleast Rs. 296? (AU MAY/JUNE2014)
5b. A certain city is studied for demographic characteristics. It is found that the age has a standard
deviation of 5.3 years and 60% of the population is female. What should be the sample size if the age
is to be estimated with an error of less than 1 year? What should be the sample size if a similar
estimation is to be done on the proportion of female population if the desired accuracy is to be within
5%?If the sample average age is found to be 37.25 for a sample size of 300, estimate the population
age range with a confidence level of 95%. (AU MAY 2020)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
5c. The life time of a certain brand of an electric bulb may be considered as a random variable with
mean 1200h and standard deviation 250h. Find the probability, using central limit theorem, that the
average lifetime of 60 bulbs exceeds 1250hours.
UNIT – III: TESTING OF HYPOTHESIS – PARAMETRIC TESTS.
PART – A
1. Define t-statistic.
The t – distribution is used when sample size is 30 or less and the population standard deviation is
x n
( x x) 2
unknown. The t – statistic is defined as t where s 2 i . The t – distribution has been
s i 1 n 1
n
derived mathematically under the assumption of a normally distributed population.
2. List out the applications of t –distribution. ( AU NOV/DEC 2017) (AU APR/MAY 2018)
1. To test the significant difference between the means of two independent samples.
2.To test the significant difference between the means of two dependent samples or
paired observation.
3. To test the significance of the mean of a random sample.
4 To test the significance of an observed correlation coefficient.
3. Mention the Properties of t – distribution.
1. The t distribution ranges from to
2. The t – distribution like the standard normal distribution is bell shaped, symmetrical around mean
zero.
3. The variance of the t – distribution is greater than one and is defined only when 𝑣 ≥ 3
4. What is the purpose of F – test?
F test refers to a test of hypothesis concerning two variances derived from two samples. It is used to test
S2
whether the two sample variances are equal or not that is F 1 2 , S1 S2 . Thus F statistics is the ratio of
S2
independent estimates of population variances.
5. What are the assumptions on which F-test is based?
1. Normality: The values in each group should be normally distributed.
2. Independence of error: The variations of each value around its own group mean.
i.e. error should be independent of each value.
3. Homogeneity: The variances within each group should be equal for all groups.
6. When to use the normal and ‘t’ distribution in making tests of hypothesis about means? (JAN 2016)
When the sample size is greater than 30 we say it is large sample and when it is less than we say it as small
sample. For large samples we use normal distribution and for small samples we use t test.
7. Estimate the standard error of the difference between the two proportions if 𝒑 ̅̅̅𝟏 = 𝟎. 𝟏𝟎,
̅̅̅
𝒑𝟐 = 𝟎. 𝟏𝟑𝟑𝟑 ,𝒏𝟏 = 𝟓𝟎 and 𝒏𝟐 = 𝟕𝟓 ? (AU JAN 2016)
𝑛1 𝑝1 +𝑛2 𝑝2
Let us first calculate the weighted average of 𝑝1 and 𝑝2 that is say 𝑝 = 𝑛 +𝑛 =0.1199
1 2
1 1
S.E. is √𝑝𝑞 (𝑛 + 𝑛 ) = 0.05931.
1 2
8. What do you mean by degrees of freedom?
Degrees of freedom are the total number of observations minus the number of independent constraints
imposed on the observations.
9. Define Experimental Error
The estimation of the amount of variations due to each of the independent factors separately and then
comparing these estimates due to assignable factors with the estimate due to chance factor is known as
experimental error.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
16. Define one way classification and two way classifications in ANOVA.
The entire experiment influences on only single factor is one way classification. The entire experiment
influences on only two factors is two way Classification.
17. What are the basic principles of design of experiments? (APR/MAY 2018)
(i) Randomization (ii) Replication (iii) Local Control
18. What are the usual assumptions made in the analysis of a randomized block Experiment?
(i) All the experimental units are homogenous
(ii) Each treatment replicates ‘ r ’ times.
19. Write down the ANOVA table for One way classification
Source of Variation Sum of Degrees Degree of freedom Mean Square F- Ratio
SSC
Between Samples SSC K-1 MSC
K 1 MSC
FC
SSE MSE
Within
SSE N-K MSE
Samples NK
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
20. Write down the ANOVA table for Randomized Block Design
Source of Variation Sum of Degrees Degree of freedom Mean Square F- Ratio
SSC MSC
Column Treatment SSC c-1 MSC FC
c 1 MSE
SSR MSR
Row Treatments SSR r-1 MSC FR
r 1 MSE
SSE
Error (or) Residual SSE (r-1) (c-1) MSE
(r 1)(c 1)
PART – B
1a. The following are the number of mistakes made in 5 successive days by 4 technicians working for a
photographic laboratory. Test whether the difference among the four sample means can be attributed
to chance. (Test at a level of significance 0.01 )
Technicians
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
1b. 40 people were attacked by a disease and only 36 survived. Will you reject the hypothesis that the
survival rate, if attacked by this disease, is 85%at 5% level of significance?
2a. Two independent samples of 8 and 7 items respectively had the following values.
Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10 -----
Is the difference between the means of samples significant? ( AU APR / MAY 2018 )
2b. The following table shows the yields per acre of four different plant crops grown on lots treated with
three different types of fertilizer. Determine at the 5% significance level whether there is a difference
in yield per acre
(i) due to the fertilizers and
(ii) due to the crops
Crop - I Crop - II Crop - III Crop - IV
Fertilizer A 4.5 6.4 7.2 6.7
Fertilizer B 8.8 7.8 9.6 7.0
3a. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size of 22, test the
hypothesis that the value of the population mean is 70 against alternative that it is more than 70. Use
the 0.025 significance level. (AU JAN 2016)
3b. A random sample of 10 boys had the following I.Q’s: 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Do
these data support the assumption of a population mean I.Q of 100? Find the reasonable range in
which most of the mean I.Q values of samples of 10 boys lie.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
4a. The following table gives biological values of a protein from cow’s milk and buffalo’s milk at certain
level . Examine if the average values of protein in the 2 samples significantly differ. (APR /MAY 2018)
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
4b. A lathe is set to cut bars of steel into lengths of 6 centimeters. The lathe is considered to be in perfect
adjustment if the average length of the bars it cuts is 6 centimeters. A sample of 121 bars is selected
randomly and measured. It is determined that the average length of the bars in the sample is 6.08
centimeters with a standard deviation of 0.44 centimeters.
(i) Formulate the hypotheses to determine whether or not the lathe is in perfect adjustment.
(ii) Compute the test statistic. (iii) What is your conclusion? (AU JAN 2014)
5a. The daily production rates for a sample of factory workers before and after a training program are
shown below. Let d=After – Before.
Worker Before After
1 6 9
2 10 12
3 9 10
4 8 11
5 7 9
We want to determine if the training program was effective.
(i) Give the hypotheses for this problem.
(ii) Compute the test statistic.
(iii) At 95 % confidence, test the hypotheses. That is, did the training program actually increase the
production rates? (AU JAN 2019)
5b. The following table shows the lifetimes in hours of samples from three different types of television
tables manufactured by a company. Determine whether there is a difference between the three types
at significance level of 0.01
Sample 1 407 411 409
Sample 2 404 406 408 405 402
Sample 3 410 408 406 408
UNIT – IV: NON-PARAMETRIC TESTS
PART – A
1. State any two properties of 2 distribution.
1. The exact shape of the distribution depends upon the number of degrees of freedom n. In general when n
is small, the shape of the curve is skewed to the right and as n gets larger, the distribution becomes more
and more symmetrical.
2. The mean and variance of the distribution are n and 2n respectively.
3. The sum of the independent variates is also a variate.
2. Explain the various uses of Chi-square test.
1.Test of goodness of fit
2.Test of independence of attributes
3. Test of Homogeneity of independent estimates of the population correlation coefficient.
3. What are the conditions for the validity of Chi-square test? (AU MAY/ JUNE 2016)
1. The experimental data must be independent of each other.
2. The total frequency must be reasonably large, say 50.
3. No individual frequencies should be less than 5, If any frequency is less than 5, then it is pooled with the
preceding or succeeding frequency so that the pooled frequency is more than 5.Finally adjust for the
degrees of freedom lost in pooling.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
4. Write the formula for chi square test of single standard deviation. (AU MAY/JUNE 2014)
n 1 s 2
The formula is 2
2
5. What are the uses of 2 test?
1. To test the homogeneity of independent estimates of the population variances.
2. To test the goodness of fit.
3. To test for independence of attributes.
6. Explain the Chi – square test as a test of independence.
It is applied to test the association between the attributes when the sample data is presented in the form of a
contingency table with any number of rows or columns.
Ri C j
Eij where , GT = Grand Total
G.T
χ 2calculated value < χ 2tabulated value
then accept .
7. What are the disadvantages of Non Parametric tests?
1. They ignore a certain amount of information
2. They are often not as efficient or sharp as parametric tests.
3. The non-parametric tests cannot be used to estimate parameters in the population or the confidence
intervals for such parameters.
8. What is meant by Non Parametric test? (AU NOV/DEC 2013)
Non parametric test is the test that does not make any assumption regarding from which the sampling is
done. They are often called as distribution -free methods.
9. Name any four Non parametric test. (AU JAN 2015)
1. Sign Test for paired data
2. Rank Sum tests
(a)Mann-Whitney U-Test (b)Kruskal -Wallis Test or H test.
3. Rank correlation test
4. One sample run test.
10. Define the statistics used in the U – test and give its mean.
n (n 1) nn
U = n1n 2 1 1 R1 , Mean = 1 2
2 2
11. Define the statistics used in the H – test. (AU MAY/JUNE 2019)
12 Ri k 2
H= 3(n 1)
nn 1) i 1 ni
12. When Kruskal –Wallis test is used ? (NOV/DEC 2017)
The Kruskal –Wallis test is used to test whether the 3 or more populations are identical or not . The K-W
test is based on the analysis of independent random samples from each of the k populations.
13. Define Kolmogorov smirnov Test.
It is a simple non parametric test for testing whether there is a significance between an observed
frequency distributions and a theoretical frequency distribution. It is another measure of the goodness of
fit. (i.e.,) 𝐷𝑛 = 𝑚𝑎𝑥|𝐹𝑒 − 𝐹𝑜 |
14. When Mann-Whitney U-Test is used ?
The U test is used to test whether the 2 populations are identical or not . The U test is based on the
analysis of independent random samples from two polpulations.
15. What are the advantages of Kolmogorov-smirnov test ?
(i) It is a more powerful test.
(ii) It is easier to use since it does not require that the data be grouped in any way.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
16. Write any two advantages of non-parametric methods over parametric methods.
1. They do not require us to make the assumption that a population is distributed in the shape of a normal
curve or another specific shape.
2. Generally they are easier to do and to understand.
3. Sometimes even formal ordering or ranking is not required.
17. When sign test is used?
1. When there are pair of observations on two things being compared.
2. For any given pair, each of two observations is made under similar conditions.
3. No assumptions are made regarding the parent population.
18. Write the formula for run test . (AU APR/MAY 2018)
Let R be the number of runs , n1 = number of items in first sample , n2 = number of items in second
sample .
Here ,R is approximated by normal distribution
R E ( R)
Z N (0,1)
V ( R)
2n n 2n n (2n n n n )
where E ( R) 1 2 1, V ( R) 2 1 2 21 2 1 2
n1 n2 n1 n2 n1 n2 1
R
Z N (0,1)
19. Find the number of runs for the following series MMFFFMFFMMMM.
Number of runs = R = 5.
20. Find the number of runs for the following series HTHHHHHTTTHTHTHHTTTHHTTTH .
Number of runs = R = 13 .
PART – B
1.a Two sample polls of votes for two candidates A and B for a public office were taken, one from
among the residents of rural areas and one from the residents of urban areas . The results are given
in the table . Examine whether the nature of the area is related to voting preference in this election.
( AU NOV /DEC 2017)
Votes for A B TOTAL
area
Rural 620 380 1000
Urban 550 450 1000
TOTAL 1170 830 2000
1b. An experiment designed to compare three preventive methods against corrosion yielded the following
maximum depths of pits (in thousands of an inch) in pieces of wire subjected to the respective
treatments:
Method A: 77 54 67 74 71 66
Method B: 60 41 59 65 62 64 52
Method C: 49 52 69 47 56
Use the kruskal – Wallis test at the 5% level of significance to test the null hypothesis that the three
samples come from identical populations.
2a. Use the sign test to see if there is a difference between the number of days until collection of an
account receivable before and after a new collection policy. Use the 0.05 significance level.
Before: 30 28 34 35 40 42 33 38 34 45 28 27 25 41 36
After : 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
2b. Test whether the following numbers 0.44, 0.81, 0.14, 0.05, 0.93 are uniformly distributed using
Kolmogorov – smirnov test (AU NOV – DEC 2018)
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
3a. Two methods of instruction to apprentices are to be evaluated. A director assigns 15 randomly
selected trainees to each of the two methods. Due to drop outs, 14 complete in batch 1 and 12
complete in batch 2. An achievement test was given to these successful candidates. Their scores are
as follows. Method 1: 70 90 82 64 86 77 84 79 82 89 73 81 83 66
Method 2: 86 78 90 82 65 87 80 88 95 85 76 94
Test whether the two methods have significant difference in effectiveness. Use Mann-Whitney test
at 5% significance level.
3b. Kevin Morgan, national sales manager of an electronics firm, has collected the following salary
statistics on his field sales force earnings. He has both observed frequencies and expected
frequencies if the distribution of salaries is normal. At the 0.05 level of significance, can Kevin
conclude that the distribution of sales force earnings is normal? (AU MAY/JUNE 2019)
Earnings in thousands 25-30 31-36 37-42 43-48 49-54 55-60 61-66
Observed frequency 9 22 25 30 21 12 6
Expected frequency 6 17 32 35 18 13 4
4a. The following contingency table presents the reactions of legislators to a tax plan according to party
affiliation. Test whether party affiliation influences the reaction to the tax plan at 0.01 level of
signification.
Reaction
Party Infavour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
4b. A technician is asked to analyze the results of 22 items made in preparation run. Each item has been
measured and compared to engineering specifications. The order of acceptance ‘a’ and rejections of
‘r’ is aarrrarraaaaarrarraara Determine whether it is a random sample or not. Use 0.05 .
5a. From a poll of 800 television viewers, the following data have been accumulated as to, their levels of
education and their preference of television stations. We are interested in determining if the
selection of a TV station is independent of the level of education (AU JAN 2016)
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 50 150 80 280
Commercial Stations 150 250 120 520
Total 200 400 200 800
(i) State the null and alternative hypotheses.
(ii) Show the contingency table of the expected frequencies. (iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for this test.
5b. The manager of a company believes that differences in sales performance depend upon the
salesperson’s age. Independent samples of salespeople were taken and their weekly sales record is
reported below.
Below 30 years Between 30 and 45 years Over 45 years
No. of Sales No. of Sales No. of Sales
24 23 30
16 17 20
21 22 23
15 25 25
19 18 34
26 29 36
27 28
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
3. When do you say two regression lines coincide with each other?
When r = ±1 the two regression lines coincide .
4. Differentiate between correlation and regression (APR/MAY 2018)
Correlation analysis Regression analysis
1. Correlation coefficient r between X and Y 1. The regression coefficients are
is a measure of linear relationship mathematical measures expressing the
between X and Y average relationship between the two
variables
2. The correlation coefficient does not 2. Regression coefficient reflect on the
reflect upon the nature of variable nature of variable
3. It is a relative measure and is independent 3. Regression coefficients are absolute
of the units of measurement measures of finding out the relationship
between two or more variables
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
17. Briefly explain how a scatter diagram benefits the researcher? (AU MAY/JUNE 2014)
The simplest device for studying correlation between two variables is a special type of dot chart called
scatter diagram. In this method, the given data is plotted on a graph in the form of dots. The more the
plotted points scatter over a chart, the lesser is the degree of relationship between the two variables. The
nearer the points come to the line, the higher the degree of relationship. If the plotted points lie in a
haphazard manner it shows the absence of any relationship between the variables.
18. When do we say the variables are positively correlated, negatively correlated and uncorrelated.
(i) If r=1 then there is a perfect positive correlation.
(ii) If r= -1 then there is a perfect negative correlation.
(iii) If r=0 then the variables are uncorrelated.
19. Mention the two mathematical models for a time series.
1. Additive model – This model assumes that the four components of the time series Trend, seasonal,
cyclical and irregular variations are independent of each other.
2. Multiplicative model - This model assumes that the four components of the time series are
interdependent.
20. State the limitations of Method of Moving averages.
(1) Trend values cannot be calculated for all the years that is, some years will be left out in the beginning
and in the end.
(2) The period of moving average has to be chosen with great care.
(3) This method cannot be used for forecasting.
PART B
1a. Calculate the coefficient of correlation between X and Y , using the following data : (AU APR 2018)
X 1 3 5 7 8 10
Y 8 12 15 17 18 20
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
1b. Fit a second degree polynomial equation for the following data
X 1976 1977 1978 1979 1980 1981 1982 1983 1984
Y 50 65 70 85 82 75 65 90 95
2. Given below are the figures of production (in thousand quintals) of a sugar factory.
Year 1974 1975 1976 1977 1978 1979 1980
Production 77 88 94 85 91 98 90
Fit a straight line by the least squares method and tabulate the trend values.
3a. Find the two regression lines using the data below: (AU NOV/DEC 2018)
X 7 4 8 6 5
Y 6 5 9 8 2
3b. The following data on production (in ‘000 units) of a commodity from the year 2006-2012. Fit a
straight line trend and forecast for the year 2020 (AU NOV/DEC 2017)
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
5a. The following table gives the profits of a concern for 5 years ending 1983. Fit an exponential curve
for the following data (AU MAY/JUNE 2019)
Year 1979 1980 1981 1982 1983
Profits 1.6 4.5 13.8 40.2 125.0
5b. The quarterly sales (in thousands of copies) for a specific education software over the past three
years are given in the following table.
2003 2004 2005
Quarter 1 170 180 190
Quarter 2 111 96 120
Quarter 3 270 280 290
Quarter 4 250 220 223
(i) Compute the four seasonal factors (Seasonal Indexes). Show all of your computations.
(ii) The trend for these data is Trend = 174+4t (t represents time, where t=1 for Quarter 1 of 2003
and t=12 for Quarter 4 of 2005). Forecast sales for the first quarter of 2006 using the trend and
seasonal indexes. Show all of your computations.
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
Click on Subject/Paper under Semester to enter.
3rd Semester
Human Resources
2nd Semester