Statistics for Management SFM - BA4101 - Notes by JeppiaarEC

Click on Subject/Paper under Semester to enter.
Statistics for Management Quantitative Techniques for Strategic Management -

- BA4101 Decision Making - BA4201 BA4301
Management Concepts and Financial Management -

BA4202 International Business -
Organizational Behavior -
BA4302
BA4102
1st Semester
3rd Semester
Human Resources
2nd Semester
Managerial Economics - Management - BA4203 Elective - 1

BA4103
Operations Management -
Elective - 2
BA4204
Accounting for Decision
Making - BA4104
Business Research Methods - Elective - 3
BA4205
Legal Aspects of Business - Elective - 4

BA4105
Business Analytics - BA4206
Elective - 5
Information Management - Marketing Management -
BA4106 BA4207 Elective - 5
All MBA Engg Subjects (Click on Subjects to enter)
Financial Management Human Resources Management Information Management
Marketing Management Accounting For Managers Research Methodology
Business Environment Management Concepts & Human Resources
and Law Organisational Behaviour Management
Managerial Economics Marketing Management Financial Management
Operations Management Strategic Management Strategic Management
International Business Business Ethics Corporate Social Enterprise Resource
Management Responsibility and Governance Planning
Customer Relationship Security Analysis and Portfolio Customer Relationship
Management Management Management
Services Marketing Entrepreneurship Development Rural Marketing
Merchant Banking and Banking Financial Services Managerial Behavior and
Financial Services Management Effectiveness
Industrial Relations and
Labour Welfare
www.BrainKart.com
DEPARTMENT OF MANAGEMENT STUDIES
II YEAR / III SEMESTER
BA4101: STATISTICS FOR MANAGEMENT
STUDY MATERIAL
Faculty In charge
Dr. P. SIVAGAMI
Anna University Chennai

Regulation 2021
https://play.google.com/store/apps/details?id=info.therithal.brainkart.annauniversitynotes&hl=en_IN
www.BrainKart.com
TABLE OF CONTENTS
S.No Particulars Pg No
1 COLLEGE- Mission/ Vision 3
2 MBA Dept - Mission/ Vision 3
PEO’S/PO’s 4
CO’S/ CO-PEO matrix 4
3 SYLLABUS OF THE SUBJECT 5
4 STUDY MATERIAL
UNIT I 6
UNIT II 39
UNIT III 48
UNIT IV 108
UNIT V 117
5 QUESTIONS BNAK UNITWISE 122
PART –A (30 questions with answer) /
PART (10 questions from the unit) with page number from the
question bank. ( Unit 1 to Unit 3)
6 PREVIOUS YEAR UNIVERSITY QUESTION 139
www.BrainKart.com
Jeppiaar Nagar, OMR Salai, Semmencherry ,Chennai -600119
VISION
To build Jeppiaar Engineering College as an institution of academic excellence in technology and
management education, leading to become a world class university.
MISSION
• To excel in teaching and learning, research and innovation by promoting the principles of
scientific analysis and creative thinking.
• To participate in the production, development and dissemination of knowledge and
interact with national and international communities.
• To equip students with values, ethics and life skills needed to enrich their lives and enable
them to contribute for the progress of society.
• To prepare students for higher studies and lifelong learning, enrich them with the practical
skills necessary to excel as future professionals and entrepreneurs for the benefit of
Nation’s economy.
DEPARTMENT OF MANAGEMENT STUDIES
VISION
To be a prominent management institution developing industry ready managers,

entrepreneurs and socially responsible leaders by imparting extensive expertise and competencies.
MISSION
• To provide management education to all groups in the community.

• To practice management through scholarly research and education.
• To advance in the best practices of management which enable the students to meet the
global industry demand.
• To promote higher studies, lifelong learning, entrepreneurial skills and develop socially
responsible professionals for empowering nation’s economy.
www.BrainKart.com
PROGRAMME EDUCATIONAL OBJECTIVES (PEOs):

MBA programme curriculum is designed to prepare the post graduate students
• To have a thorough understanding of the core aspects of the business.
• To provide the learners with the management tools to identify, analyze and create business
opportunities as well as solve business problems.
• To prepare them to have a holistic approach towards management functions.
• To inspire and make them practice ethical standards in business.
PROGRAMME OUTCOMES (POs )

On successful completion of the programme,
1. Ability to apply the business acumen gained in practice.
2. Ability to understand and solve managerial issues.
3. Ability to communicate and negotiate effectively, to achieve organizational and individual
goals.
4. Ability to understand one’s own ability to set achievable targets and complete them.
5. Ability to fulfill social outreach
6. Ability to take up challenging assignments
COURSE OBJECTIVE:
To learn the applications of statistics in business decision making.
COURSE OUTCOMES:
C101.1: Facilitate objective solutions in business decision making.
C101.2: Understand and solve business problems.
C101.3: Apply statistical techniques to data sets, and correctly interpret the results.
C101.4: Develop skill-set that is in demand in both the research and business environments.
C101.5: Enable the students to apply the statistical techniques in a work setting.
CO-PO Matrix
CO PO1 PO2 PO3 PO4 PO5 PO6
CO1 3 3 3 0 0 2
CO2 3 3 3 0 0 2
CO3 3 3 3 0 0 2
CO4 3 3 3 0 0 2
CO5 3 3 3 0 3 2
Average 3 3 3 0 3 2
www.BrainKart.com
BA4101 - STATISTICS FOR MANAGEMENT
Syllabus
COURSE OBJECTIVE:
To learn the applications of statistics in business decision making.
UNIT I INTRODUCTION
Basic definitions and rules for probability, conditional probability independence of events,
Baye‘s theorem, and random variables, Probability distributions: Binomial, Poisson, Uniform
and Normal distributions.
UNIT II SAMPLING DISTRIBUTION AND ESTIMATION
Introduction to sampling distributions, sampling distribution of mean and proportion, application
of central limit theorem, sampling techniques. Estimation: Point and Interval estimates for
population parameters of large sample and small samples, determining the sample size.
UNIT III TESTING OF HYPOTHESIS – PARAMETIRC TESTS
Hypothesis testing: one sample and two sample tests for means and proportions of large samples
(ztest), one sample and two sample tests for means of small samples (t-test), F-test for two
sample standard deviations. ANOVA one and two way
UNIT IV NON-PARAMETRIC TESTS
Chi-square test for single sample standard deviation. Chi-square tests for independence of
attributes and goodness of fit. Sign test for paired data. Rank sum test. Kolmogorov-Smirnov –
test for goodness of fit comparing two populations. Mann – Whitney U test and Kruskal Wallis
test. One sample run test.
UNIT V CORRELATION AND REGRESSION
Correlation – Coefficient of Determination – Rank Correlation – Regression – Estimation of
Regression line – Method of Least Squares – Standard Error of estimate.
REFERENCES:
1. Richard I. Levin, David S. Rubin, Masood H.Siddiqui, Sanjay Rastogi, Statistics for
Management, Pearson Education, 8th Edition, 2017.
2. Prem. S. Mann, Introductory Statistics, Wiley Publications, 9th Edition, 2015.
3. T N Srivastava and Shailaja Rego, Statistics for Management, Tata McGraw Hill, 3rd Edition
2017.
4. Ken Black, Applied Business Statistics, 7th Edition, Wiley India Edition, 2012.
5. David R. Anderson, Dennis J. Sweeney, Thomas A.Williams, Jeffrey D.Camm, James
J.Cochran, Statistics for business and economics, 13th edition, Thomson (South – Western) Asia,
Singapore, 2016.
6. N. D. Vohra, Business Statistics, Tata McGraw Hill, 2017.
www.BrainKart.com
UNIT I INTRODUCTION
Basic definitions and rules for probability, conditional probability independence of
events, Baye's theorem, and random variables, Probability distributions: Binomial,
Poisson, Uniform and Normal distributions.
UNIT-1 INTRODUCTION
Random experiment: An experiment whose all possible outcomes are known, but
it is not possible to predict the outcome.
Probability:
Let A be a event and B be a its sample space then its probability on the occurrence
No. of favourable Cases
on events is defined as P ( A ) = .
Total no. of exhaustic Cases
Axioms of Probability:
n n
(i) 0  P( E )  1 (ii) P( S ) = 1 (iii) P( Ei ) =  P ( Ei ) if Ei’s are mutually exclusive
i =1 i =1
events.
Example: (i) A fair coin is “tossed” (ii) A die is “rolled” are random experiments,
since we cannot predict the outcome of the experiment in any trial.
Mutually exclusive:
Two events are said to be mutually exclusive if the occurrence of any one of them
excludes the occurrence of other in a single experiment.
Example: Tossing of Coin.
Independent events:
Two (or) more events are independent if the occurrence of one does not affect the
occurrence of the other.
Example: If coin is tossed twice; result of second throw is not affected by the result
of first throw.
Addition Law of Probability:
If A and B are two events in a sample space “S” then
P ( A  B ) = P ( A) + P ( B ) − P ( A  B ) .
Conditional Probability:
The conditional probability of an event B assuming that the event A has happened,
P ( A  B)
is defined as P ( B A) = , P ( A)  0
P ( A)
P ( A  B)
Similarly, P ( A B ) = , P ( B)  0 .
P ( B)
1. If A and B are independent events then a) A and B b) A and B are also
independent.
Solution:
Since A and B are independent,
P ( A  B ) = P ( A) P ( B ) − − − 1
a) P ( A  B ) = P ( A ) − P ( A  B )
= P ( A) − P ( A) P ( B ) [u sin g (1)]
= P ( A ) 1 − P ( B ) 
P ( A  B ) = P ( A ) P ( B )  A & B are independent events
(
b) P ( A  B ) = P A  B )
= 1− P ( A  B)
= 1 −  P ( A ) + P ( B ) − P ( A  B )  [ By addition theorem]
www.BrainKart.com
= 1 − P ( A) − P ( B ) + P ( A  B )
= 1 − P ( A) − P ( B ) + P ( A) P ( B ) [u sin g − 1]
= 1 − P ( A ) − P ( B ) 1 − P ( A ) 
= 1 − P ( A )  1 − P ( B ) 
P ( A  B ) = P ( A) P ( B )
 A & B are indepentent events .
2. A Problem in statistics is given to three students. A, B and C whose chances
1 1 1
of solving it are , and respectively. What is the probability that the
2 3 4
problem will be solved?
Solution:
Let A, B, C Denote the events that the problem is solved by the students A, B, C
respectively.
1 1 1
Then P ( A) = , P ( B) = , P (C ) =
2 2 4
P ( A) = 1 − =
1 1
2 2
P ( B ) = 1− =
1 2
3 3
P (C ) = 1 − =
1 3
4 4
P(all the three students will not solve the problem) = P ( A ) P ( B ) P ( C ) = . . =
1 2 3 1
2 3 4 4
P(all the three students will solve the problem) = P ( A  B  C )
= 1 − P ( A ) P ( B ) P (C ) = 1−
1 3
=
4 4
, P ( AB ) = and P ( A ) = find
3 1 2
3. Event A and B are such that P ( A + B ) =
4 4 3
P ( B) .
Solution:
Given P ( A + B ) = , P ( AB ) = , P ( A ) =
3 1 2
4 4 3
i.e. P ( A) = 1 − P ( A ) = 1 − =
2 1
3 3
By addition theorem
P ( A  B ) = P ( A) + P ( B ) − P ( A  B )
i.e. P ( B ) = P ( A  B ) − P ( A) + P ( A  B )
3 1 1 9−4+3 8 2
P ( B) =
− + = = =
4 3 4 12 12 3
4. An integer is chosen at random from two hundred digits. What is the
probability that the integer is divisible by 6 or 8?
Solution:
The sample space = 1, 2,3......199, 200
n ( S ) = 100
Let the event A be an integer chosen that is divisible by 6,
i.e. A = 6,12,18........198
198
n ( A) = = 33
6
www.BrainKart.com
n ( A) 33
 n ( A) = =
n ( S ) 200
Let the event B be an integer chosen that is divisible by 8
i.e. B = 8,16, 24.....200
200
n ( B) = = 25
8
n ( B ) 25
 P ( B) = =
n ( S ) 200
The L.C.M of 6 & 8 is 24.
Hence, a number that is divisible by both 6 & 8 is divisible by 24.
 A  B = 24, 48, 72,.....192
192
n ( A  B) = =8
24
n ( A  B) 8
 P ( A  B) = =
n(S ) 200
Hence by addition theorem on probability
33 25 8 58 − 8 50 1
P ( A  B ) = P ( A) + P ( B ) − P ( A  B ) = + − = = =
200 200 200 200 200 4
5. A and B throw alternatively with a pair of dice. A wins if he throws 6 before
B throws 7 and 8 wins if he throws 7 before a throws 6.If A begins, show that
their respective chances of winning are in the ratio 30:61.
Solution:
Let Ai denote the event of A’s throwing 6 in the ith thrown i = 1, 2,3,...
‘6’ can be obtained with two dice in the following ways
(1,5)( 5,1)( 2, 4 )( 4, 2 )( 3,3)
i.e. 5 distinct ways
P ( Ai ) =
5
36
( ) 31
, P Ai = 1 − P ( Ai ) = , i = 1, 2,...
36
Let Bi denote the event of B’s throwing 7 in the ith thrown i = 1, 2,3,...
‘7’ can be obtained with two dice in the following ways
(1, 6 ) , ( 6,1) , ( 2,5) , ( 5, 2 ) , (3, 4 ) , ( 4,3)
i.e. 6 distinct ways
P ( Bi ) =
6
36
( ) 5
, P Bi = 1 − P ( Bi ) = , i = 1, 2,.....
6
6. If A starts the game, he will win in the following mutually exclusive ways:
(i) A1 happens (ii) A1  B2  A3 happens
(iii) A1  B2  A3  B4  A5 happens, and so on.
Hence by addition theorem of probability, the required probability of winning is
given by P ( A ) ,
P ( A) = P ( i ) + P ( ii ) + P ( iii ) + ...
( ) (
= P ( A1 ) + P A1  B2  A3 + P A1  B2  A3  B4  A5 + ... )
= P ( A ) + P ( A ) P ( B ) P ( A ) + P ( A ) P ( B ) P ( A ) P ( B ) P ( A ) + ..
1 1 2 3 1 2 3 4 5
[ the events are mutually independent]

5  31 5  5  31 5   31 5  5
= +     +         + ...
36  36 6  36  36 6   36 6  36
www.BrainKart.com
 a 5  31  5  
 the series is an infinite geometric series 1 − r where a = 36 & r =  36  6  
    
5 5
30
= 36 = 36 = .
 31  5 61 61
1 −  
 36  6 216
7. The probability that a contractor will get a plumbing contract is 2 and the
3
probability that he will not get an electric contract is 5 .If the probability of
9
getting at least one contract is 4 , what is the probability that he will get both
5
?
Solution:
Let A be an event of getting a plumbing contract & B be an event of getting an electric
contract.
2
( ) 5
P ( A) = , P B = , P ( A  B ) =
3 9
4
5
( )
P ( B) = 1− P B = 1− =
5 4
9 9
By addition theorem of probability
P ( A  B ) = P ( A) + P ( B ) − P ( A  B )
2 4 4 30 + 20 − 36 50 − 36 14
P ( A  B) = + − = = =
3 9 5 45 45 45
14
i.e. probability of getting both the contract is .
15
5 1 1
( )
8. Let P ( A  B ) = , P ( A  B ) = and P B = . Are the events A and B
6 3 2
independent Explain.
Solution:
( )
P ( B) = 1− P B =
1
2
P ( A) = P ( A  B ) + P ( A  B ) − P ( B )
5 1 1 5+ 2−3 4 2
= + − = = =
6 3 2 6 6 3
1 21 1
Since P ( A  B ) = = P ( A) P ( B ) P ( A) P ( B ) = =
3 32 3
Hence A & B are independent
Total probability of an event:

If A1 , A2 ,..., An are mutually exclusive and exhaustive events and B is any event in
S then P ( B ) = P ( B1 ) P ( B A1 ) + P ( B2 ) P ( B A2 ) + ... + P ( Bn ) P ( B An ) .
State Baye’s theorem.
If E1 , E2 ,..., En are mutually disjoint events with P ( Ei )  0, (i = 1, 2,..., n) then for any
n
arbitrary event a which is a subset of Ei such that P ( A )  0 We have
i =1
P ( Ei ) P ( A / Ei )
P ( Ei / A) = n
, i = 1, 2,..., n .
 P(E ) P( A/ E )
i =1
i i
9. If the probability that A solves a problem is 1 2 and that for B is 3 4 and if they
www.BrainKart.com
aim at solving a problem independently, what is the probability that the
problem is solved?
Solution:
Probability of A solving a problem is P ( A) = 1 & that of B is P ( B ) = 3 .
2 4
1 3 3
A & B are independent P ( A  B ) = P ( A) P ( B ) = . =
2 4 8
Hence the probability that the problem is solved is
1 3 3
P ( A  B ) = P ( A) + P ( B ) − P ( A  B ) = + − Using (1)
2 4 8
4 + 6 − 3 10 − 3 7
= = = .
8 8 8
10. In a shooting test, the probability of hitting the target is 1 for A, 2 for B
2 3
and 3 for C. If all of them fine at the target, find the probability that (i) none
4
of them hits the target and (ii) atleast one of them hits the target.
Solution:
1 2 3
Given P ( A) = , P ( B ) = , P ( C ) =
2 3 4
( ) 1
( )
P A = ,P B = ,P C =
2
( )1
3
1
4
( ) ( ) ( ) ( )
(i) P A  B  C = P A P B P C (by independence) =   =
1 1 1 1
2 3 4 24
1 23
(ii) P (atleast one hits the target) =1−P(none hits the target) = 1 − =
24 24
11. A bolt is manufactured by 3 machines A,B and C. A turns out twice as many
items as B, and machines B and C produce equal number of items. 2% of bolts
produced by A and B are defective and 4% of bolts produced by C are defective.
All bolts are put into 1 stock pile and 1 is chosen from this pile. What is the
Probability that it is defective?
Solution:
Let A, B & C be the event in which the item has been produced by machine A, B & C
respectively.
D be the event of the item being defective.
1 1
Given P ( A) = , P ( B ) = P ( C ) =
2 4
P ( D / A) = P (an item is defective, given that A has produced it)
2
= = P ( D / B)
100
4
P(D /C) =
100
By theorem of total probability,
P ( D ) = P ( A)  P ( D / A) + P ( B )  P ( D / B ) + P ( C )  P ( D / C )
1 2 1 2 1 4 1 1 1 2 1
=  +  +  = + + = +
2 100 4 100 4 100 100 200 100 100 200
4 +1 5 1
= = = .
200 200 40
12. For a certain binary communication channel, the probability that a
transmitted ‘0’ is received as a ‘0’ is 0.95 and the probability that a transmitted
‘1’ is received as ‘1’is 0.90. If the probability that (i) a ‘1’ is received and (ii) a
‘1’ was transmitted given that a ‘1’ was received.
Solution:
Let A be the event of transmitting ‘1’
www.BrainKart.com
A be the event of transmitting ‘0’
B be the event of received ‘1’ &
B be the event of receiving ‘0’.
( ) ( )
Given P A = 0.4, P ( B / A) = 0.9 & P B / A = 0.95
 P ( A) = 1 − P ( A) = 1 − 0.4 = 0.6 & P ( B / A) = 0.05

By the theorem of total probability
( ) ( )
P ( B ) = P ( A) P ( B / A) + P A P B / A = 0.6  0.9+0.4  0.05 = 0.56
By Baye’s Theorem,
P ( A)  P ( B / A) 0.6  0.9 27
P ( A / B) = = = .
P ( B) 0.56 28
13. A given lot of IC chips contains 2% defective chips. Each chip is tested
before delivery. The tester itself is not reliable. Probability of tester says the
chip is good when it is really good is 0.95 and the probability of tester says
chip is defective when it is actually defective is 0.94. If a tested device is
indicated to be defective, what is the probability that it is actually defective?
Solution:
Let A be the event of chip that is actually defective & B be the event of chip that is
actually good.
Let D be the event of tester says it is good.
Given P ( A) = 0.02 , P ( B ) = 0.98 , P ( D / B ) = 0.95
P ( D / A ) = 0.94 , P ( D / B ) = 1 − P ( D / B ) = 1 − 0.95 = 0.05
P ( D ) = P ( A ) P ( D / A ) + P ( B ) P ( D / B ) =0.02  0.94+0.98  0.05 = 0.0678
By Baye’s rule,
P ( A) P ( D / A) 0.94  0.02
P ( A / D) = =
P ( D) 0.0678
P ( A / D ) = 0.2772 .
14. An urn contains 5 balls. Two balls are drawn and are found to be white.
What is the probability of all the balls being white?
Solution:
A1 A2 A3 A4
W O W O W O W
2 3 3 2 4 1 5
Where W denotes white ball & O denotes other colors.
Let it be A1 , A2 , A3 , A4
1
 P ( A1 ) = P ( A2 ) = P ( A3 ) = P ( A4 ) =
4
Let D be the event of selecting 2 white balls
2C 1
P ( D / A1 ) = 2 =
5C2 10
3C 3
P ( D / A2 ) = 2 =
5C2 10
4C 3
P ( D / A3 ) = 2 =
5C2 5
5C
P ( D / A4 ) = 2 = 1
5C2
www.BrainKart.com
P ( D ) = P ( A1 ) P ( D / A1 ) = P ( A2 ) P ( D / A2 ) + P ( A3 ) P ( D / A3 ) + P ( A4 ) P ( D / A4 )
1  1 3 3  1  1 + 3 + 6 + 10  20 1
=  + + + 1 =  = =
4  10 10 5  4  10  40 2
By Bayer’s theorem,
1
P ( A4 ) P ( D / A4 ) 4 1 1
P ( A4 / D ) = = = .
P ( D) 1 2
2
15. The first bag contains 3 white balls, 2 red balls and 4 black balls. Second
bag contains 2 white, 3 red and 5 black balls and third bag contains 3 white, 4
red and 2 black balls. One bag is chosen at random and from it 3 balls are
drawn. Out of three balls two balls are white and one is red. What are the
probabilities that they were taken from first bag, second bag, third bag.
Solution:
Urn I Urn II Urn III
3 2 4 2 3 5 3 4 2
W R B W R B W R B
Where W, R, B denotes white, red & black balls
Let it be A1 , A2 , A3
1
P ( A1 ) = P ( A2 ) = P ( A3 ) =
3
Let D be the event of selecting three balls taken from the selected bag that 2 are white
and 1 is red.
3C  2C1 6
P ( D / A1 ) = 2 =
9C3 84
2C  3C1 3
P ( D / A2 ) = 2 =
10C3 120
3C2  4C1 12
P ( D / A3 ) = =
9C3 84
By the theorem of total probability,
P ( D ) = P ( A1 ) P ( D / A1 ) + P ( A2 ) P ( D / A2 ) + P ( A3 ) P ( D / A3 )
1 6 1 3 1 12
=  +  +  = 0.0746.
3 84 3 120 3 84
By Baye’s theorem,
6 1
P ( A1 ) P ( B / A1 ) 84  3
P ( A1 / B ) = = = 0.319
P ( D) 0.0746
1 3
P ( A2 ) P ( B / A2 ) 3  120
P ( A2 / B ) = = = 0.0428
P ( D) 0.0746
1 12
P ( A3 ) P ( B / A3 ) 3  84
P ( A3 / B ) = = = 0.6380
P ( D) 0.0746
16. A factory produces a certain type of outputs by three types of machine. The
respective daily Production figures are: Machine I: 3,000 Units; Machine II:
2,500 Units; Machine III: 4,500 Units.Past experience shows that I percent of
the output produced by machine I is defective. The corresponding fractions of
defectives for the other two machines are 1.2 percent and 2 percent
respectively. An item is drawn at random from the day’s production run and is
www.BrainKart.com
found to be defective. What is the probability that it comes from the output of
(i) Machine I, (ii) Machine II, (iii) Machine III?
Solution:
Let A1 A2 & A3 denote the events that the output is produced by machines I, II & III
respectively and let D denote the event that the output is defective.
Thus
3000 2500 4500
P ( A1 ) = = 0.30, P ( A2 ) = = 0.25, P ( A3 ) = = 0.45
10,000 10,000 10,000
P ( D / A1 ) = 1% = 0.01, P ( D / A2 ) = 1.2% = 0.012, P ( D / A3 ) = 2% = 0.02
By the theorem of total probability,
P ( D ) = P ( A1 ) P ( D / A1 ) + P ( A2 ) P ( D / A2 ) + P ( A3 ) P ( D / A3 )
= ( 0.30 )( 0.01) + ( 0.25)( 0.012 ) + ( 0.45)( 0.02 ) = 0.015
By Baye’s rule
P ( A1 ) P ( D / A1 ) 0.003 1
(i) P ( A1 / D ) = = =
P ( D) 0.015 5
P ( A2 ) P ( D / A2 ) 0.003 1
(ii) P ( A2 / D ) = = =
P ( D) 0.015 5
P ( A3 ) P ( D / A3 ) 0.009 3
(iii) P ( A3 / D ) = = =
P ( D) 0.015 5
17. There are two boxes B1 and B2 . B1 contains two red balls and one green
ball. B2 contains one red ball and two green balls.
(i)A ball is drawn from one of the boxes randomly. It is found to be red. What
is the
Probability that it is from B1 ?
(ii)Two balls are drawn randomly from one of the boxes without replacement.
One is red and the other is green. What is the probability that they came from
B1 ?
(iii)A ball drawn from one of the boxes is green. What is the probability that it
came from B2 ?
(iv)A ball drawn from one of the boxes is white. What is the probability
that it came from B2 ?
Solution:
Let B1 and B2 be the events that the boxes B1 and B2 are selected respectively.
1 1
P( B1 ) = , P( B2 ) =
2 2
Let A be the event that a red ball is selected.
2 1
P( A / B1 ) = , P( A / B2 ) =
3 3
By Baye’s theorem,
(i) P( ball is from B1 , given it is red)
1 2

P( B1 ) P( A / B1 ) 2 3 2
= P( A / B1 ) = = =
P( B1 ) P( A / B1 ) + P( B2 ) P( A / B2 )  1 2   1 1  3
  +  
 2 3  2 3
(ii) Let C be the event that a red ball and a green ball are selected.
2C1  1C1 2 1C  2C1 2
P(C / B1 ) = = , P(C / B2 ) = 1 =
3C 2 3 3C 2 3
www.BrainKart.com
P( B1 was chosen given a red ball and a green ball were selected )
1 2

P( B1 ) P(C / B1 ) 2 3 1
= P( B1 / C ) = = =
P( B1 ) P(C / B1 ) + P( B2 ) P(C / B2 )  1 2   1 2  2
  +  
 2 3  2 3
(iii) Let D be the event that a green ball is selected.
1 2
P( D / B1 ) == , P( D / B2 ) =
3 3
1 2

P( B2 ) P( D / B2 ) 2 3 2
= P( B2 / D ) = = =
P( B1 ) P( D / B1 ) + P( B2 ) P( D / B2 )  1 1   1 2  3
  +  
 2 3  2 3 
(iv) Let E be the event that a white ball is selected.
The given two boxes does not contain a white ball, hence the probability is 0.
Random Variable:
A random variable is a real valued function whose domain is the sample space
of a random experiment taking values on the real line .
Discrete Random Variable:
A discrete random variable is one which can take only finite or countable
number of values with definite probabilities associated with each one of them.
Probability mass function:
Let X be discrete random variable which assuming values x1 , x2 ,..., xn with
each of the values, we associate a number called the probability
P ( X = xi ) = p ( xi ) , ( i = 1, 2,..., n ) this is called the probability of xi satisfying the
following conditions
i. pi  0 i i.e., pi ' s are all non-negative
n
ii.  pi = p1 + p2 + ... + pn = 1 i.e., the total probability is one.
i =1
Continuous random variable:
A continuous random variable is one which can assume every value between
two specified values with a definite probability associated with each.
Probability Density Function:
A function f is said to be the probability density function of a continuous
random variable X if it satisfies the following properties.
i. f ( x )  0; −   x  

ii.  f ( x )dx = 1.
−
Distribution Function or Cumulative Distribution Function
i.Discrete Variable:
A distribution function of a discrete random variable X is defined as
P ( X  x ) =  P ( xi ) .
xi  x
ii.Continuous Variable:
A distribution function of a continuous random variable X is defined as
x
F ( x) = P ( X  x) =  f ( x ) dx .
−
Mathematical Expectation
The expected value of the random variable X is defined as
www.BrainKart.com

i.If X is discrete random variable E ( X ) =  xi p ( xi ) where p ( x ) is the probability
i =1
function of x .

ii.If X is continuous random variable E ( X ) =
−
 xf ( x ) dx where f ( x ) is the probability
density function of x .
Properties of Expectation:
1. If C is constant then E ( C ) = C
Proof:
Let X be a discrete random variable then E ( x ) =  xp ( x )
Now E ( C ) =  Cp ( x )
n
= C p ( x) since p
i =1
i = p1 + p2 + ... + pn = 1
=C
2. If a, b are constants then E ( ax + b ) = aE ( x ) + b
Proof:
Let X be a discrete random variable then E ( x ) =  xp ( x )
Now E ( ax + b ) =  ( ax + b ) p ( x )
=  axp ( x ) +  bp ( x )
n
= a  xp ( x ) + b p ( x ) since p
i =1
i = p1 + p2 + ... + pn = 1
= aE ( x ) + b
3. If a and b are constants then Var ( ax + b ) = a 2Var ( x )
Proof:
Var ( ax + b ) = E ( ax + b − E ( ax + b ) ) 
2
 
= E ( ax + b − aE ( x ) − b ) 
2
 
= E a ( x − E ( x ) ) 
2 2
 
= a 2 E ( x − E ( x ) ) 
2
 
= a Var ( x ) .
2
4. If a is constant then Var ( ax ) = a 2Var ( x )

Proof:
Var ( ax ) = E ( ax − E ( ax ) ) 
2
 
= E ( ax − aE ( x ) ) 
2
 
= E a 2 ( x − E ( x ) ) 
2
 
= a 2 E ( x − E ( x ) ) 
2
 
= a Var ( x ) .
2
( )
5. Prove that Var ( x ) = E x2 −  E ( x )
2
Proof:
www.BrainKart.com
Var ( x) = E ( x − E ( x ) ) 
2
 
= E  x + ( E ( x ) ) − 2 xE ( x ) 
2 2
 
= E  x +  − 2 x 
2 2
= E ( x 2 ) + E (  2 ) − E ( 2 x )
= E ( x 2 ) +  2 − 2 E ( x )
= E ( x 2 ) +  2 − 2 2
= E ( x2 ) −  2
Var ( x) = E ( x2 ) −  E ( x )
2
Moment Generating Function (m.g.f)

A moment generating function of a random variable X (about origin) is
  e f ( x ) dx , if x is continuous
 tx
defined as M X ( t ) = E ( e ) = 
tX
 e p ( x ) , if x is discrete

tx
Properties of Moment Generating Function

1. M cx ( t ) = M x ( ct )
Proof:
M cx ( t ) = E ( ecxt )
(
= E e x( ct ) )
= M x ( ct )
2. M x + c ( t ) = ect M x ( t )
Proof:
(
M x+c ( t ) = E e( x+c)t )
= E (e e )xt ct
= ect M x ( t )
3. M ax +b ( t ) = ebt M x ( at )
Proof:
M ax+b ( t ) = E e( ax+b)t( )
= E ( e axt ebt )
(
= ebt E e x( at ) )
= ebt M x ( at )
4. If X and Y are independent random variables then M x + y ( t ) = M x ( t ) .M y ( t )
Proof:
(
M x+ y ( t ) = E e( x+ y )t )
= E ( e xt e yt )
= E ( e xt ) E ( e yt )
M x+ y ( t ) = M x (t ) M y (t )
Problem.1
If the probability distribution of X is given as
X : 1 2 3 4
P X : 0.4 0.3 0.2 0.1
www.BrainKart.com
Find P (1/ 2  X  7 / 2 X  1)
Solution:
P (1/ 2  X  7 / 2 )  X  1
P 1/ 2  X  7 / 2 / X  1 =
P ( X  1)
P ( X = 2 or 3) P ( X = 2 ) + P ( X = 3)
= =
P ( X = 2,3 or 4 ) P ( X = 2 ) + P ( X = 3) + P ( X = 4 )
0.3 + 0.2 0.5 5
= = = .
0.3 + 0.2 + 0.1 0.6 6
Problem.2
A random variable X has the following probability distribution
X : 2 1 0 1 2 3
P X : 0.1 K 0.2 2 K 0.3 3K
a) Find K , b) Evaluate P ( X  2 ) and P ( −2  X  2 )
b) Find the cdf of X and d) Evaluate the mean of X .
Solution:
a) Since  P ( X ) = 1
0.1 K 0.2 2K 0.3 3K 1
6K 0.6 1
6K 0.4
0.4 1
K= =
6 15
b) P ( X  2 ) = P ( X = −2, −1, 0 or 1)
= P ( X = −2 ) + P ( X = −1) + P ( X = 0 ) + P ( X = 1)
1 1 1 2 3 + 2 + 6 + 4 15 1
= + + + = = =
10 15 5 15 30 30 2
P ( −2  X  2 ) = P ( X = −1, 0 or 1)
1 1 2 1+ 3 + 2 6 2
= P ( X = −1) + P ( X = 0 ) + P ( X = 1) =
+ + = = =
15 5 15 15 15 5
c) The distribution function of X is given by F ( x ) defined by
X =x P( X = x ) F ( x ) = P( X  x)
-2 1 1
F ( x ) = P( X  −2) =
10 10
-1 1 1
F ( x ) = P( X  −1) =
15 6
0 2 11
F ( x ) = P( X  0) =
10 30
1 2 1
F ( x ) = P( X  1) =
15 2
2 3 4
F ( x ) = P( X  2) =
10 5
3 3 F ( x ) = P ( X  3) = 1
15
d) Mean of X is defined by E ( X ) =  xP ( x )
 1  1   1  2   3   1
E ( X ) =  −2   +  −1  +  0   +  1  +  2   +  3  
 10   15   5   15   10   5 
www.BrainKart.com
1 1 2 3 3 16
=− − + + + = .
5 15 15 5 5 15
Problem.3
A random variable X has the following probability function:
X : 0 1 2 3 4 5 6 7
2 2 2
P X : 0 K 2 K 2 K 3K K 2 K 7 K K
Find (i) K , (ii) Evaluate P ( X  6 ) , P ( X  6 ) and P ( 0  X  5 )
(iii). Determine the distribution function of X .
(iv). P (1.5  X  4.5 X  2 )
(v). E ( 3x − 4 ) , Var (3x − 4)
1
(vi). The smallest value of n for which P ( X  n )  .
2
Solution:
7
(i) Since  P ( X ) = 1,
x =0
K + 2K + 2K + 3K + K 2 + 2K 2 + 7 K 2 + K = 1
10 K 2 + 9 K − 1 = 0
1
K= or K = −1
10
1
As P ( X ) cannot be negative K =
10
(ii) P ( X  6 ) = P ( X = 0 ) + P ( X = 1) + ... + P ( X = 5 )
1 2 2 3 1 81
= + + + + + ... =
10 10 10 10 100 100
Now P ( X  6 ) = 1 − P ( X  6 )
81 19
= 1−
=
100 100
Now P ( 0  X  5) = P ( X = 1) + P ( X = 2 ) + P ( X = 3) = P ( X = 4 )
= K + 2K + 2K + 3K
8 4
= 8K = = .
10 5
(iii) The distribution of X is given by F ( x ) = P ( X  x )
X =x P( X = x ) F ( x ) = P( X  x)
0 0 F ( x ) = P( X  0) = 0
1 1 1
F ( x ) = P( X  1) =
10 10
2 2 3
F ( x ) = P( X  2) =
10 10
3 2 5
F ( x ) = P( X  3) =
10 10
4 3 8
F ( x ) = P( X  4) =
10 10
5 1 81
F ( x ) = P ( X  5) =
100 100
6 2 83
F ( x ) = P( X  6) =
100 100
7 17 F ( x ) = P( X  7) = 1
100
www.BrainKart.com
5
P ( x = 3) + P ( x = 4 ) 5
(iv) P (1.5  X  4.5 X  2 ) = = 10 =
1 −  P ( x = 0 ) + P ( x = 1) + P ( x = 2 )  3 7
1−  
10 
(v) E ( x ) =  xp ( x)
1 2 2 3 1 2 17
= 1 + 2  + 3 + 4  + 5  + 6 + 7
10 10 10 10 100 100 100
E ( x ) = 3.66
E ( x2 ) =  x2 p ( x )
1 2 2 3 1 2 17
= 12  + 22  + 32  + 42  + 52  + 62  + 72 
10 10 10 10 100 100 100
E ( x ) = 16.8
2
Mean = E ( x ) = 3.66
( )
Variance = E x 2 −  E ( x ) = 16.8 − ( 3.66 ) = 3.404
2 2
1
(vi) The smallest value of n for which P ( X  n )  is 4
2
Problem.4
The probability mass function of random variable X is defined as P ( X = 0 ) = 3C 2 ,
P ( X = 1) = 4C − 10C 2 , P ( X = 2 ) = 5C − 1 , where C  0 , and P ( X = r ) = 0 if r  0,1, 2
. Find (i). The value of C .
(ii). P ( 0  X  2 x  0 ) .
(iii). The distribution function of X .
1
(iv). The largest value of x for which F ( x )  .
2
Solution:
x=2
(i) Since  p ( x) = 1
x =0
p ( 0 ) + p (1) + p ( 2 ) = 1
3C 2 + 4C − 10C 2 + 5C − 1 = 1
7C 2 − 9C + 2 = 0
2
C = 1,
7
C = 1 is not applicable
2
C =
7
The Probability distribution is
X : 0 1 2
12 16 21
P( X ) :
49 49 49
P ( 0  x  2 )  x  0  P  0  x  2
(ii) P 0  x  2 = =
 x  0 P  x  0 P  x  0
P  x = 1
=
P  x = 1 + P  X = 2
www.BrainKart.com
16
16
P 0  x  2 = = 49
 x  0  16 21 37
+
49 49
(iii). The distribution function of X is
X F ( X = x) = P ( X  x)
12
0 F ( 0) = P ( X  0) = = 0.24
49
12 16
1 F (1) = P ( X  1) = P ( X = 0 ) + P ( X = 1) =
+ = 0.57
49 49
12 16 21
2 F ( 2) = P ( X  2) = P ( X = 0 ) + P ( X = 1) + P ( X = 2 ) = + + =1
49 49 49
1
(iv) The Largest value of x for which F ( x ) = P ( X  x )  is 0.
2
Problem.5
x
 ; x = 1, 2,3, 4,5
If P ( x ) = 15
0 ; elsewhere
Find (i) P  X = 1or 2 and (ii) P 1/ 2  X  5 / 2 x  1
Solution:
1 2 3 1
i) P ( X = 1 or 2 ) = P ( X = 1) + P ( X = 2 ) =
+ = =
15 15 15 5
 1 5 
P   X    ( X  1) 
1 
ii) P   X  / x  1 = 
5 2 2  = P ( X = 1or 2 )  ( X  1)
2 2  P ( X  1) P ( X  1)
P ( X = 2)
=
1 − P ( X = 1)
2 /15 2 /15 2 1
= = = = .
1 − (1/15) 14 /15 14 7
Problem.6
A continuous random variable X has a probability density function f ( x ) = 3x 2 ,
0  x  1. Find ' a ' such that P ( X  a ) = P ( X  a ) .
Solution:
1
Since P ( X  a ) = P ( X  a ) , each must be equal to because the probability
2
is always 1.
1
 P ( X  a) =
2
a
1
 f ( x ) dx =
0
2
a
a
1  x3  1
0 =  = a3 = .
2
3 x dx 3  
2  3 0 2
1
 1 3
a =  
2
Problem.7
www.BrainKart.com
Cxe − x ; if x  0
A random variable X has the p.d.f f ( x ) given by f ( x ) =  Find the value
0 ; if x  0
of C and cumulative density function of X .
Solution:

Since  f ( x ) dx = 1
−

 Cxe
−x
dx = 1
0
C  x ( −e− x ) − ( e− x ) = 1

0
C =1
 xe − x ; x  0
 f ( x) = 
 0 ;x  0
Cumulative Distribution of x is
x x
F ( x ) =  f ( x )dt =  xe − x dx =  − xe − x − e − x  = − xe − x − e − x + 1
x
0
0 0
= 1 − (1 + x ) e− x , x  0 .
1
 ( x + 1) ; −1  x  1
8. If a random variable X has the p.d.f f ( x ) =  2 . Find the mean
 0 ; otherwise
and variance of X .
Solution:
1
1 1 1
1  x3 x 2 
Mean=1 =  xf ( x ) dx =  x ( x + 1) dx =  ( x 2 + x ) dx =  +  =
1 1 1
−1
2 −1 2 −1 2  3 2 −1 3
1
1 x x3 
1 1 4
2 =  x f ( x )dx =  ( x3 + x 2 ) dx =  +  =  + − +  = . =
2 1 1 1 1 1 1 1 2 1
−1
2 −1 2 4 3  −1 2  4 3 4 3 2 3 3
( )
1 1 3 −1 2
2
Variance = 2 − 1 = − = = .
3 9 9 9
9. A continuous random variable X that can assume any value between X = 2 and
X = 5 has a probability density function given by f ( x) = k (1 + x) . Find P ( X  4 ) .
Solution:
k (1 + x) , 2  x  5
Given X is a continuous random variable whose pdf is f ( x ) =  .
0 , Otherwise
 5
Since  f ( x ) dx = 1   k (1 + x)dx = 1
− 2
5
 (1 + x)2 
k  =1
 2 2
 (1 + 5) 2 (1 + 2) 2 
k − =1
 2 2 
 9
k 18 −  = 1
 2
 27  2
k   =1  k =
2 27
www.BrainKart.com
 2(1 + x)
 ,2  x  5
 f ( x ) =  27
0 , Otherwise
4
2
27 2
P( X  4) = (1 + x) dx
4
2  (1 + x)2  2  (1 + 4)2 (1 + 2)2  2  25 9  2 16 16
= = − = − = = .
27  2  2 27  2 2  27  2 2  27 2 27
 2e −2 x ; x  0
10. A random variable X has density function given by f ( x ) =  . Find
0 ;x  0
m.g.f
Solution:
 
M X ( t ) = E ( etx ) =  etx f ( x ) dx =  etx 2e −2 x dx
0 0

= 2  e( t − 2 ) x dx
0

 e( t − 2 ) x  2
= 2  = ,t  2 .
 t − 2 0 2 − t
 2 x, 0  x  b
11. The pdf of a random variable X is given by f ( x ) =  . For what value
0, otherwise
of b is f ( x ) a valid pdf? Also find the cdf of the random variable X with the above
pdf.
Solution:
 2 x, 0  x  b
Given f ( x ) = 
0, otherwise
 b
Since  f ( x ) dx = 1   2 x dx = 1
− 0
b
 x2 
2 2  = 1
 0
b 2 − 0  = 1  b =1
 2 x, 0  x  1
 f ( x) = 
0, otherwise
x
x
 x2 
x
F ( x ) = P( X  x) =  f ( x ) dx =  2 xdx = 2  = x 2 , 0  x  1
0 0  2 0
x x
F ( x ) = P( X  x) =  f ( x ) dx =  0 dx = 0 , x  0
− −
0 1 x 0 1 x
F ( x ) = P( X  x) =  f ( x ) dx +  f ( x ) dx +  f ( x ) dx =  0 dx +  2 x dx +  0 dx =
− 0 1 − 0 1
2 1
 x 
2 2  = 1 , x  1
 0
 0, x0
 2
F ( x) = x , 0  x  1
 1, x 1

www.BrainKart.com
 K
 ,−  x  
12. A random variable X has density function f ( x ) = 1 + x 2 .
 0 , Otherwise
Determine K and the distribution functions. Evaluate the probability P ( x  0 ) .
Solution:

Since
−
 f ( x )dx = 1

K
 1+ x
−
2
dx = 1

dx
K =1

1 + x2
K ( tan −1 x )

=1
−
    
K  −−  =1
 2  2 
K = 1
1
K=

x x
K
F ( x) =  f ( x )dx =  1 + x 2
dx
− −
1  −1   
=  tan x −  −  
  2 
1  
F ( x) =  + tan −1 x  , −  x  
 2 

( tan x )
1 dx 1 
P ( X  0) =
  1+ x
−1
=
0
2
 0
1  −1  1
=  − tan 0  = .
2  2
 Ke −3 x , x  0
13. If X has the probability density function f ( x ) =  find K ,
0 , otherwise
P  0.5  X  1 and the mean of X .
Solution:

Since
−
 f ( x ) dx = 1

 Ke
−3 x
dx = 1
0

 e−3 x 
K
−  =1
 3 0
K
=1
3
K =3
1 1
 e −3 − e −1.5 
P ( 0.5  X  1) =  f ( x ) dx = 3  e −3 x dx = 3   = e − e 
−1.5 −3
0.5 0.5  −3 
www.BrainKart.com
 
Mean of X = E ( x ) =  xf ( x ) dx = 3 xe −3 x dx
0 0

  −e−3 x   e−3 x   3 1 1
= 3x   − 1  = =
  3   9 0 9 3
1
Hence the mean of X = E ( X ) = .
3
14. If X is a continuous random variable with pdf given by
 Kx in 0  x  2
2 K in 2  x  4

f ( x) =  . Find the value of K and also the cdf F ( x ) .
 6 K − Kx in 4  x  6
0 elsewhere
Solution:

Since  F ( x ) dx = 1

2 4 6
 Kxdx +  2Kdx +  ( 6k − kx )dx = 1

0 2 4
 x 2 2 6
 x2  
6
K   + ( 2 x ) 2 +   6 x −   = 1
4
 2 0 4
2 4 

K  2 + 8 − 4 + 36 − 18 − 24 + 8 = 1
8K = 1
1
K=
8
x
We know that F ( x ) =  f ( x ) dx
−
x
If x  0 , then F ( x ) =  f ( x ) dx = 0
−
x
If x  ( 0, 2 ) , then F ( x ) =  f ( x ) dx
−
0 x
F ( x) =  f ( x ) dx +  f ( x ) dx
− 0
0 x 0 x
1
=  0dx +  Kxdx =
− 0
 0dx +
−
8 0
xdx
x
 x2  x2
F ( x ) =   = ,0  x  2
 16 0 16
0 2 x
If x  ( 2, 4 ) , then F ( x ) =  f ( x ) dx +  f ( x ) dx +  f ( x ) dx
− 0 2
0 2 x
=  0dx +  Kxdx +  2Kdx
− 0 2
2
 x2   x 
2 x x
x 1
=  dx +  dx =   +  
0
8 2
4  16 0  4 2
1 x 1
= + −
4 4 2
www.BrainKart.com
x 4 x −1
F ( x) = − = ,2 x4
4 16 4
0 2 4 x
If x  ( 4, 6 ) , then F ( x ) =  0dx +  Kxdx +  2Kdx +  K ( 6 − x ) dx
− 0 2 4
2 4 x
x 1 1
=  dx +  dx +  ( 6 − x ) dx
0
8 2
4 4
8
2 x
 x2   x   6x x2 
4
=   +  + − 
 16 0  4 2  8 16 4
1 1 6 x x2
= +1− + − − 3 +1
4 2 8 16
4 + 16 − 8 + 12 x − x 2 − 48 + 16
=
16
− x + 12 x − 20
2
F ( x) = ,4  x  6
16
0 2 4 6 
If x  6 , then F ( x ) =  0dx +  Kxdx +  2 Kdx +  K ( 6 − x ) dx +  0dx
− 0 2 4 6
F ( x) = 1 , x  6
0 ;x0
 2
x ;0  x  2
16
 1
 F ( x ) =  ( x − 1) ;2 x4
 4
 −1
 16 ( 20 − 12 x + x ) ; 4  x  6
2

 1 ;x  6
2 x , 0  x  1
15. A random variable X has the P.d.f f ( x ) = 
0 , Otherwise
 1 1 1  3 1
Find (i) P  X   (ii) P   x   (iii) P  X  / X  
 2 4 2  4 2
Solution:
1/ 2
 1
1/ 2 1/ 2
 x2  2 1 1
(i) P  x   =
 2  f ( x ) dx = 
0 0
2 xdx = 2   =
 2 0 8
=
4
1/ 2
1 1
1/ 2 1/ 2
 x2 
(ii) P   x   =  f ( x ) dx =  2 xdx = 2  
4 2  1/ 4 1/ 4  2 1/ 4
1 1  1 1  3
= 2 −  =  −  = .
 8 32   4 16  16
 3 1  3
P X   X   P X  
 1
(iii) P  X  / X   = 
2
= 
3 4 4
 4 2  1  1
P X   P X  
 2  2
1
 3
1 1
 x2  9 7
P  X   =  f ( x ) dx =  2 xdx = 2   = 1 − =
 4  3/ 4 3/ 4  2 3/ 4 16 16
1
 1
1
x  1
1 3 2
P  X   =  f ( x ) dx =  2 xdx = 2   = 1 − =
 2  1/ 2 1/ 2  2 1/ 2 4 4
www.BrainKart.com
7
 3 1 7 4 7
P  X  / X   = 16 =  = .
 4 2  3 16 3 12
4
 1 − 2x
 e ,x 0
16. Let the random variable X have the p.d.f f ( x ) =  2 .Find the
0
 , otherwise.
moment generating function, mean & variance of X .
Solution:
 
M X ( t ) = E ( e ) =  e f ( x ) dx =  etx e − x / 2 dx
tx tx 1
− 0
2

 − 1 −t  x 
1  e 2  
 − 1 −t  x  
1   1 1
= e 2 
dx =   = , if t  .
20 2   1  1 − 2t 2
 −  2 − t  
 0
d   2 
E ( X ) =  M X ( t ) =  2
=2
 dt  t =0  (1 − 2t )  t =0
 d2   8 
E ( X 2 ) =  2 M X ( t ) =  3
=8
 dt  t =0  (1 − 2t )  t =0
Var ( X ) = E ( X 2 ) −  E ( X ) = 8 − 4 = 4 .
2
17. The first four moments of a distribution about x = 4 are 1,4,10 and 45
respectively. Show that the mean is 5, variance is 3, 3 = 0 and 4 = 26 .
Solution:
Given 1 = 1, 2 = 4, 3 = 10, 4 = 45
 r = r th moment about to value x = 4
Here A = 4
Here Mean = A + 1 = 4 + 1 = 5
( )
2
Variance = 2 = 2 − 1
= 4 −1 = 3 .
( )
3
3 = 3 − 321 + 2 1
= 10 − 3 ( 4 )(1) + 2 (1) = 0
3
( ) ( )
2 4
4 = 4 − 431 + 62 1 − 3 1
= 45 − 4 (10 )(1) + 6 ( 4 )(1) − 3 (1)
2 4
4 = 26 .
18. Find the moment generating function and rth moments for the distribution.
Whose p.d.f is f ( x ) = Ke− x , 0  x   . Find also standard deviation.
Solution:
Total Probability=1

  ke − x dx = 1
0
www.BrainKart.com

 e− x 
k   =1
 −1  0
k =1
 
M X ( t ) = E etx  =  etx e − x dx =  e( t −1) x dx
0 0

e ( t −1) x
 1
=  = , t 1
 t −1 0 1 − t
= (1 − t ) = 1 + t + t 2 + ... + t r + ...
−1
 tr
r = coeff . of = r !
r!
When r = 1 , 1 = 1! = 1
r = 2 , 2 = 2! = 2
Variance =  2 − 1 = 2 − 1 = 1
 Standard deviation=1.
19. A continuous random variable X has the p.d.f f ( x ) = kx 2e − x , x  0. Find the rth
moment of X about the origin. Hence find mean and variance of X.
Solution:

 Kx e
2 −x
Since dx = 1
0

  e− x   e− x   e− x  
K  x2   − 2 x   + 2  = 1
  −1   1   −1 0
2K = 1
1
K= .
2

r =  x r f ( x )dx
0

1 r +2 − x
2 0
= x e dx
1

=  e − x x ( r +3)−1dx =
( r + 2 )!
20 2
3!
Putting n = 1 , 1 = = 3
2
4!
n = 2 , 2 = = 12
2
 Mean = 1 = 3
( )
2
Variable= 2 − 1
i.e. 2 = 12 − ( 3) = 12 − 9
2
 2 = 3.
20. Find the moment generating function of the random variable X, with probability
x for 0  x  1

density function f ( x ) = 2 − x for 1  x  2 .Also find 1 , 2 .
0
 otherwise
Solution:
www.BrainKart.com

M X (t ) =  e f ( x )dx
tx
−
1 2
=  etx xdx +  etx ( 2 − x ) dx
0 1
1 2
 xetx etx   etx etx 
= − 2  + ( 2 − x ) − (−1) 2 
 t t 0  t t 1
et et 1 e 2 t e t e t
= − + + − −
t t2 t2 t2 t t2
2
 et − 1 
= 
 t 
2
 t t2 t3 
= 1 + + + + ... − 1
 1! 2! 3! 
2
 t t2 t3 
= 1 + + + + ...
 2! 3! 4! 
t
1 = coeff . of =1
1!
 t2 7
2 = coeff . of = .
2! 6
x −
1 −
21. The p.d.f of the r.v. X follows the probability law: f ( x ) = e 
, −  x   .
2
Find the m.g.f of X and also find E X and V X .
Solution:
  − x −
M X ( t ) = E ( e ) =  e f ( x ) dx = 
tx tx1 
e etx dx
− −
2
 ( x − )  −( x − )
1 1
= − 2 e 
etx dx + 
 2
e 
etx dx
  1  1 
e−1 x t +  e − x −t 
M X (t ) = e 2 
   
dx + e dx
2 −
 1 1 
 t+  −  −1
−1    
e e e e
= +
2  1  2  1 
t +   −t
   
t t
e e e t 2 −1
= + = = e t 1 − ( t ) 
2 ( t + 1) 2 (1 −  t ) 1 −  t2 2  
  2t 2 
= 1 +  t + + ... 1 +  2t 2 +  4t 4 + ...
 2! 
3 t2 2
= 1+t + + ...
2!
E ( X ) = 1 = coeff . of t in M X ( t ) = 
t2
2 = coeff . of in M X ( t ) = 3 2
2!
( ) = 3
2
Var ( X ) = 2 − 1 2
−  2 = 2 2 .
22. The elementary probability law of a continues random variable is
www.BrainKart.com
f ( x ) = y0e−b( x −a ) , a  x  , b  0 where a, b and y0 are constants. Find y0 the rth
moment about point x = a and also find the mean and variance.
Solution:
Since the total probability is unity,

 f ( x ) dx = 1
−

y0  e − b( x − a ) dx = 1
0

 e − b( x − a ) 
y0   =1
 − b 0
1
y0   = 1
b
y0 = b.

r ( rth moment about the point x = a ) =  ( x − a ) f ( x ) dx
r
−

= b ( x − a ) e
r − b( x − a )
dx
a
Put x − a = t , dx = dt , when x = a, t = 0 , x = , t = 

= b  t r e −bt dt
0
 ( r + 1) r!
=b ( r +1)
=
b br
In particular r = 1
1
1 =
b
2
2 = 2
b
1
Mean = a + 1 = a +
b
( )
2
Variance = 2 − 1
2 1 1
=2
− 2= 2.
b b b
23. The lifetime (in hours) of a certain piece of equipment is a continuous r.v. having
 xe − kx , 0  x  
range 0  x   and p.d.f.is f ( x ) =  . Determine the constant K and
0 , otherwise
evaluate the probability that the life time exceeds 2 hours.
Solution:
Let X the life time of a certain piece of equipment.
 xe − kx , 0  x  
Then the p.d.f. f ( x ) = 
0 , Otherwise

To find K ,  f ( x ) dx = 1
0

e
− kx
x 2−1dx = 1
0
www.BrainKart.com
 ( 2)
2
=1 K2 =1  K = 1
K
 xe − x , 0  x  
 f ( x) = 
0 , Otherwise
P[Life time exceeds 2 hours] = P  X  2

=  f ( x ) dx
2

=  xe− x dx
2
=  x ( −e− x ) − ( e− x )

= 2e + e = 3e = 0.4060
−2 −2 −2
24. If the continuous random variable X has ray Leigh density

 x − x2 
2
F ( x) =  2 e 2
  U ( x ) find E ( x n ) and deduce the values of E ( X ) and Var ( X ) .
 
 
Solution:
1 if x  0
Here U ( x ) = 
0 if x  0

E(x n
)=x n
f ( x )dx
0
 − x2
x
=x n
e 2 2
dx
0
2
x2
Put =t, x = 0, t = 0
2 2
x
dx = dt x = ,t = 
2

=  ( 2 2t )  x = 2 . t 
n/2
e − t dt  
0

=2  t
n/2 n n / 2 −t
e dt
0
n 
E ( x n ) = 2n / 2  n   + 1 − (1)
2 
Putting n = 1 in (1) we get
3 1 
E ( x ) = 21/ 2     = 2   + 1
2 2 
1 1
= 2   
2 2
 1
=  [   = 
2 2

 E ( x) = 
2
Putting n = 2 in (1), we get
E ( x 2 ) = 2 2  ( 2 ) = 2 2 [  ( 2 ) = 1]
Var ( X ) = E ( X 2 ) −  E ( X )
2
www.BrainKart.com

= 2 2 −  2
2
   4 −  2
=  2 −  2 =   .
 2  2 
Standard Distributions
Discrete type
Binomial distribution:
A random variable X is said to follow binomial distribution if it assumes only
non negative values and its probability mass function is given by
nCx p x q n − x , x = 0,1, 2,..., n; q = 1 − p
P ( X = x ) = p( x) = 
0, otherwise
Notation: X B ( n, p ) read as X is following binomial distribution with parameter
n and p .
1. Find m.g.f. of Binomial distribution and find its mean and variance.
Solution:
M.G.F.of Binomial distribution:-
n
M X ( t ) = E etx  =  etx P ( X = x )
x =0
n
=  nCx x P x q n − x etx
x =0
x
=  nCx ( pet ) q n − x
n
x =0
M X (t ) = ( q + pet )
n
Mean of Binomial distribution

Mean = E ( X ) = M X  ( 0 )
=  n ( q + pet ) pet  = np Since q + p = 1

n −1
  t =0
E ( X 2 ) = M X  ( 0 )
=  n ( n − 1) ( q + pet ) ( pe ) + npet ( q + pet ) 

n−2 t 2 n −1
  t =0
E ( X 2 ) = n ( n − 1) p 2 + np
= n 2 p 2 + np (1 − p ) = n 2 p 2 + npq
( )
Variance = E X 2 −  E  X  = npq
2
Mean = np ; Variance = npq

2. Comment the following: “The mean of a binomial distribution is 3 and variance is 4
Solution:
In binomial distribution, mean  variance but Variance  Mean
Since Variance = 4 &Mean = 3 , the given statement is wrong.
 1  1
3.If X and Y are independent binomial variates B  5,  and B  7,  find P  X + Y = 3
 2  2
Solution:
1
X + Y is also a binomial variate with parameters n1 + n2 = 12 & p =
2
3 9
1 1 55
 P  X + Y = 3 = 12C3     = 10
2 2 2
4. (i). Six dice are thrown 729 times. How many times do you expect atleast 3 dice
show 5 or 6 ?
www.BrainKart.com
(ii) Six coins are tossed 6400 times. Using the Poisson distribution, what is the
approximate probability of getting six heads 10 times?
Solution:
(i). Let X be the number of times the dice shown 5 or 6
1 1 1
P 5 or 6 = + =
6 6 3
1 2
 P = and q =
3 3
Here n = 6
By Binomial theorem,
x 6− x
1  2
P  X = x  = 6Cx     where x = 0,1, 2...6 .
3  3
P  X  3 = P ( 3) + P ( 4 ) + P ( 5 ) + P ( 6 )
3 3 4 2 5 6
1  2 1  2 1  2 1
= 6C3     + 6C4     + 6C5     + 6C6  
3  3  3  3  3  3  3
= 0.3196
Expected number of times atleast 3 dies to show 5 or 6 = N  P  X  3
= 729  0.3196 = 233 .
6
1
(ii). Probability of getting six heads in one toss of six coins is p=  ,
2
6
1
 = np = 6400    =100
2
e−100 (100)10
Let X be the number of times getting 6 heads P( X = 10) = = 1.025  10−30
10!
Poisson distribution:
A random variable X is said to follow Poisson distribution if it assumes only
non negative values and its probability mass function is given by
 e−  x
 ; x = 0,1, 2,...;   0
P ( X = x ) =  x!
0, otherwise

Notation: X P (  ) read as X is following Poisson distribution with parameter  .
Poisson distribution as limiting form of binomial distribution:
Poisson distribution is a limiting case of Binomial distribution under the
following conditions:
(i). n the number of trials is indefinitely large, (i.e.) n → 
(ii). p the constant probability of success in each trial is very small (i.e.) p → 0
(iii). np =  is finite.
Proof:
P ( X = x ) = p ( x ) = ncx px qn − x
Let np = 
 
 p= , q =1−
n n
x n−x
  
 p ( x ) = ncx   1 − 
n  n
www.BrainKart.com
x n−x
n!   
=    1− 
x ! (n − x )!  n   n
n ( n − 1) ( n − ( x − 1) ) ( n − x ) !   x  
n−x
=  n  1 − n 
x ! (n − x )!    
 1  x −1 
1. 1 −  1 − n  x n−x
= 
n   n x  1 −  
x!  n 
nx 
n−x
 1  x −1  x  
p ( x ) = 1. 1 −  1 −   1− 
 n  n  x!  n
Taking limit n →  on both sides
 n−x
x 1  x − 1   
lim p ( x ) = lim  1 −  1 − 1−  
n → x ! n →  n  n   n 

−x n
x  1  x − 1     
= lim 1 −  1 −   lim 1 −  lim 1 − 
x ! n →  n  n   n →  n  n →  n
− x
e 
P ( X = x) = ; x = 0,1, 2,...
x!
Problem.1Criticise the following statement: “The mean of a Poisson distribution is 5
while the standard deviation is 4”.
Solution:
For a Poisson distribution mean and variance are same. Hence this statement is not
true.
Problem.2If X is a Poisson variate P ( X = 2 ) = 9P ( X = 4 ) + 90P ( X = 6 ) , find mean
and variance of X.
Solution:
x
P ( X = x ) = e −, x = 0,1,2,...
x!
P ( X = 2 ) = 9P ( X = 4 ) + 90P ( X = 6 )
e −  2 e −  4 e −  6
=9 + 90
2! 4! 6!
2 4
1 9 
= + 90
2! 4! 6!
4
1 3 2 
=  +
2 8 8
3 2  4
1= +
4 4
 4 + 3 3 − 4 = 0
Put  2 = t , t 2 + 3t − 4 = 0
(t + 4 )(t − 1) = 0
t = 1, −4
  2 = 1,  2 = −4
 = 1 ,  = 2i
 Mean =  = 1 (   0 )
Variance =  = 1 .
www.BrainKart.com
Problem.3 If X is a Poisson rv such that P ( X = 1) = 0.3 and P ( X = 2 ) = 0.2 . Find
P ( X = 0) .
Solution:
x
Given X is a Poisson rv, p ( X = x ) = e − , x = 0,1,...
x!
e −  1
P ( X = 1) = = 0.3 (1)
1!
e −  2
P ( X = 2) = = 0.2 (2)
2!
(1) e −  1 0.3
 − 2
2=
(2) e  0.2
1 0.3
=
 2 ( 0.2)
 = 1.3333
(1.3333 )x
 P (X = x) = e −1.3333
x!
(1)
 P ( X = 0 ) = e −1.3333 = 0.2636 .
1
Problem.4 Out of 800 families with 4 children each, how many families would be
expected to have (i) 2 boys and 2 girls, (ii) at least 1 boy, (iii) at most 2 girls and (iv)
children of both sexes. Assume equal probabilities for boys and girls.
Solution:
Considering each child as a trial, n = 4 . Assuming that birth of a boy is a success,
1 1
p=  q= .
2 2
Let X denote the number of successes (boys)
(i) P ( 2 boys and 2 girls ) = P ( X = 2 )

2 2
1  1 
= 4c2    
2 2
= 0.375
 No. of families having 2 boys and 2 girls = N . P ( X = 2 )
= 800 ( 0.375 )
= 300
(ii) P ( at least 1 boy ) = P ( X  1)
= 1 − P ( X  1)
www.BrainKart.com
= 1 − P ( X = 0)
0 4
1  1 
= 1 − 4c0    
2 2
= 0.9375
 No. of families having at least 1 boy = N .P ( X  1)
= 800 ( 0.9375 )
= 750 .
(iii) P ( atmost 2 girls) = P ( exactly 0 girl, 1girl, 2 girls)
= P ( X = 4 ) + P ( X = 3) + P ( X = 2)
4 0 3 1 2 2
 1  1  1  1  1  1
= 4c4     + 4c3     + 4c4    
 2  2  2  2  2  2
= 0.6875.
 No. of families having atmost 2 girls = 800 ( 0.6875) = 550 .
(iv) P ( Children of both genders ) = 1 − P (Children of the same gender )
= 1 − P ( all are boys) + P ( all are girls)
= 1 − P ( X = 4 ) + P ( X = 0 )
  1
4
 1 
4
= 1 − 4c4   + 4c0   
  2  2 

= 0.875 .
 No. of families having children of both genders = 800 ( 0.875 ) = 700 .
Continuous type
Uniform (or) Rectangular distribution:
A continuous random variable X is said to have a uniform distribution over
an interval ( a, b ) if its probability density function is given by
 1
 ,a  x  b
f ( x) = b − a
0, otherwise
Problem.1 If X is uniformly distributed with Mean1 and Variance , find P  X  0

4
3
Solution:
If X is uniformly distributed over ( a, b ) , then
(b − a )
2
b+a
E(X ) = and V ( X ) =
2 12
b+a
 =1 a + b = 2
2
( b − a)
2
4
 =  ( b − a ) = 16
2
12 3
 a + b = 2 & b − a = 4 We get b = 3, a = −1
www.BrainKart.com
 a = −1& b = 3 and probability density function of x is
1
 ; −1  x  3
f ( x) =  4
0 ; Otherwise
0
1 1 1
P  x  0 =  4 dx = 4  x =
0
−1
.
−1
4
Normal distribution:
A random variable X is said to have a Normal distribution with parameters
 (mean) and  2 (variance) if its probability density function is given by the
probability law
1  x− 
2
1 −  
f ( x) = e 2    , −  x  , −    ,   0
 2
Notation: X N (  ,  2 ) read as X is following normal distribution with mean  and
variance  2 are called parameter.
t2
Problem.1 Prove that “For standard normal distribution N ( 0,1) , M X ( t ) = e . 2
Solution:
Moment generating function of Normal distribution
= M X ( t ) = E etx 
1  x− 
2
 − 
1 
e
2  
= tx
e dx
 2 −
x−
Put z = then  dz = dx, −   Z  

 z2
1 t ( z +  ) −
 M X (t ) = e 2
dz
2 −
 z2 

e t − −t z 
=
2 e
−
 2 
dz
1   2t 2 
 ( z −t )2 + 
e t − 
= e
2  2 
dz
2 −
 2t 2

e t e 2 −
1
( z −t )2
=
2 e
−
2
dz
 1
1 − ( z −t )2
the total area under normal curve is unity, we have
2 e
−
2
dz = 1
 2t 2
t +
Hence M X ( t ) = e 2
For standard normal variable N ( 0,1)
t2
M X (t ) = e 2
Problem.2 State and prove the additive property of normal distribution.

Solution:
Statement:
If X 1 , X 2 ,..., X n are n independent normal random variates with mean ( 1 ,  12 ) ,
(  ,  ) ,… (  ,  ) then
2 2
2
n n
2
X 1 + X 2 + ... + X n also a normal random variable with mean
www.BrainKart.com
 n n
2
  i ,   i  .
 i =1 i =1 
Proof:
We know that. M X1 + X 2 +...+ X n ( t ) = M X1 ( t ) M X 2 ( t ) ...M X n ( t )
t 2 i 2
i t +
But M X i ( t ) = e 2
, i = 1, 2....n
t 12
2
t 2 22 t 2 n 2
1t + 2 t + n t +
M X1 + X 2 +...+ X n ( t ) = e 2
e 2
...e 2
( 1 + 2 +...+ n )t +
( 1 + 2 +...+ n
2 2 2
)t 2
=e 2
n
n  i 2 t 2
 it + i=1 2
= e i=1
By uniqueness MGF, X 1 + X 2 + ... + X n follows normal random variable with
 n n

parameter   i ,   i 2  .
 i =1 i =1 
This proves the property.
Problem.3 X is a normal variate with mean = 30 and S.D = 5 Find the following
P  26  X  40
Solution:
X ~ N ( 30,52 )
  = 30 &  = 5
X −
Let Z = be the standard normal variate

 26 − 30 40 − 30 
P  26  X  40 = P  Z
 5 5 
= P  −0.8  Z  2 = P  −0.8  Z  0 + P  0  Z  2
= P  0  Z 0.8 +  0  z  2
= 0.2881 + 0.4772 = 0.7653 .
Problem.4 The average percentage of marks of candidates in an examination is 45
will a standard deviation of 10 the minimum for a pass is 50%.If 1000 candidates
appear for the examination, how many can be expected marks. If it is required, that
double that number should pass, what should be the average percentage of marks?
Solution:
Let X be marks of the candidates
Then X ~ N ( 42,10 2 )
X − 42
Let z =
10
P  X  50 = P  Z  0.8
= 0.5 − P  0  z  0.8
= 0.5 − 0.2881 = 0.2119
Since 1000 students write the test, nearly 212 students would pass the examination.
If double that number should pass, then the no of passes should be 424.
We have to find z1 , such that P  Z  z1  = 0.424
 P  0  z  z1  = 0.5 − 0.424 = 0.076
From tables, z = 0.19
50 − x1
 z1 =  x1 = 50 − 10 z1
10
www.BrainKart.com
= 50 −1.9 = 48.1
The average mark should be 48 nearly.
Problem.5 Given that X is normally distribution with mean 10 and probability
P  X  12 = 0.1587 . What is the probability that X will fall in the interval ( 9,11) .
Solution:
Given X is normally distributed with mean  = 10.
x−
Let z = be the standard normal variate.

12 − 10 2
For X = 12, z = z=
 
2
Put z1 =

Then P  X  12 = 0.1587
P  Z  Z1  = 0.1587
 0.5 − p  0  z  z1  = 0.1587
 P  0  z  z1  = 0.3413
From area table P  0  z  1 = 0.3413
2
 Z1 = 1  =1

To find P 9  x  11
1 1
For X = 9, z = − and X = 11, z =
2 2
 P 9  X  11 = P  −0.5  z  0.5
= 2 P  0  z  0.5
= 2  0.1915 = 0.3830
Problem.6 In a normal distribution 31% of the items are under 45 and 8% are over
64.Find the mean and standard deviation of the distribution.
Solution:
Let  be the mean and  be the standard deviation.
Then P  X  45 = 0.31 and P  X  64 = 0.08
45 − 
When X = 45 , Z = = − z1

z1
 z1 is the value of z corresponding to the area   ( z )dz = 0.19

0
 z1 = 0.495
45 −  = −0.495 ---(1)
64 − 
When X = 64 , Z = = z2

z2
 z2 is the value of z corresponding to the area   ( z )dz = 0.42

0
 z2 = 1.405
64 −  = 1.405 ---(2)
Solving (1) & (2) We get  = 10 (approx) &  = 50 (approx)
www.BrainKart.com
UNIT –II : SAMPLING DISTRIBUTION & ESTIMATION
Population:
The group of individuals under study is called population. The population may be finite or infinite.
Sample and Sample Size:
A finite subset of statistical individuals in a population is called Sample. The number of individuals in a
sample is called Sample Size(n).
Parameter and Statistic:
A numerical measure of a population is called a population parameter or simply a parameter.
A numerical measure of the sample is called a sample statistic or simply a statistic.
Sampling distribution:
The sampling distribution of a statistic is the probability distribution of all possible values the statistic
may take, when computed from random samples of same size, drawn from a specified population. Like
any other distribution, a sampling distribution will have its mean, standard deviation and moments of
higher order.
Standard Error:
The standard deviation of the sampling distribution of a statistic is known as itsstandard error.
Uses of Standard Error:
The magnitude of the standard error gives an index of the reliability of the estimate of the parameter.
The greater the standard error of the estimate, lesser will be the reliability of the sample.
Standard error is useful for determining the probable limits or confidence limits for an unknown
parameter with a specified confidence co-efficient.Standard error is also used for testing of hypothesis.
Type I error and Type II error:
Type I error: If we reject a hypothesis when it should be accepted, we say that type I error has been
made.
Type II error: If we accept a hypothesis when it should be rejected, we say that a type II error has been
made.
Critical region:
A region corresponding to a test statistic in the sample space which tends to rejection of H 0(Null
Hypothesis) is called critical region or region of rejection.
The region complementary to the critical region is called the region of acceptance.
Level of significance:
The probability ‘’ (the probability of making type I error) that a random value of the test statistic
belongs to the critical region is known as the level of significance. In other words, level of significance
is the size of the type I error.
The levels of significance usually employed in testing of hypothesis are 5% and 1%.
Critical values or significant values:
The value of test statistic which divides the critical (or rejection) region and acceptance region is called
the critical value or significant value. It depends on the level of significance used and the alternative
hypothesis.
Different types of sampling:
Non probability Samples: Judgment sample, Quota sample, Chunk sample.
Probability samples: Simple random sample, stratified sample, systematic sample, Cluster sample.
One tailed test:
www.BrainKart.com
When the hypothesis about the population parameter is rejected only for the value of sample statistic
falling into one of the tails of the sampling distribution, then it is known as one-tailed test.
If it is right tail then it is called right-tailed test or one-sided alternative to the right and if it is on the left
tail, then it is one-sided alternative to the left and called left-tailed test.
Two tailed test:
Two tailed test is one where the hypothesis about the population parameter is rejected for the value of
sample statistic falling into the either tails of the sampling distribution.
Systematic sampling:
In a systematic sample, the N items in the population are partitioned into k groups by dividing the size of
the population by the desired sample size n.
Stratified Sampling:
In a stratified sample, then N items in the population are first subdivided into separate subpopulations, or
strata, according to some common characteristic.
Cluster Sampling:
In a cluster sample, the N items in the population are divided into several clusters so that each cluster is
representative of the entire population.
Sampling Error:
Sampling errors have their origin in sampling and arise due to the fact that only a part of the population
has been used to estimate populations parameters and draw inferences about them.
Estimator:
An estimator of a population parameter is a sample statistic used to estimate the parameter. An estimate
of the parameter is a particular numerical value of the estimator obtained by sampling.
Different types of estimation:
There are two types of estimation. They are Point estimation and Interval estimation.
Point estimation:
When a single value is used as an estimate, the estimate is called a point estimate of the population
parameter. For example, the sample mean is the sample statistic used as an estimate of population mean
μ.
Interval estimation:
An estimate of a population parameter given by two numbers between which the parameter may be
considered to lie is called an interval estimate of the parameter.
The interval estimate or a confidence interval consists of an upper confidence limit and lower confidence
limit and we assign a probability that this interval contains the unknown population parameter.
Characteristics of a good estimator:
The important properties of good statistical estimators are (i) unbiasedness (ii) efficiency (iii) consistency
(iv) sufficiency.
Unbiased estimator:
An estimator is said to be unbiased if its expected value is equal to the population parameter it estimates.
Consistent estimator:
An estimator is said to be consistent if its probability of being close to the parameter it estimates
increases as the sample size increases.
Efficient estimator:
An estimator is efficient if it has a relatively smaller variance.
Sufficient estimator:
An estimator is said to be sufficient if it contains all the information in the data about the parameter it
estimates.
FORMULAS:
www.BrainKart.com
1. Write the confidence interval for the population mean for large samples when  is known.
The confidence interval for μ when  is known and sampling is done from a normal population or with a

large sample, is x  Z /2 .
n
Here x - sample mean,  - standard deviation, n – size of the sample
2. Write the confidence interval for the population mean for small samples when  is unknown.
s 1
  i
2
The confidence interval for μ when  is not known is x  t /2 , where S 2  x  x
n n 1
Here x - sample mean, s - standard deviation, n – size of the sample
3. Write the confidence interval for the difference between two population means for large samples
when  is known.
The confidence limits for the difference between two population means are given by
 12  22
 
x1  x2  Z
n1 n2
2

4. Write the confidence interval for the difference between two population means for small samples
when  is unknown.
The confidence limits for the difference between two population means are given by
 12  22
  1 
    
xi  x   yi  y 
2 2
x1  x2  t S  , where S 2 
2 n1 n2 n1  n2  2  
5. Write the confidence interval for the population proportion for large samples.
pq
The confidence interval for the population proportion P is p  Z , where q = 1- p.
2 n
6. Write the confidence interval for the difference between two population proportions for large
samples.
Confidence limits for the difference between two population proportions are
pq p q
 p1  p2   Z 2 1 1  2 2
n1 n2
7. Write the confidence interval for a mean when a finite population N is known?
 N n
  x  z
2 n N 1
8. What is the sample size for estimating a population mean when the sample standard deviation
and standard error is known?
 Z . 
2
n  , E  standard error

 E 
9. What is the sample size for estimating a population proportion when the sample
standarddeviation and standard error is known?
Z 2 . pq
n , E  standard error
E2
www.BrainKart.com
PROBLEMS:
1. A machine produces components, which have a standard deviation of 1.6cm in length. A
random sample of 64 parts is selected from the output and this sample has a mean length of 90cm.
The customer will reject the part if it is either less than 88cm or more than 92cm. Does the 95%
confidence interval for the true mean length of all the components produced ensure acceptance by
the customer?
Solution:
 
Formula for confidence interval is x  z     x  z
2 n 2 n
where   1.6, z   1.96, x  90 and n  64.
2
 89.61    90.39.
This implies that the probability that thetrue value of the population mean length of the components
will faill in this interval is 95%.
2. A server channel monitored for an hour was found to have an estimated mean of 20 transactions
transmitted per minute. The variance is known to be 4. Find the standard error. Establish an
interval estimate that includes a population mean 95% of the time and 99% of the time.
Solution:

(i) Standard error =  0.2582
n
(ii) Z   1.96`. 95% confidence interval is   x  Z   x  (19.4939,20.5061)
2 2
(iii) Z   2.58`. 99% confidence interval is   x  Z   x  (19.33,20.67)

2 2
3. A management consulting agency needs to estimate the average number of years of experience of
executives in a given branch of management. A random sample of 28 executives gives sample
mean as 6.7 years and standard deviation as 2.4 years. Give a 99% confidence interval for the
average number of years of experience for all executives in this branch.
Solution:
Given n  28, x  6.7 and s  2.4.
s
t   2.861.`. 99% confidence interval is   x  t   (5.402,7.998)
2 2 n
4. In order to compare the intelligent quotient of students, two schools were selected. A random
sample of 90 students was selected from each school. At school A, the mean I.Q. is 109 and
standard deviation is 11. At school B, the mean I.Q. is 98 and standard deviation is 9. Construct
95% confidence interval for the difference between mean I.Q. of two schools.
Solution:
Given n 1  n2  90, x 1  109 , S1  11, x 2  98 , S 2  9
 
z   1.96`. 95% confidence interval is x 1  x 2  z 
S12 S 22

n1 n2
 (8.06,13.94)
2 2
5. A sample poll of 100 voters chosen at random from all voters in a given district indicated that
55% of them were in favour of a particular candidate. Find (i) 95% and (ii) 99% confidence limits
for the proportion of all the voters in favour of this candidate.
www.BrainKart.com
We have n  100.
Solution: Sample proportion p  0.55 and q  0.45 Z  1.96

2
pq
(a) Z  1.96 : 95%Confidenceintervalfor p  p  Z   0.4526,0.6474 
2 2 n
pq
(b) Z  2.58 : 99%Confidenceintervalfor p  p  Z   0.4218,0.6782 
2 2 n
6. Two operators perform the same operation of applying a plastic coating to a part. A random
sample of 100 parts from the first operator shows that 6 are non-conforming. A random sample of
200 parts from the second operator shows that 8 are non-conforming. Find a 90% confidence
interval for the difference in the proportion of non-conforming parts produced by the two
operators.
Solution:
Given n1  100, n2  200, p1  0.06, q1  0.94, p 2  0.04, q 2  0.96, Z   1.645.
2
90% confidence interval for the difference in the proportionof non - conforming parts produced is
p1 q1 p 2 q 2
p1  p 2   Z    0.0275,0.0652
2 n1 n2
7. The operations manager for a large newspaper wants to determine the proportion of newspapers
printed that have a non conforming attribute, such as excessive rub off, missing pages, and
duplicate pages. The operations manager determines that a random sample of 200 newspapers
should be selected for analysis. Suppose that, of this sample of 200, thirty five contain same type of
non conformance. If the operations manager wants to have 90% confidence in estimating the true
population proportion, set up the confidence interval estimate.
Solution:
We have x  35 and n  200.
x
Sample proportion p   0.175 , q  0.825 and Z   1.645
n 2
 0.1308,0.2192
pq
90% confidence interval for p  p  Z 
2 n
8. The following are the average weekly losses of worker-hours due to accidents in 10 industrial plants
before and after a certain safety program were put into operation:
Before 45 73 46 124 33 57 83 34 26 17
After 36 60 44 119 35 51 77 29 24 11
Find a 90% confidence interval for the mean improvement in lost worker hours.
Null Hypothesis H0: There is no improvement between before and after the safety program.
Alternative Hypothesis H1: There is an improvement in the performance between before and after the safety
program.
From the given data
Group 1 Group 2
Mean 53.8 48.6
Variance 1027.7333 962.9333
www.BrainKart.com
Stand. Dev. 32.0583 31.0312

n 10 10
t 4.0333
degrees of freedom 9
critical value 4.297
Since the calculated t value is smaller than critical value (4.0333<4.297), we reject H0. So the means are not
significantly different.
9. In a test given to two groups of students the marks obtained were as follows
I group 18 20 36 50 49 36 34 49 61
II group 29 28 26 35 30 44 46
Construct a 95% confidence interval on the mean marks secured by students of the above two
groups.
Solution:
Given n 1  9, n2  7, x 1  37 , x 2  34
 2
 xi  x1    xi  x 2    108.57.  S  10.42.
1
S2 
2
n1  n2  2  i j 
 
t   1.76`. 95% confidence interval is x 1  x 2  t  S
1

1
n1 n2
 (6.24,12.24)
2 2
10. The diameter of component produced on a semi-automatic machine is known to be distributed

normally with a mean of 10mm and a standard deviation of 0.1mm. If we pick up a random
sample of size 5, what is the probability that the sample mean will be between 9.95mm and
10.05mm?
Solution:
 
P 9.95  x  10.05  P 1.12  z  1.12  2P0  z  1.12  0.7372
11. For a particular brand of T.V. picture tube, it is known that the mean operating life of the
tubes is 1000 hours with a standard deviation of 250 hours. What is the probability that the mean
for a random sample of size 25 will be between 950 and 1050 hours?
Solution:
 
P 950  x  1050  P 1  z  1  2P0  z  1  0.6826
12. Strength of wire were produced by company A has a mean of 4500kg and a standard deviation
of 200kg, company B has a mean of 4000kg and a S.D. of 300kg. If 50 wires of company A and 100
wires of company B are selected at random and tested for strength. What is the probability that
the sample mean strength of A will be atleast 600kg more than that of B.
Solution:
 
P x  600  Pz  2.425  0.5  0.4925  0.0075
13. Car stereo manufacturer of A have mean life time of 1400 hours with a S.D. of 200 hours while
those of manufacturer B have mean lifetime of 1200 hours with a S.D. of 100 hours. If a random
www.BrainKart.com
sample of 120 stereos of each manufacturer are tested. (i) What is the probability that the
manufacturer of A’s stereo’s will have a mean life time of atleast 160 hours more than the
manufacturer B’s stereos (ii) and 250 hours more than the manufacturer B stereos.
Solution:
(i)  
P x  160  Pz  1.95  0.5  0.4750  0.9750
(ii) Px  250  Pz  2.45  0.5  0.4929  0.0071
14. If two proportions 10% of machine produced by a company A are defective and 5% of machine
produced by a company B are defective. A random sample of 250 machines is taken from
company A and has the random sample of 300 machines from company B. What is the probability
that the difference in sample proportion is less than or equal to 0.02.
Solution:
P p1  p2  0.02  P( z  1.32)  0.5  P0  z  1.32  0.5  0.4066  0.0934
15. A random sample of 500 toys was taken from a consignment and 65 were found to be defective.
Find the percentage of defective toys in the consignment.
Solution:
n  500; X  65. p  0.13
pq
The limits for the population proportion P are given by p  1.96  (0.159,0.101)
n
The percentage of defective toysin the consignmen t lies betwwen and 10.1% and 15.9%.
16. What are the different types of sampling methods? Also write short notes on different types of
sampling?
Solution:
Simple random sampling, Stratified sampling, Cluster sampling, Judgment sampling and Quota
sampling.
Technique Descriptions Advantages Disadvantages
Simple Random sample from Highly representative if all Not possible without complete list
random whole population subjects participate; the ideal of population members; potentially
uneconomical to achieve; can be
disruptive to isolate members from
a group; time-scale may be too
long, data/sample could change
Stratified Random sample from Can ensure that specific groups More complex, requires greater
random identifiable groups (strata), are represented, even effort than simple random; strata
subgroups, etc. proportionally, in the sample(s) must be carefully defined
(e.g., by gender), by selecting
individuals from strata list
Cluster Random samples of Possible to select randomly Clusters in a level must be

successive clusters of when no single list of equivalent and some natural ones
subjects (e.g., by population members exists, but are not for essential characteristics
institution) until small local lists do; data collected on (e.g., geographic: numbers equal,
groups are chosen as units groups may avoid introduction but unemployment rates differ)
of confounding by isolating
members
www.BrainKart.com
Stage Combination of cluster Can make up probability Complex, combines limitations of

(randomly selecting sample by random at stages cluster and stratified random
clusters) and random or and within groups; possible to sampling
stratified random sampling select random sample when
of individuals population lists are very
localized
Purposive Hand-pick subjects on the Ensures balance of group sizes Samples are not easily defensible as
basis of specific when multiple groups are to be being representative of populations
characteristics selected due to potential subjectivity of
researcher
Quota Select individuals as they Ensures selection of adequate Not possible to prove that the
come to fill a quota by numbers of subjects with sample is representative of
characteristics appropriate characteristics designated population
proportional to
populations
Snowball Subjects with desired traits Possible to include members of No way of knowing whether the
or characteristics give groups where no lists or sample is representative of the
names of further identifiable clusters even exist population
appropriate subjects (e.g., drug abusers, criminals)
Volunteer, Either asking for Inexpensive way of ensuring Can be highly unrepresentative
accidental, volunteers, or the sufficient numbers of a study
convenience consequence of not all
those selected finally
participating, or a set of
subjects who just happen
to be available
17. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size of 22, test the
hypothesis that the value of the population mean is 70 against the alternative that it is more than 70. Use
the 0.025 level of significance.
Here the sample size, n = 22 < 30. Hence the sample is small sample. Given x  83,   70, s  12.5
Null Hypothesis H0: There is no significant different between sample mean and population mean.
Alternative Hypothesis H1: > 70.
Degrees of freedom: n–1 = 21.
x
The test statistic is, t 
s
n 1
83  70
 =4.7659
12.5
21
Tabulated value of t at 5% level with 21 degrees of freedom for single tailed test is 1.72. Here calculated value >
tabulated value, we reject H0.
Therefore > 70.
18. All the 0.10 level of significance, can we conclude that the following 400 observations follow a Poisson
distribution with  = 3?
www.BrainKart.com
No of arrivals per hr 0 1 2 3 4 5 or more

No of hrs 20 57 98 85 78 62
e  x
Given  = 3, N = 400. The expected frequency is P( X  x)  *N
x!
Null Hypothesis H0: The given data fit Poisson distribution
Alternative Hypothesis H1: The given data does not fit Poisson distribution
X O E (O-E)2 (O-E)2 / E
0 20 19.912 0.0077 0.0004
1 57 59.736 7.4857 0.1313
2 98 89.604 70.4928 0.7193
3 85 89.604 21.1968 0.2494
4 78 67.203 116.5752 1.4946
5 62 40.3218 469.9444 7.5797
Total 10.17471
Degrees of freedom , n-2 = 6-2 = 4

Tabulated value = 7.779
Since the calculated value is greater than the tabulated value, we reject H0.
Therefore the given data does not fit Poisson distribution.
19. If X1 , X 2 ,..., X n are Poisson variates with parameter   2 , use the central limit theorem to
estimate P(120  sn  160) where sn = X1  X 2  ...  X n and n=75
Sol:
Given   2,  2  2 ( For poisson distribution, mean  var iance   )
Now, n  75  2  150, & n 2  75  2  150   n  150
By Central limit theorem, Sn : N (n ,  n )  N (150, 150)
To find P(120< S n <160):
S  n Sn  150
Let z  n  , (since z is standard normal variate)
 n 150
120  150 160  150
If S n =120, then z   2.45 and if S n =160, z   0.85,
150 150
 P(120< S n <160)=P(-2.45<z<0.85)
=P(-2.45<z<0)+P(0<z<0.85)
=P(0<z<2.45)+ P(0<z<0.85)
=0.4927+0.2939
=0.7866
www.BrainKart.com
Unit-III Testing of Hypothesis
Population:
A population in statistics means a set of object. The population is finite or infinite
according to the number of elements of the set is finites or infinite.
Sampling:
A sample is a finite subset of the population. The number of elements in the sample is
called size of the sample.
Large and small sample:
The number of elements in a sample is greater than or equal to 30 then the sample is called a
large sample and if it is less than 30, then the sample is called a small sample.
Parameters:
Statistical constant like mean , variance  2 , etc., computed from a population are called
parameters of the population.
Statistics:
Statistical constants like x , variance S 2 , etc., computed from a sample are called samlple
staticts or statistics.
POPULATION (PARAMETER) SAMPLE (STATISTICS)

Population size=N Sample size=n
Population mean=  Sample mean= x
Population s.d.= Sample s.d.=S
Population proportion= P Sample proportion= p
Tests of significance or Hypothesis Testing:
Statistical Hypothesis:
In making statistical decision, we make assumption, which may be true or false are called
Statistical Hypothesis.
Null Hypothesis( H 0 ):
For applying the test of significance, we first setup a hypothesis which is a statement about the
population parameter. This statement is usually a hypothesis of no true difference between
sample statistics and population parameter under consideration and so it is called null hypothesis
and is denoted by H 0 .
Alternative Hypothesis ( H1 ):
Suppose the null hypothesis is false, then something else must be true. This is called an
alternative hypothesis and is denoted by H1 .
Eg. If H 0 is population mean =300, then H1 is   300 (ie.   300 or   300) or
H1 is   300 or H1 is   300 . So any of these may be taken as alternative hypothesis.
Error in sampling:
After applying a test of significance a decision is to be taken to accept or reject the null
hypothesis H 0 .
Type I error: The rejection of the null hypothesis H 0 when it is true is called type I error.
Type II error: The acceptance of the null hypothesis H 0 when it is false is called type II error.
www.BrainKart.com
The probability of type I error is called level of significance of the test and it is denoted by α.
We usually take either α=5% or α=1%.
One tailed and Two tailed test:
If  0 is a population parameter and  is the corresponding sample statistics and if we setup
the null hypothesis H 0 :   0 , then the alternative hypothesis which is complementary to H 0 can
be anyone of the following:
(i) H1 :   0 (  0 or   0 ) (ii) H1 :   0 (iii) H1 :   0
Alternative hypotheis, whereas H1 given in (ii) is called a left-tailed test. And (iii) is called a
right tailed test.
The probability of Type I error is called the level of significance of the test and is denoted by  .
Critical region:
For a test statistic, the area under the probability curve, which is normal is divided into two
region namely the region of acceptance of H 0 and the region of rejection of H 0 . The region in
which H 0 is rejected is called critical region. The region in which H 0 is accepted is called
acceptance region.
Procedure of Testing of Hypothesis:
(i) State the null hypothesis H 0
(ii) Decide the alternative hypothesis H1 (ie, one tailed or two tailed)
(iii) Choose the level of significance α (α=5% or α=1%).
(iv) Determine a suitable test statistic.
t  E (t )
Test statistic 
S .E of (t )
(v) Compute the computed value of z with the table value of z and decide the acceptane or the
rejection of H 0 .
If z <1.96, H 0 may be accepted at 5% level of significance. If z >1.96, H 0 may be rejection

at 5% level of significance.
If z <2.58, H 0 may be accepted at 1% level of significance. If z >2.58, H 0 may be rejection
at 1% level of significance.
For a single tail test(right tail or left tail) we compare the computed value of z with 1.645(at
5% level) and 2.33(at 1% level) and accept or reject H 0 accordingly.
2
www.BrainKart.com
Test of significance of small sample:
When the size of the sample (n) is less than 30, then that sample is called a small sample.
The following are some important tests for small sample,
(I) students t test
(II) F-test
(III)  2 -test
I Student t test
(i). Test of significance of the difference between sample mean and population mean
(ii). Test of significance of the difference between means of two small samples
(i) Test of significance of the difference between sample mean and population mean:
x
The studemts ‘t’ is defined by the statistic t  where x =sample mean, =population
S
n
mean, S=standard deviation of sample,
n= sample size.
Note:
x
If standard deviation of sample is not given directly then, the static is given by t  , where
S
n
 x  x
n n
 xi
2
i
x i 1
,S2  i 1
n n 1
Confident Interval:
s
The confident interval for the population mean for small sample is x t
n
 s s 
  x  t , x  t 
 n n
Working Rule:
(i) Let H 0 :   x (there is no significant difference between sample mean and population
mean)
H1 :   x (there is no significant difference between sample mean and population
mean)(Two tailed test)
x
Find t  .
S
n 1
Let t be the table value of t with v=n-1 degrees of freedom at  % level of significance.
Conclusion:
If t  t , H 0 is accepted at  % level of significance.
If t  t , H 0 is rejected at  % level of significance.
www.BrainKart.com
Problem:
1. The mean lifetime of a sample of 25 bulbs is found as 1550h, with standard deviation of
120h. The company manufacturing the bulbs claims that the average life of their bulbs is
1600h. Is the claim acceptable at 5% level of significance?
Solution:
Given sample size n=25, mean x =1550, S.D.(S)=120, population mean =1600
Let H 0 :   1600 ( the claim is acceptable)
H1 :   1600 (   x) (two tailed test)
x   1550  1600
Under H 0 , the test statistic is t    2.0833
S 120
n 25
 t  2.0833
From the table, for v=24, t0.05 =2.064. Since t  t0.05
 H 0 is rejected
Conclusion: The claim is not acceptable.
2. Test made on the breaking strength of 10 pieces of a metal gave the following results:
578,572,570,568,572,570,570,572,596, and 584kg. Test if the mean breaking strength of the
wire can be assumed as 577kg.
Solution:
let us first compute sample mean x and sample standard deviation S and then test if x differs
significantly from the population mean =577.
xx  x  x
2
x
578 2.8 7.84

572 -3.2 10.24
570 -5.2 27.04
568 -7.2 51.84
572 -3.2 10.24
570 -5.2 27.04
570 -5.2 27.04
572 -3.2 10.24
596 20.8 432.64
584 8.8 77.44
5752 0 681.6
Where
n
x i
5752
x i 1
  575.2,
n 10
www.BrainKart.com
 x 
n 2
i x
681.6
S2  i 1
  75.733
n 1 9
Let H 0 :   x ,
H1 :   x
x 572.2  577
Under H 0 , the test statistic is t    1.74
S 75.733
n 10
 t  1.74
Tabulated value of t for v=9 degrees of freedom t0.05 =2.262
Since t  t0.05 .  H 0 is accepted
Conclusion:
 The mean breaking strength of the wire can be assumed as 577kg at 5% level of significance.
3. A random sample of 10 boys had the following I.Q’s: 70, 120, 110, 101, 88, 83, 95,
98,107,100. Do these data support the assumption of a population mean I.Q of 100 ? Find a
reasonable range in which most of the mean I.Q. values of samples of 10 boys lie.
Solution:
Given   100, n  10
Null Hypothesis:
H 0 :   100 i.e., The data are consist with the assumption of men IQ of 100 in the population
Alternate Hypothesis:
H1 :   100 i.e., The data are consist with the assumption of men IQ of 100 in the population
Level of Significance :   5%    0.05

Test Statistic :
x
t
S
n
1
where S 2 
n 1
 ( x  x )2
x
 x  70  120  110  101  88  83  95  98  107  100  972  97.2
n 10 10
1 (70  97.2)  (120  97.2)  (110  97.2)  (101  97.2)  (88  97.2) 
2 2 2 2 2
S2   
10  1  (83  97.2)2  (95  97.2)2  (98  97.2)2  (107  97.2)2  (100  97.2)2 
1
S 2  (1833.6)  203.73  S  14.2734
9
www.BrainKart.com
97.2  100 2.8
t   0.6203
14.2734 4.5136
10
Table value : t ,n1  t 5% ,101  t 0.05,9  2.262 (Two –tailed test)
Conclusion :
Here t  t
i.e., The table value >calculated value,
 we accept the null hypothesis and conclude that the data are consistent with the assumption of
mean I.Q of 100 in the population.
To find the confidence limit:
 S   14.2734 
 x t    97.2 2.262     97.2 (2.262)(4.514)   (86.99,107.41)
 n  10 
A reasonable range in which most of the mean I.Q. values of samples of 10 boys lies
(86.99,107.41)
4. A random sample of 16 values from a normal population showed a mean of 41.5 inches and
the sum of squares of deviations from this mean equal to 135 square inches. Show that the
assumption of a mean of 43.5 inches for the population is not reasonable. Obtain 95 percent
and 99 percent confidence limits for the same.
Solution:
Given x  41.5,   43.5, n  16
Sum of squares of deviations from mean=  x  x  
2
 135
The parameter of interest is  .
Null Hypothesis H0:  =43.5 i.e., the assumption of a mean of 43.5 inches for the population is
reasonable.
Alternative Hypothesis H1:   43.5 i.e., the assumption of a mean of 43.5 inches for the
population is not reasonable.
Level of significance: (i)   5% = 0.05, degrees of freedom = 16–1=15
(ii)   1% =0.01, degrees of freedom = 16–1=15
x
Test Statistic : t 
S
n
1 1
where S 2 
n 1
 ( x  x )2 
16  1
135  9  S  9
41.5  43.5 8
t   2.667  t  2.667
3 3
16
Conclusion:
(i) Since t =2.667 > 2.131 so we reject H0 at 5% level of significance.
So we conclude that the assumption of mean of 43.5 inches for the population is not
reasonable.
(ii) Since t =2.667 < 2.947 so we accept H0 at 1% level of significance.
So we conclude that the assumption is reasonable.
www.BrainKart.com
95% confidence limits:
 S    3 
 x t    41.5  2.947  4     41.5 1.5983  (39.9, 43.09)
 n   
39.902    43.098
99% confidence limits:
 S    3 
 x t    41.5  2.947  4     41.5 2.2101  (39.29, 43.71)
 n   
39.29    43.71
5. Ten oil tins are taken at random from an automatic filling machine. The mean weight of the
tins is 15.8 kg and standard deviation is 0.5 kg. Does the sample mean differ significantly
from the intended weight of 16 kg?
Solution:
Given x  15.8,   16, s  0.50, n  10
Null Hypothesis H0:   16 the sample mean weight is not different from the intended weight.
Alternative Hypothesis H1:   16 i.e., the sample mean weight is not different from the
intended weight.
Level of significance:   5% = 0.05, degrees of freedom = 10-1=9
x
s
n
15.8  16 0.2
t   1.27  t  1.27
0.50 0.1581
10
Critical value : The critical value of t at 5% level of significance with degrees of freedom 9 is
2.26
Conclusion:
Here calculated value < table value.
so we accept H0 at 5% level of significance.
Hence the sample mean weight is not different from the intended weight.
(ii) Test of significance of the difference between means of two small samples:
To test the significance of the difference between the means x1 and x2 of sample of size n1
and n 2 .
x1  x2
Under H 0 , the test statistic is t  ,
1 1
S 
n1 n2
 
 x1  x1   x2  x2  
2 2
n s 2  n2 s22
where S  1 1 or S 2  (if s1 , s2 is not given directly )
n1  n2  2 n1  n2  2
Degrees of freedom(df) v = n1 + n 2 -2
Note:
If n1 = n 2 =n and if the pairs of values x1 and x2 are associated in some way (or correlated).
www.BrainKart.com
d  d 
2
Then we use the statistic is t 

d
, where d 
d and S 2 
S n n
n 1
Degrees of freedom v = n-1
Confident Interval:
The confident interval for difference between two population means for small sample is
1 1
 x1  x2  t S 
n1 n2
Problem:
1. Samples of two types of electric bulbs were tested for length of life and the following data
were obtainded.
Sample Size Mean S.D

I 8 1234h 36h
II 7 1036h 40h
Is the difference in the means sufficient to warrant that type I bulbs are superior type II
bulbs?
Solution:
Here x1 =1234, x2 =1036, n1 =8, n 2 =7, s1 =36, s2  40
Let H 0 : x1  x2 ,
H1 : x1  x2 (ie. Type I bulbs are superior to type II bulbs) (one tail test)
x1  x2
1 1
S 
n1 n2
n1s12  n2 s22
where S   40.7317
n1  n2  2
1234  1036
t   9.39
1 1
40.7317 
8 7
Degrees of freedom v = n1 + n 2 -2=13
Tabulated value of t for 13 d.f. at 5% level of significance is t0.05 =1.77
Since t  t0.05 .  H 0 is rejected. H1 is accepted.
Conclusion:
Type I bulbs may be regarded superior to type II bulbs at 5% level of significance.
www.BrainKart.com
2. Two independent sample of size 8 and 7 contained the following value:
Sample I 19 17 15 21 16 18 16 14
Sample 15 14 15 19 15 18 16
II
Is the difference between the sample means significant?
Solution:
x1  x1 x  x  x2  x2 x 
2 2
x1 x2
1 1 2  x2
19 2 4 15 -1 1
17 0 0 14 -2 4
15 -2 4 15 -1 1
21 4 16 19 3 9
16 -1 1 15 -1 1
18 1 1 18 2 4
16 -1 1 16 0 0
14 -3 9
136 0 36 112 0 20
x1 
x 1

136
 17, x2 
 x2  112  16
n1 8 n2 7
 x  x    x 
2 2
1 1 2  x2 36  20
S2    4.3076  S  2.0754
n1  n2  2 872
Let H 0 : x1  x2 ,
H1 : x1  x2 (Two tailed test)
x1  x2 17  16
Under H 0 , the test statistic is t    0.9309
1 1 1 1
S  2.0754 
n1 n2 8 7
t  0.9309
Degrees of freedom v = v = n1 + n 2 -2=13
From the ‘t’ table, v = 13 degrees freedom at 5% level of significance is t0.05 =2.16
Since t  t0.05  H 0 is accepted
Conclusion:
The two sample mean do not differ significantly at 5% level of significance.
www.BrainKart.com
3. The following data represent the biological values of protein from cow’s milk and buffalo’s
milk:
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
Examine whether the average values of protein in the two samples significantly differ at
5% level.
Solution:
Given n1  n2  6
H 0 : 1  2 There is no significant difference between the means of the two samples.
H1 : 1  2 There is a significant difference between the means of the two samples.
xy
Test Statistic: t 
1 1
S 
n1 n2
x y xx ( x  x )2 y y
( y  y )2
x  1.78 y  1.965
1.82 2 0.04
0.0016 0.035 0.00123
2.02 1.83 0.24
0.0576 -0.135 0.01823
1.88 1.86 0.1 0.01 -0.105 0.01102
1.61 2.03 -0.17
0.0289 0.065 0.00425
1.81 2.19 0.03
0.0009 0.225 0.0506
1.54 1.88 -0.24
0.0576 -0.085 0.00723
Total
10.68 11.79 0.1566 0.09256
x
 x  10.68  1.78 ; y   y  11.79  1.965
n1 6 n2 6
1
S2  0.1566  0.09256  (0.1)(0.2492)  0.0249  S  0.1578
662
1.78  1.956 0.176 0.176
t    1.9319
1 1 (0.1578)(0.5774) 0.0911
(0.1578) 
6 6
Critical value:The critical value of t at 5% level of significance with degrees of freedom 10 is
2.228
Here calculated value < table value, we accept H 0
(i.e,) The difference between the mean protein values of the two varieties of milk is not
significant at 5% level.
10
www.BrainKart.com
4. The following data relate to the marks obtaind by 11 students in 2 test, one held at the
beginning of a year and the other at the end of the year intensive coaching.
Test 1 19 23 16 24 17 18 20 18 21 19 20
Test 2 17 24 20 24 20 22 20 20 18 22 19
Do the data indicate that the students have benefited by coaching?
Solution:
The given data relate to the marks obtained in 2 tests by the same set of students. Hence the
marks in the 2 set can be regarded as correlated.
We use t-test for paired values.
Let H 0 : x1  x2 ,
H1 : x1  x2 (one tailed test)
  d  d 
2 2
x1 x2 d = x1 - x2 d 2  x1  x2 d- d
19 17 2 4 3 9
23 24 -1 1 0 0
16 20 -4 16 -3 9
24 24 0 0 1 1
17 20 -3 9 -2 4
18 22 -4 16 -3 9
20 20 0 0 1 1
18 20 -2 4 -1 1
21 18 3 9 4 16
19 22 -3 9 -2 4
20 19 1 1 2 4
-11 58
d  d 
2
d
 d  11  1 S 2
 
58
 5.272
n 11 n 11
d 1
the test statistic is t    1.377  t  1.377
S 5.272
n 1 10
from the table, v = n-1 = 10 (d.f.), t0.05 =1.812
Since t  t0.05  H 0 is accepted
Conclusion:
The students have not benefitted by coaching.
11
www.BrainKart.com
5. Ten Persons were appointed in the officer cadre in an office. Their performance was noted
by giving a test and the marks were recorded out of 100.
Employee A B C D E F G H I J
Before training 80 76 92 60 70 56 74 56 70 56
After training 84 70 96 80 70 52 84 72 72 50
By applying the t-test, can it be concluded that the employees have been benefited by the
training?
Solution:
Null Hypothesis H0: 1  2 i.e., the employees have not been benefited by the training.
Alternative Hypothesis H1: 1  2 i.e., the employees have been benefited by the training.
Level of significance:   5% = 0.05 (one tailed test)
d
S
n
where S 2 
1
 (d  d ) 2 & d 
d
n 1 n
Employees Before After d (d  d ) 2
A 80 84 -4 0
B 76 70 6 100
C 92 96 -4 0
D 60 80 -20 256
E 70 70 0 16
F 56 52 4 64
G 74 84 -10 36
H 56 72 -16 144
I 70 72 -2 4
J 50 50 6 100
Total 44 44.4
d
 d  40  4
n 10
1 1
S 
2
n 1
 (d  d )2  (720)  80
9
d 4
t   1.414  t  1.414
S 8.94 / 10
n
Critical value : The critical value of tat 5% level of significance with degrees of freedom 9 is
1.83
Conclusion:
so we accept H0
Hence the employees have not been benefited by the training.
12
www.BrainKart.com
6. The weight gains in pounds under two systems of feeding of calves of 10 pairs of identical
twins is given below.
Twin pair 1 2 3 4 5 6 7 8 9 10
Weight gains under 43 39 39 42 46 43 38 44 51 43
System A
Sytem B 37 35 34 41 39 37 37 40 48 36
Discuss whether the difference between the two systems of feeding is significant.
Solution:
Null Hypothesis H0: 1  2 i.e., there is no significance difference between the two system of
feedings
Alternative Hypothesis H1: 1  2 i.e., there is significance difference between the two systems
of feedings.
Level of significance:   5% = 0.05 ( Two tailed test)
d
S
n
where S 2 
1
 (d  d ) 2 & d 
d
n 1 n
System System
Twin
Pair
A B d  x y (d  d ) 2
x y
1 43 37 6 2.56
2 39 35 4 0.16
3 39 34 5 0.36
4 42 41 1 11.56
5 46 39 7 6.76
6 43 37 6 2.56
7 38 37 1 11.56
8 44 40 4 0.16
9 51 48 3 1.96
10 43 36 7 6.76
Total 44 44.4
d
 d  44  4.4
n 10
1 1
S 
2
n 1
 (d  d )2  (44.4)  4.93
9
 S  2.08
13
www.BrainKart.com
d 4.4
t   6.68
S 2.08 / 10
n
Critical value : The critical value of tat 5% level of significance with degrees of freedom 9 is
2.62
Conclusion:
so we accept H0
Hence there is no significance difference between the two systems of feedings.
II F-test
(i) To test whether if there is any significant difference between two estimates of population
variance
(ii) To test if the two sample have come from the same population.
We use F-test:
S2
The test statistic is given by F  12 , if S12  S22
S2
n1s12 n s2
Where S12  [ n1 is the first sample size] and S22  2 2 [ n2 is the second sample size]
n1  1 n2  1
The degrees of freedom ( v1 , v2 )=( n1  1 n2  1 )
Note :
S2
1. If S12  S22 then F  22 (always F > 1)
S1
2. To test whether two independent samples have been drawn from the same normal population,
we have to test
i) Equality of population means using t-test or z-test, according to sample size.
ii) Equality of population variances using F-test
Problem:
1. A sample of size 13 gave an estimated population variance of 3.0, while another sample of
size 15 gave an estimate of 2.5. Could both sample be from population with the same
variance?
Solution:
Given n1 =13, n 2 =15, S12 =3.0, S22  2.5
Let H 0 : S12  S22 (the two samples have been drawn from populations with same variance}
H1 : S12  S22
S12 3
The test statistics is F  2
  1.2
S2 2.5
From the table, with degrees of freedom v = ( n1  1 n2  1 ) = (12, 14)
F0.05  2.53 Since F  F0.05  H 0 is accepted
Conclusion:
The two sample could have come from two normal population with the same variance.
14
www.BrainKart.com
2. Two sample of size 9 and 8 give the sums of squares of deviations from their respective
means equal to 160 and 91 respectively. Could both samples be from populations with the
same variance?
Solution:
 x  x  y  y
2 2
Given n1 =9, n 2 =8,  160 ,  91
 x  x  y  y
2 2
160 91
S12    20 , S22    13
n1  1 8 n2  1 7
Let H 0 : 12   22 (the two normal populations have the same variance}
H1 : 12   22
S12 20
The test statistics is F    1.538
S22 13
From the table, with degrees of freedom v = ( n1  1 n2  1 ) = (8,7)
F0.05  3.73 Since F  F0.05  H 0 is accepted
Conclusion:
The two sample could have come from two populations with the same variance.
3. Two random samples gave the following data:
Sample Size Mean Variance
I 8 9.6 1.2
II 11 16.5 2.5
Cane we conclude that the two samples have been drawn from the same normal
population?
Solution:
The two samples have been drawn from the same normal population we have to check
(i) the variance of the population do not differ significantly by F-test.
(ii) the sample means do not differ significantly by t-test.
(i) F-test:
Given n1 =8, n 2 =11, s12 =1.2, s22  2.5 , x1 =9.6, x2 =16.5
n1s12 8(1.2) n s 2 11(2.5)
S12    1.37 S22  2 2   2.75
n1  1 7 n2  1 10
Let H 0 : 12   22
H1 : 12   22
S22
(sin ce S12  S22 )
S1
2.75
  2.007
1.37
From the table, F0.05  n2  1, n1  1  F0.05 (10,7)  3.63
Since F  F0.05  H 0 is accepted
(ii) t-test:(Equality of means)
Let H 0 : 1  2
H1 : 1  2
15
www.BrainKart.com
x1  x2
1 1
S 
n1 n2
n1s12  n2 s22 8(1.2)  11(2.5)
where S    1.4772
n1  n2  2 8  11  2
9.6  16.5
t  10.0525  t  10.0525
1 1
1.4772 
8 11
From the table ,with degrees of freedom n1 + n 2 -2=17, t0.05 =2.110
sin ce t  t0.05  H 0 is rejected ie. 1  2
Conclusion:
 The two samples could not have been drawn from the same normal population.
4. Two independent samples of 5 and 6 items respectively had the following values of the
following values of the variable:
Sameple1: 21 24 25 26 27
Sameple2: 22 27 28 30 31 36
Can you say that the two samples came from the same population?
Solution:
Let H 0 : 12   22 and 1  2 ( the two samples have been drawn from the same population)
H1 : 12   22 and 1  2
(i) F-test : (Equality of variance)
x1  x1 x  x  x2  x2 x  x 
2 2
x1 x2
1 1 2 2
21 -3.6 12.96 22 -7 49
24 -0.6 0.36 27 -2 4
25 0.4 0.16 28 -1 1
26 1.4 1.96 30 1 1
27 2.4 5.76 31 2 4
36 7 49
123 21.2 174 108
x1 
x 1

123
 24.6, x2 
 x2  174  29
n1 5 n2 6
 x  x  x 
2 2
21.2 2  x2 108
s12    5.3 , s22    21.6
n1  1 4 n2  1 5
n1s12 5(5.3) n s2 6(21.6)

S12    6.625 S22  2 2   25.92
n1  1 4 n2  1 5
S22
(sin ce S12  S22 )
S1
16
www.BrainKart.com
25.92
  3.912
6.625
From the table, F0.05  n2  1, n1  1  F0.05 (5, 4)  6.26
Since F  F0.05  H 0 is accepted
(ii) t-test:(Equality of means)
x1  x2
1 1
S 
n1 n2
n1s12  n2 s22 5(5.3)  6(21.6)
where S    4.164
n1  n2  2 562
24.6  29
t  1.746  t  1.746
1 1
4.16 
5 6
From the table ,with degrees of freedom n1 + n 2 -2=9, t0.05 =2.262
sin ce t  t0.05  H 0 is accepted ie. 1  2
Conclusion:  The two samples could have been drawn from the same normal population.
5. Two random samples gave the following results:
Sample Size Sample Sum of squares of
mean deviations from the
mean
1 10 15 90
2 12 14 108
Test whether the samples come from the same normal population at 5% level of
significance.
Solution:
A normal population has 2 parameters namely mean µ and variance  2 . To test if independent
samples have been drawn from the same normal population, we have to test
1) Equality of population means using t-test or z-test, according to sample size.
2) Equality of population variances using F-test.

Given x  15, y  14, n1  10, n2  12, ( x  x )  90,
2
( y  y) 2
 108
i) t-test to test equality of population means:
1  2 there is no difference between the two population means.
Null hypothesis H 0 :
Alternate Hypothesis H1 : 1  2 there is difference between the two population means.
Level of Significance :   5%  0.05 (Two tailed test )
xy
Test statistic: t 
1 1
S 
n1 n2
1 1
Where S 
2
 ( x  x )2   ( y  y )2   (90  108)  9.9
n1  n2  2 10  12  2
S  9.9  3.146
17
www.BrainKart.com
15  14
t  0.742
1 1
3.146 
10 12
Critical value: The critical value of t at 5% level of significance with degrees of freedom
n1  n2  2  10  12  2  20 is 2.086
Conclusion: calculated value < table value
H 0 is Accepted.
ii) F-test to test equality of populations variances:
Null Hypothesis H0:  12   22 The population Variances are equal
Alternative Hypothesis H1:  1   2 The population Variances are not equal
2 2
Level of significance:   5%
Test Statistics:
S2
F  12
S2
1 1
Where S1 
2
n1  1
 ( x  x )2 
10  1
(90)  10
1 1
S12 
n1  1
 ( y  y )2 
12  1
(108)  9.818
S12 10
Here S1  S2  F 
2 2
2
  1.02
S2 9.818
Critical value:The critical value of F at 5% level of significance with degrees of freedom
(n1 1, n2 1)  (9,11) is 2.90
Here calculated value < table value, we accept H0
Conclusion: Both null hypothesis 1  2 and  1   2 are accepted.
2 2
Hence we may conclude the two samples are drawn from same normal population.
III  2 -test:
(i).  2 -Test for a specified population variance
(ii).  2 -test is used to test whether differences between observed and expected frequencies are
significant (goodness of fit).
(iii).  2 -test is used to test the independence of attributes.
 2 -Test for a specified population variance:
ns 2
The test statistics  2 
2
Which follows  2 - distribution with (n – 1) degrees of freedom
Problem:
1. The lapping process is used to grind certain silicon wafers to the proper thickness is
acceptable only  , the population S.D. of the thickness of dice cut from the wafers, is at
most 0.5mil. Use the 0.05 level of significance to test the null hypothesis  =0.5 against the
alternative hypothesis  >0.5, if the thickness of 15 dice cut from such wafers have S.D of
0.64mil.
18
www.BrainKart.com
Solution:
Given n  15 , s=0.64,  =0.5
H 0 :   0.5 , H1 :   0.5
ns 2 15 (0.64)2
Under H 0 , The test statistics  2    24.576
2 (0.5)2
From  2 table, with degrees of freedom = 14, 0.05
2
 23.625
  2  0.05
2
H 0 is rejected. Hence   0.5
 2 -test is used to test whether differences between observed and expected frequencies are
significant (goodness of fit):
  Oi  Ei 2 
The test statistics    
2

i  Oi 

Where Oi is observed frequency, and Ei is the expected frequency.
If the data given in a series of n number, then degree of freedom = n - 1 .
Note: In case of binomial distribution d.f = n – 1, poisson distribution d.f = n – 2, normal
distribution d.f = n – 3.
Problem:
1. The following data give the number of aircraft accident that occurred during the various
days of a week:
Days : Mon Tue Wed Thu Fri Sat
No of 15 19 13 12 16 15
accidents:
Test the whether the accident are uniformly distributed over the week.
Solution:
90
The expected number of accident on any day   15
6
Let H 0 : Accidents occur uniformly over the week
H1 : Accidents not occur uniformly over the week
Days Observed Expected  Oi  Ei   Oi  Ei 
2
Freqency Frequency
Ei
( Oi ) ( Ei )
Mon 15 15 0 0
Tue 19 15 4 1.066
Wed 13 15 -2 0.266
Thu 12 15 -3 0.6
Fri 16 15 1 0.066
Sat 15 15 0 0
90 1.998
  O  Ei  
2
Now,  2    i   1.998
i  Oi 

Here 6 observations are given, degrees of freedom = n – 1= 6 – 1 = 5
From  2 table, with degrees of freedom = 5, 0.05

2
 11.07
19
www.BrainKart.com
  2  0.05
2
H 0 is accepted.
Conclusion:  Accidents occur uniformly over the week
2. A survey of 320 families with 5 children each revealed the following distribution:
No. of 5 4 3 2 1 0
Boys:
No. of 0 1 2 3 4 5
Girls:
No. of 14 56 110 88 40 12
families:
Is the result consistent with the hypothesis that male and female births are equally
probable?
Solution:
Let H 0 : Male and female births are equally probable
H1 : Male and female births are not equally probable
1 1
Probability of male birth  p  , Probability of female birth  q 
2 2
x 5 x
The probability of x male births in a family of 5 is p( x)  5Cx p q , x  0,1, 2...5
Expected number of families with x male births  320  5Cx p x q5 x , x  0,1, 2...5
x 5 x
1 1
 320  5Cx    
2 2
5
1
 320  5Cx    10  5Cx
2
The  2 is calculated using the following table:
No. of Observed Expected  Oi  Ei   Oi  Ei 
2
Boys Freqency Frequency

Ei
( Oi ) Ei  10  5Cx
5 14 10 4 1.6
4 56 50 6 0.72
3 110 100 10 1
2 88 100 -12 1.44
1 40 50 -10 2
0 12 10 2 0.4
Total 320 320 7.16
   7.16
2
The tabulated value of  2 for n – 1 = 6 – 1 =5 degrees of freedom at 5% level of significance

 0.05
2
 11.07
Since  2  0.05
2
. So we accepted H 0 .
Conclusion:  The male and female births are equally probable.
3. Fit a poisson distribution to the following data and test the goodness of fit.
x: 0 1 2 3 4 5 6
f(x): 275 72 30 7 5 2 1
Solution:
20
www.BrainKart.com
Mean of the given distribution  x 

fx i i

189
 0.482
f i 392
To fit a poisson distribution to the given data:
We take the parameter of the poisson distribution equal to the mean of the given distribution.
   x  0.482
e   x
The poisson distribution is given by P  X  x   ; x  0,1, 2...
x!
 0.482 
0.482 x
e   x
and the expected frequencies are obtained by f ( x)    fi  
e
 392 
x! x!
0.482
 0.482  242.1, f (1)  392  e  0.482   116.69
0.482
0 1
e
we get f (0)  392 
0! 1!
f (3)  4.518, f (4)  0.544, f (5)  0.052  0.1, f (6)  0.004  0
x: 0 1 2 3 4 5 6 Total
Expected 242.1 116.69 28.12 4.518 0.544 0.052 0.004 392
Frequency:
H 0 : The poisson distribution fit well into the data.

H1 : The poisson distribution does not fit well into the data.
The  2 is calculated using the following table:

x Observed Expected  Oi  Ei 
2
Freqency Frequency
Ei
( Oi )  Ei 
0 275 242.1 4.471
1 72 116.7 17.122
2 30 28.1 0.128
3 7 4.5
4 5 15 0.5 5.1 19.218
5 2 0.1
6 1 0
Total 392 392 40.939
   40.939
2
The tabulated value of  2 for = 7 – 1 –1– 3 =2 degrees of freedom at 5% level of significance

 0.05
2
 5.991
Since  2  0.05
2
. So we rejected H 0 .
Conclusion: 
The Poisson distribution is not a good fit to the given data.
 2 -test is used to test the independence of attributes:

An attributes means a equality or characteristic.  2 - test is used to test whether the two
attributes are associated or independent. Let us consider two attributes A and B. A is divided into
three classes and B is divided into three classes.
21
www.BrainKart.com
Attribute B
B1 B2 B3 Total
A1 a11 a12 a13 R1
A2 a21 a22 a23 R2
Attribute A
A3 a31 a32 a33 R3
Total C1 C2 C3 N
Now, under the null hypothesis H 0 : The attributes A and B are independent and we calculate
the expected frequency Eij for varies cells using the following formula.
Ri  C j
Eij  , i  1, 2,...r , j  1, 2,...s
N
R1  C1 R1  C2 R1  C3 R1
E  a11   E  a12   E  a13  
N N N
R C R C R C R2
E  a21   2 1 E  a22   2 2 E  a23   2 3
N N N
R3  C1 R3  C2 R3  C3 R3
E  a31   E  a32   E  a33  
N N N
C1 C2 C3 N
O  Eij 
2
r s
and we compute  2  
ij
i 1 j 1 Eij
Which follows  2 distribution with n = (r-1) (s-1) degrees of freedom at 5% or 1% level of
significance.
1. Calculate the expected frequencies for the following data presuming two attributes viz.,
conditions of home and condition of child as independent.
Condition of home
Clean Dirty
Condition of Child Clean 70 50
Fair 80 20
Dirty 35 45
Use Chi-Square test at 5% level of significance to state whether the two attributes are
independent.
Solution:
Null hypothesis H 0 : Conditions of home and conditions of child are independent.
Alternate hypothesis H 1 : Conditions of home and conditions of child are not independent.
Level of significance:   0.05
22
www.BrainKart.com
r s (Oij  Eij ) 2
The test statistics:  2   
i 1 i 1 Eij
Analysis:
Condition of home Total
Clean Dirty
Condition of Child Clean 70 50 120
Fair 80 20 100
Dirty 35 45 80
Total 185 115 300
Corresponding row total×Column total
Expected Frequency 
Grand Total
120×185 100×185
Expected Frequency for 70   74 , Expected Frequency for 80   61.67 ,
300 300
80×185 120×115
Expected Frequency for 35   49.33 , Expected Frequency for 50   46 ,
300 300
100×115 80×115
Expected Frequency for 20   38.33 , Expected Frequency for 45   30.67
300 300
Oij E ij Oij - E ij (Oij  Eij ) 2 (Oij  Eij ) 2
Eij
70 74 -4 16 16
 0.216
74
50 46 4 16 0.348
80 61.67 18.33 335.99 5.448
20 38.33 -18.33 335.99 8.766
35 49.33 -14.33 205.35 4.163
45 30.67 14.33 205.35 6.695
Total 25.636
  2  25.636
  0.05 Degrees of freedom = (r 1)(c 1)  (3 1)(2 1)  2  2  5.991

Conclusion:
Since  2   2 , we Reject our Null Hypothesis H 0 . Hence, Conditions of home and conditions
of child are not independent.
23
www.BrainKart.com
2. The following contingency table presents the reactions of legislators to a tax plan according
to party affiliation. Test whether party affiliation influences the reaction to the tax plan at
0.01 level of signification.
Reaction
Party In favour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
Solution:
Null hypothesis H 0 : Party affiliation and tax plan are independent.
Alternate hypothesis H 1 : Party affiliation and tax plan are not independent.
The test statistic:   
2

i 1 i 1 Eij
Analysis:
Reaction
Party Infavour Neutral Opposed Total
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
160  220 160  60 160  120

E(120)=  88 ; E(20)=  24 ; E(20)=  48
400 400 400
140  220 140  60 140  120
E(50)=  77 ; E(30)=  21 ; E(60)=  42
400 400 400
100  220 100  60 120  100
E(50)=  55 ; E(10)=  15 ; E(40)=  30
400 400 400
24
www.BrainKart.com
Eij
120 88 32 1024 11.64

20 24 -4 16 0.67
20 48 -28 784 16.33
50 77 -27 729 9.47
30 21 9 81 3.86
60 42 18 324 7.71
50 55 -5 25 0.45
10 15 -5 25 1.67
40 30 10 100 3.33
Total 55.13
  2  55.13
  0.05 Degrees of freedom = (r  1)(s  1)  (3  1)(3  1)  4  0.05

2
 13.28
Conclusion: Since  2   2 , we Reject our Null Hypothesis H 0

Hence, the Party Affiliation and tax plan are dependent.
3. From a poll of 800 television viewers, the following data have been accumulated as to, their
levels of education and their preference of television stations. We are interested in
determining if the selection of a TV station is independent of the level of education
Educational Level
Public High School Bachelor Graduate Total
Broadcasting 50 150 80 280
Commercial Stations 150 250 120 520
Total 200 400 200 800
(i) State the null and alternative hypotheses.
(ii) Show the contingency table of the expected frequencies. (iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for
this test.
Solution:
(i)Null Hypothesis: Selection of TV station is independent of level of education
Alternative Hypothesis: Selection of TV station is not independent of level of education
(ii) Level of significance:   0.05
25
www.BrainKart.com
Educational Level
Total 200 400 200 800
To Find Expected frequency:

Corresponding row total×Column total
Expected Frequency 
Grand Total
280×200 280×400
Expected Frequency for 50   70 , Expected Frequency for 150   140
800 800
280×200 520×200
800 800
520×400 520×200
800 800
The test statistic:   
2

i 1 i 1 Eij
Analysis:
Eij
50 70 -20 400 5.714

150 140 10 100 0.174
80 70 10 100 1.428
150 130 20 400 3.076
250 260 -10 100 0.385
120 130 -10 100 0.769
TOTAL 11.546
(iii)Test statistic = 11.546

(iv) Critical Chi-Square = 5.991,
Conclusion: Calculated value > table value

Hence, we reject Null Hypothesis.
26
www.BrainKart.com
Large sample:
If the size of the sample n>30, then that samplw is said to be large sample. There are four
important test to test the significance of large samples.
(i). Test of significance for single mean.

(ii). Test of significance for difference of two means.
(iii). Test of significance for single proportion
(iv). Test of significance for difference of two proportions.
Note:
(i). The sampling distribution of a static is approximately normal, irrespective of whether the
distribution of the population is normal or not.
(ii). The sample statistics are sufficiently close to the corresponding population parameters and
hence may be used to calculate the standard errors of the sampling distribution.
(iii). Critical values for some standard LOS’s (For Large Samples)
1% (0.01) 2% (0.02) 5% (0.05) 10% (0.1)
Nature of test
(99%) (98%) (95%) (90%)
Two Tailed Test z  2.58 z  2.33 z  1.96 z  1.645
One Tailed Test

z  2.33 z  2.055 z  1.645 z  1.28
(Right tailed Test)
One Tailed Test
z   2.33 z   2.055 z  1.645 z   1.28
(Left tailed Test)
Problem based on Test of significance for single mean:

x
The test statistic z  where x =sample mean, =population mean,  = standard deviation

n
of population, n= sample size.
Note:
x
If standard deviation of population is not known then the static is z  ,
S
n
where S = standard deviation of sample.
Confident Interval:
The confident interval for  when  is known and sampling is done from a normal population or

with a large sample is x z
n
27
www.BrainKart.com
   
  x  z , x  z 
 n n
s
If s is known (  is not known): x z
n
1. A sample of 100 students is taken from a large population, the mean height in the sample is
160cm. Can it be reasonable regarded that in the population the mean height is 165cm, and
s.d. is 10cm. and find confident limit. Use an level of significance at 1%
Solution:
Given n = 100, x =160cm, =165cm,  =10cm
Let H 0 :   165
H1 :   165 (two tailed test)
x 160  165
Under H 0 , the test statistic is z    5
 10
n 100
 z  5
From the table, z0.01 =2.58. Since z  z0.01  H 0 is rejected. hence   165 .
Confident Interval:
     10 10 
 x  z , x  z   160  2.58 ,160  2.58   (157.42,162.58)
 n n  100 100 
2. The mean breaking strength of the cables supplied by a manufacture is 1800 with a S.D of
100. By a new techniques in the manufacturing process, it it claimed that the breaking
strength of the cable has increased. In order to test this claim, a sample of 50 cables is tested
and it is found that the mean breaking strength is 1850. Can we support the claim at 1%
level of significance?
Solu:
Given n = 50, x =1850, =1800,  =100
Let H 0 : x  
H1 : x   (one tailed test)
x 1850  1800
Under H 0 , the test statistic is z    3.535
 100
n 50
 z  3.535
From the table, z0.01 =2.33. Since z  z0.01  H 0 is rejected. hence x   .
3. A sample of 900 members has a mean of 3.4 cms and s.d is 2.61 cms. Is the sample from a
large population of mean 3.25cm and s.d is 2.61 cms. If the population is normal and its
mean is unknown find the 95% confidence limits of true mean.
Solution:
Given n  900 ,   3.25 , x  3.4cm ,   2.61, s  2.61
Null Hypothesis H0 : Assume that there is no significant difference between sample mean and
population mean. (i.e)   3.25
Alternative Hypothesis H1 : Assume that there is a significant difference between sample mean
and population mean. (i.e)   3.25
28
www.BrainKart.com
Level of significance :   5%
Test Statistic :
x   3.4  3.25
z   1.724
s 2.61
n 900
Critical value: The critical value of z for two tailed test at 5% level of significance is 1.96
Conclusion:
i.e., z  1.724  1.96  calculated value < tabulated value
Therefore We accept the null hypothesis H0.
i.e., The sample has been drawn from the population with mean   3.25
To find confidence limit:

95% confidence limits are
  2.61 
x 1.96  3.4 1.96    3.4 0.1705   3.57,3.2295 
n  900 
4. A lathe is set to cut bars of steel into lengths of 6 centimeters. The lathe is considered to be
in perfect adjustment if the average length of the bars it cuts is 6 centimeters. A sample of
121 bars is selected randomly and measured. It is determined that the average length of the
bars in the sample is 6.08 centimeters with a standard deviation of 0.44 centimeters.
(i) Formulate the hypotheses to determine whether or not the lathe is in perfect adjustment.
(ii) Compute the test statistic.
(iii) What is your conclusion?
Solution:
Given n  121, x  6.08,   6, S  0.44
Null Hypothesis H0:   6 i.e., Assume that the lathe is in perfect adjustment
Alternative Hypothesis H1:   6 i.e., Assume that the lathe is not in perfect adjustment.
Level of Significance :   0.05
ii) Test Statistic :
x   6.08  6 0.08
z   2
S 0.44 0.04
n 121
Table value: Table value at 5% level of significance is 1.96
iii) Conclusion:
Here calculated value > tabulated value
Hence we reject 𝐻0 .
5. The mean life time of a sample of 100 light tubes produced by a company is found to be
1580 hours with standard deviation of 90 hours. Test the hypothesis that the mean lifetime
of the tubes produced by the company is 1600 hours.
Solution:
Given n  100, x  1580,   1600, S  90
Null Hypothesis H0:   1600 i.e., There is no significance difference between the sample mean
29
www.BrainKart.com
and population mean
Alternative Hypothesis H1:   1600 i.e., There is a significance difference between the
sample mean and population mean
Level of Significance :   5%  0.05
Test Statistic :
x   1580  1600 20
z    2.22
S 90 9
n 100
z  2.22
Table value: Table value at 5% level of significance is 1.96 (two tailed test)
Conclusion:
Here calculated value > tabulated value
Hence we reject 𝐻0 .
Hence the mean life time of the tubes produced by the company may not be 1600 hrs.
Problem based on Test of significance for difference of two means:

x1  x2
The test statistic z  where  1 ,  2 are S.D. of populations.
 12  22

n1 n2
Test Statistic:
i) Z  x1  x2 If  is known and 1   2
1 1
 
n1 n2
ii) Z  x y If  is not known and 1   2 , S12 , S 2 2 are known.
S12 S2 2

n1 n2
Confident Interval:
The confident interval for difference between two population mean for large sample,
 12  22
(1) when  (1 ,  2 ) is known is  x1  x2   z 
n1 n2
s12 s22
(2). when s ( s1 , s2 ) is known is  x1  x2   z

n1 n2
1. In a random sample of size 500, the mean is found to be 20. In another independent sample
of size 400, the mean is 15. Could the samples have been drawn from the same population
with S.D 4?
Solution:
Given x1  20, x2  15, n1  500, n2  400,   4
1  2 The samples have been drawn from the same population.
Null hypothesis H 0 :
Alternate Hypothesis H1 : 1  2 The samples could not have been drawn from same population.
Level of Significance :   5%  0.05 (Two tailed test )
30
www.BrainKart.com
x1  x2 20  15
Test statistic: z    18.6
1 1 1 1
  4 
n2 n1 500 400
Critical value: The critical value of t at 1% level of significance is 2.58
Conclusion: calculated value > table value
H 0 is rejected
The samples could not have been drawn from same population.
2. Test significance of the difference between the means of the samples, drawn from two
normal populations with the same SD using the following data:
Size Mean Standard Deviation
Sample I 100 61 4
Sample II 200 63 6
Solution:
Given x1  60, x2  63, s1  4, s2  6, n1  100, n2  200
Null hypothesisH 0 : 1  2 there is no significance difference between the means of the samples.
Alternate Hypothesis H1 : 1  2 there is a significance difference between the means of the
samples.
Level of Significance :   5%  0.05 (two tailed test )
x1  x 2 61  63
Test statistic: z    3.02  z  3.02
2 2 2 2
s1 s 4 6
 2 
n2 n1 200 100
H 0 is rejected .Therefore the two normal populations, from which the samples are drawn, may
not have the same mean though they may have the same S.D.
3. A sample of heights of 6400 Englishmen has a mean of 170cm and a S.D of 6.4cm, while a
simple sample of heights of 1600 Americans has a mean of 172cm and a S.D of 6.3cm. D the
data indicate that Americans are on the average, taller than Englishmen?
Solution:
Given x1  170, x2  172, s1  6.4, s2  6.3, n1  6400, n2  1600
Null hypothesis H 0 : 1  2 there is no significance difference between the heights of Americans
and Englishmen.
Alternate Hypothesis H1 : 1  2 Americans are on the average, taller than Englishmen
Level of Significance :   5%  0.05 (one tailed test )
x1  x 2 170  172
Test statistic: z    11.32  z  11.32
2 2 2 2
s1 s 6.4 6.3
 2 
n2 n1 6400 1600
31
www.BrainKart.com
H 0 is rejected. We conclude that the data indicate that Americans are on the average, taller than
Englishmen.
4. The aveage marks scored by 32 boys is 72 with a S.D of 8, while that for 36 girls is 70 with a
S.D of 6. Test at 1%level of significance whether the boys perform beter than girls.
Solution:
Given x1  72, x2  70, s1  8, s2  6, n1  32, n2  36
H 0 : 1  2 (Both perfom are equal)
H 0 : 1  2 (Boys are better than girls) (one tailed test)
x1  x2 72  70
The test statistic: z    1.15
2 2
s s2 82 62
1
 
n2 n1 32 36
Conclusion: calculated value < table value

H 0 is accepted. Hence both are equal.
Problem based on Test of significance for single proportion:

To test the significant difference between the sample proportion p and the population
proportion P, then we use the test statistic
pP
z , where Q = 1 – P
PQ
n
Confident Interval:
PQ
The confident interval for population proportion for large sample is p z
n
1. In a big city 325 men out of 600 men were found to be smokers. Does this information
support the conclusion that the majority of men in this city are smokers?
Solution:
Given n=600 , Number of smokers=325
p = sample proportion of smokers p =325/600=0.5417
P= Population proportion of smokers in the city = 1/2 =0.5Q=0.5
Null Hypothesis H0: The number of smokers and non-smokers are equal in the city.
Alternative Hypothesis H1: P > 0.5 (Right Tailed)
Test Statistic:
p  P 0.5417  0.5
z   2.04
PQ 0.5*0.5
n 600
32
www.BrainKart.com
Critical value:
Tabulated value of z at 5% level of significance for right tail test is 1.645.
Conclusion:
Since Calculated value of z > tabulated value of z.
We reject the null hypothesis. The majority of men in the city are smokers.
2. 40 people were attacked by a disease and only 36 survived. Will you reject the hypothesis
that the survival rate, if attacked by this disease, is 85% at 5% level of significance?
Solution:
Given
36
The Sample proportion, p   0.90
40
Population proportion P  0.85  Q  1  P  1  0.85  0.15
Null Hypothesis H0: P  0.85 i.e., There is no significance difference in survival rate
Alternative Hypothesis H1: P  0.85
i.e., There is a significance difference in survival rate.
Test Statistic :
pP 0.90  0.85
z   0.886
PQ 0.85  0.15
n 40
Table value: Tabulated value of z at 5% level of significance is 1.96
Conclusion : The table value >calculated value

Hence we accept the null hypothesis
Conclude that the survival rate may be taken as 85%.
3. A Manufacturer of light bulbs claims that an average 2% of the bulbs manufactured by his
firm are defective. A random sample of 400 bulbs contained 13 defective bulbs. On the basis
of this sample, can you support the manufacturer’s claim at 5% level of significance?
Solution:
Given n  400
X 13
p  Sample proportion of defectives =   0.0325
n 400
Null Hypothesis H0: P  2%  0.02 i.e., Assume that 2% bulbs are defective.
Alternative Hypothesis H1: P  2%  0.02 i.e., Assume that 2% bulbs are non-defective.
Level of significance:   5% = 0.05
pP
Test Statistic : z 
PQ
n
0.0325  0.02 0.0125
z   1.7857
0.02  0.98 0.0007
400
Critical value : The critical value of tat 5% level of significance is 1.645 (one tailed test)
33
www.BrainKart.com
Conclusion:
Here calculated value > table value.
So we accept H0 . Hence the manufacturers claim cannot be supported.
4. A salesman in a departmental store claims that at most 60 percent of the shoppers entering
the store leave without making a purchase. A random sample of 50 shoppers should that 35
out of them left without making a purchase. Are these sample reults consistent with the
claim of the salesman? Use an LOS of 0.05.
Solution:
35
Let p = Sample proportion of shoppers not making a purchase =  0.7
50
60
P = Population proportion of shoppers not making a purchase = 60%   0.6 ,
100
and Q = 1 – P = 0.4
H0: P  0.6 i.e., The claim is accepted
H1: P  0.6 (two tailed test)
pP 0.7  0.6
The test Statistic is z    1.445
PQ 0.6  0.4
n 50
From the table, z0.05 =1.96. Since z  z0.05  H 0 is accepted
Conclusion:
The sample reults are consistent with the claim of the salesman.
Problem based on Test of significance for Two proportion:
To test the significant difference between the sample proportion p1 and p2 and the population
proportion P, then we use the test statistic
p1  p2
z , where Q = 1 – P
1 1
PQ   
 n1 n2 
n1 p1  n2 p2
If P is not known, then P 
n1  n2
Confident Interval:
The confident interval for difference between two population proportion for large sample is
1 1
 p1  p2 z PQ   
 n1 n2 
1. Before an increase in excise duty on tea, 800 people out of a sample of 1000 were consumers
of tea. After the increase in duty, 800 people were consumers of tea in a sample of 1200
persons. Find whether there is significant decrease in the consumption of tea after the
increase in duty. Also find confident limit.
Solution:
Given n1  1000, n2  1200
800
p1  proportion of tea drinkers before increase inexcise duty   0.8
1000
34
www.BrainKart.com
800
p2  proportion of tea drinkers before increase inexcise duty   0.6667
1200
Null hypothesis: H 0 : P1  P2 there is no significance difference in the consumption of tea before
after increase in excise duty
Alternate hypothesis: H1 : P1  P2 there is a significance difference in the consumption of tea
before after increase in excise duty
Level of significance:   5% =0.05
p1  p2
Test Statistic: z 
1 1
PQ   
 n1 n2 
Where
n1 p1  n2 p2 (0.8)(1000)  (0.67)(1200)
P   0.7273  Q  1  P  1  0.7273  0.2727
n1  n2 1000  1200
0.8  0.6667 0.1333
z   6.99
 1 1  0.01907
(0.7273)(0.2727)   
 1000 1200 
Critical value: the critical value of z at 5% level of significance is 1.645
Conclusion:
Here calculated value > table value
 We reject H 0
Hence there is no significance difference in the consumption of tea before after increase in excise
duty.
Confident Interval:
The confident interval for difference between two population proportion for large sample is
1 1  1 
 p1  p2  z PQ      0.8  0.667  1.645 0.7273  0.2727 
1
 
 n1 n2    1000 1200  
 (0.1016,0.1644)
2. Random samples of 400 men and 600 women asked whether they would like to have a
flyover near their residence.200 men and 325 women were in favor of the proposal. Test the
hypothesis that proportions of men and women in favor of the proposal are same against
that they are not, at 5% level.
Solution:
Given n1  400, n2  600
200
p1  proportion of men   0.5
400
325
p2  proportion of women   0.541
600
Null hypothesis: H 0 : P1  P2 Assume that there is no significant difference between the
option of men and women as far as proposal of flyover is concerned.
Alternate hypothesis: H1 : P1  P2 Assume that there is significant difference between the
option of men and women as far as proposal of flyover is concerned
35
www.BrainKart.com
Level of significance:   5% =0.05 (two tailed)
p1  p2
Test Statistic: z 
1 1
PQ   
 n1 n2 
n1 p1  n2 p2 (400)(0.5)  (600)(0.541)
Where P    0.525  Q  1  P  1  0.525  0.475
n1  n2 400  600
0.5  0.541 0.041
z   1.34  z  1.34
 1 1  0.032
(0.525)(0.475)   
 400 600 
Critical value: the critical value of z at 5% level of significance is 1.96
Conclusion:
Here calculated value < table value
 We accept H 0 at 5% level of significance.
Hence There is no difference between the option of men and women as far as proposal
of flyover are concerned.
3. A machine puts out 16 imperfect articles in a sample of 500. After the machine is
overhauled, it puts out 3 imperfect articles in a batch of 100. Has the machine improved?
Solution:
Hypothesis:
H 0 : P1  P2
H 1 : P1  P2
p1  p 2
Test Statistic : Z 
1 1 
PQ   
 n1 n2 
Analysis:
The Sample proportion,
16 3 n p  n2 p 2
p1   0.032 , p 2   0.03 , P  1 1  0.032 & Q  1  P  0.968
500 100 n1  n2
p1  p 2 0.032  0.03
Z   0.1037
1 1   1 1 
PQ   0.032  0.968  
 n1 n2   500 100 
Table value : Z   1.645
Conclusion:
Calculated value < table value
Hence we accept the null hypothesis and conclude that the machine has not improved after
overhauling.
36
www.BrainKart.com
UNIT-IV DESIGN OF EXPERIMENT
The sequence of steps taken to ensure a scientific analysis leading to valid inferences about
the hypothesis is called design of experiment. The main aim of the design of experiments is to
control the extraneous variables and hence to minimize the experimental error so that the
results of the experiments could be attributed only to the experimental variables.
The basic principles of design of experiments:
(i) Randomization
(ii) Replication
(iii) Local Control
Basic design of Experiments:
Depending on the number of extraneous variables whose effects are to be controlled, various
design procedures are developed in the study of experimental design. We shall consider here
three important designs.
(1) Completely randomized Design (C.R.D)
(2) Randomized Block Design (R.B.D)
(3) Latin Square Design (LSD)
ANOVA:
Analysis of Variance is a technique that will enable us to test for the significance of the
difference among more than two sample means.
Assumptions of analysis of variance:
(i) The sample observations are independent
(ii) The environmental effects are additive in nature
(iii) The samples have been randomly selected from the population.
(iv) Parent population from which observations are taken in normal.
One Way Classification (or) Completely randomized Design (C.R.D)
The C.R.D is the simplest of all the designs, based on principles of randomization and
replication. In this design, treatments are allocated at random to the experimental units over the
entire experimental materials.
Advantages of completely randomized block design:
The advantages of completely randomized experimental design as follows:
(i) Easy to lay out.
(ii) Allow flexibility
(iii) Simple statistical analysis
(iv) lots of information due to missing data is smaller than with any other design
www.BrainKart.com
Working Procedure ( One – Way classification )
Null Hypothesis H 0 : There is no significance difference between the treatments.
Alternate Hypothesis H 1 : There is a significance difference between the treatments.
Analysis:
Step 1: Find N= number of observations
Setp 2: Find T = The total value of observations
2
T
Step 3: Find the correction Factor = C . F 
N
Step 4: Calculate the total sum of squares = TSS   X1 

2
 X 2
2
  X 3  ...   C . F
2
Step 4: Find Total Sum of Square T S S   X1 

2
 X
2
2
  X 3  ...   C . F
2
 
 X1    
2 2 2
X X
Step 5: Column Sum of Square S S C    ...   C . F
2 3
 
 N1 N2 N3 
 
Where N i  Total number of observation in each column ( i  1, 2 , 3 , ... )
Step 6: Prepare the ANOVA TABLE to calculate F-ratio.
Source
Sum of Degree
of Mean
Degree of F- Ratio
Variatio Square
s freedom
n
M SC
Between SSC FC  if M SC  M SE
SSC c-1 M SC  M SE
Columns c 1
(or)
M SE
SSE FC  if M SE  M SC
Error SSE N-c M SE  M SC
N c
Total SST N-1
Step 7: Find the table value (use  table)

2
Step 8: Conclusion:
Calculated value < Table Value, the we accept Null Hypothesis H 0 (or)
Calculated value > Table Value, the we reject Null Hypothesis H 0
1. The following are the number of mistakes made in 5 successive days by 4 technicians
working for a photographic laboratory. Test whether the difference among the four
sample means can be attributed to chance. (Test at a level of significance   0 . 01 )
www.BrainKart.com
Technicians
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
Solution:
H0: There is no significant difference between the technicians

H1 : Significant difference between the technicians
We shift the origin to 10
X1 X2 X3 X4 TOTAL X12 X22 X32 X42
-4 4 0 -1 -1 16 16 0 1
4 -1 2 2 7 16 1 4 4
Total 0 2 -3 -2 -3 0 4 9 4
-2 0 5 0 3 4 0 25 0
1 4 1 1 7 1 16 1 1
-1 9 5 0 13 37 37 39 10
Step1: N= Total No of Observations = 20
Step 2: T=Grand Total = 13

2 T
2
13
2
( Grand total )
Step 3: Correction Factor = =   8 .4 5
Total No of Observatio ns N 20
Step 4: T S S   X1   X2   X3   X 4  C . F  3 7  3 7  3 9  1 0  8 .4 5  1 1 4 .5 5
2 2 2 2
Step 5:
 X1    X3  
2 2 2 2
X 2
X 4
SSC      C.F
N1 N1 N1 N1
(  1)
2 2 2
9 5
    0  8 .4 5
5 5 5
S S C  0 .2  1 6 .2  5  8 .4 5  1 2 .9 5
Where N 1 = Number of elements in each column=5
Step 7: S S E = T S S -S S C  1 1 4 .5  1 2 .9 5  1 0 1 .6
www.BrainKart.com
Step 8: ANOVA TABLE:
Source of Sum of Degree of
Mean Square F- Ratio
Variation Squares freedom
Between SSC
=4.317 FC 
M SC
SSC=12.95 C-1= 4-1=3 M SC 
Columns C 1 M SE
6 .3 5

SSE 4 .3 1 7
Error SSE=101.6 N-C=20-4=16 M SE  =6.35
N  C =1.471
Cal FC = 1.471
Table value : FC (16,3)=5.29
Conclusion : Cal FC< Tab FC
 There is no significance difference between the technicians
2. A completely randomized design exprement with 10 plots and 3 treatments gave the
following results.
Plot No : 1 2 3 4 5 6 7 8 9 10
Treatment : A B C A C C A B A B
Yield : 5 4 3 7 5 1 3 4 1 7
Analyse the results for treatment effects.
Solution:
A B C
5 4 3
7 4 5
3 7 1
1
Null Hypothesis H0: There is no significant difference in treatments
Alternate Hypothesis H1 : Significant difference in treatments
X1 X2 X3 TOTAL X12 X22 X32
5 4 3 12 25 16 9
7 4 5 16 49 16 25
Total
3 7 1 11 9 49 1
1 1 1
16 15 9 40 84 81 35
2 T
2
40
2
( Grand total )
Step 3: Correction Factor = =   160
www.BrainKart.com
Step 4: T S S   X1   X2   X 3  C . F  84  81  35  160  40
2 2 2
 X1    
2 2 2
2 2
X 2
X 3 (1 6 ) 15
Step 5: S S C     C .F    3  160
N1 N1 N1 4 3
SSC  64  75  27  160  6
Where N 1 = Number of elements in each column

Step 7: S S E = T S S -S S C  4 0  6  3 4
SSC
M SC 
Between C 1
SSC=6 C-1= 3-1=2 M SE
Columns 6 FC 
  3 M SC
2
4 .8 6
SSE 
M SE  3
N  C
Error SSE=34 N-C=10-3=7  1 .6 2
34
  4 .8 6
87
Cal FC = 1.62
Table value : FC (7,2)=19.35
Conclusion : Cal FC< Tab FC
We accept Null Hypothesis  There is no significance difference in tretments
3. As head of the department of a consumers research organization you have the responsibility of
testing
and comparing life times of 4 brands of electric bulbs.suppose you test the life time of 3 electric
bulbs
each of 4 brands,the data is given below,each entry representing the life time of an electric
bulb,measured
in hundreds of hours
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
Solution:
H0: Here the population means are equal.
H1: The population mean are not equal.
X1 X2 X3 X4 X12 X22 X32 X42

20 25 24 23 400 625 576 529
19 23 20 20 361 529 400 400
21 21 22 20 441 441 484 400
TOTAL 60 69 66 63 1202 1595 1460 1329
www.BrainKart.com
2 T
2
258
2
( Grand total )
Step 4: T S S   X1   X2   X 3  C . F  39
2 2 2
 X1    
2 2 2
X 2
X 3
Step 5: S S C     C .F  15
N1 N1 N1

Step 7: S S E = T S S -S S C  3 9  1 5  2 4
Between SSC
=13
M SE
SSC=39 C-1= 4-1=3 M SC  FC 
Columns C 1 M SC
13
SSE 
M SE  =1.875 1 .8 7 5
Error SSE=15 N-C=12-4=8 N  C
 6 .9 3
Cal FC = 6.93 & Tab FC (3,8)=4.07

Conclusion : Cal FC > Tab FC Hence we rejected H0
Two Way Classification (or) Randomized Block Design (R.B.D):

The entire experiment influences on only two factors is two way Classification.
Working Procedure ( Two – Way classification )
Null Hypothesis H 0 : There is no significance difference between the treatments.
Alternate Hypothesis H 1 : There is a significance difference between the treatments.
Analysis:
2
T
Step 3: Find the correction Factor = C .F 
N

2
 X
2
2
  X 3  ...   C . F
2

2
 X
2
2
  X 3  ...   C . F
2
 
 X1    
2 2 2
X X
Step 5: Find column sum of Square S S C    ...   C . F
2 3
 
 N1 N2 N3 
 
www.BrainKart.com
Where N i  Total number of observation in each column ( i  1, 2 , 3, ...)
 
 Y   Y   Y 
2 2 2
Step 6: Find Row sum of square = S S R    ...   C . F

1 2 3
 
 N1 N2 N3 
 
Where N j
 Total number of observation in each Row ( j  1, 2 , 3, ...)
Step 7: Prepare the ANOVA TABLE to calculate F-ratio.
Source of Sum of Degree

Variation Degrees of freedom
M SC
FC  if
M SE
M SC  M SE
Between SSC
SSC c-1 M SC  (or)
Columns c 1
M SE
FC  if
M SC
M SE  M SC
M SR
FR  if
M SE
M SR  M SE
Between SSR
SSR r-1 M SC  (or)
Rows r 1
M SE
FR  if
M SR
M SE  M SR
SSE
Error SSE N-c-r+1 M SE 
N  c  r 1
Total TSS rc-1
Step 8: Find the table value for both F C & F R (use  table)
2
Step 9: Conclusion:
Calculated value < Table Value, the we accept Null Hypothesis H 0 (or)
Calculated value > Table Value, the we reject Null Hypothesis H 0
1. A Company appointments four salesmen A, B, C and D and observes their sales in 3

seasons: summer, winter and monsoon. The figures (in lakhs of Rs.) are given in the
following table:
Salesman
Season
A B C D
Summer 36 36 21 35
Winter 28 29 31 32
Monsoon 26 28 29 29
i) Do the salesmen significantly differ in performance?
www.BrainKart.com
ii) Is there significant difference between the seasons?
Solution:
Null Hypothesis H 0 : There is no significant difference between the sales in the 3 seasons and
also between the sales of the 4 salesmen.
Alternate Hypothesis H 1 : There is a significant difference between the sales in the 3 seasons and
also between the sales of the 4 salesmen.
Test statistic:
To simplify calculations we deduct 30 from each value
Seasons A B C D Seasons
X12 X22 X32 X42
X1 X2 X3 X4 Total
Y1 Summer 6 6 -9 5 8 36 36 81 25
Y2 Winter -2 -1 1 2 0 4 1 1 4
Y3 Monson -4 -2 -1 -1 -8 16 4 1 1
Total 0 3 -9 6 0 56 41 83 30

2 T
2
0
2
( Grand total )
Step 4: T S S   X1   X2   X3   X 4  C . F  56  41  83  30  0  210
2 2 2 2
 X1      
2 2 2 2
(9)
2 2 2 2
X 2
X 3
X 4 0 3 6
Step 5: S S C      C .F     0
N1 N1 N1 N1 3 3 3 3
SSC  0  3  27  12  0  42
 Y   Y   Y 
2 2 2
(8)
2 2 2 2
1 2 3 8 0 6
Step 6: S S R     C .F      0  16  0  16  0  32
N2 N2 N2 4 4 4 4
Where N 2 = Number of elements in each row
Step 7: S S E = T S S -S S C -S S R  2 1 0  4 2  3 2
www.BrainKart.com
Source of Sum of Degrees of Mean Sum of varience F – ratio
Variation Squares Freedom Squares
Between SSC=42 c-1=4-1=3 SSC M SE
M SC  M SC 
Columns c 1 M SC
(Salesmen) 42 2 2 .6 7 F C ( 6 , 3 )  8 .9 4
  14 
3 14
 1 .6 1 9
Between SSR =32 r-1=3-1=2 SSR M SE F R ( 6 , 2 )  8 .9 4
M SR  M SR 
rows r 1 M SR
(Seasons) 32 2 2 .6 7
  16 
2 16
 1 .4 1 7
Error SSE=136 N-c-r +1=6 SSE
M SE 
N  c  r 1
136
  2 2 .6 7
6
Total 210 11
Table Value of F = F C (E rro r, d . f)  F C ( 6 , 3 )  8 .9 4 , F R (E rro r, d . f)  8 .9 4 with 5% level of
significance
Conclusion:
1) Cal F R < Table F R , 0 .0 5 ( 6 , 3 )
Hence we accept the H 0 and we conclude that there is no significant difference between sales in
the three seasons.
2) Cal F R < Table F R , 0 .0 5 ( 6 , 2 ) .
Hence we accept the H 0 and we conclude that there is no significant difference between in the
sales of 4 salesmen.
2. The following data represent the number of units of production per day turned out by
different workers using 4 different types of machines.
A B C D
Machine type
1 44 38 47 36
2 46 40 52 43
Workers 3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
(1) Test whether the five men differ with respect to mean productivity and
(2) Test whether the mean productivity is the same for the four different machine types.
Solution:
Null Hypothesis H0: There is no significant difference between the Machine types the Workers.
www.BrainKart.com
Alternate Hypothesis H1 : Significant difference between the Machine types between the Workers
Test statistic:
To simplify calculations we deduct 46 from each value
Machine type
workers
X12 X22 X32 X42
Total
worker A B C D
s X1 X2 X3 X4
Y1 -2 -8 1 -10 -19 4 64 1 100
Y2 0 -6 6 -3 -3 0 36 36 9
Y3 -12 -10 -2 -14 -38 144 100 4 196
Y4 -3 -8 0 -13 -24 9 64 0 169
Y5 -8 -4 3 -7 -16 64 16 9 49
Total -25 -36 8 -47 -100 221 280 50 523

Step 2: T=Grand Total = -100
2 T
2
(100)
2
10000
( Grand total )
Step 3: Correction Factor = =    500
Total No of Observatio ns N 20 20
Step 4: T S S   X1   X2   X3   X 4  C . F  221  280  50  523  500  574

2 2 2 2
Step 5:
 X1      
2 2 2 2
(25) (36) (47)

2 2 2 2
X 2
X 3
X 4 8
SSC      C .F      500
N1 N1 N1 N1 5 5 5 5
625 1296 64 2209
SSC      5 0 0  8 3 8 .8  5 0 0  3 3 8 .8
5 5 5 5
 Y   Y   Y   Y   Y 
2 2 2 2 2
1 2 3 4 5
Step 6: S S R       C .F
N2 N2 N2 N2 N2
(19) ( 3) (38) (24) (16)

2 2 2 2 2
      500
4 4 4 4 4
361  9  1444  576  256
  5 0 0  6 6 1 .5  5 0 0  1 6 1 .5
4
S S R  1 6 1 .5
Where N 2 = Number of elements in each row=4
Step 7: S S E  T S S  S S C  S S R  5 7 4  3 3 8 .8  1 6 1 .5  7 3 .7
10
www.BrainKart.com
Source of Sum of Degrees of Mean Sum of varience F – ratio
Variation Squares Freedom Squares
Between SSC=338.8 c-1=4-1=3 SSC M SC
M SC  M SC 
Columns c 1 M SE
(Salesmen) F C (3,1 2 )  3 .4 9
3 3 8 .8 1 1 2 .9
 
3 6 .1 4
 1 1 2 .9  1 8 .3 9
Between SSR r-1=5-1=4 SSR M SR F R ( 4 ,1 2 )  3 .2 6
M SR  M SR 
rows =161.8 r 1 M SE
(Seasons) 1 6 1 .5 4 0 .4
 
4 6 .1 4
 4 0 .4  6 .5 8
Error SSE=73.7 N-c-r +1 SSE
M SE 
=20-4-5+1 N  c  r 1
=12 7 3 .7

12
 6 .1 4
Total 574.3 19
Table Value of F = F C (E rro r, d . f)  F C ( 6 , 3 )  8 .9 4 , F R (E rro r, d . f)  8 .9 4 with 5% level of
significance
Conclusion:
1) Calculated value > table value
Hence we reject the H 0 and we conclude that there is significant difference between sales in the
three seasons.
2) Calculated value > table value
Hence we rejectthe H 0 and we conclude that there is significant difference between in the sales
of 4 salesmen.
Conclusion :There is significant difference between the Machine types and no significant difference
between the Workers
3. A laboratory technician measures the breaking strength of each of 5 kinds of linen threads
by using 4 different measuring instruments, and obtains the following results, in ounces.
I1 I2 I3 I4
Thread 1 20.9 20.4 19.9 21.9
Thread 2 25 26.2 27.0 24.8
Thread 3 25.5 23.1 21.5 24.4
Thread 4 24.8 21.2 23.5 25.7
Thread 5 19.6 21.2 22.1 22.1
Perform a 2 – way ANOVA using the 0.05 level of significance for both tests.
11
www.BrainKart.com
Solution:
Null Hypothesis: There is no significant difference between in breaking strength of various
threads H 01 :  1   2   3   4   5 and H 02 : 1   2   3   4
Alternate Hypothesis: There is a significant difference between in breaking strength of various
threads H 11 :  1   2   3   4   5 and H 12 :  1   2   3   4
ANOVA TABLE
Source of Sum of Degrees of Mean Sum of F – ratio
Variation Square Freedom Squares
Between 66.393 R–1=4 16.598 16 . 598
FR 
Rows 2 . 078
 7 . 987
Between 5.02 C–1=3 1.673 2 . 078

FC 
Columns 1 . 673
 1 . 242
Error 24.935 (C-1) (R-1) 2.078

=12
Total 96.348 N – 1 = 11
Table Value of F = F 0 .0 5 ( 4 ,1 2 )  3 .2 6 a n d F 0 .0 5 (1 2 , 3)  8 .7 4
Conclusion:
1) F R  3 . 26 . Hence we reject H 01 and we conclude that there is significant difference
between threads
2) F C  8 . 74 . Hence we accept H 02 and we conclude that there is no significant
difference between instruments.
12
www.BrainKart.com
4. Four varieties A,B,C,D of a fertilizer are tested in a randomized block design with 4
replication. The plot yields in pounds are as follows:
Column / Row 1 2 3 4
1 A(12) D(20) C(16) B(10)
2 D(1) A(14) B(11) C(14)
3 B(12) C(15) D(19) A(13)
4 C(16) B(11) A(15) D(20)
Analyse the experimental yield.

Solution:
H0: There is no significant difference between the fertilizers and replication
H1 :Significant difference between the fertilizers and replication
Variety Block Total

varieties
1 2 3 4
X12 X22 X32 X42
(X1) (X2) (X3) (X4)
A 12 14 15 13 54 144 196 225 169
B 12 11 11 10 44 144 121 121 100
C 16 15 16 14 61 256 225 256 196
D 18 20 19 20 77 324 400 361 400
58 60 61 57 236 868 942 963 865
N=16
T=Grand Total = 236
2 2
( Grand total ) ( 236 )
Correction Factor =   3481
Total No of Observatio ns 16
TSS    C . F  868  942  963  865  3481  157

2
X ij
i j

2
T* j
SSC   C . F  841  900  930  812  3481  2
h

2
T i*
SSR   C . F  729  484  930 . 25  1482 . 25  3481  144 . 5
k
SSE = TSS – SSC – SSR = 157-2-144.5=10.5
13
www.BrainKart.com
ANOVA Table
Source of Sum of Degree of Mean

F- Ratio FTabRatio
Variation Squares freedom Square
Between MSR=
SSR=144.5 h - 1= 3 FR = 39.27 F5%(3,9) = 3.86
varieties 48.17
Between
SSC=2 k – 1=3 MSC = 0.67
blocks
FC = 0.545 F5%(3, 9) = 3.86
(h – 1)( k – 1) MSE
Residual SSE = 10.5
=9 = 1.17
Conclusion: Cal FC<Tab FC and Cal FR> Tab FR  Therefore null hypothesis is rejected. Hence four
varieties are not similar. But the varieties are similar along block wise.
Latin Square Design:

Latin Square design controls variation in two direction of the experimental materials as rows
and columns resulting in the reduction of experimental error. The analysis of the design results
in a three way classification of analysis of variance. Data from Latin Square experiments form a
three way classification according the factors rows, columns, and treatments.
CRD RBD LSD
To influence one factor To influence two factor To influence more than
two factor
No restriction further No restriction on The number of

treatments treatment and replication of each
replications treatment is equal to
the number of
treatment
- Use only rectangular Use only Square filed
or Square field
The advantages of the Latin square design over other designs are:
(i) With a two way stratification or grouping, the Latin square controls more of the variation
than the CRD or the randomized completely block design. The two way elimination of variation
often results in small error mean square.
(ii) The analysis is simple.
(iii) Even with missing data the analysis remains relatively simple.
14
www.BrainKart.com
Working Procedure ( Three – Way classification )
we have seen data from a latin square experiment result in a three way classification
result in a three way classification say
(i) variety seeds
(ii) types of spacing(rows)
(iii) the letters for different manure treatment
H0: There is no difference between columns, between rows and between treatments
H1 :Not all are equal.

2
T
Step 3: Find the correction Factor = C . F 
N

2
 X
2
2
  X 3  ...   C . F
2

2
 X
2
2
  X 3  ...   C . F
2
 
 X1    
2 2 2
X X
Step 5: Find column sum of Square S S C    ...   C . F
2 3
 
 N1 N2 N3 
 
Where N i  Total number of observation in each column ( i  1, 2 , 3, ...)
 
 Y   Y   Y 
2 2 2
Step 6: Find Row sum of square = S S R    ...   C . F

1 2 3
 
 N1 N2 N3 
 
Where N j
 Total number of observation in each Row ( j  1, 2 , 3, ...)
Step 7: Find SSK for treatments

ANOVA Table for three way classification:

Variation Degrees freedom
If MSC>MSE
M SC
FC 
Column SSC M SE
SSC n-1 M SC 
Treatment n 1 If MSC<MSE
M SE
FC 
M SC
15
www.BrainKart.com
If MSR>MSE
M SR
FR 
Row SSR M SE
SSR n-1 M SR 
Treatments n 1 If MSR>MSE
M SE
FR 
M SR
If MSK>MSE
M SK
FK 
Between SSK M SE
SST n-1 M SK 
Treatments n 1 If MSK>MSE
M SE
FK 
M SK
Error (or) SSE

SSE (n-1) (n-2) M SE 
Residual ( n  1) ( n  2 )
1. Analyse the variance in the following latin square of yields (in kgs) of paddy where A, B,
C, D denote the different methods of cultivation.
D 122 A 121 C 123 B 122

B 124 C 123 A 122 D 125
A 120 B 119 D 120 C 121
C 122 D 123 B 121 A 122
Examine whether the different methods of cultivation have given significantly different
yields.
Solu.:
H0: There is no difference between columns, between rows and between treatments
H1 : Not all are equal.
We shift the origin Xij = xij – 100; n = 4; N = 16
16
www.BrainKart.com
I II III IV Total=Ti* [Ti*2]/ X*ij2

n
A 2 1 3 2 8 16 18
B 4 3 2 5 14 49 54
C 0 -1 0 1 0 0 2
D 2 3 1 2 8 16 18
Total=T*j 8 6 6 10 30 81 92
[T*j2]/n 16 9 9 25 59
Xi*2 24 20 14 34 92
Letters Total=Ti* [Ti*2]/n

P 1 2 0 2 5 6.25
Q 2 4 -1 1 6 9
R 3 3 1 2 9 20.25
S 2 5 0 3 10 25
Total 30 60.5
2 2
( Grand total ) ( 30 )
T=Grand Total = 30 ;Correction Factor = 
Total No of Observatio ns 16
2
( 30 )
TSS    C . F  92   35 . 75
2
X ij
i j 16

2
2
T i* ( 30 )
SSR   C . F  81   24 . 75
n 16

2
2
T* j ( 30 )
SSC   C . F  59   2 . 75
n 16

2
2
T i* ( 30 )
SSL   C . F  60 . 5   4 . 25
n 16
SSE = TSS – SSC – SSR-SSL = 35.75 – 24.75 – 2.75 – 4.25 = 4

ANOVA Table
Source of Sum of Degree of FTabRatio

Variation Squares freedom ( 5% level)
Between
SSR=24.75 n - 1= 3 MSR=8.25 FR= 12.31 FR(3, 6)=4.76
Rows
Between
SSC=2.75 n - 1= 3 MSC = 0.92 FC = 1.37
Columns
17
www.BrainKart.com
Between Fc(3, 6)=4 .76
SSL = 4.25 n - 1= 3 MSL = 1.42
Letters FL = 2.12
FL(3, 6)=4 .76
(n – 1)(n – 2)
Residual SSE= 4 MSE = 0.67
=6
Total 35.75
Conclusion :
Cal FC< Tab FC , Cal FL< Tab FL and Cal FR> Tab FR  There is significant difference between the rows ,
no significant difference between the letters and no significant difference between the columns
2. The following is a Latin square of a design when 4 varieties of seeds are being tested. Set
up the analysis of variance table and state your conclusion. The following is a Latin square
of a design when 4 varieties of seeds are being tested. Set up the analysis of variance table
and state your conclusion. You may carry out suitable change of origin and scale.
A 105 B 95 C 125 D 115
C 115 D 125 A 105 B 105
D 115 C 95 B 105 A 115
B 95 A 135 D 95 C 115
(APRIL / MAY ‘17)
Solu.:
H0 : Four varieties are similar

H1 : Four varieties are not similar
Let us take 100 as origin and divide by 5 for simplifying the calculation
Variety X1 X2 X3 X4 TOTAL X1 2 X2 2 X3 2 X4 2
Y1 1 -1 5 3 8 1 1 25 9
Y2 3 5 1 1 10 9 25 1 1
Y3 3 -1 1 3 6 9 1 1 9
Y4 -1 7 -1 3 8 1 49 1 9
6 10 6 10 32 20 76 28 28
N=Total No of Observations = 16 T=Grand Total = 32
2
( Grand total )
Correction Factor = = 64
Total No of Observatio ns
   
2 2 2 2
TSS  X1  X2  X3  X4  C . F  20  76  28  28  64  88
18
www.BrainKart.com
  
2 2 2 2 2 2 2
( X 1) ( X 2) ( X 3) (6 ) (10 ) (6 ) (10 )
SSC     C .F      64  4
N1 N1 N1 4 4 4 4
Y) Y Y Y
2 2 2 2 2 2 2 2
( 1
( 2
) ( 3
) ( 4
) (8 ) (10 ) (6 ) (8 )
SSR      C .F      64  2
N1 N2 N2 N2 4 4 4 4
To find SSK
Treatment 1 2 3 4 Total
A 1 1 3 7 12
B -1 1 1 -1 0
C 5 3 -1 3 10
D 3 5 3 -1 10
Y Y Y Y
2 2 2 2
SSK= ( 1
)

( 2
)

( 3
)

( 4
)
 C .F
K1 K 2
K3 K4
 22
SSE= TSS  SSCSSRSSK = 88-4-2-11=60

ANOVA Table
Degree
Source of Sum of
of Mean Square F- Ratio
Variation Squares
freedom
SSC
M SC 
Column M SC
SSC=4 n-1=3 n 1 FC  =7.52
Treatment M SE
=1.33
SSR
M SR 
Row MSR
SSR=2 n-1=3 n 1 FR  =14.9
Treatments MSE
=0.67
SSK
Between M SK  M SK
SST=22 n-1=3 n 1 FK  =1.36
Treatments M SE
=7.33
MSE
Error (or) (n-1) (n- SSE

SSE=60 
Residual 2)=6 ( n  1 )( n  2 )
 10
Table value F(3,6) degrees of freedom 8.94

There is significant difference between treatments.
19
www.BrainKart.com
2
2 F a c to ria l D e s ig n E x p e rim e n t:
In factorial experiment, the effect of several factors of variation are investigated simultaneously, the
treatment being all the combinations of different factors under study.
Note: 2 2 F a c to ria l  4 tre a tm e n t ( L e t b e s a y 1 ,a ,b ,a b )
Procedure:
1. Find N,T
2
T
2. Find Correction factor = C . F 
N
1. We proceed two way classification between treatment and blocks.
2. For 2  2 o r 2 2 f a c to r ia l :
Find
1
 a + a b -b -(1 ) 
2
SSA =
N
1
 b + a b -a -(1 ) 
2
SSB =
N
1
 a b + (1 )-a -b 
2
SSAB =
N
3. Find S S E = S S T -S S A -S S B -S S A B
ANOVA Table:

Variation Degrees freedom
If MSC>MSE
M SC
FC 
Between SSC M SE
SSC n-1 M SC 
Column n 1 If MSC<MSE
M SE
FC 
M SC
If MSR>MSE
Between SSR M SR
SSR n-1 M SR  FR 
Row n 1 M SE
If MSR<MSE
20
www.BrainKart.com
M SE
FR 
M SR
If MSA>MSE
M SA
Fa 
M SE
A SSA 1 M SA  SSA
If MSA<MSE
M SE
Fa 
M SA
If MSB>MSE
M SA
Fb 
M SE
B SSB 1 M SB  SSB
If MSB<MSE
M SE
Fb 
M SA
If MSAB>MSE
M SAB
Fa b 
M SE
AB SSAB 1 M SAB  SSAB
If MSAB<MSE
M SE
Fa b 
M SAB
Error (or) SSE

SSE n-c-r+1 M SE 
Residual n  c  r 1
Problem:
1. Analyse 22 factorial experiment for the following table
Treatment
I II III IV
(l) 64 75 76 75
(k) 25 14 12 33
(p) 30 50 41 25
(kp) 6 33 17 10
21
www.BrainKart.com
Solution:
Treatme
I II III IV
nt
(l) 64 75 76 75
(k) 25 14 12 33
(p) 30 50 41 25
(kp) 6 33 17 10
We shift the origin Xij = xij – 37;

Treatment I II III IV Total=Ti [Ti*2]/n ΣX*ij2
*
(l) 27 38 39 38 142 5041 5138

(k) -12 -23 -25 -4 -64 1024 1314
(p) 7 13 4 -12 12 36 378
(kp) -31 -4 -20 -27 -82 1681 2106

Total=T*j -9 24 -2 -5 8 7782 8936
[T*j2]/n 20.25 144 1 6.25 171.5
T=Grand Total = 8: N=16
(Grand total ) 2 (8) 2
Correction Factor =  4
Total No of Observations 16
TSS  ∑∑ X ij2 − C.F  8936 − 4  8932
i j
SSR 
∑Ti* 2
− C.F  7782 − 4  7778
n
SSC 
∑T* j 2
− C.F  171.5 − 4  167.5
n
SSE = TSS – SSC – SSR = 8932 – 7778 – 167.5 = 986.5
[k] = [kp] – [p] + [k] – [1] =-300 ;[p] = [kp] + [p] - [k] – [1] =-148
[kp] = [kp] – [p] - [k] + [1] =126
Sk = [k]2 /4r = 5625; Sp = [p]2 /4r = 1369; Skp = [kp]2 /4r = 992.2
22
www.BrainKart.com
ANOVA Table
Degree FTab
Source of Sum of
of Mean Square F- Ratio
Variation Degrees
freedom
Between SSC If MSC<MSE

M SC   5 5 .8 3
SSC=167.5 3 n 1
Column M SE F0.05(3 ,9)=3.86
FC 
M SC
 1 .9 6 3
If MSR>MSE
Between SSR M SR F0.05(3 ,9)=3.86

M SR  FR 
SSR=7778 3 n 1 M SE
Row  2 5 9 2 .6 7  2 3 .6 5
F0.05(1 ,9)=5.12
If MSA>MSE
M SA
k SSk=5625 1 M SA  SSA  5625 Fa 
M SE
 5 1 .3 2
If MSB>MSE F0.05(1 ,9)=5.12
M SA
p SSp=1369 1 M SB  SSB  1369 Fb 
M SE
 1 2 .4 9
If F0.05(1 ,9)=5.12
MSAB>MSE
SSkp=992. M SAB  SSAB
M SAB
kp 1 Fa b 
2  9 9 2 .2
M SE
 9 .0 5
SSE
Error (or) M SE 
SSE=986.5 9 n  c r 1
Residual
 1 0 9 .6 1
Conclusion : Cal Fk > Tab Fk , Cal Fp > Tab Fp and Cal Fkp> Tab Fkp
There is significant difference between the treatments.
23
www.BrainKart.com
24
www.BrainKart.com
UNIT – IV:
NON-PARAMETRIC TESTS
PROBLEMS
1. Calculate the expected frequencies for the following data presuming two attributes viz., conditions
of home and condition of child as independent.
Condition of home
Clean Dirty
Condition of Child Clean 70 50
Fair 80 20
Dirty 35 45
Use Chi-Square test at 5% level of significance to state whether the two attributes are independent.
(AU NOV/DEC 2013)
Solution:
Null hypothesis: Two attributes are independent.
Alternative hypothesis: Two attributes are not independent.
2
(𝑂 − 𝐸 )
2
𝜒 = ∑𝑟𝑖=1 ∑𝑠𝑗=1 ( 𝑖𝑗 𝑖𝑗 ) follows chi-square distribution with (r-1)(s-1) degrees of freedom.
𝐸𝑖𝑗
𝜒 2 = 25.633
Table value at 5% level of significance is 5.991.
Null hypothesis is rejected.
2. A group of 19 pilots were trained in three different methods video cassette, audio cassette and class
room training. Their scores in written exams were as follows:
Video cassette : 74 88 82 93 55 70
Audio cassette : 78 80 65 57 88
Class room training : 68 50 91 84 77 94 81 92
Test whether there is any difference in the effectiveness of the three methods. Use appropriate rank sum
test.
Solution:
Here three independent populations are given
we use Kruscal – wallis H-Test.
Null Hypothesis: All the Populations are identical.
H 0 : 1   2   3
Alternate Hypothesis: All the Populations are not identical.
H 1 : 1   2   3
The test statistics:
12  k Ri2 
H or W     3(n  1)
n(n  1)  i 1 ni 
Where
k = 3 (Number of populations or samples )
Ri = sum of the ranks of all items in sample i
Analysis:
Ranking the data jointly from 1 to 19, we find that
R1 = sum of the ranks of all items in sample 1
www.BrainKart.com
=7+14+12+18+2+6
=59
R 2 = sum of the ranks of all items in sample 2
=9+10+4+3+15
=41
R 3 = sum of the ranks of all items in sample 3
=5+1+16+13+8+19+11+17
=90
n1 = 6 (the no. of items in sample 1)
n   n i  n1  n 2  n 3  ...  n k
 658
 19
12  R12 R22 R32 
H or W       3(n  1)
n(n  1)  n1 n2 n3 
12  59  412  902   3(20)
2
   
19(20)  6 5 8 
 0.95
The sampling distribution of W can be approximated by a  2 distribution with
k  1 =3-1=2 degrees freedom.   0.05
  2  5.991
Conclusion: Here H   2 then we Accept Null hypothesis H 0
Hence we conclude that the given three methods are equally effective.
3. Use the sign test to see if there is a difference between the number of days until collection of
an account receivable before and after a new collection policy. Use the 0.05 significance level.
Before: 30 28 34 35 40 42 33 38 34 45 28 27 25 41 36
After : 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
Solution:
Null Hypothesis: There is no significant difference between the two types of
collections.
(i.e) H 0 : P  0.5
Alternate Hypothesis: There is a significant difference between the two types of
Collections.
(i.e) H 1 : P  0.5
Find d i  xi  yi
=- - + + + - - - - + + - - + 0
By omitting zero differences, we get n=14
No. of + signs = 6
No. of - signs = 8
www.BrainKart.com
p = 0.43 q = 0.57
np=6 and nq =8 is greater than 5
we use normal distribution
pq
The standard error of the proportion  p 
n
(0.57)(0.47)
=
14
=0.132
  0.05  Z   1.96
The confidence interval ( P   p Z  , P   p Z  )
i.e ( 0.5  (0.132)(1.96) , 0.5  (0.132)(1.96) )
i.e 0.5  0.26, 0.5  0.26
i.e 0.241 , 0.76
Conclusion:
Here the sample proportion p = 0.57 lies within these two limits, so we Accept our
Null hypothesis H 0 .
Hence there is no significant difference between the two types of collections.
4. Write the merits and demerits of non parametric tests.
Merits and demerits of non parametric tests:

Merits:
1. They do not require us to make the assumption that a population is distributed in the shape of a normal
curve or another specific shape.
2. Generally they are easier to do and to understand
3. Sometimes even formal ordering or ranking is not required.
Demerits:
1. They ignore a certain amount of information
2. They are often not us efficient or sharp as parametric tests
3. The non-parametric tests cannot be used to estimate parameters in the populations (or) the confidence
intervals for such parameters
It is not possible to solve certain statistical problems by using non parametric tests. A good is the type of
problem dealt in the analysis of variance.
5. Two methods of instruction to apprentices are to be evaluated. A director assigns 15 randomly
selected trainees to each of the two methods. Due to drop outs, 14 complete in batch 1 and 12
complete in batch 2. An achievement test was given to these successful candidates. Their
scores are as follows.
Method 1: 70 90 82 64 86 77 84 79 82 89 73 81 83 66
Method 2: 86 78 90 82 65 87 80 88 95 85 76 94
Test whether the two methods have significant difference in effectiveness. Use Mann-Whitney
test at 5% significance level.
Solution:
Null Hypothesis: There is no significant difference between the two methods
(i.e) H 0 : 1   2
Alternate Hypothesis: There is a significant difference between the two methods
(i.e) H 0 : 1   2
www.BrainKart.com
Level of significance:   5%
U  U  E (U )
Z or z 
 var(U )
Analysis:
Method I Rank R1 Method II Rank R1

70 4 86 18.5
90 23.5 78 8
82 13 90 23.5
64 1 82 13
86 18.5 65 2
77 7 87 20
84 16 80 10
79 9 88 21
82 13 95 26
89 22 85 17
73 5 76 6
81 11 94 25
83 15
66 3
Total 161 190
Here,
n1 = 14 (no. of values in sample I)
n2 = 12 (no. of values in sample II)
R1 = 161 (Sum of the Ranks of the first sample.)
R 2 = 190 (Sum of the Ranks of the second sample.)
n1 (n1  1)
U  n1 n 2   R1
2
14(14  1)
 (14)(12)   161
2
 112
n1n2

2
(14)(12) (14)(12)
   84
2 2
n n (n  n  1)
 1 2 1 2
12
(14)(12)(14  12  1)
  19.44
12
Now,
U  112  84
Z   1.44
 19.44
www.BrainKart.com
  5%  Z   1.96
Conclusion :
Since Z  Z , we Accept Null Hypothesis H 0
Hence, there is no significant difference between the two methods.
6. Kevin Morgan, national sales manager of an electronics firm, has collected the following salary
statistics on his field sales force earnings. He has both observed frequencies and expected
frequencies if the distribution of salaries is normal. At the 0.05 level of significance, can Kevin
conclude that the distribution of sales force earnings is normal?
Earnings in thousands 25-30 31-36 37-42 43-48 49-54 55-60 61-66

Observed frequency 9 22 25 30 21 12 6
Expected frequency 6 17 32 35 18 13 4
Solution:
Null Hypothesis:
H 0 : The distribution of salesforce earnings a normal.
Alternate hypothesis:
H 1 : The distribution of salesforce earnings is not normal.
The Test statistic:

Dn  max Fe  Fo with n degrees of freedom
Where Fe = Expected relative frequency
Fo = Observed relative frequency
Analysis:
Observed Observed Observed Expected Expected Expected D  Fe  Fo
frequency cumulative relative frequency cumulative relative
frequency frequency frequency frequency
9 9 0.072 6 6 0.048 0.024
22 31 0.248 17 23 0.184 0.064
25 56 0.448 32 55 0.440 0.008
30 86 0.688 35 90 0.720 0.032
21 107 0.856 18 108 0.864 0.008
12 119 0.952 13 121 0.968 0.076
6 125 1 4 125 1 0
  0.05 degrees freedom = n = 7
The table value of Dn = 0.486
Dn  max Fe  Fo  0.064
Conclusion:
Since the calculated value of Dn < table value of Dn
We Accept our Null Hypothesis H 0 .Hence, the distribution of sales force earnings a normal
www.BrainKart.com
7. The following contingency table presents the reactions of legislators to a tax plan according to party
affiliation. Test whether party affiliation influences the reaction to the tax plan at 0.01 level of
signification.
Reaction
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
Solution:
Given
Null hypothesis H 0 : Party affiliation and tax plan are independent.
Alternate hypothesis H 1 : Party affiliation and tax plan are not independent.
The test statistics:   
2

i 1 i 1 Eij
Analysis:
Reaction
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
160  220
E(120)=  88
400
160  60
E(20)=  24
400
160  120
E(20)=  48
400
140  220
E(50)=  77
400
140  60
E(30)=  21
400
140  120
E(60)=  42
400
100  220
E(50)=  55
400
100  60
E(10)=  15
400
www.BrainKart.com
120  100
E(40)=  30
400
O ij E ij O ij - E ij (Oij  Eij ) 2 (Oij  Eij ) 2
Eij
120 88 32 1024 11.64
20 24 -4 16 0.67
20 48 -28 784 16.33
50 77 -27 729 9.47
30 21 9 81 3.86
60 42 18 324 7.71
50 55 -5 25 0.45
10 15 -5 25 1.67
40 30 10 100 3.33
  2  55.13
  0.05 Degrees of freedom = (r  1)(s  1)  (3  1)(3  1)  4
  2  13.28
Conclusion:
Since  2   2 , we Reject our Null Hypothesis H 0
Hence, the Party Affiliation and tax plan are dependent.
8. A technician is asked to analyze the results of 22 items made in preparation run. Each item has
been measured and compared to engineering specifications. The order of acceptance ‘a’ and
rejections of ‘r’ is
aarrrarraa aaarrarraa ra
Determine whether it is a random sample or not. Use   0.05 .
Solution:
Null Hypothesis H 0 : The Observations are randomly generated.

Alternate hypothesis H 1 : The Observations are not randomly generated (two tailed)
The Test statistic:
R 2n1n2 2n1n2 (2n1n2  n1  n2 )
Z Where E ( R)    1 
 n1  n2 (n1  n2 )2 (n1  n2  1)
www.BrainKart.com
Analysis:
aa rrr a rr aaaaa rr a rr aa r a
1 2 3 4 5 6 7 8 9 10 11
R = No. of Runs = 11
n1  12 (no. of items in the first sample)
n2 = 10 (no. of items in the second sample)
2(12)(10)
E ( R)     1  11.909
12  10
2n1n2 (2n1n2  n1  n2 ) 2(12)(10)(2(12)(10)  12  10)
   2.2688
(n1  n2 ) (n1  n2  1)
2
(12  10) 2 (12  10  1)
R   11  11.909
Z   0.4007 Z  0.4007
 2.2688
  0.05  Z   1.96
Conclusion:
Since Z  Z , we Accept our Null hypothesis H 0
Hence, The sample is randomly chosen.
9. From a poll of 800 television viewers, the following data have been accumulated as to, their
levels of education and their preference of television stations. We are interested in determining
if the selection of a TV station is independent of the level of education
Educational Level
Total 200 400 200 800
(ii) Show the contingency table of the expected frequencies.
(iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for this test.
(AU JAN 2014)
Solution:
(i)Null Hypothesis: Selection of TV station is independent of level of education
Alternative Hypothesis: Selection of TV station is not independent of level of education
(ii)
Educational Level
Total 200 400 200 800
(iii)Test statistic = 12.088

(iv) Critical Chi-Square = 5.991, reject Null Hypothesis.
www.BrainKart.com
10. The manager of a company believes that differences in sales performance depend upon the
salesperson’s age. Independent samples of salespeople were taken and their weekly sales record
is reported below.
Below 30 years Between 30 and 45 years Over 45 years
No. of Sales No. of Sales No. of Sales
24 23 30
16 17 20
21 22 23
15 25 25
19 18 34
26 29 36
27 28
(ii) At 95% confidence, test the hypotheses using Kruskal Wallis Test. (AU JAN 2014)
Solution:
(i) Null Hypothesis: All three populations are identical.
Alternative Hypothesis: Not all populations are identical.
(ii) W=6.78, Critical 𝜒 2 = 5.991, at 0.05 level of significance with 2 degrees of freedom.
Hence Reject Null Hypothesis.
www.BrainKart.com
UNIT 5 CORRELATION & REGRESSION

PART B
1. Calculate the coefficient of correlation and obtain the lines of regression for the following:
X 1 2 3 4 5 6 7 8 9
Y 9 8 10 12 11 13 14 16 15
Obtain an estimate of Y which should correspond to the value of X = 6.2
Solution:
X Y X2 Y2 XY
1 9 1 81 9
2 8 4 64 16
3 10 9 100 30
4 12 16 144 48
5 11 25 121 55
6 13 36 169 78
7 14 49 196 98
8 16 64 256 128
9 15 81 225 135
45 108 285 1356 597
Since x 
 x  45  5 and y
 y  108  12
n 9 n 9
n xy   x  y 
r  0.95
n x   x  n y   y 
2 2 2 2
n xy   x  y  n xy   x  y 
bxy   0.95 and byx   0.95
n y 2   y  n x 2   x 
2 2
 
Re gression line y on x is y  y  byx x  x  y  0.95 x  7.25
Re gression line x on y is x  x  bxy y  y   x  0.95 y  6.4
The value of correspond ing to x  6.2 is y  0.95  6.2  7.25  13.14
2. Explain the basic components of Time series analysis.
Solution:
The Components of a time series are
1. Secular Trend
2. Seasonal Variations
3. Cyclical Variations
4. Irregular Variations
1. Secular Trend
Trend, also called secular or long term trend, is the basic tendency of a series to grow or decline
over a period of time. The concept of trend does not include short range oscillations, but rather the
steady movement over a long time. If the values of the time series plotted on a graph paper,
cluster around a straight line, the trend is said to be a linear trend. Then the rate of growth in nearly
constant. If the plotted points do not fall in the pattern of a straight line, the trend is said to be a
non-linear trend or curvilinear trend.
www.BrainKart.com
2. Seasonal Variations
Seasonal variations are variations which occur with some degree of regularity within a specific period
of one year or shorter. Seasons could be weekly, monthly, quarterly or half yearly depending on the
nature of the phenomenon. Production, consumption and prices of commodities show seasonal variations.
These variations are periodic and regular.
Eg. Prices of agricultural commodities go down at the time of harvest.
Seasonal variations may occur due to the following causes.
(1) Climate and Weather conditions.
(2) Customs, traditions and habits
A study of seasonal variations is helpful in scheduling purchases, inventory control, personnel
recruitment, advertising et. A consumer can gain by purchasing things during slack season.
3. Cyclical Variations
Cyclical variations in a time series are the recurrent variations whose duration is more than one year.
A business cycle has four phases – boom, recession, depression and recovery. These phases are uniform
but their time duration may vary from cycle to cycle. In spite of the importance of measuring cyclical
variations, they are very difficult to measure due to the following reasons.
(i) Business cycles do not show regular periodicity
(ii) The cyclical variations are associated with erratic, random or irregular faces.
4. Irregular Variations
Irregular variations refers to those variations in business or other activities, which do not repeat in a
definite pattern. They are caused by random factors like floods, earth quakes, famines, wars, strikes,
lockout etc. Sudden changes in demand or very rapid technological progress may also be responsible
for these variations. No advance preparation can be done to meet the consequences of irregular
variations and their effects are unpredictable and irregular.
3. Given below are the figures of production (in thousand quintals) of a sugar factory.
Year 1974 1975 1976 1977 1978 1979 1980
Production 77 88 94 85 91 98 90
Fit a straight line by the least squares method and tabulate the trend values.
Solution.
Year Production Trend
X X2 XY
(y) Values
1974 77 -3 9 -231 83
1975 88 -2 4 -176 85
1976 94 -1 1 -94 87
1977 85 0 0 0 89
1978 91 1 1 91 91
1979 98 2 4 196 93
1980 90 3 9 270 95
Total 623 0 28 56
The straight line trend equation is Y = a + bx
Normal equations are
www.BrainKart.com
a
 y  623  89
 y  na  b x , if sum of x is zero then
n 7
 xy  a x  b x 2
b
 xy  56  2
 x 28
2
Therefore the Trend equation is Y = 89 + 2x

4. Find the two regression lines using the data below: (AU NOV/DEC 2013)
X 7 4 8 6 5
Y 6 5 9 8 2
Solution:
∑ 𝑋 = 30, ∑ 𝑌 = 30, ∑ 𝑋 2 = 190, ∑ 𝑌 2 = 210, ∑ 𝑋𝑌 = 192

𝑋̅ = 6, 𝑌̅ = 6
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑥𝑦 = 𝑛 ∑ 𝑦 2 −(∑ 𝑦)2 =0.4
𝑛 ∑ 𝑥𝑦−∑ 𝑥 ∑ 𝑦
𝑏𝑦𝑥 = =1.2 Regression line of y on x is 𝑦 − 𝑦̅ = 𝑏𝑦𝑥 (𝑥 − 𝑥̅ )
𝑛 ∑ 𝑥 2 −(∑ 𝑥)2
y=1.2x-1.2
Regression line of x on y is 𝑥 − 𝑥̅ = 𝑏𝑥𝑦 (𝑦 − 𝑦̅), x=0.4y-3.6
5. The following data on production (in ‘000 units) of a commodity from the year 2006-2012. Fit a
straight line trend and forecast for the year 2020
Year 2006 2007 2008 2009 2010 2011 2012
Production 6 7 5 4 6 7 5
(AU NOV/DEC 2013)
Solution:
Trend line Y=a+bX,
Normal equations are ∑ 𝑌 = 𝑛𝑎 + 𝑏 ∑ 𝑋
∑ 𝑋𝑌 = 𝑎 ∑ 𝑋 + 𝑏 ∑ 𝑋 2
a=5.71,
b=0.642
Y=5.71+0.642X, X=x-2009
When x = 2020, y=12.772
www.BrainKart.com
6. The monthly water consumption in thousand gallons in a hostel for five years is given below.
Calculate the seasonal indices by the method of simple averages
Year Jan Feb Mar Apr May June July Aug Sep Oct Nov Dec
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
Solution:
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
Total 183 169 164 157 140 152 160 177 164 184 224 217
Avg. 36.6 33.8 32.8 31.4 28 30.4 32 35.4 32.8 36.8 44.8 43.4
Seasonal
105.02 96.99 94.12 90.10 80.34 87.23 91.82 101.58 94.12 105.60 128.55 124.53
Index
Monthly Average
Seasonal Index =  100
General Average
www.BrainKart.com
7. The quarterly sales (in thousands of copies) for a specific education software over the past three
years are given in the following table.
2003 2004 2005
Quarter 1 170 180 190
Quarter 2 111 96 120
Quarter 3 270 280 290
Quarter 4 250 220 223
(i) Compute the four seasonal factors (Seasonal Indexes). Show all of your computations.
(ii) The trend for these data is Trend = 174+4t (t represents time, where t=1 for Quarter 1 of 2003
And t=12 for Quarter 4 of 2005). Forecast sales for the first quarter of 2006 using the trend and
seasonal indexes.
Show all of your computations.(AU JAN 2014).
Solution:
(i)
2003 2004 2005 Quarter Quarter Seasonal

Total Average Index
Quarter 1 170 180 190 540 180 0.900
Quarter 2 111 96 120 327 109 0.545
Quarter 3 270 280 290 840 280 1.400
Quarter 4 250 220 223 693 231 1.155
Overall average =200

(ii)Trend = 174+4t=174+4(13) = 226.
Forecast = Trend (SI for Quarter 1 ) = (226)(0.9)=203.40.
www.BrainKart.com
Jeppiaar Nagar, Rajiv Gandhi Salai – 600 119
DEPARTMENT OF MASTER OF BUSINESS ADMINSTRATION
QUESTION BANK
I SEMESTER
BA 4101 – STATISTICS FOR MANAGEMENT
Regulation – 2021 (Batch: 2023 -2025)
Academic Year 2023 – 2024
Prepared by
Dr. P. SIVAGAMI, Assistant Professor/MATHEMATICS [S&H]
www.BrainKart.com
Jeppiaar Nagar, Rajiv Gandhi Salai – 600 119
DEPARTMENT OF MASTER OF BUSINESS ADMINSTRATION
QUESTION BANK
I SEMESTER
BA 4101 – STATISTICS FOR MANAGEMENT
Regulation – 2021 (Batch: 2023 -2025)
Academic Year 2023 – 2024
Prepared by
Dr. P. SIVAGAMI, Assistant Professor/MATHEMATICS [S&H]
www.BrainKart.com
UNIT –I : PROBABILITY
PART – A
1. If A and B are independent events then prove that A and B are independent.
Since A and B are independent,
P  A  B   P  A P  B     (1)
P  A  B   P  A  P  A  B   P  A  P  A P  B  [using (1)]
= P  A 1  P  B 
P  A  B   P  A P  B   A & B are independent events
2. 1 3 1
Let A and B be two events such that P(A) = , P(B) = , P(A  B) = . Compute P  A B  and
3 4 4
P  A  B . (May/June 2019)
1
P  A  B
P  A  B  P  B  P  A  B 
1 3 1 2 1
P  A B  4 .    .
P( B) 3 3 4 4 4 2
4
 
3. 3 1 2
If A and B are two events such that P(A  B) = , P(A  B) = , P(A) = , find P A B .
4 4 3
3 1 1 2
P( A  B)  P( A)  P( B)  P( A  B)    P( B)   P( B) 
4 3 4 3
 
2 1
P A B 
P( B)  P( A  B) 3 4 5

P AB   P( B)

P( B)

2

8
3
4. 5 3 12
If P ( A)  , P ( B )  , P ( A  B )  find P ( A  B ) .
13 7 91
P( A  B) = P(A) + P(B) – P(A∩B) = 5/13 +3/7 – 12/91 = 62/91
5. State Baye’s Theorem on Probability. (AU NOV/DEC 2013, APR/MAY 2018)
If E1, E2…En are a set of exhaustive and mutually exclusive events associated with a random experiment
P ( Ei ) P ( A / Ei )
and A is any other event associated with Ei. Then P ( Ei / A)  n , i=1,2,..n
 P ( Ei ) P ( A / Ei )
i 1
6. What are mutually exclusive and independent events? (AU JAN 2015)
Two events are said to be mutually exclusive if the occurance of any one of them excludes the occurance of
other in a single experiment. Example: Tossing of Coin.
Two (or) more events are independent if the occurance of one does not affect the occurance of the other.
Example: If coin is tossed twice; result of second throw is not affected by the result of first throw.
7. Define Random variable. (AU JAN 2014)
A random variable is a function that assigns a real number X(S) to every element S in the sample space
corresponding to a random experiment E.
i.e., X: S  R, S-Sample Space and R-Real Numbers
8. Explain discrete and continuous variable with examples. (AU NOV/DEC 2013)
A random variable X is said to be discrete, if it takes finite or countable number of values.
Example: X=1,2,3,4,5
A random variable X is said to be continuous, if it takes uncountable number of values.
Example: X is defined in any interval.
www.BrainKart.com
9. Let X be a discrete R.V. with probability mass function

x
 , x = 1, 2, 3, 4 X
P(X = x) =  10 . Compute P(X < 3) and E   . (May/June 2016)
 2
0, otherwise
X= x 1 2 3 4
P(X=x) 1/10 2/10 3/10 4/10
1 2 3
P( X  3)  P( X  1)  P( X  2)   
10 10 10
X 1 1 4 1  1 2 3 4 
E    E  X    xP( X  x)  1   2   3   4  
2 2 2 x 1 2  10 10 10 10 
1 1  4  9  16  1 30 3
     .
2 10  2 10 2
10 

0, x<0
The CDF of a continuous random variable is given by F(x) =  -
x

1 - e , x  0
5
Find the PDF of X and mean of X.

0, x0
d 
PDF = f ( x)   F ( x)    1  x
dx  e , x0
5
5

  x    x 
1 e 5   e 5 
 
1  5x 25
E ( X )   xf ( x) dx   xe dx  ( x)    (1)     5
 0
5 5 1
    1  5
  5   25   0
11. The mean of a Binomial distribution is 20 and standard deviation is 4. Find the parameters of the
distribution.
4 4 1
np = 20 and npq  4  npq 16  (20)q  16  q  p 1 q  1  .  np  20  n 100
5 5 5
1
 100 and are the parameters.
5
12. A random variable X has the following probability distribution
X : 2 1 0 1 2 3
P X : 0.1 K 0.2 2 K 0.3 3K
Find K
Since P X  1
0.1 K 0.2 2K 0.3 3K 1
0.4 1
6K 0.6 1 6K 0.4 K
6 15
13. Find the probability of getting a total of 5 at least once in three tosses of pair of fair dice .
4 1 2
p = 36 = 9 , q = 9 , n = 3
let X=number of times getting total 5, 𝑋~𝐵(𝑛, 𝑝)
⇒ 𝑃(𝑋 = 𝑥) = 𝑛𝐶𝑥 𝑝 𝑥 𝑞 𝑛−𝑥
www.BrainKart.com
1 0 2 3−0 23
𝑃(𝑋 ≥ 1) = 1 − 𝑃(𝑋 < 1) = 1 − 𝑃(𝑋 = 0) = 1 − 3𝐶0 ( ) ( ) = 1 − 3 = 0.99
9 9 9
14. One percent of jobs arriving at a computer system need to wait until weekends for scheduling, owing
to core-size limitations. Find the probability that among a sample of 200 jobs there are no jobs that
have to wait until weekends.
p = 0.01, n = 200,  = np = 2, X is the no. of jobs that have to wait
 x 2
e  e (2) x
e2 (2)0
P(X  x)    P(X  0)   e2  0.1353.
x! x! 0!
15. The number of monthly breakdown of a computer is having a Poisson distribution with mean equal
to 1.8. Find the probability that this computer will function for a month with only one breakdown.
(AU MAY/JUNE 2019)
Mean =  = np = 1.8, X = No. of breakdowns per month.
e  x e1.8 (1.8) x e1.8 (1.8)1
P(X  x)    P(X  1)   0.2975.
x! x! 1!
16. 4
If X is a Uniformly distributed random variable with mean 1 and variance , find P(X<0).
3
b  a 4
2
ab
Mean =  1  a  b  2 and variance =   b  a 4
2 12 3
By solving the above eqns. We get a = -1 and b = 3
1
𝑓(𝑥) = 𝑖𝑛 𝑎 < 𝑥 < 𝑏
𝑏−𝑎
0 0
1 1 1 1
𝑓(𝑥) = 𝑖𝑛 − 1 < 𝑥 < 3 ⇒ 𝑃(𝑋 < 0) = ∫ 𝑓(𝑥)𝑑𝑥 = ∫ 𝑑𝑥 = [𝑥]0−1 =
4 −1 −1 4 4 4
17. If R.V ‘X’ is uniformly distributed over (-3,3), then compute P ( | X – 2 | < 2).
1 1
𝑓(𝑥) = 𝑖𝑛 𝑎 < 𝑥 < 𝑏 = 𝑖𝑛 − 3 < 𝑥 < 3
𝑏−𝑎 6
3 3
1 1 3 1
P ( | X – 2 | < 2) = P ( -2 < X – 2 < 2) = P ( 0 < X < 4) =  f (x)dx  dx  x 0  .
0 0
6 6 2
18. Define Normal distribution.
A random variable X is said to have a Normal distribution with parameters  (mean) and  2
(variance) if its probability density function is given by the probability law
1  x 
2
1  
 
f  x  e 2 ,   x  ,     ,   0
 2
19. State Poisson distribution as limiting form of binomial distribution.

Poisson distribution is a limiting case of Binomial distribution under the following conditions:
(i). n the number of trials is indefinitely large, (i.e.) n  
(ii). P the constant probability of success in each trial is very small (i.e.) P0
(iii). np   is finite.
20. X is a normal variate with mean = 30 and S.D = 5. Find P  26  X  40
X follows N(30, 5)   30 &   5
X 
Let Z  be the standard normal variate

www.BrainKart.com
 26  30 40  30 
P  26  X  40  P  Z  P  0.8  Z  2
 5 5 
 P  0.8  Z  0  P 0  Z  2
 P 0  Z  0.8  0  Z  2  0.2881  0.4772  0.7653 .
PART B
1a. A bag contains 3 black and 4 white balls. Two balls are drawn at random one at a time without
replacement. (1) What is the probability that the second ball drawn is white?
(2) What is the conditional probability that the first ball drawn is white if the second ball is known to
be white? (May/June 2019)
1b. An industrial unit has 3 machines – 1, 2 and 3, which produce the same item. It is known that machines
1 and 2 each produce 30% of the total output, while machine 3 produces 40% of the remaining output.
It is also known that 2% of machine 1 output is defective, while machines 2 and 3, each produce 3%
defective items. All the items are put into one stockpile and then one item is chosen at random. Find
the probability that defective item is produced by machine 1 (AU JAN 2016)
1c. The first bag contains 3 white balls, 2 red balls and 4 black balls. Second bag contains 2 white, 3 red
and 5 black balls and third bag contains 3 white, 4 red and 2 black balls. One bag is chosen at random
and from it 3 balls are drawn. Out of three balls two balls are white and one is red. What are the
probabilities that they were taken from first bag, second bag, third bag.
2a. A consulting firm rents cars from three rental agencies in the following manner: 20% from agency
D, 20% from agency E and 60% from agency F. If 10% cars from D, 12% of the cars from E and
4% of the cars from F have bad tyres, what is the probability that the firm will get a car with bad
tyres? Find the probability that a car with bad tyres is rented from agency F.
2b. A random variable X has the following probability function:
X : 0 1 2 3 4 5 6 7
P X : 0 K 2K 2K 3K K 2 2K 2 7K 2 + K
Find (i) K , (ii) Evaluate P  X < 6  , P  X  6  and P  0 < X < 5  (iii) Determine the distribution
function of X. (iv) P 1.5 < X < 4.5 X > 2  (v) E  3X - 4  , Var(3X - 4)
1
(vi) If P  X  C > , find the minimum value of C. (April/May 2015)
2
3a. 1
A random variable X has the probability mass function f (x) = , x= 1,2,3,...
2x
Find its (i) M.G.F (ii) Mean (iii) Variance.
3b.  ax, 0 x1
 a, 1 x 2

If the density function of a continuous random variable X is given by f(x) = 
 3a - ax, 2  x  3

0, elsewhere
Find the value of a and find the c.d.f of X , P(X  1.5 ).
4a. In the production of electric bulbs , the quality specification of their life was found to normally
distributed with average life of 2100 hours and standard deviation of 80 hours . In a sample of 1500
bulbs, find out the expected number of bulbs likely to burn for
(i) more than 2200 hours ,
(ii) less than 1950 hours ,
(iii) more than 2000 hours but less than 2150 hours . (AU MAY/JUNE 2019) (APR /MAY 2018)
www.BrainKart.com
4b. Trains arrive at a station at 15 minutes interval starting at 4 a.m. If a passenger arrive at a station at
a time that is uniformly distributed between 9.00 a.m. and 9.30 a.m., find the probability that he has
to wait for the train for (i) less than 6 minutes (ii) more than 10 minutes.
5a. In a normal population with mean 15 and standard deviation 3.5, it is found that 647 observations
exceed 16.25. What is the total number of observations in the population?
5b. Derive MGF, Mean and Variance of Binomial and Poison Distribution.
UNIT –II : SAMPLING DISTRIBUTION AND ESTIMATION
PART – A
1. Define Population. (AU JAN 2016)
The group of individuals under study is called population. The population may be finite or infinite.
2. Define Sample and Sample Size. (AU JAN 2018)
A finite subset of statistical individuals in a population is called Sample. The number of individuals in a
sample is called Sample Size (n).
3. Define Parameter and Statistic. (AU JAN 2016)
A numerical measure of a population is called a population parameter or simply a parameter.
A numerical measure of the sample is called a sample statistic or simply a statistic.
4. Define Sampling distribution.
The sampling distribution of a statistic is the probability distribution of all possible values the statistic may
take, when computed from random samples of same size, drawn from a specified population. Like any
other distribution, a sampling distribution will have its mean, standard deviation and moments of higher
order.
5. Distinguish between estimate and estimator.
An estimator of a population parameter is a sample statistic used to estimate the parameter. An estimate of
the parameter is a particular numerical value of the estimator obtained by sampling.
6. What is Standard Error and mention its uses? (AU MAY/JUNE 2019)
The standard deviation of the sampling distribution of a statistic is known as its standard error.
The magnitude of the standard error gives an index of the reliability of the estimate of the parameter. The
greater the standard error of the estimate, lesser will be the reliability of the sample.Standard error is useful
for determining the probable limits or confidence limits for an unknown parameter with a specified
confidence co-efficient. Standard error is also used for testing of hypothesis.
7. Define Type I error and Type II error. (AU MAY/JUNE 2019)
Type I error: If we reject a hypothesis when it should be accepted, we say that type I error.
Type II error: If we accept a hypothesis when it should be rejected, we say that a type II error.
8. Define Critical region.
A region corresponding to a test statistic in the sample space which tends to rejection of H0 (Null
Hypothesis) is called critical region or region of rejection.
The region complementary to the critical region is called the region of acceptance.
9. Define level of significance.
The probability ‘’ (the probability of making type I error) that a random value of the test statistic belongs
to the critical region is known as the level of significance. In other words, level of significance is the size
of the type I error.
The levels of significance usually employed in testing of hypothesis are 5% and 1%.
10. Define Critical values or significant values.
The value of test statistic, which divides the critical (or rejection) region and acceptance region, is called
the critical value or significant value. It depends on the level of significance used and the alternative
hypothesis.
www.BrainKart.com
11. Write the two properties of the sampling distribution of the mean when the population is normally
distributed. (AU JAN 2016)
1. It has a mean equal to the population mean 𝜇𝑥̅ = 𝜇.
2. It has a standard deviation equal to the population standard deviation divided by the square root of the
𝜎
sample size 𝜎𝑥̅ = 𝑛 , where 𝜎𝑥̅ is the standard error of the mean.
√
12. State the characteristics of best estimator.
i)Unbiasedness ii)Efficiency iii)Consistency iv)Sufficiency
13. Define One tailed test and two tailed test. (AU NOV/DEC 2013)
When the hypothesis about the population parameter is rejected only for the value of sample statistic
falling into one of the tails of the sampling distribution, then it is known as one-tailed test.
If it is right tail then it is called right-tailed test or one-sided alternative to the right and if it is on the left
tail, then it is one-sided alternative to the left and called left-tailed test. Two tailed test is one where the
hypothesis about the population parameter is rejected for the value of sample statistic falling into the either
tails of the sampling distribution.
14. Given that n  400, x  250, s  40 for one sample and n  400, x  220, s  55 for another
1 1 1 2 2 2
sample, find the standard error of x1  x2 .

The standard error of x1  x2 is
1 1 s12 + s 2 2
S + , where S = = 2.4074
n1 n 2 n1 + n 2 - 2
 
S.E of x1 - x 2 = 2.4074
1
+
1
400 400
= 0.17023
Therefore standard error of x1  x2 is 0.17023.

15. Write 95% confidence interval of the population mean. (AU MAY/JUNE 2014)
S S
x  t 0.05    x  t 0.05
n 1 n 1
16. You want to determine whether the mean of the population from which this sample was taken is
significantly different from 48. State the null and the alternate hypothesis. (AU MAY/JUNE 2014)
Null hypothesis 𝐻0 : 𝜇 = 48 and Alternate hypothesis 𝐻1 : 𝜇 ≠ 48.
17. For test market find the sample size needed to estimate the true proportion of consumers satisfied
with a certain new product within 0.04 at 90% confidence level.
If proportion is not given, take p = q = 0.5. E = 0.04, Z = 1.645
Z 2 . pq 1.645  0.5 0.5 
2
n  2   423
 0.04 
2
E
18. State central limit theorem. (AU JAN 2014, 2015)
A sample of samples is always normally distributed about the mean of sample means, even if the samples
themselves are not normally distributed themselves about their means.
19. Differentiate between point estimate and interval estimate (AU JAN 2015)
Point estimate Interval estimate
When a single value is used as an estimate, the estimate An estimate of a population parameter given
is called a point estimate of the population parameter. by two numbers between which the parameter
For example, the sample mean is the sample statistic may be considered to lie is called an interval
used as an estimate of population mean estimate of the parameter.
www.BrainKart.com
20 Write the confidence interval for the population mean for large samples when  is known.
The confidence interval for μ when  is known and sampling is done from a normal population or with a

large sample, is x  Z /2 .Here x - sample mean,  - standard deviation, n – size of the sample.
n
PART – B
1a. Below you are given the values obtained from a random sample of 4 observations taken from an
infinite population.
32, 34, 35, 39
(i) Find a point estimate for µ. Is this an unbiased estimate of µ? Explain.
(ii) Find a point estimate for . Is this an unbiased estimate of ? Explain.
(iii) Find a point estimate for .
(iv) What can be said about the sampling distribution of ? Be sure to discuss the expected value,
the standard deviation and the shape of the sampling distribution of . (AU JAN 2014)
1b. In a sample of 25 observations from a normal distribution with mean 98.6 and standard deviation
17.2. What is [𝟗𝟐 < 𝒙̅ < 102] ? (AU JAN 2016)
2a. Explain the types of estimation and the qualities of a good estimator. (AU NOV/DEC 2013)
2b. In a random sample of 75 axle shafts, 12 have a surface finish that is rougher than the specifications
will allow. Suppose that a modification is made in the surface finishing process and subsequently a
second random sample of 85 axle shafts is obtained. The number of defective shafts in this second
sample is 10. Obtain an approximate 95% confidence interval on the difference in the proportions of
defectives produced under the two processes. (AU JAN 2016)
3a. In a batch chemical process used for etching printed circuit boards, two different catalysts are being
compared to determine whether they require different emersion times for removal of identical
quantities of photoresist material. Twelve batch were run with catalyst 1, resulting in a sample mean
emersion time of 24.6 minutes and sample standard deviation of 0.85 minutes. Fifteen batches were
run with catalyst 2, resulting in a mean emersion time of 22.1 minutes and a standard deviation of
0.98 minutes. Find a 95% confidence interval on the difference in means, assuming that 𝝈𝟏 𝟐 = 𝝈𝟐 𝟐 .
Also find a 90% confidence interval on the ratio of variances. (AU JAN 2016)
3b. From a population of 540, a sample of 60 individuals are taken. From this sample, the mean is found
to be 6.2 and the standard deviation is 1.368.
(1) Find the estimated standard error of the mean.
(2) Construct a 90 percent confidence interval for the mean. (AU MAY/JUNE 2019)
4. Discuss various non – probability sampling methods in use with its applications.
5a. A bank has kept records of the checking balances of its customers and determined that the average
daily balance of its customers is Rs. 300 with a standard deviation of Rs. 48. A random sample of 144
checking accounts is selected.
(i) What is the probability that the sample mean will be more than Rs. 306.60?
(ii) What is the probability that the sample mean will be less than Rs. 308?
(iii) What is the probability that the sample mean will be between Rs.302 and Rs. 308?
(iv) What is the probability that the sample mean will be atleast Rs. 296? (AU MAY/JUNE2014)
5b. A certain city is studied for demographic characteristics. It is found that the age has a standard
deviation of 5.3 years and 60% of the population is female. What should be the sample size if the age
is to be estimated with an error of less than 1 year? What should be the sample size if a similar
estimation is to be done on the proportion of female population if the desired accuracy is to be within
5%?If the sample average age is found to be 37.25 for a sample size of 300, estimate the population
age range with a confidence level of 95%. (AU MAY 2020)
www.BrainKart.com
5c. The life time of a certain brand of an electric bulb may be considered as a random variable with
mean 1200h and standard deviation 250h. Find the probability, using central limit theorem, that the
average lifetime of 60 bulbs exceeds 1250hours.
UNIT – III: TESTING OF HYPOTHESIS – PARAMETRIC TESTS.
PART – A
1. Define t-statistic.
The t – distribution is used when sample size is 30 or less and the population standard deviation is
x n
( x  x) 2
unknown. The t – statistic is defined as t  where s 2   i . The t – distribution has been
s i 1 n 1
n
derived mathematically under the assumption of a normally distributed population.
2. List out the applications of t –distribution. ( AU NOV/DEC 2017) (AU APR/MAY 2018)
1. To test the significant difference between the means of two independent samples.
2.To test the significant difference between the means of two dependent samples or
paired observation.
3. To test the significance of the mean of a random sample.
4 To test the significance of an observed correlation coefficient.
3. Mention the Properties of t – distribution.
1. The t distribution ranges from  to 
2. The t – distribution like the standard normal distribution is bell shaped, symmetrical around mean
zero.
3. The variance of the t – distribution is greater than one and is defined only when 𝑣 ≥ 3
4. What is the purpose of F – test?
F test refers to a test of hypothesis concerning two variances derived from two samples. It is used to test
S2
whether the two sample variances are equal or not that is F  1 2 , S1  S2 . Thus F statistics is the ratio of
S2
independent estimates of population variances.
5. What are the assumptions on which F-test is based?
1. Normality: The values in each group should be normally distributed.
2. Independence of error: The variations of each value around its own group mean.
i.e. error should be independent of each value.
3. Homogeneity: The variances within each group should be equal for all groups.
6. When to use the normal and ‘t’ distribution in making tests of hypothesis about means? (JAN 2016)
When the sample size is greater than 30 we say it is large sample and when it is less than we say it as small
sample. For large samples we use normal distribution and for small samples we use t test.
7. Estimate the standard error of the difference between the two proportions if 𝒑 ̅̅̅𝟏 = 𝟎. 𝟏𝟎,
̅̅̅
𝒑𝟐 = 𝟎. 𝟏𝟑𝟑𝟑 ,𝒏𝟏 = 𝟓𝟎 and 𝒏𝟐 = 𝟕𝟓 ? (AU JAN 2016)
𝑛1 𝑝1 +𝑛2 𝑝2
Let us first calculate the weighted average of 𝑝1 and 𝑝2 that is say 𝑝 = 𝑛 +𝑛 =0.1199
1 2
1 1
S.E. is √𝑝𝑞 (𝑛 + 𝑛 ) = 0.05931.
1 2
8. What do you mean by degrees of freedom?
Degrees of freedom are the total number of observations minus the number of independent constraints
imposed on the observations.
9. Define Experimental Error
The estimation of the amount of variations due to each of the independent factors separately and then
comparing these estimates due to assignable factors with the estimate due to chance factor is known as
experimental error.
www.BrainKart.com
10. State the assumptions of Student’s t – test.

1. The sample observations are independent.
2. The parent population from which the sample is drawn is normal.
3. The population standard deviation  is unknown.
11. Define Local Control.
When the number of treatments becomes large, it may not be possible to accommodate all the treatments in
one block because that will increase heterogeneity within blocks. The process of making the experimental
units homogeneous and reducing the experimental error is known as local control.
12. What are the Properties of F- distribution? (AU MAY/JUNE 2019)
1. The value of F must always be positive or zero since variances are squares and can never assume
negative values. Its value will always lie between 0 and .
2. The shape of the F- distribution depends upon the number of degrees of freedom.
3. The F – distribution is positively skewed.
13. Define Analysis of Variance.
Analysis of Variance is a technique that will enable us to test for the significance of the difference among
more than two sample means.
14. What are the assumptions of analysis of variance?
(i) The sample observations are independent
(ii) The Environmental effects are additive in nature
(iii) Sample observation are coming from normal
15. Distinguish between z-test and t-test. (AU JAN 2015)
z-test t-test
(i) Used for large samples (i) Used for small samples
(ii) Follows normal distribution. (ii)Follows student’s t distribution.
16. Define one way classification and two way classifications in ANOVA.
The entire experiment influences on only single factor is one way classification. The entire experiment
influences on only two factors is two way Classification.
17. What are the basic principles of design of experiments? (APR/MAY 2018)
(i) Randomization (ii) Replication (iii) Local Control
18. What are the usual assumptions made in the analysis of a randomized block Experiment?
(i) All the experimental units are homogenous
(ii) Each treatment replicates ‘ r ’ times.
19. Write down the ANOVA table for One way classification
Source of Variation Sum of Degrees Degree of freedom Mean Square F- Ratio
SSC
Between Samples SSC K-1 MSC 
K 1 MSC
FC 
SSE MSE
Within
SSE N-K MSE 
Samples NK
www.BrainKart.com
20. Write down the ANOVA table for Randomized Block Design
Source of Variation Sum of Degrees Degree of freedom Mean Square F- Ratio
SSC MSC
Column Treatment SSC c-1 MSC  FC 
c 1 MSE
SSR MSR
Row Treatments SSR r-1 MSC  FR 
r 1 MSE
SSE
Error (or) Residual SSE (r-1) (c-1) MSE 
(r  1)(c  1)
PART – B
1a. The following are the number of mistakes made in 5 successive days by 4 technicians working for a
photographic laboratory. Test whether the difference among the four sample means can be attributed
to chance. (Test at a level of significance   0.01 )
Technicians
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
8 10 15 10
11 14 11 11
1b. 40 people were attacked by a disease and only 36 survived. Will you reject the hypothesis that the
survival rate, if attacked by this disease, is 85%at 5% level of significance?
2a. Two independent samples of 8 and 7 items respectively had the following values.
Sample I 9 11 13 11 15 9 12 14
Sample II 10 12 10 14 9 8 10 -----
Is the difference between the means of samples significant? ( AU APR / MAY 2018 )
2b. The following table shows the yields per acre of four different plant crops grown on lots treated with
three different types of fertilizer. Determine at the 5% significance level whether there is a difference
in yield per acre
(i) due to the fertilizers and
(ii) due to the crops
Crop - I Crop - II Crop - III Crop - IV
Fertilizer A 4.5 6.4 7.2 6.7
Fertilizer B 8.8 7.8 9.6 7.0
Fertilizer C 5.9 6.8 5.7 5.2
3a. Given a sample mean of 83, a sample standard deviation of 12.5 and a sample size of 22, test the
hypothesis that the value of the population mean is 70 against alternative that it is more than 70. Use
the 0.025 significance level. (AU JAN 2016)
3b. A random sample of 10 boys had the following I.Q’s: 70, 120, 110, 101, 88, 83, 95, 98, 107, 100. Do
these data support the assumption of a population mean I.Q of 100? Find the reasonable range in
which most of the mean I.Q values of samples of 10 boys lie.
www.BrainKart.com
4a. The following table gives biological values of a protein from cow’s milk and buffalo’s milk at certain
level . Examine if the average values of protein in the 2 samples significantly differ. (APR /MAY 2018)
Cow’s milk 1.82 2.02 1.88 1.61 1.81 1.54
Buffalo’s milk 2.00 1.83 1.86 2.03 2.19 1.88
4b. A lathe is set to cut bars of steel into lengths of 6 centimeters. The lathe is considered to be in perfect
adjustment if the average length of the bars it cuts is 6 centimeters. A sample of 121 bars is selected
randomly and measured. It is determined that the average length of the bars in the sample is 6.08
centimeters with a standard deviation of 0.44 centimeters.
(i) Formulate the hypotheses to determine whether or not the lathe is in perfect adjustment.
(ii) Compute the test statistic. (iii) What is your conclusion? (AU JAN 2014)
5a. The daily production rates for a sample of factory workers before and after a training program are
shown below. Let d=After – Before.
Worker Before After
1 6 9
2 10 12
3 9 10
4 8 11
5 7 9
We want to determine if the training program was effective.
(i) Give the hypotheses for this problem.
(ii) Compute the test statistic.
(iii) At 95 % confidence, test the hypotheses. That is, did the training program actually increase the
production rates? (AU JAN 2019)
5b. The following table shows the lifetimes in hours of samples from three different types of television
tables manufactured by a company. Determine whether there is a difference between the three types
at significance level of 0.01
Sample 1 407 411 409
Sample 2 404 406 408 405 402
Sample 3 410 408 406 408
UNIT – IV: NON-PARAMETRIC TESTS
PART – A
1. State any two properties of  2 distribution.
1. The exact shape of the distribution depends upon the number of degrees of freedom n. In general when n
is small, the shape of the curve is skewed to the right and as n gets larger, the distribution becomes more
and more symmetrical.
2. The mean and variance of the distribution are n and 2n respectively.
3. The sum of the independent variates is also a variate.
2. Explain the various uses of Chi-square test.
1.Test of goodness of fit
2.Test of independence of attributes
3. Test of Homogeneity of independent estimates of the population correlation coefficient.
3. What are the conditions for the validity of Chi-square test? (AU MAY/ JUNE 2016)
1. The experimental data must be independent of each other.
2. The total frequency must be reasonably large, say 50.
3. No individual frequencies should be less than 5, If any frequency is less than 5, then it is pooled with the
preceding or succeeding frequency so that the pooled frequency is more than 5.Finally adjust for the
degrees of freedom lost in pooling.
www.BrainKart.com
4. Write the formula for chi square test of single standard deviation. (AU MAY/JUNE 2014)
 n  1 s 2
The formula is  2 
2
5. What are the uses of  2 test?
1. To test the homogeneity of independent estimates of the population variances.
2. To test the goodness of fit.
3. To test for independence of attributes.
6. Explain the Chi – square test as a test of independence.
It is applied to test the association between the attributes when the sample data is presented in the form of a
contingency table with any number of rows or columns.
Ri  C j
Eij  where , GT = Grand Total
G.T
χ 2calculated value < χ 2tabulated value
then accept .
7. What are the disadvantages of Non Parametric tests?
1. They ignore a certain amount of information
2. They are often not as efficient or sharp as parametric tests.
3. The non-parametric tests cannot be used to estimate parameters in the population or the confidence
intervals for such parameters.
8. What is meant by Non Parametric test? (AU NOV/DEC 2013)
Non parametric test is the test that does not make any assumption regarding from which the sampling is
done. They are often called as distribution -free methods.
9. Name any four Non parametric test. (AU JAN 2015)
1. Sign Test for paired data
2. Rank Sum tests
(a)Mann-Whitney U-Test (b)Kruskal -Wallis Test or H test.
3. Rank correlation test
4. One sample run test.
10. Define the statistics used in the U – test and give its mean.
n (n  1) nn
U = n1n 2  1 1  R1 , Mean = 1 2
2 2
11. Define the statistics used in the H – test. (AU MAY/JUNE 2019)
12  Ri  k 2
H=    3(n  1)
nn  1)   i 1 ni 
12. When Kruskal –Wallis test is used ? (NOV/DEC 2017)
The Kruskal –Wallis test is used to test whether the 3 or more populations are identical or not . The K-W
test is based on the analysis of independent random samples from each of the k populations.
13. Define Kolmogorov smirnov Test.
It is a simple non parametric test for testing whether there is a significance between an observed
frequency distributions and a theoretical frequency distribution. It is another measure of the goodness of
fit. (i.e.,) 𝐷𝑛 = 𝑚𝑎𝑥|𝐹𝑒 − 𝐹𝑜 |
14. When Mann-Whitney U-Test is used ?
The U test is used to test whether the 2 populations are identical or not . The U test is based on the
analysis of independent random samples from two polpulations.
15. What are the advantages of Kolmogorov-smirnov test ?
(i) It is a more powerful test.
(ii) It is easier to use since it does not require that the data be grouped in any way.
www.BrainKart.com
16. Write any two advantages of non-parametric methods over parametric methods.
1. They do not require us to make the assumption that a population is distributed in the shape of a normal
curve or another specific shape.
2. Generally they are easier to do and to understand.
3. Sometimes even formal ordering or ranking is not required.
17. When sign test is used?
1. When there are pair of observations on two things being compared.
2. For any given pair, each of two observations is made under similar conditions.
3. No assumptions are made regarding the parent population.
18. Write the formula for run test . (AU APR/MAY 2018)
Let R be the number of runs , n1 = number of items in first sample , n2 = number of items in second
sample .
Here ,R is approximated by normal distribution
R  E ( R)
Z  N (0,1)
V ( R)
2n n 2n n (2n n  n  n )
where E ( R)    1 2  1, V ( R)   2  1 2 21 2 1 2
n1  n2  n1  n2   n1  n2  1
R
Z   N (0,1)

19. Find the number of runs for the following series MMFFFMFFMMMM.
Number of runs = R = 5.
20. Find the number of runs for the following series HTHHHHHTTTHTHTHHTTTHHTTTH .
Number of runs = R = 13 .
PART – B
1.a Two sample polls of votes for two candidates A and B for a public office were taken, one from
among the residents of rural areas and one from the residents of urban areas . The results are given
in the table . Examine whether the nature of the area is related to voting preference in this election.
( AU NOV /DEC 2017)
Votes for A B TOTAL
area
Rural 620 380 1000
Urban 550 450 1000
TOTAL 1170 830 2000
1b. An experiment designed to compare three preventive methods against corrosion yielded the following
maximum depths of pits (in thousands of an inch) in pieces of wire subjected to the respective
treatments:
Method A: 77 54 67 74 71 66
Method B: 60 41 59 65 62 64 52
Method C: 49 52 69 47 56
Use the kruskal – Wallis test at the 5% level of significance to test the null hypothesis that the three
samples come from identical populations.
2a. Use the sign test to see if there is a difference between the number of days until collection of an
account receivable before and after a new collection policy. Use the 0.05 significance level.
Before: 30 28 34 35 40 42 33 38 34 45 28 27 25 41 36
After : 32 29 33 32 37 43 40 41 37 44 27 33 30 38 36
2b. Test whether the following numbers 0.44, 0.81, 0.14, 0.05, 0.93 are uniformly distributed using
Kolmogorov – smirnov test (AU NOV – DEC 2018)
www.BrainKart.com
3a. Two methods of instruction to apprentices are to be evaluated. A director assigns 15 randomly
selected trainees to each of the two methods. Due to drop outs, 14 complete in batch 1 and 12
complete in batch 2. An achievement test was given to these successful candidates. Their scores are
as follows. Method 1: 70 90 82 64 86 77 84 79 82 89 73 81 83 66
Method 2: 86 78 90 82 65 87 80 88 95 85 76 94
Test whether the two methods have significant difference in effectiveness. Use Mann-Whitney test
at 5% significance level.
3b. Kevin Morgan, national sales manager of an electronics firm, has collected the following salary
statistics on his field sales force earnings. He has both observed frequencies and expected
frequencies if the distribution of salaries is normal. At the 0.05 level of significance, can Kevin
conclude that the distribution of sales force earnings is normal? (AU MAY/JUNE 2019)
Earnings in thousands 25-30 31-36 37-42 43-48 49-54 55-60 61-66
Observed frequency 9 22 25 30 21 12 6
Expected frequency 6 17 32 35 18 13 4
4a. The following contingency table presents the reactions of legislators to a tax plan according to party
affiliation. Test whether party affiliation influences the reaction to the tax plan at 0.01 level of
signification.
Reaction
Party A 120 20 20 160
Party B 50 30 60 140
Party C 50 10 40 100
Total 220 60 120 400
4b. A technician is asked to analyze the results of 22 items made in preparation run. Each item has been
measured and compared to engineering specifications. The order of acceptance ‘a’ and rejections of
‘r’ is aarrrarraaaaarrarraara Determine whether it is a random sample or not. Use   0.05 .
5a. From a poll of 800 television viewers, the following data have been accumulated as to, their levels of
education and their preference of television stations. We are interested in determining if the
selection of a TV station is independent of the level of education (AU JAN 2016)
Educational Level
Total 200 400 200 800
(ii) Show the contingency table of the expected frequencies. (iii) Compute the test statistic.
(iv) The null hypothesis is to be tested at 95% confidence. Determine the critical value for this test.
5b. The manager of a company believes that differences in sales performance depend upon the
salesperson’s age. Independent samples of salespeople were taken and their weekly sales record is
reported below.
Below 30 years Between 30 and 45 years Over 45 years
No. of Sales No. of Sales No. of Sales
24 23 30
16 17 20
21 22 23
15 25 25
19 18 34
26 29 36
27 28
www.BrainKart.com

(ii) At 95% confidence, test the hypotheses using Kruskal Wallis Test. (AU JAN 2018)
UNIT V: CORRELATION, REGRESSION AND TIME SERIES ANALYSIS
PART – A
1. Define time series.
A time series is an arrangement of statistical data in accordance with the time of occurrence in
chronological order.
2. Write angle between the regression lines
 1  r 2   x y
tan     2
 r  x  y
2
3. When do you say two regression lines coincide with each other?
When r = ±1 the two regression lines coincide .
4. Differentiate between correlation and regression (APR/MAY 2018)
Correlation analysis Regression analysis
1. Correlation coefficient r between X and Y 1. The regression coefficients are
is a measure of linear relationship mathematical measures expressing the
between X and Y average relationship between the two
variables
2. The correlation coefficient does not 2. Regression coefficient reflect on the
reflect upon the nature of variable nature of variable
3. It is a relative measure and is independent 3. Regression coefficients are absolute
of the units of measurement measures of finding out the relationship
between two or more variables
5. Write any two properties of regression co-efficient

The coefficient of correlation is the geometric mean of the coefficients of regression
If one of the regression coefficients is greater than unity, then other is less than unity.
6. Find the mean of x and y, given two regression lines are x+6y = 4 and 2x+3y = -1
x + 6y = 4,2x + 3y = -1
2x + 12y = 8,2x +3y = -1 solving we get, x = -2, y = 1.
7. The equations of the regression lines are given by 3x + y =10,
3x + 4y = 12. Find the correlation coefficient between x and y
x on y : x = 10/3 – 1/3 y
bxy = -1/3
y on x : y = 12/4 –3/4 x
byx = -3/4 , r2 = bxybyx = (–1/3 ) (–3/4) = 1/4 = 0.25
8. What are the basic components of Time series analysis? ( AU JAN 2015)
1.Secular Trend 2.Seasonal Variations 3.Cyclical Variations 4.Irregular Variations
9. Define regression.
Regression is the measure of the average relationship between two or more variables in terms of the
original units of the data.
10. Write the normal equations for the method of fitting of a parbolic curve.
y=a+bx+cx2 is the trend equation and the normal equations are
∑y=na+b∑x+c∑x2 , ∑xy=a∑x +b∑x2+c∑x3 , ∑x2y=a∑x2 +b∑x3+c∑x4
11. What are the uses of regression analysis (AU MAY/JUNE 2019)
1. It is useful in economic analysis as regression equation can determine an increase in the cost of
living index for a particular increase in general price level.
2. It enables us to study the nature of relationship between the variables.
www.BrainKart.com
12. Write the merits of the least squares method.

1. This method gives the trend values for the entire time period
2. This method is a completely objective in character
3. This method can be used to forecast future trend because trend line establishes a functional
relationship between the value and the time.
13. 1
If the tangent of the angle between the lines of regression y on x and x on y is 0.6 and  x   y .
2
Find the correlation coefficient.
1
 x y 1  r 2   y y 
2 1  r2 
tan   2 
 x   2 y  r   
, 0 .6
1 2
 y  2y  r 
4
1
1  r 2 
 2  1.5r  1  r 2 , r  2,0.5
5  r 
4
14. State any two properties of correlation coefficient.
(i) The coefficient of correlation lies between -1 and +1.
(ii) The coefficient of correlation is independent of change of scale and origin of the variables X & Y.
15. What are the various methods of studying trend?
1. Graphic method 2. Method of semi-averages 3. Method of Moving averages
4. Method of Least squares.
16. Write down the formula to calculate rank correlation coefficient .
6(∑ 𝑑𝑖2 )
𝜌𝑠 = 1 − , di = xi - yi
𝑛(𝑛2 −1)
17. Briefly explain how a scatter diagram benefits the researcher? (AU MAY/JUNE 2014)
The simplest device for studying correlation between two variables is a special type of dot chart called
scatter diagram. In this method, the given data is plotted on a graph in the form of dots. The more the
plotted points scatter over a chart, the lesser is the degree of relationship between the two variables. The
nearer the points come to the line, the higher the degree of relationship. If the plotted points lie in a
haphazard manner it shows the absence of any relationship between the variables.
18. When do we say the variables are positively correlated, negatively correlated and uncorrelated.
(i) If r=1 then there is a perfect positive correlation.
(ii) If r= -1 then there is a perfect negative correlation.
(iii) If r=0 then the variables are uncorrelated.
19. Mention the two mathematical models for a time series.
1. Additive model – This model assumes that the four components of the time series Trend, seasonal,
cyclical and irregular variations are independent of each other.
2. Multiplicative model - This model assumes that the four components of the time series are
interdependent.
20. State the limitations of Method of Moving averages.
(1) Trend values cannot be calculated for all the years that is, some years will be left out in the beginning
and in the end.
(2) The period of moving average has to be chosen with great care.
(3) This method cannot be used for forecasting.
PART B
1a. Calculate the coefficient of correlation between X and Y , using the following data : (AU APR 2018)
X 1 3 5 7 8 10
Y 8 12 15 17 18 20
www.BrainKart.com
1b. Fit a second degree polynomial equation for the following data
X 1976 1977 1978 1979 1980 1981 1982 1983 1984
Y 50 65 70 85 82 75 65 90 95
2. Given below are the figures of production (in thousand quintals) of a sugar factory.
Year 1974 1975 1976 1977 1978 1979 1980
Production 77 88 94 85 91 98 90
Fit a straight line by the least squares method and tabulate the trend values.
3a. Find the two regression lines using the data below: (AU NOV/DEC 2018)
X 7 4 8 6 5
Y 6 5 9 8 2
3b. The following data on production (in ‘000 units) of a commodity from the year 2006-2012. Fit a
straight line trend and forecast for the year 2020 (AU NOV/DEC 2017)
Year 2006 2007 2008 2009 2010 2011 2012

Production 6 7 5 4 6 7 5
4a. Explain the basic components of Time series analysis.
4b. The monthly water consumption in thousand gallons in a hostel for five years is given below. Calculate
the seasonal indices by the method of simple averages
1979 25 23 21 18 15 20 21 25 22 24 32 35
1980 27 25 23 20 17 22 23 27 24 26 35 33
1981 32 31 30 27 25 27 29 30 30 32 41 38
1982 42 40 38 36 34 37 38 40 38 43 52 48
1983 57 50 52 46 49 46 49 55 50 59 64 63
5a. The following table gives the profits of a concern for 5 years ending 1983. Fit an exponential curve
for the following data (AU MAY/JUNE 2019)
Year 1979 1980 1981 1982 1983
Profits 1.6 4.5 13.8 40.2 125.0
5b. The quarterly sales (in thousands of copies) for a specific education software over the past three
years are given in the following table.
2003 2004 2005
Quarter 1 170 180 190
Quarter 2 111 96 120
Quarter 3 270 280 290
Quarter 4 250 220 223
(i) Compute the four seasonal factors (Seasonal Indexes). Show all of your computations.
(ii) The trend for these data is Trend = 174+4t (t represents time, where t=1 for Quarter 1 of 2003
and t=12 for Quarter 4 of 2005). Forecast sales for the first quarter of 2006 using the trend and
seasonal indexes. Show all of your computations.
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
www.BrainKart.com
Click on Subject/Paper under Semester to enter.
Statistics for Management Quantitative Techniques for Strategic Management -

- BA4101 Decision Making - BA4201 BA4301
Management Concepts and Financial Management -

BA4202 International Business -
Organizational Behavior -
BA4302
BA4102
1st Semester
3rd Semester
Human Resources
2nd Semester
Managerial Economics - Management - BA4203 Elective - 1

BA4103
Operations Management -
Elective - 2
BA4204
Accounting for Decision
Making - BA4104
Business Research Methods - Elective - 3
BA4205
Legal Aspects of Business - Elective - 4

BA4105
Business Analytics - BA4206
Elective - 5
Information Management - Marketing Management -
BA4106 BA4207 Elective - 5
All MBA Engg Subjects (Click on Subjects to enter)
Financial Management Human Resources Management Information Management
Marketing Management Accounting For Managers Research Methodology
Business Environment Management Concepts & Human Resources
and Law Organisational Behaviour Management
Managerial Economics Marketing Management Financial Management
Operations Management Strategic Management Strategic Management
International Business Business Ethics Corporate Social Enterprise Resource
Management Responsibility and Governance Planning
Customer Relationship Security Analysis and Portfolio Customer Relationship
Management Management Management
Services Marketing Entrepreneurship Development Rural Marketing
Merchant Banking and Banking Financial Services Managerial Behavior and
Financial Services Management Effectiveness
Industrial Relations and
Labour Welfare

Statistics for Management SFM - BA4101 - Notes by JeppiaarEC

Uploaded by

Copyright:

Available Formats

Statistics for Management SFM - BA4101 - Notes by JeppiaarEC

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Statistics for Management SFM - BA4101 - Notes by JeppiaarEC

Uploaded by

Copyright:

Available Formats

Click on Subject/Paper under Semester to enter.

Statistics for Management Quantitative Techniques for Strategic Management -

Management Concepts and Financial Management -

Managerial Economics - Management - BA4203 Elective - 1

Legal Aspects of Business - Elective - 4

DEPARTMENT OF MANAGEMENT STUDIES

II YEAR / III SEMESTER

BA4101: STATISTICS FOR MANAGEMENT

Anna University Chennai

1 COLLEGE- Mission/ Vision 3

2 MBA Dept - Mission/ Vision 3

CO’S/ CO-PEO matrix 4

3 SYLLABUS OF THE SUBJECT 5

5 QUESTIONS BNAK UNITWISE 122

PART –A (30 questions with answer) /

6 PREVIOUS YEAR UNIVERSITY QUESTION 139

Jeppiaar Nagar, OMR Salai, Semmencherry ,Chennai -600119

DEPARTMENT OF MANAGEMENT STUDIES

To be a prominent management institution developing industry ready managers,

• To provide management education to all groups in the community.

PROGRAMME EDUCATIONAL OBJECTIVES (PEOs):

PROGRAMME OUTCOMES (POs )

CO PO1 PO2 PO3 PO4 PO5 PO6

BA4101 - STATISTICS FOR MANAGEMENT

[ the events are mutually independent]

Total probability of an event:

 P ( A) = 1 − P ( A) = 1 − 0.4 = 0.6 & P ( B / A) = 0.05

4. If a is constant then Var ( ax ) = a 2Var ( x )

Moment Generating Function (m.g.f)

Properties of Moment Generating Function

 Kxdx +  2Kdx +  ( 6k − kx )dx = 1

24. If the continuous random variable X has ray Leigh density

Mean of Binomial distribution

=  n ( q + pet ) pet  = np Since q + p = 1

=  n ( n − 1) ( q + pet ) ( pe ) + npet ( q + pet ) 

Mean = np ; Variance = npq

(i) P ( 2 boys and 2 girls ) = P ( X = 2 )

(iv) P ( Children of both genders ) = 1 − P (Children of the same gender )

= 1 − P ( all are boys) + P ( all are girls)

Problem.1 If X is uniformly distributed with Mean1 and Variance , find P  X  0

Problem.2 State and prove the additive property of normal distribution.

 z1 is the value of z corresponding to the area   ( z )dz = 0.19

 z2 is the value of z corresponding to the area   ( z )dz = 0.42

UNIT –II : SAMPLING DISTRIBUTION & ESTIMATION

One tailed test:

n  , E  standard error

(iii) Z   2.58`. 99% confidence interval is   x  Z   x  (19.33,20.67)

Stand. Dev. 32.0583 31.0312

10. The diameter of component produced on a semi-automatic machine is known to be distributed

Cluster Random samples of Possible to select randomly Clusters in a level must be

Stage Combination of cluster Can make up probability Complex, combines limitations of

No of arrivals per hr 0 1 2 3 4 5 or more

Degrees of freedom , n-2 = 6-2 = 4

POPULATION (PARAMETER) SAMPLE (STATISTICS)

Tests of significance or Hypothesis Testing:

If z <1.96, H 0 may be accepted at 5% level of significance. If z >1.96, H 0 may be rejection

578 2.8 7.84

Level of Significance :   5%    0.05

Then we use the statistic is t 

Sample Size Mean S.D