Course Material - I MCA
Course Material - I MCA
Course Material - I MCA
I MCA
UNIT I
Probability Spaces:
If there are n exhaustive, mutually exclusive and equally likely events, probability of
the happening of A is defined as the ration m/n, m is favourable to A.
m
P( A) =
n
Number cases favourable to A
=
Exhaustive number of cases in S
Let a random experiment be repeated n times and let an event A occur nA out of n
nA nA
n called the relative frequency of the event A. As n increases, n
trials. The ratio
shows a tendency to stabilize and to approach a constant value. This value, denoted by
P(A) is called the probability of the event A.
lim nA
P( A) =
n→ n
When the occurrence of one event procludes the occurrence of all other events, then
such a set of events is said to be mutually exclusive.
Example On tossing a coin , either head or tail can occur but not both. i.e occurrence
of head excludes the occurrence of tail. The events of occurrence of head and tail are
mutually exclusive.
Two events are said to be equally likely events if each one of them has an equal
chance of occurrence.
In tossing an unbiased coin the occurrence of head or tail are equally likely.
If A and B are any two events, and are not disjoint, then
P( A B) = P( A) + P( B) − P( A B)
Proof:
From the Venn diagram, the events A and A B are disjoint.
Therefore A B = A ( A B)
P( A B) = P[ A ( A B)]
= P( A) + P( A B)
Adding and subtracting P( A B) ,
P( A B) = P( A) + P( A B) + P( A B) − P( A B)
= P( A) + P[( A B) ( A B)] − P( A B)
P( A B) = P( A) + P( B) − P( A B)
Conditional Probability
Proof:
The inner circle represents the event A.
A can occur along with B1 , B2 ,...., Bn that are exhaustive and mutually exclusive.
Therefore AB1 , AB2 ,...., ABn are also mutually exclusive.
Bayes theorem
Proof:
P( Bi A) = P( Bi ) P( A / Bi )
P( Bi A) = P ( A) P( Bi / A)
P( Bi ) P( A / Bi )
P( A / Bi ) =
P( A)
P ( Bi ) P ( A / Bi )
= n
P ( B ) P (A / B )
i =1
i i
Random Variable:
A real – valued function defined on the outcome of a probability experiment is called
a random variable.
Example :Suppose that a coin is tossed twice so that the sample space is S = {HH, HT,
TH, TT}. Let X represent the number of heads that can come up. With each sample
point we can associate a number for X as shown in Table . Thus, for example, in the
case of HH (i.e., 2 heads), X = 2 while for TH (1 head), X = 1. It follows that X is a
random variable.
Table
Sample HH HT TH TT
Point
X 2 1 1 0
It should be noted that many other random variables could also be defined on this
sample space, for example, the square of the number of heads or the number of
heads minus the number of tails.
A random variable whose set of possible values is either finite or countably infinite is
called discrete random variable.
Example: Number of transmitted bits received in error.
Let X be a discrete random variable assuming values x1, x2,…, xn with corresponding
probabilities P1, P2,…, Pn. Then
E(X) = x i p(x i ) is called the expected value of X.
i
E(X) is also commonly called the mean or the expectation of X. A useful identity
states
that for a function g,
E[g(x)] = g(x
xi
i ) p(x i )
Suppose X is a continuous random variable with probability density function f(x). The
mean or expected value of X, denoted as or E(X) is
= E(X) =
−
x f(x) dx
A useful identity is that for any function g,
E[g(x)] = g(x) f(x) dx
−
Variance of X:
− −
= E[X ] – [E(X)]
2 2
MX(t) = E[etX] = x
e tx p(x) , if X is discrete
e
tx
f X (x) dx , if X is continuous.
−
Tchebyshev's Inequality:
Let X be a random variable with mean E(X) = and variance var(X) = 2 . Then the
Tchebyshev’s inequality states that
2
P( X − t)
t2
2
P ( X − t ) 1−
t2
P ( X − n )
1
n2
Problems:
1. Find the chance of throwing (a) four (b) an even number with an ordinary six faced
die.
1
P (throwing four) = 6
3 1
P (getting an even number) = = 2.
6
2. A bag contains 8 white balls and 6 red balls. Find the probability of drawing two balls
of the same colour.
Two balls out of 14 balls can be drawn in 14𝐶2 ways.
Two white balls out of 8 can be drawn in 8𝐶2 ways.
8C 28
P (drawing two white balls) = 14C2 = 91
2
6C 15
P (drawing two red balls) = 14C2 = 91
2
Probability of drawing 2 balls of same colour (either both white or both red)
28 15 43
= 91 + 91 = 91
3. Find the probability of drawing an ace or a spade or both from a deck of cards?
4
Probability of drawing an ace event (A) = 52.
13
Probability of drawing a spade event (B) = 52
1
Probability of drawing an ace of spade (A ∩ 𝐵) = .
52
The events drawing an ace or a spade are not mutually exclusive , therefore
P(AUB) = P(A) +P(B) - P (A ∩ 𝐵)
4 13 1
= + -
52 52 52
4
= 13.
4. What is the chance that a leap year selected at random will contain 53 Sundays?
A leap year contains 52 full weeks and extra two days (total of 366 days).
5. When A and B are 2 mutually exclusive events such that P(A) = ½ and
P(B) = 1/3 , find P( AUB) and P(A∩ 𝐵).
n(S) = 25 =32
n(A) = at least one head = 31
P(A) = 31/32
7. Given that P(A) =0.31, P(B) = 0.47 , A and B are mutually exclusive. Then find P(A∩
𝐵̅ ).
𝑃(𝐴̅ ∪ 𝐵̅ ) = P𝐴
̅̅̅̅̅̅̅
∩ 𝐵 = 1- P(A∩ 𝐵) = 0.86.
9. Given P(A) =1/3 , P(B) =1/4 , P(A∩ 𝐵) = 1/6 find the following probability
𝑃(𝐴̅), 𝑃(𝐴̅ ∩ 𝐵̅ ).
10. What is the probability of obtaining 2 heads in two throws of a single coin?
12. A box contains tags marked 1,2,…,n. two tags are chosen at random without
replacement. Find the probability that the numbers on the tags will be consecutive
integers.
Soln:
No. of ways of choosing any one pair from the (n-1)pairs = (n − 1)C1 = n − 1
Total no. of ways of choosing 2 tags from the n tags = nC 2
n −1
Therefore the required probability = =2/n
n(n − 1) / 2
13. Among the workers in a factory only 30% receive a bonus. Among those receiving the
bonus only 20% are skilled. What is the probability of a randomly selected worker
who is skilled and receiving bonus.
Soln:
P(A)=0.3
P(B/A)=0.2
P( A B) = P( A) P( B / A)
= 0.6
14. Prove that the events A and B are independent, then A and B are also independent.
Proof: P( A B) = P( A B)
= 1 − P( A B)
Using addition and multiplication theorems,
P( A B) = P( A) P( B)
15. A and B alternately throw a pair of dice. A wins if he throws 6 before B throws 7 and
B wins if he throws 7 before A throws 6. If A begins, show that his chance of winning
is 30/61.
Soln: A- Event of A throwing 6.
B – Event of B throwing 7.
5 1
P( A) = P( B) =
36 6
P( Awins) = P( Aor ABAor ABABAor...)
= P( A) + P( ABA) + P( ABABA) + ...
= 30/61
16. In a coin tossing experiment, if the coin shows head, 1 dice is thrown and the result is
recorded. But if the coin shows tail, 2 dice are thrown and their sum is recorded. What
is the probability that the recorded number will be 2?
Soln:
When a single die is thrown, P(2)=1/6.
When 2 dice are thrown, the sum will be 2, only if each dies shows 1.
Therefore P(getting 2 as sum with 2 dice) =1/6 X 1/6=1/36
by theorem of total probability,
P(2) = P( H ) P(2 / H ) + P(T ) P(2 / T )
= 7/72.
17. If atleast 1 child in a family with 2 children is a boy, what is the probability that both
children are boys?
Soln:
P=probability that a child is a boy = ½
q=1/2
P(atleast one boy) = P(exactly 1 boy)+P(exactly 2 boys)
=¾
1
3
P(both are boys/atleast one is a boy)= 4 =
3 4
4
18. In a shooting test, the probability of hitting the target is ½ for A, 2/3 for B and ¾ for
C. If all of them fire at the target, find the probability that none of them hits the target.
Soln:
Let A,B, and C are the event of hitting the target.
P(A)=1/2 ; P(B)=2/3 ; P(C)= ¾
P( A B C ) = P( A) P( B) P(C )
=1/24
19. If A is the complementary event of A, prove that P( A) = 1 − P( A) 1
Proof: If A and are mutually exclusive events, such that A A = S
P( A A) = P(S )
P( A) + P( A) = 1
P( A) = 1 − P( A)
Since P( A) 0 , it follows that P( A) 1
20. Two fair dice are thrown independently. Three events A, B and C are defined as
follows.
(i) Odd face with the first die
(ii) Odd face with second die
(iii) Sum of the numbers in 2 dice is odd. Are the events A, B and C mutually
independent?
Soln:
P(A)= ½; P(B)= ½; P(C)=1/2
P( A B) = P( B C ) = P( A C ) = 1/ 4
P( A B C ) = 0
Since C cannot happen when A and B occur. Therefore
P( A B C ) P( A) P( B) P(C )
Therefore the events are pairwise independent, but not mutually
independent.
21. Two defective tubes get mixed up with 2 good ones. The tubes are tested, one by one,
until both defectives are found. What is the probability that the last defective tube is
obtained on (i) the second test (ii) the third test and (iii) the fourth test.
Soln:
Let D represent defective and N represent non-defective tube.
(i) P(Second D in the II test)=P(D in the I test and D in the II test)
= P ( D1 D2 )
= P( D1 ) P( D2 ) =1/6
26. An urn contains 10 white and 3 black balls. Another urn contains 3 white and 5 black
balls. Two balls are drawn at random from the first urn and placed in the second urn
and then 1 ball is taken at random from the latter. What is the probability that it is a
white ball?
Soln:
The two balls transferred may be both white or both black or 1 white and 1
black.
Let B1 be the event of drawing 2 white balls from the first urn and B2 be the
event of drawing 2 black balls from it and B3 be the event of drawing 1 white and
1black ball from it.
Let A be the event of drawing a white ball from the second urn after transfer.
P(B1)= 15/26, P(B2)=1/26, P(B3)=10/26,
P(A/B1)=P(drawing a white ball/2 white balls have been transferred)’’
= 5/10.
Similarly, P(A/B2)=3/10 and P(A/B3)=4/10
Therefore P( A) = P( B1 ) P( A / B1 ) + P( B2 ) P( A / B2 ) + P( B3 ) P( A / B3 )
= 59/130
27. A bag contains 5 balls and it is not known how many of them are white. Two balls are
drawn at random from the bag and they are noted to be white. What is the chance that
all the balls in the bag are white?
Soln:
since 2 white balls have been drawn out, the bag must have contained 2, 3, 4
or 5 white balls.
Let B1 be the event of the bag containing 2 white balls, B2 be the event of the
bag containing 3 white balls, B3 be the event of the bag containing 4 white balls and
B4 be the event of the bag containing 5 white balls.
Let A be the event of drawing 2 white balls.
P(A/B1)=1/10; P(A/B2)=3/10; P(A/B3)=3/5; P(A/B4)=1
P(B1)= P(B2)= P(B3)= P(B4)=1/4
By Bayes theorem,
P ( B4 ) P ( A / B4 )
P ( B4 / A) = 4
, i = 1, 2,3, 4
P( B ) P(A / B )
i =1
i i
=1/2.
28. In a bolt factory, machines A,B and C produce 25, 35 and 40% of the total output
respectively. Of their outputs 5, 4 and 2% respectively are defective bolts. If a bolt is
chosen at random from the combined output, what is the probability that it is
defective? If a bold chosen at random is found to be defective, what is the probability
that it was produced by B?
Soln:
P ( E1 ) = 0.25 ; P ( E2 ) = 0.35 ; P ( E3 ) = 0.40
Let X be the event of drawing defective bolt.
P( X / E1 ) = 0.05
P ( X / E2 ) = 0.04
P ( X / E3 ) = 0.02
By Baye’s theorem
P( E2 ) P( X / E2 )
P( E2 / X ) =
P( E1 ) P( X / E1 ) + P( E2 ) P( X / E2 ) + P( E3 ) P( X / E3 )
=0.406.
29. The contents of three urns 1, 2, and 3 are as follows:
Urns White Black Red
Balls
I 1 2 3
II 2 3 1
III 3 1 2
An urn is chosen at random and from it two balls are drawn at random. The two
balls are one red and one white. What is the probability that they come from the
second urn.
Soln:
1
P( B1 ) = P( B2 ) = P( B3 ) =
3
2
P( A / B1 ) =
15
2
P( A / B2 ) =
5
1
P( A / B3 ) =
5
By Baye’s theorem,
P( Bi ) P( A / Bi )
P( Bi / A) = n
, i = 1, 2,..., n
P( B ) P(A / B )
i =1
i i
P( B2 ) P( A / B2 )
P( B2 / A) = 3
, i = 1, 2,3
P( B ) P(A / B )
i =1
i i
= 2/11
30. A Given lot of IC chips contains 2% defective chips. Each is tested before delivery.
The tester itself is not totally reliable. Probability of tester says the chip is good when
it is really good is 0.95 and the probability of tester says chip is defective when it is
actually defective is 0.94. If a tested device is indicated to be defective, what is the
probability that it is actually defective.
Soln:
E be the event of chip is actually good and D be the event of tester says it is
good.
P( E ) = 0.02
P( E ) = 1 − P( E ) = 0.98
Given that the probability of tester says the chip is good when it is really good is 0.95
P( D / E ) = 0.95
P( D / E ) = 1 − P( D / E ) = 0.05
P( D / E ) = 0.94
The probability of actually defective
By Baye’s theorem,
P ( E / D)P ( E )
P( E / D) =
P ( E / D)P ( E ) + P( D / E )P( E )
=0.2773.
31. A certain firm has plant A, B and C producing IC chips. Plant A produces twice the
output from B and B produces twice the output from C. The probability of a non-
defective product produced by A,B and C are respectively 0.85, 0.75 and 0.95. A
customer receives a defective product. Find the probability that it came from plant B.
Soln:
P(A)=1; P(B)=0.5; P(C)=0.25
P(E/A)=0.85 ; P(E/B)=0.75 ; P(E/C)=0.95
P( E / A) = 0.15
P( E / B) = 0.25
P( E / C ) = 0.05
The probability that the customer receives a defective product from plant B is
P ( B ) P ( E /B )
P( B / E ) = =0.4348
P( A) P( E / A) + P( B) P( E /B) + P(C ) P( E /C )
32. There are 3 true coins and 1 false coin with ‘head’ on both sides. A coin is chosen at
random and tossed 4 times. If ‘head’ occurs all the 4 times, what is the probability that
the false coin has been chosen and used?
Soln:
P(T)=P(the coin is a true coin)=3/4
P(F)=P(the coin is a false coin)=1/4
Let A be the event of getting all heads in 4 tosses.
1 1 1 1 1
P( A / T ) = =
2 2 2 2 16
P( A / F ) = 1
By Baye’ theorem,
P( F ) P( A / F )
P( F / A) =
P( F ) P( A / F ) + P(T ) P( A / T )
= 16/19
33. A coin with is tossed n times. Show that the probability that the number of heads
obtained is even is 0.5 1 + (q − p)n .
Soln:
P(even no.of heads are obtained)=P(0 head or 2 head or 4 head or …)
=P(0 head or 2 head or 4 heads or …)
= nC0 qn p0 + nC2 qn−2 p2 + nC4 qn−4 p4 + ... --------(1)
P(− 5 X 4 ) =
1
6
X: 0 -3 6 10
F(X): 0 1/6 1/2 1
P(X): 0 1/6 2/6 1/2
35. The monthly demand for Titan watches is known to have the following probability
distribution.
Demand 1 2 3 4 5 6 7 8
Probability 0.08 0.12 0.19 0.24 0.16 0.10 0.07 0.04
Determine the expected demand for watches. Also compute the variance.
Soln:
E(X) = x i p(x i )
i
= 1(0.08)+2(0.12)+3(0.19)+4(0.24)+5(0.16)+6(0.10)+7(0.07)+8(0.04)
EX = 4.06
E(X2) = x i
i
2
p(x i )
= 12(0.08)+22(0.12)+32(0.19)+42(0.24)+52(0.16)+62(0.10)+72(0.07)+82(0.04)
= 19.7
V ( X ) = E X 2 − E ( X )
2
= 19.7-(4.06)2=3.2164
E(X2) = x
i
i
2
p(x i )
= 154/6
190 784
− =
6 36
356
=
36
89
=
9
Kx, x = 1,2,3,4,5 represents a p.m.f
If P(X = x ) =
37. 0 , otherwise
(i) Find ‘K’
(ii) Find P (X being a prime number )
1 5
(iii) Find P X / X 1
2 2
(iv) Find the distribution function.
Soln:
(i) K+2K+3K+4K+5K=1
15K=1
1
K=
15
2 3 5 10
= + + =
15 15 15 15
2
=
3
1 5
P x x 1
(iii) P x / x 1 =
1 5 2 2
2 2 P(x 1)
2K
= 15
2K 3K 4K 5K
+ + +
15 15 15 15
=1/7
F(x ) = 0 ; x 1
1
= ; 1 x 2
15
3
= ; 2x3
15
6
= ; 3 x 4
15
10
= ; 4x5
15
=1 ; 5 x
38. a) A fair coin is tossed three times. Let X be the number of tails appearing. Find the
probability distribution of X. And also calculate E (X).
X 0 1 2 3
P(X) 1/8 3/8 3/8 1/8
E (X ) =
3
2
1
b) f (x ) dx = 0.05
k
k = (0.95) = 0.9830
1/3
39. a) A continuous random variable X that can assume any value between x=2 and x=5
has a
density function given by f (x ) = k (1 + x ). Find PX 4
b) Find the value of (a) C and (b) mean of the following distribution given
( )
C x − x 2 for 0 x 1
f (x ) =
0 otherwise
Soln:
4
a) p X 4 = f (x ) dx
2
2
k=
27
16
P[X 4] =
27
b) C = 6
1
Mean = E[X] =
2
40. A continuous r.v. has the pdf of f(x) = kx4; –1 < x < 0. Find the value of k and also
1 1
P X − /X −
2 4
Soln:
k=5
1 1
P X − /X − = 0.0303
2 4
1
for 0 x k
41. A random variable X has density function given by f(x) = k
0 otherwise
th
Find (i) m.g.f. (ii) r moment (iii) mean (iv) variance.
Soln:
kt (kt) r
MX(t) = 1 + + ... + ...
2! (r + 1)!
rKr
Coefficient of t =
(r + 1)!
K
Mean =
2
K2
Variance =
12
43. Find the probability distribution of the total number of heads obtained in four tosses
of a
balanced coin. Hence the MGP of X, mean of X and variance of X.
Soln:
MX(t) = E[etX] = x
e tx p(x) , if X is discrete
1
MX(t) = [1 + 4et + 6e2t + 4e3t + e4t]
16
E[X] = 2
PROBABILITY DISTRIBUTIONS
Introduction
DISCRETE DISTRIBUTIONS
Let A be an event ((trail) associated with a random experiment such that p(A) remains
the same for the repetitions of that random experiment, then the events are called Bernoulli
trails.
A random variable X which takes only two values either 1 (success) or 0(failure) with
probability p and q respectively. i.e., P(X=1)=p, P(X=0)=q, p+q=1 is called Bernoulli variate
and is said to have a Bernoulli distribution.
Definition.
p( x) = P( X = x) = nc x p x q n − x ,x=0,1,2,…,n
=0, otherwise
Np( x) = N nc x p x q n − x , x = 0,1,2,..., n
1. Each trail results in two mutually disjoint outcomes, termed success and failure.
Mean = E ( X ) = xp( x)
x
n
= xnc x p x q n − x
x =0
n
n!
= x x!(n − x)! p x q n − x
x =0
n
n(n − 1)! pp x −1q n − x
= ( x − 1)!(n − x)!
x =0
n
(n − 1)! p x −1q n − x
= np
x =1
( x − 1)!(n − x)!
n
(n − 1)! p x −1q n − x
= np
x =1
( x − 1)!(n − x)!
n
= np (n − 1) c x −1 p x −1q n − x
x =1
n
= np (n − 1) c x −1 p x −1q ( n −1) − ( x −1)
x =1
= np(q + p) n −1
Mean=np
Var ( X ) = E ( X 2 ) − E ( X )2
P( X = x) = p( x) = nc x p x q n − x , x = 0,1,2,..., n
n
E( X 2 ) = x 2 p( x)
x =0
n
= x 2 nc x p x q n − x
x =0
n
n!
= x 2 x!(n − x)! nc x p x q n − x
x =0
n
x( x − 1) + x x!(n − x)! p x q n − x
n!
=
x =0
n n
n! x n− x n!
= x( x − 1) p q +x p xqn− x
x =0
x!(n − x)! x =0
x!(n − x)!
n
n(n − 1)( n − 2)!
= ( x − 2)!(n − x)! p 2 p x − 2 q n − x + E ( X )
x =0
n
(n − 2)!
= n(n − 1) p 2 ( x − 2)!(n − x)! p x − 2 q n − x + np
x =0
= n(n − 1) p 2 (q + p) n −2 + np
E ( x 2 ) = n(n − 1) p 2 + np
= n(n − 1) p 2 + np − n 2 p 2
= p 2 n 2 − n − n 2 + np
= np(1 − p)
= npq
P( X = x) = nc x p x q n − x , x = 0,1,2,...n
M x (t ) = E (e tx )
n
= etx nc x p x q n − x
x =0
n
= nc x ( pet ) x q n − x
x =0
= (q + pet ) n
Examples
1. The mean and variance of a binomial distribution are 4 and 4/3 respectively. Find P(X≥1)
if n=6.
Solution
4
npq 3
=
np 4
q= 1
3
Given n=6
P( X = x) = nc x p x q n − x
P( X 1) = 1 − P[ X 1]
= 1 − P[ X = 0]
= 1 − 6 c0 p 0 q 6 − 0
= 1 − q6
1
= 1 − ( )6
3
1
=1−
729
728
=
729
2. The mean and variance of binomial distributions are 4 and 3 respectively. Find P(X=0),
P(X=1) and P(X≥2).
Solution
Mean of binomial distribution = np = 4
npq 3
=
np 4
3
q=
4
Since Mean = np = 4
= n(1/4) = 4
n = 16
P( X = x) = nc x p x q n − x
P( X = 0) = nc0 p 0q n
3
= 16 c0 ( )16
4
3
= ( )16 = 0.01
4
P( X = 1) = nc1 p1q n −1
= 16c1 p1q15
1 3
= 16( )( )15 = 0.053
4 4
P( X 2) = 1 − P( X 2)
= 1 − [ P( X = 0) + P( X = 1)]
= 0.937
3. If the mean is 3 and variance is 4 of a random variable X, check whether X follows
binomial distribution,
Solution
No. Because for a binomial distribution mean should be greater than the variance.
Therefore mean should be greater than the variance for a binomial distribution.
3. A binomial variate X satisfies the relation 9P(X=4) = P(X=2) when n=6. Find the
parameter p of the binomial distribution.
Solution
P( X = x) = nc x p x q n − x
P( X = 4) = 6c4 p 4q 6 − 4
P( X = 4) = 6c4 p 4q 2
P( X = 2) = 6c2 p 2q 4
9 * 6c4 p 4q 2 = 6c2 p 2q 4
135 p 2 = 15q 2
9 p2= q2
9 p 2 −q 2 = 0
9 p 2 −(1 − p) 2 = 0
9 p 2 −(1 + p 2 −2 p) = 0
9 p 2 −1 − p 2 +2 p = 0
8 p 2 +2 p − 1 = 0
− 2 4 + 32
p=
16
4 −8
−26 = ,
p= 16 16
16
1 −1
p= ,
4 2
4. Out of 800 families with 4 children each, how many families would be expected to have
Solution
Considering each child is a trial, n=4. Assuming that birth of a boy is success, p = 1/2
and q = ½
P( X = x) = nc x p x q n − x
1 1
P( X = 2) = 4c2 ( ) 2 ( ) 4 − 2
2 2
1 3
= 6( ) 4 =
2 8
= 800(3/8) = 100 * 3
= 300
=1- P[X=0]
1 1
P ( X = 0) = 4 c 0 ( ) 0 ( ) 4 − 0
2 2
1 15
1 − P( X = 0) = 1 − ( ) 4 =
2 16
1 1 1 1
= 1 − 4c0 ( ) 0 ( ) 4 − 0 + 4c1 ( )1 ( ) 4 −1
2 2 2 2
1 1 1 4 5
= 1 − ( ) 4 + 4( ) 4 = 1 − ( + ) = 1 −
2 2 16 16 16
11
=
16
1 1 1 1 1
= 1 − 4c4 ( ) 4 +4c 0 ( ) 0 ( ) 4 = 1 − [( ) 4 + ( ) 4 ]
2 2 2 2 2
= 1- 2/16 = 7/8
= 700
5. An irregular 6 faced die is such that the probability that it gives 3 even numbers in 5 throws
is twice the probability that it gives 2 even numbers in 5 throws. How many sets of exactly 5
trials can be expected to give no even number out of 2500 sets.
Solution
Let the probability of getting an even number with the unfair die bep .
5c3 p 3q 2 = 2 * 5c2 p 2q 3
p = 2q
p = 2(1-p)
3p = 2
P = 2/3
q=1-p = 1/3
1 1
5c0 p 0 q 5 = ( ) 5 =
3 243
Therefore number of sets having no success ( even number) out of N sets = N [ P(X=0) ]
= 2500 * 1/243
= 10 nearly
7.. Assuming that half of the population is vegetarian and that 100 investigators each take 10
individuals to see whether they are vegetarians, how many would you expect to report that 3
people or less were vegetarians?
Solution
P( X = x) = nc x p x q n − x
1 1
= 10 c x ( ) x ( )10 − x
2 2
1
= 10 c x ( )10
2
P( X 3) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)
1 1 1 1
= 10 c0 ( )10 + 10 c1 ( )10 + 10 c 2 ( )10 + 10 c3 ( )10
2 2 2 2
1
= ( )10 [1 + 10 + 45 + 120] \
2
1 176
= ( )10 [176] = = 0.1718
2 1024
Among 100 investigators, the number of investigators who report that 3 or less were
consumers
=100 * 0.1718
=17 investigators
14. A factory produces 10 articles daily. It may be assumed that there is a constant probability
p= 0.1 of producing a defective article. Before the articles are stored, they are inspected and
the defective ones are set aside. Suppose that there is a constant probability r = 0.1, that a
defective article is misclassified. If X denote the number of articles classified as defective at
the end of a production day, find a) P(X=3) and b) P(X>3)
Solution
Let X be the random variable represented by the number of articles which are
defective.
P[ a defective article is classified as defective ] = P( an article produced is defective) *P( it is
classified as defective)
= 0.1 *0.9
p = 0.09
q = 1 – p = 0.91
n= 10
P( X = x) = nc x p x q n − x
P( X = 3) = 10 c3 (0.09)3 (0.91) 7
= 0.0452
P( X 3) = 1 − P( X 3)
= 1 − [ P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)]
= 1 − [10c0 (0.09) 0 (0.91)10 + 10c1 (0.09)1 (0.91)9 + 10c 2 (0.09) 2 (0.91)8 + 10c3 (0.09)3 (0.91) 7 ]
=0.0089
POISSON DISTRIBUTION
Definition
If X is a discrete random variable that assumes only non-negative values such that its
probability mass function is given by
e − x
P( X = x ) = , x = 0,1,2,3,... where 0
x!
= 0, otherwise
3. np (= ) is finite and p = , q = 1− p = 1− where is a positive constant.
n n
Mean = E ( X ) = xP( X = x)
x =0
e − x
= x! x
x =0
x x −1
= e − x!
x =1
x −1
= e −
x =1
( x − 1)!
Mean = e − e
=
Var(X) = E ( X 2 ) − [ E ( X )]2
[ E ( X )] 2 = x 2 p( x)
x =0
e − x
= x2 x!
x =0
e − x
= ( x 2 + x − x) x!
x =0
e − x
= ( x( x − 1) + x) x!
x =0
e − x e − x
= x( x − 1)
x!
+ x
x!
x =0 x =0
e − x e − x
= ( x − 2)! + ( x − 1)!
x=2 x =1
e − x − 2 − x −1
e
= 2 ( x − 2)! ( x − 1)!
+
x=2 x =1
x − 2
x −1
= 2 e − ( x − 2)! +e − ( x − 1)!
x=2 x =1
= 2 e − e + e − e
E (X 2 ) = 2 +
Var(X) = E ( X 2 ) − [ E ( X )] 2 = 2 + − 2 =
Examples:
1.If X is a Poisson variate such that P(X=1)=3/10 and P(X=2)=1/5. Find P(X=0) and P(X=3)
Solution
e − x
P( X = x ) =
x!
e − 1 3
P( X = 1) = = (1)
1! 10
e − 2 1
P( X = 2 ) = = (2)
2! 5
e - 2 1
(2) 2! = 5
−
(1) e 3
1! 10
(2) 2 4
= =
(1) 2 3 3
4
− x
e 3 4
e − x 3
P( X = x ) = =
x! x!
4
− 0
e 3 4 4
P( X = 0) = 3 = e − 3 = 0.2637
0!
4
− 3
e 3 4
P( X = 3) = 3 = 0.1047
3!
2. In a certain factory producing razor blades, there is a small chance 1/500 for any blade to
be defective. The blades are supplied in packets of 10.Use Poisson distribution to calculate
the approximate number of packets containing
Solution
λ=np=10/500=1/50=0.02
e−0.02 (0.02)0
= = 0.9802
0!
= 9802
= 1- P(X<1)
= 1-P(X=0)
= 1 – 0.9802 = 0.0198
Therefore the number of packets containing at least one defective= 10000 * 0.0198
= 198
=P(X=0) + P(X=1)
= 0.9997
Therefore the number of packets containing at most 1 defective blade = 10000 * 0.9997
= 9997
3. An insurance company has discovered that only about 0.1% of the population is involved
in a certain type of accident each ear. If its 10000 policy holders were randomly selected from
the population, what is the problem that not more than 5 of its clients are involved in such an
accident next year?
Solution
n= 10000
e − x e −10 (10) x
P( X = x) = =
x! x!
P( X 5) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3) + P( X = 4) + P( X = 5)
e−10 (10)0 e−10 (10)1 e−10 (10)2 e−10 (10)3 e−10 (10)4 e−10 (10)5
P( X 5) = + + + + +
0! 1! 2! 3! 4! 5!
10 100 1000 10000 100000
= e −10 1 + + + + +
1 2 6 24 120
= 0.0671
4. In a given city 4% of all licenced drivers will be involved in at least 1 road accident in any
given year. Determine the probability that among 150 licenced drivers ran only chosen in this
city
Solution
4
= np = 100 =6
100
e −6 65
P( X = 5) = = 0.1606
i) 5!
ii) P( X 3) = P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)
e −6 6 e −6 6 2 e −6 63
= e−6 + + + = 0.1512
1! 2! 31
7. Messages arrive at a switch board in a Poisson manner at an average rate of six per hour.
Find the probability for each of the following events
Solution
e − x e −6 6 x
P( X = x ) = =
x! x!
e −6 62
P( X = 2) = = 0.0446
2!
e −6 60
P( X = 0) = = 0.0025
0!
P( X 3) = 1 − P( X 3) = 1 − [ P( X = 0) + P( X = 1) + P( X = 2)]
= 1 − e −6 (1 + 6 + 18) = 0.9380
8. A car hire firm has 2 cars which it hires out day by day. The number of demands for a car
on each day follows a Poisson distribution with mean 1.5. Calculate the proportion of days on
which
Solution
e − x
P( x demands in a day)= P( X = x) =
x!
Given: λ = 1.5
e −1.5 (1.5) x
Now P( X = x) =
x!
e −1.51.50
P( X = 0) = = e −1.5 = 0.2231
0!
P( X 2) = 1 − [ P( X 2)]
= 1 − [ P( X = 0) + P( X = 1) + P( X = 2)]
= 0.19126
9. The proofs of a 500 page book contains 500 misprints. Find the probability that there are at
least 4 misprints in a randomly chosen page.
Solution
e − x e−11x
P( X = x ) = =
x! x!
P( at least 4 mistakes) = P( X 4)
= 1 − P( X 4)
= 1 − [ P( X = 0) + P( X = 1) + P( X = 2) + P( X = 3)]
e −1 e −1 e −1 e −1
= 1− + + +
0! 1! 2! 3!
1 1
= 1 − e −1 1 + 1 + +
2 6
= 0.0180
NORMAL DISTRIBUTION
Definition
2
1 x−
−
1
A normal distribution is a continuous distribution given by y = e 2 where X is
2
a continuous normal variate distributed with density function
2
1 x−
−
1
f ( ) = e 2 with mean and standard deviation .
2
2
1 x−m
−
1
y= e 2 is the standard form of the normal curve with origin at (m,0).
2
1. The normal distribution is a symmetrical distribution and the graph of the normal
distribution is bell shaped.
2. The curve has a single peak point (i.e.,) the distribution is unimodal
3. The mean of the normal distribution lies at the centre of normal curve.
4.Because of the symmetry of the normal curve, the median and mode are also at the centre of
the normal curve. Hence in a normal distribution the mean, median and mode coincide.
5. The tails of the normal distribution extend indefinitely and never touch the horizontal axes.
That is we say that the normal curve approaches approximately from either side of its
horizontal axes.
6. The normal distribution is a two parameter probability distribution. The parameters mean
and standard deviation (μ,σ) completely determine the distribution.
7. Area property:
In a normal distribution about 67% of the observations will lie between mean S.D
i.e., (μ σ). About 95% of the observations lie between mean 2S.D (i.e., μ 2σ). About
99% of the observation will lie between mean 3S.D i.e.,(μ 3σ) .
If X is a normally distributed random variable, μ and σ are respectively its mean and
X −
standard deviation, then Z = is called standard normal random variable.
Normal table
Special table called table of areas under normal curve is available to determine
probabilities that the random variable lies in a given range of values of the variables. Using
the table, we can determine the probability for X, taking a value less than x (X<x) and also
for a given probability we determine the value x such that X < x
,…,(mn, σn) respectively then X1+X2+…..+ Xn is also a normal variate with parameter (m, σ)
Examples
1) X is a normal variate with mean 30 and standard deviation 5. Find the probability that
i) 26 X 40 ; ii) X 45 iii) X − 30 5 .
Solution
Given = 30; = 5
X −
z=
26 − 30
i) when X= 26, z = = −0.8
5
40 − 30
when X=40, z = =2
5
= P0 z 0 + P.80 z 2
=0.2881+0.4772
=0.7653.
45 − 30
ii) when X=45, z = =3
5
P( X 45) = P( z 3)
=0.5- P(0 z 3)
=0.5-0.4987=0.0013.
iii) To find P( X − 30 5)
P( X − 30 5) = P(25 X 35)
25 − 30
When X=25, z = = −1
5
35 − 30
When X=35, z = =1
5
= 2 P(0 z 1)
=2(0.3413)
=0.6826.
P( X − 30 5) = 1 − P( X − 30 5)
=1-0.6826
=0.3174
P(15 X 40) .
Solution
Given = 20 and = 10
X −
We know that z = .
15 − 20
When X = 15 , z = = −0.5 and
10
40 − 20
When =40, z = =2
10
=0.4772+0.1915
=0.6687.
3. The average seasonal rainfall in a place is 16 inches with a standard deviation of 4 inches.
What is the probability that in a year the rainfall in that place will be between 20 and 24
inches?
Solution
X −
z=
20 − 16
When X=20, z = =1
4
24 − 16
When X =24, z = =2
4
= P(0 z 2) − P(0 z 1)
=0.4772-0.3413
=0.1359.
Note
E(aX+bY)=aE(X)+bE(Y)
Var(aX+bY)=a2V(X)+b2V(Y)
Var(a)=0
E(a)=a
4. X is a normal variate with mean 1 and variance 4. Y is another normal variate independent
of X with mean 2 and variance 3. What is the distribution of X+2Y?
Solution
Since X and Y are independent normal variates, X+2Y will also be a normal variate by the
additive property and
=1+2(2)=5
Variance of X+2Y=V(X+2Y)=V(X)+22V(Y)
=4+4(3)=16.
5. The saving bank account of a customer showed an average balance of Rs.150 and a
standard deviation of Rs.50. Assuming that the account balances are normally distributed.
Solution
1) To find P( X 200)
X −
We know that z =
200 − 150
When X=200, z = =1
50
=0.5-0.3413
=0.1587.
120 − 150
When X= 120, z = = −0.6
50
170 − 150
When X=170, z = = 0.4
50
=0.2257+0.1554=0.3811
75 − 150
When X=75, z = = −1.5
50
P( X 75) = P( z −1.5)
=0.5-0.4322=0.0668.
6. The mean yield for one-acre plot is 662 kilos with standard deviation 32 kilos. Assuming
normal distribution, how many one-acre plots in a patch of 1000 plots would you expect to
have yield over 700 kilos below 650 kilos.
Solution
Given = 662, = 32
X − X − 662
z= =
32
700 − 662
When X=700, z = =1.19
32
650 − 662
When X=650, z = = −0.375 = −0.38
32
Descriptive Statistics
Types of Data:
155 7
156 4
157 2
158 5
159 1
160 1
-----
20
----
0-10 0
10-20 0
20-30 1
30-40 1
40-50 2
50-60 1
60-70 5
70-80 4
80-90 4
90-100 1
20
Individual Observations
Mean =
x
n
Mean = A +
d where d=x-A, A-Assumed mean( Individual Observations)
n
n +1
Median = th item( Individual Observations)
2
Discrete Series:
Mean =
fx where N = f
N
Mean = A +
fd where d=x-A, A-Assumed mean
N
Mean = A +
fd i where d=
x−A
, A-Assumed mean , i= class interval
N i
N +1
Median = th item
2
Mean =
fm where N = f
N
Mean = A +
fd where d=m-A, A-Assumed mean
N
Mean = A +
fd i where d=
m−A
, A-Assumed mean , i= class interval
N i
N
Median = th item
2
N
− cf
Median = L + 2 i
f
N = f
i-class interval
f1 − f 0
Mode = M 0 = L + i
2f1 − f 0 − f 2
i-class interval
Empirical relation:
Mean=
x
n
=(25+32+28+34+24+31+36+ 27+29+30)/10=296/10
=29.6
24,25,27,28,29,30,31,32,34,36
Median =(n+1)/2th item
=(10+1)/2 th item=5.5th item
5.5th item =(5th item+6th item)/2=(29+30)/2=29.5
There is no mode.
2. Find the mode of following data: 2,3,2,1,3,2,3,3,2,1,3,3,3,2,2,1,1,3,3,3
x f
155 30
156 20
157 5
158 5
159 10
160 15
161 5
162 10
Solution:
x f fx cf(cumulative
frequency)
155(mode) 30 4650 30
156 20 3120 50
157(Median) 5 785 55
158 5 790 60
159 10 1590 70
160 15 2400 85
161 5 805 90
162 10 1620 100
N = f fx
=100 =15760
N=100
Mean=
fx
N
=15760/100=157.60
X = 157.60
Median =157
Mode =155
4.Find the mean, median and mode for the following data:
Class(x) frequency(f)
0-10 20
10-20 5
20-30 3
30-40 8
40-50 10
50-60 35
60-70 10
70-80 4
80-90 3
90-100 2
N = f =100
Solution:
Class(x) m frequency(f) fm cf
0-10 5 20 100 20
10-20 15 5 75 25
20-30 25 3 75 28
30-40 35 8 280 36
40-50 45 10f0 450 46
50-60(median, 55 35f1 1925 81
Mode)
60-70 65 10f2 650 91
70-80 75 4 300 95
80-90 85 3 255 98
90-100 95 2 190 100
N = f =100 fm =4300
N=100
Mean =
fm where N = f
N
=4300/100=43
Median =(N/2)th item = (100/2)th item =50th item
N
− cf
Median = L + 2 i
f
50 − 46
= 50 + 10
35
=50+1.1428=51.1428
f1 − f 0
M0 = L + i
2f1 − f 0 − f 2
35 − 10
= 50 + 10
2(35) − 10 − 10
=50+5=55
Another method:
Class(x) m frequency(f) m − 55 fd
d=
10
0-10 5 20 -5 -100
10-20 15 5 -4 -20
20-30 25 3 -3 -9
30-40 35 8 -2 -16
40-50 45 10 -1 -10
50-60 55 35 0 0
60-70 65 10 1 10
70-80 75 4 2 8
80-90 85 3 3 9
90-100 95 2 4 8
N = f =100 fd =-120
Mean = A +
fd i where d=
m−A
, A-Assumed mean , i= class interval
N i
A=55, i=10
−120
Mean= 55 + 10
100
=55-12=43
MEASURES OF DISPERSION
Range = L-S
L −S
Coefficient of Range =
L+S
Problems:
Example 2: Calculate the coefficient of range separately for the two sets of data
given below:
Set 1 8 10 20 9 15 10 13 28
Set 2 30 35 42 50 32 49 39 33
Solution: It can be seen that the range in both the sets of data is the same:
Set 1 28 -8=20
Set 2 50 -30=20
28–8 =0.55
28+8
Coefficient of range in set 2 is:
50 – 30
50 +30 = 0.25
3.
X f
158 15
159 20
160 32
161 35
162 33
163 22
164 20
165 10
166 8
N=195
Range = 166-158=8
20- 40 7
40- 60 11
60- 80 30
80-100 17
100-120 5
Total 70
Quartile Deviation :
Q3 − Q1
QD =
2
Q3-3rd quartile
Q3 − Q1
Coefficient of quartile deviation =
Q3 + Q1
To find Q1:
N +1
Q1 = th item (Individual observations)
4
N +1
Q1 = th item (Discrete series)
4
N
Q1 = th item (continuous series)
4
N
− cf
Q1 = L + 4 i
f
To find Q3:
3(N + 1)
Q3 = th item (Individual observations)
4
3(N + 1)
Q3= th item (Discrete series)
4
3N
Q3 = th item (continuous series)
4
3N 1.
− cf
Q3 = L + 4 i
f
Find the Quartile deviation and the coefficient of QD 3, 8, 6, 10, 12, 9, 11, 10, 12, 7
Solution:
3, 6 , 7, 8,9,10,10,11,12,12
N=10
QD=(11.25-6.75)/2=2.25
QD= (Q3-Q1)/2
Coefficient of QD=0.25
2. Compute the Quartile deviation and its relative measure.
X f
158 15
159 20
160 32
161 35
162 33
163 22
164 20
165 10
166 8
N=195
Solution:
X f cf
158 15 15
159 20 35
160Q1 32 67
161 35 102
162 33 135
163Q3 22 157
164 20 177
165 10 187
166 8 195
N=195
N +1
Q1 = th item (Discrete series)
4
=3(195+1)/4=147th item
Q3=163
QD=(Q3-Q1)/2
=(163-160)/2=1.5
CQD=0.0092
3. Find the quartile deviation and its relative measure.
Class(x) frequency(f)
0-10 20
10-20 5
20-30 3
30-40 8
40-50 10
50-60 35
60-70 10
70-80 4
80-90 3
90-100 2
N = f =100
Solution:
Class(x) frequency(f) cf
0-10 20 20
10-20 5 25
20-30 Q1 3 28
30-40 8 36
40-50 10 46
50-60Q3 35 81
60-70 10 91
70-80 4 95
80-90 3 98
90-100 2 100
N = f =100
N
− cf
Q1 = L + 4 i
f
QD= (Q3-Q1)/2
=(58.29-20)/2
=19.15
CQD=(58.29-20)/ (58.29+20)
= 0.4890
Mean Deviation:
MD =
D where D = x − x (Individual Observations)
N
(OR)
MD =
D where D = x − Median (Individual Observations)
N
MD =
f D where D = x − x (Discrete series)
N
(OR)
MD =
f D where D = x − Median (Discrete series)
N
MD =
f D where D = m − x (Continuous series)
N
(OR)
MD =
f D where D = m − Median (Continuous series)
N
1. Find the MD of the set of numbers 3, 8, 6, 10, 12, 9, 11, 10, 12, 7
Soln:
X |D|=|x-8.8|
3 5.8
8 0.8
6 2.8
10 1.2
12 3.2
9 0.2
11 2.2
10 1.2
12 3.2
7 1.8
Total= | D | =22.4
X =88
N=10
Mean = X =88/10
X =8.8
X f cf |D|=|x-161| f|D|
158 15 15 3 45
159 20 35 2 40
160 32 67 1 32
161 35 102 0 0
Median
162 33 135 1 33
163 22 157 2 44
164 20 177 3 60
165 10 187 4 40
166 8 195 5 40
Solution:
N=195 f D =334
Median = (N+1)/2 th item=(195+1)/2th item= 98th item
Median =161
MD =
f D where D = x − Median (Discrete series)
N
Mean deviation = 334/195=1.712
2-4 20
4-6 40
6-8 30
8-10 10
Solution:
X m f fm |D|=|m-5.6| f|D|
2-4 3 20 60 2.6 52
4-6 5 40 200 0.6 24
6-8 7 30 210 1.4 42
8-10 9 10 90 3.4 34
N=100 fm =560 f | D |
=152
Mean=
fm
N
=560/100=5.6
MD =
f D where D = m − x (Continuous series)
N
MD=152/100=1.52
CMD=MD/Mean
=1.52/5.6=0.271
Standard Deviation:
=
x 2
d d
2 2
=
fx 2
fd fd
2 2
=
fx 2
fd fd
2
m−A
2
1. Find the Standard deviation of the set of numbers 3, 8, 6, 10, 12, 9, 11, 10, 12, 7
Soln:
X x = X − 8.8 x2
3 -5.8 33.64
8 -0.8 0.64
6 -2.8 7.84
10 1.2 1.44
12 3.2 10.24
9 0.2 0.04
11 2.2 4.84
10 1.2 1.44
12 3.2 10.24
7 -1.8 3.24
Total= x 2 =73.6
X =88
N=10
Mean = X =88/10
X =8.8
Standard deviation = =
x 2
where x = X − X
n
73.6
=
10
= 7.36 = 2.712
2. Compute the Standard deviation.
X f fx x = X −161.5128 x2 fx2
158 15 2370 -3.5128 12.3398 185.0965
159 20 3180 -2.5128 6.3142 126.2833
160 32 5120 -1.5128 2.2886 73.23404
161 35 5635 -0.5128 0.2630 9.2037
162 33 5346 0.4872 0.2374 7.8330
163 22 3586 1.4872 2.2118 48.6588
164 20 3280 2.4872 6.1862 123.7233
165 10 1650 3.4872 12.1606 121.606
166 8 1328 4.4872 20.1350 100.6748
N=195 31495 856.7055
Mean=
fx
N
= 31495/195=161.5128
=
fx 2
856.7055
= =
195
=2.09603
2-4 20
4-6 40
6-8 30
8-10 10
Coefficient of Variation:
CV = 100
X
4. Find the Range, Quartile deviation , Mean deviation and standard deviation for the
following data:
2-4 20
4-6 40
6-8 30
8-10 10
Solution:
range =
10-2=8
cf d=(m-A)/2 fd f(d2)
Mid-points Frequency |D|=|m-
Size of Item (m) (f) fm 5.6| f |D|
Here N=100
MD =
f D where D = m − x
N
Mean=5.6
152
MD =
100
MD=1.52
Q1=(N/4)th item
=25th item
Q1 lies in 4-6
N
− cf
Q1= L + 4 i
f
25 − 20
= 4+ 2
40
Q1=4.25
Q3=(3N/4)th item
=75th item
Q3 lies in 6-8
3N
− cf
Q3= L + 4 i
f
75 − 60
= 6+ 2
30
Q3=7
Q3 − Q1
QD =
2
=(7-4.25)/2=1.375
Coefficient of QD=0.244
=
fx 2
where d = m − X
n
fd fd
2
m−A
2
=1.8
CV=32.1
Mean=161.5128
QD=1.5
CQD=0.0092
MD about mean=1.736
MD about median=1.7128
856.7055
SD= = 2.096
195
CV=(2.096/161.5128)X100=1.2977
Range=8
A: 12 115 6 73 7 19 119 36 84 29
B: 47 12 16 42 4 51 37 48 13 0
Solution:
X x=X-mean x2 Y y=Y-mean y2
12 -38 1444 47 20 400
115 65 4225 12 -15 225
6 -44 1936 16 -11 121
73 23 529 42 15 225
7 -43 1849 4 -23 529
19 -31 961 51 24 576
119 69 4761 37 10 100
36 -14 196 48 21 441
84 34 1156 13 -14 196
29 -21 441 0 -27 729
total=500 17498 270 3542
N=10
X=
x
n
=500/10=50
Y=
y
n
270/10=27
x =
x 2
17498
x = = 1749.8
10
x = 41.8306
y =
y 2
3542
y = = 354.2 = 18.846
10
CV for A:
x
cv = 100
X
83.6612%
CV for B:
y
100
CV= Y
69.6%
MEASURES OF SKEWNESS
Skewness
Literal meaning of skewness is lack of symmetry. It measures the degree of departure of a distribution from
symmetry and reveals the direction of scatterdness of the items.
A frequency distribution is said to be symmetrical when values of the variables equidistant from their mean
have equal frequencies. If a frequency distribution is not symmetrical, it is said to be asymmetrical or skewed.
Any deviation from symmetry is called skewness.
According to Morris Humberg Skewness refers to the asymmetry or lack of symmetry in the shape of a
frequency distribution.
According to Croxton & Cowden When a series is not symmetrical it is said to be asymmetrical or skewed.
According to Simpson & Kafka Measures of skewness tell us the direction and the extent of skewness. In a
symmetrical distribution the mean, median and mode are identical. The more we move away from the mode,
the larger the asymmetry or skewness.
Symmetrical curve
The figure , given below, presents the shape of a symmetrical curve which is bell shaped having no skewness.
The value of mean (M), median (Md) and mode (Mo) for such a curve would be identical.
Fig. 5.1 Symmetrical distribution
In a symmetrical distribution the values of mean, median and mode coincide. The spread of the frequencies is
the same on both sides of the centre point of the curve. For a symmetrical distribution Mean = Median =
Mode.
Positively skewed curve
A positively skewed curve has a longer tail towards the higher values of X i.e. the frequency curve gradually
slopes down towards the higher values of X. In a positively skewed distribution the mean is greater than the
median and then mode and the median lies in between mean and mode. The frequencies are spread over a
greater range of values on the high value end of the curve (the right hand side) as is clear from the Figure
This measure is based on the fact that the mean and the mode are drawn widely apart. Skewness will be
positive if mean > mode and negative if mean < mode. There is no limit to this measure in theory and this is a
slight drawback. But in practice the value given by this formula is rarely very high and its value usually lies
between -1 and +1.
Problems:
1. Calculate the Karl Pearson’s coefficient of Skewness :
3, 8, 6, 10, 12, 9, 11, 10, 12, 7
Solution:
Ascending order : 3, 6, 7,8, 9,10,10,11,12,12
Mean=
x
n
=88/10=8.8
We can’t define Mode.
Median = (n+1)/2th item = (10+1)/2th item
=5.5 th item
Median = (5th item+6th item)/2=(9+10)/2=9.5
X x = X − 8.8 x2
3 -5.8 33.64
8 -0.8 0.64
6 -2.8 7.84
10 1.2 1.44
12 3.2 10.24
9 0.2 0.04
11 2.2 4.84
10 1.2 1.44
12 3.2 10.24
7 -1.8 3.24
Total= x 2 =73.6
X =88
Standard deviation = =
x 2
where x = X − X
n
73.6
=
10
= 7.36 = 2.712
3(X − Median)
Karl Pearson’s coefficient of skewness=
=3(8.8-9.5)/2.712=-0.77433
Negative skewed.
X f fX x=X-mean x2 f x2
155(mode) 30 4650 -2.60 6.76 20.28
156 20 3120 -1.60 2.56 51.20
157 5 785 -0.60 0.36 1.8
158 5 790 0.40 0.16 0.8
159 10 1590 1.40 1.96 19.6
160 15 2400 2.40 5.76 86.4
161 5 805 3.40 11.56 57.8
162 10 1620 4.40 19.36 193.6
N = f fx f x2
=100 =15760 =431.48
N=100
Mean=
fx
N
=15760/100=157.60
X = 157.60
Mode =155
Standard deviation =
fx 2
x=X-Mean
N
431.48
= =2.077
100
X − Mode
Skewness=
=(157.60-155)/2.077=1.252
Positively skewed.
Class(x) frequency(f)
0-10 20
10-20 5
20-30 3
30-40 8
40-50 10
50-60 35
60-70 10
70-80 4
80-90 3
90-100 2
N = f =100
Solution:
Mean =
fm where N = f
N
=4300/100=43
f1 − f 0
M0 = L + i
2f1 − f 0 − f 2
35 − 10
= 50 + 10
2(35) − 10 − 10
=50+5=55
Standard deviation=
fx 2
x=m-Mean
N
59000
= = 590
100
=24.28
Skewness=(43-55)/24.28=-0.4942
Negatively skewed.
A=22.5
X f m cf d=(m-A)/5 fd d2 fd2
0-5 2 2.5 2 -4 -8 16 32
5-10 5 7.5 7 -3 -15 9 45
10-15 7 12.5 14 -2 -14 4 28
Q1 13 17.5 27 -1 -13 1 13
15-20
M 21 22.5 48 0 0 0 0
20-25
16 27.5 64 1 16 1 16
Q3
25-30
30-35 8 32.5 72 2 16 4 32
35-40 3 37.5 75 3 9 9 27
Total 75 -9 193
18.75 − 14
Q1=15+ 5 =16.827
13
37.5 − 27
Median= 20+ 5 =22.5
21
Q3=27.57
Sk=(27.57+16.827-45)/(27.57-16.827)=-0.055
Negatively skewed.
UNIT III
Multivariate Analysis
Regression:
1. Find the regression lines
Solution:
X Y dx=X-12 dy=Y-43 dx2 dy2 dxdy
10 40 -2 -3 4 9 6
12 38 0 -5 0 25 0
13 43 1 0 1 0 0
12 45 0 2 0 4 0
16 37 4 -6 16 36 -24
15 43 3 0 9 0 0
78 246 6 -12 30 74 -18
X=
X
N
=78/6=13
Y=
Y =246/6=41
N
N dxdy − dx dy
b xy =
N dy 2 − ( dy )
2
6(−18) − (6)(−12)
b xy =
6(74) − (−12)2
bxy=-0.12
N dxdy − dx dy
b yx =
N dx 2 − ( dx )
2
6(−18) − (6)(−12)
b xy =
6(30) − (6)2
byx=-0.25
the regression equation of X on Y is
(X − X) = b xy (Y − Y)
X-13=-0.12(Y-41)
X-13=-0.12Y+4.92
X=-0.12Y+4.92+13
X=-0.12Y+17.92
3Y-2X=10 -----(2)
2X(1) -2X+4Y=100
(2) -2X+3Y=10
(-)-----------------------
Y=90
i.e., Y = 90
2(90)-X=50
180-x=50
180-50=X
X=130
(i.e) X = 130
2Y-X=50
X=2Y-50
bxy=2
3Y-2X=10
3Y=2X+10
Y=(2/3)Y+(10/3)
byx=2/3
2
r = b xy b yx = r = 2 = 1.155
3
2Y-X=50
2Y=X+50
Y=(1/2)X+25
byx=1/2
2X=3Y-10
X=(3/2)Y-5
byx=3/2
1 3
r = b xy b yx = r = = 0.8660
2 2
𝑣 −
2
f( )
2 1 −1
.( ) , 0< <∞ where v is the degrees of freedom.
2 2
= 𝑣
𝑣
2 .𝑒 2
22 √
2
(i). As degree of freedom increases v, the curve becomes more and more
symmetrical. As v decreases, the curve is skewed more and more to the right.
Uses:
i. 2 -distribution is used to test the goodness fit i.e It is used to judge whether the
sample is from the hypothetical population.
ii. It is used to test the independence of attributes. i.e If a population is known to
have two attributes then -distribution is used to test whether the attributes are
2
Definition: - Test
2
Karl Pearson developed a test for testing the significance of discrepancy between
experimental values and the theoretical values obtained under some theory or
hypothesis.This test is known as 𝜒 2 test of goodness of fit. Let o1, o2, ……..on be the
observed frequencies and e1, e2,……….en be the corresponding expected frequencies
such that ∑𝑛𝑖=1 𝑜𝑖 = 𝑁 = ∑𝑛𝑖=1 𝑒𝑖
(𝑜𝑖 − 𝑒𝑖 )2
𝜒2 = ∑
𝑒𝑖
i. The number of observations N in the sample must be reasonably large, say ≥ 50.
ii. Individual frequencies must be too small.
iii. The number of classes n must be neither too small nor too large ie 4≤ n ≤16.
Problems
On the basis of this data can it be concluded that there is a significant difference in the
effect of the drug and sugar pills?
Soln :
H0 = There is no difference between the effect of the drug and sugar pills.
O E (O- (𝑂 − 𝐸)2
E) 𝐸
52 48 4 0.333
10 11 -1 0.091
20 23 -3 0.391
44 48 -4 0.333
12 11 1 0.091
26 23 3 0.391
1.630
2. The number of automobile accidents per week in a certain community was as follows: 12
8 20 2 14 10 15 6 9 4. Are these frequencies in
agreement with the belief that accident conditions were the same during this 10 week
period?
Soln :
H0 : The given frequencies are consistent with the belief that accident conditions were
the same during the 10 week period.
O E (𝑂 − 𝐸)2
𝐸
12 10 0.4
8 10 0.4
20 10 10
2 10 6.4
14 10 16
10 10 0
15 10 25
6 10 16
9 10 1
4 10 36
26.6
3. The theory predicts the proportion of beans in the four groups A,B,C and D should be
9:3:3:1. In a experiment with 1600 beans the number in the four groups were 882,313,287
and 118. Does the experiment result support the theory?
Soln :
H0 : Experimental support the theory.
Expected Frequencies
A = 900 , B = 300 , C = 300 , D = 100.
O E (𝑂 − 𝐸)2
𝐸
882 900 0.360
313 300 0.563
287 300 0.563
118 100 3.240
4.726
4. A set of 5 coins is tossed 3200 times and the number of heads appearing each time is noted.
The results are given below:
No of heads 0 1 2 3 4 5
Frequency 80 570 1100 900 500 50
Test the hypothesis that the coins are unbiased.
Soln :
H0 The coins are unbiased.
Under this assumption the theoretical frequencies would follow binomial law and can be
obtained by 3200(p+q)5
1
Probability of occurrence of head in a single throw = 2
1 1
Expected frequencies can be obtained 3200 ( 2 + 2 )5
Theoretical frequencies are 100, 500, 1000, 1000, 500, 100 respectively.
O E (𝑂 − 𝐸)2
𝐸
80 100 4
570 500 9.8
1100 1000 10
900 1000 10
500 500 0
50 100 25
58.80
O E (𝑂 − 𝐸)2
𝐸
211 209.43 0.012
90 92.15 0.050
19 20.27 0.008
5 2.97
0 0.33
0.07
6. The following table gives the classification of 100 workers according to sex and the nature
of work. Test whether nature of work is independent of the sex of the worker.
Skilled Unskilled
Males 40 20
Females 10 30
Soln :
H0 : Nature of work is independent of the sex of worksre.
Expected frequencies
Skilled Unskilled
Males 30 30
Females 20 20
O E (𝑂 − 𝐸)2
𝐸
40 30 3.333
10 20 5
20 30 3.333
30 20 5
Total 16.666
7. From the adult male population of four large cities, random sample of sizes given below
are taken and the number of married and single men recorded. Do the data indicate any
significant variation among the cities in the tendency of men to marry?
City A B C D Total
Married 137 164 152 147 600
Single 32 57 56 35 180
Total 169 221 208 182 780
Soln :
H0 : There is no significant difference in the tendency for marriage in the 4 towns.
Expected values
City A B C D Total
Married 130 170 160 140 600
Single 39 51 48 42 180
Total 169 221 208 182 780
O E (𝑂 − 𝐸)2
𝐸
137 130 0.377
164 170 0.212
152 160 0.400
147 140 0.350
32 39 1.256
57 51 0.706
56 48 1.333
35 42 1.167
Total 5.801
If the size of the sample n>30 then that sample is called large sample.
Type I: Test of significance for single Mean
Type II: Test of significance for Difference of means
Type III: Test of significance for single proportion
Type IV: Test of significance for difference of proportions
TYPE V: Test of significance for difference of standard deviations.
Symbols for populations and samples:
Population size = N
Population mean = µ
Population std.deviation = σ
Population Proportion = P
Sample size = n
Sample mean = x
Sample std.deviation = s
Sample proportion = p
Table value:
Level of significance
Critical value
1% 5% 10%
x−
(i) Z=
n
where x is the sample mean
is the population mean,
is the population std.deviation.
n is the sample size.
(ii)
x −
Z=
s n
where x is the sample mean
is the population mean,
s is the sample std.deviation.
n is the sample size.
Null Hypothesis:
H0: = 0
Alternative Hypothesis:
H1: 0
Problem 1
A random sample of 200 tins of coconut oil gave an average weight of 4.95
kgs with a standard deviation of 0.21 kg. Do we accept the hypothesis of net weight 5
kgs per tin at 1% level?
Solution:
Given
Sample size n = 200
Sample mean x = 4.95
Sample std.deviation s = 0.21
Calculated value:
Z = 3.36
Table value:
The table value of Z at 1% level of significance is 2.58
Conclusion:
Cal Z > Tab Z
Reject H0
Problem 2:
A Manufacturer of ball pens claims that a certain pen the manufactures has a
mean writing life of 400 pages with a standard deviation of 20 pages. A purchasing
agent selects a sample of 100 pens and puts them for test. The mean writing life for
the sample was 390 pages. Should the purchasing agent reject the manufactures claim
at 5% level?. Table value of z at 5% level is 1.96 for two tail test and 1.64
approximately for one tail test.
Solution
Given
Sample size n = 100
Population mean µ = 400
Population std.deviation σ = 20
Sample mean x = 390
Calculated value:
Z=5
Table value:
The table value of Z at 5% level of significance is 1.96
Conclusion:
Cal Z > Tab Z
Reject H0
Problem 3:
A sample of 900 members has a mean of 3.4 cms and SD 2.61 cms. Is the
sample from a large population of mean is 3.25 cm and SD 2.61 cms. If the
population is normal and its mean is unknown find the 95% confidence limits of true
mean.
Solution
Given
Sample size n = 900
Population mean µ = 3.25
Population std.deviation σ = 2.61
Sample mean x = 3.4
Calculated value:
Z = 1.724
Table value:
The table value of Z at 5% level of significance is 1.96
Conclusion:
Cal Z < Tab Z
Accept H0
Type – II Test of significance for Difference of means
Consider two different normal populations with mean 1 and 2 and std,
deviation 1 and 2 respectively. Let a sample size n1 be drawn from first
population and an independent sample of size n2 drawn from second population.
Let x1 be the mean of the first sample and x2 be the mean of the second
sample.
Formula:
x1 − x2
Z=
1 2
+
n1 n2
Where
x1 = mean of the first sample
x2 = mean of the second sample
1 = std. deviation of the first population
2 = std. deviation of the second population
n1 = first sample size
n2 = second sample size
Note 1:
If the samples have been drawn from the two population with common
std.deviation
ie. 1 = 2 = ( say)
x1 − x2
Z=
1 1
+
n1 n2
Note 2:
If the common std. deviation is not know
x1 − x2
Z=
s1 s2
+
n1 n2
where
x1 = mean of the first sample
x2 = mean of the second sample
s1 = std. deviation of the first sample
s2 = std. deviation of the second sample
n1 = first sample size
n2 = second sample size
Note 3:
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 2
Problem 1:
A simple sample of heights of 6400 English men has a mean of 67.85 inches
and SD of 2.56 inches, while a sample of heights of 1600 Australians has a mean of
68.55 inches and a SD of 2.52 inches. Do the data indicate that Americans, on the
average taller than Englishmen?
Solutions:
Given
first sample size n1 = 6400 second sample size n2 = 1600
mean of first sample x1 =67.85 mean of 2nd sample x2 =68.55
std. deviation of 1st population 1 =2.5 std. deviation of 2nd population 2 =2.52
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 2 (two tailed test)
The test Statistic :
x1 − x2
Z=
1 2
+
n1 n2
67.85 − 68.55
Z=
2.56 2.52
+
6400 1600
Z = −10
Calculated value:
Z = 10
Table value:
Table value of Z at 5% of level of significance is 1.96
Conclusion:
Cal Z > tab Z
Reject H0
Problem 2:
The sales manager of a large company conducted a sample survey in states A
and B taking 400 samples in each case. The results were in the following table.
State A State B
Average sales Rs. 2,500 Rs. 2,200
S.D. Rs. 400 Rs. 550
Test whether the average sales in the same in the 2 states at 1 % level.
Solution:
Table value:
Table value of Z at 1% of level of significance is 2.58
Conclusion:
Cal Z > tab Z
Reject H0
Problem3:
A college conducts both day and night classes intended to be identical. A
sample of 100 day students yields examination results as x = 72.4, σ = 14.8, and a
sample of 200 night students as x = 73.9, σ = 17.9. Are the two means statistically
equal at 10% level?
Solution:
Given
first sample size n1 =100 second sample size n2 = 200
mean of first sample x1 =72.4 mean of 2nd sample x2 =73.9
std. deviation of !st population 1 =14. std. deviation of 2nd population 2 =17.9
Null hypothesis H0 : 1 = 2
Alternative hypothesis H1 : 1 2 (two tailed test)
The test Statistic :
x1 − x2
Z=
1 2
+
n1 n2
72.4 − 73.9
Z=
14.8 17.9
+
100 200
Z = −0.77
Calculated value:
Z = 0.77
Table value:
Table value of Z at 10% of level of significance is 1.645
Conclusion:
Cal Z < tab Z
Accept H0
Test of Significance(Small samples)
Test of significance based on t – distribution
Definition: t - Test
Consider a normal population with mean µ and s.d σ . Let 𝑥1, 𝑥2, … 𝑥𝑛. be a random
𝑥̅ −𝜇
sample of size n with mean 𝑥 and standard deviation s. We know that 𝑧 = 𝜎
⁄ 𝑛
√
𝑥̅ −𝜇
Now let us define t = 𝑠 . This follows student‟s t distribution with n-1 degrees of freedom
⁄
√𝑛−1
1.Test for the difference between the mean of a sample and that of a population
II. Test for the difference between the means of two samples
II A. If 𝑥 1and 𝑥 2 are the means of two independent samples of sizes 𝑛1and 𝑛2 from a normal
̅𝑥̅̅1̅−𝑥
̅̅̅2̅
population with mean µ and standard deviation σ. It found that 1 1
~𝑁(0,1)
𝜎(√ + )
𝑛1 𝑛2
̅𝑥̅̅1̅− ̅𝑥̅̅2̅
𝑡= which follows a t – distribution with degrees of freedom
𝑛 𝑠2 +𝑛 𝑠2 1 1
√( 1 1 2 2 )( + )
𝑛1 + 𝑛2 − 2 𝑛1 𝑛2
𝜗 = 𝑛1 + 𝑛2 − 2
II B. When the sample sizes are equal i.e. n1 = n2 = n. The we have n pair of values. Further we
have assume that the n pair are independent then the test statistic t becomes
̅𝑥̅̅1̅− ̅𝑥̅̅2̅
𝑡=
𝑛(𝑠2 2
1 +𝑠2 )) ( 2 )
√(
2𝑛− 2 𝑛
̅𝑥̅̅1̅− ̅𝑥̅̅2̅
∴𝑡= is a student t – variate with degrees of freedom 𝜗 = 2𝑛 − 2
(𝑠2 +𝑠2 )
√( 1 2 )
𝑛−1
II C. Suppose that the sample size are equal and if the n pairs of values in this case are not
independent.
𝑥̅ −𝜇
The test statistic 𝑡 = 𝑠 to test whether the means of differences is significantly different
⁄
√𝑛−1
from zero. In this case the degrees of freedom is n – 1.
Problems:
1. A sample of 10 house owners is drawn and the following values of their incomes are
obtained. Mean Rs 6000, standard deviation Rs 650. Test the hypothesis that the
average income of the house owners of the town is Rs 5500.
Soln :
Sample size n = 10
Sample mean 𝑥̅ = 6000.
Population mean 𝜇 = 5500
Standard deviation 𝜎 = s = 650.
H0 : 𝜇 = 5500.
H1: 𝜇 ≠ 5500 (Two Tailed test).
Level of significance = 5%
𝑥̅ −𝜇
t= 𝑆
√𝑛
𝑛
S= 𝑛−1 𝜎 = 685.16
Therefore
t = 2.189
|𝑡| = 2.189 < 2.262.
H0 is accepted.
The average income of the house owners is Rs 5500.
2. A machinist is expected to make engine parts with axle diameter of 1.75 cm. A
random sample of 10 parts shows a mean diameter of 1.85 cm with S.D of 0.1 cm. On
the basis of this sample, would you say that the work of the machinist is inferior?
Sample size n = 10
Sample mean 𝑥̅ = 1.85.
Population mean 𝜇 = 1.75
Standard deviation s = 0.1.
H0 : 𝑥̅ = 𝜇.
H1: 𝑥̅ ≠ 𝜇.
Two Tailed test).
Level of significance = 5%
𝑥̅ −𝜇
t= 𝑆
√𝑛
Therefore
t= 3
|𝑡| = 3 > 2.262.
H0 is rejected.
3. Samples of two types of electric bulbs were tested for length of life and the following
data were obtained.
Size Mean S.D
Sample I 8 1234 hrs 36 hrs
Sample II 7 1036 hrs 40 hrs
Is the difference in the mean sufficient to warrant that type I bulbs are superior to
type II bulbs?
Soln :
Sample I size n1 = 8 mean 𝑥̅1 = 1234 hrs 𝑠1 = 36 hrs
Sample II size n2 = 7 mean 𝑥̅2 = 1036 hrs 𝑠2 = 40 hrs.
H0 : 𝑥̅1 = 𝑥̅ 2
H1 : 𝑥̅1 > 𝑥̅2 (Right tailed test)
Level of significance 5 %
𝑥̅ 1 − 𝑥̅ 2 𝑛1 𝑠1 2 +𝑛2 𝑠2 2
t= S=
1 1 𝑛1 +𝑛2 −2
√𝑆 ( + )
n1 n2
t = 9.39
Degree of freedom v = 13
Table value = 1.77
|𝑡| = 9.39 > 1.77.
H0 is rejected.
Type one bulbs may be regarded superior to type II bulbs.
Solution:
𝑥̅ −𝜇
Under H0, test statistics 𝑡 = 𝑠 ~𝑡𝑛−1 , where 𝑥̅ and s can be calculated from the
⁄
√𝑛−1
sample data as 𝑥̅ = 972 / 10 = 97.2 and
2
(𝑥𝑖 − 𝑥̅ )2 1833.60
𝑠 = ∑ = = 183.36
𝑛 10
𝐻𝑒𝑛𝑐𝑒 𝑠 = 13.54
∴ |𝑡| = 6.2 < t0.05. Hence the difference is not significant at 5% level. Hence H0
may be accepted at 5% level. Hence the data support the assumption of population
mean 100.
5. It was found that a machine has produced pipes having a thickness .05
mm. to determine whether the machine is in proper working order a
sample of 10 pipe is chosen for which the mean thickness is .53mm
and s.d is 0.3mm .test the hypothesis that the machine is in proper
working order using a level of significance of (1) .05 (2) .01
Solution :
𝑥̅ −𝜇
Under H0, test statistics 𝑡 = 𝑠 ~𝑡𝑛−1 , where 𝑥̅ and s can be calculated from the
⁄
√𝑛−1
sample data
0.53 − 0.50
𝑡= × √9 = 3.
0.03
(i). The table value for v = 9 d.f at 5% level of significance is t0.05 = 2.26
(ii). The table value for v = 9 d.f at 1% level of significance is t0.01 = 3.25.
Soldiers 1 2 3 4 5 6 7 8 9 10
Score 67 24 57 55 63 54 56 68 33 43
before(x)
Score 70 38 58 58 56 67 68 75 42 38
after(y)
Do the data indicate that the soldier have been identified by the training ?
Solution:
Here we are connected with the same set of the soldiers in the 2 competitions and their scores
which are related to each other because of the intensive training .we compute the difference in
their scores 𝑧 = 𝑦 − 𝑥and calculate the mean 𝑧 and the s.d 𝑧 as follow
24 38 14 9 81
57 58 1 -4 16
55 58 3 -2 4
63 56 -7 -12 144
54 67 13 8 64
56 68 12 7 49
68 75 7 2 4
33 42 9 4 16
43 38 -5 -10 100
- 50 482
- -
Given n = 10,
𝑥̅ −𝜇
Under H0, test statistics 𝑡 = 𝑠 ~𝑡𝑛−1 , where 𝑥̅ and s can be calculated from
⁄
√𝑛−1
the sample data
as 𝑥̅ = 50 / 10 = 5 and
2
(𝑥𝑖 − 𝑥̅ )2 482
𝑠 = ∑ = = 48.2
𝑛 10
Hence the null hypothesis is accepted .We can conclude that there is
no significant improvement in the training .
F – Test
Definition: F – Test
1 1 1
Let 𝑋̅ = 𝑛 ∑𝑛𝑖=1 𝑋𝑖 and 𝑌̅ = 𝑚 ∑𝑚 2
𝑖=1 𝑌𝑖 be the sample means. Let 𝑆𝑋 = ∑𝑛𝑖=1(𝑋𝑖 −
𝑛−1
1
𝑋̅)2 and 𝑆𝑌2 = ∑𝑚 ̅ 2
𝑖=1(𝑌𝑖 − 𝑌 ) be the sample variances. Then the test statistic
𝑚−1
𝑆2
F = 𝑆𝑋2 has an F – distribution with n -1 and m -1 degrees of freedom.
𝑌
Important properties:
Uses:
F- distribution is used to test the equality of the variance of the populations from
which two small samples have been drawn.
Assumptions of F – Test:
1. In one sample of 10 observations, the sum of the squares of the deviation of the
sample values from the sample mean was 120 and in the other sample of 12
observations it was 314. Test whether this difference is significant at 5 % level of
significance.
Soln :
n1 = 10 n2 = 12
∑(𝑋1 − ̅̅̅ 2
𝑋1 ) = 120 ̅̅̅2 )2 = 314
∑(𝑋2 − 𝑋
̅̅̅̅
∑(𝑋1 −𝑋 2
1)
𝑆1 2 = = 13.33
𝑛1 −1
̅̅̅̅
∑(𝑋2 −𝑋 2
2)
𝑆2 2 = = 28.55
𝑛2 −1
H0: 𝜎1 = 𝜎2 2
2
𝑠1 2
F= = 2.14
𝑠2 2
F = 2.14 < 3.10
H0 is accepted.
2. Two independent samples of sizes 9 and 7 from a normal population had the
following values of the variables
Sample I : 18 13 12 15 12 14 16 14 15
Sample II :16 19 13 16 18 13 15
Do the estimates of the population variance differ significantly at 5% level.
Soln :
n1 = 9 and n2 = 7
𝑛1 𝑠1 2
𝑆1 2 = = 3.751
𝑛1 −1
𝑛2 𝑠2 2
𝑆2 2 = = 5.2376
𝑛2 −1
H0: 𝜎1 2 = 𝜎2 2
H1 : 𝜎1 2 ≠ 𝜎2 2 (Two tailed test)
𝑠2 2
F= = 1.3963
𝑠1 2
F = 1.3963 < 3.58
H0 is accepted.
3. In comparing the variability of family income in two areas, a survey yielded the
following data,
Sample I size n1 = 100 𝑠1 2 = 25
Sample II size n2 = 110 𝑠2 2 = 10.
Assuming that the populations are normal, test the hypothesis H0: 𝜎1 2 = 𝜎2 2
and H1 : 𝜎1 2 > 𝜎2 2 at 5% level of significance.
Soln :
Sample I size n1 = 100 𝑠1 2 = 25
Sample II size n2 = 110 𝑠2 2 = 10.
H0: 𝜎1 2 = 𝜎2 2
H1 : 𝜎1 2 > 𝜎2 2 (right tailed test)
𝑠1 2
F= = 2.5
𝑠2 2
F = 2.5 > 1.38
H0 is rejected .