Unit-I PROBABILITY AND DISTRIBUTION

Download as pdf or txt
Download as pdf or txt
You are on page 1of 53

UNIT-II

PROBABILITY AND DISTRIBUTION

Introduction

Probability Model:- A probability is a convenient way to describe the distribution of the


outcomes of an experiment. It consists of all possible outcomes of an experiment their
corresponding probabilities.

Statistical model:- A statistical model is a collection of probability distributions on a set of all


outcomes of an experiment.

Statistical models are:

(i) Linear Regression

(ii) Classification

(iii) Re-sampling

(iv) Non-linear methods

When, we perform experiments in science and engineering, repeatedly under identical


conditions, we get almost the same result. There also exist experiments in which the outcome
may be different even if the experiment is performed under identical conditions. In such
experiments, the outcome of such experiment depends on chance.

Also a random experiment is defined as an experiment in which all the possible outcomes are
known in advance and no personal bias is exercised.

Throwing of an unbiased coin is a random experiment as out of two faces, any of the face i.e.
head or tail may come up. Similarly, throwing of an unbiased die is a random experiment as any
of the six faces of the die may come up. In this experiment, there are six possibilities ( 1 or 2 or
3 or 4 or 5 or 6 ). This is also random experiment.

Below are certain terms which will be used frequently.

Trial: Performing of an experiment is called trial.

Cases: Various possible outcomes of a trial are termed as cases.


Event: It is used to represent the aim with which the experiment is performed.

Sample space: It is the set of all possible outcomes of an experiment.

Event is a subset of sample space and cases are its members i.e. subsets consisting of single
members.

Equally likely cases ( Events ): Cases are called mutually exclusive when no two of them can
occur simultaneously.

Mutually exclusive events: Two events are said to be mutually exclusive events when the
occurrence of one of them, stops the occurrence of the other.

Independent events: Two events are said to be independent events if occurrence of one event
does not affect the occurrence of other.

Exhaustive Cases: A set of cases is said to be exhaustive if it includes all possible outcomes of a
trial.

Favourable cases: The cases which entail the happening of an event are said to be favourable
to an event.

Odds in favour of or against the trial: If an experiment can succeed in 𝑚 ways and fail in 𝑛
ways, each of these ways being equally likely, then the odds are 𝑚 to 𝑛 in favour or 𝑛 to 𝑚
against the trial.
𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔
Odds in favour of an event = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜𝑛−ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔

𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑛𝑜𝑛−ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔
Odds against an event = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 ℎ𝑎𝑝𝑝𝑒𝑛𝑖𝑛𝑔

Definition:

Let an event A can happen in 𝑚 ways, and fail in 𝑛 ways where all ways are equally likely to
occure, then the probability of the happening of event A is defined as
𝑁𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑓𝑎𝑣𝑜𝑢𝑟𝑎𝑏𝑙𝑒 𝑐𝑎𝑠𝑒𝑠 𝑚
𝑃(𝐴) = 𝑇𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑚𝑢𝑡𝑢𝑎𝑙𝑙𝑦 𝑒𝑥𝑐𝑙𝑢𝑠𝑖𝑣𝑒 𝑎𝑛𝑑 𝑒𝑞𝑢𝑎𝑙𝑙𝑦 𝑙𝑖𝑘𝑒𝑙𝑦 𝑐𝑎𝑠𝑒𝑠 = 𝑚+𝑛 = 𝑝(𝑠𝑎𝑦)

While that of its failing is defined as 𝑃(𝑛𝑜𝑡𝐴)


𝑛
OR 𝑃(𝐴̅) = 𝑚+𝑛 = 𝑞(𝑠𝑎𝑦)

𝑚 𝑛
Thus 𝑃(𝐴) + 𝑃(𝐴̅) = 𝑝 + 𝑞 = 𝑚+𝑛 + 𝑚+𝑛 = 1.
From this, it is noted that 𝑃(𝐴) = 𝑝 is such that 0 ≤ 𝑝 ≤ 1. 𝑃(𝐴̅) = 𝑞 is called the
complementary event. Also 0 ≤ 𝑞 ≤ 1.

Conditional probability: The probability of an event A, when event B has already occurred is
known as conditional probability of event A and denoted as P(A/B). This is in the case when
events A and B are dependent.

Example1. The chance of an event happening is the square of the chance of a second event but
the odds against the first are the cube of the odds against the second. Find the chance of each.

Solution: Let 𝑝 and 𝑝′ be the chances of happening of two events, then 𝑝 = 𝑝′2 .
1−𝑝
Odds against the first event = 𝑝

1−𝑝′
Odds against the second event = 𝑝′

1−𝑝 1−𝑝′ 3 1−𝑝′2 (1−𝑃 ′ )3 1 1


Therefore, =( ) , implies that = , which implies that 𝑝′ = 3, so 𝑝 = 9.
𝑝 𝑝′ 𝑝′2 (𝑃 ′ )3

Example2. In an experiment there are 𝑛 outcome 𝑤𝑗+1 is twice as likely as 𝑤𝑗 (𝑗 =


1,2,3, … , 𝑛 − 1). Find 𝑃(𝐴𝑘 ) where 𝐴𝑘 = (𝑤1 , 𝑤2 , … , 𝑤𝑘 ).

Solution: Let 𝑃(𝑤𝑗 ) = 𝑝𝑗 , 𝑗 = 1,2, … . , 𝑛 − 1

Then 𝑝𝑗+1 = 2𝑝𝑗 , therefore

𝑝2 = 2𝑝1 , 𝑝3 = 2𝑝2 = 2.2𝑝1 = 22 𝑝1 , 𝑝4 = 2𝑝3 = 23 𝑝3 , … … , 𝑝𝑛 = 2𝑛−1 𝑝1

Since, 𝑤1 , 𝑤2 , … , 𝑤𝑛 are exhaustive,

𝑝1 + 𝑝2 + ⋯ + 𝑝𝑛 = 1
1−2𝑛
𝑝1 (1 + 2 + 22 + ⋯ + 2𝑛−1 ) = 1 , implies that 𝑝1 ( 1−2 ) = 1, which implies that

1
𝑝1 = 2𝑛−1 , therefore

1−2𝑘 2𝑘 −1
𝑃(𝐴𝑘 ) = ∑𝑘𝑖=1 𝑃(𝑤𝑘 ) = 𝑝1 + 𝑝2 + ⋯ + 𝑝𝑘 = 𝑝1 (1 + 2 + 22 + ⋯ + 2𝑘−1 ) = 𝑝1 ( 1−2 ) = 2𝑛 −1.

Example3. The sum of two positive quantities is equal to 2𝑛. Find the chance that the product
3
of two quantities is not less than 4 times of their greatest product.

Solution: Let 𝑥 be one quantity. Then other quantity is (2𝑛 − 𝑥).


Let product 𝑦 = 𝑥(2𝑛 − 𝑥)
𝑑𝑦
= 2𝑛 − 2𝑥.
𝑑𝑥

𝑑𝑦
For maximum and minimum putting 𝑑𝑥 = 0, we get 2𝑛 − 2𝑥 = 0, implies that 𝑥 = 𝑛.

Therefore maximum value of = 𝑛2 .

According to the given condition, 𝑥 is to be such that


3
𝑥(2𝑛 − 𝑥) > 4 𝑛2 , implies that 3𝑛2 − 8𝑛𝑥 + 4𝑥 2 < 0 OR (3𝑛 − 2𝑥)(𝑛 − 2𝑥) < 0,

𝑛 3𝑛
which implies that 2 < 𝑥 < .
2

3𝑛 𝑛
Therefore, number of favourable cases = −2=𝑛
2

And total number of cases= 2𝑛. Therefore,


𝑛 1
Required probability= 2𝑛 = 2.

There are two important theorems of probability, namely

1. The addition theorem or the theorem on total probability.

2. The multiplication theorem or theorem on compound probability.

Theorem of total probability:

Statement: It states that the probability of the happening of any one of the several mutually
exclusive events is the sum of the probabilities of the happening of separate events i.e.

𝑃(𝐴1 + 𝐴2 + ⋯ + 𝐴𝑛 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑛 ) OR

𝑃(𝐴1 𝑜𝑟 𝐴2 𝑜𝑟 … 𝑜𝑟 𝐴𝑛 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑛 ) , where 𝐴1 , 𝐴2 , … , 𝐴𝑛 are mutually


exclusive events.

Proof: Let 𝐴1 , 𝐴2 , … , 𝐴𝑛 be 𝑛 mutually exclusive events. Then we are to show that

𝑃(𝐴1 + 𝐴2 + ⋯ + 𝐴𝑛 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑛 )

Let 𝑁 be the number of cases which are equally likely, mutually exclusive and exhaustive. Out
of these let

Number of cases favourable to 𝐴1 =𝑚1


Number of cases favourable to 𝐴2 =𝑚2

……………………………………………………………….

……………………………………………………………….

Number of cases favourable to 𝐴𝑛 =𝑚𝑛

Since 𝐴1 , 𝐴2 , … , 𝐴𝑛 are mutually exclusive, the cases 𝑚1 , 𝑚2 , … , 𝑚𝑛 are quite distinct and no-
overlapping.

Therefore, number of cases which are favourable to (𝐴1 + 𝐴2 + ⋯ + 𝐴𝑛 ) ( i.e. occurrence of


any of the events 𝐴1 , 𝐴2 , … , 𝐴𝑛 ) = 𝑚1 + 𝑚2 + ⋯ + 𝑚𝑛
𝑚1 +𝑚2 +⋯+𝑚𝑛 𝑚1 𝑚2 𝑚𝑛
Therefore, 𝑃(𝐴1 + 𝐴2 + ⋯ + 𝐴𝑛 ) = = + +⋯+
𝑁 𝑁 𝑁 𝑁

𝑃(𝐴1 + 𝐴2 + ⋯ + 𝐴𝑛 ) = 𝑃(𝐴1 ) + 𝑃(𝐴2 ) + ⋯ + 𝑃(𝐴𝑛 ).

Note:- 1. When two events 𝐴 and 𝐵 are not mutually exclusive, then there will be some
outcomes, or cases which favour both 𝐴 and 𝐵 together and suppose this happens in 𝑚𝑘 ways

( this is included in both 𝑚1 and 𝑚2 favourable to both 𝐴 and 𝐵 respectively ). Thus the total
number of cases favouring either 𝐴 or 𝐵 or both is 𝑚1 + 𝑚2 − 𝑚𝑘 . Hence the probability of
occurrence of 𝐴 or 𝐵 or both is given by
𝑚1 +𝑚2 −𝑚𝑘 𝑚1 𝑚2 𝑚𝑘
𝑃(𝐴 + 𝐵) = = + − = 𝑃(𝐴) + 𝑃(𝐵) − 𝑃(𝐴𝐵)
𝑛 𝑛 𝑛 𝑛

Where 𝑃(𝐴𝐵) represent the probability of both 𝐴 and 𝐵 happening together. It may be noted
that when 𝐴 and 𝐵 are mutually exclusive then 𝑃(𝐴𝐵) = 0.

2. When three events 𝐴, 𝐵 and 𝐶 are non mutually exclusive events, then

𝑃(𝐴 + 𝐵 + 𝐶) = 𝑃(𝐴) + 𝑃(𝐵) + 𝑃(𝐶) − 𝑃(𝐴𝐵) − 𝑃(𝐵𝐶) − 𝑃(𝐶𝐴) + 𝑃(𝐴𝐵𝐶).

Theorem on multiplication of probabilities or Theorem of compound probability:

Statement: The probability of the occurrence of two independent events is the product of their
separate probabilities i.e. 𝑃(𝐴𝐵) = 𝑃(𝐴). 𝑃(𝐵).

Proof: Let the two independent events be 𝐴 and 𝐵. Let the event 𝐴 succeed in 𝑚1 ways and fail
in 𝑛1 ways, and the event 𝐵 succeed in 𝑚2 ways and fail in 𝑛2 ways, all the ways in both the
events 𝐴 and 𝐵 being equally likely. Now there are (𝑚1 + 𝑛1 ) ways in event 𝐴 and (𝑚2 + 𝑛2 )
ways in event 𝐵. Each of the (𝑚1 + 𝑛1 ) ways can be associated with each of (𝑚2 + 𝑛2 ) ways.
Thus in their simultaneous happening, the total number of ways are (𝑚1 + 𝑛1 ) × (𝑚2 + 𝑛2 ).
Out of these total number of ways, we have (𝑚1 . 𝑚2 ) ways in which both the events succeed.

Hence the probability that both the events succeed is


𝑚1 𝑚2 𝑚1 𝑚2
= (𝑚 ) × (𝑚 ) = Product of the probabilities of success of the two
(𝑚1 +𝑛1 ) (𝑚2 +𝑛2 ) 1 +𝑛1 2 +𝑛2
events.

OR 𝑃(𝐴𝐵) = 𝑃(𝐴). 𝑃(𝐵).

This is also true for any number of events 𝐴, 𝐵, 𝐶, 𝐷, ….

Thus, 𝑃(𝐴𝐵𝐶𝐷 … . ) = 𝑃(𝐴). 𝑃(𝐵). 𝑃(𝐶)𝑃(𝐷) ….

Note:- If two events 𝐴 and 𝐵 are dependent, then 𝑃(𝐴 ∩ 𝐵) = 𝑃(𝐴). 𝑃(𝐵/A), where 𝑃(𝐵/A) is
known as conditional probability of event 𝐵 when event 𝐴 has already occurred.

Repeated trials

If the probability that event happens in a single trial is 𝑝 and the probability that it fails is q then
𝑝 + 𝑞 = 1. We would like to know the probability of its happening exactly 𝑟 times in 𝑛 trials.

By multiplication theorem, ( as the trials are independent ) probability for exactly 𝑟 consecutive
successes followed by (𝑛 − 𝑟) failures will be

(𝑝 × 𝑝 × 𝑝 × … 𝑟 𝑡𝑖𝑚𝑒𝑠)(𝑞 × 𝑞 × 𝑞 × … (𝑛 − 𝑟)𝑡𝑖𝑚𝑒𝑠) = 𝑝𝑟 𝑞 𝑛−𝑟 .

Obviously the probability for 𝑟 successes and (𝑛 − 𝑟) failures remain the same as above in
whatever order events happen. Now the total probability for 𝑟 successes and (𝑛 − 𝑟) failures,
irrespective of the order of their occurrence can be found by finding the total number of
possible ways in which this particular event can happen. Now the total number of possible ways
are simply the permutations of 𝑛 things taken all at a time, of which 𝑟 are alike each equal to 𝑝
and (𝑛 − 1) alike equal to 𝑞.
𝑛!
Therefore, possible number of ways = 𝑟!(𝑛−𝑟)!

𝑛!
Hence the number of times 𝑝𝑟 𝑞 𝑛−𝑟 is to be added is 𝑟!(𝑛−𝑟)! .

The probability for 𝑟 successes and (𝑛 − 𝑟) failures is given by


𝑛!
𝑝𝑟 𝑞 𝑛−𝑟 = 𝑛𝐶𝑟 𝑝𝑟 𝑞 𝑛−𝑟 .
𝑟!(𝑛−𝑟)!

By putting 𝑟 = 0,1,2, … , 𝑛 we get probabilities for 0,1,2, … , 𝑛 successes respectively.


Cor.1. The probability that the event happens at least 𝑟 times in 𝑛 trials is

𝑛𝐶𝑟 𝑝𝑟 𝑞 𝑛−𝑟 + 𝑛𝐶𝑟+1 𝑝𝑟+1 𝑞 𝑛−𝑟−1 + 𝑛𝐶𝑟+2 𝑝𝑟+2 𝑞 𝑛−𝑟−2 + ⋯ + 𝑛𝐶𝑛 𝑝𝑛 𝑞 𝑛−𝑛 , since the event can
happen 𝑟, 𝑟 + 1, 𝑟 + 2, … , 𝑛 ways.

Cor.2. The probability that the event happens at least once in 𝑛 trials is (1 − 𝑞 𝑛 ) = 1 −
probability of zero success.

Since the total probability is 1, and the probability of zero success is 𝑛𝐶0 𝑝0 𝑞 𝑛−0 = 𝑞 𝑛 .
Subtracting 𝑞 𝑛 from 1 we get the probability for at least one success.

Example1. A die is thrown 8 times and it is required to find the probability that 6 will show

(i) exactly 7 times, (ii) at least 7 times, (iii) at least once.

Solution: The chance that 6 will show in single throw is 1/6 and the chance that it fails is 1 −
1 5
= 6.
6

1 7 5 40
(i) The chance for exactly 7 successes in 8 trials is 8𝐶7 𝑝7 𝑞 8−7 = 8𝐶7 (6) (6) = 68 .

(ii) The chance for at least 7 successes is = 𝑝(𝑟 = 7) + 𝑝(𝑟 = 8) where 𝑟 denotes the number
1 7 5 1 8 5 0 41
of successes = 8𝐶7 (6) (6) + 88 (6) (6) = .
68

5 8
(iii) The chance for at least one success = 1 − (6) .

Example2. In a given race, the odds in favour of four horses 𝐴, 𝐵, 𝐶, 𝐷 are 1:3, 1:4, 1:5, 1:6
respectively. Assuming that a dead heat is impossible; find the chance that one of them wins
the race.

Solution: Let 𝑝1 , 𝑝2 , 𝑝3 , 𝑝4 be the probabilities of winning of the horses 𝐴, 𝐵, 𝐶, 𝐷 respectively.

Since a dead heat ( in which all the four horses cover same distance in same time ) is not
possible, the events are mutually exclusive.
1 1
Odds in favour of 𝐴 are 1:3, therefore 𝑝1 = 1+3 = 4 .

1 1 1 1 1 1
Similarly 𝑝2 = 1+4 = 5 , 𝑝3 = 1+5 = 6 , 𝑝4 = 1+6 = 7.

1 1 1 1 319
Then, the probability that one of them wins is 𝑝 = 𝑝1 + 𝑝2 + 𝑝3 + 𝑝4 = 4 + 5 + 6 + 7 = 420.
Example3. A committee of 12 students consists of 3 representatives from first year, 4 from
second year and 5 from third year classes. Out of 12 members three are to be removed by
drawing lots. What is the chance that

(i) the three students belong to different classes.

(ii) two belong to one class and the third to the different class.

(iii) the three belong to the Same class.

Solution: The total number of ways of choosing 3 students out of 12 is 12𝐶3 = 220.

The number of ways of choosing 1 student from three groups are 3𝐶1 . 4𝐶1 . 5𝐶1 = 3.4.5 = 60
60 3
Therefore, required probability 220 = 11.

(ii) If 2 from first year and one from others = 3𝐶2 . 9𝐶1 = 3.9 = 27

If 2 from second year and one from others = 4𝐶2 . 8𝐶1 = 6.8 = 48

If two from third year and one from others = 5𝐶2 . 7𝐶1 = 10.7 = 70

Therefore, total number of ways = 27 + 48 + 70 = 145


145 29
The required probability = 220 = 44

(iii) If three students belong to first year = 3𝐶3 = 1

If three students belong to second year = 4𝐶3 = 4

If three students belong to third year = 5𝐶3 = 10 Total number of ways =1+4+10=15
15 3
The required probability = 220 = 44.

Example 4. An apparatus contains 6 electronic tubes. It will not work unless all tubes are
working. If the probability of failure of each tube is 0.05, what is the probability of failure of
apparatus?

Solution: Probability of failure of tube = 0.05.

Therefore, the probability that tube works = 0.95.

Probability that apparatus works = Probability that all 6 tubes work =(0.95)6

Hence probability that apparatus fails = 1 − (0.95)6 = 1 − 0.73509 = 0.26491.


Bayes theorem

Statement: If an event 𝐸, can only occur in combination with one of the mutually exclusive
events 𝐸1 , 𝐸2 , … , 𝐸𝑛 , then
𝑃(𝐸𝑘 )𝑃(𝐸/𝐸𝑘 )
𝑃(𝐸𝑘 /𝐸) = ∑𝑛 , 𝑘 = 1,2, … , 𝑛
𝑖=1 𝑃(𝐸𝑖 )𝑃(𝐸/𝐸𝑖 )

Proof: Since the event 𝐸 can occur only with the events 𝐸1 , 𝐸2 , … , 𝐸𝑛 , the possible forms in
which 𝐸 can occur are

𝐸𝐸1 , 𝐸𝐸2 , … , 𝐸𝐸𝑛

These forms are mutually exclusive as the events 𝐸 are mutually exclusive.

Therefore, by total probability theorem

𝑃(𝐸) = 𝑃(𝐸𝐸1 ) + 𝑃( 𝐸𝐸2 ) + ⋯ + 𝑃(𝐸𝐸𝑛 ) = ∑𝑛𝑖=1 𝑃(𝐸𝐸𝑖 ) = ∑𝑛𝑖=1 𝑃(𝐸𝑖 )𝑃(𝐸/𝐸𝑖 )

On using compound probability theorem

𝑃(𝐸𝐸𝑘 ) = 𝑃(𝐸)𝑃(𝐸𝑘 /𝐸) = 𝑃(𝐸𝑘 )𝑃(𝐸/𝐸𝑘 )


𝑃(𝐸𝑘 )𝑃(𝐸/𝐸𝑘 ) 𝑃(𝐸𝑘 )𝑃(𝐸/𝐸𝑘 )
Therefore, 𝑃(𝐸𝑘 /𝐸) = = ∑𝑛 , 𝑘 = 1, 2, 3, … , 𝑛.
𝑃(𝐸) 𝑖=1 𝑃(𝐸𝑖 )𝑃(𝐸/𝐸𝑖 )

Example1. In a bolt factory, machines 𝐴, 𝐵 and 𝐶 manufacture respectively 25%, 35% and 40%
of the total. Of their output 5%, 4% and 2% are defective bolts. A bolt is drawn at random from
the product and is found to be defective. What is the probability that it was manufactured by
machine 𝐵?

Solution: Let 𝐸1 , 𝐸2 and 𝐸3 denote the event that a bolt at random is manufacture by the
machines 𝐴, 𝐵 and 𝐶 respectively and let 𝐸 denote the event of its being defective. Then
1 7 2
𝑃(𝐸1 ) = 25% = 4 , 𝑃(𝐸2 ) = 35% = 20 , 𝑃(𝐸3 ) = 40% = 5

1
The probability of drawing a defective bolt manufactured by machine 𝐴 is 𝑃(𝐸/𝐸1 ) = 5% = 20

1 1
Similarly, 𝑃(𝐸/𝐸2 ) = 4% = 25 , 𝑃(𝐸/𝐸3 ) = 2% = 50

Therefore, by Baye’s theorem,


7 1
𝑃(𝐸2 )𝑃(𝐸/𝐸2 ) ×
20 25
𝑃(𝐸2 /𝐸) = 𝑃(𝐸 =1 1 7 1 2 1 = 0.41.
1 )𝑃(𝐸/𝐸1 )+𝑃(𝐸2 )𝑃(𝐸/𝐸2 )+𝑃(𝐸3 )𝑃(𝐸/𝐸3 ) × + × + ×
4 20 20 25 5 50
Example2. The contents of urns I, II and III are as follows:

1 white, 2 black and 3 red balls; 2 white, 1 black and 1 red balls and 4 white, 5 black and 3 red
balls. One urn is chosen at random and two balls drawn. They happen to be white and red.
What is the probability that they come from urns I, II and III?

Solution: Let 𝐸1 : urn I is chosen; 𝐸2 : urn II is chosen; 𝐸3 : urn III is chosen and

𝐴 : two balls are white and red

We have to find 𝑃(𝐸1 /𝐴), 𝑃(𝐸2 /𝐴) and 𝑃(𝐸3 /𝐴).


1
Now 𝑃(𝐸1 ) = 𝑃(𝐸2 ) = 𝑃(𝐸3 ) = 3

1𝐶1 ×3𝐶1 1
𝑃(𝐴/𝐸1 ) = 𝑃( a white and a red ball are drawn from urn I)= =5
6𝐶2

2𝐶1 ×1𝐶1 1 4𝐶1 ×3𝐶1 2


𝑃(𝐴/𝐸2 ) = = 3; 𝑃(𝐴/𝐸3 ) = = 11
4𝐶2 12𝐶2

Therefore, by Baye’s theorem


𝑃(𝐸1 )𝑃(𝐴/𝐸1 ) 33
𝑃(𝐸1 /𝐴) = 𝑃(𝐸 = 118
1 )𝑃(𝐴/𝐸1 )+𝑃(𝐸2 )𝑃(𝐴/𝐸2 )+𝑃(𝐸3 )𝑃(𝐴/𝐸3 )

55 15
Similarly, 𝑃(𝐸2 /𝐴) = 118 , 𝑃(𝐸3 /𝐴) = 59.

Example3. 𝐴 and 𝐵 take turns in throwing two dice, the first to throw 10 being awarded the
prize, show that if 𝐴 has the first throw, their chances of winning are in the ratio 12:11.

Solution: The number 10 can be drawn in three ways (6,4) ; (4,6) ; (5,5). Therefore,
3 1
The probability of throwing 10 𝑝 = 36 = 12.

1 11
Therefore, the probability of failre 𝑞 = 1 − 𝑝 = 1 − 12 = 12

If 𝐴 is to win, he should throw 10 in either the first, the third, the fifth,…throws.

1 11 2 1 11 4 1
The respective probabilities are 12 ; (12) × 12 ; (12) × 12 ; …

1 11 2 1 11 4 1
Therefore, 𝐴′𝑠 total chances of winning = 12 + (12) × 12 + (12) × 12 + ⋯

1
12 12
= 11 2
= 23 (Using sum of an infinite G.P.)
1−( )
12
𝐵′𝑠 can win in either second ; fourth ; sixth ; … throws.

11 1 11 3 1 11 5 1
Similarly, 𝐵′𝑠 total of chance of winning are = 12 × 12 + (12) × 12 + (12) × 12 + …

1 11
× 11
12 12
= 11 2
= 23.
1−( )
12

12 11
Hence 𝐴′𝑠 chance to 𝐵′𝑠 chance = 23 : 23 = 12: 11.

RANDOM VARIABLE AND PROBABILITY DISTRIBUTION

If an experiment is conducted under identical conditions, values so obtained may not be similar.
Observations are always taken about a factor or character under study, which can take different
values. This factor or character is termed as variable. The observations may be the number of
certain objects or items or their measurements. These observations very even though the
experiment is conducted under identical conditions. Hence we have a set of outcomes of a
random experiment. A rule that assigns a real number to each outcome is called a random
variable. The rule is nothing but a function of the variable, say, 𝑋 that assigns a unique value to
each outcome of the random experiment. It is clear that there is a value for each outcome,
which it takes with certain probability. Thus when a variable 𝑋 takes the value 𝑥𝑖 with
probability 𝑝𝑖 (𝑖 = 1,2,3, … , 𝑛), then 𝑋 is called random variable or stochastic variable or a
variate.

Random variables are of two types

(i) Discrete random variable

(ii) Continuous random variable

A random variable 𝑋, which can take only finite number of values in an interval of the domain,
is called discrete random variable. For example if we throw a pair of dice at a time and note the
sum which turn up, we note that it must be an integer between 2 and 12. Thus the discrete
random variable can take the finite values 2 ≤ 𝑥 ≤ 12. On the other hand a random variable 𝑋,
which can take every value in the domain or when its range 𝑅 is an interval, is calle continuous
random variable. In this case the random variable can take infinite values in the domain or
interval. The probability of the continuous random variable is defined to fall in the interval
1 1
(𝑥 − 2 𝑑𝑥, 𝑥 + 2 𝑑𝑥), and not at 𝑋 = 𝑥.

Note that the probability of any single 𝑥, a value of 𝑋, is zero i.e. 𝑃(𝑋 = 𝑥) = 0.
For example, the height of students in a country lies between 100 cms and 200 cms. The
continuous random variable:

𝑋(𝑥) = {𝑥 ∶ 100 ≤ 𝑥 ≤ 200}

Another example, the maximum life of electric bulbs is 2000 hours. The continuousrandom
variable;

𝑋(𝑥) = {𝑥 ∶ 0 ≤ 𝑥 ≤ 2000}

The values 𝑥1 , 𝑥2 , … , 𝑥𝑛 of the discrete random variable 𝑋 with their respective probabilities
𝑝1 , 𝑝2 , … , 𝑝𝑛 constitute a probability distribution which is called discrete probability distribution
of the discrete random variable 𝑋. It may be noted that 𝑝1 + 𝑝2 + … + 𝑝𝑛 = 1.

In a throw of a pair of dice the sum (𝑋) is discrete random variable which is an integer between
2 and 12 with the probabilities 𝑃(𝑋) given as:

𝑋 2 3 4 5 6 7 8 9 10 11 12
P(𝑋) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36

𝑋 = 𝑥𝑖 ∶ 2 3 4 5 6 7 8 9 10 11 12
1 2 3 4 5 6 5 4 3 2 1
𝑃(𝑋)𝑜𝑟𝑝(𝑥𝑖 ) ∶ 36 36 36 36 36 36 36 36 36 36 36

This constitute a discrete probability distribution.

𝑃(𝑋 ≤ 𝑥) = ∑𝑥𝑖=1 𝑝(𝑥𝑖 ) where 𝑥 is an integer is defined as distribution function or cumulative


distribution function and is denoted by 𝐹(𝑥).

In case of continuous random variable, the variate 𝑋 can have infinite values, and it is not
possible to have a finite probability associated with each possible point or value of the variate
and yet to have the sum of these probabilities equal to unity. In this case we associate
probabilities with intervals. Let the probability of the continuous random variable 𝑋 to fall in
1 1
the interval (𝑥 − 2 𝑑𝑥, 𝑥 + 2 𝑑𝑥) be given by 𝑓(𝑥)𝑑𝑥, where 𝑓(𝑥) is a continuous function of 𝑥
and is called probability density function. The curve 𝑦 = 𝑓(𝑥) is called the probability density
curve.
Therefore, the probability of a continuous random variable 𝑋 in a given interval is given by
1 1
𝑃 (𝑥 − 2 𝑑𝑥 ≤ 𝑋 ≤ 𝑥 + 2 𝑑𝑥) = 𝑓(𝑥)𝑑𝑥

The probability that the variate 𝑋 falls in the interval (𝑎, 𝑏) is given by
𝑏
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥

Therefore, the total probability is unity



Therefore, ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1.

𝛽
If 𝑋 lies only inside the interval (𝛼, 𝛽), then ∫𝛼 𝑓(𝑥)𝑑𝑥 = 1.

Here range of 𝑥 is (𝛼, 𝛽) which may be finite or infinite.

Note:-
𝑏
1. 𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = area between the curve 𝑦 = 𝑓(𝑥), 𝑥 −axis and the
ordinates 𝑥 = 𝑎 and 𝑥 = 𝑏 as shown in the figure.

2. 𝑃(−∞ ≤ 𝑋 ≤ ∞) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = area between the curve 𝑦 = 𝑓(𝑥), 𝑥 −axis and the
ordinates 𝑥 = −∞ and 𝑥 = ∞ which is unity.
𝑥
3. 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = 𝐹(𝑥) (say) where 𝐹(𝑥) is called probability distribution
function or cumulative distribution function. Thus 𝐹(𝑥) gives the probability of the variable 𝑋
to take values up to 𝑥.

Also 𝑃(𝑋 = −∞) = 0

𝑃(∞) = 1
𝑃(𝑎 ≤ 𝑋 ≤ 𝑏) = 𝐹(𝑏) − 𝐹(𝑎).

OR

If corresponding to exhaustive and mutually exclusive cases that may arise from an experiment,
a variate 𝑥 assumes 𝑛 value 𝑥𝑖 (𝑖 = 1,2, … , 𝑛) with the probabilities 𝑝𝑖 (𝑖 = 1,2, … , 𝑛), then
assemblage of the values 𝑥𝑖 with their probabilities 𝑝𝑖 define the probability distribution
function of the variate 𝑥. Since the number 𝑥 is associated with the outcome of a random
experiment, it is called a random variable or stochastic variable or more commonly a variate.
Most of the concepts discussed with the frequency distributions apply equally well to
distribution functions. Thus the mean value 𝑥̅ of the discrete distribution function is given by
∑ 𝑝 𝑖 𝑥𝑖
𝑥̅ = ∑ 𝑝𝑖
= ∑ 𝑝𝑖 𝑥𝑖 , because for all the mutually exclusive and exhaustive cases, ∑ 𝑝𝑖 = 1.

OR

If a real variable 𝑋 be associated with the outcome of a random experiment, then since the
values which 𝑋 takes depend on chance, it is called random variable or a stochastic variable or
simply a variate.

For example, if a random experiment 𝐸 consists of tossing a pair of dice, the sum 𝑋 of the two
numbers which turn up have the value 2,3,4,…,12 depending on chance. Then 𝑋 is the random
variable. It is a function whose values are real numbers and depend on chance.

If in a random experiment, the event corresponding to a number 𝑎 occurs, then the


corresponding random variable 𝑋 is said to assume the value 𝑎 and the probability of the event
is denoted by 𝑃(𝑋 = 𝑎). Similarly the probability of the event 𝑋 assuming any value in the
interval 𝑎 < 𝑋 < 𝑏 is denoted by 𝑃(𝑎 < 𝑋 < 𝑏). The probability of the event 𝑋 ≤ 𝑐 is written
as 𝑃(𝑋 ≤ 𝑐).

If a random variable takes a finite set of values, it is called a discrete variate. On the other
hand, if it assumes an infinite number of uncountable values, it is called a continuous variate.

Discrete probability distribution

Suppose a discrete variate 𝑋 is the outcome of some experiment. If the probability that 𝑋 takes
the values 𝑥𝑖 , is 𝑝𝑖 , then

𝑃(𝑋 = 𝑥𝑖 ) = 𝑝𝑖 𝑜𝑟 𝑝(𝑥𝑖 )𝑓𝑜𝑟 𝑖 = 1,2, …

Where (i) 𝑝(𝑥𝑖 ) ≥ 0 for all values of 𝑖, (ii) ∑ 𝑝(𝑥𝑖 ) = 1.


The set of values 𝑥𝑖 with their probabilities 𝑝𝑖 constitute a discrete probability distribution of
the discrete variate 𝑋.

For example,

The discrete probability distribution for 𝑋, the sum of the numbers which turn on tossing a pair
of dice is given by the following table:

𝑋 = 𝑥𝑖 2 3 4 5 6 7 8 9 10 11 12
𝑝(𝑥𝑖 ) 1 2 3 4 5 6 5 4 3 2 1
36 36 36 36 36 36 36 36 36 36 36

Distribution function:

The distribution function 𝐹(𝑥) of the discrete variate 𝑋 is defined by

𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∑𝑥𝑖=1 𝑝(𝑥𝑖 ) where 𝑥 is any integer.

The distribution function is also sometimes called cumulative distribution function.

Example1. The probability density function of a variate 𝑋 is

𝑋 = 𝑥𝑖 0 1 2 3 4 5 6
𝑝(𝑥𝑖 ) 𝑘 3𝑘 5𝑘 7𝑘 9𝑘 11𝑘 13𝑘

(i) Find 𝑃(𝑋 < 4), 𝑃(𝑋 ≥ 50 , 𝑃(3 < 𝑋 ≤ 6).

(ii) What will be the minimum value of 𝑘 so that 𝑃(𝑋 ≤ 2) > 0.3.

Solution: (i) If 𝑋 is a random variable, then


1
∑6𝑖=0 𝑝(𝑥𝑖 ) = 1 𝑖. 𝑒. 𝑘 + 3𝑘 + 5𝑘 + 7𝑘 + 9𝑘 + 11𝑘 + 13𝑘 = 1, implies that 𝑘 = .
49

16
Therefore, 𝑃(𝑋 < 40 = 𝑘 + 3𝑘 + 5𝑘 + 7𝑘 = 16𝑘 =
49

24
𝑃(𝑋 ≥ 5) = 11𝑘 + 13𝑘 = 24𝑘 = 49

33
𝑃(3 < 𝑋 ≤ 6) = 9𝑘 + 11𝑘 + 13𝑘 = 33𝑘 = 49

(ii) (𝑋 ≤ 2) = 𝑘 + 3𝑘 + 5𝑘 = 9𝑘 , but 𝑃(𝑋 ≤ 2) > 3, therefore


1
9𝑘 > 0.3, implies that 𝑘 > 30.
Continuous probability distribution

When a variate 𝑋 takes every value in an interval, it gives rise to continuous distribution of 𝑋.
The distributions defined by the variates like heights and weights are continuous distributions.

A major conceptual difference, however, exists between discrete and continuous probabilities.
When thinking in discrete terms, the probability associated with an event is meaningful. With
continuous events, however, where the number of events are infinitely large, the probability
that a specific event will occur is practically zero. For this reason, continuous probability
statements must be worded somewhat differently from discrete ones. Instead of finding the
probability that 𝑥 equals some value, we find the probability of 𝑥 falling in a small interval.

Thus the probability distribution of a continuous variate 𝑥 is defined by a function 𝑓(𝑥) such
1 1
that the probability of the variate 𝑥 falling in small interval 𝑥 − 2 𝑑𝑥 to 𝑥 + 2 𝑑𝑥 is 𝑓(𝑥)𝑑𝑥.
1 1
Symbolically it can be expressed as 𝑃 (𝑥 − 2 𝑑𝑥 ≤ 𝑥 ≤ 𝑥 + 2 𝑑𝑥) = 𝑓(𝑥)𝑑𝑥. Then 𝑓(𝑥) is
called the probability density function and the continuous curve 𝑦 = 𝑓(𝑥) is called the
probability curve.

The range of the variable may be finite or infinite. But even when the range is finite, it is
convenient to consider it is infinite by supposing the density function to be zero outside the
given range. Thus if 𝑓(𝑥) = 𝜑(𝑥) be the density function denoted for the variate 𝑥 in the
interval(𝑎, 𝑏), then it can be written as

0 , 𝑥<𝑎
𝑓(𝑥) = {𝜑(𝑥), 𝑎≤𝑥≤𝑏
0 , 𝑥>𝑏

The density function 𝑓(𝑥) is always positive and ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1 ( i.e. the total area under the
probability curve and the x-axis is unity which corresponds to the requirements that the total
probability of happening of an event is unity ).

Distribution function
𝑥
If 𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥, then 𝐹(𝑥) is defined as the cumulative distribution
function or simply a distribution function of the continuous variate 𝑋. It is the probability that
the value of the variate 𝑋 will be ≤ 𝑥.

The distribution function 𝐹(𝑥) has the following properties:

(i) 𝐹 ′ (𝑥) = 𝑓(𝑥) ≥ 0, so that 𝐹(𝑥) is a non-decreasing function.

(𝑖𝑖) 𝐹(−∞) = 0
(iii) 𝐹(∞) = 1.
𝑏 −∞ 𝑏 𝑏 𝑎
(iv) 𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = ∫𝑎 𝑓(𝑥)𝑑𝑥 = ∫𝑎 𝑓(𝑥)𝑑𝑥 + ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑓(𝑥)𝑑𝑥 − ∫−∞ 𝑓(𝑥)𝑑𝑥

= 𝐹(𝑏) − 𝐹(𝑎).

Example 1. (i) Is the function defined as follows a density function?

𝑒 −𝑥 , 𝑥≥0
𝑓(𝑥) = {
0 , 𝑥<0

(ii) If so, determine the probability that the variate having this density will fall in the in terval
(1,2)?

(iii) Also find cumulative probability function 𝐹(2).

Solution: (i) 𝑓(𝑥) is clearly ≥ 0 for every 𝑥 in (1,2) and


∞ 0 ∞
∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 0𝑑𝑥 + ∫0 𝑒 −𝑥 𝑑𝑥 = 1.

Hence the function 𝑓(𝑥) satisfies the requirements for a density function.
2
(ii) Required probability= 𝑃(1 ≤ 𝑥 ≤ 2) = ∫1 𝑒 −𝑥 𝑑𝑥 = 𝑒 −1 − 𝑒 −2 = 0.368 − 0.135 = 0.233.

This probability is equal to the shaded area in figure (a)

(iii) Cumulative probability function


2 0 2
𝐹(2) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 0𝑑𝑥 + ∫0 𝑒 −𝑥 𝑑𝑥 = 1 − 𝑒 −2 = 1 − 0.135 = 0.865.

Which is shown in figure (b).

Figure (a) Figure (b)


Mean or Expected value and variance:

(1) Let a discrete random variable 𝑋 assume the values 𝑥1 , 𝑥2 , 𝑥3 , … … 𝑥𝑛 with probabilities
𝑝1 , 𝑝2 , 𝑝3 , … … . , 𝑝𝑛 then the mean or expected value is defined as

𝑥1 𝑝1 +𝑥2 𝑝2 +𝑥3 𝑝3 +⋯……+𝑥𝑛 𝑝𝑛 . ∑𝑛


𝑖=1 𝑝𝑖 𝑥𝑖
𝐸(𝑋) = 𝜇 = = ∑𝑛
= ∑𝑛𝑖=1 𝑝𝑖 𝑥𝑖 = ∑ 𝑝𝑥, because ∑𝑛𝑖=1 𝑝𝑖 = 1.
𝑝1 +𝑝2 + 𝑝3 +⋯….+𝑝𝑛 𝑖=1 𝑝𝑖

The variance is defined as 𝜎 2 = ∑𝑛𝑖=1 𝑝𝑖 (𝑥𝑖 − 𝜇)2 = ∑ 𝑝(𝑥 − 𝜇)2 = ∑ 𝑝(𝑥 2 + 𝜇 2 − 2𝜇𝑥)

𝜎 2 = ∑ 𝑝𝑥 2 + ∑ 𝑝𝜇 2 − 2 ∑ 𝑝𝜇𝑥 = ∑ 𝑝𝑥 2 + 𝜇 2 ∑ 𝑝 − 2𝜇 ∑ 𝑝𝑥

𝜎 2 = ∑ 𝑝𝑥 2 + 𝜇 2 − 2𝜇. 𝜇 = ∑ 𝑝𝑥 2 − 𝜇 2 .

(2) Let a continuous random variable 𝑋 has a probability 𝑓(𝑥)𝑑𝑥 in the interval
1 1
𝑥 − 2 𝑑𝑥 ≤ 𝑋 ≤ 𝑥 + 2 𝑑𝑥. If the variate 𝑋 lies only in the interval 𝑎 ≤ 𝑋 ≤ 𝑏, we have the
expected or mean value.
𝑏
∫𝑎 𝑥𝑓(𝑥)𝑑𝑥 𝑏 𝑏
𝐸(𝑋) = 𝜇 = 𝑏 = ∫𝑎 𝑥𝑓(𝑥)𝑑𝑥 , because ∫𝑎 𝑓(𝑥)𝑑𝑥 = 1.
∫𝑎 𝑓(𝑥)𝑑𝑥

𝑏
The variance in this case is defined as 𝜎 2 = ∫𝑎 (𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥

Where 𝜎 = 𝑠𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = √𝑣𝑎𝑟𝑖𝑎𝑛𝑐𝑒


𝑏 𝑏 𝑏
Also 𝜎 2 = ∫𝑎 𝑥 2 𝑓(𝑥)𝑑𝑥 + 𝜇 2 ∫𝑎 𝑓(𝑥)𝑑𝑥 − 2𝜇 ∫𝑎 𝑥𝑓(𝑥)𝑑𝑥

𝑏 𝑏
𝜎 2 = ∫𝑎 𝑥 2 𝑓(𝑥)𝑑𝑥 + 𝜇 2 − 2𝜇. 𝜇 = ∫𝑎 𝑥 2 𝑓(𝑥)𝑑𝑥 − 𝜇 2 .

(3) The 𝑟 𝑡ℎ moment about the mean (denoted by 𝜇𝑟 ) is defined by

For discrete distribution, 𝜇𝑟 = ∑(𝑥𝑖 − 𝜇)𝑟 𝑓(𝑥𝑖 )



For continuous distribution, 𝜇𝑟 = ∫−∞(𝑥 − 𝜇)𝑟 𝑓(𝑥)𝑑𝑥

(4) Mean deviation from mean is defined as

For discrete distribution, ∑|𝑥𝑖 − 𝜇|𝑓(𝑥𝑖 ).



For continuous distribution, ∫−∞|𝑥 − 𝜇|𝑓(𝑥)𝑑𝑥.

Example 1. A pair of two coins is tossed, what is the expected value and variance?
Solution: In tossing of two coins, probability distribution for number of heads(𝑋)

𝑋 0 1 2
P(𝑋) 1 1 1
4 2 4
1 1 1
Expected value or mean value = 𝐸(𝑋) = 𝜇 = ∑ 𝑝𝑥 = 4 (0) + 2 (1) + 4 (2) = 1

1 1 1 3 1
The variance 𝜎 2 = ∑ 𝑝𝑥 2 − 𝜇 2 = 4 (0)2 + 2 (1)2 + 4 (2)2 − (1)2 = 2 − 1 = 2.

Example 2. A pair of dice is thrown together; find the expected value and variance for sum of
numbers.

Solution: Let 𝑋 denotes the sum of numbers on pair of dice, then probability distribution is

𝑋 = 𝑥𝑖 2 3 4 5 6 7 8 9 10 11 12
P( 1 2 3 4 5 6 5 4 3 2 1
𝑋 = 𝑥𝑖 ) 36 36 36 36 36 36 36 36 36 36 36

Therefore, expected value 𝐸(𝑋) = 𝜇 = ∑ 𝑝𝑥


1 2 3 4 5 6 5 4 3
𝐸(𝑋) = 𝜇 = 36 (2) + 36 (3) + 36 (4) + 36 (5) + 36 (6) + 36 (7) + 36 (8) + 36 (9) + 36 (10) +
2 1 252
(11) + (12) = = 7.
36 36 36

1 2 3 4
Variance of a pair of dice 𝜎 2 = ∑ 𝑝𝑥 2 − (𝜇)2 = 36 (2)2 + 36 (3)2 + 36 (4)2 + 36 (5)2 +
5 6 5 4 3 2 1 1974
(6)2 + 36 (7)2 + 36 (8)2 + 36 (9)2 + 36 (10)2 + 36 (11)2 + 36 (12)2 − (7)2 = − 49 =
36 36
329 35
− 49 = .
6 6

35
Standard deviation = 𝜎 = √ 6 .

Example 3. A bag contains 8 items of which 2 are defective. A man selects 3 items at random.
Find the expected number of defective items he has drawn. Also find variance.

Solution: The expected number of defective items can be zero defective, one defective, two
defective items. Thus, 𝑋 = 0, 1, 2.
𝐶(2,0)×𝐶(6,3) 20
Now 𝑝1 = 𝑃(𝑋 = 0) = = 56
𝐶(8,3)

𝐶(2,1)×𝐶(6,2) 30
𝑝2 = 𝑃(𝑋 = 1) = 𝐶(8,3)
= 56
𝐶(2,2)×𝐶(6,1) 6
𝑝3 = 𝑃(𝑋 = 2) = = 56
𝐶(8,3)

𝑋 0 1 2
P(𝑋) 20 30 6
56 56 56
Hence the expected number of defective items drawn is
20 30 6 3
𝐸(𝑋) = 𝜇 = ∑ 𝑝𝑥 = 𝑝1 𝑥1 + 𝑝2 𝑥2 + 𝑝3 𝑥3 = 56 (0) + 56 (1) + 56 (2) = 4.

20 30 6 3 45
Variance = 𝜎 2 = ∑ 𝑝𝑥 2 − 𝜇 2 = 56 (0)2 + 56 (1)2 + 56 (2)2 − (4)2 = 112

Example 4. The continuous random variable 𝑋 lies only inside the interval (0,2) and its density
function is given as: 𝑓(𝑥) = 𝑘𝑥(2 − 𝑥); 0 ≤ 𝑥 ≤ 2, find the expected value and variance.
2
Solution: Since 𝑥 lies only inside the interval (0,2), we have ∫0 𝑓(𝑥)𝑑𝑥 = 1, therefore

2 3
∫0 𝑘𝑥(2 − 𝑥)𝑑𝑥 = 1, implies that 𝑘 = 4.

2 2 3
Now 𝐸(𝑋) = 𝜇 = ∫0 𝑥𝑓(𝑥)𝑑𝑥 ∫0 𝑥. 4 𝑥(2 − 𝑥)𝑑𝑥 = 1

2 2 3 6 1
Variance = 𝜎 2 = ∫0 𝑥 2 𝑓(𝑥)𝑑𝑥 − (𝜇)2 = ∫0 𝑥 2 . 4 𝑥(2 − 𝑥)𝑑𝑥 − (1)2 = 5 − 1 = 5.

Example 5. The life in hours of a certain kind of radio tube has the probability density function:
100
, 𝑓𝑜𝑟 𝑥 ≥ 100
𝑓(𝑥) = [ 𝑥 2
0 , 𝑓𝑜𝑟 𝑥 < 100

Find (i) the distribution function

(ii) the probability that the life of the tube is 150 hours

(iii) the probability that the life of the tube is more than 150 hours.

Solution: (i) The distribution function is given by


𝑥 100 𝑥 𝑥 100 100
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑓(𝑥)𝑑𝑥 + ∫100 𝑓(𝑥)𝑑𝑥 = 0 + ∫100 𝑑𝑥 = 1 − .
𝑥2 𝑥

(ii) Probability that the tube will have a life of 150 hours
100 1
= 𝐹(150) = 𝑃(𝑋 ≤ 150) = 1 − 150 = 3

1 2
(iii) Probability that the tube will have a life less than 150 hours = 1 − 3 = 3.
Example 6. Find the moment generating function of the exponential distribution
1
𝑓(𝑥) = 𝑐 𝑒 −𝑥/𝑐 , 0 ≤ 𝑥 < ∞ , 𝑐 > 0. Hence find its mean and S.D.

Solution: The moment generating function about the origin is


1
∞ 1 1 ∞ 1
𝑀0 (𝑡) = ∫0 𝑒 𝑡𝑥 . 𝑐 𝑒 −𝑥/𝑐 𝑑𝑥 = 𝑐 ∫0 𝑒 (𝑡−𝑐)𝑥 𝑑𝑥 , because |𝑡| < 𝑐

1 ∞
(𝑡− )𝑥
|𝑒 𝑐 |
1
=𝑐 1
0
= (1 − 𝑐𝑡)−1 = 1 + 𝑐𝑡 + (𝑐𝑡)2 + (𝑐𝑡)3 + ⋯
|(𝑡− |
𝑐

𝑑
Therefore, 𝜇′1 = [𝑑𝑡 𝑀0 (𝑡)] = [0 + 𝑐 + 𝑐 2 2𝑡 + 𝑐 3 3𝑡 2 + ⋯ ]𝑡=0 = 𝑐
𝑡=0

𝑑2
𝜇′2 = [𝑑𝑡 2 𝑀0 (𝑡)] = [2𝑐 2 + 𝑐 3 6𝑡 + ⋯ ]𝑡=0 = 2𝑐 2
𝑡=0

And 𝜇2 = 𝜇′2 − (𝜇′1 )2 = 2𝑐 2 − 𝑐 2 = 𝑐 2 .

Hence the mean is 𝑐 and S.D. is also 𝑐.

BINOMIAL DISTRIBUTION

Binomial distribution is a discrete probability distribution which is obtained when the


probability 𝑝 of the happening of an event is same in all the trials, and there are only two
events in each trial. For example, the probability of getting a head, when a coin is tossed a
number of times, must remain same in each toss, i.e. ½ .

Let an experiment consisting of 𝑛 trials be performed and let the occurrence of an event in any
trial be called a success and its non-occurrence a failure. Let 𝑝 be the probability of success and
𝑞 be the probability of the failure in a single trial, where 𝑞 = 1 − 𝑝, so that 𝑝 + 𝑞 = 1.

Let us assume that trials are independent and the probability of success is same in each trial.
Let us claim that we have 𝑛 trials, then the probability of happening of an event 𝑟 times and
failing (𝑛 − 𝑟) times in any specified order is 𝑝𝑟 𝑞 𝑛−𝑟 (by the theorem on multiplication of
probability). But the total number of ways in which the event can happen 𝑟 times exactly in 𝑛
trials is 𝐶(𝑛, 𝑟). These 𝐶(𝑛, 𝑟) ways are equally likely, mutually exclusive and exhaustive.

Therefore, the probability of 𝑟 successes and (𝑛 − 𝑟) failures in 𝑛 trials in any order,


whatsoever is, 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 .

It can also be expressed in the form


𝑃(𝑋 = 𝑟) = 𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 ; 𝑟 = 0, 1, 2, … , 𝑛.

Where 𝑃(𝑋 = 𝑟) or 𝑃(𝑟) is the probability distribution a random variable 𝑋 of the number of
successes. Giving different values to 𝑟, i.e. putting 𝑟 = 0, 1, 2, … , 𝑛, we get the corresponding
probabilities 𝐶(𝑛, 0)𝑝0 𝑞 𝑛 , 𝐶(𝑛, 1)𝑝1 𝑞 𝑛−1 , 𝐶(𝑛, 2)𝑝2 𝑞 𝑛−2 , … . , 𝐶(𝑛, 𝑛)𝑝𝑛 𝑞 0 , which are the
different terms in the Binomial expansion of (𝑞 + 𝑝)𝑛 .

As a result of it, the distribution 𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 is called Binomial probability
distribution. The two independent constants, 𝑛 and 𝑝 in the distribution are called the
parameters of the distribution.

Again if the experiment (each consisting of 𝑛 trials) be repeated 𝑁 times, the frequency
function of the Binomial distribution is given by

𝑓(𝑟) = 𝑁𝑃(𝑟) = 𝑁𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟

The expected frequencies of 0, 1, 2, 3, …,𝑛 successes in the above set of experiment are the
successive terms in the Binomial expansion of 𝑁(𝑞 + 𝑝)𝑛 ; 𝑤ℎ𝑒𝑟𝑒 𝑝 + 𝑞 = 1, which is also
called Binomial frequency distribution.

Properties of Binomial distribution

1. It is discrete which gives the theoretical probabilities.

2. It depends on the parameters 𝑝 or 𝑞, the probability of success or failure and 𝑛 (the number
of trials). The parameter 𝑛 is always a positive integer.

3. The distribution will be symmetrical if 𝑝 = 𝑞.

4. The statistics of the Binomial distribution one mean= 𝑛𝑝, variance= 𝑛𝑝𝑞 and standard
deviation= √𝑛𝑝𝑞.

5. The mode of the binomial distribution is equal to that value of 𝑋 which has the largest
frequency.

6. The shape and location of a Binomial distribution changes as 𝑝 changes for a given 𝑛 or 𝑛
changes for a given 𝑝.

Mean and variance of Binomial distribution

(i) 𝜇 =Mean or expected value = ∑ 𝑝𝑥


The discrete probability distribution for the Binomial distribution can be displayed as follows:

𝑋 0 1 2 ⋯ ⋯ n
0 𝑛 1 𝑛−1
𝑃(𝑋) 𝐶(𝑛, 0)𝑝 𝑞 𝐶(𝑛, 1)𝑝 𝑞 𝐶(𝑛, 2)𝑝2 𝑞 𝑛−2 ⋯ ⋯ 𝐶(𝑛, 𝑛)𝑝𝑛 𝑞 0
Therefore,

𝜇 = 𝐶(𝑛, 0)𝑝0 𝑞 𝑛 × 0 + 𝐶(𝑛, 1)𝑝1 𝑞 𝑛−1 × 1 + 𝐶(𝑛, 2)𝑝2 𝑞 𝑛−2 × 2 + ⋯ + 𝐶(𝑛, 𝑛)𝑝𝑛 𝑞 0 × 𝑛
𝑛(𝑛−1)
= 0 + 𝑛𝑝𝑞 𝑛−1 + 𝑝2 𝑞 𝑛−2 + ⋯ + 𝑛𝑝𝑛
2!

(𝑛−1)(𝑛−2)
= 𝑛𝑝 (𝑞 𝑛−1 + (𝑛 − 1)𝑝𝑞 𝑛−2 + 𝑝2 𝑞 𝑛−3 + ⋯ + 𝑝𝑛−1 ) = 𝑛𝑝(𝑞 + 𝑝)𝑛−1 = 𝑛𝑝
2

Mean= 𝜇 = 𝑛𝑝.

(ii) Variance = ∑ 𝑝𝑥 2 − 𝜇 2

𝜎 2 = ∑ 𝑝𝑥 2 = ∑[𝑥(𝑥 − 1) + 𝑥]𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥 − 𝜇 2

= ∑ 𝑥(𝑥 − 1) 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥 + 𝑥𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥 − 𝜇 2

= ∑ 𝑥(𝑥 − 1) 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥 + 𝜇 − 𝜇 2 (1)

First taking, ∑ 𝑥(𝑥 − 1) 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥

= 0 + 0 + 2.1𝐶(𝑛, 2)𝑝2 𝑞 𝑛−2 + 3.2𝐶(𝑛, 3)𝑝3 𝑞 𝑛−3 + ⋯ + 𝑛. (𝑛 − 1)𝐶(𝑛, 3)𝑝𝑛 𝑞 0


𝑛(𝑛−1) 𝑛(𝑛−1)(𝑛−2)
= 2.1 𝑝2 𝑞 𝑛−2 + 3.2 𝑝3 𝑞 𝑛−3 + ⋯ + 𝑛(𝑛 − 1)𝑝𝑛 𝑞 0
2.1 3.2.1

= 𝑛(𝑛 − 1)𝑝2 [𝑞 𝑛−2 + 𝐶(𝑛 − 2,1)𝑝𝑞 𝑛−3 + ⋯ + 𝑝𝑛−2 ] = 𝑛(𝑛 − 1)𝑝2 (𝑞 + 𝑝)𝑛−2 = 𝑛(𝑛 − 1)𝑝2

= 𝑛2 𝑝2 − 𝑛𝑝2

From (1)

Variance = 𝜎 2 = 𝑛2 𝑝2 − 𝑛𝑝2 + 𝑛𝑝 − 𝑛2 𝑝2 = 𝑛𝑝(1 − 𝑝) = 𝑛𝑝𝑞.

Standard deviation = √𝑛𝑝𝑞

Recurrence formula for Binomial distribution

By Binomial distribution 𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 ; 𝑃(𝑟 + 1) = 𝐶(𝑛, 𝑟 + 1)𝑝𝑟+1 𝑞 𝑛−𝑟−1

𝑃(𝑟+1) 𝐶(𝑛,𝑟+1)𝑝𝑟+1 𝑞 𝑛−𝑟−1 𝑛! (𝑛−𝑟)!𝑟! 𝑝 (𝑛−𝑟)𝑝


= = (𝑛−𝑟−1)!(𝑟+1)! × .𝑞 =
𝑃(𝑟) 𝐶(𝑛,𝑟)𝑝𝑟 𝑞 𝑛−𝑟 𝑛! (𝑟+1)𝑞

(𝑛−𝑟)𝑝
Therefore, 𝑃(𝑟 + 1) = (𝑟+1)𝑞
𝑃(𝑟).
This is the recurrence formula for binomial distribution.

Example 1. Find the probability that in five tosses of a fair die a 6 appears

(i) twice (ii) at least two times.


1
Solution: probability of 6 in a single throw is = 6.

5
Probability of non-6 face = 6.

1 2 5 3 1 125 625
(i) Hence probability of 6 appearing twice in 5 throws = 𝐶(5,2) (6) (6) = 10. 36 . 216 = 3888

9ii) Probability of 6 appearing at least twice = 𝑃(2) + 𝑃(3) + 𝑃(4) + 𝑃(5) + 𝑃(6)

1 0 5 5 1 1 5 4 763
= 1 − 𝑃(0) − 𝑃(1) = 1 − 𝐶(5,0) (6) (6) − 𝐶(5,1) (6) (6) = 3882.

Example 2. Six dice are thrown 729 times. How many times do you expect at least three dice to
show a five or a six?
2 1 1 2
Solution: Here = 6 = 3 ; 𝑞 = 1 − 𝑝 = 1 − 3 = 3 ; 𝑛 = 6

Hence the Binomial distribution is given by 𝑁 × 𝑃(𝑟 ≥ 3) = 729[1 − 𝑃(0) − 𝑃(1) − 𝑃(2)]

2 6 1 1 2 5 1 2 2 4 64 92 240
= 729 [1 − 𝐶(6,0) (3) − 𝐶(6,1) (3) (3) − 𝐶(6,2) (3) (3) ] = 729 [1 − 729 − 729 − 729]

= 233.

Example 3. Eight coins are tossed at a time for 256 times. Number of heads are observed at
each throw and is recorded as tabulated below. Find the expected frequencies by Binomial
distribution. Compare the theoretical and experimental values of mean and standard deviation.

No. of heads (𝑋) 0 1 2 3 4 5 6 7 8


Frequency (𝑓) 2 6 30 52 67 56 32 10 1

1 1 1
Solution: Probability of head(success) in a single trial 𝑝 = 2, therefore 𝑞 = 1 − 𝑝 = 1 − 2 = 2.

By Binomial distribution 𝑃(𝑟 𝑠𝑢𝑐𝑐𝑒𝑠𝑠𝑒𝑠) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟

Frequency of 𝑟 heads = 𝑁 × 𝑃(𝑟 ℎ𝑒𝑎𝑑𝑠).

1 0 1 8 1
Expected frequency of 0 head = 𝐶(8,0) (2) (2) × 256 = 1 × 1 × 256 × 256 = 1
1 1 1 7 1 1
Expected frequency of 1 head = 𝐶(8,1) (2) (2) × 256 = 8 × 2 × 128 × 256 = 8

1 2 1 6 1 1
Expected frequency of 2 head = 𝐶(8,2) (2) (2) × 256 = 28 × 4 × 64 × 256 = 28

1 3 1 5 1 1
Expected frequency of 3 head = 𝐶(8,3) (2) (2) × 256 = 56 × 8 × 32 × 256 = 56

1 4 1 4 1 1
Expected frequency of 4 head = 𝐶(8,4) (2) (2) × 256 = 60 × 16 × 16 × 256 = 70

1 5 1 3 1 1
Expected frequency of 5 head = 𝐶(8,5) (2) (2) × 256 = 56 × 32 × 8 × 256 = 56

1 6 1 2 1 1
Expected frequency of 6 head = 𝐶(8,6) (2) (2) × 256 = 28 × 64 × 4 × 256 = 28

1 7 1 1 1 1
Expected frequency of 7 head = 𝐶(8,7) (2) (2) × 256 = 8 × 128 × 2 × 256 = 8

1 8 1 0 1
Expected frequency of 8 head = 𝐶(8,8) (2) (2) × 256 = 1 × 256 × 1 × 256 = 1

Mean of theoretical values


0×1+1×8+2×28+3×56+4×70+5×56+6×28+7×8+8×1
= =4
256

Comparison table

Given Frequency Expected Frequency


Heads (𝑥) Freq (𝑓) 𝑥2 𝑓𝑥 2
Heads (𝑥) Freq (𝑓) 𝑥2 𝑓𝑥 2
0 2 0 0 0 1 0 0
1 6 1 6 1 8 1 8
2 30 4 120 2 28 4 112
3 52 9 468 3 56 9 504
4 67 16 1072 4 70 16 1120
5 56 25 1400 5 56 25 1400
6 32 36 1152 6 28 36 1008
7 10 49 490 7 8 49 392
8 1 64 64 8 1 64 64
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2
= 4772 = 4608
∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2 ∑ 𝑓𝑥 2
S.D. = √ ∑𝑓
− ( ∑𝑓 ) S.D. = √ ∑𝑓
− ( ∑𝑓 )

4772 1040 2 4608 1024 2


S.D. = √ 256 − ( 256 ) S.D. = √ 256 − ( 256 )

S.D. = 1.4628 S.D. = 1.4142


∑ 𝑓𝑥 1040 ∑ 𝑓𝑥 1024
Mean = ∑𝑓
= = 4.06 Mean = ∑𝑓
= = 4.
256 256

Example 4. If the sum of the mean and the variance of a Binomial distribution of 5 trials is 4.8,
find the distribution.

Solution: Let the required Binomial distribution be 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 where 𝑛 = number of trials =
5.

Mean of the distribution = 𝑛𝑝 and the variance of the distribution = 𝑛𝑝𝑞.

By the given condition, 𝑛𝑝 + 𝑛𝑝𝑞 = 4.8, implies that 5𝑝 + 5𝑝𝑞 = 4.8

5𝑝(1 + 𝑞) = 4.8, implies that 5(1 − 𝑞)(1 + 𝑞) = 4.8, which implies that
1 1 4
𝑞 = 5, therefore, 𝑝 = 1 − 5 = 5

4 𝑟 1 5−𝑟
Hence the required Binomial distribution is 𝐶(5, 𝑟) (5) (5) .

POISSON DISTRIBUTION

The Poisson distribution is a discrete probability distribution which has the following
characteristics:

(i) It is the limiting form of Binomial distribution as 𝑛 becomes infinitely large i.e. 𝑛 → ∞ and 𝑝,
the constant probability of success for each trial becomes indefinitely small i.e. 𝑝 → 0 in such a
manner that 𝑛𝑝 = 𝑚 remains a finite number.

(ii) It consists of a single parameter 𝑚 only. The entire distribution can be obtained once 𝑚 is
known.

It has wide applications in physical, engineering and management sciences as well as in


economics, operational research and reliability.
Binomial distribution to Poisson distribution:

We will derive Poisson distribution as a limiting case of Binomial distribution when

𝑝 → 0, 𝑛 → ∞ such that 𝑛𝑝 = 𝑚 ( a finite quantity ). We know that in a Binomial distribution,


the probability of 𝑟 successes is given by
𝑛(𝑛−1)(𝑛−2)…(𝑛−(𝑟−1))
𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 = 𝑝𝑟 (1 − 𝑝)𝑛−𝑟
𝑟!

𝑛(𝑛−1)(𝑛−2)…(𝑛−(𝑟−1)) 𝑚 𝑟 𝑚 𝑛−𝑟 𝑛
𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 = ( 𝑛 ) (1 − 𝑛 ) , since 𝑛𝑝 = 𝑚 𝑜𝑟 𝑝 = 𝑚.
𝑟!

𝑛(𝑛−1)(𝑛−2)…(𝑛−(𝑟−1)) 𝑚 𝑛 𝑚 −𝑟
𝑃(𝑟) = 𝐶(𝑛, 𝑟)𝑝𝑟 𝑞 𝑛−𝑟 = 𝑚𝑟 (1 − 𝑛 ) (1 − 𝑛 )
𝑟!𝑛𝑟

𝑛 𝑛−1 𝑛−2 𝑛−𝑟+1 𝑚𝑟 𝑚 𝑛 𝑚 −𝑟


𝑝(𝑟) = 𝑛 ( )( )…( ) . 𝑛𝑟 . (1 − 𝑛 ) (1 − 𝑛 )
𝑛 𝑛 𝑛

1 2 𝑟−1 𝑚𝑟 𝑚 𝑛 𝑚 −𝑟
𝑝(𝑟) = (1 − 𝑛) (1 − 𝑛) … (1 − ) . 𝑛𝑟 . (1 − 𝑛 ) (1 − 𝑛 )
𝑛

Now for a given value of 𝑟, when 𝑛 → ∞, we have

1 2 𝑟−1 𝑚 −𝑟 𝑚 𝑛
lim (1 − 𝑛) (1 − 𝑛) … (1 − ) = 1, lim (1 − 𝑛 ) = 1, , lim (1 − 𝑛 ) = 𝑒 −𝑚
𝑛→∞ 𝑛 𝑛→∞ 𝑛→∞

As such in the limiting form


𝑒 −𝑚 𝑚𝑟
𝑃(𝑟) = .
𝑟!

This is the probability of 𝑟 successes for Poisson distribution. For 𝑟 = 0,1,2, … we get the
probabilities of 0, 1, 2, … successes as

𝑒 −𝑚 𝑚2 𝑒 −𝑚 𝑚3
𝑃(0) = 𝑒 −𝑚 , 𝑃(1) = 𝑚𝑒 −𝑚 , 𝑃(2) = , 𝑃(3) = and so on.
2! 3!

Note:- Sum of the probabilities 𝑃(𝑟) for 𝑟 = 0,1,2,3, … is 1.

𝑒 −𝑚 𝑚3
Proof: ∑ 𝑃(𝑟) = 𝑃(0) + 𝑃(1) + 𝑃(2) + ⋯ = 𝑒 −𝑚 + 𝑚𝑒 −𝑚 + +⋯
3!

𝑚2 𝑚3
= 𝑒 −𝑚 (1 + 𝑚 + + + ⋯ ) = 𝑒 −𝑚 . 𝑒 𝑚 = 1.
2! 3!

Mean and Variance of Poisson distribution:


𝑒 −𝑚 𝑚𝑟
We know that Poisson distribution is 𝑃(𝑟) = , 𝑟 = 0,1,2,3, …
𝑟!

Mean (𝜇) = ∑∞
𝑖=0 𝑝𝑖 𝑥𝑖 𝑂𝑅 ∑ 𝑟𝑃(𝑟) = 0 + 1. 𝑃(1) + 2𝑃(2) + 3. 𝑃(3) + ⋯
𝑒 −𝑚 𝑚1 𝑒 −𝑚 𝑚2 𝑒 −𝑚 𝑚3
= 1. + 2. + 3. +⋯
1! 2! 3!

𝑚2 𝑚3
= 𝑚𝑒 −𝑚 (1 + 𝑚 + + + ⋯ ) = 𝑚𝑒 −𝑚 𝑒 𝑚 = 𝑚
2! 3!

Also Variance 𝜎 2 = ∑ 𝑟 2 𝑃(𝑟) − (𝜇)2 ∑ 𝑟 2 𝑃(𝑟) − 𝑚2 (1)

Taking ∑ 𝑟 2 𝑃(𝑟) = ∑[𝑟(𝑟 − 1) + 𝑟]𝑃(𝑟) = ∑ 𝑟(𝑟 − 1)𝑃(𝑟) + ∑ 𝑟𝑃(𝑟)

= 0 + 0 + 2.1. 𝑃(2) + 3.2. 𝑃(3) + 4.3. 𝑃(4) + ⋯ + 𝜇


𝑒 −𝑚 𝑚2 𝑒 −𝑚 𝑚3 𝑒 −𝑚 𝑚4
=2 + 3.2 + 4.3. + ⋯ + 𝑚 , because 𝜇 = 𝑚
2! 3! 4!

𝑚2 𝑚3
= 𝑚2 𝑒 −𝑚 (1 + 𝑚 + + + ⋯ ) + 𝑚 = 𝑚2 𝑒 −𝑚 𝑒 𝑚 + 𝑚 = 𝑚2 + 𝑚
2! 3!

Therefore, from (1), we get

𝜎 2 = 𝑚2 + 𝑚 − 𝑚2 = 𝑚

Standard deviation (S.D.) = √𝑚 .

Constants of Poisson distribution

𝜇1 = 0 , 𝜇2 = 𝑚 , 𝜇3 = 𝑚 , 𝜇4 = 𝑚 + 3𝑚2
𝜇 2 𝑚2 1 𝜇 𝑚+3𝑚2 1
𝛽1 = 𝜇3 3 = 𝑚3 = 𝑚 , 𝛽2 = 𝜇 42 = = 3 + 𝑚.
2 2 𝑚2

Recurrence formula for the Poisson distribution:

𝑒 −𝑚 𝑚𝑟 𝑒 −𝑚 𝑚𝑟+1
We have 𝑃(𝑟) = and (𝑟 + 1) = , therefore
𝑟! (𝑟+1)!

𝑃(𝑟+1) 𝑒 −𝑚 𝑚𝑟+1 𝑟! 𝑚 𝑚
= × 𝑒 −𝑚 𝑚𝑟 = 𝑟+1 , implies that 𝑃(𝑟 + 1) = 𝑟+1 𝑃(𝑟), 𝑟 = 0,1,2,3, ….
𝑃(𝑟) (𝑟+1)!

Which is the required recurrence formula for Poisson distribution. With this formula we can
find 𝑃(1), 𝑃(2), 𝑃(3), … if 𝑃(0) is given.

Example 1. Suppose a book of 585 pages contains 43 typographical errors. If these errors are
randomly distributed throughout the book, what is the probability that 10 pages, selected at
random, will be free from errors? ( Use 𝑒 −0.735 = 0.4795 )
43
Solution: Here 𝑝 = 585 = 0.0735 and 𝑛 = 10, therefore, 𝑚 = 𝑛𝑝 = 10 × 0.0735 = 0.735.
Clearly, 𝑝 is very small and 𝑛 is large. So, it is a case of Poisson distribution.

Let 𝑋 denote the number of errors in 10 pages.

𝑒 −𝑚 𝑚𝑟 𝑒 −0.735 (0.735)𝑟
Then, 𝑃(𝑋 = 𝑟) = =
𝑟! 𝑟!

𝑒 −0.735 (0.735)0
Therefore, 𝑃(𝑛𝑜 𝑒𝑟𝑟𝑜𝑟) = 𝑃(𝑋 = 0) = = 𝑒 −0.735 = 0.4795.
0!

Hence the required probability is 0.4795.

Example 2. In a certain factory turning out razor blades, there is a small chance of 0.002 for any
blade to be defective. The blades are supplied in packets of 10. Calculate the approximate
number of packets containing no defective, one defective and two defective blades in a
consignment of 10000 packets. ( Given 𝑒 −0.02 = 0.9802 )

Solution: Here 𝑁 = 10000 , 𝑝 = 0.002 , 𝑛 = 10 therefore, 𝑚 = 𝑛𝑝 = 10 × 0.002 = 0.02

Let 𝑟 be the number of defective blades in a packet.

Let 𝑃(𝑟) be the number of packets containing 𝑟 defective blades, then


𝑒 −𝑚 𝑚𝑟
𝑃(𝑟) = 𝑁 × 𝑟!

(i) 𝑃(0) = Number of packets with no defective blades

𝑒 −0.02 (0.02)0
= 10000 × = 10000 × 0.9802 = 9802.
0!

(ii) 𝑃(1) = Number of packets with one defective blades

𝑒 −0.02 (0.02)1
= 10000 × = 10000 × 0.9802 × 0.02 = 196.04.
1!

Therefore, Number of packets with one defective blade =196.

(iii) 𝑃(2) = Number of packets with two defective blades

𝑒 −0.02 (0.02)2
= 10000 × = 10000 × 0.9802 × 0.0002 = 1.96.
2!

Therefore, Number of packets with two defective blade =2.

Example3. If the variance of a Poisson distribution is 2, find the probabilities for 𝑟 = 1,2,3,4
from the recurrence relation of the Poisson distribution. Also find 𝑃(𝑋 ≥ 4).(𝑒 −2 = 0.1353)

𝑒 −2 (2)0
Solution: Here variance = 𝑚 = 2. 𝑃(0) = = 𝑒 −2 = 0.1353
0!
𝑚 2
We know that 𝑃(𝑟 + 1) = 𝑟+1 𝑃(𝑟) = 𝑟+1 𝑃(𝑟) (1)

Putting 𝑟 = 0,1,2,3 in (1), we get


2
𝑃(1) = 0+1 𝑃(0) = 2𝑒 −2 = 2 × 0.1353 = 0.2706

2 2 2
𝑃(2) = 1+1 𝑃(1) = 0.2706 , 𝑃(3) = 2+1 𝑃(2) = 3 × 0.2706 = 0.1804

2 2
𝑃(4) = 3+1 𝑃(3) = 4 × 0.1804 = 0.0902

Now, 𝑃(𝑋 ≥ 4) = 1 − [𝑃(0) + 𝑃(1) + 𝑃(2) + 𝑃(3)]

= 1 − [0.1353 + 0.2706 + 0.1804 + 0.0902 = 0.1431]

Example 4. Fit a Poisson distribution to the following:

x 0 1 2 3 4
𝑓 192 100 24 3 1

𝑒 −𝑚 𝑚𝑟
Solution: 𝑃(𝑟) = 𝑟!

0×192+1×100+2×24+3×3+4×1 161
𝑚 = mean of the distribution = = 320 = 0.5
192+100+24+3+1

𝑒 −0.5 (0.5)0
𝑃(0) = = 0.6065, therefore, theoretical freq. 𝑓 = 320 × 0.6065 = 194 (approx.)
0!

𝑒 −0.5 (0.5)1
𝑃(1) = = 0.30325, therefore, theoretical freq. 𝑓 = 320 × 0.30325 = 97 (approx.)
1!

𝑒 −0.5 (0.5)2
𝑃(2) = = 0.07581, therefore, theoretical freq. 𝑓 = 320 × 0.07581 = 24 (approx.)
2!

𝑒 −0.5 (0.5)3
𝑃(3) = = 0.0126, therefore, theoretical freq. 𝑓 = 320 × 0.0126 = 4 (approx.)
3!

𝑒 −0.5 (0.5)4
𝑃(4) = = 0.0016, therefore, theoretical freq. 𝑓 = 320 × 0.0016 = 0.512 𝑜𝑟 1
4!
(approx.)

As total number of trials = 320.

We have the approximate values as obtained by Poisson distribution as:

x 0 1 2 3 4
𝑓 194 97 24 4 1
NORMAL DISTRIBUTION

Normal distribution is the most popular and commonly used distribution. It was discovered by
De-Moivre in 1733, after 20 years when Bernaulli gave Binomial distribution. This distribution is
a limiting case of Binomial distribution when neither 𝑝 and 𝑞 are too small and 𝑛, the number
of trials becomes infinitely large i.e. 𝑛 → ∞. In fact any quantity whose variation depends on
random cause will be distributed according to the normal distribution whereas in Binomial and
Poisson distribution, 𝑋 assume values like 0, 1, 2, …. And thus these distributions are discrete
distributions. Cases when the variables can assume any value between 0 and 1 or between 1
and 2 are classified under the continuous variate, for example in case of height , weight etc.

The continuous random variable 𝑥 is said to have a normal distribution, if its probability density
function is defined as:
1 𝑥−𝑎 2
𝑓(𝑥) = 𝑘. 𝑒 −2( )
𝑏 for −∞ < 𝑥 < ∞ , where 𝑘 is a constant and 𝑎 and 𝑏 are two parameters.

Now 𝑃(−∞ < 𝑥 < ∞) = ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1

1 𝑥−𝑎 2

Therefore, 1 = 𝑘. ∫−∞ 𝑒 −2( 𝑏 ) 𝑑𝑥 (1)

𝑥−𝑎 𝑑𝑥
Taking 𝑦 = , therefore 𝑑𝑦 = , implies that 𝑑𝑥 = √2𝑏𝑑𝑦
√2𝑏 √2𝑏

Therefore, from (1)


∞ 2 ∞ 2 ∞ 2
1 = 𝑘 ∫−∞ 𝑒 −𝑦 √2𝑏𝑑𝑦 = 𝑘√2𝑏 ∫−∞ 𝑒 −𝑦 𝑑𝑦 = 2𝑘√2𝑏 ∫0 𝑒 −𝑦 𝑑𝑦 (2)

1
Now taking 𝑦 2 = 𝑡, therefore 𝑦 = √𝑡, implies that 𝑑𝑦 = 2 𝑡 𝑑𝑡

From (2)

∞ ∞ 1
1
1 = 2𝑘√2𝑏 ∫0 𝑒 −𝑡 2 𝑡 𝑑𝑡 = √2𝑘𝑏 ∫0 𝑒 −𝑡 𝑡 −2 𝑑𝑡 = √2𝑘𝑏Γ(1/2)= √2𝑘𝑏√𝜋.

1
Therefore = 𝑏√2𝜋 .

1 𝑥−𝑎 2
1 − ( )
Thus 𝑓(𝑥) = ×𝑒 2 𝑏 for −∞ < 𝑥 < ∞.
√2𝜋𝑏

Mean of normal distribution

Mean (𝜇) of normal distribution is given by



∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 1 𝑥−𝑎 2
∞ 1 ∞ ∞
= ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 = 𝑏√2𝜋 ∫−∞ 𝑥𝑒 −2( )
𝜇= ∞ 𝑏 𝑑𝑥 , because ∫−∞ 𝑓(𝑥)𝑑𝑥 = 1
∫−∞ 𝑓(𝑥)𝑑𝑥

𝑥−𝑎
Putting = 𝑧, implies that 𝑥 = 𝑎 + 𝑏𝑧, therefore, 𝑑𝑥 = 𝑏𝑑𝑧
𝑏

𝑧2 𝑧 2 𝑧 2
1 ∞ − 1 ∞ − 1 ∞ −
𝜇= ∫ (𝑎 + 𝑏𝑧)𝑒 2 𝑑𝑧 = ∫ 𝑎𝑒 2 𝑑𝑧 + ∫ 𝑏𝑧𝑒 2 𝑑𝑧
√2𝜋 −∞ √2𝜋 −∞ √2𝜋 −∞

𝑧 2 𝑧2
∞ 2
𝑏∞ − 𝑏 − ∞ −𝑧
𝜇=𝑎+ ∫ 𝑧𝑒 2 𝑑𝑧 = 𝑎 + [−𝑒 2 ] = 𝑎 + 0 = 𝑎 , because ∫−∞ 𝑒 2 𝑑𝑧 = √2𝜋.
√2𝜋 −∞ √2𝜋 −∞

Therefore, Mean (𝜇) = 𝑎.

Variance of Normal distribution


1 𝑥−𝜇 2
∞ 1 ∞
Variance (𝜎 2 ) = ∫−∞(𝑥 − 𝜇)2 𝑓(𝑥)𝑑𝑥 = 𝑏√2𝜋 ∫−∞(𝑥 − 𝜇)2 𝑒 −2( )
𝑏 𝑑𝑥

1 𝑥−𝜇 2 𝑏
Putting 𝑦 = 2 ( ) , therefore 𝑑𝑥 = 𝑑𝑦 and so
𝑏 √2√𝑦

1 1
𝑏2 ∞ 2𝑏 2 ∞ 2𝑏 2 ∞
𝜎2 = ∫ 𝑦 2 𝑒 −𝑦 𝑑𝑦 = ∫0 𝑦 2 𝑒 −𝑦 𝑑𝑦 = Γ(3/2) , because Γ(n)= ∫0 𝑒 −𝑥 𝑥 𝑛−1 𝑑𝑥
√𝜋 −∞ √𝜋 √𝜋

2𝑏 2 1 1 b2
𝜎2 = Γ( ) = √π = b2 , therefore Standard deviation 𝜎 = 𝑏.
√𝜋 2 2 √π

Hence probability density function of the normal distribution becomes

1 𝑥−𝜇 2
1
𝑓(𝑥) = 𝜎√2𝜋 𝑒 −2( )
𝜎 , −∞ < 𝑥 < ∞.

The curve 𝑦 = 𝑓(𝑥) is called normal probability curve.


𝑥−𝜇
The variate 𝑧 = is called standard normal variate, whose mean and variance are 0 and 1,
𝜎
and the distribution of 𝑧 is called standard normal distribution and is written as
𝑧2
1
𝜑(𝑧) = 𝑒 − 2 , −∞ < 𝑧 < ∞.
√2𝜋

1 𝑥−𝜇 2
1
The curve 𝑦 = 𝑓(𝑥) = 𝜎√2𝜋 𝑒 −2( )
𝜎 , −∞ < 𝑥 < ∞, where 𝜇 and 𝜎 are the mean and the
standard deviation respectively is shown in the figure.
The shaded area under the curve from 𝑥 = 𝑎 to 𝑥 = 𝑏 gives probability of the variable 𝑥 lying
between the values 𝑎 and 𝑏. Thus

1 𝑥−𝜇 2
1 𝑏
𝑃(𝑎 ≤ 𝑥 ≤ 𝑏) = 𝜎√2𝜋 ∫𝑎 𝑒 −2( )
𝜎 𝑑𝑥

1 𝑥−𝜇 2
1
If total frequency be 𝑁, the normal frequency distribution is given by 𝑁. 𝜎√2𝜋 𝑒 −2( )
𝜎 .

1 𝑥−𝜇 2
1𝑏
The frequency of the variable 𝑥 between 𝑎 and 𝑏 is given by 𝑁. 𝜎√2𝜋 ∫𝑎 𝑒 −2( 𝜎 ) 𝑑𝑥.

Notes.

1. The normal probability curve extends from −∞ to ∞.

2. Total area under the curve 𝑦 = 𝑓(𝑥), above x-axis, from −∞ to ∞ is unity.

3. The area of the normal probability curve between 𝜇 − 𝜎 and 𝜇 + 𝜎 is 68.27% or 0.6827.

4. The area of the normal probability curve between 𝜇 − 2𝜎 and 𝜇 + 2𝜎 is 95.45% or 0.9545.

5. The area of the normal probability curve between 𝜇 − 3𝜎 and 𝜇 + 3𝜎 is 99.73% or 0.9973.
𝑥−𝜇
6. By using the transformation 𝑧 = , we get the standard normal curve and the total area
𝜎
under this curve is unity. The line 𝑧 = 0 divides the whole area in two equal parts. The left side
area of 𝑧 = 0 is 0.5 whereas right side area 𝑧 = 0 is also 0.5. The area in between the ordinates
𝑧 = 0 and 𝑧 = 𝑧1 , can be computed from the standard table.
1 𝑥−𝜇 2
1 ∞
7. The mean deviation from mean = 𝜎√2𝜋 ∫−∞|𝑥 − 𝜇|𝑒 −2( )
𝜎 𝑑𝑥

𝑥−𝜇
Putting = 𝑧, implies that 𝑥 = 𝜇 + 𝜎𝑧, therefore 𝑑𝑥 = 𝜎𝑑𝑧, we get
𝜎

𝑧2 𝑧 2
1 ∞ 𝜎 ∞
The mean deviation from mean = 𝜎√2𝜋 ∫−∞|𝜎𝑧|𝑒 − 2 𝜎𝑑𝑧 = ∫ |𝑧|𝑒 −
2 𝑑𝑧
√2𝜋 −∞

𝑧 2 𝑧 2 𝑧2
−∞ 𝑧2

𝜎 0 − 𝜎 ∞ − 𝜎 𝜎
= ∫ (−𝑧)𝑒 2 𝑑𝑧 + ∫ (+𝑧)𝑒 2 𝑑𝑧 = |−𝑒 − 2 | + |−𝑒 − 2 |
√2𝜋 −∞ √2𝜋 0 √2𝜋 0 √2𝜋 0

𝜎 𝜎 2 2 4
= (0 + 1) + (0 + 1) = 𝜎√ = 𝜎√ = 0.80𝜎(𝑎𝑝𝑝𝑟𝑜𝑥. ) = 5 𝜎.
√2𝜋 √2𝜋 𝜋 3.1416

4
Therefore, mean deviation from mean in the normal distribution = 5 𝜎.
5
In other words standard deviation 𝜎 = 4 times mean deviation from mean.

Therefore, 𝜎 = 25% more than the mean deviation from mean.

Constants of Normal Distribution

1. The mean of the normal distribution is 𝑥̅ .

2. The S.D. of the normal distribution is 𝜎. 𝜇2 = 𝜎 2 , 𝜇3 0, 𝜇4 = 3𝜎 2 .

(𝜇 )2
3. 𝛽1 or moment coefficient of skewness = (𝜇3 )3 = 0.
2

𝜇 3𝜎4
4. 𝛽2 or moment coefficient of kurtosis = (𝜇 4)2 = = 3.
2 𝜎4

Example1. In a normal distribution if 𝜇 = 50 and 𝜎 = 10, find (i) 𝑃(50 ≤ 𝑥 ≤ 80) (ii) 𝑃(60 ≤
𝑥 ≤ 70)

(iii) 𝑃(30 ≤ 𝑥 ≤ 40) (iv) 𝑃(40 ≤ 𝑥 ≤ 60).


𝑥−𝜇 𝑥−50
Solution: Standard normal variate 𝑧 = =
𝜎 10

50−50 80−50
(i) 𝑧 = = 0 when 𝑥 = 50 and 𝑧 = = 3 when 𝑥 = 80.
10 10

Hence, 𝑃(50 ≤ 𝑥 ≤ 80) = 𝑃(0 ≤ 𝑧 ≤ 3) = 0.4987.

(ii) 𝑃(60 ≤ 𝑥 ≤ 70) = 𝑃(1 ≤ 𝑧 ≤ 2) = Area from 𝑧 = 1 to 𝑧 = 2.

= (Area from 𝑧 = 0 to 𝑧 = 2) –( Area from 𝑧 = 0 to 𝑧 = 1) = 0.4772 − 0.3413 = 0.1359.


(iii) 𝑃(30 ≤ 𝑥 ≤ 40) = 𝑃(−2 ≤ 𝑧 ≤ −1)

Due to symmetry, area between 𝑧 = −1 to 𝑧 = −2 will be the same as between 𝑧 = 1 to 𝑧 = 2

Which is the same as in (ii) i.e. 0.1359.

(iv) 𝑃(40 ≤ 𝑥 ≤ 60) = 𝑃(−1 ≤ 𝑧 ≤ 1) = Area from 𝑧 = −1 to 𝑧 = 1

=Twice the area between 𝑧 = 0 to 𝑧 = 1 = 2 × 0.3413 = 0.6826.

Example2. In a normal distribution 31% of items are under 45 and 8% are over 64. Find the
mean and standard deviation of the distribution.

Solution: Let 𝜇 be the mean and 𝜎 be the standard deviation of the distribution.
𝑥−𝜇
Normal variate 𝑧 = .
𝜎

As 31% of items are under 45 and 8% over 64%


64−𝜇
Therefore, at 𝑥 = 64, we have 𝑧 = = 𝑧1 (say)
𝜎
Area from 𝑧 = 0 to 𝑧 = 𝑧1 is 42% i.e. 0.42. The value of 𝑧1 corresponding to area 0.42 from the
table is 1.405. Therefore,
64−𝜇
1.405 = (1)
𝜎

45−𝜇
Similarly, at 𝑥 = 45 , |𝑧| = | | = |𝑧2 |
𝜎

Area between 𝑥 = 45 (𝑖. 𝑒. 𝑧 = 𝑧2 ) 𝑡𝑜 𝑥 = 𝜇 (𝑧 = 0) is the same numerically as between 𝑧 = 0


to 𝑧 = 𝑧2 which is 19%.

Now the value of the normal variate 𝑧2 corresponding to area 0.19 is 0.495.
45−𝜇 𝜇−45
Therefore, | | = 0.495 OR = 0.495 (2)
𝜎 𝜎

From (1) and (2) 𝜇 = 50 and 𝜎 = 10.

Example3. A sample of 100 dry battery cells tested to find the length of life produced the
following results: 𝑋̅ = 12 ℎ𝑜𝑢𝑟𝑠, 𝜎 = 3 ℎ𝑜𝑢𝑟𝑠. Assuming the data to be normally distributed,
what %age of battery cells are expected to have life

(a) more than 15 hours. (b) less than 6 hours (c) between 10 and 14 hours.
𝑋−𝑋̅ 𝑋−12
Solution: Let 𝑋 denotes the length of life of dry battery cells. Also 𝑧 = =
𝜎 3

15−12
(a) When 𝑋 = 15, 𝑧 = =1
3

Therefore, 𝑃(𝑋 > 15) = 𝑃(𝑧 > 1) = area to the right of 𝑧 = 1 is 0.5 − 0.3413 = 0.1587.

Therefore, %age of battery cells having life more than 15 hours = 0.1587 × 100 = 15.87%.
6−12
(b) When 𝑋 = 6, 𝑧 = = −2
3

Therefore, 𝑃(𝑋 < 15) = 𝑃(𝑧 < −2) = area to the left of 𝑧 = −2 is 0.5 − 0.4722 = 0.0228.
Therefore, %age of battery cells having life less than 6 hours = 0.0228 × 100 = 2.28%.
10−12 14−12
(c) When 𝑋 = 10 , 𝑧 = = −0.67 and when 𝑋 = 14 , 𝑧 = = 0.67
3 3

Therefore, 𝑃(10 < 𝑋 < 14) = 𝑃(−0.67 < 𝑧 < 0.67) = 2 × 𝑃(0 < 𝑧 < 0.67)

Area between 𝑋 = 10 to 𝑋 = 14 is twice the area to the left of 𝑧 = 0.67

= 2 × 0.2487 = 0.4974.

Therefore, %age of battery cells having life span between 10 hours and 14 hours is 49.74%.

Example4. The distribution of a random variable is given by


1 2 −30𝑥)
𝑓(𝑥) = 𝐶𝑒 −50(9𝑥 , −∞ < 𝑥 < ∞

Find the constant 𝐶, the mean and the variance of the random variable. Find also the upper 5%
value of the random variable.

Solution: 𝑃(−∞ < 𝑥 < ∞) = 1 = unit area under the curve 𝑦 = 𝑓(𝑥) and 𝑥 −axis.
1 2 −30𝑥)
∞ ∞
Therefore, 1 = ∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝐶𝑒 −50(9𝑥 𝑑𝑥 (1)

1 9 10 9 5 2 25 9 5 2 1
Now − 50 (9𝑥 2 − 30𝑥) = − 50 (𝑥 2 − 𝑥) = − 50 [(𝑥 − 3) − =] − 50 (𝑥 − 3) + 2
3 9

Therefore, from (1)

5 2 5 2
(𝑥− ) 1 𝑥−3
3
9 5 2 1 1 − 50 1 − ( )
∞ − (𝑥− ) + ∞ ∞ 2 5/3
1 = 𝐶 ∫−∞ 𝑒 50 3 2 𝑑𝑥 = 𝐶𝑒 ∫−∞ 𝑒 2 9 𝑑𝑥 = 𝐶𝑒 ∫−∞ 𝑒2 𝑑𝑥

Comparing this with normal distribution, we have

1 𝑥−𝜇 2
1 ∞ 5 5 1
1 = 𝜎√2𝜋 ∫−∞ 𝑒 −2( )
𝜎 𝑑𝑥 , we get 𝜇 = 3 1.667 , 𝜎 = 3 , 𝐶 = = 0.145.
√𝑒√2𝜋𝜎

5
𝑥−𝜇 𝑥−
3
Standard normal variate 𝑧 = = 5 corresponding to 95% area , we have
𝜎
3

5
𝑥−
3
1.96 = 5 , implies that 𝑥 = 4.933.
3
Exponential distribution

A random variable 𝑋 is said to have an exponential distribution with parameter 𝛾 > 0if its
probability density function 𝑓(𝑥) is defined as

𝛾𝑒 −𝛾𝑥 , 𝑥 ≥ 0
𝑓(𝑥) = {
0, 𝑥<0

Cumulative distribution function

The cumulative distribution function 𝐹(𝑥) is defined as


𝑥
𝐹(𝑥) = 𝑃(𝑋 ≤ 𝑥) = ∫−∞ 𝛾𝑒 −𝛾𝑥 𝑑𝑥

Implies that

1 − 𝑒 −𝛾𝑥 , 𝑥 ≥ 0
𝐹(𝑥) = {
0, 𝑥<0

Mean of exponential distribution


∞ 0 ∞
𝜇 = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑥. 0𝑑𝑥 + ∫0 𝑥𝛾𝑒 −𝛾𝑥

𝑒 −𝛾𝑥 𝑒 −𝛾𝑥 ∞ 1
= 𝛾 [𝑥 − (−𝛾)2 ] = 𝛾
−𝛾 0

Variance of exponential distribution



𝜎 2 = ∫−∞ 𝑥 2 𝑓(𝑥)𝑑𝑥 − (𝜇)2

0 ∞ 1 2 1
= ∫−∞ 𝑥 2 . 0𝑑𝑥 + ∫0 𝑥 2 𝛾𝑒 −𝛾𝑥 𝑑𝑥 − (𝛾) = 𝛾2

Example1. A power supply unit for a computer component is assumed to follow an exponential
distribution with a mean life of 1200 hours. What is the probability that the component will

(i) fail in first 300 hours.

(ii) survive more than 150 hours.

(iii) last between 1200 hours and 1500 hours.

Solution: Given that mean life is 1200, we know that


1
Mean = 𝛾 , therefore

1 1
1200 = 𝛾 , implies that 𝛾 = 1200

(i) The probability that the component will fall in the first 300 hours is

𝑥 𝑒 −𝛾𝑥 𝑥
𝑃(𝑋 ≤ 𝑥) = ∫0 𝛾𝑒 −𝛾𝑥 𝑑𝑥 = 𝛾 [ ] = 1 − 𝑒 −𝛾𝑥
−𝛾 0

But here 𝑥 = 300


300
𝑃(𝑋 ≤ 300) = 1 − 𝑒 −1200 = 1 − 𝑒 −0.25 = 1 − 0.7788 = 0.2212

(ii) The probability that the component will survive more than 1500 hours is

∞ 𝑒 −𝛾𝑥 ∞
𝑃(𝑋 ≥ 𝑥) = ∫𝑥 𝛾𝑒 −𝛾𝑥 𝑑𝑥 = 𝛾 [ ] = 𝑒 −𝛾𝑥
−𝛾 𝑥

But here 𝑥 = 1500, therefore


150
𝑃(𝑋 ≥ 1500) = 𝑒 −1200 = 0.2865

(iii) The probability that the component will lost between 120 hours and 1500 hours is given as

𝑃(1200 ≤ 𝑋 ≤ 1500), implies that

𝑃(1200 ≤ 𝑋 ≤ 1500) = 1 − [𝑃(𝑋 ≤ 1200) + 𝑃(𝑋 ≥ 1500)

= 1 − [(1 − 𝑒 −𝛾𝑥 ) + (𝑒 −𝛾𝑥 )] = 1 − [(1 − 𝑒 −1 ) + (𝑒 −1.25 )]

= 1 − [(0.6321) + (0.2865)] = 0.0813

Example2. A random variable has an exponential distribution with probability density function
given by

3𝑒 −3𝑥 , 𝑥 > 0
𝑓(𝑥) = {
0, 𝑥≤0

What is the probability that 𝑥 not less than 4 ? Find the mean and standard deviation. Show
that coefficient of variation is 1.

Solution: 𝑃(𝑥 𝑖𝑠 𝑛𝑜𝑡 𝑙𝑒𝑠𝑠 𝑡ℎ𝑎𝑛 4) = 𝑃(𝑥 ≥ 4) = ∫4 3𝑒 −3𝑥 𝑑𝑥 = 𝑒 −12
∞ ∞ 1
Mean (𝜇) = 𝐸(𝑋) = ∫0 𝑥𝑓(𝑥)𝑑𝑥 = 3 ∫0 𝑥𝑒 −3𝑥 𝑑𝑥 = 3
∞ 1 1
Variance 𝜎 2 = 𝐸[𝑋 2 ] − [𝐸(𝑋)]2 = ∫0 𝑥 2 3𝑒 −3𝑥 𝑑𝑥 − (3)2 = 9

1
Standard deviation (𝜎) = 3

We know that
1
𝑀𝑒𝑎𝑛 3
Coefficient of variation = 𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑑𝑒𝑣𝑖𝑎𝑡𝑖𝑜𝑛 = 1 =1
3

Gamma Distribution

A continuous random variable 𝑋 is said to have gamma distribution with parameter 𝛼 𝑎𝑛𝑑 𝛾 if
its probability density function is defined as

𝛼𝛾 𝑒 −𝛼𝑥 𝑥 𝛾−1
̅ , 𝑥 ≥ 0 , 𝛼, 𝛾 > 0
𝑓(𝑥) = { |𝛾
0 , 𝑥<0

Cumulative distribution function

The cumulative density function for gamma function is

𝑥 𝛼𝛾 𝑒 −𝛼𝑥 𝑥 𝛾−1 𝛼𝛾 𝑥 𝛾−1 −𝛼𝑥


𝑓(𝑥) = ∫0 ̅ 𝑑𝑥 = ̅ ∫0 𝑥 𝑒 𝑑𝑥
|𝛾 |𝛾

1
Putting 𝛼𝑥 = 𝑦 𝑡ℎ𝑒𝑛 𝑑𝑥 = 𝛼 𝑑𝑦

𝛼𝛾 𝛼𝑥 𝛾−1 −𝑦 1 ∞
−𝛼𝑥 𝛾−1
= ̅ ∫0 𝑦 𝑒 𝑑𝑦 = |𝛾
̅ ∫0 𝑒 𝑥 𝑑𝑥
|𝛾

Remarks:

The function 𝑓(𝑥) represents a probability density function since

∞ 0 ∞ 𝛼𝛾 𝑒 −𝛼𝑥 𝑥 𝛾−1
∫−∞ 𝑓(𝑥)𝑑𝑥 = ∫−∞ 0𝑑𝑥 + ∫0 ̅
|𝛾
𝑑𝑥

𝛼𝛾 |𝛾
= ̅ . 𝛼𝛾 =1
|𝛾

Mean of Gamma function

∞ ∞ 𝛼𝛾 𝑒 −𝛼𝑥 𝑥 𝛾−1
As we know Mean (𝜇) = 𝐸(𝑋) = ∫−∞ 𝑥𝑓(𝑥)𝑑𝑥 = ∫0 𝑥. ̅ 𝑑𝑥
|𝛾
𝛼𝛾 ∞ −𝛼𝑥 𝛾 𝛼𝛾 ̅̅̅̅̅̅̅
|𝛾+1 𝛾
= ̅ ∫0 𝑒 𝑥 𝑑𝑥 = ̅ . 𝑎𝛾+1 =𝑎
|𝛾 |𝛾

Variance of Gamma function

∞ 𝛼𝛾 𝑒 −𝛼𝑥 𝑥 𝛾−1 𝛾 2
𝑉𝑎𝑟(𝑋) = 𝜎 2 = 𝐸[𝑋 2 ] − [𝐸(𝑋)]2 = ∫0 𝑥 2 ̅ 𝑑𝑥 − (𝑎 )
|𝛾

𝛼𝛾 ∞ 𝛾+1 −𝛼𝑥 𝛾 2
= ̅ ∫0 𝑥 𝑒 𝑑𝑥 − (𝑎)
|𝛾

1
Putting 𝛼𝑥 = 𝑡 𝑡ℎ𝑒𝑟𝑒𝑓𝑜𝑟𝑒 𝑑𝑥 = 𝛼 𝑑𝑡, we get

𝛼𝛾 ∞ 𝛾2 ̅̅̅̅̅̅̅
|𝛾+2 𝛾2 𝛾
𝛾+1 −𝑡
= 𝛼𝛾+2|𝛾
̅ ∫0 𝑡 𝑒 𝑑𝑡 − 𝛼2 = ̅ − 𝛼2 = 𝑎2
𝑎2 |𝛾

Moment generating function of Gamma function



Moment generating function about origin 𝑀𝑜 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = ∫0 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥

𝛼𝛾 ∞ −𝛼𝑥 𝛾−1 𝑡𝑥 𝛼𝛾 ∞ −(𝛼−𝑡)𝑥 𝛾−1


𝑀𝑜 (𝑡) = ̅ ∫0 𝑒 𝑥 𝑒 𝑑𝑥 = ̅ ∫0 𝑒 𝑥 𝑑𝑥
|𝛾 |𝛾

Putting (𝛼 − 𝑡)𝑥 = 𝑦, implies that


𝑦
𝑥= , therefore
𝛼−𝑡

1
𝑑𝑥 = 𝛼−𝑡 𝑑𝑦, we get

𝛼𝛾 ∞ 𝑦 𝛾−1 −𝑦 1
𝑀𝑜 (𝑡) = ̅ ∫ ( ) 𝑒 𝛼−𝑡 𝑑𝑦
|𝛾 0 𝛼−𝑡

𝛼 𝛾∞ 𝛾−1 −𝑦 𝛼 𝛾
𝛼 𝛼 𝛾 𝛾
𝑀𝑜 (𝑡) = (𝛼−𝑡)𝛾|𝛾 ̅
̅ ∫0 𝑦 𝑒 𝑑𝑦 = (𝛼−𝑡)𝛾 |𝛾
̅ |𝛾 = (𝛼−𝑡)𝛾 = (𝛼−𝑡) , 𝑡 < 𝛾

OR

𝑡 −𝛾
𝑀𝑜 (𝑡) = (1 − ) ,𝑡 < 𝛾
𝛼

Geometric Distribution

If 𝑝 be the probability of success and 𝑘 be the number of failures preceding the first success
then the geometric distribution is

𝑝(𝑘) = 𝑞 𝑘 𝑝 , 𝑘 = 0, 1, 2, … . . & 𝑞 = 1 − 𝑝

Obviously
∑∞ ∞ 𝑘 2 3
𝑘=0 𝑝(𝑘) = 𝑝 ∑𝑘=0 𝑞 = 𝑝(1 + 𝑞 + 𝑞 + 𝑞 + ⋯ . )

𝑝 𝑝
= 𝑝(1 − 𝑞)−1 = 1−𝑞 = 𝑝 = 1

Mean of geometric distribution

Mean (𝜇) = ∑∞ ∞ 𝑘
𝑘=0 𝑘𝑝(𝑘) = ∑𝑘=0 𝑘. 𝑞 𝑝

= 0 + 1. 𝑞𝑝 + 2. 𝑞 2 𝑝 + 3. 𝑞 3 𝑝 + ⋯.

= 𝑞𝑝(1 + 2𝑞 + 3𝑞 2 + ⋯ . ) = 𝑞𝑝(1 − 𝑞)−2


𝑞𝑝 𝑞𝑝 𝑞
= (1−𝑞)2 = =𝑝
𝑝2

Variance of geometric distribution

𝑉𝑎𝑟(𝜎 2 ) = ∑∞ 2
𝑘=0 𝑘 𝑝(𝑘) − (𝜇)
2

= ∑∞ 2 𝑘
𝑘=0 𝑘 . 𝑞 𝑝 − (𝜇)
2

𝑞 2
= [0 + 12 . 𝑞1 𝑝 + 22 . 𝑞 2 𝑝 + 32 . 𝑞 3 𝑝 + ⋯ . ] − (𝑝)

𝑞
= 𝑝2

Moment generating function of geometric distribution

𝑀𝑜 (𝑡) = 𝐸(𝑒 𝑡𝑘 ) = ∑∞ 𝑡𝑘 𝑘
𝑘=0 𝑒 𝑞 𝑝

= ∑∞ 𝑡 𝑘 𝑡 𝑡 2 𝑡 3
𝑘=0 𝑝(𝑞𝑒 ) = 𝑝(1 + 𝑞𝑒 + (𝑞𝑒 ) + (𝑞𝑒 ) + ⋯ )

𝑝
= 𝑝(1 − 𝑞𝑒 𝑡 )−1 = 1−𝑞𝑒 𝑡

Weibull distribution

A continuous random variable 𝑋 has a Weibull distribution if its probability density function is
defined as
𝑥𝛼
𝛼
𝑓(𝑥) = 𝑐 𝑥 𝛼−1 𝑒 − 𝑐 , 𝑥 > 0 , 𝑐 > 0

Where 𝑐 is a scale parameter and 𝛼is a shape parameter.

Uniform (or Rectangular) distribution


A random variable 𝑋 is said to have a uniform distribution over the interval −∞ < 𝑎 < 𝑏 < ∞,
if its probability density function is defined as
1
,𝑎 < 𝑥 < 𝑏
𝑓(𝑥) = [𝑏−𝑎
0 , 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Mean of uniform distribution


𝑏
𝑏 𝑏 1 1 𝑥2 𝑎+𝑏
Mean (𝜇) = ∫𝑎 𝑥. 𝑓(𝑥)𝑑𝑥 = ∫𝑎 𝑥. 𝑏−𝑎 𝑑𝑥 = 𝑏−𝑎 [ 2 ] =
𝑎 2

Variance of uniform distribution

𝑏 1 (𝑏−𝑎)2
𝑉𝑎𝑟(𝜎 2 ) = ∫𝑎 𝑥 2 . 𝑏−𝑎 𝑑𝑥 − (𝜇)2 = 12

Moment generating function of uniform distribution


𝑏
𝑏 1 1 𝑒 𝑡𝑥 𝑒 𝑡𝑏 −𝑒 𝑡𝑎
𝑀0 (𝑡) = 𝐸(𝑒 𝑡𝑥 ) = ∫𝑎 𝑒 𝑡𝑥 𝑏−𝑎 𝑑𝑥 = 𝑏−𝑎 [ ] =
𝑡 𝑎 𝑡(𝑏−𝑎)

Example1. A random variable 𝑋 has a uniform distribution over (−3,3), find 𝑘 for which
1
𝑃(𝑋 > 𝑘) = 3. Also evaluate 𝑃(𝑋 < 2) and 𝑃(|𝑋 − 2| < 2).

Solution: We know that probability density function for uniform distribution is


1 1 1
𝑓(𝑥) = 𝑏−𝑎 = 3−(−3) = 6

Therefore,
𝑘 𝑘 1 1 𝑘
𝑃(𝑋 > 𝑘) = 1 − 𝑃(𝑋 ≤ 𝑘) = 1 − ∫−3 𝑓(𝑥)𝑑𝑥 = 1 − ∫−3 6 𝑑𝑥 = 2 − 6

1
But we have given 𝑃(𝑋 > 𝑘) = 3

Therefore,
1 𝑘 1
− 6 = 3 , implies that 𝑘 = 1
2

Now
2 2 1 5
𝑃(𝑋 < 2) = ∫−3 𝑓(𝑥)𝑑𝑥 = ∫−3 6 𝑑𝑥 = 6

And
3 31 1
𝑃(|𝑋 − 2| < 2) = 𝑃(−2 < 𝑋 − 2 < 2) = 𝑃(0 < 𝑋 < 4) = ∫0 𝑓(𝑥)𝑑𝑥 = ∫0 6 𝑑𝑥 = 2

Example2. A die is cast until 6 appear. What is the probability that it must be cast more than 5
times?
1 1 5
Solution: Here probability of getting 6 is 𝑝 = 6. Therefore, 𝑞 = 1 − 6 = 6

If 𝑋 is the number of tosses required for the first success, then

𝑃(𝑋 = 𝑥) = 𝑞 𝑥−1 𝑝 for 𝑥 = 1, 2, 3, …

Therefore,

Required probability = 𝑃(𝑋 > 5) = 1 − 𝑃(𝑋 ≤ 5)

5 𝑥−1 1 1 5 5 2 5 3 5 4 5 5
= 1 − ∑5𝑥=1 (6) . (6) = 1 − 6 [1 + 6 + (6) + (6) + (6) ] = (6) .

Negative Binomial Distribution

This distribution gives the probability that the event occurs for the kth time on the rth trial
(𝑟 ≥ 𝑘). If 𝑝 be the probability of occurrence of an event, then
𝑟−1 𝑘 𝑟−𝑘
𝑃(𝑘, 𝑟) = 𝐶𝑘−1 𝑝 𝑞

It contains two parameters 𝑝(0 < 𝑝 < 1) and 𝑘 (a positive integer). If 𝑘 = 1, the negative
binomial distribution reduces to the geometric distribution.

Hyper geometric Distribution

Suppose a bag contains 𝑚 white and 𝑛 black balls. If 𝑟 balls are drawn one at a time (with
replacement), then the probability that 𝑘 of them will be white is

𝐶𝑘𝑚 𝐶𝑟−𝑘
𝑛
𝑃(𝑘) = , 𝑘 = 0, 1, 2, … , 𝑟 , 𝑟 ≤ 𝑚 , 𝑟 ≤ 𝑛.
𝐶𝑟𝑚+𝑛

This is known as hyper geometric distribution.

For ∑𝑟𝑘=0 𝑝(𝑘) = 1, since ∑𝑟𝑘=0 𝐶𝑘𝑚 𝐶𝑟−𝑘


𝑛
= 𝐶𝑟𝑚+𝑛

This can be proved by equating the coefficient of 𝑡 𝑟 in

(1 + 𝑡)𝑚 (𝑡 + 1)𝑛 = (1 + 𝑡)𝑚+𝑛 .


MOMENTS

1. The 𝑟 𝑡ℎ moment of a variable 𝑥 about the mean 𝑥̅ is usually denoted by 𝜇𝑟 is given by


𝟏
̅)𝒓 , ∑ 𝒇𝒊 = 𝑵
𝝁𝒓 = 𝑵 ∑ 𝒇𝒊 (𝒙𝒊 − 𝒙

2. The 𝑟 𝑡ℎ moment of a variable 𝑥 about the any point 𝑎 is usually denoted by 𝜇′𝑟 is given by
𝟏
𝝁′𝒓 = 𝑵 ∑ 𝒇𝒊 (𝒙𝒊 − 𝒂)𝒓 , ∑ 𝒇𝒊 = 𝑵

3. Moment about mean:


1
Let 𝑥̅ be the arithmetic mean, then 𝜇𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟 , 𝑟 = 0,1,2,3, …, where 𝑁 = ∑𝑛𝑖=1 𝑓𝑖

1 1 1
If 𝑟 = 0 , 𝜇0 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )0 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 = 𝑁 𝑁 = 1

1 1 1 1
If 𝑟 = 1 , 𝜇1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑥̅ 𝑁 ∑𝑛𝑖=1 𝑓𝑖 = 𝑥̅ − 𝑥̅ 𝑁 𝑁 = 𝑥̅ − 𝑥̅ = 0

1 1 1 1
If 𝑟 = 2 , 𝜇2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − 2𝑥̅ 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 + (𝑥̅ )2 𝑁 ∑𝑛𝑖=1 𝑓𝑖

1 1 1 1 2
= 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − 2𝑥̅ 𝑥̅ + (𝑥̅ )2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − (𝑥̅ )2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 2 − (𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 )

= 𝜎 2 (variance)
1
If 𝑟 = 3 , 𝜇3 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )3

1
If 𝑟 = 4 , 𝜇4 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )4

Moments about any number (Raw moments)


1
Let 𝑎 be any arbitrary number then 𝜇′𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)𝑟 , 𝑟 = 0,1,2, , …, where 𝑁 = ∑𝑛𝑖=1 𝑓𝑖

1 1 1
If 𝑟 = 0 , 𝜇′0 = ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)0 = ∑𝑛𝑖=1 𝑓𝑖 = 𝑁=1
𝑁 𝑁 𝑁

1 1 1 1
If 𝑟 = 1 , 𝜇′1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 − 𝑎 𝑁 ∑𝑛𝑖=1 𝑓𝑖 = 𝑥̅ − 𝑁 𝑁 = 𝑥̅ − 𝑎 .

1
If 𝑟 = 2 , 𝜇′2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)2

1
If 𝑟 = 3 , 𝜇3 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)3

1
If 𝑟 = 4 , 𝜇′4 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )4
Moments about origin
1
𝑣𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )𝑟 , 𝑟 = 0,1,2, , …, where 𝑁 = ∑𝑛𝑖=1 𝑓𝑖

1 1 1
If 𝑟 = 0 , 𝑣0 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )0 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 = 𝑁 𝑁 = 1

1 1
If 𝑟 = 1 , 𝑣1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )1 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 𝑥𝑖 = 𝑥̅

1
If 𝑟 = 2 , 𝑣2 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )2

1
If 𝑟 = 3 , 𝑣3 = ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )3
𝑁

1
If 𝑟 = 4 , 𝑣4 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )4

Relation between 𝝁𝒓 and 𝝁′𝒓 :

We have
1 1 1
𝜇𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 ([𝑥𝑖 − 𝑎] − [𝑥̅ − 𝑎])𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 ([𝑥𝑖 − 𝑎] − 𝜇′1 )𝑟

On expanding by Binomial theorem,


1 2
𝜇𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 [(𝑥𝑖 − 𝑎)𝑟 − 𝐶(𝑟, 1)(𝑥𝑖 − 𝑎)𝑟−1 𝜇 ′1 + 𝐶(𝑟, 2)(𝑥𝑖 − 𝑎)𝑟−2 (𝜇 ′1 ) − ⋯ +
𝑟
(−1)𝑟 (𝜇′1 ) ]

2 𝑟
𝜇𝑟 = 𝜇′𝑟 − 𝐶(𝑟, 1)𝜇′𝑟−1 𝜇′1 + 𝐶(𝑟, 2)𝜇′𝑟−2 (𝜇 ′1 ) − ⋯ + (−1)𝑟 (𝜇 ′1 )

Putting = 2, 3, 4, … , we get
2 2 2
𝜇2 = 𝜇′2 − 2(𝜇 ′1 ) + (𝜇 ′1 ) = 𝜇′2 − (𝜇 ′1 ) , because 𝜇 ′ 0 = 1
3 3 3
𝜇3 = 𝜇′3 − 3𝜇′2 𝜇′1 + 3(𝜇 ′1 ) − (𝜇 ′1 ) = 𝜇′3 − 3𝜇′2 𝜇′1 + 2(𝜇 ′ 2 )
2 4
𝜇4 = 𝜇′4 − 4𝜇′3 𝜇′1 + 6𝜇′2 (𝜇 ′1 ) − 3(𝜇 ′1 )

Thus we have the following relations:

𝝁𝟏 = 𝟎 ; 𝝁𝟐 = 𝝁′𝟐 − 𝝁′𝟐𝟏 ; 𝝁𝟑 = 𝝁′𝟑 − 𝟑𝝁′𝟐 𝝁′𝟏 + 𝟐𝝁′𝟑𝟏 ;

𝝁𝟒 = 𝝁′𝟒 − 𝟒𝝁′𝟑 𝝁′𝟏 + 𝟔𝝁′𝟐 𝝁′𝟐𝟏 − 𝟑𝝁′𝟒𝟏


Conversly
1 1
𝜇′𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎)𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 ([𝑥𝑖 − 𝑥̅ ] + [𝑥̅ − 𝑎])𝑟

1
= 𝑁 ∑[𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟 + 𝐶(𝑟, 1)𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟−1 (𝑥̅ − 𝑎) + 𝐶(𝑟, 2)𝑓𝑖 (𝑥𝑖 − 𝑥̅ )𝑟−2 (𝑥̅ − 𝑎)2 + ⋯ +
𝐶(𝑟, 2)𝑓𝑖 (𝑥𝑖 − 𝑥̅ )1 (𝑥̅ − 𝑎)𝑟−1 + 𝑓𝑖 (𝑥̅ − 𝑎)𝑟 ]

𝜇′𝑟 = 𝜇𝑟 + 𝑟𝜇𝑟−1 𝜇′1 + 𝐶(𝑟, 2)𝜇𝑟−2 𝜇′12 + ⋯ + 𝑟𝜇1 𝜇′1𝑟−1 + 𝜇′1𝑟

Putting = 1, 2, 3, 4, … , we get

If 𝑟 = 1, 𝜇′1 = 𝑥̅ − 𝑎 = 𝜇1 − 𝑎

If 𝑟 = 2, 𝜇′2 = 𝜇2 + 2𝜇1 𝜇′1 + 𝜇′12 = 𝜇2 + 𝜇′12 , because 𝜇1 = 0

If 𝑟 = 3, 𝜇′3 = 𝜇3 + 3𝜇2 𝜇′1 + 3𝜇1 𝜇′12 + 𝜇′13 = 𝜇3 + 3𝜇2 𝜇′1 + 𝜇′13

If 𝑟 = 4, 𝜇′4 = 𝜇4 + 4𝜇3 𝜇′1 + 6𝜇2 𝜇′12 + 4𝜇1 𝜇′13 + 𝜇′14 = 𝜇4 + 4𝜇3 𝜇′1 + 6𝜇2 𝜇′12 + 𝜇′14

𝝁′𝟏 = 𝝁𝟏 − 𝒂 ; 𝝁′𝟐 = 𝝁𝟐 + 𝝁′𝟐𝟏 ; 𝝁′𝟑 = 𝝁𝟑 + 𝟑𝝁𝟐 𝝁′𝟏 + 𝝁′𝟑𝟏 ;

𝝁′𝟒 = 𝝁𝟒 + 𝟒𝝁𝟑 𝝁′𝟏 + 𝟔𝝁𝟐 𝝁′𝟐𝟏 + 𝝁′𝟒𝟏

Relation between 𝑣𝑟 and 𝜇𝑟


1
𝑣𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 )𝑟 , 𝑟 = 0,1,2, , …

1 1
𝑣𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 (𝑥𝑖 − 𝑎 + 𝑎)𝑟 = 𝑁 ∑𝑛𝑖=1 𝑓𝑖 [(𝑥𝑖 − 𝑎)𝑟 + 𝐶(𝑟, 1)(𝑥𝑖 − 𝑥̅ )𝑟−1 𝑎 + ⋯ + 𝑎𝑟 ]

On taking 𝑎 = 𝑥̅ , we get

𝑣𝑟 = 𝜇′𝑟 + 𝐶(𝑟, 1)𝜇𝑟−1 𝑥̅ + 𝐶(𝑟, 2)𝜇𝑟−2 𝑥̅ 2 + ⋯ 𝑥̅ 𝑟 (1)

Putting 𝑟 = 1, 2, 3, 4 in (1), we get

If 𝑟 = 1 , 𝑣1 = 𝜇1 + 𝜇0 𝑥̅ = 𝑥̅ , because 𝜇1 = 0 , 𝜇0 = 1

If 𝑟 = 2 , 𝑣2 = 𝜇2 + 𝐶(2,1)𝜇1 𝑥̅ + 𝜇0 𝑥̅ 2 = 𝜇2 + 𝑥̅ 2

If 𝑟 = 3 , 𝑣3 = 𝜇3 + 𝐶(3,1)𝜇2 𝑥̅ + 𝐶(3,2)𝜇1 𝑥̅ 2 +𝜇0 𝑥̅ 3 = 𝜇3 + 3𝜇2 𝑥̅ + 𝑥̅ 3

If 𝑟 = 4 , 𝑣4 = 𝜇4 + 𝐶(4,1)𝜇3 𝑥̅ + 𝐶(4,2)𝜇2 𝑥̅ 2 + 𝐶(4,3)𝜇1 𝑥̅ 3 + 𝜇0 𝑥̅ 4

= 𝜇4 + 4𝜇3 𝑥̅ + 6𝜇2 𝑥̅ 2 + 𝑥̅ 4

̅𝟐 ; 𝒗𝟑 = 𝝁𝟑 + 𝟑𝝁𝟐 𝒙
̅ ; 𝒗𝟐 = 𝝁𝟐 + 𝒙
Hence, 𝒗𝟏 = 𝒙 ̅𝟑 ; 𝒗𝟒 = 𝝁𝟒 + 𝟒𝝁𝟑 𝒙
̅+𝒙 ̅𝟐 + 𝒙
̅ + 𝟔𝝁𝟐 𝒙 ̅𝟒
Moment generating function:

(1) The moment generating function (m.g.f.) is a function that generates moments. In the case
of discrete probability distributions it is defined as

𝑀𝑎 (𝑡) = ∑ 𝑝𝑖 𝑒 𝑡(𝑥𝑖 −𝑎) = 𝐸{𝑒 𝑡(𝑥−𝑎) } (1)

Where 𝑀𝑎 (𝑡) is the moment generating function (m.g.f.) of the discrete probability distribution
of 𝑥 about the point 𝑎 and is a function of the parameter 𝑡.

Expanding the exponential in (1), we get

𝑡2 𝑡𝑟
𝑀𝑎 (𝑡) = ∑ 𝑝𝑖 [1 + 𝑡(𝑥𝑖 − 𝑎) + 2! (𝑥𝑖 − 𝑎)2 + ⋯ + 𝑟! (𝑥𝑖 − 𝑎)𝑟 + ⋯ ]

𝑡2 𝑡𝑟
= ∑ 𝑝𝑖 + 𝑡 ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎) + 2! ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎)2 + ⋯ + 𝑟! ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎)𝑟 + ⋯

𝑡2 𝑡𝑟
= 1 + 𝑡 ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎) + 2! ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎)2 + ⋯ + 𝑟! ∑ 𝑝𝑖 (𝑥𝑖 − 𝑎)𝑟 + ⋯ (2)

OR

𝑡2 𝑡𝑟
𝑀𝑎 (𝑡) = 1 + 𝑡𝜇1′ + 2! 𝜇2′ + ⋯ + 𝑟! 𝜇𝑟′ + ⋯ (3)

𝑡𝑟
From equation (3) we notice that 𝜇𝑟′ is the coefficient of 𝑟! in the expansion of 𝑀𝑎 (𝑡). For this
reason the function 𝑀𝑎 (𝑡) is called the moment generating function (m.g.f.).

Alternately, 𝜇𝑟′ can also be obtained by differentiating 𝑀𝑎 (𝑡) 𝑟 times with respect to 𝑡 and
putting 𝑡 = 0 in the differentiated result i.e.
𝑑𝑟
(𝑑𝑡 𝑟 (𝑀𝑎 (𝑡))) = 𝜇𝑟 ′ (4)
𝑡=0

Thus the moment about any point 𝑥 = 𝑎 can either be computed from equation (3) or more
easily from formula (4).

On rewriting equation (1), we get

𝑀𝑎 (𝑡) = ∑ 𝑝𝑖 𝑒 𝑡(𝑥𝑖 −𝑎) = ∑ 𝑝𝑖 𝑒 𝑡𝑥𝑖 𝑒 −𝑎𝑡 = 𝑒 −𝑎𝑡 ∑ 𝑝𝑖 𝑒 𝑡𝑥𝑖

OR

𝑀𝑎 (𝑡) = 𝑒 −𝑎𝑡 𝑀0 (𝑡) (5)

Equation (5) shows that the moment generating function about point 𝑎 = 𝑒 −𝑎𝑡 (m.g.f. about
the origin)
Note:- (1) The m.g.f. of the sum of two independent variables is the product of their m.g.fs.

(2) If 𝑓(𝑥) is the density function of a continuous variable 𝑋, then the moment generating
function of this continuous probability distribution about 𝑥 = 𝑎is defined as

𝑀𝑎 (𝑡) = ∫−∞ 𝑒 𝑡(𝑥−𝑎) 𝑓(𝑥)𝑑𝑥.

Example1. Find the moment generating function of the discrete binomial distribution given by
𝑓(𝑥) = 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥 . Also, find the first and second moment about the mean and standard
deviation.

Solution: Here we have 𝑓(𝑥) = 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥

Moment generating function about the origin 𝑀0 (𝑡) = ∑ 𝑒 𝑡𝑥 𝑓(𝑥) = ∑ 𝑒 𝑡𝑥 𝐶(𝑛, 𝑥)𝑝 𝑥 𝑞 𝑛−𝑥

= ∑ 𝐶(𝑛, 𝑥)(𝑝𝑒 𝑡 )𝑥 𝑞 𝑛−𝑥 = 𝑞 𝑛 + 𝐶(𝑛, 1)𝑞 𝑛−1 (𝑝𝑒 𝑡 ) + 𝐶(𝑛, 2)𝑞 𝑛−2 (𝑝𝑒 𝑡 )2 + ⋯ + (𝑝𝑒 𝑡 )𝑛

= (𝑞 + 𝑝𝑒 𝑡 )𝑛
𝑑
𝑣1 = [𝑑𝑡 𝑀0 (𝑡)] = [𝑛(𝑞 + 𝑝𝑒 𝑡 )𝑛−1 𝑝𝑒 𝑡 ]𝑡=0 = 𝑛(𝑞 + 𝑝)𝑛−1 𝑝 = 𝑛𝑝 , because 𝑞 + 𝑝 = 1.
𝑡=0

𝑑2 𝑑
𝑣2 = [𝑑𝑡 2 𝑀0 (𝑡)] = [𝑑𝑡 {𝑛(𝑞 + 𝑝𝑒 𝑡 )𝑛−1 𝑝𝑒 𝑡 }]
𝑡=0 𝑡=0

= [𝑛(𝑛 − 1)(𝑞 + 𝑝𝑒 𝑡 )𝑛−2 (𝑝𝑒 𝑡 )2 + 𝑛(𝑞 + 𝑝𝑒 𝑡 )𝑛−1 (𝑝𝑒 𝑡 )]𝑡=0

= [𝑛(𝑛 − 1)(𝑞 + 𝑝)𝑛−2 𝑝2 + 𝑛(𝑞 + 𝑝)𝑛−1 𝑝] = 𝑛(𝑛 − 1)𝑝2 + 𝑛𝑝 = 𝑛𝑝[(𝑛 − 1)𝑝 + 1]

= 𝑛𝑝[𝑛𝑝 + (1 − 𝑝)] = 𝑛𝑝[𝑛𝑝 + 𝑞] = 𝑛2 𝑝2 + 𝑛𝑝𝑞

𝜇1 = 𝑥̅ = 𝑣1 = 𝑛𝑝

𝜇2 = 𝜇′2 − (𝑥̅ )2 = 𝑣2 − (𝑣1 )2 = (𝑛2 𝑝2 + 𝑛𝑝𝑞) − (𝑛𝑝)2 = 𝑛𝑝𝑞

S.D. = √𝑛𝑝𝑞 , Mean = 𝑛𝑝 .

Example2. Find the moment generating function of the discrete distribution given by 𝑓(𝑥) =
𝑒 −𝑚 .𝑚𝑥
. Also, find the first and second moments about mean and variance.
𝑥!

𝑒 −𝑚 .𝑚𝑥
Solution: Here we have 𝑓(𝑥) = .
𝑥!

𝑥
𝑒 −𝑚 .𝑚𝑥 (𝑚𝑒 𝑡 )
Moment generating function about the origin = 𝑀0 (𝑡) = ∑ 𝑒 𝑡𝑥 . = 𝑒 −𝑚 ∑
𝑥! 𝑥!

(𝑚𝑒 𝑡 )2 (𝑚𝑒 𝑡 )3 𝑡 𝑡 −1)


= 𝑒 −𝑚 [1 + 𝑚𝑒 𝑡 + + + ⋯ ] = 𝑒 −𝑚 𝑒 𝑚𝑒 = 𝑒 𝑚(𝑒
2! 3!
𝑑 𝑑 𝑡 −1) 𝑡 −1)
𝑣1 = [𝑑𝑡 𝑀0 (𝑡)] = [𝑑𝑡 𝑒 𝑚(𝑒 ] = [𝑒 𝑚(𝑒 𝑚𝑒 𝑡 ]𝑡=0 = 𝑒 𝑚(1−1) . 𝑚 = 𝑚
𝑡=0 𝑡=0

𝑑2 𝑑 𝑡 −1) 𝑡 −1) 𝑡 −1)


𝑣2 = [𝑑𝑡 2 𝑀0 (𝑡)] = [𝑑𝑡 [𝑒 𝑚(𝑒 𝑚𝑒 𝑡 ]] = [𝑒 𝑚(𝑒 (𝑚𝑒 𝑡 )2 + 𝑒 𝑚(𝑒 𝑚𝑒 𝑡 ]𝑡=0
𝑡=0 𝑡=0

= [𝑒 𝑚(1−1) 𝑚2 + 𝑒 𝑚(1−1) 𝑚] = 𝑚2 + 𝑚

𝜇1 = 𝑥̅ = 𝑣1 = 𝑛𝑝

𝜇2 = 𝜇′2 − (𝑥̅ )2 = 𝑣2 − (𝑣1 )2 = 𝑚2 + 𝑚 − 𝑚2 = 𝑚

Hence Mean = 𝑚 and Variance = 𝑚 .

Example3. Find the moment generating function of the exponential distribution


1
𝑓(𝑥) = 𝑐 𝑒 −𝑥/𝑐 , 0 ≤ 𝑥 < ∞ , 𝑐 > 0. Hence, find its mean and standard deviation.

Solution: The moment generating function about the origin is


1 1 ∞
∞ ∞ 1 1 ∞ 1 1
𝑀0 (𝑡) = ∫0 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥 = ∫0 𝑒 𝑡𝑥 𝑐 𝑒 −𝑥/𝑐 𝑑𝑥 = 𝑐 ∫0 𝑒 (𝑡−𝑐)𝑥 𝑑𝑥 = 𝑐 . (𝑡− )𝑥
1 [𝑒
𝑐 ]
𝑡− 0
𝑐

1 1 1
= 𝑐. 1 [0 − 1] = = (1 − 𝑐𝑡)−1 = 1 + 𝑐𝑡 + (𝑐𝑡)2 + (𝑐𝑡)3 + ⋯
𝑡− 1−𝑐𝑡
𝑐

Moment about origin


𝑑 𝑑
𝑣1 = [𝑑𝑡 𝑀0 (𝑡)] = 𝑣1 = [𝑑𝑡 {1 + 𝑐𝑡 + (𝑐𝑡)2 + (𝑐𝑡)3 + ⋯ }] =𝑐
𝑡=0 𝑡=0

𝜇1 = 𝑥̅ = 𝑣1 = 𝑐.

𝑑2 𝑑
𝑣2 = [𝑑𝑡 2 𝑀0 (𝑡)] = [𝑑𝑡 {𝑐 + 2𝑐 2 𝑡 + 3𝑐 3 𝑡 2 + ⋯ }] = [2𝑐 2 + 6𝑐 3 𝑡 + ⋯ ]𝑡=0 = 2𝑐 2
𝑡=0 𝑡=0

𝜇2 = 𝜇′2 − (𝑥̅ )2 = 𝑣2 − (𝑣1 )2 = 2𝑐 2 − 𝑐 2 = 𝑐 2

Standard deviation = √𝜇2 = √𝑐 2 = 𝑐.

Example4. Find the moment generating function of the continuous normal distribution given by

1 𝑥−𝜇 2
1 − ( )
𝑓(𝑥) = 𝜎√2𝜋 𝑒 2 𝜎 ; −∞ < 𝑥 < ∞ .

1 𝑥−𝜇 2
1
Solution: Here we have 𝑓(𝑥) = 𝜎√2𝜋 𝑒 −2( )
𝜎
The moment generating function about the origin is

1 𝑥−𝜇 2
∞ ∞ 1
𝑀0 (𝑡) = ∫−∞ 𝑒 𝑡𝑥 𝑓(𝑥)𝑑𝑥 = ∫−∞ 𝑒 𝑡𝑥 𝜎√2𝜋 𝑒 −2( )
𝜎 𝑑𝑥 (1)

𝑥−𝜇
Putting = 𝑧 so that 𝑑𝑥 = 𝜎𝑑𝑧in (1), we get
𝜎

𝑧2 2
1 ∞ 𝑒 𝜇𝑡 ∞ 𝑡𝜎𝑧−𝑧
𝑀0 (𝑡) = 𝜎√2𝜋 ∫−∞ 𝑒 𝑡(𝜎𝑧+𝜇) 𝑒 − 2 (𝜎𝑑𝑧) = ∫ 𝑒 2 𝑑𝑧
√2𝜋 −∞

1 1
𝜇𝑡+ 𝜎2𝑡2 1 𝑧2 𝜇𝑡+ 𝜎2𝑡2 1
𝑒 2 ∞ 𝑡𝜎𝑧− 𝜎2 𝑡 2 − 𝑒 2 ∞ − (𝑧 2 −2𝑡𝜎𝑧+𝜎2 𝑡 2 )
=
√2𝜋
∫−∞ (𝑒 2 2 ) 𝑑𝑧 =
√2𝜋
∫−∞ 𝑒 2 𝑑𝑧

1 1
𝜇𝑡+ 𝜎2𝑡2 𝜇𝑡+ 𝜎2 𝑡2
𝑒 2 ∞ −1(𝑧−𝑡𝜎)2 𝑒 2 ∞ 1 2
= ∫−∞ 𝑒 2 𝑑𝑧 = × √2𝜋 , because ∫−∞ 𝑒 −2(𝑧−𝑡𝜎) 𝑑𝑧 = √2𝜋
√2𝜋 √2𝜋

1 2 2
= 𝑒 𝜇𝑡+2𝜎 𝑡
.

You might also like