Week2 Notes
Week2 Notes
Week2 Notes
2. Week 2
Example 2.1. A box contains 3 red balls and 2 green balls. Balls of the same colour are assumed
to be identical. Draw a ball at random. If A denotes the event that the ball drawn is red, then
P(A) = 35 .
Note 2.2. Consider a random experiment E with the sample space Ω being the set of all natural
numbers, i.e. Ω = {1, 2, 3, · · · }. Then for any probability function/measure P on F = 2Ω , we have
P∞
1 = P(Ω) = n=1 pn , ∀A ∈ F. Consequently, limn pn = 0 and all pn ’s cannot be equal. Hence
we cannot have natural numbers drawn at random. By a similar argument, we cannot have any
random experiment performed at random if the sample space is countably infinite.
Remark 2.3. When multiple draws from a box are involved in a single trial of a random experiment,
then there are two broad categories of problems, viz. sampling with replacement and sampling
without replacement. In the first case, the outcome of each draw is returned to the box before the
next draw. In the second case, the outcome is removed from the possibilities in the next draw.
Following examples illustrate these concepts.
Example 2.4. Example 1.14 where we throw/roll a die thrice is an example of sampling with
replacement. The cardinality of the sample space is 6 × 6 × 6 = 63 . If A denote the event that all
the rolls result in an even number, then number of ways favourable to A is 3 × 3 × 3 = 33 . Thus,
33
P(A) = 63
= 18 .
Example 2.5. Draw 2 cards at random from a standard deck of 52 cards. Here, the cardinality
52
of the sample space is 2
. Since, we are looking at the 2 cards in hand together, the order in
which they have been obtained does not matter. Consider the event A that both cards are from
the Club (♣) suit. Since a standard deck of cards contain 13 cards from the Club suit, we have
13 52
P(A) = 2
/ 2
. This is an example of sampling without replacement.
Example 2.6 (Placing r balls in m bins). Fix two positive integers r and m. Suppose that there
are r labelled balls and m labelled bins/boxes/urns. Assume that each bin can hold all the balls,
if required. One by one, we put the balls into the bins ‘at random’. Then, by letting ωi be the
12
bin-number into which the i-th ball is placed, we can capture the full configuration by the vector
ω = (ω1 , . . . , ωr ). Let Ω be the list of all configurations. Therefore, Ω is the sample space of this
random experiment. We have
The cardinality of Ω is mr (since each ball may be placed in one of the m bins). Since the
experiment has been performed at random, we have P({ω}) = pω = m−r , ∀ω ∈ Ω. We now
consider the probabilities of the following events.
(a) Let A be the event that the r-th ball is placed in the first bin. Then A = {ω ∈ Ω : ωr = 1}.
Here, balls numbered 1 to r − 1 can be placed in any of the m bins. Therefore, the number
mr−1 1
of outcomes ω favourable to A is mr−1 . Hence, P(A) = mr
= m
.
(b) Let B be the event that the first bin is empty. Then B = {ω ∈ Ω : ωi 6= 1, ∀i = 1, 2, · · · , r}.
Here, each ball can be placed in any of the remaining bins numbered 2 to m. Since there
are m − 1 choices for each ball, the number of outcomes ω favourable to B is (m − 1)r .
(m−1)r
Hence P(B) = mr
.
(c) Consider r ≤ m and let C be the event that all the balls are placed in distinct bins, i.e.
no bins contain more than one ball (a bin may remain empty). Then, C = {ω ∈ Ω : ωi 6=
ωj , ∀i 6= j}. Here, we are choosing/sampling bins for each ball and the sampling is being
done without replacement. Hence, the number of outcomes ω favourable to C is mPr . Hence
m(m−1)···(m−r+1) (m−1)···(m−r+1)
P(C) = mPr m−r = mr
= mr−1
.
Example 2.7 (Birthday Paradox). There are n people at a party. What is the chance that two of
them have the same birthday? Assume that none of them was born on a leap year and that days
are equally likely to be a birthday of a person. The problem structure remains the same as in the
previous balls in bin problem, where the bins are labelled as 1, 2, . . . , 365 (days of the year), and
the balls are labelled as 1, 2, . . . , n (people). In the notations of the previous example, r = n and
m = 365. Here, we wish to find the probability of the following event
We now discuss a generalization of the Inclusion-Exclusion principle for two events discussed in
Proposition 1.37.
Proposition 2.8 (Inclusion Exclusion Principle). Let (Ω, F, P) be a probability space and let
A1 , . . . , An be events. Then,
n
Ai ) = S1,n − S2,n + S3,n − · · · + (−1)n−1 Sn,n ,
[
P(
i=1
where
X
Sk,n := P(Ai1 ∩ Ai2 ∩ · · · ∩ Aik ).
1≤i1 <i2 <...<ik ≤n
Proof. For the case n = 2, the result has already been discussed in Proposition 1.37. We prove the
result for general n by an application of the principle of Mathematical Induction.
Suppose the result is true for n = 2, 3, · · · , k. We want to establish the result for n = k + 1.
Using the result for n = 2, we have
k+1
[ k
[ k
[ k
[
P( Ai ) = P(( Ai ) ∪ Ak+1 ) = P( Ai ) + P(Ak+1 ) − P(( Ai ) ∩ Ak+1 ).
i=1 i=1 i=1 i=1
Consider
X
Tj,k := P(Ai1 ∩ Ai2 ∩ · · · ∩ Aij ∩ Ak+1 ), j = 1, 2, · · · , k.
1≤i1 <i2 <...<ij ≤k
Sk
Applying the result for n = k on the set i=1 (Ai ∩ Ak+1 ), we have
k k
(−1)j−1 Tj,k .
[ X
P(( Ai ) ∩ Ak+1 ) =
i=1 j=1
Then,
k+1
[
P( Ai )
i=1
14
= (S1,k + P(Ak+1 )) − (S2,k + T1,k ) + (S3,k + T2,k ) · · · + (−1)k−1 (Sk,k + Tk−1,k ) + (−1)(k+1)−1 Tk,k
k+1
(−1)j−1 Sj,k+1 .
X
=
j=1
Hence the result is true for the case n = k+1. Applying the principle of Mathematical Induction,
the result is true for any positive integer n.
Example 2.9. Consider placing r labelled balls in m labelled bins at random (Example 2.6). Let
E denote the event that at least one bin is empty. Now, for j = 1, 2, · · · , m, consider the event Ej
that none of the balls are placed in the j-th bin, i.e. j-th bin is empty. Then
Ej = {ω ∈ Ω : ωi 6= j, ∀i = 1, 2, · · · , r}, ∀j = 1, 2, · · · , m
Sm Tm
and E = j=1 Ej . Not all the bins can be empty and hence j=1 Ej = ∅. For 1 ≤ k ≤ m − 1,
Ej1 ∩ Ej2 ∩ · · · ∩ Ejk denotes the event that the bins numbered j1 < j2 < · · · < jk are empty. Here,
each ball can be placed in the remaining m − k bins. Therefore,
(m − k)r
P(Ej1 ∩ Ej2 ∩ · · · ∩ Ejk ) = .
mr
By the Inclusion-Exclusion Principle,
m m−1
m (m − k)r
!
k−1
[ X
P(E) = P Ej = (−1) .
j=1 k=1 k mr
Remark 2.10 (Bonferroni’s inequality). In the notations of Proposition 2.8, it can be shown that
Sn
S1,n − S2,n ≤ P( i=1 Ai ) ≤ S1,n . More generally,
n
[
P( Ai ) ≤ S1,n − S2,n + · · · + Sm,n if m is odd,
i=1
[n
P( Ai ) ≥ S1,n − S2,n + · · · − Sm,n if m is even.
i=1
We do not discuss the proof. These inequalities are sometimes referred to as Bonferroni’s inequal-
ities in the literature.
15
Note 2.11. During the performance of a random experiment, if an event A is observed, then it
may also provide some information regarding other events. The next concept attempts to formalize
this information.
Definition 2.12 (Conditional Probability). Let (Ω, F, P) be a probability space and let A be an
event with P(A) > 0. For any event B, we define
P(B ∩ A)
P(B | A) :=
P(A)
to be the conditional probability of B given A.
Example 2.13. Consider rolling a fair die twice. The sample space is Ω = {(i, j) : i, j ∈
1
{1, 2, 3, 4, 5, 6}} with P((i, j)) = 36
, ∀(i, j) ∈ Ω. Consider the events A = {(i, j) : i is odd} and
B = {(i, j) : i + j = 3} = {(1, 2), (2, 1)}. Here, number of outcomes favourable to A is 3 × 6 = 18
18 1 1
and hence P(A) = 36
= 2
> 0. The event A ∩ B = {(1, 2)} and hence P(A ∩ B) = 36
. Then
P(B∩A) 1
P(B | A) = P(A)
= 18
.
Note 2.14. We note some basic properties of conditional probability. If P(A) > 0, then
P(Ω ∩ A) P(A)
(a) P(Ω | A) = P(A)
= P(A)
= 1.
(b) P(A | A) = 1
(c) B 7→ P(B | A) defines a non-negative set function on F.
Proof. We verify the axioms in Definition 1.33. We have already observed P(Ω | A) = 1 and
non-negativity in the above note.
To establish the countable additivity. If {En }n is a sequence of mutually exclusive events, then
so are {En ∩ A}n . By the countable additivity of P, we have
∞ S∞ S∞ ∞ ∞
[ P( n=1 En ∩ A) P( n=1 (En ∩ A)) X P(En ∩ A) X
P( En | A) = = = = P(En | A).
n=1 P(A) P(A) n=1 P(A) n=1
P(E1 ∩ E2 ∩ · · · ∩ En )
= ···
Example 2.17. Suppose that an urn contains 3 red balls and 5 green balls. All balls of the
same colour are identical. Suppose 2 balls are drawn successively at random from the urn without
replacement. Let A and B denote the events that the first ball is red and second ball is green,
3
respectively. Then P(A) = 8
. If the event A has already happened, then at the time of the
5
second draw the urn contains 2 red balls and 5 green balls. As such P(B | A) = 7
. By the
Multiplication rule, the probability that the first ball drawn is red and the second ball drawn is
15
green is P(A ∩ B) = P(A)P(B | A) = 56
.
Definition 2.18 (Exhaustive events). Let I be an indexing set. A collection of events {Ei : i ∈ I}
is said to be exhaustive if ∪i∈I Ei = Ω.
Theorem 2.19 (Theorem of Total Probability). Let I be a finite or countably infinite indexing
set. Let {Ei : i ∈ I} be a collection of mutually exclusive and exhaustive events in a probability
17
Proof. Since Ei ’s are exhaustive, we have P(∪i∈I Ei ) = P(Ω) = 1. Then P(E) = P(E ∩ (∪i∈I Ei ))
(see practice problem set 1). Since Ei ’s are mutually exclusive, so are E ∩ Ei ’s. Hence by the finite
or countable additivity of P (depending on whether I is finite or countably infinite), we have
X X
P(E) = P(E ∩ (∪i∈I Ei )) = P(E ∩ Ei ) = P(Ei ) P(E | Ei ).
i∈I i∈I
Remark 2.20. The practice problem referred in the above proof does not require the fact that
∪i∈I Ei = Ω, but rather we need P(∪i∈I Ei ) = 1. In the hypothesis of the previous theorem, we
may replace the exhaustiveness of Ei ’s by the condition P(∪i∈I Ei ) = 1.
Example 2.21. Suppose we perform a random experiment with the following steps.
(a) Suppose that there are two urns. The first urn contains 3 red balls and 5 green balls. The
second urn contains 6 red balls and 3 green balls. All balls of the same colour are identical.
(b) Suppose a fair die is rolled and if the outcome is 1 or 6, then the first urn is chosen.
Otherwise, the second urn is chosen.
(c) Finally, 2 balls are drawn at random from the chosen urn.
We want to find the probability that both the balls drawn are red. Let E denote this event.
Suppose U1 and U2 denote the events that the first urn and the second urn is chosen respectively.
2 1
Then the events Ui , i = 1, 2 are mutually exclusive and exhaustive. Moreover, P(U1 ) = 6
= 3
and
4 2
P(U2 ) = 6
= 3
. Further,
3 6
2 3 2 15 5
P(E | U1 ) =
8
= , P(E | U2 ) = 9 = = .
28 36 12
2 2
18
Then the required probability can be computed as an application of Theorem of Total Probability
as
1 3 2 5 1 5 79
P(E) = P(U1 ) P(E | U1 ) + P(U2 ) P(E | U2 ) = + = + = .
3 28 3 12 28 18 252
Theorem 2.22 (Bayes’ Theorem). Let I be a finite or countably infinite indexing set. Let {Ei :
i ∈ I} be a collection of mutually exclusive and exhaustive events in a probability space (Ω, F, P)
such that P(Ei ) > 0, ∀i. Then for any event E ∈ F with P(E) > 0, we have
P(Ej ) P(E | Ej )
P(Ej | E) = P , ∀j ∈ I.
i∈I P(Ei ) P(E | Ei )
In the last step of the calculation above, we have used the Multiplication rule and the Theorem of
Total Probability.
Remark 2.23. As discussed in Remark 2.20, in the statement of Theorem 2.22, we may replace the
exhaustiveness of Ei ’s by the condition P(∪i∈I Ei ) = 1.
Remark 2.24. In the setup of Theorem 2.19 and Theorem 2.22, information about the ‘standard’
events Ei ’s may be known beforehand and we want to understand the probability of occurrence of
an arbitrary event E, treated as an ‘effect’ caused by the Ei ’s. These two results, therefore, allows
us to understand/quantify the relationship between ‘cause’ and ‘effect’, in terms of conditional
probability.
Notation 2.25. In the setup of the Bayes’ Theorem, we shall refer to P(Ei ), i ∈ I as prior
probabilities and P(Ei | E), i ∈ I as posterior probabilities.
Example 2.26. Consider a rare disease X that affects one in a million people. A medical test is
used to test for the presence of the disease. The test is 99% accurate in the sense that if a person
does not have this disease, the chance that the test shows positive is 1% and if the person has this
disease, the chance that the test shows negative is also 1%. Suppose a person is tested for the
19
disease and the test result is positive. Let A be the event that the person has the disease X. Let
B be the event that the test shows positive. As per the given information, the given data may be
summarized as follows.
Then
P(Ac ) = 1 − P(A) = 1 − 10−6 , P(B | A) = 1 − P(B c | A) = 0.99.
We are interested in the conditional probability that the person has the disease, given that the test
result is positive. Here, A and Ac are mutually exclusive and exhaustive. By the Bayes’ Theorem,
we have
P(B | A)P(A) 0.99 × 10−6
P(A | B) = = = 0.000099.
P(B | A)P(A) + P(B | Ac )P(Ac ) 0.99 × 10−6 + 0.01 × (1 − 10−6 )
Definition 2.27 (Independence of Two events). Let A, B be events in a common probability space
(Ω, F, P). We say that A and B are independent if P(A ∩ B) = P(A)P(B).
Note 2.28. If A = ∅, then A is independent of any other event B. To see this, observe that
Remark 2.29 (Independence and Mutually Exclusiveness (Pairwise disjointness) of two events).
Independence should not be confused with pairwise disjointness! If A and B are disjoint, P(A∩B) =
P(∅) = 0 and hence A and B can be independent if and only if at least one of P(A) or P(B) equals
0. If A and B are disjoint, then A ⊆ B c and B ⊆ Ac . If we know that A has occurred, then we
immediately conclude that B did not occur. Independence is not to be expected in such situations.
On the other hand, if P(A) > 0 and P(B) > 0 and A, B are independent, then P(A ∩ B) > 0. In
this situation, A and B cannot be mutually exclusive.
20
Remark 2.30 (Conditional probability and Independence of two events). If A, B are independent
P(A∩B)
and P(A) > 0, then P(B | A) = P(A)
= P(B).
Example 2.31. Recall Example 2.13, where we considered rolling a standard six-sided die twice
1
at random. The sample space is Ω = {(i, j) : i, j ∈ {1, 2, 3, 4, 5, 6}} with P((i, j)) = 36
, ∀(i, j) ∈ Ω.
Consider the events A = {(i, j) : i is odd}, B = {(i, j) : i + j = 4} and C = {(i, j) : j = 2}.
Observe that
1 1 1
P(A) = , P(B) = , P(A ∩ B) = .
2 12 18
Here, A and B are not independent. Again,
1 1 1
P(A) = , P(C) = , P(A ∩ C) = .
2 6 12
Here, A and C are independent.
(b) Let I be any indexing set and let {Ei : i ∈ I} be a collection of events in a probability space
(Ω, F, P). We say that this collection of events is mutually independent or equivalently, the
events are mutually independent, if all finite subcollections {Eii , Ei2 , · · · , Eik } are mutually
independent.
Definition 2.33 (Pairwise Independence of a collection of events). Let I be any indexing set
and let {Ei : i ∈ I} be a collection of events in a probability space (Ω, F, P). We say that this
collection of events is pairwise independent or equivalently, the events are pairwise independent, if
for all distinct indices i and j, the events Ei and Ej are independent.
21
Note 2.34. To check that events E1 , E2 , · · · , En are mutually independent, we need to check
n n n
2
+ 3
+ ··· + n
= 2n − n − 1 conditions. However, to check that these events are pairwise
n
independent we need to check 2
conditions.
Remark 2.35 (Mutual independence and Pairwise independence of a collection of events). If a collec-
tion of events is mutually independent, then by definition the events are also pairwise independent.
We consider an example to see that the converse need not be true. Consider a random experiment E
with sample space Ω = {1, 2, 3, 4} and event space F = 2Ω . If the outcomes are equally likely, then
we have the probability function/measure P determined by the information P({ω}) = 41 , ∀ω ∈ Ω.
Consider the events A = {1, 4}, B = {2, 4}, C = {3, 4}. Then P(A) = P(B) = P(C) = 21 . More-
1
over, A ∩ B = B ∩ C = C ∩ A = {4} and hence P(A ∩ B) = P(B ∩ C) = P(C ∩ A) = 4
.
Therefore, the collection of events {A, B, C} is pairwise independent. However, A ∩ B ∩ C = {4}
1 1
and hence P(A ∩ B ∩ C) = 4
6= 8
= P(A)P(B)P(C). Here, the collection {A, B, C} is not mutually
independent.
Notation 2.36. We say that a collection of events is independent or equivalently, some events are
independent to mean that the collection is (equivalently, the events are) mutually independent.
Remark 2.37. Consider random experiments where we roll a standard six-sided die thrice or draw
two balls from a bin/box/urn. In such experiments multiple draws/trials of the same operations
are being performed. In such situations, if the events depending purely on different trials are
independent, then we say that the trials are being performed independently. In Example 2.31, the
two throws/rolls of the die are independent. Here, the events A and C which depend on the first and
the second throws respectively, are independent. If we toss a fair coin twice independently, then the
sample space is Ω = {HH, HT, T H, T T } with P({HH}) = P({HT }) = P({T H}) = P({T T }) = 14 .
Note 2.38 (Functions defined on sample spaces). While studying a random phenomena with
the framework of a random experiment, in most situations we shall be interested in numerical
quantities associated with the outcomes. To elaborate, consider the following two examples.
(a) Consider the random experiment of tossing a coin once. As discussed earlier, the sample
space is Ω = {H, T }. Suppose we think of the occurrence of head as winning a rupee and
22
the occurrence of a tail as losing a rupee. This information may be captured by a function
X : Ω → R given by
X(H) := 1, X(T ) := −1.
(b) Consider the random experiment of tossing a coin until head appears. The sample space
may be written as Ω = {H, T H, T T H, T T T H, · · · }. If we are interested in the number
of tosses required to obtain the first head, then the information can be captured by the
function X : Ω → R given by
Now, we focus on analysis of such functions X defined on the sample space Ω of some random
experiment E.
Notation 2.39 (Pre-image of a set under a function). Let Ω be a non-empty set and let X : Ω → R
be a function. Given any subset A of R, we consider the subset X −1 (A) of Ω defined by
The set X −1 (A) shall be referred to as the pre-image of A under the function X.
Remark 2.40. In Notation 2.39, we do not know whether the function X is bijective. As such, we
cannot identify X −1 as the ‘inverse’ function of X. To avoid any confusion, treat X −1 (A) as one
symbol referring to the set as defined above and do not identify it as a combination of symbols
X −1 and A.