Module 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 50

Stat 123: Probability and Statistics Module 1

1 Introduction to Probability

Learning objectives: At the end of this module, the student should be able to:

1. solve problems using the counting principle, permutations and combinations,

2. illustrate random experiment, its sample space and events,

3. state the definitions of probability of an event,

4. find probabilities that will apply permutations/combinations and the classical definition
of probability, and

5. solve problems in Computer Science involving conditional probability, Law of Total


Probability, Bayes Rule and Multiplication Rule.

Together with statistics, probability theory is a branch of mathematics that deals with
chance and uncertainty. Classical mathematical theory had been successful in describing
the world as a series of fixed and real observable events, yet before the seventeenth century
it was largely inadequate in coping with processes or experiments that involved uncertain
or random outcomes. Spurred initially by the mathematician’s desire to analyze gambling
games and later by the scientific analysis of mortality tables within the medical profession,
the theory of probability theory has been developed as a scientific tool dealing with chance.
Probability can also be described as something that concerns a numerical description of
how likely an event is to occur or how likely it is that a proposition is true. The higher the
probability of an event, the more likely it is that the event will happen. A simple example is
the tossing of a fair (unbiased) die. Since the die is fair, the six outcomes {1, 2, 3, 4, 5, 6} are
all equally probable, that is, the probability of 1 is equal to the probability of 2, of 3, and
so on. And since no other outcomes are possible, the probability of each outcome is 1/6 or
roughly 0.1667.
Nowadays, probability theory is recognized as one of the most interesting and also one
of the most useful areas of mathematics. It provides the basis for the science of statistical

Page 1
Stat 123: Probability and Statistics Module 1

inference through experimentation and data analysis – an area of crucial importance in an


increasingly quantitative world.
The following are some examples for which uncertainty happens where probability con-
cepts could be applied in real life:

1. The stock market has several ups and downs which may be cause by new contracts
being made, financial reports being released, and other events of this sort. Many turns
of stock prices remained unexplained. Clearly, nobody would have ever lost a cent in
stock trading had the market contained no uncertainty.

2. A launch of a space shuttle was postponed and that could be because of weather
conditions. Why did not they know of it in advance, when the event was scheduled?
Forecasting weather precisely, with no error, is not a solvable problem, again, due to
uncertainty.

3. To support these words, a meteorologist predicts, say a 60% chance of rain. Why
cannot she let us know exactly whether it will rain or not, so we’ll know whether or
not to take our umbrellas? Yes, because of uncertainty. Because she cannot always
know the situation with future precipitation for sure.

4. Suppose a heavily favored home team unexpectedly lost to an outsider, and a young
tennis player won against expectations. Existence and popularity of totalizators, where
participants place bets on sports results, show that uncertainty enter sports, results of
each game, and even the final standing.

5. We may also hear reports of traffic accidents, crimes, and convictions. Of course, if
that driver knew about the accident ahead of time, he would have stayed at home.

Uncertainty is a condition when the situation cannot be predetermined or predicted for


sure with no error. Uncertainty exists in computer science, software engineering, in many

Page 2
Stat 123: Probability and Statistics Module 1

aspects of science, business and our everyday life. It is an objective reality, and one has to
be able to deal with it. We are forced to make decisions under uncertainty.
Here are some links to YouTube videos related to this section:

1. How Is Probability Used in Real Life?

2. Practical Applications of Probability

Exercise: Do the following and submit your output in the VLE on or before the due date.

1. List 10 situations involving uncertainty that happened with you for the past 5 days.

2. For each uncertainty listed, list all the possible outcomes.

3. Concepts in probability are also applied to several branches of computer science. Dis-
cuss at 2 applications of probability from 2 branches of computer science. Each dis-
cussion should have at least 200 words.

4. From 3, choose 1 discussion of application of probability and look for 3 articles pub-
lished in peer-reviewed journals related to the discussion that you chose.

Page 3
Stat 123: Probability and Statistics Module 1

1.1 The Counting Principle

Learning objectives: At the end of this section, the student should be able to:

1. identify different principles of counting; and

2. solve word problems involving principles of counting.

Counting? You already know how to count or you wouldn’t be taking a college-level
math class, right? Well yes, but what we’ll really be investigating here are ways of counting
efficiently. When we get to the probability situations a bit later in this chapter we will need
to count some very large numbers, like the number of possible winning lottery tickets. One
way to do this would be to write down every possible set of numbers that might show up
on a lottery ticket, but believe me: you don’t want to do this. We will start, however, with
some more reasonable sorts of counting problems in order to develop the ideas that we will
soon need.
As an example, you are in a restaurant and you are given an option to choose between a
burger, a pizza, or a burrito and for beverage, you can choose between a soda or a glass of
iced tea. Given these options, how many possible meal combinations are there?
One way to solve this is to systematically list each possible meal:

burger and soda burger and iced tea

pizza and soda pizza and iced tea

burrito and soda burrito and iced tea

Assuming that we did systematically and that we neither missed nor listed any possibility
more than once, the answer would be 6. Thus you could go to the restaurant 6 nights in a
row and have a different meal each night.
Another way to solve this problem would be to list all the possibilities in a table:

Page 4
Stat 123: Probability and Statistics Module 1

burger pizza burrito

soda burger and soda pizza and soda burrito and soda

iced tea burger and iced tea pizza and iced tea burrito and iced tea

In this manner, we are more certain that we did not miss anything. But either way, we
yield at the same answer which is 6 possible meal combinations. At this point, you might
have noticed already why we keep on getting the same answer which is 6. You guessed it
right, it comes from 3 × 2, 3 choices of food and 2 choices for beverage.

Definition 1.1 (Multiplication Principle). Suppose an experiment E1 has n1 possible out-


comes, E2 has n2 , up until Ek has nk . Then the composite experiment E1 E2 . . . Ek will have
n1 × n2 × · · · × nk outcomes.

Relating the definition to the example, we can let E1 to be the choice of food with n1 = 3
outcomes and E2 be the choice of beverage with n2 = 2 outcomes. Then, the composite
experiment E1 E2 is the possible food combination with n1 × n2 = 6 possible number of
outcomes.
Let’s have more examples!

1. The South Shore line runs from South Bend Airport to Randolph St. Station in
Chicago. There are 20 stations at which it stops along the line. How many one way
tickets could be printed, showing a point of departure and a destination? (Assuming
you can not depart and arrive at the same station.)

Answer: You can start at any of twenty stations. Once this is picked, you can pick
any of 19 destinations. The answer is 20 × 19 = 380. If you can get on and off at the
same station, the answer is 20 × 20 = 400.

2. You want to design a 30 minute workout. For the first 15 minutes, you will choose
an aerobic exercise from running, kickboxing, skipping or circuit training. For the

Page 5
Stat 123: Probability and Statistics Module 1

second 15 minutes, you will work on strength and/or balance choosing from weight
training, TRX, Bosu, resistance bands or your core routine. How many such workouts
are possible.

Answer: There are 4 things you can do for your first 15 minutes. There are 5 things
that you can do for the next 15 minutes. The answer is 4 × 5 = 20.

Exercise: Use the tabular method to solve the following and confirm through the multipli-
cation if your answer is correct.

1. If your closet contains 3 hats, 2 coats and 2 scarves. Assuming you are comfortable
with wearing any combination of hat, coat and scarf, (and you need a hat, coat and
scarf today), how many different outfits could you select from your closet?

2. A multiple choice quiz has 10 questions each with 4 different possible answers. In how
many ways can one fill out the quiz?

Page 6
Stat 123: Probability and Statistics Module 1

1.1.1 Permutation

In this section we will develop an even faster way to solve some of the problems we have
already learned to solve by other means. Let’s start with a couple examples.

1. How many different ways can the letters of the word MATH be rearranged to form a
four-letter code word?

Answer: This problem is a bit different. Instead of choosing one item from each of
several different categories, we are repeatedly choosing items from the same category
(the category is: the letters of the word MATH) and each time we choose an item we
do not replace it, so there is one fewer choice at the next stage: we have 4 choices for
the first letter (say we choose A), then 3 choices for the second (M, T and H; say we
choose H), then 2 choices for the next letter (M and T; say we choose M) and only one
choice at the last stage (T). Thus there are 4 × 3 × 2 × 1 = 24 ways to spell a code
worth with the letters MATH.

This problem can also be answered by using what we call the ”pigeonhole technique”.
In this technique, we start with n number of pigeonholes

× × ··· ×
| {z }
n number of pigeonholes

and we fill them up with the corresponding available outcome for each pigeonhole then
multiplying them after. Now let’s apply this technique to solve this problem, although
we will yield with the same answer. So, initially, there will be 4 pigeonholes since it’s
explicitly stated that the code is four-lettered.

× × ×

Take note that filling this up should be from left to right. For the first pigeonhole, we

Page 7
Stat 123: Probability and Statistics Module 1

still have 4 choices from the letters of MATH, hence, we have

4× × × .

Then for the second pigeonhole, we have to put 3 since there are only 3 remaining
choices, that is, a letter is already used up in the first pigeonhole. So we have

4×3× × .

Continuing this process, we’ll eventually come up with

4×3×2×1

which is equal to 24.

In this example, we actually needed to calculate n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1.


This calculation shows up often in mathematics, and is called factorial, and is notated
by n!.

Definition 1.2 (Factorial). The factorial of a number n is defined to be

n! := n × (n − 1) × (n − 2) × · · · × 3 × 2 × 1.

Note that 0! = 1.

2. How many ways can five different door prizes be distributed among five people?

Answer: There are 5 choices of prize for the first person, 4 choices for the second, and
so on. The number of ways the prizes can be distributed will be

5! = 5 × 4 × 3 × 2 × 1 = 120

Page 8
Stat 123: Probability and Statistics Module 1

ways. Now we will consider some slightly different examples.

3. A charity benefit is attended by 25 people and three gift certificates are given away as
door prizes: one gift certificate is in the amount of $100, the second is worth $25 and
the third is worth $10. Assuming that no person receives more than one prize, how
many different ways can the three gift certificates be awarded?

Answer: Note that there are 3 prizes where 3 from 25 people will have a chance of
winning. Hence, we will have 3 pigeonholes.

× ×

Let’s just assume that the first pigeonhole is for the number of people who could
possibly win the $10 prize, the second pigeonhole is for the $25 prize, and the third is
for the $100 prize. For the first draw, there are 25 possible people who might win the
$10 prize. For the second draw, there are 24 possible people left that might win the
$25 prize since the one who won the $10 is no longer qualified. Until there are only 23
people left who might possibly win the $100 prize. So, our pigeonhole will now look
like
25 × 23 × 22

and therefore, the number of ways for which the prizes could be awarded is 13800.

4. Eight sprinters have made it to the Olympic finals in the 100-meter race. In how many
different ways can the gold, silver and bronze medals be awarded?

Answer: We will use the same method from 3. There are 8 choices for the gold
medal winner, 7 remaining choices for the silver, and 6 for the bronze, so there are
8 × 7 × 6 = 336 ways the three medals can be awarded to the 8 runners.

Note that in these preceding examples, the gift certificates and the Olympic medals were
awarded without replacement; that is, once we have chosen a winner of the first door prize

Page 9
Stat 123: Probability and Statistics Module 1

or the gold medal, they are not eligible for the other prizes. Thus, at each succeeding stage
of the solution there is one fewer choice (25, then 24, then 23 in the first example; 8, then 7,
then 6 in the second). Contrast this with the situation of a multiple choice test, where there
might be five possible answers – A, B, C, D or E – for each question on the test.
Note also that the order of selection was important in each example: for the three door
prizes, being chosen first means that you receive substantially more money; in the Olympics
example, coming in first means that you get the gold medal instead of the silver or bronze.
In each case, if we had chosen the same three people in a different order there might have
been a different person who received the $100 prize, or a different gold medalist. (Contrast
this with the situation where we might draw three names out of a hat to each receive a $10
gift certificate; in this case the order of selection is not important since each of the three
people receive the same prize. Situations where the order is not important will be discussed
in the next section.)
We can generalize the situation in the two examples above to any problem without
replacement where the order of selection is important.

Definition 1.3 (Permutation). If only r positions are to be filled from a selection of n


different objects where r ⩽ n, then the number of possible (ordered) arrangements is given
by
n!
n Pr := .
(n − r)!

As an extension of the definition, if in case n = r, then

n Pr = n!.

Moreover, if repetition is allowed, then

n Pr = nr .

Page 10
Stat 123: Probability and Statistics Module 1

By repetition is allowed, this means that even though an object is already used in a pigeon-
hole, that object might be used again to another pigeonhole.
Exercise:

1. How many different license plates are possible if LTO uses 2 upper case letters followed
by a 4-digit number if:

(a) leading zeros are permitted and digits/letters may be repeated?

(b) the digits and letters are not repeated and leading zeros are not allowed?

2. In how many ways can a president, a treasurer and a secretary be chosen from among
7 candidates?

3. A zip code contains 5 digits. How many different zip codes can be made with the digits
0–9 if no digit is used more than once and the first digit is not 0?

Page 11
Stat 123: Probability and Statistics Module 1

1.1.2 Combination

In the previous section we considered the situation where we chose r items out of n possibil-
ities without replacement and where the order of selection was important. We now consider
a similar situation in which the order of selection is not important. Let’s start with a simple
example.

1. A charity benefit is attended by 25 people at which three $50 gift certificates are given
away as door prizes. Assuming no person receives more than one prize, how many
different ways can the gift certificates be awarded?

Answer: There are 25 choices for the first person, 24 remaining choices for the second
person and 23 for the third, so there are 25 × 24 × 23 = 13, 800 ways to choose three
people. Suppose for a moment that Abe is chosen first, Bea second and Cindy third;
this is one of the 13,800 possible outcomes. Another way to award the prizes would be
to choose Abe first, Cindy second and Bea third; this is another of the 13,800 possible
outcomes. But either way Abe, Bea and Cindy each get $50, so it doesn’t really matter
the order in which we select them. In how many different orders can Abe, Bea and
Cindy be selected? It turns out there are 6:

ABC ACB BAC BCA CAB CBA

How can we be sure that we have counted them all? We are really just choosing 3
people out of 3, so there are 3 × 2 × 1 = 6 ways to do this; we didn’t really need to list
them all, we can just use permutations!

So, out of the 13800 ways to select 3 people out of 25, six of them involve Abe, Bea
and Cindy. The same argument works for any other group of three people (say Abe,
Bea and David or Frank, Gloria and Hildy) so each three-person group is counted six
times. Thus the 13800 figure is six times too big. The number of distinct three-person
groups will be 13800/6 = 2300.

Page 12
Stat 123: Probability and Statistics Module 1

We can generalize the situation in this example above to any problem of choosing a
collection of items without replacement where the order of selection is not important.

Definition 1.4 (Combination). Now suppose the order of selection is not important. Then
the number of n objects taken r at a time (without replacement) is given by

n!
n Cr := .
r!(n − r)!

The formula in the definition reads as n taken r, denoted by


 
n
 
 
 
 
r

which would mean that  


n
 
 = n!
  r!(n − r)! .
 
r

To have a really clear distinction between permutation and combination, let’s have this
simple example. Suppose in a fish bowl there are 3 colored balls; red, blue, and yellow, and
you are asked to pick 2. In how many ways can you pick 2 colored balls from 3? Now it
depends on the context of the situation. It could be that (1) you pick one at a time or (2)
you pick 2 all at once. For (1), we use permutation because, for example, picking a red
ball first and yellow ball second is not the same as picking the yellow ball first and red ball
second. On the other hand, for (2), we use combination since you are picking 2 balls all at
once and the placement of the balls in your hands does not matter. Therefore, for (1), there
are 3 P2 = 6 ways and for (2), there are 3 C2 = 3 ways.
Let’s have more examples!
Example:

Page 13
Stat 123: Probability and Statistics Module 1

1. A group of four students is to be chosen from a 35-member class to represent the class
on the student council. How many ways can this be done?

Answer: Since we are choosing 4 people out of 35 without replacement where the order
35!
of selection is not important there are 3 5C4 = (35−4)!4!
= 52360 combinations.

2. The United States Senate Appropriations Committee consists of 29 members, 15 Re-


publicans and 14 Democrats. The Defense Subcommittee consists of 19 members,
10 Republicans and 9 Democrats. How many different ways can the members of the
Defense Subcommittee be chosen from among the 29 Senators on the Appropriations
Committee?

Answer: In this case we need to choose 10 of the 15 Republicans and 9 of the 14


Democrats. There are 15 C10 = 3003 ways to choose the 10 Republicans and 14 C9 = 2002
ways to choose the 9 Democrats. But now what? How do we finish the problem?

Suppose we listed all of the possible 10-member Republican groups on 3003 slips of red
paper and all of the possible 9-member Democratic groups on 2002 slips of blue paper.
How many ways can we choose one red slip and one blue slip? This is a job for the
Multiplication Principle. We are simply making one choice from the first category and
one choice from the second category, just like in the restaurant menu problems from
earlier.

There must be 3003 × 2002 = 6012006 possible ways of selecting the members of the
Defense Subcommittee.

3. A box of candies contains 52 identically shaped mints, of which 19 are white, 10 are
brown, 7 are pink, 3 are purple, 5 are yellow, 2 are orange, and 6 are green. If 9 candies
are selected randomly without replacement, in how many ways can you select:

(a) 3 white?

Answer: Note that out of 9 that will be chosen, 3 are specifically white and it

Page 14
Stat 123: Probability and Statistics Module 1

does not matter what the other 6 are. By Multiplication Principle, we have to
multiply the number of ways 3 white mints can be chosen from 19 white mints
and the number of ways 6 mints can be chosen from 52 − 19 = 33. Hence, there
are

19 C3 × 33 C6 = 1073233392

ways of choosing 3 white mints.

(b) 3 white, 2 brown, 1 pink, 1 yellow, and 2 green?

Answer: Using the same formulation above, we get

19 C3 × 10 C2 × 7 C1 × 5 C1 × 6 C2 = 22892625.

In our previous examples, we have been dealing with finding number of ways of taking r
objects from n where it should be without replacement. What if we want to know the number
of ways we can pick r number of objects from n with replacement? How does this even make
sense? To start, I know you already know the difference between finite and infinite, right?
In combination without replacement, we actually assume that the hypothetical “container”
for which the objects are coming from, is finite, which means that the number of objects in
that container is known. From the previous example (3), the number of mints in a box is
known, which is 52. From the previous example (2), it is known that their are 29 members
of the United States Senate Appropriations Committee, and so on. So from the context of
(3), although mints are grouped by colors, we cannot consider a white mint to be the same
as another white mint from the group of 19 white mints, so each white mint, though same
in color, are considered to be different. Hence, the concept of without replacement.
How about the concept of combination of with replacement? As an example, suppose in a
bakery, you are presented a promotion where you can buy 2 flavored muffins at a discounted
price. Now, let’s say there are 5 different flavors for muffin that the bakery offers. The
thing is, you really don’t know how many muffins are being baked in that bakery. So we

Page 15
Stat 123: Probability and Statistics Module 1

just consider the indefinite (infinite) number of blueberry muffins as a group of blueberry
muffins. Meaning, if I pick a blueberry muffin, and I picked another one, I’ll just consider
them to be “the same” since they’re just both coming from a group of blueberry muffins.
Let’s have some examples.

1. Let us say there are five flavors of ice cream: banana, chocolate, lemon, strawberry,
and vanilla. We can have three scoops. How many variations will there be?

Answer: Here, the number of scoops for each flavor is indefinite. All we know is we
can have 3 scoops given that there are 5 flavors. For instance, if you want to have 3
scoops of strawberry ice cream, the first scoop is just considered to be the same as the
other 2 scoops. Let us denote by

b c l s v

and imagine you are ordering the 3 scoops from a robot and those are buttons. Then,
the robot will have a series of actions consisting of ➔ and ●, where ➔ means move
to the desired flavor based from the arrangement of the buttons and ● to scoop. For
example, if you want to order 2 scoops of chocolate and 1 scoop of strawberry, then it
will take 1 move from the b button to have 2 scoops for chocolate and 2 moves from
c to get a scoop of strawberry. Therefore, the command of the robot is

➔●●➔➔●➔

or if you want to order 3 scoops of lemon, then the command is given by

➔➔●●●➔➔

Notice how there are 7 positions where there are 4 ➔ and 3 ●. Now the question is,
in how many ways can we arrange three ● from 7 available positions? Since the order
of ● does not matter, then there are 7 C3 = 35 ways in choosing three scoops from 5
available flavors.

Page 16
Stat 123: Probability and Statistics Module 1

The next task is to generalize this. So given the quantities r = 3 and n = 5, how do
we arrive at 7? Spoiler alert, it’s just r + n − 1. Therefore, in combination terms, it is
r + n − 1 taken r. Hence, our formula for combination with replacement is
 
r + n − 1



= (r + n − 1)!
  r!(r + n − 1 − r)!
 
r
(r + n − 1)!
= .
r!(n − 1)!

For the first line, we simply use Definition 1.4 and for the second line, the 2 r will
cancel out.

Definition 1.5 (Combination with replacement). Now suppose the order of selection
is not important. Then the number of n objects taken r at a time with replacement is
given by
(r + n − 1)!
.
r!(n − 1)!

Using the formula in Definition 1.5 to answer the problem, we get

(3 + 5 − 1)
= 35.
3!(5 − 1)!

2. In how many ways can you choose 2 sodas from Pepsi, Coca-cola, and Dr. Pepper?

Answer:
(2 + 3 − 1)!
= 6 ways.
2!(3 − 1)!

Exercise:

1. You just got a free ticket for a boat ride, and you can bring along 2 friends! Unfor-
tunately, you have 5 friends who want to come along. How many different groups of
friends could you take with you?

Page 17
Stat 123: Probability and Statistics Module 1

2. On a circle there are 9 points selected. How many triangles with edges in these points
exist?

3. A teacher has prepared 20 arithmetics tasks and 30 geometry tasks. For a test he‘d
like to use:

(a) 3 arithmetics and 2 geometry tasks or

(b) 1 arithmetics and 2 geometry tasks.

How many ways are there to build the test?

4. A bridge hand is defined to be a set of 13 cards from an ordinary deck. Suppose you
draw 13 cards without replacement, in how many ways can you have a bridge hand if

(a) 6 are royalty?

(b) 5 are red and 3 are clubs?

5. 16 teams enter a competition. They are divided up into four Pools (A, B, C and D)
of four teams each. Every team plays one match against the other teams in its Pool.
After the Pool matches are completed:

• the winner of Pool A plays the second placed team of Pool B

• the winner of Pool B plays the second placed team of Pool A

• the winner of Pool C plays the second placed team of Pool D

• the winner of Pool D plays the second placed team of Pool C

The winners of these four matches then play semi-finals, and the winners of the semi-
finals play in the final. How many matches are played altogether?

To know more about counting principles, you can check out the following YouTube links:

1. The Fundamental Counting Principle

Page 18
Stat 123: Probability and Statistics Module 1

2. Permutation Formula

3. Combination Formula

Page 19
Stat 123: Probability and Statistics Module 1

1.2 The Building Blocks of the Probability Structure

Learning objectives: At the end of this section, the student should be able to:

1. state clearly the definition of a set;

2. solve problems involving set operations; and

3. differentiate between, outcomes, events, and sample space.

In this section, we are going to begin with the idea of sets and how do we operate sets.
The idea of sets will transition to the concept of events, outcomes, and sample space. These
are key terms that we need to understand in studying probability.

1.2.1 Sets

It is safe to say that it is inherent for us human beings to have a collection of any sort. I
want you to pause for a while and think of something that you may have collected before or
maybe you are collecting until now. Whatever that collection is, we always think consciously
that the collection should have a common characteristic or a set of characteristics. I don’t
think that we ever collected things randomly and put them altogether in one place, we’ll
actually, there is, and it’s a pile of rubbish put in a trash bin.
Sets are one of the most fundamental concepts in mathematics. Developed at the end of
the 19th century, set theory is now an ubiquitous part of mathematics, and can be used as a
foundation from which nearly all of mathematics can be derived. In mathematics education,
elementary topics such as Venn diagrams are taught at a young age, while more advanced
concepts are taught as part of a university degree.

Definition 1.6 (Set). A set is a well-defined collection of distinct objects.

When we say well-defined, it simply means that you will have no difficulty in describing
the collection as a whole and by distinct objects, we mean that no two objects are the same
despite them belonging in one group.

Page 20
Stat 123: Probability and Statistics Module 1

The objects that make up a set are called elements or members. For example, let’s have
a set of primary colors. Then we can say that red is a member of the set of primary colors.
Next, sets are conventionally denoted by capital Latin or Greek letters. Although, Latin
letters are mostly preferred. For instance, we can denote by P the set of primary colors.
Now what if I have a set C where it contains red, blue, and yellow? Isn’t it just the same
with P ? Yes! They’re just the same set. Hence, we call P and C equal sets.

Definition 1.7 (Equal sets). Sets A and B are said to be equal if and only if they have
precisely the same elements.

There are two ways for which we can describe sets; intensional or extensional. We
describe the set intensionally if we put accurately write a semantic description of the set.
On the other hand, a set is described extensionally if we list all the elements of the set,
separated by commas, and enclosed in curly braces. Below is an illustration on how we can
describe a set.
Intensional: P is the set containing the primary colors
Extensional: P ={red, blue, yellow}
In our example, we know that yellow is an element or a member of the set P . In math,
we use the symbol ∈ to denote membership. Hence, we write yellow ∈ P . Otherwise, if an
object is not a member of a set, we use the symbol ∈.
/ For instance, purple ∈
/ P.

Definition 1.8. If B is a set and x is one of the objects of B, then this is denoted by x ∈ B,
and is read as “x is an element of B”. Otherwise, if x is not an element of B, then we denote
it by x ∈
/ B.

Now that you already know how to describe sets, let’s get to know some special sets that
are really persistent in mathematics.

1. Universal set Ω. A universal set is simply a set that contains everything, or a set which
contains all objects, including itself.

Page 21
Stat 123: Probability and Statistics Module 1

2. Empty set ∅. If there is a set that contains everything, then there’s also a set that
contains nothing. An empty set, also called the null set is a set that contains no
elements. In other references, an empty set is denoted by {}.

3. Set of real numbers R. The set of real numbers is the set that contains all of the
numbers from positive infinity to negative infinity. It is often denoted by an open
interval (−∞, ∞).

4. Set of natural numbers N. Often called as the set of counting numbers, the set of
natural numbers is a set whose elements are {1, 2, 3, . . . }.

5. Set of integers Z. When you mix 0 and the negative counterparts of the set of natural
numbers to the set of natural numbers, then we’ll have the set of integers whose
elements are {. . . , −3, −2, −1, 0, 1, 2, 3, . . . }.

6. Set of rational numbers Q. The set of rational numbers is a set of all p/q such that p
and q are integers and q ̸= 0. We write the set as follows:

 
p
p, q ∈ Z, q ̸= 0 .
q

Numbers that do not belong to Q are called irrational numbers. For example, the

numbers π, 2, e can never be expressed as a ratio between two integers, hence they
are irrational.

From the examples of special sets given above, notice how one set is “contained” in
another set. For instance, N is contained in Z, that is, all elements of N are found in Z.
Then we say that N is a subset of Z.

Definition 1.9. If all of the elements of A are found in B, then we say that A is a subset
of B.

More specifically, all of the elements of N are found in Z, but not all elements in Z are
found in N. Then we say that N is a proper subset of Z.

Page 22
Stat 123: Probability and Statistics Module 1

Definition 1.10. If all elements of A are found in B, but not all elements of B are found
in A, then we say that A is a proper subset of B, denoted by A ⊂ B.

Note however that for any set, is a subset of itself. For instance, we can say that N is a
subset of N since all elements of N are found in N. We are not violating Definition 1.9. In
this case, we say that N is an improper subset of N.

Definition 1.11. If a subset A has all the elements from the original set B, then we say
that A is an improper subset of B, denoted by A ⊆ B.

Let’s have a proposition about the concept of subsets which may be obvious but actually
is not.

Proposition 1.1. An empty set is always a subset of any set.

Proof. This is an example of a vacuously true statement. In logic, we conclude a vacuously


true precedent if the antecedent will never be satisfied. By Definition 1.9, a set A can only
be a subset of another set B only if all the elements of A are found in B. In this case, the
proposition is also stating that all elements of ∅ are found in any set say A. However, ∅ will
never have elements and are obviously not in A. Therefore, it is vacuously true to say that
∅ is always a subset of any set.

There are also binary operations that we can perform given two or more sets, and these
operations will be used all throughout the course. For illustration purposes, let’s have
Ω = {1, 2, . . . , 9}, A = {1, 2, 3, 4}, B = {2, 4, 6, 8}, C = {3, 5, 7, 9}.

1. Union. Two sets can be “added” together. The union of sets X and Y , denoted by
X ∪ Y , is the set of all things that are members of either X or Y .

Example: A∪B = {1, 2, 3, 4, 6, 8}, B∪C = {2, 3, 4, 5, 6, 7, 8, 9}, A∪C = {1, 2, 3, 4, 5, 7, 9}

The union operation is “commutative”, meaning, X ∪ Y = Y ∪ X.

Page 23
Stat 123: Probability and Statistics Module 1

2. Intersection. A new set can also be constructed by determining which members two
sets have “in common”. The intersection of X and Y , denoted by X ∩ Y , is the set of
all things that are both members of both X and Y . If X ∩ Y = ∅, then X and Y are
said to be disjoint.

Example: A ∩ B = {2, 4}, B ∩ C = ∅, A ∩ C = {3}

The intersection operation is also commutative.

3. Complement. Given a set X, there exist another set X c such that all the elements of
X are not found in X c but are found in Ω. The set X c is called the complement of X.

Example: Ac = {5, 6, 7, 8, 9}, B c = {1, 3, 5, 7, 9}, C c = {1, 2, 4, 6, 8}

4. Set Difference. Given two sets X and Y , a new set can be generated by getting the
elements of X that are not in Y , denoted by X − Y .

Example: A − B = {1, 3}, B − C = B, A − C = {1, 2, 4}

Set difference is not commutative, meaning, A − B ̸= B − A.

Exercise: Using the same given sets from the example above, find the following:

1. Ac ∪ (B ∩ C)c

2. B ∩ (A ∪ C c )

3. (A ∪ B c ) ∪ (Ac ∩ C)c

4. (B − C c )c ∪ (Ac ∩ C)

You can check out this YouTube link to know more about set operations.

Page 24
Stat 123: Probability and Statistics Module 1

1.2.2 Sample Spaces

Definition 1.12 (Experiment). An experiment can in general be thought of as any process


or procedure for which more than one outcome is possible.

In simpler terms, experiment is a mechanism that generates outcomes. Here are some
examples of experiment:

1. tossing a coin

2. tossing a die

3. drawing a card from an ordinary deck

The goal of studying probability is to provide a mathematical structure for understanding


or explaining the chances or likelihoods of the various outcomes actually occurring. A first
step in the development of this theory is the construction of a list of the possible experimental
outcomes. The collection of outcomes is called the sample space denoted by Ω.

Definition 1.13. A sample space Ω is a collection of all possible outcomes of an experiment.

The example of experiments above give us the following sample space:

1. Ω1 = {H, T }

2. Ω2 = {1, 2, 3, 4, 5, 6}

3. Ω3 = {A♣, A♦, A♥, A♠, . . . , K♣, K♦, K♥, K♠}

The following examples help illustrate the concept of a sample space:

1. An engineer in charge of the maintenance of a particular machine notices that its


breakdowns can be characterized as due to an electrical failure within the machine, a
mechanical failure of some component of the machine, or the operator misuse. When

Page 25
Stat 123: Probability and Statistics Module 1

the machine is running, the engineer is uncertain what will be the cause of the next
breakdown. The problem can be thought of as an experiment with the sample space

Ω = {electrical, mechanical, misuse}.

2. A company sells computer chips in boxes of 500, and each chip can be classified as
either satisfactory or defective. The number of defective chips in a particular box is
uncertain, and the sample space is

Ω = {0 defectives, 1 defectives, 2 defectives, . . . , 500 defectives}.

3. The control of errors in computer software products is obviously of great importance.


The number of separate errors in a particular piece of software can be viewed as having
a sample space
Ω = {0 errors, 1 errors, 2 errors, . . . }.

In practice there will be an upper bound on the possible number of errors on the
software, although conceptually it is alright to allow the sample space to consist of all
of the positive integers.

4. A manager supervises the operation of 3 power plants, plant X, plant Y, and plant Z.
At any given time, each of the 3 plants can be classified as either generating electricity
(1) or being idle (0). With the notation (0, 1, 0) used to represent the situation where
plant Y is generating electricity but plants X and Z are both idle, the sample space of
the 3 plants at a particular time is

Ω = {(0, 0, 0), (0, 0, 1), (0, 1, 0), (0, 1, 1), (1, 0, 0), (1, 0, 1), (1, 1, 0), (1, 1, 1)}.

Exercise:

Page 26
Stat 123: Probability and Statistics Module 1

1. What is the sample space when a coin is tossed three times?

2. What is the sample space for counting the number of females in a group of n people?

3. What is the sample space for the number of aces in a hand of 13 playing cards?

4. What is the sample space for a person’s birthday?

5. A car repair is performed in either on time or late and either satisfactorily or unsatis-
factorily. What is the sample space for a car repair?

Page 27
Stat 123: Probability and Statistics Module 1

1.2.3 Events

Interest is often centered not so much on the individual elements of a sample space, but
rather on collections of individual outcomes. These collections of outcomes are called events.

Definition 1.14 (Event). An event is any set of outcomes. Thus, any subset of a sample
space Ω is an event.

An event is said to occur if one of the outcomes within the event occurs. For illustration,
let’s use the sample spaces used earlier:

1. E1 = Ω1 (event of tossing a coin)

2. E2 = {2, 3, 5} (event of tossing a die whose outcome is prime)

3. E3 = {K♣, K♦, K♥, K♠} (event of drawing a king)

4. Suppose we have an experiment of tossing a coin twice. Then the sample space is given
by Ω4 = {HH, HT, T H, T T } and one possible event is the event of tossing a coin twice
whose faces are the same, given by E4 = {HH, T T }.

Since we can already draw events given a sample space, we should now be interested on
how to find for the probability of an event.

Definition 1.15. Let E ⊆ Ω. Let n(·) be the cardinality function, a function that simply
counts the number of outcomes. Then the probability of an event E, denoted by p(E) is
given by
n(E)
n(Ω)

true for all E ⊆ Ω.

The definition simply states that to get the probability of an event, you have to get the
ratio between the cardinality of the event and the cardinality of the sample space where the
event comes from.

Page 28
Stat 123: Probability and Statistics Module 1

For example, let’s use E3 or the event of drawing a king from an ordinary deck. Note
that this event comes from the sample space of all cards in an ordinary deck. Moreover,
n(E3 ) = 4 and n(Ω3 ) = 52. Therefore, the probability of drawing a king from an ordinary
deck is 4/52 ≈ 0.0769. Then let’s have E1 or the event of tossing a coin. Then, the probability
of that event is actually 1 since n(E1 ) = 2 and n(Ω) = 2. Therefore, the probability of tossing
a coin from an experiment of tossing a coin is 1. A probability of 1 actually means that that
event will always happen, which makes sense from our example. The experiment is tossing a
coin, then the event that a coin will be tossed will always happen. If a probability of 1 means
that an event will always happen, then there is also a possibility that an event will never
happen, and that probability is valued to be zero. With this, let’s have some propositions.

Proposition 1.2. Suppose E = Ω ⊆ Ω. Then p(E) = 1.

Proposition 1.3. Suppose E = ∅ ⊆ Ω. Then p(E) = 0.

Since the events ∅ and Ω are already the worst and best case scenarios, then intuitively,
values for probability should be bounded between 0 and 1. This leads us to another propo-
sition.

Proposition 1.4. Suppose E ⊆ Ω. Then 0 ⩽ p(E) ⩽ 1 for all E.

Moreover, the likelihoods of particular experimental outcomes actually occurring are


found by assigning a set of probability values to each of the outcomes of the sample space.
Specifically, each outcome in the sample space is assigned a probability value that is a number
between 0 and 1. The probabilities are chosen so that the sum of the probability values over
all the outcomes in the sample space is 1.

Definition 1.16. A set of probability values for an experiment with a sample space

Ω = {ω1 , ω2 , . . . , ωn }

Page 29
Stat 123: Probability and Statistics Module 1

consists of some probabilities p1 , p2 , . . . , pn that satisfy

0 ⩽ p1 ⩽ 1, 0 ⩽ p2 ⩽ 1, . . . , 0 ⩽ pn ⩽ 1

and
p1 + p2 + · · · + pn = 1.

The probability of outcome ωi occurring is said to be pi , and this is written as p(ωi ) = pi .

An intuitive interpretation of a set of probability values is that the larger the probability
value of a particular outcome, the more likely it is to happen. If two outcomes have identical
probability values assigned to them, then they can be thought of as being equally likely to
occur. On the other hand, if one outcome has a larger probability value assigned to it than
another outcome, then the first outcome can be thought of as being more likely to occur.
Let’s have some examples:

1. Suppose we toss a die twice. Find the probability that the sum of the outcomes is at
least 7.

Answer: The experiment is tossing a die twice, hence, we have a sample space given
by

Ω = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (1, 6), (2, 1), (2, 2), (2, 3), (2, 4), (2, 5), (2, 6),

(3, 1), (3, 2), (3, 3), (3, 4), (3, 5), (3, 6), (4, 1), (4, 2), (4, 3), (4, 4), (4, 5), (4, 6),

(5, 1), (5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}.

However, we are only interested in finding the probability of the event that the sum of
the outcomes is at least 7. Let S be the name of this event. Then, S is given by

S = {(1, 6), (2, 5), (2, 6), (3, 4), (3, 5), (3, 6), (4, 3), (4, 4), (4, 5), (4, 6),

(5, 2), (5, 3), (5, 4), (5, 5), (5, 6), (6, 1), (6, 2), (6, 3), (6, 4), (6, 5), (6, 6)}.

Page 30
Stat 123: Probability and Statistics Module 1

Since n(Ω) = 36 and n(S) = 21, then the probability that the sum of the outcomes is
at least 7 is 21/36 ≈ 0.5833.

2. Two dice are tossed. Find the probability that the sum of the outcomes is less than 7.

Answer: We will have the same sample space as the previous one. However, for
example, note that what one see as (2, 3) might be (3, 2) to other person’s perspective.
Therefore, any toss (d1 , d2 ) is just the same as (d2 , d1 ). This now goes without saying
that if d1 ̸= d2 , then the probability of the outcome (d1 , d2 ) is twice as much as
compared to an outcome (d1 , d2 ) where d1 = d2 . Let T be the event that the outcomes
is less than 7. Then T is given by

T = {(1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (3, 3)}.

Hence, p(T ) = 15/36 ≈ 0.4167.

3. Find the probability of drawing 3 kings and 2 aces in bridge hand.

Answer: We also use the same principle for computing the probability, ratio between
the cardinality of the event and the cardinality of the sample space. For the cardinality
of the event, we have to count the number of ways we can draw 3 kings and 2 aces in
a bridge hand. Recall that a bridge hand is simply 13 cards. For the cardinality of the
sample space, it’s just the number of ways we can draw 13 cards in an ordinary deck.
Let U be the event of drawing 3 kings and 2 aces in a bridge hand. Therefore,

4 C3 × 4 C2 × 44 C8
p(U ) =
52 C13

≈ 0.0067.

4. Suppose a (fair) coin is tossed 3 times. What is the probability that there are exactly
2 heads from the 3 tosses?

Page 31
Stat 123: Probability and Statistics Module 1

Answer: The sample space for this experiment is given by

Ω = {(H, H, H), (H, H, T ), (H, T, H), (H, T, T ), (T, H, H), (T, H, T ), (T, T, H), (T, T, T )}.

Let V be the event of having exactly 2 heads. Hence, p(V ) = 3/8 = 0.3750.

Now it’s time for you to practice!


Exercise:

1. Find the probability of winning in a 6/42 lotto of PCSO.

2. Suppose you draw 2 cards from an ordinary deck. Find the probability that

(a) the cards have the same suit.

(b) the cards are of the same color.

(c) one card is a royalty and the other is an ace.

3. Three letters are being drawn from the word APPLE. Find the probability of not
having a letter P.

Page 32
Stat 123: Probability and Statistics Module 1

1.2.4 Event Spaces

By now, we already have a rudimentary knowledge and skill in finding the probability of a
given event. Hence, it is just fitting to discuss the axioms of probability but we cannot do so
until we know the concept of an event space. This is a collection of events whose probabilities
we can consider in our problem.

Definition 1.17. A collection F of events is an event space on sample space Ω such that

a. it includes the sample space, i.e., Ω ∈ F,

b. every event in F is contained along with its complement; that is, if E ∈ F, it should
also imply that E c ∈ F, and

Sn Tn
c. if {Ai } is a collection of finite or countable subsets of Ω, then i Ei ∈ F and i Ei ∈ F.
S T
Just know that i Ei = E1 ∪ E2 ∪ . . . and i Ei = E1 ∩ E2 ∩ . . . . Here are a few examples
of event spaces:

1. (DEGENERATE EVENT SPACE). By conditions (a) and (b) in Definition 1.17, every
event space has to contain the sample space Ω and the empty set ∅. This minimal
collection
F = {Ω, ∅}

forms an event space that is called degenerate. Condition (c) is straightforward.

2. (POWER SET). On the other extreme, what is the richest event space on a sample
space Ω? It is the collection of all the events, in other words, it is the set of all subsets
of Ω,
F = {Ei |Ei ⊆ Ω}.

To illustrate what a power set is, suppose we have Ω = {1, 2, 3}. Then its power set is
given by
{∅, Ω, {1}, {2}, {3}, {1, 2}, {1, 3}, {2, 3}}.

Page 33
Stat 123: Probability and Statistics Module 1

Check out this link for a YouTube tutorial about random experiment, sample space, event
space.

1.2.5 Kolmogorov Axioms of Probability

It’s time to discuss the axioms of probability. These axioms are called the Kolmogorov axioms
of probability, considered to be the foundations of probability theory introduced by Andrey
Kolmogorov in 1933.

Definition 1.18 (Kolmogorov Axioms of Probability). Assume a sample space Ω and an


event space F on it. Then probability is a function

p : F → [0, 1]

with the domain F and the range [0, 1] that satisfies the following conditions:

a. (Non-negativity) The probability of an event E is a non-negative real number, i.e.,


p(E) ∈ R and p(E) ⩾ 0 for all E ∈ F.

b. (Unit measure) The sample space has unit probability, i.e., p(Ω) = 1.

c. (Sigma-additivity) For any finite or countable collection of mutually exclusive (disjoint)


events
E1 , E2 , · · · ∈ F, !
[ X
p Ei = p(Ei ).
i i

From these axioms, rules of probability can be derived.

Page 34
Stat 123: Probability and Statistics Module 1

1.2.6 Some Properties of Probability

In this section, we discuss the consequences of the Kolmogorov axioms. These consequences
are actually properties of probability that will be useful in solving some probabilistic prob-
lems. And to validate the truthfulness of the properties, we will prove them and for illustra-
tion, we’ll provide examples.

Property 1.1. If A ⊆ B, then p(A) ⩽ B.

Proof. Note that A and B are disjoint. Hence,

B = A ∪ (B − A)

p(B) = p(A ∪ (B − A))

= p(A) + p(B − A) (Sigma-additivity)

⩾ p(A).

Property 1.2 (Complement Law). Given A ⊆ Ω, then p(A) = 1 − p(Ac ).

Proof. Note that A and Ac are disjoint and their union is the sample space Ω.

Ω = A ∪ Ac

p(Ω) = p(A ∪ Ac )

= p(A) + p(Ac ) (Sigma-additivity)

1 = p(A) + p(Ac ) (Unit measure)

1 − p(Ac ) = p(A).

Page 35
Stat 123: Probability and Statistics Module 1

Example:

1. According to the weather forecast, there is a 45% chance of rain tomorrow. What is
the probability that it will not rain tomorrow?

Answer: Let R be the event of raining tomorrow. It is given that p(R) = 0.45.
Therefore, the probability that it will not rain tomorrow is p(R) = 0.55.

2. There is a 0.2112 chance that I will fail Stat 123. What is the probability that I will
pass?

Answer: Let F be the event of failing Stat 123. Then the event F c denotes the event
of passing. Therefore, the probability that I will pass Stat 123 is p(F c ) = 1 − 0.2112 =
0.7888.

Property 1.3. Given A, B ⊆ Ω, then p(A ∪ B) = p(A) + p(B) − p(A ∩ B).

Proof. The union of A and B which is A ∪ B can be expressed as the union of two disjoint
events A and B − A. Hence, we’ll have

A ∪ B = A ∪ (B − A)

p(A ∪ B) = p(A ∪ (B − A))

= p(A) + p(B − A) (Sigma-additivity)

p(B − A) = p(A ∪ B) − p(A). (∗)

Moreover, note that B can be expressed as the union of disjoint events A ∩ B and B − A.

Page 36
Stat 123: Probability and Statistics Module 1

Therefore,

B = (A ∩ B) ∪ (B − A)

p(B) = p((A ∩ B) ∪ (B − A))

= p(A ∩ B) + p(B − A) (Sigma-additivity)

p(B − A) = p(B) − p(A ∩ B) (∗∗).

Since (∗) = (∗∗), then

p(A ∪ B) − p(A) = p(B) − p(A ∩ B)

p(A ∪ B) = p(A) + p(B) − p(A ∩ B).

Example:

1. According to the weather forecast, there is a 0.65 chance that it will rain tomorrow.
Also, there is a 0.15 chance that the school will announce class suspension. If there is
a 0.36 probability that it will either rain tomorrow or that the class will be suspended,
what is the probability that it will rain tomorrow and also the school will announce
class suspension?

Answer: Let R be the event of raining tomorrow and S the event of class suspension.
It is given that p(R) = 0.65, p(S) = 0.15 and p(R ∪ S) = 0.36. Just a tip, “either or” is
union of events in probability while “and” is intersection. Then, the probability that

Page 37
Stat 123: Probability and Statistics Module 1

it will rain tomorrow and school will announce class suspension is given by

p(R ∪ C) = p(R) + p(S) − p(R ∩ C)

0.36 = 0.65 + 0.15 − p(R ∩ C)

p(R ∩ C) = 0.65 + 0.15 − 0.36

= 0.44

2. Given p((A ∩ B)c ) = 0.74, p(A ∪ B) = 0.33 and p(B) = 0.11, find p(Ac ).

Answer:

p(A ∪ B) = p(A) + p(B) − p(A ∩ B)

p(A) = p(A ∪ B) − p(B) + p(A ∩ B)

= p(A ∪ B) − p(B) + (1 − p((A ∩ B)c ))

= 0.38 − 0.11 + (1 − 0.74)

= 0.48.

Therefore,

p(Ac ) = 1 − p(A)

= 1 − 0.48

= 0.52.

Page 38
Stat 123: Probability and Statistics Module 1

1.2.7 Conditional Probability

Suppose you are meeting someone at the airport. The flight is likely to arrive on time; the
probability of that is 0.8. Suddenly, it is announced that the flight departed an hour behind
the supposed schedule. Now, it has the probability of only 0.05 to arrive on time. New
information affected the probability of meeting this flight on time. The new probability is
called conditional probability, where the new information, that the flight departed late, is a
condition.

Definition 1.19 (Conditional Probability). Conditional probability of event A given event


B is the probability that A occurs when B is known to occur, denoted by p(A|B).

The formula for finding the conditional probability of A given B is the ratio between the
probability that A and B will happen and the probability that B will happen, that is,

p(A ∩ B)
p(A|B) = .
p(B)

Example:

1. Suppose a 6-sided die is thrown. We assume that the die is unbiased. Find the
probability that of getting a 4 given that it landed on an even number.

Answer: Clearly there are 2 events; getting a 4 and getting an even number. Therefore,
the probability of getting a 4 given that the die landed on an even number is

p({4} ∩ {2, 4, 6})


p({4}|{2, 4, 6}) =
p({2, 4, 6})
p({4})
=
p({2, 4, 6})
1
6
= 3
6
1
= .
3

Page 39
Stat 123: Probability and Statistics Module 1

2. The probability of passing Stat 123 is 0.90 and the probability of passing CS 11 is
0.80. Moreover, the probability of passing either one of the subjects is 0.95. Find the
probability of passing Stat 123 given that you passed CS 11.

Answer: Let S be the event of passing Stat 123 and C the event of passing CS 11. We
need to find p(S|C) however, we still don’t have p(S ∩ C). But don’t worry since we
just have to use Property 1.3.

p(S ∪ C) = p(S) + p(C) − p(S ∩ C)

p(S ∩ C) = p(S) + p(C) − p(S ∪ C)

= 0.90 + 0.80 − 0.95

= 0.75.

Therefore,

p(S ∩ C)
p(S|C) =
p(C)
0.75
=
0.80
= 0.9375.

Property 1.4 (Multiplication Rule). For all A, B ∈ Ω, (a) p(A ∩ B) = p(A|B)p(B) or (b)
p(A ∩ B) = p(B|A)p(A).

Proof. This is simply a straightforward consequence of the formula for conditional probability
from Definition 1.19.

Property 1.5 (Law of Total Probability). Let {Bi } be a collection of mutually exclusive
S
events such that i Bi = Ω and Bi ̸= ∅ for all i. Then for all A ∈ F,

X
p(A) = p(A|Bi )p(Bi ).
i

Page 40
Stat 123: Probability and Statistics Module 1

Proof. Without loss of generality, let n = 3. Imagine Ω being sliced into 3 parts and each
part are events named B1 , B2 and B3 . Then, their exists an event A ∈ F such that it overlaps
B1 , B2 and B3 . Hence, it should be that the union of the events A ∩ B1 , A ∩ B2 and A ∩ B1
is A and that they are also mutually exclusive. Therefore,

A = (A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 )

p(A) = p((A ∩ B1 ) ∪ (A ∩ B2 ) ∪ (A ∩ B3 ))

= p(A ∩ B1 ) + p(A ∩ B2 ) + p(A ∩ B3 ) (Sigma-additivity)

= p(A|B1 )p(B1 ) + p(A|B2 )p(B2 ) + p(A|B3 )p(B3 ) (Multiplication Rule)


3
X
= p(A|Bi )p(Bi ).
i=1

Example: I have three bags that each contain 100 marbles.

• Bag 1 has 75 red and 25 blue marbles;

• Bag 2 has 60 red and 40 blue marbles;

• Bag 3 has 45 red and 55 blue marbles.

I choose one of the bags at random and then pick a marble from the chosen bag, also at
random. What is the probability that the chosen marble is red?
Answer: Let R be the event that the chosen marble is red. Let Bi be the event that I choose
Bag i. We already know that P (R|B1 ) = 0.75, p(R|B2 ) = 0.60, and p(R|B3 ) = 0.45. We
choose partitions B1 , B2 and B3 . Note that this is a valid partition because, firstly, the Bi s
are mutually exclusive (only one of them can happen), and secondly, because their union is
the entire sample space as one of the bags will be chosen for sure, i.e., p(B1 ∪ B2 ∪ B3 ) = 1.

Page 41
Stat 123: Probability and Statistics Module 1

Using the Law of Total Probability, the probability that the chosen marble is red is,

p(R) = p(R|B1 )p(B1 ) + p(R|B2 )p(B2 ) + p(R|B3 )p(B3 )


     
1 1 1
= (0.75) + (0.60) + (0.45)
3 3 3
= 0.60.

Now we are ready to state one of the most useful results in conditional probability: Bayes’
rule. Suppose that we know p(A|B), but we are interested in the probability p(B|A). To
start, from Property 1.4, we see that (a) = (b). Hence,

p(A|B)p(B) = p(B|A)p(A)
p(A|B)p(B) p(B|A)p(A)
=
p(A) p(A)
p(A|B)p(B)
p(B|A) = ,
p(A)

which is the Bayes’ formula. Often, in order to find p(A) in Bayes’ formula, we need to use
the Law of Total Probability given B1 , B2 , . . . where they form a partition of the sample
space.

Property 1.6. Let {Bi } be a collection of non-empty, mutually exclusive events such that
S
i Bi = Ω. then for all A ∈ F,

p(A|Bj )p(Bj )
p(Bj |A) = P .
i p(A|Bi )p(Bi )

Proof. By Definition 1.19, p(Bj |A) = p(Bj ∩ A)/p(A). By Property 1.4, p(Bj ∩ A) =
P
p(A|Bj )p(Bj ) and by Property 1.5, p(A) = i p(A|Bi )p(Bi ).

Example: A manufacturing company gets its 60% of manufacturing parts from Factory 1
and the other 40% from Factory 2. The parts that are being delivered by both factories

Page 42
Stat 123: Probability and Statistics Module 1

contain defective and non-defective parts. Factories 1 and 2 usually deliver parts that are
5% and 8% defective, respectively. If a part is being chosen at random,

1. what is the probability that the part is from Factory 1 given that it is not defective?

Answer: Let

F1 = event that a part is Factory 1,

F2 = event that a part is Factory 2,

D = event that a part is defective, and

Dc = event that a part is not defective.

Then

p(Dc |F1 )p(F1 )


p(F1 |Dc ) =
p(Dc |F1 )p(F1 ) + p(Dc |F2 )p(F2 )
(0.95)(0.60)
=
(0.95)(0.60) + (0.92)(0.40)
285
=
469
≈ 0.6077.

2. what is the probability that it is from Factory 2 given that it is defective?

Answer:

p(D|F2 )p(F2 )
p(F2 |D) =
p(D|F1 )p(F1 ) + p(D|F1 )p(F1 )
(0.08)(0.40)
=
(0.05)(0.60) + (0.08)(0.40)
16
=
31
≈ 0.5161.

Page 43
Stat 123: Probability and Statistics Module 1

Exercise:

1. There are three boxes, each containing a different number of light bulbs. The first box
has 10 bulbs, of which four are dead, the second has six bulbs, of which one is dead,
and the third box has eight bulbs of which three are dead. What is the probability
of a dead bulb being selected when a bulb is chosen at random from one of the three
boxes?

2. Eighty-percent of people attend their primary care physician regularly; 35% of those
people have no health problems crop up during the following year. Out of the 20% of
people who don’t see their doctor regularly, only 5% have no health issues during the
following year. What is the probability a random person will have no health problems
in the following year?

3. A bag contains red and blue marbles. Two marbles are drawn without replacement.
The probability of selecting a red marble and then a blue marble is 0.28. The prob-
ability of selecting a red marble on the first draw is 0.5. What is the probability of
selecting a blue marble on the second draw, given that the first marble drawn was red?

4. What is the probability that the total of two dice will be greater than 9, given that
the first die is a 5?

5. A person has undertaken a mining job. The probabilities of completion of job on time
with and without rain are 0.42 and 0.90 respectively. If the probability that it will rain
is 0.45, then determine the probability that the mining job will be completed on time.

Page 44
Stat 123: Probability and Statistics Module 1

1.2.8 Independence

Suppose A is the event that it rains tomorrow, and suppose that p(A) = 1/3. Also suppose
that I toss a fair coin; let B be the event that it lands heads up. We have p(B) = 1/2.
Now I ask you, what is p(A|B)? What is your guess? You probably guessed that p(A|B) =
p(A) = 1/3. You are right! The result of my coin toss does not have anything to do with
tomorrow’s weather. Thus, no matter if B happens or not, the probability of A should not
change. This is an example of two independent events. Two events are independent if one
does not convey any information about the other. Let us now provide a formal definition of
independence.

Definition 1.20. Two events A and B are independent if and only if p(A ∩ B) = p(A)p(B).

Now, let’s first reconcile this definition with what we mentioned earlier, p(A|B) = p(A).
If two events are independent, then p(A ∩ B) = p(A)p(B), so

p(A ∩ B)
p(A|B) =
p(B)
p(A)p(B)
=
p(B)
= p(A).

Thus, if two events A and B are independent where p(B) ̸= 0, then p(A|B) = p(A).
To summarize, we can say “independence means we can multiply the probabilities of events
to obtain the probability of their intersection”, or equivalently, ”independence means that
conditional probability of one event given another is the same as the original (prior) proba-
bility”.
Sometimes the independence of two events is quite clear because the two events seem not
to have any physical interaction with each other (such as the two events discussed above).
At other times, it is not as clear and we need to check if they satisfy the independence
condition. Let’s look at an example.

Page 45
Stat 123: Probability and Statistics Module 1

Example: I pick a number from {1, 2, . . . , 10}, and call it N . Suppose that all outcomes
are equally likely. Let A be the event that N . Suppose that all outcomes are equally likely.
Let A be the event that N is less than 7, and let B be the event that N is an even number.
Are A and B independent?
Answer: We have A = {1, 2, 3, 4, 5, 6}, B = {2, 4, 6, 8, 10}, and A ∩ B = {2, 4, 6}. Then
p(A) = 0.6, p(B) = 0.5, and p(A ∩ B) = 0.3. Therefore, p(A ∩ B) = p(A)p(B), so A and B
are independent. This means that knowing B has occurred does not change our belief about
the probability of A. In this problem the two events are about the same random number,
but they are still independent because they satisfy the definition.
The definition of independence can be extended to the case of three or more events.

Definition 1.21. Three events A, B, and C are independent if all of the following conditions
hold:

• p(A ∩ B) = p(A)p(B)

• p(A ∩ C) = p(A)p(C)

• p(B ∩ C) = p(B)p(C)

• p(A ∩ B ∩ C) = p(A)p(B)p(C).

Note that all four of the stated conditions must hold for three events to be independent.
In particular, you can find situations in which three of them hold, but the fourth one does
not. In general, for n events A1 , A2 , . . . , An to be independent we must have

p(Ai ∩ Aj ) = p(Ai )p(Aj ), for all distinct i, j ∈ {1, 2, . . . , n}

p(Ai ∩ Aj ∩ Ak ) = p(Ai )p(Aj ), for all distinct i, j ∈ {1, 2, . . . , n}


..
.

p(A1 ∩ A2 ∩ · · · ∩ An ) = p(A1 )p(A2 ) . . . p(An ) for all distinct i, j ∈ {1, 2, . . . , n}.

Page 46
Stat 123: Probability and Statistics Module 1

This might look like a difficult definition, but we can usually argue that the events are
independent in a much easier way. For example, we might be able to justify independence by
looking at the way the random experiment is performed. A simple example of an independent
event is when you toss a coin repeatedly. In such an experiment, the results of any subset
of the coin tosses do not have any impact on the other ones.
Example: I toss a coin repeatedly until I observe the first tails at which point I stop. Let
X be the total number of coin tosses. Find p(X = 5).
Answer: Here, the outcome of the random experiment is a number X. The goal is to find
p(A) = p(5). But what does X = 5 mean? It means that the first 4 coin tosses result in
heads and the fifth one results in tails. Thus the problem is to find the probability of the
sequence HHHHT when tossing a coin 5 times. Note that HHHHT is a shorthand for the
event 11(The first coin toss results in heads) and (The second coin toss results in heads) and
(The third coin toss results in heads) and (The fourth coin toss results in heads) and (The
fifth coin toss results in tails).” Since all the coin tosses are independent, we can write

p(HHHHT ) = p(H)p(H)p(H)p(H)p(T )
1 1 1 1 1
= · · · ·
2 2 2 2 2
1
= .
32

Some people find it more understandable if you look at the problem in the following
way. I never stop tossing the coin. So the outcome of this experiment is always an infinite
sequence of heads or tails. The value X (which we are interested in) is just a function of the
beginning part of the sequence until you observe a tails. If you think about the problem this
way, you should not worry about the stopping time. For this problem it might not make
a big difference conceptually, but for some similar problems this way of thinking might be
beneficial.
So far, we have seen that two events A and B are independent if p(A ∩ B) = P (A)P (B).

Page 47
Stat 123: Probability and Statistics Module 1

In the next two results, we examine what independence can tell us about other set operations
such as complements and unions.

Property 1.7. If A and B are independent, then

• A and B c are independent,

• Ac and B are independent,

• Ac and B c are independent.

Proof. We prove the first one as the others can be concluded from the first one immediately.
We have

p(A ∩ B c ) = p(A − B)

= p(A) − p(A ∩ B)

= p(A) − p(A)p(B) (since A and B are independent)

= p(A)(1 − p(B))

= p(A)p(B c )

Thus, A and B c are independent.

Just a warning though, one common mistake is to confuse between independence and
being mutually exclusive. These are completely different concepts. When two events A and
B are mutually exclusive it means that if one of them occurs, the other one cannot occur,
i.e., A ∩ B = ∅. Thus, event A usually gives a lot of information about event B which means
that they cannot be independent. Let’s make it precise.

Property 1.8. Consider two events A and B, with p(A) ̸= 0 and p(B) ̸= 0. If A and B are
mutually exclusive, then they are not independent.

Proof. Since A and B are mutually exclusive, we have p(A ∩ B) = 0 ̸= p(A)p(B). Thus, A
and B are not independent.

Page 48
Stat 123: Probability and Statistics Module 1

The following table summarizes the difference between disjointness and independence.

Concept Meaning Formulas

A∩B =∅
Disjoint A and B cannot occur at the same time
p(A ∪ B) = p(A) + p(B)

p(A|B) = p(A), p(B|A) = p(B)


Independent A does not give any information about B
p(A ∩ B) = p(A)p(B)

Example:

1. Suppose there are two biased coins where the head is twice as likely to come up than
tails. If the coins will be tossed, find the probability of getting a head from the first
coin and a tail on the second coin.

Answer: The tosses between the coins are said to be independent since the result of
the toss from the first coin will never affect the toss of the second coin, and vice versa.
Let H1 be the event of getting a head in the first coin and T2 the event of getting a
coin from the second coin. Then, the probability of getting a head from the first coin
and a tail on the second coin is

p(H1 ∩ T2 ) = p(H1 )p(T2 )


  
2 1
=
3 3
2
=
9
≈ 0.2222.

2. Two sets of cards with a letter on each card as follows are placed into separate bags.

Page 49
Stat 123: Probability and Statistics Module 1

• Bag 1: {I, L, J, A, U }

• Bag 2: {L, R, H, E, C, A}

Sara randomly picked one card from each bag. Find the probability that:

(a) she picked letters J and R.

Answer: (1/5)(1/6) = 1/30

(b) both letters are L.

Answer: (1/5)(1/6) = 1/30

(c) both are vowels.

Answer: (3/5)(2/6) = 1/5

Page 50

You might also like