Probability Lecture Notes: 1 Definitions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Probability Lecture Notes

Zohair Raza Hassan

Friday, 4th October, 2019

1 Definitions
This section includes some basic definitions that we must go over to
understand the lecture.
• An experiment is a random process that leads to a set of out-
comes. e.g. the flipping of a coin is an experiment
• The entire set of outcomes is known as the sample space of said
experiment. e.g. the sample space of a coin flip is {Head, Tail}
• An event is a subset of the sample space. e.g. if you roll a
dice, some possible events include: an even number is rolled, a
prime number is rolled, and a number between two and four is
rolled. In these cases, the sample space is {1, 2, 3, 4, 5, 6}, and
the corresponding subsets are {2, 4, 6}, {2, 3, 5}, and, {2, 3, 4} ,
respectively
• The probability of an event is the likelihood of the event occur-
ring; it quantifies the chances of the event happening. Probabil-
ities must lie between 0 and 1 (inclusive). e.g. a fair dice (each
outcome is equally likely). There are six possible outcomes, and
since each outcome is equally likely, each outcome has a proba-
bility of 1/6. The probability of an event, A, is denoted as P (A)
• When all outcomes in a sample space are equally likely, we can
use the following formula to calculate the probability of an event,
A, occurring:
number of outcomes in A
P (A) =
number of outcomes in the sample space

1
• The probability that an event A occurs, given that an event B
occurs is denoted as P (A|B)
• Two events are disjoint if the outcomes in their sets do not over-
lap. Examples of disjoint events: getting different numbers on a
dice roll. Probabilities of disjoint events/outcomes can be added.
Example of non-disjoint events: the outcome of a dice is prime
and the outcome that it is odd. The “additive” rule for any two
events can be stated as P (A or B) = P (A) + P (B) − P (A and B)
• Two events are independent if the occurrence of one event does
not effect the occurrence of the other. For two independent events
A, B, P (A|B) = P (A), P (B|A) = P (B), and P (A and B) =
P (A) × P (B) e.g. rolling multiple dice; flipping multiple coins;
rolling a dice and then flipping a coin; same example as before,
but this time if the coin is heads the outcome of the dice is doubled
• In general, P (A and B) = P (A|B) × P (B) = P (B|A) × P (A)

Figure 1: Why conditional probability matters

2 Application 1: Risk Minimization in Ludo


Let’s play some Ludo. We will explore the the settings in Figure 2.1
and 2.2. You are the blue player, and your adversary is red. What’s
the move that will minimize your chance of dying in the next turn?
Remember, in Ludo, you roll again if you roll a six. However, rolling six

2
three times eliminates your turn. In these examples, we are assuming
that after a token is killed, an extra turn is not given.

2.1 Example 1

Figure 2: The current state is shown on the top, while the two possible
moves are shown at the bottom.

It’s currently your turn and you’ve rolled a 1.


Moving the first token. In this case, both of your tokens are in
a safe spot and you can’t be killed in the next turn:

P (Red kills blue) = 0

Moving the second token. In this case, both of your tokens are
nin a safe spot and you can’t be killed in the next turn:
!
1 1 1
P (Red kills blue) = 2 × + + ≈ 0.3981
6 62 63

2.2 Example 2
It’s currently your turn and you’ve rolled a 3.
Moving the first token. In this case, both of your tokens are
three and five spaces away from the red token. We can use the same
calculation as in the latter case of the last example to show that:

P (Red kills blue) ≈ 0.3981

Moving the second token. In this case, both of your tokens are
two and six spaces away from the red token:

3
Figure 3: The current state is shown on the top, while the two possible
moves are shown at the bottom.

1 5 5
P (Red kills blue) = + 2 + 3 ≈ 0.3287
6 6 6
Therefore, it is better to move the second token.

3 Application 2: Appreciating Music

Figure 4: An excerpt from Beethoven’s Fifth Symphony.

The following experiment is to illustrate how difficult it would be


to write the kind of music he composed; we show that there must be
some methodology behind his art, as a completely random process is
unlikely produce his famous symphony.
Imagine there is a monkey playing the piano, randomly. There
are a total of 88 keys on the piano. However, one can choose not
to play a note as well (this is called a rest), giving us a total of 89
musical possibilities at the time of a single note. Furthermore, there
are multiple beats one can play too. i.e. the selected note can be played
for different amounts of time. We assume that a tempo is fixed, and
that there are five possibilities for the beats (see Figure 5).

4
Figure 5: Different beats of musical notes and rests.

Assuming that all keys and beats are equally probable, the proba-
bility that any note is played for any given time is:
1 1 1
× =
89 6 534
The probability of getting one note right is already quite low. Given
that each note is played independently, one can say that for n notes
the probability of playing them all right is:
!n
1
534
There are 359 notes in Beethoven’s Fifth Symphony, meaning that
the probability that a monkey playing randomly on the piano is able
to stumble on Beethoven’s Fifth is:
!359
1
534

4 More Definitions
• A random variable is a variable whose outcome is determined
by a random event. e.g. Let X be a random variable dependent
on the outcome of rolling a dice:
X = the outcome of the dice roll

• The expectation of a random variable is the “average” value of


the random variable. e.g. expected value of a dice roll

5
• The Law of Large Numbers dictates that if a random experi-
ment is repeated enough times, the average value of the random
variables will approach the expectation

5 Application 3: Estimating Pi
Let’s say you have a unit circle inscribed within a 2 × 2 square. We will
now throw a dart at the square such that it is equally likely to land on
each point on the square. The probability that it will land in the circle
is the ratio of the area of the circle to the area of the square:

π × 12 π
P (Dart lands in circle) = 2
=
2 4
Now, let’s define a random variable:

X = 1 if the dart lands in the circle, 0 otherwise


The expectation of this variable is π/4. Due to the Law of Large
Numbers, we know that if we repeatedly throw darts, the average num-
ber of darts that land in the circle:
X1 + X2 + · · · + X k π
=
k 4
Where Xi is the random variable X at the ith trial. We can now
estimate the value of π:
number of darts inside the circle
π ≈4×
total number of darts thrown

6 Application 4: Artificial Intelligence


Probability theory provides the basis for Machine Learning – without
it we cannot prove that learning is possible. For a more direct exam-
ple of the impact of probability theory, one can use the example of
Reinforcement Learning.
A neural network is simply a magic box that takes input and shoots
out output. The network is told what it does right, and what it does
wrong and it adjusts itself accordingly to always get the correct output
for each input.

6
Similarly, reinforcement learning via neural networks is done by
telling the network what state the game is in, and asking it to out-
put one of multiple valid moves to make within the game. The equa-
tion behind this process is given in Figure 6. (Source: medium.com/
@jonathan_hui/rl-model-based-reinforcement-learning-3c2b6f0aa323).
This equation simply tells the network to increase the probability of
more favourable moves based on rewards given to the network.

Figure 6: Reinforcement learning objective function.

7 Bonus Application: Fair Games


This application was skipped in class due to a lack of time.
Say that you’re playing the following game with someone: you flip
a coin, if the outcome is heads, you give him 10 rs., but, if the outcome
is tails, he has to give you 10 rs.. Is this a fair game? Yes, because the
probability of you losing is equal to the probability of you winning i.e.
the game is not biased towards a specific person and depends entirely
on luck.
Now let’s consider another game, with a new, more devious ad-
versary. You flip three coins, one after the other. If the sequence
Head, Head, Head shows up, you get 10 rs.. On the other hand, if the
sequence Tail, Head, Head shows up, she gets 10 rs.. You agree because
this seems like a fair game – both of you have a 1/8 chance of winning.
Let’s say you start flipping coins and end up with the following se-
quence: Head, Tail, Head. This doesn’t match either of your sequences.
Now what? She suggests that instead of stopping, let’s just continue
flipping coins till one of our sequences show up. Without thinking too
much, you agree. But after many turns, you realize you’ve made a
mistake; the game is biased. You can only win if three heads show up

7
in the first three flips - a probability of 1/8. However, if a fourth flip is
required, you will always lose because a tails would always precede a
sequence of three heads (actually two since your adversary would win
at the second heads).

You might also like