Elements of Probability
Elements of Probability
Elements of Probability
O. S. Makinde
1 Introduction
Life is filled with uncertainty. Some examples of events involving uncertainty are:
• What is the chance that your cowpea seedlings will grow to maturity if it rains everyday?
• What is the chance that your car will be toed away if you park illegally in the visitors’
parking lot?
• What is the likelihood of scoring an A in STA 224 if you start reading two weeks before
examination?
Definition 1 Probability is a numerical measure of the likelihood that an event will occur.
Thus, probabilities can be used as measures of the degree of uncertainty associated with the three
events previously listed. If probabilities are available, we can determine the likelihood of each
event occurring. Probability values are always assigned on a scale from 0 to 1. A probability near
zero indicates an event is unlikely to occur; a probability near 1 indicates an event is almost certain
to occur. Other probabilities between 0 and 1 represent degrees of likelihood that an event will
occur. For example, if we consider the event rain tomorrow, we understand that when the weather
report indicates a near-zero probability of rain, it means almost no chance of rain. However, if
a 0.90 probability of rain is reported, we know that rain is likely to occur. A 0.50 probability
indicates that rain is just as likely to occur as not.
1
Table 1: Some experiments and their outcomes.
Definition 2 The sample space for an experiment is the set of all experimental outcomes.
An experimental outcome is also called a sample point. It is an element of the sample space, and
hence experimental outcomes are sample points.
Consider the experiment of tossing a coin. Let S denotes the sample space, then we can define
S mathematically for this experiment as S = {head, tail}. Consider the experiment of rolling a
die, the sample space is S = {1, 2, 3, 4, 5, 6}. Also, consider an experiment of tossing two coins one
after the other. The sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
The counting rule for multiple-step experiments makes it possible to determine the number of
experimental outcomes without listing them.
Looking at the experiment of tossing two coins as a sequence of first tossing one coin (n1 = 2)
and then tossing the other coin (n2 = 2), the counting rule then shows us that (2)(2) = 4 distinct
experimental outcomes are possible: S = {(H, H), (H, T ), (T, H), (T, T )}.
A useful counting rule allows one to count the number of experimental outcomes when the experi-
ment involves selecting n objects from a (usually larger) set of N objects. It is called the counting
rule for combinations.
where N ! = N × (N − 1) × (N − 2) × . . . × 2 × 1, n! = n × (n − 1) × (n − 2) × . . . × 2 × 1 and, by
definition, 0! = 1.
2
The notation ! means factorial; for example, 3 factorial is 3! = (3)(2)(1) = 6.
The number of ways that n objects can be grouped into r classes with ni in the ith class,
Pr
i = 1, 2, . . . , r and i=1 ni = n is
n n!
= .
n1 n2 . . . nr n1 !n2 ! . . . nr !
As an illustration of the counting rule for combinations, consider a quality control procedure in
which an inspector randomly selects two of five parts to test for defects. In a group of five parts,
how many combinations of two parts can be selected? The counting rule in equation (1) shows
that with N = 5 and n = 2, we have
5 5 5! (5)(4)(3)(2)(1) 120
C2 = = = = = 10.
2 2!(5 − 2)! (2)(1)(3)(2)(1) 12
Thus, 10 outcomes are possible for the experiment of randomly selecting two parts from a group
of five.
Another counting rule that is sometimes useful is the counting rule for permutations. It allows
one to compute the number of experimental outcomes when n objects are to be selected from a set
of N objects where the order of selection is important. A permutation is an ordered arrangement
of objects.
3
and E, the 20 permutations are AB, BA, AC, CA, AD, DA, AE, EA, BC, CB, BD, DB, BE, EB,
CD, DC, CE, EC, DE, and ED.
In sampling from a finite population of size N , the counting rule for combinations is used to
find the number of different samples of size n that can be selected.
Example
How many ways can four children be lined up?
Solution: This corresponds to sampling without replacement. By Definition 4, there are 4! =
4 × 3 × 2 × 1 = 24 different lines
Example
In Nigeria some license plates have eight characters: three letters followed by three numbers, and
then two letters (for example, LSD 422 AA). How many distinct such plates are possible?
Solution: There are 265 different ways to choose the letters (a, b, . . . , y, z, size = 26) and 103 =
1000 ways to choose the numbers 0, 1, . . . , a, size10. Using multiplication rule, there are 265 × 103
DIFFERENT PLATES.
Example
Suppose out of 15 objective questions in STA 122, eight are to be answered. How many different
sets of questions are possible .
Solution: From Definition 5, there are 15 × (15 − 1) × (15 − 2) × . . . × (15 − 8 + 1) = 15 × 14 ×
13 × 12 × 11 × 10 × 9 × 8 sets of different questions
This is a graphical representation that helps in visualizing a multiple-step experiment. The count-
ing rule and tree diagram help the project manager identify the experimental outcomes and de-
termine the possible project completion times.
2 Assigning Probabilities
2.1 Basic Requirement for Assigning Probabilities
1. The probability assigned to each experimental outcome must be between 0 and 1, inclu-
sively. If we let Ei denote the ith experimental outcome and P (Ei ) its probability, then this
requirement can be written as
4
2. The sum of the probabilities for all the experimental outcomes must equal 1.0. For n exper-
imental outcomes, this requirement can be written as
The classical method of assigning probabilities is appropriate when all the experimental outcomes
are equally likely. If n experimental outcomes are possible, a probability of 1/n is assigned to
each experimental outcome. When using this approach, the two basic requirements for assigning
probabilities are automatically satisfied.
Example
In rolling a die, six possible outcomes are equally likely, and hence each outcome is assigned a
probability of 1/6. Let P (1) denotes the probability that one dot appears on the upward face
of the die, then P (1) = 1/6. Similarly, P (2) = 1/6, P (3) = 1/6, P (4) = 1/6, P (5) = 1/6, and
P (6) = 1/6. Note that these probabilities satisfy the two basic requirements of equations (3) and
(4) because each of the probabilities is greater than or equal to zero and they sum to 1.0.
Example
For the experiment involving tossing two coins, four experimental outcomes are obtained:
P (H, H) = 1/4, P (H, T ) = 1/4, P (T, H) = 1/4, P (T, T ) = 1/4
Recall that the sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
The relative frequency method of assigning probabilities is appropriate when data are available
to estimate the proportion of the time the experimental outcome will occur if the experiment is
repeated a large number of times.
Example:
In testing market evaluation of a new product, 400 potential customers were contacted; 100 ac-
tually purchased the product but 300 did not. In effect, the experiment of contacting customers
was repeated 400 times and have found that the product was purchased 100 times. Then using
the relative frequency approach, we can assign a probability of 100/400 = 0.25 to experimental
outcome of a potential customer purchasing the product. Similarly, 300/400 = 0.75 is assigned to
the experimental outcome of a potential customer not purchasing the product.
5
2.2.3 Subjective method
The subjective method of assigning probabilities is most appropriate when one cannot realistically
assume that the experimental outcomes are equally likely and when little relevant data are avail-
able. When the subjective method is used to assign probabilities to the experimental outcomes,
we may use any information available, such as our experience or intuition. After considering all
available information, a probability value that expresses our degree of belief (on a scale from 0 to
1) that the experimental outcome will occur is specified. Because subjective probability expresses
a persons degree of belief, it is personal.
Using the subjective method, different people can be expected to assign different probabilities
to the same experimental outcome. The subjective method requires extra care to ensure that the
two basic requirements of equations (3) and (4) are satisfied.
Example:
Tom and Judy Elsbernd make an offer to purchase a house. Two outcomes are possible:
Judy believes that the probability their offer will be accepted is 0.8; thus, Judy would set P (E1 ) =
0.8 and P (E2 ) = 0.2. Tom, however, believes that the probability that their offer will be accepted
is 0.6; hence, Tom would set P (E1 ) = 0.6 and P (E2 ) = 0.4. Note that Toms probability estimate
for E1 reflects a greater pessimism that their offer will be accepted.
Example
Suppose that a room contains n people. What is the probability that at least two of them have a
common birthday, given that every year is equally likely to be a birthday and disregard leap year?
Solution: Let A denote the event that there are at least two people with common birthday. It is
easier to find P (Ac ) than to find P (A) in this case because A can happen in many ways, whereas
Ac is simpler. There are 365n possible outcomes andAc can occur in 365 × 364 × . . . × (365 − n + 1)
ways. Thus
365 × 364 × . . . × (365 − n + 1)
P (Ac ) =
365n
and
365 × 364 × . . . × (365 − n + 1)
P (A) = 1 − P (Ac ) = 1 − .
365n
6
n P (A)
4 0.016
23 0.507
40 0.891
56 0.988
The table below exhibit the probabilities of at least two having common birthday for various values
of n.
Example
How many people must you ask in order to have a 50:50 chance of finding who shares your birthday?
Solution: Suppose n people are asked, let A denotes the event that someone’s birthday is the
same as yours. The total number of outcomes is 365n . The total number of ways that Ac can
happen is 364n . Thus
364n 364 n
P (Ac ) = n
=( )
365 365
and
364n 1 n
P (A) = 1 − = 1 − (1 − ) .
365n 365
To compute n, we set P (A) to 0.5 in order to have a 50:50 chance of finding who shares your
birthday. That is,
364n
1− = 0.5
365n
implies
364n
= 0.5.
365n
Taking logarithm of both sides, we obtain
364
n log = log 0.5
365
log 0.5 −0.30103
n= 364 = −0.00119 = 252.65.
(5)
log 365
Hence you need to ask 253 people in order to have a 50:50 chance of finding who shares your
birthday.
Complement of an event
Given an event A, the complement of A is defined to be the event consisting of all sample points
that are not in A. The complement of A is denoted by Ac . Computing probability using the
complement
P (A) = 1 − P (Ac ).
7
The equation above shows that the probability of an event A can be computed easily if the
probability of its complement, P (Ac ), is known.
Definition 7 The union of A and B is the event containing all sample points belonging to A or
B or both. The union is denoted by A ∪ B. Given two events A and B, the intersection of A
and B is the event containing the sample points belonging to both A and B. The intersection is
denoted by P (A ∩ B).
Definition 8 Two events are said to be mutually exclusive if the events have no sample points in
common.
Addition law for mutually exclusive events is given as P (A ∪ B) = P (A) + P (B). That is,
P (A ∩ B) = 0.
8
Definition 9 Prior probabilities are initial estimates of the probabilities of events.
Posterior probabilities are revised probabilities of events based on additional information.
Bayes theorem is a method used to compute posterior probabilities.
The posterior probabilities can be computed as the ratio of a joint probability to a marginal
probability provides the following general formula for posterior probability calculations for two
events A and B.
P (A ∩ B) P (A ∩ B)
P (A|B) = and P (B|A) = .
P (B) P (A)
Independent events
Two events A and B are independent if P (A|B) = P (A) or P (B|A) = P (B). Otherwise, the
events are dependent. That is, the events have no influence on each other.
P (A ∩ B) = P (A) × P (B).
9
exclusive events A1 , A2 , . . . , An , whose union is the entire sample space, Bayes theorem can be
used to compute any posterior probability P (Ai |B) as:
P (Ai )P (B|Ai )
P (Ai |B) =
P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + . . . + P (An )P (B|An )
Example
Suppose that we have two events, A and B, with P (A) = 0.70, P (B) = 0.40, and P (A ∩ B) = 0.30.
Find P (A ∪ B), P (A|B) and P (B|A).
Solution: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.70 + 0.40 − 0.30 = 0.80
P (A ∩ B) 0.30 P (A ∩ B) 0.30
P (A|B) = = = 0.75 and P (B|A) = = = 0.429.
P (B) 0.40 P (A) 0.70
2. When drawing a random sample without replacement from a population of size N , the
counting rule for combinations is used to find the number of different samples of size n that
can be selected.
3. The sample space, S, is an event. Because it contains all the experimental outcomes, it has
a probability of 1; that is, P (S) = 1.
4. When the classical method is used to assign probabilities, the assumption is that the ex-
perimental outcomes are equally likely. In such cases, the probability of an event can be
computed by counting the number of experimental outcomes in the event and dividing the
result by the total number of experimental outcomes
5. Do not confuse the notion of mutually exclusive events with that of independent events. Two
events with nonzero probabilities cannot be both mutually exclusive and independent. If one
mutually exclusive event is known to occur, the other cannot occur; thus, the probability of
the other event occurring is reduced to zero. They are therefore dependent.
6. If the union of events is the entire sample space, the events are said to be collectively
exhaustive.
10
7. Bayes theorem is used extensively in decision analysis. The prior probabilities are often
subjective estimates provided by a decision maker. Sample information is obtained and
posterior probabilities are computed for use in choosing the best decision.
8. An event and its complement are mutually exclusive, and their union is the entire sample
space. Thus, Bayes theorem is always applicable for computing posterior probabilities of an
event and its complement.
5 Probability Distributions
Definition 10 Given a random experiment with sample space S, a random variable X is a set
function that assigns one and only one real number to each element s that belongs in the sample
space S.
Definition 11 The set of all possible values of the random variable X, denoted x, is called the
support, or space, of X.
The probability distribution of a discrete random variable is called a discrete probability distribution.
Example
A student is selected at random from class of STA 202 students with CGPA ≥ 3.0 (denoted by
U) and CGPA < 3.0 (denoted by L). Once selected, the academic performance (U or L) of the
student is noted. Thus the sample space is {U, L}. We may define the random variable X as
Here the random variable X assigns one and only one real number (1 and 2) to each element of
the sample space (U and L). The support, or space, of X is {1, 2}. Also, there is a finite (two, to
be exact) number of possible outcomes.
Definition 13 A random variable is a continuous random variable if there are infinite and un-
countable number of possible outcomes of X. The probability distribution of a continuous random
variable is called a continuous probability distribution.
11
A continuous random variable differs from a discrete random variable in that it takes on an
uncountably infinite number of possible outcomes. For example, if we let X denote the height (in
meters) of a randomly selected orange tree, then X is a continuous random variable.
A probability distribution assigns a probability value to each measurable subset of the possible
outcomes of a random experiment, survey, or procedure of statistical inference. A probability
distribution can be discrete or continuous.
P (X = x) = P (x)
3. The sum of the probability function P (X = x) over the support of random variable is one,
that is
X
Pj = 1
j
where j represents all possible values that X can have and Pj is the probability at xj
3. The integral of the probability function over the support of random variable is one, that is
Z ∞
f (x)dx = 1
−∞
12
5.1 Binomial Distribution
Suppose that n independent experiments or trials are performed when n is a fixed number, and
that each experiment results in a “success” with probability p and a “failure” with probability
1 − p. The total number of successes, X is a binomial random variable with parameters n and p.
The probability mass function of random variable X = x having binomial distribution is defined
as
n x
P (X = x) = p (1 − p)n−x ; x = 0, 1, 2, . . . , n.
x
It should be noted that the probability of a success, denoted by p, remains constant from trial to
trial and repeated trials are independent.
2. Each trial results in an outcome that may be classified as a success or a failure (hence the
name, binomial);
E[X] = np.
Example
Tay-Sachs disease is a rare but fatal disease of genetic origin occurring chiefly in infants and
children, especially those of eastern European extraction. If a couple are both carriers of Tay-
sachs disease, a child of theirs has probability 0.25 of being born with the disease. If such a couple
has four children, what is the probability that
13
Solution
Let X be the number of children with the disease. Total number of children n = 4
4
1. P (X = 0) = 0 (0.25)0 (1 − 0.25)4−0 = (0.75)4 = 0.316
4
2. P (X = 1) = 1 (0.25)1 (1 − 0.25)4−1 = 4 × 0.25 × 0.753 = 0.422
4
3. P (X = 4) = 4 (0.25)4 (1 − 0.25)4−4 = 0.254 = 0.004
Example
If a single bit (0 or 1) is transmitted over a noisy communications channel, it has probability p
of being incorrectly transmitted. To overcome this problem, the bit is transmitted n times. A
decoder at the receiving end called a majority decoder, decides that the correct message is that
carried by a majority of the received bits. Under a simple noise model, each bit is independently
subject to being corrupted with the same probability p . Suppose n = 5 and p = 0.5, find the
probability that the message is correctly received.
solution
The probability that the message is correctly received is the probability of two (0 or 1) or fewer
errors. The word success is used in a generic sense; here a success is an error, denote the number of
bits that are in the error as X. Thus X is a binomial random variable with n trials and probability
p of success on each trial.
probability that the message is correctly received = Probability of two or fewer errors
= P (X = 0) + P (X = 1) + P (X = 2)
5 5 5
= (0.1)0 (1 − 0.1)5−0 + (0.1)1 (1 − 0.1)5−1 + (0.1)2 (1 − 0.1)5−2
0 1 2
= 0.95 + 5 × 0.1 × 0.94 + 10 × 0.12 × 0.93 = 0.9914
λx e−λ
P (X = x) = , x = 0, 1, 2, . . .
x!
The poisson distribution can be derived as the limit of a binomial distribution as the number of
trial, n approaches in infinity and the probability of success on each trial, p approaches zero in
such a way that np = λ.
For poisson distribution, the rate at which events occur is constant and events are independent.
14
5.2.1 Basic properties of poisson distribution
E[X] = λ.
V ar(X) = λ.
Example
Let X be the number of borehole a company drill with a mean of 3 borehole per town in Nigeria.
What is the probability that the company drill
3. The probability that the company drill at least one borehole in a randomly selected town is
P (X ≥ 1). So,
P (X ≥ 1) = 1 − P (X = 0) = 1 − 0.0498 = 0.9502.
15
Example
Suppose that an office receives telephone calls as a poisson process with λ = 0.5 per minutes. The
number of calls in a 5-minutes interval follows a poisson distribution with parameter ω = 5λ = 2.5.
What is the probability of the following
1. no call
Solution:
ω x −ω
P (X = x) = e , ω = 2.5
x!
1.
2.50 −2.5
P (X = 0) = e = e−2.5 = 0.082
0!
2.
2.51 −2.5
P (X = 1) = e = 2.5e−2.5 = 0.205
1!
Example
Given a normal distribution of set of measurements for which the mean is 70 and the standard
deviation is 4.5. Determine the probability that a measurement is
16
2. between 65 and 80, inclusive.
Example
Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard
deviation of 10, what is the probability that a person who takes the test will score
Solution: In this example, µ = 100 and σ = 10. Let X denotes score on an IQ test.
Example
Let X ∼ N (µ, σ 2 ). Find the probability that X is a distance less than σ away from µ. That is
17
Solution
X −µ
P (|X − µ| < σ) = P (−σ < X − µ < σ) = P (−1 < < 1)
σ
= p(−1 < Z < 1)
It should be noted that λ is rate parameter. That is λ = average number of events in unit of time.
1/λ = average time until an event occurs.
18
of time s, is considered an exponential random variable. we can calculate the probability
that it will last at least t more time units. (i.e P (T > t + s|T > s)).
Example
The time required to repair a machine is an exponential random variable with rate parameter
λ = 0.5 downs/hour.
2. What is the probability that the repair time will take at least 4 hours given that the repair
man has been working on the machine for 3 hours?
Solution
Let X be the repair time. The probability density function of X is
P (X ≤ x) = F (x) = 1 − e−λx .
1. The probability that a repair time exceeds 2 hours is P (X > 2). Following from above
definition,
P (X > x) = 1 − P (X ≤ x) = 1 − (1 − e−λx ) = e−λx .
S0,
P (X > 2) = e−0.5×2 = e−1 = 0.36788.
2.
P (X ≥ 4|X ≥ 3) = P (T ≥ 1) = e−0.5×1 = 0.6065.
19
Example
Suppose campus shuttle buses arriving at SAAT building bus stop is having exponential distribu-
tion with rate parameter λ = 4 buses/hour.
1. If you arrived at 9:00 am to the SAAT building bus stop, what is the expected time of the
next bus?
2. Assume you asked one of the students waiting for the bus about the arrival time of the last
bus and he told you that the last bus left at 8:43 am. What is the expected time of the next
bus?
Solution
Let Y be the time between buses. The probability density function of Y is
1.
2.
E[Time until next bus arrives | last bus arrived at 8:43 and you arrived at 9:00 am]
= 9 : 00 + 1/4 hour = 9 : 00 + 15 min = 9 : 15 am
Example
Find the median η of an exponential distribution with parameter λ.
Solution
Z 0
F (x) = λe−λu du = 1 − e−λx if x ≥ 0.
x
20
Taking the natural logarithm of both sides, we obtain
− loge 2 = −λη
loge 2
η= .
λ
References
Anderson, D.R. and Sweeney, D.J. and Williams, T.A.(1991) Introduction to Statistics: Concepts
and Application. Thomson South-Western Publishing Company.
21