Elements of Probability

Elements of Probabilities
O. S. Makinde
May 13, 2016
1 Introduction
Life is filled with uncertainty. Some examples of events involving uncertainty are:
• What is the chance that your cowpea seedlings will grow to maturity if it rains everyday?
• What is the chance that your car will be toed away if you park illegally in the visitors’
parking lot?
• What is the likelihood of scoring an A in STA 224 if you start reading two weeks before
examination?
Definition 1 Probability is a numerical measure of the likelihood that an event will occur.
Thus, probabilities can be used as measures of the degree of uncertainty associated with the three
events previously listed. If probabilities are available, we can determine the likelihood of each
event occurring. Probability values are always assigned on a scale from 0 to 1. A probability near
zero indicates an event is unlikely to occur; a probability near 1 indicates an event is almost certain
to occur. Other probabilities between 0 and 1 represent degrees of likelihood that an event will
occur. For example, if we consider the event rain tomorrow, we understand that when the weather
report indicates a near-zero probability of rain, it means almost no chance of rain. However, if
a 0.90 probability of rain is reported, we know that rain is likely to occur. A 0.50 probability
indicates that rain is just as likely to occur as not.
1.1 Experiments, sample space and probability

In discussing probability, we define an experiment as a process that generates well-defined out-
comes. On any single repetition of an experiment, one and only one of the possible experimental
outcomes will occur. Several examples of experiments and their associated outcomes follow. When
we have specifying all possible experimental outcomes, we have identified what statisticians call
the sample space for the experiment.
1
Table 1: Some experiments and their outcomes.
Experiment Experimental outcomes

Groups of micro-organisms Procaryote, Eucaryote
Toss a coin Head, Tail
Roll a die 1, 2, 3, 4, 5, 6
Write an exam Pass, Fail
Definition 2 The sample space for an experiment is the set of all experimental outcomes.
An experimental outcome is also called a sample point. It is an element of the sample space, and
hence experimental outcomes are sample points.
Consider the experiment of tossing a coin. Let S denotes the sample space, then we can define
S mathematically for this experiment as S = {head, tail}. Consider the experiment of rolling a
die, the sample space is S = {1, 2, 3, 4, 5, 6}. Also, consider an experiment of tossing two coins one
after the other. The sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
The counting rule for multiple-step experiments makes it possible to determine the number of
experimental outcomes without listing them.
1.2 Counting rule for multiple-step experiments

If an experiment can be described as a sequence of k steps with n1 possible outcomes on the first
step, n2 possible outcomes on the second step, and so on, then the total number of experimental
outcomes is given by
n1 × n2 × . . . × nk .
Looking at the experiment of tossing two coins as a sequence of first tossing one coin (n1 = 2)
and then tossing the other coin (n2 = 2), the counting rule then shows us that (2)(2) = 4 distinct
experimental outcomes are possible: S = {(H, H), (H, T ), (T, H), (T, T )}.
1.2.1 Counting Rule For Combinations
A useful counting rule allows one to count the number of experimental outcomes when the experi-
ment involves selecting n objects from a (usually larger) set of N objects. It is called the counting
rule for combinations.
Definition 3 The number of combinations of N objects taken n at a time is

N N!
CnN = = (1)
n n!(N − n)!
where N ! = N × (N − 1) × (N − 2) × . . . × 2 × 1, n! = n × (n − 1) × (n − 2) × . . . × 2 × 1 and, by
definition, 0! = 1.
2
The notation ! means factorial; for example, 3 factorial is 3! = (3)(2)(1) = 6.
The number of ways that n objects can be grouped into r classes with ni in the ith class,
Pr
i = 1, 2, . . . , r and i=1 ni = n is

n n!
= .
n1 n2 . . . nr n1 !n2 ! . . . nr !
As an illustration of the counting rule for combinations, consider a quality control procedure in
which an inspector randomly selects two of five parts to test for defects. In a group of five parts,
how many combinations of two parts can be selected? The counting rule in equation (1) shows
that with N = 5 and n = 2, we have

5 5 5! (5)(4)(3)(2)(1) 120
C2 = = = = = 10.
2 2!(5 − 2)! (2)(1)(3)(2)(1) 12
Thus, 10 outcomes are possible for the experiment of randomly selecting two parts from a group
of five.
1.2.2 Counting Rule For Permutations
Another counting rule that is sometimes useful is the counting rule for permutations. It allows
one to compute the number of experimental outcomes when n objects are to be selected from a set
of N objects where the order of selection is important. A permutation is an ordered arrangement
of objects.
Definition 4 The number of orderings of N elements is N (N − 1)(N − 2) . . . 1 = N !.
This is an equivalent notion to selection of elements without replacement.
Definition 5 The number of permutations of N objects taken n at a time is given by

N N!
PnN = n! = (2)
n (N − n)!
That is, for a set of size N and a sample of size n , there are N n different ordered samples with
replacement and N (N − 1)(N − 2) . . . (N − n + 1) different ordered samples without replacement.
The counting rule for permutations closely relates to the one for combinations; however, an
experiment results in more permutations than combinations for the same number of objects because
every selection of n objects can be ordered in n! different ways. As an example, consider again the
quality control process in which an inspector selects two of five parts to inspect for defects. How
many permutations may be selected? The counting rule in equation (2) shows that with N = 5
and n = 2, we have
5 5! (5)(4)(3)(2)(1) 120
P25 = 2! = = = = 20.
2 3! (3)(2)(1) 6
Thus, 20 outcomes are possible for the experiment of randomly selecting two parts from a group
of five when the order of selection must be taken into account. If we label the parts A, B, C, D,
3
and E, the 20 permutations are AB, BA, AC, CA, AD, DA, AE, EA, BC, CB, BD, DB, BE, EB,
CD, DC, CE, EC, DE, and ED.
In sampling from a finite population of size N , the counting rule for combinations is used to
find the number of different samples of size n that can be selected.
Example
How many ways can four children be lined up?
Solution: This corresponds to sampling without replacement. By Definition 4, there are 4! =
4 × 3 × 2 × 1 = 24 different lines
Example
In Nigeria some license plates have eight characters: three letters followed by three numbers, and
then two letters (for example, LSD 422 AA). How many distinct such plates are possible?
Solution: There are 265 different ways to choose the letters (a, b, . . . , y, z, size = 26) and 103 =
1000 ways to choose the numbers 0, 1, . . . , a, size10. Using multiplication rule, there are 265 × 103
DIFFERENT PLATES.
Example
Suppose out of 15 objective questions in STA 122, eight are to be answered. How many different
sets of questions are possible .
Solution: From Definition 5, there are 15 × (15 − 1) × (15 − 2) × . . . × (15 − 8 + 1) = 15 × 14 ×
13 × 12 × 11 × 10 × 9 × 8 sets of different questions
1.2.3 Tree diagram
This is a graphical representation that helps in visualizing a multiple-step experiment. The count-
ing rule and tree diagram help the project manager identify the experimental outcomes and de-
termine the possible project completion times.
2 Assigning Probabilities
2.1 Basic Requirement for Assigning Probabilities
1. The probability assigned to each experimental outcome must be between 0 and 1, inclu-
sively. If we let Ei denote the ith experimental outcome and P (Ei ) its probability, then this
requirement can be written as
0 ≤ P (Ei ) ≤ 1 for all i (3)
4
2. The sum of the probabilities for all the experimental outcomes must equal 1.0. For n exper-
imental outcomes, this requirement can be written as
P (E1 ) + P (E2 ) + . . . + P (En ) = 1 (4)
2.2 Methods of Assigning Probabilities

2.2.1 classical method
The classical method of assigning probabilities is appropriate when all the experimental outcomes
are equally likely. If n experimental outcomes are possible, a probability of 1/n is assigned to
each experimental outcome. When using this approach, the two basic requirements for assigning
probabilities are automatically satisfied.
Example
In rolling a die, six possible outcomes are equally likely, and hence each outcome is assigned a
probability of 1/6. Let P (1) denotes the probability that one dot appears on the upward face
of the die, then P (1) = 1/6. Similarly, P (2) = 1/6, P (3) = 1/6, P (4) = 1/6, P (5) = 1/6, and
P (6) = 1/6. Note that these probabilities satisfy the two basic requirements of equations (3) and
(4) because each of the probabilities is greater than or equal to zero and they sum to 1.0.
Example
For the experiment involving tossing two coins, four experimental outcomes are obtained:

P (H, H) = 1/4, P (H, T ) = 1/4, P (T, H) = 1/4, P (T, T ) = 1/4
Recall that the sample space is S = {(H, H), (H, T ), (T, H), (T, T )}.
2.2.2 Relative Frequency method
The relative frequency method of assigning probabilities is appropriate when data are available
to estimate the proportion of the time the experimental outcome will occur if the experiment is
repeated a large number of times.
Example:
In testing market evaluation of a new product, 400 potential customers were contacted; 100 ac-
tually purchased the product but 300 did not. In effect, the experiment of contacting customers
was repeated 400 times and have found that the product was purchased 100 times. Then using
the relative frequency approach, we can assign a probability of 100/400 = 0.25 to experimental
outcome of a potential customer purchasing the product. Similarly, 300/400 = 0.75 is assigned to
the experimental outcome of a potential customer not purchasing the product.
5
2.2.3 Subjective method
The subjective method of assigning probabilities is most appropriate when one cannot realistically
assume that the experimental outcomes are equally likely and when little relevant data are avail-
able. When the subjective method is used to assign probabilities to the experimental outcomes,
we may use any information available, such as our experience or intuition. After considering all
available information, a probability value that expresses our degree of belief (on a scale from 0 to
1) that the experimental outcome will occur is specified. Because subjective probability expresses
a persons degree of belief, it is personal.
Using the subjective method, different people can be expected to assign different probabilities
to the same experimental outcome. The subjective method requires extra care to ensure that the
two basic requirements of equations (3) and (4) are satisfied.
Example:
Tom and Judy Elsbernd make an offer to purchase a house. Two outcomes are possible:
E1 = their offer is accepted and E2 = their offer is rejected
Judy believes that the probability their offer will be accepted is 0.8; thus, Judy would set P (E1 ) =
0.8 and P (E2 ) = 0.2. Tom, however, believes that the probability that their offer will be accepted
is 0.6; hence, Tom would set P (E1 ) = 0.6 and P (E2 ) = 0.4. Note that Toms probability estimate
for E1 reflects a greater pessimism that their offer will be accepted.
3 Events and their Probabilities

Definition 6 An event is a collection of sample points. The probability of any event is equal to
the sum of the probabilities of the sample points in the event
Example
Suppose that a room contains n people. What is the probability that at least two of them have a
common birthday, given that every year is equally likely to be a birthday and disregard leap year?
Solution: Let A denote the event that there are at least two people with common birthday. It is
easier to find P (Ac ) than to find P (A) in this case because A can happen in many ways, whereas
Ac is simpler. There are 365n possible outcomes andAc can occur in 365 × 364 × . . . × (365 − n + 1)
ways. Thus
365 × 364 × . . . × (365 − n + 1)
P (Ac ) =
365n
and
365 × 364 × . . . × (365 − n + 1)
P (A) = 1 − P (Ac ) = 1 − .
365n
6
n P (A)
4 0.016
23 0.507
40 0.891
56 0.988
The table below exhibit the probabilities of at least two having common birthday for various values
of n.
Example
How many people must you ask in order to have a 50:50 chance of finding who shares your birthday?
Solution: Suppose n people are asked, let A denotes the event that someone’s birthday is the
same as yours. The total number of outcomes is 365n . The total number of ways that Ac can
happen is 364n . Thus
364n 364 n
P (Ac ) = n
=( )
365 365
and
364n 1 n
P (A) = 1 − = 1 − (1 − ) .
365n 365
To compute n, we set P (A) to 0.5 in order to have a 50:50 chance of finding who shares your
birthday. That is,
364n
1− = 0.5
365n
implies
364n
= 0.5.
365n
Taking logarithm of both sides, we obtain

364
n log = log 0.5
365
log 0.5 −0.30103
n= 364 = −0.00119 = 252.65.
(5)
log 365
Hence you need to ask 253 people in order to have a 50:50 chance of finding who shares your
birthday.
Complement of an event
Given an event A, the complement of A is defined to be the event consisting of all sample points
that are not in A. The complement of A is denoted by Ac . Computing probability using the
complement
P (A) = 1 − P (Ac ).
7
The equation above shows that the probability of an event A can be computed easily if the
probability of its complement, P (Ac ), is known.
3.1 Addition Law

The addition law is helpful when we are interested in knowing the probability that at least one
of two events occurs. That is, with events A and B we are interested in knowing the probability
that event A or event B or both occur. Before we present the addition law, we need to discuss two
concepts related to the combination of events: the union of events and the intersection of events.
Given two events A and B, the union of A and B is defined as follows.
Definition 7 The union of A and B is the event containing all sample points belonging to A or
B or both. The union is denoted by A ∪ B. Given two events A and B, the intersection of A
and B is the event containing the sample points belonging to both A and B. The intersection is
denoted by P (A ∩ B).
By the addition law, the probability of the union of A and B is
P (A ∪ B) = P (A) + P (B) − P (A ∩ B).
Definition 8 Two events are said to be mutually exclusive if the events have no sample points in
common.
Addition law for mutually exclusive events is given as P (A ∪ B) = P (A) + P (B). That is,
P (A ∩ B) = 0.
3.2 Conditional Probability

Often, the probability of an event is influenced by whether a related event already occurred.
Suppose we have an event A with probability P (A). If we obtain new information and learn that a
related event, denoted by B, already occurred, we will want to take advantage of this information
by calculating a new probability for event A. This new probability of event A is called a conditional
probability and is written P (A|B). This is also called posterior probability of A given B. We use
the notation | to indicate that we are considering the probability of event A given the condition
that event B has occurred. Hence, the notation P (A|B) reads the probability of A given B.
The joint probability of events A and B is the probability of the intersection of A and B (that
is, the probability of two events both occurring). Marginal probability of A is the sum of the joint
probabilities of events A and B at all levels of B while marginal probability of B is the sum of
the joint probabilities of events A and B at all levels of A. Using joint probability table, marginal
probability is the values in the margins of a joint probability table that provide the probabilities
of each event separately.
8
Definition 9 Prior probabilities are initial estimates of the probabilities of events.
Posterior probabilities are revised probabilities of events based on additional information.
Bayes theorem is a method used to compute posterior probabilities.
The posterior probabilities can be computed as the ratio of a joint probability to a marginal
probability provides the following general formula for posterior probability calculations for two
events A and B.
P (A ∩ B) P (A ∩ B)
P (A|B) = and P (B|A) = .
P (B) P (A)
Independent events
Two events A and B are independent if P (A|B) = P (A) or P (B|A) = P (B). Otherwise, the
events are dependent. That is, the events have no influence on each other.
3.3 Multiplication Law

Whereas the addition law of probability is used to compute the probability of a union of two
events, the multiplication law is used to compute the probability of the intersection of two events.
The multiplication law is based on the definition of conditional probability. The multiplication
law is defined as
P (A ∩ B) = P (B) × P (A|B) and P (B ∩ A) = P (A) × P (B|A).
The multiplication law for independent events is
P (A ∩ B) = P (A) × P (B).
If P (A ∩ B) = P (A) × P (B), then A and B are independent; if P (A ∩ B) 6= P (A) × P (B), then

A and B are dependent.
3.4 Bayes Theorem

The Reverend Thomas Bayes (1702-1761), a Presbyterian minister, is credited with the original
work leading to the version of Bayes theorem in use today. Suppose A has two levels; A1 and
A2 with prior probabilities P (A1 ) and P (A2 ) respectively. By Bayes theorem, the conditional
(posterior) probabilities P (A1 |B) and P (A2 |B) are defined as
P (A1 ∩ B) P (A1 )P (B|A1 )
P (A1 |B) = = and
P (B) P (A1 )P (B|A1 ) + P (A2 )P (B|A2 )
P (A2 ∩ B) P (A1 )P (B|A1 )
P (A2 |B) = = respectively.
P (B) P (A1 )P (B|A1 ) + P (A2 )P (B|A2 )
Bayes theorem is applicable when the events for which we want to compute posterior probabil-
ities are mutually exclusive and their union is the entire sample space. For the case of n mutually
9
exclusive events A1 , A2 , . . . , An , whose union is the entire sample space, Bayes theorem can be
used to compute any posterior probability P (Ai |B) as:
P (Ai )P (B|Ai )
P (Ai |B) =
P (A1 )P (B|A1 ) + P (A2 )P (B|A2 ) + . . . + P (An )P (B|An )
Example
Suppose that we have two events, A and B, with P (A) = 0.70, P (B) = 0.40, and P (A ∩ B) = 0.30.
Find P (A ∪ B), P (A|B) and P (B|A).
Solution: P (A ∪ B) = P (A) + P (B) − P (A ∩ B) = 0.70 + 0.40 − 0.30 = 0.80
P (A ∩ B) 0.30 P (A ∩ B) 0.30
P (A|B) = = = 0.75 and P (B|A) = = = 0.429.
P (B) 0.40 P (A) 0.70
4 Notes and Comments

1. In statistics, the notion of an experiment differs somewhat from the notion of an experiment
in the physical sciences. In the physical sciences, researchers usually conduct an experiment
in a laboratory or a controlled environment in order to learn about cause and effect. In
statistical experiments, probability determines outcomes. Even though the experiment is
repeated in exactly the same way, an entirely different outcome may occur. Because of this
influence of probability on the outcome, the experiments of statistics are sometimes called
random experiments.
2. When drawing a random sample without replacement from a population of size N , the
counting rule for combinations is used to find the number of different samples of size n that
can be selected.
3. The sample space, S, is an event. Because it contains all the experimental outcomes, it has
a probability of 1; that is, P (S) = 1.
4. When the classical method is used to assign probabilities, the assumption is that the ex-
perimental outcomes are equally likely. In such cases, the probability of an event can be
computed by counting the number of experimental outcomes in the event and dividing the
result by the total number of experimental outcomes
5. Do not confuse the notion of mutually exclusive events with that of independent events. Two
events with nonzero probabilities cannot be both mutually exclusive and independent. If one
mutually exclusive event is known to occur, the other cannot occur; thus, the probability of
the other event occurring is reduced to zero. They are therefore dependent.
6. If the union of events is the entire sample space, the events are said to be collectively
exhaustive.
10
7. Bayes theorem is used extensively in decision analysis. The prior probabilities are often
subjective estimates provided by a decision maker. Sample information is obtained and
posterior probabilities are computed for use in choosing the best decision.
8. An event and its complement are mutually exclusive, and their union is the entire sample
space. Thus, Bayes theorem is always applicable for computing posterior probabilities of an
event and its complement.
5 Probability Distributions
Definition 10 Given a random experiment with sample space S, a random variable X is a set
function that assigns one and only one real number to each element s that belongs in the sample
space S.
Definition 11 The set of all possible values of the random variable X, denoted x, is called the
support, or space, of X.
Definition 12 A random variable X is a discrete random variable if:
1. there are a finite number of possible outcomes of X, or
2. there are a countably infinite number of possible outcomes of X.
The probability distribution of a discrete random variable is called a discrete probability distribution.
Example
A student is selected at random from class of STA 202 students with CGPA ≥ 3.0 (denoted by
U) and CGPA < 3.0 (denoted by L). Once selected, the academic performance (U or L) of the
student is noted. Thus the sample space is {U, L}. We may define the random variable X as
1. X=1 if the student’s CGPA ≥ 3.0.
2. X=2 if the student’s CGPA < 3.0.
Here the random variable X assigns one and only one real number (1 and 2) to each element of
the sample space (U and L). The support, or space, of X is {1, 2}. Also, there is a finite (two, to
be exact) number of possible outcomes.
Definition 13 A random variable is a continuous random variable if there are infinite and un-
countable number of possible outcomes of X. The probability distribution of a continuous random
variable is called a continuous probability distribution.
11
A continuous random variable differs from a discrete random variable in that it takes on an
uncountably infinite number of possible outcomes. For example, if we let X denote the height (in
meters) of a randomly selected orange tree, then X is a continuous random variable.
A probability distribution assigns a probability value to each measurable subset of the possible
outcomes of a random experiment, survey, or procedure of statistical inference. A probability
distribution can be discrete or continuous.
Definition 14 The mathematical definition of a discrete probability function, f (x), is a function

that satisfies the following properties:
1. The probability that X can take a specific value is
P (X = x) = P (x)
2. P (x) is non-negative for all real x.
3. The sum of the probability function P (X = x) over the support of random variable is one,
that is
X
Pj = 1
j
where j represents all possible values that X can have and Pj is the probability at xj
Examples of discrete probability distributions include Bernoulli distribution, binomial distribu-

tion, Poisson distribution, negative binomial distribution, geometric distribution, hypergeometric
distribution.
Definition 15 The mathematical definition of a continuous probability function, f (x), is a func-

tion that satisfies the following properties:
1. The probability that X is between two points a and b is

Z b
P [a ≤ X ≤ b] = f (x)dx
a
2. It is non-negative for all real x.
3. The integral of the probability function over the support of random variable is one, that is
Z ∞
f (x)dx = 1
−∞
Examples of continuous probability distributions include normal distribution, uniform distribution,

exponential distribution, gamma distribution, beta distribution.
12
5.1 Binomial Distribution
Suppose that n independent experiments or trials are performed when n is a fixed number, and
that each experiment results in a “success” with probability p and a “failure” with probability
1 − p. The total number of successes, X is a binomial random variable with parameters n and p.
The probability mass function of random variable X = x having binomial distribution is defined
as
n x
P (X = x) = p (1 − p)n−x ; x = 0, 1, 2, . . . , n.
x
It should be noted that the probability of a success, denoted by p, remains constant from trial to
trial and repeated trials are independent.
5.1.1 Basic properties of binomial distribution
Basic properties of binomial distribution include
1. The experiment consists of n repeated trials;
2. Each trial results in an outcome that may be classified as a success or a failure (hence the
name, binomial);
3. The expected value of X having binomial distribution with parameters n and p is
E[X] = np.
4. The variance of X having binomial distribution with parameters n and p is
V ar(X) = np(1 − p).
Example
Tay-Sachs disease is a rare but fatal disease of genetic origin occurring chiefly in infants and
children, especially those of eastern European extraction. If a couple are both carriers of Tay-
sachs disease, a child of theirs has probability 0.25 of being born with the disease. If such a couple
has four children, what is the probability that
1. no child will have the disease?
2. one child will have the disease?
3. all the children will have the disease?
13
Solution
Let X be the number of children with the disease. Total number of children n = 4
4

1. P (X = 0) = 0 (0.25)0 (1 − 0.25)4−0 = (0.75)4 = 0.316
4

2. P (X = 1) = 1 (0.25)1 (1 − 0.25)4−1 = 4 × 0.25 × 0.753 = 0.422
4

3. P (X = 4) = 4 (0.25)4 (1 − 0.25)4−4 = 0.254 = 0.004
Example
If a single bit (0 or 1) is transmitted over a noisy communications channel, it has probability p
of being incorrectly transmitted. To overcome this problem, the bit is transmitted n times. A
decoder at the receiving end called a majority decoder, decides that the correct message is that
carried by a majority of the received bits. Under a simple noise model, each bit is independently
subject to being corrupted with the same probability p . Suppose n = 5 and p = 0.5, find the
probability that the message is correctly received.
solution
The probability that the message is correctly received is the probability of two (0 or 1) or fewer
errors. The word success is used in a generic sense; here a success is an error, denote the number of
bits that are in the error as X. Thus X is a binomial random variable with n trials and probability
p of success on each trial.
probability that the message is correctly received = Probability of two or fewer errors
= P (X = 0) + P (X = 1) + P (X = 2)

5 5 5
= (0.1)0 (1 − 0.1)5−0 + (0.1)1 (1 − 0.1)5−1 + (0.1)2 (1 − 0.1)5−2
0 1 2
= 0.95 + 5 × 0.1 × 0.94 + 10 × 0.12 × 0.93 = 0.9914
5.2 Poisson Distribution

The probability mass function of poisson distribution with parameter λ > 0) is defined as
λx e−λ
P (X = x) = , x = 0, 1, 2, . . .
x!
The poisson distribution can be derived as the limit of a binomial distribution as the number of
trial, n approaches in infinity and the probability of success on each trial, p approaches zero in
such a way that np = λ.
For poisson distribution, the rate at which events occur is constant and events are independent.
14
5.2.1 Basic properties of poisson distribution
Basic properties of poisson distribution include
1. The expected value of X having poisson distribution with parameter λ is
E[X] = λ.
2. The variance of X having poisson distribution with parameter λ is
V ar(X) = λ.
Example
Let X be the number of borehole a company drill with a mean of 3 borehole per town in Nigeria.
What is the probability that the company drill
1. one borehole in a randomly selected town?
2. at most one borehole in a randomly selected town?
3. at least one borehole in a randomly selected town?
Solution: In this example, λ = 3 and

λx −λ
P (X = x) = e .
x!
1.
31 −3
P (X = 1) = e = 3 × e−3 = 0.1494.
1!
2.
30 −3
P (X = 0) = e = e−3 = 0.0498
0!
The probability that the company drill at most one borehole in a randomly selected town is
P (X ≤ 1). So,
P (X ≤ 1) = P (X = 0) + P (X = 1) = 0.0498 + 0.1494 = 0.1992.
3. The probability that the company drill at least one borehole in a randomly selected town is
P (X ≥ 1). So,
P (X ≥ 1) = 1 − P (X = 0) = 1 − 0.0498 = 0.9502.
15
Example
Suppose that an office receives telephone calls as a poisson process with λ = 0.5 per minutes. The
number of calls in a 5-minutes interval follows a poisson distribution with parameter ω = 5λ = 2.5.
What is the probability of the following
1. no call
2. exactly one call in a 5-minutes interval?
Solution:
ω x −ω
P (X = x) = e , ω = 2.5
x!
1.
2.50 −2.5
P (X = 0) = e = e−2.5 = 0.082
0!
2.
2.51 −2.5
P (X = 1) = e = 2.5e−2.5 = 0.205
1!
5.3 Normal Distribution:

The normal distribution has been used as a model for diverse phenomena such as a person’s height,
the distribution of IQ scores and the velocity of a gas molecule. The probability density function
of the normal distribution is defined as
1 1 (x−µ)
2
f (x) = √ e− 2 σ2 , for − ∞ < x < ∞

σ 2π
where −∞ < x < ∞ and σ > 0. µ and σ 2 are mean and variance of the distribution. The standard
normal distribution is achieved when µ = 0 and σ 2 = 1. That is
1 x2
f (x) = √ e− 2 ; −∞ < x < ∞.
σ 2π
5.3.1 Basic properties of normal distribution
Basic properties of normal distribution include
1. The expected value of X having normal distribution is E[X] = µ.
2. The variance of normal distribution is var[X] = σ.
Example
Given a normal distribution of set of measurements for which the mean is 70 and the standard
deviation is 4.5. Determine the probability that a measurement is
1. less than 65.
16
2. between 65 and 80, inclusive.
Solution: Here µ = 70 and σ = 4.5. Let X denotes the measurement.
1. The probability that a measurement is less than 65 is

X −µ 65 − µ 65 − 70
P (X < 65) = P < =P Z<
σ σ 4.5
= P (Z < −1.11) = φ(−1.11) = 0.1335.
2. The probability that a measurement is between 65 and 80 inclusive is

65 − µ X −µ 80 − µ 65 − 70 80 − 70
P (65 ≤ X ≤ 80) = P ≤ ≤ =P ≤Z≤
σ σ σ 4.5 4.5
= P (−1.11 ≤ Z ≤ 2.22) = φ(2.22) − φ(−1.11) = 0.9868 − 0.1335 = 0.8533.
Example
Suppose scores on an IQ test are normally distributed. If the test has a mean of 100 and a standard
deviation of 10, what is the probability that a person who takes the test will score
1. greater than 120?
2. between 90 and 110?
Solution: In this example, µ = 100 and σ = 10. Let X denotes score on an IQ test.
1. The probability that X is greater than 120 is

X −µ 120 − µ 120 − 100
P (X > 120) = P > =P Z>
σ σ 10
= P (Z > 2.0) = 1 − P (Z < 2.0) = 1 − φ(2.0) = 1 − 0.9772 = 0.0228.
2. The probability that X is between 90 and 110 is

90 − µ X −µ 110 − µ 90 − 100 110 − 100
P (90 < X < 110) = P ≤ ≤ =P ≤Z≤
σ σ σ 10 10
= P (−1.0 ≤ Z ≤ 1.0) = φ(1.0) − φ(−1.0) = 0.8413 − 0.1587 = 0.6826.
Example
Let X ∼ N (µ, σ 2 ). Find the probability that X is a distance less than σ away from µ. That is
P (|x − µ| < σ).
17
Solution
X −µ
P (|X − µ| < σ) = P (−σ < X − µ < σ) = P (−1 < < 1)
σ
= p(−1 < Z < 1)
where Z follows standard normal distribution and
P (−1 < Z < 1) = Φ(1) − Φ(−1) = 0.68.
5.4 Exponential Distribution

The exponential distribution is often used to model life time of electrical components or electronic
device, the time between arrivals of two successive buses, the duration time of a car service, time
between machine breakdowns, time between successive job arrivals at a computing centre, among
others.
The probability density function of exponential distribution with parameter λ is defined as
(
λe−λx , if x ≥ 0
f (x) =
0, if x = 0
It should be noted that λ is rate parameter. That is λ = average number of events in unit of time.
1/λ = average time until an event occurs.
5.4.1 Basic properties of exponential distribution
Basic properties of exponential distribution include
1. The cumulative distribution function of exponential distribution is

(
1 − e−λx , if x ≥ 0
Z 0
−λµ
F (x) = λe du =
x 0, if x = 0
2. The expected value of X having exponential distribution with rate parameter λ is

Z ∞ Z ∞
1
E[X] = xf (x)dx = xλe−λx dx = .
0 0 λ
3. The variance of X having exponential distribution with rate parameter λ is

1
V ar(X) = E[X 2 ](E[T ])2 = .
λ2
4. The exponential distribution is memoryless.

To show this, suppose that the lifetime of an electronic component that has lasted a length
18
of time s, is considered an exponential random variable. we can calculate the probability
that it will last at least t more time units. (i.e P (T > t + s|T > s)).
P (T > t + s and T > s) P (T > t + s)

P (T > t + s|T > s) = =
P (T > s) P (T > s)
e−λ(t+s)
= = e−λt
e−λs
because P (T ≥ t + s) = 1 − P (T ≤ t + s) and P (T ≤ t + s) = 1 − e−λ(t+s) . The probability
that the unit will last t more time units does not depends on s.
Example
The time required to repair a machine is an exponential random variable with rate parameter
λ = 0.5 downs/hour.
1. What is the probability that a repair time exceeds 2 hours?
2. What is the probability that the repair time will take at least 4 hours given that the repair
man has been working on the machine for 3 hours?
Solution
Let X be the repair time. The probability density function of X is
f (x) = 0.5e−0.5x for x > 0
and the cumulative distribution function is
P (X ≤ x) = F (x) = 1 − e−λx .
1. The probability that a repair time exceeds 2 hours is P (X > 2). Following from above
definition,
P (X > x) = 1 − P (X ≤ x) = 1 − (1 − e−λx ) = e−λx .
S0,
P (X > 2) = e−0.5×2 = e−1 = 0.36788.
2.
P (X ≥ 4|X ≥ 3) = P (T ≥ 1) = e−0.5×1 = 0.6065.
This follows from the memoryless property of exponential distribution.
19
Example
Suppose campus shuttle buses arriving at SAAT building bus stop is having exponential distribu-
tion with rate parameter λ = 4 buses/hour.
1. If you arrived at 9:00 am to the SAAT building bus stop, what is the expected time of the
next bus?
2. Assume you asked one of the students waiting for the bus about the arrival time of the last
bus and he told you that the last bus left at 8:43 am. What is the expected time of the next
bus?
Solution
Let Y be the time between buses. The probability density function of Y is
f (y) = 4e−4y for y > 0.
Expectation of Y is E[Y ] = 1/λ = 1/4 hours =15 minutes.
1.
E[Arrival of next bus | your arrive at 9:00 am] = 9 : 00 + E[Y ]

= 9 : 00 + 15 min = 9 : 15 am.
The expected time of the next bus is 9:15 am.
2.
E[Time until next bus arrives | last bus arrived at 8:43 and you arrived at 9:00 am]
= 9 : 00 + 1/4 hour = 9 : 00 + 15 min = 9 : 15 am
Example
Find the median η of an exponential distribution with parameter λ.
Solution
Z 0
F (x) = λe−λu du = 1 − e−λx if x ≥ 0.
x
η is the median if F (η) = 21 . Then

1
F (η) = 1 − e−λη =
2
1
= e−λη
2
2−1 = e−λη
20
Taking the natural logarithm of both sides, we obtain
− loge 2 = −λη
loge 2
η= .
λ
References
Anderson, D.R. and Sweeney, D.J. and Williams, T.A.(1991) Introduction to Statistics: Concepts
and Application. Thomson South-Western Publishing Company.
21

Elements of Probability

Uploaded by

Copyright:

Available Formats

Elements of Probability

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Elements of Probability

Uploaded by

Copyright:

Available Formats

Elements of Probabilities

May 13, 2016

1.1 Experiments, sample space and probability

Experiment Experimental outcomes

1.2 Counting rule for multiple-step experiments

1.2.1 Counting Rule For Combinations

Definition 3 The number of combinations of N objects taken n at a time is

1.2.2 Counting Rule For Permutations

Definition 4 The number of orderings of N elements is N (N − 1)(N − 2) . . . 1 = N !.

This is an equivalent notion to selection of elements without replacement.

Definition 5 The number of permutations of N objects taken n at a time is given by

1.2.3 Tree diagram

0 ≤ P (Ei ) ≤ 1 for all i (3)

P (E1 ) + P (E2 ) + . . . + P (En ) = 1 (4)

2.2 Methods of Assigning Probabilities

2.2.2 Relative Frequency method

E1 = their offer is accepted and E2 = their offer is rejected

3 Events and their Probabilities

3.1 Addition Law

By the addition law, the probability of the union of A and B is

P (A ∪ B) = P (A) + P (B) − P (A ∩ B).

3.2 Conditional Probability

3.3 Multiplication Law

P (A ∩ B) = P (B) × P (A|B) and P (B ∩ A) = P (A) × P (B|A).

The multiplication law for independent events is

If P (A ∩ B) = P (A) × P (B), then A and B are independent; if P (A ∩ B) 6= P (A) × P (B), then

3.4 Bayes Theorem

4 Notes and Comments

Definition 12 A random variable X is a discrete random variable if:

1. there are a finite number of possible outcomes of X, or

2. there are a countably infinite number of possible outcomes of X.

1. X=1 if the student’s CGPA ≥ 3.0.

2. X=2 if the student’s CGPA < 3.0.

Definition 14 The mathematical definition of a discrete probability function, f (x), is a function

1. The probability that X can take a specific value is

2. P (x) is non-negative for all real x.

Examples of discrete probability distributions include Bernoulli distribution, binomial distribu-

Definition 15 The mathematical definition of a continuous probability function, f (x), is a func-

1. The probability that X is between two points a and b is

2. It is non-negative for all real x.

Examples of continuous probability distributions include normal distribution, uniform distribution,

5.1.1 Basic properties of binomial distribution

Basic properties of binomial distribution include

1. The experiment consists of n repeated trials;

3. The expected value of X having binomial distribution with parameters n and p is

4. The variance of X having binomial distribution with parameters n and p is

V ar(X) = np(1 − p).

1. no child will have the disease?

2. one child will have the disease?

3. all the children will have the disease?

5.2 Poisson Distribution

Basic properties of poisson distribution include

1. The expected value of X having poisson distribution with parameter λ is

2. The variance of X having poisson distribution with parameter λ is

1. one borehole in a randomly selected town?

2. at most one borehole in a randomly selected town?

3. at least one borehole in a randomly selected town?

Solution: In this example, λ = 3 and

P (X ≤ 1) = P (X = 0) + P (X = 1) = 0.0498 + 0.1494 = 0.1992.

2. exactly one call in a 5-minutes interval?