Probability & Statistics MTH 2401 STV V54

Download as pdf or txt
Download as pdf or txt
You are on page 1of 273

PROBABILITY AND STATISTICS

with Applications to Reliability and R

MTH 2401
Lecture Notes
Instructor: Eugene Dshalalow
Department of Mathematical Sciences
Florida Institute of Technology
Melbourne, FL 32901
[email protected]
CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

CHAPTER I. FOUNDATIONS OF
PROBABILISTIC MODELING

1. The Probability Space


A probability model includes three major components: a sample space, a collection of events
(referred to as a 5-algebra of events), and a probability measure defined on the 5-algebra of
these events.

H - Sample Space (a carrier set from which all pertinent events are to be drawn)

Y ÐHÑ - a collection of some specified subsets of H referred to as events. By default, H − Y ÐHÑ


and Ø (empty set) − Y ÐHÑ, so that the smallest 5 -algebra is Y ÐHÑ œ ÖHß Ø×.

We define a probability measure after introducing some set operations.

Basic set operations:

E ∪ F (union) of two events E and F means: E or F occur.

A B

A B

E ∩ F (intersection) of events E and F means: both occur.

A A B B

Probability and Statistics MTH 2401, Page 2, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

E- (complement) of event E: means E does not occur.

A
Ac

E  F œ E ∩ F - (difference) E occurs, but F does not

A B

Two subsets E and F of H are called disjoint if E ∩ F œ ØÞ

Set-Theoretical Laws
1) Intersection and Union are commutative and associative:

E ∪ F œ F ∪ Eß E ∩ F œ F ∩ Eß

ÐE ∪ FÑ ∪ G œ E ∪ ÐF ∪ GÑß ÐE ∩ FÑ ∩ G œ E ∩ ÐF ∩ GÑ

2) Distributive Laws

E ∩ ÐF ∪ GÑ œ ÐE ∩ FÑ ∪ ÐE ∩ GÑ

E ∪ ÐF ∩ GÑ œ ÐE ∪ FÑ ∩ ÐE ∪ GÑ

MTH 2401, LECTURE NOTES, Page 3, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

3) De Morgan's Laws

ÐE ∪ FÑ- œ E- ∩ F -

ÐE ∩ FÑ- œ E- ∪ F - .

Probability (Measure) is a function whose domain is Y ÐHÑ, valued in Ò!ß "Ó, and which satisfies
two main axiomsÞ More specifically,

T À Y ÐHÑ Ä Ò!ß "Ó is a function on Y ÐHÑÞ It is called a probability if

a) T ÐHÑ œ " (Axiom 1)

b) For any two exclusive (i.e. non overlapping) events Eß Fß

T ÐE ∪ FÑ œ T ÐEÑ  T ÐFÑÞ (Axiom 2)

b') For any sequence E" ß E# ß á of mutually exclusive events,

T   E3 

(Axiom 2')
3œ"

Note that Axiom 2 is a special case of Axiom 2' (if we let E# œ E$ œ á ), and Axiom 2' is the
original second axiom of probability. However, for convenience, its special case, Axiom 2, is
separately formulated. Axiom 2 is called the additivity axiom, whereas Axiom 2' is referred to as
the 5-additivity axiom.

A random experiment can be modeled by ÐHß Y ÐHÑß T Ñ, called a probability space - a


conglomerate of a sample space, collection of pertinent events and a measure of likelihood for
each of these events to occur. [Note that Y ÐHÑ is not just an arbitrary collection of pertinent
events but a family being closed by all finite or countable set operations and it is then called a 5-
algebra of events.]

Example 1.1. Suppose we are to roll a die and interested in the event of obtaining an even
number of dots. A reasonable and fairly compact model would be one with
Y ÐHÑ œ ÖHß Øß Eß E- ×, where

H œ Ö." ß á ß .' ×ß E œ Ö.# ß .% ß .' ×ß E- œ Ö." ß .$ ß .& ×ß .3 œ # dots.

With no further information about the die, we assume that it is homogenous with each outcome
to occur equally likely. We then postulate T ÐEÑ œ T ÐE- Ñ œ "# along with T ÐHÑ œ " and
T ÐØÑ œ !. (This model may turn out to be unrealistic if the die is biased. The latter can be
further investigated by conducting statistical experiments.) It is readily seen that the model is
consistent and the probability measure satisfies conditions (or rather “axioms”) (a) and (b). 

MTH 2401, LECTURE NOTES, Page 4, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

In Example 1.1, suppose we roll a biased die so that T ÐEÑ œ : and thus T ÐE- Ñ œ "  :, for
some : − Ò!ß "Ó. Then, the probability measure on Y ÐHÑ,

T À Y ÐHÑ Ä Ö"ß !ß :ß "  :×,

is called Bernoulli.

Definition 1.1. A generic probability space ÐHß Y ÐHÑß T Ñ is called a Bernoulli probability space
if Y ÐHÑ œ ÖHß Øß Eß E- × for some nonempty subset E of H and if T is the Bernoulli probability
measure on Y ÐHÑ, i.e. T À Y ÐHÑ Ä Ö"ß !ß :ß "  :×. 

A few more properties of every probability measure, which can be easily proved:

1) If E © F ("E is a subset of F " or "E implies F ")ß

T ÐF  EÑ œ T ÐFÑ  T ÐEÑ, (1.1)

where F  E in general is the rest of F after all points of F belonging to E (if any) are removed.
In particular, it follows that T ÐEÑ Ÿ T ÐFÑ (monotonicity). Taking F œ H, we obtain from (1.1)
that

2) T ÐE- Ñ œ "  T ÐEÑ. (1.2)

3) For any two events Eß F it holds true that

T ÐE ∪ FÑ œ T ÐEÑ  T ÐFÑ  T ÐE ∩ FÑ. (1.3)

The latter is due to the partition of E ∪ F as E and ÒF  ÐE ∩ FÑÓ and the use of additivity
Axiom 2.

Laplacian Space. A probability space (which is a probabilistic model of an experiment)


ÐHß Y ÐHÑß T Ñ is Laplacian if

1) H is finite, say H œ Ö=" ß á ß =8 ×, i.e. with 8 elements.

2) For each 3 œ "ß á ß 8, Ö=3 × − Y ÐHÑ. This means that singletons are events, called
elementary events. The latter immediately yields that all subsets of H are events. Indeed, all we
need to do is to include all unions of singletons in Y ÐHÑ and thus run over all subsets of H. (In
set theory, such a collection is referred to as the power set, in notation, cÐHÑ.)

3) Elementary events have identical outcomes. In order for this to happen, we need
T ÐÖ=3 ×Ñ œ 8" ß for each 3 œ "ß á ß 8.

Remark 1.1. In Example 1.1, rolling a homogeneous die can be alternatively modeled by a
Laplacian space, because the outcomes of rolling are equally likely. However, we still need to

MTH 2401, LECTURE NOTES, Page 5, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

formally include singletons Ö.3 ×'s in the 5-algebra Y of the pertinent events. Just having this in
mind is insufficient. Then,

T ÐÖ.3 ×Ñ œ "' ß for each of the six faces of the die.

Now, we can readily show that T ÐEÑ œ "# . Indeed,

T ÐEÑ œ T ÐÖ.# ß .% ß .' ×Ñ œ T ÐÖ.# ×Ñ  T ÐÖ.% ×Ñ  T ÐÖ.' ×Ñ

(by the additivity axiom) œ $ † "' .

In general, for an arbitrary Laplacian space,

lEl
T ÐEÑ œ lHl , where l † l denotes the number of elements. (1.4)

Indeed, if E œ Ö=" ß á ß =5 × (for example) and H œ Ö=" ß á ß =8 ×, then by the additivity axiom,

T ÐEÑ œ T ÐÖ=" ×Ñ  á  T ÐÖ=5 ×Ñ œ 5 † 8" .

Remark 1.2. In Example 1.1, of course, we need not include all singletons in Y ÐHÑ as far as our
original model. We can change the setting as follows. Introduce new sample space
~ ~"ß =
~ # ×ß defining Ö=
~ " × À œ E and Ö=
~ # × À œ E- . Physically speaking, we partition the
H œ Ö=
sample space into events "even number of dots" and "odd number of dots". In this case, the die
turns to a "coin" with two faces. 

MTH 2401, LECTURE NOTES, Page 6, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

1.1. A pair of fair dice is tossed. What is the probability of the event E µ “getting a total of 7”?
Construct a relevant probability space and then determine T ÐEÑ. Justify your calculation.

straight. Suppose two vehicles are moving through the intersection. 3 Describe the sample
1.2. In an experiment, a busy intersection is observed at which vehicles can turn left, right, or go

space and each of the sample points. 33 If all sample points are equally likely, what is the
probability that at least one car goes straight? 333 Given that all points are equally likely, what
is the probability that at most one car turns?

1.3. Let ÐHß Y ÐHÑß T Ñ be a probability space and E and F two events such that T ÐEÑ œ "# and
T ÐE ∩ FÑ œ "% Þ Find T ÐE ∩ F - ÑÞ Hint: Show that E ∩ F - œ E  ÐE ∩ FÑ and then use
Property 1, equation (1.1).

1.4. The probability that an American computer company will outsource its technical support to
China is 0.6; the probability that it will outsource its support to India is 0.5, and the probability
that it will outsource its support to either China or India or both is 0.9. What is the probability
that the company will outsource its technical support (a) to both countries, (b) to neither
country?

1.5. Let Eß Fß and G be three events. Find the expression for the events

(a) only E occurs


(b) both E and F occur, but G does not

1.6. Suppose that E and F are mutually exclusive events for which T ÐEÑ œ !Þ# and
T ÐFÑ œ !Þ%. What is the probability that: (a) either E or F occurs; (b) E occurs, but F does not;
(c) both E and F occur.

1.7. Suppose E and F are two events such that T E œ !Þ* and T F  œ !Þ'. Can
T E ∩ F  œ !Þ#?

1.8. Under the conditions of Problem 1.7, what is the smallest possible value of T E ∩ F ?
What is the largest possible value of T E ∩ F ?

1.9 Prove that:

+Ñ ÐE  FÑ- œ E- ∪ F .
,Ñ ÒÐE- ∪ FÑ- ∪ ÐE ∪ F - ÑÓ- œ F  E.
-Ñ ÐE ∩ FÑ ∪ ÐE ∩ F - Ñ ∪ ÐE- ∩ FÑ œ E ∪ F .

MTH 2401, LECTURE NOTES, Page 7, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

2. Combinatorial Probability
MODEL 1. (Multiplication). Suppose there are two boxes filled with balls. Box 1 has 7 balls
and Box 2 has 8 balls. In how many different ways two balls, one from Box 1 and another from
Box 2, can be chosen? The answer is simple. The total number of pairs Ð," ß ,# Ñ total 7 † 8ß
which is the content of the Cartesian product 7 ‚ 8 of the total number of different pairs of
balls.

Now, if we have < boxes, filled with 7" ß á ß 7< balls, respectively, then clearly the total
number of different ordered <-tuples of balls chosen from the boxes equals the content of the
Cartesian product 7" ‚ á ‚ 7< and thus is 7" † á † 7<.

Example 2.1. The Florida license test is a multiple choice of 20 questions. Each of the questions
is supplied with four different answers of which only one is correct. Suppose a candidate needs
to answer all questions correctly in order to pass. What is the probability that the candidate will
pass the test?

Solution. In the lack of any other conditions, we assume that the candidate choice of answer will
undergo a random sequence of trials. The sample space H will contain 4#! elementary points,
corresponding to the experiment with 20 boxes, each containing four balls. So, the 3th
elementary event will contain a 20-tuple of chosen responses:
=3 œ Ö<3" ß á ß <3#! ×ß 3 œ "ß á ß % Þ The space is Laplacian and passing the test would mean to
#!

choose just of the %#! ¸ "Þ" ‚ "!"# elementary events. This by (1.4) would yield
"
T ÐÖ=5 ×Ñ œ %#! ¸ !Þ*" ‚ "!"# Þ 

MODEL 2. (Permutations). We will begin with an example.

Example 2.2. Suppose four students enter an elevator which runs six upper floors of the
Crawford Science Building. What is the probability that all four students will get off the elevator
singly, i.e. not more than one of them will leave the elevator at any floor?

Solution. Firstly we need to identify the combinatorial problem we deal with. Suppose there is
box with 8 numbered empty cells and 5 Ð Ÿ 8Ñ numbered balls. Assume that a cell contains a
maximum of one ball at a time. The experiment consists of randomly distributing the 5 balls
among the cells. The question raises: how many different placements can be rendered?

1
4 2 5 1 3

Figure 2.1

MTH 2401, LECTURE NOTES, Page 8, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

To perform the experiment, we drop off the first ball into the box and see that the first ball has 8
different placements. After the first ball is situated, the second ball has only 8  " different
possibilities. Using the multiplication theorem, we find that the first two balls have 8Ð8  "Ñ
different placements. It is like pairing two balls out of two boxes, with 8 and 8  " balls,
respectively, in light of multiplication theorem.

Continuing the process with the rest of the balls, we obtain that the total number of placements
of 5 numbered balls in 8 numbered cells is
8!
T Ð8ß 5Ñ œ 8Ð8  "ÑâÐ8  5  "Ñ œ Ð85Ñ! . (2.2)

Formula (2.2) is for the number of permutations of 5 out of 8.

Now, back to the elevator problem. Firstly, we identify the sample space H œ Ö=" ß á ß =R ×,
where R is the total number of all placements of four students among six floors without
restrictions. In this case, each of the four students can be identified as a box and six floors can be
put into each of the students as balls into boxes. So, thus we have four boxes (students), each
filled with six balls (floors), so that if we happen to draw floor number 1 from student 1 and
floor number 1 from student 2 and floor number 3 from student 3 and floor number 3 from
student 4, we have them get off on floors 1 and 3, two on each. By the multiplication theorem,
we thus have

R œ 6%

different placements of the students without restrictions. Now, of the R elementary placements,
we need to pick out those =3 's where students get off singly. So, E œ Ö=3" ß á ß =3: × where : is
the number of permutations from (2.2). Because the space is Laplacian and thus

lEl
T ÐEÑ œ lHl ß

'!
all we need is to determine lEl, which by (2.2), is Ð'%Ñ! . The final answer is

'! &
T ÐEÑ œ Ð'%Ñ!'% œ ") .

MODEL 3. (Combinations). Assume that in the permutation box, the balls are not numbered,
so that the location of balls in the cells only matters but not how the balls permute among

denoted by  85 , it is obvious that each of the  85  combinations is to be 5 ! times multiplied in


themselves. If the total number of such placements, referred to as combinations of 5 out of 8 is

order to restore the permutations and thus

T Ð8ß 5Ñ œ 5 ! 85 

 85  œ
or
8!
Ð85Ñ!5 ! . (2.3)

MTH 2401, LECTURE NOTES, Page 9, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Combinations occur more frequently in applications and play a more important role than
permutations. In particular,

Ð+  ,Ñ8 œ   85 +5 ,85


8
(2.3a)
5œ!

is one of the major formulas in mathematics, known as the binomial formula. One important
application of this formula is when + œ , œ "Þ Then we have

  8  œ #8 .
8
5 (2.3b)
5œ!

For instance, if E is an 8-elements set, E œ Ö+" ß á ß +8 ×, in how many different ways can we
select one-, two-, etc. elements subsets of E including the empty set? In other words, what is the
cardinal number of the power set cÐEÑ, i.e. the quantity of all subsets of E including the empty
set and itself? Since the order within a set does not matter, a selection of a 5 -elements set will be

correspond to the selection of a subset with elements +# ß +& ß +( . So, we have  85  quantity of 5 -
equivalent to the placement of 5 balls in 8 numbered cells. The occupied cells, i.e. 2,5,7, will

elements subsets of E. And the total number of subsets of E is #8 .

Example 2.3 (Lottery Game). In a typical 6 from 49 lottery, 6 numbers (in the form of balls) are
drawn from 49. If the 6 numbers on a ticket match the numbers drawn, the ticket holder is a
jackpot winner. This is true regardless of the order in which the numbers are drawn. On the day
of drawing, exactly six numbers are randomly generated. Any participant (of 18 years of age or
older) who buys a state-issued ticket fills it out trying to "guess" a forthcoming sequence of six
numbers. As mentioned, the order in which the six numbers appear during the drawing is
irrelevant. Hence we identify the sample space as

H œ Ö=" ß á ß =R ×,

where R œ  %* 
' œ 13,983,816 and an elementary event is a combination of six numbers in an
increasing order. For instance,

Ö=" × œ ÖÐ"ß #ß $ß %ß &ß 'Ñ×Þ

Since all outcomes are regarded to be equally likely, the related probability space ÐHß Y ÐHÑß T Ñ
is Laplacian and thus all singletons are measurable (or elementary events as noted). Therefore,
the probability to be a jackpot winner is T ÐN Ñ œ "ÎR œ "Î"$ß *)$ß )"' and thus it is very small.

On the other hand, there are also smaller awards for guessing just 4 (or 5) out of 6 right numbers,
but clearly much more likely to win. So we wonder what is the probability of the event E µ to
guess exactly 4 out of 6. Since the pertinent probability space is Laplacian,
T ÐEÑ œ lElÎlHl œ lElÎR . To find lEl we note that E contains any combination of 4 out of 6

MTH 2401, LECTURE NOTES, Page 10, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

right numbers, i.e.  '%  quadruples combined with  %$ 


# pairs of wrong numbers. The two named
combinations ought to be multiplied, due to the Multiplication Theorem. So,

 '%  %$ 
 %* 
T ÐEÑ œ #
¸ !Þ!!!*')'"*Þ (2.3c)
'

Note that you can use R operators to calculate it:

> (choose(6,4)*choose(43,2))/choose(49,6)
[1] 0.0009686197

Analogously, the probability of F µ right guessing of 5 out of 6 is

 '&  %$ 
 %* 
T ÐFÑ œ "
¸ !Þ!!!!")%%*. (2.3d)
'

Finally, the probability of guessing right at least 4 out 6 is

T ÐEÑ  T ÐFÑ  T ÐN Ñ ¸ !Þ!!!*)("$*Þ 

rectangle !ß < ‚ !ß ?. The particle moves only to the right or upward as shown on the figure
Example 2.4. Consider a “random walk” of a particle along an integer lattice (grid) within a

below.

(r , u )

(0,0)
In how many different ways can the particle move from point !ß ! to point <ß ?? In other
words, how many different paths are there connecting points !ß ! and <ß ??

Solution. Obviously, however the particle moves, it needs to run exactly < right steps and ?

Given a particular path of the particle starting from !ß !, let us put a ball in cell 1 if the particle
upward steps. Now, consider the associate one-box-model of <  ? numbered cells with < balls.

moves to the right and leave cell 1 empty if the particle moves upward. Following the path as in
the figure above we then have the associated placement of the balls.

MTH 2401, LECTURE NOTES, Page 11, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

...

with <  ? numbered cells and it is  <? .


Hence, the total number of paths in question equals the number of placements of balls in the box
< 

!ß ! to <ß Aß so that on its way it passes through the point +ß ,?
Example 2.5. Under the condition of Example 2.4, in how many way can the particle move from

(r , u )

( a, b)

(0,0)
Solution. The number of all paths through +ß , is  +,  and the number of all paths from +ß ,
to <ß ? is  <+?, . The total number of pertinent paths is therefore the product
+

 +,  <+?,  if we apply the multiplication model.


<+
+ <+ 

moves from point !ß ! to point <ß ? it passes through the point +ß , .
Example 2.6. Under the condition of Example 2.5, find the probability that when the particle

Solution. Let Hß Y Hß T  be the probability space describing the model. Then,
H œ =" ß á ß =R ß where =3 is the 3th path of the particle and R is the total number of different
paths. E.g.,

=3 œ !ß !ß !ß "ß "ß "ß "ß #ß á ß <ß ?

as in the figure above. Now, R œ  <? <


. Furthermore, we postulate the space is Laplacian,

moves from point !ß ! to point <ß ? it passes through the point +ß ,  equals
because all paths are equally likely. Therefore, the probability of event E that when the particle

T E  œ E 
H œ  +,
+
 <+?,
<+
 <?
<
. 

MODEL 4. (Hypergeometric Model)Þ Consider the following generalization of the lottery


game. Suppose a box is filled with a total of R balls of two colors: < - red and A - white so that
<  A œ R . In an experiment one takes a sample of 8 ( Ÿ R ) balls. What is the probability that
5 of them are white?

MTH 2401, LECTURE NOTES, Page 12, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Solution. First identify the sample space as H œ Ö=" ß á ß = R  × whose cardinal lHl equals the
8
number of combinations of 8 (different samples of balls) out of R , first without the care of the

(now elementary events) are equally measurable, i.e. T ÐÖ=4 ×Ñ œ lH" l œ " R8 Þ Thus, we have
colors. Take as usual, Y ÐHÑ œ c ÐHÑ including the singletons Ö=4 × and postulate that all of them

ÐHß Y ÐHÑß T Ñ Laplacian.

out of < red (whose number is  5< ) and the remaining 8  5 are white (whose number is  85
A 
Now, we need to figure out the event E that includes all those elementary events with exactly 5
).
 <  A   <  R < 
Thus, by the multiplication theorem, lEl œ 5 85 œ 5 85 . Finally,

 5<  85
A   5<  R85
< 

 R8   R8 
lEl
T ÐEÑ œ lHl œ œ . (‡)

We must restrict 5 as to running from maxÖ8  Aß !× to minÖ<ß 8×Þ

To reduce this model to the lottery game we have R œ %*ß < œ 8 œ 'ß A œ %$. Finally, (‡) is
fully reduced to (2.3) if we take 5 œ %. 

MODEL 5. (Generalized Placements. The Multinomial Theorem.) We generalize


combinations in Model 3 by selecting 8 balls of < different colors and placing them in the same
box of 8 cells as in Model 3. So, we wonder how many different placements are possible.

To solve this problem, we first rephrase the problem in Model 3 (with 5 balls) by considering 8
balls of which 5 are red and 8  5 are white.

Then, during the process of placements, we identify empty cells in Model 3 with those occupied
by white balls to arrive at the same number

 85  œ 8!
Ð85Ñ!5 ! (2.6)

MTH 2401, LECTURE NOTES, Page 13, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

of different placements of red and white balls (which are distinguishable by colors only.

Equation (2.6) can be rewritten as

8x œ 5xÐ8  5Ñx 85 Þ (2.6a)

The latter can be interpreted as follows. To have 8x different permutations of 8 balls (regardless
of their colors) we can combine 5x different permutations of 5 red balls with Ð8  5Ñx different

multiplying 5xÐ8  5Ñx by  85  we take into account all different placements of the balls and as it
permutations of 8  5 white balls, provided that they occupied 5 and 8  5 fixed cells. Then,

follows from (2.6a) we restore 8xÞ

Now, suppose we have a set of 8 balls of < different colors so that 83 balls are of color -3 and
8"  á  8< œ 8.

Emulating the similar experiment with 8 such balls and denoting  8" á
8
8<  the number of
different placements of 8 balls distinguished by colors only, we deduce that

8x ( œ # permutations with no regard of colors) œ 8" xâ8< x 8" á


8
8< 

and thus from the last equation,

 8" á
8
8<  œ
8x
8" xâ8< x Þ (2.6b)

The following is a useful generalization of the binomial theorem:

Ð+"  á  +< Ñ8

  8" á 8< +" â+< .


8 8"
œ 8<
(2.6c)
ÖÐ8" ßáß8< ÑÀ8" á8< œ8×

Explanation. For +" œ +ß +# œ ,ß and +$ œ -ß we have the following special case.

+  ,  - 8 œ 83œ! 83
4œ!  3 4 834 + , -
8 3 4 834
.

Next,

 +  ,  -  . 8

MTH 2401, LECTURE NOTES, Page 14, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

œ 83œ! 83
4œ!
834
5œ!  3 4 5 8345 + , - .
8 3 4 5 8345
.

Finally, (2.6c) can be rewritten as

Ð+"  á  +< Ñ8

œ 88" œ! 88 "


8# œ! â
88 " á8<"
8< œ!  8" á8
8
<
+"8" â+<8< .

Example 2.7. In how many different ways can 9 security officers patrol three different sites on

The answer is obviously  # *$ % Þ


FIT campus such that in the Areas 1ß 2ß and 3, there must be 2, 3, and 4 officers, respectively?

We can explain the above result as follows. Suppose we identify 9 officers as S" ß á ß S* prior to
the task. Now, we assume that those officers patrolling Area 1 will wear blue uniforms, those
patrolling Area 2 will wear red uniforms, and those patrolling Area 3 will wear green uniforms.
Consider the following distribution of the officers:

S" ß S# ß S$ ß S% ß S& ß S' ß S( ß S) ß S*

The associated box model will carry * cells filled with * balls of which the first, fifth and eighth
are of red color, fourth and sixth are blue and the rest are green. The total number of different

three are red, and four are green. The latter equals  # *$ %  as per formula (2.6b).
patrolling arrangements equal the number of placements of nine balls of which two are blue,

MTH 2401, LECTURE NOTES, Page 15, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

2.1. How many different seven-digit phone numbers can be generated provided that the first digit
cannot be zero?

2.2. Prove that for any 8   "ß

 8!    8"    8#   á  Ð  "Ñ8  88  œ !.

2.3. Show that  85  œ  85


8 
.

2.4. Show that  8"


5
 œ  85    5"
8 
.

=" ß =# ß á , event E in question, and calculate T EÞ


In the below problems, identify the associated sample space H, the elementary points

2.5. Suppose three friends enter together a train at one of the stops. Suppose that the train needs
to make five more stops until its final destination. Considering that we have no prior knowledge
where the three friends are going to get off the train, what is the probability that all three of them
get off at different stops?

2.6. In the above lottery game (Example 2.3), find the probability that a single ticket will match
exactly three out of six winning numbers.

Answer. !Þ!"('&!%. Note that you can use R operators to calculate it:

> (choose(6,3)*choose(43,3))/choose(49,6)
[1] 0.0176504

2.7. If & books are picked at random from a shelf containing 7 novels, 4 books of poems, and 3
dictionaries, what is the probability of event E that 3 novels and 2 books of poems are selected?
Prior to calculation of T ÐEÑ, construct the sample space and explain how the probability of E is
calculated based on Laplacian space argument.

2.8. If 8 students are randomly seated in a row, what is the probability that two of them, say A
and B will seat next to each other?

2.9. In the context of the lottery game (Example 2.3), what is the minimal number of lottery
tickets one must buy in order that at least one of them will be giving at least four right numbers
with probability one, provided that all of them are filled out differently, but with no particular
strategy?

2.10. In the context of the lottery game (Example 2.3), one buys 1000 lottery tickets and fill
them out differently, but with no particular strategy. What is the probability that at least one of
them gives at least four right numbers?

MTH 2401, LECTURE NOTES, Page 16, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

2.11. Suppose that a deck of 30 cards containing 10 red cards, 10 blue cards, and 10 green cards,
is distributed at random among three people so that each gets 10 cards. What is the probability
that each person receives the cards of the same color?

three-dimensional rectangle, starting from point !ß !ß ! and terminating its walk at T ß Uß V .
2.12. Suppose a particle randomly moves in a three-dimensional integer lattice bounded by a

( P, Q , R )

(0,0,0)

How many different paths are there, provided that the particle can move only in positive
direction along any of the three axes? Explain your steps and justify your answer.

Answer.  TTUV
U V .

2.13 Calculate the number of different arrangements of all letters in the word

SOCIOLOGICAL

Solution. We use model 5. If necessary, we can associate letters with balls of different colors.
For instance, letter S can be associated with three red balls, etc. Altogether we have three S'sß
two G 's, two M 's, two P's, one Wß one Eß and one K. The answer is

$ "# 
# # # " " " . 

from !ß !ß ! to T ß Uß V ß it passes through a point :ß ;ß <.


2.14. Under the condition of Problem 2.12, find the probability that when the particle moves

MTH 2401, LECTURE NOTES, Page 17, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

( P, Q, R )

( p , q, r )

(0,0,0)

MTH 2401, LECTURE NOTES, Page 18, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

3. Conditional Probability and Bayes Formula

A Background of Conditional Probability. Suppose a population of R people living in a town


T consists of 7 males and A females. Also assume that of R people - are colorblind. Suppose a
randomly selected person is a female. What is the probability that she is colorblind?

Obviously, this probability is -AÎA, where -A is the number of colorblind females. We will
formalize it. If [ and G are the sets of all females and colorblind people, then the above
probability -AÎA can be expressed as lG ∩ [ lÎl[ lß where lEl is the number of elements in a
finite set E. This, as the reader sees it, agrees with the axioms of a Laplacian probability space.
In this case, of course [ represents a sample space and the event G ∩ [ is the trace of G on
[ , thus it is in [ .

So we see here that Ð[ ß Y Ð[ Ñß T[ Ñ is a probability space formed from the original probability
space ÐHß Y ÐHÑß T Ñ, where H represents the population of town T. Also,
T[ ÐGÑ œ lG ∩ [ lÎl[ l good for G and any other measurable subset P of H, with its trace on
[.

We would like to express the measure T[ through the original measure T earlier acting on
Y ÐHÑ. Dividing the numerator and denominator in the last fraction by R we will arrive at the
same result

lG ∩[ lÎR lG ∩[ lÎlHl
-AÎA œ lG ∩ [ lÎl[ l œ l[ lÎR œ l[ lÎlHl (3.1)

with a different interpretation. Obviously, the new numerator of (3.1) represents the probability
that a randomly selected individual is a colorblind female (i.e. simultaneously a female and
colorblind), while denominator is the probability that a randomly chosen person is a female.
Altogether it becomes a ratio of two probabilities, T ÐG ∩ [ Ñ over T Ð[ Ñ thus having

lG ∩[ lÎR T ÐG ∩[ Ñ
-AÎA œ l[ lÎR œ T Ð[ Ñ . (3.2)

Notice that the sets G and [ now turned to be events from Y ÐHÑ. Most importantly, we can say
that the above ratio in (3.2) is the conditional probability of a randomly chosen person to be
colorblind given that this person is a female, in notation T ÐGl[ Ñ. So we have

MTH 2401, LECTURE NOTES, Page 19, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

T ÐG ∩[ Ñ
T[ ÐGÑ œ T ÐGl[ Ñ œ T Ð[ Ñ . (3.3)

Among very important applications of the conditional probability formula, we have the total
probability formula and Bayes formula.

The Total Probability Formula. If the sample space H is partitioned into 8 disjoint subsets
H œ L" ∪ á ∪ L8 referred to as hypotheses and if E © H is another set (all are events), then E
(by the distributive law) will also be partitioned as

E œ H ∩ E œ L" ∪ á ∪ L8  ∩ E œ ÐE ∩ L" Ñ ∪ á ∪ ÐE ∩ L8 Ñ.

In other words, E ∩ L3 is the trace of E on L3 and all these traces are disjoint. Applying
probability measure T to the left and right-hand sides of the last equation and using the
additivity axiom (b), we have

T ÐEÑ œ  T ÐE ∩ L3 Ñ œ  T ÐElL3 ÑT ÐL3 Ñ


8 8
(3.4)
3œ" 3œ"

after applying the conditional probability formula to each of the 8 summands. The formula we
arrived at is called the total probability formula and it can also be extended in the same way to
an infinite sum (series).

The Bayes Formula. Now we turn to the celebrated Bayes formula, which is foundational in
probability and statistics, known as Bayesian statistics. [Thomas Bayes, ca. 1702 - April 17,
1761, was a British mathematician who is the author of the formula.]

Suppose the probability of some hypothesis, say L5 , is known and equals T ÐL5 Ñ. It is referred
to as a prior probability of L5 . Suppose an event E related to L5 has occurred and we want to
reevaluate T ÐL5 Ñ after event E occurred. Consequently, an additional information on L5 via E
has entered. More formally, we need to calculate T ÐL5 lEÑ called the posterior probability.
Using the conditional probability formula (3.3) twice and the total probability formula (3.4)
gives

T ÐL5 ∩EÑ T ÐElL5 ÑT ÐL5 Ñ


(3.5)
 T ÐElL3 ÑT ÐL3 Ñ
T ÐL5 lEÑ œ T ÐEÑ œ 8

3œ"

known as the Bayes formula.

Example 3.1. Suppose that based on symptoms a patient has, his doctor is 60% certain that the
patient has a particular disease. If doctor's suspicions would be overwhelming, say at least 85%,
then he would recommend a surgery. Under these circumstances, the doctor opts for quite an
invasive and expensive procedure, which unfortunately is not 100% reliable. In particular, the
test can show positive even if the patient does not have the disease (false positive), because of
his diabetes. Chances of this is 30%. On the other hand, the test can show negative if the patient

MTH 2401, LECTURE NOTES, Page 20, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

does have the disease (false negative) in 10% of all cases. Question is, in the event the test shows
positive, how much higher the prior estimate of 60% should increase to make the test worth
rendering. Can we accurately predict the results of this before running the test in order to see if a
positive test will elevate the prior from 60% to 85% or higher?

Solution. We start with identifying the reference event E and the related hypotheses:

E µ test shows positive

L" µ patient has the disease, with the prior T ÐL" Ñ œ !Þ'

L# µ patient does not have the disease, with the prior T ÐL# Ñ œ !Þ%

The further identification of the related conditional probabilities from (3.5) gives

T ÐE- lL" Ñ œ !Þ" and thus T ÐElL" Ñ œ !Þ*

T ÐElL# Ñ œ !Þ$ß

Using Bayes posterior probability formula (3.5) we have


!Þ*†!Þ'
T ÐL" lEÑ œ !Þ*†!Þ'!Þ$†!Þ% œ !Þ)"). (E3.1)

As we see it from (E3.1), even if the result of the test turns out to be positive, the (posterior)
probability of the patient to have the disease would raise from 0.6 to 0.818, but not high enough
to warrant a surgery. Consequently, the test will not be recommended. 

MTH 2401, LECTURE NOTES, Page 21, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

3.1. Two fair dice are rolled. What is the conditional probability that at least one lands on 3
given that the dice land on different numbers?

3.2. Let E § F . Using the conditional probability formula express the following probabilities as
simple as possible:

(a) T ÐElF - Ñ (b) T ÐFlEÑ.

3.3. Peter tries to avoid going to a party which he was invited to. To justify his absence he flips a
coin and if the coin shows heads he goes. Otherwise, he rolls a die to give the party yet another
chance. If the die lands on 6, he goes. Otherwise, he stays home. If Peter ends up being at the
party, what is the probability that the coin he flipped showed Heads?

Solution. Step 1. We need to identify event E and related hypotheses L" ß L# À

E µ Peter goes to the party


"
L" µ Coin shows Headsà T ÐL" Ñ œ # (prior)

"
L# µ Coin shows Tailsà T ÐL# Ñ œ # (prior)

Step 2. Find conditional probabilities of event E under L" and L# :

T ÐElL" Ñ œ "
"
T ÐElL# Ñ œ '

(because this is the probability of having 6 dots when rolling a die)

Step 3. Find T ÐEÑ œ " † "#  "' † "# œ "# (


(by the way, in spite of all seemingly formidable
obstacles, the chances to be at the party are more than 1/2)

"† "#
Step 4: Find the posterior probability using Bayes: T ÐL" lEÑ œ (Î"# œ (' Þ 

3.4. A gambler has in his pocket a fair coin and a biased coin (with chances for Heads 9 out of 10
times). He selects one of the coins at random and flips it twice. Suppose in both cases the coin
shows Heads. What is the probability that the fair coin was selected?

Solution. Step 1: We need to identify event E and related hypotheses L" ß L# À

E µ The coin flipped twice and in both cases shows Heads


"
L" µ Fair coin selected T ÐL" Ñ œ # (prior)

MTH 2401, LECTURE NOTES, Page 22, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

"
L# µ Biased coin selected T ÐL# Ñ œ # (prior)

Step 2. Find conditional probabilities of event E under L" and L# :


"" "
T ÐElL" Ñ œ ## œ %

T ÐElL# Ñ œ !Þ* † !Þ* œ !Þ)"


" " "
Step 3. Find T ÐEÑ œ # † %  # † !Þ)" œ !Þ&$

Step 4: Find posterior probability using Bayes:


" "
#†%
T ÐL" lEÑ œ !Þ&$ œ !Þ#$&)%*Þ 

3.5. A sport team of 18 archers include five who hit the target with probability !Þ), seven -
with probability !Þ(, four - with probability !Þ', and two - with probability !Þ&. A randomly
selected archer shoots a bow and misses the target. To what group does he most likely belong?

Answer: Second group.

3.6. A student forgot the last digit of a phone number she was about to dial. She decided to dial
the last digit at random. What is the probability that the student needs at most three trials?

Answer: !Þ$.

3.7. Urn A contains 2 white and 8 red balls, whereas urn B contains 7 white and 2 red balls. A
ball is drawn at random from urn A and placed in urn B, after which a ball from urn B was drawn
and it happened to be red. What is the probability that the first ball drawn from urn A was also
red?

MTH 2401, LECTURE NOTES, Page 23, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

4. Independent Events

From (3.3), T ÐGl[ Ñ œ T TÐGÐ[


∩[ Ñ
Ñ if [ does not affect G (now just two generic events), we will
define T ÐGl[ Ñ œ T ÐGÑ, thus yielding from this special case of (3.3) that

T ÐGÑT Ð[ Ñ œ T ÐG ∩ [ Ñ. (4.1)

From (4.1) it follows that

T Ð[ ∩GÑ
T Ð[ Ñ œ T ÐGÑ which is also T Ð[ lGÑÞ (4.2)

Hence, if [ does not affect G , then G does not affect [ either and we see that the two events
[ and G are independent in a mutual way. (4.1) can be used as an equivalent definition of
independence of two events.

A very similar notion of independence can be used for independence of more than two events.
Say, for three events, Eß Fß Gß the independence means (a) pairwise like in (4.1) and (b) all
three at once:

T ÐE ∩ F ∩ GÑ œ T ÐEÑT ÐFÑT ÐGÑ. (4.3)

Example 4.1 (Bernstein Tetrahedron). A tetrahedron is a pyramid with four faces each being a
perfect triangle. Suppose we have a homogeneous tetrahedron whose three faces are painted in
Red, Yellow, Green and the fourth face is painted in all three colors. Suppose events Vß ] ß K
mean the appearance of red, yellow and green respectively on a face of the tetrahedron down to
the surface. Are events Vß ] ß K pairwise independent? Are these events independent
(altogether)?

Solution. Check T ÐV] Ñ œ "Î%, because red and yellow appear on only one face that contain all
three colors. On the other hand, T ÐVÑ œ T Ð] Ñ œ #Î% œ "Î# and thus T ÐV] Ñ œ T ÐVÑT Ð] Ñ
meaning that the appearances of colors pairwise are independent. (Other combination of colors
give obviously same results.) However, T ÐV] KÑ œ "Î%, while T ÐVÑT Ð] ÑT ÐKÑ œ  "#  and
$

thus as the triple, Vß ] ß K, are not independent.

Notice that without checking, it would be hard to impossible to figure this out by a mere
intuition. 

The independence of any family of events is defined as independence of any finite subfamily and
the latter, in turn, requires independence of any combinations of the involved events.

MTH 2401, LECTURE NOTES, Page 24, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

4.1. Suppose E and F are independent events such that T ÐEÑ œ "# and T ÐFÑ œ $% Þ Determine
the probability that none of these events will occur. (Justify your actions.)

Solution. Step 1. We need to show first that if E and F are independent, then so are E- and F - :

Step 1a) Show that E and F - are independent:

E ∩ F - œ E  F œ E  ÐE ∩ FÑ

Ê T ÐE ∩ F - Ñ œ T ÐEÑ  T ÐE ∩ FÑ

œ T ÐEÑ  T ÐEÑT ÐFÑ œ T ÐEÑ"  T ÐFÑ œ T ÐEÑT ÐF- Ñ

Step 1b) Show that E- and F - are independent:

If E and F indep Ê E and F - are indep.,


then F - and E indep Ê B- and E- are indep.

Step 2. T ÐE- ∩ F - Ñ œ T ÐE- ÑT ÐF - Ñ

œ "  T ÐEÑ"  T ÐFÑ œ ""


#% œ ") Þ 

4.2. If three balanced dice are rolled, what is the probability that all three numbers will be the
same?

Solution. The three events are independent:


"
Let T ÐE3 Ñ œ probability that the first die shows 3 dots œ '

"
T ÐF3 Ñ œ probability that the second die shows 3 dots œ '

"
Similarly T ÐG3 Ñ œ ' probability that the third die shows 3 dots

T Ðall three dice show the same number)

œ  T ÐE3 ∩ F3 ∩ G3 Ñ œ  T ÐE3 ÑT ÐF3 ÑT ÐG3 Ñ


' '

3œ" 3œ"

""" "
'† ''' œ $' . 

4.3. Consider two independent tosses of a fair coin. Let E be the event that the first toss lands
Heads, F - the event that the second coin lands Heads, and G - be the event that both coins land

MTH 2401, LECTURE NOTES, Page 25, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

on the same side. (a) Show that the events E and G are independent. (b) Show that Eß Fß G are
not independent.

4.4. A break in an electric circuit will occur if at least one of the three independent
components connected in series is out of order. Calculate the probability of the event E that the
break will occur if the components fail with respective probabilities !Þ"ß !Þ$ß and !Þ'.

Answer: T E œ !Þ(%*.

MTH 2401, LECTURE NOTES, Page 26, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

5. Discrete Random Variables


In probabilistic modeling, a very useful tool are functions, which transfer abstract sample spaces
into much handier numeric sets. Consider the following example.

Example 5.1. If we toss a coin twice with

H œ ÖÐX" ß X# Ñß ÐL" ß X# Ñß ÐX" ß L# Ñß ÐL" ß L# Ñ× (5.1)

and we are interested in the number of heads in these two trials, the relevant function will be

\ À H Ä Ö!ß "ß #×. (5.2)

If the coin is fair, the probability defined on "elementary events" ÖÐL" ß L# Ñ× and alike is
"uniform" with "% for each. Correspondingly, declaring Ö!×ß Ö"×ß Ö#× as elementary events in
Y ÐÖ!ß "ß #×Ñ, we will find that
"
T\ ÐÖ!×Ñ œ T Ö\ œ !× œ % (because ! corresponds to ÐX" ß X# Ñ)

"
T\ ÐÖ"×Ñ œ T Ö\ œ "× œ #

T\ ÐÖ#×Ñ œ T Ö\ œ #× œ "% .

So, the function \ induces a new probability measure T\ on Y ÐÖ!ß "ß #×Ñ called the probability
distribution of \ . The function \ is called a random variable (r.v.).

Notice that \ " Ð!Ñ œ Ö\ œ !× œ ÖX" ß X# ×


\ " Ð"Ñ œ Ö\ œ "× œ ÖÐL" ß X# Ñß ÐX" ß L# Ñ×
\ " Ð#Ñ œ Ö\ œ #× œ ÖÐL" ß L# ÑÓ

the sets above must be measurable, i.e. more precisely, we have them belong to Y ÐHÑ. If they do
not, then \ is not a r.v.. 

Example 5.2. Consider the general case of a function

\ À H Ä Hw œ ÖB" ß B# ß á ×

(whose range is Hw ). In other words, B3 œ \Ð=Ñ for some = − H. We will see a graphical
illustration of what makes \ a r.v., as oppose to being just a function on H.

Recall that \ " is the inverse of function \ . Furthermore, \ " ÐBÑ need not be a point in H, but
rather a set as in the above example with tossing of a coin.

Figure 5.1 below depicts how the inverse of a discrete r.v.

MTH 2401, LECTURE NOTES, Page 27, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

\ À H Ä Hw œ I œ ÖB" ß B# ß á ×

partitions the sample space H into the subsets

\ " ÐB" Ñ œ Ö\ œ B" ×ß \ " ÐB# Ñ œ Ö\ œ B# ×ß á (5.3)

These subsets of H must be events (i.e. belong to Y H), or else \ is not a r.v.. If they are, then
they are to be measured by T and their measures form the probability distribution of \ .

Figure 5.1

This distribution, T \ " œ T\ acting on the numerical family Y ÐÖB" ×ß ÖB# ×ß á Ñ of


(elementary) events is the part of a new “numerical” probability space ÐIß Y ÐIÑß T\ Ñ. Thus, the
r.v. \ generates a new probability measure on Y ÐIÑ called the probability distribution of \ . 

Bernoulli R.V. One of the most rudimentary r.v.'s is

\ À H Ä Hw œ Ö!ß "×.

For example, the r.v. that models tossing a biased coin, with H œ ÖLß X × and T defined on 5-
algebra ÖHß Øß ÖL×ß ÖX ×× as Ö"ß !ß :ß "  :×ß respectively. The probability distribution
T\ œ T \ " is defined on

Y ÐHw Ñ œ Y ÐÖ!ß "×Ñ œ ÖHw ß Øß Ö"×ß Ö!×× (5.4)

as

T\ À ÖHw œ Ö!ß "×ß Øß Ö"×ß Ö!×× Ä Ö"ß !ß :ß "  :×Þ (5.5)

Such a r.v. is called Bernoulli. More precisely, any r.v. which lands at Hw œ Ö!ß "× with the
above distribution, regardless of the nature of the underlying model, is Bernoulli.

MTH 2401, LECTURE NOTES, Page 28, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Most typically, a generic Bernoulli r.v. \ is associated with a single trial which can manifest a
success or failure. The latter in turn is mapped by \ to " or !, respectively.

We can also formalize a Bernoulli r.v. as follows. Given 5 -algebra Y H œ Hß Øß Eß E-  we
can define the following function,

\ = œ 1E = œ 
"ß =−E
(5.6)
!ß =ÂE

called the indicator function. We recall that the probability space Hß Y Hß T , with
T H œ "ß T Ø œ !ß T E œ :ß and T E-  œ "  :ß is called a Bernoulli space. Now,
associated with the Bernoulli space is the above function \ œ 1E ß which is a Bernoulli r.v.,
because

\ œ " œ E and \ œ ! œ E-

and T \ œ " œ T ÐEÑ œ À : and T \ œ ! œ T ÐE- Ñ œ "  :. 

Binomial R.V. A Bernoulli r.v. is foundational for the formation of two very important classes
of r.v.'s: binomial and geometric. Both r.v.'s appear in a series of independent Bernoulli trials
(like flipping a coin) manifesting a sequence of successes and failures. In other words, a series of
Bernoulli trials is a sequence \" ß \# ß á of independent Bernoulli r.v.'s, each from the
equivalence class ÒÐ"ß :ÑÓ of Bernoulli r.v.'s with same parameter : − Ò!ß "Ó.

In the binomial case, the above sequence is terminated after the 8th term (trial). The r.v. ]
defined as

] œ \ "  á  \8 ß (5.7)

is called binomial, in notation ] − ÒÐ8ß :ÑÓ. Clearly, the range of ] is Ö!ß á ß 8×.

Consider the event Ö] œ 5×. Obviously,

I" À œ W" ∩ á ∩ W5 ∩ J5" ∩ á ∩ J8 © ] œ 5  (5.8)

(where W3 is the 3th success and J4 is the 4th failure) is an elementary event included in
Ö] œ 5×. Any permutation of any two or more successes and failures matters and it gives us yet
another elementary event from Ö] œ 5×, for instance,

I# œ J" ∩ W# ∩ á ∩ W5" ∩ J5# ∩ á ∩ J8 © ] œ 5 . (5.9)

To find the quantity of all elements of Ö] œ 5× we notice that each event I3 can be associated

with the total number of  85 Þ In other words,


with some placement of 5 balls among 8 numbered cells in Model c) of Example 2.2 (Chapter I),

MTH 2401, LECTURE NOTES, Page 29, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

lÖ] œ 5×l œ  85 . (5.10)

Now, T ÐI" Ñ œ :5 Ð"  :Ñ85 if we take into account the independence of the events
W" ß á ß W5 ß J5" ß á ß J8 . Consequently,

T Ö] œ 5× œ ,Ð8ß :à 5Ñ œ  85 :5 Ð"  :Ñ85 ß 5 œ !ß á ß 8. (5.11)

The latter is valid for all 5 œ !ß á ß 8Þ To verify that the above probabilities form a probability
distribution, we sum them up:

 T Ö] œ 5× œ   8 :5 Ð"  :Ñ85


8 8
5
5œ! 5œ!

œ Ð:  "  :Ñ8 œ ".

The latter is due to the binomial formula

  8 +5 ,85 œ Ð+  ,Ñ8


8
5 (5.12)
5œ!

holding for any two real numbers + and , . 

Example 5.3. In many complex parallel reliability systems and parallel subsystems, an 8-
components-system stays intact (over a time period) if at least 5 out of 8 components do so. For
example, in many power-generating systems with at least two generators, 5 generators are
sufficient to provide power requirements. Also, in a typical wire cable for cranes and bridges, the
cable may contain thousands of wires and only a fraction of them is required to carry the desired
load.

Assuming that all units have identical and independent life distributions and that the probability
that a unit is functioning is :ß the probability that exactly 5 out of a total of 8 units function is

,8ß :à 5  œ  85 :5 ; 85 ß 5 œ !ß á ß 8.

In the context of a 5 -out-of-8 system, at least 5 of them have to function (over a time period) and
such probability is

V 8ß :à 5  œ 84œ5  84 :4 ; 84 . (5.13)

Here : stands for reliability of one component (in fact each of the 8 components) and V 8 is
the system reliability.

Consider a 5 -out-of-8 reliability system of 8 components each one of which can fail
independently of the others with probability ; œ 0.75. Suppose that the number of components

MTH 2401, LECTURE NOTES, Page 30, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

in this system must be at least 20. What is a minimal number of components R needed so that
the reliability V of this system is at least !Þ*&?

Solution. We will use an R-program to calculate R . Note that the a pertinent command in R to
calculate the sum 54œ!  84 :4 ; 84 of 5  " binomial probabilities of \ À H Ä !ß "ß á ß 8 is

pbinom(k, n, p)

The system reliability is calculated using formula (5.13)

V 8 œ 84œ5  84 :4 ; 84 œ "  4œ!  4 :4 ; 84


5" 8

œ "  pbinom5  "ß 8ß :ß

with : œ !Þ#&, ; œ !Þ(&ß and 5 œ #!. However, 8 is undefined. More precisely,


8 œ #!ß #"ß á ß R ß where R œ min8   #! À V 8   !Þ*&. Therefore, we need to compute R
using the R-program:

N=19;
R<-0;
while (R<0.95) {
N=N+1
R<-1-pbinom(19,N,p=0.25);
}
print(R);
print(N)

And the answer is

> print(R);
[1] 0.9510116
> print(N)
[1] 107

Here 107 is the minimum number of components for which system reliability V is !Þ*&"!""'. 

\ − "#ß !Þ$ and plot the distribution


Example 5.4. Using an R-operator we can make a bar-plot of its distribution. Take for example

,"#ß !Þ$à 5 ß 5 œ !ß á ß "#.

The R-command for this is

MTH 2401, LECTURE NOTES, Page 31, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

barplot(dbinom(0:12,12,0.3),names=as.character(0:12),xlab="x",yla
b="f(x)")

0.20
0.15
f(x)

0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 10 11 12

In another variant of \ − "#ß !Þ& we make a bar-plot distribution of \ using the R-


command

barplot(dbinom(0:12,12,0.5),names=as.character(0:12),xlab="x",yla
b="f(x)",col="lightblue2")

enhancing its color:


0.20
0.15
f(x)

0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 10 11 12

MTH 2401, LECTURE NOTES, Page 32, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Here with : œ !Þ&ß the plot is symmetric about 5 œ '. 

Example 5.5. Suppose it is known that people over the age of 40 can develop a hypertension
(the systolic blood pressure reading of 130 or higher) with probability :. Let ] be a r.v.
recording the number of people in a sample of 8 individuals that have hypertension. Then,
] À H Ä Ö!ß "ß á ß 8× has the binomial probability distribution

T Ö] œ 5× œ ,Ð8ß :à 5Ñ œ  85 :5 Ð"  :Ñ85 ß 5 œ !ß á ß 8. (5.14)


Here Ö\3 œ "× corresponds to the event that the 3th individual in the sample has a hypertension.

Geometric R.V. We return to the series of independent Bernoulli trials in Remark 5.1:
\" ß \# ß á on the same probability space ÐHß Y ÐHÑß T Ñ each mapping H œ Öfailureß success×
onto Hw œ Ö!ß "×, with the same Bernoulli probability distribution

T\ À ÖHw œ Ö!ß "×ß Øß Ö"×ß Ö!×× Ä Ö"ß !ß :ß "  :×Þ (5.15)

as in Remark 5.1. The series of trials now continues until a first success occurs. For instance, if
the first success takes place at the 5 th trial, we have

\" œ !ß á ß \5" œ !ß \5 œ ".

(As an example, we can flip a coin up until we obtain a Head for the first time.) Then we stop the
series. Let ^ be the r.v. that gives the number of trials needed to attain to the first success. Then,
the event Ö^ œ 5× equals

Ö^ œ 5× œ J" ∩ á ∩ J5" ∩ W5

and thus the distribution of ^ , by the independence of trials, is

:5 œ T Ö^ œ 5× œ :; 5" ß 5 œ "ß #ß á ß where ; œ "  :.

The r.v. ^ is called geometric of type I or just geometric, because of its association with the
terms of a geometric series. It is easy to show that for all 5 œ "ß #ß á ß the :5 's sum up to 1. We
will say that ^ − Geo:.

The r.v. [ œ ^  " counts the number of failures prior to the first success in a series of
Bernoulli trials. It is readily seen that the distribution of [ is similar to that of ^ and is

T Ö[ œ 5× œ :; 5 ß 5 œ !ß "ß á .

The r.v. [ is called geometric of type II, in notation [ − Geo2:.

[ − Geo2!Þ#. The corresponding R-command is


Using an R-operator we can make a bar-plot of Geo2 distribution. Take for example

MTH 2401, LECTURE NOTES, Page 33, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

fx<-dgeom(0:20,0.2)
barplot(fx,names=as.character(0:20),xlab="x",ylab="f(x)")

and the plot is

0.20
0.15
0.10
f(x)

0.05
0.00

0 2 4 6 8 10 12 14 16 18 20

Another variant of [ − Geo2!Þ$ yields


0.30
0.25
0.20
0.15
f(x)

0.10
0.05
0.00

0 2 4 6 8 10 12 14 16 18 20

For 5ß 8   !, we have the following surprising property of the geometric distribution:

T Ö[ œ 5  8l[   8×

œ T ÐÖ[ œ 5  8× ∩ Ö[   8×ÑÎT Ö[   8×

MTH 2401, LECTURE NOTES, Page 34, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

œ T Ö[ œ 5  8×Î T Ö[ œ 3×
3 8

8
;
œ :; 58 Î: "; œ :; 58 Î; 8 œ :; 5 .

This means that if a time process like a telephone conversation was observed from some moment
prior to which the conversation lasts 8 or more units of time (seconds), the probability that it will
last 5 units of time longer thereafter is the same as the probability that the conversation lasts 5
seconds but was observed from the very beginning. In this process, we do not count the very last
second to associate it with the second type geometric r.v. The type I geometric r.v. has a similar
property. We call is the memoryless property. It can be proved that if a discrete-valued r.v. has
memoryless property then it is geometric. Thus the geometric r.v. is the only discrete r.v. with
the memoryless property. 

Example 5.6. Under the condition of Example 5.4, if the sample size 8 is large (say over 500)
and : is small (we can replace the above age group by 25 years of age and younger individuals),
then (5.1) can be approximated by
5
T Ö\ œ 5× œ /- -5x ß 5 œ !ß "ß á (5.16)

(8: converges to some -  ! as 8 Ä ∞ and : Ä !). Indeed, let - œ 8: for a large 8 and small
:. Then, for a fixed 5ß we can represent ,Ð8ß : À 5Ñ as

 85 :5 Ð"  :Ñ85


-5  - 8Î-
5x  "  "  :5 ,
-
8Ð8"ÑâÐ85"Ñ
œ 85
†  8

of which the first factor approaches " as 8 Ä ∞, the fourth factor approaches " as : Ä !. The
third factor converges to /- ß as 8 Ä ∞.

Ideally, 5 runs the countable set of nonnegative integers to qualify for a distribution (since only
the infinite series  - equals /- ), but because the convergence is very rapid, the probabilities in
∞ 5
5x
5œ!
(5.1), for 5 larger than say 300, are negligibly small. The distribution defined in (5.16) is called
Poisson with parameter -. Poisson r.v.'s are among most important r.v.'s in biostatistics and
stochastic (i.e. random) processes. A very prominent class of stochastic processes (including

\ −  1 -  .
genetics and ecology) is the Poisson process which is related to the Poisson r.v. We will denote

\ − 1&. The corresponding R-command is


Using an R-operator we can make a bar-plot of a Poisson distribution. Take for example

fx<-dpois(0:15,5)
barplot(fx,names=as.character(0:15),col="lightgoldenrod3")

MTH 2401, LECTURE NOTES, Page 35, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

with the plot

0.15
0.10
0.05
0.00

0 1 2 3 4 5 6 7 8 9 11 13 15

Remark 5.1. If ] is a binomial r.v. with parameters Ð8ß :Ñ, then as we know it from (5.2),
] œ \"  á  \8 ß where \3 's are Bernoulli r.v.'s valued ! or " each and \3 œ " with
probability :. The "average" or "expected value" of \3 , in notation, I\3 , will be : calculated as
I\3 œ : † "  Ð"  :Ñ † !. For instance, when flipping a fair coin, the average value of one
outcome is "# . Informally, 8: seems to be the average value of the r.v. ] as the sum of the
expected values of 8 Bernoulli's r.v.'s. (More about this in section 6.)

Consequently, in the Poisson case, - has the meaning of the average value of the Poisson r.v. \
as the limiting value of 8: in the binomial case. In most situations we will be concerned with,
the Poisson distribution will serve as an approximation to any binomial distribution qualified for
the approximation as having a large 8 and a small :. 

Example 5.7 (tutorials on some discrete r.v.'s).

Ð3Ñ It is known that the Norton Antivirus software can identify and eliminate 90% of all current
and past viruses. Suppose a sample of 10 personal computers was selected for testing and they
were all infected with various viruses. What is the probability that exactly 8 of them will be
detected by Norton?

Solution. We model the above situation by a binomial r.v. with parameters 8 œ "! and : œ !Þ*.
The reason for this is that : œ !Þ* comes from some large statistics of unidentified population,
from which a small sample was randomly drawn. Thus, if the r.v. ] describes the number of
detected viruses (i.e. the number of machines the Norton detects), we need to find the probability
that exactly 8 out of 10 PC's will be detected by the Norton:

MTH 2401, LECTURE NOTES, Page 36, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

T Ö] œ )× œ  "!  ) #
) !Þ* !Þ" œ %& † !Þ%$!%'(#" † "!
#

œ !Þ"*$("!#%%&. (5.17)

(Alternatively, the same answer can be found in the attached Table of Binomial Distributions,
page T-7. However, the above formula needs to be presented.)

Ð33Ñ Suppose a typist makes " "# typos per page on an average when typing a manuscript. What is
the probability that on a given page she will make three or more typos?

Solution. A typical page of manuscript can contain between 500 and 1000 characters, say 750.
Due to Remark 5.2, we interpret the phrase "on an average " "# typos" per page as 8: œ " "# . With
8 œ (&! (large) we then have : œ !Þ!!"$, which is small. Therefore this is a binomial model
ideally, but it allows a good Poisson approximation with - œ " "# . Thus, the corresponding r.v. \
is "approximately" Poisson with - œ " "# and we need to find

T Ö\   $× œ "  T Ö\ Ÿ #×

œ "  T Ö\ œ !×  T Ö\ œ "×  T Ö\ œ #×

- -#
œ "  /- Ð"  "x  #x Ñ (with - œ " "# ). (5.18)

Ð333Ñ A student is about to submit a draft of her master thesis. It is known there is an 70% chance
that her advisor will accept the draft. If the draft will need a revision, then the new submission
will still have the same probability of being accepted. Assume that submissions will continue
with the same statistics. What is the probability that the student will need less than five
submissions before her thesis will be accepted?

Solution. In this case we observe a series of independent Bernoulli trials \" ß \# ß á , which ends
with the first "success" and with T Ö\3 œ "× œ : œ !Þ(. Therefore, the r.v. \ describing the
process of trials and ending the series is geometric with parameter : œ !Þ(. What we need to find
is

T Ö\ Ÿ %× œ  :; 3" œ : ";
% %
%
"; œ "  !Þ$ œ "  !Þ!!)"
3œ"

œ !Þ**"*. (5.19)

Example 5.8. It is known that of 313.6 million of US population, about 200 million have high-
speed Internet access at home. Thus the population proportion of high-speed Internet users is
0.6378. Suppose 1500 people are randomly selected. What is the probability that at least 1000 of
those responding to the survey have high-speed Internet access?

MTH 2401, LECTURE NOTES, Page 37, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Solution. We interpret !Þ'$() proportion as : and a sample of 1500 people selected for the

\ œ \"  á  \"&!! − "&!!ß !Þ'$() gives the number of those in the surveyed sample
survey as a sample of 1500 independent Bernoulli trials, so that

who have high-speed Internet access. Thus, we are interested to find

T \   "!!! œ "&!!
5œ"!!!
 "&!!
5
!Þ'$()5 !Þ$'##"&!!5 .

Now the above is a computational challenge. The normal approximation, due to the Central
Limit Theorem (to be explored in Chapter V, section 2), would be a remedy. Alternatively, we
can use R language to compute the above probability precisely. The procedure will be as
follows.

T \   "!!! œ "  T \  "!!! œ "  T \ Ÿ ***

implying

> 1‐pbinom(999,1500,.6378)
    [1] 0.01042895

which reads

T \   "!!! œ !Þ!"!%#)*&Þ

Here pbinom(999,1500,.6378) calculates

***
5œ!
 "&!! !Þ'$()5 !Þ$'##"&!!5 .
5

Now if we use the Poisson approximation with - œ *&'Þ( (why?) we arrive at

    > 1‐ppois(999,956.7)
    [1] 0.08395288

The result significantly differs from that of direct binomial. The student needs to explain the
discrepancy. (See Problem 5.6.) 

Hypergeometric R.V. A r.v. \ À H Ä Ö!ß á ß minÖ<ß 8××, with parameters ÐR ß <ß 8Ñ is called
hypergeometric, if its distribution is

 5<  R85
< 

 R8 
T Ö\ œ 5× œ ß5 œ maxÖ8  Aß !×ß á ß minÖ<ß 8×.
(5.20)

For an illustration, see Example 2.4 (Hypergeometric Model). 

MTH 2401, LECTURE NOTES, Page 38, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Example 5.9. (An example from ecology.) Suppose an unknown number R of animals or insects
inhabit a certain region. In order to obtain some information about the population size (and make
a decision on whether or not this population is endangered or overpopulated), ecologists often
perform the following test. They catch a number of animals, say <, and mark them in some
manner; then release them trying to disperse them throughout the habitat. After a while, a new
catch is rendered of size 8. If \ is the r.v. describing the number of marked animals in the catch,
then as we know it from Model 5, \ is a hypergeometric r.v. with parameters ÐR ß <ß 8Ñ.

Supposed \ is observed to be equal B. (We assume that the number of animals between the two
catches did not alter.) From Example 5.5, we recall that

 B<  R8<B 
 R8 
2ÐR ß <ß 8à BÑ œ T Ö\ œ B× œ . (5.21)

Now, we need to estimate the unknown parameter R using the maximum likelihood principle.
Namely, we assume that the value R we need to find is the one that maximizes distribution
(5.21). In other words, we need to find an R ‡ (m.l.e.) such that

2ÐR ‡ ß <ß 8à BÑ œ maxÖ2ÐR ß <ß 8à BÑ À R œ "ß #ß á ×. (5.22)

The maximization of 2ÐR ß <ß 8à BÑ can be done by first noting that

2ÐR ß<ß8àBÑ ÐR <ÑÐR 8Ñ


2ÐR "ß<ß8àBÑ œ R ÐR <8BÑ . (5.23)

From (5.23) we find that 2ÐR ß <ß 8à BÑ  2ÐR  "ß <ß 8à BÑ if and only if the above ratio is
greater than " and thus,

R  <8ÎB.

From similar calculations, 2ÐR ß <ß 8à BÑ  2ÐR  "ß <ß 8à BÑ if and only if

R  "  <8ÎB.

Altogether,
<8 <8
B  "  R‡  B

Therefore, 2ÐR ß <ß 8à BÑ reaches its maximum at the largest integer value of R not exceeding
<8ÎB. Roughly speaking, we have
< B
R ¸ 8

the ratio, under which the likelihood function 2ÐR ß <ß 8à BÑ attains its maximum. For example if
the initial catch consisted of < œ &! animals which were marked and then released, and then the

MTH 2401, LECTURE NOTES, Page 39, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

second catch of 8 œ 100 animals showed B œ & animals marked, then we will estimate that there
are some 1000 animals in the habitat. 

MTH 2401, LECTURE NOTES, Page 40, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

5.1. The monthly worldwide average number of airplane crashes of commercial airlines is 3.
What is the probability that there will be at least 2 such accidents in the next month. Justify
your choice of a relevant r.v. and its approximation.

5.2. Two chess players of equivalent strengths play against each other. Disregarding draws,
what is more probable to win two games out of four or three games out of six?

5.3. Suppose that \ is a type 2 geometric r.v. with parameter :, i.e. T Ö\ œ 8× œ :; 8 ß


8 œ !ß "ß ÞÞÞ . Determine the probability that the value of \ will be one of the even integers
!ß #ß %ß á

5.4. The interval Ò!ß "&Ó is partitioned into two subintervals, E œ Ò!ß "!Ó and F œ Ð"!ß "&Ó.
Suppose four points were randomly selected. Assuming that the probability of a point to land at a
subinterval is proportional to its length, find the probability that exactly two points fall into E
and the other two - into F .

5.5. Let the r.v.'s \" ß ÞÞÞß \8 form 8 independent Bernoulli trials with parameter :. Determine the
conditional probability that \" œ ", given that the sum  \3 œ 5 Ð5 œ "ß ÞÞÞß 8ÑÞ
8

3œ"

5.6. Explain in Example 5.8 how was - obtained and what went wrong with the Poisson
approximation.

5.7. Given the plot below of a geometric distribution conclude what type of geometric
distribution (type 1 or type 2) it is and give its parameter:
0.10
0.08
0.06
f(x)

0.04
0.02
0.00

0 3 6 9 13 17 21 25 29 33 37 41 45 49

5.8. In the context of Problem 5.7 write an R-program to plot the above figure.

MTH 2401, LECTURE NOTES, Page 41, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

5.9. Suppose a parallel reliability system consists of a cable of wires for a bridge and that this
cable must have at least 50 wires. If the reliability of a single wire is !Þ#ß how many wires
should the cable have to guarantee the reliability of the system to be at least !Þ*(?

MTH 2401, LECTURE NOTES, Page 42, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

6. Moments and the Probability Generating Function


Let \ À H Ä ÖB" ß B# ß á × be a r.v. with distribution T Ö\ œ B5 × œ :5 . We formally denote the
value

. œ I\ œ B5 :5 (6.1)
5

and call it the expected value of r.v. \ . We also call . the mean or the first moment of r.v. \ . If
\ is Poisson with parameter -, we can easily show that the parameter - is also the mean of \ .
Indeed,

I\ œ 5/- -5x œ -/-  Ð5"Ñ


5 5"
- - -
! œ -/ / œ -Þ (6.2)
5 5

For the binomial and geometric r.v.'s, calculation of their means is somewhat more cumbersome.
We therefore introduce a transform that works for nonnegative integer-valued r.v.'s.

Before that, let 2 be a real-valued function such that the composition 2Ð\Ñ is also a r.v. valued
in Ö2ÐB" Ñß 2ÐB# Ñß á ×. We assume that T Ö2Ð\Ñ œ 2ÐB5 Ñ× œ :5 , i.e. the probability distribution
of the values of 2Ð\Ñ is the same as values of \ . We can regards 2Ð\Ñ as a new r.v. with the
distribution Ö:" ß :# ß á ×. Thus the mean of 2Ð\Ñ is

I2Ð\Ñ œ 2ÐB5 Ñ:5 . (6.3)


5

We can interpret 2Ð\Ñ is the capital gain corresponding to the values of \ . For instance, if \
denotes the contents of some inventory and 2Ð\Ñ is its US dollar value.

Suppose \ À H Ä Ö!ß "ß á × (i.e. B5 œ 5 ) distributed as Ö:! ß :" ß á × and let 2ÐBÑ À œ D B . Then,
the expectation of the function 2 of \ is the power series

I2Ð\Ñ œ 1ÐDÑ À œ ID \ œ  D 5 :5

(6.4)
5œ!

called the probability generating function (pgf). The pgf 1ÐDÑ is absolutely convergent on the

plane FÐ!ß "ÑÑ. Indeed, for D  œ "ß


boundary of interval Ð  "ß "Ñ (or more exactly on the boundary of the unit disk in the complex

I D  œ  :5 œ ".

\
(6.5)
5œ!

Therefore, 1ÐDÑ is analytic everywhere for lDl  " and continuous for lDl œ ". In particular, 1ÐDÑ
can be expanded in Taylor series at zeroß

MTH 2401, LECTURE NOTES, Page 43, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

ÒX 1ß !ÓÐDÑ œ  1Ð5Ñ Ð!ÑD 5 Î5 !



(6.6)
5œ!

From the comparison of the series (6.4) and (6.6) and the uniqueness of the power series
representation we conclude that

:5 œ 1Ð5Ñ Ð!ÑÎ5 !, (6.7)

which can serve as the inverse transform of 1ÐDÑ of (6.4).

Remark 6.1. (The existence of expectation). Since in many circumstances, the expectation is a
series, it sounds like the expectation exists if and only if the series I\ converges. This is not
true in spite of the common sense in mathematical analysis. We know that if a series converges
absolutely, it converges in the usual sense. Not so for the expectation that came from an abstract
analysis and integration. We will agree from now on that the expected value . œ I\ exists if
and only if the series Il\l converges. In this case, . is the value of the expectation of \ .

As an example consider the r.v. \ À H Ä ÖÐ  "Ñ8 8ß 8 œ "ß #ß á × with the distribution


T Ö\ œ Ð  "Ñ8 × œ 1##8# . It can be easily shown that



#
1# 8# œ ". (6.8)
8œ"

Then, the expectation of \ is formally

I\ œ  Ð  "Ñ8 1## 8

(6.9)
8œ"

which converges as a Leibnitz series (being sign-alternating and with its variant 1## 8 Ä !).
Nevertheless, I\ does not exist, since the series does not obviously converge absolutely as it
forms a divergent harmonic series. 

Below are some useful applications of the pgf's. If we formally proceed with differentiating the
series

œ  5D 5" :5

. .
.D 1ÐDÑ œ .D ID
\
(6.10)
5œ"

and assume that the limit at D Ä "  exists. (This takes place if and only if the expectation I\
exists.) Hence we have

lim . 1ÐDÑ œ 1w Ð"Ñ œ . œ I\ . (6.11)


DÄ" .D

This is a more convenient form of obtaining the mean in many particular cases.

MTH 2401, LECTURE NOTES, Page 44, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

Example 6.1. Poisson r.v. We easily find

1ÐDÑ œ  D 5 /- -5! œ /-  Ð-5DÑ! œ /- /-D œ /-ÐD"Ñ .


∞ 5
∞ 5
(6.12)
5œ! 5œ!

Then,

. œ I\ œ -/-ÐD"Ñ Dœ" œ -. (6.13)


Example 6.2. Binomial r.v. \ À H Ä Ö!ß á ß 8× À

1ÐDÑ œ   85 :5 ; 85 D 5 œ   85 Ð:DÑ5 ; 85


8 8
(6.14)
5œ! 5œ!

Ðusing the binomial formula Ð+  ,Ñ8 œ   85 +5 ,85 Ñ


8

5œ!

œ Ð:D  ;Ñ8 . (6.15)

Again to find the mean of \ we differentiate 1ÐDÑ arriving at

1w Ð"Ñ œ I\ œ 8Ð:D  ;Ñ8" : Dœ" œ 8:Þ (6.16)


Example 6.3 (Poisson r.v. revisited). Suppose \ is a binomial r.v. with the pgf
1Ð8ß :à DÑ œ Ð:D  ;Ñ8 Þ Assume that 8 Ä ∞ and : Ä ! and

lim lim 8: œ -  !.
8Ä∞ :Ä!

Also assume that D Á ". Consider 8 very large and : very small so that 8: ¸ - Ê : ¸ -8 . For
simplicity, we assume that 8: œ - for very large 8 and very small :.

Ê 1Ð8ß :à DÑ œ Ð -8 D  "  -8 Ñ8 œ "  -8 ÐD  "Ñ


8

œ "  -8 ÐD  "Ñ -ÐD"Ñ 


8 -ÐD"Ñ
Ä /-ÐD"Ñ .

8
This is due to the following. Setting B œ -ÐD"Ñ we observe that B Ä ∞ if and only if 8 Ä ∞
provided that D Á ". Then we easily recognize that

Ð"  B" ÑB Ä / , as B Ä ∞ß

MTH 2401, LECTURE NOTES, Page 45, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

a known result from calculus. The rest is due to the continuity of the exponential function. Hence
we proved that

1Ð8ß :à DÑ Ä /-ÐD"Ñ . (6.17)

For D œ "ß 1Ð8ß :à "Ñ œ " and this is a trivial result. Now the Taylor series expansion of /-ÐD"Ñ
is

/-ÐD"Ñ œ /- /-D œ /-  -5D! œ  /- -5! D 5


5 5 5

5 ! 5 !

5
concluding that T Ö\ œ 5× œ /- -5x ß 5 œ !ß "ß á 

MTH 2401, LECTURE NOTES, Page 46, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

6.1. Find the pgf of types I and II geometric r.v.'s of types and then using the procedures like in
Examples 6.1 and 6.2, find the means in both cases.

6.2 Let \ be a Bernoulli type r.v. specified by T Ö\ œ "× œ : and T Ö\ œ  "× œ "  :. Give
the set of all values of D Á " that solve the equation ID \ œ ".

6.3*. Let \ be a binomial r.v. with parameters Ð8ß :Ñ Ð!  :  "Ñ. Evaluate I  \" " 
Þ Hint: You
can use the probability generating function 1D  of \ and integrate it from zero to ".

6.4. Let \ À =" ß á ß =8  Ä "ß á ß 8 be a discrete uniform r.v., i.e. such that
T \ œ 5  œ 8" ß for 5 œ "ß á ß 8. Find the pgf, the mean and the variance of \ . Hint: You can
use the formula

85œ" 5 # œ " 88  "#8  ".


'

MTH 2401, LECTURE NOTES, Page 47, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

7. The Calculation of the Expectation and Variance


Property 1. The expectation is invariant under the affine transformation

IÐ+\  ,Ñ œ +I\  ,Þ (7.1)

Indeed, using 2ÐBÑ œ +B  ,, we have

IÐ+\  ,Ñ œ Ð+B5  ,Ñ:5 œ +B5 :5  ,:5 œ +I\  ,".


5 5 5

In particular, for + œ !ß we have that the expected value of a constant is a constant, i.e. I, œ ,.

A more general property than Property 1 reads

Property 2. For two functions 1 and 2,

IÒ1Ð\Ñ  2Ð\ÑÓ œ I1Ð\Ñ  I2Ð\Ñ.

Indeed,

IÒ1Ð\Ñ  2Ð\ÑÓ œ Ò1ÐB5 Ñ  2ÐB5 ÑÓ:5


5

œ 1ÐB5 Ñ:5  2ÐB5 Ñ:5


5 5

(provided each series converges). 

Another useful measure of volatility of a r.v. is its variance:

Var\ œ 5# À œ IÐ\  .Ñ# .

It shows the dispersion of \ about its mean.

Applying Property 2 we have a useful computational formula for the variance:

5# œ IÒ\ #  #.\  .# Ó œ I\ #  #.I\  .#

œ . #  .# ß (7.2)

where .# œ B#5 :5 is the second moment of \ .


5

Observe that Property 2 easily yields Property 1 with 1Ð\Ñ œ +\ and 2Ð\Ñ œ ,.

MTH 2401, LECTURE NOTES, Page 48, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

7.1. Show that VarÐ+\  ,Ñ œ +# Var\ .

Solution. From (7.2) we have

VarÐ+\  ,Ñ œ IÒÐ+\  ,Ñ# Ó  IÒ+\  ,Ó#

œ IÒ+# \ #  #+,\  ,# Ó  Ð+.  ,Ñ#

(using Property 2)

œ +# .#  #+,.  ,#  +# .#  #+,.  ,#

œ + # . #  + # .# Þ 

7.2. Let \ be a r.v. with mean . and variance 5# . Find the expected value and variance of the
random variable ] œ \. 5 .

MTH 2401, LECTURE NOTES, Page 49, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

8. The Moment Generating Function


Even more useful than the pgf, is the moment generating function (mgf) formally defined as

7Ð)Ñ œ I/)\ . (8.1)

If the pgf of a r.v. is known and equals 1ÐDÑ, then it is easily seen,

7Ð)Ñ œ 1Ð/) Ñ. (8.2)

The term “moment generating function” (mgf) stems from the following expansion:

/)\ œ  ) 8\! ,
∞ 8 8
(8.3)
8œ!

which is the Taylor series expansion of the exponential function at zero. Applying formally the
expectation to both sides of the last equation and using the Fubini's Theorem (allowing us to
interchange two series or two integrals or a combination, for a variety of special cases), we have

7Ð)Ñ œ  )8! I\ 8 ,
∞ 8
(8.4)
8œ!

where I\ 8 À œ .8 is the 8th moment of r.v. \ (assuming it exists). In the event all moments
.8 of \ exist, then the mgf of \ also exists and according to (8.4), 7Ð)Ñ can be expanded in
Taylor series at zero:

ÒX 7ß !ÓÐ)Ñ œ  7Ð8Ñ Ð!Ñ )8! .


∞ 8
(8.5)
8œ!

Comparing (8.4) and (8.5) and from the uniqueness of the power series representation we
conclude that

.8 œ 7Ð8Ñ Ð!Ñß 8 œ !ß "ß #ß á (8.6)

Notice that

.! œ 7Ð!Ñ Ð!Ñ œ 7Ð!Ñ œ ". (8.7)

Example 8.1. Binomial r.v. Recall that 1ÐDÑ œ Ð:D  ;Ñ8 . Thus,

78 Ð)Ñ œ Ð:/)  ;Ñ8 Þ (8.8)

Taking the first derivative of 78 Ð)Ñ we arrive at

MTH 2401, LECTURE NOTES, Page 50, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

.
. ) 78 Ð) Ñ œ 8:/) 78" Ð)Ñ (8.9)

The second derivative will yield

.#
. ) # 78 Ð) Ñ œ 8:/) 78" Ð)Ñ  /) 7w8" Ð)Ñ. (8.10)

Therefore,

. œ ." œ 7w8 Ð!Ñ œ 8:

and

.# œ 7ww8 Ð!Ñ œ 8:"  Ð8  "Ñ: œ 8:;  8# :# (8.11)

The variance of \ , Var\ œ 5# will be (.#  .# )

5# œ 8:; . (8.12)

MTH 2401, LECTURE NOTES, Page 51, Version 54


CHAPTER I. FOUNDATIONS OF PROBABILISTIC MODELING

PROBLEMS

8.1. Using the mgf technique, find the mean and variance of a Poisson r.v. with parameter -.

8.2. Using the mgf technique, find the mean and variance of a type II geometric r.v.

8.3. Using the mgf technique, find the mean and variance of a type I geometric r.v.

8.4. If \ is a binomial r.v. with expected value 6 and variance 2.4, find T Ö\ œ #×Þ

8.5. Find the mean and variance of a r.v. \ , whose mgf is

7Ð)Ñ œ "% Ð$/) +/) Ñß  ∞  )  ∞Þ

8.6. Suppose that \ is a r.v. whose mgf is

7Ð)Ñ œ "& /)  #& /%)  #& /)) ß  ∞  )  ∞Þ

Find the values (range) of \ and its probability distribution.

MTH 2401, LECTURE NOTES, Page 52, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

CHAPTER II. CONTINUOUS DISTRIBUTIONS

1. The Probability Distribution Function (PDF)


To form a bridge between discrete and continuous r.v.'s (as well as for other applications) we
introduce a linear interpolation to the probability distribution Ö:" ß :# ß á × of a r.v.
\ À H Ä I À œ ÖB" ß B# ß á ×. For simplicity, we assume that the set I has the minimal element
and it is B" and that I is ordered as B"  B#  á .

Define J ÐBÑ À œ T Ö\ Ÿ B× and plot this function in order to understand how it works.
Obviously, if B  B" ß then J ÐBÑ œ !Þ For B œ B" we have J ÐB" Ñ œ T Ö\  B" ×  T Ö\ œ B" ×
œ !  :" Þ Hence, J ÐBÑ equals zero for all B  B" and at B œ B" it jumps to :" .

Now, let B" Ÿ B  B# . Then,

J ÐBÑ œ T Ö\ Ÿ B" ×  T ÖB"  \ Ÿ B# ×

œ :"  !Þ (1.1)

Therefore, J ÐBÑ, after it increases to :" , continues to equal :" for all B  B# . When B œ B# ß

J ÐB# Ñ œ T Ö\  B×  T Ö\ œ B# × œ :"  :# , (1.2)

thereby J ÐBÑ jumps to its second level :"  :# keeping on its trend as a step function.
Continuing with the same method, we see that between B3 and B3" ß J ÐBÑ is constant, while at
B3 it picks up :3 and adds it to already accumulated sum :"  á  :3" . Consequently, J is a
piecewise linear function with jumps at B" ß B# ß á ß of respective magnitudes :" ß :# ß á (as in
Figure 1.1).

Figure 1.1

Probability and Statistics MTH 2401, Page 53, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

The function J is referred to as the probability distribution function (PDF). Notice that if the
state space I of \ is finite, say lIl œ 8ß then :"  á  :8 œ ". If I is countable infinite, then
for large Bß J ÐBÑ asymptotically approaches line 1. Formally,

lim J ÐBÑ œ "ß (1.3)


BÄ∞

which is easy to justify, because, with B large, the set Ö\ Ÿ B× approaches H.

Notice from the definition of J and its plot, J is right-continuous, that is whenever the function
has a discontinuity, at B3 , the right limit at B3 is assigned to the value of the J .

We can use J to have a different formula for the mean

I\ œ B5 :5 , (1.4)
5

if we first replace :5 with ?J ÐB5 Ñ À œ J ÐB5 Ñ  J ÐB5  Ñß where J ÐB5  Ñ is the left limit of
J at B5 (see Figure 1.3)Þ Now considering that ?J ÐBÑ œ ! everywhere on ‘ except for B5 's, we
can rewrite (1.4) in the form

I\ œ B?J ÐBÑ (1.5)


B

or rather

I\ œ ∞ B?J ÐBÑ,



(1.6)

since the  formally deals with at most countable many terms. Perhaps even more consistent
would be to replace ?J ÐBÑ with .J ÐBÑ since as the function J is piecewise constant, its
differential is zero everywhere except at points B5 's where .J ÐB5 Ñ does not exist. Without much
formalities, we can let .J ÐB5 Ñ equal ?J ÐB5 Ñ œ :5 thereby arriving at the (Stieltjes) integral

I\ œ ∞ B.J ÐBÑ,



(1.7)

which is nothing but a sum or series.

Now, if in the plot of J , the points B" ß B# ß á (at which \ is concentrated) become more and
more dense over ‘, the PDF J becomes strictly monotone increasing (Figure 1.2):

MTH 2401, LECTURE NOTES, Page 54, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

Figure 1.2

Here the function J looks smooth and if so, the differential of J exists (almost) everywhere, and
in this case, .J ÐBÑ œ J w ÐBÑ.B, where J w ÐBÑ is strictly positive and it is denoted by 0 ÐBÑ and
called the pdf (probability density function). Expression (1.7) can be formally rewritten as

I\ œ ∞ B0 ÐBÑ.B.

(1.7)

Furthermore, representing J ÐBÑ as

J ÐBÑ œ  :4
4ÀB4 ŸB

(see Figure 1.3) or using the same principles as in (1.5) we have

J ÐBÑ œ  ?J ÐCÑ œ  .J ÐCÑ œ CŸB .J ÐCÑ œ Cœ∞ .J ÐCÑ,


B
CŸB CŸB
(1.8)

as the Stieltjes integral representation of J ÐBÑ.

Figure 1.3

MTH 2401, LECTURE NOTES, Page 55, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

Assuming J being strictly monotone increasing and smooth as in (1.7) we have

J ÐBÑ œ Cœ∞ 0 ÐCÑ.C.


B
(1.9)

Example 1.1. Find T Ö\ − E×ß where

+Ñ E œ Ð+ß ,Óà ,Ñ E œ Ò+ß ,Ñà -Ñ E œ Ð+ß ,Ñà .Ñ E œ Ö+×

Solution.

+Ñ T Ö\ − Ð+ß ,Ó× œ T Ö+  \ Ÿ ,×

œ T Ö\ − Ð  ∞ß ,Ó  Ð  ∞ß +Ó× œ T Ö\ Ÿ ,×  T Ö\ Ÿ +×

(recall: E § F Ê T ÐF  EÑ œ T ÐFÑ  T ÐEÑ)

œ J Ð,Ñ  J Ð+Ñ.

,Ñ T Ö\ − Ò+ß ,Ñ× œ T Ö+ Ÿ \  ,×

œ T Ö\ − Ð  ∞ß ,Ñ  Ð  ∞ß +Ñ× œ T Ö\  ,×  T Ö\  +×

œ J Ð,  Ñ  J Ð+  Ñ.

-Ñ T Ö\ − Ð+ß ,Ñ× œ J Ð,  Ñ  J Ð+Ñ

.Ñ T Ö\ œ +× œ T Ö\ Ÿ +×  T Ö\  +× œ J Ð+Ñ  J Ð+  Ñ

(since Ö+× œ Ð  ∞ß +Ó  Ð  ∞ß +Ñ). 

Remark 1.1. One can readily embellish the expectation of a function 2 of r.v. \ as

I2Ð\Ñ œ ∞

2ÐBÑ.J ÐBÑ, (1.10)

or in the event J has a density 0 ß

I2Ð\Ñ œ ∞ 2ÐBÑ0 ÐBÑ.B.



(1.11)

In particular, if 2ÐBÑ œ 1E ÐBÑ œ 


"ß B−E
is an indicator function of a set E,
!ß B − E-

I 1E Ð\Ñ œ ∞ 1E ÐBÑ.J ÐBÑ œ  .J ÐBÑ œ T Ö\ − E×Þ



(1.12)
E

MTH 2401, LECTURE NOTES, Page 56, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

When J has a density 0 , which takes place when the r.v. \ is valued in a "continuum" set (as
oppose to discrete set), the r.v. \ is called continuous. The continuity of \ has nothing to do
with the conventional notion of a continuous function in analysis.

MTH 2401, LECTURE NOTES, Page 57, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

PROBLEMS

1.1. Let \ œ - almost surelyÞ (i.e. with probability 1, it is a constant and equals - ). Plot the PDF
of \ .

1.2. Let \ be the number of heads in a single toss of a biased coin assuming that it shows Heads
with probability :. Plot the PDF of \ .

1.3. Under the condition of Problem 1.2, let ] give the number of Heads in two tosses of this
coin. Plot the PDF of \ .

1.4. An investment firm offers its customers treasury bills that mature after varying number of
years. Given that the PDF (probability distribution function) of X (the number of years to
maturity for a randomly selected bonds) is




 !Þ#ß
!ß >"

J Ð>Ñ œ  !Þ&ß
"Ÿ>#


#Ÿ>%

 !Þ(ß
 "ß
%Ÿ>)
>   )ß

find (a) T ÖX œ %×, (b) T ÖX œ &×, (c) T ÖX  #×, (d) T Ö" Ÿ X  %×. Also sketch J .

MTH 2401, LECTURE NOTES, Page 58, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

2. Continuous Random Variables


A r.v. \ À H Ä ‘ is continuous if \ÐHÑ (the range of \ ) is a continuum set, like an interval,
semiaxis (‘ œ Ò!ß ∞Ñ) or ‘. In short, it is not possible to describe the probability distribution
of \ , as in discrete case by T Ö\ œ B× because it is zero at any point of the range of \ . One
instead operates with closed intervals like

T Ö\ − Ð  ∞ß BÓ× œ T Ö\ Ÿ B× œ À J ÐBÑ (2.1)

becoming a function of variable B. J ÐBÑ is a PDF of r.v. \ introduced in section 1. The PDF of
\ is almost everywhere smooth and thus has a pdf (probability density function) related to J as
.
0 ÐBÑ œ .B J ÐBÑ   !, since obviously J ÐBÑ is everywhere monotone non decreasing.
Conversely, (cf. (1.9)),

J ÐBÑ œ Cœ∞ 0 ÐCÑ.C.


B
(2.2)

Also lim J ÐBÑ œ " and lim J ÐBÑ œ !. (2.3)


BÄ∞ BÄ∞

The Uniform R.V.. A r.v. Y À H Ä Ð!ß "Ñ (range) is called the standard uniform if its pdf is

0 ÐBÑ œ 
"ß B − Ð!ß "Ñ
œ À 1Ð!ß"Ñ ÐBÑ (in notation) (2.4)
!ß B − Ð!ß "Ñ-

In general,

1E ÐBÑ œ 
"ß B−E
(2.5)
!ß B − E-

called the indicator function of a set E (being a parameter). There is a number of very interesting
properties of indicator functions making them more special than just the use for brevity.

The chief property of a continuous uniform r.v. is that T ÖY − E× is constant wherever set E is
picked out from Ð!ß "Ñ as long as the measure of E, lEl (length if it is an interval) is constant. It
is easy to prove by integration of 0 ÐBÑ over different intervals with identical lengths.

More generally, a r.v. \ is uniform on an interval Ð+ß ,Ñ if its pdf

0 ÐBÑ œ  ,+
"
ß B − Ð+ß ,Ñ "
œ ,+ 1Ð+ß,Ñ ÐBÑ (2.6)
!ß B − Ð+ß ,Ñ-

Integration of 0 yields the PDF J of such uniform r.v. as depicted in Figure 2.1 below:

MTH 2401, LECTURE NOTES, Page 59, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

Figure 2.1

After integration of (2.6) it yields


 !ß
J ÐBÑ œ  B+
BŸ+

 "ß
,+ ß +B, (2.7)
B ,

Simulation. Let Y be a standard uniform r.v. and let \ be an arbitrarily distributed continuous
r.v. with PDF J\ . We show how to simulate the r.v. \ using the simulation of Y (which is
supposed to be a simple procedure). If we obtain Y by simulation, it takes a value between ! and
". We prove that J\" ÐY Ñ − Ò\Ó. Indeed,

T ÖJ\" ÐY Ñ Ÿ B× œ T ÖJ J\" ÐY Ñ Ÿ J ÐBÑ×

(which holds true, because J is monotone increasing)

œ T ÖY Ÿ J ÐBÑ× œ J ÐBÑß

since T ÖY Ÿ C× œ C for all C − Ð!ß "Ñ (see Figure 1, for + œ ! and , œ "). This can be utilized
for simulation of a r.v. \ whose inverse can be easily found (cf. exponential r.v. in Example
2.4). 

Exponential R.V. A r.v. \ À H Ä ‘ is called exponential with parameter - Ð  !Ñ if its pdf is

0 ÐBÑ œ -/-B 1Ò!ß∞Ñ ÐBÑ. (2.14)

Below is the plot of an exponential density function with parameter - œ !Þ&.

x<-seq(-4,12,0.01)
y<-dexp(x,0.5)
plot(x,y,type="l",col="blue")

MTH 2401, LECTURE NOTES, Page 60, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

0.5
0.4
0.3
y

0.2
0.1
0.0

0 5 10

It is zero on the negative real axis and for positive values of B, 0 is as negative exponential,
monotone decreasing, concave up and asymptotically approaching the horizontal axis. The
integral of 0 gives

J ÐBÑ œ Ð"  /-B Ñ1Ò!ß∞Ñ ÐBÑ. (2.15)

J is monotone increasing for nonnegative B, concave up, and asymptotically approaching the
line C œ ". Note that

T Ö\  B× œ /-B ß B   !. (2.16)

Using (2.16) we prove one interesting property of exponential r.v.'s. Suppose we began to
observe an exponentially distributed r.v. \ at some point of time when \ has been running, for
instance, a telephone conversation. What is the probability that \ will continue some time
longer, i.e. what is T Ö\  >  =l\  =×? We have by the conditional probability formula,

T ÐÖ\>=×∩Ö\=×Ñ
T Ö\  >  =l\  =× œ T Ö\=×

(the numerator for brevity makes us rewrite the fraction as)

T Ö\>=ß\=×
œ T Ö\=× .

Obviously, Ö\  >  =× is a subset of Ö\  =× and as such, the intersection of the two gives the
smaller set Ö\  >  =× yielding

T Ö\>=×
T Ö\  >  =l\  =× œ T Ö\=× (2.17)

MTH 2401, LECTURE NOTES, Page 61, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

by (2.16)

/-Ð>=Ñ
œ / - =
œ /-> œ T Ö\  >×. (2.18)

The latter means that the residual life time (if this is a time process) does not depend on = (i.e.
how long the process has lasted) and, furthermore, it is as if the process was observe from its
beginning. This property is called the memoryless property of the exponential r.v. One can show
that the exponential r.v. is the only representative of the class of continuous r.v.'s with the
memoryless property.

Example 2.1. Insurance companies collect accident records (called history) of drivers. A driver
is considered in a “stable” category, if his/her probability of having an accident remains approxi-
mately a constant independent of time. One interpretation of this feature is when \ is the time
up to the first accident of the driver in question, then

T \  >  =l\  = œ T \  >.

For such a safe driver, the insurance company does not bother estimating the risk of the driver
but it rather sets the amount of premium based on driver's record.

Now, from Problem 2.1, we know that the only continuous r.v. with the above memoryless
property is exponential. 

in the case of a discrete r.v., the expectation of \ is defined . œ I\ œ Bœ∞ B0 ÐBÑ.B and it
Arbitrary Continuous Random Variables. Let \ À H Ä ‘ be a continuous r.v. with pdf 0 . As

exists if and only if the integral Il\l œ Bœ∞ lBl0 ÐBÑ.B converges.

The variance of \ is defined as IÒÐ\  .Ñ# Ó and is determined by the formula

Var\ œ .#  .# Þ (2.19)

The same properties of the expectation and variance w.r.t. an affine transformation apply to
continuous r.v.'s, namely:

IÒ+\  ,Ó œ +I\  , (2.20)

VarÐ+\  ,Ñ œ +# Var\ , (2.21)

which can be directly proved by calculations of the associated integrals.

MTH 2401, LECTURE NOTES, Page 62, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

PROBLEMS

2.1. Show that if a continuous r.v. \ has the memoryless property, i.e. that
T Ö\  >  =l\  =× œ T Ö\  >×ß for all nonnegative = and >, then \ is exponential. ÒHint:
From (2.17) and (2. 18),

1Ð>  =Ñ œ T Ö\  >  =× œ 1Ð>Ñ1Ð=Ñ.

Thus,

1Ð>  =Ñ  1Ð>Ñ œ 1Ð>ÑÒ"  1Ð=ÑÓ.

then, divide the latter by = and run = to zero.Ó

2.2. Let \ be a r.v. with the pdf 0 ÐBÑ œ - B 1Ð!ß"Ñ ÐBÑ. Find the constant value of - . Find the
PDF J ÐBÑ and then using J , find the probabilities T Ö  "  \ Ÿ " "# × and T Ö "#  \ Ÿ #×Þ

Solution. First draw the graph of the density 0

Figure 2.1

To find -ß we use the equation

" œ Bœ∞ 0 ÐBÑ.B œ Bœ! - B.B œ - #$ B$Î# "Bœ! œ - #$


∞ "

yielding that - œ $# . Furthermore, by integrating we have

 !ß
J ÐBÑ œ  B$Î# ß
BŸ!

 "ß
!  B  "Þ (2.22)
B "

Finally, T Ö  "  \ Ÿ " "# × œ J Ð" "# Ñ  J Ð  "Ñ œ "  ! œ "ß and
T Ö "#  \ Ÿ #× œ "   "#  ¸ "  !Þ$& œ !Þ'&Þ
$Î#

2.3. Let \ be a r.v. with the pdf 0 ÐBÑ œ - B 1Ð!ß#Ñ ÐBÑ.

MTH 2401, LECTURE NOTES, Page 63, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

(a) Find the constant value of - .

(b) Find the PDF (probability distribution function) J

(c) Using J , find the probabilities:

T Ö  " "#  \ Ÿ # "# × and T Ö  "


#  \ Ÿ "×Þ

2.4 Find the mean and variance of a uniform r.v. on interval +ß ,.

Solution. A r.v. \ is uniform on an interval Ð+ß ,Ñ if its pdf

0 ÐBÑ œ  ,+
"
ß B − Ð+ß ,Ñ "
œ ,+ 1Ð+ß,Ñ ÐBÑ.
!ß B − Ð+ß ,Ñ-

The formula for the first mean (expectation) of \ is

I\ œ Bœ∞ B0 B.B œ Bœ+ B ,+ # ,+ B Bœ+


∞ " , " " # ,
.B œ

# ,+ ,  +#  " ,+,+


,+
" " +,
# .
#
œ œ # œ

I\ # œ Bœ∞ B# 0 B.B œ Bœ+ B# ,+ $ ,+ B Bœ+


∞ " , " " $ ,
.B œ

$ ,+ ,  +$  œ
" " " ,+,# +,+#  +# +,,#
œ $
$ ,+ œ $ .

%Ð+# +,,# Ñ$+,# %+# %+,%,# $+# '+,$,#


Var\ œ .#  .# œ "# œ "#

+# #+,,# +,#
œ "# œ "# . 

MTH 2401, LECTURE NOTES, Page 64, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

3. Exponential Random Variable Revisited.


The Gamma R.V.
The Moment Generating Function of the Exponential R.V. Let \ be an exponential r.v. with
parameter -. The mgf of \ can be calculated as

7Ð)Ñ œ I/)\ œ Bœ! /)B -/-B .B œ


∞ -
-) , ReÐ)Ñ  -. (3.1)

The 8th derivative of 7Ð)Ñ is

8!-
7Ð8Ñ Ð)Ñ œ Ð-) Ñ8" ß 8 œ !ß "ß #ß á ß (3.2)

which can be proved by induction. From (3.2) we thus have


8!
.8 œ -8 ß 8 œ !ß "ß #ß á (3.3)

In particular,
" # "
.œ - and .# œ -# Ê Var\ œ 5# œ -# Ê 5 œ .Þ (3.4)

The Gamma R.V. A r.v. \ is called gamma if its pdf is


α"
0 ÐBà αß " Ñ œ " /"B Ð">BÑÐαÑ 1‘ ÐBÑß α  !ß "  !, (3.6)

where

>ÐαÑ œ ! /B Bα" .Bß



(3.7)

referred to as the gamma function. (See Figure 3.1 where >ÐαÑ is depicted for α  !Þ)

MTH 2401, LECTURE NOTES, Page 65, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

Figure 3.1

From (3.7), it can be easily shown that >ÐαÑ œ Ðα  "Ñ>Ðα  "Ñ. If α is a positive integer, then
with >Ð"Ñ œ "ß we get

>ÐαÑ œ Ðα  "Ñ!ß α œ "ß #ß á (3.8)

In this case, the gamma pdf turns to


α"
0 ÐBà αß " Ñ œ " /"B ÐÐ"αBÑ
"Ñ! 1‘ ÐBÑß α œ "ß #ß á ß "  !. (3.9)

This is the pdf of the so-called Erlang r.v. Furthermore, if α œ ", (3.9) reduces to the
exponential density.

α œ !Þ)ß " œ !Þ&.


Using R-code we can plot the following gamma density function with parameters

x<-seq(0.01,4,0.01)
y<-dgamma(x,0.8,0.5)
plot(x,y,type="l",col="blue")

MTH 2401, LECTURE NOTES, Page 66, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

1.2
1.0
0.8
y

0.6
0.4
0.2

0 1 2 3 4

Here is another gamma density with parameters α œ " œ #. Since α œ # is an integer, it is


Erlang.

x<-seq(0.01,4,0.01)
y<-dgamma(x,2,2)
plot(x,y,type="l",col="blue")
0.6
0.4
y

0.2
0.0

0 1 2 3 4

It is easily seen that the gamma pdf (3.6) looks almost identical to the integrand of the gamma
function (3.7). Consequently,

Bœ!
∞ " B α"
/ B .B

MTH 2401, LECTURE NOTES, Page 67, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

" ∞ >ÐαÑ
œ " α " Bœ! /
" B
Ð" BÑα" .Ð" BÑ œ "α (3.10)

and thus the integral of 0 equals one, which proves that 0 is indeed a pdf. The MGF of gamma is

Q Ð)) œ Bœ! "/"B Ð">BÑÐαÑ /)B .B


∞ α"

œ Bœ! /Ð")ÑB ">BÐαÑ /)B .B


∞ α α"

using (3.10) and replacing " with "  ) we get

" α >ÐαÑ
œ >ÐαÑ Ð" ) )α .

Thus,

Q Ð) Ñ œ  "  )  ß ReÐ) Ñ  " Þ


α
"
(3.11)

"
Denote 7Ð)Ñ œ " ) . Then, Q Ð)Ñ œ 7α Ð)ÑÞ It can be easily shown that

Q w Ð)Ñ œ α" 7α" Ð)Ñ (3.12)

and also that

αÐα"Ñ α#
Q ww Ð)Ñ œ "# 7 Ð)ÑÞ (3.13)

In general,

αâÐα5"Ñ α5
Q Ð5Ñ Ð)Ñ œ "5
7 Ð)Ñß (3.14)

which can be proved by induction. From (3.14) we arrive at

αâÐα5"Ñ
.5 œ Q Ð5Ñ Ð!Ñ œ "5
. (3.15)

and therefore,
α
I\ œ " (3.16)
and
α
Var\ œ "# . (3.17)

In particular, for α œ ", we have the numerator of (3.15) reduce to 5x and thus, (3.15) to formula
(3.3) for the 5 th moment of the exponential r.v.

MTH 2401, LECTURE NOTES, Page 68, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

PROBLEMS

3.1. For an exponential r.v. with parameter - find

T Öl\  .l  85×ß 8 œ "ß #ß $. (3.5)

3.2. If \ is an exponential r.v. with parameter -, identify the r.v. -\ , where - is a positive real
constant.

3.3. Prove formulas (3.12) and (3.13).

3.4. Prove formula (3.14).

3.5. Let \ be a gamma r.v. with parameters α and " . Identify the r.v. -\ß where - is a positive
real constant?

3.6. Using R plot the gamma density function with parameters α œ " œ "!.

3.7. Using R plot the exponential density function with parameter - œ #.

MTH 2401, LECTURE NOTES, Page 69, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

4. Gaussian (Normal) Random Variable


A r.v. \ À H Ä ‘ is called Gaussian or Normal with parameters . and 5# if its pdf is given by

ÐB.Ñ#
0 ÐBÑ œ 0 ÐBà .ß 5# Ñ œ "
#15 #
/ #5 # ß B − ‘Þ (4.1)

We will say that \ − ÒR Ð.ß 5# ÑÓ, i.e. \ belongs to the class of normal r.v.'s with parameters .

denote 5 À œ 5# and assume 5  !.


and 5# . The parameter . can be any real number, while symbol 5# is strictly positive. We also

maximum value at B œ .ß approximately equal to !Þ$** 5" , where 5 œ 5# Þ (See Figure 4.1
Graphically, 0 is the famous bell-shaped curve symmetric about the line B œ . and having its

below.)

Figure 4.1

The integral of 0 belongs to the class of so-called transcendent functions which, except for some

burden of tabulation of J ÐBÑ œ ?œ∞ 0 Ð?Ñ.?.


particular values, can only be calculated numerically. We will discuss below how to reduce the
B

A special case of \ with . œ ! and 5# œ " is referred to as standard normal. Its pdf is denoted
by

 #1 /
" B# Î#
:ÐBÑ œ . (4.2)

Using R we plot its density.

curve(dnorm(x,2,1.5),from=-4,to=8)

MTH 2401, LECTURE NOTES, Page 70, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

0.25
0.20
dnorm(x, 2, 1.5)

0.15
0.10
0.05
0.00

-4 -2 0 2 4 6 8

It is common to denote any such r.v. by letter ^ , i.e. ^ − ÒR Ð!ß "ÑÓ.

The following two plots are of Gaussian r.v.'s with parameters "!ß 5# œ " and "!ß 5# œ %
with the source code

x<-seq(4,16,len=101)
y<-cbind(dnorm(x,10,1),dnorm(x,10,2))
matplot(x,y,type="l",ylab="f(x)")
text(7.5,.3,"X~N(10,1)")
text(14,.05,"X~N(10,4)")
0.4
0.3

X~N(10,1)
0.2
f(x)

0.1

X~N(10,4)
0.0

4 6 8 10 12 14 16

MTH 2401, LECTURE NOTES, Page 71, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

Notice that the command N+ß , in R carries + œ . and , œ 5 and not 5# .

Consider the affine transformation

\ œ 5^  ., which 5  !. (4.3)

Then,

J ÐBÑ œ T Ö\ Ÿ B× œ T Ö^ Ÿ 5" ÐB  .Ñ×

œ Q 5" ÐB  .Ñ, (4.4)

where Q denotes the PDF of ^ . To find the pdf of J we differentiate it:


.
0 ÐBÑ œ .B J ÐBÑ œ 5" Qw Ð 5" ÐB  .ÑÑ

5 #1 /
" " Ð 5" ÐB.ÑÑ# Î#
œ œ 0 ÐBà .ß 5# Ñ, (4.5)

i.e. \ − ÒR Ð.ß 5# ÑÓ if \ is the above affine transformation of ^ . This gives rise how to tabulate
Gaussian r.v.'s reducing them to the standard normal.

For instance, suppose we need to find T Ö\ Ÿ B× for some \ − ÒR Ð.ß 5# ÑÓ. We use the inverse
affine transformation of (4.3):

^ œ 5" Ð\  .Ñ. (4.6)

Thus,

T Ö\ Ÿ B× œ T Ö^ Ÿ 5" ÐB  .Ñ× œ Q 5" ÐB  .Ñ. (4.7)

Tables of Q can be found in all standard statistical software packages and text books in
probability and statistics, although for the latter they can be quite crude.

The Moment Generating Function of the Gaussian R.V. If \ − ÒR Ð.ß 5# ÑÓ, then we can
represent \ œ 5^  . and thus

Q Ð)Ñ À œ I/)\ œ I/)5^ /). œ /). 7Ð)5 Ñ, (4.8)

using the linearity of expectation and denoting 7Ð)Ñ the MGF of the standard normal. From
(4.8), it is thus sufficient to calculate 7Ð)Ñ, which we will do below.

7Ð)Ñ œ Bœ∞ /)B "#1 /B Î# .B.


∞ #
(4.9)

The exponent of the integrand is

MTH 2401, LECTURE NOTES, Page 72, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

 "# ÐB#  #)BÑ (4.10)

which is short of a perfect square ÐB#  #)B  )# Ñ. After a standard maneuvering with (4.10) we
arrive at

 "# ÒÐB#  #)B  )# Ñ  )# Ó œ  "# ÐB  )Ñ#  "# )# Þ (4.11)

Substituting (4.11) in (4.9) we have

7Ð)Ñ œ Bœ∞ / # )  #1 /
∞ " # " ÐB) Ñ# Î#
.B

œ / # ) Bœ∞ "#1 /ÐB)Ñ Î# .B œ / # ) † "ß


" # ∞ # " #

since the second factor is the integral of a normal density with parameters . œ ) and 5# œ ".
Thus we have
#
7Ð)Ñ œ /) Î#
. (4.12)

Using (4.12) we have


#
Q Ð)Ñ œ /). /Ð)5Ñ Î# œ expÐ.)  "# 5# )# Ñ. (4.13)

The Mean and Variance of a Gaussian R.V. To find the mean and variance the Gaussian r.v.
we will use the MGF techniques. From (4.13),

Q w Ð)Ñ œ Ð.  5# )ÑQ Ð)Ñ (4.14)

Ê Q w Ð!Ñ œ .. (4.15)

Furthermore,

Q ww Ð)Ñ œ 5# Q Ð)Ñ  Ð.  5# )ÑQ w Ð)Ñ

Ê .# œ 5 #  . † .

Ê Var\ œ .#  ÐI\Ñ# œ 5#  .#  .# œ 5# . (4.16)

It turns out that the parameters . and 5# are in fact the mean and variance of a Gaussian r.v. 

The Affine Transformation of a Gaussian R.V. Let \ − ÒR Ð.ß 5# ÑÓ. We are wondering about
the distribution of the r.v.

MTH 2401, LECTURE NOTES, Page 73, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

] œ +\  ,ß

where +  ! and , − ‘. We will identify the distribution of ] by using the MGF:

7] Ð)Ñ œ I/)] œ I/)+\ /), œ /,) 7\ Ð+)Ñ


" # "
Ð+) Ñ# # #
œ /,) /.+) # 5 œ /Ð+.,Ñ) # Ð+5Ñ ) Þ

Therefore, ] − ÒR Ð+.  ,ß +# 5# ÑÓ. 

MTH 2401, LECTURE NOTES, Page 74, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

PROBLEMS

4.1. Let ] œ +^  , be an affine transformation of the standard Gaussian r.v., with + Á !. Use a
similar procedure as in (4.4-4.5) to find the pdf of ] .

4.2. Let \ − ÒR Ð.ß 5# ÑÓ. Calculate T Öl\  .l  85 × for 8 œ "ß #ß $.

4.3. If \ − ÒR Ð"ß %ÑÓ. Find the mgf of the r.v. ] œ $\  #.

4.4. Under the condition of Problem 4.3 give the pgf of ] .


#
4.5. Let \ be a r.v. with the MGF 7Ð)Ñ œ /) ß ReÐ)Ñ − ‘. What is the pdf of \ ?

Let \ − ÒR Ð.ß 5# ÑÓ. The r.v. ] œ /\ is called lognormal with parameters . and 5# .

4.6. Find the pdf of ] .


"
4.7. Find the pdf of ^ œ ] .

Solution. J^ B œ T  ]" Ÿ B œ T /\ Ÿ B œ T   \ Ÿ lnB

œ T \    lnB œ "  J\   lnB.

Differentiating w.r.t. B and using the chain rule gives

0^ B œ 0\   lnB B" œ 1Ð!ß∞Ñ B.


lnB+.#

B #15# /
" "  #5 # 

4.8. Suppose it is known that a certain bridge can hold a maximum of 100 vehicles at a time.
However, the total weight of all cars passing through the bridge can vary. It is known that the
weight of a vehicle is a Gaussian random variable (r.v.) with mean . œ % and standard deviation
5 œ !Þ% measured in 1000 pound units. It can be shown that 100 vehicles will be a Gaussian r.v.
with mean ."!! œ %!! and standard deviation 5"!! œ %. [The latter can be rigorously proved.]
Civil engineers worry that as many as 100 vehicles can exceed the threshold of 410 units and
thus cause some structural damage to the bridge. What is the probability that this will ever take
place?

4.9. Below are two Gaussian curves along with R source codes. Looking at the curves give the
parameters of respective Gaussian pdf's and interpret the areas below the curves.

x<-seq(-4,8,0.01)
y<-dnorm(x,2,1.5)
plot(x,y,type="l")
polygon(c(x[x>4],4),c(y[x>4],y[x==-
4]),col="honeydew2")

MTH 2401, LECTURE NOTES, Page 75, Version 54


CHAPTER II. CONTINUOUS DISTRIBUTIONS

0.25
0.20
0.15
y

0.10
0.05
0.00

-4 -2 0 2 4 6 8

x<-seq(-4,8,0.01)
y<-dnorm(x,2,1.5)
plot(x,y,type="l")
polygon(c(x[x<0],0),c(y[x<0],y[x==-
4]),col="honeydew2")
0.25
0.20
0.15
y

0.10
0.05
0.00

-4 -2 0 2 4 6 8

4.10. Using R plot the Gaussian density function with parameters . œ #Þ& and 5# œ &.

4.11. Under the condition of Problem 4.10 plot the integral area under the left half of the curve.

MTH 2401, LECTURE NOTES, Page 76, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

CHAPTER III. JOINTLY DISTRIBUTED


RANDOM VARIABLES
1. Discrete Random Vectors
We can have two or more r.v.'s involved in an experiment. In all cases they are defined on the
same probability space ÐHß Y ß T Ñ. Suppose there are two r.v. \ and ] defined on H and valued
in I" and I# , respectively. Thus, we organize \ and ] as a random vector

Ð\ß ] Ñ À H Ä I" ‚ I# .

We assume that both I" and I# are discrete (at most countable) spaces. While in one-
dimensional case we sought T Ö\ œ B× for some B − I" , in two-dimensional case it would be
natural to seek T ÖÐ\ß ] Ñ œ ÐBß CÑ×ß where ÐBß CÑ − I" ‚ I# . Notice that ÐBß CÑ as a pair is
literally the Cartesian product of ÖB× and ÖC× and thus the latter can be rewritten in the form
T ÖÐ\ß ] Ñ − ÖB× ‚ ÖC×× or in the equivalent form

T ÖÖ\ œ B× ∩ Ö] œ C××

(in notation, T Ö\ œ Bß ] œ C×). The latter is less obvious. One can however, easily seen that
the sets ÖÐ\ß ] Ñ œ ÐBß CÑ× and Ö\ œ B× ∩ Ö] œ C× are identical. We can use the pick-a-point
process to show it.

Example 1.1. If Ew and F w are subsets from I" and I# , respectively, we will operate with
E œ Ö\ − Ew × and F œ Ö] − F w × and measure the event E ∩ F , i.e. we will be interested in

T Ö\ − Ew ß ] − F w × œ T ÐE ∩ FÑ, (1.1)

where one comma separating the events abbreviates the intersection and braces. Suppose
F"w ß F#w ß á is a measurable partition of I# . Thus, F" ß F# ß á , being Ö] − F"w ×ß Ö] − F#w ×á ,
form a measurable partition of H, as depicted in Figure 1.1. Therefore,

T Ö\ − Ew × œ T ÐEÑ œ T Ð ∪ ÐE ∩ F5 ÑÑ œ T ÐE ∩ F5 Ñ

5 5
w
œ T Ö\ − E ß ] − F5 ×Þ (1.2)
5

Probability and Statistics MTH 2401, Page 77, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Figure 1.1

So, T Ö\ − Ew × is obtained from the joint distribution

T Ö\ − Ew ß ] − F5w ×ß 5 œ "ß #ß á (1.3)

and it is called the "marginal" distribution of r.v. \ . More generally, if E"w ß E#w ß á is a
measurable partition of I" , then proceeding with each E3w as with Ew , we obtain the marginal
distribution of r.v. \ À

T Ö\ − E3w × œ T Ö\ − E3w ß ] − F5 ×. (1.4)


5

Likewise, we obtain the marginal distribution of r.v. ] . 

Example 1.2. Suppose we flip a fair coin twice. Let \ be the number of heads in the second flip
and ] is the total number of tails in two flips. Thus we have

\ À H Ä I" œ Ö!ß "×

] À H Ä I# œ Ö!ß "ß #×

H œ ÖÐL" ß L# Ñß ÐL" ß X# Ñß ÐX" ß L# Ñß ÐX" ß X# Ñ×.

T Ö\ œ !ß ] œ !× œ !

T Ö\ œ "ß ] œ !× œ T ÖÐL" ß L# Ñß ÐX" ß L# Ñ× ∩ ÖÐL" ß L# Ñ×


œ "% Þ

T Ö\ œ !ß ] œ "×

œ T ÖÐL" ß X# Ñß ÐX" ß X# Ñ× ∩ ÖÐL" ß X# Ñß ÐX" ß L# Ñ× œ "


%

MTH 2401, LECTURE NOTES, Page 78, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

T Ö\ œ "ß ] œ "×

œ T ÖÐL" ß L# Ñß ÐX" ß L# Ñ× ∩ ÖÐL" ß X# Ñß ÐX" ß L# Ñ× œ "


%

T Ö\ œ !ß ] œ #×

œ T ÖÐL" ß X# Ñß ÐX" ß X# Ñ× ∩ ÖÐX" ß X# Ñ× œ "


%

T Ö\ œ "ß ] œ #×

œ T ÖÐL" ß L# Ñß ÐX" ß L# Ñ× ∩ ÖÐX" ß X# Ñ× œ !Þ

The results are summarized in Table 1.2

\Ï] ! " # Marginal \ À T Ö\ œ 3×


" " "
! ! % % #
" " "
" % % ! #
" " "
Marginal ] À T Ö] œ 4× % # %

Table 1.2

The r.v.'s \ and ] would be independent if

T Ö\ œ 3ß ] œ 4× œ T Ö\ œ 3×T Ö] œ 4×

for all 3 and 4, i.e. the above probability of the intersection is the product of marginal
probabilities, consistent with the definition of independence, except that now we are talking
about the probability of pairs of events. For example,
" "
T Ö\ œ "ß ] œ "× œ % œ T Ö\ œ "×T Ö] œ "× œ # † "# Þ

So, these two events are independent. However,


" " "
T Ö\ œ "ß ] œ #× œ ! Á T Ö\ œ "×T Ö] œ #× œ # † % œ )

indicating that \ and ] are not independent, since independence for some of the combinations
does not hold. 

For two discrete r.v.'s \ and ] , with

T Ö\œ3ß] œ4×
T Ö\ œ 3l] œ 4× œ T Ö] œ4× , (1.5)

MTH 2401, LECTURE NOTES, Page 79, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

(with T Ö] œ 4× Á !) we can define the conditional distribution, in which we notice the


distribution in the denominator is marginal of ] .

Recall that events E and F are independent if and only if

T ÐE ∩ FÑ œ T ÐEÑT ÐFÑÞ

Analogously,

Definition 1.1. R.v.'s \ À H Ä ÖB" ß B# ß á × and ] À H Ä ÖC" ß C# ß á × are called independent, if


for any B3 and C4 ß

T Ö\ œ B3 ß ] œ C4 × œ T Ö\ œ B3 ×T Ö] œ C4 ×ß (1.6)

i.e. the joint distribution of \ and ] is the product of their marginal distributions. The
independence is similarly defined for 8 r.v.'s. 

Theorem 1.1. Suppose r.v.'s \ and ] are independent and let 1 and 2 are some real-valued
functions. Then 1Ð\Ñ and 2Ð] Ñ are also independent and

IÒ1Ð\Ñ2Ð] ÑÓ œ IÒ1Ð\ÑÓIÒ2Ð] ÑÓÞ (1.7)

Proof. We only prove the second part of the statement, namely, the validity of (1.7)Þ By the
definition of independence (eq. (1.6)),

IÒ1Ð\Ñ2Ð] ÑÓ œ   1ÐB3 Ñ2ÐC4 ÑT Ö\ œ B3 ×T Ö] œ C4 ×.


∞ ∞

3œ! 4œ!

The statement follows after iterating the two series. 

Theorem 1.1 gets easily extended for any 8-tuple of r.v.'s and it finds a very important utility
with transforms of the sums of r.v.'s. Let \" ß á ß \8 be iid (independent and identically
distributed) r.v.'s with a common pgf

1ÐDÑ œ ID \" .

Then, by Theorem 1.1

ID \" á\8 œ IÒD \" âD \8 Ó œ 1ÐDÑ8 . (1.8)

Now, replacing D with /) we have

I/\" á\8 œ Ò7Ð)ÑÓ8 , (1.9)

MTH 2401, LECTURE NOTES, Page 80, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

where 7Ð)Ñ is the common MGF of \" ß á ß \8 . 

MTH 2401, LECTURE NOTES, Page 81, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

2. The General Case. Continuous and Mixed Distributions


In the general case (without a special assumption on \ and ] being discrete or continuous), let
again

Ð\ß ] Ñ À H Ä I" ‚ I# .

If both \ and ] were discrete, then we would deal with

T ÖÐ\ß ] Ñ − Ð3ß 4Ñ× œ T Ö\ œ 3ß ] œ 4× (2.1)

In the general case, specifically if both \ and ] are continuous r.v.'s, then we will be most
interested in T ÖÐ\ß ] Ñ − H× in analog to T Ö\ − E× for a single r.v.

We start however with H being a rectangle E ‚ F . It can be shown that

T ÖÐ\ß ] Ñ − E ‚ F× œ T Ö\ − E× ∩ Ö] − F×

also in notation

œ T Ö\ − Eß ] − F×. (2.2)

In particular, if E œ Ð  ∞ß BÓ and F œ Ð  ∞ß CÓ, for some B − I" and C − I# , we have

T Ö\ Ÿ Bß ] Ÿ C×, (2.3)

which we reasonably denote by J ÐBß CÑ and call it the joint PDF of \ and ] .

Now let us assume that both \ and ] continuous and that

I" ‚ I# œ ‘ ‚ ‘ œ ‘# .

The joint PDF of \ and ] is this case has a resemblance with the univariate case in the sense
that there is a unique joint pdf 0 ÐBß CÑ   ! defined on ‘# , such that

J ÐBß CÑ œ T Ö\ Ÿ Bß ] Ÿ C×

œ ?œ∞ @œ∞ 0 Ð?ß @Ñ.@.?


B C
(2.4)

i.e. the PDF of Ð\ß ] Ñ can be expressed as the integral of a unique nonnegative function 0 .

The three-dimensional plot of probability density function 0 Bß C œ "  "# B# C# 
#1 / is as follows.

MTH 2401, LECTURE NOTES, Page 82, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

0.15

f( x,y)
0.10

0.05

-3 3
-2 2
-1 1
X0 0
Y
1 -1
2 -2

3 -3

The R-source code for this plot is

f<-function(x,y) {
z<-(1/(2*pi))*exp(-0.5*(x^2+y^2))
}
y<-x<-seq(-3,3,length=50)
z<-outer(x,y,f)
persp(x,y,z)
persp(x,y,z,theta=45,phi=30,expand=0.6,ltheta=120,shade=0.75,t
icktype="detailed",xlab="X",ylab="Y",zlab="f(x,y)",col="honeyde
w2")

In a more general case, a set H, which Ð\ß ] Ñ may belong to, need not be a rectangle. In this
case (2.4) is generalized as follows:

T ÖÐ\ß ] Ñ − H× œ   0 ÐBß CÑ.ÐBß CÑ. (2.5)


H

The latter is the volume of the cylinder enclosed between flat set H in the (X,Y)-plane and the
surface 0 and surrounded by the lateral surface generated by a line orthogonal to H and running
across its boundary `H. (See Figure 2.2 below.)

In many cases, the boundary `H can be smoothly parametrized. Suppose that the projection of H
on Y-axis is an interval Ò+ß ,Ó and suppose `H œ G" ∪ G# ß such that both projections of G" and
G# on Y-axis are Ò+ß ,Ó. Suppose G" and G# are parametrized as :" and :# ß respectively. This is
depicted in Figure 2.1 below.

MTH 2401, LECTURE NOTES, Page 83, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Figure 2.1

procedure. First, for a C − +ß ,ß we draw the C-section of the solid (cylinder enclosed between 0
Then, the integration of 0 over the region H is rendered according to the following informal

This plane crosses the boundary of H at :" C and :# C. See Figure 2.2 below.
and H). This is the intersection of the plane perpendicular to XY-plane and parallel to XZ-plane.

surface

f ( , y)

n
tio ion
ro jec - sect
p y
b

Figure 2.2

Hence the C-section looks like a curved trapezoid enclosed between the curve 0  † ß C and the
segment :" Cß :# C. The integral area of the C-section is obviously Bœ# :" ÐCÑ 0 ÐBß CÑ.B.
: ÐCÑ

Assuming that the C-section does not change from C to C  .Cß we will have the volume of the
infinitesimal layer of the solid enclosed between C and C  .C as ZC œ Bœ# :" ÐCÑ 0 ÐBß CÑ.B.C.
: ÐCÑ

To find the volume of the whole solid requires C to run from + to , and the summation of all such
ZC 's. Therefore, integral (2.5) equals

T ÖÐ\ß ] Ñ − H× œ   0 ÐBß CÑ.ÐBß CÑ


H

œ Cœ+ Bœ# :" ÐCÑ 0 ÐBß CÑ.B.C.


, : ÐCÑ
(2.5a)

MTH 2401, LECTURE NOTES, Page 84, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Furthermore, we can also encounter cases like T Ö1Ð\ß ] Ñ − V×. It can be shown to equal

T Ö1Ð\ß ] Ñ − V× œ  0 ÐBß CÑ.ÐBß CÑ. (2.6)


Ö1ÐBß CÑ − V×

Example 2.1. Let Ð\ß ] Ñ be a random vector with the pdf

0 ÐBß CÑ œ /ÐBCÑ 1‘# ÐBß CÑ œ 


/ÐBCÑ ß #
ÐBß CÑ − ‘
(2.7)
!ß otherwise.

Using the R-code

f<-function(x,y) {
z<-exp(-(x+y))
}
y<-x<-seq(0,6,length=50)
z<-outer(x,y,f)
persp(x,y,z)
persp(x,y,z,theta=45,phi=30,expand=0.6,ltheta=120,shade=0.75,ti
cktype="detailed",xlab="X",ylab="Y",zlab="f(x,y)")

we plot the density function 0

1.0
0.8
f( x,y)

0.6
0.4
0.2

0 6
1 5
2 4
X3 3
Y
4 2
5 1

6 0

Now we need to find T Ö \


] Ÿ >×. In other words, we need to find the PDF of the r.v.
\
] . From
(2.6-2.7),

MTH 2401, LECTURE NOTES, Page 85, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

T\ 
] Ÿ > œ
  0 ÐBß CÑ.ÐBß CÑ.
Ö BC Ÿ >×

The area over which we need to integrate is  BC Ÿ > œ C   "> B À

Y y  1t x

{ y  1t x}

Thus, T  \
] Ÿ >

=  /ÐBCÑ .ÐBß CÑ œ Bœ! /B CœBÎ> /C .C.B


∞ ∞

ÖC   "> B× ∩ ‘#

œ Bœ! /B /BÎ> .B œ Bœ! /B


∞ ∞ "> >
> .B œ "> Þ (2.8)

Notice that since \ and ] are both positive, their ratio is also positive and thus the above
probability is zero for all negative values of >. In conclusion,

TÖ\
] Ÿ >× œ
>
"> 1‘ Ð>Ñ. (2.9)

#
` `#
Finally, we remark from (2.4) that applying the second order partial derivatives `B`C or `C`B
dependent in which of the two orders we iterate the integration, we extract the joint pdf:

`#
`C`B J ÐBß CÑ œ 0 ÐBß CÑ. (2.10)

MTH 2401, LECTURE NOTES, Page 86, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

2.1. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ  #
"
B  -Cß !  B  "ß #  C  "!
!ß elsewhere.

Ð3Ñ Determine the value of the constant - .

Ð33Ñ Find T Ö\  ]  &×Þ

2.2. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ /ÐBCÑ 1‘# ÐBß CÑ

œ
/ÐBCÑ ß #
ÐBß CÑ − ‘
!ß otherwise.

Find T \  ] Ÿ >ß > − ‘.

Solution. We will proceed as followsÞ Let ^ œ \  ] Þ Then, if >  !ß

J^ > œ T \  ] Ÿ > œ  0 Bß C.ÐBß CÑ


ÖB  C Ÿ >×

(see the figure below)

œ Bœ! Cœ! 0 Bß C.C.B


> >B

{x y t}

œ Bœ! Cœ! /ÐBCÑ .C.B


> >B

œ Bœ! /B Cœ! /C .C .B



> >B

"  />B

MTH 2401, LECTURE NOTES, Page 87, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

œ Bœ! /B .B  /> Bœ! .B


> >

œ "  />  >/> ß with >   !ß

and J^ > œ ! when >  !, or in a more compact form

J^ > œ "  />  >/> 1Ò!ß∞Ñ >. 

2.3. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ ./ /Ð.B/ CÑ 1‘# ÐBß CÑ

œ
./ /Ð.B/ CÑ ß #
ÐBß CÑ − ‘
!ß otherwise,

where . and / are positive reals.

3  T \  ] Ÿ >ß > − ‘

33 T \  ] Ÿ >ß > − ‘

333 T !  \  ] Ÿ >ß > − ‘ .

Solution. 3 We will proceed as followsÞ Let ^ œ \  ] Þ Then, if >  !ß

J^ > œ T \  ] Ÿ > œ  0 Bß C.ÐBß CÑ


ÖB  C Ÿ >×

(see the figure below)

œ Bœ! Cœ! 0 Bß C.C.B


> >B

{x y t}

œ Bœ! Cœ! ./ /Ð.B/ CÑ .C.B


> >B

MTH 2401, LECTURE NOTES, Page 88, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

œ Bœ! ./.B Cœ! / // C .C .B



> >B

"  //>B

œ Bœ! ./.B .B  // > Bœ! ./Ð./ ÑB .B


> >

/ > 
œ "  /.>  .
./ / "  /./ > 

. .
œ "  /.>  ./ /
- >
 ./ /
.>

. /
œ" ./ /
/ >
 ./ /
.>
ß with >   !ß

and J^ > œ ! when >  !, or in a more compact form

J^ > œ "  .
./ /
/ >
 /
./ /
.>
1Ò!ß∞Ñ >. 

density function 0 of the vector \ß ]  and identify the parameters . and / .
2.4. Using the pattern of Example 2.1 and the R-source code below and problem 2.3 find the

f<-function(x,y) {
z<-0.256*exp(-0.6*(2*x+0.3*y))
}
y<-x<-seq(0,3,length=50)
z<-outer(x,y,f)
persp(x,y,z)
persp(x,y,z,theta=45,phi=30,expand=0.6,ltheta=120,shade=0.75,ticktype
="detailed",xlab="X",ylab="Y",zlab="f(x,y)",col="lightskyblue")

MTH 2401, LECTURE NOTES, Page 89, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

1.0

f( x,y)
0.5

0.0 3.0
0.5 2.5
1.0 2.0
1.5 1.5
X Y
2.0 1.0
2.5 0.5

3.0 0.0

\ß ]  with parameters . œ !Þ' and / œ ".


2.5. Under the condition of Problem 2.3 write the program and plot the density 0 of the vector

2.6. Suppose a point is located in a unit square and that the location of the point within the unit
square obeys the uniform distribution, i.e., if \ and ] are the coordinates of the particle, then
their joint pdf is

0 ÐBß CÑ œ 
"ß !  B  "ß !  C  "
!ß elsewhere.

a) Sketch the probability density surface.

b) If J Bß C œ T \ Ÿ Bß ] Ÿ Cß find J !Þ#ß !Þ%.

c) Calculate T !Þ" Ÿ \ Ÿ !Þ$ß !  ] Ÿ !Þ&.

MTH 2401, LECTURE NOTES, Page 90, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

3. Marginal PDF's and Marginal Densities


From the definition of the joint PDF,

J ÐBß CÑ œ T Ö\ Ÿ Bß ] Ÿ C×

it follows that for C Ä ∞, the set Ö] Ÿ C× approaches H. This is literally because ] Ð=Ñ Ÿ ∞
for all =. Then,

lim J ÐBß CÑ œ T Ö\ Ÿ B× ∩ H œ T Ö\ Ÿ B×


CÄ∞

œ J\ ÐBÑ (marginal PDF of \ ). (3.1)

On the other hand,

J\ ÐBÑ œ lim J ÐBß CÑ œ ?œ∞ @œ∞ 0 Ð?ß @Ñ.@.?.


B ∞
(3.2)
CÄ∞

Similarly,

lim J ÐBß CÑ œ J] ÐCÑ (marginal PDF of ] ). (3.3)


BÄ∞

The corresponding marginal densities (pdf's) can be obtained by differentiating the marginal
PDF's. So, from (3.2) we have

œ @œ∞ 0 ÐBß @Ñ.@.


. ∞
0\ ÐBÑ À œ .B J\ ÐBÑ (3.4)

The latter is due to the Newton-Leibnitz formula


. B
.B @œ+ :Ð@Ñ.@ œ :ÐBÑ, for any +. (3.5)

Example 3.1. Let \ and ] be two r.v.'s with the given joint pdf

0 ÐBß CÑ œ 
"& #
% B ß ! Ÿ C Ÿ "  B#
(3.6)
!ß elsewhere.

Find the marginal pdf's of \ and ] .

Solution. As we see it from Figure 3.1, 0 ÐBß CÑ  ! between the line C" ÐBÑ œ ! and the parabola
C# ÐBÑ œ "  B# .

MTH 2401, LECTURE NOTES, Page 91, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Figure 3.1

To find the marginal pdf 0\ ÐBÑ we use the formula:

0\ ÐBÑ œ Cœ∞ 0 ÐBß CÑ.C œ Cœ!


∞ "B# "& #
% B .C1Ò"ß"Ó ÐBÑ

"& #
œ % B Ð"  B# Ñ1Ò"ß"Ó ÐBÑÞ (3.7)

B Ÿ "  C, which means that we will use the positive value of 0 ÐBß CÑ for
Now, to find the marginal pdf 0] we need to integrate 0 w.r.t. B. From C Ÿ "  B# we find that

 "  C Ÿ B Ÿ "  CÞ In other words, given C fixed on Ò!ß "Óß the integration of 0 w.r.t. B
will run along the segment of line through C, parallel to X-axis, between B œ  "  C and
B œ "  C. (See Figure 3.2 below.)

Figure 3.2

Consequently, we have

0] ÐCÑ œ Bœ"C "&


"C
% B .B1Ò!ß"Ó ÐCÑ
#

œ &# Ð"  CÑ$Î# 1Ò!ß"Ó ÐCÑ. (3.8)


Property 3.1. It holds true that

IÒ+\  ,] Ó œ +I\  ,I] Þ (3.9)

MTH 2401, LECTURE NOTES, Page 92, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Indeed,IÒ+\  ,] Ó œ   Ð+B  ,CÑ0 ÐBß CÑ.ÐBß CÑ


BC

œ +  B0 ÐBß CÑ.ÐBß CÑ  ,  C0 ÐBß CÑ.ÐBß CÑ


BC BC

œ + B 0 ÐBß CÑ.C.B  , C 0 ÐBß CÑ.B.C


B C C B

(since  0 ÐBß CÑ.C œ 0\ ÐBÑ and  0 ÐBß CÑ.B œ 0] ÐCÑ )


C B

œ + B0\ ÐBÑ.B  , C0] ÐCÑ.C,


B C

which proves the assertion. 

An immediate utility of Proposition 3.1 is following.

Example 3.2. Let Ð\" ß á ß \8 Ñ be a random sample such that \3 − Ò\Ó and \ is a r.v. with
mean .. Denote
_
\ 8 À œ 8" Ð\"  á  \8 Ñ (3.10)

and call it a sample mean of population Ò\Ó. From Proposition 3.1, we easily find that

I\ 8 œ 8"  I\3 œ 8" 8. œ ..


_ 8
(3.11)
3œ"

Thus, the mean of the sample mean is . (i.e. the mean of the population) regardless of its size. 

MTH 2401, LECTURE NOTES, Page 93, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

3.1. Generalize Property 3.1 as follows. Let 1 and 2 be two functions and let \ and ] be r.v.'s.
Show that

IÒ1Ð\Ñ  2Ð] ÑÓ œ I1Ð\Ñ  I2Ð] Ñ. (3.12)

3.2. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ 
)BCß !BC"
(3.13)
!ß elsewhere.

Draw the region where 0  ! and find the marginal pdf's of \ and ] .

Solution. First off, the region where 0  ! is depicted as follows:

Y
1

{y x}

X
1

(a) 0\ ÐBÑ œ CœB )BC.C1Ð!ß"Ñ ÐBÑ


"

œ ) "# BC# "CœB 1Ð!ß"Ñ ÐBÑ œ %BÐ"  B# Ñ1Ð!ß"Ñ ÐBÑ

(b) 0] ÐCÑ œ Bœ! )BC.B1Ð!ß"Ñ ÐCÑ


C

œ ) "# CB# CBœ! 1Ð!ß"Ñ ÐCÑ œ %C$ 1Ð!ß"Ñ ÐCÑÞ 

3.3. Under the condition of Problem 3.2, find T \  ] .

3.4. Under the condition of Problem 3.2, find

Ð3Ñ I  #\
"
 $] #  %

Ð33Ñ I  \]
" 
Þ

MTH 2401, LECTURE NOTES, Page 94, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

4. Independent Random Variables


As in Example 1.2, two r.v.'s are independent if

T Ö\ − Eß ] − F× œ T Ö\ − E×T Ö] − F× (4.1)

holds for all Borel sets E and F . This definition makes sense and it follows from the
independence of two events in basic probability. However, for continuous r.v.'s, condition (4.1)
is difficult to verify for all E and F . If we use sets like Ð  ∞ß BÓ for E and Ð  ∞ß CÓ for F ,
expression (4.1) will be reduced to an easily verifiable product of the marginal PDF's:

T Ö\ Ÿ Bß ] Ÿ C× œ J ÐBß CÑ œ J\ ÐBÑJ] ÐCÑ. (4.2)

Fortunately, if (4.2) holds, then (4.1) also holds for all Borel subsets of ‘, in particular, all
infinite closed intervals. This is not easy to prove, but it is contained in advanced text books in
probability. From (4.2), we readily obtain that

0 ÐBß CÑ œ 0\ ÐBÑ0] ÐCÑß (4.3)

if and only if \ and ] are independent. From (4.3), it follows that a straightforward
independence test is to calculate the marginal densities and then check if (4.3) holds. However,
there is yet another way to test for independence without the necessity of integrating the
marginal densities.

Property 4.1. Let \ and ] be two continuous r.v.'s with joint pdf 0 ÐBß CÑ. \ and ] are
independent if and only if

0 ÐBß CÑ œ 1ÐBÑ2ÐCÑß (4.4)

where 1 and 2 are nonnegative functions.

Proof. Suppose (4.4) holds true. Integrating 0 ÐBß CÑ in B and then in C gives

0] ÐCÑ œ Bœ∞ 0 ÐBß CÑ.B œ +2ÐCÑß


where + œ Bœ∞ 1ÐBÑ.B. Similarly,


0\ ÐBÑ œ ,1ÐBÑß

where , œ Cœ∞ 2ÐCÑ.C. On the other hand,


" œ Bœ∞ Cœ∞ 0 ÐBß CÑ.C.B œ +,.


∞ ∞

MTH 2401, LECTURE NOTES, Page 95, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Therefore, 1 and 2 differ from marginal densities in multiplicative constants + and , such that
+, œ ". Consequently,

0 ÐBß CÑ œ " † 1ÐBÑ2ÐCÑ œ +,1ÐBÑ2ÐCÑ œ 0\ ÐBÑ0] ÐCÑ,

which proves that \ and ] are independent. The converse holds obviously true. 

Remark 4.1. To completely utilize Property 4.1, we notice that in many applications, a joint pdf
is positive on some proper subset of ‘# Þ It often reads

0 ÐBß CÑ œ :ÐBß CÑ1H ÐBß CÑ. (4.5)

Now, two things must take place in order for \ and ] to be independent. Firstly, : must be
factorizable. Secondly, H must be a rectangle, i.e. H must be a Cartesian product E ‚ F . The
latter is due to the property of the indicator function:

1E‚F ÐBß CÑ œ 1E ÐBÑ1F ÐCÑ, (4.6)

which is easy to verify by seeing that the left and right-hand sides of (4.6) are simultaneously
equal 1 or 0. This adds to the full factorization of 0 ÐBß CÑ. It is readily seen that only if H is a
rectangle, can 1H be factorized and thus 0 is factorizable.

In a nutshell, the pdf 0 in (4.5) is factorizable (and thus \ and ] are independent) if and only
if : is factorizable and H is a rectangle. 

Example 4.1. Let \ and ] have the joint pdf 0 given by

0 ÐBß CÑ œ 
"&/$B&C ß #
ÐBß CÑ − ‘
(4.7)
!ß otherwise.

In this case the algebraic part of 0 is clearly factorizable and, in addition, 0  ! on ‘# ß which is
a rectangle Ò!ß ∞Ñ ‚ Ò!ß ∞Ñ. Therefore, \ and ] are independent. 

Example 4.2. Let \ and ] have the joint pdf 0 be given by

0 ÐBß CÑ œ 
-B/B Cß !BC"
!ß otherwise.

In this case, the algebraic part is factorizable, but 0  ! on the upper triangle of the unit square
Ð!ß "Ñ ‚ Ð!ß "Ñ and thus \ and ] are not independent. 

Example 4.3. In the condition of Example 3.1, with

0 ÐBß CÑ œ 
"& #
% B ß ! Ÿ C Ÿ "  B#
!ß elsewhere

MTH 2401, LECTURE NOTES, Page 96, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

we found that
"& #
0\ ÐBÑ œ % B Ð"  B# Ñ1Ò"ß"Ó ÐBÑ

and
0] ÐCÑ œ &# Ð"  CÑ$Î# 1Ò!ß"Ó ÐCÑ.

From Figure 4.1 below we see that the area where 0 ÐBß CÑ  ! and 0\ ÐBÑ0] ÐCÑ  ! do not
coincide concluding that \ and ] are not independent. In addition, 0 Á 0\ 0] , which is obvious.
However, the first argument is crucial in seeing that 0  ! not on a rectangle, as we learn it
(indirectly) from Proposition 4.1. This is a vivid illustration to the independence principle in
Remark 4.1.

Figure 4.1

Property 4.2. Let \ and ] be independent r.v.'s and let K and L be two functions. Then,

IÒKÐ\ÑLÐ] ÑÓ œ IÒKÐ\ÑÓIÒLÐ] ÑÓ. (4.8)

Proof. Without loss of generality we prove this for continuous r.v.'s.

IÒKÐ\ÑLÐ] ÑÓ œ   KÐBÑLÐCÑ0 ÐBß CÑ.ÐBß CÑ


BC

(since by independence, 0 ÐBß CÑ œ 0\ ÐBÑ0] ÐCÑ and subsequent integral iteration)

œ  KÐBÑ0\ ÐBÑ.B LÐCÑ0] ÐCÑ.C œ IÒKÐ\ÑÓIÒLÐ] ÑÓ.


B C

Remark 4.2. Furthermore, we can show that if \ and ] be independent r.v.'s and K and L be
two functions, then, the r.v.'s KÐ\Ñ and LÐ] Ñ are independent. 

MTH 2401, LECTURE NOTES, Page 97, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

4.1. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ 
#B/C ß ! Ÿ B Ÿ "ß !  C  ∞
!ß elsewhere.

Investigate if \ and ] are independent.

4.2. Let \ and ] be two r.v.'s with the joint pdf

0 ÐBß CÑ œ 
# #
- /ÐB C Ñ ß ! Ÿ B Ÿ ∞ß !  C  ∞
!ß elsewhereß

where - is a positive constant. Are the events Ö\   "× and Ö "#  ] Ÿ #× independent?

factorizable and because 0  ! on a rectangle Ò!ß ∞Ñ ‚ !ß ∞. Secondly, because \ and ] are
Solution. Firstly, we easily conclude that \ and ] are independent. This is because 0 is clearly

independent, they generate independent events. Namely, recall that \ and ] are by the original
definition independent if and only if

T \ − Eß ] − F  œ T \ − ET ] − F ß

good for all sets E and F − ‘, in particular, for E œ Ò"ß ∞Ñ (in \   ") and for F œ Ð "# ß #Ó (in
 "#  ] Ÿ #). 

- and ., respectively. Find the distribution of ^ œ min\ß ] .


4.3. Suppose \ and ] are two independent r.v.'s, each exponentially distributed with parameters

Solution. It can be readily shown that ^  > œ \  > ∩ ]  >. Indeed, ^ =  > if and
only if \ =  > and ] =  >, good for all = − H. Then, by the independence of \ and ] ß

T ^  > œ T \  >T ]  > œ /-> /.> œ /-.> .

Therefore, T ^ Ÿ > œ "  /-.> and we conclude that ^ is exponential with parameter
-  .. 

4.4. Suppose \" ß á ß \5 are independent distributed r.v.'s, each exponentially, with parameters
"" ß á ß "5 , respectively. Show that the r.v. ] À œ minÖ\" ß á ß \5 × is exponentially distributed
with parameter ""  á  "5 .

0\ and 0] . Find the distribution T \  ] Ÿ > of their sum.


4.5. Suppose \ and ] are two independent positively defined r.v.'s with their marginal densities

Solution. We will proceed as follows:

MTH 2401, LECTURE NOTES, Page 98, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

T \  ] Ÿ > œ  0 Bß C.ÐBß CÑ


ÖB  C Ÿ >×

(see the figure below) by independence

œ Bœ! 0\ BCœ! 0] C.C.B


> >B

{x y t}

or

œ Cœ! 0] CBœ! 0\ B.B.C œ Cœ! 0] CJ\ >  C.C.


> >C >

4.6. Let \ and ] be two independent exponentially distributed r.v.'s with parameters - and .,
respectively. Using the formula of Problem 4.5 find the probability distribution function of
^ œ\].

Solution. From Problem 4.5,

J^ > œ T \  ] Ÿ > œ Cœ! 0] CJ\ >  C.C


>

œ Cœ! ./.C "  /->C .C


>

œ Cœ! ./.C .C  ./-> Cœ! /.-C .C


> >

- > 
œ "  /.>  .
.- / "  /.-> 

. .
œ "  /.>  .- /
- >
 .- /
.>

. -
œ" .- /
- >
 .- /
.>
ß with >   !ß

MTH 2401, LECTURE NOTES, Page 99, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

and J^ > œ ! when >  !, or in a more compact form

J^ > œ "  .
.- /
- >
 -
.- /
.>
1Ò!ß∞Ñ >. 

4.7. Under the condition of Problem 4.6, assuming that - œ .ß find the probability density
function of ^ œ \  ] and identify ^ .

Solution. We have

J^ > œ "  ./-> -/.> .->


.- œ "  /-> / // " ß

where / œ .Î-.

/ /.->
Now, . Ä - if and only if / Ä ". We thus apply L'Hôpital's rule to the fraction / " .
.-> -/ ">
Ê lim.Ä- / // " œ lim/ Ä" / // " (differentiate the fraction w.r.t. / )
/ "->
œ lim/ Ä" "->/" œ "  ->.

Thus, J^ > œ "  "  ->/-> 1Ò!ß∞Ñ >.

0^ > œ -/-> "  ->  -/-> œ -# >/-> 1Ò!ß∞Ñ >.

Comparing this expression with


α"
0 ÐBà αß " Ñ œ " /"B ÐÐ"αBÑ
"Ñ! 1‘ ÐBÑ

for the gamma density we conclude that 0^ is gamma with parameters α œ # and " œ -. 

4.8. Let \ and ] be two independent exponentially distributed r.v.'s with common parameter -.
Find the probability distribution of ^ œ \  ] , the pdf of ^ and identify ^ .

MTH 2401, LECTURE NOTES, Page 100, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

5. Sums of Independent Random Variables


Property 4.2 leads to further important applications, especially about the sums of independent
r.v.'s. We start with the MGF's. Let \ and ] be independent r.v.'s with MGF's 7\ Ð)Ñ and
7] Ð)Ñ. We are interested in the MGF 7\] Ð)Ñ of the sum. Using (4.8) we have

7\] Ð)Ñ œ I/)Ð\] Ñ œ IÒ/)\ /)] Ó œ 7\ Ð)Ñ7] Ð)Ñ. (5.1)

Equation (5.1) can be extended to any finite sums of independent r.v.'s

7\" á\8 Ð)Ñ œ 7\" Ð)Ñâ7\8 Ð)ÑÞ (5.2)

Example 5.1. Let \ − ÒR Ð.ß 5# ÑÓ and ] − ÒR Ð/ ß $ # ÑÓ be independent r.v.'s. We are interested in


the nature of \  ] . Using (5.1),

7\] Ð)Ñ œ 7\ Ð)Ñ7] Ð)Ñ


"
) )/  "# $ # ) #
# # " #
$ # Ñ) # .
œ /). # 5 œ /Ð./ Ñ) # Ð5 (5.3)

Therefore, \  ] − ÒR Ð.  / ß 5#  $ # ÑÓ. 

Example 5.2. Let \" ß ÞÞÞß \8 be iid exponential r.v.'s, each with parameter -. From (5.2),

7\" á\8 Ð)Ñ œ Ò7\" Ð)ÑÓ8 œ  -


-  8
) (5.4)

which is the MGF of an Erlang r.v. with parameters Ð8ß -Ñ (see (3.9)), which is a special case of
a gamma r.v. with parameters Ðα œ 8ß " œ -Ñ. As an interesting interpretation of (5.4), we see
that an Erlang r.v. contains 8 independent exponential phases and is often used to describe time-
related processes. For example, visiting a supermarket can be statistically adapted to an Erlang
distribution, considering that one spends an exponential time at every department on the list
visiting exactly 8 different departments. 

Example 5.3. Let \" ß á ß \8 be iid Bernoulli r.v.'s with parameter :. Recall that \ is a
Bernoulli r.v. with parameter : if \ À H Ä Ö!ß "× such that T Ö\ œ "× œ : and
T Ö\ œ !× œ ; œ "  :. The pgf of \ is

ID \ œ D ! ;  D: œ :D  ; . (5.5)

We once mentioned that the sum of 8 independent Bernoulli r.v.'s is binomial with parameters
Ð8ß :Ñ. Now we see that by independence,

ID \" á\8 œ Ð:D  ;Ñ8 , (5.6)

which by (3.15) is the pgf of a binomial r.v. with parameters Ð8ß :Ñ. The corresponding MGF of
the sum will be

MTH 2401, LECTURE NOTES, Page 101, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

7Ð)Ñ œ I/)Ð\" +á+\8 Ñ œ Ð:/)  ;Ñ8 . (5.7)


MTH 2401, LECTURE NOTES, Page 102, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

5.1. Under the condition of Example 5.1, identify of the r.v. +\  ,] , where +ß , Á !Þ

5.2. Let \" ß á ß \8 be iid (independent, identically distributed) r.v.'s, each Gaussian with
parameters . and 5# . Find the distribution of the r.v. \"  á  \8 Þ

5.3.
_ Under the condition of Problem 5.2, find the distribution of the sample mean
\ 8 œ 8" Ð\"  á  \8 Ñ (earlier introduced in Example 3.2).

5.4. Let \ and ] be two independent Poisson r.v.'s with parameters -" and -# . Using the MGF
technique find the distribution of the r.v. \  ] Þ

5.5. Let \" ß á ß \8 be independent r.v.'s, each Gaussian with parameters .3 and
53# ß 3 œ "ß á ß 8. Find the distribution of the r.v. +" \"  á  +8 \8 ß where +3 − ‘ and
+"#  á  +8#  !.

5.6. Let \ and ] be two independent r.v.'s with their respective MGF's
)
7\ Ð)Ñ œ /$Ð/ "Ñ and

7] Ð)Ñ œ  "& /)  %$  Þ
$

Calculate I \] .

Solution. Firstly, we identify 7\ and 7] as the MGF's of a Poisson and binomial r.v.'s, with
parameters - œ $ and Ð8 œ $ß : œ "& Ñ, respectively. Then the mean .\ of \ is $ and the mean
.] of ] is $ † "& Þ

Since \ and ] are independent, I \]  œ I\ I]  œ $ † $ † "& Þ 

5.7. Let \ and ] be two independent r.v.'s with their respective MGF's
)
7\ Ð)Ñ œ // " and

7] Ð)Ñ œ  #$ /)  "$  Þ
&

Calculate

Ð3Ñ T Ö\] œ !×

Ð33Ñ I \] 

Ð333Ñ T Ö\  ] œ "×Þ

MTH 2401, LECTURE NOTES, Page 103, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

5.8. Let \" ß á ß \8 be independent r.v.'s, each gamma with parameters α3 and "3 ß 3 œ "ß á ß 8.
Find the distribution of the r.v. +" \"  á  +8 \8 ß where +3  !ß 3 œ "ß á ß 8.

MTH 2401, LECTURE NOTES, Page 104, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

6. Correlation
If r.v.'s \ and ] are not independent, we are interested how dependent they are. We would like
to introduce in some sense a measure of their dependence referred to as the covariance:

CovÐ\ß ] Ñ œ IÒÐ\  .\ ÑÐ]  .] ÑÓ, (6.1)

where .\ and .] are the means of \ and ] . The motivation for this measure of dependence is
as follows. If \ and ] are independent, then so are their functions \  .\ and ]  .] . In this
case,

Cov\ß ]  œ I \  .\ I ]  .]  œ !. (‡)

The converse is not true, as we will see it, but obviously, for \ and ] independent Cov\ß ] 
has its smallest value.

Now, using Proposition 1.1 on linearity of the expectation, after some simple algebra, we arrive
at computationally friendlier formula:

CovÐ\ß ] Ñ œ IÒ\] Ó  .\ .] œ .\]  .\ .] . (6.2)

As mentioned, the converse of ‡ does not hold (i.e., if the covariance of \ and ] is zero, \
and ] need not be independent), as we learn it from the example below.
"
Example 6.1. Let \ À H Ä Ö  "ß !ß "× and be uniformly distributed with $ for each value. Let

] œ
!ß \Á!
"ß \ œ !.

Obviously, \ and ] are dependent. Now, since \] œ !ß .\] œ !. Furthermore, because


.\ œ !ß from (6.2) we have that CovÐ\ß ] Ñ œ !Þ Yet \ and ] are not independent. 

If CovÐ\ß ] Ñ œ !, the r.v.'s \ and ] are called uncorrelated. Hence, if CovÐ\ß ] Ñ œ !ß \ and
] are not necessarily independent. In any event, they are uncorrelated.

Next, we study the covariance more thoroughly and work on its several important properties.

Properties of the Covariance. Let \ß ] ß \" ß ÞÞÞß \8 ß ]" ß á ß ]7 be r.v.'s. Then the following
hold true:

Ð3Ñ CovÐ\ß ] Ñ œ CovÐ] ß \Ñ

Ð33Ñ CovÐ\ß \Ñ œ Var\

Ð333Ñ CovÐ+\ß ,] Ñ œ +,CovÐ\ß ] Ñ

MTH 2401, LECTURE NOTES, Page 105, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

CovÐ \3 ß ] Ñ œ  CovÐ\3 ß ] Ñ
8 8
Ð3@Ñ
3œ" 3œ"

CovÐ \3 ß  ]4 Ñ œ   CovÐ\3 ß ]4 Ñ.
8 7 8 7
Ð@Ñ
3œ" 4œ" 3œ" 4œ"

Ð@3Ñ If ] œ - is almost surely (a.s.) constant, then CovÐ\ß -Ñ œ !. Thus, any r.v.
is uncorrelated with any constant.

Ð@33Ñ lCovÐ\ß ] Ñl Ÿ Var\ † Var]

Corollary 6.1. From Properties Ð33Ñ and Ð@Ñ,

Var  \3  œ  Var\3  #CovÐ\3 ß \4 Ñ.


8 8
(6.3)
3œ" 3œ" 34

Proof. CovÐ \3 ß  \4 Ñ œ Var  \3  œ   CovÐ\3 ß \4 Ñ


8 8 8 8 8

3œ" 4œ" 3œ" 3œ" 4œ"

(due to property Ð3Ñ)

œ  CovÐ\3 ß \3 Ñ  #CovÐ\3 ß \4 Ñ
8

3œ" 34

œ  Var\3  #CovÐ\3 ß \4 Ñ.
8

3œ" 34

Corollary 6.2. Let \" ß ÞÞÞß \8 be pairwise uncorrelated r.v.'s. Then the following Bienamé
equation holds:

Var  \3  œ  Var\3 .
8 8
(6.4)
3œ" 3œ"

Remark 6.1. Given an 8-tuple of r.v.'s \" ß ÞÞÞß \8 ß with their variances 5"# ß á ß 58# ß denote their
covariance's 534 ß 3ß 4 œ "ß á ß 8. We can place all covariance's in a so-called covariance matrix,

 5" 5"8 
#

 5#" 5#8 
5"# 5"$ á
 
5##
CovX À œ O œ  á á .
5#$ á
 
á á á (6.5)

5 58# 
á á á á á
8" 58# 58$ á

MTH 2401, LECTURE NOTES, Page 106, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Notice that O is a symmetric matrix (i.e. such that O w œ O ) and that the main diagonal consists
of all variances of r.v.'s \" ß ÞÞÞß \8 . 

Remark 6.2. A common problem in statistics is the parameter estimation of a r.v. (such as its
mean or variance). In this case, one assumes that some population is represented by an
equivalence class of r.v.'s having the same particular distribution. Such an equivalence class will
be denoted by Ò\Ó, where \ is a generic r.v. being one of them. An experiment consists of
drawing a random sample X8 œ Ð\" ß ÞÞÞß \8 Ñ from the population Ò\Ó, which is an 8-tuple (or
vector) of 8 iid r.v.'s. To estimate an unknown parameter _ one forms a statistic being a Borel
function of the sample. One of them, the sample mean \ 8 , has been introduced in Example 3.2,
except that we did not assume that \" ß á ß \8 were independent. We recall that the sample
mean is
_
\ 8 œ 8" Ð\"  á  \8 Ñ. (6.6)
_
If the r.v. \ has the mean . and variance 5# , then the mean of \ 8 is
_
I\ 8 œ 8" 8I\" œ .ß (6.7)

which is equal to mean of the population, as _per Example 3.2. (This result did not require
\" ß á ß \8 to be independent.) The variance of \ 8 is by Bienamé equation (6.4),
_
Var\ 8 œ 8"# 85# œ 8" 5# . (6.8)

It shows that the variance of the sample mean is 8 times smaller than that of the population and it
seems _like by increasing the sample size 8 we reduce the variance thereby making the sample
mean \ 8 approach to a constant, apparently . itself. Thus, for large samples, the statistic sample
mean can be a good estimator for the unknown mean of the population.

We only observe that the sample mean converges to . follows from a so-called the Law of Large
Numbers. 

MTH 2401, LECTURE NOTES, Page 107, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

6.1. Let \ and ] be two independent Gaussian r.v.'s with parameters Ð  "ß #Ñ and Ð$ß &Ñ. Find
VarÐ#\  (]  &Ñ.

Hint. Step 1. We start with noticing that the constant  & in #\  (]  & does not have any
impact on the variance and thus as such it can be discarded. Therefore,

VarÐ#\  (]  &Ñ œ VarÐ#\  (] Ñ.

Step 2. Since \ and ] are independent, so are #\ and  (] as linear functions of \ and ] . In
particular, #\ and  (] are uncorrelated. Therefore, we can use the by Bienamé equation (6.4)
that

Var#\  (]  œ Var#\  Var  (] .

Next,

œ ## Var\    (# Var] œ % † #  %* † & œ #&$. 

6.2. Suppose a fair die is rolled twice and let \ and ] denote the sum and difference between
the first and second outcome, respectively. Find CovÐ\ß ] Ñ.

6.3. Under the condition of Problem 6.2, find VarÐ#\  $]  %Ñ.

6.4. Show the validity of Property Ð@Ñ.

6.5 Show the validity of Property @3.

6.6. Let \ and ] be two identically distributed r.v.'s. Show that the r.v.'s \  ] and \  ] are
uncorrelated.

6.7. Suppose two fair dice are rolled and let \" and \# denote the outcome of the first and
second die, respectively. Prove that the r.v.'s \ œ \"  #\# and ] œ #\"  \# are
uncorrelated.

"!ß_!Þ$ (i.e., with parameters "!ß !Þ$). Find the mean and variance of the sample
6.8. Let \" ß á ß \"!! be a sample of iid r.v.'s drawn from a binomial population

mean \ "!! .

Solution.

"!! I \"  á  \"!!  œ "!! I\"  á  I\"!! 


_
" "
I\ "!! œ

"
œ "!! "!!. œ .ß

MTH 2401, LECTURE NOTES, Page 108, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

where . is the mean of each \3 . Since the expectation of a binomial r.v. from 7ß :
is 7:, we have

. œ "! † !Þ$ œ $
_
and so is I\ "!! .

Because \3 's are uncorrelated, by Bienamé equation (6.4), the variance is

" "!! " "!!


Var\ "!! œ Var "!! 3œ" \3  œ
_
"!!# 3œ" Var\3

"
œ "!!# "!! † "!:; œ "!:;Î"!! œ "! † !Þ$ † !Þ(Î"!! œ !Þ!#"Þ

MTH 2401, LECTURE NOTES, Page 109, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

7. The Markov and Chebyshev's Inequalities.


The Laws of Large Numbers

Proposition 7.1. Markov Inequality. Let ]   ! a.s. Ði.e. with probability "Ñ and let +  !.
Then,

+1Ò+ß∞Ñ Ð] Ñ Ÿ ] a.s. (7.1)

Indeed, if ]  + a.s., the left-hand side œ ! a.s. and ]   ! a.s. by the assumption. If ]   +
a.s., then the left-hand side œ + a.s., in which case the inequality holds true.

Now, taking the expectation on both sides of (7.1) and recollecting that the expectation is a
monotone functional, we have

T Ö]   +× Ÿ .+ , where . œ I] Þ (7.2)

(7.2) is known as the Markov inequality. Notice that if . œ ∞ß then (7.2) holds trivially. 

Proposition 7.2. Chebyshev's Inequality. Let \ be a r.v.. Denote ] œ Ð\  .Ñ# and let
+ œ &# for some &  !. Then, using Markov inequality we have

Var\ 5#
T Öl\  .l   &× Ÿ &# œ &# (7.3)

known as the Chebyshev's inequality. One of the noteworthy applications is that if Var\ œ !,
then from (7.3) it follows that \ is almost surely a constant. If 5# œ ∞ß then (7.3) holds
trivially. 

Chebyshev's inequality is instrumental in various applications in probability and statistics. In


particular, in parameter estimation of r.v.'s. Various methods in probability and statistics relate to
asymptotic properties of sequences of r.v.'s. We begin with the following notions.

of r.v. on a probability space Hß Y ß T . We say that the sequence ^8  converges to r.v. ^
Definition 7.1. Types of Convergence for a sequence of r.v.'s. Let ^ß ^" ß ^# ß á be a sequence

3  in probability in notation ^8 Ä ^ if


T

lim8Ä∞ T ^8  ^   & œ !ß for each &  !ß

33 in the mean square (or in the square mean) in notation ^8 Ä ^ if
P#

lim8Ä∞ IÒ^8  ^ # Ó œ !ß

MTH 2401, LECTURE NOTES, Page 110, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

333 almost surely (a.s.) in notation ^8 Ä ^  if


a.s.

T = − H À lim8Ä∞ ^8 =  ^ = œ ! œ ".

The latter means that ^8 converges to ^ pointwise for almost all = − H. 

Hß Y ß T . Denote by
Definition 7.2. Let \" ß \# ß á be a sequence of independent r.v.'s on probability space

\ 8 À œ 8" 83œ" \3 ß 8 œ "ß #ß á ß


_

the associated sequence of sample means and by

.8 À œ I  8" 83œ" \3  œ 8" 83œ" I\3 ß 8 œ "ß #ß á

The sequence \8  is said to obey the

3 Weak Law of Large Numbers, if Ð\ 8  .8 Ñ Ä !ß


_ T

33 Strong Law of Large Numbers, if Ð\ 8  .8 Ñ Ä !Þ


_ a.s.

Remark 7.1. Since almost sure convergence of a sequence \8  implies the convergence in
probability (we provide no proof of this assertion), it is obvious that the strong law of large
numbers is stronger than the weak law of large numbers. Note that in the literature, the weak law
of large numbers is often referred to as just the law of large numbers. 

\" ß \# ß á be a sequence of pairwise uncorrelated r.v.'s on probability space Hß Y ß T . In


Proposition 7.3. The Weak Law of Large Numbers. In the context of Definition 7.2, let

addition, we assume that the associated sequence of variances 58# œ Var\8  has the property
that lim8Ä∞ 8"# 83œ" 53# œ !. Then, the sequence \8  satisfies the weak law of large numbers.

Proof. By Chebyshev's inequality (7.3) and Bienamé equation (6.4),

T \ 8  .8    & Ÿ
_ _
5"# á58#
Var\ 8
&# œ 8# &# Ä !ß as 8 Ä ∞. (7.4)

Remark 7.2. The condition lim8Ä∞ 8"# 83œ" 53# œ ! can be replaced by a stronger but easier
verifiable condition that 58# Ÿ Q ß 8 œ "ß #ß á ß for some positive real number Q ß i.e. the
sequence 58#  is bounded. 

Theorem 7.4. Kolmogorov's Strong Law of Large Numbers. Under the assumptions of

variances satisfies the condition ∞


Definition 7.2, suppose that \" ß \# ß á are independent and the sequence 5"# ß 5## ß á of their
8œ" 58 Î8  ∞. Then \8  satisfies the strong law of large
# #

numbers. 

MTH 2401, LECTURE NOTES, Page 111, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

Corollary 7.5. A Special case of the Weak Law of Large Numbers. Let \" ß \# ß á − \  be
a sequence of independent and identically distributed r.v.'s with a common mean . and variance
5#  ∞. Then, from (7.4),
_
T Öl\ 8  .l   &× Ä ! as 8 Ä ∞. (7.5)
_
Thus, for large 8, the sample mean \ 8 will become approximately its mean. (More about this
will be in Chapter V.) 

Corollary (Þ& on \8  and 5# ß the Kolmogorov's conditions are obviously met. Indeed,
Corollary 7.6. Special case of the Strong Law of Large Numbers. Under the assumptions of

∞ # #
8œ" 58 Î8 œ 5
# ∞ " # 1#
8œ" 8# œ 5 #  ∞.

Therefore, such a sequence also obeys the strong law of large numbers. 

MTH 2401, LECTURE NOTES, Page 112, Version 54


CHAPTER III. JOINTLY DISTRIBUTED RANDOM VARIABLES

PROBLEMS

7.1. Let \8  be a sequence of independent r.v.'s such that \8 À H Ä   8ß !ß 8 with
the distribution  8" ß #  8# ß 8" . Does the sequence obey the strong law of large numbers?

Solution. Since I\8 œ !ß Var\8 œ 58# œ I\8# œ "  !  " œ #, which implies that

∞ # # 1#
8œ" 58 Î8 œ # # œ 1  ∞.
#

Thus the conditions of Kolmogorov's strong law (Theorem 7.4) of large numbers are met. 

7.2. Under the conditions of Problem 7.1, let \8  be pairwise uncorrelated. Does \8  satisfy
the weak law of large numbers?

7.3. Show in Remark 7.2 that if the sequence 58#  is bounded, the sequence \8  of r.v.'s
satisfies the weak law of large numbers.
" 8
Solution. If Ö58# œ Var\8 × is bounded, i.e., 58# Ÿ Q ß 8 œ "ß #ß á . then 8#
#
3œ" 53 Ÿ Q 8" and
thus by Chebyshev's inequality (7.3) and Bienamé equation (6.4),

T \ 8  .8    & Ÿ
_ _
5"# á58#
Var\ 8 8Q
&# œ 8# &# Ÿ 8# &# Ä !ß as 8 Ä ∞. (P7.2)

7.4. Suppose a continuous r.v. \ has the mean " and standard deviation 5 œ !Þ#. Using
Chebyshev's inequality estimate the probability of the event !Þ&  \ Ÿ "Þ&.

7.5. It is known that the probability that a Gaussian r.v. lies within plus-minus three standard
deviations from its mean is !Þ**(. Estimate the probability of an arbitrary continuous r.v. \ to
lie within plus-minus three standard deviations from its mean.

MTH 2401, LECTURE NOTES, Page 113, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

CHAPTER IV. RELIABILITY ANALYSIS

1. Reliability Measures
Reliability and Hazard Functions. We will focus on various measures representing lifetimes of

r.v. denoting the “time-to-failure” of a component with PDF J > œ T \ Ÿ >, being the
mechanical or biological components or amount of demands or claims. Let \ be a nonnegative

failure probability distribution function.

Define

V > œ J > œ T \  > œ "  J >


_
(1.1)

and call it the reliability (or survival ) function of \ .

For a >  !ß the r.v. \  > is referred to as the residual life time of a component until its failure,
given \  > (i.e. that it was sustained until time >). The conditional distribution function of the
residual life is defined as

J Bl> œ T \  > Ÿ Bl\  > œ T \ Ÿ >  Bl\  >,

that is, given that the component sustained until >ß it is the probability the component fails not
later than in B units of time. Explicitly it is

J B> œ T \  > Ÿ Bl\  >

T >\ Ÿ>B J >BJ >


œ T \ > œ V> . (1.2)

In particular, with B œ ?>ß we have

J ?>> œ J >?>J >


V> . (1.3)

The latter yields

J ?>> J >?>J > 0 >


?> œ ?>†V> which converges to V> as ?> Ä !.

We denote

2 > œ 0 >
V> . (1.4)

Probability and Statistics MTH 2401, Page 114, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

2> can be interpreted as the conditional (instant) failure rate of a component that otherwise has
sustained until time >. In the reliability literature 2 is referred to as the hazard rate function of \
(which agrees with the physical notion of a rate).

Remark 1.1. If 2 > exists, from formula (1.4), 2 > can also be written as

2> œ  .> ln V >ß


.
V ! œ "ß (1.5)

which in turn yields

> 2?.? œ  ln V >  ! (1.6)


?œ!

and thus

V > œ /! 2?.? Þ


>
(1.7)

If for V >  !ß we define the conditional reliability function as

V Bl> œ T \  >  Bl\  > œ T \ >B


T \ > œ V>B
V> , (1.8)

then using (1.7) we can express it as

V Bl> œ /?œ> 2?.? .


>B
(1.9)

Mean and Mean Residual Time. Suppose \   ! a.s. We obtain another expression for I \ in
terms of the reliability function:

I \ œ Bœ! B.J B œ Bœ! ?œ! .?.J B œ ?œ! Bœ? .J B.?
∞ ∞ B ∞ ∞

œ ?œ! "  J ?.? œ ?œ! V ?.?.


∞ ∞
(1.10)

From (1.2), the mean residual life until failure is

.> À œ I \  >l\  > œ Bœ! B.J Bl>


œ Bœ! ?œ! .? 0 Bl>.B


∞ B

" ∞ B
œ V> Bœ! ?œ! .? 0 >  B.B

" ∞ B>œ?> 0 >  B.Ð>  BÑ.? œ " ∞


V> ?œ!  V> ?œ> V ?.?.

œ
V ?  >

MTH 2401, LECTURE NOTES, Page 115, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

So we obtained
" ∞
.> œ V> ?œ> V ?.?. (1.11)

MTH 2401, LECTURE NOTES, Page 116, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

2. Reliability Measures of Special Distributions


1. Exponential Distribution. Recall (section 2, Chapter II) that a r.v. \ is exponential with
parameter - if its pdf is

0 ÐBÑ œ -/-B 1Ò!ß∞Ñ ÐBÑ.

We will drop the indicator function for now, because in most cases we deal with positively
defined r.v.'s. From expression (1.8) for the conditional reliability function

V Bl> œ T \  >  Bl\  > œ T \ >B


T \ > œ V>B
V>

and (2.17-2.18), Chapter II,

V Bl> œ /-ÐB>Ñ
/ - >
œ /-B œ T Ö\  B× œ V B (2.1)

or also

V >  B œ V >V B. (2.2)

Furthermore, from (1.4),

2> œ 0 >
V> œ -/-> Î/-> œ - (2.3)

we see that the exponential r.v. has a constant hazard rate function.

Now, suppose \ is a r.v. with a constant - hazard rate function. From (1.7),

V > œ /! 2?.? œ /! -.? œ /->


> >

implying that \ is exponential. The latter lets us conclude that the exponential r.v. is the only
r.v. that has a constant hazard rate.

Now let \" ß á ß \8 − Exp- be a sample of independent r.v.'s. We are interested in the
distribution of the minimum value of the sample, min\" ß á ß \8 . The latter is of importance
in calculating the distribution of 8 unreliable machines connected in series (a serial system).
Obviously,

min\" ß á ß \8   B œ  \3  Bß


8

3œ"

which is easy to prove by the pick-a-point method. Thus, by independence,

MTH 2401, LECTURE NOTES, Page 117, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

T min\" ß á ß \8   B œ T   \3  B


8

3œ"

œ  T \3  B œ /-8B ß
8
(2.4)
3œ"

which shows that min\" ß á ß \8  is exponential with parameter -8. [Obviously, the r.v.
8min\" ß á ß \8  is exponential with parameter -.]

Example 2.1. Consider a serial system of 10 identical components working independently. Such
a system fails as soon as one item fails. Thus the lifetime of the system is

\ œ min\" ß á ß \"! ß

where \3 stands for the lifetime of the 3th component. Suppose \3 − Exp!Þ#. Thus, from
(2.4), the reliability function of the system is

V\ B œ /!Þ#†"!B œ /&B ß B   !ß

and the hazard rate function of the system is

2\ B œ - † 8 œ !Þ# † "! œ &.

By Problem 2.1, the mean residual lifetime of the system is

.\ > œ I \  >l\  > œ "& . 

Example 2.2. Unlike a serial system, a parallel system works until at least one component does.
If we have a parallel system of 8 components \" ß á ß \8 whose lifetime is independent of each
other, then the lifetime \ of such system can be defined as

\ À œ max\" ß á ß \8 

and

\ Ÿ B œ  \3 Ÿ B.
8
(2.5)
3œ"

The latter is easy to see when noticing that

\ œ max\" ß á ß \8  − !ß B Í \" − !ß Bß á ß \8 − !ß B.

Now, if \3 − Exp-ß we have from (2.5) that

MTH 2401, LECTURE NOTES, Page 118, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

J\ B œ  "  /-B  œ "  /-B 


8
8

3œ"

implying that

V\ B œ "  "  /-B  .


8

Furthermore,

0\ B œ 8-/-B "  /-B 


8"
.

In particular, for 8 œ #ß

V\ B œ "  "  /-B  œ /-B #  /-B 


#

and thus the hazard rate of the system is

2\ B œ 0\ B
V\ B œ #-"  .
- B
œ #- "/
#/-B
"
#/-B

r.v.'s and suppose we need to find the distribution T \  ] Ÿ > of their sum. We will proceed
Remark 2.1. For the applications, suppose \ and ] are two independent positively defined

as follows. As in Chapter III,

T \  ] Ÿ > œ  0 Bß C.ÐBß CÑ


ÖB  C Ÿ >×

(see the figure below)

œ Bœ! 0\ BCœ! 0] C.C.B


> >B

{x y t}

or

œ Cœ! 0] CBœ! 0\ B.B.C œ Cœ! 0] CJ\ >  C.C.


> >C >
(2.6)

MTH 2401, LECTURE NOTES, Page 119, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

Example 2.3. (Cold-Redundant System.) Consider a system of two units in which the second

the lifetime of the 3th unit and \3 − Exp-3  and assuming that \" and \# are independent,
one is in cold standby and it will be put in working as soon as the first unit breaks down. If \3 is

what is the reliability function of the system?

Solution. Obviously, V > œ "  J\" \# >. Then using (2.6)

V > œ "  Cœ! J\" >  C0\# C.C


>

œ "  Cœ! -# /-# C "  /-" >C .C


>

-# > 
œ "  "  /-# >  -#
- # - " / "  /-# -" > 

-# > 
œ / - # >  -#
- # - " / "  /-# -" > . (2.7)

Example 2.4. In this example we evaluate the reliability of a four-engine jet.

The jet will continue to fly as long as at least one engine on each wing keeps functioning.
Obviously, the jet crashes if either engine fails on any one of the wings.

The system can be depicted by a “Reliability Block Diagram” to represent a combination of


serial and parallel subsystems.

MTH 2401, LECTURE NOTES, Page 120, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

Left Wing Right Wing

A C

B D

lifetime of engine 3ß then \3 − Exp-. Since each subsystem (the Left and Right Wing) is
Suppose all four engines work independently with exponential lifetimes so that if \3 is the

parallel, from Example 2.2, the reliability function of either subsystem is

VA381 B œ /-B #  /-B 

whereas the PDF of the failure of either wing (Example 2.2) is

JA381 B œ "  /-B  .


#

Now, the system fails if at least one of the wings fails, which agrees with the way how a serial
system operates. Thus, if [6 and [< denote the lifetimes of either wing, respectively, the system
sustains working (i.e. the jet continues flying) beyond B in accordance with the event

min[6 ß [<   B œ [6  B ∩ [<  B.

Consequently, the reliability function of the jet is

V4/> B œ V6/0 >-A381 BV<312>-A381 B

œ /-B #  /-B /-B #  /-B  œ /#-B #  /-B  .


#

Remark 2.2. In general, the main principle of modeling reliability block diagrams in the cases of
arbitrary distributions of block components reliability works as follows. Suppose we have a
serial system of independent components with life times \" ß á ß \8 . Recall that

\ À œ min\" ß á ß \8   B œ  \3  B
8
(2.8)
3œ"

and thus
V\ B œ  V\3 B.
8
(2.9)
3œ"

In the event of a parallel system,

MTH 2401, LECTURE NOTES, Page 121, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

] À œ max\" ß á ß \8  Ÿ B œ  Ö\3 Ÿ B×
8
(2.10)
3œ"

implying that

J] B œ  J\3 B


8
(2.11)
3œ"
and thus

V] B œ "  J] B œ "   J\3 B.


8
(2.12)
3œ"

2. Weibull and Rayleigh Distributions. A r.v. \ is said to have a Weibull distribution with
parameters . and α if its PDF is

J > œ J >l.ß α œ "  /.> 1Ò!ß∞Ñ Ð>Ñß .ß α  !.


α
(2.13)

We will say that \ − Wei.ß α. The pdf of \ is

0 > œ 0 >l.ß α œ α.Ð.>Ñα" /.> 1Ò!ß∞Ñ Ð>Ñ.


α
(2.14)

Obviously, the hazard rate of the Weibull distribution is

2> œ α.Ð.>Ñα" /.>


α

/.>
α œ α.α >α" 1Ò!ß∞Ñ Ð>Ñ. (2.15)

In the sequel we drop the indicator function. For α œ " the Weibull distribution reduces to the
exponential distribution, while for α œ # the special case is referred to as Rayleigh distribution.
That is

0 > œ 0 >l.ß # œ #.# >/. > .


# #

In the reliability literature #.# is commonly replaced with - to have the Rayleigh density and
reliability in the form

0 > œ ->/ # -> ß V > œ / # -> .


" # " #
(2.16)

From (2.16), the hazard rate of the Rayleigh distribution is then

2> œ -> (2.17)

under the new meaning of (#.# ) as -. We will say that the underlying Rayleigh r.v.
\ − Ray-.

MTH 2401, LECTURE NOTES, Page 122, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

say 2> œ ->ß then from (1.7)


Conversely, if 2 is a linear hazard rate (often referred to as a linearly increasing hazard rate),

V > œ /! 2?.? œ / # -> .


> " #

Therefore, it follows that a probability distribution is Rayleigh if and only if its hazard rate
function is linear.

Remark 2.3. Very often, in reliability, one characterizes a distribution by its hazard rate alone.
We saw that if the hazard rate is constant, the associate density is exponential, and if its hazard
rate is linear, its associated density is Rayleigh.

If a hazard rate is affine, 2 > œ +  ,> (with +ß ,   ! and +  ,  !), then it is easy to see that

V > œ /+> # ,> ß


" #

which is a “combination” of exponential and Rayleigh. 

Now, we are back to the Weibull distribution. Because

J >l.ß α œ J .>l"ß α (2.18)

J >l"ß α is called the standard Weibull distribution.

Theorem 2.1. Let \ − Wei.ß α. Then

I \ œ ." >"  α"  (2.19)

"  
Var \ œ .# > "  α#   ># "  α"  (2.20)

" 
I \8 œ .8 > "  α8 ß (2.21)

where >B is the gamma function defined as

>B œ ?œ! ?B" /? .?ß B  !.



(2.22)

(See Figure 3.1, Chapter II.) 

The Weibull distribution was used to described fatigue failure, vacuum tube failure, and ball-
bearing failure. It is the most popular parametric family of failure distributions in reliability of
electronic and mechanical systems.

With the results for Theorem 2.1, we can get similar parameters for the Rayleigh distribution
expressed in the more common notation of (2.16).

MTH 2401, LECTURE NOTES, Page 123, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

Corollary 2.2. Let \ − Ray-. Then

I \ œ  #1- (2.23)

Var \ œ -# "  1%  (2.24)

#8Î# 
I \8 œ -8Î#
> "  8# 

†  1
5xß 8 œ #5ß 5 œ "ß #ß á
#5"x
#8Î#
œ -8Î#
(2.25)
# † %5 5x ß 8 œ #5  "ß 5 œ !ß "ß á

In particular,

I\ # œ -# ß (2.26)

)
I\ % œ -# (2.27)

%
Var\ # œ -# . (2.28)

Example 2.5. Consider a parallel system of two independent components. What is the
probability that component 1 fails before component 2 if the life time \3 of component 3 is
Rayleigh with parameter -3 .

Solution. What we are looking for is

T \ #  \ "  œ   0 Bß C.ÐBß CÑ


ÖC  B×

(see the figure below)

{y x}

œ Bœ! 0\" BCœB 0\# C.C.B


∞ ∞

MTH 2401, LECTURE NOTES, Page 124, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

œ Bœ! 0\" BV\# B.B.



(2.29)

Since both r.v.'s are Rayleigh, we have

0\" B œ -" B/ # -" B and V\# B œ / # -# B .


" # " #
(2.30)

Inserting (2.30) into (2.29) yields

T \#  \"  œ Bœ! -" B/ # -" B / # -# B .B


∞ " # " #

œ -" Bœ!
∞ - " - #  "# -" B#  "# -# B#
-" -# B/ / .B

-"  ∞
-"  -# B/ # (-" +-# ÑB .B.
" #
œ -" -# Bœ! (2.31)

Now, since -"  -# B/ # (-" +-# ÑB is a pdf of a Rayleigh r.v. with parameter -"  -# ß the integral
" #

Bœ!

-"  -# B/ # (-" +-# ÑB .B œ ".
" #

Therefore, (2.31) concludes with

T \ #  \ "  œ -"
- " - # . 

MTH 2401, LECTURE NOTES, Page 125, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

PROBLEMS

2.1. For an \ − Exp-ß find the mean residual time from (1.11) and show that it is constant:

.> œ I \  >l\  > œ -" .

Solution.
" ∞
.> œ V> ?œ> V ?.? œ /-> -" /-> œ -" . 

2.2. Show that if \ is a r.v. such that its mean residual time is constant, then \ is exponential.

Solution. Let
" ∞
"
- œ .> œ V> ?œ> V ?.?.

Then, multiplying by V > and differentiating the equation we get

- V > œ  V >ß V ! œ ".


" w

Solving the above linear homogeneous differential equation gives

V > œ /-> . 

with lifetimes \" ß \# ß \$ such that \3 − Exp-3 ß 3 œ "ß #ß $Þ Find the reliability function
2.3. Consider a three-unit serial reliability system consisting of three independent components

V\ B of the systemß where \ À œ min\" ß \# ß \$ . Also give the hazard rate and the mean
residual time of the system.

2.4. Find the mean residual time of a parallel system of two independent and identically
distributed lifetimes of its components, with the common exponential distribution with parameter
-.

2.5. Under the condition of Example 2.3, find V > when -" œ -# œ -.

Answer. /-> "  ->.

2.6. In the context of Example 2.4 about the reliability of a jet, find the hazard rate of the system.

2.7. Under the conditions of Example 2.4, suppose that \3 − Exp-3 ß such that 3 œ "ß #ß $ß %
corresponds to engines A,B,C,D. Assuming that -" œ -% and -# œ -$ ß find the reliability
function of the system.

2.8. Show the validity of Theorem 2.1.

MTH 2401, LECTURE NOTES, Page 126, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

Solution. I \ 8 œ ?œ! B8 α.α Bα" /.B .B


∞ α

submitting ? œ .Bα

œ ?œ!  ." ?"Îα  /? .? œ " ∞


?" α " /? .?
∞ 8 <
.8 ?œ!

" 
œ .8 > "  α8 .

The rest is obvious. 

2.9. Show the validity of formulas (2.23-2.24) of Corollary 2.2.

>B  " œ B>B.


2.10. Show the validity of formulas (2.25) of Corollary 2.2. Hint. Recall formula

2.11. Show the validity of formulas (2.26-2.28) of Corollary 2.2.

2.12. Consider a parallel system of two independent components. What is the probability that
component 1 fails before component 2 if the life time \3 of component 3 is exponential with
parameter -3 .

MTH 2401, LECTURE NOTES, Page 127, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

3. Reliability of 5 -out-of-8 Systems


In many complex parallel reliability systems and parallel subsystems, an 8-components system
fails if at least 5 out of 8 components fail. For example, in many power-generating systems with
at least two generators, 5 generators are sufficient to provide power requirements. Also, in a
typical wire cable for cranes and bridges, the cable may contain thousands of wires and only a
fraction of them is required to carry the desired load.

Assuming that all units have identical and independent life distributions and that the probability
that a unit is functioning is :ß recall that the probability that exactly 5 out of a total of 8 units
function is

,8ß :à 5  œ  85 :5 ; 85 ß 5 œ !ß á ß 8. (3.1)

In the context of a 5 -out-of-8 system, at least 5 of them have to function and such probability is

V 8ß :à 5  œ 84œ5  84 :4 ; 84 . (3.2)

(See Example 5.3, Chapter I.)

In a typical 5 -out-of-8 system with components having equal constant hazard rates -, : stands
for the reliability of one component and it is

: œ <> œ /-> . (3.3)

Therefore, from (3.2) the reliability of the system VW > is

VW > œ 84œ5  84 :4 "  :84 . (3.4)

With a linearly increasing hazard rate (2 > œ ->)ß the corresponding system reliability is

VW > œ 84œ5  84 :4 "  :84 ß (3.5)

" #
where : œ / # -> is of Rayleigh reliability function.

As in Chapter I, section 5 (Example 5.3), if we need to compute the reliability function of the
system, we will have to use the R-command

pbinom(k-1, size=n, prob=p)

MTH 2401, LECTURE NOTES, Page 128, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

which gives the sum of the first 5 probabilities 5"


4œ!  4 : ;
8 4 84
of a binomial r.v.
\ − 8ß :. Now, we know that this is the value of the binomial PDF J 5  ". Thus to
calculate the system reliability VW > we will use the command

1-pbinom(k-1, size=n, prob=p)

which will agree with the formula

VW > œ 84œ5  84 :4 ; 84 œ "  4œ!  4 :4 ; 84 .


5" 8
(3.6)

Example 3.1. Consider a parallel 2-out-of-3 system with components that exhibit constant
hazard rates with parameter -. What is the reliability of the system? If - œ $ † "!& failures per
hour, what is the reliability at time > œ "!!! hours?

Solution. From equation (3.6),

VW > œ $4œ#  $4 /4<> "  /-> 


$4

œ "  "4œ!  $4 /4<> "  /-> 


$4
œ $/#->  #/$-> .

With - œ $ † "!& and > œ "!!!ß

VW > œ !Þ**(%. 

component. Find a minimum number R    5  of components so that a minimum reliability V


Example 3.2. Consider a 5 -out-of-8 parallel system having a constant hazard rate - for each

will be reached for some fixed >. Let 5 œ #ß > œ "!! hoursß and - œ !Þ!" failures per hour. (a)
V œ !Þ(& (b) V œ !Þ&.

Solution. From (3.4),

VW 8ß > œ 84œ5  84 /4-> "  /-> 


84
  V.

So we need to find an R such that

R œ min8   5 À VW 8ß >   V .

Let 5 œ #ß > œ "!!ß and - œ !Þ!". Then using (3.6) the above inequality reads

VW 8ß > œ "  "4œ!  84 /!Þ!"4†"!! "  /!Þ!"†"!! 


84

MTH 2401, LECTURE NOTES, Page 129, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

To calculate R we can use an R-program introduced in Chapter I, section 5, Example 5.3. Here
we identify : œ /" . Thus from formula (3.6) we have

VW 8ß > œ "  "4œ!  84 :4 "  :84 .

The R-source code reads

p<-exp(-1)
N<-1;
R<-0;
while (R<0.75) {
N=N+1
R<-1-pbinom(1,N,p);
}
print(R);
print(N)

and we get

> print(R);
[1] 0.7953857
> print(N)
[1] 7

Let us verify this result “manually.” We have

VW 8ß > œ "  "4œ!  84 /!Þ!"4†"!! "  /!Þ!"†"!! 


84

œ "  " † "  /"   8/" "  /"  


8 8"

œ "  "  /"  "  /"  8/" 


8"

œ "  "  /"  "  Ð8  "Ñ/"    V


8"

or

"  /"  "  Ð8  "Ñ/"  Ÿ "  V


8"

18 œ !Þ'$#"#8" "  Ð8  "Ñ!Þ$'()) Ÿ "  V

So, we need to find

MTH 2401, LECTURE NOTES, Page 130, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

max8   # À 18 Ÿ "  V .

First off, notice that with B À œ 8  "ß < À œ ln"  V ß + œ "  /" ß and since ln is a
monotone increasing function, we have

ln18 œ Bln+  ln"  B"  + Ÿ <

Since +  "ß ln+  ! and thus the dominating part of the above inequality is negative implying
that ln18 is monotone decreasing in 8 and so is 18. Consequently, we need to find the first 8
at which 18 Ÿ "  V . So we have

8 œ # Ê 1# œ !Þ)'%(

8 œ $ Ê 1$ œ !Þ'*

8 œ % Ê 1% œ !Þ&$"$

8 œ & Ê 1& œ !Þ$*&%

8 œ ' Ê 1' œ 0.2866

8 œ ( Ê 1( œ !Þ#!%'

For V œ !Þ(& Ê "  V œ !Þ#& Ê 1'  !Þ#& but 1(  !Þ#&. Thus

R œ ( is the minimum number of components needed to sustain


the reliability of at least !Þ(& within 100 hours.

It agrees with previously calculated value of R using R.

With V œ !Þ& we have

p<-exp(-1)
N<-2;
R<-0;
while (R<0.5) {
N=N+1
R<-1-pbinom(1,N,p);
}
print(R);
print(N)

> print(R);

MTH 2401, LECTURE NOTES, Page 131, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

[1] 0.6053943
> print(N)
[1] 5

a constant hazard rate - for each component. Find a minimum number R    5  of components
Example 3.3. Under the condition of Example 3.2, consider a 5 -out-of-8 parallel system having

so that a minimum reliability V will be reached for some fixed >. Let 5 œ #&ß > œ "!! hoursß
and - œ !Þ!!& failures per hour. V œ !Þ*&.

Solution. From (3.6),

VW 8ß > œ 84œ5  84 /4-> "  /-> 


84
  V.

Here we identify : œ /"!!†!Þ!!& œ /!Þ& and with this : we have

VW 8ß > œ "  #%


4œ!  4 : "  :
8 4 84
.

So we need to find an R such that

R œ min8   5 À VW 8ß >   V .

To calculate R we can use an R-program Example 5.3, Chapter I, section 5. Thus we have

p<-exp(-0.5);
N<-24;
R<-0;
while (R<0.95) {
N=N+1
R<-1-pbinom(24,N,p);
}
print(R);
print(N)

and we get

> print(R);
[1] 0.9528804
> print(N)
[1] 50

MTH 2401, LECTURE NOTES, Page 132, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

PROBLEMS

3.1. Consider a parallel #-out-of-% system with components that exhibit linearly increasing
hazard rates with parameter -. What is the reliability of the system? If - œ 2 † "!3 failures per
hour, what is the reliability at time > œ "!! hours?

3.2. Under the condition of Example 3.1, let 5 œ #ß > œ "!!ß and - œ !Þ!!& failures per hour.
Find a minimum number R    # of components so that a minimum reliability V will be
reached with V œ !Þ(& and V œ !Þ&.

Solution. The above inequality reads

VW 8ß > œ "  "4œ!  84 /!Þ!!&4†"!! "  /!Þ!!&†"!! 


84

œ "  " † "  /!Þ&   8/" "  /!Þ&  


8 8"

œ "  "  /!Þ&  "  /"  8/!Þ& 


8"

œ "  "  /!Þ&  "  Ð8  "Ñ/!Þ&    V


8"

or

"  /!Þ&  "  Ð8  "Ñ/!Þ&  Ÿ "  V


8"

18 œ !Þ$*$&8" "  Ð8  "Ñ!Þ'!'& Ÿ "  V .

So, we need to find

max8   # À 18 Ÿ "  V .

As in the example, we need to find the first 8 at which 18 Ÿ "  V . So we have

8 œ # Ê 1# œ !Þ'$#

8 œ $ Ê 1$ œ !Þ$%#&

8 œ % Ê 1% œ !Þ"(")

For V œ !Þ(& Ê "  V œ !Þ#& Ê 1$  !Þ#& but 1%  !Þ#&. Thus

% is the minimum number of components needed to sustain the reliability of at least !Þ(& within
100 hours.

With V œ !Þ& Ê "  V œ !Þ& Ê 1#  !Þ& but 1$  !Þ& thus $ is the minimum number of
components needed to sustain the reliability of the system of at least !Þ& within 100 hours. 

MTH 2401, LECTURE NOTES, Page 133, Version 54


CHAPTER IV. RELIABILITY ANALYSIS

hazard rate - for each component. Find a minimum number R    %& of components so that
3.3. Under the condition of Example 3.2, consider a 5 -out-of-8 parallel system having a constant

minimum reliability V will be reached for > œ "!! and - œ !Þ!!$ failures per hour and
V œ !Þ)).

Find a minimum number R    5  of components so that a minimum reliability V will be


3.4. Consider a 5 -out-of-8 parallel system having a linear hazard rate -> for each component.

reached for some fixed >. Let 5 œ %&ß > œ "!! hoursß and - œ !Þ!!!4 failures per hour.
V œ !Þ*#.

MTH 2401, LECTURE NOTES, Page 134, Version 54


CHAPTER V. ESTIMATION

CHAPTER V. ESTIMATION

1. Point Estimation. Maximum Likelihood Estimators.


Estimation of Reliability Parameters
In this section we will learn how to estimate a parameter of some distribution. A random variable
always depends on one or more parameters. For instance, an exponential r.v. depends on
parameter -. A Gaussian r.v. depends upon two parameters, . and 5# .

We will denote by ) a generic parameter of a r.v. \ which can be one- or multidimensional. We


assume that ) takes on values in some parameter set @, which is a subset of ‘ or ‘5 Þ We will
consider some basic methods of estimation of ) for various classes of r.v.'s. In order to estimate
), we take a sample.

Let Ò\Ó be some population from which we draw a sample \" ß á ß \8 . (\  is the equivalence
class of all r.v.'s sharing the same distribution with \ .) In most applications, we assume that
\" ß á ß \8 are independent. A function $ of sample X8 œ Ð\" ß ÞÞÞß \8 Ñ, $ ÐX8 Ñ, known as a
statistic, is called an estimator of an unknown parameter ) , also in notation )^8 . Correspondingly,
$Ðx8 Ñ (where x œ B" ß á ß B8  are observed values of the sample) is called an estimate of ) and
denoted by the lower case * ^ .
8

How to choose an estimator or estimate of an unknown parameter )? There are various methods
of choosing $ . For example,

$X8  œ \ 8 œ 8" \"  á  \8 


_

known as a sample mean. This is a common estimator of a parameter ) that is the mean of an
underlying r.v. and such an estimator makes perfect sense. Another example of an estimator is

$X8  œ 8" 83œ" \3  \ 8 


_ 8

(called a biased sample variance) which commonly estimates an unknown variance of a


Gaussian population.

One can propose a function $ of the sample to serve as an estimator of ), but how reasonable can
it be? It means that how well an estimator estimates ) . There are several most common

(MLE for short). Roughly speaking, we take a joint density function 08 B" ß á ß B8  of an 8-
“credibility criteria” of estimators. One of them is known as a maximum likelihood estimator

sample and find $ B" ß á ß B8  that maximizes 08 . If such a function $ exists, then $ x8  is
referred to as a maximum likelihood estimate (m.l.e.) of the sample and the associated version
)^8 œ $ X8  is then an MLE. So, such an estimator seems to be pretty credible.

Probability and Statistics MTH 2401, Page 135, Version 54


CHAPTER V. ESTIMATION

There are some additional goodness criteria of an estimator, such as the property that I )^8 œ ).
Such an estimator is called unbiased. Another lucrative (and seemingly most valuable) property
of an estimator is consistency. The latter means that )^8 Ä ) as 8 Ä ∞ in some sense and thus,
)^8 well approximates ), even though we cannot often afford collecting a large sample.

However, we need to differentiate properties of an estimator )^8 from a method of obtaining )^8 .
The method we study in this section is called the maximum likelihood method. This technique is
common for many discrete and continuous r.v.'s likewise and as mentioned above it is based on
maximization of the sample pdf (known as the likelihood function). Another method of obtaining
$ or )^8 is to assume that the parameter ) we are interested in is a r.v. and we pretend to know its
pdf 0 (called prior). The knowledge of such a prior pdf can be arbitrarily crude, but then 0 can be
calibrated and improved after taking values of a sample and somehow obtain a new pdf called
posterior using Bayes principles. The posterior pdf can yield the conditional mean of an
associated r.v. (that owns the posterior pdf) called the Bayes estimator of ). (It can be shown that
every Bayes estimator is unbiased.) Such method is called Bayes analysis which we will study in
forthcoming sections.

Definition 1.1. Let X8 œ Ð\" ß ÞÞÞß \8 Ñ be a random sample from the equivalence class Ò\Ó of
r.v.'s, with the joint density

08 Ðx8 l)Ñ œ 08 ÐB" ß ÞÞÞß B8 l)Ñ œ 0 ÐB" l)Ñ † † † 0 ÐB8 l) Ñ, (1.1)

called the likelihood function of the sample. This function is regarded as a function of ), with
B" ß ÞÞÞß B8 being "fixed values." 

An estimate * ^ œ $ Ðx Ñ of ) is called a maximum likelihood estimate (m.l.e.) of ) if the


8 8
likelihood function ) È 08 Ðx8 l)Ñ of the sample attains its maximum value at *^8 œ $ Ðx8 ÑÞ The
corresponding statistic value )^8 œ $ÐX8 Ñ is called a maximum likelihood estimator (MLE).

In the situations below we will develop very common techniques for obtaining an MLE for )
from different distributions.

Example 1.1. Sampling from a Bernoulli population. Assume that we need to test a proportion
of defective items with no prior data. These problems arise in reliability analysis, quality control,
exit polls, biotechnology, pharmaceutical, to name a few.

Draw a sample X8 œ Ð\" ß á ß \8 Ñ from a Bernoulli population with an unknown ) − Ð!ß "Ñ.
[Note that a Bernoulli r.v. was previously meant to be with parameter : − Ð!ß "Ñ. Now we change
character : to character ).] We can write the density 0 ÐBl)Ñ of each r.v. \5 as

0 ÐBl)Ñ œ 
)B Ð"  )Ñ"B 1Ð!ß"Ñ Ð)Ñß B œ !ß "
(1.2)
!ß 9>2/<A3=/Þ

MTH 2401, LECTURE NOTES, Page 136, Version 54


CHAPTER V. ESTIMATION

Thus, the likelihood function is

08 Ðx8 l)Ñ œ )5 Ð"  )Ñ85 1Ð!ß"Ñ Ð)Ñ (1.3)

where

5 À œ B"  á  B8 . (1.4)

The value of *^8 that maximizes the likelihood function is the same value that maximizes the log
of 08 Ðx8 l)Ñ since the log function is strictly monotone. [It can be rigorously proved for a
composition 0 1 of any monotone function 0 and any continuous function 1.]

So, let

PÐ)Ñ À œ ln08 Ðx8 l)Ñ œ 5ln)  Ð8  5 ÑlnÐ"  ) Ñ. (1.7)

Then, _ _
Pw Ð)Ñ œ 8ÒB8 )"  Ð"  B8 Ñ "" ) Ó
_ _ _
œ 8 B8 Ð"))Ð"
Ñ) Ð"B8 Ñ
)Ñ œ 8 )BÐ"
8 )
)Ñ ß (1.8)
_
where B8 is the sample mean of B" ß á ß B8 Þ
_
It is easily seen that Pw Ð)Ñ changes its sign from positive to negative by_ passing through B8
which of course is a value from Ò!ß "Ó. Thus, the empirical sample mean B8 _is the m.l.e. _ of )Þ
Notice that a more elaborate
_ analysis in which we distinguish the cases with B8 œ ! or B8 œ 8
from all other values of B8 from Ð!ß "Ñ yields the same result. Thus,
_
)^8 œ \ 8 . (1.9)

Example 1.2. To estimate the probability ) that a timber joist delivered at a construction site
from a particular source is below specification, an engineer randomly selects 100 joists of timber
and inspects them. It turns out that 5 of them are below specification. Therefore, &Î"!! œ !Þ!& is
the m.l.e. of ). 

Remarks 1.1. (Unbiasedness and Consistency of the Sample Mean.)

3 As we know it from (6.11), Chapter II, I\ 8 œ I\" , i.e. the expectation of the sample mean
_

statistic equals the population


_ mean, which_in the case of Example 1.1 is an unknown value of )
itself, meaning that I\ 8 œ ). Thus )^8 œ \ 8 is an unbiased estimator of parameter ).

33 There is another


_ interesting fact about )^8 in the Example 1.1. According to (6.8) of
Chapter III, Var\ 8 œ 8" 5# , where 5# is the variance of the population. If 5#  ∞, as it is in our

MTH 2401, LECTURE NOTES, Page 137, Version 54


CHAPTER V. ESTIMATION

_
case, since 5# œ )Ð" _ )Ñ Ÿ ", the variance of \ 8 becomes smaller and smaller with 8 getting
large. Eventually, Var\ 8 Ä ! as 8 Ä ∞.

Thus,

Var\ 8 œ I \ 8  .  œ \ 8  . Ä !
_ _ #
_ #

_
means that \ 8 Ä . in so-called P# -norm. Since I )^8 œ ), i.e. )^8 is unbiased, in our case . œ ).
The latter implies that )^8 Ä ) under the norm. It is also known as the mean square convergence.

On the other hand, from Chebyshev's inequality

T \ 8  )  & Ÿ Var \ 8 Î&# Ä ! as 8 Ä ∞


_ _
(1.10)
_
for every positive &. Thus \ 8 converges to ), but in a different sense. It is called the
convergence in probability. In the statistics literature, the property

T )^8  )  & Ä !ß as 8 Ä ∞ß (1.11)

of )^8 is called consistency. To tell (1.11) from a stronger form of convergence of )^8 in P# -norm,
we will refer the former as consistency in probability and to the latter as consistency in the
square mean (or mean square consistency).
_
Therefore, the estimator \ 8 is consistent (in probability and in the square mean).

333 In the general case, an estimator )^8 of a parameter ) is called consistent in the square
mean if

)^8  ) Ä ! as 8 Ä ∞. (1.12)
P#

The convergence is in the sense of the P# -norm which is the most common norm in probability
theory. In other words, we say in this case that

)^8 Ä ) in the P# -norm.

Now if )^8 is also an unbiased estimator, then )^8  ) # can be written as )^8  I )^8  #
# #

P P
which is the variance of )^8 . Therefore, if )^8 is an unbiased estimator of )ß it is consistent if and
only if Var )^ Ä ! as 8 Ä ∞. The latter is often easier to verify than other forms of conver-
8
gence.

MTH 2401, LECTURE NOTES, Page 138, Version 54


CHAPTER V. ESTIMATION

Thus if )^8 is unbiased and consistent, from Chebyshev's inequality it also follows that )^8
converges to ) in probability, which is yet another form of consistency of )^8 . 

Example 1.3. Sampling from a Gaussian population. Here both . and 5# are unknown.

Let ) œ Ð.ß 5# Ñ − @ œ ‘ ‚ Ð!ß ∞Ñ. The likelihood function of the sample is

" 
exp  .Ñ# .
8
"
0 8 Ð xl . ß 5 # Ñ œ Ð#15# Ñ8Î#
 #5 # ÐB3 (1.13)
3œ"

To maximize 08 we maximize ln08 Ðin notation PÐ.ß 5# ÑÑ, since the logarithm is a monotone
increasing function. Now,

" 
8
PÐ.ß 5# Ñ œ 8# ln#1  8# ln5#  #5 # ÐB3  .Ñ# Þ (1.14)
3œ"

We calculate
` `
`. P œ P. œ ` 5# P œ P5# (set equal to) œ !. (1.15)

From (1.14) we find that

" 
8
P. œ ÐB3  .Ñ œ "   .Ñ œ ! Í . œ  (1.16)
5 # 5# 8ÐB 8 B8
3œ"

is the sample mean.

^8 À œ
We denote 7 B 8 . From the second equation of (1.15), taking 
B 8 for . we have

" 
8
P5# œ  8 "
# 5#  #
#Ð5 Ñ# ÐB3 
B 8 Ñ# œ !
3œ"
which yields

 ÐB3  
8
#
=^8 œ "
8 B 8 Ñ# ß (1.17)
3œ"

called the sample variance estimate of 5# . To verify that the critical point

^ œ Ð7^ ß =^# Ñ
* (1.18)
8 8 8

is a relative maximum point of 08 we need to see that the function

HÐ)Ñ œ ÐP.. P5# 5#  P.5# ÑÐ)Ñ (1.19)

MTH 2401, LECTURE NOTES, Page 139, Version 54


CHAPTER V. ESTIMATION

is positive at ) œ *^8 Þ

We get

^Ñ œ 
P.. Ð* 8
 !ß (1.20)
^
=#
then

^Ñ œ 
P5# 5# Ð* 8
! (1.21)
#
Ð=^ 8 Ñ#

and finally,
8 B3
8
8B
P.5# Ð)^Ñ œ #
"
œ !. (1.22)
Ð=^ 8 Ñ#

^ 8 . In addition, I .
^ 8  œ . (the latter is true in general). The estimator
Remark 1.2 As per _ the last situation, the MLE of the unknown mean . of a normal population is
the sample mean \ 8 œ . _
^ 8 of . was called (Remark 1.1) unbiased. Now, .
. ^ 8 œ \ 8 is a function, say $ , of the sample
\" ß ÞÞÞß \8 , i.e.

^ 8 œ $ \" ß ÞÞÞß \8 .
. (1.23)

We previously said that an estimator $ \" ß ÞÞÞß \8  of an unknown parameter ) is unbiased if

) œ I $ \" ß ÞÞÞß \8 . (1.24)

Suppose \" ß ÞÞÞß \8 is a random sample from a population with unknown variance 5# (not
necessarily Gaussian). Due to Example 1.3, the statistic

5^ 8 œ 8"  \3  \ 8 
#
8 _ #
(1.25)
3œ"

is the sample variance estimator of 5# . Dividing

  \ 3  \ 8  # œ   \ 3  .  .  \ 8 #
8 _ 8 _
3œ" 3œ"

œ  \3  .#   \ 8  .  # \3  ..  \ 8 


8 8 _ 8 _
#

3œ" 3œ" 3œ"

œ  \3  .#  8\ 8  .  #8\ 8  .


8 _ #
_ #

3œ"

MTH 2401, LECTURE NOTES, Page 140, Version 54


CHAPTER V. ESTIMATION

œ  \3  .#  8\ 8  .


8 _ #
(1.26)
3œ"

by 8 and then taking expectation we have

^ #8  œ " I   Ð\3  .Ñ#   I Ð\ 8  .Ñ# .


I 5
8 _
8 (1.27)
3œ"
_
The first term
_ on the right of (1.27) is 5 #
. The second term (since . is the mean of \ 8 ) is the
variance of \ 8 and it equals 5 Î8. So that we have
#

I 5
^ #8  œ 5#  5# Î8 œ 8
8" 5
#
Á 5# . (1.28)

Consequently, 5 ^ #8 is not an unbiased estimator of 5# . A simple correction to 5


^ #8 would be to
^ #8 by the reciprocal to the factor of in (1.25). Then the estimator
multiply 5

" 
\3  \8
8 _ #
8" ^ #
O8 À œ 8 58 œ 8" (1.29)
3œ"

is unbiased. Note that no assumption about the nature of the populations has been made. 

Example 1.4. Sampling from a Poisson Population. We lay out the MLE method by the
following common situation. The number of connections to a wrong phone number is often
modeled by a Poisson distribution. Suppose we need to estimate the parameter - of that
distribution (the mean number of wrong connections) by observing a sample B" ß á ß B8 of wrong
connections on 8 different days. Assuming that 5 œ B"  á  B8  !, find the m.l.e. and MLE
of -.
B
Solution. If 0 ÐBl-Ñ œ /- -Bx , then the likelihood function is

08 Ðx8 l-Ñ œ /8- -5 Î B4 x


8
Ð5 œ B"  á  B8 Ñ
4œ"

PÐ-Ñ œ ln08 Ðx8 l-Ñ œ  8-  5ln-   lnB4 x


8

4œ"

` _
` - PÐ-Ñ œ  8  5Î- œ À ! Í - œ 5 Î8 œ B8 Þ (1.30)

Now, 5Î8 is a critical point of P-. But Pw - œ -"8 ÐB8  -Ñ showing that Pw is positive for
_
_ _ _ _
hence the global maximum point of P and therefore of 08 x8 l-.
6  B8 , equal zero when - œ B8 , and is negative when -  B8 . It proves that B8 is a local and

_ _
^ œ \ is the MLE of -.
Thus, ^6 œ B8 is the m.l.e. of - and A 
8

MTH 2401, LECTURE NOTES, Page 141, Version 54


CHAPTER V. ESTIMATION

Example 1.5. Atmospheric dust particles around construction sites cause a serious
environmental problem. It is assumed that the number of particles in a unit volume is Poisson
whose parameter - needs to be estimated. In order to do it, a small randomly selected portion of
the air was observed by focusing a powerful microscope on the particles and making counts.
Suppose 50 such samples of air were collected and it gave a total of 2500 particles. From (1.30),
we have #&!!Î&! œ &! as the m.l.e. of the unknown -. In conclusion, the number of particles
around the construction site is Poisson with parameter - œ &! per unit volume of the air. 

Example 1.6. Sampling from an Exponential Population. In traffic engineering one is


concerned with the length of time between vehicles passing a given point. If the intervals are too
short, there will be halts and interruptions, which the engineers attempt to minimize by imposing
longer waiting times at the associated lights. The time intervals between vehicles passing a point
of observation are known to be exponentially distributed with an unknown parameter -.

In the context of the above situation, let \" ß ÞÞÞß \8 − Ò\Ó, where \ is an exponential r.v. with
parameter - unknown. Find the m.l.e. and MLE of -. Is this MLE unbiased?

Solution. 0 Bl- œ -/-B ß B   !, and 0 Bl- œ ! if B  !. (The value of 0 Bl- for B  ! will
be ignored for convenience.) Thus the likelihood function of the sample is

08 ÐB" ß á ß B8 l-Ñ œ -8 /-5 ß 5 œ B"  á  B8 . (1.31)

Ê P- œ ln08 x8 l- œ 8ln-  -5 . (1.32)

Then, Pw - œ  5 œ 8 -"  B8 . From the alternative representation of


8 _
-

P w -  œ -  B8  -
_
8B8 _"
(1.33)

it follows that


  !ß - _"

Pw - is  œ !ß
B8


-œ _" (1.34)
  !ß
B8
- _"
B8

proving that - œ B_"8 is the global maximum of the likelihood function on the parameter set
@ œ !ß ∞. Hence, B_"8 is the m.l.e. of - for the exponential distribution.

Is _" an
\ 8_
unbiased estimator of -? Formally, it is not, because I  \_"8  Á -. However, since
_ _
" " "
- œ \ 8 and - is the mean of
_ the exponential r.v., I\ 8 œ - and thus_ \ 8 is an unbiased
estimator for - . Furthermore, \ 8 is consistent as any sample mean, that is, \ 8 Ä -" as 8 Ä ∞.
"

MTH 2401, LECTURE NOTES, Page 142, Version 54


CHAPTER V. ESTIMATION

Example 1.7. The traffic engineering problem revisited. Suppose there are 30 observed intervals
between the passing's of vehicular traffic giving the mean time interval of !Þ&& minute. In other
_ ^ œ "Î!Þ&& œ "Þ)#.
words, B$! œ !Þ&& is an observed sample mean and thus - 

Note that the exponential distribution has wide applications in many areas of science and
reliability. The times between earthquakes fits exponential and the distribution of time to failure
of various components fits exponential.

Example 1.8. Sampling from a Rayleigh Population. Suppose we need to estimate the -
parameter in the Rayleigh density given by

0 B œ -B/ # -B .
" #
(1.35)

So, the likelihood function of the sample is

0 B" ß á ß B8 l- œ -8  B3 / # -B" áB8 


8 " # #
(1.35)
3œ"

and its logarithm is

P- œ 8ln-  ln  B3   "# -83œ" B#3 .


8
(1.36)
3œ"

Its derivative is

P w -  œ 8
-  "# 83œ" B3# . (1.37)

From the alternative representation of


" 8
P w -  œ  8#8 B#  -
#
3œ" B3
#
- (1.38)
3œ" 3

it follows that


  !ß 83œ" B#3
#8
-
Pw - is  œ !ß

83œ" B#3
#8

  !ß
-œ (1.39)
 83œ" B#3
#8
-

83œ" B#3
#8
proving that - œ is the global maximum of the likelihood function on the parameter set
@ œ !ß ∞. Hence, 3œ" B#3
#8
8
^ of - for the Rayleigh distribution.
is the m.l.e. - 8

In summary,

MTH 2401, LECTURE NOTES, Page 143, Version 54


CHAPTER V. ESTIMATION

^ œ
83œ" B#3
#8
- 8 (1.40)

is the m.l.e. of parameter - in the Rayleigh population. Consequently,

^ œ
83œ" \3#
#8
A 8 (1.41)

is the MLE of -. As in the case of the exponential distribution we will work with its reciprocal

^ " œ " 8
A 8 #8
#
3œ" \3 (1.42)

hoping to prove that it is an unbiased and consistent estimator of parameter -" .

Proposition 1.1. Let \" ß á ß \8 − Ray-. Then A


^ " is an unbiased and consistent estimator
8
of -" .

Proof. Using formula (2.26), Chapter IV, we get

" 8 #
"
^ œ
IA œ -" ß (1.43)
8 #8 3œ" -

^ " is an unbiased estimator of " .


which shows that A 8 -

Regarding consistency, in light of Remark 1.1 333, since A


^ " is an unbiased estimator of " ß we
8 -
"
^
need to show that Var A Ä ! as 8 Ä ∞.
8

Since \"# ß á ß \8# are independent r.v.'s by the Bienamé's equation (6.4), Chapter IV, and then
equation (2.28), Chapter IV,

" 8 " 8
"
^ œ
Var A % " "

3œ" Var \3 Ä !.
#
8 %8# œ %8# 3œ" -# œ -# 8

Even though A ^ " is an unbiased and consistent estimator for " ß we can use A ^ as an MLE
8 - 8
estimator for -, although we cannot claim that it is unbiased and consistent for -. Furthermore,
we can apply an observed value -^ of - for the mean and variance of \ calling them “estimates”
8
of the mean and variance, respectively. Recall that according to formulas (2.23-2.24), Chapter
IV,

. À œ I \ œ  #1- (1.44)

5# À œ Var \ œ -# "  1% . (1.45)

So, the corresponding estimates will be

MTH 2401, LECTURE NOTES, Page 144, Version 54


CHAPTER V. ESTIMATION

^ 8 œ  1^
. (1.46)
#-8

# 
^ #8 œ
5 ^ "  1% . (1.47)
- 8

Example 1.9. A manufacturer of an automotive speed sensor subjects 10 sensors to a reliability


test that simulates the environmental conditions (such as temperature and speed) at which the
sensors normally operate. A sensor is regarded as failed when its output falls outside 5%
tolerance. The miles accumulated before the failures of the sensors are (in thousands)

""! "$! "&! "&& "&* "'$ "'' "') "'* "(!

Assuming that the miles to failure follow a Rayleigh distribution determine the value of the
parameter of the distribution and the corresponding estimate of the mean.

Solution. The value of the parameter used is the m.l.e. of the above sample of 10 values of
^ œ 8#8 we have
mileage. Applying formula (1.40) - 8  B# 3œ" 3

^ œ
- #†"!
œ )Þ$""** † "!"" .
"! #%!'"'†"!'

Using (1.46) we find

^ "! œ "$(ß %(! miles.


. 

MTH 2401, LECTURE NOTES, Page 145, Version 54


CHAPTER V. ESTIMATION

PROBLEMS

1.1. Let \" ß ÞÞÞß \8 − Ò\Ó, where \ r.v. with pdf )B)" 1Ð!ß"Ñ ÐBÑ w.r.t. the unknown parameter
)  !. Find the m.l.e. and MLE of ).

Solution. The likelihood function of the sample is

08 ÐB" ß á ß B8 l)Ñ œ )   B3 
8 ) "
8
1Ð!ß"Ñ8 ÐB" ß á ß B8 ÑÞ
3œ"

Denote PÐ)Ñ œ ln08 ÐB" ß á ß B8 l)Ñ. Then,

PÐ)Ñ œ 8ln)  Ð)  "Ñ lnB3


8

3œ"

(we drop 1Ð!ß"Ñ8 ÐB" ß á ß B8 Ñ for convenience). Thus

  lnB3 and set it equal to !


8
` 8
` ) PÐ) Ñ œ )
3œ"

finding ) œ  8Î lnB3 as a critical point of P. (Notice that this ) is positive.) It is readily seen
8

that  8Î lnB3 is a maximum point of P by checking the signs of P on the left (positive) and
3œ"
8

right (negative) of  8Î lnB3 . Therefore


3œ"
8

3œ"

^ œ  8Î
*
8
lnB3
3œ"

and

) œ  8Î ln\3 Þ
8
^

3œ"

1.2. Suppose we need to estimate the time to failure of a water plant that has an exponential
distribution with an unknown parameter -. The past 1! failures of the plant took place in
#ß "!ß "#ß 'ß (ß *ß "%ß )ß $ß % days. Find the maximum likelihood estimate of -.
_
Solution. From the given data of B" ß á ß B"! we find that B"! œ (&Î"! œ (Þ&. From Example
1.6, we have - œ "Î(Þ& œ 0.133Þ 

1.3. Suppose purchases of some cell phone brands are made by men and some by women and
their proportions are unknown except that the proportion : of purchases made by males is

MTH 2401, LECTURE NOTES, Page 146, Version 54


CHAPTER V. ESTIMATION

"
# Ÿ : Ÿ #$ . In a random sample of 70 phones of a particular brand it was found that 58 were
made by women and 12 - by men. Find the m.l.e. of :.

1.4. Let \" ß á ß \8 − Exp-. Show that \ 8 is a consistent estimator of parameter -" .
_

MTH 2401, LECTURE NOTES, Page 147, Version 54


CHAPTER V. ESTIMATION

2. The Central Limit Theorem


The Central Limit_ Theorem's Basics. Let \" ß _\# ß á be a sequence of iid r.v.'s each from
R Ð.ß 5# Ñ and let \ 8 be the sample mean. Then, \ 8 − R .ß 58  and ]8 defined as
#

5 Î 8
\ 8 .
]8 À œ

is an element of R Ð!ß "ÑÞ

If \" ß \# ß á is a sequence of iid r.v.'s with common mean . and variance 5# , but not

]8 Ä ^ − R Ð!ß "Ñ in probability (i.e. T Ö]8 Ÿ B× Ä QÐBÑ as 8 Ä ∞} due to a key result in


necessarily normal, then the r.v. ]8 need not be the standard normal. However,

probability known as the Central Limit Theorem.

Theorem 2.1 The Central Limit Theorem (CLT). Let \" ß \# ß á be a sequence of iid r.v.'s,
each with mean . and variance 5# . Then, the standardized r.v.
_

5 Î 8
\ 8 .
]8 œ (2.1)

or
5 8
\" á\8 8.
]8 œ (2.2)

converges to the standard Gaussian r.v. ^ − R !ß " in probability. 

Example 2.1. Let \ be a binomial r.v. with parameters Ð8ß :Ñ. Then, as we recall it,
\ œ \"  á  \8 is the sum of 8 independent Bernoulli r.v.'s. By the CLT, \ can be
approximated by a Gaussian r.v. if 8 is large enough. ÒRecall that the Poisson approximation to
the binomial required both 8 to be large and : - small. The normal approximation does not limit
:.Ó Using (2.2) with 5# œ :; we have that the r.v.

T
5 8 8:;
\" á\8 8. \8:
œ Ä ^ − ÒR Ð!ß "ÑÓ as 8 Ä ∞Þ (2.3)

Example 2.2. Suppose a fair coin is tossed *!! times. Find the probability of obtaining more
than 495 heads. In this case the r.v. \ giving the number of heads in 900 trials is binomial with
parameters Ð*!!ß "# Ñ. Therefore, 8: œ %&! and 8:; œ ##& and

T Ö\  %*&× œ T  \%&!
"&  %*&%&! 
"&

œ T Ö^  $× œ "  QÐ$Ñ œ !Þ!!"$Þ (2.4)


MTH 2401, LECTURE NOTES, Page 148, Version 54


CHAPTER V. ESTIMATION

Remark 2.1. In the general case, if \" ß \# ß á are iid r.v.'s with common parameters .ß 5# ,
then the sum \"  á  \8 œ \ is a r.v. that is formed of a sum (like binomial r.v. in Example
2.1), then we have

.\ œ I \"  á  \8  œ 8.

and

5]# œ Var\ œ 85# or 5] œ 58

implying that (2.2) can be rewritten as


\.\
5] ¸ ^ − ÒR Ð!ß "ÑÓ as 8 Ä ∞. (2.5)

When using the normal approximation we assume that 8 is large and replace a r.v. like binomial
or a sum or sample mean with Gaussian after a corresponding standardization. In some other
cases, we even calculate the value of 8 to satisfy the accuracy of estimation of an unknown mean
by the sample mean assuming that the sample mean is already normal. But how accurate is the
normal approximation itself? The following theorem partially addresses this question.

Theorem 2.2 (Berry-Esseen). In the context of the CLT, the following estimate holds:

supB − ‘ À T  \ Ÿ B  QÐBÑ Ÿ G I 5\$ 


" .
_ $

5 Î 8
8 .
8
ß (2.6)

where

G − Ò"Î#1ß !Þ)ÑÞ (2.7)


The Berry-Esseen theorem obviously gives the speed of convergence of a sum of r.v.'s to the
Gaussian r.v.

MTH 2401, LECTURE NOTES, Page 149, Version 54


CHAPTER V. ESTIMATION

PROBLEMS

2.1. The number of students enrolled in calculus classes at FIT is a Poisson r.v. with parameter
- œ "!!. Use the CLT approximation to find the probability that the new enrollment is going to
be 120 or more students.

Solution. The exact solution, without the approximation is

T Ö\   "#!× œ "  /"!!  "!!


""*
5
5x (2.6)
5œ!

which numerically is too cumbersome. Recalling that a Poisson distribution is “infinitely


divisible” we can represent \ as any sum of iid Poisson r.v.'s each with parameter, say -Î8. For
instance we can use 8 œ "!!. Hence we can apply the CLT directly to \ pretending that it is a
sum, like in Remark 2.2 or Example 2.3. Recalling that the mean of \ is 100 and so is the
variance, we have

T  \"!!
"!!  
"#!"!!
"!  ¸ "  QÐ#Ñ œ !Þ!##). (2.7)

how large need 8 be so that T  \8  "  !Þ!"  !Þ!"?


2.2 If \ is a gamma r.v. with parameters Ð8ß "Ñ (i.e. 8-Erlang RV with - œ "Ñ, approximately

Hint: As the 8-Erlang, \ is the sum of 8 independent exponential r.v.'s, each with parameter
- œ ". Now, we need to evaluate

T  \" ÞÞÞ\
8
8
 -"   &  α

or in its equivalent form

T l\ 8  -" l Ÿ &   "  α.


_

By the CLT we have

5#  " 
 α#  ß
#
8  &# Q "

with 5 œ - œ ". Now, identifying α œ !Þ!", we have

min8 œ #&'# œ ''&%'. 

2.3. Civil engineers believe that [ , the amount of weight (in units of 1000 pounds) that a certain
span of a bridge can withstand without structural damage resulting, is normally distributed with
mean 400 and standard deviation 40. Suppose that the weight of a car is a r.v. with mean 3 and

MTH 2401, LECTURE NOTES, Page 150, Version 54


CHAPTER V. ESTIMATION

standard deviation 0.3. Approximately how many cars would have to be on the bridge span for
the probability of structural damage to exceed 0.1?

withstand. We need to estimate the minimal value of 8 such that T [8   [    !Þ". Since [8
Hint: Let [8 be the total weight of 8 cars and let [ be the total weight the bridge can

and [ are independent normal r.v.'s, [8  [ is also normal (why?).

Under the given conditions,

[8 − R Ð$8ß !Þ* † 8 and [ − R Ð%!!ß %!# ÑÞ

Thus, [8  [ − R Ð$8  %!!ß !Þ* † 8  "'!!Ñ and we have

T Ö[8  [   !× œ T  [8
[ Ð$8%!!Ñ
!Þ*8"'!!
   !Þ*8"'!! 
$8%!!

œ T ^   !Þ*8"'!! 
%!!$8
  !Þ"

or

Q !Þ*8"'!!
%!!$8
 Ÿ !Þ*ß

which yields

!Þ*8"'!!
%!!$8
Ÿ Q" Ð!Þ*Ñ œ "Þ#).

Solving the latter inequality gives

8   ""(Þ 

MTH 2401, LECTURE NOTES, Page 151, Version 54


CHAPTER V. ESTIMATION

3. Confidence Intervals
Preliminaries. Suppose ^ − R !ß " and let +  ! and α − !ß ". Consider the equation

"  α œ T ^   + œ T   +  ^  + œ Q+  Q  +

œ Q+  "  Q+ œ #Q+  ".

Thus we can find + from the latter equation as follows:

+ œ Q" "  α#  œ DαÎ# . (3.1)

We denoted + by DαÎ# thus having the identity

T   DαÎ#  ^  DαÎ#  œ "  α. (3.2)

Here DαÎ# œ Q" "  α#  is the reference point of the αÎ# tail, i.e., the tail area of the standard
Gaussian PDF from point DαÎ# all the way to the right. See the figure below.

of DαÎ# is T ^ Ÿ DαÎ#  of DαÎ# is T ^  DαÎ# 


Here the area on the left The area on the right


 Area
Area 1  2
2

z / 2

It follows from the equation T   DαÎ#  ^  DαÎ#  œ "  α that "  α is the area enclosed
between the two tail areas, each valued αÎ# located between the two reference points  DαÎ#
and DαÎ# :

Area 1  

 /2  /2

 z / 2 0 z / 2

MTH 2401, LECTURE NOTES, Page 152, Version 54


CHAPTER V. ESTIMATION

population R .ß 5  by the sample mean \ 8 , using \ 8 as an estimator of . (in section 1


Estimator Interval. Now, suppose we want_ to estimate _ an unknown mean . of a Gaussian
#

proved to be the MLE of . for the Gaussian case) within a prescribed measure of accuracy &. It
can be formalized as
_
T Öl\ 8  .l  &× œ "  αß (3.3)

where α is referred to as _the significance level (often assumed to be !Þ!&ß !Þ"!ß or !Þ!#&).
Equation (3.3) tells us that \ 8 deviates from . in less than an & with probability of "  α.

If we rewrite (3.3) as
_ _
T Ö\ 8  &  .  \ 8  &× œ "  αß (3.4)
_ _ _
we see that the expression l\ 8  .l  & forms the estimator interval Ð\ 8  &ß \ 8  &Ñ of an
unknown mean ..

The _Explicit Form of the Estimator Interval of . with 5 # known. We can also rewrite
T Öl\ 8  .l  &× as

T Öl\ 8  .l  &× œ T  5Î8   &8Î5.


_ _
\ 8 .
(3.5)
_

5 Î 8
\ 8 .
According to section 5, Chapter III, the statistic of the sample \" ß á ß \8 is Gaussian (in
_
notation, ^ ). Indeed, \ 8 is a linear _combination of \" ß á ß \8 and as such, it is Gaussian with
parameters .ß 5 Î8. Thus the r.v. 5Î8 is the standardized version of \ 8 .
_
# \ 8 .

Consequently, from (3.4-3.5) and (3.2) we have

&8
"  α œ T Öl\ 8  .l  &× œ T ^   5 
_

œ T   DαÎ#  ^  DαÎ# .

Here now

&8
DαÎ# œ 5 (3.6)

implying that

8
5
&œ DαÎ# . (3.7)

In particular, for α œ !Þ!& we find from the Gaussian tables

MTH 2401, LECTURE NOTES, Page 153, Version 54


CHAPTER V. ESTIMATION

D!Þ!#& œ Q" Ð!Þ*(&Ñ ¸ "Þ*' (3.8)

and thus we have

8 "Þ*'.
5
&œ (3.9)

Then,
_ assuming
_ that 5 is known, the estimator interval of an unknown mean .,
Ð\ 8  &ß \ 8  &Ñ, will have the explicit form

ÐEß FÑ œ \ 8  8 DαÎ# 
_ _
8 DαÎ# ß \ 8
5 5
 (3.10)

or with α œ !Þ!&,

ÐEß FÑ œ \ 8  8 "Þ*'.
_ _
8 "Þ*'ß \ 8
5 5
 (3.11)

According to (3.11), we thus have that the unknown parameter . lies between r.v.'s E and F
with probability "  α œ !Þ*&.

The Confidence Interval. Recall that


_ we assume to know 5 , which may be problematic in many
cases. Furthermore, in (3.10-3.11)
_ \ 8 is not observed yet. But, once it is observed, we obtain the
empirical sample mean B8 œ 8" ÐB"  á  B8 Ñ and thus have . − Ð+ß ,Ñß where
_
8 "Þ*'
5
+ œ B8  (3.12)
and
_
8 "Þ*'Þ
5
, œ B8  (3.13)

The predicament with an empirical substitute is that we can no longer claim that . − Ð+ß ,Ñ with
probability "  α œ !Þ*&, since + and , are not r.v.'s, but mere realizations of E and F . In fact,
there is nothing random in this empirical interval to induce an eventß and thereby to warrant the
use of the word probability. Instead we say that . − Ð+ß ,Ñ with confidence "  α œ !Þ*&Þ

The interval

Ð+ß ,Ñ œ B8  8 DαÎ# 


_ _
8 DαÎ# ß B8
5 5
 (3.14)

is referred to as the "!!Ð"  αÑ% confidence interval for ..

In our particular case, the interval

+ß , œ B8  8 "Þ*'


_ _
8
5 5
"Þ*'ß B 8  (3.15)

MTH 2401, LECTURE NOTES, Page 154, Version 54


CHAPTER V. ESTIMATION

is the *&% confidence interval for .. 

Example 3.1. Suppose when a signal having value . is transmitted from node A, the value
received at node B is normally distributed with mean . and variance %. In other words, when the

parameters !ß %. To reduce an error, suppose 9 signals of the same value . are sent. Upon their
signal is sent, then its value received is .  [ ß where [ represents a Gaussian noise with

receipt at node B, their values were recorded as

&ß )Þ&ß "#ß "&ß (ß *ß (Þ&ß 'Þ&ß "!Þ&

We need to construct a 95% confidence interval for .. Denoting = œ &  )Þ&  "#  "&  ( 
*  (Þ&  'Þ&  "!Þ& we obtain
_
B8 œ =Î* œ )"Î* œ *.

From formula (3.15) we have

+ß , œ B8  8 "Þ*'.


_ _
8 "Þ*'ß B8
5 5

_
Thus substituting * for B8 ß * for 8, and # for 5, we arrive at

+ß , œ (Þ'*ß "!Þ$". 

Remark 3.1. If \" ß \# ß á are Gaussian r.v.'s and . is to be estimated, while 5 is unknown, then
5 should be replaced with the unbiased sample standard deviation

" 
\ 3  \ 8  .
_ #
5w œ  8"
8
(3.16)
3œ"

Now, the rest of the calculations will be very similar, except that
_

5w Î8
\ 8 .
œ ^ w œ Y8" (3.17)

turns to a so-called >-r.v. with 8  " degrees of freedom. Consequently,

T  &  œ Y8"  & œ "  α.


_

5w Î8
\ 8 .
(3.18)

As per (3.7),
"
& œ 78" Ð"  α# Ñ œ >8"
αÎ# . (3.19)

Thus the confidence interval for the unknown mean . of a Gaussian population, with an
unknown variance 5# ß is

MTH 2401, LECTURE NOTES, Page 155, Version 54


CHAPTER V. ESTIMATION

+w ß ,w  œ B8  8 >αÎ# Þ


_ 5w 8" _
8 >αÎ# ß B8
5w 8"
 (3.20)

So, the new confidence interval Ð+w ß ,w Ñ is like +ß ,ß with 5 being replaced with 5w and D+Î#
replaced with >8"
αÎ# .

Now, >α8"
Î# can be found from the table of >-distribution like DαÎ# from the Gaussian table. Only
now, >αÎ# depends also upon one more parameter 8. Notice that for 8 being relatively large (like
8"

31 or more), >8"
αÎ# Ä DαÎ# .

Example 3.2. In the context of Example 3.1 assume now that the variance 5# of a transmitted
signal is unknown. Calculation of 5w # gives

5w # œ ") *3œ" B3  B* # œ *Þ& Ê 5w œ $Þ!)#.


_

From the >-table we find >)!Þ!#& œ #Þ$!'. Therefore,

+w ß ,w  œ *  #Þ$!' $Þ!)# $Þ!)# 


$ ß *  2.306 $ œ Ð'Þ'$ß ""Þ$(Ñ. 

Remark 3.2. To ease computations one can use R, the MS Excel, MatLab, or Mathematica to
name a few. For instance, in Mathematica one can use commands like

Sample = {5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5}

Z  = Variance[Sample] // N

to get Z œ *Þ& followed by 5w œ SqrtZ  œ $Þ!)2#07.

The users of MS Excel are warned not to use stdev function, as it gives the square root of a
biased sample variance or (population variance). Also Var.P in Excel gives the population
variance. So, either of them needs to be adjusted. (See Problem 3.6.)

In R, one can use the following subroutine

sample=c(5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5)
mean(sample)
var(sample)
sd(sample)

to yield

> sample=c(5, 8.5, 12, 15, 7, 9, 7.5, 6.5, 10.5)
> mean(sample)

MTH 2401, LECTURE NOTES, Page 156, Version 54


CHAPTER V. ESTIMATION

[1] 9
> var(sample)
[1] 9.5
> sd(sample)
[1] 3.082207

MTH 2401, LECTURE NOTES, Page 157, Version 54


CHAPTER V. ESTIMATION

PROBLEMS

3.1 Suppose when a signal having an unknown constant value . is transmitted from node A, the
value received at node B is normally distributed with mean . and variance #. That is, when the

!ß #. To reduce error, "' signals of the same value . are sent. Upon their receipt at node B,
signal is sent, then its value received is .  [ ß where [ is a Gaussian noise with parameters

their values were recorded as

#ß %ß (ß $ß "#ß ""ß &ß 'ß "'ß "%ß "ß $ß *ß #ß %ß "

Construct a 95% confidence interval for ..

Solution. First we obtain

= œ #  %  (  $  "#  ""  &  '  "'  "%

 "  $  *  #  %  " œ "!!.

Then,
_
B8 œ =Î"' œ "!!Î"' œ 'Þ#&.

From formula (2.19) we have

+ß , œ B"'  "' "Þ*'.


_ _
"'
5 5
"Þ*'ß B "' 

Thus substituting 'Þ#& for B"' and # œ "Þ% for 5, we arrive at


_

+ß , œ 'Þ#&  "Þ%


% "Þ*'ß 'Þ#&  "Þ%
% "Þ*'
 œ &Þ&'ß 'Þ*%. 

3.2 In the context of Problem 3.1 assume that the variance 5# of a transmitted signal . is
unknown. Construct a 95% confidence interval for ..

Solution. Calculation of 5w gives

" "'
5w œ  "& 3œ" B3  B"'  œ %Þ'$ .
_ #

From the >-table we find >"&


!Þ!#& œ #Þ"$". Therefore,

+w ß ,w  œ 'Þ#&  #Þ"$" %Þ'$


% ß 'Þ#& 
%Þ'$ 
% œ Ð$Þ()ß )Þ(#Ñ. 
_
3.3 In the context of Problem 3.2, write a program in R to calculate B"' and 5w .

MTH 2401, LECTURE NOTES, Page 158, Version 54


CHAPTER V. ESTIMATION

3.4 Suppose when a signal having an unknown constant value . is transmitted from node A, the
value received at node B is normally distributed with mean . and an unknown variance. That is,

parameters !ß 5# . To reduce error, "' signals of the same value . are sent. Upon their receipt at
when the signal is sent, then its value received is .  [ ß where [ is a Gaussian noise with

node B, their values were recorded as

%ß $ß *ß 'ß ""ß $ß &ß (ß "#ß )ß )ß %ß 'ß "ß (ß #

Construct a 95% confidence interval for ..

Answer: +w ß ,w  œ %Þ$#(ß (Þ'($.

3.5 In the context of Problem 3.4, write a program in R to calculate the sample mean, the
unbiased sample variance and the confidence interval.

3.6 How in Remark 3.2 can the population variance and standard deviation of associated
operators in MS Excel be adjusted?

MTH 2401, LECTURE NOTES, Page 159, Version 54


CHAPTER V. ESTIMATION

4. Approximate Confidence Intervals and


other Ramifications of the Central Limit Theorem
Ideally only normal samples should be used for confidence intervals regarding unknown means
and known standard deviations, especially when sample sizes are not large enough. If sample
sizes are very large, we can proceed with normally approximated confidence intervals, but with
caution. In this case it would be correct to say that such an interval is an approximate confidence
interval.

Example 4.1. Suppose we need to estimate the proportion of defective items in a large
population, or we need to estimate the proportion of HIV infected individuals, percentage of
speed limit violations, percentage of smokers, obese people, a TV program viewers, or the
number of corrupt signals. However, unlike parameter estimation in section 1, here we discuss
how to obtain a confidence interval for the unknown parameter.

Suppose, a large sample \" ß á ß \8 is drawn from a Bernoulli population with parameters
. œ : _and 5# œ :Ð"  :Ñ. The value of : is unknown and it will be estimated by the sample
mean \ 8 Þ Because the value of 5 is generally unknown, we can replace 5 with max 5 which
will end up giving us a larger interval than it really is.

Since 5# Ð:Ñ œ :Ð"  :Ñ, the latter is a second degree polynomial with roots at ! and ". The
function 5# Ð:Ñ is a parabola with vertex at : œ "# and thus 5# is valued "% Þ (See the below figure.)

1
4
 2  p(1  p)

0 1
1
2

If we use the maximal value 5 œ  "% œ "


# and substitute it in +ß , of (3.15) we get

+ß , œ B8  8 # "Þ*'.


_ _
8 #
" " " "
"Þ*'ß B 8  (4.1)

Now, if we utilize the same idea with a more reasonable 5 for the confidence interval Ð+ß ,Ñ
based on historial data, say !Þ$ for 5 , and then collect empirical data, the size of the confidence
interval will shrink. For example, if we know from the past observations that : cannot exceed
!Þ", then

max5 œ :"  : À : − Ò!ß !Þ"Ó œ !Þ" † !Þ* œ !Þ$

MTH 2401, LECTURE NOTES, Page 160, Version 54


CHAPTER V. ESTIMATION

as depicted in the figure below:

0.25  2  p(1  p )

0.09

0 0.1 1

Example 4.2. Suppose in a sample of %!! driversß "& drove above the speed limit on a HW I95.
Thus we have B%!! œ "&Î%!! œ !Þ!$(&. Continuing with the rest of +ß , and assuming that the
_
true proportion of drivers going over the speed limit never exceeds !Þ". we get !Þ$ as max5 and
thus

%!! "Þ*'
!Þ$
+ œ !Þ!$(&  œ !Þ!$(&  !Þ!#*% œ !Þ!!)"

!Þ$
, œ !Þ!$(&  #! "Þ*' œ !Þ!$(&  !Þ!#*% œ !Þ!''*Þ 

The Sample Size

There are some other applications of the CLT.

Example 4.3. From (3.7), namely

œ Q" "  α#  œ DαÎ# ß


&8
5
_
we can find a minimal sample size 8 to be taken in order that the estimator \ 8 of . provides the
&-interval for . with probability "  α of higher. But firstly we modify (3.4) to have the
inequality
_
T Öl\ 8  .l  &×   "  α. (4.2)

Now, notice that 5 is not specified, but is assumed to be known. Then as in (3.1) we have

&8
#Q 5  " "α

or

&8
Q 5    "  α# Þ (4.3)

MTH 2401, LECTURE NOTES, Page 161, Version 54


CHAPTER V. ESTIMATION

Since Q is strictly monotone increasing, so is Q" and thus we have

  Q" "  α#  œ DαÎ#


&8
5 (4.4)

or

5#  #
8  &# DαÎ# (4.5)

In particular, for α œ !Þ!&ß DαÎ# œ "Þ*' and thus we have

5#
8  &# $Þ)%. (4.6)

Example 4.4. As in Example 4.1, when 5 is unknown, we can replace 5 with max5 in (4.6)
ensuring that the requested accuracy is met at expense of possibly much larger value of 8:

max5#
8  &# $Þ)%. (4.7)

Suppose we need to estimate the proportion of defective items in a large population, or we need
to estimate the proportion of HIV infected individuals, percentage of speed limit violations,
percentage of smokers, obese people, a TV program viewers, or the number of corrupted signals.
Consequently, a sample \" ß á ß \8 is drawn from a Bernoulli population with parameters . œ _ :
and 5 œ :Ð"  :Ñ. The value of : is unknown and it will be estimated by the sample mean \ 8 Þ
#

In this case, we can readily find max5# as follows.

Since 5# Ð:Ñ œ :Ð"  :Ñ, the latter is a second degree polynomial with roots at ! and ". 5# Ð:Ñ is a
parabola with vertex at : œ "# and valued "% Þ Thus, from (2.27),

8   !Þ*' &"# . (4.8)


_
Consequently, if we can allow &% deviation (as &) of : from the estimator \ 8 , we need to test
$)% items. Remember that max5# œ "% ß i.e. when : œ "# . If we know that : must be much
smaller than that, say surely less than !Þ", then max5# Ÿ !Þ!* and the minimal 8 is "$*, a much
smaller sample size than $)%. 

MTH 2401, LECTURE NOTES, Page 162, Version 54


CHAPTER V. ESTIMATION

5. The Difference in Means of Gaussian Populations


In various applications in engineering and biomedical sciences one is interested if two different
“treatments” are similar in the sense that they have approximately the same means. For example,
if we want claim that two different products of different production costs have the same
reliability or two similar medications made by two different pharmaceutical companies have the
same effect, we would like to find an estimator of the difference of their means and prove
empirically that estimates will be small. Even more interesting is to find a corresponding
confidence interval for their means.

Let \" ß á ß \8 − R .ß 5#  and ]" ß á ß ]7 − R / ß $ #  are two independent Gaussian


samples. In some _ applications
_ they need to estimate the difference .  / . Since we know from
section 1 that_ \ 8 and
_ ] 8 are the respective MLE's of \" ß á ß \8 and ]" ß á ß ]7 it stands for
reason that \ 8  ] 7 is an the MLE of the joint sample \" ß á ß ]7 . This is not obvious (since
\ 's and ] 's are not identically distributed), but it can be proved.

Now, since \ 8 − R .ß 5# Î8 and ] 7 − R / ß $ # Î7 and they are independent, from section
_ _

8, Chapter II, it follows that

\ 8  ] 7 − R .  / ß 58  7 Þ
_ _ #
$#

We assume that both 5# and $ # are known. Then,

\ 8 ] 7 ./ 
− R !ß "
_ _

 58#  $7#
^À œ

and we have that

"  α œ T   DαÎ#  ^  DαÎ# 

./ Ð\ 8 ] 7 Ñ
_ _
œ T   DαÎ#   DαÎ# 
 58#  $7#

œ T \ 8  ] 7  DαÎ#  58 
_ _ #
_ _
$#
7  .  /  \8  ] 7

 DαÎ#  58  .
# $#
7

As in the case of a single sample, we call the interval

\ 8  ] 7  DαÎ#  58   ] 7  DαÎ#  58  7
_ _ #
_ _
$# # $#
7 ß \8

the estimator interval for .  / . Correspondingly, the observed interval

MTH 2401, LECTURE NOTES, Page 163, Version 54


CHAPTER V. ESTIMATION

+ß , œ

B8  C7  DαÎ#  58   C7  DαÎ#  58  7


_ _ # $# _ _ # $#
7 ß B8 (5.1)

is a "!!"  α% (two-sided) confidence interval for .  / .

Example 5.1. Two different types of electrical cable insulation have recently been tested to
determine the voltage level at which failures tend to occur. When specimens were subjected to
an increasing voltage stress in a laboratory experiment, failures of two types of cable insulation
occurred at the following values of voltages:

Type A: $' &% Type B: &# '!


%% &# '% %%
%" $( $) %)
&$ &" ') %'
$) %% '' (!
$' $& &# '#
$% %%

Suppose it is known that the amount of voltage that cables having type A insulation can
withstand is normally distributed with an unknown mean . and known variance 5# œ %!,
whereas the corresponding distribution for type B insulation is normal with unknown means /
and variance $ # œ "!!. We need to find the *&% confidence interval for .  / .

Using formula (5.1) with D!Þ!#& œ "Þ*' we obtain

+ß , œ   "*Þ'"ß  'Þ%*. 

MTH 2401, LECTURE NOTES, Page 164, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

CHAPTER VI. NONPARAMETRIC METHODS


1. The Goodness-of-Fit ;# -Test
Suppose we need to find out what class of distributions a particular population Ò\Ó belongs to.
Unlike previous statistical methods we previously encountered (which were aimed at finding
unknown parameters) here we will discuss a nonparametric method of hypothesis testing, in
which we answer the question on whether or not the unknown distribution of \ equals or is
close to some hypothetical distribution of a r.v. \ ! . In this section we will learn the procedure
known as the ;# -test.

We start with some preliminaries.

The ;# - Distribution. It is a special case of the gamma distribution with parameters α and " . In
the pdf form:

0 ÐBlαß " Ñ œ >ÐαÑ B / 1‘ (B),
α" " B
(1.1)

where >ÐαÑ is the gamma function. Taking α œ 8# ß where 8 is a positive integer, and " œ "
# we
arrive at the special case of gamma pdf,
" 8
0 ÐBà 8Ñ œ #8Î# >Ð 8# Ñ
B # " /BÎ# 1‘ (B) (1.2)

called the pdf of a ;# r.v. with 8 degrees of freedom (in notation ;#8 ).

The Chi-Square Test. It is formally the following hypotheses testing:

H! À \ − Ò\ ! Ó (hypothetical),

H" : \ Â Ò\ ! Ó.
We develop a test procedure that will either reject or not reject the null hypothesis.

Suppose we have 8 number of observations B" ß ÞÞÞß B8 of a r.v. \ whose distribution is to be


tested and suppose that in this sample, only 5 of 8 are identified as distinct, say C" ß á ß C5 .

Then, let

83 œ # occurrences of value C3 in the observed sample B" ß á ß B8 .

Now, if \" ß á ß \8 is an 8-sample of r.v.'s from population \ , then define

R3 œ # occurrences of value C3 in the “pre-observed” sample \" ß á ß \8 .

Probability and Statistics MTH 2401, Page 165, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Furthermore, define

:3! œ T \ ! œ C3  as the hypothetical frequency of C3 .

A reasonable measure of deviation of the hypothetical frequencies :3! from to-be-tested


frequencies R83 ß 3 œ "ß á ß 5, would be the r.v.

  R3  :3! # .
5
8
3œ"

In 1900, Karl Pearson suggested a similar test statistic

U œ   R83  :3!  :8! œ  3 8:! 3


R 8: 
5 # 5 ! #
(1.3)
3 3
3œ" 3œ"

and proved that if H! is true, then U converges to ;#5" , as 8 Ä ∞.


The Procedure: Given a significance level α we obtain the critical region Ð- ,∞Ñ in accordance
with

T Ö;#5"  -× œ α or T ;#5" Ÿ -  œ "  α. (1.4)

Then find

- œ ;#5"  Ð"  αÑ œ Tail;#5"  ÐαÑ


" "
(1.5)

(from a table for ;# -distribution); - is called the critical value.

The Interpretation: Given that H! is true, a genuine chi-square r.v. U is unlikely to cross - (of
a chance less than α). The opposite of this would mean that H! is not true.

Practical Application. Define the observed value of U denoted by ; in one of the frequently
used forms:

;À œ œ 8"  :!3  8.
83 8:3! 
5 # 5 #
8
8:3!
(1.6)
3
3œ" 3œ"

Here ; is an observed value of U. The test will consist of checking on whether or not ;  - ,
where - is previously obtained from (1.5). If ;   - , then deviations of \ from \ ! are not
negligible and H! must be rejected at α.

Remark 1.1. Given a fixed 5 , the higher is the significance α, the smaller is - (i.e. the larger is
the critical region). Thus, the higher is α, the more likely it is to reject H! , because we go
"stricter" with moderate deviations and less confidence. 

MTH 2401, LECTURE NOTES, Page 166, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Example 1.1. It is conjectured that the number of wrong telephone connections is Poisson with
parameter - œ !Þ&. A total of 120 days of observations produced the following results:

# wrong connections days observed true Poisson distr. with - œ Þ&


! (" !Þ'!'&
" $( !Þ$!$$
# * !Þ!(&)
$ # !Þ!"#'
% " !Þ!!"'
& ! !Þ!!!#
total "#! "Þ!

Test the null hypothesis H! that the daily number of wrong telephone connections is indeed
Poisson with - œ !Þ& at significance level α œ !Þ!& against the alternative hypothesis H" that it
is not.

Solution. We start with the above table expanding it and giving it a more formal interpretation:

group # wrong connections sets of C3 's days :3!


!
"
! ! 8! œ (" !Þ'!'&

#
" " 8" œ $( !Þ$!$$

$
# # 8# œ * !Þ!(&)

%
$ $ 8$ œ # !Þ!"#'

&
% % 8% œ " !Þ!!"'
& & 8& œ ! !Þ!!!#
total 8 œ "#! "Þ!

So, we have altogether ' groups 5 œ ' made of the numbers of observed occurrences
8! ß á ß 8& with 8 œ "#! days. To calculate ; we use the formula

 8!3  "#! œ $Þ'$'%


& #
"
;œ "#! : 3
3œ0

with the corresponding 83 's from the table. Now, because 5 œ ', we select

- œ Tail;#&  Ð!Þ!&Ñ œ ""Þ!("


"

from the table of ;# -distribution with '  " œ & degrees of freedom making the critical region
G œ ""Þ!("ß ∞. Since ;  Gß we do not reject H! that the daily number of wrong telephone
connections is Poisson with parameter - œ !Þ& at the significance level α œ !Þ!&. 

Remark 1.2 (The P-Value). The P-value is a more common way in statistics to make an
inference about these hypotheses testing. It is defined as

MTH 2401, LECTURE NOTES, Page 167, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

: œ T Ö;#5"  ;×, (1.7)

where ; is the observed value of U in formula (1.6). If the P-value (:) turns out to be very small,
then we say it is highly unlikely that the true ;#5" is larger than ; and thus we reject H! .
However, if : is not small, we say it is quite possible that ;#5" can be that “large” (i.e. as ; or
greater). Therefore, we will not to reject H! .

A good P-value calculator is available on the website

http://stattrek.com/online-calculator/chi-square.aspx

However, one takes into account that the result provided there is "  : and not :.

In Example 1.1, from the same table T Ö;#5" Ÿ $Þ'%× œ !Þ%ß and thus the P-value is

: œ T Ö;#5"  $Þ%'*× œ !Þ'

which is very large thereby proving that the Poisson model with - œ !Þ& provides a reasonable
fit for the collected data. In other words, we do not reject H! that the daily number of wrong
telephone connections obeys the Poisson law with parameter - œ !Þ&. 

Example 1.2. In our next example we want to investigate if the number of fatalities in
automobile accidents obeys a Poisson distribution. Suppose we have a record of 340 fatal
automobile crashes, observed per each hour during 72 consecutive hours. Suppose that some
hours gave zero, one, two, etc. crashes, which we categorized in eight groups. Group 1 included
only zero or one crashes, group 2 included only 2 crashes, etc. Finally, group 8 included 8 or
more crashes. We placed the data into the following table:

Group # crashes sets of C3 's # hours for the value :3!


" ! or " Ö!ß "× 8" œ & !Þ!%
# # Ö#× 8# œ ) !Þ!)
$ $ Ö$× 8$ œ "! !Þ"%
% % Ö%× 8% œ "" !Þ")
& & Ö&× 8& œ "" !Þ")
' ' Ö'× 8' œ * !Þ"&
( ( Ö(× 8( œ ) !Þ"!
) ) or more Ö)ß á × 8) œ "! !Þ"$

total $%! 8 œ (#

Notice that the number of groups Ð5 œ )Ñ made the number of observed occurrences 8" ß á ß 8)
approximately uniform which suggests how we should group the incidences. Consequently, we
have 72 "observations" (i.e. hours) of an unknown r.v. \ À H Ä ÖC" ß á ß C) ×. For instance, take

MTH 2401, LECTURE NOTES, Page 168, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

the set Ö& crashes× and see that according to our records, 5 crashes occurred in the first hour,
third hour, 23rd hour, etc, altogether making 11 hours of 72 during which exactly 5 crashes took
place. Consequently,

ÖB" ß B$ ß B#$ ß á ×, with 8& œ "".

Furthermore, the m.l.e. of the observed data is $%!Î(# œ %Þ(#.

We would like to test that the above data come from the class of Poisson r.v.'s with parameter
- œ &. In the formula

; œ 8"  :!3  8
5 #
8
3
3œ"

we will use the hypothetical frequencies which we can take from the Poisson table for - œ &.

According to the procedure specified in (1.4-1.5), given the significance level α œ !Þ!& and with
5 œ ), we find the “critical” value - from the table for the ;#( distribution - œ ;#(  ÐÞ*&Ñ
"

œ "%Þ!(Þ Now the empirical ; is

; œ $Þ%'  - œ "%Þ!(.

Therefore, we do not reject H! that the proposed distribution is indeed Poisson with - œ &. 

Example 1.3 (Calculation of the P-Value). Recall that the P-value is defined as

: œ T Ö;#5" œ U  ;×

where ; œ $Þ%' in Example 1.2. From the same site we find that T ÖU Ÿ $Þ%'× œ !Þ$( and thus
the P-value is

: œ T ÖU  $Þ%'*× œ !Þ'$

which is very large thereby proving that the Poisson model with - œ & provides a very good fit
for the collected data. Thus we do not reject H! that the number of fatalities obeys the Poisson
law with - œ &. 

Example 1.4. Suppose there are two measurements of aluminum oxide taken from two different
archeological sites from Roman era potteries. Do these findings come from the same period? The
10 measurements from each site are placed in two tables:

Site 1: "! "! "" "# "#


"$ "$ "$ "% "%

MTH 2401, LECTURE NOTES, Page 169, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Site 2: 10 10 10 11 11
11 12 13 1$ 1%

We interpret the measurements of site 1 as hypothetical and calculate their 83 's as hypothetical
frequencies. The measurements from site 2 we take as observed occurrences, all placed into five
groups:

Group Value Occurrences 83 from site 2 :3! (site 1)


" "! $ !Þ#
# "" $ !Þ"
$ "# " !Þ#
% "$ # !Þ$
& "% " !Þ#
total "! "Þ!

Notice that the testing formally works only when none of :3! œ ! which takes place when some
values of site 1 do not occur in site 2. In this case we switch the sites to have the other site
hypothetical and get some 83 œ !. The same adjustment can be made when the numbers of tested
values are different. Ultimately, the Kolmogorov-Smirnov procedure (of section 3) works better
than the ;# -square method anyway, with no need of adjustments.

Now, we substitute data in formula (1.6) to get:


5œ&
" 8#3
;œ "! :3!
 "!
3œ"

"  * * " % " 


œ "! !Þ#  !Þ"  !Þ#  !Þ$  !Þ#  "! œ &Þ)$.

The P-value is

: œ U%  &Þ)$ œ !Þ#".

Because the P-value is fairly large, we do not reject the null hypothesis that both finding come
from the same period of Roman era.

MTH 2401, LECTURE NOTES, Page 170, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

PROBLEMS

1.1. A coin is tossed until a head occurs and the number \ of tosses is recorded. After repeating
the experiment 256 times the following results were obtained:

^3
B " # $ % & ' ( )
83 "$' '! $% "# * " $ "

where B ^ 3 is the number of tosses needed to obtain the first head and 83 is the number of
experiments in which B ^ 3 occurs. Test the hypothesis at the !Þ!& significance that the observed
distribution of \ is geometric with parameter : œ "# Þ

1.2. According to the Mendelian theory of genetics a certain garden pea plant should produce
either white, or pink, or red flowers, with respective probabilities "% ß "# ß "% Þ To test this theory, a
sample of 564 peas were studied and they produced "%" white, 291 pink, and 132 red flowers.
Test the hypothesis at !Þ!& significance that the observed sample agrees with Mendelian theory.
Also give the P-value. Answer: ; œ !Þ)'"( and P-value œ !Þ'%).

1.3. To claim that a certain die was fair, 1000 rolls of the die were recorded with the following
results

Outcome # Occurrences
" "&)
# "(#
$ "'%
% ")"
& "'!
' "'&

Test the hypothesis that the die is fair at !Þ!& significance. Answer: ; œ #Þ"(*'ß P-value
œ !Þ)#%.

Solution. We expand the above table as follows. Some more optional columns are added for
convenience.

Group 83 8#3 :3! "'(!$! † !Þ!!' ; -


"
" "&) #%*'% '
"
# "(# #*&)% '
"
$ "'% #')*' '
"
% ")" $#('" '
"
& "'! #&'!! '
"
' "'& #(##& '
" "
total "!!! "'(!$! "!!! † ' œ !Þ!!' "!!#Þ") #Þ") ""Þ!(

MTH 2401, LECTURE NOTES, Page 171, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

; œ 8"  :!3  8ß - œ Tail;#&  !Þ!& œ ""Þ!("Þ


5 #
8 "
3
3œ"

So, ; œ #Þ")  - œ ""Þ!(" Ê do not reject H! that the die is balanced.

: œ T ÖU&  #Þ")× œ !Þ)& very large meaning that the true value of a chi-square r.v. is very
likely to be #Þ") or larger. 

1.4. It is conjectured that the daily number of electrical power failures in a certain city obeys the
Poisson law with mean 4.2. A total of 150 days of observations produced the following results:

# Failures Days Observed


! !
" &
# ##
$ #$
% $#
& ##
' "*
( "$
) '
* %
"! %
"" !

Test the H! that the number of power outages is indeed Poisson with - œ %Þ# at α œ !Þ!& and
give the P-value. Answer: ; œ "&Þ*&&ß P-value œ !Þ"%$.

1.5. A contractor who purchases a large number of fluorescent light bulbs has been told by the
manufacturer that these bulbs are not of uniform quality but that each bulb produced will,
independently, either be of quality level 1, 2, 3, 4, or 5, with respective probabilities :" œ .15,
:# œ .25, :$ œ .35, :% œ .20, :& œ .05. However, the contractor feels that he is receiving too
many type 5 (the lowest quality) bulbs, and so he decides to challenge the manufacturer's claim
by taking the time and expense to ascertain the quality of 30 such bulbs.

Suppose that he discovers that of the 30 bulbs, 3 are of quality level 1, 6 are of quality level 2, 9
are of quality level 3, 7 are of quality level 4, and 5 are of quality level 5. Do these data, at the
5% level of significance, enable the contractor to reject the manufacturer's claim? Find the P-
value and interpret the result.

MTH 2401, LECTURE NOTES, Page 172, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Hint: Identify the claimed frequencies :" ß á ß :& as hypothetical, thus having :"! œ .15, :#! œ .25,
:$! œ .35, :%! œ .20, :&! œ .05. Then, take 8" œ $ß 8# œ 'ß 8$ œ *ß 8% œ (ß 8& œ & and substitute
the data in formula (1.6):


5œ&
" 8#3
;œ 30 :3!
 $! œ *Þ$%).
3œ"

Now, find the P-value:

: œ T U%  *Þ$%) œ "  !Þ*%( œ !Þ!&$.

Thus the hypothesis should not be rejected at the 5 percent level of significance. However, it
would be rejected at all significance levels above !Þ!&$.

MTH 2401, LECTURE NOTES, Page 173, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

2. Testing Independence in Contingency Tables


We start with the following example.

Example 2.1. Table 2.1 below is a # ‚ $ contingency table obtained from a report on the
relationship between aspirin use and heart attacks by the Research Study Group at Harvard
Medical School among two groups (""ß !$% and ""ß !$() of participating physicians.

F" Fatal Heart Attack F# Nonfatal Attack F$ No Attack


E" Placebo ") "(" "!ß )%&
E# Aspirin & ** "!ß *$$
Table 2.1

The above data was a 5-year randomized study of whether regular aspirin intake reduces
mortality from cardiovascular disease. The study was “double-blind” - those who in the study
did not know whether they were taking aspirin or placebo.

In Table 2.1 we have attributes A and B, with their respective categories:

A: medication, with two categories:


E" œ placebo and E# œ aspirin

B: cardiovascular condition (or outcome), with three categories:


F" œ fatal attack, F# œ nonfatal attack, and F$ œ no attack.

Define

\ À Öphysician: E" œ in placebo group, E# œ in aspirin group× Ä Ö"ß #×

] À ÖF" œ fatal attack, F# œ nonfatal, F$ œ no attack× Ä Ö"ß #ß $×.

with the distribution

:34 œ T Ö\ œ 3ß ] œ 4×ß 3 œ "ß # and 4 œ "ß #ß $. (2.1)

and the marginal distributions

:3† À œ T Ö\ œ 3×ß :†4 À œ T Ö] œ 4×. (2.2)

The r.v.'s \ and ] are independent if and only if

:34 œ :3† :†4 for all 3 œ "ß # and 4 œ "ß #ß $. (2.3)


MTH 2401, LECTURE NOTES, Page 174, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

The Formalism. In general, if E" ß á ß EV and F" ß á ß FG be V (rows) and G (columns)

Ð\ß ] Ñ of population Ð\ß ] Ñ is


exclusive categories or classifications of attributes A and B, then a representative random vector

Ð\ß ] Ñ À H\ ‚ H] œ ÖE" á ß EV × ‚ ÖF" ß á ß FG ×

Ä Ö"ß á ß V× ‚ Ö"ß á ß G×,

with the distribution

:34 œ T Ö\ œ 3ß ] œ 4× (2.1a)

and the marginal distributions

:3† À œ T Ö\ œ 3×ß :†4 À œ T Ö] œ 4×. (2.2a)

The r.v.'s \ and ] are independent if and only if

:34 œ :3† :†4 for all 3 œ "ß á ß V and 4 œ "ß á ß G . (2.3a)

The associated hypotheses will read:

H! À :34 œ :3† :†4 for all 3 œ "ß á ß V and 4 œ "ß á ß G .

or equivalently

H! À ATTRIBUTE A is independent of ATTRIBUTE B

H" À at least on one occasion, equations (2.3a) do not hold.

In the context of Example 2.1, the veracity of H! would mean that the aspirin has no impact (or
no improvement) on the underlying cardiovascular condition.

The Method. We conduct 8 independent trials \" ß ]" ß á ß Ð\8 ß ]8 Ñ (in the context of
Example 2.1, 8 œ ##ß !(") of the random vector Ð\ß ] Ñ. Then, introduce the r.v.'s

R34 œ # pairs from Ð\" ß ]" Ñß á ß Ð\8 ß ]8 Ñ in which \5 œ 3 and ]5 œ 4


3 œ "ß á ß Vß 4 œ "ß á ß G .

For example, from Table 2.1, 8#" (pre-observed R#" ) œ # physicians of the total 8 œ ##ß !("
who took aspirin and had fatal heart attacks is &.

Since the real values of :34 's in (2.1a) are unknown, we use the following proxies (i.e. estimators)
for :34 À

MTH 2401, LECTURE NOTES, Page 175, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

T^ 34 À œ
R34
8

and for the marginal distributions :3† and :†4 À

T^ 3† À œ T^ †4 À œ
R3† R†4
8 and 8 ß

respectively, where

R3† À œ  R34 and R†4 œ  R34 .


G V

4œ" 3œ"

It makes sense not to reject H! if

T^ 34 ¸ T^ 3† T^ †4 Þ (2.4)

Since they are never equal, we need to figure out how far they may deviate to have H! hold in a
reasonable way. For example, we can consider the statistic

  T^ 34  T^ 3† T^ †4 
V G #

3œ" 4œ"

and see if it is reasonably small. Pearson and Fisher proposed the statistic

U œ U8 œ   T^ 34  T^ 3† T^ †4  ÎT^ 3† T^ †4 Î8


V G #

3œ" 4œ"

reducible to

œ 8   34 R83† R3††4 †4
R  R R 
V G " #
(2.5)
3œ" 4œ"

which they claimed was asymptotically chi-square with ÐV  "ÑÐG  "Ñ degrees of freedom.

The empirical analog of U is the observed number

; œ 8   34 883† 83†4† †4 œ 8    83† 834†4  "Þ


V 8  8 8 
G " 8
# V G #
(2.6)
3œ" 4œ" 3œ" 4œ"

Example 2.2. (Example 2.1 revisited.) We expand Table 2.1 in accordance with the above speci-
fications:

MTH 2401, LECTURE NOTES, Page 176, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

3Ï4 Category " Category # Category $ A marginals


med\cond Fatal Nonfatal No attack
Cat " Placebo 8"" œ ") 8"# œ "(" 8"$ œ "!)%& 8"† œ ""!$%
Cat # Aspirin 8#" œ & 8## œ ** 8#$ œ "!*$$ 8#† œ ""!$(
B mar 8†" œ #$ 8†# œ #(! 8†$ œ #"(() 8 œ ##!("
Table 2.2

By formula (2.11) we get

; œ 8   83† 834†4  "


V 8 G #

3œ" 4œ"

#
") "("# "!)%&#
œ ##!(" ""!$%†#$  ""!$%†#(!  ""!$%†#"(()

 &#
""!$(†#$  **#
""!$(†#(!  "!*$$#
""!$(†#"(()  "

œ ##!("!Þ!!"#('')'  !Þ!!*)"&""'  !Þ%)*%%*()"

 "
""!$( Ð"Þ!)'*&'&#"  $'Þ$  &%))Þ&)))*"Ñ  "

œ #'Þ*!#*$()".

The P-value of a ;## -r.v. (with Ð#  "Ñ † Ð$  "Ñ d.f.) taken from the table of tails of chi-square
PDF's is

T Ö;##  #'Þ*!#*$()"×  !Þ!!&

which is very small. We therefore reject the null hypothesis that taking placebo or aspirin does
not impact a cardiovascular condition. 

Example 2.3. Can mobile devices (such as phones and tablets) and radio transmitters interfere
with airplane instruments? In some independent studies they registered a total of 370 incidences
of airplane instruments malfunctioned when personal mobile devices were or were not used.
When they were used by mobile devices, two communication frequencies that the FCC bans are
450 and 800 MHz (since they are believed to interfere with planes communications). The
incidents were categorized in the following table:

affected instruments\use of phones 450-MHz 800-MHz No use


Galvanometers & $! "!
Navigation systems $! %& "!!
Pilot communication noises #! &! )!

MTH 2401, LECTURE NOTES, Page 177, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Test the hypothesis that the use of mobile phones does not interfere with airplane instruments at
significance α œ !Þ!& and also find the P-value.

Answer: V œ G œ $ Ê - œ *Þ&. ; œ #(Þ(  *Þ& Ê H! is rejected Ê mobile phones do


interfere with plane instruments.

Solution. We expand the above table as follows:

Attrib B1 B2 B3
instruments\use of phones 450-MHz 800-MHz No use A-marginals
A1 Galvanometers 8"" œ & 8"# œ $! 8"$ œ "! 8"† œ %&
A2 Navigation systems 8#" œ $! 8## œ %& 8#$ œ "!! 8#† œ "(&
A3 Pilot communication noises 8$" œ #! 8$# œ &! 8$$ œ )! 8$† œ "&!
B-marginals 8†" œ && 8†# œ "#& 8†$ œ "*! total $(!

Furthermore, V œ G œ $ß 8 œ #"!!

; œ 8  34  " œ $(! 8""†  8""†"  8†$   á


Vœ$ Gœ$ 8#
8# 8#"# 8#"$
83† 8†4 8†# 
3œ" 4œ"

œ $(! %&
"
&# Î&&  $!# Î"#&  "!# Î"*!

 á Ó œ #(Þ(.

From the Table for chi-square, Tail;#%  !Þ!& œ *Þ%)) Ê Reject H! that attributes A and B
"

are independent. In other words, The use of mobile phones and malfunctions of airplane
instruments are related.

Now from the site

http://stattrek.com/online-calculator/chi-square.aspx

we find that T ;#% Ÿ #(Þ( ¸ " implying T ;#%  #(Þ( is very small, so that the true chi-
square r.v. (representing the deviations of joint and product of marginal distributions) being so
large is highly unlikely. Therefore, we reject the null hypothesis. 

The General Test Procedure. If we have a contingency table

MTH 2401, LECTURE NOTES, Page 178, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Attribute AÏAttribute B Category B4 A-marginals


ã

83† œ  834
G
Category A3 á 834 á
4œ"
ã

8† 4 œ 
V
B-marginals total 8
3œ"

Table 2.3

we need to calculate the marginal quantities of outcomes in each category to get the value of ;
subject to the formula

; œ 8    83† 834†4  "Þ


8 V G #

3œ" 4œ"

As in the hypotheses testing of section 1 we set

T Ö;#V"G"  -× œ α (2.7)

and find the value - , which is the Ð"  αÑ-quantile of ;#ÐV"ÑÐG"Ñ r.v. to obtain the critical region
Ð-ß ∞Ñ. Then, we count the observed occurrences of pairs 3ß 4

834 œ  1Ö3ß4× ÐB5 ß C5 Ñ


8
(2.8)
5œ"

and their marginals

83† À œ  834 and 8†4 œ  834 .


G V
(2.9)
4œ" 3œ"

directly from the contingency table. Finally, we reject the null hypothesis H! , if ;  - .

Alternatively, we calculate the P-value

: œ T Ö;#ÐV"ÑÐG"Ñ  ;× (2.10)

and reject H! if : turns out to be small.

MTH 2401, LECTURE NOTES, Page 179, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

PROBLEMS

2.1. Show that formula (2.6),

; œ 8   34 883† 83†4† †4
V8  8 8 
G " #

3œ" 4œ"

can be reduced to

; œ 8    83† 834†4  ".


V8 G #

3œ" 4œ"

2.2. A random sample of 795 individuals was collected to investigate whether smoking and
drinking alcohol are related. The results were as follows:

Heavy Smoker Moderate Smoker Nonsmoker


Heavy Drinker #! '# '
Moderate Drinker %! ) "&*
Nondrinker (! *! $%!

Test the hypothesis that drinking alcohol and smoking are independent at α œ !Þ!&.

2.3. Suppose 332 people were selected at random and each person in the sample is classified
according to blood type, Sß Eß Fß EFÞ They were also classified according to other blood types
V2 positive or negative. The observed data are put in the following table:

S E F EF
V2 positive *# )* '' "*
V2 negative "$ $( ( *

Test the hypothesis that the two classifications of blood types are independent at α œ !Þ!&.
Explain your steps, interpret the result, and make a conclusion. Also find the P-value.

Hint: ; œ ")Þ$## and T ;#%  ")Þ$## ¥ !Þ!!&.

2.4. Suppose 300 people were selected at random and each person in the sample is classified
according to blood type, Sß Eß Fß EFÞ They were also classified according to other blood types
V2 positive or negative. The observed data are put in the following table:

S E F EF
V2 positive )# )* &% "*
V2 negative "$ #( ( *

MTH 2401, LECTURE NOTES, Page 180, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Test the hypothesis that the two classifications of blood types are independent at α œ !Þ!& and
also find the :-value.

2.5. A random sample of 2100 death certificates of adults were examined in a large metropolitan
area and showed the following results:

Cause of Death Heavy Smoker Moderate Smoker Nonsmoker


Respiratory Disease && "#! "'#
Heart Disease %* $)) $"&
Other '" $!! '&!
Test the hypothesis that the cause of death is independent of a person's smoking habit. Make
your judgment based on the :-value.

Solution. We expand the above table as follows:

Attrib B1 B2 B3
death cause\habit Heavy Smoker Moderate Non A-marginals
A1 Respiratory 8"" œ && 8"# œ "#! 8"$ œ "'# 8"† œ $$(
A2 Heart 8#" œ %* 8## œ $)) 8#$ œ $"& 8#† œ (&#
A3 Other 8$" œ '" 8$# œ $!! 8$$ œ '&! 8$† œ "!""
B-marginals 8†" œ "'& 8†# œ )!) 8†$ œ ""#( #"!!

Furthermore, V œ G œ $ß 8 œ #"!!

; œ 8    83† 834†4  "


V 8 G #

3œ" 4œ"

œ #"!! $$(
"
&&# Î"'&  "#!# Î)!)  "'## Î""#(
á

œ "$%Þ"#

From the Table for chi-square, Tail;#%  !Þ!& œ *Þ%)) Ê Reject H! that attributes A and B
"

are independent. In other words, smoking and death caused by respiratory, heart, and other
conditions are related.

Now, T ;#%  "$%Þ"# is very small, so that the true chi-square r.v. (representing the deviations
of joint and product of marginal distributions) being so large is highly unlikely. Therefore, we
reject the null hypothesis. 

2.6 A sample of $!! people was randomly chosen and were individually identified as to their
gender and political affiliation, Democrat, Republican, or Independent. The results were placed
in the following table:

MTH 2401, LECTURE NOTES, Page 181, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Democrat Republican Independent


Women ') &' $#
Men &# (# #!

Test the hypothesis that the gender and political affiliation are independent at α œ !Þ!&. Also
find the P-value and interpret the result.

Hint. The expanded table is as follows:

A\B Democrat Republican Independent Marginals


Women ') &' $# "&'
Men &# (# #! "%%
Marginals "#! "#) &# total $!!

This is a V œ # ‚ G œ $ contingency table. The ; gives the value 'Þ%$$ and the critical value -
of the chi-square r.v. with # d.f. at &% is &Þ**". Since ;  - , we reject the null hypothesis at 5%
significance that the gender and political affiliation are independent.

The P-value is calculated as T ;##  'Þ%$$ œ !Þ!%. The latter means that we do not reject the
null hypothesis at significance levels lower that %%. 

2.7 A company operates four machines on three separate shifts daily. The following contingency
table presents the data during a 6-month time period, concerning the machine breakdowns that
resulted.

Shift\Machine A B C D
1 "! "# ' (
2 "! #% * "!
3 "$ #! ( "!

Determine if machines' breakdowns are independent of a particular shift the company operates
using the P-value argument.

Answer: ; œ "Þ)"%). P-value is : œ T ;#'  "Þ)"%) œ !Þ*$&*. Since the P-value is very large
we do not reject the null hypothesis that machines' breakdowns are independent of the shifts. 

MTH 2401, LECTURE NOTES, Page 182, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

3. Kolmogorov-Smirnov Goodness-of-Fit Test


In section 1 we studied the chi-square goodness-of-fit test. In this section we will study yet
another goodness-of-fit test which is somewhat better suitable for continuous distributions,
although not as tame for discrete distribution as the chi-square test.

Suppose we need to figure out what class of distributions a particular population Ò\Ó (both
continuous or discrete) belongs to. Here we will discuss a hypothesis testing on whether or not
the unknown PDF (probability distribution function) J of \ equals or is close to some hypo-
thetical PDF J ! .

The idea is to collect and order a sample B"  á  B8 and then form an empirical discrete PDF
J8 with successive increments 8" and compare it with the hypothetical PDF J ! by the formation
of the largest deviation between the two. Andrey Kolmogorov suggested a test statistic which
evaluates the goodness of J ! Þ

Let \" ß á ß \8 − Ò\Ó be a sample of continuous r.v.'s drawn from population Ò\Ó with a
common PDF J . Suppose that after observation, their values are B" ß á ß B8 (which are supposed
to be all different). We can also assume that

B"  á  B8 (3.1)

or just reorder them. We now construct the associated sample PDF, also referred to as the
empirical distribution function (EDF) relative to ordered sample (3.1):

J8 ÐBÑ œ  8
5
ß B5 Ÿ B  B5" ß 5 œ "ß á ß 8ß B8" œ ∞
(3.2)
!ß B  B"

Based on J8 ÐBÑ define the r.v. (random PDF)

 ]5 ß
8
J^ 8 ÐBß =Ñ À œ "
8 (3.3)
5œ"

where ]5 À œ " if \5 − Ð  ∞ß BÓ and _ ]5 œ ! if \5  Р ∞ß BÓ, which is the sample mean of


r.v.'s ]" ß á ß ]8 . The expectation of ] 8 can be easily shown to equal J B:

I] 8 œ 8"  T Ö\5 Ÿ B× œ 8"  J ÐBÑ œ J ÐBÑ.


_ 8 8
(3.5)
5œ" 5œ"

By Kolmogorov's version of the Law of Large Numbers


_
] 8 Ä I]" œ J ÐBÑß for each B, as 8 Ä ∞ß (3.6)

MTH 2401, LECTURE NOTES, Page 183, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Thus it makes sense to choose J^ 8 as an estimator of J . However, we do not estimate the


unknown PDF J , but rather test if some hypothetical PDF J ! fits it.

Therefore, test the hypotheses:

H! À J ÐBÑ œ J ! ÐBÑ (for some PDF called hypothetical)


(3.8)
H" À J ÐBÑ Á J ÐBÑ.
!

To argue that J ! is a good fit for J we form the test statistic X8

X8 À œ 8 supÖlJ^ 8 ÐBß =Ñ  J ! ÐBÑl À B − ‘× (3.10)

referred to as the Kolmogorov-Smirnov statistic.

Kolmogorov and Smirnov showed that if null hypothesis H! is true, then as 8 Ä ∞ß

T ÖX8 Ÿ >× Ä LÐ>Ñ (for each >) (3.13)

where L > is the well-known H-PDF which is tabulated. (See Table 3.2.) Like in other
hypothesis tests, we set

T ÖX8  -l H! is true× œ "  LÐ-Ñ (3.14)

(for a large 8) and we set it equal to a significance level α, i.e.,

T ÖX8  -lH! is true× œ α. (3.15)

From (3.14-15), then

- œ L " Ð"  αÑ (3.16)

and so the critical region is

G œ Ð-ß ∞Ñ. (3.17)

For instance, if the significance level α œ !Þ!&ß from Table 3.2 we will have L " Ð!Þ*&Ñ œ "Þ$&
approximately, since LÐ"Þ$&Ñ œ !Þ*%() is being closest to !Þ*&.

As we have already done it for other instances of hypotheses testing, we form an empirical PDF
J8 (in place of random J^ 8 ) and compare it with the hypothetical J ! by calculating the norm (the
largest distance) of their difference .8 and forming the empirical version

78 œ 8.8 (3.22)

MTH 2401, LECTURE NOTES, Page 184, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

of statistic X8 . The value 78 is then measured up against G and if it falls into G , we reject the
null-hypothesis. [We then say that if in place of 78 there were X8 , it would be unlikely to see X8
greater than - .] Conversely, we do not reject H! is 78 Ÿ - .

We cautiously admit that another candidate J ! may possibly exist that produce an empirical 78
less than - . 

The technical part of this can be laid out as the following procedure.

Procedure 1.

Step 1. Take B"  á  B8 ordered.

Step 2. Construct

?
3 À œ lJ8 ÐB3 Ñ  J ÐB3 Ñl and ?3 À œ lJ8 ÐB3  Ñ  J ÐB3 Ñl.
!  !

(3.23)

Since J ! is monotone increasing and J8 is piecewise linear, suplJ8 ÐBÑ  J ! ÐBÑl for
B3 Ÿ B  B3" is readily seen to be reached at B3 or B3" with values ? 
3 or ?3 , ?3" or ?3" ,
 

respectively.

Figure 3.1

Step 3. Find a maximum of #8 values:

.8 À œ maxÖ? 
3 ß ?3 à 3 œ "ß á ß 8× (3.24)
(empirical value of H8! )

Step 4. Check if the empirical value 78 œ 8.8 of the Kolmogorov


-Smirnov's statistic is greater than - œ L " Ð"  αÑ.

MTH 2401, LECTURE NOTES, Page 185, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

If so, reject the null hypothesis.

If 78 Ÿ -ß then do not reject L! .

> LÐ>Ñ > LÐ>Ñ > LÐ>Ñ > LÐ>Ñ


!Þ$! !Þ!!!! !Þ(& !Þ$(#) "Þ#! !Þ))() "Þ)! !Þ**'*
!Þ$& !Þ!!!$ !Þ)! !Þ%&&* "Þ#& !Þ*"#" "Þ*! !Þ**)&
!Þ%! !Þ!!#) !Þ)& !Þ&$%( "Þ$! !Þ*$"* #Þ!! !Þ***$
!Þ%& !Þ!"#' !Þ*! !Þ'!($ "Þ$& !Þ*%() #Þ"! !Þ**(
!Þ&! !Þ!$'" !Þ*& !Þ'(#& "Þ%! !Þ*'!$ #Þ#! !Þ****
!Þ&& !Þ!((# "Þ!! !Þ($!! "Þ%& !Þ*(!# #Þ$! !Þ****
!Þ'! !Þ"$&( "Þ!& !Þ((*) "Þ&! !Þ*(() #Þ%! "Þ!!!!
!Þ'& !Þ#!)! "Þ"! !Þ)##$ "Þ'! !Þ*))! #Þ&! "Þ!!!!
!Þ(! !Þ#))) "Þ"& !Þ)&)! "Þ(! !Þ**$) #Þ'! "Þ!!!!

Table 3.2 (of L -distribution)

Remark 3.1. It is easily seen that


3"  3
3 œ l 8  J ÐB3 Ñl and ?3 œ l 8  J ÐB3 Ñlß 3 œ "ß á ß 8.
? ! !

Remark 3.2 (The P-Value). The P-value as we know it from the previous two sections, is
defined as

: œ T ÖX  78 ×, (3.29)

We recall that the P-value is set apart from any significance level α (according to which we
derive the critical region G ). So, after getting 78 we substitute it in (3.29) to obtain :. Let us
suppose that after the collection of empirical data and evaluation of 78 , using the table for L r.v.,
we have some :.

From (3.29), it is obvious that : is the tail-area under the density curve on the right of 7 . Clearly,
the larger is the P-value, the larger will be the area under the curve, namelyß Ð78 ß ∞Ñ. And visa
versa, the smaller : is, the smaller will be Ð78 ß ∞Ñ. If the P-value, : turns out to be very small,
then we say that it is a very low probability that the real X8 can be larger than 78 and thus we
reject H! . However, if : is not small, we say it is quite possible that X can be that “large.”.
Therefore, we will not reject H! . 

Example 3.1. The time (in seconds) between successive vehicle arrivals at a certain intersection
was measured for some period of time and yielded the following:

!Þ$ !Þ' "Þ! "Þ" "Þ$ "Þ) "Þ* #Þ" #Þ$ %Þ! &Þ!

&Þ& )Þ! )Þ# "!Þ! "$Þ!

MTH 2401, LECTURE NOTES, Page 186, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Ð3Ñ Test the hypothesis that these data come from an exponential distribution at the significance
level α œ !Þ!&. First find the m.l.e. of the unknown parameter -, and then test the hypothesis.

Ð33Ñ Test the hypothesis that these data come from an exponential distribution with a mean of '
seconds and use α œ !Þ!&.

Solution. Ð3Ñ We have 8 œ "', the m.l.e. - ^ œ "Î%Þ"$"#& (see Problem 1.2, Chapter III), the

Table 3.10), we arrive at ."' œ !Þ"$&%)#"%$. Therefore, 7"' œ "'."' œ !Þ&%"*Þ The latter is
hypothetical PDF J ! ÐBÑ œ "  /BÎ%Þ"$"#& . After calculations (see the attached spreadsheet, as

less than "Þ$' and hence we do not reject the null hypothesis that above data come from the
exponential distribution with parameter "Î%Þ"$"#&.

Table 3.3

MTH 2401, LECTURE NOTES, Page 187, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

PROBLEMS

3.1. Test the hypothesis that the below data come from a uniform distribution on interval Ð!ß "Ñ À

!Þ%# !Þ!' !Þ)) !Þ%! !Þ*! !Þ$)


!Þ() !Þ(" !Þ&( !Þ'' !Þ%) !Þ$&
!Þ"' !Þ## !Þ!) !Þ"" !Þ#* !Þ(*
!Þ(& !Þ)# !Þ$! !Þ#$ !Þ!" !Þ%"
!Þ!*

Table 3.4

3.2. Test the hypothesis that the below data come from a Gaussian distribution with parameters
Ð. œ #'ß 5# œ %Ñ À

#&Þ!)) #'Þ'"& #&Þ%') #(Þ%&$ #$Þ)%&


#&Þ**' #'Þ&"' #)Þ#%! #&Þ*)! $!Þ%$#
#&Þ&'! #&Þ)%% #'Þ*'% #$Þ$)# #&Þ#)#
#%Þ%$# #$Þ&*$ #%Þ'%% #'Þ)%* #'Þ)!"
#'Þ$!$ #$Þ!"' #(Þ$() #&Þ$&" #$Þ'!"

Table 3.5

MTH 2401, LECTURE NOTES, Page 188, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

4. Kolmogorov-Smirnov Test for Two Populations


Kolmogorov-Smirnov method is also useful when applying it for a discrete-valued r.v., (known
as the Kolmogorov-Smirnov test for two samples) with a slightly different procedure. Suppose
we are in the hypotheses testing (3.8), only J and J ! are now discrete PDF's and consequently,
both J8 and J ! are discrete. If the associated hypothetical r.v. \ ! is valued in a countable set
I ! , then clearly, we can restrict I ! to a finite set ÖC" ß á ß C7 × where C7 œ minÖC4 À C4   B8 ×.
This is because past C7 ß the difference between J8 and J ! will only decrease. We thus can
assume that I ! œ ÖC" ß á ß C7 × without loss of generality.

We notice that when constructing the empirical distribution function (EDF) J8 ß with some B3 's
equal, we proceed as follows. If B3" œ á œ B3= , then we set

J8 ÐB3" Ñ  J8 ÐB3"  Ñ œ 8= .

The corresponding Kolmogorov-Smirnov statistic is

X78 œ  78
78 
supÖlJ^ 8 B  J ! Bl À B − ‘×.
"Î#
(4.1)

Consequently, we need to find the critical region from

T ÖX78  -lH! × œ αÞ (4.2)

This version of the Kolmogorov-Smirnov Theorem asserts that

lim T ÖX78 Ÿ >× œ LÐ>Ñ.


7ß8Ä∞

Procedure 2

Step 1. Take B"  á  B8 and C"  á  C7 ordered and construct the associated empirical
PDF's J8 (for B"  á  B8 ) and K7 (for C"  á  C7 ) as in Step 1 of Test Procedure 1.

Step 2. Mix two sets ÖB" ß á ß B8 × and ÖC" ß á ß C7  in one and reorder it denoting it
ÖD" ß á ß D78 ×.

Step 3. Construct lJ8 ÐD5 Ñ  K7 ÐD5 Ñl for all 5 œ "ß á ß 7  8Þ

Step 4. Find

.78 À œ maxlJ8 ÐD5 Ñ  K7 ÐD5 Ñl À 5 œ "ß á ß 7  8,


(4.3)
which is the empirical version of !
H78 Þ

Step 5. Check if the empirical Kolmogorov-Smirnov statistic

MTH 2401, LECTURE NOTES, Page 189, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

778 À œ .78  78


78
 L " Ð"  αÑ. (4.4)

If so, then reject H! at the significance level αÞ

Figure 4.1

As per Figure 4.1, maximum .78 between J8 and K7 occurs at one of the points ÖD" ß á ß D78 ×
of Step 2.

Example 4.1. Suppose there are two measurements of aluminum oxide taken from two different
archeological sites from Roman era potteries. Do these findings come from the same period? In
other words, if J8 is the sample PDF of site 1 and K7 is the sample PDF of site 2, do they
belong to the same class of PDF's?

The below Table 4.2 contains the data about the aluminum oxide contents from both sites as B3 's
and C4 's. They are reordered, while placed in different columns for convenience. As reordered
they represent the set ÖD" ß á ß D"& ×.

MTH 2401, LECTURE NOTES, Page 190, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

8 B3 7 C4 J"! K& .78


" "!Þ" !Þ" !Þ!
# "!Þ* !Þ# !Þ!
$ ""Þ" !Þ$ !Þ!
" ""Þ& !Þ$ !Þ#
% "#Þ% !Þ% !Þ#
& "#Þ& !Þ& !Þ#
# "#Þ( !Þ& !Þ%
' "$Þ" !Þ' !Þ%
( "$Þ% !Þ( !Þ%
) "$Þ) !Þ) !Þ% !Þ%
$ "%Þ$ !Þ) !Þ'
* "%Þ' !Þ* !Þ'
% "'Þ( !Þ* !Þ)
"! "'Þ* "Þ! !Þ)
& "(Þ# "Þ! "Þ!

Table 4.2

The largest difference between J"! and K& is at D"! œ B) œ "$Þ) and it equals .78 œ !Þ%. Now,
let α œ !Þ!&Þ Then, L " Ð!Þ*&Ñ œ "Þ$' (as already mentioned). Therefore the critical region is

G œ Ð"Þ$'ß ∞Ñ.

Furthermore,

 78
78
œ  &!
"& œ "Þ)#' Ê 778 œ !Þ% † "Þ)#' œ !Þ($.

Since !Þ($  "Þ$', H! is not rejected at α œ !Þ!& that the sites are from the same era.

On the other hand, LÐ!Þ($Ñ œ !Þ$&. Therefore, the P-value is "  !Þ$& œ !Þ'& and we will
accept H! at each α  !Þ'&. 

Remark 4.1. In testing a discrete distribution an observed sample B" ß á ß B8 need not have all
^"ß á ß B
distinct values. In this case we gather them in 5 groups (or rather sets), like B ^ 5 , where
^
C" − B" œ ^
ÖB" ß á ß B8" ×ß with lB" l œ 8" (all B" ß á ß B8" equal), etc. So, lB ^"l
 á  lB ^ 5 l œ 8"  á  85 œ 8 is the total size of the sample now consisting of 5 distinct
groups, each containing identical values. If we construct the corresponding EFD J8 ÐBÑ, then
J8 ÐC" Ñ œ 8" Î8ß J8 ÐC# Ñ œ Ð8"  8# ÑÎ8ß and so on, with now C"  á  C5 and J8 being a
piecewise linear monotone non decreasing function. 

To illustrate Remark 4.1 consider the following example.

MTH 2401, LECTURE NOTES, Page 191, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

Example 4.2. The data in Table 3.5 below shows the frequencies counts for 8 œ 400
observations on the number of bacterial colonies within the field of a microscope using samples
of milk film.

Number of Colonies per Field C3 # Observations 83


! œ C! &' œ 8!
" œ C" "!% œ 8"
# )!
$ '#
% %#
& #(
' "!
( *
) &
* $
"! #

Table 4.3

So, in the first column we place the number of colonies grouped in each of the 11 rows
beginning with ! and ending with "! being the largest number of 400 observations. The second
column provides us with the numbers of observations corresponding to the named groups. For
^ ! includes a total of &' observations with no colonies. Thus, C! − B
instance, the first group B ^!.

We need to test the hypothesis that the data fit a Poisson distribution with some reasonable
parameter - at α œ !Þ!&.

Solution. First we find the total number of the observed colonies:

5 œ ! † &'  " † "!%  # † )!  á  "! † # œ *('Þ

That means B3 œ #colonies in the 3th observation, 3 œ "ß á ß &' œ 8! (with B" œ á œ B&' œ !)
and so on. Therefore, the sample mean of the number of colonies (per observation) is
_ ^.
B%!! œ *('Î%!! œ #Þ%% œ -

Next, we form the EDF J%!! based on the above frequencies and compare it with the genuine
hypothetical Poisson PDF with - œ #Þ& (for convenience in place of #Þ%%). The following table
shows calculations:

MTH 2401, LECTURE NOTES, Page 192, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

C3 83 83 Î%!! J%!! ÐC3 Ñ J ! ÐC3 Ñ ?3


! &' !Þ"% !Þ"% !Þ!)#" !Þ!&(*
" "!% !Þ#' !Þ%! !Þ#)($ !Þ""#(
# )! !Þ# !Þ' !Þ&%$) !Þ!&'#
$ '# !Þ"&& !Þ(&& !Þ(&(' !Þ!!"'
% %# !Þ"!& !Þ)' !Þ)"&$ !Þ!%%(
& #( !Þ!'(& !Þ*#(& !Þ*&)! !Þ!$!&
' "! !Þ!#& !Þ*&#& !Þ*)&) !Þ!$$$
( * !Þ!##& !Þ*(& !Þ**&) !Þ!#!)
) & !Þ!"#& !Þ*)(& !Þ**)* !Þ!""%
* $ !Þ!!(& !Þ**&! !Þ***( !Þ!!%(
"! # !Þ!!&! "Þ!!! !Þ**** !Þ!!!"

Table 4.4

The supremum norm of the difference between the EDF J%!! and the hypothetical PDF J ! ß

therefore have 7%!! œ %!!Î# † !Þ""#( œ "Þ&* which belongs to the critical region at α œ !Þ!&
||J%!!  J ! ||∞ œ !Þ""#( œ .%!! . Using the Smirnov's version (as if we compare two samples) we

and therefore the null-hypothesis that the data fits the Poisson distribution must be rejected.

If we adjust the data as Table 4.5 below

Number of Colonies per Field C3 # Observations 83


! œ C! &' œ 8!
" œ C" )! œ 8"
# "!!
$ ''
% %#
& #(
' "!
( *
) &
* $
"! #

Table 4.5

we will get

MTH 2401, LECTURE NOTES, Page 193, Version 54


CHAPTER VI. NONPARAMETRIC METHODS

C3 83 83 Î%!! J%!! ÐC3 Ñ J ! ÐC3 Ñ ?3


! &' !Þ"% !Þ"% !Þ!)#" !Þ!&(*
" )! !Þ# !Þ$% !Þ#)($ !Þ!&#(
# "!! !Þ#& !Þ&* !Þ&%$) !Þ!%'#
$ '' !Þ"'& !Þ(&& !Þ(&(' !Þ!!"'
% %# !Þ"!& !Þ)' !Þ)"&$ !Þ!%%(
& #( !Þ!'(& !Þ*#(& !Þ*&)! !Þ!$!&
' "! !Þ!#& !Þ*&#& !Þ*)&) !Þ!$$$
( * !Þ!##& !Þ*(& !Þ**&) !Þ!#!)
) & !Þ!"#& !Þ*)(& !Þ**)* !Þ!""%
* $ !Þ!!(& !Þ**&! !Þ***( !Þ!!%(
"! # !Þ!!&! "Þ!!! !Þ**** !Þ!!!"

Table 4.6

7%!! œ %!!Î# † !Þ!&(* œ !Þ)# which does not belong to the critical region at α œ !Þ!& and
The supremum norm now decreases to ||J%!!  J ! ||∞ œ !Þ!&(* œ .%!! . Therefore,

therefore the null-hypothesis that the data fits the Poisson distribution will not be rejected.

Speaking in terms of the P-value, in the latter case we have

: œ T ÖX  7%!! × œ "  LÐ!Þ)#Ñ ¸ "  !Þ%) œ !Þ&#.

Hence it is very likely that the real X is greater than !Þ)# giving us the reason to believe that the
EDF coming from the corresponding readings fits well the hypothetical Poisson PDF with
parameter #Þ&. 

Problem 4.1. Test the hypothesis that 25 observations (Table 4.7) selected at random from a
distribution form which the PDF J is unknown and that 20 observations (Table 4.8) with an
unknown PDF K are such that J and K are identical at α œ !Þ!&:

!Þ'" !Þ#* !Þ!' !Þ&*  "Þ($  !Þ(%


!Þ&"  !Þ&'  Þ!$* "Þ'% !Þ!&  !Þ!'
!Þ'%  Þ!)# !Þ$" "Þ(( "Þ!*  "Þ#)
#Þ$' "Þ$" "Þ!&  Þ!Þ$#  Þ!%! "Þ!'
 #Þ%(

Table 4.7

#Þ#! "Þ'' "Þ$) !Þ#! !Þ$' !Þ!!


!Þ*' "Þ&' !Þ%% "Þ&!  !Þ$! !Þ''
#Þ$" $Þ#*  !Þ#(  !Þ$( !Þ$) !Þ(!
!Þ&#  !Þ("

Table 4.8

MTH 2401, LECTURE NOTES, Page 194, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

VII. FURTHER TOPICS IN PARAMETER


ESTIMATION

1. Conditional Distribution and Conditional Densities

Conditional Distribution and Densities for Two Random Variables. For two discrete r.v.'s \
and ] , using the conditional probability formula

T Ö\œ3ß] œ4×
T Ö\ œ 3l] œ 4× œ T Ö] œ4× , (1.1)

(such that T Ö] œ 4× Á !) we can define the conditional distribution. Notice the distribution in
the denominator is marginal of ] .

Using the same principle we can define conditional probability density function for continuous
r.v.'s \ and ] . Let Ð\ß ] Ñ À H Ä ‘# be a continuous random vector on a probability space
ÐHß Y ÐHÑß T Ñ with its pdf (probability density function) 0 ÐBß CÑ. We call

0 ÐBßCÑ
0 ÐBlCÑ À œ 0] ÐCÑ

the conditional pdf of \ given ] , provided 0] ÐCÑ Á !. Here 0] ÐCÑ is the marginal pdf of r.v. ] .
Analogously,

0 ÐBßCÑ
0 ÐClBÑ À œ 0\ ÐBÑ

is the conditional pdf of ] given \ , provided 0\ ÐBÑ Á !.

Conditional densities are genuine pdf's in the sense that they are nonnegative, the integrals of
conditional pdf's in the first variables are equal to 1 (it can be easily verified) and that if \ and
] are independent, the conditional pdf's are reduced to marginal pdf's. We postpone for a little
the discussion on the areas of the conditional densities where their denominators are zero.

Example 1.1. Suppose a point is randomly selected on interval Ð!ß "Ñ and its value is \ . Then,
another point is randomly selected on interval Ð!ß \Ñ and its value is ] . We need to find the
marginal pdf of r.v. ] .

Solution. The experiment can be seen from the diagram (Figure 1.1) below:

Figure 1.1

Probability and Statistics MTH 2401, Page 195, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Here we observe that a random selection of a point on interval Ð!ß "Ñ means that \ is uniformly
distributed on Ð!ß "Ñ, i.e. \ is standard uniform. Furthermore, we interpret the corresponding pdf
of \ as its marginal. Thus,

0\ ÐBÑ œ 1Ð!ß"Ñ ÐBÑ.

Notice that the position of \ depends upon the position of ] , as we know it from the basic
probability that events E and F are either mutually dependent or independent. The position of \
seems to be invariant of ] only because in our interpretation it is given by its marginal pdf.

Furthermore, the random choice of ] means that the conditional pdf of ] given \ is uniform,
that is

0 ÐClBÑ œ B" 1Ð!ßBÑ ÐCÑ1Ð!ß"Ñ ÐBÑ.

Next, we get the joint pdf

0 ÐBß CÑ œ 0\ ÐBÑ0 ÐClBÑ œ B" 1Ð!ßBÑ ÐCÑ1Ð!ß"Ñ ÐBÑ œ 0 ÐClBÑ

followed by its integration w.r.t. B À

0] ÐCÑ œ Bœ! B" 1Ð!ßBÑ ÐCÑ.B œ BœC B" .B1Ð!ß"Ñ ÐCÑ


" "

œ  lnC1Ð!ß"Ñ ÐCÑ. (1.1)

The graph of 0] is depicted in Figure 1.2 below.

Figure 1.2

MTH 2401, LECTURE NOTES, Page 196, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

PROBLEMS

1.1. Let the joint pdf of \ and ] be given by

0 ÐBß CÑ œ 
"#
& BÐ#  B  CÑß !  B  "ß !  C  "
!ß elsewhere.

Find the conditional pdf 0 ÐBlCÑ of \ given ] .

1.2. Let the joint pdf of \ and ] be given by

0 ÐBß CÑ œ 
/BÎC /C
C ß !  B  ∞ß !  C  ∞
!ß elsewhere.

Find the conditional pdf 0 ÐBlCÑ and T Ö\  "l] œ C×.

1.3. Show that the integrals of conditional pdf's in the first variables are equal to 1 and that if \
and ] are independent, the conditional pdf's are reduced to marginal pdf's.

1.4. A small reservoir of water supply has a random amount of ] at the beginning of a month
and dispenses a random amount of \ during the month, with measurements in thousand of
gallons. It is not resupplied during the period of a month thus making \ Ÿ ] . Suppose that the
joint density of \ and ] is

0 Bß C œ  #
"
ß !BŸCŸ#
!ß elsewhere.

Find the conditional density of \ given ] œ C.

Solution. We use the formula 0 BlC œ 0 Bß CÎ0] C. Thus we need to find the marginal pdf
0] :

0] C œ Bœ∞ 0 ÐBß CÑ.B œ Bœ! "# .B 1Ð!ß#Ó ÐCÑ œ "# C 1Ð!ß#Ó ÐCÑ.
∞ C

Therefore,

0 BlC œ C" ß !  B Ÿ C Ÿ # and 0 BlC œ !, elsewhere. 

MTH 2401, LECTURE NOTES, Page 197, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

2. Bayesian Analysis
The Bayesian Likelihood Function. We continue our discussion of conditional densities, now
in connection with a very important and useful statistical tools of Bayesian Analysis.

In the context of parameter estimation started in section 1, Chapter III, we dealt with the
likelihood function

08 Ðx8 l)Ñ œ 08 ÐB" ß ÞÞÞß B8 l)Ñ œ 0 ÐB" l)Ñ † † † 0 ÐB8 l) Ñ (2.1)

of a sample X8 œ Ð\" ß ÞÞÞß \8 Ñ with respect to an unknown parameter ) assuming no prior


knowledge of ) . From the “Bayesian point of view,” parameter ) is now treated as a r.v. with
some prior marginal pdf denoted by 0) Ð*Ñ œ À 0Ð*Ñ and called the prior pdf of ).

We will agree about the following notation. A capital letter ) (Greek theta) representing the
unknown parameter will stand for the r.v. The associated parameter variable in the density will
be denoted by a lower script case letter * (also pronounced theta). In light of this, the likelihood
function in (2.1) above will be rewritten as

08 Ðx8 l*Ñ œ 08 ÐB" ß ÞÞÞß B8 l*Ñ œ 0 ÐB" l*Ñ † † † 0 ÐB8 l*Ñ, (2.2)

in which the joint pdf of the sample as well as marginal pdf's of r.v.'s \" ß á ß \8 will be
regarded as conditional pdf's given random parameter ) œ *.

Parametric Conditional Distributions. In the forthcoming situations we most often deal with
discrete distributions that depend on a real-valued random parameter being a continuous r.v. For

tinuous r.v. distributed in interval Ð!ß "ÑÞ Then the distribution ,Ð8ß :à BÑ œ  8B :B Ð"  :Ñ8B
example, for a binomial r.v. with parameters Ð8ß N Ñ we can assume that parameter N is a con-

can be interpreted as the conditional distribution T Ö\ œ BlN œ :× of a “mixed type” (where \


is a discrete and N is continuous r.v.

In another example, if \ is a Poisson r.v. with an unknown parameter -, we interpret that - is an


“empirical value” of random parameter A and then regard /- -B ÎB! as the conditional
distribution T Ö\ œ BlA œ -× œ 0 ÐBl-ÑÞ

given \" œ B" ß á ß \8 œ B8 . This will calibrate the prior density 0* of random parameter )
Posterior Distribution of Parameter ). We target the conditional distribution of parameter )

after we draw a sample from population \ , and call the obtained density posterior. In other
words, we are interested in 0*lB" ß á ß B8  which we can calculate using the conditional density
formula from section 1:

0*lB" ß á ß B8  œ 0*lx8  œ 0 B" ßáßB8 ß*


1B" ßáßB8  ß (2.3)

MTH 2401, LECTURE NOTES, Page 198, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

where 1B" ß á ß B8  stands for the marginal distribution of the sample \" ß á ß \8 and
0 ÐB" ß á ß B8 ß *Ñ is the joint pdf of the sample and parameter ). The joint density 0 ÐB" ß á ß B8 ß *Ñ
can be found by a mere multiplication of the likelihood function 08 of (2.2) and the prior 0, in
accordance with the conditional density formula (of section 1), i.e.,

0 ÐB" ß á ß B8 ß *Ñ œ 08 ÐB" ß á ß B8 l*Ñ0*. (2.4)

As to the marginal distribution 1 of the sample, in order to get it we should undergo integration
in * (as in any case whenever the marginal density is required), but this unwanted procedure will
be avoided through a simple recipe that applies to many special cases.

Beta and Gamma Random Variables. A r.v. ) À H Ä Ð!ß "Ñ is said to be beta with parameters
α  ! and "  ! if its pdf is

>Ðα" Ñ α"
0Ð*lαß " Ñ œ >ÐαÑ>Ð" Ñ * Ð"  *Ñ"" 1Ð!ß"Ñ Ð*Ñß (2.5)

where >ÐαÑß α  !, is the gamma function that reduces to α  "x if α is a positive integer. The
mean of ) is
α
IÒ)Ó œ α " . (2.6)

If α œ " œ ", then the beta r.v. readily reduces to the standard uniform r.v.

In section 3, Chapter II, we introduced the gamma r.v. The student is referred to this section, but
for consistency we revisit it.

A r.v. ) is said to be gamma with parameters α and " if its pdf is



0Ð*lαß " Ñ œ >ÐαÑ * / 1‘ (*),
α" "*
(2.7)

where >ÐαÑß α  !, is the gamma function.

With α œ ", (2.7) reduces to the exponential pdf:

0Ð*l-Ñ œ -/-* 1‘ (*).

More properties of gamma r.v. are:

I )  œ α
" (2.8)

and the mgf of a gamma r.v. is 7Ð>Ñ œ  "">  .


α
(2.9)

The Proportion of Defective Items Revisited. We revisit Example 1.1 of section 1, Chapter III,
about the unknown proportion of defective items. Only now the unknown parameter is random.

MTH 2401, LECTURE NOTES, Page 199, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Example 2.1. Let ) be the proportion of defective items in a large manufactured lot, which is
unknown, but suppose its prior distribution is uniform in Ð!ß "Ñ. (Note that if we do not know a
prior of some proportion, we can always assume that it is standard uniform.) Let a random
sample of 8 items, \" ß ÞÞÞß \8 be drawn. \" ß á ß \8 are independent Bernoulli r.v.'s with
parameter ) unknown. Then their sum D À œ \"  ÞÞÞ  \8 , as we recall (Example 8.3, Chapter
II), is a binomial r.v. with parameters Ð8ß )Ñ, where ) is random.

Using formula (1.2), section 1, Chapter III, the conditional density 0 ÐBl*Ñ of \5 can be written
as

0 ÐBl*Ñ œ 
*B Ð"  *Ñ"B 1Ð!ß"Ñ Ð*Ñß B œ !ß "
(2.10)
!ß 9>2/<A3=/

and thus the likelihood function is

08 Ðx8 l*Ñ œ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð)Ñß (2.11)

with 5 œ B"  ÞÞÞ  B8 (the number of defective items in the empirical sample being an integer).

The prior density of parameter ) is, by the assumption, standard uniform:

0* œ 1Ð!ß"Ñ Ð*Ñ. (2.12)

Hence, the joint density function is

0 Ðx8 ß *Ñ œ 08 Ðx8 l*Ñ0* œ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð*Ñß (2.13)

with the posterior pdf of ),

08 Ðx8 l*Ñ0Ð*Ñ "


0Ð*lx8 Ñ œ 1Ðx8 Ñ œ 5
1Ðx8 Ñ * Ð"  *Ñ85 1Ð!ß"Ñ Ð*ÑÞ (2.14)

This must belong to the family of beta densities. Here is why. Rewrite (2.14) as

0Ð*lx8 Ñ µ *5"" Ð"  *Ñ85"" 1Ð!ß"Ñ Ð*Ñ, (2.15)

which, except for a missing constant factor 1Ðx"8 Ñ , fits (2.5). We identify parameters α as 5  "
and " as 8  5  ". Hence, the vacant constant (relative to *) 1Ðx"8 Ñ should thus be

" >Ðα" Ñ >Ð8#Ñ Ð8"Ñx


1Ðx8 Ñ œ >ÐαÑ>Ð" Ñ œ >Ð5"Ñ>Ð85 "Ñ œ 5 xÐ85 Ñx , (2.16)

since 5 is also an integer. Finally,

MTH 2401, LECTURE NOTES, Page 200, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Ð8"Ñx
0Ð*lx8 Ñ œ 5xÐ85Ñx *
5""
Ð"  *Ñ85"" 1Ð!ß"Ñ Ð*Ñ. (2.17)

This is the technique allowing to get the posterior density without integration of 08 Ðx8 l*Ñ0*.
Notice that the idea of "guessing" a constant factor for a posterior pdf is due to the fact that any
two pdf's in the form 0 Ð*Ñ œ +2Ð*Ñ and 1Ð*Ñ œ ,2Ð*Ñ must have + œ , . This is very easy to
show through integration of both densities. 

Example 2.2. Suppose we need to reevaluate the proportion of defective items and that from the
past information we had, 10% of defective items took place with probability 0.7 and 20% of
defective items occurred with probability 0.3. Suppose 8 items were selected and 2 of them
turned out to be defective. We need to find the posterior distribution of defective items based on
this sample.

Solution. In this example we first identify

0* œ !Þ(1!Þ" *  !Þ$1!Þ# * (2.18)

as the prior distribution of defective items. Now, the defective items \" ß \# ß ÞÞÞ are independent
Bernoulli r.v.'s with the conditional density as in (2.10), while the conditional density of the
sample (i.e., likelihood function) is

08 Ðx8 l*Ñ œ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð*Ñß

where 5 œ # and 8 œ ). [Notice that the dependence of likelihood function in the last expression
is explicit and on the sum 5 œ #.] Thus, in this case, the conditional density of the sample is

0) Ðx) l*Ñ œ *# Ð"  *Ñ' 1Ð!ß"Ñ Ð*Ñ (2.19)

and consequently the joint density function of parameter ) and of the sample is

0) Ðx) ß *Ñ œ *# Ð"  *Ñ' 0*ß (2.20)

with 0* satisfying (2.18) and * taking on values 0.1 or 0.2. Now we need to find the marginal
density 1x8  using an analog of the total probability formula or the discrete analog of integral of
the joint density:

08 ÐB" ß ÞÞÞß B8 l*Ñ0Ð*Ñ œ 1ÐB" ß ÞÞÞß B8 Ñ œ 1Ðx8 Ñ,


*

to have 1x)  œ 0) Ðx) l* œ !Þ"Ñ0Ð!Þ"Ñ  0) Ðx) l* œ !Þ#Ñ0Ð!Þ#Ñ

œ !Þ"# Ð"  !Þ"Ñ' !Þ(  !Þ## Ð"  !Þ#Ñ' !Þ$.

The posterior distribution is then

MTH 2401, LECTURE NOTES, Page 201, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

0* œ !Þ"lx)  œ !Þ"# Ð"!Þ"Ñ' !Þ(


!Þ"# Ð"!Þ"Ñ' !Þ(!Þ## Ð"!Þ#Ñ' !Þ$ œ !Þ&%")

and

0* œ !Þ#lx)  œ !Þ## Ð"!Þ#Ñ' !Þ$


!Þ"# Ð"!Þ"Ñ' !Þ(!Þ## Ð"!Þ#Ñ' !Þ$ œ !Þ%&)#.

Notice that 0* œ !Þ"lx)   0* œ !Þ#lx)  œ ". Thus, after the observations, the correction to
the prior distribution (that was 0.7 and 0.3) is 0.54 and 0.46, respectively. Also, the above was
nothing else but the conventional Bayes formula. 

Example 2.3. Under the condition of Example 2.1, assume that the prior of the proportion ) of
the defective items is beta with parameters α and " (instead of standard uniform). Suppose a
sample of 8 items was drawn and that 5 œ B"  á  B8 were defective. Find the posterior pdf
of ).

Solution. We can write the conditional density 0 ÐBl*Ñ of each r.v. \5 as

0 ÐBl*Ñ œ 
*B Ð"  *Ñ"B 1Ð!ß"Ñ Ð*Ñß B œ !ß "
(2.21)
!ß 9>2/<A3=/Þ

Thus the likelihood function of the sample is

08 Ðx8 l*Ñ œ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð*Ñ. (2.22)

Since the prior of ) is beta with the pdf

>Ðα" Ñ α"
0Ð*Ñ œ >ÐαÑ>Ð" Ñ * Ð"  *Ñ"" 1Ð!ß"Ñ Ð*Ñ (2.23)

we have the joint pdf

>Ðα" Ñ α5"
0 Ðx8 ß *Ñ œ >ÐαÑ>Ð" Ñ * Ð"  *Ñ"85" 1Ð!ß"Ñ Ð*Ñ (2.24)

and thus the posterior of ) is

0Ð*lx8 Ñ µ *α5" Ð"  *Ñ"85" 1Ð!ß"Ñ Ð*Ñ. (2.25)

We easily figure out that the missing constant is

>Ðα" 8Ñ
>Ðα5Ñ>Ð" 85Ñ

and thus

>Ðα" 8Ñ
0Ð*lx8 Ñ œ >Ðα5Ñ>Ð" 85Ñ *
α5"
Ð"  *Ñ"85" 1Ð!ß"Ñ Ð*Ñ (2.26)

MTH 2401, LECTURE NOTES, Page 202, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

is again a beta density with parameters α  5 and "  8  5. 

Example 2.4 (Example 2.3 revisited). Under the condition of Example 2.3, suppose that the
prior of ) is beta with parameters α! and "! , i.e.

>Ðα! "! Ñ α! "


0! Ð*Ñ œ >Ðα! Ñ>Ð"! Ñ * Ð"  *Ñ"! " 1Ð!ß"Ñ Ð*Ñ. (2.27)

We draw the first sample B" ß ÞÞÞß B8" in which we assume that 5" items were defective. Hence the
likelihood function of the sample should be

08" ÐB8" l*Ñ œ *5" Ð"  *Ñ8" 5" 1Ð!ß"Ñ Ð*Ñ (2.28)

and the joint pdf of X8" and ) is the product of (2.27) and (2.28):

0 Ðx8" ß *Ñ œ 08" Ðx8" l*Ñ0! *

œ *5" Ð"  *Ñ8" 5" >>ÐÐαα!!Ñ>Ð""!!ÑÑ *α! " Ð"  *Ñ"! " 1Ð!ß"Ñ Ð*Ñ

>Ðα! "! Ñ 5" α! "


œ >Ðα! Ñ>Ð"! Ñ * Ð"  *Ñ8" "! 5" " 1Ð!ß"Ñ Ð*Ñ. (2.29)

>Ðα! "! Ñ
Dropping the constant >Ðα! Ñ>Ð"! Ñ we arrive at the posterior 0Ð*lx8" Ñ

0Ð*lx8" Ñ µ *α! 5" " Ð"  *Ñ8" +"! 5" " 1Ð!ß"Ñ Ð*Ñ, (2.30)

without a multiplicatorß figuring that it is beta with parameters α" œ α!  5" and
"" œ "!  8"  5" and thus concluding that the posterior pdf is

>Ðα! "! 8" Ñ


0" Ð*lx8" Ñ œ >Ðα! 5" Ñ>Ð"! 8" 5" Ñ

‚ *α! 5" " Ð"  *Ñ"! 8" 5" " 1Ð!ß"Ñ Ð*Ñ. (2.31)

Suppose now that we conduct yet another experiment (say 2) by drawing another sample
C" ß ÞÞÞß C8# in which 5# œ C"  á  C8# items are defective. Using the posterior 0" Ð*lx8" Ñ in
experiment 1 as a prior for experiment 2 to go to the second round, we have the likelihood
function of the sample:

08# Ðy8# l*Ñ œ *5# Ð"  *Ñ8# 5# 1Ð!ß"Ñ Ð*Ñ (2.32)

and the joint pdf of Y8# and ) is the product of (2.27) and (2.28):

0 Ðy8# ß *Ñ œ 08# Ðy8# l*Ñ0" *lx8" 

MTH 2401, LECTURE NOTES, Page 203, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

>Ðα! "! 8" Ñ


œ *5# Ð"  *Ñ8# 5# >Ðα!  5" Ñ>Ð"! 8" 5" Ñ

‚ *α! 5" " Ð"  *Ñ"! 8" 5" " 1Ð!ß"Ñ Ð*Ñ

>Ðα! "! 8" Ñ


œ >Ðα! 5" Ñ>Ð"! 8" 5" Ñ

‚ *α! 5" 5# " Ð"  *Ñ"! 8" 8# 5" 5# " 1Ð!ß"Ñ Ð*Ñ. (2.33)

>Ðα! "! 8" Ñ


Dropping the constant >Ðα! 5" Ñ>Ð"! 8" 5" Ñ we arrive at the posterior 0Ð*ly8# Ñ:

0Ð*ly8# Ñ µ *α! 5" 5# " Ð"  *Ñ"! 8" 8# 5" 5# " 1Ð!ß"Ñ Ð*Ñ, (2.34)

without a multiplicatorß figuring that the second posterior is also beta, with parameters
α# œ α!  5"  5# and "# œ "!  8"  8#  5"  5# ß thusß

>Ðα! "! 8" 8# Ñ


0# Ð*lx8# Ñ œ >Ðα! 5" 5# Ñ>Ð"! 8" 8# 5" 5# Ñ

‚ *α! 5" 5# " Ð"  *Ñ"! 8" 8# 5" 5# " 1Ð!ß"Ñ Ð*Ñ. (2.35)

Bayes Estimates and Estimators. We will briefly introduce the notion of the Bayes estimator
with a minimum rigor. For further details the student is referred to section 4 and 5.

Given two r.v.'s \ and ) and the conditional pdf 0*lB, we will define the conditional
expectation of r.v. ) given \ œ B as

I )l\ œ B œ *œ∞ *0Ð*lBÑ. *.


In our case, replacing \ with vector X8 yields

I )lX8 œ x8  œ *œ∞ *0Ð*lx8 Ñ. *.



(2.36)

We will call I )lX8 œ x8  the Bayes estimate of parameter ).

Example 2.5. To obtain the Bayes estimates in the above examples all we need is to copy the
formula for the associated expectation and adjust its parameter. For instance, using formula
α
IÒ)Ó œ α " for the expectation of a beta r.v. ) , in the context of Example 2.1, in which
α œ 5  " and " œ 8  5  " we get the Bayes estimate of ) as

I )lx8  œ
_
5" B8  8"
8# œ " 8#
, (2.37)

MTH 2401, LECTURE NOTES, Page 204, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

_
after division by 8 which approaches the m.l.e.
_ 8 for large 8. Evidently, the Bayes estimate is
B
more accurate than the corresponding m.l.e. B8 of ) from section 1, Chapter III, provided we
know a prior of ).
_ _
Now, if in formula (2.37) we replace B8 with \ 8 we will have the so-called Bayes estimator of )
which formally is warranted by the replacement of x8 with capital X8 in the conditional
expectation formula. Thus we have

I )lX8  œ
_
\ 8  8"
" 8#
. (2.38)

Example 2.6. From Example 2.3, with the posterior

>Ðα" 8Ñ
0Ð*lx8 Ñ œ >Ðα5Ñ>Ð" 85Ñ *
α5"
Ð"  *Ñ"85" 1Ð!ß"Ñ Ð*Ñ

belonging to the beta family of r.v.'s with parameters α  5 and "  8  5ß the Bayes estimate is

I )lx8  œ
_
α 5 B8  α8
α" 8 œ " α " (2.39)
8

and the Bayes estimator is

I )lX8  œ
_
\ 8  α8
" α8+"
. (2.40)

Example 2.7. We use an R-program to generate the first sample of 100 Bernoulli r.v.'s with
parameter : œ !Þ".

# prior is uniform
# generate a sample of 100 defective items with p=0.1
n1<-100
# generate max likelihood estimate of a sample
sigma0<-rbinom(1,n1,p=0.1)
MaxLik<-sigma0/n1
print(MaxLik);
[1] 0.13 [1] 0.07
# obtain Bayes estimate
alpha1=sigma0+1
beta1=n1-sigma0+1
BayesEst1<-(alpha1)/(n1+2)
print(BayesEst1);
[1] 0.1372549 [1] 0.07843137
# generate a second sample to improve first Bayes estimate
n2<-100

MTH 2401, LECTURE NOTES, Page 205, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

sigma1<-rbinom(1,n2,p=0.1)
alpha2=alpha1+sigma1
BayesEst2<-(alpha2)/(alpha1+beta1+n2);
print(BayesEst2)
[1] 0.1052632 [1] 0.0990099

Because we need only their sum 5, we generate a binomial r.v. from "!!ß !Þ". Then, we

B"!! œ !Þ" and then the Bayes estimate I )lx8  œ 58#


obtain
_ a Bayes estimate. In the example below we first obtain the maximum likelihood estimate
"
using formula (2.37). Then, we use the
latter as the new prior to calculate a new improved Bayes estimate I )lx8  œ α α 5
" 8 using
formula (2.39). We generate a new sample of 50 Bernoulli r.v.'s with parameter : œ !Þ". Since

"!!ß !Þ".
the next round also depend on their sum 5 , we generate a binomial r.v. from the class

MTH 2401, LECTURE NOTES, Page 206, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

PROBLEMS

2.0. Suppose X8 is drawn from an exponential population with an unknown parameter A  !

with parameters 8  "ß "  83œ" B3 


and the prior of A is exponential with parameter " . Show that the posterior of A is a gamma r.v.

2.1. If X8 is drawn from an exponential population with parameter A  ! and the prior of A is
gamma with parameters Ðαß " Ñ, then show that the posterior of A is also gamma with parameters
Ðα  8ß "   B3 Ñ.
8

3œ"

2.2. Let ) be the proportion of defective items in a large manufactured lot, which is unknown,
but its prior distribution is supposed to be uniform in Ð!ß "Ñ. Let a random sample of "! items,
X"! œ Ð\" ß á ß \"! Ñ be drawn and suppose just one of them turned out to be defective. Find the
posterior density 0Ð)lx"! Ñ œ 0Ð)lB" ß á ß B"! ÑÞ

2.3. Suppose that as in Problem 2.2, a sample of Ð8" œ Ñ 10 items from Bernoulli population was
drawn and that exactly one item turned out to be defective Ð5" œ "Ñ. In Problem 2.2 we assumed
that the prior pdf was standard uniform (i.e. it is a special case of beta with parameters
α! œ "! œ "). Now, assume that a new sample 8# of "% items was drawn form the same
population and that 5# œ % of them appeared to be defective. Show that the posterior density
after the second draw is beta and finds its parameters.

2.4. Suppose that in the estimation of an unknown parameter ) being the proportion of defective
items, the prior of ) is known to be a beta density with parameters Ðαß " Ñ. Give the Bayes
estimator of ).

2.5. Under the condition of Problem 2.2, with an unknown parameter ) being the proportion of
defective items, the prior density of ) was assumed to be standard uniform. With a sample of 10
items drawn of which one was defective, give the Bayes estimate of ).

2.6. In light of Examples 2.4, 2.5 suppose in the second round of the experiment with the
proportion of defective items, "% items were drawn, of which % were defective. Find the Bayes
estimate of the proportion of defective items after the second experiment.

MTH 2401, LECTURE NOTES, Page 207, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

3. Conjugacy of Families of Random Variables


A family c of distributions is called conjugate if there exists a population Ò\Ó with the pdf
0 ÐBl*Ñ whose random parameter ) has prior 0Ð*Ñ and posterior (after 8 observations for any 8)
0Ð*lx8 Ñ such that both prior and posterior belong to c .

To understand better the concept of this definition we consider a few examples.

Example 3.1. Show that the family c of beta pdf's is conjugate for a Bernoulli r.v. with
parameter ) − Ð!ß "Ñ.

Solution. Suppose first that \ is a Bernoulli r.v. with unknown parameter ) − Ð!ß "Ñ (previously
interpreted as a proportion of defective items). We know from Example 1.1 that if ) is uniform
(which is a member of beta family), then the posterior is beta as per (2.17):

Ð8"Ñx
0Ð*lx8 Ñ œ 5xÐ85Ñx *
5""
Ð"  *Ñ85"" 1Ð!ß"Ñ Ð*Ñ. (3.2)

However, we need to prove it for a more general case of priors to justify the conjugacy of c . In
other words, we now assume that the prior of ) is not just uniform, but an arbitrary beta with
parameters α and " :

>Ðα" Ñ α"
0Ð*Ñ œ >ÐαÑ>Ð" Ñ * Ð"  *Ñ"" 1Ð!ß"Ñ Ð*Ñ. (3.3)

Recall that the likelihood function of an 8-sample in Example 1.1 was

08 Ðx8 l*Ñ œ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð*Ñß (3.4)

with 5 œ B"  ÞÞÞ  B8 . Thus the posterior pdf will be

0Ð*lx8 Ñ µ *5 Ð"  *Ñ85 1Ð!ß"Ñ Ð*Ñ ‚ *α" Ð"  *Ñ"" (3.5)

(with the right-hand side of 3.5 being the "principal part" of the density) in which the constant
factor in (3.3), >>ÐÐααÑ>Ð""ÑÑ and 1Ðx"8 Ñ of (3.1) are ignored. We rewrite (3.5) by regrouping the factors
as

0Ð*lx8 Ñ µ *α5" Ð"  *Ñ"85" 1Ð!ß"Ñ Ð*Ñ

thus figuring out that 0Ð*lx8 Ñ is beta with parameters Ðα  5ß "  8  5Ñ. Consequently, we say
that beta family (of priors) is conjugate for Bernoulli family

0 ÐBl*Ñ œ 
*B Ð"  *Ñ"B 1Ð!ß"Ñ Ð*Ñß B œ !ß "
(3.6)
!ß 9>2/<A3=/Þ

MTH 2401, LECTURE NOTES, Page 208, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

as per (2.17) for * − Ð!ß "Ñ. 

Example 3.2. Show that the gamma family is conjugate for Poisson.

Solution. According to Definition 2.2 (eq. (2.5)), the gamma prior is



0Ð*Ñ œ >ÐαÑ * / 1‘ (*).
α" "*
(3.7)

Now, the Poisson "density" can be written as


B
0 ÐBl*Ñ œ /* *B! 1Ö!ß"ßá× ÐBÑ. (3.8)

Notice that as in section 1, this conditional density is a mixture of discrete (in B) and continuous
(in *). (3.8) yields the likelihood function
5
*
08 Ðx8 l*Ñ œ /8* B" !áB8!
(B3 integer) (3.9)

where

5 œ B"  á  B8 . (3.10)

Recalling (3.1), multiplying (3.9) by (3.7), and ignoring constants we have the principal part of
the posterior

/Ð8" Ñ* *α5" ß

i.e. 0Ð*lx8 Ñ µ /Ð8" Ñ* *α5" (3.11)

which we figure out is gamma with parameters Ðα  5ß "  8Ñ. The rest is obvious. 

Example 3.3 (Example 1.4 revisited). Recall that in Example 1.4, the number of connections to
a wrong phone number was modeled by a Poisson distribution. Suppose we need to estimate the
parameter ) (denoted then by -) of that distribution (the mean number of wrong connections) by
observing a sample B" ß á ß B8 of wrong connections on 8 different days. Only now, we assume
that the prior 0Ð*Ñ of ) is gamma distributed with parameters α and " . From Example 3.2, it
follows that the posterior pdf 0Ð*lB" ß á ß B8 Ñ is gamma with parameters Ðα  5ß "  5Ñ, where
5 œ B"  á  B8  !. 

Problem 3.1. The number of connections to a wrong phone number is modeled by a Poisson
distribution. Suppose the prior of its parameter A is known to be gamma with parameters α œ #
and " œ $. As in section 1, Chapter III, we need to estimate A (the mean number of wrong
connections) by observing a sample B" ß á ß B8 of wrong connections on 8 different days.
Applying the results of Examples 3.2 and 3.3 find the Bayes estimate of A using the fact that on
"! different days it was a total of #! wrong connections.

MTH 2401, LECTURE NOTES, Page 209, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

4. The Conditional Expectation


To continue with Bayes method we need the notion of conditional expectation.

DISCRETE CASE

Definition 4.1. Let \ and ] be two r.v.'s defined on H and valued in a subset Hw of real
numbers. Let 2 À Hw Ä ‘ be a function of r.v. ] . We assume that r.v.'s \ and ] are discrete.

The conditional expectation of r.v. 2Ð] Ñ given the event Ö\ œ B× is defined as

1ÐBÑ À œ IÒ2Ð] Ñl\ œ BÓ œ 2ÐC3 ÑT Ö] œ C3 l\ œ B× (4.1)


3
if ] is discrete.

If we formally drop little B having IÒ2Ð] Ñl\Ó, the latter will become a function of r.v. \ß say
1Ð\Ñ. 

The conditioning on \ means that in place of one single event Ö\ œ B× in expression (4.1) we
now have a multitude of events generated by r.v. \ .

Example 4.1. Let \" ß \# á be a sequence of iid r.v.'s drawn from Ò\Ó, each with a common
mean .. Suppose we need to find the mean of their sum W8 œ \"  á  \8 , which is
IW8 œ 8..

Now, if we need to calculate the conditional mean of a random sum of them, say
WR œ \"  ÞÞÞ  \R given R , then it will be

IÒWR lR Ó œ R . (4.2)

whereas I WR lR œ 8 œ IW8 œ 8..

Now, if \" ß \# ß á are geometric with a common parameter : and R is an integer-valued r.v.
independent of \3 's, then, by (4.2),

I WR lR  œ R Î:. (4.3)

Example 4.2. Let \" ß \# á be a sequence of iid r.v.'s drawn from Ò\Ó, each with a common pgf
1ÐDÑ œ ID \ . Suppose we need to find the distribution of their sum W8 œ \"  á  \8
expressed in the pgf form KÐDÑ œ ID W8 Þ We have, by Proposition 4.2, Chapter II,

KÐDÑ œ ID \" âID \8  œ 1ÐDÑ8 . (4.4)

MTH 2401, LECTURE NOTES, Page 210, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Now, if we need to calculate the pgf of a random sum of them, say \"  ÞÞÞ  \R , where \3 's
are geometric with a common parameter : and R is an integer-valued r.v. independent of \3 's,
then, by (4.4),

I D \" ÞÞÞ\R lR  œ  ";D  Þ


R
:D
(4.5)

Notice that the conditional expectation in (4.5)  ";D  , is a function of r.v. R .


R
:D

Property 4.1. (The Law of Double Expectation or Iterated Conditioning.) Let \ and ] be two
discrete-valued r.v.'s and let 2 be a function. Then, it holds that

I IÒ2Ð] Ñl\Ó œ I 2Ð] Ñ. (4.6)

(See Problem 4.1.) 

Example 4.3. Suppose we need to calculate the pgf of a random sum of r.v.'s from Example 4.1,
\"  ÞÞÞ  \R , where \3 's are independent and geometrically distributed and R is also a
geometrically distributed r.v. independent of \3 's, with parameter 1  !. Then, by equation (4.6)
we have

I D \" ÞÞÞ\R  œ I I D \" ÞÞÞ\R lR 

œ I  ";D  œ
R
:D 1C :D
"Ð"1ÑC , where C œ ";D .

After a simple algebra, we find that

I D \" ÞÞÞ\R  œ 1:D


"Ð"1:ÑD , (4.7)

which indicates that \"  ÞÞÞ  \R is surprisingly also a geometric r.v. with parameter 1:. 

CONTINUOUS CASE

The notion of the conditional expectation introduced for discrete r.v.'s now will be extended for
continuous and arbitrary cases.

Definition 4.2. Let ÐHß Y ÐHÑß T Ñ be a probability spaceß \ and ] be two random variables on
this probability space and 2 be a Borel measurable function. The conditional expectation of a
r.v. 2Ð] Ñ given the event Ö= À \Ð=Ñ œ B× is defined as

IÒ2Ð] Ñl\ œ BÓ œ Cœ∞ 2ÐCÑ0 ÐClBÑ.C,



(4.8)

where 0 ÐClBÑ is the conditional pdf defined in section 1.

MTH 2401, LECTURE NOTES, Page 211, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Recall that in the discrete case ,

IÒ2Ð] Ñl\ œ BÓ œ 2ÐC3 ÑT Ö] œ C3 l\ œ B×.


3
In both cases, these are some functions in variable B. If we formally drop B having IÒ2Ð] Ñl\Ó,
the latter will become a function of r.v. \ , say 1Ð\Ñ. 

Example 4.4. Suppose we need to calculate the mgf of a random sum of r.v.'s, say
\"  ÞÞÞ  \R , where \3 's are iid exponential and R is an integer-valued r.v. independent of
\3 's, then

I /)\" ÞÞÞ\R  lR  œ  -
-  R
) Þ (4.9)

(See Example 4.5 for the conclusion of this case.) 

Proposition 4.2 below is a continuous analog of Proposition 4.1

Property 4.2. (The Law of Double Expectation.) Let \ and ] be two continuous r.v.'s and let 2
be a Borel measurable function. Then, it holds

I IÒ2Ð] Ñl\Ó œ I 2Ð] Ñ. (4.10)

Indeed, IÒIÒ2Ð] Ñl\ÓÓ œ IÒ1Ð\ÑÓ œ B 1ÐBÑ0\ ÐBÑ.B

œ B IÒ2Ð] Ñl\ œ BÓ0\ ÐBÑ.B

œ B C 2ÐCÑ0 ÐClBÑ.C0\ ÐBÑ.B

œ C B 2ÐCÑ0 ÐClBÑ0\ ÐBÑ.B.C


 

œ C 2ÐCÑB 
0 ÐClBÑ0\ ÐBÑ .B.C
 
 0 ÐBß CÑ 

œ C 2ÐCÑ0] ÐCÑ.C œ I 2Ð] Ñ. 

Example 4.5. Suppose we need to calculate the mgf of a random sum of r.v.'s, say
\"  ÞÞÞ  \R , where \3 's are iid exponential and R is a geometrically distributed r.v.
independent of \3 's, with parameter :  !. Then, by the law of double expectation we have

I /)\" ÞÞÞ\R   œ I I /)\" ÞÞÞ\R  lR 

MTH 2401, LECTURE NOTES, Page 212, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

œ I  -
- 
œ
R :D -
) "Ð":ÑD , where D œ -) .

After a simple algebra, we find that

I /)\" ÞÞÞ\R   œ -:
-:) , (4.11)

which indicates that \"  ÞÞÞ  \R is an exponential r.v. with parameter -:. 

MTH 2401, LECTURE NOTES, Page 213, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

PROBLEMS

4.1. Prove Property 4.1.

4.2. Under the condition of Problem 1.1, find IÒ\l] ÓÞ Then, using this result, find I\ .

4.3. Under the condition of Problem 1.2, find IÒ\l] ÓÞ Then, using this result, find I\ .

4.4. Suppose someone flips a fair coin R times and we need to calculate the number of Heads in
R trials and this would be binomial. However, R is a geometrically distributed r.v. with
parameter "# . Find the distribution of the number [ of Heads in R trials. Hint: First, find the pgf
of [. Then expand this pgf in Taylor series.

4.5. (Generalization.) Suppose someone flips a biased coin (with probability : of having a Head)
R times and we need to calculate the number of Heads in R trials and this would be binomial.
However, R is a geometrically distributed r.v. with parameter +. Find the distribution of the
number [ of Heads in R trials. Hint: First, find the pgf of [. Then expand this pgf in Taylor
series.

MTH 2401, LECTURE NOTES, Page 214, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

5. The Bayes Estimator Revisited


An estimator of an unknown parameter ) of a r.v. \ is based on a sample
X8 œ Ð\_ " ß á ß \8 Ñ © Ò\Ó and it is a function $ ÐX8 Ñ. We knew from section 1 that the sample
mean \ 8 of a sample X8 was an MLE of the unknown mean . œ I\ of population Ò\Ó for
many special cases of r.v.'s. In this section we will use Bayes' approach to estimate an unknown
parameter ) (regarding it as a r.v.), most often the mean, provided that we have known some
history of this parameter, namely its prior density 0Ð*Ñ.

The conditional estimator of ),

$ ‡ ÐX8 Ñ œ IÒ)lX8 Óß (5.1)

called the Bayes estimator.

Therefore, in light of sections 2-4, the Bayes estimator

$ ‡ ÐX8 Ñ œ IÒ)lX8 Ó

is the posterior mean of parameter ). This can be easily found from the posterior distributions
or densities or just their parameters as the following examples show it.

Example 5.1. According to Problem 2.1, if X8 is drawn from an exponential population with (an
unknown) parameter A  ! and the prior of A is gamma with parameters Ðαß " Ñ, then the
posterior of A is also gamma with parameters Ðα  8ß "   B3 Ñ. In particular, if A is 7-Erlang
8

3œ!
with parameter " of each of its 7 individual exponential phases, then the posterior of A is
7  8-Erlang with parameter "   B3 of each phase.
8

3œ"

Recall (cf. formula (2.10)) that the mean of a gamma random variable A with parameters Ðαß " Ñ
is IÒAÓ œ α" . Therefore, the Bayes estimator $ ‡ ÐX8 Ñ of A is

α8
IÒAlX8 Ó œ . (5.2)
"   \3
8

3œ"

In other words, if A is an unknown parameter of an exponential population and a sample from


the population was drawn and valued x8 œ ÐB" ß ÞÞÞß B8 Ñ, then the "best" mean-square error (MSE)
estimate (i.e. empirical or observed predictor) of A is the value
α8
IÒAlX8 œ x8 Ó œ $ ‡ Ðx8 Ñ œ , (5.3)
"  B3
8

3œ"

MTH 2401, LECTURE NOTES, Page 215, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

Notice that A is the reciprocal of the (conditional) expected value of each member \5 of the
population. In other words, since the estimate $ ‡ Ðx8 Ñ estimates A, Ð$ ‡ Ðx8 ÑÑ" estimates the mean
of the "exponential" population. Now, with the observed sample mean value denoted by

B 8 œ 8"  B3 ,
8
 (5.4)
3œ"

by dividing the numerator and denominator in (5.3) by 8 and reciprocating it we can express the
reciprocal of $ ‡ Ðx8 Ñ in terms of the observed sample mean as follows:

" Î8 
B8
Ð$ ‡ Ðx8 ÑÑ" œ αÎ8" . (5.5)

Hence, for large 8, Ð$ ‡ Ðx8 ÑÑ" becomes approximately the observed sample mean (the m.l.e. of
A), with no regard of the history of A. So, it seems as for a large sample, the sample mean does
its m.l.e. job as usual, but for smaller samples, the Bayes
_ estimator of A is more accurate. Note
that, while Ð$ ÐX8 ÑÑ approaches the sample mean \ 8 , the sample mean, for large 8, by the
‡ "

the reciprocal of I A. Any estimator (like Ð$ ‡ ÐX8 ÑÑ" of ÐI AÑ" ) with such a property is
Law of Large Numbers, in turn, approaches the mean of the population. The latter in our case is

called consistent and its property is referred to as consistency. 

Remark 5.2. In Example 2.3 we arrived at the posterior after the second experiment with the
proportion of defective items as beta with parameters α# œ α!  5"  5# and
"# œ "!  8"  8#  5"  5# . The Bayes estimate then is
α# α! 5" 5#
IÒ)lX8 œ x8# Ó œ α# "# œ α! "! 8" 8# .

It is easy to see that after the 5 th experiment, the Bayes estimate would be
α!  53
5

α5
IÒ)l\8 œ x85 Ó œ œ 3œ"
ß
α! "!  83
α5 "5 5

3œ"

where 83 is the size of the 3th sample and 53 is the number of defective items in the 3th sample.
The same result as we see it could be obtained by combining the 5 samples in one larger of size
8 œ  83 , with the total number of defective items 5 œ  53 in one single experiment. The
5 5

3œ" 3œ"
result formally looks the same. However, conducting these experiments sequentially allows us to
draw smaller, more "affordable" samples and also spread it in a larger time period, which sounds
practically more rational. 

MTH 2401, LECTURE NOTES, Page 216, Version 54


VII. FURTHER TOPICS IN PARAMETER ESTIMATION

PROBLEMS

5.1. Under the condition of Example 2.1, give the Bayes estimator of parameter ) about the
proportion of defective items with the uniform prior.

5.2. Suppose that in the estimation of an unknown parameter ) being the proportion of defective
items, the prior of ) is known to be a beta density with parameters Ðαß " Ñ. Give the Bayes
estimator of ).

5.3. Under the condition of Problem 2.2, with an unknown parameter ) being the proportion of
defective items, the prior density of ) was assumed to be standard uniform. With a sample of 10
items drawn of which one was defective, give the Bayes estimate of ).

5.4. In light of Examples 2.4, 2.5 suppose in the second round of the experiment with the
proportion of defective items, "% items were drawn, of which % were defective. Find the Bayes
estimate of the proportion of defective items after the second experiment.

5.5. The number of connections to a wrong phone number is modeled by a Poisson distribution.
Suppose the prior of its parameter A is known to be gamma with parameters α œ # and " œ $.
As in section 1, we need to estimate A (the mean number of wrong connections) by observing a
sample B" ß á ß B8 of wrong connections on 8 different days. Applying the results of Examples
4.2 and 4.3 find the Bayes estimate of A using the fact that on "! different days it was a total of
#! wrong connections.

MTH 2401, LECTURE NOTES, Page 217, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

CHAPTER VIII. GAUSSIAN MULTIVARIATE


DISTRIBUTIONS
1. Bivariate Normal Distribution
Let ^" and ^# be two independent standard Gaussian r.v.'s.

Yœ  œ     "
]" 5" ! ^" .
(1.1)
]# 5# 3 5# "  3 # ^# .#

or in the matrix form

Y œ E Z  .ß (1.2)

where ." ß .# are real constants, 53  !ß 3 œ "ß #, and 3 − Ð  "ß "ÑÞ The random vector Y is
called a bivariate normal random vector.

Finally, the bivariate normal density is

#15" 5# "3#
"
0Y Ðy8 Ñ œ

‚ exp  #Ð"3# Ñ  5"   #3 C"5"."  C#5#.#    C#5#.#  


# #
" C" ."

(1.7)

From (1.1) we have

]" œ 5" ^"  ." (1.8)

]# œ 5# 3^"  5# "  3# ^#  .# (1.9)

Recalling the property that CovÐ] ß -Ñ œ !ß where - is a constant, we have

CovÐ]" ß ]# Ñ œ 35" 5# (1.10)

Thus,

Var]" †Var]#
CovÐ]" ß]# Ñ
3œ (1.11)

is the correlation coefficient. If 3 œ !, CovÐ]" ß ]# Ñ œ ! and thus ]" and ]# are uncorrelated, we
have from (1.8-1.9) that ]" and ]# reduce to affine transformations of ^" and ^# and thus are

Probability and Statistics MTH 2401, Page 218, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

independent. Therefore, ]" and ]# are independent if and only if they are uncorrelated or if and
only if correlation coefficient 3 œ !.

Furthermore, (1.8) reads that ]" is an affine transformation of a standard Gaussian r.v. and thus
marginal distribution of ]" is Gaussian with parameters Ð." ß 5"# Ñ. From (1.9) we have that ]# is a
linear combination of two independent standard Gaussian r.v.'s plus a constant, thus it is
Gaussian with mean .# and variance 5## .

From (1.10) we have the covariance matrix of vector Y:

Oœ ß
5"# 5" 5# 3
(1.12)
5" 5# 3 5##

with

detO œ ÐdetEÑ# œ 5"# 5## Ð"  3# ÑÞ (1.13)

Furthermore,
" 3

"3#  
5"#
 5" 5#
"
O "
œ 3 " (1.14)
 5" 5# 5##

or in the form

detO  Þ
" 5##  5" 5# 3
œ (1.15)
 5" 5# 3 5"#

Now, we notice that

y  .w O " y  .

"3#  5"   #3 C"5"."  C#5#.#    C#5#.#  


# #
" C" ."
œ
(1.16)

Therefore, in light of (1.16), equation (1.7) can be rewritten in terms of the covariance matrix O :

0Y Ðy8 Ñ œ #1detO
"
exp  #" y  .w O " y  .. (1.17)

The latter form better resembles the univariate normal pdf

0 ÐCÑ œ #15# exp


"   "# ÐC  .Ñ 5"# ÐC  .Ñ.

Problem 1.1. Find the marginal densities of a bivariate normal random vector.

MTH 2401, LECTURE NOTES, Page 219, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

MTH 2401, LECTURE NOTES, Page 220, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

2. Multivariate Normal Distribution


Distributions under Linear and Affine Transformations. Let X8 œ Ð\" ß á ß \8 Ñ be a
random vector with the joint pdf 0X Ðx8 Ñ and let r be a vector function from ‘8 to ‘8 . Denote the
random vector

Y8 œ Ð]" ß á ß ]8 Ñ À œ rÐX8 Ñß (1.1)

If y8 œ rÐx8 Ñ such that y8 œ Ex8 ß where E is an 8 ‚ 8- nonsingular matrix, then the joint pdf
of Y8 satisfies the following formula:

0Y Ðy8 Ñ œ 0X ÐE" y8 Ñ det"E Þ (1.15)

If r is an affine function defined as

y8 œ rx8  œ Ex8  . (1.16)

then the joint pdf of Y8 satisfies the following formula:

0Y Ðy8 Ñ œ 0X ÐE" Ðy8  .ÑÑ det"E Þ (1.20)

We recall that the Euclidean norem of a vector x8 is defined as

x8  œ B#"  á  B#8 .

The results of section 1 can be generalized as follows. Let ^" ß ^# ß á be iid standard Gaussian
r.v.'s and let E be an 8 ‚ 8 nonsingular matrix. Suppose . − ‘8 be a constant vector and
Z À œ Ð^" ß á ß ^8 Ñw . Then, the random vector

Y œ Y8 œ EZ  . (2.1)

is called an 8-variate normal random vector.

We start with the following lemma.

Lemma 2.1. Let Y be an 8-variate normal vector. Then,

||E" y  .||# œ y  .w ÐCovYÑ" y  . (2.2)

Now, using formula (1.20)

0Y Ðy8 Ñ œ 0Z ÐE" Ðy8  .ÑÑ det"E (2.5)

where

MTH 2401, LECTURE NOTES, Page 221, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

B#" B#8

 #1 / á "#1 / #
"
0Z Ðx8 Ñ œ  #

œ "
Ð#1Ñ8Î#
exp  "# B#"  á  B#8 

œ "
Ð#1Ñ8Î#
exp  "# || x8 ||#  (2.6)

we arrive at

0Z ÐE" Ðy8  .ÑÑ œ "


Ð#1Ñ8Î#
exp  "# ||E" y  .||#  (2.7)

by Lemma 2.1

œ "
Ð#1Ñ8Î#
exp  "# y  .w ÐCovYÑ" y  .Þ

Hence,

0Y Ðy8 Ñ œ " "


Ð#1Ñ8Î# detE
exp  "# y  .w ÐCovYÑ" y  .Þ

Furthermore,

CovY œ O œ EEw ,

we have

detO œ detE# . (2.8)

Hence,

0Y Ðy8 Ñ œ Ð#1Ñ8Î# detCovY


" "
exp  "# y  .w ÐCovYÑ" y  .

œ Ð#1Ñ8Î# detO
" "
exp  "# y  .w O " y  .. (2.9)

The results above can be summarized in the following theorem.

Theorem 2.2. Let Y œ EZ  . be an 8-variate normal random vector generated by an 8 ‚ 8


nonsingular matrix E, random vector Z œ Ð^" ß á ß ^8 Ñw of iid standard Gaussian r.v.'s, and
. − ‘8 Þ Then, the joint pdf of vector Y satisfies formula (2.9), where

CovY œ O œ EEw Þ 

From (2.1) we easily get

MTH 2401, LECTURE NOTES, Page 222, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

I Y œ .. (2.10)

Consequently,

with parameters Ð.ß O Ñ. We will also say that Y − R Ð.ß O Ñ.


Definition 2.1. The random vector in Theorem 2.2 is called 8-variate normal random vector

From (2.3), for any vector x − ‘8 ß s.t. x Á !ß

xw O x œ xw EEw x œ ÐEw xÑw ÐEw xÑ œ ||Ew x ||#  !Þ (2.11)

Therefore, CovY œ O (since it is symmetric) is positive definite.

Proposition 2.3. The MGF of Y is

QY Ð) Ñ œ exp)w .  "# )w O ). (2.12)

Theorem 2.4. Let Y œ Ð]" ß á ß ]8 Ñw be an 8-variate normal vector. Then, ]" ß á ß ]8 are
independent if and only if they are pairwise uncorrelated.

(See Problem 2.4.)

Definition 2.2. A nonsingular 8 ‚ 8 matrix E is called orthogonal if Ew œ E" . 

Theorem 2.5. Let Y œ EZ  . be an 8-variate normal random vector generated by an 8 ‚ 8


orthogonal matrix E, random vector Z œ Ð^" ß á ß ^8 Ñw of iid standard Gaussian r.v.'s, and
. − ‘8 Þ Then, the components ]" ß á ß ]8 of Y are independent and that Var]3 œ " for all
3 œ "ß á ß 8Þ Furthermore, ]" ß á ß ]8 are identically distributed if and only if . œ .Ð"ß á ß "Ñw Þ
In particular, if ." œ á œ .8 œ !ß then ]" ß á ß ]8 are independent standard Gaussian r.v.'s.

(See Problem 2.7.)

Theorem 2.6. Suppose Y is a random vector whose components ]" ß á ß ]8 are independent
Gaussian r.v.'s with means ." ß á ß .8 and common variance 5# . Let W œ EY, where E is an
orthogonal matrix. Then the components [" ß á ß [8 of vector W are independent Gaussian
r.v.'s with common variance 5# Þ In particular, if ]" ß á ß ]8 are independent standard Gaussian,
then [" ß á ß [8 are also independent standard Gaussian.

Proof. Denote . œ Ð." ß á ß .8 Ñw and let X œ 5" Y  .. Then the elements of X are iid standard
Gaussian r.v.'s. Let

EX œ 5" EY  5" E..

Then,

MTH 2401, LECTURE NOTES, Page 223, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

W œ 5ÐEX  5" E.ÑÞ

Since elements of X are iid standard Gaussian, by Theorem 2.5, the vector EX  5" E. has
independent Gaussian, each with variance ". Thus, W will have independent Gaussian
components with common variance 5# and means calculated from E..

If ]" ß á ß ]8 are standard Gaussian, then . is a zero vector, E. œ 0 and 5# œ ", which yields
the second part of the statement. 

Theorem 2.7. Let Y − R Ð.ß O Ñ be an 8-variate Gaussian random vector and let F be an
8 ‚ 8 nonsingular matrix and b − ‘8 Þ Then,

V À œ FY  b

is also 8-variate Gaussian with V − R ÐF .  bß F O F w Ñ In particular, if ]" ß á ß ]8 are


pairwise uncorrelated and have a common variance 5# if F is orthogonal, then the components
of vector V are also independent with the same common variance 5# .

Proof. We have Y œ EZ  . and thus V œ FEZ  F .  b with

CovV œ ÐFEÑÐFEÑw œ FEEw F w œ F CovYF w Þ

In particular, if ]" ß á ß ]8 are pairwise uncorrelated (thus independent) and have a common
variance 5# ß then

CovV œ F CovYF w œ 5# FF w

Now, if F is orthogonal, FF w œ M and thus the components of vector V are also independent
with the same common variance 5# . 

MTH 2401, LECTURE NOTES, Page 224, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

PROBLEMS

2.2. Prove Proposition 2.3.

2.3. Derive the mgf for the bivariate normal r.v. given in (1.1).

2.4. Prove Theorem 2.4.

Solution. Suppose the components of vector Y, ]" ß á ß ]8 , are pairwise uncorrelated. Thus, O is
the diagonal matrix:

 5"# ! 
"

œ á á
! á
"
(2.14)
 ! " 
O á á
! á 5# 8

and the exponent y  .w O " y  . of (2.9) reduces to


" "
ÐC
5"# "
 . " Ñ#  á  58# ÐC8  .8 Ñ# Þ

Furthermore,

detO œ 5" â58 Þ

Hence, from (2.9) we have

0Y Ðy8 Ñ œ  exp   .3 Ñ# 
8
" "
(2.15)
3œ" #153
#
ÐC
#53# 3

concluding that ]" ß á ß ]8 are independent.

2.5. Specify the condition imposed on matrix E in order that ]" ß á ß ]8 be independent. How
does O look in terms of the elements of matrix E in this case?

2.6. Let Y œ Ð]" ß á ß ]8 Ñw − R Ð.ß O Ñ and let α − ‘8 . Show that αw Y − R Ðαw .ß αw OαÑ.

Solution. αw Y œ αw EZ  αw .. Thus, αw Y is a linear combination of iid standard Gaussian r.v.'s


plus a constant, thus Gaussian with mean αw . and variance

Varαw Y œ ||αw E||# œ Ðαw EÑÐαw EÑw œ αw EEw α œ αw Oα. 

2.7. Prove Theorem 2.5: Let Y œ EZ  . be an 8-variate normal random vector generated by an
8 ‚ 8 orthogonal matrix E, random vector Z œ Ð^" ß á ß ^8 Ñw of iid standard Gaussian r.v.'s, and
. − ‘8 Þ Recall (Definition 2.2) that a nonsingular 8 ‚ 8 matrix E is called orthogonal if
Ew œ E" . Prove that the components ]" ß á ß ]8 of Y are independent and that Var]3 œ " for all

MTH 2401, LECTURE NOTES, Page 225, Version 54


CHAPTER VIII. GUASSIAN MULTIVARIATE DISTRIBUTIONS

3 œ "ß á ß 8Þ Furthermore, ]" ß á ß ]8 are identically distributed if and only if . œ .Ð"ß á ß "Ñw Þ
In particular, if ." œ á œ .8 œ !ß then ]" ß á ß ]8 are independent standard Gaussian r.v.'s.

Solution. Since O œ EEw œ EE" œ Mß the elements of CovY are zero except for the main
diagonal. Hence, ]" ß á ß ]8 are pairwise uncorrelated. By Theorem 2.4, they are also
independent. Moreover, Var]3 œ 53# œ "ß 3 œ "ß á ß 8, and thus ]3 's are identically distributed if
and only if ." œ á œ .8 . The rest is trivial. 

2.8. Find the marginal pdf's of a multivariate normal density.

MTH 2401, LECTURE NOTES, Page 226, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

CHAPTER IX. SIMPLE LINEAR REGRESSION


1. The Regression Line
Regression methods were advanced by Carl Friedrich Gauss in the 19th century in problems
related to astronomical measurements.

Suppose we want to find out how increasing the amount B of a certain chemical in the soil
increases the amount C of that chemical in the plants grown in that soil. For certain chemicals
and plants, the relationship between B and C can be approximated by the linear equation

CÐBÑ œ "!  "" B (1.1)

with "! and "" yet to be determined.

Now, if we run several experiments with same B using nearly identical soils and plants, we will
find that the values of C are not the same. At the same time, if we use different values B" ß B# ß á
of B and expect to observe CÐB" Ñß CÐB# Ñß á from equation (1.1), we will see that the actual
observed values C" ß C# ß á will differ from CÐB" Ñß CÐB# Ñß á

For this reason, we need to extend the deterministic model driven by equation (1.1) to a more
flexible stochastic model with the random line

] ÐBÑ œ "!  "" B  &ß (1.2)

whose only random element & is a random error generated by random variations due to
differences between plants, soils, measurements, and other possible factors.

If we now conduct 8 independent experiments with possibly different values of B" ß á ß B8 of B,


our model will become

]3 œ ] ÐB3 Ñ œ "!  "" B3  &3 ß 3 œ "ß á ß 8ß (1.3)

with &3 's being presumably independent and identically distributed. If in addition, &3 's are
Gaussian with zero mean and Var&3 œ 5# ß then the corresponding model is referred to as the
linear regression model.

Notice that the presence of r.v.'s &3 's in (1.3) explains the observed vertical deviations C3 's from
CÐB3 Ñ œ "!  "" B3 . Consequently, ] ÐB3 Ñ's of (1.3) must "predict" C3 's. In other words, we can
regard C3 's as observed values of ] ÐB3 Ñ's.

Now, to find the most suitable "! ß "" one uses the "least square estimates" "^ ! ß "^ " . Letting the
error deviations

Probability and Statistics MTH 2401, Page 227, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

UÐ"! ß "" Ñ œ  ÐC3  CÐB3 ÑÑ œ  ÐC3  Ð"!  "" B3 ÑÑ



8 8
# #
(1.4)
3œ" 3œ"
$3

^ ^
be minimal, we hope to find the two values " ! and " " . Once the parameters "^ ! and "^ " are found
that minimize UÐ"! ß "" Ñ, the line C^ œ "^ !  "^ " B can be drawn so that it will best fit all points
ÐB3 ß C3 Ñß 3 œ "ß á ß 8ß dispersed around this line. In Figure 1.1 below we drew a line with yet
unspecified "! and "" through a group of scattered points. The "best" line C^ œ "^ !  "^ " B can be

chemical put in the soil in the above illustration. Notice that in the objective function U"! ß "" 
used to interpolate or extrapolate values of chemicals in the plants for any given value B of the

only the vertical deviations of B3 ß C3 's from the line are taken into account.

Figure 1.1

In (1.4), differentiating U partially in "! and "" and setting the obtained equations to zero, we
can solve them w.r.t. "! and "" denoting their solutions by + œ "^ ! and , œ "^ " :
_ _
+ À œ "^ ! œ C8  "^ " B8 (1.5)

 ÐB3 B8 ÑC3


8 _

, À œ "^ " œ œ À =BC Î=BB . (1.6)


 ÐB3 B8 Ñ#
3œ"
8 _
3œ"

^ ^
Applying the familiar "second partials" test for " ! and " " we find that, in the form of (1.5) and
^ ^
(1.6), they indeed minimize U. However, " ! and " " do only a partial job. We still have 5#
undetermined, which would contribute to a larger picture on the regression line ] .

We are back to the stochastic version of the deterministic line C œ "!  "" B as per (1.2):

MTH 2401, LECTURE NOTES, Page 228, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

] œ "!  "" B  &,

where & − ÒR Ð!ß 5# ÑÓ, with 5# unknown. Obviously,

] − ÒR Ð"!  "" Bß 5# ÑÓ. (1.7)

The error &, as previously, mentioned, can be due to unobservable factors that corrupt
measurements. In our next step, we consider a random sample ]" ß á ß ]8 such that

]3 œ "!  "" B3  &3 ß &3 − ÒR Ð!ß 5# ÑÓß 3 œ "ß á ß 8ß (1.8)

are independent, but obviously not identically distributed Gaussian r.v.'s.

If we form the likelihood function of the sample ]" ß á ß ]8 ß even though ]3 ;s are not identically
distributed, we can still use a similar method like that of section 1, Chapter III, to find the values
of "! ß "" ß and 5# that maximized the likelihood function of the sample. It turns out that the
m.l.e.'s of "! and "" are precisely the values + œ "^ ! and , œ "^ " of (1.5) and (1.6). The m.l.e. 5
^#
of 5# satisfies the formula

^ # œ " ==/ß
5 (1.9)
8

where

==/ À œ 83œ" ÒC3  Ð+  ,B3 ÑÓ# . (1.9a)

MTH 2401, LECTURE NOTES, Page 229, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

2. Testing Hypotheses about Regression


Parameters "! and ""
From section 1, it sounds like we are done with the unknowns "! ß "" ß 5# replacing them with
^ # . However, the real testing of "! and "" ends up needing more scrutiny. This
their m.l.e.'s +ß ,ß 5
is primarily due to the impact of the random term & that makes the regression line (now ] )
random and this in turn changes the nature of the underlying parameters. We will be more
specific in a moment.

We use hypotheses testing of parameters "! and "" in the general form below by choosing some
fixed constants -! ß -" ß and -‡ :

H! À -! "!  -" "" œ -‡ (2.1)

H" À -! "!  -" "" Á -‡ (2.2)

This amounts to a large variety of special cases for different choices of -! ß -" ß and -‡ .

As in section 1, Chapter III, we replace C3 's in (1.5), (1.6), and (1.9) with their random versions
^ # from their m.l.e.'s to MLE's, in notation Eß Fß and
]3 's of (1.8), thereby changing +ß ,ß and 5
O #.

To test the veracity of the null hypothesis H! (2.1) we define the statistic

X œ -! E  -" F (2.3)

which can be shown to be Gaussian with parameters

.X œ -! "!  -" "" (2.4)

and

5X# œ 5#  8!   -! B=8B-"  ,
_ #
-#
(2.5)

where as we recall,

=#B À œ  ÐB3  B8 Ñ# .
8 _
(2.5a)
3œ"

Thus if the null hypothesis (2.1) is true, .X of (2.4) must equal -‡ , and since X − ÒR Ð.X ß 5X# ÑÓ,
the statistic

MTH 2401, LECTURE NOTES, Page 230, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

X -‡
[!" œ 5X − ÒR Ð!ß "ÑÓ (2.6)

is standard Gaussian. Notice that the subscrips !" in [!" mean to represent the respective sub-
scripts in "! and "" rather than dealing with the fact that [!" − R !ß "ß although the latter
holds true.

A Special Case. Now we will turn to a particular case of (2.1-2.2) where

-! œ -‡ œ ! and -" œ ". (2.7)

The hypotheses will reduce to

H! À "" œ ! (2.8)

H" À "" Á ! (2.9)

thus meaning that if H! is true, then in the regression line C œ "!  "" B, C does not (linearly)
depend upon B.

We would like to discuss this special case. Under restrictions (2.7) we have X ß .X ß 5X# of (2.3-
2.5) reduce to

X œF (2.10)

and

.X œ "" (2.11)

5X# œ 5# Î=#B . (2.12)

Thus if the null hypothesis H! of (2.8) is true, the mean of X , .X must be zero and since
X − ÒR Ð.X ß 5X# ÑÓ, the r.v.

X
[" À œ 5X − ÒR Ð!ß "ÑÓ (2.13)

will be a test statistic for parameter "" . Consequently, it stands for reason to test statistic [" for
being standard Gaussian. This approach however has a serious shortcoming, since in 5X# of (2.5)ß
the variance 5# is unknown.

^ # of (1.9) when replacing C3 's with ]3 's and + and , with E and F , we arrive at
Now, if in in 5

 Ò]3  ÐE  FB3 ÑÓ# .


8
"
O# œ 8 (2.14)
3œ"

This is the MLE of 5# which however is biased. We will use the unbiased estimator

MTH 2401, LECTURE NOTES, Page 231, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

 Ò]3  ÐE  FB3 ÑÓ# Þ


8
O w# œ "
8# (2.15)
3œ"

Notice that X is normal regardless of whether or not X œ !, but it fails to be standard normal if
H! is not true. Furthermore, if 5# is replaced with O w # , it is not standard normal, even if the null
hypothesis is true and thus we need to figure out an alternative test for the veracity of H! .
Consider in place of [" , the statistic

œ F  O=Bw # œ F O=Bw ß
5X O w #
#
X5
Y" œ (2.16)

in which the unknown 5 got canceled. The r.v. Y" turns out to be a >-r.v. with 8  # degrees of
freedom. Hence, the null hypothesis is true if and only if Y" belongs to the Ò>8# Ó equivalence
class.

Recall that the >-r.v. with 5 degrees of freedom has the pdf looking very much like the pdf of the
standard Gaussian r.v., being a bell-shaped curve, symmetric about C-axis and with mean equal
zero for 5  ". (For 5 œ " it does not exist.)

MTH 2401, LECTURE NOTES, Page 232, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

3. The Procedure
In this section, we will develop a procedure, which will lead to a decision making in the root
problem of accepting or rejecting the null hypothesis. The veracity of H! is checked upon
whether or not Y" belongs to a specified "critical region", say G © ‘. If it does, the null
hypothesis will be rejected. The critical region G will be established based on how the true >8#
r.v. will behave. The criterion of finding G should logically depend on whether or not the tail of
the distribution of >8# , T Öl>8# l  -×, will be small for some positive value - . The critical
region will thus be the set

G œ Ð  ∞ß  -Ñ ∪ Ð-ß ∞Ñ, (3.1)

with - to be determined from the equation

T Öl>8# l  -× œ αß (3.2)

where the level of significance α is usually chosen to be equal to 0.05. For instance if 8 œ "#,
we will have >"! (with 10 degrees of freedom) and from the tables for >"! we find that

- œ #Þ$!' (3.3)

Precisely, from (3.2) we have

"  α œ T Öl>8# l Ÿ -× œ J>8# Ð-Ñ  J>8# Ð  -Ñ

œ #J>8# Ð-Ñ  "

(with J>8# Ð-Ñ œ "  α


# Ê - œ J>"
8#
"  α# ),

due to the mentioned symmetry of the density of >8# .

Consequently, if test statistic Y" behaves like >8# , then the empirical value ?" (based on
empirical values C3 's of ]3 's) of Y" will lie in the region Ð  -ß -Ñ œ Ð  #Þ$!'ß #Þ$!'Ñ,
complimentary to G , with "confidence" "  α œ !Þ*&. The chief reason why "  α is being used
as confidence, rather than probability, is because an empirical value ?" of Y" replaces the r.v. Y" .
(The same reasons as those we used when arguing about confidence intervals in the context of
the Central Limit Theorem.)

The empirical value ?" will be like that of Y" , with ]3 's replaced with their "observed" values
C3 's À

?" œ "^ " œ "^ "  5=wB# ß



=#B #

" 
8
^ "^ B ÑÓ#
(3.4)
8# ÒC3 Ð" ! " 3
3œ"

MTH 2401, LECTURE NOTES, Page 233, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

where

" 
8
5w # œ 8# ÒC3  Ð"^ !  "^ " B3 ÑÓ# (3.5)
3œ"

^ ^
is the empirical unbiased sample variance, and " ! and " " satisfy (1.5-1.6). In a nutshell, ?" is
calculated from (1.5-1.6) and (3.4) and checked whether or not it belongs to the critical region
G . For α œ !Þ!& and 8 œ "# it is equal to G œ Ð  ∞ß  #Þ$!'Ñ ∪ Ð#Þ$!'ß ∞Ñ. If ?" − G , then
the null hypothesis is rejected. Otherwise, the null hypothesis is either not rejected or will
undergo yet another, more refined testing.

Remark 3.1 (The P-Value). The P-value is defined as

: œ T ÖY"  l?" l×, (3.6)

where ?" is the value satisfying equation (3.4). Unlike the decision making first by setting a
significance level and then calculating the associated critical region, the P-value implies a
simpler logic. Namely, if the P-value is small, we conclude that the deviations of the genuine >-
r.v. greater than l?" l are unlikely and thus we will reject the null hypothesis. 

MTH 2401, LECTURE NOTES, Page 234, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

4. Examples and Notation


In the literature on regression analysis, one uses the following common notation for the
parameters in the (3.4) formula for ?" À

=#B œ =BB À œ  ÐB3  B8 Ñ#


8 _
(4.1)
3œ"

=BC À œ  ÐB3  B8 ÑC3 .


8 _
(4.2)
3œ"

The latter can be easily proved to equal

=BC œ  ÐB3  B8 ÑÐC3  C8 Ñ.


8 _ _
(4.3)
3œ"

In light of (4.1-4.3), formulas (1.5-1.6) read:

, œ "^ " œ =BC Î=BB (4.4)


_ _ _ _
+ œ "^ ! œ C8  "^ " B8 œ C8  ,B8 . (4.5)

Introduce

=#C À œ =CC À œ  ÐC3  C8 Ñ#


8 _
(4.6)
3œ"

to get the square sum error

sse À œ  ÒC3  Ð"^ !  "^ " B3 ÑÓ#


8

3œ"

œ  ÒC3  Ð+  ,B3 ÑÓ# ,


8
(4.7)
3œ"

which can be shown to equal

sse œ =CC  "^ " =BC œ =CC  ,=BC . (4.8)

Next,

5w # À œ "
8# sse, (4.9)

MTH 2401, LECTURE NOTES, Page 235, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

which is the square of the denominator of (3.4) (unbiased empirical sample variance). Finally,
formula (3.4) can be rewritten as
,
?" œ denom (4.10)

where

denom À œ 5w # Î=BB . (4.11)

The above formulas can be readily incorporated in spreadsheets in Excel or QuattroPro office
utilities.

Example 4.1. We will test the stock AmeriGas (index GASFX) against the IBM as far as their
"linear" independence being conjectured as a null hypothesis. We take twelve weekly values of
both stocks from February of 2008 through April of 2008.
_ _ _ _
Date GASFX x3 ÐB3  BÑ# IBM C3 ÐC3  CÑ# ÐB3  BÑÐC3  CÑ
" !%Î#" #!Þ#* !Þ$'(!$% "#$Þ'( )&Þ##$'( &Þ&*#)&"$ 5w # œ %#Þ$$
# !%Î"% #!Þ%% !Þ&("#)% "#%Þ%! **Þ#$%)! (Þ&#*$&*(
$ !%Î!( "*Þ'& !Þ!!""'( ""'Þ!! #Þ%$))!#  !Þ!&$$&'* denom œ %Þ&&$%&
% !$Î$" "*Þ'( !Þ!!!#!! ""&Þ(' "Þ(%')!#  !Þ!")(#$'
& !$Î#% "*Þ"% !Þ#*'""( ""%Þ&( !Þ!"($$'  !Þ!(")%)) ?" œ !Þ'!&$
' !$Î"( ")Þ*) 0.495850 "")Þ$$ 15.14506  #Þ(%!$)"*
( !$Î"! "*Þ%% 0.059617 ""&Þ#$ 0.626736  !Þ"*$#*)'
) !$Î!$ "*Þ%$ 0.064600 ""$Þ*% 0.248336 !Þ"#''&*(##
* !#Î#& "*Þ&" 0.030334 ""$Þ)' 0.334469 !Þ"!!(#'$)*
"! !#Î"* "*Þ*( 0.081700 "!)Þ!( %!Þ&&&''  "Þ)#!##)"*
"" !#Î"" "*Þ*& !Þ!(!''( "!'Þ"' 68.53080  #Þ#!!'&'*%
"# !#Î!% "*Þ(% 0.003117 "!$Þ#( 124.7316  Þ!#$&'&#)
_ _
B8 œ "*Þ')%"' C8 œ ""%Þ%$ =CC œ %$)Þ)$%"'''(

^ œ,œ _
=BB #Þ!%"'*" " " #Þ(&'$ ,B8 œ &%Þ#&(

^ œ+œ
" '!Þ")" ,=BC œ "&Þ&"#
!

sse= %#$Þ$#

Figure 4.1

The value ?" œ !Þ'!&$ does not belong to the critical region
G œ Ð  ∞ß  #Þ$!'Ñ ∪ Ð#Þ$!'ß ∞Ñ and therefore the null hypothesis (that AMERIGAS is

MTH 2401, LECTURE NOTES, Page 236, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

independent of IBM) should not be rejected. It thus stands for reason to assume that these two
companies had a limited (if any) impact on each other.

MTH 2401, LECTURE NOTES, Page 237, Version 54


CHAPTER IX. SIMPLE LINEAR REGRESSION

PROBLEMS

1. The weekly closing prices (see the below table) in US$ from stock exchange for American
Gas GASFX (x-values) and Baxter International Stock (y-values) were recorded from February
4, 2008 to April 21, 2008. Calculate the least square error coefficients + and , for the
deterministic linear regression curve: C œ "!  "" B.

2. Under the conditions specified in Problem 1, calculate the sse (square sum error) value
sse œ  ÐC3  +  ,B3 Ñ# .
8

3œ"

3. Under the conditions specified in Problem 1, test the hypothesis H! À "" œ ! (i.e. that weekly
closings of American Gas are "linearly" independent of Baxter prices) against its alternative
L" À "" Á ! at the confidence level α œ !Þ!&. Among other things, calculate ?" . Also, find the
p-value and make your conclusion about the testing.

No Date American Gas GASFX x Baxter y


1 4/21 20.29 60.54
2 4/14 20.44 61.28
3 4/07 19.65 60.57
4 3/31 19.67 59.89
5 3/24 19.14 57.5
6 3/17 18.98 58.05
7 3/10 19.44 56.27
8 3/03 19.43 57.78
9 2/25 19.51 59.02
10 2/19 19.97 59.61
11 2/11 19.95 59.34
12 2/04 19.74 60.65

MTH 2401, LECTURE NOTES, Page 238, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

CHAPTER X. MULTIPLE LINEAR


REGRESSION
1. Problem Formulation
We recall that in a simple regression model we developed a regression line ] œ "!  "" B  &
under the following specifications:

1) it depended on two unknown parameters "! and ""

2) it additionally depended on an additive random term & − R Ð!ß 5# Ñ, with 5# unknown

3) it contained a control (input) variable B;

4) ] was interpreted as a value of the conditional expectation of some r.v. ^ on \ œ B,


where \ was a control r.v.

5) altogether, the regression line depended on three parameters "! ß "" ß and 5# , for which we
identified unbiased maximum likelihood estimators, F! ß F" ß and O # Þ

6) the entire routine included a collection of 8 independent responses ]" ß á ß ]8 of ] , under


8 different input variables B" ß á ß B8

7) the empirical values C" ß á ß C8 , along with “observed” control variables B" ß á ß B8 , were
used to find initially the deterministic values "^ ! and "^ " of "! and "" using the least square
method yielding a deterministic regression line C œ "^ !  "^ " B

8) the impact of a presumed additive Gaussian error & brought us to a likelihood function
whose maximum value was reached at "^ ! and "^ " (in agreement with the least squares) and, in
^ # ; all three being the maximum likelihood estimates of "! ß "" ß and 5# , respectively
addition, 5

9) replacing the empirical values C" ß á ß C8 with their “hypothetical random priors”
]" ß á ß ]8 turned the maximum likelihood estimates "^ ! , "^ " , and 5
^ # to maximum likelihood
estimators F! ß F" ß and O #

8O # Î5# − ;8# , i.e. a chi-square r.v. with 8  # degrees of freedom


10) F! and F" appeared to be bivariate normal, while O # was independent of F! and F" and
#

11) we determined a statistical test procedure for the three parameters "! ß "" ß and 5# by using
test statistic Y!" , as > r.v. with 8  # degrees of freedom

Probability and Statistics MTH 2401, Page 239, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

12) the veracity of the null hypothesis (some relationships between the test parameters) had to
be checked upon whether or not the empirical value ?!" of Y!" would fall into the outside of
critical region G which was established based on the behavior of a genuine > r.v. with 8  #
degrees of freedom and a selected confidence level.

In the case of a multiple regression model, a new regression figure now depends on more than
two deterministic parameters compared to the simple regression model.

Example 1.1. Suppose that instead of fitting 8 points by a single straight line we want to employ
a possibly more flexible fitting by a Ð:  "Ñst degree polynomial of the form

C œ "!  "" B  á  ":" B:" ß :   #Þ (1.1)

Notice that the : unknown parameters in (1.1) still appear in a linear form, while the control
variable B forms a polynomial function.

We can use again the method of least squares to find "3 's to minimize the sum U of squares of
the vertical deviations of the points from the polynomial curve:

U œ  C3  "!  "" B3  á  ":" B:"  Þ


8 #
3 (1.2)
3œ"

Calculating the : partial derivatives `UÎ` "3 ß 3 œ !ß "ß á ß :  ", and equating each one of them
to zero we obtain the following system of : normal equations of : unknowns "! ß á ß ":" À

 C3 œ "! 8  ""  B3  á  ":"  B:"


8 8 8
3
3œ" 3œ" 3œ"

 B3 C3 œ "!  B3  ""  B#3  á  ":"  B:3


8 8 8 8
(1.3)
3œ" 3œ" 3œ" 3œ"
ááááááááááááááá

 B:" C3 œ "!  B3:"  ""  B3:  á  ":"  B3


8 8 8 8
#Ð:"Ñ
3
3œ" 3œ" 3œ" 3œ"

We assume that the rank of the associated matrix of these equations is : and thus this system has
a unique solution. It can be shown that the solution of this system indeed minimizes U. Let these
values be "^ ! ß á ß "^ :" . Then, the least square polynomial is

C œ "^ !  "^ " B  á  "^ :" B:" . (1.4)


Example 1.2. Suppose a new drug a is being tested on a group of 8 volunteers. In the
framework of the simple regression we would use B as a control (input) variable as some

MTH 2401, LECTURE NOTES, Page 240, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

numeric response of a patient to an old drug b and C as the patient's response to drug a and
represent C as a “linear” function C œ "!  "" B w.r.t. parameters "! and "" .

In a more general case, we are interested in their respective responses C" ß á ß C8 to the new drug
compared to their multiple responses to an old drug bß combined with
B3" ß á ß B3:" ß 3 œ "ß á ß 8ß other responses such as blood pressure, interaction with a
supplement, and heart rate. For example, suppose we have :  " input variables B" ß á ß B:"
representing :  " responses to b, listed above, such as blood pressure, interaction with a
supplement, heart rate, body temperature, interaction with drug a, drug b, and drug c, red cells
count, etc., for each of the 8 patients in the test group. Furthermore, suppose we would like to
represent a generic response C of a patient to drug a as a linear function w.r.t. "3 's of the
patient's response to drug b and other named components in the form

C œ "!  "" B"  á  ":" B:" œ xw " , (1.5)

where
 "! 
 ":" 
x œ Ð"ß B" ß á ß B:" Ñ and " œ
w ã . (1.5a)

Also in this case, we will use the least square function U like (1.2) in slightly different form

U œ  C3  "!  "" B3"  á  ":" B3:"  Þ


8
#
(1.6)
3œ"

The corresponding system of normal equations after the same routine will read

 C3 œ "! 8  ""  B3"  á  ":"  B3:"


8 8 8

3œ" 3œ" 3œ"

 B3" C3 œ "!  B3"  ""  B#3"  á  ":"  B3" B3:"


8 8 8 8
(1.7)
3œ" 3œ" 3œ" 3œ"
ááááááááááááááá

 B3:" C3 œ "!  B3:"  ""  B3:" B3"  á  ":"  B#3:"


8 8 8 8

3œ" 3œ" 3œ" 3œ"

Again after finding the unique solution "^ ! ß á ß "^ :" (assuming this is the case) we have the
least-square hyper-plane

C œ "^ !  "^ " B"  á  "^ :" B:" . (1.8)


MTH 2401, LECTURE NOTES, Page 241, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

Comparing both examples we conclude that Example 1.1 is a special case of Example 1.2 with
B3 of 1.1 equal to B3 of 1.2. It therefore stands for reason to consider a general linear model

C œ "!   "4 B4
:"
(1.9)
4œ"

interpreting (1.9) in the form

C œ I ^l\" œ B" ß á ß \:" œ B:"  œ "!   "4 B4


:"
(1.10)
4œ"

in the case of a single observation. For 8 independent observations we have

C3 œ I ^3 l\3" œ B3" ß á ß \3:" œ B3:" 

œ  "4 B34 ß 3 œ "ß á ß 8ß B!4 œ "ß 4 œ !ß á ß :  ".


:"
(1.11)
4œ!

The stochastic version of (1.11) will read

]3 œ I ^3 l\3" œ B3" ß á ß \3:" œ B3:" ,&3 

œ "!   "4 B34  &3 ß 3 œ "ß á ß 8.


:"
(1.12)
4œ"

emerged from a single generic regression plane

] œ I ^l\" œ B" ß á ß \:" œ B:" ß &

œ "!   "4 B4  &


:"
(1.13)
4œ"

with

I^ œ I] œ "!   "4 B4 .
:"
(1.14)
4œ"

Thus, a single multiple regression plane ] now includes : unknown parameters "! ß á ß ":" ß
the same number of control variables "ß B" ß á ß B:" ß and one random error & − ÒR Ð!ß 5# ÑÓ. With
8 independent observations of ] and the associated control variables B! œ "ß B" ß á ß B:" ß the
complete model in the matrix form becomes

MTH 2401, LECTURE NOTES, Page 242, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

 ]"   " B":"  "!   &" 


 ]#   " B#:"  ""   &# 
B"" B"# á
 œ   
B#" B## á

 ]8   " B8:"  ":"   &8 


ã ã ã ã ä ã ã ã
B8" B8# á
(1.15)
or

Y œ G "  & (1.16)


8‚" 8‚:: ‚" 8‚"

where G is called the design matrix,

I & œ 0 and Cov& œ 5# M8 and & − R 0ß 5# M8 . (1.17)

MTH 2401, LECTURE NOTES, Page 243, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

1.1. Suppose C is the current value of a house and B" is the square footage of the living area (in
thousands of sq. ft.), B# is the location (numerical value of the indicator zone À from 1 to 10; 10
is the best), B$ is the state mean appraised value for the past three years (in $100,000), B% the
wind resistance of the of the roof (the hurricane code; 1 if it is under the code, 0 if it is not), B&
additional hurricane protection, such as shutters, panels, etc. (ranking from ! to 10), B' flood
zone (from  & to &), B( total acreage of the property. (a) Set the corresponding linear
regression model. (b) Modify the model by allowing the third degree polynomial for the first
control variable only and linearly otherwise.

Suppose we have 12 observations on the current value of the house collected from 12 different
homes.

B" sq.f. B# loc B$ av B% hc B& hp B' fz B( acr C hv


" "Þ$ 5 "Þ# " ! # !Þ$ "Þ$
# "Þ& $ "Þ% " # ! !Þ% "Þ'
$ #Þ! $ "Þ& " & " !Þ& "Þ)
% $Þ! % "Þ) ! $ " !Þ) #Þ%
& $Þ$ % "Þ) " & # "Þ! #Þ'
' $Þ& & #Þ! ! & $ "Þ" #Þ)
( $Þ& ' #Þ& " % # "Þ& $Þ"
) $Þ$ ( #Þ( " # % #Þ& $Þ&
* %Þ" ' #Þ& ! # " $Þ& $Þ(
"! %Þ& ( $Þ! " & " #Þ& %Þ!
"" %Þ' ) $Þ# " * $ "Þ& %Þ#
"# &Þ# "! $Þ) " "! % #Þ) %Þ&

(c) Specify the design matrix G .

MTH 2401, LECTURE NOTES, Page 244, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

2. Maximum Likelihood Estimators


By the analog with the simple linear regression, we consider the likelihood function

08 Ðyß xw Ñ œ

" 
exp  "!  B3" ""  á  B3:" ":" Ñ# Þ
8

#15# 8Î#
"
 #5 # ÐC3 (2.1)
3œ"

We denote, as previously, by "^ ! ß á ß "^ :" the values of "! ß á ß ":" which maximize the above
likelihood function. For a fixed 5# ß we need to minimize the same objective function U as in
(1.6):

U œ  ÐC3  "!  B3" ""  á  B3:" ":" Ñ# Þ


8
(2.2)
3œ"

Therefore, the minimum square error values "^ ! ß á ß "^ :" of "! ß á ß ":" will be the same ones
that solve system (1.7). They will also be the m.l.e.'s of "! ß á ß ":" . Now, replacing
^
" ßáß"
! with "^ ß á ß "^
:" in (2.1) we can also find the m.l.e. 5# of 5# as
! :"

^
5# œ 8" sse, (2.3)

where

sse œ  ÐC3  "^ !  B3" "^ !  á  B3:" "^ :" Ñ#


8
(2.4)
3œ"

stands for the numerical sum of square errors.

Then replacing C3 's with ]3 's and then "^ 5 's with F5 's we will have :  " MLE's of "! ß á ":" ,
and 5# ß with the MLE of 5# À

O # œ 8" SSE (2.5)

where

W # À œ SSE œ  ]3  F!  B3" F!  á  B3:" F:" #


8
(2.5a)
3œ"

stands for the associated stochastic sum of square errorsÞ Also with

O w# œ "
8: SSE (2.6)

MTH 2401, LECTURE NOTES, Page 245, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

we have and unbiased estimator of 5# as it can be shown. Furthermore, the r.v.


"
5# SSE − ;#8: 

and it is independent of F! ß á ß F:" . The forthcoming discussion on the nature of the


estimators F! ß á ß F:" we will turn to matrix and vector notation started in section 1. Recall
that

 B"! œ " B":" 


B œ " B#:" 
B"" â
G œ  #! 
B#" â
(2.7)
 B8! œ " B8:" 
ã ã
B8" â

is the design matrix. We assume that 8   : and that the rank of matrix is :. The other notation
will be as follows:

 C" 
 C8 
yœ ã ß (2.8)

 ]" 
 ]8 
Yœ ã ß (2.9)

 "! 
 ":" 
"œ ã (2.10)

 "! 
^
^ œ  ã ß (2.11)
 "^ 
"
:"

 F! 
 F:" 
Bœ ã . (2.12)

For example, in light of notation (2.7-2.12) the objective function U can be represented as

U œ  y  G "  w  y  G " . (2.13)

System (1.7) can be written as

G wG " œ G wy (2.14)

MTH 2401, LECTURE NOTES, Page 246, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

where index 5 is replaced with :  ". System (2.14) obviously has unique solution " ^ if and only
if the matrix G w G is nonsingular. This in turn will hold if the number of observations 8 is at least
: and furthermore, there must be at least : linearly independent rows in matrix G . In this case,
we have from (2.14),

^ œ VG w y,
" (2.15)

where

V œ G w G " (2.16)

and similarly,

B œ VG w Y. (2.17)

Furthermore, as in (2.15), we can rewrite (2.4) in matrix form:

sse œ y  G "
^ w  y  G "
^ (2.18)

Using (2.15), it can be simplified to

sse œ y  G"
^ w y (2.19)

(See Problem 2.1) to use for

^
5# œ 8" sse, (2.20)

Because each F3 in (2.17) is a linear combination of independent Gaussian r.v.'s, F3 is also


Gaussian. Recall from (1.12) that

]3 œ "!   "4 B34  &3 ß 3 œ "ß á ß 8


:"

4œ"

or, in matrix form,

Y œ G "  &ß (2.21)

where

 &" 
 &8 
&œ ã . (2.22)

MTH 2401, LECTURE NOTES, Page 247, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

Substituting (2.21-2.22) into (2.17) we have

B œ "  VG w &,

or in the form

B œ VG w 5Z  " , (2.23)

where " and B are defined in (2.10) and (2.12) and Z − ‘8 is the random vector of independent
standard Gaussian r.v.'s.

Obviously, vector B has each of its components normal, while jointly B does not fall into the
category of multivariate normal, since VG w 5# œ G w G " G w 5, as a : ‚ 8 matrix, is not
necessarily a square matrix. However, if we multiply B by a 8 ‚ : matrix K, we will obviously
have

KB œ KVG w 5Z  K" (2.24)

a multivariate Gaussian vector, provided that the 8 ‚ 8 matrix KVG w is nonsingular.

Returning to equation (2.23) we notice that, by Theorem 9.5, Chapter II,

CovB œ VG w 5CovZVG w 5w ,

and since CovZ œ M8 (the 8 ‚ 8 identity matrix) and V w œ V (: ‚ : matrix), we have from the
last equation:

CovB œ 5# VG w G V w œ 5# Vß (2.25)

provided that V " œ G w G is nonsingular, after taking into account the well-known property
E"  œ Ew " for a nonsingular square matrix E. Equation (2.25), along with the obvious
w

IB œ " (2.26)

from (2.23) (i.e. B is an unbiased estimator of " ), gives all first and second order characteristics
of random vector B.

From (2.26) we will find the correlation coefficients

CovF3 ßF4  5# <34


VarF3 VarF4 5# <33 5# <44 <33 <44
<34
334 œ œ œ . (2.27)

Example 2.1. Consider the case of a simple linear regression whose control and response
variables are given in Table 2.1 below:

MTH 2401, LECTURE NOTES, Page 248, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

3 B3 C3
" # !
# " !
$ ! "
% " "
& # $

Table 2.1

The simple linear regression line is C œ "!  "" B for which we would like to find "^ ! and "^ " by
using matrix operations. The design matrix is

 B"! B"" œ  #  !


 B#! B#" œ  "  !
œ"
   
G œ  B$! B$" œ ! ß y œ  " 
œ"
   
œ" (2.28)

 B&! B&" œ #  $


B%! œ" B%" œ " "
œ"

G wG œ 
"!  ! 
"
& ! !
ß V œ ÐG w
GÑ "
œ &
" (2.29)
! "!

G wy œ  
&
(2.30)
(

!  (  œ  !Þ( 
"
^ œ VG w y œ & ! & "
" " (2.31)
"!

yielding the deterministic predicting line

C^ÐBÑ œ "  !Þ(B. (2.32)


Example 2.2. Under the condition of Example 2.1, fit the quadratic polynomial curve the data in
Table 2. The quadratic parabola is C œ "!  "" B  "# B# Þ The design matrix will then be

 B"! œ" B"" œ  # B#"" œ % 


 B#! B##" œ " 
 
G œ  B$! B#$" œ ! Þ
œ" B#" œ  "
 
œ" B$" œ ! (2.33)

B B#&" œ % 
B%! œ" B%" œ " B#%" œ "
&! œ" B&" œ #

Next,

MTH 2401, LECTURE NOTES, Page 249, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

"  # %
 " " " "  "  " "
 
 " ! " # " !
"

 %  
! " % "
G wG œ # !

" %
" " "
#

 & ! "! 

 "! $% 
œ ! "! ! Þ (2.34)
!

 "(Î$& !  "Î( 

  "Î( ! "Î"% 
V œ ÐG w GÑ" œ ! !Þ" ! (2.35)

!
 " " " "  !   & 
 
 " ! " # " œ
"

 %    
Gyœ w
# ( Þ (2.36)

$
" ! " % " "$

Thus,

^  "(Î$& !  "Î(  & 

  "Î( ! "Î"%  "$ 


" œ VG y œ
w
! !Þ" ! (

 %Î( 
 $Î"% 
œ !Þ( (2.37)

and

C^ÐBÑ œ %
(  !Þ(B  $ #
"% B . (2.38)

Example 2.3. Suppose each of 10 patients is treated with a blood pressure medications: drug A
and drug B and that the change of systolic blood pressure is measured in patient 3 by responses
B3 and C3 ß respectively. The data is recorded in the following table:

MTH 2401, LECTURE NOTES, Page 250, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

3 B3 C3
" "Þ* !Þ(
# !Þ)  "Þ!
$ "Þ"  !Þ#
% !Þ"  "Þ#
&  !Þ"  !Þ"
' %Þ% $Þ%
( %Þ' !Þ!
) "Þ' !Þ)
* &Þ& $Þ(
"! $Þ% #Þ!

Table 2.2

Suppose that for each value B of \ , the regression function is a polynomial of the form
IÒ] l\ œ BÓ œ "!  "" B  "# B# .

Using system of equations (1.3) for 8 œ "! and 5 œ # we have

"!"!  #$Þ$""  *!Þ$("# œ )Þ"

#$Þ$"!  *!Þ$(""  %!"Þ!"# œ %$Þ&* (2.39)

*!Þ$("!  %!"Þ!""  ")*#Þ("# œ #!%Þ&&

Solving system (2.39) (or using (2.15)) we get the m.l.e.'s of "! ß "" ß "# ß

"^ ! œ  !Þ(%%ß "^ " œ !Þ'"'ß "^ # œ !Þ!"$ (2.40)

and therefore, the polynomial regression function is

C^ÐBÑ œ !Þ!"$B#  !Þ'"'B  !Þ(%%. (2.41)

The design matrix G is as follows:

 B"! œ " B"# œ B#" 


 B œ" B## œ B## 
B"" œ B"
G œ  #! .
B#" œ B#
(2.42)
B B"!ß# œ B#"! 
á á á
"!ß! œ " B"!ß" œ B"!

Now, from Table 2.2 and matrix G , we obtain matrix G w G and matrix V œ ÐG w GÑ" À

MTH 2401, LECTURE NOTES, Page 251, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

 "! #$Þ$ *!Þ$( 

 *!Þ$( %!" 
w
GGœ #$Þ$ *!Þ$( %!" Þ (2.41)
")*#Þ(

Thus,

 !Þ%!! !Þ!%' 
V œ G G 
 !Þ$!(

 !Þ!%' !Þ!"% 
"
w
œ  !Þ$!( !Þ%#"  !Þ!(% Þ (2.42)
 !Þ!(%

Thus, from (2.26), CovB œ 5# V , and from (2.27) we get

CovF! ß F"  œ  5# !Þ$!(ß VarF! œ 5# !Þ%ß VarF" œ 5# !Þ%#"

!Þ%Þ(!Þ%#" .
!Þ$!(
3!" œ

The m.l.e. of 5# is, by formula (2.3) and using (2.40) (or (2.19) in matrix form),

" 
   !Þ(%% † "  !Þ'"'B3"  !Þ!"$B3# Ñ# œ !Þ*$(
"!
^
5# œ "! ÐC3
3œ"
(2.43)

MTH 2401, LECTURE NOTES, Page 252, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

2.1. Prove formula (2.19).

2.2. Suppose the observation data for a simple linear regression ] œ "!  "" B  & is presented
in the following table:

3 B3 C3
" ! "
# " %
$ # $
% $ )
& % *

Table 2.3

Use matrix calculus to find the m.l.e. estimates of "! ß "" , and 5# . Also find CovB.

MTH 2401, LECTURE NOTES, Page 253, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

3. Inference about Single Regression Parameters "4


One of the typical testing is that one of the regression coefficients "4 equals a particular value
"4‡ , i.e.

H! À "4 œ "4‡
(3.1)
H" À "4 Á "4‡

Since F4 is Gaussian with mean "4 and variance 54# œ 5# <44 ß if H! is true,

− R Ð!ß "ÑÞ
F4 "4‡
[4 œ 5# <44 (3.2)

Since 5# is unknown, we use another test statistic

"  SSE 
Y4 œ [4 Î 8: 5# 
"Î#
ß (3.3)

where

SSE œ W # œ  ]3  F!  B3" F!  á  B3:" F:" # Þ


8
(3.4)
3œ"

After cancellation by 5# ß (3.3) reduces to

− >8: ß if H! is true.
F4 "4‡
 8:
Y4 œ "
SSE<44

The estimator

O w# À œ "
8: SSEß (3.5)

can be proved to be an unbiased estimator of 5# . This estimator will replace 5# in (3.2) to get
the test statistic
F4 "4‡
Y4 œ O w <44 (3.6)

that is supposed to be a >8: r.v. if H! is true. Let the observed value of Y4 be

^ " ‡
"
5w <44 ß (3.7)
4 4
?4 œ

where

MTH 2401, LECTURE NOTES, Page 254, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

5w # À œ "
8: sseß (3.8)

and

sse œ  C3  "^ !  B3" "^ "  á  B3:" "^ :"  ß


8 #
(3.9)
3œ"

or from (2.19),

sse œ y  G"
^ w y (3.9a)

Then, the :-value is

1 œ T Ö>8:  l?4 l× œ "  T Ö  l?4 l Ÿ >8: Ÿ l?4 l×

œ "  X8:
"
l?4 l  X8:
"
  l?4 l

œ #"  X8:
"
l?4 l œ #>8:
l?4 l ß (3.10)

which we denoted this time by 1 in order not to confuse it with : being the number of model
parameters "! ß á ß ":" .

Example 3.1. Under the condition of Example 2.1, test the hypotheses

H! À "# œ !

H" À "# Á !Þ

Firstly,

5w œ  "!$ 
"Î#
*Þ$(
ß <44 œ !Þ!"%, and "^ # œ !Þ!"$Þ

Hence,

?# œ  *Þ$(†(!Þ!"% 
"Î#
!Þ!"$ œ !Þ!*&.

Using the table for >( we find the :-value equal #>(!Þ!*& œ !Þ*. This is large and we do not reject
H! that "# œ !. 

MTH 2401, LECTURE NOTES, Page 255, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

3.1. In Table 3.1 below, the data is collected on 23 days (from February 15, 2011 through March
18, 2011) on closings of AMEX Oil Index, CBOE Gold Index, and Boening Co. AMEX Oil and
Gold Index are assumed to be control variables B" and B# , while Boeing represents the response
C.

Table 3.1

3 Establish the design matrix G and formulate the regression function
C œ IÒ] l\ œ BÓ œ "!  "" B"  "# B# .
33 Calculate matrix V œ ÐG w GÑ" . Then, using the system of equations "
^ œ VG w y, calculate
^ of the m.l.e.'s of " ß " ß " .
333 Calculate the associated covariance matrix CovB.
the vector " ! " #

3@ Find the m.l.e. of 5# .


@ Test the hypotheses

H! À "# œ !
H" À "# Á !

@3 Test the hypotheses

MTH 2401, LECTURE NOTES, Page 256, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

H! À "" œ !
H" À "" Á !

In each case find the critical region at the significance level α œ !Þ!& and using the :-value
interpret the results of the testing.

MTH 2401, LECTURE NOTES, Page 257, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

4. Prediction

After finding "^ ! ß á ß "^ :" we can introduce the deterministic “predictor”

^ xw Ñ œ "^ !  "^ " B"  á  "^ :" B:" œ xw "


CÐ ^ (4.1)

of C œ "!  "" B"  á ":" B:" , which can be used to interpolate a response to any set of
control variables B ß á ß B . Replacing "^ ß á ß "^
" :" ! with their random versions F ß á ß F
:" ! :"
(the MLE's), we introduce the random predictor of the random regression line

] œ xw "  & (4.2)

as

]^ À œ F!  F" B"  á  F:" B:" œ xw B

œ xw "  VG w &, (4.3)

with

I]^ œ I] œ xw " (4.4)

and

Var]^ œ VarÐxw BÑ

can be shown

œ xw ÐCovBÑx œ 5# xw V x. (4.5)

We would like to determine the mean square error (MSE) of ]^ w.r.t. ] œ xw "  &. As in
Remark 3.3, ] and ]^ are independent and thus

I ]^  ]  œ I Ð]^  xw " Ñ  Ð]  xw " Ñ  œ Var]^  ] 


# #

œ Var]^  Var]  #CovÐ]^ ß ] Ñ œ Var]^  Var]

œ 5# xw V x  5# œ 5# "  xw V x (4.6)

MTH 2401, LECTURE NOTES, Page 258, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

Since the MSE I ]^  ]  contains 5# in (4.6) which is unknown, as we did it in Remark 3.3,
#

we can formally replace 5# with the unbiased empirical sample variance

5w # À œ "
8: sse, (4.7)

where

sse œ y  G"
^  w y. (4.8)

After we do this, we call the obtained value

I ]^  ]  55# œ 5w # "  xw V x


# w#

^ w y"  xw V x
8: y
"
œ  G" (4.9)

the empirical MSE between ] and ]^ . Formula (4.8) can be used to calculate the predicted
deviation of a response ] to any set of control variables, B" ß á ß B:" Þ

MTH 2401, LECTURE NOTES, Page 259, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

4.1. Under the condition of Problem 3.1, find the empirical MSE between ] and ]^ at B" œ "$!!
and B# œ #%!Þ

4.2. Under the condition of Example 2.3 with polynomial regression function, find the empirical
MSE between ] and ]^ at B œ $.

MTH 2401, LECTURE NOTES, Page 260, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

5. The Prediction Interval

Since both ] and ]^ are Gaussian and they are independent, so is also

]  ]^ − R Ð!ß 5# "  xw V x.

Consequently,

^À œ ] ]^
5"xw Vx
− R !ß "Þ

Furthermore, it can be proved that B is independent of

W # œ SSE œ  ]3  F!  B3" F!  á  B3:" F:" # ß


8

3œ"

(the random version of the sse) and that

[ œ SSE
5# − ;#8: Þ

Therefore, ]^ œ xw B and also ]  ]^ and ^ are independent of [ . Hence the r.v.

− >8: 
] ]^
5 "xw V x ] ]^
[ ÎÐ8:Ñ O w "xw Vx
^
O w # Î5#
X À œ œ œ

is > with 8  : degrees of freedom. Furthermore,

T ÖX l Ÿ -× œ "  α

œ T Ö  - Ÿ X Ÿ -× œ #T ÖX Ÿ -×  "

Ê T ÖX Ÿ -× œ "  α
# Ê - œ >"  α 8:
8: "  # œ >αÎ# Þ

Therefore, the r.v.

] ]^
−   >8:
αÎ# ß >αÎ# 
8:
O w # "xw Vx
X œ

with probability "  α or formally,

] − ]^  >α8:
Î#
O w # "  xw V xß ]^  >α8:
Î#
 O w #  "  xw V x   Þ

MTH 2401, LECTURE NOTES, Page 261, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

An empirical version of the above interval is

^ xw Ñ  >8: 5w # "  xw V xß CÐ


CÐ ^ xw Ñ  >8: 5w # "  xw V x
αÎ# αÎ#

^  >8: <xß xw "


œ xw " ^  >8: <x,
αÎ# αÎ#

where

<x œ  sse"  xw V xÎÐ8  :Ñ

^ w y"  xw V xÎÐ8  :Ñ
œ y  G"

and it is called the "!!Ð"  αÑ%-prediction interval for the regression function
C œ "!   "3 B3 .
:"

3œ"

MTH 2401, LECTURE NOTES, Page 262, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

5.1. Under the condition of Problems 3.1 and 4.1, find the 95% prediction interval for C.

5.2. Under the condition of Example 2.3 and polynomial regression function with B œ $, find the
95% prediction interval for C.

MTH 2401, LECTURE NOTES, Page 263, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

6. Testing for Redundancy of Regression Parameters

In the regression plane C œ "!   "3 B3 stuffed with numerous control variables B" ß á ß B:"
:"

3œ"
there may be some which have no or little impact on the response C, and if this is the case, it
would be a good idea to rid of them and reduce the regression model. Consequently, it makes
sense to test a hypothesis if the associated parameters "3 's equal zero. We had some testing of
single " 's in section 3, but to test more than one " at once requires a different approach.

We will call the original regression model

] œ "!   "3 B3  &


:"

3œ"

the complete model. Suppose, without loss of generality, we need to test the last :  < Ð<  :Ñ
parameters as

H! À "< œ á œ ":" œ !

and in order to do it we will try to propose an associated test statistic. We will then introduce the
reduced model

] œ "!   "3 B3  &


<"

3œ"

The degree of redundancy of the remaining parameters can be evaluated based on the associated
SSE's for both models, for convenience denoted by SSEG and SSEV for the complete and reduced
models respectively. It seems plausible that SSEG must be smaller than SSEV since the presence
of more parameters in the complete model should reduce the least square error function. But how
significant will the reduction be leaves up to a test. The greater the difference SSEV  SSEG , the
stronger will be the evidence that H! must be rejected.

We know from the previous sections that


SSEG
5# − ;#8: ß

and thus,
SSEV
5# − ;#8< Þ

We then form the test statistic

ÐSSEV SSEG ÑÎÐ:<Ñ


J À œ SSEG ÎÐ8:Ñ ,

MTH 2401, LECTURE NOTES, Page 264, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

in which the numerator and denominator can be shown to be independent. Furthermore, the r.v.
J is known to have the J -distribution with :  < numerator and 8  : denominator degrees of
freedom.

- œ Y " "  αß Y B œ T ÖJ Ÿ B×, and


Consequently, given an α, we reject H! at the level of significance α if 0  -ß where

ÐsseV sseG ÑÎÐ:<Ñ


0À œ sseG ÎÐ8:Ñ

is the associated empirical value of J .

We can also operate with the :-value: : œ T ÖJ  0 × œ "  Y Ð0 Ñ to reject or not to reject H! .

Example 6.1. In light of Example 2.2 where we considered a three-variate linear regression
models driven by the table

3 B3 C3
" # !
# " !
$ ! "
% " "
& # $

which a parabolic in B, C œ "!  "" B  "# B# . We will test

H! À "" œ "# œ !

H" À at least one of "" ,"# is not zero

at the level of significance α œ !Þ!&Þ

Therefore the model in Example 2.2 can be regarded as complete, while the reduced model will
be with C œ "! .

We will use formula (2.19)

sse œ y  G"
^ w y

to compute the sse's for both models. For the reduced model,

MTH 2401, LECTURE NOTES, Page 265, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

 B"!  !
 B#!  !
œ"
   
GR œ  B$! ß y œ  " Þ
œ"
   
œ"

 B&!  $
B%! œ" "
œ"

We calculate VR œ G w G " œ "


&
^ œ V G w y œ " & œ " and this yields
and " R R &

sseR œ 'Þ

For the complete model,

 B"! œ" B"" œ  # B#"" œ % 


 B#! B##" œ " 
 
GC œ  B$! B#$" œ ! 
œ" B#" œ  "
 
œ" B$" œ !

B B#&" œ % 
B%! œ" B%" œ " B#%" œ "
&! œ" B&" œ #

^  %Î( 
 $Î"% 
"œ !Þ(

give

sseC œ !Þ%'$.

Furthermore, we identify 8 œ &ß : œ $ß < œ " and thus have

Ð'!Þ%'$ÑÐ$"Ñ
0œ !Þ%'$Ð&$Ñ œ ""Þ*&*.

The tabulated critical value - œ Y " Ð!Þ*&Ñ œ "*, so 0 does not belong to the critical region and
we do not reject H! that "" œ "# œ ! meaning that the regression line is independent of B in the
parabolic and linear sense. In addition, the :-value is : œ T ÖJ  0 × œ !Þ!(("(, which is not
large, but greater than !Þ!& not to reject H! . 

MTH 2401, LECTURE NOTES, Page 266, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

6.1. Under the conditions of Problem 3.1, test the hypotheses that

H! À "" œ "# œ !

H" À H! is not true

under the level of significance α œ !Þ!&.

6.2 Suppose C is the current value of a house and B" is the square footage of the living area (in
thousands of sq. ft.), B# is the location (numerical value of the indicator zone À from 1 to 10; 10
is the best), B$ is the state mean appraised value for the past three years (in $100,000), B% the
wind resistance of the of the roof (the hurricane code; 1 if it is under the code, 0 if it is not), B&
additional hurricane protection, such as shutters, panels, etc. (ranking from ! to 10), B' flood
zone (from  & to &), B( total acreage of the property.

Suppose we have 12 observations on the current value of the house collected from 12 different
homes.

B" sq.f. B# loc B$ av B% hc B& hp B' fz B( acr C hv


" "Þ$ 5 "Þ# " ! # !Þ$ "Þ$
# "Þ& $ "Þ% " # ! !Þ% "Þ'
$ #Þ! $ "Þ& " & " !Þ& "Þ)
% $Þ! % "Þ) ! $ " !Þ) #Þ%
& $Þ$ % "Þ) " & # "Þ! #Þ'
' $Þ& & #Þ! ! & $ "Þ" #Þ)
( $Þ& ' #Þ& " % # "Þ& $Þ"
) $Þ$ ( #Þ( " # % #Þ& $Þ&
* %Þ" ' #Þ& ! # " $Þ& $Þ(
"! %Þ& ( $Þ! " & " #Þ& %Þ!
"" %Þ' ) $Þ# " * $ "Þ& %Þ#
"# &Þ# "! $Þ) " "! % #Þ) %Þ&

+Ñ Find the deterministic prediction plane C^ and calculate the predicted value of a home with the
following control variables B" œ $Þ(ß B# œ &ß B$ œ #Þ(ß B% œ "ß B& œ "ß B' œ "ß B( œ #Þ!Þ

,Ñ Under the condition of +Ñ find the empirical MSE between ] and ]^ .

-Ñ Under the condition of +Ñ find the 95% prediction interval for C.

.Ñ Test the hypothesis that "$ ß "% ß and "( are redundant as the entire group, pairwise, and
individually.

MTH 2401, LECTURE NOTES, Page 267, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

6.3 (PROJECT). Select four different financial instruments of your choice (such as stocks,
mutual fund, and indexes) identifying one of them as the response variable and test for
redundancy different combinations of "" ß "# ß "$ . You can restrict the data to 12 trading days.

MTH 2401, LECTURE NOTES, Page 268, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

7. Testing Hypotheses about Linear Functions


of Regression Parameters
In this section we investigate how to make an inference about the linear function

cw " œ  -3 "3 ß -3 − ‘
:"
(7.1)
3œ!

thereby generalizing the results of section 3, in which not a combination, but one particular
regression coefficient "4 was tested. Furthermore, we also add a discussion on confidence
intervals for cw " not previously rendered.

We begin with figuring out the nature of its estimator cw B. The results and discussion will be
very similar to those for the simple linear regression. However, since we did not focus there on
the general case, we rebound in this section. If

cw B œ  - 3 F 3 ß
:"
(7.2)
3œ"

we have

cw " œ I cw B (7.3)

It can be shown that

Varcw B œ cw ÐCovBÑc œ 5# cw V c. (7.4)

Recall from (2.24) that if we multiply B by a 8 ‚ : matrix K, we have

KB œ KVG w 5Z  K" (7.5)

a multivariate Gaussian vector, provided that the 8 ‚ 8 matrix KVG w is nonsingular. Moreover
W À œ KB is 8-variate normal. Therefore, the first component of W, say [" is normal.
Obviously, cw B can be regarded as the product of the first row of matrix K and vector B and the
result is a normal r.v. with parameters cw " and 5# cw V c. Consequently, the r.v.

5cw Vc
cw Bcw "
^À œ (7.6)

is standard normal.

^ . Consider
Remark 7.1. Suppose we intend to estimate cw " with cw "

MTH 2401, LECTURE NOTES, Page 269, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

T  5cw Vc   #  œ "  α.
cw Bcw "
(7.7)

Unfolding the inequality within the probability we have

T cw B  #5cw V c  cw "  cw B  #5cw V c œ "  α (7.8)

is the probability that cw " falls into random interval

ÐEß FÑ œ cw B  #5cw V cß cw B  #5cw V c. (7.9)

To find # , we set the probability in (7.7) equal "  αß where α is the usual significance level.
Since the r.v. 5c  is standard Gaussian, we can easily find # œ Q" "  α#  œ À DαÎ# (which
w
Bcw "
cw Vc
stands for the corresponding αÎ# tail of the standard Gaussian PDF). Replacing # with DαÎ# in
(7.9) we conclude that cw " falls into the random interval

ÐEß FÑ œ cw B  DαÎ# 5cw V cß cw B  DαÎ# 5cw V c (7.10)

with probability "  α. The practical significance of this conclusion is minimal, mainly due to
^ , we
the presence of the estimator B of " and thus if we replace cw B with its empirical value cw "
obtain a fully deterministic interval

^  D 5  c w V c ß cw "
Ð+ß ,Ñ œ cw " ^  D 5cw V cß (7.11)
αÎ# αÎ#

for which however, the meaning of probability does not make any sense. In other words, we can
no longer claim that cw " falls into the interval Ð+ß ,Ñ of (7.11) with probability "  α, simply
because the latter interval is one of realizations of the original random interval ÐEß FÑÞ
Consequently, statisticians call this empirical interval Ð+ß ,Ñ, the 100Ð"  αÑ% confidence
interval for cw ". 

Remark 7.2. The above confidence interval for cw " has yet another shortcoming. We notice that
it contains the unknown model parameter 5 , which naturally needs to be replaced with 5w , the
empirical estimate of 5. However, the entire associated argumentation on this needs to be
modified. Namely, we need to start with ^ œ 5c 
w
Bcw "
in (7.6) and there replace 5 with O w
being the unbiased estimator of 5. By doing so, the modified r.v.
cw Vc

O w cw Vc
cw Bcw "
X À œ (7.12)

becomes a >-r.v. with 8  : degrees of freedom, as we have seen it numerous times. The main
argument, as in the past cases that B is independent of SSE and thus of O w œ SSEÎÐ8  :Ñ.
Skipping the details of this procedure (as being totally similar to those in Remark 7.1, we obtain
the "!!Ð"  αÑ% confidence interval for cw " as

MTH 2401, LECTURE NOTES, Page 270, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

^  >8: 5w cw V cß cw "


Ð+ß ,Ñ œ cw " ^  >8: 5w cw V cß (7.13)
αÎ# αÎ#

where >8:
αÎ# is the αÎ#-tail of the >8: distribution. 

Now suppose we need to test

H! À cw " œ -‡ (7.14)

against

H" À cw " Á -‡ (7.15)

The usual way to employ a test statistic would be first to replace cw " in (7.6) with -‡ in which
case we would test whether or not the obtained statistic sill Gaussian remains standard. However,
since 5 is unknown, we also need to replace 5 with O w thereby arriving at a >-r.v. with 8  :
degrees of freedom

O w cw Vc
cw B-‡
Y‡ À œ (7.16)

if the null hypothesis proves to be true.

The critical region for the corresponding inference will be

G œ   ∞ß  >8:  8:
αÎ#  >αÎ# ß ∞ (7.17)

at significance level αß where >8:


αÎ# , as mentioned, is the αÎ#-tail of the >8: distribution. To
reject the null hypothesis we therefore need to test if the empirical version of Y‡ ß
^ -
5w cw Vc
cw "
?‡ œ ‡
(7.18)

falls into the critical region G . The associated :-value will be

1 œ T Öl>8: l  l?‡ l× œ "  T Ö  l?‡ l Ÿ >8: Ÿ l?‡ l×

œ "  X8:
"
l?‡ l  X8:
"
  l?‡ l

œ #"  X8:
"
l?‡ l œ #>8:
l?‡ l ß (7.19)

which we denoted this time by 1 in order not to confuse it with : being the number of model
parameters "! ß á ß ":" .

MTH 2401, LECTURE NOTES, Page 271, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

a test statistic X that rejects H! at significance level α whenever T ÖX l  -×ß where
Remark 7.3. Suppose we test some generic simple hypothesis H! À . œ .! and suppose there is

- œ JX" "  α# 

and that X is r.v. whose pdf is an even function. Suppose > is an observed value of allegedly X .
We reject H! if >  - or >   - . If we reject H! , it means that l>l  - , which is equivalent to
saying that

T ÖlX l  l>l× œ #"  JX" l>l  T ÖlX l  -×.

Therefore, it makes sense to define

: œ T ÖlX l  l>l×

for an observed > and call it the :-value. Since > was initially measured against - and - had been
calculated from a given significance α, we see that the smaller : than α is, the more evidence we
have against H! . Therefore, small values of :, which speak against the validity of H! , tell us that
the deviations of the true X from such a > are unlikely. In conclusion, a :-value is a measure of
how much evidence we have against the null hypothesis. 

MTH 2401, LECTURE NOTES, Page 272, Version 54


CHAPTER X. MULTIPLE LINEAR REGRESSION

PROBLEMS

7.1. Prove formula (7.17) for the critical region.

7.2. Give the formula for the confidence interval for cw " when testing a simple null hypothesis
about a particular regression coefficient in section 3.

7.3. Under the condition of Problem 1, section 3, test the hypothesis that "" œ "# , first at !Þ!#&
significance value and then by using the :-value.

7.4. Under the condition of Problem 1, section 3, give the *&% confidence interval for the linear
combination "!  "$!!""  (!"# . Compare it with the 90% and 99% confidence intervals.

MTH 2401, LECTURE NOTES, Page 273, Version 54

You might also like