STAT301 Notes

STAT 301 Probability Distributions
Unit 1 Events and their Probabilities
Introduction
I welcome you my valued student to unit 1 of this module on Probability
Distributions. In the courses; STAT 201 and STAT 204 you learnt about a random
variable as a variable that can take several values within a defined range on the real
line in a random manner. In this Unit, you shall learn about random variables as
functions from an abstract space ( a Probability space ) to the closed interval [0,1].
The discussions will cover the following:
Section 1 Abstract Definition of a Sample Space
Section 2 Axioms of Probability
Section 3 The Concept of Random variables
Section 4 Probability distributions
Section 5 Conditional Probability
Section 6 Conditional expectation
Objectives
Upon the completion of this Unit , you should be able
 to define a sample space as an abstract space,

 to define a random variable as a function,
 to compute the probability distribution of random variables,
 to find the conditional probability distribution of random variables,
 to calculate conditional expectations.
1
Section 1 Abstract Definition of a Sample Space
Introduction
You are on the launching pad, ready to take off on a journey through the world
of Probability Theory. In this section I shall define a sample space and explain
to you the main assumptions underlying this definition.
Objectives
By the end of this section, you should be able
 to give abstract definition of a sample space

 to define a sigma-field and give examples of sigma-algebra.
Review
Recall that in connection with an experiment we have the following quantities:
Definition 1. 1 ( Sample Space)
It is the set of all possible outcomes of a random experiment. Denote it by Ω and

note that the complement of Ω is the empty set .ie. c   .
A sample point is any member of Ω and you shall denote it by i.e. .

Subsets of  are called events. If  Ω, you say that the event occurs ( with
some probability) , if the outcome of the experiment is an element of . Thus
the relation  , between sets considered as events means that the occurrence of
implies the occurrence of .
If   , is called an elementary event or simple event. If contains more

than one point, it is called a composite or compound event.
Definition 1.2 ( Sigma field or -field)
A collection of subsets of Ω is called a sigma – field ( Boolean Algebra) if it

satisfies the following properties:-
(a) Ω
(b) whenever where Ω .

(c) A
n 1
n  F , whenever An  F , where n  1,2,3.....
2
Remark
By above definition; if Ω has n elements, then the class F of all subsets of Ω

is a sigma-field.
Example 1. 4
If you toss a fair coin and your interest is the outcomes. i.e. Head denoted by
H and Tail denoted by T . Then   H, T  and the class F   ,{H },{T },{H , T }}
is a sigma-field containing four elements.
But this is not the only possible sigma-field. There are many others. If A   ,
then the class of sets  , A, Ac ,{H , T } is also a sigma-field; in this case a sub
sigma-algebra of F .
Remark
Note that , by definition 1.2 , a sigma-field is a non-empty class of events, closed
under countable unions, intersections and complementation.
Example 1.6
Consider the experiment of two independent tosses of a fair coin. Let the sample
space  be the number of heads observed.
(i) List elements of  .

(ii) List three different sigma-fields each with 4 elements that can be form
on  .
Solution
(i)   0,1,2
(ii) Three sigma-fields are:
F1   , 1, 0,2, 
F2   , 0, 1,2, 
F3   , 2, 0,1,  .
3
Activity 1.7
In an experiment of tossing a fair die once, let  be the sample space and  the
impossible event.
(i) List the sigma field with eight elements.
(ii) Is your answer in (i) unique ?
Summary
I am sure by now you can define a sample space and list sigma- fields generated
by a sample space. But just before you give yourself a break, let me recap all that
we have discussed in this section.
You have learnt
 about abstract definition of a sample space

 to list a sigma-field and examples of sigma-algebra.
4
Section 2 Axioms of Probability

You are welcome back from your break. I will now define a probability space ,and
explain to you the axioms underlying this definition.
Objectives
By the end of this section, you should be able to
 define a probability space,

 explain the axioms underlying the definition of probability of an event.
Definition 2.1 ( Probability Space)
If the probability measure a function P defined on F , (where F is a   field )

satisfies
(i) 0  P( )  P( A)  P()  1 , for all A F .
(ii) P( A1  A2 )  P( A1 )  P( A2 )  P( A1  A2 ) , for Ai  F , i  1,2,3...
  
(iii) P  Ai    P( Ai ) if Ai  F , i  1,2,3... are mutually disjoint ( i.e.
 i 1  i 1
Ai  A j   i  j ) , then the triplet (, F , P) is called a
probability space.
Theorem 2.2
Let (, F , P) be a probability space and let H  F with P( H )  0. For an

arbitrary A F , defined by
P( A  H )
PH ( A)  .
P( H )
Then (, F , PH ) is a probability space.
Proof of Theorem 2.2
P( A  H )
Clearly, you can verify that PH ( A)   0 , for all A F .
P( H )
5
P (  H ) P ( H )
Also you have PH ()    1.
P( H ) P( H )
Next you can verify that , if A1 , A2 ,... is a disjoint sequence of sets in F , then
  
 

P   Ai  H   P A  H  i 
PH   Ai    i 1  i 1
  PH ( Ai ) .
 i 1  P( H ) P( H ) i 1
This implies (, F , PH ) is a probability space.
Example 2.3
Let  be a sample space of a random experiment. If A and B form a partition of  and 

denote the impossible event. Show that the class of set F   , , A, B is a sigma – field .
Solution
Since A and B form a partition of  Ac  B and Bc  A by definition of F
(i)   F and  F
(ii) Ac  B and Bc  A  F ,
 A and B  F
which shows that F is a sigma-field.
Example 2.4
Let  , F , P  be a probability space and let B  F with P  B   0 . Show that the function
P B : B  R defined by P B ( A)  P  A B  is a probability measure on  , F  with
PB  B  1 .
Solution
Let F   , , A, B . Then, given P B ( A)  P  A B  you have
P(  B) P( )
P B ( )  P  B  
0
 = = 0 , since   B  
P( B) P( B) P  B
Now take A  B  B , then notice that you have

P  A  B  P  B
6
P  A  B
 1
P  B
:. P  A B   1
 P B  A  1
Property 2.5 (Addition property )

P  A  B  C  B
Let A  C   . Then P B  A  C  
P  B
P  A  B  P C  B 
= 
P  B P  B
Hence,
P B  A  C   P B  A  P B  C 
Activity 2.6
Consider two independent tosses of a fair coin and observing Heads (H) and Tails(T). Choose
  HH , HT , TH , TT  and let F be a collection of all subsets of  . Assuming each
outcome in  is equally likely, show that  , F , P  is a probability space.
Activity 2.7
Let  be the sample space of a random experiment. If A and B from a partition
of  and  denotes the impossible event, show that the class of sets
F   , , A, B is a  -field. If P is a function defined on F , what properties
must P satisfies for the triple  , F , P  to be called a probability space.
Summary
In this section you have learnt
 to give the definition of a probability space,

 about the axioms underlying the definition of probability of an event.
7
Section 3 Discrete Random Variables
Introduction
You are welcome to Section 3 of Unit 1. Here, I will define to you a random
variable as a function from a probability space to the close interval  0,1 .
Objectives
At the end of this Section , you should be able to
 define a discrete random variable on a probability space,

 define a probability mass function or a probability frequency function,
 calculate simple probabilities from the cumulative distribution function.
Definition 3.1 ( Discrete Random Variables)
A discrete real-valued random variable X on a probability space  , L, P  is a

function X with domain  and range a finite or countably infinite subset
x1 , x2 , x3 ,... of the real numbers R such that  : X ()  xi  is an event for all i .
Definition 3.2 (Probability mass function)
Let X be a real-valued random variable with c.d.f F on the probability space

 , L, P  . Then, X is said to be discrete if there exists countable set E  R such
P  X  E   1 . i.e P  : X ( )  E   1. The points of E which have positive
probability are the jump points of the step function F . The probability mass
function or the probability frequency function of X is function

pk  P( X  k ), k  1 satisfying pk  0,  pk  1.
k 1
Remark

Given any set of numbers pk , where pk  0, k  1, 2,3... , p
k 1
k  1. Then,
pk , k  1, 2,3... is a probability mass function of a random variable X .
8
Definition 3.3 (Cumulative distribution function)
If X is a discrete random variable , then its cumulative distribution function (c.d.f)

is given by
F (a)   P( X  x)
x  a
Example 3.4
Suppose X has a probability mass function given by p(1)  1/ 4 , P(2)  1/ 2 ,

p(3)  1/ 8 and p(4)  1/ 8 .
Then, the cumulative distribution function of X is given by
 0 , a 1
1/ 4 ,1  a  2

F (a)  3 / 4, 2  a  3
7 / 8,3  a  4

 1 , 4  a
Lemma 3.5
Suppose that X has c.d.f F . Then the probability that X  xi is given by
P( X  xi )  F ( xi )  F ( xi 1 ) , i  1, 2,3,...
Example 3. 6
Suppose the cumulative distribution function of X is given by
 0 , a 1
1/ 4 ,1  a  2

F (a)  3 / 4, 2  a  3
7 / 8,3  a  4

 1 , 4  a.
Find
(i) P( X  2)
(ii) P( X  1) .
9
Solution
3 1 1
(i) P( X  2)  F (2)  F (1)    .
4 4 2
1 1
(ii) P( X  1)  F (1)  F (0)  0  .
4 4
Activity 3.8
Suppose X is a random variable with c.d.f F . Show that for any real number x
P  X  x   lim F ( y ) .
y x
Summary
You can now
 define a discrete random variable on a probability space,

 define a probability mass function or a probability frequency function,
 Compute simple probabilities from the cumulative distribution function.
Section 4 Continuous Random Variables
Introduction
The issue of continuous random variables and probability density functions ( p.d.fs)
is more complicated. A random variable X :   R always has a cumulative
distribution function F . Whether there exists a function f such that f integrates
dF ( x)
to F ,and exists and equals f ( almost everywhere) depends on something
dx
stronger than just continuity. In this section I will take you through some concepts
of continuity in relation to c.d.fs of continuous random variables.
Objectives
At the end of this Section you should be able to
 explain the concept of absolute continuity of c.d.fs,

 compute probabilities when the p.d.f of random variables is given.
10
Definition 4.1
A real-valued function F is continuous in x0  R iff   0,   0, x :
x  x0    F ( x)  F ( xo )   .
Note that, F is continuous iff F is continuous in all x  R.
Definition 4.2
A real-valued function F defined on  a, b is absolutely continuous on  a, b iff

  0,   0,  finite sub-collection of disjoint subinterval on  ai , bi  , i  1, 2,..., n :
n n
 (bi  ai )     F (bi )  F (ai )   .

i 1 i 1
Here you note that absolute continuity implies continuity.
Activity 4.3
Show that,
dF ( x)
(i) if F is absolutely continuous, then exists almost everywhere.
dx
(ii) a function F is an indefinite integral iff it is absolutely continuous. i.e.

every absolutely continuous function F is the indefinite integral of
dF ( x)
its derivative .
dx
Definition 4.4
Let X be a random variable on  , L, P  with c.d.f F . We say X is a

continuous random variable iff F is absolutely continuous. Hence, there exists a
non-negative integrable function f , the probability density function (p.d.f ) of X ,
such that
x
F ( x)   f (t )dt  P  X  x .

11
Hence, you have that, if a, b  R , then
b
P  a  X  b   F (b)  Fa)   f (t )dt
a
exists and is well defined.
Theorem 4.5
Let X be a continuous random variable with p.d.f f . Then, it holds that:
(i) For every Borel set B , P( B)   f (t )dt .

B
(ii) If F is absolutely continuous and f is continuous at x , then
dF ( x)
F (1) ( x)   f ( x)
dx
Sketch of Proof
Part(i) From definition 4.2 above
Part(ii) By the fundamental theorem of calculus.
Note, here that every c.d.f F can be written in the form
F ( x)  aFd ( x)  (1  a) Fc ( x), 0  a  1 ,
where Fd is c.d.f of a discrete random variable and Fc is a continuous c.d.f.
Example 4.6
Consider the function given as
 0 , x0
 1/ 2 , x  0

F ( x)  
1/ 2  x / 2, 0  x  1
 1 , x  1.
You can write F ( x)  aFd ( x)  (1  a) Fc ( x), 0  a  1 . How ?
12
Solution
Since F ( x) has only one jump at x  0 , it is reasonable to get started with a

p.m.f p0  1 and corresponding c.d.f
0, x  0
Fd ( x)  
1, x  0.
Since F ( x)  0 for x  0 and F ( x)  1 for x  1, it must clearly hold that

Fc ( x)  0 for x  0 and Fc ( x)  1 for x  1. Moreover , F ( x) increases linearly
in 0  x  1 . A good guess would be a p.d.f fc ( x)  1, 0  x  1 and the
corresponding c.d.f
0 , x  0

Fc ( x)   x , 0  x  1
1 , 1, x  1

1
Now, observe that F (0)  , so you have at least to multiply Fd ( x) by 1/2
2
1 1
Hence, F ( x) can be written as F ( x)  Fd ( x)  Fc ( x).
2 2
Definition 4.7
The two valued function I A ( x) is called the indicator function and it is defined as
follows:
1, x  A
I A ( x)  
0, x  A
Activity 4.8 ( Dirac mass at a point ). Verify that the function F given by
0, x  
F ( x)  
1, x  
is a legitimate c.d.f .
Summary
All too soon we have come to the end of this section. Let me recap what you have
learnt so far. You have learnt
 about absolute continuity of c.d.f

 how to compute probabilities from c.d.fs and p.d.dfs
13
Section 5 Conditional Probability
Introduction
Welcome to Section 5 of Unit 1. In this section I shall re-introduce to you the
concept of conditioning given a jointly distributed random variables, say, X and Y .
Objectives
 find conditional probability functions and probability frequency functions,

 compute probabilities from the conditional p.d.f and p.m.f.
Definition 5.1 ( The Discrete Case)
If X and Y are discrete random variables , then
(i) the conditional probability mass function of X given Y  y is given by
P( X  x; Y  y ) p( x, y )
pX Y ( x y)   , pY ( y)  0
P(Y  y ) p y ( y)
where pY is the marginal probability mass function of Y .
(ii) the conditional probability mass function of Y given X  x is given by
P( X  x; Y  y ) p( x, y )
pY X ( y x)   , p X ( x)  0
P( X  x) p X ( x)
where p X is the marginal probability mass function of X.
Observe that if X and Y are independent, then you have
(i) pY X ( y x)  P( X  x)
.
(ii) pX Y ( x y)  P(Y  y ) .
This is because the joint p.d.f of X and Y factorizes into the product of the
marginal probability mass functions of X and Y .i.e. p( x, y)  pX ( x) pY ( y).
14
Example 5.2
The number of eggs laid by an insect is known to have a geometric distribution.
f  x   p 1  p  for x  0, 1, 2, 3 ...,
x
where p is a parameter. Each egg laid has a probability  of hatching independently of the
development of any other eggs. Show that the number of hatched eggs from a nest has a
Geometric distribution. What is the mean and variance of the number of eggs hatched?
[Hint: Number of eggs hatched depends on eggs laid]
Solution
Let X be the number of eggs laid, and Y be the number of eggs hatched of X . Now,
note that the conditional probability mass function of Y given X  x is given by
 x
P Y  y X  x     y 1    , y  0,1, 2,..., x
x y
 y
Therefore, you have the joint probability mass function of Y , X  as
f ( x, y)  fY X ( y x) f X ( x)
 x
   y 1    p 1  p 
x y x
 y
And for the marginal probability of Y you calculate the summation

 x
P Y  y      y 1    p 1  p 
x y x
x y  y 
 
 x
= p y 1  p    y  1    1  p  .1  p 
y x y x y
x y  

 x
  y  1   1  p 
x y
= p y 1  p 
y
x y  
Let x  y  k  x  y  k then you have x  y, k  0 and x  , k  

yk
P Y  y   p  1  p     1   1  p 
y k
k 0  y 
15

 r  q  1 r
Recall the binomial expansion theorem as 1  x   
q
x and use it as
k 0  r 
follows

 k   y  1  1
P Y  y   p  1  p    1   1  p 
y k
k 0  k 
 y 1
P Y  y   p  1  p  1  1   1  p 
y
 y 1
p  1  p  1  1  p     p
y
=
p  1  p  
y
=
 p   p
y 1
p  1  p  
y
= y 1
 p   1  p  
  1  p  
y
p
=  
p   1  p   p   1  p  
p
which is a geometric distribution with parameter , and therefore, you have
p   1  p 
1
E (Y )   p   (1  p) .
p
Example 5.4
If X and Y are independent Poisson random variables with parameters 1 and 2 ,respectively,
calculate and identity the conditional distribution of X given X  Y  k .
Solution
X ~ P0  1  , Y ~ P0  2 
e 1 1x e 2 2y
:. P  X  x   , x  0, 1, 2 , P Y  y   y  0, 1, 2
X! y!
X  Y ~ P  1  2  by the reproductive property
P  X  x, X  Y  k 
:. P  X  x X  Y  k  
P X Y  k
16
P  X  x, Y  k  x 
= ,since X and Y are independent
Px Y  k
e  1 1x e  2 2 
k x

=
x!  k  x !
 1  2 
 1  2 
k
e
k!
 1  2 
1x 2 k  x  e
    
 1  2 
1 2
k
e
= 
x ! k  x ! k!
 1  2 
e 1x 2 k  x  k!
= 
x ! k  x ! e
 1  2 
 1  2 
k
k! 1x 2  k x
1
= k multiply by
x ! k  x !  1  2  (1  2 ) (1  2 )  x
x
k  1x 2 k  x 
=  
 x   1  2   1  2 
k x x
x k x
 k   1   2 
=      x  0, 1, 2 k
 x  1  2   1  2 
 1 
P  X  x X  Y  k  ~ b  k, 
 1  2 
1
i.e. P  X  x X  Y  k  is a binomial distribution with parameters n  k and p  .
1  2
Activity 5.5
The random variable X1 , X 2 , X n are independent and each has a poisson distribution with
mean 1. Let Y  X1  X 2   X n . find the conditional distribution of X 2 given Y.
Activity 5.6
The number of eggs X laid by an insect is known to have a binomial distribution with
parameter n and p.  0  p  1 . Each egg laid has a probability  of hatching independently
of the development of any other eggs.

(a) Show that the number of eggs hatched has a binomial distribution
(b) What does this mean?
(c) What is the mean and variance of the number of eggs hatched?
17
Activity 5.7
Consider Y, the number of successes in M independent Bernoulli trials each with success
probability X. Suppose that X itself is a r.v which is uniformly distributed over 1, 0 
(a) Find the p.m.f of Y and identify the distribution

(b) What is the mean and variance of Y.
Activity 5.8
The number of automobile accidents a driver will be involved in during a one-year period is a
random variable Y having a Poisson distribution with parameter X, where X depends on the
driver. Suppose a driver is chosen at random from some population and hence X itself is a
continuous random variable with p.d.f f  x  , 0  x  
Show that P Y  k    P Y  k X  x  f  x  dx

(a)
o
(b) Suppose that x has an exponential distribution with mean 1 , where c is a

c
positive constant. Obtain the distribution of Y, hence find the expectation of Y.
Definition 5.9 ( The Continuous Case)
If X and Y have a joint p.d.f , then the conditional probability density

function of X given that Y  y is defined for all values of y where the
marginal p.d.f of Y , h( y)  0 by
f ( x, y )
f X Y ( x y)  .
h( y )
The conditional p.d.f of Y given that X  x can be similarly defined as
f ( x, y )
fY X ( x y )  , for g ( x)  0 .
g ( y)
18
Example 5.10
Suppose the joint p.d.f of f ( x, y) is given by
 e  x / y e  y / y , 0  x, y  

f ( x, y )  
 0 , otherwise.

Find P  X  1 Y  y .
Solution
First obtain the conditional p.d.f of X given Yy as follows
f ( x, y ) e x / y e y / y 1
f X Y ( x y)   
 e x / y , x  0, y  0.
h( y ) 1 y
e y  e x / y dx
0
y

1 
P( X  1 Y  y )   e x / y dx  e x / y   e .
1/ y
Hence ,
1
y 1
Notice, if X and Y are independent then you have that
f X Y ( x y)  g ( x) , the marginal p.d.f of X,
fY X ( y x)  h( y) , the marginal p.d.f of Y .
The preceding examples will show that if X and Y are jointly continuous, then
for any set A ,
P( X  A Y  y )   f X y ( x y )dx
A
In particular, if you let A  (, a) you can define the conditional c.d.f of X
given Y  y by
a
FX y (a y ) 

 f X y (a y )dx .
Note that by the Fundamental theorem of calculus you have

d
 
F (a y )  f X y (a y ).
dx X y
19
Similarly, if X and Y are jointly discrete, then you can define the conditional
c.d.f of X given Y  y as
P ( X  a, Y  y )
FX y (a y ) 
P(Y  y )
Definition 5.11 ( Bivariate mixed random variables)
X and Y is called bivariate mixed random variable if the joint distribution of
X and Y is partly discrete and partly continuous.
Example 5.12
Suppose that X is a continuous random variable with p.d.f f ( x) or c.d.f F ( x)
and N is discrete random variable , so that ( X , N ) is a bivariate r.v.
Given this mixed situation , write
P( N  n x  X  x  dx) P( x  X  x  dx)
 P( x  X  x  dx N  n)  / dx  
P ( N  n) dx
and you let dx  0 to obtain
FX n ( x  dx)  FX n ( x) P( N  n X  x) F ( x  dx)  F ( x)
lim   lim .
dx 0 dx P ( N  n) dx 0 dx
This yields
P( N  n X  x) d
d
dx

FX n ( x n)  P ( N  n)
  F ( x)  .
dx
Hence you have
P( N  n X  x)
f X n ( x n)   f ( x)
P ( N  n)
where f is the marginal p.d.f of X . Use the legitimacy property of fX n to

1
obtain  P( N  n X  x) f ( x)dx  1 .
P( N  n) 

Hence, P ( N  n)   P( N  n X  x) f ( x)dx .

The joint distribution of N and X
is f (n, x)  P( N  n X  x) f ( x)  f X n ( x n) P( N  n) .
20
In general, if X and Y are jointly distributed random variables where X or

Y may be discrete or continuous, then observe that

 P  X  x Y  y  P (Y  y )
 x if is discrete

P  X  x  
  if is continuous.

  P( X  x Y  y )h( y )dy
Example 5.13
Consider Y the number of successes in y  m bernoulli trials each with a success
probability X . Suppose that X is a r.v. with uniform distribution over the interval  0,1 .
Find P Y  y  .
Solution
 y  m y
Given Y X  x ~ b  y  m, x  i.e. P Y  y X  x     x 1  x 
m
 y 
Note that X ~ U  0,1 and so the probability density function of X is given by
g  x   1, 0  x  1
But the marginal of Y is given by

f  x, y 
P Y  y    f  x, y  dx and P Y  y x  
[0,1]
g  x
 f  x, y   P Y  y x   g  x 
:. h  y   P Y  y    P Y  y x   g  x dx
[0,1]
 y  m y
x 1  x  dx
1
 
m
= 
0 y 
 y  m 1 y
 0 x 1  x  dx
m
=
 y 
 y  m
=    y  1, m  1 using the Beta function
 y 
21
 y  m  y ! m!
= 
 y   y  m  1!
 y  m !  y ! m!
=
y !m !  y  m  1!
=
 y  m ! 
1
,
 y  m  1 y  m ! y  m  1
which is the uniform distribution on the set 0,1, 2,3,..., n  m .
Definition 5.14 ( The Gamma integral)

The gamma function is defined by

( x)    x 1e d , x0
0
and for large x you can verify that

1
x
( x)  2 e x x 2
by the Stirlings formula.
When x  n is an integer you have
(n)  (n  1)(n  2)...1
which can easily be obtained by successive integration by parts or a reduction
formula.
Definition 5.15 ( The Beta integral)

The beta integral is given by
 ( p ) ( q )
1
β(p,q)=   x p 1 (1  x)q 1 dx
( p  q ) 0
where p  0, q  0 , 0  x  1.
Activity 5.16
22
Let X and Y be two independent, nonnegative integer-valued random variables

whose distribution has the property
 m  n 
  
P  X  x X  Y  x  y     
n y
m  n
 
 x y
for all nonnegative integers x and y where m and n are given positive integers.
Assume that P  X  0  and P Y  0  are strictly positive. Show that both X and
Y have binomial distributions with the same parameter p , the other parameters being
m and n respectively.
Summary
At the end of this section you have learnt,
 How to find conditional p.m.f’s and p.d.f’s
 How to compute probabilities from a conditional p.d.f and p.m.f.
23
Section 6 Conditional Expectation
Introduction
You are ready to use the conditional probability distributions studied in section 5 to
calculate some numerical characteristics or estimates of this distributions.
You begin by first recalling from Stat 204 or Introductory Probability II the definition of
conditional expectation.
Objectives
 compute numerical values from the conditional distribution
 interprete the numerical values from the conditional distribution.
Definition 6.1 ( Continuous case)

If ( X , Y ) is a bivariate continuous random variable with the joint p.d.f f ( x, y) ,
you can define the conditional expectation of X given Y  y as

E  X Y  y    xf

X y
( x y )dx .
Definition 6.2 ( Discrete case)

If ( X , Y ) is a bivariate discrete random variable with the joint p.m.f p( x, y) , you
can define the conditional expectation of X given Y  y j as

E  X Y  y    xi P( X  xi Y  y j ).
i 1
You can analogously define the conditional expectation of Y and X  x as
Definition 6.3 ( Continuous case)

If ( X , Y ) is a bivariate continuous random variable with the joint p.d.f f ( x, y) ,
you can define the conditional expectation of X given Y  y as

E Y X  x    yf

Yx
( y x)dy .
24
Definition 6.4 ( Discrete case)

If ( X , Y ) is a bivariate discrete random variable with the joint p.m.f p( x, y) , the
conditional expectation of X given Y  y j as

E Y X  x    y j P(Y  y j X  xi ).
j 1
You recall that E  X Y  y  is a value of the random variable E  X Y  and
E Y X  x  is a value of the random variable E Y X  .
Activity 6.5
Prove that if ( X , Y ) is a bivariate continuous or discrete random variable then
(i)  
E E  X Y  =E[X]
(ii)  
E E Y X  =E[Y] .
Definition 6.6 ( Conditional Variance)
The conditional variance of X given Y  y is defined as
V  X Y   E ( X  E[ X Y ])2 Y  .
Theorem 6.7
For any bivariate r.v. ( X , Y ) , V [ X ]  E Var[ X Y ]  Var E[ X Y ] .
Proof
V  X Y   E  X 2 Y    E ( X Y ) 
2
Note that and so you have
   
E V  X Y   E E  X 2 Y   E E( X Y )   E[ X 2 ]  E  E ( X Y )  ……………. (1)
2 2
Also as you know from stat 204 ,
25

V  E ( X Y )   E E  X Y 
2
  E[X ] …………………………………………….(2)
2
Add equations (1) and (2) above to obtain
E Var[ X Y ]  Var E[ X Y ]  V [ X ] .
Example 6.8
Suppose that by any time t the number of people that have arrived at a train station is a
poisson r.v. with mean t . If the initial train arrives at the station that is uniformly
distributed over  0,T  . What is the mean and variance of the number of passengers that enter
the initial train.
Solution
Let X be the number of people who have arrived at any time (t). Then
e  t  t 
x
P X t  , given t ~ u  0, T 
X!
f t   1 , 0  t  T
T
E  X t   t
However,  
E  X   E E  X t   E  t 
=  E t 
=  T since t ~ u  0, T 
2
and Var ( X )  Var  E ( X t )   E Var ( X t )  , where
Var  X t   t .
Therefore, you have that

Var ( X )  Var (t )  E (t )
=  Var (t )   E (t )
2
T2 T
=  
2
12 2
26
 T 
2
T
= 
12 2
1
= T  T  6 
12
Example 6.9
Suppose that for a bivariate continuous random variable ( X , Y ) with joint pdf
f  x, y  , x  0 and y  0 . The probability density function of Y and the conditional p.d.f of
X given Y  y are respectively
y y  
g  y   e y , y  0 and f  x y   1 exp  x , x  0, y  0 find var  x 
Solution
g  y   e y ~ Exp 1 and so you have E  y   1 , var  y   1
Now f  x y 
1  x y 
y
e ~ Exp 1
y   and so you have , E  x y  1 1  y
y
var  x y   1 1  y 2 . Write these in the expression below , you have that

2
y
Var ( X )  E Var ( X Y )  Var E ( X Y )
= E Y 2   E Y 
 
Notice, Var (Y )  E Y 2   E (Y )
2
and so you have
 
E Y 2  Var (Y )   E (Y )
2
= 11
=2
Therefore, Var ( X )  E Y 2   Var (Y )
= 2 1
= 3.
27
Activity 6.10
The number N of flowers a tree bears in a season is known to have a poisson distribution with
parameters  . Each flower has a probability p of bearing a fruit independently of the
development of any other flower. Let R denote the number of flowers that bear fruit in a
season. Find: (a) E  R (b) var  R  and, show that R has the Poisson
distribution with parameter p.
The Mean and Variance of a Random number of Random variables

Let X1 , X 2 , , be a sequence of independent and identically distributed random
variable and let N be a non-negative integer-valued random variable that is

independent of the sequence X1 , X 2 , ,.
Example 6.11
N
If R   X i , find the mean and the variance of R.
i 1
Solution
N  
 N   
E[R]=E  X i   E E  X i  N  .
 i 1    i 1  
 
N   n 
Now E  X i N  n   E  X i   nE[X 1 ] by the independence of X1 , X 2 , , and
 i 1   i 1 
N 
R. This implies E  X i   E  NE(X 1 )  E  N  E  X 1  .
 i 1 
N 
To compute V   X i  , you condition on N as follows
 i 1 
N  N  N
V  X i N   NV  X i  , since
 i 1 
given N the r.v X i is the sum of
 i 1  i 1
a fixed number of independent r.v,s. Hence you apply the conditional variance
formula to arrive at
V [ R]  E[N]V [ X ]   E[X1 ] V [ N ] .
2
28
Example 6.12
The number of accidents occurring in a factory in a week is a random variable with
mean  and  2 . The number of individuals injured in different accidents are
independently distributed, each with mean  and variance  2 . Show that the mean
and variance of the number of individuals R injured in a week are respectively given
by E[ R]   and V  R   2 2   2 .
Solution
Let N be the number of accidents in a week and let Xi be the number of
individuals injured in accident number i. Then,
N
R   X i and E[ R]  E( N )E( X1 ) =  ,
i 1
V [ R]  E[N]V [ X ]   E[X1 ] V [ N ] =  2  2 2 .
2
Definition 6.13 ( Scedastic curve)
The curve of the conditional variance function V  X Y  is called the scedastic
curve of X on Y .
Definition 6.13 ( Homoscedastic curve)
The curve of the regression of X on Y i.e. E  X Y  is called homoscedastic if
the conditional variance V  X Y  is a constant.
Example 6.14
The marginal probability density function of X , g  x   2 1  x  for 0  x  1
and
h  y x 
1
, 0  y  1 x
1 x
(a) Find the regression function of Y given X  x ,
(b) Prove that the regression function of Y given X  x
is not homoscedastic
29
Solution
Regression function of Y X  x corresponds to E[Y X  x]

Now E[Y X  x]   yh( y x)dy


1
=  y  1  x dy

1 x
y
=  1  x dy
0
1x
1  y2 
1  x  2  0
=
1  (1  x) 2 
=  
1 x  2 
1 x
=
2
E  y x 
1
Therefore, you have 1  x  , 0  x  1 .
2
(b) for regression function of Y x to be homoscedastic Var Y x  should be a
 
Var Y x   E Y 2 x   E Y x  and
2
constant. Now
E  y 2 x    y 2 h  y x  dy   y 2
  1
dy
  1 x
1 x y2
= 0 1 x
dy
1 x
1  y3 
1  x  3  0
=
1  1  x 
3
= 
1  x  3
= 1 1  x 
2
3
2
1 
Var Y x   (1  x)2   1  x  .
1
3 2 
30
= 1 1  x   1 1  x 
2 2
3 4
= 1  x 
2
 13  14
= 1 1  x 
2
12
which is not a constant. So the regression function of Y X  x i.e. E Y x  is
not homoscedastic.
Activity 6.15
A bivariate continuous random variable ( X , Y ) has a joint p.d.f
x  y , 0  x  1 , 0  y  1

f ( x, y )  
 0 ,
 otherwise.
Verify the following.

The regression curve of X on Y  y is not linear .It is a decreasing function of y .
The scedastic function of X on Y is not constant which means the regression
curve of X on Y is not homoscedatic.
Activity 6.16
A bivariate continuous random variable ( X , Y ) has joint density function f ( x, y) for real
values of x and y.
(i) Define Var Y X  x 
(ii) Explain what is meant by saying that the regression function of Y x is
homoscedastic
(iii) If E Y X  x  is the conditional expectation of Y on X  x and g  x  is the
marginal probability density function of


 xE Y x g ( x)dx  E  XY  .

31
Activity 6.17
A bivariate continuous random variable ( X , Y ) has a joint p.d.f f ( x, y) given by
 x  y 0  x  1, 0  y 1
f  x, y   
0 elsewhere
Find
 1 1
(i) P Y  X  
 2 2
 1
(ii) E Y X  
 2
 1
(iii) Var  Y X   .
 2
Summary
At the end of this section, we have learnt
 About computation of numerical values from the conditional distribution
 How to interpret numerical values from the conditional distribution.
Assignment 1
(1) (a) Prove that for any discrete bivariate random variable ( X , N ) for
which the first moments of X and N exits,

E ( X )  E E ( X N ) .
(b The number of customers entering the University of Ghana book-

shop each day is a random variable. Suppose that each customer has,
independently of other customers, a probability  of buying at least
one book. Let X denote the number of customers that buy at least
one book each day.
(i) Describe without proof the distribution of X conditional on
N  n . Hence use the result in (a) to evaluate the expectation
of X if N has distribution.
M 
(ii) P  N  k     k (1   ) M k , k  0,1, 2,...., M
k 
32
(iii) P  N  k    (1   )k , k  0,1, 2,...
e k
(iv) PN  k  , k  0,1, 2,...
k!
(v) P  N  k    (1   )k 1 , k  1, 2,3,...
(c) Find the probability distribution of X if N has the

distributions in (1b) above.
(2) For each fixed   0 , let X have a Poisson distribution with parameter
 . Suppose  itself is a random variable with the gamma distribution
 1 n 1  
  ( n)  e ,   0

f ( )  
 0 , 0


where n is a fixed positive constant. Show that
k n
 ( k  n)  1 
P X  k    , k  0,1, 2,...
(n)(k  1)  2 
Identify this distribution if n is an integer.
Unit Summary
In this Unit you have learnt
 about a sample space as an abstract space;

 about a random variable defined on a probability space;
 how to compute a marginal distribution ,conditional distribution
 how to find a conditional expectations and
 how to calculate conditional variances.
33
Unit 2 Binomial and Poisson Process
Introduction
You are welcome back from studying unit 1. In Unit 2 I review some probability
distributions mentioned in Unit 1 and explain to you how they are derived from the
Bernoulli random variable and the continuous analogue, the Poisson random variable.
This unit as in the case of unit one , has a lot of interesting applications to real life
problems; You will need tools such a pen and jotter for this unit, so make sure
you have them ready to use. I will cover the following topics
Section 1 Bernoulli Processes
Section 2 Geometric Distribution
Section 3 Negative Binomial Distribution
Section 4 Multinomial Distribution
Section 5 The Poisson Process
Section 6 The Gamma Family of Distributions
Objectives
By the end of this unit, you should be able to
 derive the binomial, the geometric and the negative binomial distribution
from the independent bernoulli process,
 give examples of a multivariate experiment as an extension of the bernoulli
experiment.
 define the multivariate distribution.
 state and explain the poisson postulates,
 derive the poisson process under the postulates,
 derive the the gamma distribution from the poisson process,
 solve problem relating to the bernouli and poisson process.
34
Section 1 Bernoulli Processes
Introduction
You are welcomed to Section 1 of this Unit . I will discuss in this Section
distributions which are related to the Bernoulli distribution. Example the binomial,
geometric and the negative binomial distribution.
Objectives
By the end of this Section, you should be able to
 define the bernoulli experiment and the binomial distribution ,

 solve problems involving the binomial model.
Definition 1.1 ( Bernoulli Trials)
Independent trials of an experiment each of which results in a success with

probability p , 0  p  1 and failure with probability 1  p are called Bernoulli
trials.
Example 1.2
A repeated toss of a fair die is a Bernoulli experiment and each trial is a Bernoulli
1
trial with success probability p  .
2
Definition 1.3
Consider a sequence of n Bernoulli trials and let X be the number of successes

that occur. Then X is said to be a Binomial r.v. with parameters (n, p) .
Notation :
X b(n, p) means X is binomially distributed with parameters n and p .
Theorem 1.4
The probability mass function of a b(n, p) of a r.v X is given by :
35
n
P  X  k     p k (1  p)n k , k  0,1, 2,..., n .
k 
Activity 1.5
By ( p  q)n  (q  p)n using the Binomial theorem, show that the probability that X
is even is
1
2
1  (1  2 p)n  .
Section 2 The Geometric Distribution
Introduction
Another distribution which can be derived from the Bernoulli random variable is the
Geometric distribution. You recall from Stat 201 the geometric distribution:
Objectives
After completing this Section, you should be able to
 define the geometric distribution

 apply the geometric model to solve problems.
Definition 2.1 (The geometric distribution)
Let X be the number of trials required to accumulate 1 success in independent

bernoulli trials with a fixed success probability 0  p  1 . Then , X has the
geometric distribution with probability mass function given by
P  X  n   p(1  p)n1 , n  1, 2,3...
Example 2.2
A box contains 20 white and 30 black balls. Balls are randomly selected, one at a
time, until a black one is obtained. If we assume that each selected ball is replaced
before the next one is drawn, what is the probability that
(a) exactly n draws are needed;
(b) at least k draws are needed ?
Solution
Let X denote the number of draws needed to select a black ball, then X is a
30 3
geometric random variable with parameter p   . Hence
50 5
36
n 1
3 2
(a) P  X  n    .
5 5
k 1
3
(b) P X  k    .
5
Example 2.3
In a sequence of independent rolls of a fair die find the probability that
(i) The first 3 is observed on the fifth trial

(ii) At least 4 trials are required to observe a 1
(iii) A 3 will show up before a 4.
Solution
(i) Let X be the number of rolls needed to get the first 3. Then X has a geometric
distribution since a die has 6 faces the probability of rolling a particular number
is 1 6 . Then, X ~ G 1 6 
P  X  k   p 1  p 
k 1
 
k 1
= 1 1 1
6 6
6  6
k 1
=1 5 , k  1, 2,3,...
Therefore, the probability that a 3 is observed on the fifth trial  k  5 is given as
6  6
51
P  X  5  1 5
6 6
4
= 1 5
= 0.0804.
(ii) If at least four trials are required then there would be more than 4 trials
:. The required probability is
6  6
 k 1
P  X  4   1 5
k 4
 6
 k 1
6
= 1 5
k 4
a
(Using the sum to infinity of a geometric series .i.e Sn  , r 1)
1 r
37
 5 3

= 1  6 
 

6 1 5
 6 

 5 3

= 1  6 
  
6 1
 6 
 
 56
3
Hence,  0.5787
(iii) Let Ak be the event that a 3 will show up in the k th trial  k  1, 2,  for the
first time and it does not show up in the previous  k  1 trials their
6  6
k 1
P  Ak   1 4 since in the first  k  1 trial 3 and 4 did not appear. Thus
only 1, 2, 5 and 6 appear.
Therefore, the probability of 3 and 4 not occurring is 4 .The required probability is given
6
by
6  6 
  k 1
 P  Ak   1
k 1
4
k 1
 1 
= 1  
6 1 4
 6
1 
= 1  
6 13
 
= 1
2
Summary
At the end of this Section, you have learnt,
 about the geometric distribution
 how to use the geometric model to solve problems.
38
Section 3 Negative Binomial Distribution
Introduction
The geometric random variable also has some relationship with the Bernoulli random
variable. To begin recall from Stat 201 the negative binomial distribution .
Objectives
At the end of the Section, you should be able to
 define the negative binomial distribution,

 solve problems relating to the negative binomial distribution model.
 find the relationship between the binomial and the negative binomial
distribution.
Definition 3.1 ( The negative binomial distribution)
Let X be the number of trials required to accumulate k success in independent

bernoulli trials with a fixed success probability 0  p  1 . Then , X has the
negative binomial distribution with probability mass function given by
 n  1 k
P  X  n   nk
 p (1  p) , n  k , k  1,...
 k  1
Note that X be expressed as the sum of k independent geometric random variables

Z1 , , Z k each with parameter 0  p  1 . Thus , X  Z1   Z k .
k
Hence, you have that E ( X )  E[ Z1   Z k ]  E[ Z1 ]   E[ Z k ]  and
p
k
V ( X )  V [ Z1   Z k ]  V [ Z1 ]   V [Z k ] 
p2
Theorem 3.2
Let X have a binomial distribution with parameters n and p . Let Y have a

negative binomial or pascal distribution with parameters k and p . Then the
following relationships holds:
39
(a) P Y  n   P  X  k 
(b) P Y  n   P  X  k  .
Proof
(a) If there are k or more successes in the first n trials then you required
n or fewer trials to obtain the first k successes.
(b) If there are fewer than k successes in the first n trials then you required
more than n trials to obtain k successes.
Example 3.3
Find the probability that more than 10 repetitions are necessary to obtain the third
successes when p  P(success)  0.2.
Solution
Within the context of the notation of the preceding Theorem

2
P Y  10   P  X  3   (0.2)k (0.8)10k .
k 0
Example 3.4
An archer misses a target 6% of the time. Find the probability that he will miss the target for
the 2nd time on the 10th shot
Solution
Let p  Probability that he misses a target (failure) = 0.06
X = number of trials required to get the rth failure = 10
r = number of failures = 2, X ~   10, 0.06, 2 
 k  1 r
P X  k    p (1  p)
k r
 r 1
10  1
P  X  10      0.06   0.94  = 9  0.0021944482
2 8
 2 1 
40
= 0.01975
Example 3.5
A coin is twice as likely to turn up Heads as Tails in a sequence of independent tosses of
the coin what is the probability that the 3rd head occurs on the sixth toss.
Solution
Since there are only 2 outcomes in the tossing of a coin
P  H   P T   1
but P  H   2P T 
:. 2P T   P T   1
3P  T   1
P T   1
3
Probability of a tail P T   1
3
Probability of a head P  H   2P T   2  1  2
3 3
Let p = Probability of a head= 2/3 , q  1/ 3
X = number of tosses = 6
r = number of heads = 3
 k  1 r k  r
:. P  X  k    P 9
 r 1
 6  1 2
   13 
3 6 3
P  X  6   
 3  1 3
 6  1 2
   13 
3 3
= 
 3  1 3
5
 
3
=   2 3  13
 2
5
 
3
=   29
 2
41
5

=   8 729
 2

= 10  8
729
 80
729
Activity 3.5
Find the expected value and the variance of the number of times one must throw a
die until the outcome 1 has occurred 4 times.
Activity 3.6
Consider a sequence of independent rolls of a fair dice find the probability that
a) The first 2 will show up on the 6th roll
b) a 2 will show up before 4
c) a 2 will show up before a 4 or 5
Activity 3.7
Suppose we need some good batteries and have a pile of old batteries that is a mixture of 6
good ones and 4 bad ones. Find the 4 decimal places the probability that it takes
a) 8 random test to locate 3 good batteries
b) 8 random test to locate 5 good batteries
c)
Summary
At the end of this section, you have learnt,
 about the negative binomial distribution,
 how to solve problems relating to the negative binomial distribution model.
 about the relationship between the binomial and the negative binomial
distribution.
42
Section 4 The Multinomial Distribution
Introduction
It arises out of a generalization of the binomial distribution by considering a
random experiment with a finite number of more than two outcomes. Consider an
experiment,  , its sample space  and a partition of  into k mutually
exclusive events A1 , A2 , A3 ,... so that one and only one of events , Ai occurs when
the experiment is performed.
Consider n independent repetition of  . Let pi  P( Ai ) and suppose pi remains

k
constant during all repetitions of  . Clearly, you have p
i 1
i  1 . Define the r.v’s
X1 , , X k as follows: X i is the number of times Ai occurs among the n

repetitions of  , i  1, 2,3,..., k.
k
Note that the X i ’s are not independent r.v’s since X
i 1
i  n , so that as soon as
the value of any (k  1) of these r.v’s are known, the value of the remaining one is
determined.
Objectives
At the end of the Section, you should be able to
 define the multinomial distribution as generalization of the binomial

distribution,
 solve problems based on the multinomial model.
Theorem 4.1
If X i , i  1, 2,3,..., k are as defined above, the results is the multinomial distribution.
n
n!
P  X 1  n1 , , X k  nk   p1n1 p2n2 ... pknk where i  n.
n1 , , nk i 1
Theorem 4.2
Suppose that ( X1 , , X k ) has a multinomial distribution as defined above. Then,

E ( X i )  npi and Var ( X i )  npi (1  pi ) where i  1, 2,3,..., k.
43
Theorem 4.3
Suppose that ( X1 , , X k ) has a multinomial distribution as defined above. Then,

Cov( X i , X j )  npi p j and the correlation co-efficient between X i and X j i  j is
1/2

 pi p j 

i , j   
 (1  pi )(1  p j ) 
 
Proof of Theorems 4.2 and 4.3
It can be easily show that the m.g.f of the multinomial n r.v’s ( X1 , , X k ) is

given by
M X1 , , Xk (t1 , , tk )  ( p1et1  p2et2  ...  pk 1etk 1  pk )n for all t1 , , tk 1  R .
If M the m.g.f of a multinomial ( X1 , , X k ) is differentiated once with respect to

ti

M X1 , ,Xk (t1 , 
, tk )  npi ( p1et1  p2et2  ...  pk 1etk 1  pk ) n1
and differentiating another time with respect to
2
t j ti

M X1 , ,Xk (t1 , 
, tk )  n(n  1) pi eti p j e j ( p1et1  p2et2  ...  pk 1etk 1  pk ) n 1
t

Note that E( X i ) 
ti
M X1 , ,Xk (t1 , , tk )   npi
t1  0, ,tk 1 0
Also Cov( X i , X j )  E[ X i X j ]  E[ X i ]E[ X j ] and
2
E( X i X j ) 
t j ti

M X1 , ,Xk (t1 , , tk )  = n(n  1) pi p j .
t1  0, ,tk 1 0
Therefore Cov( X i , X j )  n(n  1) pi p j  n2 pi p j  npi p j and the correlation coefficient

between X i and X j is given by
Cov( X i , X j ) npi p j
pi , j  
Var ( X i )Var ( X j ) npi (1  pi )np j (1  p j )
44
1/2
 pi p j 
= 
 (1  pi )(1  p j ) 
Example 4.4
A fair die is rolled 9 times . What is the probability that 1 appears three times ,
2 and 3 twice each, 4 and 5 once each and 6 not at all
Solution
The required probability is given by
3 2 2 1 1 0
9! 1 1 1 1 1 1
P  X1  3, X 2  2, X 3  2, X 4  1, X 5  1, X 6  0              
3!2!2!1!1!0!  6   6   6   6   6  6
Theorem 4.5
If ( X1 , , X k ) has a multinomial distribution as defined above. Then
n
P  X1  X 2 ,  X n  m      p1  p2  ...  pn   pn1  p2  ...  pk 
m nm
 m
Activity 4.5
In a given day some customers enter an electronic show. Suppose that 3 of the
customers are widow shopping, 5 will buy a colour television and 2 will by an
ordinary black and white television. Find the probability that if 5 customers enter
the shop, the shop owner will sell at least 2 television sets.
Summary
In this Section you have learnt,
 about the multinomial distribution as generalization of the binomial

distribution,
 how to solve problems based on the multinomial model.
45
Section 5 The Poisson Process
Introduction
The Poisson process, which is the continuous analogue of the Bernoulli process,
defined in terms of a probabilistic description of the behaviour of arrivals at points
on a continuous line.
Objectives
At the end of this Section, you should be able to
 state and explain the postulates underlying the poisson distribution,
 Derive from the postulates the poisson distribution,
 solve some examples of problems involving the poisson distribution.
The Poisson Postulates

Postulate Meaning
p1 (h)   h  o(h) The probability of exactly one event in a
for small h .  is called the intensity small interval of time is approximately
paramemter or the Poisson rate proportional to the width of the interval (

assumed to be h )
 The probability of more than one event
 p (h)  o(h) , for
n 1
n small h.
in a small interval of time whose width is
h is negligible
The number of events in non-overlapping The number of events non-overlapping
intervals of time are stochastically intervals of time are stochastically
independent and have the same independent.
distribution for the same length of
interval.
p0 (0)  1 The probability of no event in time 0 is
1.
46
Examples
(i)   5 radioactive particles per second.
(ii)   4 tankers arriving per day.
(iii)   3 taxis of a special type passing through a junction per minutes.
Definition 5.1. ( Poisson Process)
A process is said to be Poisson process if the following postulates hold:
(a) The occurrence of an event in any given interval, is independent of any

other interval .i.e the occurrence of an event in an interval of time (or space)
has no effect on the probability of a second occurrence of the event in
same or another interval.
(b) The probability of a simple occurrence of an event in a given interval is
proportional to the length of the interval. This is referred to as stationarity.
(c) In any infinitesimally small portion of the interval the probability of more
than one occurrence of the event is negligible.
Theorem 5.2
Under the Poisson postulates, the number of events occurring in any interval of
length t is a Poisson random variable with parameters t i.e. if X (t ) is the
number of events occurrng in the interval (0, t ] , then
P  X (t )  n   et (t )n / n! .
pn (t  h)  P  X (t  h)  n =
 P  n events in the interval (0,t ] and 0 event in the time interval (t ,t  h] 
P  n  1 events in the interval (0,t ] and 1event in the time interval (t ,t  h] o(h)
 P  n events in the interval (0,t ]   P  0 events in the interval (t ,t  h]  +

 P  n  1 events in the interval (0,t ]   P  1 event in the interval (t ,t  h] 
Since all the numbers of events in the non-overlapping intervals of time are
stochastically independent.
Therefore, you have that
pn (t  h)  pn (t ) 1  h  o(h)  pn1 (t ) h  o(h)  o(h) , n  0.
47
This implies
1 o(h)
 pn (t  h)  pn (t )   pn (t )   pn1 (t ) 
h h
and so you have that
1 o ( h)
lim
x 0
 pn (t  h)  pn (t )   pn (t )   pn1 (t )  lim

=  pn (t )   pn1 (t ) .
h x 0 h
Using the fundamental theorem of calculus you have
d
 pn (t )   pn (t )   pn1 (t ), n  0 ,
dt
d
 p0 (t )   p0 (t ), since p1 (t )  0 …………………… (1)
dt
d
and  pn (t )   pn (t )   pn1 (t ), …………………………………(2)
dt
Multiply both sides of equations (1) and (2) by et and write
qn (t )  et pn (t ) …………………………………….(3)
So you have that
d d
 qn (t )  et pn (t )    pn (t ) , ……………………………..(4)
dt dt
Then you have
d
e t  p0 (t )  et p0 (t ) ……………………………………..(5a)
dt
d d
et  pn (t )  et pn (t )    pn1 (t ) , ………………………..(5b)
dt dt
Therefore you get,
d
 q0 (t )  0 …………………………………………………………….(6a)
dt
d d
 qn (t )    qn1 (t ) ………………………………………………………..(6b)
dt dt
With the boundary conditions
48
q0 (0)  p0 (0)  1 ……………………………………..(7a)
qn (0)  pn (0)  0 , n  0 …………………………..(7b)
Solving equations (6a) and (6b) you can write
q0' (t )  0  q0  t   c and equation (7a) gives c 1
q1' (t )    q1  t   t  c and equation (7a) gives c  0.
1
q2' (t )  t  q2  t   t 2  c and equation (7b) gives c  0.
2
 nt n1 ( t ) n
q n (t )   qn (t )  c and equation (7b) gives c  0.
(n  1)! n!
Thus, in general , we have
t ( t ) n
qn (t )  e pn (t )  ,
n!
( t ) n   t
and hence pn (t )  e , n  0,1, 2,.....
n!
Example 5.3
Consider a Poisson Process  X (t ) : t  0 where E  X (t )  t . Let T1 denote the
time of the first event of the process . Show that for s  t , P T1  s X (t )  1  .
s
t
Solution
P T1  s, X (t )  1

P T1  s X (t )  1   P  X (t )  1
P 1event in (0, s); 0 event in [ s, t ) 


P  X (t )  1
49
P 1event in (0, s)   P 0 event in [ s, t ) 


P  X (t )  1
 se  s e  (t  s )

te t
s
 .
t
Example 5.4
The number of planes arriving per day at a small private airport is a random variable having a
Poisson distribution with parameter   28 . What is the probability that the time between
two such arrivals is at least one hour.
Solution
From the Poisson process, the time between two successive arrivals is an exponential
distribution. Let X be the time between two successive arrivals then X ~ Exp  28

:. P  X  1   28e28 x dx
1

= 28 e28 x dx
1

1 
= 28  e28 x 
 28 1
= e28
Example 5.5
Customers arrive at a bank at a Poisson rate  . Suppose two customers arrived during the
first hour. What is the probability that they both arrived during the first 20 mins.
Solution
The average number of arrivals in the 1st 20mins is t , where   2 and t  20

60
20 2
:. t  2  
60 3
50
 3
k
e2/3 2
Now P  k arrivals in the 1st 20 mins is  
k!
 3
2
e2/3 2
:. P  2 arrivals in the 1st 20 mins  
2!
= 0.171
The waiting time distribution

Let T1 denote the time from t  0 to the 1st event. Then, you have
P T1  t   P  no event in (0, t )
e   t ( t ) 0
=
0!
= e  t .
Therefore, the c.d.f of T1 is
F (t )  P T1  t   1  P T1  t   1  et .
Differentiate with respect to obtain, the p.d.f of T1 as
f (t )  et .
The waiting to the N th Event Tn in a Poisson process
Let Tn denote the time at which nth event occur. Then Tn is less than or equal to
t iff the number of events that have occurred by time t is at least n i.e. with
X (t ) = no of events in (0, t ] ,

P Tn  t   P  X (t )  n    P  X (t )  k 
k n

( t ) k
= e  t
.
k n k!
Differentiate with respect to t , you have
d 
 (t )k 1 (t )k 
f n (t )   P Tn  t   e  
 t
 
dt k  n  (k  1)! k! 
 t (t ) n 1
= e
(n  1)!
51
 nt n 1e t
= , t  0, n  0.
 ( n)
Example 5.6
An electronic device fails when a cumulative effect of k shocks occurs. If the shocks happen
according to a Poisson process with parameter  , find the p.d.f of the life time T of the
device.
Solution
According to the Poisson process, the time between two successive events is an exponential
distribution with the same parameter  . Let X i be the time between two such shocks, then
X i ~ Exp    . But the device fails when the kth shock occur. Now Let T be the lifetime of
the device . Therefore, the device fails when i  k .

k
 T   X i which is the sum of k exponential r.v. with mean 1 .
i 1

Hence, according to the poisson process

T ~  K,  
  k 1   x
 t e x, t , k  0
Therefore, the p.d.f of T is p T  t      k 
 0
 elsewhere
Example 5.7
Suppose the arrivals at a service counter is exponential distribution with mean 12 mins. If
successive times are independent what is the probability of 10 or fever arrivals in one hour.
Solution
The number of arrivals in an hour follows a poisson distribution with mean
60
t  5
12
e5 5k
10
Therefore, P 10 or fever arrivals in 1hour   
k 0 K !
 5 52 510 
e5 1      = 0.986
 1! 2! 10! 
52
Activity 5.8
For a Poisson Process show that for st
nk
 n s 
k
 s
P  X ( s)  k X (t )  n      1   ,
 k  t   t
k  0,1, 2,..., n.
Activity 5.9
(1) Show that the waiting time Tk for the k th Kth event in a poisson process with rate
 has the distribute (k ,  ) .

(2) (a) State the postulates of the Poisson process
(b) The number of power surges in an electric grid has Poisson distribution what
mean 6 every 24hrs. Let X be the time between consecutive power surges.
Find P  X  1 .
Summary
In this Section, you have learnt
 about the postulates underlying the Poisson distribution,
 to derive from the postulates the Poisson distribution,
 to solve some examples of problems involving the Poisson distribution.
53
Section 6 The Gamma family of distributions
Introduction
Welcome to Section 6 of this Unit. Here, I will discuss to you one of the commonly
used distribution for modelling claims in insurance. The Exponential distribution and
the Ch-square distribution are special cases of this distribution.
Objectives
By the end of this Section , you should be able to
 define the gamma family of distributions,
 find the relationship between the poisson and the gamma distribution.
 apply the gamma model to solve real life problems.
Definition 6.1 (Gamma Distribution)

A continuous r.v. X is said to have a Gamma distribution with parameters  and
 if its p.d.f is given as
    1  x / 
 ( ) x e ,0 x

f ( x)  
 0 , x  0.


where  ,   0.
Notation
By f ( x)    ,  1  you denote the gamma distribution with parameters
Comments:
(i) The family of Gamma distributions is a two parameter family. The parameter
 may be called the shape parameter i.e. its value determines the shape
of the graph of the density. The parameter  is called the scale
parameter.
54
1
(ii) If   , the gamma p.d.f becomes
b
 b  1 bx
 ( ) x e , 0  x  

f ( x)  
 0 , x  0.


(iii) If   1 , the Gamma p.d.f in (i) becomes the exponential p.d.f
  1e x /  , 0  x  

f ( x)  
 0 , x  0.

(iv) If   n , a positive integer (n,  ) may be seen as the distribution of
the sum of n independent exponential r.v’s each with p.d.f
  1e x /  , 0  x  

f ( x)  
 0 , x  0.

Example 6.2
If X has the Gamma distribution f ( x)    ,  1  , show that the moment
generating function (m.g.f) of X is

1
M X (t )  1   t 

, t

and use the m.g.f to derive the result E( X )   and V ( X )   2 .
Hence, show that if X1 , , X n are independent r.v’s such that X j ( j ,  1 ) ,

n
 n 1
where j  1, 2,3,..., n then Sn   X i   i ,  .
i 1  i 1 
Deduce that X1 , , X n are iid, each with an exponential distribution with mean  ,
n
 1
then Sn   X i   n,  .
i 1  
 1
Hence show that if X   n,  , then the c.d.f of X , F ( x) is give by
 
55
n 1
F ( x)  P  X  x   1   P W  k 
k 0
where W has a Poisson distribution with parameter  x and explain the

significant of this result.
Solution ( partial)
 
y 1e y
1
1  ( t ) x 1
  
M X (t )  E(e ) 
tX
e 
x 1dx  dy , t  1/ 
1   t 

( )  0 0
( )
1 
if you let y  x   t  .
 

1 y 1e y 1
1   t  0
This gives M X (t )  dy   , since the Gamma density is
1   t 

( )
legitimate. Note that for any positive n ,

(i) We can compute E  X n  directly, from f ( x) as follows
 
n
E X  
1
x (n)  n 0
  n 1  x / 
n
e dx  x  n 1e x /  dx
( )   0
 n (  n)  1


 n 
=  x  n1e x /  dx 
( )  (  n)  0 
 n (  n)

( )
  n (  n  1)(  n  2)...
(ii) The case   1 leads to the exponential distribution with mean  in
 1
which case X  1,  which yields E  X n   n !  n . So that
 
EX    , V ( X )   2 with M X (t )  1   t 
1
for t  1/  .
n
(iii) Another special case of importance is when  , n  0 where n
2
1
is an integer and   , which lead to the chi-square distribution
2
n 1
with n degrees of freedom .i.e.   ,  is the same as xn2 , so
2 2
56
 2 n /2 n2 1  x /2
 n x e
   
 2 ,0 x

the p.d.f becomes f ( x)  
 0 , x  0.




n 1
Thus, if X   ,  then E  X   n , V  X   2n and
2 2
 n 
 2  k 
E  X k   2k     , k is a positive integer.
  n  
   
2 
= n(n  2)(n  4)...(n  2k  2) .
Also you have that the moment generating function of X is
1
M X (t )  1  2t 
 n /2
,t  .
2
Example 6.3
Some strains of paramecia produce and secrete “killer” particles that will cause the
death of a sensitive individual if contact is made. All paramecia unable to produce
killer particles are sensitive. The mean number of killer particles emitted by a killer
paramecium is 1 every 5 hours. In observing such a paramecium, what is the
probability that you must wait at most 4 hours before the first particle is emitted.
Solution
Considering the unit of measurement to be one hour, you have Poisson process with
1
rate   . So the time at which the first killer particle emitted, has a Poisson
5
distribution with . Therefore, the density of W is given by
1
f ( w)  e w/5 , w  0.
5
4
1
P W  4    e w/5 dw  0.5507
0
5
Also the average that you must wait until the first killer particle is emitted is
1
E W    5.

57
Summary
In this Section, you have learnt,
 about the gamma family of distributions,
 about the relationship between the poisson and the gamma distribution.
 to apply the Gamma model to solve real life problems.
Assignment 2
Events occur independently of each other at a steady mean rate of  per unit time
in such a way that if pn (t ) denotes the probability that n events in the interval (0, t ]
, then pn (t ) satisfies the following conditions:
(A1) For small h , pn (h)   h  o(h) ;
(A2) For small h ,  p ( h)  o( h) ;

n2
n
(A3) The numbers of events in non-overlapping intervals of time are
stochastically independent and have the same distribution for the same of
interval;
(A4) p0 (0)  1.
Prove that
( t ) n e   t
pn (t )  , n  0,1, 2,....
n!
Deduce that
(a) If Tn denotes the occurrence time of the nth event, then Tn has a
gamma distribution with parameters n and  ;
(b) In any Poisson process  N (t ), t  0 , for s  t
s
(i) P T1  s N (t )  1 
t
nk
 n s 
k
 s
(ii) P  N ( s)  k N (t )  n      1  
 k  t   t
(iii) If N1 (t ), t  0 and N2 (t ), t  0 are independent Poisson
Processes, with respective rates or intensities 1 and  2 , then
58
for 0  k  n
n
P  N1 (t )  k N1 (t )  N 2 (t )  n     p k q n k
k 
1
where p .
1  2 / 1 
Unit Summary
You have learnt,
 about distribution related to the bernoulli process
 about the multinomial distribution
 about the distribution related to the poisson process
 about the poisson postulates
 to derive of the poisson distribution
 about the gamma family of distribution.
59
Unit 3 Functions of Random variables
Introduction
Welcome to Unit 3 functions of random variables . In Units 1 and 2 we studied
random variables , distribution and numerical features of this distribution. In Unit 3 I
will look at random variables which are functions of other random variables. The
discussion will cover the following topics:
Section 1 Functions of the discrete Random Variables
Section 2 Functions of the continuous Random Variables
Section 3 Distribution of the sum of two Random Variables
Section 4 Distribution of the product of two random variables
Section 5 The Beta family of distributions
Section 6 The Bivariate Normal distribution
Objectives
After completing this Section, you should be able
 to find the distribution of functions of random variables

 to describe the beta family of distributions
 to describe the bivariate normal density function.
60
Section 1 Functions of the discrete Random Variables

Introduction
My value students welcome back from recess. In this Section I introduce you to
probability distributions of functions of discrete random variables.
Objectives
After completing this Section, you should be able to
 find the distribution of functions of discrete random variables,
 calculate expectations , variance and other numerical features of functions of
discrete random variables.
Definition 1.1 ( Case 1)

If X is a discrete random variable and Y  H ( X ) , then it follows immediately that
Y is also a discrete r.v. Let xi ,1 , xi ,2 ,..., xi ,k represent the X values having the
property H ( xi , j )  yi for all j . Then
q( yi )  P Y  yi   p( xi ,1 )  p( xi ,2 )  .... where p( xi , j )  P  X  xi , j  .
Example 1.2
 1 , if X is even
1 
A discrete r.v. X has p.m.f P  X  n   n . Let Y  
2 1 , if X is odd.

Find the p.m.f of Y.
Solution
Y assumes two values 1 and 1 . Then Y  1  X  2 or X  4 or X  6 or……

1 1/ 4 1 2
Hence, P(Y  1)     and P(Y  1)  1  P(Y  1)  1  1/ 3  .
k 1 2
k
1  1/ 4 3 3
61
Activity 1.3
A discrete random variable has probability mass function
n
P( X  k )    p k (1  p)n k , k  0,1, 2,3,...., n , Let
k 
1 , if X is even

Y 
0 , if
 X is odd.
Find the expected value of the random variable Y .
Definition 1.4
If X is a continuous random variable and Y  H ( X ) , then it follows immediately
that Y is also a continuous random variable . If the function H is discrete.
Example 1.5
Show that for any non-negative r.v. X and real number a  0 ,
P  X  a   E( X ) / a.
Solution
Define the discrete random variable Y  H ( X ) as
a , if X  a

Y 
 0 , if X  a.

Note that Y  X by definition and so
E( X )  E(Y )  a  P Y  a   0  P(Y  0)  a  P  X  a  .
Hence, P  X  a   E( X ) / a.
Summary
At the end of this Section, you have learnt
 to calculate the distribution of functions of discrete random variables,
 to calculate expectations , variance and other numerical features of
functions of discrete random variables.
62
Section 2 Functions of Continuous Random Variables
Introduction
The most important and most frequently encounted case arises when X is a
continuous random variable with p.d.f , say f and H is a continuous function, so
that Y  H ( X ) is continuous random variable . In this case , you obtain the p,d,f, say
g ( y) of Y  H ( X ) , in general as follows:
Procedure
(i) Obtain G , the c.d.f of Y , where G( y)  P Y  y  , by finding the event
( in the range space of X ) which is equivalent to the event Y  y .

(ii) Differentiate G( y ) with respect to y in order to obtain g ( y)  G ' ( y) .
(iii) Determine those values of y in the range space of Y for which
g ( y)  0.
Objectives
At the end of this Section, you should be able to
 find the distribution of functions of continuous random variables,
 calculate expectation and variance of functions of continuous random
variables.
Example 2.1
Suppose that X is a random variable with probability density function given by
2 x , if 0  x  1

f ( x)  
0 ,
 otherwise.
Find the probability density function of the random variable Y  H ( X )  e X .
63
Solution
Let G( y ) be the c.d.f of Y  H ( X ) . Then
G( y)  P Y  y   P  e X  y   P( X   ln y)  1  P( X   ln y)
This implies G( y)  1  F ( ln y) , where F is the c.d.f of X.

d 2
Hence the p.d.f of Y is g ( y )   G( y)   ln y , 1/ e  y  1 .
dy y
Since 0  x  1  1/ e  y  1 , for g ( y)  0 .
Activity 2.2
A random variable  is uniformly distributed in ( / 2,  / 2) . Find the p.d.f of the
random variable Y  tan  .
Theorem 2.3
Let X be a continuous random variable with probability density function f ( x) .
Let Y  X 2 , then the random variable Y has p.d.f given by
g ( y) 
1 
2 y
f  y   f   y  .
Proof
Let G( y ) be the cumulative distribution function of Y  X 2 . Then you can calculate
  
G( y)  P(Y  y)  P( X 2  y)  P X  y  P  y  X  y  F
2
  y   F  y  ,
where F is the c.d.f of X.
d 1 
g ( y)  G ( y )   f ( y )  f ( y )  .
dy 2 y
64
Example 2.4
Suppose X is standard normally distributed random variable. Find the probability
density function of Y  X 2 and identify completely the distribution of Y .
Solution
1  x2 /2
Observe that X has p.d.f f ( x)  e and so the p.d.f. of Y  X 2 by
2
1  2  y 1/2
theorem 2.3 is given as g ( y)   exp(  y / 2)  exp( y / 2) , which is the
2 y  2  2
distribution of the chi-square on 1 degree of freedom.
Theorem 2.5
Let X be a continuous random variable with p.d.f f and c.d.f F . Let Y be the
random variable defined by Y  F ( X ) . Then Y is uniformaly distributed over the
inatrval [0,1] . Y is called the integral transform of X .
Proof
Note that X is a continuous r.v. and that the c.d.f is F is continuous, strictly
monotone function with an inverse, say F 1 . Now let G be the c.d.f of the r.v.
Y define above. Then
G( y)  P Y  y   P  F ( X )  y   P  F 1F ( X )  F 1 ( y)   P  X  F 1 ( y)   F ( F 1 ( y))  y
Since F is the c.d.f of X . Hence the p.d.f g ( y) of Y is given by
g ( y)  G ' ( y)  1 , for y [0,1] .
Activity 2.6
Let T be a positive continuous random variable with c.d.f F (t ) . Find the p.d.f of
the random variable Y   ln (1  F (T ))  , where  is a positive constant.
Activity 2.7
 0,1 .
1
Let X be uniformly distributed over Find the p.d.f of Y   ln  (1  F (T )) ,

  0.
65
Theorem 2.8
Let X be a continuous random variable with p.d.f f ( x) . Suppose that y  H ( x) is
strictly monotone ( increasing or decreasing) differentiable ( and thus continuous)
function of x. Then the r.v. Y  H ( X ) has p.d.f given by

 f  H ( y ) 
1 d
dy
 H 1 ( y)  , if y  H ( x) for some x


g ( y)  
 0 , otherwise .


dx
where x  H 1 ( y) . i.e g ( y )  f ( x) .
dy
Proof of the preceding theorem

(a) Assume that H is monotonic increasing. Then the c.d.f of Y is
G( y)  P Y  y   P  H ( X )  y   P  X  H 1 ( y )   F ( H 1 ( y )) ,
where F is the c.d.f of X.

Now differentiate G( y ) with respect to y using the chain rule,
d d dx
 G( y)    G( y)   , where x  H 1 ( y).
dy dx dy
d dx dx
i.e. G ' ( y )   F ( x)    f ( x) ……………………………… (1)
dx dy dy
(b) Assume that H is monotonic decreasing. Then the c.d.f of Y
G( y)  P Y  y   P  H ( X )  y   P  X  H 1 ( y )   1  F ( H 1 ( y)) .
d dx dx
Hence, G ' ( y )  1  F ( x)     f ( x) ………………………(2)
dx dy dy
Note that if y is a decreasing function of x , x is a decreasing function of y

dx dx
and so  0. Thus,by using the absolute value sign around , you can
dy dy
combine results of (a) and (b) to obtain the final form of the theorem.
66
Example 2.9
X has a uniform distribution on the interval (0,1) . Find all differentiable monotone
functions  ( x) such that Y   ( X ) has c.d.f
 0 , y0

G( y)  

1  e , y  0.
 y2
Solution
Suppose  is monotonic increasing i.e.  (0)  0 and  (1)   . Then
dx
g ( y )  f ( x) 2 ye y dy  dx and hence
2
. So you have that
dy
 2 ye dy   dx . This gives
 y2
e y  x  c , where c is a constant.
2
Now  (0)  0  c  1 . Therefore the function is y   ln(1  x) .

Next suppose  is monotonic decreasing i.e.  (0)   and  (1)  0 . Then you have
dx
g ( y )   f ( x)   2 ye y    dx  e y   x  c , where c is a constant
2 2
dy
. But  is monotonic decreasing   (0)   which gives c  0.
Therefore , e y   x  y   ln x from the above ,

2
  ln(1  x) , x   0,1

 ( x)  

  ln x , x   0,1 .
Theorem 2.11
Let F be any strictly increasing or non-decreasing c.d.f and let X be uniformly
distributed on  0,1 . Then there exists a function h such that h( X ) has distribution
function F , thus,
P  h( X )  x   F ( x) , for x   ,   .
Proof for the Continuous Case

If F is continuous and strictly increasing, F 1 is well defined and so you can take
h( X )  F 1 ( X ) . Then you have
67
P  h( X )  x   P  F 1 ( X )  x   P  X  F ( x)   G  F ( x)  F ( x) ,
where G is the c.d.f of X .
Example 2.12
Suppose that a random variable Y has c.d.f F ( y) defined by
 0 , y0

F ( y)  
1  e y y  0

Find a function h( X ) which has the c.d.f F ( x) , where X is uniformly distributed
over  0,1 .
Solution
The inverse to z  1  e y , y  0 is y   ln(1  z ) , 0  z  1 . Thus h( z )   ln(1  z )
and h( X )   ln(1  X ) has the required distribution, where X is uniformly distributed
over (0,1).
Example 2.13
A random variable X has the normal distribution N  0, 1
(i) Find the probability density function of the random variable Y  X
If w   ln 1  G  y    , show that

c
(ii)
 
1 e w/ c d w  g  y 
c dy
Solution
Let F and f be the c.d.f and p.d.f of X respectively ,G and g be the c.d.f and p.d.f of Y
respectively, then
G ( y )  P Y  y   P  X  y   P   y  X  y  = F  y   F   y 
d
Hence, you have g ( y)   G( y)   f ( y)  f ( y)
dy
68
1  x2 2
but X ~ N  0, 1  f  x   e
2
  y 2
1  y2 2 1 
g  y  e  e 2
2 2
 y2  y2
= 1 e 2
 1 e 2
2 2
2
y
= 2 e 2
2
w   ln 1  G  y   
C
(ii)
 
 w  ln 1  G  y   
C
 
e w  1  G  y  
C
 1 G  y
w
C
e
Differentiating wthe respect to w

dy
 1 e w C  G  y 
C dw
 1 e w C   g  y 
C
1 e  g  y  , which completes the proof.
W
C dw
C dy
Example 2.15
Let X be the coordinates of a point chosen at random on the line segment  0, 1 . Find the
probability of the random variables Y  1 ln 1  X  ,   0 .


Solution
X U 0,1 . Let F and f be the cdf and pdf of X respectively , G and y be the cdf and
pdf of Y, respectively . Then, you have

 1 
G( y )  P Y  y   P   ln(1  X )  y
  
1 
= P  ln 1  X    y 
 
 P ln 1  X    y 
69
= P 1  X  e y 
 P  X  1  e  y 
 F (1  e y )
Now , taking derivative with respect to y
g  y   G  y   F  1  e  y   1  e y 
d
dy
= f 1  e  y   e  y
=  e  y since X U  0,1
f  x  1
 e   y ,   0
:. g  y   
0 , otherwise
Example 2.18
  
A random variable  U   ,  , find the probability density function of the random
 2 2
variable Y  tan  and Y  3  4 .
Solution


1   
f      2 2

0 Otherwise
d
but g  y   f   . Now, given Y  tan  , you have
dy
dy
 sec2 
d
d 1 1 1
  
dy sec  1  tan  1  y 2
2 2
Therefore,
1 1 
g ( y)  ,   y   ,
  1  y 2 
which a standard Cauchy distribution
70
Example 2.17
A random variable X has probability density function
3x 2 0  x  1
f  x  
0 elsewhere
Find all differentiable monotonic function  ( X ) such that Y    x  has the probability
density function
2 y 0  y  1
g  y   .
0 elsewhere
Solution
dx dx
Supposes  is monotonic increasing , Then, you have 
dy dy
dx
:. g  y   f  x 
dy
dx
 2 y  3x 2   2 y dy   3x 2 dx
dy
y 2  x2  c
Using the boundary conditions   0   0 and  1  1 , you have that
Hence, c  0  y 2  x3  y  x3/2 .
dx dx
Suppose  is monotonic decreasing , Then , 
dy dy
 g  y    f  x  dx
dy
2 y  3x 2 dx
dy
 2 y dy   3x dx
2
y 2   x3  c
Since  is decreasing , you have  (1)  0 and  (0)  1 , and so you that
1  03  c  c  1 , and so you have,

y 2  1  x3
71
y  1  x3 
1/2
 x3/2
 x   0, 1
  x  
1  x  x   0, 1
3 1/2
Activity 2.18
Suppose X is uniformly distributed on [0,1] , find the probability density function of the
following
 
(a) Y  sin  x 2
(b) z  cos  x 
2
(c) z  ex
(d) Y  3x  4
Activity 2.19
A bivariate random variable ( X , Y ) has a joint probability density function (p.d.f)
k , x 2  y 2  R 2
f  x, y   
0, otherwise
(a) Find k
 
1/2
(b) Find the p.d.f of Z  X 2  Y 2 .
Activity 2.20
A random variable Y has cumulative density function
1  e y y  0
2
F  y  
0 otherwise
Find all differentiable monotonic function  ( z ) such that z has a uniform distribution on the
interval  0, 1 , X   ( z ) has the same distribution as Y. Find the p.d. f of X .
72
Activity 2.21(a)
Suppose X is a continuous r.v. with c.d.f and pdf F and f respectively. Let the r.v. Y be
defined as Y  X  b  aX  , where a and b are real positive constants. Show that the c.d.f
and pdf of Y are respectively

 b  b 2  4ay   b  b 2  4ay 
G  y  1 F   F 
 2 a   2 a 
   
and
1/2 
  b  b  4ay   b  b 2  4ay  

2
2   
g  y   G  y   1 b 2
 4ay f   f 
  2 a   2a  
   
Activity 2.21(b)
Suppose Y has probability density function y / 2 for 0  y  2 , 0 otherwise.
Compute the probability P Y (2  Y )  3 / 4 .
Theorem 2.22
Let X be a continuous random variable with p.d.f of f and c.d.f F . Let
X1 , , X n be a random sample drawn on X , and K and M be minimum and
maximum of the sample respectively .i.e. K  min( X1 , , Xn) and
M  max( X1 , , X n ) . Then,
the p.d.f of M is given by g (m)  n  F (m)

n 1
(a) f (m) .
h(k )  n 1  F (k )
n 1
(b) the p.d.f of K is given by f (k ) .
Proof
(a) Let G(m)  P(M  m) be the c.d.f of M . Now M  m is equivalent to
the event  X i  m, i . Hence , since the X i ’s are independent you find
n
G(m)  P(M  m)  P  X 1  m, X 2  m,..., X n  m    P  X i  m    F (m)  .
n
i 1
Therefore, g (m)  G ' (m)  n  F (m)

n 1
f (m).
73
(b) Let H (k )  P  K  k  be the c.d.f of K . Then
H (k )  P  K  k   1  P  K  k   1  P( X 1  k , X 2  k ,...., X n  k )  1   P( X 1  k ) 
n
( since the X i ’s are independent) . Therefore,
h(k )  H ' (k )  n 1  F (k )
n 1
f (k ).
Activity 2.23
X and Y are independent random variables with a common uniform distribution on
 0,1 . Find the probability density function of K  min( X , Y ) and M  max( X , Y ) .
Example 2.24
An electronic device has a life length T which is exponentially distributed with
parameter   0.001 . Suppose that 100 such devices are tested, yielding observed
values T1 , T2 ,..., T100 .
(a) What is the probability that the largest observation exceed 7200 hours ?
(b) What is the probability that the shortest time to failure is less than 10
hours.
Solution
(a) Let M be the largest observed value. Then you require that
P  M  7200  1  P  M  7200 .
Recall that for an exponentially distributed r.v. with parameter   0.001 ,

F (t )  1  e0.001t . Therefore, you have
P  M  7200   e0.001(7200)
(b) Similarly from Theorem 2.22, you have that
P  K  10   1  F (10)
100
,
where K is the minimum of the sample.
74
Activity 2.25
Let F  x  , H  y  represent the c.d.f of the r.v’s X and Y respectively. Also let f and h
denote their respective pdf’s. find in terms of F, f, H, h the p.d.f of the r.v
(i) U  min  X , Y 
(ii) V  max  X , Y 
Example 2.26
A r.v. X is uniformly distributed over (0,1) . Find the p.d.f of Y if Y is
uniformly distributed over the interval (0, X ).
Solution
1
fY x ( y x )  , 0  y  x  1.
x
The joint probability density function of Y given X  x is given by
1
f ( x, y )  g ( x ) / x  , 0  y  x  1, where g ( x) is the p.d.f of X .
x
Therefore , the marginal of Y is give by
1 1
h( y )   f ( x, y )dx   x 1dx   ln y , 0  y  1.
y y
Activity 2.27
Suppose that Yn ( (n  1, 2,3,....) are r.v.such that Yn is uniformly distributed over
 0, Yn1  with Y0  1. Prove that Yn has p.d.f
( ln y) n 1
f n ( y)  , n  1, 2,3,... 0  y  1.
 ( n)
Summary
You have learnt,
 to compute the distribution of functions of continuous random

variables,
 to calculate expectation and variance of functions of continuous
random variables.
75
Section 3 Distribution of the Sum of two Random Variables
Introduction
A probability density function is said to be symmetric if f ( x)  f ( x) , for all x .
Example, the Cauchy density and the uniform density on  a, a  , for a  0 are both
symmetric densities. Thus a random variable X is said to be symmetric if X and

 X have the same p.d.f as the standard normal random variable.
Objectives
 find the distribution sum of random variables,
 calculate some probabilities by conditioning.
Calculation of some Probability by Conditioning

Example 3.1
Suppose X and Y are independent continuous r.v having respective densities g ( x)
and h( y ) . Compute P  X  Y  .
Solution
Conditioning on the value of value of Y yields

P  X  Y    P  X  Y Y  y h( y )dy
0
Recall from you notes that


P  X  x    P  X  x Y  y h( y )dy
0
and so you have


P  X  Y    P  X  Y Y  y h( y )dy
0

=  P  X  y  h( y)dy

since X and Y are independent .
76

= F

X ( y )h( y )dy , where FX is the c.d.f of X.
Example 3.2
Suppose X and Y are independent continuous r.v with respective p.d.f g ( x) and
h( x) . Let Z  X  Y and find PZ  z.
Solution
By conditioning on the value of Y you obtain

PZ  z   P  X  Y  z Y  y h( y)dy


=  P  X  z  y h( y)dy


= F

X ( z  y )h( y )dy …………………………………..(1)
Now differentiating the c.d.f (1) you have the p.d.f of Z as


d
fZ ( z)   FX ( z  y)h( y)dy
dz 

d
=  dz  F

X ( z  y  h( y )dy .
On differentiating under the integral sign , using a theorem Leibnez you get

fZ ( z)  

f X ( z  y )h( y )dy
which is the integral with respect to the density of Y . This result is called the
Convolution of the two p.d.f’s of X and Y . This is the elementary Convolution
Theorem.
Activity 3.3
(i) Prove that the p.d.f of the r.v. W  X  Y is

f ( w)   g (w  y)h( y)dy

if X and Y are independent with p.d.f ‘s g ( x) and h( x) respectively.

(ii) Let X 1 and X 2 be two independent exponential r.v’s with respective
77
1 1
means and . Show that
1 2
1
P  X1  X 2   .
1  2
(iii) If X and Y are independent identically distributed positive r.v’s show
that
 X  1
(a) E 
 X Y  2
E X X Y  k 
k
(b) .
2
Example 3.4
If X and Y are independent r.v’s both uniformly distributed on  0,1 .
Calculate the probability density function of Z  X  Y .
Solution
1 , 0  z  z

Note that , g ( z )  h( z )  
0 otherwise.

z
k ( z )   g ( z  y )dy since 0  z  y  1  z  1  y  z and 0  y  1.
0
z
For 0  z  1 , since z 1  y  z , k ( z )   dy  z .
0
1
For 1  z  2 , we have , k ( z)   dy  2  z .
z 1
Hence the p.d.f of Z is

 z , 0  z 1

k ( z )  2  z , 1  z  2
 0 , otherwise.

78
Sketch of Graph y 1 y  z 1
(1,2) yz
z 1 (0,1)
(1,1)
(0,0)
Figure 3. 1
Theorem 3.5 ( Jacobian Approach)

Suppose that ( X , Y ) is a bivariate continuous r.v. with joint p.d.f f ( x, y) .
Let Z  H1 ( X , Y ) and W  H 2 ( X , Y ) and assume the functions H1 and H 2
satisfy the follow condtions:

(a) The equations z  H1 ( x, y) and w  H 2 ( x, y) may be uniquely solved
for x and y in terms of z and w , say x  G1 ( z, w) and
y  G2 ( z, w)
 x x  y y
(b) The partial derivatives , , and exists and are continuous.
 z w  z w
Then the joint p.d.f of ( Z ,W ) say k ( z, w) , is given by
k ( z, w)  f G1 ( z, w), G2 ( z, w) J ( z, w)
where J ( z, w) is the following 2 by 2 determinant called the

Jacobian of the transformation ( x, y)  ( z, w) .i.e.
79
x x
z w
J ( z, w)  0
y y
z w
Theorem 3.7
Let ( X , Y ) be a bivariate continuous r.v. with p.d.f f ( x, y) . Let Z  X  Y and
U  X  Y . Then the p.d.f,s of Z and U are respectively given by

(a) f Z ( z )   f ( x, z  x)dx
0

(b) fU (u )   f (u  y, y )dy
0
with the corollary that if X and Y are independent with p.d.f g ( x) and h( y )
respectively, then

(i) f Z ( z )   g ( x)h( z  x)dx
0

(ii) fU (u )   g (u  y ) g ( y)dy .
0
Proof
(a) Let z  x  y , a  x . Thus, x  a and y  z  a . The jacobian of the
transformation is
x x
a z 1 0
J ( a, z )   1 .
y y 1 1
a z
The joint p.d.f of Z  X  Y and A  X is
f (a, z  a)  f ( x, z  x)

Therefore, the marginal p.d.f of Z is f Z ( z )   f ( x, z  x)dx .
0
(b) Let u  x  y and a  y . Thus x  u  y and y  a . The Jacobian is

Let
x x
u a 1 1
J (u, a)    1,
y y 0 1
u a
80
The joint p.d.f of U  X  Y and A  Y is

f (u  a, a)  f (u  y, y)
Therefore, the marginal of U  X  Y is given as

fU (u )   f (u  y, y )dx
0
The proofs of the corollaries follow from (a) and (b) by setting f ( x, y)  g ( x)h( y) .
Example 3.8
Let ( X , Y ) be a bivariate random variable with joint probability density
8 xy , 0  y  x  1
f ( x, y)  
 0 , otherwise
Find P  2 X  1, 2Y  1 and P  X  Y  1 .
Calculate F ( x, y) .
Solution
Notice that the constraint 2 X  1, 2Y  1 require that ( X , Y )  S where S is the
1  1 1  1
square with vertices  , 0  , 1, 0  ,  ,  , 1,  . Hence,
2  2 2  2
1 1
1 2 1 2
3
P  2 X  1, 2Y  1  8  f ( x, y )dydx  8 xdx  ydy  .
1 0 1 0
8
2 2
1 1
Similarly, X  Y  1 if ( X , Y )  T , where T is the triangle with vertices  ,  , 1, 0  ,
2 2
1,1 . Hence
1 x
5
P  X  Y  1  8  xydydx  6
1 1 x
2
y x
Finally, F ( x, y )    8uvdudv  2 x 2 y 2  y 4 .
0 v
81
Summary
At the end of Section 3, you have learnt
 to find the distribution of sum of random variables,
 to calculate probabilities by conditioning.
82
Section 4 Distribution of Product and Quotient of two Random

Variables
Introduction
Welcome to Section 4 of Unit 3. I will walk you through the procedure for finding
the distribution of product and quotient of random variables.
Objectives
After completing this section, you should be able to
 find the distribution of product of two random variables,
 find the distribution of quotient of two random variables.
Theorem 4.1
Let ( X , Y ) be a bivariate continuous random variable with p.d.f f ( x, y) . Let
V  XY and W  X / Y . Then the p.d.f,s of V and W are respectively given by

1
(a) fV (v)  

f ( x, v / x)
x
dx

(b) fU (u )  

f ( wy, y ) y dy
with the corollary that if X and Y are independent with p.d.f g ( x) and h( y )
respectively, then

1
(a*) fV (v)   g (v)h(v / x) x dx


(b*) fU (u )   g (wy)h( y) y dy .

Proof
(a) Let v  xy and a  x . The Jacobian of the inverse transformation is
1 0 1 1
J ( v, a )   . Therefore , Let J  .
v / a 1/ a a
2
a
83
The marginal p.d.f of V is


1
fV (v)  

f ( v, v / x )
x
dx .
(b) Let w  x / y and a  y . The Jacobian of the inverse transformation is
a w
J (v , a )  a .
0 1
The marginal p.d.f of W is

fW ( w)  

f ( wy, y ) y dx .
The proofs of the corollaries follow from (a) and (b) by setting
f ( x, y)  g ( x)h( y) .
Example 4.2
X and Y are two independent random variables with a common uniform
distribution. Find the probability density function of V  XY .
Solution
The probability function of V  XY is

1
g (v )   g ( x)h(v / x) x dx

where g and h are the marginal p.d.f’s of X and Y . Noting that, the values
for which g and h are not equal to zero, youy find that 0  x  1 and
v
0  1  v  x  1.
x
84
v vx
(1,1)
x 1
(0,0) (1,0)
The region of integration is as sketched above. Therefore,

1
1
g (v)   dx   ln v , 0  v  1.
v
x
Recall and note that XY and Y have the same p.d.f where Y U  0, X  and
X U  0,1 .
Example 4.5
Suppose we have a circuit in which both current (I) and resistance (R) vary in some random
way, thus I and R are r.v. with pdfs
2i 0  i  1
I : g i   
0 elsewhere
 2
r 4 0r 3
R : h i   

 0 elsewhere
Find the p.d.f of the voltage, Voltage = I  R .
Solution
V  IR . By definition
  ii di

K v   f i, v
 i
85
 i  1i  di

:. K  v    g  i h v

Now 0  i  1 and 0r 3
 0 V 3
i
 0  V  3i
 0 V  i but i  1
3
Therefore, you have 0 V  i 1

3
The probability density function of v is
K  v   V 2i V
1 2
 1 di
3 9i 2 i
1
= 
V
3
2v 2
9i 2
di
2v 2  1 1
= 
9  i V 3
2v  3  v 
= , 0v3
9
 2v  3  v 
 0v3
:. K  v    9
 0
 otherwise
Activity 4.6
Two independent r.ve x and y have a common uniform distribution over  0, 1 . Let
Z  X  Y and W  X  Y . Show that the joint pdf of  Z , W  is

 1 0  z  w  2, 0  z  w  2
f  z, w   2

0 otherwise
Hence find the probability density function of

(i) Z (ii) W (iii) V Y  X
86
Activity 4.7
X and Y are independent r.v. with g  x   e x , x  0 and
h  y   2e2 y , y  0
X
If W  , find the p.d.f of W.
Y
Activity 4.8
Let X and Y represent the number of omissions of  - particles form two different sources
of radioactive material during a specified time period of length (t). assume that X and Y have
Poisson distribution with Parameters 1t and 2t respectively . Let Z  X  Y represent the
total number of particle form the two independent sources. Use the preceding convolution
theorem to find the p.m.f of Z.
Summary
In this Section, you have learnt
 to find the distribution of product of two random variables,
 to calculate of the distribution of two random variables.
87
Section 5 The Beta family of distributions
Introduction
The gamma distribution has some relationship with the beta distribution. In this
section I use some of the techniques developed so for in Sections 1, 2, 3 and 4 to
obtain the Beta distribution from the gamma family of distributions. The rectangular
distribution on  0,1 is a special case of the beta distribution. You first theorem in the
section captures this relationship.
Objectives
 derive the beta distribution as quotient of two independent gamma variates
 compute some moments of the beta random variables.
Theorem 5.1
Let X 1 and X 2 be independent random variables with distributions (1 ,1) and
X1
( 2 ,1) , respectively. Let Y1  X1  X 2 and Y2  .
X1  X 2
Then,
(i) Y1 is distributed as (1   2 ,1) ,
(ii) Y2 has density function of the form:
q( y)  Ky1 1 (1  y)2 1 , 0  y  1,
where K is a normalisation constant.
The distribution specified by the density in (ii) is called a beta-distribution with
parameters 1 and  2 . Both parameters are strictly positive.
88
Proof of theorem 5.1

To establish these results you first note that X 1 and X 2 are both positive random
variables. The sample space of ( X1 , X 2 ) is therefore given by
S1  x1  0, x2  0 .
This implies Y1  0 and 0  Y2  1 and so the sample space of (Y1 , Y2 )
is given by S2   y1  0,1  y2  0 .
x1
The inverse transformation of y1  x1  x2 , y2  is given by
x1  x2
x1  y1 y2 , x2  y1  y1 y2 , with domain S 2 .
The Jacobian of the inverse transformation is
y2 y1
J ( y1 , y2 )    y1
1  y2  y1
As J will not vanish anywhere on the sample space S2   y1  0,1  y2  0 ,
you have the density function of (Y1 , Y2 ) as
p( y1 , y2 )  J ( y1 , y2 ) f ( y1 y2 , y1  y1 y2 )
= y1 f ( y1 y2 , y1  y1 y2 ) .
Note that X 1 , X 2 are independent (1 ,1) and ( 2 ,1) , you have:
1
f ( x1 , x2 )  x11 1 x22 1e ( x1  x2 ) , x1  0, x2  0
(1 )( 2 )
and so you have that,
y1
 y1 y2  1 ( y1  y1 y2 )2 1 e y1
 1
p( y1 , y2 ) 
(1 )(1 )
1
= y11 2 1 ( y1  y1 y2 )2 1 e y1 . y22 1 (1  y2 )2 1
(1 )(1 )
which clearly factorizes into:

p( y1 , y2 )  Kg ( y1 )h( y2 )
where g is a function of y1 alone and h is a function of y2 alone. This proves
the assertion that Y1 and Y2 are independent random variables. The marginal densities
have the forms:
89
p( y1 )  K1 y11 2 1e y1 , y1  0
q( y2 )  K2 y21 1 (1  y2 )2 1 , 0  y2  1 ,
where K1 and K 2 are normalizing constants. This establishes that Y1 (1   2 ,1)
and Y2  (1 ,  2 ).
You may find the marginal density of Y1 by recalling the definition of the beta
1
function as B(1 ,  2 )   y21 1 (1  y2 )2 1 dy2 as
0
B(1 ,  2 ) 1 2 1  y1
p( y1 )  y1 e , y1  0 .
(1 )( 2 )
(1 )( 2 )
and so you B(1 ,  2 )  .
(1   2 )
Therefore, the marginal density function of Y2 is given by
1
q( y2 )  y21 1 (1  y2 )2 1 , 0  y2  1.
B(1 ,  2 )
Example 5.2
Determine the value of A in the following beta densities.
(i) f ( x)  Ax 2 (1  x)2 , 0  x  1,
(ii) f ( x)  Ax 4 (1  x)8 , 0  x  1,
(iii) f ( x)  Ax1/2 (1  x)3/2 , 0  x  1,
(iv) f ( x)  Ax 1/2 (1  x)5/2 , 0  x  1.

Solution
  2  23!
(i) A =
=6,
(2)(2) 1!1!
  4  8 11!
(ii) A =  33530112000 ,
(4)(8) 3!7!
1 3
  
  2
A 
2 2 1 2
(iii)    ,
1 3 1 3  
3
  3
     
2 2 2 2 2
90
 1 5
  
A 
2 2
(iv) .
 1 5
  
 2 2
Activity 5.3
A random variable Y has density function f ( y). Find a linear transformation that
would transform the distribution of Y into a beta-distribution, if
(i) f ( y)  K ( y  2)3 (6  y)5 , 2  y  6.
(ii) f ( y)  K ( y  2)1/2 ( y  6)4 , 2  y  6 .
Summary
You have learnt
 to derive the beta distribution as quotient of two independent gamma
variates
 about the computation of moments of the beta random variables.
91
Section 6 The Bivariate Normal distribution
Introduction
The normal distribution is the most widely used distribution in statistics. In Stat 204 or
Introductory Probability II you came across the two dimension normal distribution.
I will revisit this distribution in this section.
Objectives
 compute moments of the bivariate normal distribution
 compute conditional expectation from the bivariate density
Definition 6.1 ( Bivariate normal density)

For  ,  0 ,   1 , the bivariate normal density is given by
1  1  x 2 2  xy y 2  
f ( x, y )  exp   2 
  2  .
1         
2
2 2
 2(1 ) 
Example 6.2
Suppose ( X , Y ) is bivariate normal random variable with joint probability density
1  1  x 2 2  xy y 2  
function f ( x, y )  exp   2 
  2  ,
1         
2
2 2
 2(1 ) 
where  ,  0 ,   1 . Find the marginal densities f X ( x) and fY ( y ) of X and
Y , respectively.
Solution
Integrating away the effect of Y you have the marginal density of X as

1
  1 
 y  x 
2

x2  2 x2 
f X ( x)   f ( x, y)dy    2(1   2 )      2  2 dy .
exp      
 2 1   2
   
y x
Now you set   u and note that
 
92

 u2 
  2(1   2 )  du  2
exp  1    
2
1  x2 
and so f X ( x)  exp   2  .
2  2 
You can similarly verify that
1  x2 
fY ( x )  exp   2  .
2  2 
Thus, X N (0,  2 ) and Y N (0,  2 ) .
The general bivariate normal distribution is more complicated.
Definition 6.3
The pair  X ,Y  has the bivariate normal distribution with means 1 and  2 ,
variances  12 and  22 , and correlation coefficient  if the joint density function is
1  1 
f ( x, y )  exp   Q( x, y) 
2 1 2 1  2
 2 
where 1 ,  2  0 and Q is the following quadratic form
1  x  1   x  1  y  2   y  2  
2 2
Q( x, y )     2     .
(1   2 )   1    1   2    2  
 
Activity 6.4
 X ,Y  has the joint density function f ( x, y) given by
1  1 
f ( x, y )  exp   Q( x, y) 
2 1 2 1  2
 2 
where 1 ,  2  0 and Q is the following quadratic form
1  x  1   x  1  y  2   y  2  
2 2
Q( x, y )     2     .
(1   2 )   1    1   2    2  
 
Show that :
(a) X N (1 , 12 ) and Y N (2 ,  22 ) ,
(b) the correlation between X and Y is  ,

(c) X and Y are independent iff  =0.
93
Summary
After completing this section, you have learnt
 to compute conditional expectation from the bivariate normal density
 to calculate moments of the bivariate normal distribution.
Assignment 3 (a)
A continuous bivariate random variable  X ,Y  has joint probability density
function (p.d.f) f ( x, y) for x and y real. Let Z  X  Y . Show that the probability
density function of Z is given by

k ( z)  

f ( x, z  x)dx
 x 2  xy / 3 , 0  x  1 , 0  y  2
Hence find k ( z ) if f ( x, y )  
 0 , elsewhere;
and compute P 0  Z  1 . Also, if V  X  Y , find the joint p.d.f of (Z ,V )
and hence the marginal p.d.f of V .
Assignment 3(b)
X 1 and X 1 are independent random variables with a common density function given
by
1 , 0  x  1

f ( x)  
0 , otherwise.

Let Y1  X1  X 2 and Y2  X1  X 2 . Find the joint p.d.f of Y1 and Y2 . Hence find
the marginal p.d.f’s of Y1 and Y2 .
Unit Summary
All too soon, we have come to the end of this Unit. Let me recap what you have
learnt in this section. You have learnt
 how to derive the distribution of functions of discrete random variables
 how to derive the distribution of functions of continuous random variables
 to calculate of mean and variance of functions of discrete random variables
94
 to calculate of mean and variance of functions of continuous random variable

 about the bivariate normal distribution.
95
Unit 4 Generating Functions and their Applications

Introduction
Informally, there are four main generating functions namely ;
(i) The moment generating functions E  etX   M X  t 
(ii) The factorial moment generating function

(iii) Probability generating functions
(iv) Characteristics functions E  eitX    (t ) where i  1 .
This Unit will cover the following topics:

Section 1 Factorial moment generating functions
Section 2 Probability generating functions
Section 3 Moment generating functions
Section 4 The characteristic functions
Section 5 The Reproductive Property
Section 6 The Joint probability generating function
Objectives
At the end of this Unit , you should be able
 to compute the four generating functions;

 to use the uniqueness property to obtain distribution of random variables
 to use the generating functions to estimate moments of random variables.
96
Section 1 Factorial Moment Generating Functions
Introduction
Welcome to Section 1 of Unit 4, generating functions. I will discuss in this Section a
generating function for only integer. You will learnt that it is very good for
generating moments.
Objectives
 compute factorial generating functions of random variables
 compute factorial moments of random variables
Definition 4.1 (Factorial moment generating function )

For any a.r X, the factorial moment generating function (f.mg.f) of X is given by
 X t   E t X 
Taking derivations you can determine moments as follows:

   t   E  Xt X 1     1  E  X 
   t   E  X  X  1 t X 2     1  E  X  X  1
   t   E  X  X  1 X  2 t X 2     1  E  X  X  1 X  2 
Thus, values at t  1 are given by

  1  E  X 
  1  E  X  X  1
  1  E  X  X  1 X  2 
 k 1  E  X  X  1 X  2   X  k  1

where  k 1 is called the kth derivative of   t  evaluated at t  1
d k  t 
 k 1 
dt k t 1
97
  k  1 is called the k th factorial moment of X . the relationship between m.g.f M X  t  and
f.m.g.f is that   t   M X  ln t 
Clearly, a knowledge of the factorial moments is equivalent to knowledge of the ordinary

moments. For example:
The first factorial moment is E  X  = the first moment.
The second factorial moment is

E  X  X  1  E  X 2   E  X  …………………………(1)
The third factorial moment is

E  X  X  1 X  2   E  X 3   3E  X 2   2E  X  ………………………………..(2)
So the from equations (1) and (2) you have that

E  X 2   E  X  X  1  E  X 
 
E  X 3   E  X  X  1 X  2   3 E  X  X  1  E  X   2E ( X ) from (2)
It is every easy to prove that for the sum of two independent r.v.s X and Y,
 X Y  t    X  t  Y  t 
and by induction, this can be extended to any finite number of independent summands
n
as X 
1  Xn t    X t  .
k
k 1
Example 1.2
Suppose a random variable X has the Poisson distribution with parameter Y . If Y
itself is a Poisson random variable with parameter  , find the factorial moment
generating function of Z  X  Y .
Solution
Y Po ( ) and X Y  y Po ( y) . Let G be the factorial m.g.f of X .
Then , you have that

G(s)  E  s ( X Y )   E E  s ( X Y ) Y  
= E e sY
E  s X Y  .
98

Now E  S X Y  y    s k e y y k / k !
k 0

( sy ) k
= e y  ( by the Maclaurin’s series
k 0 k !
expansion of e x )
= e y esy
= e y ( s 1) .
Hence,      
E esY E  s X Y   E e sY  sY Y  E t Y , where t  e2 s 1
= e (t 1)
2 s 1
= e ( e 1)
.
Summary
 How to compute the factorial moment generating functions,
 How to use the uniqueness property of f.m.g.f to obtain distribution of
random variables,
 How to use the f.m.g.f to estimate moments of random
variables.
99
Section 2 Probability Generating Functions
Introduction
The probability generating functions are define for only nonnegative integer, and
therefore coincide with the factorial m.g.f discussed in Section 1
Objectives
 compute probability generating functions of random variables
 compute probabilities of random variables from p.g.f’s.
Definition 2.3 (Probability generating function)

For r.v.’s where only possible values are all the non-negative integers, the p.d.f is defined by

g  s    pk s k  E  s X 
k 0
where pk  P  X  k  and X is a random variable with probable values 0, 1, 2, . . .
For non-negative integer valued random variable X, the p.g.f is given by

 
g  s    pk s k  E  s X   p0   pk s k
k 0 R 1
where pk  P  X  k  .
 
Note that g 1  1 and g  0   p0   pk s k . Since by hypothesis pk  0 and P k 1
R 1 k 1
g  s  is defined at least for s  1 where s is a complex dummy variable and is
differentiable infinitely for s  1 . The probability generating function of X is related to
the characteristic function  of X through a change of variable s  eit
  t   E  eitX   E  eit    g  eit 

X
:.
 
Similarly the relationship between m.g.f M and the p.g.f is

M  ln s   E e ln s  x 
= E  eln s    E  s X   g  s  , since eln s  s
X
 
100
Thus g  s   M  ln s  where M is the m.g.f of X . Note that, probability generating
functions p.g.f’s have the same statistical properties as characteristics functions (c.f,s),
moment generating functions (m.g.f’s) and factorial moment generating functions
(f.m.g.f’s). A p.g.f uniquely determines the p.d.f or p.m.f’s of random variables and for
that matter the moments may be obtained through successive differentiation.
The factorial moment are given by

E  X  X  1  X  k  1  g  k  (1) ………………………. (1)
d xg s
where g  k   s   is the k th derivative of f.
ds k
Hence E  X   g (1) 1 , E  X 2   g  2 1  g 1 1
Thus, whereas expanding g  s  in a Taylors series about s  1, yields the factorial moments
as co-efficients, expanding about s  0 yields the probability for non-negative integer r.v’s
Infact using the Maclaurin’s series, expansion of g  s  we have

g  s    P  X  k s k  g  0   P  X  0 
k 0

:. g   s    k P  X  k s k 1  g   0   P  X  1
k 1


g   s    k  k  1 P  X  k s k 2  g (0)  P  X  2
k 2
2!
and generally
g k  0
P X  k  , k  0, 1, 2,
k!
Thus, from a knowledge of the form of g  s  , you can generate the probabilities P  X  k  .
This result is quite useful in problems that the p.g.f can early be found, but it is difficult to
find the probabilities directly. In several cases however, one cannot easily see the pattern for
g
k
 0  even by using the theorem of Leibniz and more so if g
k
 0  gets more and more
complicated or difficult to obtain and the use of the Binomial theorem may be preferred.
101
Example 2.3
A random variable X taken integer values 0, 1, 2, . . . and the p.g.f of X is
1  s  1  1      s
g  s  1
1  1    s
Find the probability distribution of X.
0, if n  0

Answer : P  X  n   1   , if n  1

 1    , n  2, 3, 4,
n2
Activity 2.4
A random variable X has probability generating function
15 19 s
2
f s 
1  14 s  60 s2
19 361
Find the probability distribution of X.
1
 
Hint: Express in partial fraction by noting that
1  14 s  60 s2
19 361
1 1

1 1419 s  60 361s  1 1019 s 1  619 s 
2
Then proceed using the Binomial Theorem.
Also, the p.g.f of a sum of independent non-negative integer r.v’s is the product of their p.g.f.
s. Please, know how to prove this. In particular if N , X1 , X 2 , X n are independent non-
negative integer-valued random variables and suppose we want to determine the p.g.f g R  s 
of the sum R  X1  X 2   X N . A sum of random variables with a random number of
terms (i.e. N is random). Let g N  s  be the p.g.f of N and suppose the X i ’s have the
same p.d.f ( p.d.f or p.m.f) with common p.g.f g  s  . Then
g R  x   E  s R   E  s X1  X 2  XN


= E E  s X1  X 2  XN
N 
102

=  E  s
n 0
X1  X 2   X N
N  n P  N  n 

=  E s
n 0
X1  X 2   X N
P  N  n 

=  E s
n 0
X1
s X2 
s XN P  N  n
  P  N  n

 E  s E  s X1 X2 XN
= E s
n 0
Since the X i ' s are independent.


gR  x   g  s  P  N  n 
n
( since the X i and N are independent,)
n 0
= E g s  N

= g N  g  s 
To sum up g R  s   g N g  s  . Using the chain Rule, you have
g   s   g N  g  s   g   s  and setting s  1
E  R   g N 1  g  1 since g  s   1
= ENEX 
Similarly, we can show that the variance of R,  R2 , is given by
 R2  E  X   N2  E  N    X2
2
where  X2  Var ( X ) .
Example 2.5
Find the p.g.f of a Bernoulli r.v. X with parameter P (representing the probability of success)
Solution
If probability mass function of X is P  X  x   f  x   p x 1  p 

1 x
. Then , you have
g s  E sX 
1
s p x 1  p 
x 1 x
=
x 0
s p x 1  p 
x 1 x
=
x 0
103
= 1  p   sp  1  p  sp
= ps  q where q  1  p
Example 2.6
Let X1 , X 2 , , X n be n independent Bernoulli trials (as s.r.’s) and let
Sn  X 1  X 2   Xn
Find the p.g.f of S n . Hence derive the p.m.f of a Binomial r.v with parameters n and p.
Solution
 s    gn  s  
n
gn 1
since the X 2 ' s are iid
=  ps  q  from previous results

n

  n
  k   ps   q 
k nk
= by the Binomial Theorem
k 0  
n  n 
 
i.e. g n  s      p k q n k  S k
k  0  k  
n
 P  Sn  k     p k q n k , k  0, 1, 2, n
k 
which is the p.m.f of the Binomial distribution
Remark
Know how to drive, and identify (by the uniqueness property theorem) the p.g.f s of the
frequently encountered probability distribution i.e. Poisson with parameter   0 , the
Geometric, the Negative Binomial (Pascal).
Example 2.7
If a discrete random variable X has probability generating function
  , p  q 1

g s  p
1  qs
where  is a constant positive real number, then X must be a Negative Binomial s.v with
p.m.f
104
   n  1  n
p  X  n    p q , n  0, 1, 2,
 n 
Note that if

 q 
g s    ,
 1  ps 
Then, g  s  is the p.g.f of the Pascal (Negative Binomial) distribution.
Example 2.8
Let X have a Poisson distribution with parameter   0 . Find the p.g.f of X. Hence, find
the k th factorial moment of X.
Solution
e   k
P X  k  , k  0, 1, 2
k!

:. g  s   E  s x    sk P  X  k 
k 0
s e  k  k
=
k 0
k!

s
k
=e 

k 0
k!
i.e g  s   e  e s  e
  s 1
Note here that the p.g.f is the same as the f.m.g.f and so

  (s  1)
k
g (s)  k 0
k!
which is the Taylor series expansion of g  s  about s  1 .
:. g 
k
1   k . (the k th factorial moment)
ie. E  X  X  1  X  2  X  K  1   k
105
Example 2.9
Find the p.g.f (or the f.g.m.f in this case) for the outcome, X of the toss of a fair die. Hence
find the p.g.f for the sum of pints on two dice (assuming independence) X1  X 2
What is P  X1  X 2  10  . Moreover, of X is the sum of the points on seven fair dice
simultaneously tossed, find P  X  28 to 4 decimal places.
Solution
The p.g.f of X, the outcome of the toss of a fair dice is
t 1  t 6 
g t   E t    t
6
k
X

1
6 6 1  t 
t 2 1  t 6 
2
g X1  X 2  t    g  t  
2
36 1  t 
2
 
 
Now, 1 d 1  d  t k   kt k 1
1  t 
2
dt 1  t dt k 0 k 0
And by the Binomial Theorem: -

 2
 
2
(1  t 2 )2     t 2
j
j 0  j 
2
 2 
36  
g X1  X 2  t   t  
2
 kt k 1
6j
  t
j 0  j  k 1
2 
 2
= 136     1 kt16 j  k
j
j  0 k 1  j 
The probability that the total number of pints is say, 10 is the co-efficient of t10 . But
1  6 j  k  10 where k  3 and j  1 or k  9 and j  0 .
1   2  2 
P  X 1  X 2  10   3   (1)  9   (1) 
1 0
36   1   0 
= 3 1
36 12
This would have been computed of course by the elementary reasoning that there are 36
equally likely sample points out of which  6, 4 ,  4,65,5 constitute the event
X1  X 2  10 and therefore, the probability is 3 1
36 12
106
P  X1  X 2   X 7  25  24017  67
= 0.0858 (to 4 d.p)
Activity 2.10
a  bs
Let G ( s)  . When is G the probability generating function of a finite integer
1  cs
valued random variable X .
Summary
 to compute probability generating functions of random variables,
 to compute probabilities of random variables from p.g.f’s.
107
Section 3 Moment Generating Functions
Introduction
An appropriate choice of s in the definition of the p.g.f. will give the moment
generating function for nonnegative function. It is defined for all real numbers.
Objectives
 compute moment generating functions of random variables,
 compute moments of random variables from m.g.f’s.
Definition 3.1 (Moment generating functions (m.g.f) )

The m.g.f of a s.v X is defined as M  t   E  etX 
  tx j
 e p  X  x j  if X is disrecte
=  j 1
   tx f x dx if X is continuous with p.d.f f x or c.d.f F x
       
Note: - There exits r.v’s for which the m.g.f may not exist and so its utility is therefore
sometimes limited. This is the advantages which the characteristic function has over the
moment generating function.
Theorem 3.2
Suppose that the random variable X has m.g.f M  t  . Let Y   X   , then the
m.f.g of the random variable Y is given by
M Y (t )  e t M X ( t ) .
Proof
M Y (t )  E  etY   E e( X   )t   e t E (e tX )  e t M X ( t ).
108
Theorem 3.3 ( The uniqueness property)
Let X and Y be two random variables with m.g.fs M X (t ) and M Y (t ) ,

respectively. If M X (t )  M Y (t ) for all values of t , then X and Y have the same
probability distribution.
Proof
The proof of theorem 3.3 is beyond the scope of this course.
Example 3.4
Suppose that X has distribution N (  ,  2 ) . Let Y   X   , then Y is also

normally distributed. Note that the m.g.f of Y is given by
M Y (t )  e t M X ( t ) ,
and the X N ( ,  2 ) has m.g.f
M Y (t )  e t et ( )   e(   )t e( )

2 2 2
t /2 /2
,
 
which is is the m.g.f of a normally distributed random variable with expectation
   and variance  2 2 .
Theorem 3.5
Suppose that X and Y are independent random variables . Let Z  X  Y , and let
M X (t ), M Y (t ) and M Z (t ) be the m.g.fs of the random variables X , Y and Z ,
respectively. Then, you have that
M Z (t )  M X (t )M Y (t ) .
Proof
M Z (t )  E  eZt   E e( X Y )t   E  etX etY   M X (t )M Y (t ) .
109
Notes
This theorem may be generalized as follows: If X1 , , X n are independent random
variables with m.g.fs M X i , i  1, 2,3,..., n , then M Z , the m.g.f of Z  X1 , ,Xn
is given as
n
M Z (t )   M X i (t ).
i 1
Activity 3.6
Suppose that the continuous random X has probability density function
1 x
f ( x)  e ,   x   .
2
Find the m.g.f of X and use it to find E ( X ) and V ( X ) .
Summary
At the end of this section, you have learnt to
 compute moment generating functions of random variables

 compute moments of random variables from m.g.f’s.
110
Section 4 Characteristics Functions

Introduction
Characteristic functions are slightly more difficult to understand than moment

generating functions in that they involve complex numbers. They have however, two
principal advantages over moment generating function. First, the c.f,  X (t ) is finite
for all random variables and all real numbers t . Secondly, the distribution of of X
and usually the density if it exists, can be obtained from the characteristic function,
by mean of an inversion.
Objectives
 compute characteristic functions of random variables

 compute moments of random variables from c.f’s.
Definition 6.1 (Characteristic function (c.f) )

The characteristic function of a s.v X is defined as:-
  t   E  eitX  , i 2  1
  itxj
 e P  X  xj  if X is disrecte
=  j 1
  eitx f x if X is continuous with p.df f x and c.d.f F x
       
where t is a real variable .
Note:   t  exists for all values of t, since eitx  1
  t   1
It also exists for every distribution and is uniformly continuous in t.

Remark
(a) The relation between (cumulative) distribution functions and c.f.s is one-to-one. Thus,
knowing the c.f is synonymous with knowing the c.d.f and the p.d.f or p.m.f
111
(b) If X1 , X 2 , X n are independent r.v.s. the c.f of their sum is the product of their d.f’s.
this simple result makes c.f.s extremely important or expeditious for dealing with
problems involving sums of independent r.v’s
(c) When they are finite, the moments of a r.v may be determined by differentiating the
c.f. The explicit relation is E  X k   1 k     0  where i  1 and

k
i
d k  t 
 k  t  
dt k
(d) The one-one correspondence between distribution functions (d.f) and theirs c.f.s is
also presereved by various limiting processes. In fact, if F1 F1 , F2 , are d.f.s such
that lim Fn  x   F  x  for every x at which F is continuous and n  t  is the c.d.f

n 
corresponding to Fn , then n  t     t  uniformly in every finite interval, as

n   .Conversely, if 1 , 2 , , n are the c.f’s corresponding to the d.f.s
F1 , F2 , Fn and lim n  t     t  for every t and   t  is the c.f. corresponding to the

n 
distribution function F and then lim Fn  x   F  x  for every x , at which F  x  is

n 
continuous. This makes c.f suitable for the derivation of limiting distributions.
Example 6.2
Let X and Y have joint density function f given by
1  12 ( x2  y 2 )
f ( x, y )  e
2
(i) Find the characteristic function of the random variable X / Y .
(ii) Using the uniqueness property of characteristic functions identify the
distribution X / Y .
Activity 6.3
Let X and Y have joint density function
f ( x, y) 
1
4

1  xy( x 2  y 2 ) ,  x  1, y  1.
Show that
(i) X (t )Y (t )  X Y (t ) and
(ii) X and Y are independent.
112
Summary
You have learnt to
 compute characteristic functions of random variables
 compute moments of random variables from c.f’s.
Section 5 The Reproductive Property
Introduction
Some probability distributions possesses this remarkable and very useful property.i.e if
two or more random variables having a certain distribution are added, the resulting
random variable has a distribution of the same type as that of the summands. This
property is called the reproductive property .
Objectives
 identify known distribution with the reproductive property
 apply this property to real life problems.
Example 4.1
Suppose that X and Y are independent random variables with distributions
N  1 ,  12  and N  2 ,  22  , respectively. Let Z  X  Y , then you have that
M Z (t )  M X (t )M Y (t )  exp  1t  12t 2 / 2   exp  2t   22t 2 / 2 
= exp (1  2 )t  (12   22 )t 2 / 2
which is the m.g.f of a normally distributed random variable with expected value
1  2 and variance 12   22 .
113
Activity 4.2
The length of a rod is normally distributed random variable with mean 4 inches and
variance 0.01 inch 2 . The length of this slot is 8 inches with a tolerance 0.01
inch. What is the probability that the two rods will fit.
Theorem 4.3 ( The Reproductive Property of the Normal distribution)

Suppose X1 , , X n are n independent random variables with distribution N  k ,  k2  ,
 n n

k  1, 2,3,..., n . Let Z  X1   X n , then Z N   k ,   k2  .
 k 1 k 1 
Theorem 4.4 ( The Reproductive Property of the Poisson distribution)

Suppose X1 , , X n are n independent random variables and each X j has a Poisson
distribution with parameter  j , j  1, 2,3,..., n .Then Z  X1   Xn has a Poisson
distribution with parameter   1   n .
Activity 4.5
Prove that Z Poisson( ) . Hint use Mathematical induction on n.
Activity 4.6
Suppose that the number of calls coming into a telephone exchange between 9 a.m.
and 10 a.m, say X 1 , is a random variable with Poisson distribution with parameter
3. Similarly, the number of calls arriving between 10. a.m. and 11. a.m, say X 2 also
has the Poisson distribution with parameter 5. If X 1 and X 2 are independent, find
the probability that more than 5 calls come in between 9 a.m. and 11 a.m ?
Theorem 4.7
Suppose that the distribution of X i is chi-square on ni i.e Z xn2i , i  1, 2,3,..., k ,
where X i ’s are independent random variables . Let Z  X1   X k . Then, Z xn2 ,
where n  n1   nk .
114
Proof
Note that M X i (t )  1  2t 
 ni /2
, i  1, 2,3,..., k . Hence you have,
M Z (t )  M X1 (t )...M X k (t )  1  2t 
 n /2
,
which is the m.g.f of a random variable having the xn2 distribution.
Activity 4.8
Suppose X1 , , X k are independent random variables each having the N  0,1 . Prove
that W  X 12   X k2 xk2 .
Summary
In this Section , you have learnt to
 identify known distribution with the reproductive property

 apply this reproductive property to real life problems.
115
Section 6 The Joint Probability Generating Function
Introduction
Sometimes ,you may be interested in expectation of product of random variables,
and for that matter joint generating function. In this Section, I will discuss the joint
probability generating function.
Objectives
By the end of the Section, you should be able to
 compute joint probability generating functions of random variables,

 calculate cross moments from joint p.g.fs of random variables.
Definition 6.1
The joint probability generating function of the random variables X 1 and X 2 taking
values in the non-negative integers is defined by
 
GX1 , X 2 (s1 , s2 )  E s1X1 s2X 2 .
Note here, that joint generating functions have important uses, one of which is the
following characterization of independence.
Theorem 6.2
Random variables X 1 and X 2 are independent iff
GX1 , X 2 (s1 , s2 )  GX1 (s1 )GX 2 (s2 ) , for all s1 and s2 .
If X 1 and X 2 are independent then so are functions of them and so g ( X1 )  s1X1

and h( X 2 )  s2X 2 . By properties of expectation, you have
GX1 , X 2 (s1 , s2 ) = E  g ( X1 )h( X 2 )   E  g ( X 1 )  E h( X 2 )  = GX1 (s1 )GX 2 (s2 )
116
To prove the converse , equate the coefficients of the terms such as s1i s2j to
deduce after some manipulation that P  X1  i, X 2  j   P  X1  i  P  X 2  j  .,
Example 6.3
Let GX ,Y (s, t ) be the joint probability generating function of X and Y .
(i) Show that GX (s)  GX ,Y (s,1) and GY (s)  GX ,Y (1, t ) .

(ii) Show that
2
E  XY  
st
GX ,Y (s, t )  .
s t 1
Solution
By definition GX ,Y ( s)  E  s X t Y  , therefore GX (s)  E  s X (1)Y   GX ,Y (1, t )
and GX (t )  E (1) X t Y   GX ,Y (1, t ) .
Now differentiating the joint p.g.f partially with respect t followed by taking partial
derivative with respect s , you have
2
st
 GX ,Y ( s, t )   E  XYs X 1t Y 1 
2
Hence ,
st
 GX ,Y ( s, t )   E  XYs X 1t Y 1 
s t 1
 E  XY  .
s t 1
Activity 6.4
Find the joint probability generating functions of the following joint frequency
functions, and state for what values of the variables the series converge.
(a) f ( j, k )  (1   )(   ) j  k  j 1 , for 0  k  j , where 0    1 ,    .

 (2 k 1) kj
(b) f ( j, k )  (e  1)e , for k , j  0.
j
k 
(c) f ( j, k )    p j  k (1  p)k  j / k log 1/ (1  p) , 0  j  k , k  1 , where
 j
0  p  1.
Deduce the marginal probability generating functions and the covariances.
117
Activity 6.5
If X and Y have joint probability generating function
1  ( p1  p2 ) ,
n
GX ,Y ( s, t )  E  s t
X Y
 
1  ( p1s  p2t )
n
where p1  p2  1 , find the marginal mass function of X and Y , and the mass
function of X + Y . Find also the conditional probability generating function
GX Y (s y)  E  s X Y  y  of X given that Y  y . The pair X , Y is said to have
the bivariate negative binomial distribution.
Summary
You have learnt to
 compute joint probability generating functions of random variables,

 calculate cross moments from joint p.g.fs of random variables.
Assignment 4(a)
Suppose X1 , , X k are independent random variables each having the N  0,1 . Let
T  X 12   X n2 . Show that T 2 xn2 .
Assignment 4(b)
Suppose the number of automobile accidents a driver will be involved in during a
one-year period is a random variable X having Poisson distribution with parameter
 , where  is a measure of accident proneness that varies from driver to driver in
accordance with a gamma distribution given by
n
 p   n1  p 
f ( )    exp     ,   0,
 q   ( n)  q 
where n is a positive integer, p, q are positive constants and p  q  1.
(i) Show that the factorial moment generating function of X is
n
 p 
g ( s)    .
 1  qs 
118
(ii) Using the uniqueness property of probability generating functions,

identify completely the distribution of X .
(iii) If p  2 / 3 and n  12 find E  X ( X  1)...( X  k ) , where k is a
positive integer.
Assignment 4(c)
Write short notes on the following generating functions
(i) Factorial moment generating functions

(ii) Joint probability generating functions
(iii) Moment generating functions.
Unit Summary
In this unit , we have discussed
 Factorial moment generating functions

 Probability generating functions
 Moment generating functions
 How to use this generating functions to derive moments of random variables
 How to use the uniqueness property of generating functions to derive
distributions of random variables
 Joint probability generating functions.
119
Unit 5 Some Applications to Reliability Theory
Introduction
I will discuss in this Unit a very important and growing area of application of some
of the concepts introduced in the previous units of this module. This is Reliability
Theory, and it is very important for accessing performance of systems.
The discussion will comprise the following topics:
Section 1 Basic Concepts
Section 2 The Normal Failure law
Section 3 The Exponential failure law
Section 4 The Exponential law and the Poisson distribution
Section 5 The Weibull failure law
Section 6 Reliability of Systems.
Objectives
By the completing this Unit, you should be able
 to understand the basic concepts of reliability theory

 to calculate the reliability of a system.
 to compute the hazard function
 to discuss the main laws of failure time.
120
Section 1 Basic Concepts

In this section, I will introduce and discuss some basic concepts of reliability
theory and relate them to some real life problems.
Objectives
 define the reliability function and the hazard function,

 apply this concept to solve simple problems.
Definition ( Reliability)
The reliability of a component (or system) at time t , R(t ) , is defined as
R(t )  P T  t  ,
where T is the life length of the component. R(t ) is called the reliability function.
For example, if for a particular item , R(s)  0.9 , this means that approximately 90%
of such items, used under certain conditions, will still be functioning at time t .

Let T has p.d.f f , then you have that R(t )   f ( s)ds and if T has c.d.f F ,
t
R(t )  1  P T  t   1  F (t ) .
Another function which plays crucial role in describing the failure characteristics of
an item is the hazard function.
Definition 1.1 (Hazard function)
The instantaneous failure rate Z or hazard function associated with the random
variable T is given by
f (t ) f (t )
Z (t )   ,
1  F (t ) R(t )
defined for F (t )  1.
121
Theorem 1.2
If T , the time to failure, is a continuous random variable with p.d.f and if

F (0)  0 where F is the cumulative distribution function of T , then f may be
expressed in terms of the failure rate
t

 Z ( s ) ds
f (t )  Z (t )e 0
.
Proof
d d
Observe that R(t )  1  F (t ) , we have  R(t )    F (t )   f (t ) . Therefore you
dt dt
have that
f (t )  R ' (t )
Z (t )   .
R(t ) R(t )
Integrating both sides you get

t t
 R' ( s) 
0   0  R(s) ds   ln R(s) 0   ln R(t ) ,
t
Z ( s ) ds
since R(0)  1 by the assumption F (0)  0. Hence,

t

 Z ( s ) ds
R(t )  e 0
, Thus, you have
t
d 
 Z ( s ) ds
f (t )  1  R(t )  Z (t )e 0
.
dt
Example 1.3
Suppose a tube has a failure rate R(t )   , find the probability density of time to
failure T .
Solution
t
f (t )  e 0  e .
  ds  t
By Theorem 1.2 above , you have
Theorem 1.4

If E (T )   , then E (T )   R(t )dt.
0
122
Proof
  
  
Note that 0 R (t ) dt  0  t f (s)ds dt . Integrating by parts , with  f (s)ds  u
t
and
dt  dv . Hence you have that v  t and du   f (t )dt. Thus,

   
 R(t )dt  t  f (s)ds   tf (t )dt.

0 t 0
0
As the first integral on the right-hand side vanishes at t  0 and at t   , you have

 R(t )dt E (T ) ,
0
which completes the proof.
Summary
 define the reliability function and the hazard function,
 apply this concept to solve simple problems.
123
Section 2 The Normal Failure law
Introduction
There are many types of components whose failure behaviour may be represented by
the normal distribution. i.e if the life length of an item T N ( ,  2 ) .
Objectives
 Apply the normal failure model to solve problems.
The reliability function of the normal failure law may be expressed in terms of the
tabulated cumulative distribution function  , as follows
t 
R(t )  1    .
  
Note that in order for you to achieve a high reliability ( say 0.9 or greater) the
operating time must be considerably less than  , the expected life length.
Example 2.1
Suppose that the life length of a component is normally distributed with standard
deviation equal to 10 hours. If the component has a reliability of 0.99 for an
operation period of 100 hours, find the expected life length.
Solution
Using the above equation you have that
 100   
0.99  1    .
 10 
From the table of the normal distribution this gives (100-  )/10=-2.33. So you have
  123.3 hours.
Note that the normal failure law represents an appropriate model for components in
which failure is due to some “wearing effect”.
Summary
In Section 2, you have learnt to
 apply the normal failure model to solve problems.
124
Section 3 The Exponential Failure Law
Introduction
One of the most important failure laws is the one whose time to failure is described
by the exponential distribution. The simplest way to achieve it is by supposing that the
failure rate is constant .i.e. Z (t )   . This leads to the probability density function
associated with the time to failure T , is given by
f (t )   e t , t  0.
The converse is not too difficult to establish that if T exp( ) then R(t )  e t and
f (t )
hence Z (t )   .
R(t )
Objectives
 apply the exponential failure law model to real life problems


Theorem 3.1
Let T , the time to failure , be a continuous random variable assuming all

nonnegative values. Then T has an exponential distribution if and only if it has
constant failure rate.
Activity 3.2
Prove that T exp( ) iff Z (t )   .
Example 3.3
If   0.01 and R(t )  0.9 find t , the time to failure.
125
Solution
Note 0.90  e0.01t  t  100ln(0.9)  10.54 hours. Therefore if each of 100 components
is operating for 10.54 hours, approximately 90 will not fail during that period.
Activity 3.4
Suppose that it cost more to produce an item with a large expected life length than
one with a small life expectancy. Suppose that the cost C of producing an item is
the following function of  , the mean time to failure C  3 2 . Assume that a
profit of D dollars is realized for every hour the item is in service. Find the
maximum expected profit per item.
Summary
You have learnt to
 apply the exponential failure law model to real life problems
126
Section 4 The Exponential Law and the Poisson Distribution
Introduction
A close connection exists between the exponential failure law and a poisson process.
In this section, I will discuss the connection between the two laws.
Objectives
 infer the exponential law from the poisson process.
Suppose that failure occurs because of the appearance of certain random

disturbances.
These may be cause by external forces such as sadden gust of wind or a drop (rise )
in voltage or by internal causes such as a chemical disintegration or a mechanical
malfunctioning.
Let X t be equal to number of such disturbance occurring during a time interval of

length t and suppose that X t , t  0 , constitutes a Poisson process. Suppose that
failure during 0,t  is caused if and only if at least one such disturbance occurs. Let
T be the time to failure, which is assume to be a continuous random
variables.Then,
F (t )  P T  t   1  P T  t  .
Note that T  t  no disturbance occur during 0,t  . Thus,

T  t  Xt  0 .
Therefore, F (t )  1  P  X t  0   1  et , which is the c.d.f of the exponential law.

Thus, you can infer that the above cause of failure implies an exponential failure law.
You can generalized the idea described above in two ways.
(i) Suppose that the disturbance s appear according to a Poisson process.

Assume furthermore that whenever such a disturbance does appear, there
is a constant probability p that it will not cause failure. Then,
127
F (t )  1  e t  ( t )e t p  ( t )2 e t p 2 / 2! ...  1  e (1 p )t Since
T  t  X t  0 or X t  1 and no failure resulted or X t  2 occurred and no
failures resulted, or….
(ii) Suppose that the disturbance s appear according to a Poisson process.

Assume that failure occurs whenever r (r  1) or more disturbances occur
during an interval of length t . Therefore, for time to failure T , you
have
r 1
( t )k e t
F (t )  1  P T  t   1   , by the fact that
k 0 k!
T  t  X t  r 1 .
Summary
After studying Section 3, you have learnt
 to infer the exponential law from the poisson process

 to solve real life problems.
128
Section 5 The Weibull Failure Law
Introduction
I will look modifications of the constant failure rate which will leads to the
exponential failure law. I will generalize in this section from the exponential failure to
the weibull failure law.
Objectives
By the end of this section, you should be able
 to generalize, by modifying the constant rate which lead to the exponential

law.
Suppose that the failure rate Z , associated with T , the life length of an item, has
the following form:
Z (t )  ( t )t  1 ,
where  and  are positive constants. Then the p.d.f of T is given as


f (t )  ( )t  1e t , t  0 .
The random variable having p.d.f given by f above is said to a Weibull

distribution.
Activity 5.1
Sketch the probability density function f , failure rate Z and the failure probability R
for the weibull failure law for   1 and   1, 2,3 .
Activity 5.2
If the random variable T has a Weibull distribution with parameters  and  ,
show that
1 
E (T )   1/     1
 
129
  2    1   2 
2/ 
V (T )      1       1    .
         
Summary
In this section, you have learnt
 to generalize, by modifying the constant rate which lead to the exponential
law.
130
Section 6 Reliability of Systems
Introduction
Begin the discussion in this section by asking yourself, the question : How can I
evaluate the reliability of a system if I know the reliability of its components ? This
can be very difficult problem and only few simple problem shall be considered herein.
Objectives
After studying this section, you should be able
 to solve few simple problems of reliability of system.
Example 6.1
Suppose that two components are hooked up in series
C1 C2
This means that in order for the above system to work, both components must be
functioning. If, in addition that the components function independently, you may obtain
the reliability of the system, R(t ) , in terms of the reliabilities of the components,
R1 (t ) and R2 (t ) , as follows:
R(t )  P T  t  , where T is the time to failure of the system
= P T1  t  T2  t  , where T1 and T2 are the times to failure of the system of
components C1 and C2 .
= P T1  t  P T2  t   R1 (t ) R2 (t )
Thus, you find that R(t )  min  R1 (t ), R2 (t ) . i.e. a system made up of two independent
components in sries, the reliability of the system is less than the reliability of any of
its parts.
131
Theorem 6.2
If n components, functioning independently, are connected in sries, and if the i th

component has reliability Ri (t ) , then the reliability of the entire system, R(t ) ,
is given by,
n
R(t )   Ri (t ) .
i 1
2 2
Example, if T1 exp(1 ) and T2 exp( 2 ) then R(t )   Ri (t )   eit  et (1 2 ) .
i 1 i 1
Hence the p.d.f of the failure time of the system , T , is given by
d
f (t )   ( R(t )  (1   2 )e (1 2 )t
dt
Theorem 6.3
If two independently functioning components having exponential failures with

parameters 1 and  2 are connected in series, the failure law of the resulting system
is again exponential with parameter equal to 1   2 .
Activity 6.4
Each of six tubes of a ratio set has a life length (in year) which may be considered
as a random variable. Suppose that these tubes function independently of one another.
What is the probability that no tubes will have to be replaced during the first two
months of service if :
(a) The p.d.f of the time to failure is f (t )  50te25t , t  0 .

2
(b) The p.d.f of the time to failure is f (t )  25te25t , t  0 .
Another important system is a parallel system in which the components are connected
in such a way that the system fails to function only if all the components fail to
function.
C1
C2
132
If only two components are involved, the system may be depicted as above. Again
if you assume that components are independently of each other, the reliability of the
system, R(t ) , may be expressed in terms of the reliabilities of the components, say
R1 (t ) and R2 (t ) as follows.
R(t )  P(T  t )  1  P(T  t )
= 1  P T1  t  T2  t 
= 1  P T2  t  P T2  t 
= 1  1  P(T1  t )1  P(T2  t )
= 1  1  R1 (t )1  R2 (t )
= R1 (t )  R2 (t )  R1 (t ) R2 (t )
The last form indicates that R(t )  max  R1 (t ), R2 (t ) .i.e a system composed of two
independently functioning components operating in parallel will be more reliable than
either of the components.
Theorem 6.5
If n components functioning independently are operating in parallel, and if the i th
component has reliability Ri (t ) , then the reliability of the system, say R(t ) is given
by
n
R(t )  1   1  Rk (t )
k 1
Example 6.6
If two components in parallel, each of whose failure time is exponentially distributed.

with parameters, 1 and  2 ,respectively. Then
R(t )  R1 (t )  R2 (t )  R1 (t ) R2 (t )  e1t  e2t  e1t e2t
Thus the p.d.f of the time to failure of the parallel system, T is given by
f (t )  1e1t   2e2t  (1   2 )e(1 2 )t .
133
Activity 6.7
Suppose that three units operated in parallel. Assume that each has the same constant
failure rate   0.01 .
(i) Find the reliability for the period of operation of 10 hours ,
(ii) How much improvement can be obtained by operating three such units in
parallel ?
Lastly, brief description of the concept of safety factor is given. Suppose that the
stress S applied to a structure is considered as a (continuous) random variable.
Similarly the resistance of the structure , say R , may also be considered as a
continuous random variable.
Definition 6.8 ( Safety factor)
The safety factor of the structure as the ratio of R to S ,
T  R/S
If R and S are independent random variables with p.d.f’s g to h , respectively, then

the p.d.f of T  R / S is given by

f (t )   g (ts)h( s) sds.
0
Note that the structure will fail if R > S , that is, if T  1 and hence, you have
that
1
PF   f (t )dt .
0
Summary
 to solve few simple problems of reliability of system.
134
Assignment 5(a)
Each of the six tubes of a radio set has a life length (in years) which may be
considered as a random variable. Suppose that these tubes function independently of
one another. What is the probability that no tubes will have to be replaced during the
first two months of service if:
f (t )  50te25t , t  0 .
2
(i) The p.d.f of the time to failure is
(ii) The p.d.f of the time to failure is f (t )  25te25t , t  0 .
Assignments 5(b)
Suppose that each of three electronic devices has a failure law given by an
exponential distribution with parameter 1 , 3 and  3 . Suppose that these three devices
function independently and are connected in parallel to form a single system.
(a) Obtain an expression for R(t ), the reliability of the system.

(b) Obtain an expression for the p.d.f of T , the time to failure of the system.
Sketch the p.d.f.
(c) Find the main time to failure of the system.
Assignment 5(c)
Suppose that two independently functioning components, each with the same constant
failure rate are connected in parallel. If T is the time to failure of the resulting
system, obtain the moment generating function of T . Also determine E (T ) and
Var (T ) , using the m.g.f.
Unit Summary
As we have discuss reliability theory is vital for studying failure of systems.
In this Unit you have learnt,
 about the basic concepts of reliability theory

 to calculate the reliability of a system.
 to compute the hazard function
135
 about the main laws of failure time.
Unit 6 Convergence of Sequences of Random Variables
Introduction
Statements such as in the long run and on the average are very common in
statistics , and express our believe that the averages of the results of repeated
experimentation will show less and less random fluctuation as they stabilizes to some
limit.
Our discussion will cover topics such as
Section 1 Statistical Inequalities
Section 2 Concentration Inequalities
Section 3 Modes of Converges
Section 4 The Inversion and Continuity Theorem
Section 5 Limiting Moment Genetaring functions
Section 6 Large deviations
Objectives
By the end of this unit, you should be able to
 define the various modes of converges

 develop criteria for proving converges
 compute the distribution of sequence of random variables.
136
Section 1 Statistical Inequalities
Introduction
Computing the exact probability that a random variable X belongs to some set of
interest is not all the time easy. But , a simple bound on these probabilities is often
sufficient for the task in hand. Another important concept that crops up in many areas
of theoretical and applied probability is that of convexity.
Objectives
 state and prove the main statistical inequalities,

 apply them to obtain bounds on probabilities.
Theorem 1.1 ( Basic inequality)

If h( X ) is a non-negative function then, for a  0,
P  h( X )  a   E  h( X ) / a.
Proof
Begin by defining the indicator function as follows :
1, h( X )  a
I h  a  
0, otherwise.
Now note that you have E  I   P  h( X )  a . By its construction, I satisfies

h( X )  aI and hence, you have E  h( X )  a  I   aP  h( X )  a 
Based on Theorem 1.1 you can state the following useful inequalities
Theorem 1.2 ( Markov’s inequality)
Let X is a random variable. Then, for any a  0 ,
P  X  a  E  X  / a .
Theorem 1.3( Chebyshov’s)
Let X is a random variable. Then, for any a  0 ,
137
P  X  a   E  X 2  / a2 .
Corollary 1.4 ( Chebyshov’s)
Let X  Y  E Y  , then for any a  0 , you have
   
P Y  E Y   a  E Y  E[Y ] / a 2  Var (Y ) / a 2 .
2
Theorem 1.5 ( Exponential Chebyshev)
If c  0 , then P  X  a   E exp(c( X  a)) .
Example 1.6
Let X be a random variable such that Var ( X )  0 . Show that X is constant

with probability one.
Solution
By the Chebyshev’s inequality, for any integer n  1 ,
 1
P  X  E  X     n2Var ( X )  0.
 n
Hence, defining the events Cn   X  E[ X ]  1/ n , you have
 
 
P  X  E[ X ]  P  Cn   P lim Cn  lim P(Cn )  0 .
 n1  x  x 
Definition1.7 (Convex)
A function g ( x) ( from R to R ) is called convex if, for all a , there exists

 (a) such that
g ( x)  g (a)   (a)( x  a) , for all x.
If g ( x) a differentiable function, then a suitable  is given by  (a)  g ' (a) and

so you have
g ( x)  g (a)  g ' (a)( x  a) .
This implies that a convex function lies above all its tangent.
138
Theorem1.8 (Jensen’s inequality)
Let X be a random variable with finite mean and g ( x) a convex function. Then
E  g ( X )   g  E  X  .
Proof
Choose a  E ( X ) in definition 1.7, then you have that
g ( X )  g ( E[ X ])   ( X  E[ X ]). Taking expectation of both sides you have that
E  g ( X )  g  E  X  , which proves the theorem.
Example 1.8
g ( x)  x and g ( x)  x 2 are both convex , so you have E  X   E ( X ) and
E  X 2    E ( X )
2
Example 1.9
Let X be a positive random variable. Show that
E(log X )  log E( X ).
Solution
First show that  log x is convex. Note that, by definition , for x  0 ,
1 1 a
1 1 1
 log x   dy   dy   dy , for a  0 ,
x
y a
y x
y
a a
1 1 1
  log a   dy   log a   dy   log a  ( x  a),
x
y x
a a
and this defines a convex function with  (a)  a 1 .
Activity 1.10 ( Arithmetic-Geometric means inequality)

Let x1 , , xn be any collection of positive numbers, and p1 , , pn any collection
of positive numbers such that p1   pn  1. Show that
139
p1 x1   pn xn  x1p1 ...xnpn .
Activity 1.11 ( AM/GM)

1
In the special case when pi  , i  1, 2,3,..., n , then the arithmetic-geometric mean
n
1
( x1   xn )   x1   xn  .
1/ n
takes the form
n
Example 1.12(Guesswork)
Suppose you are trying to guess the value of a proper integer valued random
variable X , with probability mass function f ( x) . If you underestimate by y , it
will cost you GHs yb ; if you overestimate by y it will cost you GHs ya . If you
guess is an integer, find the guess which minimises your expected loss.
Solution
If you guess t , then your expected loss is
L(t )  a (t  x) f ( x)  b ( x  t ) f ( x) .
x t x t
Substitute t  1 for t , gives an expression for L(t  1) and subtracting this from
L(t ) gives
L(t )  L(t  1)  a f ( x)  b f ( x)  aF (t )  b(1  F (t )).

x t x t
Write D(t )  L(t )  L(t  1) and note that lim D( x)  b and lim D( x)  a , and both
x  x 
 F (t ) and 1  F (t ) are non-increasing functions. Hence, there is a smallest t such

that D(t )  L(t )  L(t  1)  0,
This is the guess which minimises your expected loss. i.e.
 b 
tmin  min t : F (t )  .
 a b
Activity 1.13
Suppose that if you underestimate X you incur a fixed loss GHs b , whereas if
you overestimate X by y it will cost you GHs ay .
140
(a) Find an expression that determines the guess which minimises your
expected loss.
(b) Find this guess when

x
1
(i) P  X  x     , x  1, 2,3,....
2
1
(ii) P  X  x  , x  1, 2,3,....
x( x  1)
1
(iii) P  X  x  , x  n, (n  1),...., n .
2n  1
Activity 1.14
What is your best guess if
(a) L(t )  E  X  t 
(b) 
L(t )  E X  t
2
.
Activity 1.15
Let X1 , , X100 be iid random variables with mean 75 and variance 225. Use
Chebyshev’s inequality to calculate the probability that the sample mean will not
differ from the population mean by more than 6. Then use the central limit theorem
to calkculate the same probability and compare your results.
Summary
After studying this section, you have learnt
 how to prove the main statistical inequalities,

 to apply statistical inequalities to obtain bounds on probabilities.
141
Section 2 Concentration Inequalities
Introduction
Concentration inequalities deal with deviations of functions of independent random
variables from their expectation. The laws of large numbers of classical probability
theory state that sums of independent random variables are, under mild conditions,
close to their expectation with a large probability. Such sums are the most basic
examples of random variables concentrated around their mean. More recent results
reveal that such a behaviour is shared by a large class of general functions of
independent random variables.
Objectives
 state some known concentration inequalities,

 apply them to obtain bounds on probabilities.
Review
Recall from Unit 5 that , for any nonnegative random variable X , you have

E ( X )   P  X  t dt .
0
This implies Markov’s inequality : for any nonnegative random variable X , and
t  0,
E( X )
P X  t  .
t
It follows from Markov’s inequality that if  is a strictly monotonically increasing

nonnegative- valued function then
E  ( X )
P  X  t   P  ( X )   (t )  .
 (t )
142
If this inequality is apply to  ( x)  x 2 you have the Chebyschev’s inequality .i.e. If

X is an arbitrary random variable and t  0 , then

P  X  E( X )  t   P X  E( X )  t 2 
2
 Var ( X )
t2
.
More generally if you take  ( x)  x q ( x  0) , for any q you have that
E  X  E( X ) 
q
P  X  E( X )  t    .
tq
You may choose the value of q that optimize the upper bound . Such moments
bounds often provide with very sharp estimates of the tail probabilities. If you take
 ( x)  esx , where s is an arbitrary positive number, for any random variable X ,
and any t  0 , you have
E e sX 
P X  t  P e  sX
e st
 e st
.
In Chernoff’s method, you find an s  0 that minimizes the upper bound or makes
the upper bound small.
n
Let X1 , , X n be independent real-valued random variables and write Sn   X i ,
i 1
then chebyschev’s inequality and independence implies

n
Var ( X ) i
P  Sn  E ( Sn )  t  
Var ( Sn )
 i 1
.
t2 t2
1 n
Thus, if you take  2  Var ( X i ) , then you have
n i 1
1 n  2
P    X i  E ( X i )     2 .
 n i 1  n
Therefore, the Chernoff’s bound will becomes
  n 
P  Sn  E ( Sn )  t   e st E exp  s  ( X i  E ( X i )  
  i 1 
143
n
= e st  E e s ( X i  E ( X i ))  ( by independence )……….. (1)
i 1
Now the problem of finding tight bounds simplifies to finding a good upper bound
for the moment generating function of the random variables X i  E ( X i ) . For bounded
random variables the most elegant way of doing is due to Hoeffding.
Lemma 2.1 (Hoeffding’s inequality)
Let X1 , , X n be independent bounded random variables such that X i   ai , bi  , for

all i  1, 2,3,..., n with probability one . Then , for any t  0 , we have,
n
2 t 2 / ( bi ai )2
P  Sn  E ( Sn )  t   e i 1
and
n
2t 2 / (bi ai )2
P  Sn  E ( Sn )  t   e i 1
.
For proof of Lemma 2.1 refer to Hoeffding (1963).
Theorem 2.2 (Bennett’s inequality)
Let X1 , , X n be independent real-valued random variables with with zero mean,

and assume that X i  1 with probability one.
1 n
Let 2  Var ( X i ) . Then, for any t  0 ,
n i 1
 n    t 
P   X i  t   exp  n 2 h  2  
 i 1    n  
where h(u)  (1  u)log(1  u)  u , for u  0.
The proof of Theorem 2.2 , which has been omitted from this module is base on
minimising the moment generating function of inequality (1).
The main message of Theorem 2.2 is perhaps best seen if you do some further
bounding. Applying the elementary inequality h(u)  u 2 / (2  2u / 3), u  0 ( which
144
may be seen by comparing the derivative of both sides) you obtain a classical
inequality of Bernstein (1946)
Theorem 2.3( Bernstein’s inequality).
Let X1 , , X n be independent real-valued random variables with with zero mean,

and assume that X i  1 with probability one. Then, for any   0 ,
1 n   n 2 
P   X i     exp   .
 2(   / 3 
2
 n i 1 
Activity 2.4
Let S n be the sum of independent bernoulli random variables X1 , X 2 , X 3 ,.... X n each

1
with probability pn  . Show that lim P  Sn  1  n   0.
n n
Summary
In this section, you have learnt to
 state some known concentration inequalities,

 apply concentration inequalities to obtain bounds on probabilities.
145
Section 3 Some Modes of Convergences
Introduction
This section will enlighten you on some modes of converges often encounted in
probability.
Objectives
After studying this section, you should be able to
 define the various modes of converges,

 state the laws of large numbers,
 apply this concepts to prove converges of statistics to population parameters.
Definition 3.1 (Convergence with probability one)

A sequence of r.v.s Yn , n  1, 2,  converges to a constant c . with probability one if
P lim Yn  c   1 .
 n 
Note that, more generally, c can be replaced with a random variable Y .
Definition 3.2 (Convergence in Probability or Stochastic Convergence )

A sequence of r.v.s Yn , n  1, 2,  converges in probability to a constant c if for every
  0 , lim P  Yn  c     1 or lim P  Yn  c    and we write Ya 

p
c .
n n
Again note , more generally, c can be replaced with a r.v. Y, also it can be shown that shown
that convergence with probability one implies convergence in Probability
Example 3.3
Let  X n , n  1, 2,  be a sequence of discrete random variables with p.m.f.
1 1
P  X n  1  ; P  x  0   1 
n n
Show that X n 
p
0 .
146
Solution
For  0, P  X n  0   P  X n  since  n  0
1
=
n
:. lim P  X n  0   lim

1
 0  X n 
p
0
n  n  n
Theorem 3.4 (The Weak Law of Large Numbers)

Let X i , i  1, 2, , n be a sequence of independent and identical distributed r.v.s. each with a
finite variance  2 . Let E  X i    , i  1, 2, , n then for the sequence X n , n  1, 2, where
Xn 
1 n

n i 1
 
X i we have lim P X n     0 for every  0 which implies X n 
n 
p

Proof
E  x    and variance of X n is 
2
. It then follows from chebychev’s inequality that for
n
2
 
every  0 , 0  P X      2 , so you have that,
n

0  lim P X n     0
n 

 
 lim P X n     0 by the sundwhich then
n 
 X n 
p

Theorem 3.5 [Bernoulli from of the Weak Law of Large Number]

Let  be an experiment and let A be an event associated with  . Consider n independent of
 and let n A be the number of times a fixed even A occurs among the n repetitions Let
nA
f A  n 
n
Let P  A  p (which is assumed to be the same for all repetitions)
Then for any   0 ,

p(1  p)
P fA  p     or equivalently
n 2
p(1  p)
P  f A  p     1
n 2
147
 f A 
P
P
Sketch of Proof
E  nA  1 P 1  p 
E  fA    p and var  f A   2 var  nA   since nA ~ B  n, p 
n n n
Now apply Chebychev’s inequality to obtain results.
Definition 3.6 [convergence in Quadratic mean or Convergence in Mean Square]

sequence of r.v.s Yn , n  1, 2,  converges to a constant  in mean square (or quadratic
mean) if E Y 2  exists and lim E Yn      0 and we write as notation Yn  

2 2
n   
Theorem 3.7 Yn 
2
  Yn 
p

Proof omitted. It follows trivially using Chebychev’s inequality . Note that the converse is
not true in general.
i.e. Yn 
2
  Yn 
p
 in general.
Example 3.8
Let  X n , n  1, 2,  be a sequence of discrete r.v.s having p.m.f
1 1
P  X n  0  1  and P  X n  1  for n  1, 2, respectively. Show that
n n
X n 
2
0
Solution
 1 1 1
E  X n2   02 1    12   
 n n n
E  X n  0    E  X n2  
2 1
Therefore,
  n
lim E  X n      lim E  X n2   lim  0 ,

2 1
Hence,
n    n n  n
 X a 
2
 0  X a 
P
0
148
Theorem 3.9
Suppose that X1 , X 2 , , X a is a random sample drawn on a r.v X with all of its moments
E  X j  where j is a strictly an integer

1 n j P
finite. The 
n i 1
Pi 
Proof
1 n  1 n
E   Pi j    E  X i j 
 n i 1  n i 1
EX j 
1 n
= 
n i 1
2
1 a  1 n 
:. E   X i j  E  X j   var   X i j 
 n i 1   n i 1 
var  X j 
n
1
=
n2

i 1
var  X j 
= 0
n
Since var  X j  with sans is a constant

1 n
 i
n i 1
X j

2
 E  X j
 
1 n
 X ij 
n i 1
P
EX j 
Comments
(1) Mean square error consistency in estimation is equivalent to convergence in mean
square while consistency (that is simple or weak consistency) in estimation is
equivalent to convergence in Probability.
 X i j is the j-th sample moment about 0. E  X j  is the j.th population moment

1 n
(2)
n i 1
about 0. Thus, the presiding theorem says that the j th sample moment is a simple
consistent estimator of the j th population moment.
149
Theorem 3.10
Let f  x, y  be a function of the variables and let  X a  and Yn  be two sequence of random
variables such that X n 

r
 a and Yn 
P
 b where a and b are constants. If f is continuous
at (a, b) and f  X a , Ya  is a r.v. n , then f  X n , Yn  

P
 f  a, b 
Proof
We have to show that
 
P f  X n , Yn   f  a, b    1 as n   the f is continuous at the point  a, b  for
every  0   0
Such that whenever x  a   then this implies that f  x, y   f  a, b  
Now for a fixed n, let

An   X n  a    w  X n  w  a   

Bn   Yn  b    w  Yn  w  a   
  
Cn  f  X nYn   f  a, b    w  f  X n  w , Yn  w    f  a, b   
Then An  Bn  Cn   An  Bn   Cnc
c
 Anc  Bnc  Cnc
 P  Cnc   P  Anc   P  Bnc  where Anc denotes the complement of An . Therefore
P  f  X n , Yn   f  a, b    P  X n  n   t   P  Yn  b  t 
The probabilities on the right hand side go to zero as n   . Hence
 
P f  X n , Yn   f  a, b    1 as n  
 f  X n , Yn  
P
 f  a, b 
Corollary 3.11
If Zn 
P
 a and Wn 
P
 b , then
(i) Zn  Wn 
P
a  b
(ii) Z nWn 
P
 ab
f  Z n ,Wn   Z n  Wn 

Proof set  in the preceding theorem
f  Z n ,Wn   Z nWn  
150
Corollary 3.12
Any polynomial in sample moment is a single constant estimator of the second polynomial in
the population moments provided the population moments exist.
Corollary 3.13
1 n
  X i  xn  is a consistent estimator of the k-th
th k
The k sample central moment
n i 1
population central moment E  x     k

 provided E  x  exists where X
k
n is the sample mean
based on a random sample of size n and E  x   
Proof
1 n 1 n
  n   X i    X n 
k k
X i  X 
n i 1 n i 1
1 n k  k  K j
  X n  1
k j
=
n i 1 j 1  j 
k
k  1 n j 
  j X  n  X i   1
k j k j
= n
j 1    i 1 
k 
E  X j   1
k
 j   k j k j
Which tends in probability to .
j 1  
From a generalization of the preceding theorem since
X nk  j :  X n 
k j

p
 k j
 X ij  E  X j 
1 n
n i 1
i.e.
1 n

n i 1
 X i  X n  
k p
E  X   
k
 
Note. From corollary 3 we have that if X is a random variable with variance  2 , then the
statistics
1
  Xi  Xn 
2
Sn2 
n 1
is a simple content estimator of the parameter  2  var  X  .
Proof: from corollary 3
151
1 n
  X i  X n    E X     2
2
ˆ n2  p
n
n n
 n2   2   2 since lim 1
n 1 n  n 1
Comment:
The relevance of the preceding results in practical estimation problem is as follows. Since
consistency in estimation, these results show that the moment method of estimation will
usually produce consistent estimator which together with this ease of calculation compensates
for this invariable inefficiency and possible bias.
Definition 3.14 (Convergence in distribution or convergence in Law)

Let the c.d.f Fn  y  of the random variable Yn depend upon n a positive integer. If F  y  is a
c.d.f. and if lim Fn  y   F  y  for every point y at which F  y  is continuous, then the r.v.
n 
Yn is said to converge in Law (or distribution) to Y where F  y  is the c.d.f of Y.
Alternatively, we say that Yn has a limiting distribution with c.d.f. F  y  or Fn  y 
converges wearily to F  y  as n  
Comment:
It can be seen that if Yn converges to Y in probability, and then Yn converges to y in
distribution. Thus convergence in distribution is the weakest form of convergence.
Example 3.15
Let Yn have p.d.f
 ny n 1
 , 0  y 
fn  y     n
 0
 , otherwise
Find (i) the c.d.f of Yn
(iii) the limiting distribution of Yn
Solution
152
The (cumulative) density function of Yn is
0 if y  0
 y n1

Fn  y     nw n dw  y , 0  y  
n

0
1 if   y  
Then
0 if  w  y 
lim Fn  y   
n 
1 if   y 
0 if y  
i.e. F  y  
1 if y  
is distribution function moreover lim Fn  y   F  y  at each to point of continuity of F  y  .
n 
Therefore, by distribution Yn has a limiting distribution function F  y  which represents a
degenerate probability mass function.
Comment
(1) A distribution of the discrete type which has probability of 1 at a single point is called
a degenerate distribution more precisely a random variable Y is degenerated at c if
P Y  c   1 . In this case the cumulative distribution function of Y is
0, y  0 0 if y  c
F ( y)   F  y 
1, y  0. 1 if y  c
(2) Sometimes a limiting distribution may
(i) exists and is degenerate
(ii) exists and is not degenerate
(iii) not exist at all.
Definition 3:16
Suppose that X n has probability density function
f n  x   ne
 n x  
, x 
(  a positive constant ). Find the limiting distribution of X n .
Hence find the limiting moment generating function of Z n  n  X n    and use if to
determine the limiting distribution of Z n .
153
Solution:
The moment generating function of X n is given by

 
M n (t )  E etX n   etx ne n ( x  ) dx


=  ne n ( x  )tx dx

Now let y  x   , then you have that


M n (t )   ne ny t ( y  ) dy


= net  e  ( n t ) y dy
0

 e  ( n  t ) y 
t
= ne  
 n  t 0
n
= et
nt
Thus X n has a limiting moment generating function M ( t )  lim Mn (t ) e t , for all real
n
t. By the uniqueness property of moment generating functions this is the moment

generating function of discrete random variable that takes the value  with probability .
Thus X n has a limiting distribution which is divergent at x   .
Now the m.g.f of Z n is
 
M Zn (t )  E etZn  E et ( X n  ) 
= e n t M n (tn)
 1 
= e n t en t  
 1 t 
= 1  t 
1
lim M zn  t   1  t 
1
for t  1 which is the m.g.f of an exponential distribution with
n 
parameter 1. Therefore by the uniqueness property of m.g.f. s Z n has a limiting distribution

which is exponential with mean or parameter is the case 1.
154
Summary
After studying Section 3, you have learnt to
 define the various modes of converges,
 state the laws of large numbers,
 apply this concepts to prove converges of statistics to population parameters.
Section 4 The Inversion and Continuity Theorem
Introduction
In this section you will learn about two uses the characteristic functions. The first of
these states that the distribution of a random variable is specified by its
characteristic functions. The second, there is a formula which tells us how to
recapture the distribution function, say F , corresponding to a characteristic function  .
Objectives
When you have worked through this section, you should be able to
 use uniqueness property to identify distribution of random variables,

 recapture the distribution function corresponding to a characteristic function.
Theorem 4.1
If X is continuous with density function f and characteristic function  then
 e  (t ) dt
1  itx
f ( x) 
2 
At every point at which f is differentiable.
Proof.
This is the Fourier inversion theorem and can be found in any introduction to
Fourier transforms.
155
A sufficient condition, but not necessary condition that a characteristic function  be

characteristic function of a continuous random variable is that

  (t ) dt   .

Theorem 4.2 (Inversion Theorem)
Let X have continuous cumulative distribution function F and characteristic function

_
 . Define F : R  [0,1] by
_
1 
F ( x)   F ( x)  F ( y ) 
2 l im y  x 
Then, you have that
eiat  eitb
_ _ N
F (b)  F (a)  lim   (t )dt.
N 
N
2 it
Proof refers to Kingman and Taylor (1966).
Theorem 4.3 (Continuity Theorem)
Suppose that F1 , F2 , F3 ,... is a sequence of distributions functions with corresponding

characteristic functions 1 , 2 , 3 ,...
(a) If Fn  F for some distribution function F with characteristic function
 , then n (t )   (t ) for all t.
(b) Conversely, if  (t )  lim n (t ) exists and is continuous at t  0 , the  is the

n 
Characteristic function of some distribution F , and Fn  F .
Example 4.4 (Stirling’s Formula)
n!
It states that n!  nn e n 2 n as n   i.e.  1 as n .
n n
ne 2 n
Proof
156
Y t
Let Y be a random variable with the (1, t ) distribution. Then X has density
t
function
 
1 t 1
f t ( x)  t x t t exp ( x t  t )  ,  t  x   ,
(t )
And characteristic function

t
 iu 
t (t )  E  eitX   exp(iu t ) 1  
 t
Note that ft ( x) is differentiable with respect to x on ( t , ) . Apply Theorem (1)

at x  0 to obtain

1
ft (0)  exp
2   (u)du

t
1
t
2 t
But ft (0)  t e / (t ) and so you have that
  iu     iu u 2 
t (u )  exp  iu t  t log 1    exp  iu t  t    o(u 3t 3/2 )  
  t    t 2t 
 1 
1
 u2
= exp   u 2  o(u 3t 3/2 )   e 2 as t .
 2 
Taking limit as t   , you find that
 
  
 1 t  12 t  1 1 1 1
 t (u)du   lim t (u ) du  e
 u 2 /2
lim  t e   lim  .
t  (t )
  t  2 
2  t  2  2
Summary
 use uniqueness property of characteristic functions to identify distribution of

random variables,
 recapture the distribution function corresponding to a characteristic function.
157
Section 5 Limiting Moment Generating Functions

Introduction
Welcome to Section 5 of Unit 6. I will discuss in this one more uses of moment
generating function. Thus, finding the limiting distribution of random variables via the
limiting m.g.f.
Objectives
By completing this section, you should be able to
 prove the central limit theorem,
 find the limiting distribution of sequences of random variables.
Theorem 5.1 (The Central Limit Theorem)

Let X1 , X 2 , X n be a random sample form a distribution X that has mean  and (positive)
variance  2 . Then, the random variable

 n 
Yn   X i  n  n   n  xn  N  
 i 1 
has limiting distribution that is normal with mean 0 and variance 1.
Assume the existence of the m.g.f
M X (t )  E  etX  , h  t  h , of the distribution for some n  0 .
Then, the function

M Y (t )  E  et ( X  )   et  E  etX  ,
158
= et  M X (t ) =  t
M x t 
also exists for, h  t  h and M  t  gives the m.g.f of X   .Consequently M  0   1
and M   0   E  X     0 . Also, you have
M   0   E  X    
2
 
= Var ( X )
= 2
By Taylors Theorem, there exists a number  between 0 and t such that

M    t 2
M t   M  0  M   0 
2!
= 1  M 
  t 2
2
 M      2  t 2
 2t 2
= 1 
2 2
Now consider M n  t  where
 t  n X i n  / 
n 
    
M n (t )  E e  i1 

 
 
 n t  X i    
 E  e   n  
 i 1 
n 
 t   n  
 Xi  

=  E e  , since the X i ’s are independent
i 1 
 

n
  t  X1     
=  E e   n    since the X i ’s identically distributed
   
 
n
  t  t
= M   ,  h   h, h  0
   n   n
t 2  M      
2
= 1  form previous results

2n 2n 2
159
where now  is between 0 and t  n with  t   h  n . Since M   t  is continuous at
 t 
t  0 and   0 as n   , and 0     , we have that
  n
lim  M      2   0
n 
lim M n (t )  et
2
/2
,
n 
from the premium lemma. Therefore, by the uniqueness property of m.g.f the r.v.
Yn  n  X n     has a limiting standard normal distribution i.e Yn ~ N  0,1 .
Example 5.2
Let X n denote the sample mean corresponding to a random sample of size n drawn on a
random variable X with the normal distribution N  0,1 . Find the limiting distribution of X n
.
Solution
X n has the normal distribution N  0, 1n  . The distribution function of X n is
x
n  12 w2
Fn  x    e dw
 2
If the change of variable v  nw is made you have the function

nx
1  12 v2
Fn  x    e dv
 2
0 if x  0

Clearly, lim Fn  x    12 if x  0
n 
1 if x  0

0 if x  0
Now the function F x   is a cumulative distribution function and
1 if x  0
lim Fn  x   F  x  at every point of continuity of F  x  .
n 
160
To be sure that lim Fn  0   F  0  , but F  x  is not continuous at x  0 . Thus by definition,

n 
X n has a limiting distribution with c.d.f F  x  . Moreover, the limiting distribution is, as
before degenerate at x  0 . Which implies X n conserves to 0 is probability as n  
Example 5.3
Let X n have probability density function
1 if x  2  1n
fn  x   
0 elsewhere
Find the c.d.f of X n . Hence find the limiting distribution of X n . Also find lim f n  x  and
n 
comment on your result against the background of the limiting distribution of X n
Solution
0, x  2  1/ n
Fn ( x)  
1, x  2  1/ n
and
0 if x  2
lim Fn  x   
n 
1 if x  2
is a cumulative distribution function and since Fn ( x)  F ( x) for all points of continuity of
F  x  , there is a limiting distribution of X n with c.d. f. F  x  . Clearly lim f n ( x)  0, x .

n
This may suggest that X n had no limiting distribution which is false. This shows that limiting
distribution of X n exists but cannot in general be determined by taking the limit of the
probability density function.
Lemma 5.4
 
cn
  n
lim 1  bn  n  bc
n 
where b and c do not depend on n and where lim  n   0 .

n 
Proof
Follow trivially from using Taylor’s expansion or L’ Hospital’s rule
161
 
cn
  n
Let H  1  bn  n
 ln H 
1  b
n 
  n
n  use L’ Hospital rule to get lim ln H  bc
1 n 
cn
 lim H  ebc
n 
Example 5.5
Suppose that Z n has probability density function
   n2 n 1
 0  z  n
hn  z     n
 0
 elsewhere
Show that Z n has a limiting distribution.
Solution
The cumulative distribution function of Z n is
0 if z  o
 z
Ga  z       wn   n  dw  1  1  
n 1 z n
, u  z 
0   n

1 n  z
Hence ,
 0 , z0
lim Gn ( z )    z /
n 
1  e , z  0.
 0 , z0
Now G ( z )    z /
is a c.d.f that is everywhere continuous, and
1  e , z  0.
lim Gn  z   G  z  at all points.
n 
Therefore Z n has a limiting (non-degenerate) distribution with cumulative distribution
function G  z  (which is exponential with mean  ).
162
To find the limiting distribution function of a r.v. X n by using the definition of a limiting
cumulative distribution function can be laborious. It is most appropriates to use the
characteristics function in problems of limiting distribution but it is in several cases more
convenient to use the m.g.f as the following theorem show.
Theorem 5.6
Let a r.v. have distribution function Fn  y  and m.g.f M n  t  that exists for t  h n if  a
c.d.f F  y  with corresponding m.g.f. M  t  define for t  h; h  0 such that
lim M n  t   M  t 
n 
then Yn has a limiting distribution with cumulative distribution function F  y  .
Example 5.7
Let Yn have a binomial distribution with parameters n and p. Find the limiting probability
mass function of Yn upon the assumption that the mean   np is the same for every n, that

is p  ,  is a constant.
n
Solution
The m.g.f of Yn is
 
n
M n (t )  E etYn  (1  p)  pet 
  t
n

 1 
 n
e 1 

 
lim M n (t )  e ( e 1)
t
n
for all real numbers t. Since there exists a distribution namely the Poisson distribution with
mean  , that has the moment generating function e ( e 1) , it follows that Yn has a limiting
t
Poisson distribution with mean  .
Activity 5.8
Let X b(n, ) . Use the CLT to find n such that P  X  n / 2  1   . In particular,
let   0.1 and   0.45 . Calculate n, satisfying P  X  n / 2  0.9 .
163
Summary
In this Section, you have learnt to
 prove the central limit theorem,
 find the limiting distribution of sequences of random variables.
164
Section 6 Large Deviations
Introduction
The Law of large numbers says that, in a certain sense, the sum Sn of n
independent and identically distributed random variables is approximately n , where
 is a typical mean. The central limit theorem implies that the deviations of S n
from n are typically of order n , ie. small compared with the mean. Note that S n
1
may deviate from n by quantities of greater order than n , say n ,   , but
2
such “large deviations” have probabilities which tend to zero as n   . The theory
of large deviations studies the asymptotic behaviour of such probabilities i.e.
P  Sn  n  n  as
1
n   , for values of  satisfying   ; of a particular
2
interest is the case when   1 , corresponding to deviations of S n from its mean
n of the same order as the mean.
Objectives
 state the large deviation principle for the sequence S n ,
 compute the rate function of the large deviation principle for some laws.
Definition 6.1 (Cumulant generating function)

Let X1 , , X n be a sequence of independent identically distributed random variables with
mean  and partial sums Sn  X1   X n . Let M (t )  E  etX  be the m.g.f of a
typical X i , then the cumulant generating function is given by

(t )  log M (t ).
Properties of 
(i) (0)  log M (0)  0
165
M ' (0)
(ii)  (0) 
'
  if M ' (0) exists.
M (0)
2 2
M (t ) M '' (t )   M ' (t )  E (etX ) E ( X 2etX )   E ( XetX ) 
(iii)  (t ) 
''

 M (t )  M (t )
2 2
(iv) (t ) is convex function.
Definition 6.2 ( Frenchel-Legendre transform)

The Frenchel-Legendre transform of (t ) is given by
* (a)  sup at  (t ) , a  .

t
Activity 6.3
Suppose X1 , , X n are independent Bernoulli random variable each with parameter p .
n
Let Sn   X i and find * .
i 1
Theorem 6.4 (Large deviations)

Let X1 , , X n be independent identically distributed random variables with mean ,
and suppose that the m.g.f M (t )  E  etX  is finite in some neighbourhood of the
origin t  0 . Let a be such that a   and P  X  a   0 . Then, * (a)  0 and
1
lim log P  Sn  na   * (a) .
n  n
Proof
Refer to Grimmett and Stirzaker for proof of this theorem.
Summary
After completing this section, you have learnt to
 state the large deviation principle for the sequence S n ,
 compute the rate function of the large deviation principle for some laws.
166
Assignment 6(a)
Let X1 , , X n be independent random variables each having the Cauchy distribution,
and let Sn  X 1   X n . Find P  Sn  an  .
Assignment 6(b)
Let X1 , ,X 1 0 0 be iid Poisson random variable with parameter   0.02 . Let
100
S  S100   X i . Use the central limit theorem to evaluate P  S  3 , and compare
i 1
your result to the exact probability of event S  3 .
Assignment 6 (c)
Suppose X n has probability density function
nx n 1
f n ( x)  , 0  x  ,
n
where  is a strictly positive parameter if Z n  n   X n  , find the moment generating
function of Z n . Hence or otherwise show that Z n has a limiting distribution which is
exponential with mean  .
Unit Summary
You have learnt
 about the various modes of converges
 about concentration inequality
 to compute the distribution of sequence of random variables
 about large deviation.
167
References
Grimmet, G. & Stirzaker, D. (2001). Probability and Random Processes. Oxford

University Press.
Hoel, P.G, Port, S.C & Stone C.J. (1971). Introduction to Probability Theory. Houghton
Miffin.
Lindley, D.A. (1965). Introduction to Probability and Statistics, Cambridge University

Press.
Meyer, P.L. (1965). Introductory Probability and Statistical Application, Addison-

Wesley.
Ross, S. (1997). A First Course in Probability, Prentice Hall.
Stirzaker, D. (1964). Elementary Probability, Cambridge University Press.
168

STAT301 Notes

Uploaded by

Copyright:

Available Formats

STAT301 Notes

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT301 Notes

Uploaded by

Copyright:

Available Formats

What is the main topic covered in the document?

What is the main topic covered in the document?

What is a sigma-field?

What is a sigma-field?

STAT 301 Probability Distributions

Unit 1 Events and their Probabilities

Section 1 Abstract Definition of a Sample Space

Section 2 Axioms of Probability

Section 3 The Concept of Random variables

Section 4 Probability distributions

Section 5 Conditional Probability

Section 6 Conditional expectation

 to define a sample space as an abstract space,

Section 1 Abstract Definition of a Sample Space

 to give abstract definition of a sample space

Definition 1. 1 ( Sample Space)

It is the set of all possible outcomes of a random experiment. Denote it by Ω and

A sample point is any member of Ω and you shall denote it by i.e. .

If   , is called an elementary event or simple event. If contains more

Definition 1.2 ( Sigma field or -field)

A collection of subsets of Ω is called a sigma – field ( Boolean Algebra) if it

By above definition; if Ω has n elements, then the class F of all subsets of Ω

(i) List elements of  .

(ii) Three sigma-fields are:

(i) List the sigma field with eight elements.

(ii) Is your answer in (i) unique ?

You have learnt

 about abstract definition of a sample space

Section 2 Axioms of Probability

 define a probability space,

Definition 2.1 ( Probability Space)

If the probability measure a function P defined on F , (where F is a   field )

(i) 0  P( )  P( A)  P()  1 , for all A F .

(ii) P( A1  A2 )  P( A1 )  P( A2 )  P( A1  A2 ) , for Ai  F , i  1,2,3...

Ai  A j   i  j ) , then the triplet (, F , P) is called a

Let (, F , P) be a probability space and let H  F with P( H )  0. For an

Then (, F , PH ) is a probability space.

Proof of Theorem 2.2

This implies (, F , PH ) is a probability space.

Let  be a sample space of a random experiment. If A and B form a partition of  and 

P B : B  R defined by P B ( A)  P  A B  is a probability measure on  , F  with

Now take A  B  B , then notice that you have

Property 2.5 (Addition property )

outcome in  is equally likely, show that  , F , P  is a probability space.

 to give the definition of a probability space,

Section 3 Discrete Random Variables

 define a discrete random variable on a probability space,

Definition 3.1 ( Discrete Random Variables)

A discrete real-valued random variable X on a probability space  , L, P  is a

Definition 3.2 (Probability mass function)

Let X be a real-valued random variable with c.d.f F on the probability space

pk , k  1, 2,3... is a probability mass function of a random variable X .

Definition 3.3 (Cumulative distribution function)

If X is a discrete random variable , then its cumulative distribution function (c.d.f)

Suppose X has a probability mass function given by p(1)  1/ 4 , P(2)  1/ 2 ,

Then, the cumulative distribution function of X is given by

Suppose that X has c.d.f F . Then the probability that X  xi is given by

Suppose the cumulative distribution function of X is given by