Discrete Random Variable
Discrete Random Variable
Discrete Random Variable
Proof The fact that the range of X is either finite or countably infinite means that
we can use Kolmogorovs second and third axioms to deduce that
P(X = r) = P (S) = 1
r2Range(X)
E(X) =
r P(X = r)
r2Range(X)
X(s)
E(X)
M:
1
(c) If there exists k 2 R with
P(X = k t) = P(X = k + t)
for all t 2 R then E(X) =
k.
r2Range(X)
r2Range(X)
=b
rP(X = r) + c
r2Range(X)
P(X = r)
r2Range(X)
= bE(X) + c
where the fourth equality uses the definition of E(X) for the first term and
Proposition 11.1 for the second term.
(b) See Exercises 7, Q3.
(c) We have
E(X) =
rP
r2
= kP(X =
= kP(X =
t>0
=k
k t2Range(X)
P(X = r)
r2Range(X)
=
k
where:
the second and fourth equalities follow by putting r = k t when r < k
and r = k + t when r > k;
the third equality holds because P(X = k t) = P(X = k + t) so the terms
involving t cancel;
the fifth equality uses Proposition 11.1.
2
De nition Let X be a discrete random variable with E(X) = . Then the variance of
X is
2
Var(X) =
[r
] P(X = r):
r2Range(X)
Thus Var(X) is the expected value of [X ] i.e. the square of the dis-tance from X
to E(X). It measures how the probability distribution of X is concentrated about
its expectation. A small variance means X is sharply concentrated and a large
variance means X is spread out.
Proposition 11.3 (Properties of variance). Let X be a discrete random variable.
Then
(a) Var(X) 0.
2
Var(X) =
[r
] P(X = r)
r2Range(X)
since [r
]
0 and P(X = r)
0 for all r 2 RangeX.
(b)We have E(bX + c) = b
+ c by Proposition 11.2. Thus
Var(bX + c) =
r2Range(X)
= b
= b
=
We have:
r P(X = r) = E(X );
r P(X = r) = E(X);
= E(X ) E(X)
since E(X) = .
12
De nition Given two discrete random variables X and Y we say that they have
the same probability distribution if they have the same range and the same
probability mass function i.e. P(X = r) = P(Y = r) for all values r in the range.
We use the notation X Y to mean X and Y have the same probability distribution.
Certain probability distributions occur so often that it is convenient to give
them special names. In this section we study a few of these. In each case we will
determine the probability mass function, expectation and variance, as well as
describing the sort of situation in which the distribution occurs.
12.1
Bernoulli distribution
Suppose that a trial (experiment) with two outcomes is performed. Such a trial is
called a Bernoulli trial and the outcomes are usually called success and failure
with P(success) = p, and hence P(failure) = 1 p. We define a random variable X
by putting X(success) = 1 and X(fail) = 0. Then the probability mass function of
X is
n
P (X = n)
We say that X has the Bernoulli distribution and write X
Lemma 12.1. If X
Bernoulli(p).
Bernoulli(p) then
(a) E(X) = p.
(b) Var(X) = p(1
p)
(1
p) + 1
p=p
Var(X) = E(X )
12.2
E(X) = [0
(1
p) + 1
p]
p = p(1
p)
Binomial distribution
k
k p (1
Bin(n; p) then
n k
p)
for 0
k n:
where the third equality uses the fact that the k = 0 term in the summation is
zero. We can now put m = n 1 and i = k 1 in the last equality to obtain
E(X) = np
= np ( i )p (1 p)
i=0
mi
= np[p + (1 p)]
= np
12.3
Hypergeometric distribution
Suppose that we have B objects, R of which are red and the remaining B R of
which are white. We make a selection of n objects without replacement. Let X be
the number of red objects among the n objects chosen. We say that X has the
hypergeometric distribution and write X Hg(n; R; B). The
6
()(
It is perhaps interesting
able Y we obtain when the sampling is done with replacement. We have Y Bin(n;
R
B ). The expectation of Y is the same as that of X but the
variance diers by a factor of
compared to n.
12.4
Geometric distribution
Geom(p) then
(a) E(X) = .
p
(b) Var(X) =
Proof (a) We have
1
E(X) =kp(1 p)
=1
k 1
p)
k 1
for k
1:
1. We also have
(p 1)E(X) =
where the second equality follows by putting r = k and using the fact that the r =
0 term in the last summation is zero. Subtracting (2) from (1) gives
[1 (p 1)]E(X) =
12.5
Poisson distribution
Suppose that incidents of some kind occur at random times but at an average
rate incidents per unit time. Let X be the number of these incidents which occur
in a fixed unit time interval. (For example we could take X to be the number of
emissions from a radioactive source in one particular second, or the number of
phone calls received on a particular day.) In this case X has a Poisson
distribution. We write X Poisson( ). The random variable X has range f0; 1;
2; : : : g. It can be show that the probability mass function for X is
P(Y = k) = e
(See Remark at the end of this subsection for some justification of this.)
Lemma 12.5. If X
(a) E(X) = .
(b) Var(X) =
Poisson( ) then
E(X) =
k2Range(X)
k
=
=0
= e
= e
=e
i!
i=0
=
where:
the third equality uses the fact that the k = 0 term in the summation is zero;
the fifth equality uses the substitution i = k 1;
1
the last equality uses the Tayler expansion e =
i=0 i!
for any
2 R.
( )(
P (Y = k) = k
Of course the probability mass function of Y is not the same as that of X (since
the Poisson distribution allows there to be 2 or more incidents in a subinterval).
However if n is made larger and larger then the subintervals become smaller and
smaller and the probability mass function of Y gives a better and better
approximation to that of X. If we take the limit of the expression for P (Y = k) as
n ! 1 then we get e kk! . This is the proba-bility mass function of X. (We need
techniques from Analysis to make this argument precise. These will be covered
in the level 5 module Convergence and Continuity.)
12.6
Summary
The three most important discrete distribution are the binomial, geometric and
Poisson distributions. Condense this section of the notes into half a page by
writing your own summary of their properties (probability mass function,
expectation, variance and when they occur). You could include the Bernoulli and
Hypergeometric distributions also if you like.
10