Discrete Random Variable

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 13


Discrete Random variables

In this section we define the expectation and variance of a discrete random

variable and deduce some basic properties.
Proposition 11.1. Let X be a discrete random variable on a sample space
S. Then

Proof The fact that the range of X is either finite or countably infinite means that
we can use Kolmogorovs second and third axioms to deduce that

P(X = r) = P (S) = 1

This proposition is a useful to check we have calculated the probability mass

function of X correctly.
De nition The expectation of a discrete random variable X is

E(X) =
r P(X = r)

Thus the expectation of X is a weighted average of the values taken by X,

where each value r 2 Range(X) is weighted by the probability it occurs.
Sometimes it is useful to consider functions of random variables. If X is a
random variable on S and g : R ! R then Y = g(X) is a new random variable on X
which maps S into R by Y (s) = g(X(s)) for all outcomes s 2 S. For example Y =
2X 7 or Z = X are both new random variables on S.
Proposition 11.2 (Properties of expectation). Let X be a discrete random
(a) Suppose b; c 2 R. Then E(bX + c) = bE(X) + c.
(b) If m; M 2 R and m


M for all s 2 S then




(c) If there exists k 2 R with
P(X = k t) = P(X = k + t)
for all t 2 R then E(X) =


Proof (a) We have



rP(X = r) + c

P(X = r)

= bE(X) + c
where the fourth equality uses the definition of E(X) for the first term and
Proposition 11.1 for the second term.
(b) See Exercises 7, Q3.
(c) We have
E(X) =



= kP(X =

= kP(X =


k t2Range(X)

P(X = r)


the second and fourth equalities follow by putting r = k t when r < k
and r = k + t when r > k;
the third equality holds because P(X = k t) = P(X = k + t) so the terms
involving t cancel;
the fifth equality uses Proposition 11.1.

De nition Let X be a discrete random variable with E(X) = . Then the variance of
X is

Var(X) =
] P(X = r):

Thus Var(X) is the expected value of [X ] i.e. the square of the dis-tance from X
to E(X). It measures how the probability distribution of X is concentrated about
its expectation. A small variance means X is sharply concentrated and a large
variance means X is spread out.
Proposition 11.3 (Properties of variance). Let X be a discrete random variable.
(a) Var(X) 0.

(b) If b; c 2 R then Var(bX + c) = b Var(X).

Proof (a) Let E(X) = . Then

Var(X) =


] P(X = r)


since [r
0 and P(X = r)
0 for all r 2 RangeX.
(b)We have E(bX + c) = b
+ c by Proposition 11.2. Thus

Var(bX + c) =

= b

= b

We close this section by deriving a useful formula for calculating variance.

Proposition 11.4. Suppose X is a discrete random variable. Then Var(X) =
E(X )
E(X) .

Proof Let E(X) = . Then

Var(X) =

We have:

r P(X = r) = E(X );
r P(X = r) = E(X);

P(X = r) = 1, by Proposition 11.1.


Var(X) = E(X ) 2 E(X) +

= E(X ) E(X)

since E(X) = .


Some Common Discrete Probability Dis-tributions

De nition Given two discrete random variables X and Y we say that they have
the same probability distribution if they have the same range and the same
probability mass function i.e. P(X = r) = P(Y = r) for all values r in the range.
We use the notation X Y to mean X and Y have the same probability distribution.
Certain probability distributions occur so often that it is convenient to give
them special names. In this section we study a few of these. In each case we will
determine the probability mass function, expectation and variance, as well as
describing the sort of situation in which the distribution occurs.


Bernoulli distribution

Suppose that a trial (experiment) with two outcomes is performed. Such a trial is
called a Bernoulli trial and the outcomes are usually called success and failure
with P(success) = p, and hence P(failure) = 1 p. We define a random variable X
by putting X(success) = 1 and X(fail) = 0. Then the probability mass function of
X is
P (X = n)
We say that X has the Bernoulli distribution and write X
Lemma 12.1. If X


Bernoulli(p) then

(a) E(X) = p.
(b) Var(X) = p(1


Proof (a) Using the definition of E(X) we have

E(X) = 0


p) + 1


(b) Using Proposition 11.4 we have


Var(X) = E(X )


E(X) = [0


p) + 1


p = p(1


Binomial distribution

Suppose we perform n independent Bernoulli trials. Let X be the number of

trials which result in success. Then X has the binomial distribution. We write X
Bin(n; p). The range of X is f0; 1; 2; : : : ; ng and X has probability mass
P(X = k) =
Lemma 12.2. If X

k p (1

Bin(n; p) then

n k


for 0

k n:

(a) E(X) = np.

(b) Var(X) = np(1 p)
Proof (a) Using the definition of E(X) we have

where the third equality uses the fact that the k = 0 term in the summation is
zero. We can now put m = n 1 and i = k 1 in the last equality to obtain

E(X) = np

= np ( i )p (1 p)


= np[p + (1 p)]
= np

where the third equality uses the Binomial Theorem.

(b) See Exercises 8.


Hypergeometric distribution

Suppose that we have B objects, R of which are red and the remaining B R of
which are white. We make a selection of n objects without replacement. Let X be
the number of red objects among the n objects chosen. We say that X has the
hypergeometric distribution and write X Hg(n; R; B). The

random variable X takes values in f0; 1; 2; : : : ; ng and has probability mass

P(X = k) =

This follows from our results on sampling. There are

the k red objects; for each of these there are

white objects; and the sample space has size

It can be shown that:

Lemma 12.3. If X Hg(n; R; B) then


It is perhaps interesting
able Y we obtain when the sampling is done with replacement. We have Y Bin(n;
B ). The expectation of Y is the same as that of X but the
variance diers by a factor of
compared to n.


Geometric distribution

Suppose we perform an unlimited number of Bernoulli trials and let X be the

number of trials up to and including the first success. In this case X has the
geometric distribution. We write X Geom(p). The random variable X has range
f1; 2; 3; : : : g and has probability mass function
P(X = k) = p(1
Lemma 12.4. If X

Geom(p) then

(a) E(X) = .

(b) Var(X) =
Proof (a) We have

E(X) =kp(1 p)

k 1


k 1

for k


where the second equality follows by putting r = k

1. We also have

(p 1)E(X) =
where the second equality follows by putting r = k and using the fact that the r =
0 term in the last summation is zero. Subtracting (2) from (1) gives
[1 (p 1)]E(X) =

Now [1 (p 1)]E(X) = pE(X) and

mula for the sum of a geometric
so E(X) = 1=p.
(b) See Exercises 8.


Poisson distribution

Suppose that incidents of some kind occur at random times but at an average
rate incidents per unit time. Let X be the number of these incidents which occur
in a fixed unit time interval. (For example we could take X to be the number of
emissions from a radioactive source in one particular second, or the number of
phone calls received on a particular day.) In this case X has a Poisson
distribution. We write X Poisson( ). The random variable X has range f0; 1;
2; : : : g. It can be show that the probability mass function for X is
P(Y = k) = e
(See Remark at the end of this subsection for some justification of this.)
Lemma 12.5. If X
(a) E(X) = .
(b) Var(X) =

Poisson( ) then

Proof (a) Using the definition of E(X) we have

E(X) =


= e

= e



the third equality uses the fact that the k = 0 term in the summation is zero;
the fifth equality uses the substitution i = k 1;
the last equality uses the Tayler expansion e =

i=0 i!

for any

2 R.

(b) See Exercises 8.

Remark The probability mass function for the Poisson distribution looks rather
mysterious. It comes from considering a limit of probabilities coming from
binomial distributions. Specifically, suppose that we divide the unit time interval
we are interested in into n equal subintervals. If n is very large then the
subintervals are very small so it is unlikely that there will be two or more
incidents in any one of the subintervals. So lets assume that there is never more
than 1 incident in a subinterval and that the probability that there is 1 incident is
equal to =n. Let Y be the total number of incidents in the unit time interval when
we make this assumption. Then Y Bin(n; =n), so

( )(

P (Y = k) = k

Of course the probability mass function of Y is not the same as that of X (since
the Poisson distribution allows there to be 2 or more incidents in a subinterval).
However if n is made larger and larger then the subintervals become smaller and
smaller and the probability mass function of Y gives a better and better
approximation to that of X. If we take the limit of the expression for P (Y = k) as
n ! 1 then we get e kk! . This is the proba-bility mass function of X. (We need
techniques from Analysis to make this argument precise. These will be covered
in the level 5 module Convergence and Continuity.)



The three most important discrete distribution are the binomial, geometric and
Poisson distributions. Condense this section of the notes into half a page by
writing your own summary of their properties (probability mass function,
expectation, variance and when they occur). You could include the Bernoulli and
Hypergeometric distributions also if you like.


You might also like