Conditional Expectation

Conditional Expectation
Jimmy Jin, James Wilson and Andrew Nobel

UNC-Chapel Hill
Last updated: 1/8/14
If X is a random variable then the ordinary expected value EX represents our best
guess of the value of X if we have no prior information. But now suppose that
Y is a random variable related to X somehow and that we know the value of Y .
Then one might expect that we can use that extra information to our advantage in
guessing where X will end up.
This is the idea behind the expectation of X conditional on Y , written E(X | Y ).
The first thing to notice is that the conditional expectation E(X | Y ) is actually a
random variable (that depends on Y ):
Example 1. (Throwing darts)

A game of darts is being played in another room and we want to guess the
landing spot of a dart X. If we have no prior information, then our best guess
for the landing spot of the dart might be, say, the the dead center of the board
(call it C):
EX = C
However, suppose now that every time a dart is thrown, someone watching in
the other room tells us whether the dart landed in the upper or lower half of the
board. Call this random variable Y , taking values in the set {upper, lower}.
If we condition on Y , then we can improve our guess for the landing spot:
If Y = upper, then our best guess for X might be the centerpoint of the upper
half of the board: E(X | Y = upper) = Cupper
If Y = lower then our best guess for X might be the centerpoint of the lower
half of the board: E(X | Y = lower) = Clower
Thus E(X | Y ), our conditional best guess for the landing spot of the dart
given Y , depends on whether Y = upper or Y = lower. But since Y is a
random variable, then E(X | Y ) is also a random variable.
The next sections formalize the concept of conditional expectation and develop some
useful properties.
1
0.1 Joint, Marginal, and Conditional Probability Mass Func-
tions
Let X, Y be discrete random variables taking values in the countable sets X and Y,
respectively, and having joint probability mass function
pX,Y (x, y) = P(X = x, Y = y) x X , y Y
By summing pX,Y (x, y) over y Y we recover the marginal probability mass func-
tion pX (x) of X. The marginal probability mass function pY (y) is obtained similarly,
by summing pX,Y (x, y) over x X . The conditional probability mass function of
Y given X is defined by
pX,Y (x, y)
pY |X (y | x) = = P(Y = y | X = x)
pX (x)
when pX (x) 6= 0. If pX (x) = 0 then it is common to define pY |X (y | x) = 0 as well.

The conditional probability mass function of X given Y is defined in the same way.
When the roles of X and Y are clear, we drop subscripts and write p(x, y), p(x),
p(y), p(y | x) and p(x | y).
0.2 Definition of Conditional Expectations, Discrete Case
Definition 1. (Conditional Expectation, Discrete Case) Let X, Y be discrete

random variables having joint probability mass function p(x, y), and suppose
that Y is real-valued. The conditional expectation of Y given X = x is defined
by X
E(Y | X = x) := y p(y | x) x X (1)
yY
When the set Y is infinite, the sum in (1) may be infinite, or undefined. Formally, the
conditional expectation of Y given X = x is well-defined only when the integrability
condition X
|y| p(y | x) <
yY
is satisfied. To simplify what follows, we assume that the integrability condition is

satisfied in each situation of interest. This will always be the case when Y is finite,
but the reader is cautioned that (i) it is necessary to check this condition in cases
where Y is infinite, and (ii) applying the results below when the condition is not
satisfied can quickly lead to nonsense.
As Y is real valued, Y is necessarily a subset of the real line. On the other hand,
the conditioning variable X need not be real-valued and X may not be a subset of
the real line.
The conditional expectation E(Y | X = x) is defined in essentially the same way
as the unconditional expectation E[Y ], except that the unconditional probability
mass function p(y) is replaced by the conditional probability mass function p(y | x),
2
which depends on x. The conditional expectation E(Y | X = x) depends on x
through p(y | x). The conditional expectation of Y given X = x is the mean or
central value of Y when X = x. If we cannot observe Y but are told that X = x,
then E(Y | X = x) would be a reasonable guess of the value of Y .
We can capture the dependence of the conditional expectation on x explicitly by
defining a function : X R via the equation
(x) = E(Y | X = x). (2)
For each x X , the function (x) is equal to the conditional expectation of Y given
that X = x.
Definition 2. Let (x) be defined as in (2). The conditional expectation of
Y given X, denoted E(Y | X) is defined by the equation
E(Y | X) = (X).
As a function of a random variable is again a random variable, the following impor-

tant facts follow immediately from the definition:
1. E(Y | X) is a random variable
2. E(Y | X) is a function of X
Caveat: The definition of E(Y | X) bears further explanation. Simply replacing

the value x in E(Y | X = x) with the random variable X leads to the potentially
confusing expression E(Y | X) = E(Y | X = X). The confusion arises from the fact
that the two appearances of X on the right side of the equation play different roles
in the expression. In its first appearance, X is paired with Y ; together they are
stand-ins for their joint distribution. In its second appearance, X represents a
random variable. Use of the intermediate function (x) separates these two roles,
and makes the definition of E(Y | X) unambiguous.
0.3 Properties of the Conditional Expectation
In what follows we assume that X X and Y Y are discrete random vari-

ables having joint probability mass function p(x, y), with associated marginal and
conditional probability mass functions.
Fact 1. (Function of a random variable) Let g : Y R be a real valued

function. Then for each x X ,
X
E g(Y ) | X = x = g(y) p(y | x)
yY
P
(Note that the sum is well-defined only if yY |g(y)| p(y | x) is finite.)
Proof. This follows from the result for standard (unconditional) expectations.
3

Fact 2. (Linearity I) If a, b R then E aY + b | X = x = a E Y | X = x + b.
Proof. Let g(y) = ay + b. It follows from Fact 1 that
X
E aY + b | X = x = (ay + b) p(y | x)
yY
X X
=a y p(y | x) + b p(y | x)
yY yY
X
=a y p(y | x) + b
yY

= aE Y |X = x + b
where we have made use of the

linearity of summation and, in the last step,
the definition of E Y | X = x .
Fact 3. (Substitution rule) Let g : X Y R be a real valued function.

Then for each x0 X ,
E[g(X, Y ) | X = x0 ] = E[g(x, Y ) | X = x0 ]
Interpretation: If we know that X = x0 , then we can replace every appearance of

X to the left of the conditioning bar by x0 . Other random variables are unchanged.
Proof. Define a new random variable Z = (X, Y ) with values in X Y. We
are interested in finding E[g(Z) | X = x]. To this end, we need to find the
conditional probability mass function of Z given X = x0 . Let (x, y) X Y
be a possible value of Z. Then
pZ|X ((x, y) | x0 ) = P(Z = (x, y) | X = x0 )
P(X = x, Y = y, X = x0 )
=
P(X = x0 )
P(X = x0 , Y = y)
= I(x = x0 )
P(X = x0 )
= p(y | x0 ) I(x = x0 ),
where I() denotes the indicator function, which is 1 if the condition in paren-
4
theses is satisfied, and 0 otherwise. It follows from Fact 1 that
X
E[g(X, Y ) | X = x0 ] = g(x, y) pZ|X ((x, y) | x0 )
xX ,yY
X
= g(x, y) p(y | x0 ) I(x = x0 )
xX ,yY
X
= g(x0 , y) p(y | x0 )
yY
= E[g(x, Y ) | X = x0 ]
as desired. Make sure that you follow each step of the argument in the previous
display.
Fact 4. (Perfect information)

1. E(g(X) | X = x) = g(x)
2. E(g(X) | X) = g(X)
Interpretation: If we know that X = x then g(X) is equal to a constant g(x) whose

expectation is just g(x).
Proof. The first claim follows from the substitution rule (Fact 3). Let (x) =
E[Y | X = x]. Substituting X for x establishes the second claim.
Fact 5. (Factors) Let h : X R be a real-valued function. Then
E(h(X) Y | X) = h(X) E(Y | X)
Interpretation: If we know the value of X then the value of the function h(X) is
fixed: in so far as the conditional expectation given X is concerned, h(X) behaves
like a constant and can be placed outside the expectation. Note that h(X)E(Y | X)
is a random variable, and is clearly a function of X.
Proof. It suffices to show that E(h(X) Y | X = x0 ) = h(x0 ) E(Y | X = x0 )

for each fixed x0 . Define g(x, y) = h(x) y. Then
E[g(X, Y ) | X = x0 ] = E[g(x0 , Y ) | X = x0 ]
= E[h(x0 ) Y | X = x0 ]
= h(x0 ) E[Y | X = x0 ]
where the first inequality follows from Fact 3, the second follows from the
definition of g(x, y), and the last follows from Fact 2 as h(x0 ) is a constant.
5
Fact 6. (Law of total expectation)
h i
E E(Y | X) = EY
Interpretation: The overall expectation EY can be found in two stages: (i) compute
the conditional expectation E(Y | X) which is a function of X; (ii) compute the
(ordinary) expectation of E(Y | X).
Proof. Let (x) = E(Y | X = x). Then E(Y | X) = (X) and
h i
E E(Y | X) = E (X) (3)
X
= p(x) (x) (4)
xX
X hX i
= p(x) y p(y | x) (5)
xX yY
XX
= p(x) p(y | x) y (6)
xX yY
X hX i
= y p(x, y) (7)
yY xX
X
= y p(y) (8)
yY

= E Y . (9)
Equation (3) follows from the definition of (), (4) follows from the formula
for the expectation of a function of a random variable, and (??) follows from
the definition of E(Y | X = x). Provide justifications for the remaining steps
above.
Fact 7. (Independence I) If Y is independent of X, then E(Y | X) = EY .
Interpretation: If X and Y are independent, then knowing the value of X tells us

nothing about the value of X; in particular, the conditional distribution of Y given
X is the same as the unconditional distribution of Y .
Proof. The result follows directly from the fact that
p(x, y) p(x) p(y)
p(y | x) = = = p(y).
p(x) p(x)
0.4 CEs for Multiple Random Variables
The notion of conditioning and conditional expectation extends readily from two to
any finite, or even infinite, collection of random variables. To take a representative
6
example, suppose that W, X, Y, Z are random variables taking values in W, X , Y, Z,
respectively, and having joint probability mass function p(w, x, y, z). Then for any
function g : Y Z R, the conditional expectation of g(Y, Z) given W = w, X = x
is given by X
g(y, z) p(y, z | w, x), (10)
yY,zZ
provided that the sum is well-defined. The conditional probability mass function
p(y, z | w, x) is defined by
p(w, x, y, z)
p(y, z | w, x) = P(Y = y, Z = z | W = w, X = x) =
p(w, x)
Some additional facts about conditional expectations are given below. We assume
that the integrability conditions hold in all cases.
Fact 8. (Linearity) E(Y + Z | X) = E(Y | X) + E(Z | X).

Proof. Let g(y, z) = y + z. By (10) and the linearity of summation,
X
E(Y + Z | X) = (y + z) p(y, z | x)
(y,z)
X X X X
= y p(y, z | x) + z p(y, z | x)
y z z y
X X
= y p(y | x) + z p(z | x)
y z
= E(Y | X) + E(Z | X)
Fact 9. (Monotonicity) If Z Y then E(Z | X) E(Y | X).

Proof. If Z Y then Z Y 0 and it follow that E(Z Y | X) 0. Using
linearity of the conditional expectation and rearranging terms gives the result.
Fact 10. (Independence II) If X is independent of {Y, Z} then E(Z | X, Y ) =

E(Z | Y ).
Proof. If X is independent of {Y, Z} then one may readily show that the
conditional probability mass function
p(z, x, y) p(z, y) p(x)

p(z | x, y) = = = p(z | y).
p(x, y) p(y) p(x)
The conclusion follows from the definition of conditional expectation

Conditional Expectation

Uploaded by

Copyright:

Available Formats

Conditional Expectation

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Conditional Expectation

Uploaded by

Copyright:

Available Formats

Conditional Expectation

Jimmy Jin, James Wilson and Andrew Nobel

Last updated: 1/8/14

Example 1. (Throwing darts)

pX,Y (x, y) = P(X = x, Y = y) x X , y Y

when pX (x) 6= 0. If pX (x) = 0 then it is common to define pY |X (y | x) = 0 as well.

0.2 Definition of Conditional Expectations, Discrete Case

Definition 1. (Conditional Expectation, Discrete Case) Let X, Y be discrete

is satisfied. To simplify what follows, we assume that the integrability condition is

(x) = E(Y | X = x). (2)

As a function of a random variable is again a random variable, the following impor-

Caveat: The definition of E(Y | X) bears further explanation. Simply replacing

0.3 Properties of the Conditional Expectation

In what follows we assume that X X and Y Y are discrete random vari-

Fact 1. (Function of a random variable) Let g : Y R be a real valued

where we have made use of the

Fact 3. (Substitution rule) Let g : X Y R be a real valued function.

Interpretation: If we know that X = x0 , then we can replace every appearance of

pZ|X ((x, y) | x0 ) = P(Z = (x, y) | X = x0 )

Fact 4. (Perfect information)

Interpretation: If we know that X = x then g(X) is equal to a constant g(x) whose

Fact 5. (Factors) Let h : X R be a real-valued function. Then

E(h(X) Y | X) = h(X) E(Y | X)

Proof. It suffices to show that E(h(X) Y | X = x0 ) = h(x0 ) E(Y | X = x0 )

Fact 7. (Independence I) If Y is independent of X, then E(Y | X) = EY .

Interpretation: If X and Y are independent, then knowing the value of X tells us

0.4 CEs for Multiple Random Variables

Fact 8. (Linearity) E(Y + Z | X) = E(Y | X) + E(Z | X).

Fact 9. (Monotonicity) If Z Y then E(Z | X) E(Y | X).

Fact 10. (Independence II) If X is independent of {Y, Z} then E(Z | X, Y ) =

p(z, x, y) p(z, y) p(x)

The conclusion follows from the definition of conditional expectation

You might also like