Conditional Expectation
Conditional Expectation
Conditional Expectation
If X is a random variable then the ordinary expected value EX represents our best
guess of the value of X if we have no prior information. But now suppose that
Y is a random variable related to X somehow and that we know the value of Y .
Then one might expect that we can use that extra information to our advantage in
guessing where X will end up.
This is the idea behind the expectation of X conditional on Y , written E(X | Y ).
The first thing to notice is that the conditional expectation E(X | Y ) is actually a
random variable (that depends on Y ):
However, suppose now that every time a dart is thrown, someone watching in
the other room tells us whether the dart landed in the upper or lower half of the
board. Call this random variable Y , taking values in the set {upper, lower}.
If we condition on Y , then we can improve our guess for the landing spot:
If Y = upper, then our best guess for X might be the centerpoint of the upper
half of the board: E(X | Y = upper) = Cupper
If Y = lower then our best guess for X might be the centerpoint of the lower
half of the board: E(X | Y = lower) = Clower
Thus E(X | Y ), our conditional best guess for the landing spot of the dart
given Y , depends on whether Y = upper or Y = lower. But since Y is a
random variable, then E(X | Y ) is also a random variable.
The next sections formalize the concept of conditional expectation and develop some
useful properties.
1
0.1 Joint, Marginal, and Conditional Probability Mass Func-
tions
Let X, Y be discrete random variables taking values in the countable sets X and Y,
respectively, and having joint probability mass function
By summing pX,Y (x, y) over y Y we recover the marginal probability mass func-
tion pX (x) of X. The marginal probability mass function pY (y) is obtained similarly,
by summing pX,Y (x, y) over x X . The conditional probability mass function of
Y given X is defined by
pX,Y (x, y)
pY |X (y | x) = = P(Y = y | X = x)
pX (x)
When the set Y is infinite, the sum in (1) may be infinite, or undefined. Formally, the
conditional expectation of Y given X = x is well-defined only when the integrability
condition X
|y| p(y | x) <
yY
2
which depends on x. The conditional expectation E(Y | X = x) depends on x
through p(y | x). The conditional expectation of Y given X = x is the mean or
central value of Y when X = x. If we cannot observe Y but are told that X = x,
then E(Y | X = x) would be a reasonable guess of the value of Y .
We can capture the dependence of the conditional expectation on x explicitly by
defining a function : X R via the equation
For each x X , the function (x) is equal to the conditional expectation of Y given
that X = x.
Definition 2. Let (x) be defined as in (2). The conditional expectation of
Y given X, denoted E(Y | X) is defined by the equation
E(Y | X) = (X).
3
Fact 2. (Linearity I) If a, b R then E aY + b | X = x = a E Y | X = x + b.
Proof. Let g(y) = ay + b. It follows from Fact 1 that
X
E aY + b | X = x = (ay + b) p(y | x)
yY
X X
=a y p(y | x) + b p(y | x)
yY yY
X
=a y p(y | x) + b
yY
= aE Y |X = x + b
E[g(X, Y ) | X = x0 ] = E[g(x, Y ) | X = x0 ]
P(X = x, Y = y, X = x0 )
=
P(X = x0 )
P(X = x0 , Y = y)
= I(x = x0 )
P(X = x0 )
= p(y | x0 ) I(x = x0 ),
where I() denotes the indicator function, which is 1 if the condition in paren-
4
theses is satisfied, and 0 otherwise. It follows from Fact 1 that
X
E[g(X, Y ) | X = x0 ] = g(x, y) pZ|X ((x, y) | x0 )
xX ,yY
X
= g(x, y) p(y | x0 ) I(x = x0 )
xX ,yY
X
= g(x0 , y) p(y | x0 )
yY
= E[g(x, Y ) | X = x0 ]
as desired. Make sure that you follow each step of the argument in the previous
display.
Proof. The first claim follows from the substitution rule (Fact 3). Let (x) =
E[Y | X = x]. Substituting X for x establishes the second claim.
Interpretation: If we know the value of X then the value of the function h(X) is
fixed: in so far as the conditional expectation given X is concerned, h(X) behaves
like a constant and can be placed outside the expectation. Note that h(X)E(Y | X)
is a random variable, and is clearly a function of X.
E[g(X, Y ) | X = x0 ] = E[g(x0 , Y ) | X = x0 ]
= E[h(x0 ) Y | X = x0 ]
= h(x0 ) E[Y | X = x0 ]
where the first inequality follows from Fact 3, the second follows from the
definition of g(x, y), and the last follows from Fact 2 as h(x0 ) is a constant.
5
Fact 6. (Law of total expectation)
h i
E E(Y | X) = EY
Interpretation: The overall expectation EY can be found in two stages: (i) compute
the conditional expectation E(Y | X) which is a function of X; (ii) compute the
(ordinary) expectation of E(Y | X).
Proof. Let (x) = E(Y | X = x). Then E(Y | X) = (X) and
h i
E E(Y | X) = E (X) (3)
X
= p(x) (x) (4)
xX
X hX i
= p(x) y p(y | x) (5)
xX yY
XX
= p(x) p(y | x) y (6)
xX yY
X hX i
= y p(x, y) (7)
yY xX
X
= y p(y) (8)
yY
= E Y . (9)
Equation (3) follows from the definition of (), (4) follows from the formula
for the expectation of a function of a random variable, and (??) follows from
the definition of E(Y | X = x). Provide justifications for the remaining steps
above.
The notion of conditioning and conditional expectation extends readily from two to
any finite, or even infinite, collection of random variables. To take a representative
6
example, suppose that W, X, Y, Z are random variables taking values in W, X , Y, Z,
respectively, and having joint probability mass function p(w, x, y, z). Then for any
function g : Y Z R, the conditional expectation of g(Y, Z) given W = w, X = x
is given by X
g(y, z) p(y, z | w, x), (10)
yY,zZ
provided that the sum is well-defined. The conditional probability mass function
p(y, z | w, x) is defined by
p(w, x, y, z)
p(y, z | w, x) = P(Y = y, Z = z | W = w, X = x) =
p(w, x)
Some additional facts about conditional expectations are given below. We assume
that the integrability conditions hold in all cases.
= E(Y | X) + E(Z | X)
Proof. If X is independent of {Y, Z} then one may readily show that the
conditional probability mass function