Chapter 4: Multiple Random Variables

Chapter 4: Multiple Random Variables
We study the joint distribution of more than two random variables, called
a random vector, such that (X, Y ), (X, Y, Z), (X1 , · · · , Xn ), and the distri-
bution of their functions like X + Y , XY Z, or X1 + X2 + · · · + Xn .
1 Bivariate Random Variables

Assume both X and Y are random. We treat (X, Y ) as a two-dimensional
random vector and study their relationship.
1.1 Discrete Case

Assume that both X and Y are discrete random variables, with the sample
space X and Y respectively.
Joint pmf:
fX,Y (x, y) = P (X = x, Y = y), ∀x ∈ X , y ∈ Y.
Properties:
• fX,Y (x, y) ≥ 0;
P P
• x∈X y∈Y fX,Y (x, y) = 1.
The probability of a set A is given by

X
P ((X, Y ) ∈ A) = fX,Y (x, y).
(x,y)∈A
Marginal pmf: If the joint distribution of (X, Y ) is known, their marginal

pmf are X
fX (x) = P (X = x) = fX,Y (x, y).
y∈Y
X
fY (y) = P (Y = y) = fX,Y (x, y)
x∈X
80
Example 1 Two fair dice thrown. Let X=maximum, Y =sum.
Possible values:
X: 1, 2, 3, 4, 5, 6.
Y : 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
Can write the probabilities in a table.
Remark:
• Joint distribution determines the marginal distribution.
• Marginals do not determine the joint distribution.
Example: Define the joint pmf by

1 1
f (0, 0) = f (0, 1) = ; f (1, 0) = f (1, 1) = ; f (x, y) = 0, otherwise.
6 3
Consider another joint pmf by
1 5 3
f (0, 0) = ; f (1, 0) = ; f (0, 1) = f (1, 1) = ; f (x, y) = 0, otherwise.
12 12 12
They share the same marginal distributions, but not the same joint distri-
bution!
81
1.2 Continuous Case
Assume that both X and Y are continuous random variables.
Joint pdf: A function fX,Y (x, y) is called a joint probability density function
of (X, Y ) if
Z Z
P ((X, Y ) ∈ A) = f (x, y)dxdy, ∀A ∈ R2 .
(x,y)∈A
The joint pdf satisfies:

• f (x, y) ≥ 0,
R∞ R∞
• −∞ −∞ fX,Y (x, y)dxdy = 1.
Joint cdf: The joint distribution of (X, Y ) can be completely described

with their joint cdf
F (x, y) = P (X ≤ x, Y ≤ y), ∀(x, y) ∈ R2 .
Relationship between joint pdf and joint cdf: if F is differentiable with

respect to both x and y, then
Z y Z x
F (x, y) = fX,Y (u, v)dudv,
−∞ −∞
∂2
F (x, y) = f (x, y).
∂x∂y
Marginal pdf: If the joint pdf of (X, Y ) is given, the marginal pdfs of X
and Y are given by Z ∞
fX (x) = f (x, y)dy,
−∞
Z ∞
fY (y) = f (x, y)dx.
−∞
82
Review on Double
RR Integration:
Compute D f (x, y)dxdy using iterated integrals
83
Example. Check whether the following function a valid pdf
f (x, y) = ye−(x+y) I{0 < x < y}.
Example. Show that f (x, y) = 2I{0 ≤ x ≤ y ≤ 1} is a valid pdf.
84
Example. Assume f (x, y) = e−y I{0 < x < y}.
(i) Show that f (x, y) is a valid pdf.
(ii) What is the marginal distribution of X?
(iii) What is the marginal distribution of Y ?
(iv) Compute P (X + Y ≥ 1).
85
1.3 Expectation of Functions of Random Vector
Assume g is a real-valued function of two random variables g(X, Y ).
If X and Y are both discrete, then
XX
E(g(X, Y )) = g(x, y)fX,Y (x, y).
x∈X y∈Y
If X and Y are both continuous, then

Z ∞Z ∞
E(g(X, Y )) = g(x, y)fX,Y (x, y)dxdy.
−∞ −∞
Properties:
• E(aX + bY + c) = aE(X) + bE(Y ) + c.
• E(ag1 (X, Y ) + bg2 (X, Y ) + c) = aE(g1 (X, Y )) + bE(g2 (X, Y )) + c.
• In general, E(XY ) 6= E(X)E(Y ) unless X and Y are independent.
Joint mgf: MX,Y (t, s) = E(etX+sY ). Note
MX,Y (t, 0) = MX (t), MX,Y (0, s) = MY (s),

∂ k MX,Y (t, s) ∂ k MX,Y (t, s)
|(0,0) = E(X k ), |(0,0) = E(Y k ).
∂tk ∂sk
86
Discrete Example: Two fair dice thrown. Let X=maximum, Y =sum.
Compute E(XY ).
Ex. f (x, y) = e−y I{0 < x < y}. Compute E(X), E(Y ), E(XY ), MX,Y (t, s).
87
2 Conditional Distributions
Oftentimes (X, Y ) are related. For example, let X be a person’s height and
Y be a person’s weight. Knowledge about the value of X gives us some
information about the value of Y . It turns out conditional probabilities
of Y given knowledge of X can be computed from their joint distribution
fX,Y (x, y).
2.1 Discrete Case

Assume both X and Y are discrete. For any x such that P (X = x) > 0, the
conditional pmf of Y given X = x is defined as
P (X = x, Y = y)
fY |X (y|x) = P (Y = y|X = x) = , ∀y ∈ Y.
P (X = x)
We can define fX|Y (x|y) similarly.
Remark: The function f (y|x) is indeed a pmf, since for any fixed x it
satisfies
• fY |X (y|x) ≥ 0 for any y.

P
• y fY |X (y|x) = 1.
Proof:
Example. The two dice example, X=maximum, Y =sum.
fY |X (y|3).
fX|Y (x|7).
88
2.2 Continuous Case
Assume both X and Y are continuous. For any x such that fX (x) > 0, the
conditional pdf of Y given X = x is defined as
f (x, y)
fY |X (y|x) = , ∀y ∈ Y.
fX (x)
We can define fX|Y (x|y) similarly.
Remark: The function fY |X (y|x) is indeed a pdf, since for any fixed x it
satisfies
• fY |X (y|x) ≥ 0 for any y.

R
• y fY |X (y|x)dy = 1.
Example. Assume f (x, y) = e−y I{0 < x < y}. Compute fY |X (y|x).
89
2.3 Conditional Mean and Variance
For discrete random variables:
X
E(Y |X = x) = yfY |X (y|x),
y
X
Var(Y |X = x) = {y − E(Y |X = x)}2 fY |X (y|x).
y
For continuous random variables:

Z
E(Y |X = x) = yfY |X (y|x)dy,
Z
Var(Y |X = x) = {y − E(Y |X = x)}2 fY |X (y|x)dy.
Remark 1: As before, we have
Var(Y |X = x) = E(Y 2 |X = x) − {E(Y |X = x)}2 .
Example. Two dice example, X=max, Y =sum. Compute E(Y |X = 3).
Ex. f (x, y) = e−y I{0 < x < y}. Find E(Y |X = x) and Var(Y |X = x).
90
Remark 2: Note E(Y |X = x) is a function of x. Therefore, E(Y |X) is a
random variable as a function of X.
• E(g(X)|X) = g(X).
Theorem:
• Conditional Expectation Identity
E(Y ) = E(E(Y |X)).
• Conditional Variance Identity
Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)).
Remark 3:
• E(g(X, Y )) = E(E(g(X, Y )|Y )) = E(E(g(X, Y )|X)).
• Conditional expectation as projection:
E(Y − E(Y |X))2 ≤ E(Y − g(X))2 , ∀ g function
So E(Y |X) is “closest” (in above sense) to Y among all the functions
of X.
91
3 Independence
Def: Let (X, Y ) be a bivariate random vector with joint pdf/pmf fX,Y (x, y),
Then X and Y are called independent random variables if for every x, y ∈ R
f (x, y) = fX (x)fY (y).
Example. Consider the discrete bivariate random vector (X, Y ) with joint
pmf given by
1 1 3
f (10, 1) = f (20, 1) = f (20, 2) = , f (10, 2) = f (10, 3) = , f (20, 3) = .
10 5 10
Are X and Y are independent?
Lemma. Let (X, Y ) be a bivariate random vector with joint pdf or pmf
fX,Y (x, y), Then X and Y are independent random variables if and only
if there exist functions g(x) and h(y) such that for every x ∈ R and y ∈ R,
f (x, y) = g(x)h(y).
In other words, the joint pmf/pdf is factorizable. (We do not need to com-
pute marginal pdfs).
Example. Consider the continuous bivariate random vector (X, Y ) with

joint pdf given by
1 2 4 −y−(x/2)
f (x, y) = x y e , x > 0, y > 0.
384
Are X and Y are independent?
92
Theorem: If X and Y are independent, then
(i) E(Y |X) = E(Y ).
(ii) The events {X ∈ A} and {Y ∈ B} are independent.
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B), ∀A ⊂ R, B ⊂ R.
(iii)
E(g(X)h(Y )) = E(g(X))E(h(Y ).
In particular, E(XY ) = E(X)E(Y ).
(iv) In addition, we have MX,Y (t, s) = E(etX+sY ) = MX (t)MY (s). And
MX+Y (t) = E(et(X+Y ) ) = MX (t)MY (t).
If it is easy to identify the right-hand side as the MGF of some standard

distribution, then the sum of two independent variables is easy to find.
Example 1. X ∼ Bin(n1 , p), Y ∼ Bin(n2 , p), and they are independent.
Example 2. X ∼ Poisson(λ1 ), Y ∼ Poisson(λ2 ), and they are independent.
93
Example 3. X ∼ NB(r1 , p), Y ∼ NB(r2 , p), and they are independent.
Example 4. X ∼ N(µ1 , σ12 ), Y ∼ N(µ2 , σ22 ), and they are independent.
Example 5. X ∼ Gamma(α1 , β), Y ∼ Gamma(α2 , β), and they are inde-

pendent.
94
4 Bivariate transformation
In this section, we only consider continuous bivariate random vector (X, Y ).
Consider the following bivariate transformation of (X, Y ):
U = g1 (X, Y ), V = g2 (X, Y ).
4.1 Transformation for Discrete Random Variables

Assume that (X, Y ) is a discrete bivariate random vector with the support
A, i.e. P (X = x, Y = y) > 0 on A. Consider the bivariate transformation
U = g1 (X, Y ), V = g2 (X, Y ).
The the support of (U, V ) is
B = {(u, v) : u = g1 (x, y), v = g2 (x, y) for some (x, y) ∈ A}.
For any (u, v) ∈ B, define A(u,v) = {(x, y) ∈ A : g1 (x, y) = u, g2 (x, y) = v}.

Then the joint pmf of (U, V ) is given by
X
fU,V (u, v) = P (U = u, V = v) = P ((X, Y ) ∈ A(u,v) ) = fX,Y (x, y).
(x,y)∈A(u,v)
Example 1: Assume X ∼ Poisson(λ) and Y ∼ Poisson(θ), and they are

independent. Find the joint pmf of (X + Y, Y ) and the marginal pmf of U .
95
4.2 One-to-One Transformation for Continuous Random Vari-
ables
Assume that g1 and g2 are continuous, differentiable, and one-to-one. There-
fore, we can define their inverse transformations as
X = h1 (U, V ), Y = h2 (U, V ).
Def: Jacobian matrix and determinant

∂x ∂y
J= ∂u ∂u .
∂x ∂y
∂v ∂v
is the Jacobian matrix and det(J) is the Jacobian determinant, or simply

the Jacobian.
Example 1. Linear transform.
U = X + Y, V = X − Y.
Example 2. Polar transform. Assume (X, Y ) ∈ R2 . Consider
x = r cos θ, y = r sin θ,
where r ∈ (0, ∞) and θ ∈ (0, 2π). How to express (r, θ) in terms of (x, y)?
96
Theorem. If fX,Y (x, y) is the joint density of (X, Y ), then
fU,V (u, v) = fX,Y (h1 (u, v), h2 (u, v))| det(J)|.
Proof follows from change of variable rules for integration — omitted.
Example. Assume X, Y ∼ N (0, 1) and they are independent. Let U =

X + Y, V = X − Y . Find the joint and marginal distributions of (U, V ).
Example. Polar transform of independent normals.
97
Example. Assume X ∼ Gamma(α1 , β) and Y ∼ Gamma(α2 , β), and they
X
are independent. Let U = X + Y, V = X+Y . Find the joint and marginal
distributions of (U, V )
Example. Assume X, Y ∼ N (0, 1) and they are independent. Let U =

X/Y . Find the distribution of U .
Example. Assume X ∼ Beta(α, β) and Y ∼ Beta(α + β, γ) and they are

independent. Find the distribution of XY .
98
4.3 Piecewise One-to-One Transformation
Assume (X, Y ) takes value from A = A0 ∪ A1 ∪ · · · ∪ Ak , where P ((X, Y ) ∈
A0 ) = 0. Also U = g1i (X, Y ), V = g2i (X, Y ) is one-to-one transformation
from Ai to B, for i = 1, · · · , k. Then
k
X
fU,V (u, v) = fX,Y (h1i (u, v), h2i (u, v))| det(Ji )|,
i=1
5 Hierarchical Mixtures.
Recall that
E(Y ) = E(E(Y |X)), var(Y ) = E(var(Y |X)) + var(E(Y |X)).
Example. (binomial-Poisson). An insect lays a large number of eggs, each

surviving with probability p. On the average, how many eggs will survive?
Let X=the number of eggs that survive
Let Y =the total number of eggs laid by the insect.
1. Describe their distributions.
2. Find the joint distribution of (X, Y ).
3. Find the marginal distribution of X, E(X) and Var(X).
99
Example: Assume X|Λ ∼ Poisson(Λ), Λ ∼ Gamma(α, β).
Example: Assume X|p ∼ Binomial(n,p), p ∼ Beta(α, β).
100
Example: binomial-Poisson-gamma (optional). Assume X|Y ∼ Bin(Y, p),
Y |Λ ∼ Poisson(Λ), Λ ∼Gamma(α, β).
6 Covariance and Correlation.

Covariance: A measure of joint variation.
Cov(X, Y ) = E[{X − E(X)}{Y − E(Y )}].
Note: the outer expectation is with respect to the joint distribution of

(X, Y ). And we have
Cov(X, Y ) = E(XY ) − E(X)E(Y ).
Correlation:
Cov(X, Y )
ρX,Y = .
σX σY
Remark: If X and Y are independent, then cov(X, Y ) = 0 and ρX,Y = 0.

But the converse is not true!
101
Example. X ∼ N (0, 1), Y = X 2 .
Example. X = X1 + X3 , Y = X2 + X3 , where X1 , X2 , X3 pairwise inde-

pendent with common variance σ 2 . Compute ρ.
102
1
Example. X ∼Unif(0,1), Z ∼Unif(0, 10 ) and they are independent. Let
Y = X + Z. Compute ρ.
Example. Y = X 2 + Z, where X, Z are independent, X symmetric about

0, Z any distribution. Compute Cov(X, Y ).
One Important Equation:

Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ).
One Important Inequality: Cauchy-Schwarz Inequality

|Cov(X, Y )| ≤ σX σY
with equality iff X and Y are linearly related.
Corollary.
−1 ≤ ρX,Y ≤ 1.
And |ρX,Y | = 1 iff Y = aX + b wp 1, where a > 0 iff ρX,Y = 1 and a < 0 iff
ρX,Y = −1.
(Proofs can be found in the textbook and are omitted here.)
103
7 Bivariate Normal.
We say (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ) if
1
f (x, y) = p
1 − ρ2 σ 1 σ 2
2π
" ( )#
x − µ1 2 y − µ2 2

1 x − µ1 y − µ2
× exp − − 2ρ + .
2(1 − ρ2 ) σ1 σ1 σ2 σ2
We will show that
X ∼ N (µ1 , σ12 ), Y ∼ N (µ2 , σ22 ), ρX,Y = ρ, aX + bY is normal.
104
Conditional distribution for Bivariate normal.
Suppose (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ), then

σ1 2
X|Y = y ∼ N µ1 + ρ (y − µ2 ), σ1 (1 − ρ)
σ2

σ2 2
Y |X = x ∼ N µ2 + ρ (x − µ1 ), σ2 (1 − ρ)
σ1
8 Multivariate Distributions
Several variables (X1 , . . . , Xn ).
8.1 Discrete Case

Joint pmf
fX1 ,...,Xn (x1 , . . . , xn ) = P (X1 = x1 , . . . , Xn = xn ).
P
satisfying fX1 ,...,Xn (x1 , . . . , xn ) ≥ 0 and x1 ,...,xn fX1 ,...,Xn (x1 , . . . , xn ) = 1.
For any subset A of Rn , we have

X
P {(X1 , . . . , Xn ) ∈ A} = fX1 ,...,Xn (x1 , . . . , xn ).
(x1 ,...,xn )∈A
X X
Eg(X1 , . . . , Xn ) = ... g(x1 , . . . , xn )fX1 ,...,Xn (x1 , . . . , xn ).
Marginal distribution of (Xi1 , . . . , Xik ):

X
fXi1 ,...,Xik (xi1 , . . . , xik ) = fX1 ,...,Xn (x1 , . . . , xn ).
other indices
One-dimensional marginals:
X
fXi (xi ) = f (x1 , . . . , xn ).
x1 ,...,xi−1 ,xi+1 ,...,xn
Conditional distribution:
f (x1 , . . . , xn )
fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) = .
fX1 ,...,Xk (x1 , . . . , xk )
Covariance: Cov(Xi , Xj ) — based on pairwise distribution.
105
Independence: X1 , · · · , Xn are called mutually independent random vari-
ables if their joint is the product of marginals:
n
Y
fX1 ,···,Xn (x1 , · · · , xn ) = fXi (xi ), ∀(x1 , · · · , xn )
i=1
Remark: If X1 , · · · , Xn are mutually independent, then

(1) Any pair Xi and Xj are pairwise independent.
(2) Functions g1 (X1 ), · · · , gn (Xn ) are independent, and
n
Y n
Y
E( gi (Xi )) = E(gi (Xi )).
i=1 i=1
(3) MGF is the product of individual MGF’s.

n
Y
MX1 ,···,Xn (t1 , · · · , tn ) = MXi (ti ).
i=1
(4) Let Z = X1 + · · · + Xn , then the mgf of Z is

n
Y
MZ (t) = MXi (t).
i=1
In particular, if X1 , · · · , Xn all have the same distribution with mgf MX (t),

then
MZ (t) = [MX (t)]n .
Applications:
(i) Sum of independent normals is normal. Mean, variance add up.
(ii) Sum of independent gammas with the same scale parameter is gamma
with the same scale and shape parameter added up. In particular, sum
of independent exponentials is gamma.
(iii) Sum of independent Poisson is Poisson with parameters added up.
(iv) Sum of independent geometric is negative binomial.
106
Multinomial distribution. n categories, and each item can be from one
and only one category. Sampling m times independently from the categories
with probabilities p1 , . . . , pn , where p1 + · · · pn = 1. Let Xi = the count of
the ith category. Let x1 , . . . , xn be non-negative integers adding up to m.
Then
m!
P (X1 = x1 , . . . , Xn = xn ) = px1 px2 · · · pxnn .
x1 !x2 ! · · · xn ! 1 2
Prob. add up to one, as they are the terms in expansion of (p1 + · · · + pn )m .
(i) marginals are (lower order) multinomial. One dimensional Xi ∼Bin(m, pi ).
(ii) Conditionals:
(iii) Merging: (X1 +X2 , X3 , · · · , Xn ) ∼ Multinomial(m; p1+p2, p3 , · · · , pn ).
(iv) V ar(Xi ) = mpi (1 − pi ) and Cov(Xi , Xj ) = −mpi pj for all i 6= j.
107
8.2 Continuous Case
Joint Rpdf of (X1 , . . . , Xn ): If fX1 ,...,Xn (x1 , . . . , xn ) satisfies fX1 ,...,Xn (x1 , . . . , xn ) ≥
0 and fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn = 1.
Probabilities are obtained by
Z
P {(X1 , . . . , Xn ) ∈ A} = fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn .
(x1 ,...,xn )∈A
Z Z
Eg(X1 , . . . , Xn ) = ... g(x1 , . . . , xn )fX1 ,...,Xn (x1 , . . . , xn )dx1 . . . dxn .
Marginal of (Xi1 , . . . , Xik ):

Z
fXi1 ,...,Xik (xi1 , . . . , xik ) = fX1 ,...,Xn (x1 , . . . , xn ).
other indices
One-dimensional marginals:
Z
fXi (xi ) = f (x1 , . . . , xn )dx1 · · · dxi−1 dxi+1 · · · dxn .
Conditional:
f (x1 , . . . , xn )
fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) = .
fX1 ,...,Xk (x1 , . . . , xk )
Covariance: cov(Xi , Xj ) — based on pairwise distribution.

Independence: joint is the product of marginals. Equivalently, MGF is
the product of individual MGF’s.
Example 1. Uniform over the ball.

3
f (x1 , x2 , x3 ) = I{x21 + x22 + x23 < 1}.
4π
108
Example 2. Dirichlet.
f (x1 , . . . , xk−1 )
Γ(α1 + · · · + αk ) α1 −1 αk−1 −1 αk −1
= x · · · xk−1 xk I{xi > 0, x1 + · · · + xk = 1}.
Γ(α1 ) · · · Γ(αk ) 1
Properties:
(i) marginals are (lower order) Dirichlet. One dimensionals are beta.
(ii) Conditionals:
(iii) Merging of categories:
(iv) Covariances:
109
Example 3. Let n = 4 and the joint density of (X1 , X2 , X3 , X4 ) is
3
f(X1 ,X2 ,X3 ,X4 ) (x1 , x2 , x3 , x4 ) = (x21 +x22 +x23 +x24 ), if 0 < xi < 1, i = 1, 2, 3, 4;
4
and = 0 otherwise.
(i) Show that this is a valid pdf.
(ii) Compute P (X1 < 12 , X2 < 43 , X4 > 12 )
(iii) Obtain the marginal pdf of (X1 , X2 ).
1
(iv) Find the conditional pdf of (X3 , X4 ) given X1 = 3 and X2 = 23 .
(v) Compute E(X1 X2 ).
110
8.3 Multivariate Transformation
Let (X1 , · · · , Xn ) be a random vector with pdf fX1 ,···,Xn (x1 , · · · , xn ). Let
A = {x : fX (x) > 0}. A new random vector (U1 , · · · , Un ) is defined by
U1 = g1 (X1 , · · · , Xn ),
U2 = g2 (X1 , · · · , Xn ),
··· ···
Un = gn (X1 , · · · , Xn ).
The transformation is one-to-one from A onto B. The inverse of gi ’s are
X1 = h1 (U1 , · · · , Un ),
X2 = h2 (U1 , · · · , Un ),
··· ···
Xn = hn (U1 , · · · , Un ).
Let J be the Jacobian from the inverse. The joint pdf of U1 , · · · , Un is then
fU1 ,···,Un (u1 , · · · , un ) = fX1 ,···,Xn (h1 (u1 , · · · , un ), · · · , hn (u1 , · · · , un ))|J|.
Example: Let (X1 , X2 , X3 , X4 ) have the joint pdf
fX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) = 24e−x1 −x2 −x3 −x4 , 0 < x1 < x2 < x3 < x4 < ∞.
Consider the transformation
U1 = X1 , U2 = X 2 − X 1 , U3 = X 3 − X 2 , U4 = X 4 − X 3 .
111
Example 3. Multivariate normal.
Joint density of Y = (Y1 , · · · , Yn ):

1 1 T −1
fY (y1 , . . . , yn ) = exp − (y − µ) Σ (y − µ) .
(2π)n/2 (det(Σ))1/2 2
Generation of multivariate normal N (µ, Σ):

(1)Let X1 , · · · , Xn be iid N (0, 1).
(2) Write X = (X1 , . . . , Xn )0 and let
Y = AX + µ,
where Σ = AAT (i.e. the Cholesky decomposition). Then Y is multivariate

normal with the above density function, with E(Y ) = µ and Var(Y ) = Σ.
Marginals and conditionals are also (multivariate) normal.
112
9 Some Useful Inequalities.
a. Cauchy-Schwarz
(E(XY ))2 ≤ E(X 2 )E(Y 2 ).
b. Hölder
|E(XY )| ≤ (E(|X|p ))1/p (E(|X|q ))1/q ,

where p−1 + q −1 = 1.
c. Minkowski
(E(|X + Y |p ))1/p ≤ (E(|X|p ))1/p + (E(|Y |p ))1/p ,

for p ≥ 1.
d. Jensen
A function ψ is called convex if ψ(at + (1 − a)s) ≤ aψ(t) + (1 − a)ψ(s).

For any convex function ψ,
E(ψ(X)) ≥ ψ(E(X)).
113

Chapter 4: Multiple Random Variables

Uploaded by

Copyright:

Available Formats

Chapter 4: Multiple Random Variables

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 4: Multiple Random Variables

Uploaded by

Copyright:

Available Formats

Chapter 4: Multiple Random Variables

1 Bivariate Random Variables

1.1 Discrete Case

fX,Y (x, y) = P (X = x, Y = y), ∀x ∈ X , y ∈ Y.

The probability of a set A is given by

Marginal pmf: If the joint distribution of (X, Y ) is known, their marginal

Example: Define the joint pmf by

The joint pdf satisfies:

Joint cdf: The joint distribution of (X, Y ) can be completely described

F (x, y) = P (X ≤ x, Y ≤ y), ∀(x, y) ∈ R2 .

Relationship between joint pdf and joint cdf: if F is differentiable with

f (x, y) = ye−(x+y) I{0 < x < y}.

Example. Show that f (x, y) = 2I{0 ≤ x ≤ y ≤ 1} is a valid pdf.

(i) Show that f (x, y) is a valid pdf.

(ii) What is the marginal distribution of X?

(iii) What is the marginal distribution of Y ?

(iv) Compute P (X + Y ≥ 1).

If X and Y are both continuous, then

• E(aX + bY + c) = aE(X) + bE(Y ) + c.

• E(ag1 (X, Y ) + bg2 (X, Y ) + c) = aE(g1 (X, Y )) + bE(g2 (X, Y )) + c.

• In general, E(XY ) 6= E(X)E(Y ) unless X and Y are independent.

Joint mgf: MX,Y (t, s) = E(etX+sY ). Note

MX,Y (t, 0) = MX (t), MX,Y (0, s) = MY (s),

2.1 Discrete Case

We can define fX|Y (x|y) similarly.

• fY |X (y|x) ≥ 0 for any y.

Example. The two dice example, X=maximum, Y =sum.

We can define fX|Y (x|y) similarly.

• fY |X (y|x) ≥ 0 for any y.

For continuous random variables:

Remark 1: As before, we have

Var(Y |X = x) = E(Y 2 |X = x) − {E(Y |X = x)}2 .

Example. Two dice example, X=max, Y =sum. Compute E(Y |X = 3).

• Conditional Expectation Identity

E(Y ) = E(E(Y |X)).

• Conditional Variance Identity

Var(Y ) = E(Var(Y |X)) + Var(E(Y |X)).

• E(g(X, Y )) = E(E(g(X, Y )|Y )) = E(E(g(X, Y )|X)).

• Conditional expectation as projection:

E(Y − E(Y |X))2 ≤ E(Y − g(X))2 , ∀ g function

f (x, y) = fX (x)fY (y).

Example. Consider the continuous bivariate random vector (X, Y ) with

(i) E(Y |X) = E(Y ).

(ii) The events {X ∈ A} and {Y ∈ B} are independent.

(iv) In addition, we have MX,Y (t, s) = E(etX+sY ) = MX (t)MY (s). And

MX+Y (t) = E(et(X+Y ) ) = MX (t)MY (t).

If it is easy to identify the right-hand side as the MGF of some standard

Example 1. X ∼ Bin(n1 , p), Y ∼ Bin(n2 , p), and they are independent.

Example 2. X ∼ Poisson(λ1 ), Y ∼ Poisson(λ2 ), and they are independent.

Example 4. X ∼ N(µ1 , σ12 ), Y ∼ N(µ2 , σ22 ), and they are independent.

Example 5. X ∼ Gamma(α1 , β), Y ∼ Gamma(α2 , β), and they are inde-

4.1 Transformation for Discrete Random Variables

The the support of (U, V ) is

B = {(u, v) : u = g1 (x, y), v = g2 (x, y) for some (x, y) ∈ A}.

For any (u, v) ∈ B, define A(u,v) = {(x, y) ∈ A : g1 (x, y) = u, g2 (x, y) = v}.

Example 1: Assume X ∼ Poisson(λ) and Y ∼ Poisson(θ), and they are

Def: Jacobian matrix and determinant

is the Jacobian matrix and det(J) is the Jacobian determinant, or simply

Example 1. Linear transform.