Chapter 4: Multiple Random Variables
Chapter 4: Multiple Random Variables
Chapter 4: Multiple Random Variables
We study the joint distribution of more than two random variables, called
a random vector, such that (X, Y ), (X, Y, Z), (X1 , · · · , Xn ), and the distri-
bution of their functions like X + Y , XY Z, or X1 + X2 + · · · + Xn .
Joint pmf:
Properties:
• fX,Y (x, y) ≥ 0;
P P
• x∈X y∈Y fX,Y (x, y) = 1.
80
Example 1 Two fair dice thrown. Let X=maximum, Y =sum.
Possible values:
X: 1, 2, 3, 4, 5, 6.
Y : 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12.
Can write the probabilities in a table.
Remark:
• Joint distribution determines the marginal distribution.
• Marginals do not determine the joint distribution.
81
1.2 Continuous Case
Assume that both X and Y are continuous random variables.
Joint pdf: A function fX,Y (x, y) is called a joint probability density function
of (X, Y ) if
Z Z
P ((X, Y ) ∈ A) = f (x, y)dxdy, ∀A ∈ R2 .
(x,y)∈A
∂2
F (x, y) = f (x, y).
∂x∂y
Marginal pdf: If the joint pdf of (X, Y ) is given, the marginal pdfs of X
and Y are given by Z ∞
fX (x) = f (x, y)dy,
−∞
Z ∞
fY (y) = f (x, y)dx.
−∞
82
Review on Double
RR Integration:
Compute D f (x, y)dxdy using iterated integrals
83
Example. Check whether the following function a valid pdf
84
Example. Assume f (x, y) = e−y I{0 < x < y}.
85
1.3 Expectation of Functions of Random Vector
Assume g is a real-valued function of two random variables g(X, Y ).
If X and Y are both discrete, then
XX
E(g(X, Y )) = g(x, y)fX,Y (x, y).
x∈X y∈Y
Properties:
86
Discrete Example: Two fair dice thrown. Let X=maximum, Y =sum.
Compute E(XY ).
Ex. f (x, y) = e−y I{0 < x < y}. Compute E(X), E(Y ), E(XY ), MX,Y (t, s).
87
2 Conditional Distributions
Oftentimes (X, Y ) are related. For example, let X be a person’s height and
Y be a person’s weight. Knowledge about the value of X gives us some
information about the value of Y . It turns out conditional probabilities
of Y given knowledge of X can be computed from their joint distribution
fX,Y (x, y).
P (X = x, Y = y)
fY |X (y|x) = P (Y = y|X = x) = , ∀y ∈ Y.
P (X = x)
Remark: The function f (y|x) is indeed a pmf, since for any fixed x it
satisfies
Proof:
fY |X (y|3).
fX|Y (x|7).
88
2.2 Continuous Case
Assume both X and Y are continuous. For any x such that fX (x) > 0, the
conditional pdf of Y given X = x is defined as
f (x, y)
fY |X (y|x) = , ∀y ∈ Y.
fX (x)
Remark: The function fY |X (y|x) is indeed a pdf, since for any fixed x it
satisfies
Example. Assume f (x, y) = e−y I{0 < x < y}. Compute fY |X (y|x).
89
2.3 Conditional Mean and Variance
For discrete random variables:
X
E(Y |X = x) = yfY |X (y|x),
y
X
Var(Y |X = x) = {y − E(Y |X = x)}2 fY |X (y|x).
y
Ex. f (x, y) = e−y I{0 < x < y}. Find E(Y |X = x) and Var(Y |X = x).
90
Remark 2: Note E(Y |X = x) is a function of x. Therefore, E(Y |X) is a
random variable as a function of X.
• E(g(X)|X) = g(X).
Theorem:
Remark 3:
So E(Y |X) is “closest” (in above sense) to Y among all the functions
of X.
91
3 Independence
Def: Let (X, Y ) be a bivariate random vector with joint pdf/pmf fX,Y (x, y),
Then X and Y are called independent random variables if for every x, y ∈ R
Example. Consider the discrete bivariate random vector (X, Y ) with joint
pmf given by
1 1 3
f (10, 1) = f (20, 1) = f (20, 2) = , f (10, 2) = f (10, 3) = , f (20, 3) = .
10 5 10
Are X and Y are independent?
Lemma. Let (X, Y ) be a bivariate random vector with joint pdf or pmf
fX,Y (x, y), Then X and Y are independent random variables if and only
if there exist functions g(x) and h(y) such that for every x ∈ R and y ∈ R,
f (x, y) = g(x)h(y).
In other words, the joint pmf/pdf is factorizable. (We do not need to com-
pute marginal pdfs).
92
Theorem: If X and Y are independent, then
P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B), ∀A ⊂ R, B ⊂ R.
(iii)
E(g(X)h(Y )) = E(g(X))E(h(Y ).
In particular, E(XY ) = E(X)E(Y ).
93
Example 3. X ∼ NB(r1 , p), Y ∼ NB(r2 , p), and they are independent.
94
4 Bivariate transformation
In this section, we only consider continuous bivariate random vector (X, Y ).
Consider the following bivariate transformation of (X, Y ):
U = g1 (X, Y ), V = g2 (X, Y ).
U = g1 (X, Y ), V = g2 (X, Y ).
95
4.2 One-to-One Transformation for Continuous Random Vari-
ables
Assume that g1 and g2 are continuous, differentiable, and one-to-one. There-
fore, we can define their inverse transformations as
X = h1 (U, V ), Y = h2 (U, V ).
U = X + Y, V = X − Y.
x = r cos θ, y = r sin θ,
where r ∈ (0, ∞) and θ ∈ (0, 2π). How to express (r, θ) in terms of (x, y)?
96
Theorem. If fX,Y (x, y) is the joint density of (X, Y ), then
97
Example. Assume X ∼ Gamma(α1 , β) and Y ∼ Gamma(α2 , β), and they
X
are independent. Let U = X + Y, V = X+Y . Find the joint and marginal
distributions of (U, V )
98
4.3 Piecewise One-to-One Transformation
Assume (X, Y ) takes value from A = A0 ∪ A1 ∪ · · · ∪ Ak , where P ((X, Y ) ∈
A0 ) = 0. Also U = g1i (X, Y ), V = g2i (X, Y ) is one-to-one transformation
from Ai to B, for i = 1, · · · , k. Then
k
X
fU,V (u, v) = fX,Y (h1i (u, v), h2i (u, v))| det(Ji )|,
i=1
5 Hierarchical Mixtures.
Recall that
99
Example: Assume X|Λ ∼ Poisson(Λ), Λ ∼ Gamma(α, β).
100
Example: binomial-Poisson-gamma (optional). Assume X|Y ∼ Bin(Y, p),
Y |Λ ∼ Poisson(Λ), Λ ∼Gamma(α, β).
Correlation:
Cov(X, Y )
ρX,Y = .
σX σY
101
Example. X ∼ N (0, 1), Y = X 2 .
102
1
Example. X ∼Unif(0,1), Z ∼Unif(0, 10 ) and they are independent. Let
Y = X + Z. Compute ρ.
Corollary.
−1 ≤ ρX,Y ≤ 1.
And |ρX,Y | = 1 iff Y = aX + b wp 1, where a > 0 iff ρX,Y = 1 and a < 0 iff
ρX,Y = −1.
(Proofs can be found in the textbook and are omitted here.)
103
7 Bivariate Normal.
We say (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ) if
1
f (x, y) = p
1 − ρ2 σ 1 σ 2
2π
" ( )#
x − µ1 2 y − µ2 2
1 x − µ1 y − µ2
× exp − − 2ρ + .
2(1 − ρ2 ) σ1 σ1 σ2 σ2
104
Conditional distribution for Bivariate normal.
Suppose (X, Y ) ∼ BV N (µ1 , µ2 , σ12 , σ22 , ρ), then
σ1 2
X|Y = y ∼ N µ1 + ρ (y − µ2 ), σ1 (1 − ρ)
σ2
σ2 2
Y |X = x ∼ N µ2 + ρ (x − µ1 ), σ2 (1 − ρ)
σ1
8 Multivariate Distributions
Several variables (X1 , . . . , Xn ).
Conditional distribution:
f (x1 , . . . , xn )
fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) = .
fX1 ,...,Xk (x1 , . . . , xk )
105
Independence: X1 , · · · , Xn are called mutually independent random vari-
ables if their joint is the product of marginals:
n
Y
fX1 ,···,Xn (x1 , · · · , xn ) = fXi (xi ), ∀(x1 , · · · , xn )
i=1
Applications:
(i) Sum of independent normals is normal. Mean, variance add up.
(ii) Sum of independent gammas with the same scale parameter is gamma
with the same scale and shape parameter added up. In particular, sum
of independent exponentials is gamma.
106
Multinomial distribution. n categories, and each item can be from one
and only one category. Sampling m times independently from the categories
with probabilities p1 , . . . , pn , where p1 + · · · pn = 1. Let Xi = the count of
the ith category. Let x1 , . . . , xn be non-negative integers adding up to m.
Then
m!
P (X1 = x1 , . . . , Xn = xn ) = px1 px2 · · · pxnn .
x1 !x2 ! · · · xn ! 1 2
Prob. add up to one, as they are the terms in expansion of (p1 + · · · + pn )m .
(i) marginals are (lower order) multinomial. One dimensional Xi ∼Bin(m, pi ).
(ii) Conditionals:
107
8.2 Continuous Case
Joint Rpdf of (X1 , . . . , Xn ): If fX1 ,...,Xn (x1 , . . . , xn ) satisfies fX1 ,...,Xn (x1 , . . . , xn ) ≥
0 and fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn = 1.
Probabilities are obtained by
Z
P {(X1 , . . . , Xn ) ∈ A} = fX1 ,...,Xn (x1 , . . . , xn )dx1 · · · dxn .
(x1 ,...,xn )∈A
Z Z
Eg(X1 , . . . , Xn ) = ... g(x1 , . . . , xn )fX1 ,...,Xn (x1 , . . . , xn )dx1 . . . dxn .
Conditional:
f (x1 , . . . , xn )
fXk+1 ,...,Xn |X1 ,...,Xk (xk+1 , . . . , xn |x1 , . . . , xk ) = .
fX1 ,...,Xk (x1 , . . . , xk )
108
Example 2. Dirichlet.
f (x1 , . . . , xk−1 )
Γ(α1 + · · · + αk ) α1 −1 αk−1 −1 αk −1
= x · · · xk−1 xk I{xi > 0, x1 + · · · + xk = 1}.
Γ(α1 ) · · · Γ(αk ) 1
Properties:
(i) marginals are (lower order) Dirichlet. One dimensionals are beta.
(ii) Conditionals:
(iv) Covariances:
109
Example 3. Let n = 4 and the joint density of (X1 , X2 , X3 , X4 ) is
3
f(X1 ,X2 ,X3 ,X4 ) (x1 , x2 , x3 , x4 ) = (x21 +x22 +x23 +x24 ), if 0 < xi < 1, i = 1, 2, 3, 4;
4
and = 0 otherwise.
1
(iv) Find the conditional pdf of (X3 , X4 ) given X1 = 3 and X2 = 23 .
110
8.3 Multivariate Transformation
Let (X1 , · · · , Xn ) be a random vector with pdf fX1 ,···,Xn (x1 , · · · , xn ). Let
A = {x : fX (x) > 0}. A new random vector (U1 , · · · , Un ) is defined by
U1 = g1 (X1 , · · · , Xn ),
U2 = g2 (X1 , · · · , Xn ),
··· ···
Un = gn (X1 , · · · , Xn ).
X1 = h1 (U1 , · · · , Un ),
X2 = h2 (U1 , · · · , Un ),
··· ···
Xn = hn (U1 , · · · , Un ).
Let J be the Jacobian from the inverse. The joint pdf of U1 , · · · , Un is then
fX1 ,X2 ,X3 ,X4 (x1 , x2 , x3 , x4 ) = 24e−x1 −x2 −x3 −x4 , 0 < x1 < x2 < x3 < x4 < ∞.
U1 = X1 , U2 = X 2 − X 1 , U3 = X 3 − X 2 , U4 = X 4 − X 3 .
111
Example 3. Multivariate normal.
Y = AX + µ,
112
9 Some Useful Inequalities.
a. Cauchy-Schwarz
b. Hölder
c. Minkowski
d. Jensen
E(ψ(X)) ≥ ψ(E(X)).
113