Theories Joint Distribution PDF

ST 371 (VIII): Theory of Joint Distributions
So far we have focused on probability distributions for single random vari-

ables. However, we are often interested in probability statements concerning
two or more random variables. The following examples are illustrative:
• In ecological studies, counts, modeled as random variables, of several

species are often made. One species is often the prey of another; clearly,
the number of predators will be related to the number of prey.
• The joint probability distribution of the x, y and z components of
wind velocity can be experimentally measured in studies of atmospheric
turbulence.
• The joint distribution of the values of various physiological variables in
a population of patients is often of interest in medical studies.
• A model for the joint distribution of age and length in a population of
fish can be used to estimate the age distribution from the length dis-
tribution. The age distribution is relevant to the setting of reasonable
harvesting policies.
1 Joint Distribution
The joint behavior of two random variables X and Y is determined by the

joint cumulative distribution function (cdf):
(1.1) FXY (x, y) = P (X ≤ x, Y ≤ y),
where X and Y are continuous or discrete. For example, the probability

that (X, Y ) belongs to a given rectangle is
P (x1 ≤ X ≤ x2 , y1 ≤ Y ≤ y2 )
= F (x2 , y2 ) − F (x2 , y1 ) − F (x1 , y2 ) + F (x1 , y1 ).
1
In general, if X1 , · · · , Xn are jointly distributed random variables, the joint
cdf is
FX1 ,··· ,Xn (x1 , · · · , xn ) = P (X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn ).
Two- and higher-dimensional versions of probability distribution functions

and probability mass functions exist. We start with a detailed description
of joint probability mass functions.
1.1 Jointly Discrete Random Variables
Joint probability mass functions: Let X and Y be discrete random vari-

ables defined on the sample space that take on values {x1 , x2 , · · · } and
{y1 , y2 , · · · }, respectively. The joint probability mass function of (X, Y )
is
(1.2) p(xi , yj ) = P (X = xi , Y = yj ).
Example 1 A fair coin is tossed three times independently: let X denote

the number of heads on the first toss and Y denote the total number of
heads. Find the joint probability mass function of X and Y .
2
Solution: the joint distribution of (X, Y ) can be summarized in the fol-
lowing table:
x/y 0 1 2 3
0 18 28 18 0
1 0 18 28 81
Marginal probability mass functions: Suppose that we wish to find the pmf
of Y from the joint pmf of X and Y in the previous example:
pY (0) = P (Y = 0)
= P (Y = 0, X = 0) + P (Y = 0, X = 1)
= 1/8 + 0 = 1/8
pY (1) = P (Y = 1)
= P (Y = 1, X = 0) + P (Y = 1, X = 1)
= 2/8 + 1/8 = 3/8
In general, to find the frequency function of Y , we simply sum down the

appropriate column of the table giving the joint pmf of X and Y . For this
reason, pY is called the marginal probability mass function of Y . Similarly,
summing across the rows gives
X
pX (x) = (x, yi ),
i
which is the marginal pmf of X. For the above example, we have
pX (0) = pX (1) = 0.5.
1.2 Jointly Continuous Random Variables
Joint PDF and Joint CDF: Suppose that X and Y are continuous random
variables. The joint probability density function (pdf) of X and Y is the
function f (x, y) such that for every set C of pairs of real numbers
Z Z
(1.3) P ((X, Y ) ∈ C) = f (x, y)dxdy.
(x,y)∈C
3
Another interpretation of the joint pdf is obtained as follows:
Z b+db Z a+da
P {a < X < a + da, b < Y < b + db} = f (x, y)dxdy
b a
≈ f (a, b)dadb,
when da and db are small and f (x, y) is continuous at a, b. Hence f (a, b) is a

measure of how likely it is that the random vector (X, Y ) will be near (a, b).
This is similar to the interpretation of the pdf f (x) for a single random
variable X being a measure of how likely it is to be near x.
The joint CDF of (X, Y ) can be obtained as follows:
F (a, b) = P {X ∈ (−∞, a], Y ∈ (−∞, b]}

Z Z
= f (x, y)dxdy
X∈(−∞,a],Y ∈(−∞,b]
Z b Z a
= f (x, y)dxdy
−∞ −∞
It follows upon differentiation that

∂2
f (a, b) = F (a, b),
∂a∂b
wherever the partial derivatives are defined.
Marginal PDF: The cdf and pdf of X can be obtained from the pdf of (X, Y ):
P (X ≤ x) = P {X ≤ x, Y ∈ (−∞, ∞)}
Z x Z ∞
= f (x, y)dydx.
−∞ −∞
Then we have
Z x Z ∞ Z ∞
d d
fX (x) = P (X ≤ x) = f (x, y)dydx = f (x, y)dy.
dx dx −∞ −∞ −∞
4
Similarly, the pdf of Y is given by
Z ∞
fY (y) = f (x, y)dx.
−∞
Example 2 Consider the pdf for X and Y

12 2
f (x, y) = (x + xy), 0 ≤ x ≤ 1, 0 ≤ y ≤ 1.
7
Find (a) P (X > Y ) (b) the marginal density fX (x).
Solution: (a). Note that

Z Z
P (X > Y ) = f (x, y)dxdy
X>Y
Z 1Z 1
12 2
= (x + xy)dxdy
0 y 7
Z 1
12 x3 x2 y 1
= ( + )| dy
0 7 3 2 y
Z 1
12 12 12 12
= ( + y − y 3 − y 3 )dy
0 21 14 21 14
9
=
14
(b) For 0 ≤ x ≤ 1,
Z 1
12 12 6
fX (x) = f (x2 + xy)dy = x2 + x.
0 7 7 7
For x < 0 or x > 1, we have
fX (x) = 0.
5
Example 3 Suppose the set of possible values for (X, Y ) is the rectangle
D = {(x, y) : 0 ≤ x ≤ 1, 0 ≤ y ≤ 1}. Let the joint pdf of (X, Y ) be
6
f (x, y) = (x + y 2 ), for (x, y) ∈ D.
5
(a) Verify that f (x, y) is a valid pdf.

(b) Find P (0 ≤ X ≤ 1/4, 0 ≤ Y ≤ 1/4).
(c) Find the marginal pdf of X and Y .
(d) Find P ( 41 ≤ Y ≤ 34 ).
6
2 Independent Random Variables
The random variables X and Y are said to be independent if for any two
sets of real numbers A and B,
(2.4) P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B).
Loosely speaking, X and Y are independent if knowing the value of one

of the random variables does not change the distribution of the other ran-
dom variable. Random variables that are not independent are said to be
dependent.
For discrete random variables, the condition of independence is equivalent
to
P (X = x, Y = y) = P (X = x)P (Y = y) for all x, y.
For continuous random variables, the condition of independence is equiv-

alent to
f (x, y) = fX (x)fY (y) for all x, y.
Example 4 Consider (X, Y ) with joint pmf

1
p(10, 1) = p(20, 1) = p(20, 2) = 10
p(10, 2) = p(10, 3) = 15 , and p(20, 3) = 3
10 .
Are X and Y independent?
7
Example 5 Suppose that a man and a woman decide to meet at a cer-
tain location. If each person independently arrives at a time uniformly
distributed between [0, T ].
• Find the joint pdf of the arriving times X and Y .

• Find the probability that the first to arrive has to wait longer than a
period of τ (τ < T ).
8
2.1 More than two random variables
If X1 , · · · , Xn are all discrete random variables, the joint pmf of the variables
is the function
(2.5) p(x1 , · · · , xn ) = P (X1 = x1 , · · · , Xn = xn ).
If the variables are continuous, the joint pdf of the variables is the func-
tion f (x1 , · · · , xn ) such that
Z b1 Z bn
P (a1 ≤ X1 ≤ b1 , · · · , an ≤ Xn ≤ bn ) = ··· f (x1 , · · · , xn )dx1 · · · xn .
a1 an
One of the most important joint distributions is the multinomial distri-

bution which arises when a sequence of n independent and identical ex-
periments is performed, where each experiment can result in any one of r
possible outcomes, with respective probabilities p1 , · · · , pr . If we let denote
the number of the n experiments that result in outcome i, then
n!
(2.6) P (X1 = n1 , . . . , Xr = nr ) = pn1 1 · · · pnr r ,
n1 ! · · · nr !
Pr
whenever i=1 ni = n.
Example 6 If the allele of each ten independently obtained pea sections is

determined with
p1 = P (AA), p2 = P (Aa), p3 = P (aa).
Let X1 = number of AA’s, X2 = number of Aa’s and X3 = number of aa’s.

What is the probability that we have 2 AA’s, 5 Aa’s and 3 aa’s.
9
Example 7 When a certain method is used to collect a fixed volume of rock
samples in a region, there are four rock types. Let X1 , X2 and X3 denote the
proportion by volume of rock types 1, 2, and 3 in a randomly selected sample
(the proportion of rock type 4 is redundant because X4 = 1−X1 −X2 −X3 ).
Suppose the joint pdf of (X1 , X2 , X3 ) is
f (x1 , x2 , x3 ) = kx1 x2 (1 − x3 )
for (x1 , x2 , x3 ) ∈ [0, 1]3 and x1 + x2 + x3 ≤ 1 and f (x1 , x2 , x3 ) = 0 otherwise.
• What is k?
• What is the probability that types 1 and 2 together account for at most
50%?
10
The random variables X1 , · · · , Xn are independent if for every subset
Xi1 , · · · , Xik of the variables, the joint pmf (pdf) is equal to the product of
the marginal pmf’s (pdf’s).
Example 8 If X1 , · · · , Xn represent the lifetimes of n independent compo-

nents, and each lifetime is exponentially distributed with parameter λ. The
system fails if any component fails. Find the expected lifetime of system.
11
2.2 Sum of Independent Random Variables
It is often important to be able to calculate the distribution of X + Y from

the distribution of X and Y when X and Y are independent.
Sum of Discrete Independent RV’s: Suppose that X and Y are discrete in-
dependent rv’s. Let Z = X + Y . The probability mass function of Z is
X
(2.7) P (Z = z) = P (X = x, Y = z − x)
x
X
(2.8) = P (X = x)P (Y = z − x),
x
where the second equality uses the independence of X and Y .

Sums of Continuous Independent RV’s: Suppose that X and Y are contin-
uous independent rv’s. Let Z = X + Y . The CDF of Z is
FZ (a) = P (X + Y ≤ a)
Z Z
= fXY (x, y)dxdy
x+y≤a
Z Z
= fX (x)fY (y)dxdy
x+y≤a
Z ∞ Z a−y
= fX (x)dxfY (y)dy
−∞ −∞
Z ∞
= FX (a − y)fY (y)dy
−∞
By differentiating the cdf, we obtain the pdf of Z = X + Y :

Z ∞
d
fZ (a) = FX (a − y)fY (y)dy
da −∞
Z ∞
d
= FX (a − y)fY (y)dy
−∞ da
Z ∞
= fX (a − y)fY (y)dy.
−∞
12
Group project (optional):
• Sum of independent Poisson random variables. Suppose X and Y are

independent Poisson random variables with respective means λ1 and λ2 .
Show that Z = X + Y has a Poisson distribution with mean λ1 + λ2 .
• Sum of independent uniform random variables. If X and Y are two
independent random variables, both uniformly distributed on (0,1), cal-
culate the pdf of Z = X + Y .
Example 9 Let X and Y denote the lifetimes of two bulbs. Suppose that
X and Y are independent and that each has an exponential distribution
with λ = 1 (year).
• What is the probability that each bulb lasts at most 1 year?
13
• What is the probability that the total lifetime is between 1 and 2 years?
2.3 Conditional Distribution
The use of conditional distribution allows us to define conditional probabil-

ities of events associated with one random variable when we are given the
value of a second random variable.
(1) The Discrete Case: Suppose X and Y are discrete random variables.
The conditional probability mass function of X given Y = yj is the condi-
tional probability distribution of X given Y = yj . The conditional proba-
bility mass function of X|Y is
pX|Y (xi |yj ) = P (X = xi |Y = yj )

P (X = xi , Y = yj )
=
P (Y = yj )
pX,Y (xi , yj )
=
pY (yj )
This is just the conditional probability of the event {X = xi } given that
{Y = yi }.
14
Example 10 Considered the situation that a fair coin is tossed three times
independently. Let X denote the number of heads on the first toss and Y
denote the total number of heads.
• What is the conditional probability mass function of X given Y?

• Are X and Y independent?
15
If X and Y are independent random variables, then the conditional prob-
ability mass function is the same as the unconditional one. This follows
because if X is independent of Y , then
pX|Y (x|y) = P (X = x|Y = y)

P (X = x, Y = y)
=
P (Y = y)
P (X = x)P (Y = y)
=
P (Y = y)
= P (X = x)
(2) The Continuous Case: If X and Y have a joint probability density func-
tion f (x, y), then the conditional pdf of X, given that Y = y, is defined for
all values of y such that fY (y) > 0, by
fX,Y (x, y)
(2.9) fX|Y (x|y) = .
fY (y)
To motivate this definition, multiply the left-hand side by dx and the right
hand side by (dxdy)/dy to obtain
fX,Y (x, y)dxdy
fX|Y (x|y)dx =
fY (y)dy
P {x ≤ X ≤ x + dx, y ≤ Y ≤ y + dy}
≈
P {y ≤ Y ≤ y + dy}
= P {x ≤ X ≤ x + dx|y ≤ Y ≤ y + dy}.
In other words, for small values of dx and dy, fX|y (x|y) represents the
conditional probability that X is between x and x + dx given that Y is
between y and y + dy.
That is, if X and Y are jointly continuous, then for any set A,
Z
P {X ∈ A|Y = y} = fX|Y (x|y)dx.
A
In particular, by letting A = (−∞, a], we can define the conditional cdf of
16
X given that Y = y by
Z a
FX|Y (a|y) = P (X ≤ a|Y = y) = fX|Y (x|y)dx.
−∞
Example 11 Let X and Y be random variables with joint distribution

f (x, y) = 65 (x + y 2 ), for 0 ≤ x ≤ 1 and 0 ≤ y ≤ 1 and f (x, y) = 0 otherwise.
• What is the conditional pdf of Y given that X = 0.8?

• What is the conditional expectation of Y given that X = 0.8?
17
Group project (optional): Suppose X and Y are two independent random
variables, both uniformly distributed on (0,1). Let T1 = min(X, Y ) and
T2 = max(X, Y ).
• What is the joint distribution of T1 and T2 ?

• What is the conditional distribution of T2 given that T1 = t?
• Are T1 and T2 independent?
3 Expected Values
Let X and Y be jointly distributed rv’s with pmf p(x, y) or pdf f (x, y)
according to whether the variables are discrete or continuous. Then the
expected value of a function h(X, Y ), denoted by E[h(X, Y )], is given by
( PP
h(x, y)p(x, y) Discrete
(3.10) E[h(X, Y )] = R ∞ R ∞
x y
−∞ −∞ h(x, y)f (x, y)dxdy Continuous
Example 12 The joint pdf of X and Y is

½
24xy 0 < x < 1, 0 < y < 1, x + y ≤ 1
(3.11) f (x, y) = .
0 otherwise
Let h(X, Y ) = 0.5 + 0.5X + Y . Find E[h(X, Y )].
18
Rule of expected values: If X and Y are independent random variables, then
we have
E(XY ) = E(X)E(Y ).
This is in general not true for correlated random variables.
Example 13 Five friends purchased tickets to a certain concert. If the

tickets are for seats 1-5 in a particular row and the ticket numbers are
randomly selected (without replacement). What is the expected number of
seats separating any particular two of the five?
19
4 Covariance and Correlation
Covariance and correlation are related parameters that indicate the extent
to which two random variables co-vary. Suppose there are two technology
stocks. If they are affected by the same industry trends, their prices will
tend to rise or fall together. They co-vary. Covariance and correlation
measure such a tendency. We will begin with the problem of calculating the
expected values of a function of two random variables.
0.4
3
4
2
0.2
2
1
y1
y2
y3
0.0
0
0
−1
−2
−0.2
−2
−4
−3
−0.4
−1.5 −0.5 0.5 1.0 1.5 −2 −1 0 1 2 −1.5 −0.5 0.0 0.5 1.0 1.5
x1 x2 x3
Figure 1: Examples of positive, negative and zero correlation
Examples of positive and negative correlations:
• price and size of a house

• height and weight of a person
• income and highest degree received
• Lung capacity and the number of cigarettes smoked everyday.
• GPA and the hours of TV watched every week
• Expected length of lifetime and body mass index
20
4.1 Covariance
When two random variables are not independent, we can measure how
strongly they are related to each other. The covariance between two rv’s X
and Y is
Cov(X, Y ) = E[(X − µX )(Y − µY )]

½P P
y (x − µX )(y − µY )p(x, y)
R ∞x R ∞ X, Y discrete
=
−∞ −∞ (x − µX )(y − µY )f (x, y)dxdy X, Y continuous
Variance of a random variable can be view as a special case of the above

definition: Var(X) = Cov(X, X).
Properties of covariance:
1. A shortcut formula: Cov(X, Y ) = E(XY ) − µX µY .
2. If X and Y are independent, then Cov(X, Y ) = 0.
3. Cov(aX + b, cY + d) = acCov(X, Y ).
21
Variance of linear combinations: Let X, Y be two random variables, and a
and b be two constants, then
Var(aX + bY ) = a2 Var(X) + b2 Var(Y ) + 2abCov(X, Y ).
Special case: when X and Y are independent, then
Var(aX + bY ) = a2 Var(X) + b2 Var(Y ).
Example 14 What is Cov(X, Y ) for (X, Y ) with joint pmf

p(x, y) y=0 y = 100 y = 200
x = 100 0.20 0.10 0.20 ?
x = 250 0.05 0.15 0.30
22
Example 15 The joint pdf of X and Y is f (x, y) = 24xy when 0 ≤ x ≤ 1,
0 ≤ y ≤ 1 and x + y ≤ 1, and f (x, y) = 0 otherwise. Find Cov(X, Y ).
23
4.2 Correlation Coefficient
The defect of covariance is that its computed value depends critically on the
units of measurement (e.g., kilograms versus pounds, meters versus feet).
Ideally, the choice of units should have no effect on a measure of strength of
relationship. This can be achieved by scaling the covariance by the standard
deviations of X and Y .
The correlation coefficient of X and Y , denoted by ρX,Y , is defined by
Cov(X, Y )
(4.1) ρX,Y = .
σX σY
Example 16 Calculate the correlation of X and Y in Examples 14 and 15.
24
The correlation coefficient is not affected by a linear change in the units
of measurements. Specifically we can show that
• If a and c have the same sign, then Corr(aX + b, cY + d) = Corr(X, Y ).
• If a and c have opposite signs, then Corr(aX+b, cY +d) = −Corr(X, Y ).
Additional properties of correlation:
1. For any two rv’s X and Y , −1 ≤ Corr(X, Y ) ≤ 1.

2. If X and Y are independent, then ρ = 0.
Correlation and dependence: Zero correlation coefficient does not imply that
X and Y are independent, but only that there is complete absence of a
linear relationship. When ρ = 0, X and Y are said to be uncorrelated. Two
random variables could be uncorrelated yet highly dependent.
Correlation and Causation: A large correlation does not imply that increas-
ing values of X causes Y to increase, but only that large X values are
associated with large Y values. Examples:
• Vocabulary and cavities of children.

• Lung cancer and yellow finger.
25

Theories Joint Distribution PDF

Uploaded by

Copyright:

Available Formats

Theories Joint Distribution PDF

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Theories Joint Distribution PDF

Uploaded by

Copyright:

Available Formats

ST 371 (VIII): Theory of Joint Distributions

So far we have focused on probability distributions for single random vari-

• In ecological studies, counts, modeled as random variables, of several

The joint behavior of two random variables X and Y is determined by the

(1.1) FXY (x, y) = P (X ≤ x, Y ≤ y),

where X and Y are continuous or discrete. For example, the probability

FX1 ,··· ,Xn (x1 , · · · , xn ) = P (X1 ≤ x1 , X2 ≤ x2 , · · · , Xn ≤ xn ).

Two- and higher-dimensional versions of probability distribution functions

1.1 Jointly Discrete Random Variables

Joint probability mass functions: Let X and Y be discrete random vari-

Example 1 A fair coin is tossed three times independently: let X denote

In general, to find the frequency function of Y , we simply sum down the

which is the marginal pmf of X. For the above example, we have

pX (0) = pX (1) = 0.5.

1.2 Jointly Continuous Random Variables

when da and db are small and f (x, y) is continuous at a, b. Hence f (a, b) is a

F (a, b) = P {X ∈ (−∞, a], Y ∈ (−∞, b]}

It follows upon differentiation that

Example 2 Consider the pdf for X and Y

Solution: (a). Note that

(a) Verify that f (x, y) is a valid pdf.

(2.4) P (X ∈ A, Y ∈ B) = P (X ∈ A)P (Y ∈ B).

Loosely speaking, X and Y are independent if knowing the value of one

For continuous random variables, the condition of independence is equiv-

Example 4 Consider (X, Y ) with joint pmf

Are X and Y independent?

• Find the joint pdf of the arriving times X and Y .

(2.5) p(x1 , · · · , xn ) = P (X1 = x1 , · · · , Xn = xn ).

One of the most important joint distributions is the multinomial distri-

Example 6 If the allele of each ten independently obtained pea sections is

p1 = P (AA), p2 = P (Aa), p3 = P (aa).

Let X1 = number of AA’s, X2 = number of Aa’s and X3 = number of aa’s.

for (x1 , x2 , x3 ) ∈ [0, 1]3 and x1 + x2 + x3 ≤ 1 and f (x1 , x2 , x3 ) = 0 otherwise.

Example 8 If X1 , · · · , Xn represent the lifetimes of n independent compo-

It is often important to be able to calculate the distribution of X + Y from

where the second equality uses the independence of X and Y .

By differentiating the cdf, we obtain the pdf of Z = X + Y :

• Sum of independent Poisson random variables. Suppose X and Y are

• What is the probability that each bulb lasts at most 1 year?

2.3 Conditional Distribution

The use of conditional distribution allows us to define conditional probabil-

pX|Y (xi |yj ) = P (X = xi |Y = yj )

• What is the conditional probability mass function of X given Y?

pX|Y (x|y) = P (X = x|Y = y)

In particular, by letting A = (−∞, a], we can define the conditional cdf of

Example 11 Let X and Y be random variables with joint distribution

• What is the conditional pdf of Y given that X = 0.8?

• What is the joint distribution of T1 and T2 ?

−∞ −∞ h(x, y)f (x, y)dxdy Continuous

Example 12 The joint pdf of X and Y is

Let h(X, Y ) = 0.5 + 0.5X + Y . Find E[h(X, Y )].

Example 13 Five friends purchased tickets to a certain concert. If the

Figure 1: Examples of positive, negative and zero correlation

Examples of positive and negative correlations:

• price and size of a house

Cov(X, Y ) = E[(X − µX )(Y − µY )]

Variance of a random variable can be view as a special case of the above

1. A shortcut formula: Cov(X, Y ) = E(XY ) − µX µY .

2. If X and Y are independent, then Cov(X, Y ) = 0.