STAT230 Chap4

STAT 230
Introduction to Probability and Statistics
Chapter 4: Continuous Random Variables and Probability

Distributions
Rami El Haddad
[email protected]
Summer 2023
1 / 127
Outline
Chapter 4: Continuous Random Variables and Probability Distributions

Probability Distributions for Continuous Variables
Cumulative Distribution Function
Percentiles of a Continuous Distribution
Expected Values
Chebyshev's Inequality
The Normal Distribution
The Exponential Distribution
The Gamma Distributions
The Chi-Squared Distribution
Mixed random variables
2 / 127
Outline

Expected Values
3 / 127
Denition
A random variable X is continuous if it satises these two conditions:
1. Possible values comprise either a single interval on the number line

or a union of disjoint intervals.
2. IP(X = c) = 0 for any number c that is a possible value of X.
4 / 127
Example 1. Suppose that, in the study of the ecology of a lake, we make

depth measurements at randomly chosen locations and let
X = the depth at such a location
Denote M the maximum depth (in meters). Then X is a continuous rv

taking its values in the interval [0; M], so that any number in the interval
[0; M] is a possible value of X.
5 / 127
pictured in Figure 4.1(b); it has a much smoot
Probability Distributions Figure 4.1(a). If we continue
for Continuous in this way to mea
Variables
resulting sequence of histograms approaches a

Example 1 (continued). Figure 4.1(c).theBecause
To picture distributionfor
of Xeach histogram
, let us organize the
the total area under the smooth curve is also 1
its values into intervals of width 1 and centered at the integers
0; 1 : : : ; M (if M is a non-integer number, we replace it by its nearest
randomly chosen point is between a and b is
positif integer). This allows us to draw a histogram in which the height
of the rectangle above between a and
the integer b. It
k is the is exactly
proportion a smooth curve
of measurements
that are in the class [k − :5; k + :5). In this respresentation, the total
that specifies a continuous probability distribu
area of all rectangles is 1.
0 M 0
(a) (b)
Figure 4.1 (a) Probability histogram of depth measu

6 / 127
. Because for each histogram the total area of all rectang
under the smooth curve is also 1. The probability that th
osen point is between a and b is just the area under the s
Example 1 (continued). If we rene the repartition using narrower
d b. intervals
It is exactly a smooth
and represent curve
the densities of the
of the depths type
on the pictured
y -axis (i.e. the in F
a continuous probability distribution.
proportions of the measurements divided by the widths of the intervals),
the total area of all rectangles will still be equal to 1.
M 0 M 0
) (b)
a) Probability histogram of depth measured to the nearest meter; (b)

7 / 127
m the total area of all rectangles equals 1,
also 1. The probability
Probability Distributions that
for the depth atVariables
Continuous a
b is just the area under the smooth curve
curveExample
of the1 type pictured in Figure 4.1(c)
(continued).
stribution.
If we continue in this way, the resulting sequence of histograms
approaches a smooth curve, as pictured in the following gure.
M 0 M
b) (c)
Because for each histogram the total area of all rectangles equals 1, the
total area under the smooth curve is also 1.
measured to the nearest meter; (b) probability
timeter; (c) a limit of a sequence of discrete
8 / 127
narrower, though the total area of all rectangles is still 1. A possible histogram is
Probability Distributions
pictured in Figure 4.1(b); it has a for
muchContinuous Variables
smoother appearance than the histogram in
Figure 4.1(a). If we continue in this way to measure depth more and more finely, the
resulting sequence of histograms approaches a smooth curve, such as is pictured in
Figure 4.1(c). Because for each histogram the total area of all rectangles equals 1,
the total area
Example under the smooth
1 (continued). The curve is also 1.
probability Thethe
that probability
depth at that the depth at a
a randomly
randomly
chosen chosen
point point is between
is between a and ba and b isthe
is just justarea
the area under
under thethe smoothcurve
smooth curve
between a and b. It is exactly a smooth curve of the type pictured in Figure 4.1(c)
between a and b .
that specifies a continuous probability distribution.
0 M 0 M 0 M
(a) (b) (c)
Figure 4.1 (a) Probability histogram of depth measured to the nearest meter; (b) probability
Any continuous
histogram probability
of depth measured to the distribution is specied
nearest centimeter; (c) a limit ofusing a smooth
a sequence curve
of discrete
histograms
of the type pictured in (c). ¨
TION Let X be a continuous rv. Then a probability distribution or probability den

sity function (pdf) of X is a function f(x) such that for any two numbers a and
b with a # b, 9 / 127
Denition
Let X be a continuous rv. Then a probability distribution or probability
density function (pdf ) of X is a real valued function f such that
1. f (x) ≥ 0 for all x

R∞
2.
−∞
f (x)dx = 1 (this is the area under the entire graph of f)
3. for any two numbers a and b with a ≤ b,
Z b
IP(a ≤ X ≤ b) = f (x)dx
a
10 / 127
#
a
That is, the probability that X takes on a value in the interval [a, b] is the area
above this interval and under the graph of the density function, as illustrated in
Figure 4.2. The graph of f(x) is often referred to as the density curve.
That is, the probability that X takes values in the interval [a; b] is the
area above this interval and under the graph of the density function.
f(x)
x
a b
4.2 PIP(a
FigureFigure: # X # b) 5 the area under the density curve between a and b
(a ≤ X ≤ b) is the area under the density curve f between a and b
For f (x) to be a legitimate pdf, it must satisfy the following two conditions:
f (x) $ 0 for all x
`
11 / 127
Therefore, if the density of X vanishes outside an interval [a; b], then
IP(a ≤ X ≤ b) = 1
since the area under the entire graph of f is 1.
f(x)
x
a b
We say that X will take its values in [a; b] with probability 1 or almost
surely (sometimes abbreviated as a.s).
In the lake example, the density vanishes outside [0; M]: The values of X
range between 0 and M.
12 / 127
Example 2. Consider a continuous rv X measuring the direction of an

imperfection (in degrees) with respect to a reference line on a tire. One
possible pdf of X is
1
ȷ
360
0 ≤ x < 360
f (x) =
0 otherwise
1. Show that f is a legitimate pdf.
2. What is the probability that the angle is between 90◦ and 180◦ ?
13 / 127
P(90 # X # 180) 5 #90 360
dx 5
360 u x590
5
4
5
The probability that the angle of occurrence is within 908 of the re
P(0 # X # 90) 1 P(270 # X , 360) 5 .25 1 .25 5
Example 2 (continued).
1. The graph of f is the following
f(x) f(x)
Shaded area 5 P(90 #

1
360
x
0 360 90 180
Figure 4.3 The pdf and probability from Example 4.4

Clearly f (x) ≥ 0. Furthermore, the area under the density curve is
just the area of a rectangle, so
Because whenever 0 # a # b # 360 in Example 4.4, P(a # X #
Z ∞
on the width b 2 a of the interval, X is said to have a uniform distr
f (x)dx = (height) · (base) = (1=360) · (360) = 1
−∞
Thus, f DEFINITION
is a legitimate pdf. Morevover, X takes its values in [0; 360].
A continuous rv X is said to have a uniform distribution o
[A, B] if the pdf of X is
1
A # x # B14 / 127
e pdf is graphed in Figure 4.3. Clearly f(x) $ 0. The area under the density curve
ust the area of a rectangle: (height)(base) 5 (1/360)(360) 5 1. The probability
t the angle is between 908 and 1808 is
180
u
x5180
1 x 1
Example 2 180)
P(90 # X # 5
(continued).#90 360
dx 5
360 x590
5
4
5 .25
2. The probability that the angle is between 90◦ and 180◦ is
e probability that the angle of occurrence is within 908 of the reference line is
180
P(0 # X # 90) 1 P(270 # X , 360) 5 .25 .25 5 .50
Z
11 x ˛˛x=180 1
IP(90 ≤ X ≤ 180) = dx = = = :25
360 360 x=90 4
˛
90
f(x)
Shaded area 5 IP (90 # X #180)
x x
360 90 180 270 360
Figure 4.3 The pdf and probability from Example 4.4 n

Because whenever 0 ≤ a ≤ b ≤ 360, IP(a ≤ X ≤ b) depends only on the
width b−a of the interval, X is said to have a uniform distribution. ¨
cause whenever 0 # a # b # 360 in Example 4.4, P(a # X # b) depends only
the width b 2 a of the interval, X is said to have a uniform distribution.
15 / 127
.5

x x
Denition 5
8
10 5
8
10
A continuous rv X Figure 4.5 distribution

is said to have a uniform A pdf and associated cdf interval
on the
[a;ExamplE
b] if its 4.6
pdf is
Let X, the thickness of a certain metal sheet, have a uniform distribution o
[A, B]. The density ȷfunction is shown in Figure 4.6. For x , A, F(x) 5 0, sinc
1
there is no area under theb−a a≤
graph of ≤ b function to the left of such an x. Fo
thexdensity
f (x; a; b) =
x $ B, F(x) 5 1, since all the
0 area is accumulated
otherwise to the left of such an x. Finally, fo
A # x # B,
x to indicate
X ∼ U([a; b]) x
1that X is 1a uniform
y5x rv xon
2A
F(x) 5 # f(y)dy 5 #
We will often write
[a; b]. Notice that X takes its values
2` in the
dy 5
A Binterval
2A B2
[a; b]A.
?y
y5A
5
B2A u
f (x) f (x)
Shaded area 5 F(x)
1 1
b2a B2A
a b x A x B
Figure 4.6 The pdf for a uniform distribution

Figure: The pdf f (x; a; b) of a uniform rv
The entire cdf is
0 x,A
16 / 127
When X is a discrete random variable, each possible value is assigned

positive probability. This is not true of a continuous random variable
(that is, the second condition of the denition is satised) because the
area under a density curve that lies above any single value is zero:
Z c
IP(X = c) = IP(c ≤ X ≤ c) = f (x)dx = 0
c
This has an important practical consequence: The probability that X lies

in some interval between a and b does not depend on whether the lower
limit a or the upper limit b is included in the probability calculation:
IP(a ≤ X ≤ b) = IP(a < X < b) = IP(a < X ≤ b) = IP(a ≤ X < b)
Contrariwise, if X is discrete and both a and b are possible X values,

then all four previous probabilities are dierent.
17 / 127
The graph of f(x) is given in Figure 4.4; there is no density associated with
headway
Probability times less than .5, and for
Distributions headway density decreases rapidly
Continuous Variables(exponentially
fast) as x increases from .5. Clearly, f(x) $ 0; to show that #2`
`
f (x) dx 5 1, we use
the calculus result #a e dx 5 (1/k)e . Then
` 2kx 2k?a
Example 3. Let X `be a continuous

`
rv with the following
`
pdf
# #
f(x) dx 5ȷ .15e2.15(x2.5) dx 5 .15e.075 #e 2.15x
dx
2`
:15e−:15(x−:5) x ≥.5:5
.5
f (x) = 1
5 .15e.075 ? 0 e2(.15)(.5) 5otherwise
1
.15
f (x)
.15
IP (X # 5)
x
0 2 4 6 8 10
.5
Figure 4.4 The density curve for time headway in Example 4.5
1. Show that f is a legitimate pdf.
All Rights Reserved. May not be copied, scanned, or duplicated, in whole or in part. Due to electronic rights, some third party content may be suppressed from the eBook and/or eChapter(s).
ppressed content does not materially affect the overall learning experience. Cengage Learning reserves the right to remove additional content at any time if subsequent rights restrictions require it.
2. Compute the probability that X is at most 5 and the probability that
X is less than 5.
18 / 127
1. Clearly f (x) ≥ 0 and
Z ∞ Z ∞ Z ∞
−:15(x−:5)
f (x)dx = :15e dx = :15e :075
e−:15x dx
−∞ :5 :5
1 (:15)(:5)
= :15e:075 · e =1
:15
2. The probability that X is at most 5 is
Z 5 Z 5
IP(X ≤ 5) = f (x)dx = :15e−:15(x−:5) dx
−∞ :5
Z 5 „ «
1 −:15x ˛˛x=5
= :15e:075 e−:15x dx = :15e:075 · − e
:15
˛
:5 x=:5
= e:075 (−e−:75 + e−:075 ) = :491

Since X is a continuous rv, the probability that it is less than 5 is
IP(X < 5) = IP(X ≤ 5) = :491

¨
19 / 127
Outline

Expected Values
20 / 127
The cumulative distribution function F for a discrete rv X gives, for any

specied number x, the probability IP(X ≤ x). It is obtained by summing
the pmf p(y ) over all possible values y y ≤ x . The cdf of a
satisfying
continuous rv gives the same probabilities IP(X ≤ x) and is obtained by
integrating the pdf f between the limits −∞ and x .
Denition
The cumulative distribution function F (x) for a continuous rv X is
dened for every number x by
Z x
F (x) = IP(X ≤ x) = f (y )dy
−∞
21 / 127
Proposition (Properties of a continuous rv's cdf)

The cdf F of a continuous rv has the following properties:
• The range of F is [0; 1].

• F is continuous everywhere (i.e. to the right and to the left left at
each point).
• F is non-decreasing (i.e. if x ≤ y, then F (x) ≤ F (y )).

• lim F (x) = 0 and lim F (x) = 1.
x→−∞ x→∞
22 / 127
for every number x by
x
F(x) 5 P(X # x) 5 # 2`
f(y) dy
For each x, F(x) is the area under the density curve to the left of x. This is illus-
For in
trated each x , F4.5,
Figure (x) where
is the area
F(x) under the smoothly
increases density curve to the left of x .
as x increases.
Clearly, F (x) increases as x increases.
f(x) F (x)
F(8) 1
F(8)
.5
x x
5 10 5 10
8 8
Figure 4.5 A pdf and associated cdf
Figure: A pdf (left) and associated cdf (right)
Let X, the thickness of a certain metal sheet, have a uniform distribution on
[A, B]. The density function is shown in Figure 4.6. For x , A, F(x) 5 0, since
there is no area under the graph of the density function to the left of such an x. For
x $ B, F(x) 5 1, since all the area is accumulated to the left of such an x. Finally, for
23 / 127
x $ B, F(x) 5 1, since all the area is accumulated to the left of suc
A # x # B,
x x
1 1
# # u
y5x
F(x) 5 f(y)dy 5 dy 5 ?y 5
A B2A B 2 Arv X
Example 4. The thickness of a certain metal sheet is a continuous
2` y5A
that has a uniform distribution on [a; b].
f(x) f (x)
Shaded area 5
1 1
b2a B 2A
a b x A x B
Figure: Figure 4.6 The pdf for a uniform distribution

[a; b]
The pdf of a uniform rv on
The entire cdf is

0 x,A
x2A
F(x) 5 A#x,B 24 / 127
x
Cumulative Distribution Function x
5 10 5 10
8 8
Figure 4.5 A pdf and associated cdf
• For x < a, F (x) = 0, since there is no area under the graph of the
thickness of a certain metal sheet, have a uniform distribution on
density function to the left of such an x .
e density function is shown in Figure 4.6. For x , A, F(x) 5 0, since
• For x ≥ b , F (x) = 1, since all the area is accumulated to the left of
area under the graph of the density function to the left of such an x. For
such an x .
) 5 1, since all the area is accumulated to the left of such an x. Finally, for
, • For a ≤ x ≤ b ,
x x Z x
1 1 Z x 1y5x x2A
# # u x −a
˛y =x
F(x) 5 f(y)dyF 5
(x) =B 2 A fdy(y5
)dy = ?y 5 = 1 · y ˛˛
dy =
2` A −∞
B 2 A a b− a
y5A B 2 A
b−a y =a b−a
f (x) f (x)
Shaded area 5 F(x)
1 1
B2A b2a
A B x a x b
Figure 4.6 The pdf for a uniform distribution

25 / 127
Example 4 (continued). The entire cdf is
8
< 0 x <a
x−a
F (x) = b−a
a≤x <b
4.2 Cumulative
1 Distribution
b ≤ x Functionsand expected Values 14
:
F(x)
1
a b x
Figure Figure:
4.7 The cdf for a uniform distribution
The cdf of a uniform rv on [a; b]
n
ing F(x ) to Compute Probabilities ¨
importance of the cdf here, just as for discrete rv’s, is that probabilities of vari
26 / 127
ous intervals can be computed from a formula for or table of F(x).
TION Let X be a continuous rv with pdf f (x) and cdf F(x). Then for any number a,
Proposition P(X . a) 5 1 2 F(a)
Letand
X for
be any two numbers
a continuous and bpdf
rvawith withf aand
, b,cdf F. Then
1. for any number a, P(a # X # b) 5 F(b) 2 F(a)

IP(X > a) = 1 − F (a)
2. for any two numbers a and b with a < b ,

Figure 4.8 illustrates the second part of this proposition; the desired probability is
the shaded area under the density curve between a and b, and it equals the difference
IP(a ≤ X ≤ b) = F (b) − F (a)
between the two shaded cumulative areas. This is different from what is appro-
priate for a discrete integer-valued random variable (e.g., binomial or Poisson):
P(a # X # b) 5 F(b) 2 F(a 2 1) when a and b are integers.
Point 2 of this proposition is illustrated in the following gure.
f(x)
5 2
a b b a
Figure 4.8 Computing P(a # X # b) from cumulative probabilities
E 4.7 Suppose the pdf of the magnitude X of a dynamic load on a bridge (in newtons) is 27 / 127
Example 5. Suppose that the magnitude X of a dynamic load on a bridge

(in newtons) has pdf given by
1
+ 38 x
ȷ
8
0≤x ≤2
f (x) =
0 otherwise
1. Compute the cumulative distribution function of X.

2. Compute the probability that the load is between 1 and 1.5.
3. Compute the probability that the load exceeds 1.
28 / 127
1. For any number x between 0 and 2
Z x Z x „ «
1 3 x 3
F (x) = f (y )dy = + y dy = + x2
−∞ 0 8 8 8 16
Thus 8
< 0 x <0
x 3 2
F (x) = 8
+ 16
x 0≤x ≤2
1 2<x
:
29 / 127
2. The probability that the load is between 1 and 1.5 is
IP(1 ≤ X ≤ 1:5) = F (1:5) − F (1)

» – » –
1:5 3 1 3
= + (1:5)2 − + (1)2
8 16 8 16
= :2969
3. The probability that the load exceeds 1 is
IP(X > 1) = 1 − IP(X ≤ 1) = 1 − F (1)

» –
1 3
=1− + (1)2 = :6875
8 16
30 / 127
In some situations, the cdf of a rv X is given without mentioning whether

or not X is continous. The following result gives a way to check if X is
continuous and, if so, to nd its pdf using a consequence of the
Fundamental Theorem of Calculus.
Theorem
If the cdf F of a rv X is continuous everywhere and dierentiable except
possibly at some numbers, then X is continuous and, at every x at which
the derivative F ′ (x) exists,
F ′ (x) = f (x)
Remark
If the cdf of a rv X is not continuous, then X cannot be continuous.
31 / 127
Example 6. When X has a uniform distribution on [a; b], recall that its
cdf is given by 8
< 0 x <a
x−a
F (x) = b−a
a≤x ≤b
1 x >b
:
Notice that F is continuous everywhere and dierentiable except at x = a

and x = b , where the graph of F has sharp corners.
′
Since F (x) = 0 for x < a and F (x) = 1 for x > b , f (x) = F (x) = 0 for
such x . For a < x < b ,
„ «
′ d x −a 1
f (x) = F (x) = =
dx b − a b−a
32 / 127
Outline

Expected Values
33 / 127
The 40th percentile of a distribution is the score (X value) that exceeds

40% of all scores and is exceeded by 60% of all scores. Similarly, we can
dene any (100p)th percentile, 0 < p < 1.
Denition
Let p be a number between 0 and 1. The (100p )th percentile of the
distribution of a continuous rv X, denoted by ”(p), is dened by
Z ”(p)
p = F (”(p)) = f (x)dx
−∞
34 / 127
2`
According to Expression (4.2), h(p) is that value on the measurement axis such that
In words, ”(p) is that value on the x -axis such that 100p% of the area
100p% of the area under the graph of f (x) lies to the left of h(p) and 100(1 2 p)%
under the graph of f lies to the left of ”(p) and 100(1 − p)% lies to the
lies to the right. Thus h(.75), the 75th percentile, is such that the area under the graph
of fright.
(x) to the left of h(.75) is .75. Figure 4.10 illustrates the definition.
f (x) F(x)
Shaded area 5 p 1
p 5 F(h (p))
h ( p) h ( p) x
Figure 4.10 The (100p)th percentile of a continuous distribution

Figure: The (100p)th percentile of a continuous distribution
9 The distribution of the amount of gravel (in tons) sold by a particular construction
supply company in a given week is a continuous rv X with pdf
3 35 / 127
Example 7. Let X denote the amount of time (in hours) a book on two
hour reserve is actually checked out. The following pdf of X is proposed:
x
ȷ
2
0≤x ≤2
f (x) =
0 otherwise
1. Compute the 80th percentile of the distribution of X.

2. Sixty percent of books will be checked out longer than what time?
36 / 127
1. We start by computing the cdf of X. For any number x between 0

and 2, the cdf is
x
y 2 ˛˛y =x x2
Z
y
F (x) = dy = =
2 4 y =0 4
˛
0
The p th percentile of this distribution satises the equation
[”(p)]2
p = F (”(p)) =
4
2 √
that is, [”(p)] = 4p , which gives ”(p) = 2 p .
The 80th percentile corresponds to p = :8 and is given by
√
”(:8) = 2 :8 = 1:7889
We can say that in 80% of the books were checked for less than 1.8
hours.
37 / 127
2. The value that cuts o the top 60% of the values of X is exactly the
value that cuts o the lower 40%. So we need to compute the 40th
percentile. That is
√
”(:4) = 2 :4 = 1:2649
Sixty perent of the books will be check for more than 1.26 hours.
38 / 127
ON The median of a continuous distribution, denoted by m ,, is the 50th percentile,
,
Percentiles ,
of a Continuous Distribution
so m satisfies .5 5 F(m). That is, half the area under the density curve is to the
, and half is to the right of ,
left of m m.
Denition
The median of a continuous distribution, denoted by —
˜ , is the 50th
percentile, so —
˜ satises F (—)
˜ = :5 . That is, half the area under the
A continuous distribution whose pdf is symmetric—the graph of the pdf to the
density curve f is to the left of —
left of some point is a mirror image˜ ,ofand
the half
graphis to the right of — ˜ . point—has
to the right of that
median m , equal to the point of symmetry, since half the area under the curve lies
to either side of this point. Figure 4.12 gives several examples. The error in
A continuous distribution whose pdf is symmetric has a median — ˜ equal
a measurement of a physical quantity is often assumed to have a symmetric
to the point of symmetry, since half the area under the curve lies to
distribution.
either side of this point.
f (x) f(x) f(x)
x x x
a m˜ b m̃ m̃
Figure 4.12 Medians of symmetric distributions

Figure: Medians of symmetric distributions
Expected Values 39 / 127

Outline

Expected Values
40 / 127
Expected Values
For a discrete random variable X, IE(X) was obtained by summing

x · p(x) over possible X values. Here we replace summation by
integration and the pmf by the pdf to get a continuous weighted average.
Denition
The expected or mean value of a continuous rv X with pdf f is
Z ∞
—X = IE(X) = x · f (x)dx
−∞
When the pdf f species a model for the distribution of values in a

numerical population, then the expected value is the population mean
and is denoted by —.
41 / 127
Expected Values
Example 8. Consider a continuous rv X with pdf
3
− x 2)
ȷ
2
(1 0≤x ≤1
f (x) =
0 otherwise
Compute the expected value of X.
42 / 127
Expected Values
Example 8 (continued). The mean value of X is
Z ∞ Z 1
3
IE(X) = x · (1 − x 2 )dx
x · f (x)dx =
−∞ 0 2
Z 1 „ 2
x 4 ˛˛x=1
«
3 3 3 x
= (x − x )dx = −
2 0 2 2 4
˛
x=0
3
=
8
¨
43 / 127
Expected Values
Proposition
If X is a continuous rv with pdf f and h(X) is any function of X, then
Z ∞
IE[h(X)] = —h(X) = h(x) · f (x)dx
−∞
44 / 127
Expected Values
Example 9. Let X be a rv that has a uniform distribution on [0; 1], and let
0 ≤ X < 21
ȷ
1−X if
h(X) = max(X; 1 − X) = 1
X if
2
≤X≤1
Compute the expected value of h(X).
45 / 127
Expected Values
Example 9 (continued). Let X be a rv that has a a uniform distribution

on [0; 1], and let
Z ∞ Z 1
IE[h(X)] = h(x) · f (x)dx = max(x; 1 − x)dx
−∞ 0
Z 1=2 Z 1
= max(x; 1 − x)dx + max(x; 1 − x)dx
0 1=2
Z 1=2 Z 1
= (1 − x)dx + xdx
0 1=2
x 2 ˛˛x=1=2 x ˛ 2 ˛x=1
„ «
= x− + ˛
2 2 x=1=2
˛
x=0
1 1 1 1 3
= − + − =
2 8 2 8 4
¨
46 / 127
Expected Values
Denition
The variance of a continuous random variable X with pdf f and mean
value — is
Z ∞
s 2 = sX2 = V(X) = IE[(X − —)2 ] = (x − —)2 · f (x)dx
−∞
p
The standard deviation (SD) of X is ff = ffX = V(X).
The variance and standard deviation give quantitative measures of how

much spread there is in the distribution or population of x values. The
SD is roughly the size of a typical deviation from —.
47 / 127
Expected Values
Computation of the variance is facilitated by using the same shortcut

formula employed in the discrete case.
Proposition
V(X) = IE(X 2 ) − [IE(X)]2
48 / 127
Expected Values
Proposition
If h(X) is an ane function of X , h(X) = aX + b , the expected value
and variance of h(X) satisfy the same properties as in the discrete case:
IE[h(X)] = IE(aX + b) = aIE(X) + b = a— + b
and
V[h(X)] = V(aX + b) = a2 V(X) = a2 ff 2
49 / 127
Expected Values
Proposition
If X has a uniform distribution on [a; b], then
a+b (a − b)2
IE[X] = and V(X) =
2 12
In fact,
∞ b
x 2 ib
Z Z
x a+b
IE(X) = xf (x; a; b)dx = dx = =
−∞ a b−a 2(b − a) a 2
Moreover,
b
x2 x 3 ib b 2 + ab + a2
Z
2
IE(X )= dx = =
a b−a 3(b − a) a 3
b 2 +2a2 b+2ab 2 +a2

` a+b ´2 (a−b)2
Hence, V(X) = 3
− 2
= 12
.
50 / 127
Outline

Expected Values
51 / 127
Chebyshev's inequality holds for all distributions ( discrete and

continuous), and provides bounds for X being far away from the mean.
Proposition
Let X be a random variable with mean — and variance ff 2 . Then, for any
positive number c,
ff 2
IP(|X − —| ≥ c) ≤
c2
Remark
Chebyshev's inequality holds also for IP(|X − —| > c).
52 / 127
Remark
Let k be a positive integer. Taking c = kff in the Chebychev's inequality,
gives
1
IP(|X − —| ≥ kff) ≤
k2
This means that the probability of X taking values at least kff units from
its mean is at most 1=k 2 ; equivalently, the probability that X takes
values within k standard deviations of the mean is at least 1 − 1=k 2 . In
particular,
• X takes values
the probability that within two standard deviations of
0:75.
the mean, is at least
• the probability that X takes values within three standard deviations

of the mean, is at least 0:89.
53 / 127
at least 89%
at least 75%
34% 34%
µ-3σ µ-2σ µ µ+2σ µ+3σ
Figure: Visual illustration of Chebyshev's Inequality: The percentages are the areas of
the density curve between — − kff and — + kff for k = 2; 3
54 / 127
Example 10. Flip a fair coin 3000 times. What does Chebyshev inequality
say about the probability that the fraction of heads being between 0.46
and 0.54?
55 / 127
Example 10 (continued). If we denote by X the number of heads, then

X ∼ B(n; p), with n = 3000 and p = :5. Thus,
— = IE(X) = (3000) · (:5) = 1500 and
ff 2 = V(X) = (3000) · (:5) · (1 − :5) = 750. The fraction of heads is Xn ,
X
` ´
so the probability of interest is IP :46 ≤ ≤ :54 . Now write
n
„ «
X
IP :46 ≤ ≤ :54 = IP(1380 ≤ X ≤ 1620)
n
= IP(1380 − — ≤ X − — ≤ 1620 − —)
= IP(1380 − 1500 ≤ X − — ≤ 1620 − 1500)
= IP(−120 ≤ X − — ≤ 120)
= IP(|X − —| ≤ 120)
= 1 − IP(|X − —| > 120)
56 / 127
Example 10 (continued). Chebyshev's inequality gives
ff 2 750
IP(|X − —| > 120) ≤ = = :0521
1202 14400
Hence, „ «
X
IP :46 ≤ ≤ :54 ≥ 1 − :0521 = :9479
n
Hence, the probability that the fraction of heads is between .46 and .54
in at least .94. ¨
57 / 127
Example 11. Let X be a continuous rv with pdf
5
ȷ
x6
x ≥1
f (x) =
0 otherwise
1. What bound does Chebyshev's inequality give for the probability

IP(X ≥ 2:5)?
2. For what value of a can we say IP(X ≥ a) ≤ 15%?
58 / 127
Example 11 (continued). We calculate the mean as
Z ∞ Z ∞
5
—= x · f (x)dx = x· dx
−∞ 1 x6
∞
−5 ˛˛x=∞
Z
5
= dx =
x5 4x 4 x=1
˛
1
5
=
4
The variance isff 2 = IE(X 2 ) − [IE(X)]2 , where
Z ∞ Z ∞
2 5 −5 ˛˛x=∞ 5
IE(X ) = x 2 · f (x)dx = 4
dx = 3
=
x 3x x=1 3
˛
−∞ 1
Thus,
„ «2
2 2 5 2 5 5
ff = IE(X ) − [IE(X)] = − = = :1042
3 4 48
59 / 127
1. We write
IP(X ≥ 2:5) = IP(X − — ≥ 2:5 − —) = IP(X − — ≥ 1:25)

ff 2
≤ IP(|X − —| ≥ 1:25) ≤ = :0667
1:252
2. If we want IP(X ≥ a) ≤ :15 using Chebyshev's inequality, then we
need to write (notice that, from the dentiion of f, a must be
greater than 1)
IP(X ≥ a) = IP(X − — ≥ a − —) = IP(X − — ≥ a − 1:25)

ff 2
≤ IP(|X − —| ≥ a − 1:25) ≤
(a − 1:25)2
5
=
48(a − 1:25)2
5
Taking
48(a−1:25)2
≤ 0:15 gives a ≥ 2:0833
¨
60 / 127
Outline

Expected Values
61 / 127
Denition
A continuous rv X is said to have a normal distribution with parameters
— and ff 2 , where −∞ < — < ∞ and ff > 0 if the pdf of X is
1 2 2
f (x; —; ff) = √ e−(x−—) =(2ff ) for −∞<x <∞
ff 2ı
The statement that X is normally distributed with parameters — and ff 2

2
is usually abbreviated X ∼ N(—; ff )
R∞
Clearly, f (x; —; ff) > 0. We can also verify that
−∞
f (x; —; ff) = 1. The
rst condition implies that X takes its values between −∞ and ∞.
62 / 127
Proposition
If X ∼ N(—; ff 2 ), then
IE(X) = —; V(X) = ff 2 and ffX = ff
63 / 127
In the following gure, we picture graphs of the density f (x; —; ff) for two
dierent values of — but the same ff .
Each density curve is symmetric about — and bell-shaped, so the center

of the bell (point of symmetry) is both the mean of the distribution and
the median (— =—
˜ ). The mean — is a location parameter, since changing
its value rigidly shifts the density curve to one side or the other.
64 / 127
To see the eect of ff on the density curve, we plot, in the following

gure, f (x; —; ff) for the same — and dierent values of ff .
Ususally, ff is referred to as a scale parameter, because changing its value

stretches or compresses the curve horizontally without changing the basic
shape. A large value of ff corresponds to a density curve that is quite
spread out about —, whereas a small value yields a highly concentrated
curve.
65 / 127
If X ∼ N(—; ff 2 ), then for any numbers a and b such that a ≤ b,

Z b
1 2 2
IP(a ≤ X ≤ b) = √ e−(x−—) =(2ff ) dx
a ff 2ı
Unfortunately, we cannot compute this integral explicitly using standard
integration techniques.
Instead, for —=0 and ff = 1, such integrals have been calculated using
numerical techniques and tabulated for certain values of a and b.
66 / 127
Denition
The normal distribution with parameters —=0 and ff = 1 is called the
standard normal distribution. A random variable having a standard
normal distribution is called a standard normal random variable and will
be denoted by Z. The pdf of Z is
1 2
f (z; 0; 1) = √ e−z =2 −∞<z <∞
2ı
The graph of f (z; 0; 1) is called the standard normal (or z) curve. The
cdf of Z is Z z
IP(Z ≤ z) = f (y ; 0; 1)dy
−∞
and will be denoted by Φ(z). Moreover, Φ is increasing (without ever

remaining constant).
67 / 127
The standard
The Normal normal distribution
Distribution almost never serves as a model for a natura
ing population. Instead, it is a reference distribution from which informat
ut other normal distributions can be obtained. Appendix Table A.3 giv
The values Φ(z) = IP(Z ≤ z) are tabulated for
) 5 P(Z # z), the−3:48;
z = −3:49; area :under the
: : ; 3:48; standard
3:49 . normal density curve to the left of
z 5 23.49, 23.48,…, 3.48, 3.49. Figure 4.14 illustrates the type of cumulat
a (probability) tabulated
On the graph of the in Table we
z curve, A.3. From
can this
picture table,
the value various otheris probabilit
Φ(z), which the
olving Z can be calculated.

area under the standard normal density curve to the left of z .
Shaded area 5 F(z)
Standard normal (z) curve
0 z
Figure 4.14 Standard normal cumulative areas tabulated in Appendix Table A.3
/ 1271.
’s determine the following standard normal probabilities: (a) P(Z68#
Example 12. Using the z table, determine the following standard normal
probabilities:
1. IP(Z ≤ 1:25)
2. IP(Z > 1:25)
3. IP(Z ≤ −1:25)
4. IP(−:38 ≤ Z ≤ 1:25)
5. IP(Z ≤ 5)
69 / 127
1. IP(Z ≤ 1:25) = Φ(1:25) = :8944 (this is the value at the

intersection of the row marked 1.2 and the column marked .05).
2. IP(Z > 1:25) = 1 − IP(Z ≤ 1:25) = 1 − Φ(1:25) = 1 − :8944 = :1056

3. IP(Z ≤ −1:25) = Φ(−1:25) = 0:1056.
4. IP(−:38 ≤ Z ≤ 1:25) = Φ(1:25) − Φ(−:38) = :8944 − :3520 = :5424.
5. IP(Z ≤ 5) = Φ(5). This probability does not appear in the z table
because the last row is labeled 3.4. However, the last entry in that
row is Φ(3:49) = :9998. Since Φ is a non-decreasing function (as
any cdf ), we can deduce that Φ(5) ≥ Φ(3:49). Thus, we conclude
that IP(Z ≤ 5) = Φ(5) ≈ 1.
¨
70 / 127
In the previous example, the probability IP(Z ≤ 1:25) = Φ(1:25) can be

pictured on the z 4.3tothe
cruve. This is the shaded area theNormal of 1:25. 159
left Distribution
Shaded area 5 F(1.25)

z curve z curve
0 1.25 0 1.25
(a) (b)
On the other hand, in order to compute Φ(−1:25) (called the lower-tail
Figure 4.15 Normal curve areas (probabilities) for Example 4.13
area), notice that, by symmetry of the z curve, the lower-tail area and the
upper-tail area are equal, i.e. Φ(−1:25) = IP(Z > 1:25). This leads to
c. P(Z # 21.25) 5 F(21.25), a lower-tail area. Directly from Appendix Table
A.3, F(21.25)Φ(−1:25) − IP(Z ≤of1:25)
5 .1056.=By1 symmetry = 1 −this
the z curve, Φ(1:25)
is the same answer as
in part (b).
This applies#to
d. P(2.38 any
Z# number
1.25) z . under the standard normal curve above the inter-
is the area
val whose left endpoint is 2.38 and whose right endpoint is 1.25. From Section
4.2, if X is a continuous rv with cdf F(x), then P(a # X # b) 5 F(b) 2 F(a).
Thus P(2.38 # Z # 1.25) 5 F(1.25) 2 F(2.38) 5 .8944 2 .3520 5 .5424.
(See Figure 4.16.) 71 / 127
Proposition
For any number z
Φ(−z) = 1 − Φ(z):
Therefore, one table is sucent to compute all the Φ(z). We usually use
the one for z ≥ 0.
The foregoing propostion gives Φ(0) = :5. Since Φ is (strictly)

increasing, then, for any negative number z , Φ(z) < :5 and for any
positive number z , Φ(z) > :5.
72 / 127
The z table gives for xed z the area under the standard normal curve to
the left of z. In some situations, we have area and want the value of z.
The table can be used in an inverse fashion: Find in the middle of the
table the probability Φ(z), the row and column in which it lies identify
the desired value z.
This is useful when we need to compute the percentiles of the standard

normal distribution. For a given p , 0 < p < 1 that does not appear in the
table, to nd corresponding value z such that Φ(z) = p , the number
closest to p is typically used. To get more accurate answer, we can use
linear interpolation.
73 / 127
Example 13. Find the 99th percentile, the rst percentile, the 95th
percentile and the fth percentile of the standard normal distribution.
74 / 127
Example 13 (continued). The 99th percentiles satises Φ(z) = :99. Since

.99 does not appear in the table we use the closest number to it, which is
.9901. The probability .9901 lies at the intersection of the row marked
2.3 and column marked .03, so the 99th percentile is (approximately)
z = 2:33.
By symmetry, the rst percentile is as far below 0 as the 99th is above 0,

so equals −2:33 (1% lies below the rst and also above the 99th).
To nd the 95th percentile, we look for .9500 inside the table. Although
it does not appear, both .9495 and .9505 do, corresponding to z = 1:64
and 1.65, respectively. Since .9500 is halfway between the two
probabilities that do appear, we will use 1.645 as the 95th percentile and
−1:645 as the 5th percentile. ¨
75 / 127
NotationDistribution
The Normal
za will denote the value on the z axis for which a of the area under the z curve
lies to the right of za. (See Figure 4.19.)
Let ¸ be a number between 0 and 1. We denote by z¸ the value on the z
axis for which ¸ of the area under the z curve lies to the right of z¸ .
ForFor example,zz.10 captures
example, upper-tail area .10, and z.01. captures upper-tail area .01.
:05 captures upper-tail area :05
z curve Shaded area 5 P(Z $ za ) 5 a
za
Figure 4.19 za notation Illustrated

So z¸ is the 100(1 − ¸)th percentile of the standard normal distribution.
By symmetry the area under the standard normal curve to the left of −z¸
Since a of the area under the z curve lies to the right of za,1 2 a of the area
is also ¸.itsThe
lies to z¸ 's zare
left.
Thus usually referred to as z critical values.
a is the 100(1 2 a)th percentile of the standard normal distri-
bution. By symmetry the area under the standard normal curve to the left of 2za is
also a. The za’s are usually referred to as z critical values. Table 4.1 lists the most
useful z percentiles and za values.
76 / 127
Percentile 90 95 97.5 99 99.5 99.9 99.95
(upper-tail
aThe Normal
area) .1 .05
Distribution
.025 .01 .005 .001 .0005
za 5 100s1 2 adth 1.28 1.645 1.96 2.33 2.58 3.08 3.27
percentile
critical value z:05 is the 100(1 − :05)th = 95th

z.05 is Example
the 100(114.2 The
.05)th 5 95th percentile of the standard normal distribution, so
percentile of the standard normal distribution, so z:05 = 1:645. By
z.05 5 1.645. The area under the standard normal curve to the left of 2z.05 is also
symmetry of the z curve, the area under the standard normal curve to
.05. (See Figure 4.20.)
the left of −z:05 is also .05.
z curve
Shaded area 5 .05 Shaded area 5 .05
21.645 5 2z.05 z.05 5 95th percentile 5 1.645
Figure 4.20 Finding z.05 n

¨
Nonstandard Normal Distributions

When X , N(m, s 2), probabilities involving X are computed by “standardizing.” The
77 / 127
Using the z table, we can treat nonstandard normal distributions.

In fact, when X ∼ N(—; ff 2 ), probabilities involving X are computed by
standardizing. The standardized variable is
X −—
ff
Subtracting — shifts the mean from — to zero, and then dividing by ff
scales the variable so that the standard deviation is 1 rather than ff .
78 / 127
Proposition
If X has a normal distribution with mean — and standard deviation ff ,
then
X −—
Z=
ff
has a standard normal distribution. Thus
„ «
a−— X −— b−—
IP(a ≤ X ≤ b) = IP ≤ ≤
ff ff ff
„ « „ «
b−— a−—
=Φ −Φ
ff ff
„ « „ «
a−— b−—
IP(X ≤ a) = Φ and IP(X ≥ b) = 1 − Φ
ff ff
79 / 127
Example 15. Let X be a normal rv having a mean value of 1.25 and a

standard deviation of .46. What is the probability that X is between 1
and 1.75? The probability that X will exceed 2?
80 / 127
Example 15 (continued). Since X ∼ N(1:25; :462 ), then
X − 1:25
Z= ∼ N(0; 1)
:46
Thus,
„ «
1 − 1:25 X − 1:25 1:75 − 1:25
IP(1 ≤ X ≤ 1:75) = IP ≤ ≤
:46 :46 :46
„ «
:25 :5
= IP − ≤Z≤
:46 :46
„ « „ «
:5 :25
=Φ −Φ −
:46 :46
= Φ(1:09) − Φ(−:54)
= Φ(1:09) − (1 − Φ(:54))
= :8621 − 1 + :7054
= :5675
81 / 127
Example 15 (continued). The probability that X exceeds 2 is
„ «
2 − 1:25
IP(X ≥ 2) = IP Z ≥
:46
= IP(Z > 1:63)
= 1 − Φ(1:63)
= 1 − :9484
= :0516
82 / 127
Example 16. What is the probability that a rv X ∼ N(—; ff 2 ) is within 1

standard deviation of its mean value? 2 standard deviations? 3 standard
deviations?
83 / 127
Example 16 (continued). Denote Z = Z−— ff

. Since Z ∼ N(0; 1), we have
„ «
— − ff − — X −— — + ff − —
IP(— − ff ≤ X ≤ — + ff) = IP ≤ ≤
ff ff ff
= IP(−1 ≤ Z ≤ 1) = Φ(1) − Φ(−1)
= Φ(1) − (1 − Φ(1)) = 2Φ(1) − 1 = :6826
Similarly,
„ «
— − 2ff − — X −— — + 2ff − —
IP(— − 2ff ≤ X ≤ — + 2ff) = IP ≤ ≤
ff ff ff
= IP(−2 ≤ Z ≤ 2) = Φ(2) − Φ(−2)
= Φ(2) − (1 − Φ(2)) = 2Φ(2) − 1 = :9544
and
IP(— − 3ff ≤ X ≤ — + 3ff) = 2Φ(3) − 1 = :9974
¨
84 / 127
Proposition (The Empirical Rule)

If the population distribution of a variable is (approximately) normal,
then
1. Roughly 68% of the values are within 1 SD of the mean.
2. Roughly 95% of the values are within 2 SDs of the mean.
3. Roughly 99.7% of the values are within 3 SDs of the mean.
Hence, it is unlikely to observe a value from a normal population that is

much farther than 2 standard deviations from —.
85 / 127
50% 50%
34% 34%
13.7% 13.7%
2.15% 2.15%
0.15% 0.15%
µ-3σ µ-2σ µ-σ µ µ+σ µ+2σ µ+3σ
68%
95.4%
99.7%
Figure: Illustration of the Empirical Rule
86 / 127
good approximation, especially in the middle part of the picture. The area of any
rectangle (probability of any particular X value) except those in the extreme tails can
be accurately approximated by the corresponding normal curve area. For example,
P(X 5 10) 5 B(10; 25, .6) 2 B(9; 25, .6) 5 .021, whereas the area under the normal
curve between
In the following 9.5 and we
gure, 10.5 display
is P(22.25a#histogram
Z # 21.84) 5 for.0207.a binomial rv
X ∼pB(25; :6), for which the mean is — = (25) · (:6) = 15 and the SD is
√Distribution n p
ff = (25) · (:6) · (:4) = Binomial
6 = 2:449 0.6 normal curve with this — and ff
25 . A
has been superimposed on Distribution Mean StDev

the probability histogram.
Normal 15 2.449
0.18
0.16
0.14
0.12
0.10
Density
0.08
0.06
0.04
0.02
0.00
5.0 7.5 10.0 12.5 15.0 17.5 20.0 22.5
X
Figure 4.25 Binomial probability histogram for n 5 25, p 5 .6 with normal approximation
curve superimposed
87 / 127
The normal curve gives a very good approximation, especially in the

middle part of the picture. The area of any rectangle (probability of any
particular X value) except those in the extreme tails can be accurately
approximated by the corresponding normal curve area.
Therefore, if Y ∼ N(15; 6), then Y can be used to compute probabilities

involving X but extra care should be taken to ensure that probabilities
are computed in an accurate manner.
88 / 127
In fact, the rectangles of the histogram of X are centered at integers. For

example, IP(X = 10), corresponds to a rectangle beginning at 9.5, and
ending at 10.5. In order to compute this probability using Y, we have to
consider the area under the approximating normal curve between 9.5 and
10.5, which is IP(9:5 ≤ Y ≤ 10:5). As a matter of fact,
„ «
25
IP(X = 10) = (:6)10 · (:4)15 = :0212
10
whereas
IP(9:5 ≤ Y ≤ 10:5) = IP(−2:25 ≤ Z ≤ −1:84) = :0207

Y −—
where Z= ff
∼ N(0; 1).
The correction for discreteness that consists of adding or substracting .5

before standardizing is often called a continuity correction.
89 / 127
Proposition
Let X be a binomial rv based on n trials with success probability p. Then
if the binomial probability histogram is not too skewed, X has
√
approximately a normal distribution with — = np and ff = npq . In
particular, for a possible value x of X,
„ «
x + :5 − np
IP(X ≤ x) = B(x; n; p) ≈ Φ √
npq
and „ «
x − :5 − np
IP(X ≥ x) = 1 − B(x; n; p) ≈ 1 − Φ √
npq
In practice, the approximation is adequate provided that both np ≥ 10

and nq ≥ 10 (i.e. the expected number of successes and the expected
number of failures are both at least 10), since there is then enough
symmetry in the underlying binomial distribution.
90 / 127
Example 17. Suppose that 25% of all students at a large university

receive nancial aid. Let X be the number of students in a random
sample of size 50 who receive nancial aid.
1. Identify the distribution ofX then compute the probability that:

a. At most 10 students receive aid.
b. Between 5 and 15 (inclusive) of the selected students receive aid.
2. Use the normal approximation to compute the two previous
probabilities.
91 / 127
1. The rv X has a binomial distribution with parameters n = 50 and

p = :25.
a. The probability that at most 10 students receive aid is
10
IP(X ≤ 10) = B(10; 50; :25) =
X
b(x; 50; :25)
x=0
10
!
X 50
= (:25)x · (:75)50−x = :2622
x=0
x
b. The probability that between 5 and 15 (inclusive) of the selected

students receive aid is
IP(5 ≤ X ≤ 15) = B(15; 50; :25) − B(4; 50; :25)
15
!
X 50
= (:25)x · (:75)50−x = :8348
x=5
x
92 / 127
√
2. Let— = IE(X) = np = 12:5 and ff = ffX = npq = 3:06. Since
np = (50) · (:25) = 12:5 > 10 and nq = 37:5 > 10, the
approximation with a normal distribution can safely be applied.
a. The probability that at most 10 students receive aid is
„ «
10 + :5 − 12:5
IP(X ≤ 10) ≈ Φ = Φ(−:65) = :2578
3:06
b. The probability that between 5 and 15 (inclusive) of the selected

students receive aid is
„ « „ «
15 + :5 − 12:5 5 − :5 − 12:5
IP(5 ≤ X ≤ 15) ≈ Φ −Φ
3:06 3:06
= Φ(:98) − Φ(−2:61) = :8320
The approximations are quite good.
93 / 127
Outline

Expected Values
94 / 127
Denition
A rv X is said to have an exponential distribution with (scale) parameter
–, – > 0, if the pdf of X is
–e−–x
ȷ
x ≥0
f (x; –) =
0 otherwise
R∞
f (x; –) > 0 and −∞ f (x; –)dx = 1.
Clearly,
Moreover, since f (x; –) = 0 for x < 0, X will take non negative values
almost surely (i.e. with probability 1).
95 / 127
In the following gure, we picture exponential density curves f (x; –) for

three dierent values of –. 4.4 the exponential and Gamma Distributions 171
f (x; l)
2
l52
1 l 5 .5
l51
.5
Figure 4.26 Exponential density curves

Note that, for large values of –, the curve is more compressed vertically
The
to exponential pdf to
0. This is due is easily integrated
the fact that thetostandard
obtain thedeviation
cdf. of X is 1=– (see
next slide).
F(x; l) 5 5 0
2lx
x,0
96 / 127
Proposition
If X has an exponential distribution with parameter –, then
1 1 1
IE(X) = ; V(X) = and ffX =
– –2 –
The expected value of X require doing integration by parts (u = x,

v ′ = –e−–x ):
Z ∞ Z ∞
IE(X) = xf (x; –)dx = x–e−–x dx
−∞ 0
˛x=∞ Z ∞ 1 ˛x=∞ 1
= −xe−–x ˛ + e−–x dx = 0 − e−–x ˛ =
˛ ˛
x=0 0 – x=0 –
97 / 127
2
For the variance, the computation of IE(X ) requires integrating by parts
′ −–x
twice in succession (u =x 2
and v = –e u = x and v ′ = e−–x ):
, then
Z ∞ ˛x=∞ Z ∞
2
IE(X ) = x 2 –e−–x dx = −x 2 e−–x ˛ + 2xe−–x dx
˛
0 x=0 0
Z ∞ Z ∞ −–x «
xe−–x ˛˛x=∞
„
e
=0+2 xe−–x dx = 2 − + dx
– x=0 –
˛
0 0
2 ∞ −–x
Z ˛x=∞
2 2
= e dx = − 2 e−–x ˛ = 2
˛
– 0 – x=0 –
Thus,
„ «2
2 2 2 1 1
V(X) = IE(X ) − [IE(X)] = 2 − = 2
– – –
98 / 127
Proposition
If X has an exponential distribution with parameter –, then the cdf of X
is given by:
1 − e−–x
ȷ
x ≥0
F (x; –) =
0 x <0
The exponential cdf is obtained by integrating the pdf: For any number
Rx
x , F (x; –) = −∞
f (y ; –)dy .
• If x < 0, F (x; –) = 0 (because the density vanishes on the negative
x axis).
• If x ≥ 0, then
Z x ˛y =x
F (x; –) = –e−–y dy = −e−–y ˛ = 1 − e−–x
˛
0 y =0
99 / 127
Example 18. The distribution of stress range in certain bridge

connections is exponential with mean value 6 MPa. Compute the
probability that stress range is at most 10 MPa and the probability that
stress range is between 5 and 10 MPa.
100 / 127
Example 18 (continued). Since IE(X) = 1=– = 6, then – = 1=6 = :1667.

The probability that stress range is at most 10 MPa is
IP(X ≤ 10) = F (10; :1667) = 1 − e−(:1667)·(10) = :8112
The probability that stress range is between 5 and 10 MPa is
IP(5 ≤ X ≤ 10) = F (10; :1667) − F (5; :1667)

= (1 − e−(:1667)·(10) ) − (1 − e−(:1667)·(5) )
= e−:8333 − e−1:667
= :2457
101 / 127
An important application of the exponential distribution is to model the

distribution of component lifetime. This is due to the fact that this
distribution has the memoryless property.
Suppose component lifetime is exponentially distributed with parameter

–. After putting the component into service, we leave for a period of t0
hours and then return to nd the component still working. What is the
probability that it lasts at least an additional t hours?
102 / 127
We wish IP(X ≥ t + t0 |X ≥ t0 ). By the denition of conditional

probability
IP[(X ≥ t + t0 ) ∩ (X ≥ t0 )]
IP(X ≥ t + t0 |X ≥ t0 ) =
IP(X ≥ t0 )
But (X ≥ t + t0 ) ∩ (X ≥ t0 ) = (X ≥ t + t0 ). Therefore,
IP(X ≥ t + t0 )
IP(X ≥ t + t0 |X ≥ t0 ) =
IP(X ≥ t0 )
1 − F (t + t0 ; –) e−–(t+t0 )
= = = e−–t
1 − F (t0 ; –) e−–t0
This conditional probability is identical to the original probability

IP(X ≥ t) that the component lasted t hours. Thus the distribution of
additional lifetime is exactly the same as the original distribution of
lifetime, so the distribution of remaining lifetime is independent of
current age.
103 / 127
The exponential distribution is frequently used as a model for the

distribution of times between the occurrence of successive events, such as
customers arriving at a service facility or calls coming in to a
switchboard. The reason for this is that the exponential distribution is
closely related to the Poisson process.
Proposition
Suppose that the number of events occurring in any time interval of
length t has a Poisson distribution with parameter ¸t (where ¸, the rate
of the event process, is the expected number of events occurring in 1 unit
of time) and that numbers of occurrences in nonoverlapping intervals are
independent of one another. Then the distribution of elapsed time
between the occurrence of two successive events is exponential with
parameter – = ¸.
104 / 127
Example 19. Suppose that calls to a certain center occur according to a

Poisson process with rate ¸ = :5 call per minute.
1. Compute the probability that more than 2 minutes elapse between

calls.
2. What is the expected time between successive calls?
105 / 127
Example 19 (continued). The number of minutes X between successive

calls has an exponential distribution with parameter value .5.
1. The probability that more than 2 minutes elapse between calls is
IP(X > 2) = 1 − IP(X ≤ 2) = 1 − F (2; :5) = e−(:5)·(2) = :3679
2. The expected time between successive calls is IE(X) = 1=:5 = 2

minutes.
106 / 127
Figure: In the call center, the number of calls is distributed according to a Poisson
distribution, while the elapsed time (in minutes) between the occurrence of two
successive events is exponential
¨
107 / 127
Outline

Expected Values
108 / 127
Denition
For ¸ > 0, the gamma function Γ(¸) is dened by
Z ∞
Γ(¸) = x ¸−1 e−x dx
0
The most important properties of the gamma function are the following:
1. For any ¸ > 1, Γ(¸) = (¸ − 1) · Γ(¸ − 1) [via integration by parts].
2. For any positive integer, n, Γ(n) = (n − 1)!.

√
3. Γ(1=2) = ı.
109 / 127
Example 20. Compute the following integrals:

R∞
1. x 5 e−x dx .
R0∞
2.
0
x 3 e−x=5 dx .
110 / 127
R∞ R∞
1.
0
x 5 e−x dx = 0
x 6−1 e−x dx = Γ(6) = 5! = 120.
2. We make the change of variable y = x=5. Thus,
Z ∞ Z ∞
x 3 e−x=5 dx = (5y )3 · e−y · 5 dy
0 0
Z ∞
= 54 y 3 e−y dy
0
= (625) · Γ(4) = (625) · (3!)
= (625) · (6) = 3750
111 / 127
Denition
A continuous random variable X is said to have a gamma distribution if
the pdf of X is
1
x ¸−1 e−x=˛
ȷ
˛ ¸ Γ(¸)
x ≥0
f (x; ¸; ˛) =
0 otherwise
where the parameters ¸ and ˛ satisfy ¸ > 0, ˛ > 0. The standard

gamma distribution has ˛ = 1, its pdf is denoted by f (x; ¸).
Remark
The exponential distribution results from taking ¸=1 and ˛ = 1=–.
112 / 127
The exponential distribution results from taking a 5 1 and b 5 1/l.
The GammaFigure 4.27(a) illustrates the graphs of the gamma pdf f(x; a, b) (4.8) for sev-
Distributions
eral (a, b) pairs, whereas Figure 4.27(b) presents graphs of the standard gamma pdf.
For the standard pdf, when a # 1, f(x; a) is strictly decreasing as x increases from 0;
The gure
whenat a.left
1, illustrates the graphs
f(x; a) rises from 0 at x 5 of
0 tothe gamma and
a maximum f (x;
pdfthen ¸; ˛) for
decreases. The
several parameter b in (4.8)
(¸; ˛) pairs, is a scale
whereas theparameter,
plot at and is referred
thearight to as agraphs
presents shape parameter
of the
standardbecause
gamma changing
pdf. its value alters the basic shape of the density curve.
f (x; , ) f (x; ,)

1
= 2, = 3
1.0 1.0 =1
= 1, = 1
= .6
0.5 0.5
= 2, = 2 =2 =5
= 2, = 1
0 x 0 x
1 2 3 4 5 6 7 1 2 3 4 5
(a) (b)
For the standard pdf,

Figure 4.27 (a) when ¸ ≤ 1curves;
Gamma density , f (x;
(b)¸) is strictly
standard decreasing
gamma density curves as x
increases from 0; when ¸ > 1, f (x; ¸) rises from 0 at x =0 to a
The mean and variance of a random variable X˛having
maximum and then decreases. The parameter
the gamma distribution
is a scale parameter, and
f(x; a, b) are
¸ is referred to as a shape parameter because changing its value alters
E(X) 5curve.
the basic shape of the density m 5 ab V(X) 5 s 2 5 ab2
When X is a standard gamma rv, the cdf of X,
x ya21e2y
F(x; a) 5 dy x.0 (4.9)113 / 127
Proposition
The mean and variance of a random variable X having the gamma
distribution f (x; ¸; ˛) are
IE(X) = — = ¸˛ V(X) = ff 2 = ¸˛ 2
When X is a standard gamma rv, the cdf of X,

x
y ¸−1 e−y
Z
F (x; ¸) = dy x >0
0 Γ(¸)
is called incomplete gamma function. There are extensive tables of

F (x; ¸) available.
114 / 127
The incomplete gamma function can also be used to compute

probabilities involving nonstandard gamma distributions.
Proposition
Let X have a gamma distribution with parameters ¸ and ˛. Then for any
x > 0, the cdf of X is given by
„ «
x
IP(X ≤ x) = F (x; ¸; ˛) = F ;¸
˛
where F ( · ; ¸) is the incomplete gamma function.
115 / 127
Outline

Expected Values
116 / 127
The chi-squared distribution is important because it is the basis for a

number of procedures in statistical inference.
Denition
Let ‌ be a positive integer. Then a random variable X is said to have a
chi-squared distribution with parameter ‌ if the pdf of X is the gamma
density with ¸ = ‌=2 and ˛ = 2. The pdf of a chi-squared rv is thus
1
x (‌=2)−1 e−x=2
ȷ
2‌=2 Γ(‌=2)
x ≥0
f (x; ‌) =
0 x <0
The parameter ‌ is called the number of degrees of freedom (df ) of X.

The symbol ffl2 is often used in place of chi-squared.
117 / 127
Taking ¸ = ‌=2 and ˛=2 in the formuals of the mean and variance of a
rv having a gamma distribution gives those of a ffl2 distributed rv:
Proposition
If X ∼ ffl2 (‌), then
IE(X) =‌ and V(X) = 2‌
118 / 127
In the following gure, we picture graphs of the density f (x; ‌) for three
dierent values of ‌
f(x;ν)
0.30 –
ν=1
0.25 –
0.20 – ν=5
0.15 – ν = 10
0.10 –
0.05 –
0.00 –
0 2 4 6 8 10 12 14 16 18 20
As ‌ increases, the curve is shifted to the right and becomes more and
more spread out about ‌.
119 / 127
Outline

Expected Values
120 / 127
The cdf of a discrete rv is a step function while that of a continous rv is

continuous.
There exist random variables that are neither discrete nor continuous, but
are a mixture of both. We call them mixed random variables.
The cdf of a mixed rv takes its values in [0; 1], is right continuous and
non-decreasing, goes to 0 as x → −∞ and increases to 1 as x → ∞. To
compute probabilities involving a mixed rv, the same formulas as for
discrete rv are used:
IP(a ≤ X ≤ b) = F (b) − F (a− ); IP(a < X ≤ b) = F (b) − F (a)

−
IP(a < X < b) = F (b ) − F (a); IP(a ≤ X < b) = F (b − ) − F (a− )
−
where x represents the largest possible value of X that is strictly less
than x . So, if F is continous at x , F (x − ) = F (x). Furthermore,
−
IP(X = x) = F (x) − F (x ).
121 / 127
Example 21. Let X be a continuous rv with the following pdf:
ȷ
2x 0≤x ≤1
f (x) =
0 otherwise
Let also Y = min(X; 12 ), i.e.
1
ȷ
X 0≤X≤ 2
Y = 1
2
X > 12
1. Find the cdf of Y and graph it. Is Y discrete? continuous? Explain.

1
2. Compute IE(Y ), IP(Y = 2
), IP( 41 ≤ Y ≤ 38 ) and IP(Y ≥ 14 ).
122 / 127
1. The cdf of Y at y is given by F (y ) = IP(Y ≤ y ).

• If y < 0, F (y ) = 0 because, by denition, Y is non-negative.
˛x=y
• If 0 ≤ y < 21 , F (y ) = IP(X ≤ y ) = 0 2xdx = x 2 ˛ = y 2.
Ry ˛
x=0
• If y ≥ 21 , F (y ) = 1 because, by denition, Y is at most 21 .
Thus, 8
< 0 y <0
1
F (y ) = y2 0≤y < 2
1
1 ≤y
:
2
123 / 127
The following gure shows the cdf of Y.

F (y)
1 --------,�------
1
I
I
I
I
I
I
I
I
I
I
I
1
-
1
- 1 y
The cdf of Y is not continuous, so Y is not a continuous random

variable. On the other hand, the cdf is not in the staircase form, so
Y is not a discrete random variable either. It is indeed a mixed
random variable.
124 / 127
2. • If we write Y = h(X), then

Z ∞ Z 1
IE(Y ) = IE[h(X)] = h(x) · f (x)dx = h(x)2xdx
−∞ 0
Z 1 Z 1
2 1 1 1 11
=2 x 2 dx + xdx = + − = = :4583
0 1 12 2 8 24
2
−
• IP(Y = 12 ) = F ( 12 ) − F ( 21 ) = 1 − 14 = 43 . This is the amount of the
jump at y = 12 .
“ ”
−
• IP 14 ≤ Y ≤ 38 = F 38 − F 41 = F 38 − F 14 =
` ´ ` ´ ` ´ ` ´
` 3 ´2 ` 1 ´2 5
8
− 4 = 64 = :0781
−
• IP(Y ≥ 14 ) = 1− IP(Y < 41 ) = 1−F ( 14 ) = 1−F ( 41 ) = 1− 16
1
= :9375
−
• IP(Y ≥ 21 ) = 1 − IP(Y < 12 ) = 1 − F ( 21 ) = 1 − 1
4
= :75
125 / 127
Notice that we could have computed IE(Y ) using the cdf FY by mixing its
derivative (where it exists) with its jumps.
In fact, on (0; 12 ), Y (y ) = FY′ (y ) = 2y (this is
has a density given by fY
1
the continuous part of Y ; on (−∞; 0) and on ( ; ∞), the derivative
2
1
vanishes because FY is constant. Moreover, FY has a jump at y = of
2
1 3
amplitude IP(Y = ) = , this is the discrete part of Y . Hence,
2 4
Z 1
2 1 1
IE(Y )= y (2y )dy +IP(Y = )
0 2 2
y 3 ˛˛y = 12 1 3
=2 ˛ + ·
3 y =0 2 4
1 3 11
= + =
12 8 24
126 / 127
Moreover, the derivative (where it exists) of FY is a decient pdf because

its integral is not equal to 1. In fact
Z 1
2 1
2y dy =
0 4
1 1
But if we add to the integral the jump at , IP(Y = ), we get
2 2
1 3
4
+ 4
=1. We retrieve the idea of a total probability of 1 but here, since
Y is a mixed rv, we have to add the integral of the decient pdf and the
decient pmf (i.e. the amplitudes of the jump(s)) to get a total of 1. ¨
127 / 127

STAT230 Chap4

Uploaded by

Copyright:

Available Formats

STAT230 Chap4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

STAT230 Chap4

Uploaded by

Copyright:

Available Formats

STAT 230

Introduction to Probability and Statistics

Chapter 4: Continuous Random Variables and Probability

Chapter 4: Continuous Random Variables and Probability Distributions

Chapter 4: Continuous Random Variables and Probability Distributions

1. Possible values comprise either a single interval on the number line

2. IP(X = c) = 0 for any number c that is a possible value of X.

Example 1. Suppose that, in the study of the ecology of a lake, we make

X = the depth at such a location

Denote M the maximum depth (in meters). Then X is a continuous rv

resulting sequence of histograms approaches a

Figure 4.1 (a) Probability histogram of depth measu

a) Probability histogram of depth measured to the nearest meter; (b)

TION Let X be a continuous rv. Then a probability distribution or probability den­

1. f (x) ≥ 0 for all x

Therefore, if the density of X vanishes outside an interval [a; b], then

since the area under the entire graph of f is 1.

Example 2. Consider a continuous rv X measuring the direction of an

1. Show that f is a legitimate pdf.

1. The graph of f is the following

Shaded area 5 P(90 #

Figure 4.3 The pdf and probability from Example 4.4

Shaded area 5 IP (90 # X #180)

Figure 4.3 The pdf and probability from Example 4.4 n

Probability Distributions for Continuous Variables

A continuous rv X Figure 4.5 distribution

Figure 4.6 The pdf for a uniform distribution

When X is a discrete random variable, each possible value is assigned

This has an important practical consequence: The probability that X lies

IP(a ≤ X ≤ b) = IP(a < X < b) = IP(a < X ≤ b) = IP(a ≤ X < b)

Contrariwise, if X is discrete and both a and b are possible X values,

Example 3. Let X `be a continuous

= e:075 (−e−:75 + e−:075 ) = :491

IP(X < 5) = IP(X ≤ 5) = :491

Chapter 4: Continuous Random Variables and Probability Distributions

The cumulative distribution function F for a discrete rv X gives, for any

Proposition (Properties of a continuous rv's cdf)

• The range of F is [0; 1].

• F is non-decreasing (i.e. if x ≤ y, then F (x) ≤ F (y )).

Figure: Figure 4.6 The pdf for a uniform distribution

The entire cdf is

Figure 4.6 The pdf for a uniform distribution

Example 4 (continued). The entire cdf is

ing F(x ) to Compute Probabilities ¨

Cumulative Distribution Function

1. for any number a, P(a # X # b) 5 F(b) 2 F(a)

2. for any two numbers a and b with a < b ,

Figure 4.8 Computing P(a # X # b) from cumulative probabilities

Example 5. Suppose that the magnitude X of a dynamic load on a bridge

1. Compute the cumulative distribution function of X.

3. Compute the probability that the load exceeds 1.

1. For any number x between 0 and 2

2. The probability that the load is between 1 and 1.5 is

IP(1 ≤ X ≤ 1:5) = F (1:5) − F (1)

3. The probability that the load exceeds 1 is

IP(X > 1) = 1 − IP(X ≤ 1) = 1 − F (1)

In some situations, the cdf of a rv X is given without mentioning whether

Notice that F is continuous everywhere and dierentiable except at x = a

Chapter 4: Continuous Random Variables and Probability Distributions

TION Let X be a continuous rv. Then a probability distribution or probability den

Notice that F is continuous everywhere and dierentiable except at x = a

The p th percentile of this distribution satises the equation

When the pdf f species a model for the distribution of values in a

To see the eect of ff on the density curve, we plot, in the following

Example 13 (continued). The 99th percentiles satises Φ(z) = :99. Since

By symmetry, the rst percentile is as far below 0 as the 99th is above 0,