Important Inequalities

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

1.10.

Important Inequalities 79

which is the the desired result.

Theorem 1.10.2 (Markov’s Inequality). Let u(X) be a nonnegative function of the


random variable X. If E[u(X)] exists, then for every positive constant c,

E[u(X)]
P [u(X) ≥ c] ≤ .
c

Proof. The proof is given when the random variable X is of the continuous type;
but the proof can be adapted to the discrete case if we replace integrals by sums.
Let A = {x : u(x) ≥ c} and let f (x) denote the pdf of X. Then
 ∞  
E[u(X)] = u(x)f (x) dx = u(x)f (x) dx + u(x)f (x) dx.
−∞ A Ac

Since each of the integrals in the extreme right-hand member of the preceding
equation is nonnegative, the left-hand member is greater than or equal to either of
them. In particular,

E[u(X)] ≥ u(x)f (x) dx.
A

However, if x ∈ A, then u(x) ≥ c; accordingly, the right-hand member of the


preceding inequality is not increased if we replace u(x) by c. Thus

E[u(X)] ≥ c f (x) dx.
A

Since 
f (x) dx = P (X ∈ A) = P [u(X) ≥ c],
A

it follows that
E[u(X)] ≥ cP [u(X) ≥ c],

which is the desired result.

The preceding theorem is a generalization of an inequality that is often called


Chebyshev’s Inequality. This inequality we now establish.

Theorem 1.10.3 (Chebyshev’s Inequality). Let X be a random variable with finite


variance σ 2 (by Theorem 1.10.1, this implies that the mean μ = E(X) exists). Then
for every k > 0,
1
P (|X − μ| ≥ kσ) ≤ 2 , (1.10.2)
k
or, equivalently,
1
P (|X − μ| < kσ) ≥ 1 − .
k2
80 Probability and Distributions

Proof. In Theorem 1.10.2 take u(X) = (X − μ)2 and c = k 2 σ 2 . Then we have


E[(X − μ)2 ]
P [(X − μ)2 ≥ k 2 σ 2 ] ≤ .
k2 σ2
Since the numerator of the right-hand member of the preceding inequality is σ 2 , the
inequality may be written
1
P (|X − μ| ≥ kσ) ≤ ,
k2
which is the desired result. Naturally, we would take the positive number k to be
greater than 1 to have an inequality of interest.

Hence, the number 1/k 2 is an upper bound for the probability P (|X − μ| ≥ kσ).
In the following example this upper bound and the exact value of the probability
are compared in special instances.
Example 1.10.1. Let X have the uniform pdf
" 1 √ √
2

3
− 3<x< 3
f (x) =
0 elsewhere.
Based on Example 1.9.1, for this uniform distribution, we have μ = 0 and σ 2 = 1.
If k = 32 , we have the exact probability
   3/2 √
3 1 3
P (|X − μ| ≥ kσ) = P |X| ≥ =1− √ dx = 1 − .
2 −3/2 2 3 2
By Chebyshev’s inequality, this probability has the upper bound 1/k 2 = 49 . Since

1 − 3/2 = 0.134, approximately, the exact probability in this case is considerably
less than the upper bound 49 . If we take k = 2, we have the exact probability
P (|X − μ| ≥ 2σ) = P (|X| ≥ 2) = 0. This again is considerably less than the upper
bound 1/k 2 = 14 provided by Chebyshev’s inequality.
In each of the instances in Example 1.10.1, the probability P (|X − μ| ≥ kσ) and
its upper bound 1/k 2 differ considerably. This suggests that this inequality might
be made sharper. However, if we want an inequality that holds for every k > 0
and holds for all random variables having a finite variance, such an improvement is
impossible, as is shown by the following example.
Example 1.10.2. Let the random variable X of the discrete type have probabilities
1 6 1 2 1
8 , 8 , 8 at the points x = −1, 0, 1, respectively. Here μ = 0 and σ = 4 . If k = 2,
2 1 1
then 1/k = 4 and P (|X − μ| ≥ kσ) = P (|X| ≥ 1) = 4 . That is, the probability
P (|X − μ| ≥ kσ) here attains the upper bound 1/k 2 = 14 . Hence the inequality
cannot be improved without further assumptions about the distribution of X.
A convenient form of Chebyshev’s Inequality is found by taking kσ = for > 0.
Then Equation (1.10.2) becomes
σ2
P (|X − μ| ≥ ) ≤ , for all > 0 . (1.10.3)
2
The second inequality of this section involves convex functions.
186 Some Special Distributions

(b) If r(x) = cebx , where c and b are positive constants, show that X has a
Gompertz cdf given by
" % &
1 − exp cb (1 − ebx ) 0<x<∞
F (x) = (3.3.13)
0 elsewhere.

This is frequently used by actuaries as a distribution of the length of human


life.
(c) If r(x) = bx, linear hazard rate, show that the pdf of X is
" 2
bxe−bx /2 0 < x < ∞
f (x) = (3.3.14)
0 elsewhere.
This pdf is called the Rayleigh pdf.
3.3.27. Write an R function that returns the value f (x) for a specified x when
f (x) is the Weibull pdf given in expression (3.3.12). Next write an R function that
returns the associated hazard function r(x). Obtain side-by-side plots of the pdf
and hazard function for the three cases: c = 5 and b = 0.5; c = 5 and b = 1.0; and
c = 5 and b = 1.5.
3.3.28. In Example 3.3.5, a page of plots of β pdfs was discussed. All of these pdfs
are mound shaped. Obtain a page of plots for all combinations of α and β drawn
from the set {.25, .75, 1, 1.25}. Comment on these shapes.
3.3.29. Let Y1 , . . . , Yk have a Dirichlet distribution with parameters α1 , . . . , αk , αk+1 .
(a) Show that Y1 has a beta distribution with parameters α = α1 and β = α2 +
· · · + αk+1 .
(b) Show that Y1 + · · · + Yr , r ≤ k, has a beta distribution with parameters
α = α1 + · · · + αr and β = αr+1 + · · · + αk+1 .
(c) Show that Y1 + Y2 , Y3 + Y4 , Y5 , . . . , Yk , k ≥ 5, have a Dirichlet distribution
with parameters α1 + α2 , α3 + α4 , α5 , . . . , αk , αk+1 .
Hint: Recall the definition of Yi in Example 3.3.6 and use the fact that the
sum of several independent gamma variables with β = 1 is a gamma variable.

3.4 The Normal Distribution


Motivation for the normal distribution is found in the Central Limit Theorem, which
is presented in Section 5.3. This theorem shows that normal distributions provide
an important family of distributions for applications and for statistical inference,
in general. We proceed by first introducing the standard normal distribution and
through it the general normal distribution.
Consider the integral
 ∞  2
1 −z
I= √ exp dz. (3.4.1)
−∞ 2π 2
3.4. The Normal Distribution 187

This integral exists because the integrand is a positive continuous function that is
bounded by an integrable function; that is,
 2
−z
0 < exp < exp(−|z| + 1), −∞ < z < ∞,
2
and  ∞
exp(−|z| + 1) dz = 2e.
−∞

To evaluate the integral I, we note that I > 0 and that I 2 may be written
 ∞ ∞  2 
1 z + w2
I2 = exp − dzdw.
2π −∞ −∞ 2
This iterated integral can be evaluated by changing to polar coordinates. If we set
z = r cos θ and w = r sin θ, we have
 2π  ∞
1 2
I2 = e−r /2 r dr dθ
2π 0 0
 2π
1
= dθ = 1.
2π 0
Because the integrand of display (3.4.1) is positive on R and integrates to 1 over
R, it is a pdf of a continuous random variable with support R. We denote this
random variable by Z. In summary, Z has the pdf
 2
1 −z
f (z) = √ exp , −∞ < z < ∞. (3.4.2)
2π 2
For t ∈ R, the mgf of Z can be derived by a completion of a square as follows:
 ∞ " #
1 1 2
E[exp{tZ}] = exp{tz} √ exp − z dz
−∞ 2π 2
" # ∞ " #
1 2 1 1 2
= exp t √ exp − (z − t) dz
2 −∞ 2π 2
" # ∞ " #
1 2 1 1
= exp t √ exp − w2 dw, (3.4.3)
2 −∞ 2π 2
where for the last integral we made the one-to-one change of variable w = z − t. By
the identity (3.4.2), the integral in expression (3.4.3) has value 1. Thus the mgf of
Z is " #
1 2
MZ (t) = exp t , for −∞ < t < ∞. (3.4.4)
2
The first two derivatives of MZ (t) are easily shown to be
" #
1 2
MZ (t) = t exp t
2
" # " #
1 2 2 1 2
MZ (t) = exp t + t exp t .
2 2
188 Some Special Distributions

Upon evaluating these derivatives at t = 0, the mean and variance of Z are

E(Z) = 0 and Var(Z) = 1. (3.4.5)

Next, define the continuous random variable X by

X = bZ + a,

for b > 0. This is a one-to-one transformation. To derive the pdf of X, note that
the inverse of the transformation and the Jacobian are z = b−1 (x − a) and J = b−1 ,
respectively. Because b > 0, it follows from (3.4.2) that the pdf of X is
$  2 *
1 1 x−a
fX (x) = √ exp − , −∞ < x < ∞.
2πb 2 b

By (3.4.5), we immediately have E(X) = a and Var(X) = b2 . Hence, in the


expression for the pdf of X, we can replace a by μ = E(X) and b2 by σ 2 = Var(X).
We make this formal in the following:

Definition 3.4.1 (Normal Distribution). We say a random variable X has a nor-


mal distribution if its pdf is
$  2 *
1 1 x−μ
f (x) = √ exp − , for −∞ < x < ∞. (3.4.6)
2πσ 2 σ

The parameters μ and σ2 are the mean and variance of X, respectively. We often
write that X has a N (μ, σ 2 ) distribution.

In this notation, the random variable Z with pdf (3.4.2) has a N (0, 1) distribution.
We call Z a standard normal random variable.
For the mgf of X, use the relationship X = σZ + μ and the mgf for Z, (3.4.4),
to obtain

E[exp{tX}] = E[exp{t(σZ + μ)}] = exp{μt}E[exp{tσZ}]


" # " #
1 2 2 1 2 2
= exp{μt} exp σ t = exp μt + σ t , (3.4.7)
2 2

for −∞ < t < ∞.


We summarize the above discussion, by noting the relationship between Z and
X:

X has a N (μ, σ 2 ) distribution if and only if Z = X−μ


σhas a N (0, 1) distribution.
(3.4.8)
Let X have a N (μ, σ 2 ) distribution. The graph of the pdf of X is seen in
Figure 3.4.1 to have the following characteristics: (1)
√ symmetry about a vertical
axis through x = μ; (2) having its maximum of 1/(σ 2π) at x = μ; and (3) having
the x-axis as a horizontal asymptote. It should also be verified that (4) there are
3.4. The Normal Distribution 189

f(x)

1
2

x
–3 –2 – + +2 +3

Figure 3.4.1: The normal density f (x), (3.4.6).

points of inflection at x = μ ± σ; see Exercise 3.4.7. By the symmetry about μ, it


follows that the median of a normal distribution is equal to its mean.
If we want to determine P (X ≤ x), then the following integration is required:
 x
1 2 2
P (X ≤ x) = √ e−(t−μ) /(2σ ) dt.
−∞ 2πσ

From calculus we know that the integrand does not have an antiderivative; hence,
the integration must be carried out by numerical integration procedures. The R
software uses such a procedure for its function pnorm. If X has a N (μ, σ 2 ) distribu-
tion, then the R call pnorm(x, μ, σ) computes P (X ≤ x), while q = qnorm(p, μ, σ)
gives the pth quantile of X; i.e., q solves the equation P (X ≤ q) = p. We illustrate
this computation in the next example.

Example 3.4.1. Suppose the height in inches of an adult male is normally dis-
tributed with mean μ = 70 inches and standard deviation σ = 4 inches. As a
graph of the pdf of X use Figure 3.4.1 replacing μ by 70 and σ by 4. Suppose
we want to compute the probability that a man exceeds six feet (72 inches) in
height. Locate 72 on the figure. The desired probability is the area under the curve
over the interval (72, ∞) which is computed in R by 1-pnorm(72,70,4) = 0.3085;
hence, 31% of males exceed six feet in height. The 95th percentile in height is
qnorm(0.95,70,4) = 76.6 inches. What percentage of males have heights within
one standard deviation of the mean? Answer: pnorm(74,70,4) - pnorm(66,70,4)
= 0.6827.

Before the age of modern computing tables of probabilities for normal distribu-
tions were formulated. Due to the fact (3.4.8), only tables for the standard normal
distribution are required. Let Z have the standard normal distribution. A graph of
190 Some Special Distributions

its pdf is displayed in Figure 3.4.2. Common notation for the cdf of Z is
 z
1 2
P (Z ≤ z) = Φ(z) =dfn √ e−t /2 dt, −∞ < z < ∞. (3.4.9)
0 2π
Table II of Appendix D displays a table for Φ(z) for specified values of z > 0. To
compute Φ(−z), where z > 0, use the identity
Φ(−z) = 1 − Φ(z). (3.4.10)
This identity follows because the pdf of Z is symmetric about 0. It is apparent in
Figure 3.4.2 and the reader is asked to show it in Exercise 3.4.1.

φ(z)

Φ(zp) = p

z
zp (0,0)

Figure 3.4.2: The standard normal density: p = Φ(zp ) is the area under the curve
to the left of zp .

As an illustration of the use of Table II, suppose in Example 3.4.1 that we want
to determine the probability that the height of an adult male is between 67 and 71
inches. This is calculated as
P (67 < X < 71) = P (X < 71) − P (X < 67)
   
X − 70 71 − 70 X − 70 67 − 70
= P < −P <
4 4 4 4
= P (Z < 0.25) − P (Z < −0.75) = Φ(0.25) − 1 + Φ(0.75)
= 0.5987 − 1 + 0.7734 = 0.3721 (3.4.11)
= pnorm(71, 70, 4) − pnorm(67, 70, 4) = 0.372079. (3.4.12)
Expression (3.4.11) is the calculation by using Table II, while the last line is the cal-
culation by using the R function pnorm. More examples are offered in the exercises.
As a final note on Table II, it is generated by the R function:

You might also like