Chapter 2. Order Statistics
Chapter 2. Order Statistics
Chapter 2. Order Statistics
are called the order statistics. If F is continuous, then with probability 1 the order statistics
of the sample take distinct values (and conversely).
There is an alternative way to visualize order statistics that, although it does not
necessarily yield simple expressions for the joint density, does allow simple derivation of many
important properties of order statistics. It can be called the quantile function representation.
The quantile function (or inverse distribution function, if you wish) is defined by
Now it is well known that if U is a Uniform(0,1) random variable, then F −1 (U ) has distri-
bution function F . Moreover, if we envision U1 , . . . , Un as being iid Uniform(0,1) random
variables and X1 , . . . , Xn as being iid random variables with common distribution F , then
d
(X(1) , . . . , X(n) ) = (F −1 (U(1) ), . . . , F −1 (U(n) )), (2)
d
where = is to be read as “has the same distribution as.”
Let F be a distribution function (continuous from the right, as usual). The proof of F is
right continuous can be obtained from the following fact:
F (x + hn ) − F (x) = P (x < X ≤ x + hn ),
where {hn } is a sequence of real numbers such that 0 < hn ↓ 0 as n → ∞. It follows from
the continuity property of probabaility (P (limn An ) = limn P (An ) if lim An exists.) that
lim [F (x + hn ) − F (x)] = 0,
n→∞
and hence that F is right-continuous. Let D be the set of all discontinuity points of F and
n be a positive integer. Set
1
Dn = x ∈ D : P (X = x) ≥ .
n
2
where Fd and Fc are both continuous function such that Fd is a step function and Fc is
continuous. Moreover, the above decomposition is unique.
Let λ denote the Lebesgue measure on B, the σ-field of Borel sets in R. It follows from
the Lebesgue decomposition theorem that we can write Fc (x) = βFs (x)+(1−β)Fac (x) where
0 ≤ β ≤ 1, Fs is singular with respect to λ, and Fac is absolutely continuous with respect to
λ. On the other hand, the Radon-Nikodym theorem implies that there exists a nonnegative
Borel-measurable function on R such that
Z x
Fac (x) = f dλ,
−∞
where f is called the Radon-Nikodym derivative. This says that every distribution function
F admits a unique decomposition
P3
where αi ≥ 0 and i=1 αi = 1.
For 0 < p < 1, the pth quantile or fractile of F is defined as
• If F has a discontinuity at x0 , suppose that F (x0 −) < y < F (x0 ) = F (x0 +). In this
case, although there exists no x for which y = F (x), F −1 (y) is defined to be equal to
x0 .
• Now consider the case that F is not strictly increasing. Suppose that
< y for x < a
F (x) = y for a ≤ x ≤ b
> y for x > b
Then any value a ≤ x ≤ b could be chosen for x = F −1 (y). The convention in this case
is to define F −1 (y) = a.
3
Now we prove that if U is uniformly distributed over the interval (0, 1), then X =
−1
FX (U ) has cumulative distribution function FX (x). The proof is straightforward:
−1
P (X ≤ x) = P [FX (U ) ≤ x] = P [U ≤ FX (x)] = FX (x).
Note that discontinuities of F become converted into flat stretches of F −1 and flat stretches
of F into discontinuities of F −1 .
In particular, ξ1/2 = F −1 (1/2) is called the median of F . Note that ξp satisfies
F (ξ(p)−) ≤ p ≤ F (ξ(p)).
The function F −1 (t), 0 < t < 1, is called the inverse function of F . The following
proposition, giving useful properties of F and F −1 , is easily checked.
Lemma 1 Let F be a distribution function. The function F −1 (t), 0 < t < 1, is nondecreasing
and left-continuous, and satisfies
Here we consider statistics which may be expressed as functions of order statistics. A variety
of short-cut procedures for quick estimates of location or scale parameters, or for quick tests
of related hypotheses, are provided in the form of linear functions of order statistics, that is
statistics of the form
n
X
cni X(i:n) .
i=1
4
We term such statistics “L-estimates.” For example, the sample range X(n:n) − X(1:n) belongs
to this class. Another example is given by the α-trimmed mean.
n−[nα]
1 X
X(i:n) ,
n − 2[nα]
i=[nα]+1
which is a popular competitor of X̄ for robust estimation of location. The asymptotic dis-
tribution theory of L-statistics takes quite different forms, depending on the character of the
coefficients {cni }.
The representations of X̄ and ξˆpn in terms of order statistics are a bit artificial. On
the other hand, for many useful statistics, the most natural and efficient representations are
in terms of order statistics. Examples are the extreme values X1:n and Xn:n and the sample
range Xn:n − X1:n .
(2) The density of X(k) is given by nC(n − 1, k − 1)F k−1 (x)[1 − F (x)]n−k f (x).
n!
[F (x(k1 ) )]k1 −1 [F (x(k2 ) ) − F (x(k1 ) )]k2 −k1 −1
(k1 − 1)!(k2 − k1 − 1)!(n − k2 )!
[1 − F (x(k2 ) )]n−k2 f (x(k1 ) )f (x(k2 ) )
(4) The joint pdf of all the order statistics is n!f (z1 )f (z2 ) · · · f (zn ) for −∞ < z1 < · · · <
zn < ∞.
Proof. (1) The event {X(k) ≤ x} occurs if and only if at least k out of X1 , X2 , . . . , Xn are
less than or equal to x.
(2) The density of X(k) is given by nC(n − 1, k − 1)F k−1 (x)[1 − F (x)]n−k f (x). It can be
shown by the fact that
n
d X
C(n, i)pi (1 − p)n−i = nC(n − 1, k − 1)pk−1 (1 − p)n−k .
dp
i=k
Heuristically, k − 1 smallest observations are ≤ x and n − k largest are > x. X(k) falls into a
small interval of length dx about x is f (x)dx.
5
In the following two theorems, we relate the conditional distribution of order statistics (con-
ditioned on another order statistic) to the distribution of order statistics from a population
whose distribution is a truncated form of the original population distribution function F (x).
Proof. From the marginal density function of X(i:n) and the joint density function of
X(i:n) and X(j:n) , we have the conditional density function of X(j:n) , given that X(i:n) = xi ,
as
Here i < j ≤ n and xi ≤ xj < ∞. The result follows easily by realizing that {F (xj ) −
F (xi )}/{1 − F (xi )} and f (xj )/{1 − F (xi )} are the cdf and density function of the population
whose distribution is obtained by truncating the distribution F (x) on the left at xi .
Proof. From the marginal density function of X(i:n) and the joint density function of X(i:n)
and X(j:n) , we have the conditional density function of X(i:n) , given that X(j:n) = xj , as
Here 1 ≤ i < j and −∞ < xi ≤ xj . The proof is completed by noting that F (xi )/F (xj ) and
f (xi )/F (xj ) are the cdf and density function of the population whose distribution is obtained
by truncating the distribution F (x) on the right at xj
In this section, we will discuss some methods of simulating order statistics from a distribution
F (x). First of all, it should be mentioned that a straightforward way of simulating order
statistics is to generate a pseudorandom sample from the distribution F (x) and then sort
the sample through an efficient algorithm like quick-sort. This general method (being time-
consuming and expensive) may be avoided in many instances by making use of some of the
distributional properties to be established now.
For example, if we wish to generate the complete sample (x(1) , . . . , x(n) ) or even a Type
II censored sample (x(1) , . . . , x(r) ) from the standard exponential distribution. This may be
done simply by generating a pseudorandom sample y1 , . . . , yr from the standard exponential
distribution first, and then setting
i
X
x(i) = yj /(n − j + 1), i = 1, 2, . . . , r.
j=1
Theorem 4 Let X(1) ≤ X(2) ≤ · · · ≤ X(n) be the order statistics from the standard exponen-
tial distribution. Then, the random variables Z1 , Z2 , . . . , Zn , where
Zi = (n − i + 1)(X(i) − X(i−1) ),
with X(0) ≡ 0, are statistically independent and also have standard exponential distributions.
Proof. Note that the joint density function of X(1) , X(2) , . . . , X(n) is
n
!
X
f1,2,...,n:n (x1 , x2 , . . . , xn ) = n! exp − xi , 0 ≤ x1 < x2 < · · · < xn < ∞.
i=1
Theorem 5 For the Uniform(0, 1) distribution, the random variables V1 = U(i) /U(j) and
V2 = U(j) , 1 ≤ i < j ≤ n, are statistically independent, with V1 and V2 having Beta(i, j − i)
and Beta(j, n − j + 1) distributions, respectively.
Proof. From Theorem 1(3), we have the joint density function of U(i) and U(j) (1 ≤ i < j ≤
n) to be
n!
fi,j:n (ui , uj ) = ui−1 (uj − ui )j−i−1 (1 − uj )n−j , 0 < ui < uj < 1.
(i − 1)!(j − i − 1)!(n − j)! i
Now upon makin the transformation V1 = U(i) /U(j) and V2 = U(j) and noting that the
Jacobian of this transformation is v2 , we derive the joint density function of V1 and V2 to be
(j − 1)!
fV1 ,V2 (v1 , v2 ) = v i−1 (1 − v1 )j−i−1
(i − 1)!(j − i − 1)! 1
n!
= v j−1 (1 − v2 )n−j ,
(j − 1)!(n − j)! 2
0 < v1 < 1, 0 < v2 < 1. From the above equation it is clear that the random variables V1
and V2 are statistically independent, and also that they are distributed as Beta(i, j − i) and
Beta(j, n − j + 1), respectively.
Proof. Let X(1) < X(2) < · · · < X(n) denote the order statistics from the standard expo-
nential distribution. Then upon making use of the facts that X = − log U has a standard
exponential distribution and that − log u is a monotonically decreasing function in u, we
d
immediately have X(i) = − log U(n−i+1) . The above equation yields
I
e−X(n−i+1)
U(i) d d
Vi∗ = = = exp[−i(X(n−i+1) − X(n−i) )] = exp(−Yn−i+1 )
U(i+1) e−X(n−i)
upon using the above theorem, where Yi, are independent standard exponential random vari-
ables.
The just-described methods of simulating uniform order statistics may also be used
easily to generate order statistics from any known distribution F (x) for which F −1 (·) is
relatively easy to compute. We may simply obtain the order statistics x(1) , . . . , x(n) from the
required distribution F (·) by setting x(i) = F −1 (u(i) ).
8
Consider the sample pth quantile, ξˆpn , which is X([np]) or X([np]+1) depending on whether
np is an integer (here [np] denotes the integer part of np). For simplicity, we discuss the
properties of X([np]) where p ∈ (0, 1) and n is large. This will in turn inform us of the
properties of ξˆpn .
We first consider the case that X is uniformly distributed over [0, 1]. Let U([np]) denote
the sample pth quantile. If i = [np], we have
n! Γ(n + 1)
nC(n − 1, i − 1) = = = B(i, n − i + 1).
(i − 1)!(n − i)! Γ(i)Γ(n − i + 1)
[np]
EU([np]) = →p
n+1
[np1 ](n + 1 − [np2 ])
nCov U(np1 ) , U(np2 ) = n → p1 (1 − p2 ).
(n + 1)2 (n + 2)
P
Use these facts and Chebyschev inequality, we can show easily that U([np]) → p with rate
n−1/2 . This generates the question whether we can claim that
P
ξˆpn → ξp .
and Pn+1
i=i0 +1 Vi − (n − i0 + 1) d
√ → N (0, 1) .
n − i0 + 1
Consequently,
Pi0
i=1Vi − i0 d
√ → N (0, p)
n
and Pn+1
i=i0 +1 Vi − (n − i0 + 1) d
√ → N (0, 1 − p) .
n
Pi0 Pn
Since i=1 Vi and i=i0 +1 Vi are independent for all n,
P[np] Pn+1
i=1 Vi − [np] [np]+1 Vi − (n − [np] + 1) d
(1 − p) √ −p √ → N (0, p(1 − p)).
n n
√ d
n U([np]) − p → N (0, p(1 − p)) .
d
X([np]) = F −1 (p) + (U([np]) − p){f (F −1 (Dn ))}−1
where the random variable Dn is betwen U([np]) and p. This can be rearranged as
√ d √
n X([np]) − F −1 (p) = n U([np]) − p {f (F −1 (Dn )}−1 .
P
When f is continuous at F −1 (p), it follows that as n → ∞, f (F −1 (Dn )) → f (F −1 (p)). Use
the delta method, it yields
√
d p(1 − p)
n X([np]) − F −1 (p) → N 0, .
[f (F −1 (p))]2
We shall use the following result of Hoeffding (1963) to show that P (|ξˆpn − ξp | > ) → 0
exponentially fast.
Here
c E|Y1 − EY1 |3
|Error| ≤ √ .
n [V ar(Y1 )]3/2
Theorem 7 Let 0 < p < 1. Suppose that ξp is the unique solution x of F (x−) ≤ p ≤ F (x).
Then, for every > 0,
P |ξˆpn − ξp | > ≤ 2 exp −2nδ2 ,
all n,
By Lemma ??,
P ξˆpn > ξp + = P (p > Fn (ξp + ))
n
!
X
= P I(Xi > ξp + ) > n(1 − p)
i=1
n n
!
X X
= P Vi − E(Vi ) > nδ1 ,
i=1 i=1
where Wi = I(Xi < ξp − ) and δ2 = p − F (ξp − ). Therefore, utilizing Hoeffding’s lemma,
we have
P ξˆpn > ξp + ≤ exp −2nδ12
and
P ξˆpn < ξp − ≤ exp −2nδ22 .
√
d p(1 − p)
n(ξˆpn − ξp ) → N 0, .
[f (ξp )]2
Proof. We only consider p = 1/2. Note that ξp is the unique median since f (ξp ) > 0.
First, we consider that n is odd (i.e., n = 2m − 1).
√ √
−1 1 −1 1
P n X(m) − F ≤t = P nX(m) ≤ t F
=0
2 2
√ −1 1
= P X(m) ≤ t/ n F =0 .
2
√
Let Sn be the number of X’s that exceed t/ n. Then
t n−1
X(m) ≤ √ if and only if Sn ≤ m − 1 = .
n 2
√
−1 1 n−1
P n X(m) − F ≤ t = P Sn ≤
2 2
!
1
Sn − npn (n − 1) − npn
= P p ≤ 2p .
npn (1 − pn ) npn (1 − pn )
as n → ∞. Writing
1 √ 1
2 (n − 1) − npn − pn n 2
p ≈
npn (1 − pn ) 1/2
√
n − 12 + F (n−1/2 t)
F (n−1/2 t) − F (0)
= = 2t → 2tf (0).
1/2 n−1/2 t
Thus, " #
1
2 (n − 1) − npn
Φ p ≈ Φ (2f (0) · t)
npn (1 − pn )
or
√
−1
d 1
n X(m) − F (1/2) → N 0, .
4f 2 (F −1 (1/2))
12
√ √
When n is even (i.e., n = 2m), both P [ n(X(m) − F −1 (1/2)) ≤ t] and P [ n(X(m+1) −
F −1 (1/2)) ≤ t] tend to Φ 2f (F −1 (1/2)) · t .
We just prove asymptotic normality of ξˆp in the case that F possesses derivative at
the point ξp . So far we have discussed in detail the asymptotic normality of a single quantile.
This discussion extends in a natural manner to the asymptotic joint normality of a fixed
number of quantiles. This is made precise in the following result.
Theorem 9 Let 0 < p1 < · · · < pk < 1. Suppose that F has a density f in neighborhoods
of ξp1 , . . . , ξpk and that f is positive and continuous at ξp1 , . . . , ξpk . Then (ξˆp1 , . . . , ξˆpk ) is
asymptotically normal with mean vector (ξp1 , . . . , ξpk ) and covariance σij /n, where
pi (1 − pj )
σij = for 1≤i≤j≤k
f (ξpi )f (ξpj )
Suppose that we have a sample of size n from a normal distribution N (µ, σ 2 ). Let mn
√
represent the median of this sample. Then because f (µ) = ( 2πσ)−1 ,
√ d
n(mn − µ) → N (0, (1/4)/f 2 (µ)) = N (0, πσ 2 /2).
Compare mn with X̄n as an estimator of µ. We conclude immediately that X̄n is better than
mn since the latter has a much larger variance. Now consider the above problem again with
Cauchy distribution C(µ, σ) with density function
1 1
f (x) = .
πσ 1 + [(x − µ)/σ]2
The joint normality of a fixed number of central order statistics can be used to construct
simultaneous confidence regions for two or more population quantiles. As an illustration, we
now consider the semi-interquantile range, R = 12 (ξ3/4 − ξ1/4 ). (Note that the parameter σ
in C(µ, σ) is the semi-interquantile range.) It is used as an alternative to σ to measure the
spread of the data. A natural estimate of R is R̂n = 21 (ξˆ3/4 − ξˆ1/4 ). Theorem 4 gives the joint
distribution of ξˆ1/4 and ξˆ3/4 . We can use the following result, due to Cramer and Wold (1936),
which reduces the convergence of multivariate distribution functions to the convergence of
univariate distribution functions.
j−1
X
= C(n, r)pr (1 − p)n−r , (4)
r=i
14
where the last equality follows from (??). Thus, we have a confidence interval [X(i:n) , X(j:n) ]
for F −1 (p) whose confidence coefficient α(i, j) given by (??), is free of F and can be read from
the table of binomial probabilities. If p and the desired confidence level α0 are specified, we
choose i and j so that α(i, j) exceeds α0 . Because of the fact that α(i, j) is a step function,
usually the interval we obtain tends to be conservative. Further, the choice of i and j is not
unique, and the choice which makes (j − i) small appear reasonable. For a given n and p,
the binomial pmf C(n, r)pr (1 − p)n−r increases as r increases up to around [np], and then
decreases. So if we want to make (j − i) small, we have to start with i and j close to [np]
and gradually increase (j − i) until α(i, j) exceeds α0 .
Wilk and Gnanadesikan (1968) proposed a graphical, rather informal, method of testing the
goodness-of-fit of a hypothesized distribution to given data. It essentially plots the quantile
function of one cdf against that of another cdf. When the latter cdf is the empirical cdf
defined below, order statistics come into the picture. The empirical cdf, to be denoted by
Fn (x) for all real x, represents the proportion of sample values that do not exceed x. It has
jumps of magnitude 1/n at X(i:n) , 1 ≤ i ≤ n. Thus, the order statistics represent the values
taken by Fn−1 (p), the sample quantile function.
The Q-Q plot is the graphical representation of the points (F −1 (p), X(i:n) ), where
population quantiles are recorded along the horizontal axis and the sample quantiles on the
vertical axis. If the sample is in fact from F , we expect the Q-Q plot to be close to a straight
line. If not, the plot may show nonlinearity at the upper or lower ends, which may be an
indication of the presence of outliers. If the nonlinearity shows up at other points as well,
one could question the validity of the assumption that the parent cdf is F .
random distribution function and thus may be treated as a particular stochastic process (a
random element of a suitable space.)
Note that the exact distribution of nFn (x) is simply binomial(n, F (x)). It follows
immediately that
We now describe the Glivenko-Cantelli Theorem which ensures that the ecdf converges uni-
formly almost surely to the true distribution function.
Theorem 12
P {sup |Fn (x) − F (x)| → 0} = 1.
x
for j = 1, . . . , k − 1. Hence,
a.s.
4n = max(|Fn (xj ) − F (xj )|, |Fn (x− −
j ) − F (xj )|, j = 1, . . . , k − 1) → 0.
and
Fn (x) − F (x) ≥ Fn (xj−1 ) − F (x−
j ) ≥ Fn (xj−1 ) − F (xj−1 ) − .
A related problem is to express confidence bands for F (x), −∞ < x < ∞. Thus, for selected
functions a(x) and b(x), it is of interest to compute probabilities of the form
The general problem is quite difficult. However, in the simplest case, namely a(x) = b(x) = d,
the problem reduces to computation of P (Dn < d).
For the case of F 1-dimensional, an exponential-type probability inequality for Dn was
established by Dvoretzky, Kiefer, and Wolfowitz (1956).
Theorem 13 The distribution of Dn under H0 is the same for all continuous distribution.
Proof. For simplicity we give the proof for F0 strictly increasing. Then F0 has inverse F0−1
and as u ranges over (0, 1), F0−1 (u) ranges over a;; the possible values of X. Thus
Dn = sup |Fn (F0−1 (u))F0 (F0−1 (u))| = sup |Fn (F0−1 (u)) − u|.
0<u<1 0<u<1
Let Ui = F0 (Xi )|. Then U1 , . . . , Un are a sample from U N IF (0, 1), since
Thus,
Dn = sup |Fn∗ (u) − u|
0<u<1
where Fn∗ (u) is the empirical distribution of the uniform sample U1 , . . . , Un and the distribu-
tion of Dn does not depend on F0 .
We now give an important fact which is used in Donsker (1952) to give a rigorous proof
of the Kolmogorov-Smirnov Theorems.
Theorem 14 The distribution of the order statistic (Y(1) , . . . , Y(n) ) of n iid random variables
Y1 , Y2 , . . . from the uniform distribution on [0, 1] can also be obtained as the distribution of
the ratios
S1 S2 Sn
, ,···, ,
Sn+1 Sn+1 Sn+1
where Sk = T1 + · · · + Tk , k ≥ 1, and T1 , T2 , . . . is an iid sequence of (mean 1) exponentially
distributed random variables.
17
Intuitively, if the Ti are regarded as the successive times between occurrence of some phenom-
ena, then Sn+1 is the time to the (n + 1)st occurrence and, in units of Sn+1 , the occurrence
times should be randomly distributed because of lack of memory and independence properties.
Recall that Dn = sup0<u<1 |Fn∗ (u) − u| where Fn∗ (u) is the empirical distribution of
the uniform sample U1 , . . . , Un . We then have
√ √
k
|Fn∗ (u)
Dn = n sup − u| = n max Y(k) −
0<u<1 k≤n n
√
d Sk k n Sk − k k Sn+1 − n
= n max − = max √ − √
k≤n Sn+1 n Sn+1 k≤n n n n
Theorem 15 Let F be defined on R. There exists a finite positive constant C (not depending
on F ) such that
P (Dn < d) ≤ C exp(−2nd2 ), d > 0,
for all n = 1, 2, . . ..
Moreover,
for all n = 1, 2, . . ..
Secondly, we consider goodness of fit test statistics based on the sample distribution function.
The null hypotheseis in the simple case is H0 : F = F0 , where F0 is specified. A useful
procedure is the Kolmogorov-Smirnov test statistic
which reduces to Dn under the null hypothesis. More broadly, a class of such statistics is
obtained by introducing weight functions:
Another important class of statistics is based on the Cramer-von Mises test statistic
Z ∞
Cn = n [Fn (x) − F0 (x)]2 dF0 (x)
−∞
w(F0 (x))[Fn (x) − F0 (x)]2 dF0 (x). For example, for w(t) =
R
and takes the general form n
[t(1−t)]−1 , each duscrepancy Fn (x)−F (x) becomes weighted by the reciprocal of its standard
deviation (under H0 ), yielding the Anderson-Darling statistic.
18
is called a partition of [a, b]. Write 4fk = f (xk ) − f (xk−1 ) for k = 1, 2 . . . , n. If there exists
a positive number M such that
n
X
| 4 fk | ≤ M
k=1
for all partitions of [a, b], then f is said to be of bounded variation on [a, b]. Let F (x) be a
function of bounded variation and continuous from the left such as a distribution function.
Given a finite interval (a, b) and a function f (x) we can form the sum
n
X 0
Jn = f (xi )[F (xi ) − F (xi−1 )]
i=1
0
for a division of (a, b) by points xi such that a < x1 < · · · < xn < b and arbitrary xi ∈
(xi−1 , xi ). It may be noted that in the Riemann integration a similar sum is considered with
the length of the interval (xi − xi−1 ) instead of F (xi ) − F (xi−1 ). If J = limn→∞ Jn as the
length of each interval → 0, then J is called the Stieltjes integral of f (x) with respect to F (x)
and is denoted by Z b
J= f (x)dF (x).
a
The improper integral is defined by
Z b Z
lim f (x)dF (x) = f (x)dF (x).
a→−∞,b→∞ a
so that the integral taken over an interval that reduces to zero need not be zero. We shall
follow the convention that the lower end point is always included but not the upper end point.
Rb
With this convention, we see that a dF (x) = F (b) − F (a). If there exists a function p(x)
Rx
such that F (x) = −∞ p(x)dx, the Stieltjes integral reduces to a Riemann integral
Z Z
f (x)dF (x) = f (x)p(x)dx.
Rb
Theorem 17 Let α be a step function defined on [a, b] with jumps αk at xk . Then a
f (x)dα(x) =
Pn
k=1 f (xk )αk .
19
1 x0 − X
Efn (x0 ) = EW
bn bn
x0 − x
Z Z
1
= W f (x)dx = W (t)f (x0 − bn t)dt
bn bn
Z h i
0
= W (t) f (x0 ) − bn tf (x0 − θt bn t) dt
Z
0
= f (x0 ) − bn tW (t)f (x0 − θt bn t)dt.
R
When tW (t)dt 6= 0, we have
Z
0
Efn (x0 ) = f (x0 ) − bn f (x0 ) tW (t)dt + o(bn ).
t2 W (t)dt 6= 0, we have
R R
When tW (t)dt = 0 and
b2n 2 ”
Z
0
Efn (x0 ) = W (t) f (x0 ) − bn tf (x0 ) + t f (x0 − θt bn t) dt
2
2 Z
b
= f (x0 ) + n f ” (x0 ) t2 W (t)dt + o(b2n ).
2
0
Therefore, bn = O(n−1/3 ) when tW (t)dt 6= 0 (i.e., Assume that f exists.), and bn =
R
4. Waiting for the Big One. Disastrous floods and destructive earthquakes re-
cur throughout history. Dam construction has long focused on so called 100-year
flood. Presumably the dams are built big enough and strong enough to handle
any water flow to be encountered except for a level expected to occur only once
every 100 years. Architects in California are particularly concerned with construc-
tion designed to withstand “the big one,” presumably an earthquake of enormous
strength, perhaps a “100-year quake.” Whether one agrees or not with the 100-
year diaster philosophy, it is obvious that designers of dams and skycrapers, and
even doghouses, should be concerned with the distribution of large order statistics
from a possibly dependent, possibly not identically distributed sequence.
After the disastrous flood of February 1st, 1953, in which the sea-dikes broke in
several parts of the Netherlands and nearly two thousand people were killed, the
Dutch government appointed a committee (so-called Delta-committee) to recom-
mend on an appropriate level for the dikes (called Delta-level since) since no specific
21
statistical study had been done to fix a safer level for the sea-dikes before 1953.
The Dutch government set as the standard for the sea-dikes that at any time in a
given year the sea level exceeds the level of the dikes with probability 1/10, 000. A
statistical group from the Mathematical Centre in Amsterdam headed by D. van
Dantzig showed that high tides occurring during certain dangerous windstorms (to
ensure independence) within the dangerous wintermonths December, January and
February (for homogenity) follow closely an exponential distribution if the smaller
high tides are neglected.
If we model the annual maximum flood by a random variable Z, the Dutch gov-
ernment wanted to determine therefore the (1 − q)-quantile
F −1 (1 − q) = inf{t ∈ R : F (t) ≥ 1 − q}
5. Strength of Materials. The adage that a chain is no longer than its weakest link
underlies much of the theory of strength of materials, whether they are threads,
sheets, or blocks. By considering failure potential in infinitesimally small sections
of the material, on quickly is led to strength distributions associated with limits
of distributions of sample minima. Of course, if we stick to the finite chain with
n links, its strength would be the minimum of the strengths of its n component
links, again an order statistic.
7. Quality Control. Take a comfortable chair and watch the daily production of
Snickers candy bars pass by on the conveyor belt. Each candy bar should weigh
22
2.1 ounces; just a smidgen over the weight stated on the wrapper. No matter
how well the candy pouring machine was just adjusted at the beginning of the
shift, minor fluctuations will occur, and potentially major aberrations might be
encountered (if a peanut gets stuck in the control valve). We must be alert for
correctable malfunctions causing unreasonable variation in the candy bar weight.
Enter the quality control man with his X̄ and R charts or his median and R
charts. A sample of candy bars is weighed every hour, and close attention is paid
to the order statistics of the weights so obtained. If the median (or perhaps the
mean) is far from the target value, we must shut down the line. Either we are
turning out skinny bars and will hear from disgruntled six-year-olds, or we are
turning out overweight bars and wasting money (so we will hear from disgruntled
management). Attention is also focused on the sample range, the largest minus
the smallest weight. If it is too large, the process is out of control, and the widely
fluctuating candy bar weights will probably cause problems further down the line.
So again we stop and seek to identify a correctable cause before restarting the
Snickers line.
8. Selecting the Best. Field trials of corn varieties involved carefully balanced ex-
periments to determine which of several varieties is most productive. Obviously we
are concerned with the maximum of a set of probability not identically distributed
variables in such a setting. The situation is not unlike the one discussed earlier in
the context of identification of outliers. In the present situation, the outlier (the
best variety) is, however, good and merits retention (rather than being discarded
or discounted as would be the case in the usual outliers setting). Another instance
in biology in which order statistics play a clear role involves selective breeding by
culling. Here perhaps the best 10% with respect to meatiness of the animals in
each generation are raised for breeding purposes.
Geneticists and breeders measure the effectiveness of a selection program by com-
paring the average of the selected group with the population average. This differ-
ence, expressed in standard deviation units, is known as the selection differential.
Usually, the selected group consists of top or bottom order statistics. Without
loss of generality let us assume the top k order statistics are selected. Then the
selection differential is
( n
! )
1 X 1
Dk,n (µ, σ) = Xi:n −µ ,
σ k
i=n−k+1
where µ and σ are the population mean and standard deviation, respectively.
Breeders quite often use E(Dk,n (µ, σ)) or Dk,n (µ, σ) as a measure of improvement
due to selection. If k = n − [np], then except for a change of location and scale,
23
Dk,n (µ, σ) is a trimmed mean with p1 = p and p2 = 1 where p1 and 1−p2 represent
the proportion of the sample trimmed at either ends.
10. Olympic Records. Bob Beamon’s 1968 long jump remains on the Olympic record
book. Few other records last that long. If the best performances in each Olympic
Games were modeled as independent identically distributed random variables, then
records would become more and more scarce as time went by. Such is not the case.
The simplest explanation involves improving and increasing populations Thus the
1964 high jumping champion was the best of, say, N1 active international-caliber
jumpers. In 1968 there were more high-caliber jumpers of probably higher caliber
So we are looking, most likely, at a sequence of not identically distributed random
variables. But in any case we are focusing on maximum.
Now we discuss the possible nondegenerate limit distributions for X(n) . Denote the upper
limit of the support of F by F −1 (1). Suppose that F −1 (1) < ∞. Observe that FX(n) (x) =
P
[F (x)]n . Then X(n) → F −1 (1) can be established easily. Recall an elementary result in
Calculus:
cn n
1− → exp(−c) if and only if limn cn = c,
n
where {cn } be a sequence of real numbers. Consider the normalization [X(n) −F −1 (1)]/bn and
n−1 cn = 1−F (F −1 (1)+bn x). Depending on the tail of F , it is expected that different norming
constants bn and asymptotic distribution emerge. When F −1 (1) = ∞, it is not clear how to
find the limiting distribution of X(n) in general. In order to hope for a nondegenerate limit
distribution, we will have to appropriately normalize or standardize X(n) . In other words,
we look at the sequence {(X(n) − an )/bn , n ≥ 1} where an represents a shift in location and
bn > 0 represents a change in scale. The cdf of the normalized X(n) is F n (an + bn x). We will
now ask the following questions:
(i) Is it possible to find an and bn > 0 such that F n (an + bn x) → G(x) at all continuity
points of a nondegenerate cdf G?
In order to answer these questions precisely and facilitate the ensuing discussion, we introduce
two definitions
26
at all continuity points of G(x). If the above holds, we will write F ∈ D(G).
Definition Two cdfs F1 and F2 are said to be of the same type if there exist constants
a0 and b0 > 0 such that F1 (a0 + b0 x) = F2 (x).
If the random variable (X(n) − an )/bn has a limit distribution for some choice of
constants {an }, {bn }, then the limit distribution must be of the form G1 , G2 , or G3 , where
0, x ≤ 0,
G1 (x; α) =
exp (−x−α ) , x > 0,
exp (−(−x)α ) , x < 0,
G2 (x; α) =
1, x ≥ 0,
and
G3 (x) = exp −e−x , −∞ < x < ∞.
(In G1 and G2 , α is a positive constant.) This result was established by Gnedenko (1943),
following less rigorous treatments by earlier authors.
Note that the above three families of distributions may be related to the exponential
distribution as follows. If Y follows an exponential distribution, then G1 (x, α) is the distri-
bution function of Y −1/α , G21(x, α) is the distribution function of −Y −1/α , and G3 (x) is the
distribution function of − log(Y ).
As a motivated example, consider that X1 , . . . , Xn are uniformly distributed over [0, 1]. Be-
P
cause the uniform distribution has an upper terminal 1, it follows easily that X(n) → 1. Now
we would like to know how fast X(n) tends to 1. Alternatively, we attempt to choose appro-
priate constants {bn } such that Zn = (1 − X(n) )/bn will have a nondegenerate distribution.
Note that
0 if z < 0
P (Zn ≤ z) = .
1 − (1 − b z)n
n if b−1
n >z >0
This concludes that n(1 − X(n) ) has a limiting exponential distribution with unit mean.
27
We now consider the case that X has an upper terminal θ and 1 − F (x) ∼ a(θ − x)c
for some c > 0 as x → θ. Again, consider Zn = (θ − X(n) )/bn where bn is to be chosen. We
have the cumulative distribution function
for z ≥ 0 and bn z = O(1). We take abcn = n−1 to show that (θ − X(n) )(na)1/c has the limiting
p.d.f. cz c−1 exp(−z c ).
Next, suppose that X does not have an upper terminal and that as x → ∞
Note that F is a Pareto(θ) cdf when 1 − F (x) = xθ , x ≥ 1, θ > 0, where θ is the shape
parameter. Consider Zn = X(n) /bn , we have that
n
P r(Zn ≤ z) = [1 − {1 − F (bn z)}] ∼ {1 − a(bn z)−c }n .
P r(Zn ≤ z) ∼ exp(−z −c ).
n
P r(Zn ≤ z) = (1 − exp[log{1 − F (an + bn z)}]) . (5)
The crucial values of X(n) are those close to F −1 (1 − 1/n), which we take to be an . Observe
that
1 − F (an + bn z) ≈ 1 − {F (an ) + bn zf (an ))} = n−1 (1 − nbn zf (an ))
P r(Zn ≤ z) ∼ exp(−e−z ),
α(log n)(α−1)/α .
For exponential distribution, α = 1 and hence an = log n and bn = 1.
Example 3 (Standard Normal Distribution) Let F be a standard normal distribution.
Find appropriate choices of an and bn .
Sol. Using L’Hospital’s rule, we obtain
1
1 − F (x) ≈ f (x)
x
√ √
when x is large. Note that F (an ) = 1−n−1 which leads to an = 2 log n−(1/2) log(4π log n)/ 2 log n.
√
Hence, b−1
n = nf (an ) = 1/ 2 log n. Refer to p.99 of Ferguson (1996) for the detail of calcu-
lation.
As we know, Cauchy distribution is a special case of t-distribution. For tv distribution,
it has density
c
f (x) = ≈ cx−(v+1) .
(v + x2 )(v+1)/2
Can you figure out an and bn ? Refer to p.95 of Ferguson (1996) for the detail of calculation.
previous section the possible limit distributions for X(i:n) which depend on how i is related
to n. If Ln is a function of a finite number of central order statistics, the limit distribution
is normal, under mild conditions on F .
Even when the ain are nonzero for many i’s, Ln turns out to be asymptotically normal
when the weights are reasonably smooth. In order to make this requirement more precise, let
us suppose ain is of the form J(i/(n + 1))/n, where J(u), 0 ≤ u ≤ 1, is the associated weight
function. In other words, we assume now that Ln can be expressed as
n
1X i
Ln = J X(i:n) .
n i=1 n+1
The asymptotic normality of Ln has been established either by putting conditions on the
weights or the weight function.
Suppose that µ and the population median (= F −1 (1/2)) coincide. Let us assume
that the variance σ 2 is finite and f (µ) is finite and positive. For simplicity, let us take the
sample size n to be odd. While X̄n is an uubiased, asymptotically normal estimator of µ with
variance V ar(X̄n ) = σ 2 /n, X̃n is asymptotically unbiased and normal. If the population pdf
is symmetric (around µ), X̃n is also unbiased. Further, V ar(X̃n ) ≈ {4n[f (µ)]2 }−1 . Thus, as
an estimator µ, the sample median would be more efficient than the sample mean, at least
asymptotically, whenever [2f (µ)]−1 < σ. This condition is satisfied, for example, for the
1
Laplace distribution with pdf f (x; µ) = 2 exp(−|x − µ|), −∞ < x < ∞. For this distribution,
we know that X̃n is the maximum-likelihood estimator of µ, and that it is robust against
outliers. Further, since f (µ) = 1/2, we can construct confidence intervals for µ using the fact
√
that n(X̃n − µ) is asymptotically standard normal.
30
References
[1] Arnold, B.C., Balakrishnan, N. and Nagaraja, H.N. (1992). A First Course in Order
Statistics. John Wiley & Sons, Inc.
[2] Chung, K.L. (1974). A Course in Probability Theory. 2nd ed., Academic Press, New
York.
[3] Cramer, H. and Wold, H. (1936). Some theorems on distribution functions. J. London
Math. Soc. 11 290-295.
[4] Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956). Asymptotic minimax character of
the sample distribution function and of the classical multinomial estimator. Ann. Math.
Statist. 27 642-669.
[5] Falk, M. (1989). A note on uniform asymptotic normality of intermediate order statistics.
Ann. Inst. Statist. Math. 41 19-29.
[6] Ferguson, T.S. (1996). A Course in Large Sample Theory. Chapman & Hall.
[7] Gnedenko, B.V. (1943). Sue la distribution limite du terme maximum d’un serie aleatoire,
Ann. Math. 44 423-453.
[8] Wilk, M.B. and Gnanadesikan, R. (1968). Probability plotting methods for the analysis
of data. Biometrika 55 1-17.