Journal of Nonparametric Statistics
Vol. 22, No. 8, November 2010, 937–954
Spearman’s footrule and Gini’s gamma: a review
with complements
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
Christian Genesta *, Johanna Nešlehováb and Noomen Ben Ghorbala
a Département
de mathématiques et de statistique, Université Laval, 1045, avenue de la Médecine, Québec
(Québec), Canada G1V 0A6; b Department of Mathematics and Statistics, McGill University, 805,
rue Sherbrooke Ouest, Montréal (Québec), Canada H3A 2K6
(Received 19 February 2009; final version received 16 November 2009 )
The scattered literature on Spearman’s footrule and Gini’s gamma is surveyed. The following topics are
covered: finite-sample moments and asymptotic distribution under independence; large-sample distribution
under arbitrary alternatives; asymptotic relative efficiency for testing independence; consistent asymptotic
variance estimation through the jackknife; multivariate generalisations and uses. Complementary results
and an extensive bibliography are provided, along with several original illustrations.
Keywords: asymptotic relative efficiency; concordance; copula; Gini’s gamma; jackknife; Spearman’s
footrule; Spearman’s rho; ranks; test of independence
1. Introduction
Spearman’s footrule is a nonparametric measure of association. It was introduced by the
British psychologist Charles Spearman as an alternative to the correlation in the pairs
(R1 , S1 ), . . . , (Rn , Sn ) of ranks associated with a random sample (X1 , Y1 ), . . . , (Xn , Yn ) from
some continuous bivariate distribution H (x, y) = Pr(X x, Y y).
Spearman’s footrule usually refers to the statistic
n
ϕn = 1 −
3
|Ri − Si |,
2
n − 1 i=1
(1)
although other normalisations have been used, even by Spearman himself (cf. Spearman 1904,
1906; Dinneen and Blakesley 1971). This coefficient is closely related to the indice de cograduazione semplice introduced by the Italian statistician, demographer and sociologist Corrado Gini
*Corresponding author. Email:
[email protected]
ISSN 1048-5252 print/ISSN 1029-0311 online
© American Statistical Association and Taylor & Francis 2010
DOI: 10.1080/10485250903499667
http://www.informaworld.com
938
C. Genest et al.
(1914), viz.
n
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
γn =
1
{|(n + 1 − Ri ) − Si | − |Ri − Si |},
2
⌊n /2⌋ i=1
(2)
where ⌊m⌋ denotes the integer part of arbitrary m > 0.
Spearman’s footrule and Gini’s gamma remained largely neglected until fairly recently. In the
fourth edition of his book on rank correlation methods, Kendall (1970) discussed the footrule as a
nonparametric measure of association but dismissed it because of a lack of statistical properties.
Prior to 1980, the main sources of information on Gini’s gamma were in Italian (Savorgnan 1915;
Salvemini 1951; Amato 1954; Cucconi 1964).
Interest in Spearman’s footrule was apparently revived by Diaconis and Graham (1977), who
highlighted its natural interpretation in terms of the Manhattan (or city-block) distance between
two sets of ranks. They derived its asymptotic distribution under independence and noted that in
small samples, it is less variable than Spearman’s rho, which is based on the Euclidean metric.
Extensions have since been proposed to handle data that are incomplete (Alvo and Charbonneau
1977), multivariate (Úbeda-Flores 2005), and censored (Sen, Salama, and Quade 2003; Salama
and Quade 2004; Quade and Salama 2006).
Because of its simplicity, robustness and natural interpretation, the footrule has since been
rediscovered and used in various contexts. For instance, motivated by litigation about a
scoring
procedure for civil service examinations, Berman (1996) proposed the statistic Mn = (Ri −
Si )1(Ri > Si ) as a measure of ‘unfairness’when the results of an exam leading to ranks R1 , . . . , Rn
are replaced by scores leading to ranks S1 , . . . , Sn . However, Berman did not notice that Mn =
(n2 − 1)(1 − ϕn )/6.
In the field of genomics, a simple function of ϕn was advocated a few years ago by Kim, Rha,
Cho, and Chung (2004) to measure reproducibility among replicates in microarray experiments,
which are likely to produce outliers due to a low signal-to-noise ratio. In the field of information
retrieval, Spearman’s footrule distance has also been used to measure the discrepancy between
rank lists (Fagin, Kumar, and Sivakumar 2003; Mikki in press). The same idea was used very
recently in gene expression profiling and in bioinformatics by Iorio, Tagliaferri, and di Bernardo
(2009) and Lin and Ding (2009), respectively.
In comparison with Spearman’s footrule, Gini’s gamma seems to be used rather rarely in
practice. This may well change in the years to come, however, as a strong connection between
the two coefficients was recently uncovered by Nelsen and Úbeda-Flores (2004). They observed
that γn is in fact an extension of ϕn which Salama and Quade (2001) introduced to remedy its
asymmetry, already noted by Spearman (1904).
In support of these recent developments, this paper aims to consolidate the knowledge base on
Spearman’s footrule and Gini’s gamma. The scattered literature on the subject is collected and
organised in a structured way using the theory of copulas as a unifying framework. This leads to
several new results, proofs and illustrations.
Section 2 reviews basic properties and relations between Spearman’s footrule and Gini’s
gamma. Sections 3 and 4 describe their distributions under independence and under general
alternatives, respectively. Section 5 collects results on tests of independence based on the two
coefficients.
A jackknife procedure is detailed in Section 6 for the estimation of the statistics’ asymptotic
variance under any dependence structure. Finally, multivariate extensions of ϕn and γn are considered in Section 7, and their sampling properties are studied in Section 8. Practical recommendations
are summarised in the Conclusion.
Various Appendices contain the technical arguments, including new, simpler proofs of known
results based on the asymptotic behaviour of the empirical copula process.
Journal of Nonparametric Statistics
939
2. Definitions and basic properties
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
It is clear that the statistic ϕn defined in Equation (1) equals 1 when Ri = Si for all i ∈ {1, . . . , n}.
It takes its smallest value when the two sets of ranks are antithetic, that is, when Ri = n + 1 − Si
for every i ∈ {1, . . . , n}. A simple calculation shows that
⎧ 2
n
⎪
⎪
n
when n is even,
⎨ ,
2
|n + 1 − 2i| =
2
⎪
⎪
i=1
⎩ (n − 1) , when n is odd.
2
Therefore, ϕn varies in [−1/2, 1] when n is odd but it can go as low as −(n2 + 2)/{2(n2 − 1)} ∈
[−1, −1/2) for n even.
In order to span the entire interval [−1, 1], one can replace 3/(n2 − 1) by 2/⌊n2 /2⌋ in
Equation (1). Even if this is done, the statistic ϕn may still be regarded as unsatisfactory in
some applications. This is because it generally assigns different degrees of dependence (in absolute value) to the samples (X1 , Y1 ), . . . , (Xn , Yn ) and (−X1 , Y1 ), . . . , (−Xn , Yn ). For example, if
(X1 , Y1 ) = (10, 20), (X2 , Y2 ) = (20, 30) and (X3 , Y3 ) = (30, 10), then ϕn = −1 while ϕn = 0
for the sample (−10, 20), (−20, 30) and (−30, 10).
As explained by Salama and Quade (2001), this problem can be solved by making ϕn symmetric
with respect to the rank transformation R → n + 1 − R. Nelsen and Úbeda-Flores (2004) pointed
out that the resulting coefficient is the right-hand side of Equation (2), that is, Gini’s γn .
Many properties of ϕn and γn stem from their representation as linear rank statistics. From the
identity |u − v| = u + v − 2 min(u, v) valid for all u, v ∈ R, one gets
n
ϕn =
1
Jϕ
n − 1 i=1
Si
Ri
,
n+1 n+1
−
2n + 1
,
n−1
(3)
where Jϕ (u, v) = 6 min(u, v). Similarly, one can use the identity |(n + 1) − u − v| =
2 max{0, u + v − (n + 1)} − u − v + (n + 1) to see that
n
n+1
γn =
Jγ
2⌊n2 /2⌋ i=1
Si
Ri
,
n+1 n+1
−
n(n + 1)
,
⌊n2 /2⌋
(4)
where Jγ (u, v) = 4 min(u, v) + 4 max(0, u + v − 1).
3. Distribution under independence
The behaviour of ϕn , γn and variants thereof has been extensively studied under the assumption
that the variables X and Y are independent. From results which Spearman (1904) attributed to
Felix Hausdorff (see, for example, Kleinecke, Ury, and Wagner 1962, for a derivation), one gets
E(ϕn ) = 0
and var(ϕn ) =
2n2 + 7
.
5(n + 1)(n − 1)2
Tables of the null distribution of ϕn were produced by Ury and Kleinecke (1979). They were
later expanded by Franklin (1988) and Salama and Quade (1990); see also Salama and Quade
940
C. Genest et al.
(2002). Diaconis and Graham (1977) were apparently the first to show that under independence,
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
n1/2 ϕn N (0, 2/5),
where denotes convergence in distribution as n → ∞. See Sen and Salama (1983) for an
alternative proof.
For Gini’s gamma, Amato (1954) and Cucconi (1964) obtained independently
⎧
2(n2 + 2)
⎪
⎪
when n is even,
⎪
⎨ 3(n − 1)n2 ,
E(γn ) = 0 and var(γn ) =
⎪
⎪
2(n2 + 3)
⎪
⎩
, when n is odd.
3(n − 1)(n2 − 1)
A third derivation was provided by Salama and Quade (2001) but note the typo in their final
formula for n even.
The exact null distribution of Gini’s gamma was given by Savorgnan (1915) for n 5; these
tables were later extended by Salvemini (1951) and Cifarelli and Regazzini (1977). In addition,
Rizzi (1971) used simulations to approximate the null distribution of γn up to n = 30. Betrò
(1993) later showed how the exact distribution can be derived numerically. Other approximations
were designed by Landenna and Scagni (1989), and by Vittadini (1991). It was suspected for a
long time (Salvemini 1951; Amato 1954; Cucconi 1964; Herzel 1972) that under independence,
n1/2 γn N (0, 2/3)
as n → ∞. This was eventually proved by Cifarelli and Regazzini (1977).
4. Distribution in the case of dependence
The asymptotic distribution of Gini’s gamma was given by Cifarelli, Conti, and Regazzini (1996)
in the general case where the pair (X, Y ) has a bivariate distribution H (x, y) = Pr(X x, Y
y) with continuous margins F (x) = Pr(X x) and G(y) = Pr(Y y). The parallel result for
Spearman’s footrule is reported below, seemingly for the first time.
As it turns out, the large-sample distributions of ϕn and γn depend on H only through the
function C implicitly defined by
H (x, y) = Pr(X x, Y y) = C{F (x), G(y)}
for all x, y ∈ R. The so-called copula C, which is unique, is a bivariate distribution function with
uniform margins on the interval (0, 1) (Nelsen 2006, Chap. 2).
The following proposition, whose proof is in Appendix 1, shows that ϕn and γn are
asymptotically unbiased estimators of
1
ϕC = 1 − 3
(0,1)2
|u − v| dC(u, v) = −2 + 6
C(t, t) dt
0
and
1
γC = 2
(0,1)2
{|u + v − 1| − |u − v|} dC(u, v) = −2 + 4
0
{C(t, t) + C(t, 1 − t)} dt,
respectively. From these definitions, reported by Nelsen (1998), it is clear that ϕC and γC depend
only on the copula’s main and secondary diagonal sections, defined for all t ∈ [0, 1] by C(t, t)
and C(t, 1 − t), respectively.
Journal of Nonparametric Statistics
941
Proposition 1 Suppose that a bivariate copula C admits continuous partial derivatives
Ċ1 (u, v) = ∂C(u, v)/∂u and Ċ2 (u, v) = ∂C(u, v)/∂v on (0, 1). Then as n → ∞,
n1/2 (ϕn − ϕC ) N (0, σϕ2C ),
n1/2 (γn − γC ) N (0, σγ2C ),
with σϕ2C and σγ2C defined in Equations (A1) and (A2), respectively.
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
When C(u, v) = (u, v) ≡ uv is the independence copula, one gets σϕ2C = 2/5 and σγ2C = 2/3.
Additional examples of explicit calculations are given below.
Example 1 Let C(u, v) = uv + θ uv(1 − u)(1 − v) be the Farlie–Gumbel–Morgenstern copula
with parameter θ ∈ [−1, 1]. Routine calculations yield ϕC = θ/5, γC = 4θ /15,
σϕ2C =
2
3
11 2
+ θ−
θ
5 70
150
and σγ2C =
88 2
2
−
θ .
3 675
Example 2 Given θ ∈ [−1, 1], let A(v) = θ sin(2π v)/(2π ) and C(u, v) = uv + u(1 − u)A(v)
for all u, v ∈ (0, 1). These are examples of copulas with quadratic sections, as defined by
Quesada-Molina and Rodríguez-Lallena (1995). Interestingly, ϕC = γC = 0 for all θ ∈ [−1, 1].
This is in fact the case for any measure of concordance à la Scarsini (1984), because C(u, v) +
C(u, 1 − v) = u for all u, v ∈ (0, 1), so that all members of this family are ‘indifferent,’ in the
sense given to that term by Gini (Conti 1994). With the help of Maple, one finds
σϕ2C =
1080θ 2 + 72θ 2 π 4 + 225θ 2 π 2 + 64π 6
160π 6
and
σγ2C =
330θ 2 + 24θ 2 π 4 + 95θ 2 π 2 + 20π 6
.
30π 6
5. Asymptotic relative efficiency
Spearman’s footrule and Gini’s gamma are natural statistics for testing independence. Cifarelli
and Regazzini (1977) compared the merits of the test based on γn in terms of Pitman’s asymptotic
relative efficiency in a Gaussian model. More recently, Conti and Nikitin (1999a) computed the
local Bahadur efficiency of ϕn and γn for a large class of alternatives. As the test statistics are
rank-based, the calculations rely only on the dependence structure under the alternative, that is,
the copula.
In their work, Conti and Nikitin (1999a) considered copula alternatives defined for each
u, v ∈ (0, 1) by Cθ (u, v) = uv + θ θ (u, v), where θ 0 and θ is a non-negative function
whose mixed partial derivative satisfies mild conditions. They showed that for such alternatives,
Bahadur’s and Pitman’s efficiencies coincide.
Using the results of Genest, Quessy, and Rémillard (2006), one can extend these comparisons
to other copula families (Cθ ) in which independence occurs when, say, θ = 0. Indeed, note that
by Equations (3) and (4), ϕn and γn are asymptotically equivalent to statistics of the form
n
n
n
Si
j
1
Ri
i
1
J
Sn =
,
,
− 2
.
(5)
J
J
n i=1
n+1 n+1
n i=1 j =1
n+1 n+1
Here, J = Jϕ and J = Jγ , respectively. Many classical nonparametric tests of independence
are based on statistics of the form (5) for some score function J .
942
C. Genest et al.
Given right-continuous, square-integrable, quasi-monotone score functions J1 and J2 , it is
shown by Genest et al. (2006) that Pitman’s asymptotic relative efficiency (ARE) equals
μJ1 /σJ1 2
J1
J2
,
ARE(Sn , Sn ) =
μJ2 /σJ2
provided that the family (Cθ ) of copula alternatives meets mild regularity conditions concerning
mainly the existence and properties of the function Ċ0 , defined as ∂Cθ (u, v)/∂θ evaluated at
θ = 0. Here, μJi is the derivative with respect to θ of the asymptotic mean of SnJi under Cθ ,
evaluated at θ = 0, that is,
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
μJi =
Ċ0 (u, v) dJi (u, v).
(0,1)2
Furthermore, σJ2i stands for the asymptotic variance of SnJi at independence.
Given below are applications of this result when J1 , J2 ∈ {Jϕ , Jγ , Jρ } with Jρ (u, v)=12 uv for
all u, v ∈ (0, 1), which corresponds to Spearman’s rho.
Example 3 If Cθ is the Gaussian copula and denotes the cumulative distribution function of
′
−1
a N (0, 1) random
variable, one
(u)} ′ { −1 (v)} for all u, v ∈ (0, 1).
√
√ finds Ċ0 (u, v) = {
Thus, μJϕ = 3/π , μJγ = 4/( 3 π) and μJρ = 3/π . Hence,
J
J
ARE(Sn ϕ , Sn ρ ) =
5
≈ 0.83
6
J
J
and ARE(Sn γ , Sn ρ ) =
8
≈ 0.89.
9
These calculations are in accordance with the findings of Cifarelli and Regazzini (1977). For
this class of alternatives, both Spearman’s footrule and Gini’s gamma are less efficient than
Spearman’s rho. The Pitman efficiency of the latter is 9/π 2 ≈ 0.91 when compared with the van
der Waerden statistic, which is locally optimal for such alternatives (Genest and Verret 2005).
Example 4 Suppose that the family (Cθ ) is such that for all u, v ∈ (0, 1), Ċ0 (u, v) = kuv(um −
1)(v m − 1) for some k > 0 and m 1. The Farlie–Gumbel–Morgenstern, Dabrowska, Plackett
‘
and Frank families of copulas fall in this category when m = 1. The alternatives of Woodworth
(1970) illustrate the case m > 1. Simple calculations yield
μ Jϕ =
4km2
(m + 3)(2m + 3)
and
μ Jρ =
3km2
.
(2 + m)2
A complex but explicit expression is also available for μJγ ; it reduces to 4/15 if m = 1. Using
σρ = 1, one recovers the results of Conti and Nikitin (1999a) for m = 1, viz.
J
J
ARE(Sn ϕ , Sn ρ ) =
9
= 0.90
10
J
J
and ARE(Sn γ , Sn ρ ) =
24
= 0.96.
25
Spearman’s footrule and Gini’s gamma are thus somewhat less efficient than Spearman’s rho,
which is the locally optimal test statistic for this class of models (Genest and Verret 2005). As
shown in Figure 1, however, ϕn eventually becomes more efficient than ρn as m → ∞, while γn
gradually looses ground when m 2. In fact,
J
J
lim ARE(Sn ϕ , Sn ρ ) =
m→∞
10
≈ 1.11
9
and
J
J
lim ARE(Sn γ , Sn ρ ) =
m→∞
2
≈ 0.67.
3
Example 5 Suppose that the family (Cθ ) is such that for all u, v ∈ (0, 1), Ċ0 (u, v) =
kuv ln(u) ln(v) for some k > 0. The Clayton/Cook–Johnson and Gumbel–Barnett families fall in
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
Journal of Nonparametric Statistics
943
Figure 1. Relative efficiency of ϕn versus ρn (left) and γn versus ρn (right) as a function of parameter m 1 in the
Woodworth alternatives of Example 4.
this category, as well as Model 4.2.10 of Nelsen (2006). Here, μJϕ = 4k/9, μJγ = k(15 − π 2 )/9
and μJρ = 3k/4. Consequently,
J
640
≈ 0.88
729
J
ARE(Sn ϕ , Sn ρ ) =
J
J
and ARE(Sn γ , Sn ρ ) =
8(15 − π 2 )2
≈ 0.87.
243
The tests based on ϕn and γn thus have similar efficiencies. For this class of alternatives, however,
neither they nor the test based on ρn can be recommended. Indeed, the Pitman efficiency of
Spearman’s rho is only 9/16 ≈ 0.563 when compared with Savage’s log-rank test, which is the
locally most powerful test statistic in this case (Genest and Verret 2005).
The final example, adapted from Conti and Nikitin (1999a), exhibits dependence models for
which Spearman’s footrule and Gini’s gamma are the locally most powerful test statistics.
Example 6
Consider the families of copulas defined for all u, v, θ ∈ (0, 1) by
ϕ
Cθ (u, v) = uv +
θ {|u − v|3 − (u + v)3 + 2uv(u2 + v 2 + 2)}
2
and
θ {|1 − u − v|3 + |u − v|3 − 3(u2 + v 2 − u − v) − 1}
.
6
Both of them lie in the class of cubic-section copulas introduced by Nelsen, Quesada-Molina,
and Rodríguez-Lallena (1997). As shown by Conti and Nikitin (1999a), tests of independence
γ
ϕ
based on ϕn and γn are locally most powerful for the classes of alternatives Cθ and Cθ , respectively.
ϕ
For the family Cθ , one gets μJϕ = 2/5, μJγ = 1/2 and μJρ = 3/5. Thus,
γ
Cθ (u, v) = uv +
J
J
ARE(Sn ϕ , Sn ρ ) =
10
≈ 1.11
9
J
J
and ARE(Sn γ , Sn ρ ) =
25
≈ 1.04.
24
γ
For Cθ , one finds μJϕ = 1/2, μJγ = 2/3 and μJρ = 4/5. Hence,
J
J
ARE(Sn ϕ , Sn ρ ) =
125
≈ 0.98
128
J
J
and ARE(Sn γ , Sn ρ ) =
25
≈ 1.04.
24
944
C. Genest et al.
6. Estimation of the asymptotic variance
An alternative derivation of the limiting distributions of ϕn and γn was given by Conti (1994)
using an asymptotically equivalent U -statistic (see also Cifarelli et al. 1996). His approach leads
to a consistent estimate of their large-sample variances. Given u, v, s, t ∈ R, let
ψ1 (u, v; s, t) = |u − v| + sign (u − v){1(s u) − 1(t v) − u + v},
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
ψ2 (u, v; s, t) = |u + v − 1| + sign (u + v − 1){1(s u) + 1(t v) − u − v},
with the convention sign(0) = −1. For k = 1, 2, define k (u, v; s, t) = ψk (u, v; s, t) +
ψk (s, t; u, v) as well as
−1
n
ϒkn =
k (Ui , Vi ; Uj , Vj ),
2
i<j
where Ui = F (Xi ), Vi = G(Yi ) for i ∈ {1, . . . , n}. Conti’s result is as follows (see Conti 1994
for a proof).
Proposition 2 If the conditions of Proposition 1 hold, then as n → ∞,
n1/2 (ϒ1n − ϕC ) N (0, σϕ2C ) and n1/2 (ϒ2n − ϒ1n − γC ) N (0, σγ2C ).
Let σ̃ϕ2C and σ̃γ2C be the delete-one jackknife variance estimators based on ϒ1n and ϒ2n − ϒ1n ,
respectively. The theory of U -statistics (Lee 1990, Chap. 5) implies that σ̃ϕ2C is a consistent estimate
of σϕ2C ; similarly, σ̃γ2C estimates σγ2C consistently. In his work, Conti (1994) used slight variants
based on the work of Sen (1960). Specifically, let
ϒkn,i =
n
1
n − 1 j =1, j =i
k (Ui , Vi ; Uj , Vj )
for k = 1, 2 and i ∈ {1, . . . , n}. Conti’s estimators of σϕ2C and σγ2C are then given by
n
σ̂ϕ2C =
4
(n − 2)2 2
σ̃
(ϒ1n,i − ϒ1n )2 =
n i=1
n(n − 1) ϕC
σ̂γ2C =
(n − 2)2 2
4
σ̃ ,
(ϒ2n,i − ϒ2n )2 =
n i=1
n(n − 1) γC
and
n
respectively. In this fashion, var(σ̂ϕ2C ) < var(σ̃ϕ2C ) and var(σ̂γ2C ) < var(σ̃γ2C ).
As shown by Conti (1994), the delete-one jackknife remains consistent when Ui and Vi are
replaced by Fn (Xi ) = Ri /n and Gn (Yi ) = Si /n, where Fn and Gn are the empirical versions of
F and G, respectively.
0.8
0.8
0.7
0.7
0.6
0.6
0.5
0.5
0.4
0.4
0.3
0.3
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
945
0.5
0.5
Journal of Nonparametric Statistics
n=50
n=100
n=250
n=500
n=50
n=100
n=250
n=500
Figure 2. Dispersion of the asymptotic variance estimate of Spearman’s footrule (left) and Gini’s gamma (right), based
on 100 random samples of size n = 50, 100, 250 and 500 from the Farlie–Gumbel–Morgenstern copula with parameter
θ = 1/2.
The procedure can be implemented more easily upon noting that
1 (u, v; s, t)
2
= 1 {u min(v, s), t < min(v, s)} + 1 {s min(t, u), v < min(t, u)}
and
2 (u, v; s, t)
2
= 1 {s min(u, 1 − t), v > max(t, 1 − u)}
+ 1 {u min(s, 1 − v), t > max(v, 1 − s)} .
The behaviour of σ̂ϕ2C and σ̂γ2C is illustrated in Figure 2 using random samples of size n =
50, 100, 250 and 500 from the Farlie–Gumbel–Morgenstern copula with parameter θ = 1/2. As
can be seen, the convergence is fairly rapid. The same phenomenon was observed for several other
classes of copulas [results not shown].
7. Extensions
In recent years, various generalisations of Spearman’s footrule and Gini’s gamma have been
proposed. In particular, Cifarelli et al. (1996) considered
ϕg,C = g −1
(0,1)2
g(|u − v|) dC(u, v)
and
γg,C = g
−1
(0,1)2
{g(|u + v − 1|) − g(|u − v|)} dC(u, v) ,
where g : [0, 1] → [0, 1] is a strictly increasing, continuous function. In addition, if g is convex
and satisfies g(0) = 0, γg,C is a measure of concordance in the sense of Scarsini (1984); the cases
g(t) = t and g(t) = t 2 correspond to Gini’s gamma and Spearman’s rho, respectively. Cifarelli
et al. (1996) identified the asymptotic distribution of the empirical version of γg,C and showed
how to estimate its variance consistently by the jackknife. See Conti and Nikitin (1999b) for
additional limiting results. However, it is not clear how g should be chosen in practice.
946
C. Genest et al.
More recently, multivariate versions of Spearman’s footrule and Gini’s gamma were proposed
by Úbeda-Flores (2005) and Behboodian, Dolati, and Úbeda-Flores (2007), respectively. The
d-variate version of ϕC is
ϕC =
d +1
d −1
1
0
{C(t, . . . , t) + C̄(t, . . . , t)} dt −
2
,
d −1
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
where C̄ is the distribution function of 1 − U with U = (U1 , . . . , Ud ) distributed as C. ÚbedaFlores (2005) showed that ϕC = 0 at independence and ϕC = 1 at the Fréchet–Hoeffding upper
bound, defined for every u1 , . . . , ud ∈ (0, 1) by
M(u1 , . . . , ud ) = min(u1 , . . . , ud ).
In addition, he proved that the inequality ϕC −1/d always holds and that if C12 , C13 , C23
are the bivariate margins of a trivariate copula C, then
ϕC =
1
(ϕC12 + ϕC13 + ϕC23 ).
3
(6)
This property, which does not extend to higher dimensions, is shared by the multivariate extension of Gini’s gamma proposed by Behboodian et al. (2007). The latter is defined as a linear
transformation of
1
γC∗ =
0
{C(t, . . . , t) + C̄(t, . . . , t)} dt +
(−1)|A|
A⊆D
W (uA ) dC(u),
(0,1)d
where |A| denotes the cardinality of the set A ⊆ D = {1, . . . , d} and uA is the vector derived
from u = (u1 , . . . , ud ) ∈ (0, 1)d by replacing its ℓth coordinate by 1 if and only if ℓ ∈
/ A. The
expression for γC∗ also involves the multivariate Fréchet–Hoeffding lower bound, defined for every
u1 , . . . , ud ∈ (0, 1) by
W (u1 , . . . , ud ) = max(0, u1 + · · · + ud + 1 − d).
More specifically, Behboodian et al. (2007) defined γC = (γC∗ − 2ad + 1)/(2bd − 2ad ) with
d
1
1
1
j d
ad =
+
+
,
(−1)
j 2(j + 1)!
d + 1 2(d + 1)! j =0
bd = 1 −
d−1
1
4j
j =1
chosen in such a way that γC = 0 at independence and γC = 1 at M.
8. Sample properties in the multivariate case
Given a random sample (X11 , . . . , X1d ), . . . , (Xn1 , . . . , Xnd ) from some continuous d-variate
distribution, and (R11 , . . . , R1d ), . . . , (Rn1 , . . . , Rnd ) the associated vectors of componentwise
ranks, Úbeda-Flores (2005) defined the empirical version of ϕC by
n
ϕn = 1 −
d + 1 Li
,
d − 1 i=1 n2 − 1
where for each i ∈ {1, . . . , n}, Li = max(Ri1 , . . . , Rid ) − min(Ri1 , . . . , Rid ). The following
proposition, whose proof is in Appendix 2, implies that ϕn is asymptotically unbiased. As will be
shown below, however, it is generally biased in finite samples.
Journal of Nonparametric Statistics
947
Proposition 3 Suppose that a d-variate copula C admits continuous partial derivatives
Ċ1 (u1 , . . . , ud ) = ∂C(u1 , . . . , ud )/∂u1 , . . . , Ċd (u1 , . . . , ud ) = ∂C(u1 , . . . , ud )/∂ud on (0, 1)d .
Then as n → ∞,
n1/2 (ϕn − ϕC ) N (0, σϕ2C ),
where σϕ2C is defined in Equation (A5).
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
It is checked readily that Equation (A5) reduces to Equation (A1) when d = 2, and that Property
(6) continues to hold for the empirical version of ϕC . Although a closed-form expression is
available for σϕ2C , its computation can be tedious. Here is a simple example in dimension d = 3.
Example 7 Given θ12 , θ13 , θ23 , θ123 ∈ [−1, 1], a trivariate version of the Farlie–Gumbel–
Morgenstern copula is defined for all u, v, w ∈ (0, 1) by
C(u, v, w) = uvw{1 + θ12 (1 − u)(1 − v) + θ13 (1 − u)(1 − w)
+ θ23 (1 − v)(1 − w) + θ123 (1 − u)(1 − v)(1 − w)}.
Simple algebra yields ϕC = (θ12 + θ13 + θ23 )/15 and
2
2
11
17
2
2
+ (θ12 + θ13 + θ23 ) −
(θ 2 + θ13
+ θ23
)−
(θ12 θ13 + θ12 θ23 + θ13 θ23 ).
15 63
1350 12
1350
Note that θ123 is absent from the formulas, as might be expected from Property (6).
σϕ2C =
One possible use of the extended version of ϕn is as a test statistic for multivariate independence.
It is shown in Appendix 3 that under the null hypothesis,
n
2 k d
d +1 n
E(ϕn ) = 1 −
1−
.
(7)
d −1n−1
n + 1 k=0 n
Observe that while it vanishes when d = 2 or 3, this expectation is only O(1/n2 ) in general
and, for example, equals 1/(9n2 ) when d = 4. A closed-form expression for the finite-sample
variance of ϕn is also given in Equation (A6), but it is cumbersome.
In view of Proposition 3, a more practical solution is to reject the null hypothesis at asymptotic
level α if |ϕn |/σ is larger than the quantile of level 1 − α/2 of the N (0, 1) distribution. Here,
σ2 stands for the large-sample variance of ϕn under H0 , that is, when the underlying copula is
(u1 , . . . , ud ) = u1 × · · · × ud for all u1 , . . . , ud ∈ (0, 1). As shown in Appendix 4,
2 + 4d − d 2 + d 3
d +1 2
B(d, d + 2)
2
σ = 2
−
,
2
d −1
d(d + 2)(2d + 1)(d + 1)
d +1
where B denotes the Beta function. In particular, σ2 = 2/5, 2/15 and 149/2268 when d = 2, 3
and 4, respectively.
Behboodian et al. (2007) defined the empirical version of γC by
γn =
γn∗ − cn
dn − cn
for appropriate normalising constants cn , dn and a function γn∗ of the vectors R1 , . . . , Rn of
normalised ranks given for each i ∈ {1, . . . , n} by Ri = (Ri1 , . . . , Rid ) /(n + 1). Specifically,
n
1
|A|
∗
M(Ri ) + W (Ri ) +
γn =
(−1) {M(RiA ) + W (RiA )} ,
2n i=1
A⊆D
where for each i ∈ {1, . . . , n}, RiA is the vector obtained from Ri ∈ (0, 1)d by replacing its ℓth
coordinate by 1 if and only if ℓ ∈
/ A.
948
C. Genest et al.
It may be conjectured that γn is an unbiased, asymptotically normal estimator of γC in arbitrary
dimension d 3. It will be a challenge to determine its large-sample variance, however, even
under independence. This may be the object of future work.
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
9. Conclusion
This paper reviewed and complemented the properties of Spearman’s footrule and Gini’s gamma.
As mentioned in the Introduction, Spearman’s footrule, ϕn , is quickly gaining popularity in applications, mainly due to its interpretation as a Manhattan distance between two sets of ranks. As
such, it is more robust than, for example, Spearman’s rho which is based on the Euclidean distance. However, it suffers from one major drawback, namely its asymmetry. Gini’s statistic, γn ,
corrects this defect while maintaining the interpretation as a distance. From this point of view, it
thus seems preferable.
Furthermore, ϕn and γn may be regarded as measures of non-linear association. However, only
γn satisfies the axiomatic definition of such a measure proposed by Scarsini (1984). Nonetheless,
both statistics can be used for testing independence. In most cases considered here, they turned out
to be less efficient than the classical Spearman’s rho. A general recommendation cannot be made,
however, as both ϕn and γn are locally optimal for specific classes of alternatives. For additional
discussion on rank-based tests of independence and efficiency considerations, see, for example,
Genest and Rémillard (2004), Genest and Verret (2005) and Genest et al. (2006).
At present, standard errors for ϕn and γn are rarely found in applications, if ever. Asymptotic
confidence intervals for both statistics can be derived readily from Proposition 1 using the simpler
form of Conti’s variance estimator given in Section 6.
Results in Sections 7 and 8 make it possible to use the multivariate version of Spearman’s
footrule proposed by Úbeda-Flores (2005) for the comparison of d 3 sets of ranks. It was
shown that ϕn is again asymptotically normal, but that it is generally biased in finite samples if
d 4. The asymptotic variance under independence was computed and can be used to construct
tests for multivariate independence. Similar results concerning the multivariate extension of Gini’s
gamma are still under development.
Acknowledgements
Funding in support of this work was provided by the Natural Sciences and Engineering Research Council of Canada, the
Fonds québécois de la recherche sur la nature et les technologies and the Institut de finance mathématique de Montréal.
References
Alvo, M., and Charbonneau, M. (1997), ‘The Use of Spearman’s Footrule in Testing for Trend When the Data are
Incomplete’, Communications in Statistics: Simulation and Computation, 26, 193–213.
Amato, V. (1954), ‘Sulla distribuzione dell’indice del Gini’, Statistica, 14, 505–519.
Behboodian, J., Dolati, A., and Úbeda-Flores, M. (2007), ‘A Multivariate Version of Gini’s Rank Association Coefficient’,
Statistical Papers, 48, 295–304.
Berman, S.M. (1996), ‘Rank Inversions in Scoring Multipart Examinations’, The Annals of Applied Probability, 6, 992–
1005.
Betrò, B. (1993), ‘On the Distribution of Gini’s Rank Association Coefficient’, Communications in Statistics: Simulation
and Computation, 22, 497–505.
Cifarelli, D.M., Conti, P.L., and Regazzini, E. (1996), ‘On the Asymptotic Distribution of a General Measure of Monotone
Dependence’, The Annals of Statistics, 24, 1386–1399.
Cifarelli, D.M., and Regazzini, E. (1977), ‘On a distribution-free test of independence based on Gini’s rank association
coefficient’. Recent Developments in Statistics (Proceedings of the European Meeting of Statisticians, Grenoble,
1976), Amsterdam, North-Holland, pp. 375–385.
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
Journal of Nonparametric Statistics
949
Conti, P.L. (1994), ‘Asymptotic Inference on a General Measure of Monotone Dependence’, Journal of the Italian
Statistical Society, 3, 213–241.
Conti, P.L., and Nikitin, Y.Y. (1999a), ‘Asymptotic Efficiency of Independence Tests Based on Gini’s Rank Association
Coefficient, Spearman’s Footrule and Their Generalizations’, Communications in Statistics: Theory and Methods,
28, 453–465.
Conti, P.L., and Nikitin, Y.Y. (1999b), ‘Rates of Convergence for a Class of Rank Tests for Independence’, Zap. Nauchn.
Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 260, Veroyatn. i Stat. 3, 155–163, 319–320 [Translation in
Journal of Mathematical Science (New York) 109 (2002) 2141–2147].
Cucconi, O. (1964), ‘La distribuzione campionaria dell’indice di cograduazione del Gini’, Statistica, 24, 143–151.
Deheuvels, P. (1979), ‘La fonction de dépendance empirique et ses propriétés: un test non paramétrique d’indépendance’,
Académie royale de Belgique: Bulletin de la classe des sciences, 65(5), 274–292.
Diaconis, P., and Graham, R.L. (1977), ‘Spearman’s Footrule as a Measure of Disarray’, Journal of the Royal Statistical
Society, Series B, 39, 262–268.
Dinneen, L.C., and Blakesley, B.C. (1971), ‘Definition of Spearman’s Footrule’, Journal of the Royal Statistical Society,
Series C, 31, 66.
Fagin, R., Kumar, R., and Sivakumar, D. (2003), ‘Comparing Top k Lists’, SIAM Journal of Discrete Mathematics, 17,
134–160.
Fermanian, J.-D., Radulović, D., and Wegkamp, M.H. (2004), ‘Weak Convergence of Empirical Copula Processes’,
Bernoulli, 10, 847–860.
Franklin, L.A. (1988), ‘Exact Tables of Spearman’s Footrule for N = 11(1)18 With Estimate of Convergence and Errors
for the Normal Approximation’, Statistics and Probability Letters, 6, 399–406.
Genest, C., Quessy, J.-F., and Rémillard, B. (2006), ‘Local Efficiency of a Cramér–von Mises Test of Independence’,
Journal of Multivariate Analysis, 97, 274–294.
Genest, C., and Rémillard, B. (2004), ‘Tests of Independence and Randomness Based on the Empirical Copula Process’,
Test, 13, 335–369.
Genest, C., and Verret, F. (2005), ‘Locally Most Powerful Rank Tests of Independence for Copula Models’, Journal of
Nonparametric Statistics, 17, 521–539.
Gini, C. (1914), L’Ammontare e la Composizione della Ricchezza delle Nazione, Torino: Bocca.
Hájek, J., and Šidák, Z. (1967), Theory of Rank Tests, New York: Academic Press.
Herzel, A. (1972), ‘Sulla distribuzione campionaria dell’indice di cograduazione del Gini’, Metron, 30, 137–153.
Iorio, F., Tagliaferri, R., and di Bernardo, D. (2009), ‘Identifying Network of Drug Mode of Action by Gene Expression
Profiling’, Journal of Computational Biology, 16, 241–251.
Kendall, M. (1970), Rank Correlation Methods (4th ed.), London: Griffin.
Kim, B.S., Rha, S.Y., Cho, G.B., and Chung, H.C. (2004), ‘Spearman’s Footrule as a Measure of cDNA Microarray
Reproducibility’, Genomics, 84, 441–448.
Kleinecke, D.C., Ury, H.K., and Wagner, L.F. (1962), ‘Spearman’s Footrule—An Alternative Rank Statistic’, (Rep. no
CDRP–182–114), Civil Defense Research Project, Institute of Engineering Research, University of California at
Berkeley.
Landenna, G., and Scagni, A. (1989), ‘An Approximated Distribution of the Gini’s Rank Association Coefficient’,
Communications in Statistics. Theory and Methods, 18, 2017–2026.
Lee, A.J. (1990), U -Statistics: Theory and Practice, New York: Dekker.
Lin, S., and Ding, J. (2009), ‘Integration of Ranked Lists via Cross Entropy Monte Carlo with Applications to mRNA and
microRNA Studies, Biometrics, 65, 9–18.
Mikki, S. (in press), ‘Comparing Google Scholar and ISI Web of Science for Earth Sciences’, Scientometrics.
Nelsen, R.B. (1998), ‘Concordance and Gini’s Measure of Association’, Journal of Nonparametric Statistics, 9,
227–238.
Nelsen, R.B. (2006), An Introduction to Copulas (2nd ed.), Berlin: Springer.
Nelsen, R.B., Quesada-Molina, J.J., and Rodríguez-Lallena, J.A. (1997), ‘Bivariate Copulas with Cubic Sections’, Journal
of Nonparametric Statistics, 7, 205–220.
Nelsen, R.B., and Úbeda-Flores, M. (2004), ‘The Symmetric Footrule is Gini’s Rank Association Coefficient’,
Communications in Statistics: Theory and Methods, 33, 195–196.
Quade, D., and Salama, I.A. (2006), ‘Concordance of Complete or Right-censored Rankings Based on Spearman’s
Footrule’, Communications in Statistics: Theory and Methods, 35, 1059–1069.
Quesada-Molina, J.J., and Rodríguez-Lallena, J.A. (1995), ‘Bivariate Copulas with Quadratic Sections’, Journal of
Nonparametric Statistics, 5, 323–337.
Rizzi, A. (1971), ‘Distribuzione dell’indice di cograduazione del Gini’, Metron, 29, 63–73.
Rüschendorf, L. (1976), ‘Asymptotic Distributions of Multivariate Rank Order Statistics’, The Annals of Statistics, 4,
912–923.
Salama, I.A., and Quade, D. (1990), ‘A Note on Spearman’s Footrule’, Communications in Statistics: Simulation and
Computation, 19, 591–601.
Salama, I.A., and Quade, D. (2001), ‘The Symmetric Footrule’, Communications in Statistics: Theory and Methods, 30,
1099–1109.
Salama, I.A., and Quade, D. (2002), ‘Computing the Distribution of Spearman’s Footrule in O(n4 ) Time’, Journal of
Statistical Computation and Simulation, 72, 895–898.
Salama, I.A., and Quade, D. (2004), ‘AgreementAmong Censored Rankings Using Spearman’s Footrule’, Communications
in Statistics: Theory and Methods, 33, 1837–1850.
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
950
C. Genest et al.
Salvemini, T. (1951), ‘Sui vari indici di cograduazione’, Statistica, 11, 133–154.
Savorgnan, F. (1915), Sulla Formazione dei Valori dell’Indice di Cograduazione, Studi Economico-Giuridici
dell’Università di Cagliari.
Scarsini, M. (1984), ‘On Measures of Concordance’, Stochastica, 8, 201–218.
Sen, P.K. (1960), ‘On Some Convergence Properties of U -Statistics’, Calcutta Statistics Association Bulletin, 10, 1–18.
Sen, P.K., and Salama, I.A. (1983), ‘The Spearman Footrule and a Markov Chain Property’, Statistics and Probability
Letters, 1, 285–289.
Sen, P.K., Salama, I.A., and Quade, D. (2003), ‘Spearman’s Footrule Under Progressive Censoring’, Journal of
Nonparametric Statistics, 15, 53–60.
Spearman, C. (1904), ‘The Proof and Measurement of Association Between Two Things’, The American Journal of
Psychiatry, 15, 72–101.
Spearman, C. (1906), ‘Footrule for Measuring Correlation’, The British Journal of Psychiatry, 2, 89–108.
Stute, W. (1984), ‘The Oscillation Behavior of Empirical Processes: The Multivariate Case’, The Annals of Probability,
12, 361–379.
Tsukahara, H. (2005), ‘Semiparametric Estimation in Copula Models’, The Canadian Journal of Statistics, 33, 357–375.
Úbeda-Flores, M. (2005), ‘Multivariate Versions of Blomqvist’s Beta and Spearman’s Footrule’, Annals of the
Institute of Statistical Mathematics, 57, 781–788.
Ury, H.K., and Kleinecke, D.C. (1979), ‘Tables of the Distribution of Spearman’s Footrule’, Applied Statistics, 28,
271–275.
Vittadini, G. (1991), ‘Una approssimazione della variabile casuale G di Gini’, Rivista Internationale di Scienze Economiche
e Commerciali, 38, 81–94.
Woodworth, G.G. (1970), ‘Large Deviations and Bahadur Efficiency of Linear Rank Statistics’, The Annals of
Mathematical Statistics, 41, 251–283.
Appendix 1: Proof of Proposition 1
First, define a variant of the empirical copula of Deheuvels (1979) by
n
1
Si
Ri
Cn (u, v) =
1
u,
v
n
n+1
n+1
i=1
for every u, v ∈ (0, 1). Observe that for any score function J : (0, 1)2 → R, one has
n
Si
Ri
1
J (u, v) dCn (u, v).
=
,
J
n
n+1 n+1
(0,1)2
i=1
If in addition J itself is a copula, up to a multiplicative constant, Fubini’s theorem yields
(0,1)2
J (u, v) dCn (u, v) =
(0,1)2
Cn (u, v) dJ (u, v).
When these identities are used with J = Jϕ , Equation (3) becomes
ϕn =
1
6n
n−1
0
Cn (t, t) dt −
2n + 1
,
n−1
which shows that n1/2 (ϕn − ϕC ) has the same asymptotic behaviour as ZC,n = 6 Cn (t, t) dt, where Cn = n1/2 (Cn − C)
∗ = 4 {C (t, t) + C (t,
1/2
is the empirical copula process. Similarly, n (γn − γC ) behaves asymptotically as ZC,n
n
n
1 − t)} dt.
Now it has been known since the work of Rüschendorf (1976) that when C admits continuous partial derivatives,
Cn converges weakly as n → ∞ to a continuous centred Gaussian process C of the form C(u, v) = UC (u, v) −
Ċ1 (u, v)UC (u, 1) − Ċ2 (u, v)UC (1, v) for all u, v ∈ (0, 1). Here, UC denotes a pinned C-Brownian sheet, that is, a
centred Gaussian random field whose covariance function at u, v, s, t ∈ (0, 1) is given by cov{UC (u, v), UC (s, t)} =
C{min(u, s), min(v, t)} − C(u, v)C(s, t). See, for example, Stute (1984), Fermanian, Radulović, and Wegkamp (2004),
or Tsukahara (2005) for further discussion.
Because ZC,n is a continuous linear functional of Cn , it converges weakly as n → ∞ to the centred Gaussian random
variable ZC = 6 C(t, t) dt with variance
σϕ2C = 36
1
1
cov{C(s, s), C(t, t)} ds dt.
0
(A1)
0
∗ as n → ∞ is the centred Gaussian random variable Z ∗ = 4 {C(t, t) + C(t, 1 − t)} dt
Similarly, the weak limit of ZC,n
C
with variance
σγ2C = 16
1
0
1
0
cov{C(s, s) + C(s, 1 − s), C(t, t) + C(t, 1 − t)} ds dt.
The latter limit corresponds to the case g(t) = t in Cifarelli et al. (1996, Theorem 4.1).
(A2)
Journal of Nonparametric Statistics
951
Appendix 2: Proof of Proposition 3
For arbitrary u = (u1 , . . . , ud ) ∈ (0, 1)d , let
Cn (u) =
n
1
Rid
Ri1
1
u1 , . . . ,
ud
n
n+1
n+1
i=1
be the d-variate empirical copula. One then has
ϕn = 1 −
d +1 n
d −1 n−1
(0,1)d
{max(u) − min(u)} dCn (u),
(A3)
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
and the factor n/(n − 1) can be ignored asymptotically.
Now if (U1 , . . . , Ud ) has distribution Cn and U is an independent uniform random variable on (0, 1), then
1
(0,1)d
max(u) dCn (u) = 1 − Pr(U U1 , . . . , U Ud ) = 1 −
Cn (t, . . . , t) dt
0
and
1
(0,1)d
min(u) dCn (u) = Pr(U U1 , . . . , U Ud ) =
Pr(U1 > t, . . . , Ud > t) dt.
0
The latter expression can be formulated alternatively in terms of Cn by means of the inclusion–exclusion formula. To
this end, let |A| denote the cardinality of any set A ⊆ D = {1, . . . , d}, and denote by tA the vector (t1 , . . . , td ) such that
tℓ = t1(ℓ ∈ A) + 1(ℓ ∈
/ A) for all ℓ ∈ {1, . . . , d} so that, for example, tD = (t, . . . , t). Then,
(−1)|A| Pr
{Ui t} =
(−1)|A| Cn (tA ),
Pr(U1 > t, . . . , Ud > t) =
A⊆D
i∈A
A⊆D
where an intersection over the empty set is to be interpreted as the sure event.
Similarly, one has
1
0
1
C̄(t, . . . , t) dt =
0
C̄(1 − t, . . . , 1 − t) dt =
A⊆D
C(tA ) dt.
0
A⊆D
Consequently, n1/2 (ϕn − ϕC ) has the same asymptotic behaviour as
⎧
d +1 ⎨ 1
ZC,n =
Cn (tD ) dt +
(−1)|A|
⎩
d −1
0
1
(−1)|A|
1
Cn (tA ) dt
0
⎫
⎬
,
⎭
which is a continuous linear functional of the process Cn . From the work of Rüschendorf (1976), the limit of the latter is
of the form
C(u) = UC (u) −
d
Ċj (u)UC (uj )
(A4)
j =1
for arbitrary u = (u1 , . . . , ud ), where uj represents a d-dimensional vector with uj in its j th coordinate and 1 everywhere else. Here, UC is a d-variate-centred Gaussian field with covariance given by cov{UC (u), UC (v)} = C(u ∧ v) −
C(u)C(v), where for all u, v ∈ (0, 1)d , u ∧ v represents the componentwise minimum. Thus, ZC,n converges weakly as
n → ∞ to a centred Gaussian random variable
⎧
⎫
⎬
1
d +1 ⎨ 1
|A|
C(tD ) dt +
(−1)
C(tA ) dt .
ZC =
⎭
d −1 ⎩ 0
0
A⊆D
Hence, if sA is defined as tA mutatis mutandis, the variance of ZC is given by
⎧
⎫
2⎨
⎬
d
+
1
Ŵ(D, D) + 2
(−1)|A| Ŵ(A, D) + Ŵ̄(D, D) ,
σϕ2C =
⎭
d −1 ⎩
A⊆D
where for arbitrary A, B ⊆ D, one has
1
Ŵ(A, B) =
1
cov{C(sA ), C(tB )} ds dt
0
0
(A5)
952
C. Genest et al.
and
Ŵ̄(D, D) =
A⊆D B⊆D
(−1)|A|+|B| Ŵ(A, B) =
1
1
cov{C̄(sD ), C̄(tD )} ds dt.
0
0
Here, the process C̄ is defined in Equation (A4), with C replaced by C̄ everywhere. Thus when C is radially symmetric,
that is, C = C̄, one gets Ŵ̄(D, D) = Ŵ(D, D).
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
Appendix 3: Moments of ϕn at independence
For arbitrary integers k n, let k/nA denote the vector (k1 , . . . , kd )/(n + 1), where kℓ = k1(ℓ ∈ A) + (n + 1)1(ℓ ∈
/ A)
for all ℓ ∈ {1, . . . , d}. Using results stated on p. 59 of Hájek and Šidák (1967), one can see easily that E{Cn (k/nA )} =
(k/n)|A| at independence. Thus if tA is defined as in Appendix 2, one gets
1
E
0
Cn (tA ) dt =
n
n
1
1 k
E {Cn (k/nA )} =
n+1
n+1
n
k=0
|A|
.
k=0
It then follows from the identities proven in Appendix 2 that
1
max(u) dCn (u) = 1 − E
E
(0,1)d
0
Cn (tD ) dt = 1 −
n
1 k
n+1
n
d
k=0
and that
E
(0,1)d
min(u) dCn (u) =
=
1
(−1)|A| E
Cn (tA ) dt
0
A⊆D
n
k
1
(−1)|A|
n+1
n
A⊆D
|A|
k=0
=
d
n
k
1
d
(−1)ℓ
ℓ
n+1
n
ℓ
.
k=0
ℓ=0
The latter sum can be simplified further using the binomial theorem, viz.
n
d
(−1)ℓ
k=0 ℓ=0
k
d
ℓ
n
ℓ
=
n
k
1−
n
d
k=0
=
n
k
k=0
n
d
.
Taking expectations on both sides of Equation (A3) and making the appropriate substitutions, one gets Formula (7), as
stated.
Turning to the computation of var (ϕn ), one can immediately deduce from first principles and the above expression for
E (ϕn ) that
⎡
n
2 ⎤
2
k d
n
d +1 2
⎣E{M(n, d)} − 2
⎦,
var(ϕn ) = 2
d −1
n2 − 1
n
k=0
where
1
2
M(n, d) = (n + 1)
0
1
0
Now at independence, one has
⎧
⎨
⎩
Cn (sD )Cn (tD ) + Cn (sD )
E{Cn (j/nD )Cn (k/nA )}
1 j d−|A|
min(j, k)
= 2
n
n
n
n
|A|
+ n(n − 1)
(−1)
|A|
Cn (tA )
⎫
⎬
ds dt.
⎭
A⊆D
j k − min(j, k)
n2 − n
|A|
for arbitrary j, k ∈ {0, . . . , n} and A ⊆ D. Thus if
n (ℓ, d) =
n
n
1 j
n
n
j =0 k=0
d−ℓ
min(j, k)
n
ℓ
+ (n − 1)
j k − min(j, k)
n2 − n
ℓ
Journal of Nonparametric Statistics
953
for all ℓ ∈ {0, . . . , d}, one gets
1
1
E
0
0
Cn (sD )Cn (tA ) ds dt =
n (|A|, d)
.
(n + 1)2
Upon substitution and an application of the binomial identity, one finds
E{M(n, d)} = n (d, d) +
d
(−1)ℓ
ℓ=0
d
n (ℓ, d) = n (d, d) + n (d),
ℓ
where
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
n (d) =
n
n
1
n
j =0 k=0
min(j, k)
j
−
n
n
d
+ (n − 1)
j k − min(j, k)
j
−
n
n2 − n
d
.
Collecting terms, one finds
d +1
var (ϕn ) = 2
d −1
2
2
n
n2 − 1
⎡
⎣n (d, d) + n (d) − 2
n
k
k=0
n
d
2 ⎤
⎦.
(A6)
2
Appendix 4: Computation of σ
Because the independence copula is radially symmetric, Formula (A5) reduces to
⎫
⎧
⎬
d +1 2⎨
2
|A|
σϕC = 2
(−1) Ŵ(A, D) .
Ŵ(D, D) +
⎭
d −1 ⎩
A⊆D
To compute σϕ2C , one must evaluate 2d covariances of the form cov {C(sA ), C(tD )} for some A ⊆ D. In view of
Equation (A4), any such covariance may be expressed as
cov {UC (sA ), UC (tD )} +
−
d
j =1
d
d
j =1 k=1
Ċj (sA )Ċk (tD )cov {UC (sA∩{j } ), UC (tD∩{k} )}
Ċj (sA )cov {UC (sA∩{j } ), UC (tD )} −
d
k=1
Ċk (tD )cov {UC (sA ), UC (tD∩{k} )}.
Simplifications occur when C = because Ċj (sA ) = s |A\{j }| , Ċk (tB ) = t |B\{k}| and for arbitrary A, B ⊆ D,
|A| |B\A|
s
t
− t |B| , if s < t,
cov {UC (sA ), UC (tB )} =
t |B| s |A\B| − s |A| , if s > t.
Thus if s < t and A ⊆ D, one finds
cov {C(sA ), C(tD )} = s |A| (t |D\A| − t |D| ) +
−
j ∈A
s
|A| |D|−1
t
j =k∈A
(1 − t) −
s |A| t |D|−1 (1 − t)
k∈A
s |A| t |D|−1 (1 − t),
which reduces to
s |A| (t |D\A| − t |D| ) − |A|s |A| t |D|−1 (1 − t) = s |A| {t |D\A| (1 − t |A| ) − |A|t |D|−1 (1 − t)}.
Similarly if s > t, one gets
t |D| (s |A\D| − s |A| ) − |A| t |D| s |A|−1 (1 − s) = t |D| {(1 − s |A| ) − |A| s |A|−1 (1 − s)}.
Consequently, for any A ⊆ D with |A| = k,
1
Ŵ(A, D) =
0
1
0
cov {C(sA ), C(tD )} ds dt = 1 (k, d) + 2 (k, d),
954
C. Genest et al.
where for arbitrary k ∈ {1, . . . , d},
1
1 (k, d) =
t
0
s k {t d−k (1 − t k ) − kt d−1 (1 − t)} ds dt,
0
and
1
2 (k, d) =
1
0
t
2
2
t d {(1 − s k ) − k s k−1 (1 − s)} ds dt.
Consequently,
Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011
σ2 = 2
d +1
d −1
ℓ=1
ℓ (d, d) +
d
(−1)k
k=0
d
ℓ (k, d) .
k
Now observe that in view of the binomial identity,
d
k=0
(−1)k
d
1 (k, d) =
k
1
0
0
(−1)k
0 k=0
1
=
d
t
t
0
d k d−k
s {t
(1 − t k ) − kt d−1 (1 − t)} ds dt
k
{(t − s)d − (1 − s)d t d + ds(1 − s)d−1 t d−1 − ds(1 − s)d−1 t d } ds dt.
Similarly,
d
k=0
(−1)k
d
2 (k, d) =
k
1
0
1
t
−t d {(1 − s)d − (1 − s)d−1 d + ds(1 − s)d−1 } ds dt.
Upon collecting the terms and integrating, one gets the desired conclusion.