Academia.eduAcademia.edu

Spearman's footrule and Gini's gamma: a review with complements

2010, Journal of Nonparametric Statistics

The scattered literature on Spearman's footrule and Gini's gamma is surveyed. The following topics are covered: finite-sample moments and asymptotic distribution under independence; large-sample distribution under arbitrary alternatives; asymptotic relative efficiency for testing independence; consistent asymptotic variance estimation through the jackknife; multivariate generalisations and uses. Complementary results and an extensive bibliography are provided, along with several original illustrations.

Journal of Nonparametric Statistics Vol. 22, No. 8, November 2010, 937–954 Spearman’s footrule and Gini’s gamma: a review with complements Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 Christian Genesta *, Johanna Nešlehováb and Noomen Ben Ghorbala a Département de mathématiques et de statistique, Université Laval, 1045, avenue de la Médecine, Québec (Québec), Canada G1V 0A6; b Department of Mathematics and Statistics, McGill University, 805, rue Sherbrooke Ouest, Montréal (Québec), Canada H3A 2K6 (Received 19 February 2009; final version received 16 November 2009 ) The scattered literature on Spearman’s footrule and Gini’s gamma is surveyed. The following topics are covered: finite-sample moments and asymptotic distribution under independence; large-sample distribution under arbitrary alternatives; asymptotic relative efficiency for testing independence; consistent asymptotic variance estimation through the jackknife; multivariate generalisations and uses. Complementary results and an extensive bibliography are provided, along with several original illustrations. Keywords: asymptotic relative efficiency; concordance; copula; Gini’s gamma; jackknife; Spearman’s footrule; Spearman’s rho; ranks; test of independence 1. Introduction Spearman’s footrule is a nonparametric measure of association. It was introduced by the British psychologist Charles Spearman as an alternative to the correlation in the pairs (R1 , S1 ), . . . , (Rn , Sn ) of ranks associated with a random sample (X1 , Y1 ), . . . , (Xn , Yn ) from some continuous bivariate distribution H (x, y) = Pr(X  x, Y  y). Spearman’s footrule usually refers to the statistic n ϕn = 1 − 3  |Ri − Si |, 2 n − 1 i=1 (1) although other normalisations have been used, even by Spearman himself (cf. Spearman 1904, 1906; Dinneen and Blakesley 1971). This coefficient is closely related to the indice de cograduazione semplice introduced by the Italian statistician, demographer and sociologist Corrado Gini *Corresponding author. Email: [email protected] ISSN 1048-5252 print/ISSN 1029-0311 online © American Statistical Association and Taylor & Francis 2010 DOI: 10.1080/10485250903499667 http://www.informaworld.com 938 C. Genest et al. (1914), viz. n Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 γn = 1  {|(n + 1 − Ri ) − Si | − |Ri − Si |}, 2 ⌊n /2⌋ i=1 (2) where ⌊m⌋ denotes the integer part of arbitrary m > 0. Spearman’s footrule and Gini’s gamma remained largely neglected until fairly recently. In the fourth edition of his book on rank correlation methods, Kendall (1970) discussed the footrule as a nonparametric measure of association but dismissed it because of a lack of statistical properties. Prior to 1980, the main sources of information on Gini’s gamma were in Italian (Savorgnan 1915; Salvemini 1951; Amato 1954; Cucconi 1964). Interest in Spearman’s footrule was apparently revived by Diaconis and Graham (1977), who highlighted its natural interpretation in terms of the Manhattan (or city-block) distance between two sets of ranks. They derived its asymptotic distribution under independence and noted that in small samples, it is less variable than Spearman’s rho, which is based on the Euclidean metric. Extensions have since been proposed to handle data that are incomplete (Alvo and Charbonneau 1977), multivariate (Úbeda-Flores 2005), and censored (Sen, Salama, and Quade 2003; Salama and Quade 2004; Quade and Salama 2006). Because of its simplicity, robustness and natural interpretation, the footrule has since been rediscovered and used in various contexts. For instance, motivated by litigation about a scoring procedure for civil service examinations, Berman (1996) proposed the statistic Mn = (Ri − Si )1(Ri > Si ) as a measure of ‘unfairness’when the results of an exam leading to ranks R1 , . . . , Rn are replaced by scores leading to ranks S1 , . . . , Sn . However, Berman did not notice that Mn = (n2 − 1)(1 − ϕn )/6. In the field of genomics, a simple function of ϕn was advocated a few years ago by Kim, Rha, Cho, and Chung (2004) to measure reproducibility among replicates in microarray experiments, which are likely to produce outliers due to a low signal-to-noise ratio. In the field of information retrieval, Spearman’s footrule distance has also been used to measure the discrepancy between rank lists (Fagin, Kumar, and Sivakumar 2003; Mikki in press). The same idea was used very recently in gene expression profiling and in bioinformatics by Iorio, Tagliaferri, and di Bernardo (2009) and Lin and Ding (2009), respectively. In comparison with Spearman’s footrule, Gini’s gamma seems to be used rather rarely in practice. This may well change in the years to come, however, as a strong connection between the two coefficients was recently uncovered by Nelsen and Úbeda-Flores (2004). They observed that γn is in fact an extension of ϕn which Salama and Quade (2001) introduced to remedy its asymmetry, already noted by Spearman (1904). In support of these recent developments, this paper aims to consolidate the knowledge base on Spearman’s footrule and Gini’s gamma. The scattered literature on the subject is collected and organised in a structured way using the theory of copulas as a unifying framework. This leads to several new results, proofs and illustrations. Section 2 reviews basic properties and relations between Spearman’s footrule and Gini’s gamma. Sections 3 and 4 describe their distributions under independence and under general alternatives, respectively. Section 5 collects results on tests of independence based on the two coefficients. A jackknife procedure is detailed in Section 6 for the estimation of the statistics’ asymptotic variance under any dependence structure. Finally, multivariate extensions of ϕn and γn are considered in Section 7, and their sampling properties are studied in Section 8. Practical recommendations are summarised in the Conclusion. Various Appendices contain the technical arguments, including new, simpler proofs of known results based on the asymptotic behaviour of the empirical copula process. Journal of Nonparametric Statistics 939 2. Definitions and basic properties Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 It is clear that the statistic ϕn defined in Equation (1) equals 1 when Ri = Si for all i ∈ {1, . . . , n}. It takes its smallest value when the two sets of ranks are antithetic, that is, when Ri = n + 1 − Si for every i ∈ {1, . . . , n}. A simple calculation shows that ⎧ 2 n ⎪ ⎪ n when n is even, ⎨ ,  2 |n + 1 − 2i| = 2 ⎪ ⎪ i=1 ⎩ (n − 1) , when n is odd. 2 Therefore, ϕn varies in [−1/2, 1] when n is odd but it can go as low as −(n2 + 2)/{2(n2 − 1)} ∈ [−1, −1/2) for n even. In order to span the entire interval [−1, 1], one can replace 3/(n2 − 1) by 2/⌊n2 /2⌋ in Equation (1). Even if this is done, the statistic ϕn may still be regarded as unsatisfactory in some applications. This is because it generally assigns different degrees of dependence (in absolute value) to the samples (X1 , Y1 ), . . . , (Xn , Yn ) and (−X1 , Y1 ), . . . , (−Xn , Yn ). For example, if (X1 , Y1 ) = (10, 20), (X2 , Y2 ) = (20, 30) and (X3 , Y3 ) = (30, 10), then ϕn = −1 while ϕn = 0 for the sample (−10, 20), (−20, 30) and (−30, 10). As explained by Salama and Quade (2001), this problem can be solved by making ϕn symmetric with respect to the rank transformation R → n + 1 − R. Nelsen and Úbeda-Flores (2004) pointed out that the resulting coefficient is the right-hand side of Equation (2), that is, Gini’s γn . Many properties of ϕn and γn stem from their representation as linear rank statistics. From the identity |u − v| = u + v − 2 min(u, v) valid for all u, v ∈ R, one gets n ϕn = 1  Jϕ n − 1 i=1  Si Ri , n+1 n+1 − 2n + 1 , n−1 (3) where Jϕ (u, v) = 6 min(u, v). Similarly, one can use the identity |(n + 1) − u − v| = 2 max{0, u + v − (n + 1)} − u − v + (n + 1) to see that n n+1  γn = Jγ 2⌊n2 /2⌋ i=1  Si Ri , n+1 n+1 − n(n + 1) , ⌊n2 /2⌋ (4) where Jγ (u, v) = 4 min(u, v) + 4 max(0, u + v − 1). 3. Distribution under independence The behaviour of ϕn , γn and variants thereof has been extensively studied under the assumption that the variables X and Y are independent. From results which Spearman (1904) attributed to Felix Hausdorff (see, for example, Kleinecke, Ury, and Wagner 1962, for a derivation), one gets E(ϕn ) = 0 and var(ϕn ) = 2n2 + 7 . 5(n + 1)(n − 1)2 Tables of the null distribution of ϕn were produced by Ury and Kleinecke (1979). They were later expanded by Franklin (1988) and Salama and Quade (1990); see also Salama and Quade 940 C. Genest et al. (2002). Diaconis and Graham (1977) were apparently the first to show that under independence, Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 n1/2 ϕn  N (0, 2/5), where  denotes convergence in distribution as n → ∞. See Sen and Salama (1983) for an alternative proof. For Gini’s gamma, Amato (1954) and Cucconi (1964) obtained independently ⎧ 2(n2 + 2) ⎪ ⎪ when n is even, ⎪ ⎨ 3(n − 1)n2 , E(γn ) = 0 and var(γn ) = ⎪ ⎪ 2(n2 + 3) ⎪ ⎩ , when n is odd. 3(n − 1)(n2 − 1) A third derivation was provided by Salama and Quade (2001) but note the typo in their final formula for n even. The exact null distribution of Gini’s gamma was given by Savorgnan (1915) for n  5; these tables were later extended by Salvemini (1951) and Cifarelli and Regazzini (1977). In addition, Rizzi (1971) used simulations to approximate the null distribution of γn up to n = 30. Betrò (1993) later showed how the exact distribution can be derived numerically. Other approximations were designed by Landenna and Scagni (1989), and by Vittadini (1991). It was suspected for a long time (Salvemini 1951; Amato 1954; Cucconi 1964; Herzel 1972) that under independence, n1/2 γn  N (0, 2/3) as n → ∞. This was eventually proved by Cifarelli and Regazzini (1977). 4. Distribution in the case of dependence The asymptotic distribution of Gini’s gamma was given by Cifarelli, Conti, and Regazzini (1996) in the general case where the pair (X, Y ) has a bivariate distribution H (x, y) = Pr(X  x, Y  y) with continuous margins F (x) = Pr(X  x) and G(y) = Pr(Y  y). The parallel result for Spearman’s footrule is reported below, seemingly for the first time. As it turns out, the large-sample distributions of ϕn and γn depend on H only through the function C implicitly defined by H (x, y) = Pr(X  x, Y  y) = C{F (x), G(y)} for all x, y ∈ R. The so-called copula C, which is unique, is a bivariate distribution function with uniform margins on the interval (0, 1) (Nelsen 2006, Chap. 2). The following proposition, whose proof is in Appendix 1, shows that ϕn and γn are asymptotically unbiased estimators of 1 ϕC = 1 − 3 (0,1)2 |u − v| dC(u, v) = −2 + 6 C(t, t) dt 0 and 1 γC = 2 (0,1)2 {|u + v − 1| − |u − v|} dC(u, v) = −2 + 4 0 {C(t, t) + C(t, 1 − t)} dt, respectively. From these definitions, reported by Nelsen (1998), it is clear that ϕC and γC depend only on the copula’s main and secondary diagonal sections, defined for all t ∈ [0, 1] by C(t, t) and C(t, 1 − t), respectively. Journal of Nonparametric Statistics 941 Proposition 1 Suppose that a bivariate copula C admits continuous partial derivatives Ċ1 (u, v) = ∂C(u, v)/∂u and Ċ2 (u, v) = ∂C(u, v)/∂v on (0, 1). Then as n → ∞, n1/2 (ϕn − ϕC )  N (0, σϕ2C ), n1/2 (γn − γC )  N (0, σγ2C ), with σϕ2C and σγ2C defined in Equations (A1) and (A2), respectively. Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 When C(u, v) = (u, v) ≡ uv is the independence copula, one gets σϕ2C = 2/5 and σγ2C = 2/3. Additional examples of explicit calculations are given below. Example 1 Let C(u, v) = uv + θ uv(1 − u)(1 − v) be the Farlie–Gumbel–Morgenstern copula with parameter θ ∈ [−1, 1]. Routine calculations yield ϕC = θ/5, γC = 4θ /15, σϕ2C = 2 3 11 2 + θ− θ 5 70 150 and σγ2C = 88 2 2 − θ . 3 675 Example 2 Given θ ∈ [−1, 1], let A(v) = θ sin(2π v)/(2π ) and C(u, v) = uv + u(1 − u)A(v) for all u, v ∈ (0, 1). These are examples of copulas with quadratic sections, as defined by Quesada-Molina and Rodríguez-Lallena (1995). Interestingly, ϕC = γC = 0 for all θ ∈ [−1, 1]. This is in fact the case for any measure of concordance à la Scarsini (1984), because C(u, v) + C(u, 1 − v) = u for all u, v ∈ (0, 1), so that all members of this family are ‘indifferent,’ in the sense given to that term by Gini (Conti 1994). With the help of Maple, one finds σϕ2C = 1080θ 2 + 72θ 2 π 4 + 225θ 2 π 2 + 64π 6 160π 6 and σγ2C = 330θ 2 + 24θ 2 π 4 + 95θ 2 π 2 + 20π 6 . 30π 6 5. Asymptotic relative efficiency Spearman’s footrule and Gini’s gamma are natural statistics for testing independence. Cifarelli and Regazzini (1977) compared the merits of the test based on γn in terms of Pitman’s asymptotic relative efficiency in a Gaussian model. More recently, Conti and Nikitin (1999a) computed the local Bahadur efficiency of ϕn and γn for a large class of alternatives. As the test statistics are rank-based, the calculations rely only on the dependence structure under the alternative, that is, the copula. In their work, Conti and Nikitin (1999a) considered copula alternatives defined for each u, v ∈ (0, 1) by Cθ (u, v) = uv + θ θ (u, v), where θ  0 and θ is a non-negative function whose mixed partial derivative satisfies mild conditions. They showed that for such alternatives, Bahadur’s and Pitman’s efficiencies coincide. Using the results of Genest, Quessy, and Rémillard (2006), one can extend these comparisons to other copula families (Cθ ) in which independence occurs when, say, θ = 0. Indeed, note that by Equations (3) and (4), ϕn and γn are asymptotically equivalent to statistics of the form   n n n Si j 1  Ri i 1 J Sn = , , − 2 . (5) J J n i=1 n+1 n+1 n i=1 j =1 n+1 n+1 Here, J = Jϕ and J = Jγ , respectively. Many classical nonparametric tests of independence are based on statistics of the form (5) for some score function J . 942 C. Genest et al. Given right-continuous, square-integrable, quasi-monotone score functions J1 and J2 , it is shown by Genest et al. (2006) that Pitman’s asymptotic relative efficiency (ARE) equals  μJ1 /σJ1 2 J1 J2 , ARE(Sn , Sn ) = μJ2 /σJ2 provided that the family (Cθ ) of copula alternatives meets mild regularity conditions concerning mainly the existence and properties of the function Ċ0 , defined as ∂Cθ (u, v)/∂θ evaluated at θ = 0. Here, μJi is the derivative with respect to θ of the asymptotic mean of SnJi under Cθ , evaluated at θ = 0, that is, Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 μJi = Ċ0 (u, v) dJi (u, v). (0,1)2 Furthermore, σJ2i stands for the asymptotic variance of SnJi at independence. Given below are applications of this result when J1 , J2 ∈ {Jϕ , Jγ , Jρ } with Jρ (u, v)=12 uv for all u, v ∈ (0, 1), which corresponds to Spearman’s rho. Example 3 If Cθ is the Gaussian copula and denotes the cumulative distribution function of ′ −1 a N (0, 1) random variable, one (u)} ′ { −1 (v)} for all u, v ∈ (0, 1). √ √ finds Ċ0 (u, v) = { Thus, μJϕ = 3/π , μJγ = 4/( 3 π) and μJρ = 3/π . Hence, J J ARE(Sn ϕ , Sn ρ ) = 5 ≈ 0.83 6 J J and ARE(Sn γ , Sn ρ ) = 8 ≈ 0.89. 9 These calculations are in accordance with the findings of Cifarelli and Regazzini (1977). For this class of alternatives, both Spearman’s footrule and Gini’s gamma are less efficient than Spearman’s rho. The Pitman efficiency of the latter is 9/π 2 ≈ 0.91 when compared with the van der Waerden statistic, which is locally optimal for such alternatives (Genest and Verret 2005). Example 4 Suppose that the family (Cθ ) is such that for all u, v ∈ (0, 1), Ċ0 (u, v) = kuv(um − 1)(v m − 1) for some k > 0 and m  1. The Farlie–Gumbel–Morgenstern, Dabrowska, Plackett ‘ and Frank families of copulas fall in this category when m = 1. The alternatives of Woodworth (1970) illustrate the case m > 1. Simple calculations yield μ Jϕ = 4km2 (m + 3)(2m + 3) and μ Jρ = 3km2 . (2 + m)2 A complex but explicit expression is also available for μJγ ; it reduces to 4/15 if m = 1. Using σρ = 1, one recovers the results of Conti and Nikitin (1999a) for m = 1, viz. J J ARE(Sn ϕ , Sn ρ ) = 9 = 0.90 10 J J and ARE(Sn γ , Sn ρ ) = 24 = 0.96. 25 Spearman’s footrule and Gini’s gamma are thus somewhat less efficient than Spearman’s rho, which is the locally optimal test statistic for this class of models (Genest and Verret 2005). As shown in Figure 1, however, ϕn eventually becomes more efficient than ρn as m → ∞, while γn gradually looses ground when m  2. In fact, J J lim ARE(Sn ϕ , Sn ρ ) = m→∞ 10 ≈ 1.11 9 and J J lim ARE(Sn γ , Sn ρ ) = m→∞ 2 ≈ 0.67. 3 Example 5 Suppose that the family (Cθ ) is such that for all u, v ∈ (0, 1), Ċ0 (u, v) = kuv ln(u) ln(v) for some k > 0. The Clayton/Cook–Johnson and Gumbel–Barnett families fall in Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 Journal of Nonparametric Statistics 943 Figure 1. Relative efficiency of ϕn versus ρn (left) and γn versus ρn (right) as a function of parameter m  1 in the Woodworth alternatives of Example 4. this category, as well as Model 4.2.10 of Nelsen (2006). Here, μJϕ = 4k/9, μJγ = k(15 − π 2 )/9 and μJρ = 3k/4. Consequently, J 640 ≈ 0.88 729 J ARE(Sn ϕ , Sn ρ ) = J J and ARE(Sn γ , Sn ρ ) = 8(15 − π 2 )2 ≈ 0.87. 243 The tests based on ϕn and γn thus have similar efficiencies. For this class of alternatives, however, neither they nor the test based on ρn can be recommended. Indeed, the Pitman efficiency of Spearman’s rho is only 9/16 ≈ 0.563 when compared with Savage’s log-rank test, which is the locally most powerful test statistic in this case (Genest and Verret 2005). The final example, adapted from Conti and Nikitin (1999a), exhibits dependence models for which Spearman’s footrule and Gini’s gamma are the locally most powerful test statistics. Example 6 Consider the families of copulas defined for all u, v, θ ∈ (0, 1) by ϕ Cθ (u, v) = uv + θ {|u − v|3 − (u + v)3 + 2uv(u2 + v 2 + 2)} 2 and θ {|1 − u − v|3 + |u − v|3 − 3(u2 + v 2 − u − v) − 1} . 6 Both of them lie in the class of cubic-section copulas introduced by Nelsen, Quesada-Molina, and Rodríguez-Lallena (1997). As shown by Conti and Nikitin (1999a), tests of independence γ ϕ based on ϕn and γn are locally most powerful for the classes of alternatives Cθ and Cθ , respectively. ϕ For the family Cθ , one gets μJϕ = 2/5, μJγ = 1/2 and μJρ = 3/5. Thus, γ Cθ (u, v) = uv + J J ARE(Sn ϕ , Sn ρ ) = 10 ≈ 1.11 9 J J and ARE(Sn γ , Sn ρ ) = 25 ≈ 1.04. 24 γ For Cθ , one finds μJϕ = 1/2, μJγ = 2/3 and μJρ = 4/5. Hence, J J ARE(Sn ϕ , Sn ρ ) = 125 ≈ 0.98 128 J J and ARE(Sn γ , Sn ρ ) = 25 ≈ 1.04. 24 944 C. Genest et al. 6. Estimation of the asymptotic variance An alternative derivation of the limiting distributions of ϕn and γn was given by Conti (1994) using an asymptotically equivalent U -statistic (see also Cifarelli et al. 1996). His approach leads to a consistent estimate of their large-sample variances. Given u, v, s, t ∈ R, let ψ1 (u, v; s, t) = |u − v| + sign (u − v){1(s  u) − 1(t  v) − u + v}, Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 ψ2 (u, v; s, t) = |u + v − 1| + sign (u + v − 1){1(s  u) + 1(t  v) − u − v}, with the convention sign(0) = −1. For k = 1, 2, define k (u, v; s, t) = ψk (u, v; s, t) + ψk (s, t; u, v) as well as  −1  n ϒkn = k (Ui , Vi ; Uj , Vj ), 2 i<j where Ui = F (Xi ), Vi = G(Yi ) for i ∈ {1, . . . , n}. Conti’s result is as follows (see Conti 1994 for a proof). Proposition 2 If the conditions of Proposition 1 hold, then as n → ∞, n1/2 (ϒ1n − ϕC )  N (0, σϕ2C ) and n1/2 (ϒ2n − ϒ1n − γC )  N (0, σγ2C ). Let σ̃ϕ2C and σ̃γ2C be the delete-one jackknife variance estimators based on ϒ1n and ϒ2n − ϒ1n , respectively. The theory of U -statistics (Lee 1990, Chap. 5) implies that σ̃ϕ2C is a consistent estimate of σϕ2C ; similarly, σ̃γ2C estimates σγ2C consistently. In his work, Conti (1994) used slight variants based on the work of Sen (1960). Specifically, let ϒkn,i = n  1 n − 1 j =1, j =i k (Ui , Vi ; Uj , Vj ) for k = 1, 2 and i ∈ {1, . . . , n}. Conti’s estimators of σϕ2C and σγ2C are then given by n σ̂ϕ2C = 4 (n − 2)2 2 σ̃ (ϒ1n,i − ϒ1n )2 = n i=1 n(n − 1) ϕC σ̂γ2C = (n − 2)2 2 4 σ̃ , (ϒ2n,i − ϒ2n )2 = n i=1 n(n − 1) γC and n respectively. In this fashion, var(σ̂ϕ2C ) < var(σ̃ϕ2C ) and var(σ̂γ2C ) < var(σ̃γ2C ). As shown by Conti (1994), the delete-one jackknife remains consistent when Ui and Vi are replaced by Fn (Xi ) = Ri /n and Gn (Yi ) = Si /n, where Fn and Gn are the empirical versions of F and G, respectively. 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 945 0.5 0.5 Journal of Nonparametric Statistics n=50 n=100 n=250 n=500 n=50 n=100 n=250 n=500 Figure 2. Dispersion of the asymptotic variance estimate of Spearman’s footrule (left) and Gini’s gamma (right), based on 100 random samples of size n = 50, 100, 250 and 500 from the Farlie–Gumbel–Morgenstern copula with parameter θ = 1/2. The procedure can be implemented more easily upon noting that 1 (u, v; s, t) 2 = 1 {u  min(v, s), t < min(v, s)} + 1 {s  min(t, u), v < min(t, u)} and 2 (u, v; s, t) 2 = 1 {s  min(u, 1 − t), v > max(t, 1 − u)} + 1 {u  min(s, 1 − v), t > max(v, 1 − s)} . The behaviour of σ̂ϕ2C and σ̂γ2C is illustrated in Figure 2 using random samples of size n = 50, 100, 250 and 500 from the Farlie–Gumbel–Morgenstern copula with parameter θ = 1/2. As can be seen, the convergence is fairly rapid. The same phenomenon was observed for several other classes of copulas [results not shown]. 7. Extensions In recent years, various generalisations of Spearman’s footrule and Gini’s gamma have been proposed. In particular, Cifarelli et al. (1996) considered ϕg,C = g −1 (0,1)2 g(|u − v|) dC(u, v) and γg,C = g −1 (0,1)2  {g(|u + v − 1|) − g(|u − v|)} dC(u, v) , where g : [0, 1] → [0, 1] is a strictly increasing, continuous function. In addition, if g is convex and satisfies g(0) = 0, γg,C is a measure of concordance in the sense of Scarsini (1984); the cases g(t) = t and g(t) = t 2 correspond to Gini’s gamma and Spearman’s rho, respectively. Cifarelli et al. (1996) identified the asymptotic distribution of the empirical version of γg,C and showed how to estimate its variance consistently by the jackknife. See Conti and Nikitin (1999b) for additional limiting results. However, it is not clear how g should be chosen in practice. 946 C. Genest et al. More recently, multivariate versions of Spearman’s footrule and Gini’s gamma were proposed by Úbeda-Flores (2005) and Behboodian, Dolati, and Úbeda-Flores (2007), respectively. The d-variate version of ϕC is ϕC = d +1 d −1 1 0 {C(t, . . . , t) + C̄(t, . . . , t)} dt − 2 , d −1 Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 where C̄ is the distribution function of 1 − U with U = (U1 , . . . , Ud ) distributed as C. ÚbedaFlores (2005) showed that ϕC = 0 at independence and ϕC = 1 at the Fréchet–Hoeffding upper bound, defined for every u1 , . . . , ud ∈ (0, 1) by M(u1 , . . . , ud ) = min(u1 , . . . , ud ). In addition, he proved that the inequality ϕC  −1/d always holds and that if C12 , C13 , C23 are the bivariate margins of a trivariate copula C, then ϕC = 1 (ϕC12 + ϕC13 + ϕC23 ). 3 (6) This property, which does not extend to higher dimensions, is shared by the multivariate extension of Gini’s gamma proposed by Behboodian et al. (2007). The latter is defined as a linear transformation of 1 γC∗ = 0 {C(t, . . . , t) + C̄(t, . . . , t)} dt +  (−1)|A| A⊆D W (uA ) dC(u), (0,1)d where |A| denotes the cardinality of the set A ⊆ D = {1, . . . , d} and uA is the vector derived from u = (u1 , . . . , ud ) ∈ (0, 1)d by replacing its ℓth coordinate by 1 if and only if ℓ ∈ / A. The expression for γC∗ also involves the multivariate Fréchet–Hoeffding lower bound, defined for every u1 , . . . , ud ∈ (0, 1) by W (u1 , . . . , ud ) = max(0, u1 + · · · + ud + 1 − d). More specifically, Behboodian et al. (2007) defined γC = (γC∗ − 2ad + 1)/(2bd − 2ad ) with  d  1 1 1 j d ad = + + , (−1) j 2(j + 1)! d + 1 2(d + 1)! j =0 bd = 1 − d−1  1 4j j =1 chosen in such a way that γC = 0 at independence and γC = 1 at M. 8. Sample properties in the multivariate case Given a random sample (X11 , . . . , X1d ), . . . , (Xn1 , . . . , Xnd ) from some continuous d-variate distribution, and (R11 , . . . , R1d ), . . . , (Rn1 , . . . , Rnd ) the associated vectors of componentwise ranks, Úbeda-Flores (2005) defined the empirical version of ϕC by n ϕn = 1 − d + 1  Li , d − 1 i=1 n2 − 1 where for each i ∈ {1, . . . , n}, Li = max(Ri1 , . . . , Rid ) − min(Ri1 , . . . , Rid ). The following proposition, whose proof is in Appendix 2, implies that ϕn is asymptotically unbiased. As will be shown below, however, it is generally biased in finite samples. Journal of Nonparametric Statistics 947 Proposition 3 Suppose that a d-variate copula C admits continuous partial derivatives Ċ1 (u1 , . . . , ud ) = ∂C(u1 , . . . , ud )/∂u1 , . . . , Ċd (u1 , . . . , ud ) = ∂C(u1 , . . . , ud )/∂ud on (0, 1)d . Then as n → ∞, n1/2 (ϕn − ϕC )  N (0, σϕ2C ), where σϕ2C is defined in Equation (A5). Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 It is checked readily that Equation (A5) reduces to Equation (A1) when d = 2, and that Property (6) continues to hold for the empirical version of ϕC . Although a closed-form expression is available for σϕ2C , its computation can be tedious. Here is a simple example in dimension d = 3. Example 7 Given θ12 , θ13 , θ23 , θ123 ∈ [−1, 1], a trivariate version of the Farlie–Gumbel– Morgenstern copula is defined for all u, v, w ∈ (0, 1) by C(u, v, w) = uvw{1 + θ12 (1 − u)(1 − v) + θ13 (1 − u)(1 − w) + θ23 (1 − v)(1 − w) + θ123 (1 − u)(1 − v)(1 − w)}. Simple algebra yields ϕC = (θ12 + θ13 + θ23 )/15 and 2 2 11 17 2 2 + (θ12 + θ13 + θ23 ) − (θ 2 + θ13 + θ23 )− (θ12 θ13 + θ12 θ23 + θ13 θ23 ). 15 63 1350 12 1350 Note that θ123 is absent from the formulas, as might be expected from Property (6). σϕ2C = One possible use of the extended version of ϕn is as a test statistic for multivariate independence. It is shown in Appendix 3 that under the null hypothesis,   n  2  k d d +1 n E(ϕn ) = 1 − 1− . (7) d −1n−1 n + 1 k=0 n Observe that while it vanishes when d = 2 or 3, this expectation is only O(1/n2 ) in general and, for example, equals 1/(9n2 ) when d = 4. A closed-form expression for the finite-sample variance of ϕn is also given in Equation (A6), but it is cumbersome. In view of Proposition 3, a more practical solution is to reject the null hypothesis at asymptotic level α if |ϕn |/σ is larger than the quantile of level 1 − α/2 of the N (0, 1) distribution. Here, σ2 stands for the large-sample variance of ϕn under H0 , that is, when the underlying copula is (u1 , . . . , ud ) = u1 × · · · × ud for all u1 , . . . , ud ∈ (0, 1). As shown in Appendix 4,  2 + 4d − d 2 + d 3 d +1 2 B(d, d + 2) 2 σ = 2 − , 2 d −1 d(d + 2)(2d + 1)(d + 1) d +1 where B denotes the Beta function. In particular, σ2 = 2/5, 2/15 and 149/2268 when d = 2, 3 and 4, respectively. Behboodian et al. (2007) defined the empirical version of γC by γn = γn∗ − cn dn − cn for appropriate normalising constants cn , dn and a function γn∗ of the vectors R1 , . . . , Rn of normalised ranks given for each i ∈ {1, . . . , n} by Ri = (Ri1 , . . . , Rid ) /(n + 1). Specifically,   n  1  |A| ∗ M(Ri ) + W (Ri ) + γn = (−1) {M(RiA ) + W (RiA )} , 2n i=1 A⊆D where for each i ∈ {1, . . . , n}, RiA is the vector obtained from Ri ∈ (0, 1)d by replacing its ℓth coordinate by 1 if and only if ℓ ∈ / A. 948 C. Genest et al. It may be conjectured that γn is an unbiased, asymptotically normal estimator of γC in arbitrary dimension d  3. It will be a challenge to determine its large-sample variance, however, even under independence. This may be the object of future work. Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 9. Conclusion This paper reviewed and complemented the properties of Spearman’s footrule and Gini’s gamma. As mentioned in the Introduction, Spearman’s footrule, ϕn , is quickly gaining popularity in applications, mainly due to its interpretation as a Manhattan distance between two sets of ranks. As such, it is more robust than, for example, Spearman’s rho which is based on the Euclidean distance. However, it suffers from one major drawback, namely its asymmetry. Gini’s statistic, γn , corrects this defect while maintaining the interpretation as a distance. From this point of view, it thus seems preferable. Furthermore, ϕn and γn may be regarded as measures of non-linear association. However, only γn satisfies the axiomatic definition of such a measure proposed by Scarsini (1984). Nonetheless, both statistics can be used for testing independence. In most cases considered here, they turned out to be less efficient than the classical Spearman’s rho. A general recommendation cannot be made, however, as both ϕn and γn are locally optimal for specific classes of alternatives. For additional discussion on rank-based tests of independence and efficiency considerations, see, for example, Genest and Rémillard (2004), Genest and Verret (2005) and Genest et al. (2006). At present, standard errors for ϕn and γn are rarely found in applications, if ever. Asymptotic confidence intervals for both statistics can be derived readily from Proposition 1 using the simpler form of Conti’s variance estimator given in Section 6. Results in Sections 7 and 8 make it possible to use the multivariate version of Spearman’s footrule proposed by Úbeda-Flores (2005) for the comparison of d  3 sets of ranks. It was shown that ϕn is again asymptotically normal, but that it is generally biased in finite samples if d  4. The asymptotic variance under independence was computed and can be used to construct tests for multivariate independence. Similar results concerning the multivariate extension of Gini’s gamma are still under development. Acknowledgements Funding in support of this work was provided by the Natural Sciences and Engineering Research Council of Canada, the Fonds québécois de la recherche sur la nature et les technologies and the Institut de finance mathématique de Montréal. References Alvo, M., and Charbonneau, M. (1997), ‘The Use of Spearman’s Footrule in Testing for Trend When the Data are Incomplete’, Communications in Statistics: Simulation and Computation, 26, 193–213. Amato, V. (1954), ‘Sulla distribuzione dell’indice del Gini’, Statistica, 14, 505–519. Behboodian, J., Dolati, A., and Úbeda-Flores, M. (2007), ‘A Multivariate Version of Gini’s Rank Association Coefficient’, Statistical Papers, 48, 295–304. Berman, S.M. (1996), ‘Rank Inversions in Scoring Multipart Examinations’, The Annals of Applied Probability, 6, 992– 1005. Betrò, B. (1993), ‘On the Distribution of Gini’s Rank Association Coefficient’, Communications in Statistics: Simulation and Computation, 22, 497–505. Cifarelli, D.M., Conti, P.L., and Regazzini, E. (1996), ‘On the Asymptotic Distribution of a General Measure of Monotone Dependence’, The Annals of Statistics, 24, 1386–1399. Cifarelli, D.M., and Regazzini, E. (1977), ‘On a distribution-free test of independence based on Gini’s rank association coefficient’. Recent Developments in Statistics (Proceedings of the European Meeting of Statisticians, Grenoble, 1976), Amsterdam, North-Holland, pp. 375–385. Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 Journal of Nonparametric Statistics 949 Conti, P.L. (1994), ‘Asymptotic Inference on a General Measure of Monotone Dependence’, Journal of the Italian Statistical Society, 3, 213–241. Conti, P.L., and Nikitin, Y.Y. (1999a), ‘Asymptotic Efficiency of Independence Tests Based on Gini’s Rank Association Coefficient, Spearman’s Footrule and Their Generalizations’, Communications in Statistics: Theory and Methods, 28, 453–465. Conti, P.L., and Nikitin, Y.Y. (1999b), ‘Rates of Convergence for a Class of Rank Tests for Independence’, Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 260, Veroyatn. i Stat. 3, 155–163, 319–320 [Translation in Journal of Mathematical Science (New York) 109 (2002) 2141–2147]. Cucconi, O. (1964), ‘La distribuzione campionaria dell’indice di cograduazione del Gini’, Statistica, 24, 143–151. Deheuvels, P. (1979), ‘La fonction de dépendance empirique et ses propriétés: un test non paramétrique d’indépendance’, Académie royale de Belgique: Bulletin de la classe des sciences, 65(5), 274–292. Diaconis, P., and Graham, R.L. (1977), ‘Spearman’s Footrule as a Measure of Disarray’, Journal of the Royal Statistical Society, Series B, 39, 262–268. Dinneen, L.C., and Blakesley, B.C. (1971), ‘Definition of Spearman’s Footrule’, Journal of the Royal Statistical Society, Series C, 31, 66. Fagin, R., Kumar, R., and Sivakumar, D. (2003), ‘Comparing Top k Lists’, SIAM Journal of Discrete Mathematics, 17, 134–160. Fermanian, J.-D., Radulović, D., and Wegkamp, M.H. (2004), ‘Weak Convergence of Empirical Copula Processes’, Bernoulli, 10, 847–860. Franklin, L.A. (1988), ‘Exact Tables of Spearman’s Footrule for N = 11(1)18 With Estimate of Convergence and Errors for the Normal Approximation’, Statistics and Probability Letters, 6, 399–406. Genest, C., Quessy, J.-F., and Rémillard, B. (2006), ‘Local Efficiency of a Cramér–von Mises Test of Independence’, Journal of Multivariate Analysis, 97, 274–294. Genest, C., and Rémillard, B. (2004), ‘Tests of Independence and Randomness Based on the Empirical Copula Process’, Test, 13, 335–369. Genest, C., and Verret, F. (2005), ‘Locally Most Powerful Rank Tests of Independence for Copula Models’, Journal of Nonparametric Statistics, 17, 521–539. Gini, C. (1914), L’Ammontare e la Composizione della Ricchezza delle Nazione, Torino: Bocca. Hájek, J., and Šidák, Z. (1967), Theory of Rank Tests, New York: Academic Press. Herzel, A. (1972), ‘Sulla distribuzione campionaria dell’indice di cograduazione del Gini’, Metron, 30, 137–153. Iorio, F., Tagliaferri, R., and di Bernardo, D. (2009), ‘Identifying Network of Drug Mode of Action by Gene Expression Profiling’, Journal of Computational Biology, 16, 241–251. Kendall, M. (1970), Rank Correlation Methods (4th ed.), London: Griffin. Kim, B.S., Rha, S.Y., Cho, G.B., and Chung, H.C. (2004), ‘Spearman’s Footrule as a Measure of cDNA Microarray Reproducibility’, Genomics, 84, 441–448. Kleinecke, D.C., Ury, H.K., and Wagner, L.F. (1962), ‘Spearman’s Footrule—An Alternative Rank Statistic’, (Rep. no CDRP–182–114), Civil Defense Research Project, Institute of Engineering Research, University of California at Berkeley. Landenna, G., and Scagni, A. (1989), ‘An Approximated Distribution of the Gini’s Rank Association Coefficient’, Communications in Statistics. Theory and Methods, 18, 2017–2026. Lee, A.J. (1990), U -Statistics: Theory and Practice, New York: Dekker. Lin, S., and Ding, J. (2009), ‘Integration of Ranked Lists via Cross Entropy Monte Carlo with Applications to mRNA and microRNA Studies, Biometrics, 65, 9–18. Mikki, S. (in press), ‘Comparing Google Scholar and ISI Web of Science for Earth Sciences’, Scientometrics. Nelsen, R.B. (1998), ‘Concordance and Gini’s Measure of Association’, Journal of Nonparametric Statistics, 9, 227–238. Nelsen, R.B. (2006), An Introduction to Copulas (2nd ed.), Berlin: Springer. Nelsen, R.B., Quesada-Molina, J.J., and Rodríguez-Lallena, J.A. (1997), ‘Bivariate Copulas with Cubic Sections’, Journal of Nonparametric Statistics, 7, 205–220. Nelsen, R.B., and Úbeda-Flores, M. (2004), ‘The Symmetric Footrule is Gini’s Rank Association Coefficient’, Communications in Statistics: Theory and Methods, 33, 195–196. Quade, D., and Salama, I.A. (2006), ‘Concordance of Complete or Right-censored Rankings Based on Spearman’s Footrule’, Communications in Statistics: Theory and Methods, 35, 1059–1069. Quesada-Molina, J.J., and Rodríguez-Lallena, J.A. (1995), ‘Bivariate Copulas with Quadratic Sections’, Journal of Nonparametric Statistics, 5, 323–337. Rizzi, A. (1971), ‘Distribuzione dell’indice di cograduazione del Gini’, Metron, 29, 63–73. Rüschendorf, L. (1976), ‘Asymptotic Distributions of Multivariate Rank Order Statistics’, The Annals of Statistics, 4, 912–923. Salama, I.A., and Quade, D. (1990), ‘A Note on Spearman’s Footrule’, Communications in Statistics: Simulation and Computation, 19, 591–601. Salama, I.A., and Quade, D. (2001), ‘The Symmetric Footrule’, Communications in Statistics: Theory and Methods, 30, 1099–1109. Salama, I.A., and Quade, D. (2002), ‘Computing the Distribution of Spearman’s Footrule in O(n4 ) Time’, Journal of Statistical Computation and Simulation, 72, 895–898. Salama, I.A., and Quade, D. (2004), ‘AgreementAmong Censored Rankings Using Spearman’s Footrule’, Communications in Statistics: Theory and Methods, 33, 1837–1850. Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 950 C. Genest et al. Salvemini, T. (1951), ‘Sui vari indici di cograduazione’, Statistica, 11, 133–154. Savorgnan, F. (1915), Sulla Formazione dei Valori dell’Indice di Cograduazione, Studi Economico-Giuridici dell’Università di Cagliari. Scarsini, M. (1984), ‘On Measures of Concordance’, Stochastica, 8, 201–218. Sen, P.K. (1960), ‘On Some Convergence Properties of U -Statistics’, Calcutta Statistics Association Bulletin, 10, 1–18. Sen, P.K., and Salama, I.A. (1983), ‘The Spearman Footrule and a Markov Chain Property’, Statistics and Probability Letters, 1, 285–289. Sen, P.K., Salama, I.A., and Quade, D. (2003), ‘Spearman’s Footrule Under Progressive Censoring’, Journal of Nonparametric Statistics, 15, 53–60. Spearman, C. (1904), ‘The Proof and Measurement of Association Between Two Things’, The American Journal of Psychiatry, 15, 72–101. Spearman, C. (1906), ‘Footrule for Measuring Correlation’, The British Journal of Psychiatry, 2, 89–108. Stute, W. (1984), ‘The Oscillation Behavior of Empirical Processes: The Multivariate Case’, The Annals of Probability, 12, 361–379. Tsukahara, H. (2005), ‘Semiparametric Estimation in Copula Models’, The Canadian Journal of Statistics, 33, 357–375. Úbeda-Flores, M. (2005), ‘Multivariate Versions of Blomqvist’s Beta and Spearman’s Footrule’, Annals of the Institute of Statistical Mathematics, 57, 781–788. Ury, H.K., and Kleinecke, D.C. (1979), ‘Tables of the Distribution of Spearman’s Footrule’, Applied Statistics, 28, 271–275. Vittadini, G. (1991), ‘Una approssimazione della variabile casuale G di Gini’, Rivista Internationale di Scienze Economiche e Commerciali, 38, 81–94. Woodworth, G.G. (1970), ‘Large Deviations and Bahadur Efficiency of Linear Rank Statistics’, The Annals of Mathematical Statistics, 41, 251–283. Appendix 1: Proof of Proposition 1 First, define a variant of the empirical copula of Deheuvels (1979) by  n 1 Si Ri Cn (u, v) = 1  u, v n n+1 n+1 i=1 for every u, v ∈ (0, 1). Observe that for any score function J : (0, 1)2 → R, one has  n Si Ri 1 J (u, v) dCn (u, v). = , J n n+1 n+1 (0,1)2 i=1 If in addition J itself is a copula, up to a multiplicative constant, Fubini’s theorem yields (0,1)2 J (u, v) dCn (u, v) = (0,1)2 Cn (u, v) dJ (u, v). When these identities are used with J = Jϕ , Equation (3) becomes ϕn = 1 6n n−1 0 Cn (t, t) dt − 2n + 1 , n−1  which shows that n1/2 (ϕn − ϕC ) has the same asymptotic behaviour as ZC,n = 6 Cn (t, t) dt, where Cn = n1/2 (Cn − C) ∗ = 4 {C (t, t) + C (t, 1/2 is the empirical copula process. Similarly, n (γn − γC ) behaves asymptotically as ZC,n n n 1 − t)} dt. Now it has been known since the work of Rüschendorf (1976) that when C admits continuous partial derivatives, Cn converges weakly as n → ∞ to a continuous centred Gaussian process C of the form C(u, v) = UC (u, v) − Ċ1 (u, v)UC (u, 1) − Ċ2 (u, v)UC (1, v) for all u, v ∈ (0, 1). Here, UC denotes a pinned C-Brownian sheet, that is, a centred Gaussian random field whose covariance function at u, v, s, t ∈ (0, 1) is given by cov{UC (u, v), UC (s, t)} = C{min(u, s), min(v, t)} − C(u, v)C(s, t). See, for example, Stute (1984), Fermanian, Radulović, and Wegkamp (2004), or Tsukahara (2005) for further discussion. Because ZC,n is a continuous linear functional of Cn , it converges weakly as n → ∞ to the centred Gaussian random variable ZC = 6 C(t, t) dt with variance σϕ2C = 36 1 1 cov{C(s, s), C(t, t)} ds dt. 0 (A1) 0 ∗ as n → ∞ is the centred Gaussian random variable Z ∗ = 4 {C(t, t) + C(t, 1 − t)} dt Similarly, the weak limit of ZC,n C with variance σγ2C = 16 1 0  1 0 cov{C(s, s) + C(s, 1 − s), C(t, t) + C(t, 1 − t)} ds dt. The latter limit corresponds to the case g(t) = t in Cifarelli et al. (1996, Theorem 4.1). (A2) Journal of Nonparametric Statistics 951 Appendix 2: Proof of Proposition 3 For arbitrary u = (u1 , . . . , ud ) ∈ (0, 1)d , let Cn (u) =  n 1 Rid Ri1 1  u1 , . . . ,  ud n n+1 n+1 i=1 be the d-variate empirical copula. One then has ϕn = 1 − d +1 n d −1 n−1 (0,1)d {max(u) − min(u)} dCn (u), (A3) Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 and the factor n/(n − 1) can be ignored asymptotically. Now if (U1 , . . . , Ud ) has distribution Cn and U is an independent uniform random variable on (0, 1), then 1 (0,1)d max(u) dCn (u) = 1 − Pr(U  U1 , . . . , U  Ud ) = 1 − Cn (t, . . . , t) dt 0 and 1 (0,1)d min(u) dCn (u) = Pr(U  U1 , . . . , U  Ud ) = Pr(U1 > t, . . . , Ud > t) dt. 0 The latter expression can be formulated alternatively in terms of Cn by means of the inclusion–exclusion formula. To this end, let |A| denote the cardinality of any set A ⊆ D = {1, . . . , d}, and denote by tA the vector (t1 , . . . , td ) such that tℓ = t1(ℓ ∈ A) + 1(ℓ ∈ / A) for all ℓ ∈ {1, . . . , d} so that, for example, tD = (t, . . . , t). Then,      (−1)|A| Pr {Ui  t} = (−1)|A| Cn (tA ), Pr(U1 > t, . . . , Ud > t) = A⊆D i∈A A⊆D where an intersection over the empty set is to be interpreted as the sure event. Similarly, one has 1 0 1 C̄(t, . . . , t) dt = 0 C̄(1 − t, . . . , 1 − t) dt =  A⊆D C(tA ) dt. 0 A⊆D Consequently, n1/2 (ϕn − ϕC ) has the same asymptotic behaviour as ⎧  d +1 ⎨ 1 ZC,n = Cn (tD ) dt + (−1)|A| ⎩ d −1 0 1 (−1)|A| 1 Cn (tA ) dt 0 ⎫ ⎬ , ⎭ which is a continuous linear functional of the process Cn . From the work of Rüschendorf (1976), the limit of the latter is of the form C(u) = UC (u) − d  Ċj (u)UC (uj ) (A4) j =1 for arbitrary u = (u1 , . . . , ud ), where uj represents a d-dimensional vector with uj in its j th coordinate and 1 everywhere else. Here, UC is a d-variate-centred Gaussian field with covariance given by cov{UC (u), UC (v)} = C(u ∧ v) − C(u)C(v), where for all u, v ∈ (0, 1)d , u ∧ v represents the componentwise minimum. Thus, ZC,n converges weakly as n → ∞ to a centred Gaussian random variable ⎧ ⎫ ⎬ 1  d +1 ⎨ 1 |A| C(tD ) dt + (−1) C(tA ) dt . ZC = ⎭ d −1 ⎩ 0 0 A⊆D Hence, if sA is defined as tA mutatis mutandis, the variance of ZC is given by ⎧ ⎫  2⎨ ⎬  d + 1 Ŵ(D, D) + 2 (−1)|A| Ŵ(A, D) + Ŵ̄(D, D) , σϕ2C = ⎭ d −1 ⎩ A⊆D where for arbitrary A, B ⊆ D, one has 1 Ŵ(A, B) = 1 cov{C(sA ), C(tB )} ds dt 0 0 (A5) 952 C. Genest et al. and Ŵ̄(D, D) =   A⊆D B⊆D (−1)|A|+|B| Ŵ(A, B) = 1 1 cov{C̄(sD ), C̄(tD )} ds dt. 0 0 Here, the process C̄ is defined in Equation (A4), with C replaced by C̄ everywhere. Thus when C is radially symmetric, that is, C = C̄, one gets Ŵ̄(D, D) = Ŵ(D, D). Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 Appendix 3: Moments of ϕn at independence For arbitrary integers k  n, let k/nA denote the vector (k1 , . . . , kd )/(n + 1), where kℓ = k1(ℓ ∈ A) + (n + 1)1(ℓ ∈ / A) for all ℓ ∈ {1, . . . , d}. Using results stated on p. 59 of Hájek and Šidák (1967), one can see easily that E{Cn (k/nA )} = (k/n)|A| at independence. Thus if tA is defined as in Appendix 2, one gets 1 E 0 Cn (tA ) dt = n n  1  1  k E {Cn (k/nA )} = n+1 n+1 n k=0 |A| . k=0 It then follows from the identities proven in Appendix 2 that 1 max(u) dCn (u) = 1 − E E (0,1)d 0 Cn (tD ) dt = 1 − n  1  k n+1 n d k=0 and that E (0,1)d min(u) dCn (u) = =  1 (−1)|A| E Cn (tA ) dt 0 A⊆D n   k 1  (−1)|A| n+1 n A⊆D |A| k=0 =   d n  k 1  d (−1)ℓ ℓ n+1 n ℓ . k=0 ℓ=0 The latter sum can be simplified further using the binomial theorem, viz. n  d  (−1)ℓ k=0 ℓ=0   k d ℓ n ℓ = n   k 1− n d k=0 = n   k k=0 n d . Taking expectations on both sides of Equation (A3) and making the appropriate substitutions, one gets Formula (7), as stated. Turning to the computation of var (ϕn ), one can immediately deduce from first principles and the above expression for E (ϕn ) that ⎡  n  2 ⎤   2  k d n d +1 2 ⎣E{M(n, d)} − 2 ⎦, var(ϕn ) = 2 d −1 n2 − 1 n k=0 where 1 2 M(n, d) = (n + 1) 0 1 0 Now at independence, one has ⎧ ⎨ ⎩ Cn (sD )Cn (tD ) + Cn (sD ) E{Cn (j/nD )Cn (k/nA )}   1 j d−|A| min(j, k) = 2 n n n n |A| + n(n − 1)  (−1) |A| Cn (tA ) ⎫ ⎬ ds dt. ⎭ A⊆D j k − min(j, k) n2 − n |A|  for arbitrary j, k ∈ {0, . . . , n} and A ⊆ D. Thus if n (ℓ, d) = n  n 1  j n n j =0 k=0 d−ℓ  min(j, k) n ℓ + (n − 1) j k − min(j, k) n2 − n ℓ  Journal of Nonparametric Statistics 953 for all ℓ ∈ {0, . . . , d}, one gets 1 1 E 0 0 Cn (sD )Cn (tA ) ds dt = n (|A|, d) . (n + 1)2 Upon substitution and an application of the binomial identity, one finds E{M(n, d)} = n (d, d) + d  (−1)ℓ ℓ=0  d n (ℓ, d) = n (d, d) + n (d), ℓ where Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 n (d) = n n 1  n j =0 k=0  min(j, k) j − n n d + (n − 1) j k − min(j, k) j − n n2 − n d  . Collecting terms, one finds  d +1 var (ϕn ) = 2 d −1 2  2 n n2 − 1 ⎡ ⎣n (d, d) + n (d) − 2  n   k k=0 n d 2 ⎤ ⎦. (A6) 2 Appendix 4: Computation of σ Because the independence copula is radially symmetric, Formula (A5) reduces to ⎫ ⎧  ⎬  d +1 2⎨ 2 |A| σϕC = 2 (−1) Ŵ(A, D) . Ŵ(D, D) + ⎭ d −1 ⎩ A⊆D To compute σϕ2C , one must evaluate 2d covariances of the form cov {C(sA ), C(tD )} for some A ⊆ D. In view of Equation (A4), any such covariance may be expressed as cov {UC (sA ), UC (tD )} + − d  j =1 d d   j =1 k=1 Ċj (sA )Ċk (tD )cov {UC (sA∩{j } ), UC (tD∩{k} )} Ċj (sA )cov {UC (sA∩{j } ), UC (tD )} − d  k=1 Ċk (tD )cov {UC (sA ), UC (tD∩{k} )}. Simplifications occur when C =  because Ċj (sA ) = s |A\{j }| , Ċk (tB ) = t |B\{k}| and for arbitrary A, B ⊆ D,  |A|  |B\A|  s t − t |B| , if s < t, cov {UC (sA ), UC (tB )} =   t |B| s |A\B| − s |A| , if s > t. Thus if s < t and A ⊆ D, one finds cov {C(sA ), C(tD )} = s |A| (t |D\A| − t |D| ) + −  j ∈A s |A| |D|−1 t  j =k∈A (1 − t) − s |A| t |D|−1 (1 − t)  k∈A s |A| t |D|−1 (1 − t), which reduces to s |A| (t |D\A| − t |D| ) − |A|s |A| t |D|−1 (1 − t) = s |A| {t |D\A| (1 − t |A| ) − |A|t |D|−1 (1 − t)}. Similarly if s > t, one gets t |D| (s |A\D| − s |A| ) − |A| t |D| s |A|−1 (1 − s) = t |D| {(1 − s |A| ) − |A| s |A|−1 (1 − s)}. Consequently, for any A ⊆ D with |A| = k, 1 Ŵ(A, D) = 0 1 0 cov {C(sA ), C(tD )} ds dt = 1 (k, d) + 2 (k, d), 954 C. Genest et al. where for arbitrary k ∈ {1, . . . , d}, 1 1 (k, d) = t 0 s k {t d−k (1 − t k ) − kt d−1 (1 − t)} ds dt, 0 and 1 2 (k, d) = 1 0 t 2 2  t d {(1 − s k ) − k s k−1 (1 − s)} ds dt. Consequently, Downloaded By: [Canadian Research Knowledge Network] At: 15:25 7 January 2011 σ2 = 2  d +1 d −1 ℓ=1  ℓ (d, d) + d  (−1)k k=0   d ℓ (k, d) . k Now observe that in view of the binomial identity, d  k=0 (−1)k  d 1 (k, d) = k 1 0 0 (−1)k 0 k=0 1 = d  t t 0  d k d−k s {t (1 − t k ) − kt d−1 (1 − t)} ds dt k {(t − s)d − (1 − s)d t d + ds(1 − s)d−1 t d−1 − ds(1 − s)d−1 t d } ds dt. Similarly, d  k=0 (−1)k  d 2 (k, d) = k 1 0 1 t −t d {(1 − s)d − (1 − s)d−1 d + ds(1 − s)d−1 } ds dt. Upon collecting the terms and integrating, one gets the desired conclusion.