Academia.eduAcademia.edu

Nonparametric tests of independence based on interpoint distances

2020, Journal of Nonparametric Statistics

We present novel tests for the hypothesis of independence when the number of variables is larger than the number of vector observations. We show that two multivariate normal vectors are independent if and only if their interpoint distance are independent. The proposed test statistics exploit different properties of the sample interpoint distances. A simulation study compares the new tests with three existing tests under various scenarios, including monotone and nonmonotone dependence structures. Numerical results show that the new methods are effective for independence testing.

Journal of Nonparametric Statistics ISSN: 1048-5252 (Print) 1029-0311 (Online) Journal homepage: https://www.tandfonline.com/loi/gnst20 Nonparametric tests of independence based on interpoint distances Lingzhe Guo & Reza Modarres To cite this article: Lingzhe Guo & Reza Modarres (2020): Nonparametric tests of independence based on interpoint distances, Journal of Nonparametric Statistics, DOI: 10.1080/10485252.2020.1714613 To link to this article: https://doi.org/10.1080/10485252.2020.1714613 View supplementary material Published online: 16 Jan 2020. Submit your article to this journal Article views: 9 View related articles View Crossmark data Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=gnst20 JOURNAL OF NONPARAMETRIC STATISTICS https://doi.org/10.1080/10485252.2020.1714613 Nonparametric tests of independence based on interpoint distances Lingzhe Guo and Reza Modarres Department of Statistics, George Washington University, Washington, DC, USA ABSTRACT ARTICLE HISTORY We present novel tests for the hypothesis of independence when the number of variables is larger than the number of vector observations. We show that two multivariate normal vectors are independent if and only if their interpoint distance are independent. The proposed test statistics exploit different properties of the sample interpoint distances. A simulation study compares the new tests with three existing tests under various scenarios, including monotone and nonmonotone dependence structures. Numerical results show that the new methods are effective for independence testing. Received 2 May 2019 Accepted 6 January 2020 KEYWORDS Interpoint distance; distance covariance; cut-point AMS SUBJECT CLASSIFICATIONS 62G09; 62H15; 62H20; 62H12 1. Introduction With numerous applications, independence testing is fundamental in statistics and has been extensively studied in the past several decades. However, there is still a need for methods of testing independence in a number of statistical problems where the sample size is smaller than the dimension. High-dimensional datasets are commonplace in finance, robotics, network analysis, medical imaging, machine learning, microarray and DNA analysis where independence testing is a central issue. Typical examples include microarray analysis where one assumes independence in the expression levels among different genes and functional magnetic resonance imaging research where one is interested to test the independence of voxels in different parts of the brain. Independent component analysis (ICA) is another application where one decomposes multivariate data into mutually independent components. The ICA model is subject to a constraint that at most one of these components is Gaussian as required for model identifiability. Classical tests of independence are based on the asymptotic theory that the sample size n approaches infinity while the dimension p is fixed. Therefore, they are not effective in settings where p approaches infinity. Our main concern is testing the hypothesis of independence when the p is much larger than n. We present testing procedures that use the interpoint distances (IPDs), which are inherently one dimensional. These methods are effective for large p and fixed n where p > n. Let {(Xi , Yi )}ni=1 be independent and identically distributed copies of random vector pairs (X, Y) with joint distribution function (DF) F in Rp and marginal distributions CONTACT Reza Modarres [email protected] © American Statistical Association and Taylor & Francis 2020 2 L. GUO AND R. MODARRES Gx and Gy in Rp1 and Rp2 , respectively, where p = p1 + p2 . Our aim in this work is to investigate whether X and Y are independent by testing the null hypothesis for all (X, Y) in Rp , (1) for at least one (X, Y) in Rp , (2) H0 : F(X, Y) = Gx (X) Gy (Y) against general alternatives Ha : F(X, Y) = Gx (X) Gy (Y) where p is larger than n. Consider the random vector Z such that Z′ = [X′ , Y′ ]. For observations vectors that are   xy , the null hypothesis of indenormally distributed with covariance matrix  = xx yx  yy pendence is equivalent to  xy = 0. More generally, under any distribution where lack of covariance implies independence, testing H0 is equivalent to testing that the covariance  xy = 0. Among the earliest problems in the development of multivariate techniques, the classical test of independence is based on the Pearson correlation coefficient ρ when p1 = p2 = 1. Wilks (1935) shows that the likelihood ratio test of H0 is V = (|R|)/(|Rxx |×|Ryy |) where R, Rxx and Ryy are the corresponding sample correlation matrices. Anderson (2003) shows that H0 is equivalent to the regression of one set on the other being zero. Moreover, one does not require normality of X to obtain the distribution of V under H0 . The likelihood ratio test is strictly unbiased and is invariant under linear transformations of X and Y. The distribution of V under the null hypothesis can be written as the product of beta random variables. Assuming finite fourth moments, one can show that −n log V approaches a χ 2 distribution with p1 p2 degrees of freedom as n tends to infinity. Hoeffding (1948) discusses a Cramer–von Mises type nonparametric test for bivariate independence that is robust to non-monotone dependence. Blum et al. (1961) (BKR) obtained the asymptotic null distribution of an asymptotically equivalent form of Hoeffding statistic. Given n bivariate pairs (xi , yi ), BKR statistic is defined as n Bn = 1 (F̂(xi , yi ) − Ĝx (xi )Ĝy (yi ))2 , n (3) i=1 where F̂, Ĝx and Ĝy denote the joint and marginal empirical distribution functions, respectively. The classical dependence measures such as Pearson, Spearman or Kendall correlations have been used for detecting linear or monotone dependence structures. Han et al. (2017) study two families of distribution-free test statistics, which include Kendall’s and Spearman’s correlations. See also Mao (2017). Puri and Sen (1971) present a nonparametric analog to Wilks’ test based on componentwise ranking. Friedman and Rafsky (1983) present graph-based tests of independence. Generalising the univariate quadrant test of Blomqvist (1950); Gieser and Randles (1997) proposed a nonparametric test based on interdirection counts. Feuerverger (1993) presents a consistent bivariate rank test of independence. Taskinen et al. (2003, 2005) propose multivariate tests of independence based on spatial signs and ranks. Modarres (2007) presents a novel test of independence based on the likelihood of cut-points. Bagkavos and Patil (2017) propose a bivariate test of independence based on the property that under independence, every quantile of one variable given the other is constant. Fan et al. (2017) propose a test based on a Cramer–von Mises type functional of a process defined from the empirical JOURNAL OF NONPARAMETRIC STATISTICS 3 characteristic functions. Perhaps the most popular multivariate independence test is based on distance covariance proposed by Szekely et al. (2007). They use characteristic function to transform high dimension data into one dimension. Zhu et al. (2017) argue that the distance covariance may not work well for high dimensional data and propose projection covariance for high dimensional data. This article presents novel methods of testing H0 based on the sample interpoint distances when p is larger than n and compares the methods with existing tests in the literature. Section 2 introduces seven new test statistics using the EDF of IPDs, method of cut-point, odds ratio, correlation of IPDs, EDF in each quadrant and two normalising transformations of IPDs. Section 3 considers comparisons of the new tests with Dcor, HHG and Pcov. The last section is devoted to summary and conclusions. The R code of the test statistics considered in this paper is available from the corresponding author. 2. IPD tests of independence Consider two random vectors X and Y. Let X = (X′ X)1/2 and Y = (Y′ Y)1/2 be the Euclidean norm of X and Y. Heller et al. (2013) show that X and Y are dependent if there exist two vectors x0 and y0 and two constants rx , ry such that P(X − x0  < rx , Y − y0  < ry ) = P(X − x0  < rx )P(Y − y0  < ry ). Because x0 , y0 , rx and ry are not known, Heller et al. (2013) take each sample point in turn to be x0 , y0 and use all the possible interpoint distances to be rx , ry . Consider i.i.d. vector pairs {(Xi , Yi )}ni=1 . The interpoint distance between Xi and Xj is given by d(x)ij = Xi − Xj , and the interpoint distance between Yi and Yj is given by d(y)ij = Yi − Yj  for 1 ≤ i < j ≤ n. Since the distributions of d(x)ij are the same, we use d(X) to denote the interpoint distance between Xs. Similarly, the random variable d(Y) represents the interpoint distance of Ys. Thus the hypothesis we are interested in is H0′ : P(d(X) ≤ x, d(Y) ≤ y) = P(d(X) ≤ x)P(d(Y) ≤ y) for all (x, y) ∈ R2 . (4) There are m = n(n − 1)/2 pairs of IPDs (d(x)ij , d(y)ij ). We refer to any X IPD with random variable d(X) with observed values {d(x)ij } for 1 ≤ i < j ≤ n. Similarly, we refer to any Y IPD with random variable d(Y) with observed values d(y)ij . Two X IPDs d(x)ij and d(x)kh are dependent if they have an index in common. With n observations, there are (n(n − 1)(n − 2))/2 dependent pairs of distances. The following theorem shows that under multivariate normal distribution, the independence between two vectors is equivalent to the independence between the square interpoint distances of the two vectors. Theorem 2.1: Let (X1 , Y1 ), (X2 , Y2 ) be i.i.d. copies of random vector (X, Y) with joint multivariate normal distribution F in Rp and marginal distributions Gx and Gy in Rp1 and 2 = X − X 2 , d2 = Y − Y 2 . The Rp2 , respectively, where p = p1 + p2 . Suppose d(X) 1 2 1 2 (Y) 2 and d2 are vectors X and Y are independent if and only if the interpoint distances d(X) (Y) independent. 4 L. GUO AND R. MODARRES The following proposition shows that when the increase in dimension involves correlated variables in two random vectors, the covariance between the interpoint distances will increase. Proposition 2.1: Let (X1 , Y1 ), (X2 , Y2 ) be i.i.d. copies of random vector (X, Y) with joint multivariate normal distribution F in Rp and marginal distributions Gx and Gy in Rp1 and Rp2 , respectively, where p = p1 + p2 . Suppose for every variable xi in X, there is at least one variable (denoted as y(xi ) ) in Y such that the absolute value of their covariance is larger than 2 = X − X 2 , a positive constant ǫ, i.e. |cov(xi , y(xi ) )| > ǫ > 0, for i = 1, . . . , p1 . Let d(X) 1 2 2 2 2 goes to 2 d(Y) = Y1 − Y2  . When p1 goes to infinity, the covariance between d(X) and d(Y) infinity. Let x and y be the covariance matrix of X and Y, respectively. The correlation coefficient   between the two random vectors is defined as Corr(X, Y) =   ′ ′ tr(x2 )tr(y2 ) (Robert et al. 1985). One can show that under nor/ tr xy xy mal distribution, the correlation coefficient between two random vector is the same as the correlation between their corresponding interpoint distances, i.e. Corr(X, Y) = Corr(d2(X) , d2(Y) ). However, this equation may not hold under other distribution where the dependence between X and Y is not linear. λj Consider the Box–Cox power transformation defined by Yj = (Xj − 1)/λj where λj = 0 and Yj = ln Xj when λj = 0 for each variable Xj , Yj in X = (X1 , X2 , . . . , Xp ) and Y = (Y1 , Y2 , . . . , Yp ), respectively. Assuming that the random vector Y has a p-variate normal distribution with mean µ and covariance matrix , the random vector X defined by the inverse transformations is said to have a multivariate power normal distribution. Freeman and Modarres (2006) show that the p.d.f. of a multivariate power normal distribution is p  1 1 1 λj −1 ′ f (X | λ, µ, ) = · · exp − Yj − µ  −1 Yj − µ Xj p/2 1/2 K (2π ) |  | 2 j=1 , where K is a normalising constant that depends on λ, µ and . Practitioners often assume K = 1. Under multivariate power normal distribution, the dependence between the variables is no longer linear. The following theorem shows that such non-linear dependence is equivalent to the dependence between the interpoint distances. Theorem 2.2: Let (W1 , Z1 ), (W2 , Z2 ) be i.i.d. copies of random vector (W, Z) with a joint Power-normal distribution in Rp and marginal distributions Gw and Gz in Rp1 and Rp2 , 2 2 = Z − Z 2 . The respectively, where p = p1 + p2 . Suppose d(W) = W1 − W2 2 , d(Z) 1 2 2 2 are vectors W and Z are independent if and only if the interpoint distances d(W) and d(Z) independent. This result shows that the statistics based on interpoint distances can not only detect linear dependence but also find non-linear relationships that is induced on the power normal scale. The equivalence of the distributions’ independence and the interpoint distances’ independence under other distributions requires further research. However, in the simulations, we see that the tests based on interpoint distance also have good powers for other types of dependence. JOURNAL OF NONPARAMETRIC STATISTICS 5 2.1. IPD EDF method (EDF) Denote the marginal DFs of d(X) , and d(Y) and their joint DF by Hx (x) = P(d(X) ≤ x), Hy (y) = P(d(Y) ≤ y) and Hxy (x, y) = P(d(X) ≤ x, d(Y) ≤ y), respectively. Let I(·) denote the indicator function and estimate the DFs by Ĥx (x) = n−1 n 1   I(d(x)st ≤ x), m s=1 t=s+1 Ĥy (y) = n−1 n 1   I(d(y)st ≤ y), m s=1 t=s+1 Ĥxy (x, y) = n−1 n 1   I(d(x)st ≤ x) × I(d(y)st ≤ y), m s=1 t=s+1 (5) where m = n(n − 1)/2. We obtain m estimates of the joint and marginal distributions of IPDs at (d(x)ij , d(y)ij ) for 1 ≤ i < j ≤ n. To measure the inability of bivariate DF H to factorise into a product of marginal DFs at each pair of IPDs (d(x)ij , d(y)ij ), we estimate the total squared deviations from independence and define the test statistic EDF = n n   i=1 j=i+1 [Ĥxy (d(x)ij , d(y)ij ) − Ĥx (d(x)ij ) Ĥy (d(y)ij )]2 . (6) The range of the statistics EDF is 0 ≤ EDF ≤ n(n − 1)/2. Under H0 we expect EDF to be small, and we reject H0 for large values of EDF. The statistic EDF is similar in spirit to the statistics Bn in Equation (3), but with a fundamental difference. While Bn is evaluated at the n observed i.i.d. pairs (xi , yi ), the m pairs (d(x)ij , d(y)ij ) that construct EDF are not independent. Thus the null distribution of EDF cannot be obtained asymptotically as the i.i.d. case. Instead, we use random permutation testing to draw conclusion with EDF, which is described as follows. Let R be the total number of permutations. In rth permutation for r = 1, . . . , R, we (r) n n reorder the observations {Yi }ni=1 as {Y(r) i }i=1 . Consider the sample pairs {Xi , Yi }i=1 and (r) calculate the statistic as EDF . When the null hypothesis of independence is true, the distribution of EDF (r) is the same as the distribution of EDF. Thus we can obtain the empirical distribution of EDF based on the values of statistics EDF (r) . The permutation test will result in exact critical values if we consider all possible permutations. However, unless the sample size is small, this is not computationally feasible. Instead, we obtain an approximate value by performing a sequence of R random permutations. Since we reject the null hypothesis for large value of EDF, the p-value is calculated by p-value EDF = Rr=1 I(EDF < EDF (r) ). 2.2. IPD cut-point method (CP) Modarres (2007) describes a test statistic based on the sum of the likelihood ratio statistics for testing independence in a 2 × 2 tables defined at n sample cut-points (xi , yi ) for testing the hypothesis H0 in (1) when p1 = p2 = 1. IPD cut-point method extends this idea 6 L. GUO AND R. MODARRES to high dimensional independence testing using the IPDs. Consider an observed IPD pair (d(x)ij , d(y)ij ) on the plane and the four regions obtained by the two lines passing through it and parallel to the axes. Let H̄xy (d(x) , d(y) ) = 1 − Hx (d(x) ) − Hy (d(y) ) + Hxy (d(x) , d(y) ) be the bivariate survival function of IPDs. Table 1 shows the probabilities and estimates of the observations that fall in each of the four regions around (d(x)ij , d(y)ij ). Under random sampling, the joint distribution of the frequencies in the four quadrants follows a multinomial distribution at each fixed cut-point (d(x)ij , d(y)ij ). With each observation (d(x)ij , d(y)ij ), 1 ≤ i < j ≤ n, we associate a 2 × 2 table with elements mrk(ij) /m for r, k = 1, 2 and calculate m11(ij) = n n−1   I(d(x)st ≤ d(x)ij ) × I(d(x)st ≤ d(y)ij ), m12(ij) = n n−1   I(d(x)st ≤ d(x)ij ) × I(d(y)st > d(y)ij ), m21(ij) = n n−1   I(d(x)st > d(x)ij ) × I(d(y)st ≤ d(y)ij ), m22(ij) = n n−1   I(d(x)st > d(x)ij ) × I(d(y)st > d(y)ij ), s=1 t=s+1 s=1 t=s+1 s=1 t=s+1 s=1 t=s+1 m1+(ij) = m11(ij) + m12(ij) , m2+(ij) = m21(ij) + m22(ij) , m+1(ij) = m11(ij) + m21(ij) , m+2(ij) = m12(ij) + m22(ij) . (7) Recall that m = n(n − 1)/2 is the total number of the interpoint distance pairs. One can show that m11(ij) + m12(ij) + m21(ij) + m22(ij) = m. Let cij = m!/(m11(ij) m12(ij) m21(ij) m22(ij) ). The likelihood at each fixed observation (d(x)ij , d(y)ij ) is defined as Lij = cij (Hxy (d(x)ij , d(y)ij )m11(ij) (Hx (d(x)ij ) − Hxy (d(x)ij ))m12(ij) × (Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij ))m21(ij) (H̄xy (d(x)ij , d(y)ij )m22(ij) . (8) The likelihood ratio test of independence at each (d(x)ij , d(y)ij ) is λij = 2 2   mr+(ij) m+s(ij) m mrs(ij) r=1 s=1 The cut-point test statistic is CP= CP = 2 n i=1 mrs(ij) . (9) n j=i+1 −2 log(λij ) or n  2  2 n   i=1 j=i+1 r=1 s=1 mrs(ij) log m mrs(ij) . mr+(ij) m+s(ij) (10) When n → ∞ and for fixed p the asymptotic distribution of −2 log(λij ) is χ 2 with one degree of freedom. Furthermore, λij s are dependent as they depend on the cut-points JOURNAL OF NONPARAMETRIC STATISTICS 7 Table 1. The 2 × 2 table of probabilities and estimates at (d(x)ij , d(y)ij ). Prob. d(X) ≤ d(x)ij Estimates d(X) > d(x)ij Estimates Estimates d(y) ≤ d(y)ij d(y) > d(y)ij Hxy (d(x)ij , d(y)ij ) m11(ij) /m Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij ) m21(ij) /m Hy (d(y)ij ) m+1(ij) /m Hx (d(x)ij ) − Hxy (d(x)ij , d(y)ij ) m12(ij) /m H̄xy (d(x)ij , d(y)ij ) m22(ij) /m 1 − Hy (d(y)ij ) m+2(ij) /m Hx (d(x)ij ) m1+(ij) /m 1 − Hx (d(x)ij ) m2+(ij) /m 1 1 (d(x)ij , d(y)ij ). The statistic CP computes the total likelihood of all m cut-points and the m individual likelihood ratio test statistics −2 log(λij ) are available to further classify regions of positive (negative) or no dependence. In practice, one uses random permutation testing to obtain the sampling distribution of CP. 2.3. IPD odds ratio (Odds) Fleiss et al. (2013) describe odds ratio test for the independence of two variables where each variable has exactly two possible outcomes. Agresti (1980) proposed energised odds ratios for ordinal data. In this section, we apply odds ratios to the interpoint distances. Consider the odds ratio of the 2 × 2 table (Table 1), at each point (d(x)ij , d(y)ij ) we have the odds ratio as Hxy (d(x)ij , d(y)ij )H̄xy (d(x)ij , d(y)ij ) , (11) θ(xy)ij = [Hx (d(x)ij ) − Hxy (d(x)ij , d(y)ij )][Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij )] where 0 < θ(xy)ij < ∞. If the odds ratio θ(xy)ij is larger than 1, then the interpoint distance of Xs and Ys at (d(x)ij , d(y)ij ) is associated in the sense that the IPD of Xs being d(x)ij raise the odds of Y IPD being d(y)ij . If θ(xy)ij is less than 1, then X and Y IPD are negatively correlated. Under independence, the odds ratios θ(xy)ij = 1 for all (d(x)ij , d(y)ij ) in R2 . The 2 × 2 table of probabilities and estimates at (d(x)ij , d(y)ij ) appear in Table 1. Suppose we estimate the odds ratio at each (d(x)ij , d(y)ij ) with θ̂(xy)ij = (m11(ij) m22(ij) )/(m12(ij) m21(ij) ). We consider the average odds ratio and propose the statistic n n 1   Odds = θ̂(xy)ij . m i=1 j=i+1 (12) If the alternative hypothesis is positive correlation between the IPDs, the null hypothesis of independence is rejected for large values of Odds. If we consider the alternative as negative correlation between the IPDs, we reject the null hypothesis for small values of Odds. Otherwise, we consider the difference between Odds and 1. The distribution of Odds under the null is calculated based on the permutation principle. 2.4. IPD correlation test (Coripd ) The premier test of independence under bivariate normality is based on the sample Pearson correlation coefficient r. Using a geometric argument, Fisher (1915) found the exact distribution of the correlation coefficient. Pitman (1937) obtains the permutation distribution of r, shows it symmetric with zero mean and variance 1/(n − 1) and proves that r is 8 L. GUO AND R. MODARRES asymptotically normal as the sample size becomes unbounded. Having the distinction of being the uniformly most powerful unbiased and uniformly most powerful invariant for testing the independence of bivariate normal samples, the statistic r is designed to detect linear dependence. Let {Zi = [Xi′ , Y′i ]′ }ni=1 be the sample in Rp . Under some regularity conditions given in Theorem 1 of Li (2018), as the dimension p tends to infinity the value of the squared interpoint distances Zi − Zj 2 for 1 ≤ i < j ≤ n converges to normal distribution. It follows that the random vector (d(X) , d(Y) ) converges to a bivariate normal distribution as p tends to infinity. 2 , d2 . Denote the mean vecConsider the covariance and the correlation for the pair d(X) (Y) tor of (d(X) , d(Y) ) with (µd(x) , µd(y) ) and estimate them by d̄(x) = 1/m ni=1 nj=i+1 d(x)ij and d̄(y) = 1/m ni=1 nj=i+1 d(y)ij , respectively. Using the m observed pairs (d(x)ij , d(y)ij ), we estimate the Pearson correlation between d(X) and d(Y) with ⎧ ⎫ n n−1  ⎨ ⎬  1 1 Coripd = Ĉorr d(X) , d(Y) = (13) d(x)ij d(y)ij − d̄(x) d̄(y) , ⎭ sx sy ⎩ m i=1 j=i+1 where sx and sy are the standard deviations of (d(x)ij ) and (d(x)ij ) for 1 ≤ i < j ≤ n, respectively. The range of the statistics Coripd is −1 ≤ Coripd ≤ 1. If the alternative hypothesis is positive correlation between the IPDs, we reject the null hypothesis for large values of Coripd . For negative correlation as alternative, the null hypothesis is rejected for a small value of Coripd . If the alternative hypothesis is unknown, we consider the absolute value of the statistics Coripd . We obtain the sampling distribution of Coripd from random permutations. Suppose {Z′i = [Xi′ , Y′i ]}ni=1 is a sample of p-dimensional random vectors are  that  drawn  from normal distribution with mean (µx , µy ) and covariance matrix  = xxyx xyyy . The following lemma shows that the distribution of square IPD between any two random vectors within the X sample is a linear combination of p1 independent central chi-square random variables with 1 degree of freedom. Lemma 2.1: Let X1 , X2 be i.i.d. copies of random vectors X with multivariate normal dis2 = X − X 2 . Then d2 = tribution Gx with mean µx and covariance  xx . Suppose d(X) 1 2 (X) p1 2 t=1 λt Ut where λ1 , . . . , λp1 are eigenvalues of 2 xx and Ut2 ∼ χ 2 (1). 2 are 2tr Based on Lemma 2.1, the expected value and variance of the IPDs for d(X) xx 2 2 and 8tr xx , respectively. Similarly, the expected value and variance of d(Y) are 2tr yy and 8tr 2yy , respectively. The proof of Theorem 2.1 shows that the covariance between 2 and d2 is Cov(d2 , d2 ) = 8tr( ′  ). Thus the correlation is Corr(d2 , d2 ) = d(X) xy xy (X) (Y) (Y)  (X) (Y) ′ 2 )tr( 2 )). (tr(xy xy )/ tr(xx yy Let rts be the sample correlation between the tth component of X and sth component p1 p2 2 of Y vectors. Schott (2005) proposed the statistics T = t=1 s=1 rts to test the independence of two normal vectors. Under multivariate normality, the asymptotic distribution of this statistic is considered by Schott (2008); Mao (2014). The nonparametric versions of this statistic using Spearman’s and Kendall’s correlations have also been examined in the JOURNAL OF NONPARAMETRIC STATISTICS 9 literature. Compared with the statistics that use the correlations between the components of X and Y directly, the estimation of Corr(d2(X) , d2(Y) ) not only considers the information from xy but also includes the correlations within the random vector. 2.5. IPD quadrant test (Quad) While there is a unique definition for the cumulative distribution function (c.d.f) of a random variable there is not a unique definition of the c.d.f for a multivariate random vector. The distribution function is uniquely defined when the dimension is p = 1 since P(X ≤ x) = 1 − P(X > x). This is no longer the case when p ≥ 2. With p variables, there are 2p − 1 ways of defining a distribution function. Consider the bivariate case where one may define the c.d.f of the pair (x, y) based on how the variables are accumulated in the 4 regions around (x, y). In the multivariate case, we may use IPDs to define the c.d.f as P(d(X) ≤ x, d(Y) ≤ y), P(d(X) ≤ x, d(Y) > y), P(d(X) > x, d(Y) ≤ y) or P(d(X) > x, d(Y) > y). To test for independence between X and Y we take all four possible orderings of the IPDs into account. The criteria to measure the dependence at each (x, y) are θ1 (x, y) = P(d(X) ≤ x, d(Y) ≤ y) − P(d(X) ≤ x) P(d(Y) ≤ y), θ2 (x, y) = P(d(X) ≤ x, d(Y) > y) − P(d(X) ≤ x) P(d(Y) > y), θ3 (x, y) = P(d(X) > x, d(Y) ≤ y) − P(d(X) > x) P(d(Y) ≤ y), θ4 (x, y) = P(d(X > x, d(Y) > y) − P(d(X) > x) P(d(Y) > y). 4 It is clear that k=1 θk (x, y) = 0. Under the null hypothesis of independence, we have θk (x, y) = 0 for k = 1, . . . , 4. The null hypothesis is rejected if either one of θ1 (x, y), . . . , θ4 (x, y) has large absolute value. We estimate θk with θ̂k where θ̂1 (x, y) = Ĥxy (x, y) − Ĥx (x)Ĥy (y), θ̂2 (x, y) = Ĥ2xy (x, y) − Ĥx (x)(1 − Ĥy (y)), θ̂3 (x, y) = Ĥ3xy (x, y) − (1 − Ĥx (x))Ĥy (y), θ̂4 (x, y) = Ĥ4xy (x, y) − (1 − Ĥx (x))(1 − Ĥy (y)), Ĥ2xy (x, y) = Ĥ3xy (x, y) = Ĥ4xy (x, y) = n n 1   I(d(x)st ≤ x)I(d(y)st > y), m s=1 t=s+1 n n 1   I(d(x)st > x)I(d(y)st ≤ y), m s=1 t=s+1 n n 1   I(d(x)st > x)I(d(y)st > y). m s=1 t=s+1 To test H0 we will use the test statistic Quad = max 1≤i<j≤n,1≤k≤4 θ̂k2 (d(x)ij , d(y)ij ). (14) Ranging over the four quadrants and n data points, the statistic Quad is the maximum discrepancy between the estimated c.d.f from its factorised version. The range of Quad is 10 L. GUO AND R. MODARRES 0 ≤ Quad ≤ 1, and we reject the null hypothesis for a large value of Quad. We use random permutations to obtain the distribution of Quad. 2.6. IPD transformation tests In this section, we introduce two other test statistics based on the transformation of the marginal distributions to normality. It is well known that marginal transformation to normality enhances the power of the normal test for independence. Fisher’s Z-transformation is an example. Kowalski and Tarter (1969) consider properties of the normal theory test for independence applied to observations from bivariate distributions constrained to have normal marginals and show that the normal theory test for independence is more powerful when applied to the transformed vectors. For a random variable X with a continuous DF Hx , the transformation Wx = −1 (H (X)), where −1 is the standard normal quantile function, produces normally disx tributed variates. We will compute the empirical density function of the IPDs using (5) and then form   −1 Trans = Ĉorr (Ĥx (x)), −1 (Ĥy (y)) , ⎧ ⎫ n−1 n ⎬ 1 ⎨ 1   −1 (15) = (Ĥx (d(x)ij )) −1 (Ĥy (d(y)ij )) − µ̂1 µ̂2 , ⎭ σ̂1 σ̂2 ⎩ m i=1 j=i+1 where m = n(n − 1)/2, µ̂1 and σ̂1 are the mean and variance of −1 (Ĥx (d(x)ij )), µ̂2 and σ̂2 are the mean and variance of −1 (Ĥy (d(y)ij )) for 1 ≤ i < j ≤ n. It measures the correlation between IPDs of X and Y after normalising transformations of the margins. The distribution of Trans is obtained by permutation method. Other transformations to normality such as cube root transformation have also been suggested in the literature. Freeman and Modarres (2005) consider the efficiency and the power of the normal theory test for independence after a Box–Cox transformation and obtain an expression for the correlation between the variates after the transformation in terms of the correlation on the normal scale. They show that it is always more efficient to conduct the test of independence based on Pearson correlation coefficient after transformation to normality. Wilson and Hilferty (1931) consider the cube root transformation of a chi-squared variable to normality. Lemma 2.1 shows that the distribution of interpoint distances drawn from multivariate normal vectors is a linear combination of chi-squared random variables, which is often approximated using a chi-squared distribution. The distribution of IPDs is of chi-squared type for observations from other distributions (Song and Modarres 2019). Hence, we consider a test based on the correlation of the cube root of 1/3 1/3 the IPDs. Let the random vectors d(x) and d(y) denote the cube root of the sample IPDs. The cube root transformation computes  1/3 IPDcube = Ĉorr d(x) ,  1/3 d(y) . (16) The approach to calculate the correlation estimate is similar to Equation (15). We use random permutations to obtain the approximate sampling distribution of IPDcube . JOURNAL OF NONPARAMETRIC STATISTICS 11 3. Comparisons A desirable method should maintain its power in testing against all form of dependence on the data. We consider some monotone dependence structure under different distributions in Section 3.2 and some non-monotone dependence structures in Section 3.3. Three existing methods are described in Section 3.1 and will be used for comparison. 3.1. Dcor, HHG and pcov methods Szekely et al. (2007) propose distance correlation to measure the dependence between random vectors. Suppose fX,Y (t, s), fX (t) and fY (s) are the joint and marginal characteristic functions of the random vectors X and Y with finite first moments. The distance covariance (dCov) between X and Y and their distance variance are defined by V (X, Y) = fX,Y (t, s) − fX (t)fY (s)L2 , V (X) = fX,X (t, s) − fX (t)fX (s)L2 , V (Y) = fY,Y (t, s) − fY (t)fY (s)L2 , where ·L2 is a weighted L2 norm of the difference between the densities in it. The distance correlation (Dcor) is defined as  V (X,Y) √ , V (X)V (Y) > 0, V (X)V (Y) R (X, Y) = V (X)V (Y) = 0. 0, Szekely et al. (2007) prove that when Euclidean norms of X and Y all have finite expectations, the random vectors X and Y are independent if and only if R (X, Y) = 0. Motivated by this property, they propose the empirical distance correlation Rn (X , Y ) for sample X = {Xi }ni=1 and Y = {Yi }ni=1 . The method finds the distance matrices of X and Y samples and centres them so each has column and row means equal to zero. By averaging the entries of component-wise products of the two centred distance matrices, the squared distance covariance between the two variables is found. Define akl = Xk − Xl , āk· = 1/n nl=1 akl , ā·l = 1/n nk=1 akl , ā·· = 1/n2 nk,l=1 akl , Akl = akl − āk· − ā·l + ā·· , bkl = Yk − Yl , b̄k· = 1/n nl=1 bkl , b̄·l = 1/n nk=1 bkl , b̄·· = 1/n2 nk,l=1 bkl , Bkl = bkl − b̄k· − b̄·l + b̄·· , for k, l = 1, . . . , n. The empirical distance covariance and the empirical distance variance are ⎛ ⎞1/2 n 1 ⎝ Vn (X , Y ) = Akl Bkl ⎠ , n k,l=1 Vn (X ) = ⎛ 1⎝ n n  k,l=1 ⎞1/2 A2kl ⎠ , ⎛ ⎞1/2 n 1 ⎝ 2 ⎠ Vn (Y ) = Bkl . n k,l=1 12 L. GUO AND R. MODARRES The empirical distance correlation is  Rn (X , Y ) = √ Vn (X ,Y ) , V n ( X ) Vn ( Y ) Vn (X )Vn (Y ) > 0, Vn (X )Vn (Y ) = 0. 0, Szekely et al. (2007) prove that almost surely, limn→∞ Rn2 (X , Y ) = R (X, Y). Therefore, it is straightforward to use Rn to measure the independence based on sample X , Y . The R package ‘energy’ is built for independent tests with distance correlation, and we use Dcor to denote this method in the tables. Heller et al. (2013) propose similar 2 × 2 contingency tables to Modarres (2007) while replacing the distribution functions by the probability that (X, Y) falls in the balls with centre (Xi , Yi ) and some radius, for i = 1, . . . , n. Based on Pearson’s test, Heller et al. (2013) construct an R package named ‘HHG’ with the statistic as HHG = n  n  (n − 2){m12(ij) m21(ij) − m11(ij) m22(ij) }2 , m1+(ij) m2+(ij) m+1(ij) m+2(ij) i=1 j=1,j=i where the m11(ij) , m12(ij) , m21(ij) , m22(ij) , m1+(ij) , m2+(ij) , m+1(ij) , m+2(ij) are defined in (7). To handle high dimensional data, Zhu et al. (2017) project the data into low dimension and then apply Pearson correlation to test independence. They show that two random vectors are independent if and only if any of their projections are independent. Motivated by the equivalence, they defined the squared projection covariance between two random vectors X and Y and proposed the test statistic as Pcov = n −3 n  Aklr Bklr , k,l,r=1 where aklr  (Xk − Xr )′ (Xl − Xr ) , = arccos Xk − Xr Xl − Xr   āk·r = n−1 n  l=1 aklr , ā·lr = n−1 n  k=1 aklr , ā··r = n−2 Aklr = aklr − āk·r − ā·lr − ā··r ,   (Yk − Yr )′ (Yl − Yr ) bklr = arccos , Yk − Yr Yl − Yr  −1 b̄k·r = n n  l=1 −1 bklr , b̄·lr = n n  k=1 −2 bklr , b̄··r = n n n   aklr , n n   bklr , k=1 l=1 k=1 l=1 Bklr = bklr − b̄k·r − b̄·lr − b̄··r . 3.2. Sampling experiment In this section, we compare the proposed methods with the approaches from R package ‘energy’ and ‘HHG’, as well as the statistic from Zhu et al. (2017). The dependence JOURNAL OF NONPARAMETRIC STATISTICS 13 Table 2. The type I error and power of different statistics under the multivariate normal distribution. ρ p1 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.051 0.060 0.048 0.035 0.054 0.063 0.046 0.067 0.050 0.042 0.065 0.061 0.042 0.064 0.059 0.049 0.047 0.057 0.042 0.068 0.059 0.043 0.045 0.046 0.041 0.054 0.042 0.049 0.041 0.045 0.1 50 100 200 0.431 0.731 0.93 0.566 0.837 0.959 0.505 0.795 0.949 0.583 0.859 0.972 0.560 0.841 0.967 0.386 0.673 0.905 0.536 0.817 0.959 0.792 0.971 0.999 0.507 0.808 0.958 0.763 0.961 0.997 0.2 50 100 200 0.893 0.978 0.998 0.943 0.989 1 0.928 0.985 0.998 0.953 0.993 1 0.946 0.991 1 0.876 0.972 0.997 0.931 0.988 0.999 0.991 1 1 0.939 0.992 1 0.987 1 1 0.3 50 100 200 0.983 0.995 1 0.993 0.998 1 0.989 0.998 1 0.995 0.999 1 0.994 0.999 1 0.978 0.994 1 0.993 0.997 1 1 1 1 0.992 0.999 1 1 1 1 structures we considered are correlated relations and some unusual dependence patterns. All tests are controlled by the significant level at α = 0.05. The simulations are performed under the multivariate: (a) normal, (b) folded normal (Leone et al. 1961), (c) log-normal, (d) Cauchy, (e) t and (f) double exponential distributions. Each simulation is conducted under 1000 replications with sample size n = 20 in each run. We use p1 = p2 ∈ {50, 100, 200}. The testing samples are generated by the following steps. We first generate p = p1 + p2 dimensional samples {Zi }ni=1 from a multivariate normal distribution with mean µ and covariance . We use the first p1 elements of Zi are set to Xi , and   use the remaining p2 elements of Zi are used for Yi , i.e. Zi = (Xi′ , Y′i )′ . Let  = xxyx xyyy where  xy is of dimension p1 × p2 . By changing the matrix  xy , we can generate the pair sample {(Xi , Yi )}ni=1 with different degrees of dependence under the multivariate normal distribution. By taking the absolute value or exponentiating the elements of Xi and Yi , for i = 1, . . . , n, we obtain samples from the multivariate folded normal or log-normal distributions. We use  = (σ )ij where σii = 1 + ρ and σij = ρ for i = j = 1, . . . , p. The value of ρ measure the degree of the correlation and varies in ρ ∈ {0, 0.1, 0.2, 0.3}. When ρ = 0, samples are drawn under the null hypothesis of independence and the simulation results estimate the type I error. Table 2 shows the estimated type I errors and powers of the statistics under the multivariate normal distribution. One notices that all tests have their type I error under control. The estimated significance levels are close to the nominal level of 0.05. In terms of power, one observes that Dcor and Pcov perform the best, followed by the Coripd and Corcube tests. All 10 test statistics work fine under the multivariate normal distribution. Nearly all powers are above 0.9 for alternatives close to independence. The testing powers of all the methods increase when we increase the dimensions p1 and p2 . Table 3 shows results for the multivariate folded normal distribution. The estimated significance levels are close to the nominal level of 0.05. The tests Coripd , Corcube and Trans have the largest power, followed by Odds and CP methods. The seven newly proposed methods dominate Dcor, HHG and Pcov. Increasing the dimension increases the power of all the testing methods. In addition, compared with testing the independence under multivariate normal distribution, all tests show less power. 14 L. GUO AND R. MODARRES Table 3. The type I error and power of different statistics under the multivariate folded normal distribution. ρ p1 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.045 0.055 0.045 0.053 0.063 0.048 0.053 0.053 0.045 0.054 0.053 0.045 0.055 0.052 0.044 0.054 0.062 0.051 0.058 0.055 0.048 0.054 0.050 0.050 0.057 0.062 0.050 0.050 0.048 0.050 0.1 50 100 200 0.155 0.309 0.517 0.213 0.373 0.572 0.178 0.317 0.537 0.217 0.406 0.601 0.209 0.397 0.597 0.144 0.287 0.479 0.209 0.393 0.591 0.120 0.220 0.387 0.119 0.248 0.421 0.080 0.143 0.262 0.2 50 100 200 0.451 0.688 0.909 0.533 0.743 0.924 0.494 0.707 0.915 0.547 0.764 0.94 0.548 0.759 0.939 0.429 0.658 0.892 0.537 0.755 0.933 0.370 0.612 0.842 0.379 0.621 0.868 0.229 0.439 0.715 0.3 50 100 200 0.718 0.891 0.988 0.75 0.909 0.992 0.726 0.896 0.989 0.785 0.924 0.994 0.782 0.922 0.994 0.673 0.869 0.982 0.776 0.920 0.992 0.642 0.851 0.975 0.625 0.855 0.973 0.453 0.722 0.931 Table 4. The type I error and power of different statistics under the multivariate log-normal distribution. ρ p1 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.052 0.059 0.047 0.056 0.056 0.043 0.052 0.06 0.048 0.058 0.052 0.037 0.056 0.051 0.038 0.052 0.061 0.045 0.055 0.054 0.044 0.059 0.048 0.039 0.055 0.054 0.049 0.054 0.047 0.047 0.1 50 100 200 0.529 0.704 0.897 0.543 0.727 0.925 0.556 0.728 0.919 0.431 0.611 0.847 0.532 0.714 0.907 0.464 0.657 0.849 0.595 0.762 0.932 0.566 0.739 0.921 0.468 0.682 0.884 0.728 0.884 0.976 0.2 50 100 200 0.806 0.941 0.993 0.812 0.942 0.99 0.844 0.96 0.994 0.679 0.854 0.964 0.801 0.934 0.985 0.744 0.877 0.973 0.857 0.968 0.992 0.820 0.941 0.984 0.804 0.935 0.991 0.955 0.996 1 0.3 50 100 200 0.938 0.984 0.997 0.939 0.984 0.997 0.961 0.986 0.997 0.823 0.927 0.987 0.927 0.971 0.997 0.881 0.961 0.991 0.956 0.985 0.998 0.939 0.975 0.996 0.92 0.988 0.998 0.994 1 1 Table 4 shows the performance of the statistics under the multivariate log-normal distribution. Clearly, all tests have their type I error under control. Regarding the powers, Pcov performs better than all other testing methods. The CP and Trans tests are not as powerful as Pcov, but are competitive when compared with the other methods. Larger dimension implies a higher power for all testing methods. Table 5, 6 and 7 show the results for the multivariate Cauchy, t and double exponential distributions that exhibit heavy tails. We construct the vectors {(Xi , Yi )}ni=1 as follows. We first generate the observations {Xi }ni=1 . For i = 1, . . . , n, each element of Xi is generated from the selected distribution. The observations {Yi }ni=1 are generated as follows. Let Xi = (Xi1 , . . . , Xip1 )′ and Yi = (Yi1 , . . . , Yip2 )′ . Suppose q is an integer such that 0 ≤ q ≤ min(p1 , p2 ). For 1 ≤ j ≤ q, set Yij = Xij2 . For q < j ≤ p2 , Yij is generated from the selected distribution independently. The value of q measures the degree of the dependence between Xi and Yi . Larger value of q implies more dependence. We define q = 0 when all the elements in Yi are generated independently from Xi . In simulations, the dimension varies in p1 = p2 ∈ {50, 100, 200}, and q is chosen such that q/p2 ∈ {0, 0.2, 0.5, 0.7} for the Cauchy and t distributions and q/p2 ∈ {0, 0.5, 0.7, 0.9} for the double exponential distribution. JOURNAL OF NONPARAMETRIC STATISTICS 15 Table 5. The type I error and power of different statistics under the multivariate Cauchy distribution. q p2 p2 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.042 0.051 0.051 0.053 0.044 0.049 0.050 0.046 0.050 0.059 0.045 0.056 0.059 0.044 0.055 0.050 0.046 0.057 0.051 0.044 0.058 0.057 0.042 0.055 0.048 0.045 0.048 0.054 0.040 0.057 0.2 50 100 200 0.368 0.419 0.390 0.341 0.350 0.347 0.391 0.434 0.403 0.288 0.268 0.276 0.321 0.301 0.326 0.320 0.348 0.326 0.367 0.413 0.396 0.294 0.280 0.291 0.296 0.318 0.279 0.370 0.403 0.379 0.5 50 100 200 0.924 0.925 0.939 0.897 0.875 0.904 0.936 0.930 0.941 0.583 0.600 0.640 0.693 0.692 0.726 0.840 0.862 0.875 0.862 0.857 0.886 0.620 0.631 0.674 0.912 0.907 0.937 0.859 0.851 0.889 0.7 50 100 200 0.998 0.995 0.994 0.994 0.994 0.992 0.999 0.996 0.996 0.778 0.776 0.797 0.852 0.864 0.864 0.985 0.976 0.985 0.974 0.980 0.980 0.813 0.793 0.823 0.997 0.996 0.999 0.972 0.971 0.980 Table 6. The type I error and power of different statistics under the multivariate t distribution. q p2 p2 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.055 0.034 0.046 0.041 0.033 0.048 0.050 0.026 0.054 0.049 0.042 0.049 0.051 0.041 0.048 0.053 0.033 0.044 0.056 0.033 0.050 0.045 0.038 0.050 0.049 0.032 0.044 0.042 0.037 0.050 0.2 50 100 200 0.403 0.403 0.424 0.380 0.390 0.425 0.417 0.433 0.448 0.318 0.317 0.331 0.372 0.361 0.371 0.361 0.358 0.370 0.435 0.436 0.465 0.328 0.323 0.338 0.328 0.329 0.341 0.399 0.385 0.419 0.5 50 100 200 0.893 0.902 0.910 0.867 0.889 0.890 0.908 0.923 0.916 0.676 0.687 0.687 0.755 0.743 0.727 0.833 0.837 0.844 0.878 0.882 0.880 0.686 0.702 0.693 0.890 0.892 0.885 0.781 0.812 0.811 0.7 50 100 200 0.986 0.992 0.988 0.978 0.987 0.986 0.986 0.990 0.989 0.855 0.853 0.845 0.898 0.896 0.895 0.968 0.981 0.970 0.975 0.980 0.977 0.868 0.861 0.866 0.986 0.992 0.984 0.929 0.950 0.937 Table 7. The type I error and power of different statistics under the multivariate double exponential distribution. q p2 p2 EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 0 50 100 200 0.070 0.052 0.041 0.065 0.055 0.057 0.059 0.054 0.048 0.052 0.059 0.061 0.061 0.059 0.059 0.064 0.060 0.046 0.056 0.061 0.061 0.058 0.042 0.050 0.050 0.051 0.052 0.062 0.051 0.057 0.5 50 100 200 0.721 0.580 0.508 0.779 0.664 0.622 0.765 0.644 0.582 0.602 0.482 0.41 0.733 0.604 0.526 0.602 0.460 0.387 0.807 0.697 0.647 0.497 0.398 0.327 0.519 0.302 0.322 0.372 0.315 0.293 0.7 50 100 200 0.853 0.690 0.614 0.900 0.757 0.678 0.899 0.748 0.657 0.719 0.587 0.458 0.862 0.716 0.577 0.750 0.560 0.491 0.924 0.799 0.709 0.618 0.438 0.355 0.665 0.395 0.372 0.464 0.356 0.333 0.9 50 100 200 0.935 0.807 0.693 0.962 0.846 0.775 0.963 0.847 0.768 0.802 0.658 0.542 0.930 0.792 0.670 0.851 0.667 0.541 0.975 0.880 0.801 0.702 0.518 0.423 0.780 0.481 0.481 0.528 0.420 0.393 From Table 5, we can see that under Cauchy distribution, all tests have estimated type I errors around 0.05. In terms of the power, EDF and CP methods perform the best, while all other methods work similarly. Increasing the dimension does not have much influence on the performance of the methods. 16 L. GUO AND R. MODARRES Figure 1. The special patterns. The result under t distribution is shown in Table 6. All tests control the type I error well. Regarding the powers, one observes that EDF and CP perform the best, followed by Odds and Trans methods. The testing powers change slightly when we increase the dimension of the vectors. Table 7 shows the power of the statistics under the double exponential distribution. From the table, we can see that all tests have estimated type I errors around 0.05. The test statistic Trans has the largest power, followed by Odds and CP method. In this scenario, a larger dimension implies a smaller power, which is different from other cases. A different choice for the norm to calculate the interpoint distance might help improve the testing performance. Further analysis is needed for the optimal choice of norm. 3.3. Special dependence In an interesting article, Newton (2010) presents seven patterns of dependence and runs simulations to gauge the performance of the energy statistic proposed by Szekely et al. (2007) for testing the hypothesis of independence. Newton (2010) reports simulation results for p1 = p2 = 1 and n = 500. The patterns of the dependence that Newton considered are shown in Figure 1. The Pearson correlations are all zeros in these dependence structures. The results in Newton’s paper show that the energy test is powerful in testing special dependence other than Pearson correlation. In this section, we consider the same special dependence structures, but simulate them with large dimension and small sample sizes. Consider {(Xi , Yi )}ni=1 . Suppose (xi , yi ) is the coordinate of a point in the any of the patterns of dependence, for i = 1, . . . , n. The observations are generated by Xi = xi 1p1 , Yi = yi 1p2 , where 1p1 is a p1 × 1 vector of 1s. The dimension of the observations are p1 = p2 = 100 and the sample size varies in n ∈ {20, 30}. Each simulation is conducted under 1000 replications and the results are shown in Table 8. One can see that the distance correlation (Dcor) and projection covariance (Pcov) lose power when the sample size n is small. The EDF, CP and Quad methods are sensitive for the 1st, 4th, 5th and 6th dependence structures. The Odds test can detect the 1st, 4th and 5th dependence when the sample size is n = 30. Nearly all the methods except Dcor and Pcov work fine for the 4th and 5th dependence type. All the methods have difficulties detecting the 2nd, 3rd and the last dependence structure with limited sample size. 3.4. Discussion We have compared the testing methods under different distributions or dependence structures. There is not a single method that dominates all other methods in all situations. Generally speaking, tests based on interpoint distance perform better in detecting nonlinear dependence. In real-life problems, if one has prior knowledge of the distribution or JOURNAL OF NONPARAMETRIC STATISTICS 17 Table 8. The power of different statistics for independence test under special dependence structure. Type n EDF Odds CP Coripd Corcube Quad Trans Dcor HHG Pcov 1 20 30 0.588 0.998 0.229 0.809 0.569 1 0 0 0 0.141 0.549 0.980 0 0.086 0 0.002 0.046 0.620 0.008 0.161 2 20 30 0.073 0.080 0.026 0.014 0.0850 0.143 0.019 0.004 0.031 0.013 0.074 0.094 0.021 0.006 0.040 0.040 0.066 0.100 0.035 0.027 3 20 30 0.105 0.325 0.009 0.007 0.193 0.553 0.001 0 0.010 0.001 0.115 0.253 0.003 0.001 0.026 0.034 0.119 0.255 0.021 0.019 4 20 30 0.697 0.910 0.522 0.862 0.627 0.895 0.47 0.745 0.644 0.886 0.631 0.891 0.538 0.793 0.097 0.434 0.569 0.864 0.209 0.473 5 20 30 0.989 1 0.987 1 0.987 1 0.96 1 0.93 1 0.937 1 0.935 1 0.132 0.161 0.813 0.994 0.273 0.362 6 20 30 0.620 0.997 0.006 0.003 0.775 1 0 0 0 0 0.342 0.994 0 0 0 0 0.069 0.397 0 0.003 7 20 30 0.048 0.041 0.048 0.048 0.054 0.037 0.045 0.050 0.048 0.049 0.048 0.041 0.048 0.046 0.041 0.056 0.040 0.043 0.049 0.053 Table 9. Time complexity. EDF Odds CP Coripd Corcube Quad Trans Dcor O(n4 ) O(n4 ) O(n4 ) O(n2 ) O(n2 ) O(n4 ) O(n2 ) O(n2 ) HHG O(n2 log(n)) Pcov O(n3 ) prior information about the form of dependence to guard against, one can use the corresponding testing method. Otherwise, one can use an ensemble approach that uses all the methods and conclude with majority voting or weighted voting. With this adjustment, one can reduce the error from a single method. The major drawback of the methods based on IPDs is the computation of the interpoint distances. For n observations, the time complexity of IPDs is O(n2 ) and simple operation of finding the mean of IPDs has time complexity of O(n2 ). The time complexities for the methods are listed in Table 9. The time complexity higher than O(n2 ) is restrictive for large datasets. 4. Summary and concluding remarks Testing the hypothesis of independence is fundamental in statistics and has numerous applications in finance, network analysis, machine learning, microarray and DNA analysis, among others. Classical tests of independence are based on the asymptotic theory that the sample size n approaches infinity while the dimension p is fixed. Therefore, they are not effective in settings where p approaches infinity. In this paper, we propose seven novel statistics for testing independence where the sample size is smaller than the dimension of the observations. The proposed tests are based on interpoint distances, which is inherently one dimensional. Using the empirical distribution function of the IPDs within the X and Y samples, we propose the EDF test. Based on the 2 × 2 table of probabilities and estimates at the m observed IPDs of X and Y samples, the IPD odds ratio test, IPD cut-point method and IPD quadrant test are proposed. By calculating the Pearson correlation between the IPDs from the two samples, the IPD correlation test is proposed. IPD transformation test is constructed using the correlation between the transformation of the empirical distribution function of the IPDs or the cube root of the IPDs. Simulations are conducted to compare the performance of the proposed tests with three existing 18 L. GUO AND R. MODARRES testing methods under different situations. The simulations are performed under the multivariate (a) normal, (b) folded normal, (c) log-normal, (d) Cauchy, (e) t and (f) double exponential distributions. We also consider some special dependence patterns discussed by Newton (2010). Acknowledgments We gratefully acknowledge the constructive comments and suggestions of the two anonymous referees, and the associate editor. Disclosure statement No potential conflict of interest was reported by the authors. ORCID Reza Modarres http://orcid.org/0000-0003-1240-6027 References Agresti, A. (1980), ‘Generalized Odds Ratios for Ordinal Data’, Biometrics, 36, 59–67. Anderson, T.W. (2003), An Introduction to Multivariate Statistical Analysis (3rd ed), New York: Wiley. Bagkavos, D., and Patil, P.N. (2017), ‘A New Test of Independence for Bivariate Observations’, Journal of Multivariate Analysis, 160, 117–133. Blomqvist, N. (1950), ‘On a Measure of Dependence Between Two Random Variables’, Annals of Mathematical Statistics, 21, 593–600. Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961), ‘Distribution Free Tests of Independence Based on the Sample Distribution Function’, Annals of Mathematical Statistics, 32, 485–498. Fan, Y., Lafaye de Micheaux, P., Penev, S., and Salopeka, D. (2017), ‘Multivariate Nonparametric Test of Independence’, Journal of Multivariate Analysis, 153, 189–210. Feuerverger, A. (1993), ‘A Consistent Test for Bivariate Independence, Internat’, International Statistical Review, 61(3), 419–433. Fisher, R.A. (1915), ‘Frequency Distribution of the Values of the Correlation Coefficient in Samples from an Infintely Large Population’, Biometrika, 10, 507–521. Fleiss, J.L., Levin, B., and Paik, M.C.. (2013), Statistical Methods for Rates and Proportions, Hoboken, NJ: John Wiley & Sons. Freeman, J., and Modarres, R. (2005), ‘Efficiency of Test for Independence After Box–Cox Transformation’, Journal of Multivariate Analysis, 95, 107–118. Freeman, J., and Modarres, R. (2006), ‘Inverse Box–Cox: The Power-normal Distribution’, Statistics and Probabaility Letters, 76, 764–772. Friedman, J.H., and Rafsky, L.C. (1983), ‘Graph-theoretic Measures of Multivariate Association and Prediction’, Annals of Statistics, 11(2), 377–391. Gieser, P.W., and Randles, R.H. (1997), ‘A Nonparametric Test of Independence Between Two Vectors’, Journal of the American Statistical Association, 92, 561–567. Han, F., Chen, Z., and Liu, H. (2017), ‘Distribution-free Tests of Independence in High Dimensions’, Biometrika, 104(4),813–828. Heller, R., Heller, Y., and Gorfin, M. (2013), ‘A Consistent Test of Association Based on the Ranks of Distances’, Biometrika, 100(2), 503–510. Hoeffding, W. (1948), ‘A Non-parametric Test of Independence’, The Annals of Mathematical Statistics, 19, 546–557. Kowalski, C.J., and Tarter, M.E. (1969), ‘Coordinate Transformations to Normality and the Power of Normal Tests for Independence’, Biometrika, 56(1), 139–148. JOURNAL OF NONPARAMETRIC STATISTICS 19 Leone, F.C., Nelson, L.S., and Nottingham, R.B. (1961), ‘The Folded Normal Distribution’, Technometrics, 3(4), 543–550. Li, J. (2018), ‘Asymptotic Normality of Interpoint Distances for High-dimensional Data with Applications to the Two-sample Problem’, Biometrika, 105( 3),529–546. Mao, G. (2014), ‘A New Test of Independence for High-dimensional Data’, Statistics & Probability Letters, 93, 14–18. Mao, G. (2017), ‘Robust Test for Independence in High Dimensions’, Communications in Statistics Theory and Methods, 46, 10036–10050. Modarres, R. (2007), ‘A Test of Independence Based on the Likelihood of Cut-points’, Communications in Statistics: Simulation and Computation, 36, 817–825. Newton, M.A. (2010), Introducing the discussion paper by Szekely and Rizzo. arXiv preprint arXiv:1010.3575. Pitman, E.J.G. (1937), ‘Significance Tests which May Be Applied to Samples from Any Populations II. The Correlation Coefficient Test’, Supplement to the Journal of the Royal Statistical Society, 4, 225–232. Puri, M.L., and Sen, P.K.. (1971), Nonparametric Methods in Multivariate Analysis, New York, NY: John Wiley and Sons. Robert, P., Cléroux, R., and Ranger, N. (1985), ‘Some Results on Vector Correlation’, Computational Statistics and Data Analysis, 3, 25–32. Schott, J.R. (2005), ‘Testing for Complete Independence in High Dimensions’, Biometrika, 92, 951–956. Schott, J.R. (2008), ‘A Test for Independence of Two Sets of Variables when the Number of Variables Is Large Relative to the Sample Size’, Statistics and Probability Letters, 78, 3096–3102. Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), ‘Measuring and Testing Dependence by Correlation of Distances’, The Annals of Statistics, 35(6), 2769–2794. Song, Y., and Modarres, R. (2019), ‘Interpoint Distance Test of Homogeneity for Multivariate Mixture Models’, International Statistical Review, 1–26. doi:10.1111/insr.12332. Taskinen, S., Oja, H., and Randles, R.H. (2003), ‘Sign Test of Independence Between Two Random Vectors’, Statistics and Probability Letters, 62(1), 9–21. Taskinen, S., Oja, H., and Randles, R.H. (2005), ‘Multivariate Nonparametric Tests of Independence’, Journal of the American Statistical Association, 100(471), 916–925. Wilks, S.S. (1935), ‘On the Independence of K Sets of Normally Distributed Statistical Variables’, Econometrica, 3, 309–326. Wilson, E.B., and Hilferty, M.M. (1931), ‘The Distribution of Chi-squared’, Proceedings of the National Academy of Sciences, 17, 684–688. Zhu, L., Xu, K., Li, R., and Zhong, W. (2017), ‘Projection Correlation Between Two Random Vectors’, Biometrika, 104, 104(4),829–843. Appendix Theorem 2.1: The necessary part is trivial, since functions of independent random vectors are inde2 = X − X 2 and d2 = pendent. To prove the sufficient part we need to show that if IPDs d(X) 1 2 (Y) 2 Y1 − Y2  are independent, then X and Y are independent. Since the contrapositive of any true 2 and d2 are dependent. proposition is also true, we will show if X and Y are dependent, then d(X) (Y) ′ ′ ′ Suppose Z = [X , Y ] is a random vector from multivariate normal distribution with mean   (µx , µy ) and covariance matrix  = xxyx xyyy . Let σ(x)s1 s2 , σ(y)t1 t2 , σ(xy)st be the components of xx , yy , xy , respectively, for s, s1 , s2 = 1, . . . , p1 ; t, t1 , t2 = 1, . . . , p2 . Suppose X and Y are dependent, we have 2 tr(xy )= p1 p 2   s=1 t=1 2 σ(xy)st > 0. (A1) 20 L. GUO AND R. MODARRES Let Xit be the tth variable of the random vector Xi , and Yis be the sth variable of the random vector Yi for i = 1, 2, t = 1, . . . , p1 , s = 1, . . . , p2 . One can show that   p1 p2     2 2 2 2 (Y1s − Y2s ) (X1t − X2t ) , Cov d(X) , d(Y) = Cov s=1 t=1 p2 p1 =  4σ(x)tt σ(y)ss Cov W21t , W22s , t=1 s=1   where W1t = (X1t − X2t )/( 2σ(x)tt ) ∼ N(0, 1) and W2s = (Y1s − Y2s )/( 2σ(y)ss ) ∼ N(0, 1). The covariance between W1t and W2s is Cov (W1t , W2s ) = Cov (X1t − X2t , Y1s − Y2s ) 4σ(x)tt σ(y)ss = Cov (X1t , Y1s ) + Cov (X2t , Y2s ) 4σ(x)tt σ(y)ss = 1 ρ(xy)st , 2 where ρ(xy)st is the correlation between the tth component of X and sth component of Y vectors. Thus the W1t and W2s are jointly normal random variables with 21 ρ(xy)st correlation. The conditional expec2 tation and variance of W1t given W2s are E(W1t |W2s ) = ρ(xy)st W2s , and Var(W1t |W2s ) = 1 − ρ(xy)st . 2 2 2 2 2 2 Thus we have E(W1t |W2s ) = 1 − ρ(xy)st + ρ(xy)st W2s . The covariance between W1t and W2s is 2 2 2 2 Cov(W21t , W22s ) = E(W1t W2s ) − E(W1t )E(W2s ) 2 2 = E(E(W1t W2s |W2s )) − 1 2 2 2 2 = E[(1 − ρ(xy)st + ρ(xy)st W2s )W2s ]−1 2 4 = ρ(xy)st (E(W2s ) − 1) 2 = 2ρ(xy)st . 2 Since Cov(W21t , W22s ) = 2ρ(xy)st , we obtain p1 p2     Cov d2(X) , d2(Y) = 4σ(x)tt σ(y)ss Cov W21t , W22s , t=1 s=1 = p1 p2   2 8σ(x)tt σ(y)ss ρ(xy)st t=1 s=1 =8 p1 p2   Cov2 (X1t , Y1s ) t=1 s=1 ′ xy ). = 8tr(xy (A2) 2 Under the assumption of dependence, Equations (A1) and (A2) lead to Cov(d2(X) , d2(Y) ) > 0. Thus d(X) 2 are dependent. and d(Y) Proposition 2.1: Let xy be the covariance matrix between X and Y. Because every variable in X has a correlated variable in Y, there is at least p1 elements in xy that has larger absolute value than ǫ. JOURNAL OF NONPARAMETRIC STATISTICS 21 Based on Equation (A2), we have   ′ xy ) > 2ǫ 2 p1 → ∞. Cov d2(X) , d2(Y) = 2tr(xy Theorem 2.2: Let ρx and ρy denote the coefficient of correlations in the bivariate power normal and bivariate normal scales, respectively. Freeman and Modarres (2006) study the properties of the power normal distribution and show that ρx = h(ρy ) =  ( ∞ i=1 i!b1i b2i ∞ 2 i=1 b1i i!)( ρyi ∞ 2 j=1 b2j j!) , where the form of the function h depends on the mean, covariance and the transformation parameter. Here, b1i , b2i are defined in terms of the parameters and the Chebyshev–Hermite polynomials. It follows that ρy = 0 if and only if ρx = 0 and based on Theorem 2.1, the proof is completed. Lemma 2.1: Let W = X1 − X2 , then W follow multivariate normal distribution with mean 0 and 2 = W′ W, which is the quadratic form of W. Let Z = (2 )−1/2 W. covariance 2 xx . We have d(X) xx It follows that Cov(W) = I, where I is an identity matrix. Let P be a orthogonal matrix that diagonalises 2xx and PP′ = I. That is P′ 2xx P = diag(λ1 , . . . , λp1 ), where λ1 , . . . , λp1 are the eigenvalues of 2xx and diag(λ1 , . . . , λp1 ) is the diagonal matrix formed by the eigenvalues. Let U = P′ Z, it follows that Z = PU. The distribution of U is multivariate normal distribution with mean E(U) = 0 and covariance Cov(U) = I. Therefore, the quadratic form can be written as 2 d(X) = W′ W = Z′ (2xx )1/2 (2xx )1/2 Z = Z′ (2xx )Z = U′ P′ (2xx )PU ′ = U diag(λ1 , . . . , λp1 )U = p1  λt Ut2 , t=1 where U = (U1 , . . . , Up1 )′ , and Ut2 follows chi-square distribution for t = 1, . . . , p1 .