Journal of Nonparametric Statistics
ISSN: 1048-5252 (Print) 1029-0311 (Online) Journal homepage: https://www.tandfonline.com/loi/gnst20
Nonparametric tests of independence based on
interpoint distances
Lingzhe Guo & Reza Modarres
To cite this article: Lingzhe Guo & Reza Modarres (2020): Nonparametric tests of
independence based on interpoint distances, Journal of Nonparametric Statistics, DOI:
10.1080/10485252.2020.1714613
To link to this article: https://doi.org/10.1080/10485252.2020.1714613
View supplementary material
Published online: 16 Jan 2020.
Submit your article to this journal
Article views: 9
View related articles
View Crossmark data
Full Terms & Conditions of access and use can be found at
https://www.tandfonline.com/action/journalInformation?journalCode=gnst20
JOURNAL OF NONPARAMETRIC STATISTICS
https://doi.org/10.1080/10485252.2020.1714613
Nonparametric tests of independence based on interpoint
distances
Lingzhe Guo and Reza Modarres
Department of Statistics, George Washington University, Washington, DC, USA
ABSTRACT
ARTICLE HISTORY
We present novel tests for the hypothesis of independence when the
number of variables is larger than the number of vector observations.
We show that two multivariate normal vectors are independent if
and only if their interpoint distance are independent. The proposed
test statistics exploit different properties of the sample interpoint
distances. A simulation study compares the new tests with three
existing tests under various scenarios, including monotone and nonmonotone dependence structures. Numerical results show that the
new methods are effective for independence testing.
Received 2 May 2019
Accepted 6 January 2020
KEYWORDS
Interpoint distance; distance
covariance; cut-point
AMS SUBJECT
CLASSIFICATIONS
62G09; 62H15; 62H20; 62H12
1. Introduction
With numerous applications, independence testing is fundamental in statistics and has
been extensively studied in the past several decades. However, there is still a need for methods of testing independence in a number of statistical problems where the sample size
is smaller than the dimension. High-dimensional datasets are commonplace in finance,
robotics, network analysis, medical imaging, machine learning, microarray and DNA analysis where independence testing is a central issue. Typical examples include microarray
analysis where one assumes independence in the expression levels among different genes
and functional magnetic resonance imaging research where one is interested to test the
independence of voxels in different parts of the brain. Independent component analysis
(ICA) is another application where one decomposes multivariate data into mutually independent components. The ICA model is subject to a constraint that at most one of these
components is Gaussian as required for model identifiability. Classical tests of independence are based on the asymptotic theory that the sample size n approaches infinity while
the dimension p is fixed. Therefore, they are not effective in settings where p approaches
infinity. Our main concern is testing the hypothesis of independence when the p is much
larger than n. We present testing procedures that use the interpoint distances (IPDs), which
are inherently one dimensional. These methods are effective for large p and fixed n where
p > n.
Let {(Xi , Yi )}ni=1 be independent and identically distributed copies of random vector
pairs (X, Y) with joint distribution function (DF) F in Rp and marginal distributions
CONTACT Reza Modarres
[email protected]
© American Statistical Association and Taylor & Francis 2020
2
L. GUO AND R. MODARRES
Gx and Gy in Rp1 and Rp2 , respectively, where p = p1 + p2 . Our aim in this work is to
investigate whether X and Y are independent by testing the null hypothesis
for all (X, Y) in Rp ,
(1)
for at least one (X, Y) in Rp ,
(2)
H0 : F(X, Y) = Gx (X) Gy (Y)
against general alternatives
Ha : F(X, Y) = Gx (X) Gy (Y)
where p is larger than n.
Consider the random vector Z such that Z′ = [X′ , Y′ ]. For observations vectors that are
xy
, the null hypothesis of indenormally distributed with covariance matrix = xx
yx yy
pendence is equivalent to xy = 0. More generally, under any distribution where lack of
covariance implies independence, testing H0 is equivalent to testing that the covariance
xy = 0. Among the earliest problems in the development of multivariate techniques, the
classical test of independence is based on the Pearson correlation coefficient ρ when p1 =
p2 = 1. Wilks (1935) shows that the likelihood ratio test of H0 is V = (|R|)/(|Rxx |×|Ryy |)
where R, Rxx and Ryy are the corresponding sample correlation matrices. Anderson (2003)
shows that H0 is equivalent to the regression of one set on the other being zero. Moreover,
one does not require normality of X to obtain the distribution of V under H0 . The likelihood ratio test is strictly unbiased and is invariant under linear transformations of X and Y.
The distribution of V under the null hypothesis can be written as the product of beta random variables. Assuming finite fourth moments, one can show that −n log V approaches
a χ 2 distribution with p1 p2 degrees of freedom as n tends to infinity.
Hoeffding (1948) discusses a Cramer–von Mises type nonparametric test for bivariate independence that is robust to non-monotone dependence. Blum et al. (1961) (BKR)
obtained the asymptotic null distribution of an asymptotically equivalent form of Hoeffding statistic. Given n bivariate pairs (xi , yi ), BKR statistic is defined as
n
Bn =
1
(F̂(xi , yi ) − Ĝx (xi )Ĝy (yi ))2 ,
n
(3)
i=1
where F̂, Ĝx and Ĝy denote the joint and marginal empirical distribution functions,
respectively. The classical dependence measures such as Pearson, Spearman or Kendall
correlations have been used for detecting linear or monotone dependence structures. Han
et al. (2017) study two families of distribution-free test statistics, which include Kendall’s
and Spearman’s correlations. See also Mao (2017).
Puri and Sen (1971) present a nonparametric analog to Wilks’ test based on componentwise ranking. Friedman and Rafsky (1983) present graph-based tests of independence.
Generalising the univariate quadrant test of Blomqvist (1950); Gieser and Randles (1997)
proposed a nonparametric test based on interdirection counts. Feuerverger (1993) presents
a consistent bivariate rank test of independence. Taskinen et al. (2003, 2005) propose multivariate tests of independence based on spatial signs and ranks. Modarres (2007) presents a
novel test of independence based on the likelihood of cut-points. Bagkavos and Patil (2017)
propose a bivariate test of independence based on the property that under independence,
every quantile of one variable given the other is constant. Fan et al. (2017) propose a test
based on a Cramer–von Mises type functional of a process defined from the empirical
JOURNAL OF NONPARAMETRIC STATISTICS
3
characteristic functions. Perhaps the most popular multivariate independence test is based
on distance covariance proposed by Szekely et al. (2007). They use characteristic function
to transform high dimension data into one dimension. Zhu et al. (2017) argue that the
distance covariance may not work well for high dimensional data and propose projection
covariance for high dimensional data.
This article presents novel methods of testing H0 based on the sample interpoint distances when p is larger than n and compares the methods with existing tests in the literature.
Section 2 introduces seven new test statistics using the EDF of IPDs, method of cut-point,
odds ratio, correlation of IPDs, EDF in each quadrant and two normalising transformations of IPDs. Section 3 considers comparisons of the new tests with Dcor, HHG and Pcov.
The last section is devoted to summary and conclusions. The R code of the test statistics
considered in this paper is available from the corresponding author.
2. IPD tests of independence
Consider two random vectors X and Y. Let X = (X′ X)1/2 and Y = (Y′ Y)1/2 be the
Euclidean norm of X and Y. Heller et al. (2013) show that X and Y are dependent if there
exist two vectors x0 and y0 and two constants rx , ry such that
P(X − x0 < rx , Y − y0 < ry ) = P(X − x0 < rx )P(Y − y0 < ry ).
Because x0 , y0 , rx and ry are not known, Heller et al. (2013) take each sample point in turn to
be x0 , y0 and use all the possible interpoint distances to be rx , ry . Consider i.i.d. vector pairs
{(Xi , Yi )}ni=1 . The interpoint distance between Xi and Xj is given by d(x)ij = Xi − Xj , and
the interpoint distance between Yi and Yj is given by d(y)ij = Yi − Yj for 1 ≤ i < j ≤ n.
Since the distributions of d(x)ij are the same, we use d(X) to denote the interpoint distance
between Xs. Similarly, the random variable d(Y) represents the interpoint distance of Ys.
Thus the hypothesis we are interested in is
H0′ : P(d(X) ≤ x, d(Y) ≤ y) = P(d(X) ≤ x)P(d(Y) ≤ y) for all (x, y) ∈ R2 .
(4)
There are m = n(n − 1)/2 pairs of IPDs (d(x)ij , d(y)ij ). We refer to any X IPD with random variable d(X) with observed values {d(x)ij } for 1 ≤ i < j ≤ n. Similarly, we refer to
any Y IPD with random variable d(Y) with observed values d(y)ij . Two X IPDs d(x)ij
and d(x)kh are dependent if they have an index in common. With n observations, there
are (n(n − 1)(n − 2))/2 dependent pairs of distances. The following theorem shows
that under multivariate normal distribution, the independence between two vectors is
equivalent to the independence between the square interpoint distances of the two
vectors.
Theorem 2.1: Let (X1 , Y1 ), (X2 , Y2 ) be i.i.d. copies of random vector (X, Y) with joint multivariate normal distribution F in Rp and marginal distributions Gx and Gy in Rp1 and
2 = X − X 2 , d2 = Y − Y 2 . The
Rp2 , respectively, where p = p1 + p2 . Suppose d(X)
1
2
1
2
(Y)
2 and d2 are
vectors X and Y are independent if and only if the interpoint distances d(X)
(Y)
independent.
4
L. GUO AND R. MODARRES
The following proposition shows that when the increase in dimension involves correlated variables in two random vectors, the covariance between the interpoint distances will
increase.
Proposition 2.1: Let (X1 , Y1 ), (X2 , Y2 ) be i.i.d. copies of random vector (X, Y) with joint
multivariate normal distribution F in Rp and marginal distributions Gx and Gy in Rp1 and
Rp2 , respectively, where p = p1 + p2 . Suppose for every variable xi in X, there is at least one
variable (denoted as y(xi ) ) in Y such that the absolute value of their covariance is larger than
2 = X − X 2 ,
a positive constant ǫ, i.e. |cov(xi , y(xi ) )| > ǫ > 0, for i = 1, . . . , p1 . Let d(X)
1
2
2
2
2 goes to
2
d(Y) = Y1 − Y2 . When p1 goes to infinity, the covariance between d(X) and d(Y)
infinity.
Let x and y be the covariance matrix of X and Y, respectively. The correlation
coefficient
between the two random vectors is defined as Corr(X, Y) =
′ ′
tr(x2 )tr(y2 ) (Robert et al. 1985). One can show that under nor/
tr xy
xy
mal distribution, the correlation coefficient between two random vector is the same
as the correlation between their corresponding interpoint distances, i.e. Corr(X, Y) =
Corr(d2(X) , d2(Y) ). However, this equation may not hold under other distribution where the
dependence between X and Y is not linear.
λj
Consider the Box–Cox power transformation defined by Yj = (Xj − 1)/λj where λj =
0 and Yj = ln Xj when λj = 0 for each variable Xj , Yj in X = (X1 , X2 , . . . , Xp ) and Y =
(Y1 , Y2 , . . . , Yp ), respectively. Assuming that the random vector Y has a p-variate normal
distribution with mean µ and covariance matrix , the random vector X defined by the
inverse transformations is said to have a multivariate power normal distribution. Freeman
and Modarres (2006) show that the p.d.f. of a multivariate power normal distribution is
p
1
1
1
λj −1
′
f (X | λ, µ, ) = ·
· exp − Yj − µ −1 Yj − µ
Xj
p/2
1/2
K (2π ) | |
2
j=1
,
where K is a normalising constant that depends on λ, µ and . Practitioners often assume
K = 1. Under multivariate power normal distribution, the dependence between the variables is no longer linear. The following theorem shows that such non-linear dependence is
equivalent to the dependence between the interpoint distances.
Theorem 2.2: Let (W1 , Z1 ), (W2 , Z2 ) be i.i.d. copies of random vector (W, Z) with a joint
Power-normal distribution in Rp and marginal distributions Gw and Gz in Rp1 and Rp2 ,
2
2 = Z − Z 2 . The
respectively, where p = p1 + p2 . Suppose d(W)
= W1 − W2 2 , d(Z)
1
2
2
2 are
vectors W and Z are independent if and only if the interpoint distances d(W)
and d(Z)
independent.
This result shows that the statistics based on interpoint distances can not only detect
linear dependence but also find non-linear relationships that is induced on the power normal scale. The equivalence of the distributions’ independence and the interpoint distances’
independence under other distributions requires further research. However, in the simulations, we see that the tests based on interpoint distance also have good powers for other
types of dependence.
JOURNAL OF NONPARAMETRIC STATISTICS
5
2.1. IPD EDF method (EDF)
Denote the marginal DFs of d(X) , and d(Y) and their joint DF by Hx (x) = P(d(X) ≤ x),
Hy (y) = P(d(Y) ≤ y) and Hxy (x, y) = P(d(X) ≤ x, d(Y) ≤ y), respectively. Let I(·) denote
the indicator function and estimate the DFs by
Ĥx (x) =
n−1 n
1
I(d(x)st ≤ x),
m s=1 t=s+1
Ĥy (y) =
n−1 n
1
I(d(y)st ≤ y),
m s=1 t=s+1
Ĥxy (x, y) =
n−1 n
1
I(d(x)st ≤ x) × I(d(y)st ≤ y),
m s=1 t=s+1
(5)
where m = n(n − 1)/2.
We obtain m estimates of the joint and marginal distributions of IPDs at (d(x)ij , d(y)ij )
for 1 ≤ i < j ≤ n. To measure the inability of bivariate DF H to factorise into a product of
marginal DFs at each pair of IPDs (d(x)ij , d(y)ij ), we estimate the total squared deviations
from independence and define the test statistic
EDF =
n
n
i=1 j=i+1
[Ĥxy (d(x)ij , d(y)ij ) − Ĥx (d(x)ij ) Ĥy (d(y)ij )]2 .
(6)
The range of the statistics EDF is 0 ≤ EDF ≤ n(n − 1)/2. Under H0 we expect EDF to be
small, and we reject H0 for large values of EDF.
The statistic EDF is similar in spirit to the statistics Bn in Equation (3), but with a fundamental difference. While Bn is evaluated at the n observed i.i.d. pairs (xi , yi ), the m pairs
(d(x)ij , d(y)ij ) that construct EDF are not independent. Thus the null distribution of EDF
cannot be obtained asymptotically as the i.i.d. case. Instead, we use random permutation
testing to draw conclusion with EDF, which is described as follows.
Let R be the total number of permutations. In rth permutation for r = 1, . . . , R, we
(r) n
n
reorder the observations {Yi }ni=1 as {Y(r)
i }i=1 . Consider the sample pairs {Xi , Yi }i=1 and
(r)
calculate the statistic as EDF . When the null hypothesis of independence is true, the distribution of EDF (r) is the same as the distribution of EDF. Thus we can obtain the empirical
distribution of EDF based on the values of statistics EDF (r) .
The permutation test will result in exact critical values if we consider all possible permutations. However, unless the sample size is small, this is not computationally feasible.
Instead, we obtain an approximate value by performing a sequence of R random permutations. Since we reject the null hypothesis for large value of EDF, the p-value is calculated
by p-value EDF = Rr=1 I(EDF < EDF (r) ).
2.2. IPD cut-point method (CP)
Modarres (2007) describes a test statistic based on the sum of the likelihood ratio statistics
for testing independence in a 2 × 2 tables defined at n sample cut-points (xi , yi ) for testing the hypothesis H0 in (1) when p1 = p2 = 1. IPD cut-point method extends this idea
6
L. GUO AND R. MODARRES
to high dimensional independence testing using the IPDs. Consider an observed IPD pair
(d(x)ij , d(y)ij ) on the plane and the four regions obtained by the two lines passing through it
and parallel to the axes. Let H̄xy (d(x) , d(y) ) = 1 − Hx (d(x) ) − Hy (d(y) ) + Hxy (d(x) , d(y) ) be
the bivariate survival function of IPDs. Table 1 shows the probabilities and estimates of the
observations that fall in each of the four regions around (d(x)ij , d(y)ij ). Under random sampling, the joint distribution of the frequencies in the four quadrants follows a multinomial
distribution at each fixed cut-point (d(x)ij , d(y)ij ).
With each observation (d(x)ij , d(y)ij ), 1 ≤ i < j ≤ n, we associate a 2 × 2 table with
elements mrk(ij) /m for r, k = 1, 2 and calculate
m11(ij) =
n
n−1
I(d(x)st ≤ d(x)ij ) × I(d(x)st ≤ d(y)ij ),
m12(ij) =
n
n−1
I(d(x)st ≤ d(x)ij ) × I(d(y)st > d(y)ij ),
m21(ij) =
n
n−1
I(d(x)st > d(x)ij ) × I(d(y)st ≤ d(y)ij ),
m22(ij) =
n
n−1
I(d(x)st > d(x)ij ) × I(d(y)st > d(y)ij ),
s=1 t=s+1
s=1 t=s+1
s=1 t=s+1
s=1 t=s+1
m1+(ij) = m11(ij) + m12(ij) ,
m2+(ij) = m21(ij) + m22(ij) ,
m+1(ij) = m11(ij) + m21(ij) ,
m+2(ij) = m12(ij) + m22(ij) .
(7)
Recall that m = n(n − 1)/2 is the total number of the interpoint distance pairs. One
can show that m11(ij) + m12(ij) + m21(ij) + m22(ij) = m. Let cij = m!/(m11(ij) m12(ij) m21(ij)
m22(ij) ). The likelihood at each fixed observation (d(x)ij , d(y)ij ) is defined as
Lij = cij (Hxy (d(x)ij , d(y)ij )m11(ij) (Hx (d(x)ij ) − Hxy (d(x)ij ))m12(ij)
× (Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij ))m21(ij) (H̄xy (d(x)ij , d(y)ij )m22(ij) .
(8)
The likelihood ratio test of independence at each (d(x)ij , d(y)ij ) is
λij =
2
2
mr+(ij) m+s(ij)
m mrs(ij)
r=1 s=1
The cut-point test statistic is CP=
CP = 2
n
i=1
mrs(ij)
.
(9)
n
j=i+1 −2 log(λij ) or
n
2
2
n
i=1 j=i+1 r=1 s=1
mrs(ij) log
m mrs(ij)
.
mr+(ij) m+s(ij)
(10)
When n → ∞ and for fixed p the asymptotic distribution of −2 log(λij ) is χ 2 with one
degree of freedom. Furthermore, λij s are dependent as they depend on the cut-points
JOURNAL OF NONPARAMETRIC STATISTICS
7
Table 1. The 2 × 2 table of probabilities and estimates at (d(x)ij , d(y)ij ).
Prob.
d(X) ≤ d(x)ij
Estimates
d(X) > d(x)ij
Estimates
Estimates
d(y) ≤ d(y)ij
d(y) > d(y)ij
Hxy (d(x)ij , d(y)ij )
m11(ij) /m
Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij )
m21(ij) /m
Hy (d(y)ij )
m+1(ij) /m
Hx (d(x)ij ) − Hxy (d(x)ij , d(y)ij )
m12(ij) /m
H̄xy (d(x)ij , d(y)ij )
m22(ij) /m
1 − Hy (d(y)ij )
m+2(ij) /m
Hx (d(x)ij )
m1+(ij) /m
1 − Hx (d(x)ij )
m2+(ij) /m
1
1
(d(x)ij , d(y)ij ). The statistic CP computes the total likelihood of all m cut-points and the m
individual likelihood ratio test statistics −2 log(λij ) are available to further classify regions
of positive (negative) or no dependence. In practice, one uses random permutation testing
to obtain the sampling distribution of CP.
2.3. IPD odds ratio (Odds)
Fleiss et al. (2013) describe odds ratio test for the independence of two variables where each
variable has exactly two possible outcomes. Agresti (1980) proposed energised odds ratios
for ordinal data. In this section, we apply odds ratios to the interpoint distances. Consider
the odds ratio of the 2 × 2 table (Table 1), at each point (d(x)ij , d(y)ij ) we have the odds ratio
as
Hxy (d(x)ij , d(y)ij )H̄xy (d(x)ij , d(y)ij )
,
(11)
θ(xy)ij =
[Hx (d(x)ij ) − Hxy (d(x)ij , d(y)ij )][Hy (d(y)ij ) − Hxy (d(x)ij , d(y)ij )]
where 0 < θ(xy)ij < ∞. If the odds ratio θ(xy)ij is larger than 1, then the interpoint distance
of Xs and Ys at (d(x)ij , d(y)ij ) is associated in the sense that the IPD of Xs being d(x)ij raise the
odds of Y IPD being d(y)ij . If θ(xy)ij is less than 1, then X and Y IPD are negatively correlated.
Under independence, the odds ratios θ(xy)ij = 1 for all (d(x)ij , d(y)ij ) in R2 . The 2 × 2 table
of probabilities and estimates at (d(x)ij , d(y)ij ) appear in Table 1. Suppose we estimate the
odds ratio at each (d(x)ij , d(y)ij ) with θ̂(xy)ij = (m11(ij) m22(ij) )/(m12(ij) m21(ij) ). We consider
the average odds ratio and propose the statistic
n
n
1
Odds =
θ̂(xy)ij .
m i=1 j=i+1
(12)
If the alternative hypothesis is positive correlation between the IPDs, the null hypothesis
of independence is rejected for large values of Odds. If we consider the alternative as negative correlation between the IPDs, we reject the null hypothesis for small values of Odds.
Otherwise, we consider the difference between Odds and 1. The distribution of Odds under
the null is calculated based on the permutation principle.
2.4. IPD correlation test (Coripd )
The premier test of independence under bivariate normality is based on the sample Pearson correlation coefficient r. Using a geometric argument, Fisher (1915) found the exact
distribution of the correlation coefficient. Pitman (1937) obtains the permutation distribution of r, shows it symmetric with zero mean and variance 1/(n − 1) and proves that r is
8
L. GUO AND R. MODARRES
asymptotically normal as the sample size becomes unbounded. Having the distinction of
being the uniformly most powerful unbiased and uniformly most powerful invariant for
testing the independence of bivariate normal samples, the statistic r is designed to detect
linear dependence.
Let {Zi = [Xi′ , Y′i ]′ }ni=1 be the sample in Rp . Under some regularity conditions given in
Theorem 1 of Li (2018), as the dimension p tends to infinity the value of the squared interpoint distances Zi − Zj 2 for 1 ≤ i < j ≤ n converges to normal distribution. It follows
that the random vector (d(X) , d(Y) ) converges to a bivariate normal distribution as p tends
to infinity.
2 , d2 . Denote the mean vecConsider the covariance and the correlation for the pair d(X)
(Y)
tor of (d(X) , d(Y) ) with (µd(x) , µd(y) ) and estimate them by d̄(x) = 1/m ni=1 nj=i+1 d(x)ij
and d̄(y) = 1/m ni=1 nj=i+1 d(y)ij , respectively. Using the m observed pairs (d(x)ij , d(y)ij ),
we estimate the Pearson correlation between d(X) and d(Y) with
⎧
⎫
n
n−1
⎨
⎬
1
1
Coripd = Ĉorr d(X) , d(Y) =
(13)
d(x)ij d(y)ij − d̄(x) d̄(y) ,
⎭
sx sy ⎩ m
i=1 j=i+1
where sx and sy are the standard deviations of (d(x)ij ) and (d(x)ij ) for 1 ≤ i < j ≤ n, respectively. The range of the statistics Coripd is −1 ≤ Coripd ≤ 1. If the alternative hypothesis
is positive correlation between the IPDs, we reject the null hypothesis for large values of
Coripd . For negative correlation as alternative, the null hypothesis is rejected for a small
value of Coripd . If the alternative hypothesis is unknown, we consider the absolute value of
the statistics Coripd .
We obtain the sampling distribution of Coripd from random permutations.
Suppose {Z′i = [Xi′ , Y′i ]}ni=1 is a sample of p-dimensional random vectors
are
that
drawn
from normal distribution with mean (µx , µy ) and covariance matrix = xxyx xyyy . The following lemma shows that the distribution of square IPD between any two random vectors
within the X sample is a linear combination of p1 independent central chi-square random
variables with 1 degree of freedom.
Lemma 2.1: Let X1 , X2 be i.i.d. copies of random vectors X with multivariate normal dis2 = X − X 2 . Then d2 =
tribution Gx with mean µx and covariance xx . Suppose d(X)
1
2
(X)
p1
2
t=1 λt Ut
where λ1 , . . . , λp1 are eigenvalues of 2 xx and Ut2 ∼ χ 2 (1).
2 are 2tr
Based on Lemma 2.1, the expected value and variance of the IPDs for d(X)
xx
2
2
and 8tr xx , respectively. Similarly, the expected value and variance of d(Y) are 2tr yy
and 8tr 2yy , respectively. The proof of Theorem 2.1 shows that the covariance between
2 and d2 is Cov(d2 , d2 ) = 8tr( ′ ). Thus the correlation is Corr(d2 , d2 ) =
d(X)
xy xy
(X) (Y)
(Y)
(X) (Y)
′
2 )tr( 2 )).
(tr(xy xy )/ tr(xx
yy
Let rts be the sample correlation between the tth component of X and sth component
p1
p2
2
of Y vectors. Schott (2005) proposed the statistics T = t=1
s=1 rts to test the independence of two normal vectors. Under multivariate normality, the asymptotic distribution of
this statistic is considered by Schott (2008); Mao (2014). The nonparametric versions of
this statistic using Spearman’s and Kendall’s correlations have also been examined in the
JOURNAL OF NONPARAMETRIC STATISTICS
9
literature. Compared with the statistics that use the correlations between the components
of X and Y directly, the estimation of Corr(d2(X) , d2(Y) ) not only considers the information
from xy but also includes the correlations within the random vector.
2.5. IPD quadrant test (Quad)
While there is a unique definition for the cumulative distribution function (c.d.f) of a
random variable there is not a unique definition of the c.d.f for a multivariate random
vector. The distribution function is uniquely defined when the dimension is p = 1 since
P(X ≤ x) = 1 − P(X > x). This is no longer the case when p ≥ 2. With p variables, there
are 2p − 1 ways of defining a distribution function. Consider the bivariate case where one
may define the c.d.f of the pair (x, y) based on how the variables are accumulated in the
4 regions around (x, y). In the multivariate case, we may use IPDs to define the c.d.f as
P(d(X) ≤ x, d(Y) ≤ y), P(d(X) ≤ x, d(Y) > y), P(d(X) > x, d(Y) ≤ y) or P(d(X) > x, d(Y) >
y). To test for independence between X and Y we take all four possible orderings of the
IPDs into account. The criteria to measure the dependence at each (x, y) are
θ1 (x, y) = P(d(X) ≤ x, d(Y) ≤ y) − P(d(X) ≤ x) P(d(Y) ≤ y),
θ2 (x, y) = P(d(X) ≤ x, d(Y) > y) − P(d(X) ≤ x) P(d(Y) > y),
θ3 (x, y) = P(d(X) > x, d(Y) ≤ y) − P(d(X) > x) P(d(Y) ≤ y),
θ4 (x, y) = P(d(X > x, d(Y) > y) − P(d(X) > x) P(d(Y) > y).
4
It is clear that
k=1 θk (x, y) = 0. Under the null hypothesis of independence, we
have θk (x, y) = 0 for k = 1, . . . , 4. The null hypothesis is rejected if either one of
θ1 (x, y), . . . , θ4 (x, y) has large absolute value. We estimate θk with θ̂k where
θ̂1 (x, y) = Ĥxy (x, y) − Ĥx (x)Ĥy (y),
θ̂2 (x, y) = Ĥ2xy (x, y) − Ĥx (x)(1 − Ĥy (y)),
θ̂3 (x, y) = Ĥ3xy (x, y) − (1 − Ĥx (x))Ĥy (y),
θ̂4 (x, y) = Ĥ4xy (x, y) − (1 − Ĥx (x))(1 − Ĥy (y)),
Ĥ2xy (x, y) =
Ĥ3xy (x, y) =
Ĥ4xy (x, y) =
n
n
1
I(d(x)st ≤ x)I(d(y)st > y),
m s=1 t=s+1
n
n
1
I(d(x)st > x)I(d(y)st ≤ y),
m
s=1 t=s+1
n
n
1
I(d(x)st > x)I(d(y)st > y).
m s=1 t=s+1
To test H0 we will use the test statistic
Quad =
max
1≤i<j≤n,1≤k≤4
θ̂k2 (d(x)ij ,
d(y)ij ).
(14)
Ranging over the four quadrants and n data points, the statistic Quad is the maximum
discrepancy between the estimated c.d.f from its factorised version. The range of Quad is
10
L. GUO AND R. MODARRES
0 ≤ Quad ≤ 1, and we reject the null hypothesis for a large value of Quad. We use random
permutations to obtain the distribution of Quad.
2.6. IPD transformation tests
In this section, we introduce two other test statistics based on the transformation of the
marginal distributions to normality. It is well known that marginal transformation to normality enhances the power of the normal test for independence. Fisher’s Z-transformation
is an example. Kowalski and Tarter (1969) consider properties of the normal theory test
for independence applied to observations from bivariate distributions constrained to have
normal marginals and show that the normal theory test for independence is more powerful
when applied to the transformed vectors.
For a random variable X with a continuous DF Hx , the transformation Wx =
−1 (H (X)), where −1 is the standard normal quantile function, produces normally disx
tributed variates. We will compute the empirical density function of the IPDs using (5) and
then form
−1
Trans = Ĉorr
(Ĥx (x)), −1 (Ĥy (y)) ,
⎧
⎫
n−1 n
⎬
1 ⎨ 1 −1
(15)
=
(Ĥx (d(x)ij )) −1 (Ĥy (d(y)ij )) − µ̂1 µ̂2 ,
⎭
σ̂1 σ̂2 ⎩ m
i=1 j=i+1
where m = n(n − 1)/2, µ̂1 and σ̂1 are the mean and variance of −1 (Ĥx (d(x)ij )), µ̂2 and
σ̂2 are the mean and variance of −1 (Ĥy (d(y)ij )) for 1 ≤ i < j ≤ n. It measures the correlation between IPDs of X and Y after normalising transformations of the margins. The
distribution of Trans is obtained by permutation method.
Other transformations to normality such as cube root transformation have also been
suggested in the literature. Freeman and Modarres (2005) consider the efficiency and the
power of the normal theory test for independence after a Box–Cox transformation and
obtain an expression for the correlation between the variates after the transformation in
terms of the correlation on the normal scale. They show that it is always more efficient to
conduct the test of independence based on Pearson correlation coefficient after transformation to normality. Wilson and Hilferty (1931) consider the cube root transformation of
a chi-squared variable to normality. Lemma 2.1 shows that the distribution of interpoint
distances drawn from multivariate normal vectors is a linear combination of chi-squared
random variables, which is often approximated using a chi-squared distribution. The distribution of IPDs is of chi-squared type for observations from other distributions (Song
and Modarres 2019). Hence, we consider a test based on the correlation of the cube root of
1/3
1/3
the IPDs. Let the random vectors d(x) and d(y) denote the cube root of the sample IPDs.
The cube root transformation computes
1/3
IPDcube = Ĉorr d(x) ,
1/3
d(y) .
(16)
The approach to calculate the correlation estimate is similar to Equation (15). We use
random permutations to obtain the approximate sampling distribution of IPDcube .
JOURNAL OF NONPARAMETRIC STATISTICS
11
3. Comparisons
A desirable method should maintain its power in testing against all form of dependence
on the data. We consider some monotone dependence structure under different distributions in Section 3.2 and some non-monotone dependence structures in Section 3.3. Three
existing methods are described in Section 3.1 and will be used for comparison.
3.1. Dcor, HHG and pcov methods
Szekely et al. (2007) propose distance correlation to measure the dependence between random vectors. Suppose fX,Y (t, s), fX (t) and fY (s) are the joint and marginal characteristic
functions of the random vectors X and Y with finite first moments. The distance covariance
(dCov) between X and Y and their distance variance are defined by
V (X, Y) = fX,Y (t, s) − fX (t)fY (s)L2 ,
V (X) = fX,X (t, s) − fX (t)fX (s)L2 ,
V (Y) = fY,Y (t, s) − fY (t)fY (s)L2 ,
where ·L2 is a weighted L2 norm of the difference between the densities in it. The distance
correlation (Dcor) is defined as
V (X,Y)
√
, V (X)V (Y) > 0,
V (X)V (Y)
R (X, Y) =
V (X)V (Y) = 0.
0,
Szekely et al. (2007) prove that when Euclidean norms of X and Y all have finite expectations, the random vectors X and Y are independent if and only if R (X, Y) = 0. Motivated
by this property, they propose the empirical distance correlation Rn (X , Y ) for sample
X = {Xi }ni=1 and Y = {Yi }ni=1 .
The method finds the distance matrices of X and Y samples and centres them so each has
column and row means equal to zero. By averaging the entries of component-wise products of the two centred distance matrices, the squared distance covariance between the two
variables is found. Define akl = Xk − Xl , āk· = 1/n nl=1 akl , ā·l = 1/n nk=1 akl , ā·· =
1/n2 nk,l=1 akl , Akl = akl − āk· − ā·l + ā·· , bkl = Yk − Yl , b̄k· = 1/n nl=1 bkl , b̄·l =
1/n nk=1 bkl , b̄·· = 1/n2 nk,l=1 bkl , Bkl = bkl − b̄k· − b̄·l + b̄·· , for k, l = 1, . . . , n. The
empirical distance covariance and the empirical distance variance are
⎛
⎞1/2
n
1 ⎝
Vn (X , Y ) =
Akl Bkl ⎠ ,
n
k,l=1
Vn (X ) =
⎛
1⎝
n
n
k,l=1
⎞1/2
A2kl ⎠
,
⎛
⎞1/2
n
1 ⎝ 2 ⎠
Vn (Y ) =
Bkl
.
n
k,l=1
12
L. GUO AND R. MODARRES
The empirical distance correlation is
Rn (X , Y ) =
√ Vn (X ,Y ) ,
V n ( X ) Vn ( Y )
Vn (X )Vn (Y ) > 0,
Vn (X )Vn (Y ) = 0.
0,
Szekely et al. (2007) prove that almost surely, limn→∞ Rn2 (X , Y ) = R (X, Y). Therefore,
it is straightforward to use Rn to measure the independence based on sample X , Y . The R
package ‘energy’ is built for independent tests with distance correlation, and we use Dcor
to denote this method in the tables.
Heller et al. (2013) propose similar 2 × 2 contingency tables to Modarres (2007) while
replacing the distribution functions by the probability that (X, Y) falls in the balls with
centre (Xi , Yi ) and some radius, for i = 1, . . . , n. Based on Pearson’s test, Heller et al. (2013)
construct an R package named ‘HHG’ with the statistic as
HHG =
n
n
(n − 2){m12(ij) m21(ij) − m11(ij) m22(ij) }2
,
m1+(ij) m2+(ij) m+1(ij) m+2(ij)
i=1
j=1,j=i
where the m11(ij) , m12(ij) , m21(ij) , m22(ij) , m1+(ij) , m2+(ij) , m+1(ij) , m+2(ij) are defined in (7).
To handle high dimensional data, Zhu et al. (2017) project the data into low dimension and then apply Pearson correlation to test independence. They show that two random
vectors are independent if and only if any of their projections are independent. Motivated
by the equivalence, they defined the squared projection covariance between two random
vectors X and Y and proposed the test statistic as
Pcov = n
−3
n
Aklr Bklr ,
k,l,r=1
where
aklr
(Xk − Xr )′ (Xl − Xr )
,
= arccos
Xk − Xr Xl − Xr
āk·r = n−1
n
l=1
aklr , ā·lr = n−1
n
k=1
aklr , ā··r = n−2
Aklr = aklr − āk·r − ā·lr − ā··r ,
(Yk − Yr )′ (Yl − Yr )
bklr = arccos
,
Yk − Yr Yl − Yr
−1
b̄k·r = n
n
l=1
−1
bklr , b̄·lr = n
n
k=1
−2
bklr , b̄··r = n
n
n
aklr ,
n
n
bklr ,
k=1 l=1
k=1 l=1
Bklr = bklr − b̄k·r − b̄·lr − b̄··r .
3.2. Sampling experiment
In this section, we compare the proposed methods with the approaches from R package ‘energy’ and ‘HHG’, as well as the statistic from Zhu et al. (2017). The dependence
JOURNAL OF NONPARAMETRIC STATISTICS
13
Table 2. The type I error and power of different statistics under the multivariate normal distribution.
ρ
p1
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.051
0.060
0.048
0.035
0.054
0.063
0.046
0.067
0.050
0.042
0.065
0.061
0.042
0.064
0.059
0.049
0.047
0.057
0.042
0.068
0.059
0.043
0.045
0.046
0.041
0.054
0.042
0.049
0.041
0.045
0.1
50
100
200
0.431
0.731
0.93
0.566
0.837
0.959
0.505
0.795
0.949
0.583
0.859
0.972
0.560
0.841
0.967
0.386
0.673
0.905
0.536
0.817
0.959
0.792
0.971
0.999
0.507
0.808
0.958
0.763
0.961
0.997
0.2
50
100
200
0.893
0.978
0.998
0.943
0.989
1
0.928
0.985
0.998
0.953
0.993
1
0.946
0.991
1
0.876
0.972
0.997
0.931
0.988
0.999
0.991
1
1
0.939
0.992
1
0.987
1
1
0.3
50
100
200
0.983
0.995
1
0.993
0.998
1
0.989
0.998
1
0.995
0.999
1
0.994
0.999
1
0.978
0.994
1
0.993
0.997
1
1
1
1
0.992
0.999
1
1
1
1
structures we considered are correlated relations and some unusual dependence patterns. All tests are controlled by the significant level at α = 0.05. The simulations are
performed under the multivariate: (a) normal, (b) folded normal (Leone et al. 1961),
(c) log-normal, (d) Cauchy, (e) t and (f) double exponential distributions. Each simulation is conducted under 1000 replications with sample size n = 20 in each run. We use
p1 = p2 ∈ {50, 100, 200}. The testing samples are generated by the following steps. We first
generate p = p1 + p2 dimensional samples {Zi }ni=1 from a multivariate normal distribution with mean µ and covariance . We use the first p1 elements of Zi are set to Xi , and
use the remaining p2 elements of Zi are used for Yi , i.e. Zi = (Xi′ , Y′i )′ . Let = xxyx xyyy
where xy is of dimension p1 × p2 . By changing the matrix xy , we can generate the pair
sample {(Xi , Yi )}ni=1 with different degrees of dependence under the multivariate normal
distribution.
By taking the absolute value or exponentiating the elements of Xi and Yi , for i =
1, . . . , n, we obtain samples from the multivariate folded normal or log-normal distributions. We use = (σ )ij where σii = 1 + ρ and σij = ρ for i = j = 1, . . . , p. The value of
ρ measure the degree of the correlation and varies in ρ ∈ {0, 0.1, 0.2, 0.3}. When ρ = 0,
samples are drawn under the null hypothesis of independence and the simulation results
estimate the type I error.
Table 2 shows the estimated type I errors and powers of the statistics under the multivariate normal distribution. One notices that all tests have their type I error under control.
The estimated significance levels are close to the nominal level of 0.05. In terms of power,
one observes that Dcor and Pcov perform the best, followed by the Coripd and Corcube
tests. All 10 test statistics work fine under the multivariate normal distribution. Nearly all
powers are above 0.9 for alternatives close to independence. The testing powers of all the
methods increase when we increase the dimensions p1 and p2 .
Table 3 shows results for the multivariate folded normal distribution. The estimated
significance levels are close to the nominal level of 0.05. The tests Coripd , Corcube and Trans
have the largest power, followed by Odds and CP methods. The seven newly proposed
methods dominate Dcor, HHG and Pcov. Increasing the dimension increases the power
of all the testing methods. In addition, compared with testing the independence under
multivariate normal distribution, all tests show less power.
14
L. GUO AND R. MODARRES
Table 3. The type I error and power of different statistics under the multivariate folded normal
distribution.
ρ
p1
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.045
0.055
0.045
0.053
0.063
0.048
0.053
0.053
0.045
0.054
0.053
0.045
0.055
0.052
0.044
0.054
0.062
0.051
0.058
0.055
0.048
0.054
0.050
0.050
0.057
0.062
0.050
0.050
0.048
0.050
0.1
50
100
200
0.155
0.309
0.517
0.213
0.373
0.572
0.178
0.317
0.537
0.217
0.406
0.601
0.209
0.397
0.597
0.144
0.287
0.479
0.209
0.393
0.591
0.120
0.220
0.387
0.119
0.248
0.421
0.080
0.143
0.262
0.2
50
100
200
0.451
0.688
0.909
0.533
0.743
0.924
0.494
0.707
0.915
0.547
0.764
0.94
0.548
0.759
0.939
0.429
0.658
0.892
0.537
0.755
0.933
0.370
0.612
0.842
0.379
0.621
0.868
0.229
0.439
0.715
0.3
50
100
200
0.718
0.891
0.988
0.75
0.909
0.992
0.726
0.896
0.989
0.785
0.924
0.994
0.782
0.922
0.994
0.673
0.869
0.982
0.776
0.920
0.992
0.642
0.851
0.975
0.625
0.855
0.973
0.453
0.722
0.931
Table 4. The type I error and power of different statistics under the multivariate log-normal distribution.
ρ
p1
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.052
0.059
0.047
0.056
0.056
0.043
0.052
0.06
0.048
0.058
0.052
0.037
0.056
0.051
0.038
0.052
0.061
0.045
0.055
0.054
0.044
0.059
0.048
0.039
0.055
0.054
0.049
0.054
0.047
0.047
0.1
50
100
200
0.529
0.704
0.897
0.543
0.727
0.925
0.556
0.728
0.919
0.431
0.611
0.847
0.532
0.714
0.907
0.464
0.657
0.849
0.595
0.762
0.932
0.566
0.739
0.921
0.468
0.682
0.884
0.728
0.884
0.976
0.2
50
100
200
0.806
0.941
0.993
0.812
0.942
0.99
0.844
0.96
0.994
0.679
0.854
0.964
0.801
0.934
0.985
0.744
0.877
0.973
0.857
0.968
0.992
0.820
0.941
0.984
0.804
0.935
0.991
0.955
0.996
1
0.3
50
100
200
0.938
0.984
0.997
0.939
0.984
0.997
0.961
0.986
0.997
0.823
0.927
0.987
0.927
0.971
0.997
0.881
0.961
0.991
0.956
0.985
0.998
0.939
0.975
0.996
0.92
0.988
0.998
0.994
1
1
Table 4 shows the performance of the statistics under the multivariate log-normal distribution. Clearly, all tests have their type I error under control. Regarding the powers, Pcov
performs better than all other testing methods. The CP and Trans tests are not as powerful
as Pcov, but are competitive when compared with the other methods. Larger dimension
implies a higher power for all testing methods.
Table 5, 6 and 7 show the results for the multivariate Cauchy, t and double exponential distributions that exhibit heavy tails. We construct the vectors {(Xi , Yi )}ni=1 as
follows. We first generate the observations {Xi }ni=1 . For i = 1, . . . , n, each element of Xi
is generated from the selected distribution. The observations {Yi }ni=1 are generated as follows. Let Xi = (Xi1 , . . . , Xip1 )′ and Yi = (Yi1 , . . . , Yip2 )′ . Suppose q is an integer such that
0 ≤ q ≤ min(p1 , p2 ). For 1 ≤ j ≤ q, set Yij = Xij2 . For q < j ≤ p2 , Yij is generated from the
selected distribution independently. The value of q measures the degree of the dependence
between Xi and Yi . Larger value of q implies more dependence. We define q = 0 when
all the elements in Yi are generated independently from Xi . In simulations, the dimension varies in p1 = p2 ∈ {50, 100, 200}, and q is chosen such that q/p2 ∈ {0, 0.2, 0.5, 0.7}
for the Cauchy and t distributions and q/p2 ∈ {0, 0.5, 0.7, 0.9} for the double exponential
distribution.
JOURNAL OF NONPARAMETRIC STATISTICS
15
Table 5. The type I error and power of different statistics under the multivariate Cauchy distribution.
q
p2
p2
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.042
0.051
0.051
0.053
0.044
0.049
0.050
0.046
0.050
0.059
0.045
0.056
0.059
0.044
0.055
0.050
0.046
0.057
0.051
0.044
0.058
0.057
0.042
0.055
0.048
0.045
0.048
0.054
0.040
0.057
0.2
50
100
200
0.368
0.419
0.390
0.341
0.350
0.347
0.391
0.434
0.403
0.288
0.268
0.276
0.321
0.301
0.326
0.320
0.348
0.326
0.367
0.413
0.396
0.294
0.280
0.291
0.296
0.318
0.279
0.370
0.403
0.379
0.5
50
100
200
0.924
0.925
0.939
0.897
0.875
0.904
0.936
0.930
0.941
0.583
0.600
0.640
0.693
0.692
0.726
0.840
0.862
0.875
0.862
0.857
0.886
0.620
0.631
0.674
0.912
0.907
0.937
0.859
0.851
0.889
0.7
50
100
200
0.998
0.995
0.994
0.994
0.994
0.992
0.999
0.996
0.996
0.778
0.776
0.797
0.852
0.864
0.864
0.985
0.976
0.985
0.974
0.980
0.980
0.813
0.793
0.823
0.997
0.996
0.999
0.972
0.971
0.980
Table 6. The type I error and power of different statistics under the multivariate t distribution.
q
p2
p2
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.055
0.034
0.046
0.041
0.033
0.048
0.050
0.026
0.054
0.049
0.042
0.049
0.051
0.041
0.048
0.053
0.033
0.044
0.056
0.033
0.050
0.045
0.038
0.050
0.049
0.032
0.044
0.042
0.037
0.050
0.2
50
100
200
0.403
0.403
0.424
0.380
0.390
0.425
0.417
0.433
0.448
0.318
0.317
0.331
0.372
0.361
0.371
0.361
0.358
0.370
0.435
0.436
0.465
0.328
0.323
0.338
0.328
0.329
0.341
0.399
0.385
0.419
0.5
50
100
200
0.893
0.902
0.910
0.867
0.889
0.890
0.908
0.923
0.916
0.676
0.687
0.687
0.755
0.743
0.727
0.833
0.837
0.844
0.878
0.882
0.880
0.686
0.702
0.693
0.890
0.892
0.885
0.781
0.812
0.811
0.7
50
100
200
0.986
0.992
0.988
0.978
0.987
0.986
0.986
0.990
0.989
0.855
0.853
0.845
0.898
0.896
0.895
0.968
0.981
0.970
0.975
0.980
0.977
0.868
0.861
0.866
0.986
0.992
0.984
0.929
0.950
0.937
Table 7. The type I error and power of different statistics under the multivariate double exponential
distribution.
q
p2
p2
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
0
50
100
200
0.070
0.052
0.041
0.065
0.055
0.057
0.059
0.054
0.048
0.052
0.059
0.061
0.061
0.059
0.059
0.064
0.060
0.046
0.056
0.061
0.061
0.058
0.042
0.050
0.050
0.051
0.052
0.062
0.051
0.057
0.5
50
100
200
0.721
0.580
0.508
0.779
0.664
0.622
0.765
0.644
0.582
0.602
0.482
0.41
0.733
0.604
0.526
0.602
0.460
0.387
0.807
0.697
0.647
0.497
0.398
0.327
0.519
0.302
0.322
0.372
0.315
0.293
0.7
50
100
200
0.853
0.690
0.614
0.900
0.757
0.678
0.899
0.748
0.657
0.719
0.587
0.458
0.862
0.716
0.577
0.750
0.560
0.491
0.924
0.799
0.709
0.618
0.438
0.355
0.665
0.395
0.372
0.464
0.356
0.333
0.9
50
100
200
0.935
0.807
0.693
0.962
0.846
0.775
0.963
0.847
0.768
0.802
0.658
0.542
0.930
0.792
0.670
0.851
0.667
0.541
0.975
0.880
0.801
0.702
0.518
0.423
0.780
0.481
0.481
0.528
0.420
0.393
From Table 5, we can see that under Cauchy distribution, all tests have estimated type I
errors around 0.05. In terms of the power, EDF and CP methods perform the best, while
all other methods work similarly. Increasing the dimension does not have much influence
on the performance of the methods.
16
L. GUO AND R. MODARRES
Figure 1. The special patterns.
The result under t distribution is shown in Table 6. All tests control the type I error well.
Regarding the powers, one observes that EDF and CP perform the best, followed by Odds
and Trans methods. The testing powers change slightly when we increase the dimension
of the vectors.
Table 7 shows the power of the statistics under the double exponential distribution.
From the table, we can see that all tests have estimated type I errors around 0.05. The test
statistic Trans has the largest power, followed by Odds and CP method. In this scenario,
a larger dimension implies a smaller power, which is different from other cases. A different choice for the norm to calculate the interpoint distance might help improve the testing
performance. Further analysis is needed for the optimal choice of norm.
3.3. Special dependence
In an interesting article, Newton (2010) presents seven patterns of dependence and
runs simulations to gauge the performance of the energy statistic proposed by Szekely
et al. (2007) for testing the hypothesis of independence. Newton (2010) reports simulation results for p1 = p2 = 1 and n = 500. The patterns of the dependence that Newton
considered are shown in Figure 1. The Pearson correlations are all zeros in these dependence structures. The results in Newton’s paper show that the energy test is powerful in
testing special dependence other than Pearson correlation. In this section, we consider the
same special dependence structures, but simulate them with large dimension and small
sample sizes. Consider {(Xi , Yi )}ni=1 . Suppose (xi , yi ) is the coordinate of a point in the
any of the patterns of dependence, for i = 1, . . . , n. The observations are generated by
Xi = xi 1p1 , Yi = yi 1p2 , where 1p1 is a p1 × 1 vector of 1s. The dimension of the observations are p1 = p2 = 100 and the sample size varies in n ∈ {20, 30}. Each simulation is
conducted under 1000 replications and the results are shown in Table 8. One can see that
the distance correlation (Dcor) and projection covariance (Pcov) lose power when the sample size n is small. The EDF, CP and Quad methods are sensitive for the 1st, 4th, 5th and
6th dependence structures. The Odds test can detect the 1st, 4th and 5th dependence when
the sample size is n = 30. Nearly all the methods except Dcor and Pcov work fine for the
4th and 5th dependence type. All the methods have difficulties detecting the 2nd, 3rd and
the last dependence structure with limited sample size.
3.4. Discussion
We have compared the testing methods under different distributions or dependence structures. There is not a single method that dominates all other methods in all situations.
Generally speaking, tests based on interpoint distance perform better in detecting nonlinear dependence. In real-life problems, if one has prior knowledge of the distribution or
JOURNAL OF NONPARAMETRIC STATISTICS
17
Table 8. The power of different statistics for independence test under special dependence structure.
Type
n
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
HHG
Pcov
1
20
30
0.588
0.998
0.229
0.809
0.569
1
0
0
0
0.141
0.549
0.980
0
0.086
0
0.002
0.046
0.620
0.008
0.161
2
20
30
0.073
0.080
0.026
0.014
0.0850
0.143
0.019
0.004
0.031
0.013
0.074
0.094
0.021
0.006
0.040
0.040
0.066
0.100
0.035
0.027
3
20
30
0.105
0.325
0.009
0.007
0.193
0.553
0.001
0
0.010
0.001
0.115
0.253
0.003
0.001
0.026
0.034
0.119
0.255
0.021
0.019
4
20
30
0.697
0.910
0.522
0.862
0.627
0.895
0.47
0.745
0.644
0.886
0.631
0.891
0.538
0.793
0.097
0.434
0.569
0.864
0.209
0.473
5
20
30
0.989
1
0.987
1
0.987
1
0.96
1
0.93
1
0.937
1
0.935
1
0.132
0.161
0.813
0.994
0.273
0.362
6
20
30
0.620
0.997
0.006
0.003
0.775
1
0
0
0
0
0.342
0.994
0
0
0
0
0.069
0.397
0
0.003
7
20
30
0.048
0.041
0.048
0.048
0.054
0.037
0.045
0.050
0.048
0.049
0.048
0.041
0.048
0.046
0.041
0.056
0.040
0.043
0.049
0.053
Table 9. Time complexity.
EDF
Odds
CP
Coripd
Corcube
Quad
Trans
Dcor
O(n4 )
O(n4 )
O(n4 )
O(n2 )
O(n2 )
O(n4 )
O(n2 )
O(n2 )
HHG
O(n2
log(n))
Pcov
O(n3 )
prior information about the form of dependence to guard against, one can use the corresponding testing method. Otherwise, one can use an ensemble approach that uses all the
methods and conclude with majority voting or weighted voting. With this adjustment, one
can reduce the error from a single method. The major drawback of the methods based on
IPDs is the computation of the interpoint distances. For n observations, the time complexity of IPDs is O(n2 ) and simple operation of finding the mean of IPDs has time complexity
of O(n2 ). The time complexities for the methods are listed in Table 9. The time complexity
higher than O(n2 ) is restrictive for large datasets.
4. Summary and concluding remarks
Testing the hypothesis of independence is fundamental in statistics and has numerous
applications in finance, network analysis, machine learning, microarray and DNA analysis, among others. Classical tests of independence are based on the asymptotic theory
that the sample size n approaches infinity while the dimension p is fixed. Therefore, they
are not effective in settings where p approaches infinity. In this paper, we propose seven
novel statistics for testing independence where the sample size is smaller than the dimension of the observations. The proposed tests are based on interpoint distances, which is
inherently one dimensional. Using the empirical distribution function of the IPDs within
the X and Y samples, we propose the EDF test. Based on the 2 × 2 table of probabilities and estimates at the m observed IPDs of X and Y samples, the IPD odds ratio test,
IPD cut-point method and IPD quadrant test are proposed. By calculating the Pearson
correlation between the IPDs from the two samples, the IPD correlation test is proposed.
IPD transformation test is constructed using the correlation between the transformation
of the empirical distribution function of the IPDs or the cube root of the IPDs. Simulations are conducted to compare the performance of the proposed tests with three existing
18
L. GUO AND R. MODARRES
testing methods under different situations. The simulations are performed under the multivariate (a) normal, (b) folded normal, (c) log-normal, (d) Cauchy, (e) t and (f) double
exponential distributions. We also consider some special dependence patterns discussed by
Newton (2010).
Acknowledgments
We gratefully acknowledge the constructive comments and suggestions of the two anonymous
referees, and the associate editor.
Disclosure statement
No potential conflict of interest was reported by the authors.
ORCID
Reza Modarres
http://orcid.org/0000-0003-1240-6027
References
Agresti, A. (1980), ‘Generalized Odds Ratios for Ordinal Data’, Biometrics, 36, 59–67.
Anderson, T.W. (2003), An Introduction to Multivariate Statistical Analysis (3rd ed), New York: Wiley.
Bagkavos, D., and Patil, P.N. (2017), ‘A New Test of Independence for Bivariate Observations’,
Journal of Multivariate Analysis, 160, 117–133.
Blomqvist, N. (1950), ‘On a Measure of Dependence Between Two Random Variables’, Annals of
Mathematical Statistics, 21, 593–600.
Blum, J.R., Kiefer, J., and Rosenblatt, M. (1961), ‘Distribution Free Tests of Independence Based on
the Sample Distribution Function’, Annals of Mathematical Statistics, 32, 485–498.
Fan, Y., Lafaye de Micheaux, P., Penev, S., and Salopeka, D. (2017), ‘Multivariate Nonparametric Test
of Independence’, Journal of Multivariate Analysis, 153, 189–210.
Feuerverger, A. (1993), ‘A Consistent Test for Bivariate Independence, Internat’, International
Statistical Review, 61(3), 419–433.
Fisher, R.A. (1915), ‘Frequency Distribution of the Values of the Correlation Coefficient in Samples
from an Infintely Large Population’, Biometrika, 10, 507–521.
Fleiss, J.L., Levin, B., and Paik, M.C.. (2013), Statistical Methods for Rates and Proportions, Hoboken,
NJ: John Wiley & Sons.
Freeman, J., and Modarres, R. (2005), ‘Efficiency of Test for Independence After Box–Cox Transformation’, Journal of Multivariate Analysis, 95, 107–118.
Freeman, J., and Modarres, R. (2006), ‘Inverse Box–Cox: The Power-normal Distribution’, Statistics
and Probabaility Letters, 76, 764–772.
Friedman, J.H., and Rafsky, L.C. (1983), ‘Graph-theoretic Measures of Multivariate Association and
Prediction’, Annals of Statistics, 11(2), 377–391.
Gieser, P.W., and Randles, R.H. (1997), ‘A Nonparametric Test of Independence Between Two
Vectors’, Journal of the American Statistical Association, 92, 561–567.
Han, F., Chen, Z., and Liu, H. (2017), ‘Distribution-free Tests of Independence in High Dimensions’,
Biometrika, 104(4),813–828.
Heller, R., Heller, Y., and Gorfin, M. (2013), ‘A Consistent Test of Association Based on the Ranks
of Distances’, Biometrika, 100(2), 503–510.
Hoeffding, W. (1948), ‘A Non-parametric Test of Independence’, The Annals of Mathematical
Statistics, 19, 546–557.
Kowalski, C.J., and Tarter, M.E. (1969), ‘Coordinate Transformations to Normality and the Power of
Normal Tests for Independence’, Biometrika, 56(1), 139–148.
JOURNAL OF NONPARAMETRIC STATISTICS
19
Leone, F.C., Nelson, L.S., and Nottingham, R.B. (1961), ‘The Folded Normal Distribution’, Technometrics, 3(4), 543–550.
Li, J. (2018), ‘Asymptotic Normality of Interpoint Distances for High-dimensional Data with Applications to the Two-sample Problem’, Biometrika, 105( 3),529–546.
Mao, G. (2014), ‘A New Test of Independence for High-dimensional Data’, Statistics & Probability
Letters, 93, 14–18.
Mao, G. (2017), ‘Robust Test for Independence in High Dimensions’, Communications in Statistics Theory and Methods, 46, 10036–10050.
Modarres, R. (2007), ‘A Test of Independence Based on the Likelihood of Cut-points’, Communications in Statistics: Simulation and Computation, 36, 817–825.
Newton, M.A. (2010), Introducing the discussion paper by Szekely and Rizzo. arXiv preprint
arXiv:1010.3575.
Pitman, E.J.G. (1937), ‘Significance Tests which May Be Applied to Samples from Any Populations
II. The Correlation Coefficient Test’, Supplement to the Journal of the Royal Statistical Society,
4, 225–232.
Puri, M.L., and Sen, P.K.. (1971), Nonparametric Methods in Multivariate Analysis, New York, NY:
John Wiley and Sons.
Robert, P., Cléroux, R., and Ranger, N. (1985), ‘Some Results on Vector Correlation’, Computational
Statistics and Data Analysis, 3, 25–32.
Schott, J.R. (2005), ‘Testing for Complete Independence in High Dimensions’, Biometrika,
92, 951–956.
Schott, J.R. (2008), ‘A Test for Independence of Two Sets of Variables when the Number of Variables
Is Large Relative to the Sample Size’, Statistics and Probability Letters, 78, 3096–3102.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007), ‘Measuring and Testing Dependence by Correlation of Distances’, The Annals of Statistics, 35(6), 2769–2794.
Song, Y., and Modarres, R. (2019), ‘Interpoint Distance Test of Homogeneity for Multivariate
Mixture Models’, International Statistical Review, 1–26. doi:10.1111/insr.12332.
Taskinen, S., Oja, H., and Randles, R.H. (2003), ‘Sign Test of Independence Between Two Random
Vectors’, Statistics and Probability Letters, 62(1), 9–21.
Taskinen, S., Oja, H., and Randles, R.H. (2005), ‘Multivariate Nonparametric Tests of Independence’,
Journal of the American Statistical Association, 100(471), 916–925.
Wilks, S.S. (1935), ‘On the Independence of K Sets of Normally Distributed Statistical Variables’,
Econometrica, 3, 309–326.
Wilson, E.B., and Hilferty, M.M. (1931), ‘The Distribution of Chi-squared’, Proceedings of the
National Academy of Sciences, 17, 684–688.
Zhu, L., Xu, K., Li, R., and Zhong, W. (2017), ‘Projection Correlation Between Two Random Vectors’,
Biometrika, 104, 104(4),829–843.
Appendix
Theorem 2.1: The necessary part is trivial, since functions of independent random vectors are inde2 = X − X 2 and d2 =
pendent. To prove the sufficient part we need to show that if IPDs d(X)
1
2
(Y)
2
Y1 − Y2 are independent, then X and Y are independent. Since the contrapositive of any true
2 and d2 are dependent.
proposition is also true, we will show if X and Y are dependent, then d(X)
(Y)
′
′
′
Suppose Z = [X , Y ] is a random vector from multivariate normal distribution with mean
(µx , µy ) and covariance matrix = xxyx xyyy . Let σ(x)s1 s2 , σ(y)t1 t2 , σ(xy)st be the components of
xx , yy , xy , respectively, for s, s1 , s2 = 1, . . . , p1 ; t, t1 , t2 = 1, . . . , p2 . Suppose X and Y are dependent, we have
2
tr(xy
)=
p1 p 2
s=1 t=1
2
σ(xy)st
> 0.
(A1)
20
L. GUO AND R. MODARRES
Let Xit be the tth variable of the random vector Xi , and Yis be the sth variable of the random vector Yi
for i = 1, 2, t = 1, . . . , p1 , s = 1, . . . , p2 . One can show that
p1
p2
2
2
2
2
(Y1s − Y2s )
(X1t − X2t ) ,
Cov d(X) , d(Y) = Cov
s=1
t=1
p2
p1
=
4σ(x)tt σ(y)ss Cov W21t , W22s ,
t=1 s=1
where W1t = (X1t − X2t )/( 2σ(x)tt ) ∼ N(0, 1) and W2s = (Y1s − Y2s )/( 2σ(y)ss ) ∼ N(0, 1).
The covariance between W1t and W2s is
Cov (W1t , W2s ) =
Cov (X1t − X2t , Y1s − Y2s )
4σ(x)tt σ(y)ss
=
Cov (X1t , Y1s ) + Cov (X2t , Y2s )
4σ(x)tt σ(y)ss
=
1
ρ(xy)st ,
2
where ρ(xy)st is the correlation between the tth component of X and sth component of Y vectors. Thus
the W1t and W2s are jointly normal random variables with 21 ρ(xy)st correlation. The conditional expec2
tation and variance of W1t given W2s are E(W1t |W2s ) = ρ(xy)st W2s , and Var(W1t |W2s ) = 1 − ρ(xy)st
.
2
2
2
2
2
2
Thus we have E(W1t |W2s ) = 1 − ρ(xy)st + ρ(xy)st W2s . The covariance between W1t and W2s is
2
2
2
2
Cov(W21t , W22s ) = E(W1t
W2s
) − E(W1t
)E(W2s
)
2
2
= E(E(W1t
W2s
|W2s )) − 1
2
2
2
2
= E[(1 − ρ(xy)st
+ ρ(xy)st
W2s
)W2s
]−1
2
4
= ρ(xy)st
(E(W2s
) − 1)
2
= 2ρ(xy)st
.
2
Since Cov(W21t , W22s ) = 2ρ(xy)st
, we obtain
p1 p2
Cov d2(X) , d2(Y) =
4σ(x)tt σ(y)ss Cov W21t , W22s ,
t=1 s=1
=
p1 p2
2
8σ(x)tt σ(y)ss ρ(xy)st
t=1 s=1
=8
p1 p2
Cov2 (X1t , Y1s )
t=1 s=1
′
xy ).
= 8tr(xy
(A2)
2
Under the assumption of dependence, Equations (A1) and (A2) lead to Cov(d2(X) , d2(Y) ) > 0. Thus d(X)
2 are dependent.
and d(Y)
Proposition 2.1: Let xy be the covariance matrix between X and Y. Because every variable in X has
a correlated variable in Y, there is at least p1 elements in xy that has larger absolute value than ǫ.
JOURNAL OF NONPARAMETRIC STATISTICS
21
Based on Equation (A2), we have
′
xy ) > 2ǫ 2 p1 → ∞.
Cov d2(X) , d2(Y) = 2tr(xy
Theorem 2.2: Let ρx and ρy denote the coefficient of correlations in the bivariate power normal and
bivariate normal scales, respectively. Freeman and Modarres (2006) study the properties of the power
normal distribution and show that
ρx = h(ρy ) =
(
∞
i=1 i!b1i b2i
∞ 2
i=1 b1i i!)(
ρyi
∞ 2
j=1 b2j j!)
,
where the form of the function h depends on the mean, covariance and the transformation parameter.
Here, b1i , b2i are defined in terms of the parameters and the Chebyshev–Hermite polynomials. It follows
that ρy = 0 if and only if ρx = 0 and based on Theorem 2.1, the proof is completed.
Lemma 2.1: Let W = X1 − X2 , then W follow multivariate normal distribution with mean 0 and
2 = W′ W, which is the quadratic form of W. Let Z = (2 )−1/2 W.
covariance 2 xx . We have d(X)
xx
It follows that Cov(W) = I, where I is an identity matrix. Let P be a orthogonal matrix that diagonalises 2xx and PP′ = I. That is P′ 2xx P = diag(λ1 , . . . , λp1 ), where λ1 , . . . , λp1 are the eigenvalues
of 2xx and diag(λ1 , . . . , λp1 ) is the diagonal matrix formed by the eigenvalues. Let U = P′ Z, it follows that Z = PU. The distribution of U is multivariate normal distribution with mean E(U) = 0 and
covariance Cov(U) = I. Therefore, the quadratic form can be written as
2
d(X)
= W′ W = Z′ (2xx )1/2 (2xx )1/2 Z
= Z′ (2xx )Z = U′ P′ (2xx )PU
′
= U diag(λ1 , . . . , λp1 )U =
p1
λt Ut2 ,
t=1
where U = (U1 , . . . , Up1 )′ , and Ut2 follows chi-square distribution for t = 1, . . . , p1 .