Mathematics 09 03003 v3

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

mathematics

Article
A New Goodness of Fit Test for Multivariate Normality and
Comparative Simulation Study
Jurgita Arnastauskaitė 1,2, * , Tomas Ruzgas 2 and Mindaugas Bražėnas 3

1 Department of Applied Mathematics, Kaunas University of Technology, 51368 Kaunas, Lithuania


2 Department of Computer Sciences, Kaunas University of Technology, 51368 Kaunas, Lithuania;
[email protected]
3 Department of Mathematical Modelling, Kaunas University of Technology, 51368 Kaunas, Lithuania;
[email protected]
* Correspondence: [email protected]

Abstract: The testing of multivariate normality remains a significant scientific problem. Although it
is being extensively researched, it is still unclear how to choose the best test based on the sample size,
variance, covariance matrix and others. In order to contribute to this field, a new goodness of fit test
for multivariate normality is introduced. This test is based on the mean absolute deviation of the
empirical distribution density from the theoretical distribution density. A new test was compared
with the most popular tests in terms of empirical power. The power of the tests was estimated for the
selected alternative distributions and examined by the Monte Carlo modeling method for the chosen
sample sizes and dimensions. Based on the modeling results, it can be concluded that a new test is
one of the most powerful tests for checking multivariate normality, especially for smaller samples. In

addition, the assumption of normality of two real data sets was checked.


Citation: Arnastauskaitė, J.; Ruzgas, Keywords: multivariate normality; power of tests; squared radii; skewness; kurtosis
T.; Bražėnas, M. A New Goodness of
Fit Test for Multivariate Normality
and Comparative Simulation Study.
Mathematics 2021, 9, 3003. https:// 1. Introduction
doi.org/10.3390/math9233003 Much multivariate data is being collected by monitoring natural and social processes.
IBM estimates that we all generate 175 zettabytes of data every day. To add, the data
Academic Editor: María Purificación
were collected at a rapidly increasing rate, i.e., it is estimated that 90% of data has been
Galindo Villardón
generated in the last two years. The need to extract useful information from continuously
generated data sets drives demand for data specialists and the development of robust
Received: 12 October 2021
analysis methods.
Accepted: 19 November 2021
Data analytics is inconceivable without testing the goodness of fit hypothesis. The
Published: 23 November 2021
primary task of a data analyst is to become familiar with the data sets received. This usually
starts by identifying the distribution of the data. Then, the assumption that the data follow
Publisher’s Note: MDPI stays neutral
a normal distribution is usually tested. Since 1990, many tests have been developed to test
with regard to jurisdictional claims in
published maps and institutional affil-
this assumption, mostly for univariate data.
iations.
It is important to use the powerful tests for the goodness of fit hypothesis to test
the assumption of normality because an alternative distribution is not known in general.
Based on the outcome of normality verification, one can choose suitable analysis methods
(parametric or non-parametric) for further investigation. From the end of the 20th century
to the present day, multivariate tests for testing the goodness of fit hypothesis have been
Copyright: © 2021 by the authors.
developed by a number of authors [1–14]. Some of the most popular and commonly used
Licensee MDPI, Basel, Switzerland.
multivariate tests are Chi-Square [8], Cramer von Mises [2], Anderson-Darling [2], and
This article is an open access article
distributed under the terms and
Royston [3].
conditions of the Creative Commons
Checking the assumption of normality of multivariate data is more complex compared
Attribution (CC BY) license (https://
to univariate. Additional data processing is required (e.g., standardization). The develop-
creativecommons.org/licenses/by/ ment of multivariate tests is more complex because they require checking the properties
4.0/). of invariance and contingency. While for the univariate tests, the invariance property is

Mathematics 2021, 9, 3003. https://doi.org/10.3390/math9233003 https://www.mdpi.com/journal/mathematics


Mathematics 2021, 9, 3003 2 of 20

always satisfied. The properties of invariance, contingency are presented in Section 2 and
are discussed in more detail in [2,12,15].
The study aims to perform a power analysis of the multivariate goodness of fit
hypothesis tests for the assumption of normality, to find out proposed test performances
compared to other well-known tests and to apply the multivariate tests to the real data.
The power estimation procedure is discussed in [16].
Scientific novelty. The power analysis of multivariate goodness of fit hypothesis
testing for different data sets was performed. The goodness of fit tests were selected as
representatives of popular techniques, which had been analyzed by other researchers
experimentally. In addition, we proposed a new multivariate test based on the mean
absolute deviation of the empirical distribution density from the theoretical distribution
density. In this test, the density estimate is derived by using an inversion formula which is
presented in Section 3.
The rest of the paper is organized as follows. Section 2 defines the tests for the
comparative multivariate test power study. Section 3 presents details of our proposed
test. Section 4 presents the data distributions used for experimental test power evaluation.
Section 5 presents and discusses the results of simulation modeling. Section 6 discusses
the application of multivariate goodness of fit hypothesis tests to real data. Finally, the
conclusions and recommendations are given in Section 7.

2. Multivariate Tests for Normality


We denote the p-variate normal distribution as Np (µ, ∑), where µ is an expecta-
T
tion vector µ = µ1 , . . . , µ p and ∑ is the nonsingular covariance matrix. Np indi-
cates a set of all possible p-variate normal distributions. Let X1 , X2 , . . . , Xn , where Xk =
 T
Xk1 , Xk2 , . . . , Xkp and k = 1, 2, . . . , n, be a finite sample generated by a random p-
variate (column) vector X with distribution function FX . The mean vector X is given by
n
1
X= n ∑ X j , where n is the sample size and the sample covariate matrix is
j =1

n
1 T
S=
n ∑ (Xj − X )(Xj − X ) .
j =1

To assess multivariate normality of X (based on the observed sample X1 , X2 , . . . , Xn ) a


lot of statistical tests have been developed. Before reviewing specific tests, selected for this
study, let us consider two essential properties. The set Np is closed with respect to affine
transformations, i.e.,
FAX +b ∈ Np ⇔ FX ∈ Np ,
for any translation vector b ∈ R p and any nonsingular matrix A ∈ R p× p . Thus, a reasonable
statistic Tn for checking the null hypothesis (H0 ) of multivariate normality should have the
same value for a sample and its affine transforms, that is

Tn ( AX1 + b, . . . , AXn + b) = Tn ( X1 , . . . , Xn ). (1)

An invariant test has a statistic, which satisfies the condition (1). It might seem that a
test based on a standardized sample
1
Yj = S− 2 ( X j − X ),

is invariant, however Henze and Zirkler [2] note that this is not always the case. In practice,
for a given sample X1 , X2 , . . . , Xn the alternative distribution is not know. In such a case it
is important to use a test for which the probability of correctly rejecting H0 tends to one
as n → ∞ . Such a test is said to be consistent. For more elaborate discussion on these
properties we refer the reader to [2]. Other important denotes are given in Appendix A.
Mathematics 2021, 9, 3003 3 of 20

2.1. Tests Based on Squared Radii


This section reviews the properties of several measures of squared radii concerning
their use for assessing multivariate normality. Squared radii are defined as
T
S−1 X j − X , j = 1, 2, .., n.

Dn,j = X j − X
 
p n − p −1
Dn,j have a distribution which, under normality, is (n − 1)2 /n times a Beta 2, 2
distribution [9]. Under H0 , the distribution of Dn,j is approximately χ2p for large n.

2.1.1. Chi-Squared (CHI2)


In 1981, Moore and Stubblebine presented multivariate Chi-Squared goodness of fit
test based on order statistics [8]. The statistic of the test is defined as
k
k  n 2
Mn,k =
n ∑ Nn,l −
k
, (2)
l =1

n 
where Nn,l = ∑ 1 al −1 < Dn,j ≤ al , (l = 1, 2, . . . , k; a0 = 0, ak = +∞). Since Mn,k takes

j =1
the equivalent form [8]:

k
Mn,k = k ∑ ( Gn ( al ) − Gn ( al −1 ))2 ,
l =1

2 −1
where G p (·) is the probability  distribution function of χ ( p). G p ( al ) − G p ( al −1 ) = k
l = 1, 2, . . . , k; G p (+∞) = 1 .

2.1.2. Cramer-Von Mises (CVM)


In 1982, Koziol proposed the use of Cramer-von Mises-type multivariate goodness of
fit test based on order statistics [2]. This test statistic is defined as

1 n    2j − 1 2
12n j∑
CM = + G p D( j ) − , (3)
=1
n

where D( j) , j = 1, 2, . . . , n is order statistics.

2.1.3. Anderson-Darling (AD)


In 1987, Paulson, Roohan and Sullo proposed the Anderson-Darling type multivariate
goodness of fit test based on order statistics [2]. The test statistic is defined as
n
2j − 1      
AD = −n − ∑ n
log G p D( j) + log 1 − G p D(n+1− j) . (4)
j =1

2.2. Tests Based on Skewness and Kurtosis


This section reviews the properties of several measures of multivariate skewness
and kurtosis regarding their use as statistics for assessing multivariate normality [2]. The
skewness and kurtosis are defined as
m3 m
s = √ 3 , k = 42 ,
m2 m2

i
where mi = 1
n ∑in=1 ( xi − x ) x = 1
n ∑in=1 xi .
Mathematics 2021, 9, 3003 4 of 20

2.2.1. Doornik-Hansen (DH)


In 2008, Doornik-Hansen proposed a new multivariate goodness of fit test based on
the skewness and kurtosis of multivariate data transformed to ensure independence [6].
The Doornik-Hansen test statistic is defined as the sum of squared transformations of the
skewness and kurtosis. Approximately, the test statistic follows a χ2 distribution

DH = Z1T Z1 + Z2T Z2 ∼ χ2 (2p), (5)

where Z1T = z11 , . . . , z1p and Z2T = z21 , . . . , z2p are defined as
 

√ 1
 q  r 
χ
Z1 = δ log y + y2 − 1 and Z2 = 9α −1+ 3
9α 2α

where
s
3 n2 + 27n − 70 (n + 1)(n + 3)

1 (w2 − 1)(n + 1)(n + 3)
q
2
δ= p , w = −1 + 2( β − 1), β = , y=s ,
log(w2 ) (n − 2)(n + 5)(n + 7)(n + 9) 12(n − 2)

(n − 2)(n + 5)(n + 7) n2 + 27n − 70 (n − 7)(n + 5)(n + 7) n2 + 2n − 5


 
2
α = a+c×s , a = , c= ,
6δ 6δ
(n + 5)(n + 7) n3 + 37n2 + 11n − 313
    
δ = (n − 3)(n + 1) n2 + 15n − 4 , χ = 2l k − 1 − s2 , l = .
12δ

2.2.2. Royston (Roy)


In 1982, Royston proposed a test that uses the Shapiro-Wilk/Shapiro-Francia statistic
to test multivariate normality. If the kurtosis of the sample is greater than 3, then it uses
the Shapiro-Francia test for leptokurtic distributions. Otherwise it uses the Shapiro-Wilk test
for platykurtic distributions [3,5]. Let W j be the Shapiro-Wilk/Shapiro-Francia test statistic
for the jth variable (j = 1, 2, . . . , d) and Zj be the values obtained from the normality
transformation [3,5].
 
if 4 ≤ n ≤ 11, x = n and W j = − log γ − log 1 − W j ,

if 12 ≤ n ≤ 2000, x = log(n) and W j = log 1 − W j .
Thus, it are observed that x and W j change with the sample size. The transformed
values of each random variable are obtained by [3,5]

Wj − l
Zj = ,
σ
where γ, l and σ are derived from the polynomial approximations. The polynomial
coefficients are provided for different [3,5]:

γ = a0γ + a1γ x + a2γ x2 + . . . + adγ x d ,

l = a0l + a1l x + a2l x2 + . . . + adl x d ,


log(σ ) = a0σ + a1σ x + a2σ x2 + . . . + adσ x d .
The Royston’s test statistic for multivariate normality is defined as
p
e ∑ j =1 ψ j
H= ∼ χ2e , (6)
p
Mathematics 2021, 9, 3003 5 of 20

where e is the equivalent degrees of freedom, Φ(·) is the cumulative distribution function
for the standard normal distribution such that,
p
e= ,
[1 + ( p − 1) c ]
( "  #)2
−1 Φ − Zj
ψj = Φ , j = 1, 2, . . . , d.
2

Let R be the correlation matrix and rij is the correlation between ith and jth observa-
tions. Then, the extra term c is found by
cij
∑ ∑ p ( p − 1) ,

c= cij i6= j ,
i j

where  
g rij , n for i 6= j
cij = .
1 for i = j
When g(0, n) = 0 and g(1, n) = 1, then g(·) can be defined as
 
l l
g(r, n) = r 1 − (1 − r ) ,
$
v

where l, $ and v are the unknown parameters, which are estimated by Ross modeling [4]. It
was found that l = 0.715 and $ = 5 for sample size 10 ≤ n ≤ 2000 and v is a cubic function

v(n) = 0.21364 + 0.015124(log(n))2 − 0.0018034(log(n))3 .

2.2.3. Mardia (Mar1 and Mar2)


In 1970, K.V. Mardia proposed a new multivariate goodness of fit test based on
skewness and kurtosis. The statistic for this test is defined as [17]
p
 
p( p+1)( p+2)
MS (s) = n6·s → χ2 6 ,
n(k− p( p+2))2 d
(7)
Mk (k) = 8p( p+2) → χ2 (1).

2.3. Other Tests


This section reviews the properties of several measures of non-negative functional
distance, a covariance matrix and Energy distance concerning their use as statistics for
assessing multivariate normality. A non-negative functional distance that measures the
distance between two functions is defined as
Z
P̂(t) − Q̂(t) 2 ϕh (t)dt,

Dh ( P, Q) =

where P̂(t) is the characteristic function of the multivariate standard normal, Q̂(t) is
the empirical characteristic function of the standardised observations, ϕh (t) is a kernel
(weighting) function
 − p/4 −tT t
ϕh (t) = 2πh2 e 2h2 ,

where t  R p and h  R is a smoothing parameter that needs to be selected [10].


Mathematics 2021, 9, 3003 6 of 20

2.3.1. Energy (Energy)


In 2013, G. Szekely and M. Rizzo introduced a new multivariate goodness of fit test
based on Energy distance between multivariate distributions. The statistic for this test is
defined as [18]
!
2 n 1 n

n j∑ n2 j,k∑
En = n en,j − N1 || − E|| N1 − N2 || −
E||Y ||Y
en,j − Y
en,k || , (8)
=1 =1

p −1 
where Y en,j = n/(n − 1)Yn,j , Yn,j = Sn 2 X j − X n , j = 1, . . . , n is called scattering
residues. N1 and N2 are independent
 randomly
 distributed vectors according to the normal
p +1 p
distribution. E|| N1 − N2 || = 2Γ 2 /Γ 2 , where Γ(·) is a Gamma function. The null
hypothesis is rejected when En acquires large values.

2.3.2. Lobato-Velasco (LV)


In 2004, I. Lobato and C. Velasco improved the Jarque and Bera test and applied it to
stationary processes. The statistic for this test is defined as [19]

nµ̂23 n(µ̂4 − 3µ̂2 )2


G= + , (9)
6 F̂ (3) 24 F̂ (4)
n −1  k −1
where F̂ (k) = ∑ ψ̂(t) ψ̂(t) + ψ̂(n − |t|)

is an auto-covariance function.
t =1− n

2.3.3. Henze-Zirkler (HZ)


In 1990, Henze and Zirkler introduced the HZ test [1]. The statistic for this test is
defined as

1 n n h2
 − p n − h2 D
 − p
∑ ∑ e− 2 Dij − 2 ∑e
2 2
HZ = 1 + h2 2(1+ h2 ) i + n 1 + 2h2 , (10)
n2 i =1 j =1 i =1

T T
where Dij = Xi − X j S−1 Xi − X j , Di = Xi − X S−1 Xi − X .
 

Di gives the squared Mahalanobis distance of ith observation to the centroid and Dij
gives the Mahalanobis distance between ith and jth observations. If the sample follows a
multivariate normal distribution, the test statistic is approximately log-normally distributed
with mean [1]
p
 2 
a− 2 1 + ph a + p( p + 2)h4
1− ,
2a2
and variance [1]
− p 2a− p 1 + 2ph4

3p( p + 2)h8 3ph4 p ( p + 2) h8
 

2 p
−2
2 1 + 4h2 + + − 4w h 1 + + ,
a2 4a4 2wh 2wh 2

where a = 1 + 2h2 and wh = 1 + h2 1 + 3h2 . Henze and Zirkler also proposed an


 

optimal choice of the parameter h in using HZ in the p-variate case as [1]


  1
1 n(2p + 1) p +4
h∗ = √ .
2 4

A drawback of the Henze-Zirkler test is that, when H0 is rejected, the possible violation
of normality is generally not straightforward. Thus, many biomedical researchers would
prefer a more informative and equally or more powerful test than the Henze-Zirkler test [5].
Mathematics 2021, 9, 3003 7 of 20

2.3.4. Nikulin-Rao-Robson (NRR) and Dzhaparidze-Nikulin (DN)


In 1981, Moore and Stubblebine suggested a multivariate Nikulin-Rao-Robson (NRR)
goodness of fit test [7,8]. This test statistic for a covariance matrix of any dimension is
defined as
2pr (∑ Vi pi )2
Yn2 = ∑ Vi2 + , (11)
1 − 2pr ∑ p2i
where Vi is a vector of standardized cell frequencies with components

 ( Nin − n/r )
Vi = Vin θ̂n = √ , i = 1, . . . , r,
n/r

where Nin is the number of random vectors X1 , . . . , Xn falling into Ein θ̂n , i = 1, . . . , r.
Then the limiting covariance matrix of standardized frequencies is Vn θ̂n = ∑l = I −


qq T − BJ −1 B T , where B is the r × m matrix with elements


Mathematics 2021, 9, x FOR PEER REVIEW 7 of 20
1 ∂pi (θ )
Mathematics
Mathematics
2021, 9,2021,
x FOR9, xPEER
FOR = p
BijREVIEW
PEER REVIEW , i = 1, . . . , r, j = 1, . . . , m,
pi (θ ) ∂θ j

whereq 𝑞is is
where a ar-vector
r-vectorwith
withits
itsentries
entriesas as 1/1/√r, 𝑟, 𝕞 ==𝑝 p++𝑝(𝑝 p( p++ 1)/21)/2isisthe the number
number of ofun-
parameters,𝐽 J==
known parameters,
unknown 𝐽(𝜃)
where isis𝑞the
J (θ )where the a𝑞Fisher
is Fisher
r-vector
is ainformation
r-vector
with its
information with entries
its entries
matrix
matrix as
ofof 1/size
√𝑟,𝕞
as
size 1/𝕞 ×√=×𝑟,
𝕞𝑝𝕞 +=
for 𝑝(𝑝
for 𝑝 one
one ++ob-
𝑝(𝑝
1)/2+ 1)/
is t
servation which
observation whichevaluated known
evaluated asknown
as parameters,
parameters,𝐽 = 𝐽(𝜃) 𝐽 = is𝐽(𝜃) theisFisher
the Fisher
information
information matrix matrix
of size of 𝕞s
servation
servation
which which
evaluated
 −1Σ evaluated as as
∑ 0 0 ,

J (θ ) =𝐽(𝜃) = 0 −𝑄 , Σ Σ0 0
0 Q 1 𝐽(𝜃) = 𝐽(𝜃) = , ,
0 𝑄0 𝑄
where 𝑄 is the 𝑝(𝑝 + 1)/2 × 𝑝(𝑝 + 1)/2 covariance matrix of 𝑤 (a vector of the entries
where
√of √𝑛𝑆 Q is the p( p +
arranged × p(where
1)/2where
column-wise +
p by
𝑄 1is )/2𝑄 covariance
the
taking is𝑝(𝑝
the+
the 𝑝(𝑝
1)/2
upper +matrix
×
1)/2
𝑝(𝑝×
triangular of w
+𝑝(𝑝
1)/2(a+vector
1)/2
covariance
elements) of[7]:
thematrix
covariance entries of 𝑤 of
matrix
of (a 𝑤vec
nS arranged column-wise by
of √𝑛𝑆 taking
of arranged the
√𝑛𝑆 arranged upper triangular
column-wise
column-wise elements)
by takingby taking [7]:
the upperthe upper triangulartriangularelements)
elem
𝑤 = 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , 𝑠 , … , 𝑠 .
T
w = s11 , s12 , s22 , s13 , s23 , s33 ,𝑤 . . .=, s pp
𝑤
𝑠 =, 𝑠. 𝑠 , 𝑠, 𝑠 , 𝑠, 𝑠 , 𝑠, 𝑠 , 𝑠, 𝑠 , …, 𝑠, 𝑠 , … , .𝑠 .
The second term of 𝑌 recovers information lost due to data grouping. Another use-
ful The second termofof𝑌Yn2 is
decomposition The second
defined
recovers The
as second
term of
information lost𝑌due
term of 𝑌to data
recovers recovers
information
information
grouping. lost due
Another lost todue
useful data to group
data g
2 ful
decomposition of Yn is defined as decomposition
ful decomposition of 𝑌 ofis 𝑌defined
is defined
as as
𝑌 =𝑈 +𝑆 ,
𝑌 =𝑌 𝑈 = +𝑈 𝑆 ,+ 𝑆 ,
where 𝑈 is the multivariate statistic Yn2defined
= Un2 +by Sn2Dzhaparidze
, and Nikulin (1974) [7]. It is
defined as wherewhere 𝑈 is𝑈theis multivariate
the multivariate statistic statistic
defined defined
by Dzhaparidze
by Dzhaparidze and Niku
and
where Un2 is the multivariate statistic
defined
𝑈 =𝑉 𝜃 𝐼−𝐵 defined
as defined
as (𝐵 by
𝐵 Dzhaparidze
) 𝐵 𝑉 𝜃 , and Nikulin (1974) [7]. It
(12)is
defined as  𝑈 test =−𝑈 𝑉1statistic
=𝜃𝑉 𝐼 𝜃− 𝐵 𝐼(𝐵− 𝐵𝐵 (𝐵 ) 𝐵 𝐵 ) 𝑉 𝐵𝜃 𝑉, 𝜃 ,
and in 1985, McCulloch 2presented a multivariate [7]:
 
T
BnT Bpresented T
 
= Vinnand
Unand θ̂nin 1985,
1985, − BnMcCulloch
IMcCulloch n B n Vn
presented θ̂n a, multivariate
a multivariate test statistic (12)
test statistic[7]: [7]:
𝑆 = 𝑌 − 𝑈 = 𝑉 𝜃 𝐵 (𝐽 − 𝐵 𝐵 ) + (𝐵 𝐵 ) 𝐵 𝑉 𝜃 .
and in 1985, McCulloch presented a multivariate 𝑆 =𝑌 𝑆 − =𝑈
test𝑌 statistic

=𝑈 𝑉 =𝜃[7]:𝑉 𝐵 𝜃(𝐽 𝐵− (𝐽 𝐵 𝐵− )𝐵 𝐵+)(𝐵 + 𝐵 (𝐵) 𝐵𝐵) 𝑉
If rank 𝐵 = 𝑠, then 𝑈 and 𝑆 are asymptotically independent and distributed in
the limit as 𝜒 and 𝜒 , respectively.
If rank
 If rank
 𝐵 = 𝑠,𝐵 then
= 𝑠,then
𝑈−1 and  𝑆andare
𝑈 −𝑆1 asymptotically
 are asymptotically independentindepend a
T T T
2 2 2
BnT Vn θ̂n .

Sn = Yn − Un =the Vnlimit
θ̂n B
the asn 𝜒 Jas
limit n− 𝜒 Band
n Bn 𝜒and +𝜒 ,Brespectively.
n Bn
, respectively.
3. The New Test
IfOur B =iss,based
ranktest then U 23. The 3.
SNew
2The NewTest Test and independent
onn and n are asymptotically
distribution distance has been derived and distributed in the
using an inversion
limit as 2 and 2 , respectively.
formula.r−The
χ s−1 estimation
χ s of aOur test
sampleOurdistribution
istest
based
is based
on distribution
on distribution
density distance
is based distance
on and hasandbeen
application hasof been
derived
the derivu
characteristic function and formula.
formula.
inversionTheformula.
estimation
The estimation
This ofmethod
a sample
of a sampledistribution
is known distribution
for itsdensity
gooddensity
is based
proper- is based
on a
3.ties
The(i.e.,
New Test
low sensitivity)characteristic
characteristic
and has beenfunction function
introduced andin inversion
and
[20].inversion
formula.
Marron formula.
and This
Wand method
This
[21]method
is known
carried is know
fo
outOur test is based
an extensive ties (i.e.,
ties
on distribution
comparison of low
(i.e.,distance
density sensitivity)
low sensitivity)
andand
estimation has has
andbeen
been
methods has been
introduced
derived
(including introduced
using in [20].
an in Marron
[20].
inversion
the adapted kernel Marron
and W
formula.
method)The andestimation
concludedof outa an
sample
that out
extensivedistribution
an extensive
density comparison
estimation density
comparison isofbased
of density
based on density on estimation
application
estimation
application ofmethods of (including
the (includ
methods
characteristic th
characteristic function and
function and inversion ismethod) inversion
more method) formula.
and concluded
accurate and This method
concluded is
that density
for non-Gaussian known
thatdata
density for
estimation
sets. its good
estimation properties
basedbased
on application
on applic
(i.e., low random 𝑝-variate
Thesensitivity) and has been
function and𝑋inversion
introduced
function
vector ∈ ℝ inversion
and , in [20].
is more
which Marron
is more
accurate
follows and Wand
accurate
for for[21]
non-Gaussian
a distribution carried
non-Gaussian
of out sets.
data
a mixture data set
anmodel
extensive
has acomparison
density function of density
The random
Theestimation
random𝑝-variate
𝑝-variate
methods vector 𝑋 ∈ ℝ𝑋 the
(including
vector ,∈which
ℝadapted
, which kernel
followsfollows
a distribut
a dis
method) and concluded that model density
model
has a has estimation
densitya density based
function
functionon application of characteristic
function and inversion is more = 𝑓(𝑋, 𝜃)for
𝑓(𝑋) accurate =∑ 𝑝 𝑓 (𝑋, 𝜃 data
non-Gaussian ), sets. (13)
= 𝑓(𝑋,
𝑓(𝑋) 𝑓(𝑋) =𝜃) =∑
𝑓(𝑋, 𝜃) = 𝑝∑ 𝑓 (𝑋, 𝑝 𝜃𝑓 (𝑋,
), 𝜃 ),
where 𝑞 is the number of clusters (i.e., components, classes) of the mixture, and 𝑝 (𝑘 =
1, … , 𝑞) is the a priori probability
wherewhere 𝑞 which𝑞 isnumber
is the the number
satisfy of clusters
of clusters
(i.e., components,
(i.e., components, classes)
classes)
of theofmix
th
1, … , 𝑞)
1, …is, 𝑞)
theisa the
priori
a priori
probability
probability
whichwhichsatisfy satisfy
𝑝 > 0, ∑ 𝑝 = 1. (14)
𝑝 > 0, 𝑝 ∑ > 0, 𝑝∑ = 1. 𝑝 = 1.
Mathematics 2021, 9, 3003 8 of 20

The random p-variate vector X ∈ R p , which follows a distribution of a mixture model


has a density function
q
f ( X ) = f ( X, θ ) = ∑ pk f k (X, θk ), (13)
k =1

where q is the number of clusters (i.e., components, classes) of the mixture, and pk (k = 1, . . . , q)
is the a priori probability which satisfy
q
pk > 0, ∑ pk = 1. (14)
k =1

 The f k ( X, θk ) is a distribution of the kth class and θ is a set of parameters θ =


p1 , . . . , pq , θ1 , . . . , θq . We denote the p-variate sample of independent and identically
distributed random values X.
When examining approximations of parametric methods, it should be emphasized
that as the data dimension increases, the number of model parameters increases rapidly,
making it more difficult to find accurate parameter estimates. It is much easier to find
density of univariate data projections

xτ = τ T x, (15)

than multivariate data density f because of mutually unambiguous compliance.

f ↔ { f τ , τ ∈ R p }. (16)

It is quite natural to try to find the multivariate density f using the density estimates
fˆτ of univariate observational projections [20]. In case of Gaussian mixture model, the pro-
jection of the observations (15) is also distributed according to the Gaussian mixture model:
q
f τ ( x ) = f τ ( x, θτ ) = ∑ pk,τ ϕk,τ (x), (17)
k =1
 
2
where ϕk,τ ( x ) = ϕ x; mk,τ , σk,τ is univariate Gaussian density. The parameter set θ of
 multivariate mixture and the distribution parameters of the data projections θτ =
the
2
pk,τ , mk,τ , σk,τ , k = 1, . . . , q are related by equations:

p j,τ = p j ,
m j,τ = τ T M j , (18)
2 = τ T R τ.
σj,τ j

The inversion formula is used


1
Z
T
f (x) = e−it x ψ(t)dt, (19)
(2π ) p Rp

where
T
ψ(t) = Eeit x , (20)
where ψ(t) denotes the characteristic function of the random variable X. Given that u = |t|,
τ = t/|t| and by changing the variables to a spherical coordinate system we obtain
Z ∞
1
Z
T
f (x) = ds e−iuτ x ψ(uτ )u p−1 du, (21)
(2π ) p τ: |τ |=1 0
Mathematics 2021, 9, 3003 9 of 20

where the first integral is the surface integral of the unit sphere. The characteristic function
of the projection of the observed random variable is
TX
ψτ (u) = Eeiuτ , (22)

and has the property


ψ(uτ ) = ψτ (u). (23)
By selecting the set T of uniform distributed directions on the sphere and replacing
the characteristic function with its estimate, a density estimate is obtained [20,22]:
Z ∞
A( p)

T 2
fˆ( x ) = e−iuτ x ψ̂τ (u)u p−1 e−hu du, (24)
#T τ ∈T 0

where #T denotes a size of set T. Using the p-variate ball volume formula
 p
p
π  π 2pR ,
p
2 Rp when p mod 2 ≡ 0,
Vp (R) =  = ( 2 ) !
(25)
p p +1 p −1
Γ 2 +1  2 2 π 2 Rp , when p mod 2 ≡ 1,

p!!

the constant A( p) defined as


0 p
Vp (1) R p2− p π − 2
A( p) = = . (26)
(2π ) p p
Γ 2 +1

Computer simulation studies have shown that the density estimates obtained using
the inversion formula are not smooth. Therefore, in Formula (24), an additional multiplier
2
e−hu is used. This multiplier smoothes the estimate fˆ( x ) with the Gaussian kernel function.
Moreover, this form of the multiplier allows the integral value to be calculated analytically.
Monte Carlo studies have shown that its use significantly reduces the error of estimates.
Formula (24) can be used to estimate the characteristic function of the projected data.
Let us consider two approaches. The first one is based on the density approximation
of the Gaussian distribution mixture model. In this case, the parametric estimate of the
characteristic function is used:
q̂τ
2 2
ψ̂τ (u) = ∑ p̂k,τ eium̂k,τ −u σ̂k,τ /2 . (27)
k =1

By substituting ψ̂τ (u) in (24) by (27), we get

q̂τ R∞ T 2 2
eiu(m̂k,τ −τ!x)−u (h+σ̂k,τ /2) u p−1 du
A( p)
fˆ( x ) = #T ∑τ ∈T ∑k=1 p̂k,τ 0
A( p) q̂τ m̂ −τ T x
q
2 + 2h
− p (28)
= #T ∑τ ∈T ∑k=1 p̂k,τ I p−1 qk,τ
2
σ̂k,τ ,
σ̂k,τ +2h

where

Z 
2 /2
Ij (y) = Re eiyt−t t j dt . (29)
0

We note, that only the real part of the expression is considered here (the sum of the
imaginary parts must be equal to zero) in other words, the density estimate fˆ( x ) can acquire
2
only the real values. The chosen form of the smoothing multiplier e−hu allows relating the
smoothing parameter h with the variances of the projection clusters, i.e., in the calculations
the variances are simply increased by 2h. Next, the expression (29) is evaluated.
Let Z ∞
2
Cj (y) = cos(yt)·e−t /2 ·t j dt, (30)
0
Mathematics 2021, 9, 3003 10 of 20

Z ∞
2 /2
S j (y) = sin(yt)·e−t ·t j dt, (31)
0
then (29) can be written as
Z ∞
2 /2
e−iyt−t t j dt = Cj (y) + iS j (y). (32)
0

By integrating in parts, we get



− t2 R ∞ t2
j − 1 cos(yt) + 0 e− 2 ( j − 1)t j−2 cos(yt) − yt j−1 sin(yt) dt =

Cj (y) = e 2 t
0 (33)
1{ j=1} + ( j − 1)Cj−2 (y) − yS j−1 (y), j ≥ 1.

S j (y) is expressed analogously. With respect to the limitations of the j index, the
following recursive equations are obtained:

Cj (y) = ( j − 1)Cj−2 (y) − yS j−1 (y), j ≥ 2, (34)

C1 (y) = 1 − yS0 (y), (35)


S j (y) = ( j − 1)S j−2 (y) − yCj−1 (y), j ≥ 2, (36)
S1 (y) = YC0 (y). (37)
The initial function S0 (y) is founded by starting with the relation
Z ∞
2 /2
(S0 (y))0y = t cos(yt)·e−t dt = C1 (y). (38)
0

From (35) and (38) it follows that S0 satisfies the differential equation

S00 (y) = 1 − yS0 (y), S0 (0) = 0, (39)

which is solved by writing down S0 as the Taylor series:


∞ ∞
S00 (y) = ∑ c l +1 ( l + 1 ) y l +1 = 1 − ∑ c l −1 y l . (40)
l =0 l =2

By equating the coefficients of the same powers, its values are obtained:

c0 = 0, c1 = 1,cl = −cl −2 /l, l ≥ 2, (41)

which gives us


(−1)l y2l +1 y3 y5 y7
S0 ( y ) = ∑ (2l + 1)!!
= y− + −
3!! 5!! 7!!
+.... (42)
l =0

C0 is found from expression (30):


R∞ 2 R∞ 2
C0 (y) = 0 cos(yt)·e−t /2 dt = 12 −∞ cos(yt)·e−t /2 dt
R∞ 2
q 2 (43)
= 12 −∞ (cos(yt) − i sin(yt))·e−t /2 dt = π2 e−y /2 .

The value of the integral (24) then is

I j ( y ) = C j ( y ). (44)

One of the disadvantages of the inversion formula method (defined by (24)) is that
the Gaussian distribution mixture model (13) described by this estimate (for f k = ϕk )
does not represent density accuratelly, except around observations. When approximating
Mathematics 2021, 9, 3003 11 of 20

the density under study with a mixture of Gaussian distributions, the estimation of the
density using the inversion formula often becomes complicated due to a large number
of components. Thus, we merge components with small a priori probabilities into one
noise cluster.
We have developed and examined a modification of the algorithm which is based on
the use of a multivariate Gaussian distribution mixture model. The parametric estimate of
the characteristic function of uniform distribution density is defined as

(b − a)u
 
2 iu( a+b)
ψ̂(u) = sin ·e 2 , (45)
(b − a)u 2

in the inversion Formula (19). In the density estimate calculation Formula (24), the estima-
tion of the characteristic function is constructed as a union of the characteristic functions
of a mixture of Gaussian distributions and uniform distribution with corresponding a
priori probabilities:

q̂τ
(b − a)u
 
2 σ̂2 /2 2 iu( a+b)
ψ̂τ (u) = ∑ p̂k,τ eium̂k,τ −u k,τ + p̂0,τ
(b − a)u
sin
2
·e 2 , (46)
k =1

where the second member describes uniformly distributed noise cluster, p̂0 —noise cluster
weight, a = a(τ ), b = b(τ ). Based on the established estimates of the parameters of the
uniform distribution and data projections, it is possible to define the range

τT x − τT x
   
T max min
a= τ x − , (47)
min 2( n − 1)

τT x − τT x
   
T max min
b= τ x + . (48)
max 2( n − 1)
By inserting (46) to (24) we obtain

A( p)
h
q̂τ R ∞ iu(m̂k,τ −τ T x)−u2 (h+σ̂2 /2) p−1
fˆ( x ) = #T ∑τ ∈T ∑k= 1 p̂k,τ 0 e
k,τ u du
2 p̂0,τ R ∞ iu( a+b −τ T x )−u2 h

(b− a)u
 i (49)
+ b− a 0 e 2 ·sin 2 ·u p−2 du .

Using notations such as (28), we define the density estimate as


" !
Tx − p


A ( p ) q̂ m̂ 2
fˆ( x ) =
τ
∑τ ∈T ∑
#T
τ
p̂k,τ I p−1 qk,τ
k =1 σ̂2 + 2h 2 +2h k,τ
σ̂k,τ
 (50)
  p −1
2 p̂
+ b−0,τa J p−2 −2τ T x , b√
a+b√ −a ·(2h) − 2
,
2 2h 2 2h

where Ij (y) is given in (29) which is evaluated by (44) and

Z∞
 
iyu−u2 /2
Jj (y, t) = Re e ·sin(tu)·u j du. (51)
0

By integrating, we get
R ∞ iyu− u2 R∞ u2
0 e
2 · sin( tu )· u j du =
0 (cos( yu ) + i sin( yu ))· sin(tu)·e− 2 ·u j du =
R ∞ sin((y+t)u)+sin((t−y)u) u2

0 2 + i cos((y−t)u)−2 cos((y+t)u) ·e− 2 ·u j du = 21 S j (y + t)+ (52)
1 1 1
2 S j ( t − y ) + i 2 C j ( y − t ) − i 2 C j ( y + t ),
Mathematics 2021, 9, 3003 12 of 20

where S j (y) and Cj (y) are defined in (30) and (31). Then the integral (51) evaluates to

1 1
Jj (y, t) = S j ( y + t ) + S j ( t − y ). (53)
2 2
The above procedure is called a modified inversion formula density estimate. Our proposed
normality test is based on the distance function
Z
T = f (z) − fˆ(z) dG (z), (54)

p

R

where z is a standardized value, fˆ(z) is an estimate of density function.


The choice of G(z) (54) is influenced by three aspects [23]:
• G(z) assigns high weight where |f(z)− fˆ(z)| is large, f(z) pertaining to the alterative
hypothesis. the distribution density is related to the alternative hypothesis.
• G(z) gives high weight where the fˆ(z) is a relatively precise estimator of f(z).
• G(z) is such that the integral (54) has a closed form.
For the distribution free method, the first two aspects are fulfilled by adequately select-
ing the smoothness parameter h, in addition it yields a closed (54) integral form
n
T = n −1 ∑ f (zt ) − fˆ(zt ) . (55)

t =1

T does not depend on a moderate sample volume (≥32) but depends on the data
dimension. It is convenient to use the test statistics T ∗ = −log(T ) which had the lowest
sensitivity based on the exploratory study. Under the null hypothesis statistic T ∗ approxi-
mately follows the Johnson SU distribution which is specified by the shape (δ > 0, γ), scale
(λ > 0), location (ξ) parameters and has the density function
 !
X−ξ X−ξ 2
   
δ 0
f (X) = √ g exp −0.5 γ + δg , for X  (−∞, +∞).
λ 2π λ λ
h i
where g(y) = ln y + y2 + 1 , g0 (y) = √ 12
p
.
y +1
In the middle of the twentieth century, N. L. Johnson [24] proposed certain systems
of curve derived by the method of translation, which, retain most of the advantages and
eliminate some of the drawbacks of the systems first based on this method. Johnson
introduced log-normal (SL ), bounded (SB ), and unbounded (SU ) systems. The bounded
system range of variation covers the area between the bounding line β 2 − β 1 − 1 = 0 and
the Pearson Type III distribution; where (β 1 , β 2 ) points are obtained from the distribution
moments defined by Wicksell [25]:
Z ∞
1 1 2 1 2 δ−2 −rγδ−1
µr0 (y) = √ e(r(z−γ))/δ e− 2 z dz = e 2 r .
2π −∞

It follows that   2  p 
β 1 = e δ −2 − 1 e δ −2 + 2 , β1 > 0 ,
 4  3  2
β 2 = eδ−2 + 2 eδ−2 + 3 eδ−2 − 3.

The SU system is bounded at one end only (Pearson Type V). The SL system is lying
between SB and SU systems. These regions are indicated in Figure 1. The SU system is
presented in detail in [24].
Mathematics 2021, 9, 3003 13 of 20

Figure 1. Regions of Johnson’s systems.

Estimates of T ∗ statistic Johnson SU distribution parameters for different dimensions


are given in Table 1.

Table 1. Statistic T ∗ Johnson SU distribution parameter estimates.

Parameter Symbol Estimate


p=2
Location ξ̂ 4.342807
Scale λ̂ 0.585038
Shape δ̂ 1.498293
Shape γ̂ 0.764906
p=5
Location ξ̂ 7.025845
Scale λ̂ 0.088023
Shape δ̂ 0.895003
Shape γ̂ 0.400035
p = 10
Location ξ̂ 5.195174
Scale λ̂ 1.578613
Shape δ̂ 2.24856
Shape γ̂ −1.83037

For statistic T ∗ , the invariance and contingency properties were checked. The invari-
ance property is confirmed because standardized data was used. The contingency property
is confirmed experimentally (see Section 5).

4. Statistical Distributions
The overviewed normality tests are assessed by the simulation study of 11 statistical
distributions grouped into four groups: symmetric, asymmetric, mixed and normal mixture
distributions [5]. A description of these distribution groups is given in the following sub-
sections.
Mathematics 2021, 9, 3003 14 of 20

4.1. A Group of Symmetric Distributions


Symmetric multivariate distributions are taken from the research [5]:
• Three cases of the Beta(a,b) distribution − Beta(1,1),Beta(1,2) and Beta(2,2), where a and
b are the shape parameters.
• One case of the Cauchy(t,s) distribution − Cauchy(0,1), where t and s are the location
and scale parameters.
• One case of the Laplace(t,s) distribution − Laplace(0,1), where t and s are the location
and scale parameters.
• One case of the Logistic(t,s) distribution − Logistic(0,1), where t and s are the location
and scale parameters.
• Two cases of the t-Student(ν) distribution − t(2) and t(5), where ν is the number of
degrees of freedom.
• One case of the standard normal N(0,1) distribution.

4.2. A Group of Asymmetric Distributions


Asymmetric multivariate distributions are taken from the research [5]:
• Five cases of the Chi-squared(ν) distribution − χ2 (1), χ2 (2), χ2 (5), χ2 (10) and χ2 (15),
where ν is the number of degrees of freedom.
• Two cases of the Gamma(a,b) distribution − Gamma(0.5,1) and Gamma(5,1), where a
and b are the shape and scale parameters.
• One case of the Gumbel(t,s) distribution − Gumbel(1,2), where t and s are the location
and scale parameters.
• Two cases of the Lognormal(t,s) distribution − LN(0,1) and LN(0,0.25) where t and s
are the location and scale parameters.
• Tree cases of the Weibull(β) distribution − Weibull(0.8), Weibull(1) and Weibull(1.5),
where β is the shape parameter.

4.3. A Group of Mixed Distributions


The generated mixed data distribution
 T
Xk = Xk1 , Xk2 , . . . , Xkm , . . . , Xkp , k = 1, 2, . . . , n

is such that the first m variates (i.e., Xk1 , Xk2 , . . . , Xkm ) follow the standard normal distri-
bution and distribution of the remaining variates is one of the non-normal distributions
(Laplace(0,1), χ2 (5), t(5), Beta(1,1), Beta(1,2), Beta(2,2)). The experimental research covers the
cases for m = p − 1, m = p/2 and m = 1.

4.4. A Group of Normal Mixture Distributions


Normal mixture distributions are considered in this research [5]: nine cases of the mul-
tivariate normal mixture distribution MVNMIX (a,b,c,d) − MVNMIX (0.5,2,0,0), MVNMIX
(0.5,4,0,0), MVNMIX (0.5,2,0.9,0), MVNMIX (0.5,0.5,0.9,0), MVNMIX (0.5,0.5,0.9,0.1), MVN-
MIX(0.5,0.5,0.9,0.9), MVNMIX(0.7,2,0.9,0.3), MVNMIX(0.3,1,0.9,0.1), MVNMIX(0.3,1,0.9,0.9).
The multivariate normal mixture distribution with density:
! !
aN 0, ∑ + (1 − a) N b1, ∑ ,
1 2

where 1 is the column vector with all elements being 1,

∑ = (1 − c) I + c11T and ∑ = (1 − d) I + d11T .


1 2
Mathematics 2021, 9, 3003 15 of 20

5. Simulation Study and Discussion


This section provides a modeling study that evaluates the power of selected multi-
variate normality tests. We used the Monte Carlo method to compare our proposed test
with 13 multivariate tests described above for dimensions p = 2, 5, 10, with sample sizes
n = 32, 64, 128, 256, 512, 1024 at significance level α = 0.05. Power was estimated by
applying the tests on 1 000 000 randomly drawn samples from the alternative distribution
(Beta, Cauchy, Laplace, Logistic, Student, Standard normal, Chi-Square, Gamma, Gumbel,
Lognormal, Weibull, Mixed, Normal mixture).
The values of the test smoothness parameter (h) were selected experimentally: from 0.1
to 5 with a step of 0.1. The value of the test h parameter was determined for each dimension
considered. It was found that the best results are obtained (i.e., maximum statistical value)
for p = 2 with h = 1.05, for p = 5 with h = 0.1, and for p = 10 with h = 2.4. These
smoothness parameter h values were used to carry out the numerical experiments.
The power of 13 (including our proposed test) multivariate goodness of fit hypothesis
tests was estimated calculated for different sample sizes, distributions and mixtures. The
mean power values for the groups for distributions (given in Section 4), for each test and
sample sizes, have been computed and presented in Tables 2–5. It can be determined that
the new test for the groups of symmetric and mixed distributions is the most powerful
one. In the group of asymmetric distributions, the new (for p = 2) and Roy (for p = 5 and
10) tests are the most powerful ones. The new (for p = 2 and 5) and Roy (for p = 10 with
sample sizes n = 256, 512, 1024) tests are also the most powerful in the group of normal
distribution mixtures. Comparing the Mardia (Mar1 and Mar2) tests, based on asymmetry
and excess coefficients, it has been found that Mar1 is the most powerful only for the group
of asymmetric distributions. For the group of symmetric distributions the power of this
test is the lowest (compared to other tests).

Table 2. An average empirical power for a group of symmetric distributions.

AD CHI2 CVM DH DN Energy HZ LV New Mar1 Mar2 NRR Roy


p=2
n = 32 0.651 0.57 0.652 0.677 0.565 0.65 0.644 0.696 0.999 0.532 0.605 0.608 0.703
n = 64 0.778 0.692 0.779 0.809 0.671 0.77 0.765 0.815 0.999 0.617 0.751 0.736 0.819
n = 128 0.867 0.798 0.868 0.892 0.768 0.86 0.853 0.893 0.999 0.681 0.857 0.842 0.891
n = 256 0.92 0.873 0.92 0.932 0.847 0.914 0.906 0.932 0.999 0.721 0.917 0.91 0.929
n = 512 0.939 0.912 0.94 0.945 0.903 0.941 0.936 0.945 0.999 0.743 0.942 0.941 0.944
n = 1024 0.945 0.932 0.945 0.949 0.937 0.948 0.947 0.949 0.999 0.758 0.95 0.949 0.95
p=5
n = 32 0.644 0.531 0.624 0.735 0.585 0.632 0.622 0.763 0.985 0.523 0.637 0.602 0.784
n = 64 0.791 0.656 0.775 0.864 0.7 0.758 0.755 0.871 0.989 0.621 0.792 0.739 0.875
n = 128 0.883 0.773 0.876 0.924 0.806 0.863 0.856 0.925 0.988 0.696 0.89 0.851 0.924
n = 256 0.929 0.864 0.926 0.941 0.886 0.921 0.91 0.941 0.987 0.735 0.934 0.916 0.941
n = 512 0.942 0.916 0.942 0.946 0.932 0.945 0.94 0.946 0.981 0.752 0.948 0.944 0.947
n = 1024 0.946 0.941 0.946 0.949 0.949 0.949 0.949 0.949 0.985 0.764 0.95 0.95 0.95
p = 10
n = 32 0.557 0.473 0.534 0.754 0.599 0.598 0.604 0.791 0.997 0.458 0.65 0.599 0.834
n = 64 0.754 0.604 0.728 0.884 0.704 0.709 0.71 0.893 0.998 0.592 0.802 0.719 0.905
n = 128 0.878 0.726 0.865 0.934 0.817 0.821 0.831 0.935 0.998 0.676 0.899 0.844 0.934
n = 256 0.928 0.824 0.922 0.941 0.896 0.906 0.901 0.941 0.998 0.733 0.94 0.913 0.943
n = 512 0.942 0.891 0.941 0.945 0.936 0.943 0.937 0.945 0.991 0.747 0.951 0.942 0.946
n = 1024 0.945 0.928 0.945 0.948 0.949 0.948 0.949 0.948 0.991 0.756 0.95 0.949 0.95
Mathematics 2021, 9, 3003 16 of 20

Table 3. An average empirical power for a group of asymmetric distributions.

AD CHI2 CVM DH DN Energy HZ LV New Mar1 Mar2 NRR Roy


p=2
n = 32 0.634 0.631 0.639 0.852 0.55 0.832 0.811 0.87 0.999 0.813 0.63 0.639 0.877
n = 64 0.744 0.767 0.744 0.956 0.657 0.93 0.906 0.961 0.999 0.941 0.776 0.759 0.962
n = 128 0.827 0.861 0.822 0.995 0.724 0.985 0.968 0.995 0.999 0.992 0.876 0.841 0.995
n = 256 0.897 0.931 0.892 0.999 0.774 0.999 0.995 0.999 0.999 0.999 0.947 0.915 0.999
n = 512 0.954 0.977 0.949 0.999 0.816 0.999 0.999 0.999 0.999 0.999 0.988 0.968 0.999
n = 1024 0.985 0.996 0.982 0.999 0.864 0.999 0.999 0.999 0.999 0.999 0.999 0.993 0.999
p=5
n = 32 0.614 0.6 0.608 0.915 0.551 0.854 0.798 0.932 0.982 0.803 0.623 0.61 0.945
n = 64 0.763 0.779 0.761 0.99 0.675 0.958 0.907 0.992 0.989 0.954 0.791 0.763 0.993
n = 128 0.869 0.892 0.869 0.999 0.748 0.996 0.974 0.999 0.997 0.997 0.908 0.869 0.999
n = 256 0.946 0.965 0.947 0.999 0.812 0.999 0.998 0.999 0.997 0.999 0.978 0.95 0.999
n = 512 0.984 0.994 0.985 0.999 0.872 0.999 0.999 0.999 0.999 0.999 0.997 0.989 0.999
n = 1024 0.995 0.999 0.995 0.999 0.926 0.999 0.999 0.999 0.999 0.999 0.999 0.999 0.999
p = 10
n = 32 0.483 0.443 0.459 0.944 0.532 0.829 0.744 0.96 0.922 0.693 0.573 0.532 0.98
n = 64 0.707 0.712 0.7 0.998 0.679 0.956 0.861 0.998 0.947 0.931 0.746 0.722 0.999
n = 128 0.863 0.87 0.859 0.999 0.776 0.997 0.954 0.999 0.98 0.997 0.898 0.86 0.999
n = 256 0.955 0.96 0.953 0.999 0.858 0.999 0.994 0.999 0.995 0.999 0.978 0.952 0.999
n = 512 0.99 0.994 0.989 0.999 0.93 0.999 0.999 0.999 0.996 0.999 0.998 0.992 0.999
n = 1024 0.996 0.999 0.996 0.999 0.975 0.999 0.999 0.999 0.996 0.999 0.999 0.999 0.999

Table 4. An average empirical power for a group of mixed distributions.

AD CHI2 CVM DH DN Energy HZ LV New Mar1 Mar2 NRR Roy


p=2
n = 32 0.469 0.408 0.463 0.436 0.439 0.582 0.572 0.453 0.999 0.476 0.412 0.444 0.451
n = 64 0.572 0.476 0.567 0.51 0.511 0.703 0.697 0.547 0.999 0.577 0.527 0.533 0.513
n = 128 0.683 0.571 0.679 0.591 0.59 0.809 0.807 0.667 0.999 0.659 0.651 0.641 0.572
n = 256 0.78 0.667 0.778 0.66 0.674 0.872 0.871 0.749 0.999 0.717 0.762 0.741 0.643
n = 512 0.848 0.763 0.847 0.746 0.763 0.895 0.894 0.808 0.999 0.76 0.843 0.827 0.72
n = 1024 0.883 0.842 0.883 0.826 0.835 0.902 0.901 0.857 0.999 0.78 0.884 0.878 0.764
p=5
n = 32 0.626 0.466 0.585 0.545 0.538 0.703 0.706 0.553 0.982 0.584 0.584 0.551 0.47
n = 64 0.749 0.582 0.726 0.631 0.684 0.788 0.815 0.628 0.989 0.675 0.723 0.694 0.524
n = 128 0.805 0.67 0.791 0.695 0.771 0.845 0.864 0.692 0.995 0.722 0.791 0.769 0.589
n = 256 0.852 0.729 0.841 0.747 0.825 0.88 0.885 0.751 0.998 0.75 0.838 0.822 0.669
n = 512 0.88 0.763 0.875 0.777 0.865 0.894 0.895 0.81 0.999 0.766 0.864 0.863 0.73
n = 1024 0.894 0.789 0.893 0.795 0.891 0.9 0.899 0.883 0.999 0.778 0.889 0.889 0.764
p = 10
n = 32 0.688 0.477 0.642 0.58 0.669 0.719 0.745 0.592 0.916 0.614 0.731 0.679 0.475
n = 64 0.753 0.579 0.744 0.69 0.753 0.744 0.78 0.69 0.942 0.68 0.796 0.753 0.529
n = 128 0.775 0.651 0.771 0.736 0.776 0.777 0.821 0.735 0.94 0.722 0.795 0.774 0.602
n = 256 0.802 0.709 0.795 0.761 0.793 0.823 0.87 0.76 0.968 0.745 0.811 0.79 0.689
n = 512 0.833 0.746 0.821 0.778 0.818 0.875 0.892 0.776 0.995 0.764 0.84 0.814 0.745
n = 1024 0.866 0.763 0.853 0.791 0.842 0.897 0.899 0.79 0.997 0.779 0.861 0.837 0.769

In order to supplement and emphasize the results presented in Tables 2–5, the general-
ized line diagrams were drawn using the Trellis display [26] multivariate data visualization
method. The resulting graph is shown in Figure 2 which shows that the New test is sig-
nificantly more powerful than the other tests. The power of the Mar1 tests is the lowest
compared with the other tests. Figure 2 indicate that the power of the tests increases as the
sample size increases. By increasing the dimensions of the power of 8 (AD, CHI2, CVM,
Energy, HZ, New, Mar1 and NRR) tests decreases while the power of the other (DH, DN,
LV, Mar2 and Roy) tests increases slightly. For small sample sizes, the most powerful tests
Mathematics 2021, 9, 3003 17 of 20

are New, Roy and DH. For large sample sizes, the most powerful tests are New, Energy,
HZ and LV.

Table 5. An average empirical power for a group of normal mixture distributions.

AD CHI2 CVM DH DN Energy HZ LV New Mar1 Mar2 NRR Roy


p=2
n = 32 0.465 0.422 0.468 0.56 0.428 0.537 0.529 0.588 0.999 0.433 0.437 0.442 0.607
n = 64 0.576 0.508 0.581 0.74 0.503 0.682 0.672 0.752 0.999 0.544 0.563 0.544 0.778
n = 128 0.71 0.618 0.715 0.893 0.582 0.836 0.823 0.895 0.999 0.633 0.707 0.664 0.908
n = 256 0.844 0.738 0.848 0.974 0.685 0.938 0.926 0.974 0.999 0.701 0.84 0.805 0.978
n = 512 0.943 0.845 0.945 0.998 0.791 0.986 0.977 0.998 0.999 0.733 0.931 0.924 0.998
n = 1024 0.987 0.917 0.988 0.999 0.882 0.999 0.998 0.999 0.999 0.757 0.977 0.985 0.999
p=5
n = 32 0.45 0.399 0.441 0.594 0.443 0.503 0.485 0.632 0.98 0.384 0.46 0.442 0.672
n = 64 0.574 0.491 0.563 0.782 0.51 0.64 0.621 0.795 0.994 0.516 0.59 0.539 0.828
n = 128 0.699 0.598 0.689 0.916 0.594 0.781 0.761 0.92 0.997 0.619 0.728 0.655 0.934
n = 256 0.806 0.702 0.798 0.979 0.691 0.894 0.877 0.979 0.999 0.694 0.832 0.766 0.984
n = 512 0.889 0.787 0.883 0.998 0.782 0.963 0.95 0.998 0.999 0.736 0.905 0.857 0.998
n = 1024 0.946 0.857 0.942 0.999 0.859 0.992 0.985 0.999 0.999 0.758 0.954 0.925 0.999
p = 10
n = 32 0.402 0.392 0.396 0.62 0.495 0.476 0.487 0.667 0.989 0.287 0.547 0.487 0.735
n = 64 0.556 0.47 0.537 0.8 0.536 0.599 0.581 0.815 0.984 0.472 0.634 0.55 0.855
n = 128 0.709 0.587 0.692 0.915 0.629 0.723 0.708 0.919 0.977 0.582 0.758 0.674 0.939
n = 256 0.801 0.688 0.79 0.973 0.723 0.834 0.815 0.974 0.971 0.669 0.834 0.771 0.98
n = 512 0.853 0.76 0.846 0.995 0.8 0.912 0.887 0.995 0.971 0.711 0.885 0.833 0.997
n = 1024 0.893 0.818 0.886 0.999 0.85 0.96 0.939 0.999 0.973 0.739 0.924 0.875 0.999

Figure 2. The summary of average empirical power of all examined distribution groups by sample size and dimensionality.
Mathematics 2021, 9, 3003 18 of 20

6. Examples
6.1. Survival Data
The data set collected in 2001–2020 by the Head of the Department of Urology Clinic of
the Lithuanian University of Health Sciences [27] illustrates the practical application. This
dataset consists of study data from 2423 patients and two different continuous attributes
(patient age and prostate-specific antigen (PSA)). The assumption of normality was verified
by filtering patients’ age and PSA by year of death (i.e., deaths during the first 1, 2, 3, 4, 5,
6, 7, 10, and 15 years). The filtered data was standardized. The power and p-value were
calculated for the multivariate tests. The significance level of α = 0.05 was used for the
study. Based on the obtained results, it was found that all the applied multivariate tests
rejected the H0 the hypothesis of normality. The power of tests CHI2, DH, Energy, HZ, LV,
New, Mar, NRR and Roy was 0.999 and the p-value was <0.0001. Except for DN test, which
power was 0.576 and the p-value was 0.026.

6.2. IQOS Data


In 2017, the data set of pollution research with IQOS and traditional cigarettes [28]
was used by Kaunas University of Technology, Faculty of Chemical Technology, and
Department of Environmental Technology for practical application. This data set consists
of 33 experiments (with different conditions) in which the numerical (Pn10) and mass
concentrations (Pm2.5, Pm10) of particles were measured. The assumption of normality
was checked by filtering Pn10, Pm2.5, Pm10 according to the number of the experiment
in the smoking phase. The filtered data was standardized. The power and p-values
of multivariate tests with a significance level of α = 0.05 were calculated. Based on
the obtained results, it was found that all the applied multivariate tests show that the
assumption of normality is rejected. Most of the multivariate tests used (CHI2, DH, Energy,
HZ, LV, New, Mar, NRR, and Roy) had a power of 0.999 and p-value of <0.0001. The power
of the other tests was also close to 0.99 and the p-value was about 0.0001.

7. Conclusions
In this study, the comprehensive comparison of the power of 13 multivariate goodness
of fit tests was performed for groups of symmetric, asymmetric, mixed, and normal mixture
distributions. Two-dimensional, five-dimensional, and ten-dimensional data sets were
generated to estimate the test power empirically.
A new multivariate goodness of fit test based on inversion formula was proposed.
Based on the obtained modeling results, it was determined that the most powerful tests
for the groups of symmetric, asymmetric mixed and normal mixture distributions are the
proposed test and Roy multivariate test. From two real data examples, it was concluded
that our proposed test is stable, even when applied to real data sets.

Author Contributions: Data curation, J.A., T.R.; Formal analysis, J.A., T.R.; Investigation, J.A., T.R.;
Methodology, J.A., T.R.; Software, J.A., T.R.; Supervision, T.R.; Writing—original draft, J.A., M.B.;
writing—review and editing, J.A., M.B. All authors have read and agreed to the published version of
the manuscript.
Funding: This research received no external funding.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Generated data sets were used in the study (see in Section 4).
Conflicts of Interest: The authors declare no conflict of interest.
Mathematics 2021, 9, 3003 19 of 20

Appendix A
R p is p-variate set of real numbers,
 T
Xk = Xk1 , Xk2 , . . . , Xkp ∈ R p , k = 1, 2, . . . , n is p-variate vector,
#T denotes a size of set T,
p is dimension,
h is smoothness parameter,
D( j) , j = 1, 2, . . . , n are ordered statistics,
G p (·) is the probability distribution function of χ2 ( p),
s is skewness,
k is kurtosis,
n is sample size,
x is sample mean,
σ2 is sample variance,
z is a standardized value,
d is number of variables,
e is the equivalent degrees of freedom,
Φ(·) is the cumulative distribution function for the standard normal distribution,
R is the correlation matrix,
rij is the correlation between ith and jth observations,
Vi is a vector of standardized cell frequencies,
Nin is the number of random vectors,
J = J (θ ) is the Fisher information matrix,
Q is the p( p + 1)/2 × p( p + 1)/2 covariance matrix of w,
P̂(t) is the characteristic function of the multivariate standard normal,
Q̂(t) is the empirical characteristic function of the standardised observations,
ϕ β (t) is a kernel (weighting) function,
Γ(·) is a Gamma function,
F̂ (k) is an auto-covariance function,
Dij is Mahalanobis distance between ith and jth observations,
W j is the normality transformations,
f k ( X, θk ) is a distribution of the kth class,
θ is a set of parameters θ = p1 , . . . , pq , θ1 , . . . , θq ,
ψ(t) is the characteristic function of the random variable X.
pk (k = 1, . . . , q) is the a priori probability,
R is the radius of the ball (bounded sphere),
qς is a quantile of standardized normal distribution
δ and γ are shape parameters,
λ is scale parameter,
ξ is location parameter.

References
1. Henze, N.; Zirkler, B. A class of invariant consistent tests for multivariate normality. Commun. Stat. Theory Methods 1990, 19,
3595–3617. [CrossRef]
2. Henze, N. Invariant tests for multivariate normality: A critical review. Stat. Pap. 2002, 43, 467–506. [CrossRef]
3. Royston, J.P. An extension of Shapiro and Wilk’s W test for normality to large samples. Appl. Stat. 1982, 31, 115–124. [CrossRef]
4. Ross, G.J.S.; Hawkins, D. MLP: Maximum Likelihood Program; Rothamsted Experimental Station: Harpenden, UK, 1980.
5. Korkmaz, S.; Goksuluk, D.; Zararsiz, G. MVN: An R Package for Assessing Multivariate Normality. R J. 2014, 6, 151–162.
[CrossRef]
6. Doornik, J.A.; Hansen, H. An Omnibus Test for Univariate and Multivariate Normality. Oxf. Bull. Econ. Stat. 2008, 70, 927–939.
7. Voinov, V.; Pya, N.; Makarov, R.; Voinov, Y. New invariant and consistent chi-squared type goodness-of-fit tests for multivariate
normality and a related comparative simulation study. Commun. Stat. Theory Methods 2016, 45, 3249–3263. [CrossRef]
8. Moore, D.S.; Stubblebine, J.B. Chi-square tests for multivariate normality with application to common stock prices. Commun. Stat.
Theory Methods 1981, A10, 713–738. [CrossRef]
9. Gnanadesikan, R.; Kettenring, J.R. Robust estimates, residuals, and outlier detection with multiresponse data. Biometrics 1972, 28,
81–124.
10. Górecki, T.; Horváth, L.; Kokoszka, P. Tests of Normality of Functional Data. Int. Stat. Rev. 2020, 88, 677–697. [CrossRef]
11. Pinto, L.P.; Mingoti, S.A. On hypothesis tests for covariance matrices under multivariate normality. Pesqui. Operacional. 2015, 35,
123–142. [CrossRef]
Mathematics 2021, 9, 3003 20 of 20

12. Dörr, P.; Ebner, B.; Henze, N. A new test of multivariate normality by a double estimation in a characterizing PDE. Metrika 2021,
84, 401–427. [CrossRef]
13. Zhoua, M.; Shao, Y. A Powerful Test for Multivariate Normality. J. Appl Stat. 2014, 41, 351–363. [CrossRef]
14. Kolkiewicz, A.; Rice, G.; Xie, Y. Projection pursuit based tests of normality with functional data. J. Stat. Plan. Inference 2021, 211,
326–339. [CrossRef]
15. Ebner, B.; Henze, N. Tests for multivariate normality-a critical review with emphasis on weighted Lˆ2-statistics. TEST 2020, 29,
845–892. [CrossRef]
16. Arnastauskaitė, J.; Ruzgas, T.; Bražėnas, M. An Exhaustive Power Comparison of Normality Tests. Mathematics 2021, 9, 788.
[CrossRef]
17. Mardia, K. Measures of Multivariate Skewness and Kurtosis with Applications. Biometrika 1970, 57, 519–530. [CrossRef]
18. Szekely, J.G.; Rizzo, L.M. Energy statistics: A class of statistics based on distances. J. Stat. Plan. Inference 2013, 143, 1249–1272.
[CrossRef]
19. Lobato, I.; Velasco, C. A simple Test of Normality for Time Series. Econom. Theory 2004, 20, 671–689. [CrossRef]
20. Ruzgas, T. The Nonparametric Estimation of Multivariate Distribution Density Applying Clustering Procedures. Ph.D. Thesis,
Institute of Mathematics and Informatics, Vilnius, Lithuania, 2007; p. 161.
21. Marron, J.S.; Wand, M.P. Exact mMean Integrated Squared Error. Ann. Stat. 1992, 20, 712–736. [CrossRef]
22. Kavaliauskas, M.; Rudzkis, R.; Ruzgas, T. The Projection-based Multivariate Distribution Density Estimation. Acta Comment.
Univ. Tartu. Math. 2004, 8, 135–141.
23. Epps, T.W.; Pulley, L.B. A test for normality based on the empirical characteristic function. Biometrika 1983, 70, 723–726. [CrossRef]
24. Johnson, N.L. Systems of Frequency Curves Generated by Methods of Translation. Biometrika 1949, 36, 149–176. [CrossRef]
[PubMed]
25. Wicksell, S.D. The construction of the curves of equal frequency in case of type A correlation. Ark. Mat. Astr. Fys. 1917, 12, 1–19.
26. Theus, M. High Dimensional Data Visualization. In Handbook of Data Visualization; Springer: Berlin/Heidelberg, Germany, 2008;
pp. 5–7.
27. Milonas, D.; Ruzgas, T.; Venclovas, Z.; Jievaltas, M.; Joniau, S. The significance of prostate specific antigen persistence in prostate
cancer risk groups on long-term oncological outcomes. Cancers 2021, 13, 2453. [CrossRef]
28. Martuzevicius, D.; Prasauskas, T.; Setyan, A.; O’Connell, G.; Cahours, X.; Julien, R.; Colard, S. Characterization of the Spatial and
Temporal Dispersion Differences Between Exhaled E-Cigarette Mist and Cigarette Smoke. Nicotine Tob. Res. 2019, 21, 1371–1377.
[CrossRef]

You might also like