A Riemannian Approach to Multivariate Geostatistical
Modeling
arXiv:2109.14550v2 [stat.AP] 1 Oct 2021
Álvaro I. Riquelme*
Abstract In geosciences, the use of classical Euclidean methods is unsuitable for treating and analyzing some
types of data, as this may not belong to a vector space. This is the case for correlation matrices, belonging to
a subfamily of symmetric positive definite matrices, which in turn form a cone shape Riemannian manifold. We
propose two novel applications for dealing with the problem of accounting with the non-linear behavior usually
presented on multivariate geological data by exploiting the manifold features of correlations matrices. First, we
employ an extension for the linear model of coregionalization (LMC) that alters the linear mixture, which is
assumed fixed on the domain, and making it locally varying according to the local strength in the dependency
of the coregionalized variables. The main challenge, once this relaxation on the LMC is assumed, is to solve
appropriately the interpolation of the different known correlation matrices throughout the domain, in a reliable
and coherent fashion. The present work adopts the non-euclidean framework to achieve our objective by locally
averaging and interpolating the correlations between the variables, retaining the intrinsic geometry of correlation
matrices. A second application deals with the problem of clustering of multivariate data.
Keywords Geostatistical modeling · Linear model of coregionalization · Eigen-decomposition · geodesics ·
Riemannian manifold · Symmetric positive definite
1 Introduction
In geosciences, classical Euclidean methods are not suitable for treating and analyzing some types of data, as
they may not belong to a vector space. A common example is weather data, commonly assumed to be restricted to the sphere. Seeing the data as lying in different submanifolds of a Riemannian space is an increasingly used approach that has been highly successful over
the past decades. In geostatistics, these concepts has
been particularly used for the modeling of spatial nonstationarity in the data (Sampson and Guttorp 1992;
Almendral et al. 2008; Boisvert et al. 2009; Fouedjio
et al. 2015). Other exhaustive application of the concepts have been done by Taylor et al. 2003, Taylor et al.
2006 and Adler and Taylor 2007 in order to understand
the topology of random fields (RF) in manifolds.
We address the problem of capturing and incorporating a second source of non-stationarity from the
geological phenomena, which has to do with the fact
that the different variables that describe ore deposits,
𝑍1 (u), . . . , 𝑍 𝑝 (u), cannot be modeled independently
among them since they are mineralogically and physically related in complex fashions. As simple linear multivariate features rarely occur among geological variables composing sampling databases, usually showing
∗ E-mail:
[email protected]
Robert M. Buchan Department of Mining, Queen’s University, Kingston, Canada
nonlinear features instead, the correct reproduction of
such characteristics becomes a problem when employing traditional estimation and geostatistical simulation
techniques.
When relationships are simple and linear, one can
rely on the linear model of coregionalization (LMC)
(Journel and Huijbregts 1978; Chilès and Delfiner 2012)
which can be interpreted, in the standard Gaussian
setting, as assigning a constant correlation 𝜌𝑖 𝑗 to the
pair of variables {𝑍𝑖 (u), 𝑍 𝑗 (u)} throughout the domain.
This correlation parameter fixes in the space the direct and cross covariances theoretically, even at different positions {𝑍𝑖 (u), 𝑍 𝑗 (u + h)}, and must be modeled
in beforehand to proceed with estimation techniques
such as co-kriging (Wackernagel 2013), or before applying decorrelation of the data through linear transformations such as Principal Component Analysis (PCA)
(Pearson 1901) or Minimum/Maximum Autocorrelation Factors (Switzer 1985).
Since geological variables rarely show a linear Gaussian characteristics, it is hard to give to the LMC a
geological interpretation, specially when a non-linear
multivariate behavior among the different attributes is
present, reducing the rate of success for the traditional
methods. To overcome this limitation, some approaches
that generalize the Gaussian transformation approach
from the geological variables (raw variables hereafter)
into independent standard Gaussian variables, that can
be treated individually, has been proposed (Leuangth-
2
ong and Deutsch 2003; Barnett et al. 2014; van den
Boogaart et al. 2017). We follow a different path, which
is to modify the LMC to alter the fixed correlation
among geological features on the domain. This linear
mixture can be made locally varying according to the
local strength in the dependency of the variables, leading to a locally varying linear model of coregionalization
(LVLMC), first introduced by Gelfand et al. (2003) in
the context of spatial non-stationary models.
The main challenge, once the relaxation on the LMC
is assumed, is to properly carrying out the correlation
matrix, computed first at data position, to unknown locations of the spatial domain. Correlation matrices belong to the family of symmetric positive definite (SPD)
matrices, which in turn forms a cone shape Riemannian
manifold. Building upon earlier studies that have shown
that a Riemannian framework is appropriate to address the challenge of interpolation between correlation
matrices, the present work adopts this non-euclidean
framework to achieve our objective by interpolating the
correlations between the variables throughout the geological domain, retaining the intrinsic geometry of correlation matrices.
2 Background
𝑇x𝑖 𝑀
x𝑖
expx𝑖 (v) = x 𝑗
𝑀
Fig. 1 The exponential map.
Given a tangent vector v ∈ 𝑇x 𝑀, locally there exists a unique geodesic 𝛾v (𝑡) starting at x with initial
velocity v, and this geodesic has constant speed equal
to k𝑣k 2x . The exponential map, expx : 𝑇x 𝑀 → 𝑀 maps
a tangent vector v to the point on the manifold that
is reached at time 1 by the geodesic 𝛾v (𝑡). The inverse of expx is known as the logarithm map and is
denoted by logx : 𝑀 → 𝑇x 𝑀. Now, if we have two
points x𝑖 and x 𝑗 on the manifold 𝑀, the tangent vector to the geodesic curve from x𝑖 to x 𝑗 is defined as
𝑣 = logx𝑖 (x 𝑗 ), and the exponential map takes v to the
point x 𝑗 = expx𝑖 logx𝑖 (x 𝑗 ) . In addition, 𝛾v (0) = x𝑖
and 𝛾v (1) = x 𝑗 . The Riemannian distance between x𝑖
and x 𝑗 is defined as 𝑑 (x𝑖 , x 𝑗 ) = logx𝑖 (x 𝑗 ) x𝑖 .
Given the data x1 , . . . , x𝑛 ∈ 𝑀, we consider the use
geometric or Fréchet mean 𝝁 is defined as a minimizer of the sum of squared distances:
2.1 Review of Riemannian Manifolds
𝝁 = arg inf
x∈𝑀
A differentiable manifold 𝑀 of dimension 𝑝 is a topological space that is locally similar to a Euclidean space,
with every point on the manifold having a neighborhood
for which there exists a homeomorphism (a continuous
bijection whose inverse is also continuous) mapping the
neighborhood to R 𝑝 . The tangent space 𝑇x 𝑀 at x is
the vector space that contains the tangent vectors to
all 1-D curves on 𝑀 passing through x. Fig. 1 shows an
example of a two-dimensional manifold, a smooth surface living in R3 . A Riemannian metric on a manifold 𝑀
is a bilinear form which associates to each point x ∈ 𝑀
a differentiable varying inner product h·, ·ix on the tangent space 𝑇x 𝑀 at x. The norm of a vector v ∈ 𝑇x 𝑀
is denoted by k𝑣k 2x = hv, vix . The Riemannian distance
between two points x𝑖 and x 𝑗 that lie on the manifold,
𝑑 (x𝑖 , x 𝑗 ), is defined as the minimum length over all possible smooth curves on the manifold between x𝑖 and x 𝑗 .
The smooth curve with minimum length is known as the
geodesic curve 𝛾.
v
𝑘
∑︁
𝑑 2 (x, x𝑖 ).
𝑖=1
2.2 The Riemannian manifold of SPD matrices
Let Sym+ ( 𝑝) denote the set of symmetric, positive definite matrices of size 𝑝 × 𝑝, that is the set of all symmetric 𝑝 × 𝑝 matrices X such that the quadratic form
v𝑇 Xv > 0 for all v ∈ R 𝑝 . A crucial aspect of the set
Sym+ ( 𝑝) is that it is not a vector space but forms
a cone-shape Riemannian manifold. As a consequence
of the manifold structure of Sym+ ( 𝑝), computational
methods that simply rely on the Euclidean distances between SPD matrices are generally sub optimal, with low
performance (Tuzel et al. 2008). It is necessary to consider the notion of geodesic distance to exploit the manifold structure of Sym+ ( 𝑝), which is the length of the
shortest curve connecting two points, in this case two
matrices, on the manifold. Among the different Riemannian metrics that have been considered on Sym+ ( 𝑝),
3
the one that has been most studied and analyzed is the
classical affine-invariant metric, in which the geodesic
distance on the manifold between two SPD matrices P1
and P2 is defined as:
1/2 1/2 1/2
2
𝑑Sym
+ (P1 , P2 ) = Log(P1 P2 P1 )
2
1/2 1/2
= tr Log2 (P1/2
1 P2 P1 )
visualize any C ∈ Corr(2) by associating the ellipsoid
v𝑇 C−1 v = 1. Because of the global parametrization
𝜑 : (−1, 1) → Corr(2), we can visualize the manifold
Corr(2) as the interval (−1, 1), but at each point in the
interval we can attach to it the ellipsoid corresponding
to the positive-definite form associated to the matrix.
We see this in Fig. 2. Another visualization we will consider is to see the correlation matrices embedded inside
the symmetric positive-definite matrices (Fig. 3).
with Log(·) denoting the matrix logarithm and kAk 2 =
tr(A𝑇 A) denoting the Frobeniuous matrix norm.
Furthermore, given a tangent vector YP ∈ 𝑇P Sym+ ( 𝑝)
at a point P ∈ Sym+ ( 𝑝), the Riemannian exponential
map expP : 𝑇P 𝑀 → Sym+ ( 𝑝) is given by
Fig. 2 The manifold Corr(2).
X = expP (YP ) = P1/2 Exp(P−1/2 YP P−1/2 )P1/2 .
(1)
In the case of correlation matrices of dimension 3,
where Exp(·) denotes the exponential of a matrix. Given
the
shape formed by the set is named the 3-dimensional
+
two positive definite matrices P, X ∈ Sym ( 𝑝), the Riemannian logarithmic map logP : Sym+ ( 𝑝) → 𝑇P Sym+ ( 𝑝), elliptope which can be represented as a linear matrix
inequality, characterize by
of X in relation to P is given by
YP = logP (X) = P1/2 Log(P−1/2 XP−1/2 )P1/2 .
(2)
Finally, the geodesic passing through P in the direction of YP is uniquely given by
𝛾P (𝑡; YP ) = P1/2 Exp(P−1/2 YP P−1/2 𝑡)P1/2 .
(3)
( 1
©
Corr(3) := 𝑥
«𝑦
𝑥
1
𝑧
𝑦ª
𝑧 ®®
1¬
1
: det 𝑥
𝑦
)
1 + 2𝑥𝑦𝑧 − 𝑥 2 − 𝑦 2 − 𝑧 2 > 0 .
𝑥
1
𝑧
𝑦
𝑧 =
1
2.3 The Riemannian manifold of Correlation matrices
The boundary of the elliptope (Fig. 4) is the cubic surface defined by
2.3.1 Visualizing Corr(2) and Corr(3)
1 + 2𝑥𝑦𝑧 − 𝑥 2 − 𝑦 2 − 𝑧 2 = 0
The affine-invariant structure for Sym+ ( 𝑝) is not only
intrinsically linked with Corr(𝑛) but also imposes symmetry on its structure as a quotient manifold.
Let us begin by visualizing Corr(2) as a subset of
Sym+ (2):
)
!
(
1 𝑥
: 𝑥 ∈ (−1, 1) .
Corr(2) :=
𝑥 1
then we can realize this space to be a manifold of dimension 1 parameterized by the map 𝜑 : (−1, 1) → Corr(2)
given by
!
1 𝑥
.
𝜑(𝑥) =
𝑥 1
This is a smooth map into the symmetric matrices which
restricts to Corr(2) whose inverse is simply given by
projection onto one of the off-diagonal entries. We can
Fig. 3 The manifold Corr(2) visualized as an embedded submanifold of Sym+ (2). Points in Sym+ (2) (black) are sampled
independently of those in Corr(2) (red).
4
Corr( 𝑝), with dim Corr( 𝑝) = dim Sym+ ( 𝑝)−dim Diag+ ( 𝑝)
(David 2019). Intuitively, this correspond to a “retraction” along the leaves to the one dimensional line Corr(2),
for the case of Sym+ (2).
The representative that we take on Sym+ ( 𝑝)/Diag+ ( 𝑝)
correspond to the element given by the projection
𝜋 : Sym+ ( 𝑝) → Corr( 𝑝)
Fig. 4 The boundary of the elliptope.
𝚺 ↦→ (D𝚺 , 𝚺) = C𝚺 ,
where D𝚺 = (I 𝑝 ◦ 𝚺) −1/2 and ◦ is the Hadamard product. Since more than one element can be projected
into the same correaltion matrix, we call to the leave
𝜋 −1 (C𝚺 ) projected into the correlation matrix C𝚺 the
fiber of C𝚺 :
𝜋 −1 (C𝚺 ) = {𝚺 ∈ Sym+ ( 𝑝) : (D𝚺 , 𝚺) = C𝚺 }.
2.3.3 Accounting for a distance in Corr( 𝑝)
Fig. 5 A foliation of the cone Sym+ (2). Each leaf is an embedded two-dimensional submanifolds, obtained by translating a given correlation matrix C (red dot) by the action
DCD.
2.3.2 Quotient Geometry
Consider an element 𝚺 ∈ Sym+ ( 𝑝). The orbit of 𝚺,
that is, the set of images of 𝚺 when considering the
action of a group diagonal matrices with positive entries Diag+ ( 𝑝) on it, Diag+ ( 𝑝) × Sym+ ( 𝑝) → Sym+ ( 𝑝),
given by (D, 𝚺) ↦→ D𝚺D:
While the result that Corr( 𝑝) exhibits this particular
quotient manifold structure is meaningful in itself, this
fact alone does not yield results that are suitable for
algorithms and computation as closed form expression
are not available for computing distances on Corr( 𝑝).
Thus, one must rely only on Riemannian structure that
Corr( 𝑝) inherits from Sym+ ( 𝑝) in order to obtain an algorithm that computes distances through an optimization procedure.
In order to come up with such an algorithm, it is
used a really helpful theorem, proved by Huckemann
et al. (2010), showing that the geodesic connecting two
points in the quotient can be expressed as the geodesic
in the ambient manifold from the starting point to an
optimal representative of the end point, lying on the
fiber over the desired endpoint:
Theorem 1 (Huckemann 2010). Let 𝑀 be a Riemannian manifold with an isometric action of a Lie group
𝐺.
Then a geodesic 𝛾 in the quotient 𝑀/𝐺 with end
+
+
Diag ( 𝑝) · 𝚺 = D · 𝚺 : D ∈ Diag ( 𝑝) ,
D · 𝚺 := D𝚺D, points 𝑎, 𝑏 ∈ 𝑀/𝐺 can be obtained from the projection
of a geodesic 𝛾˜ on M (i.e. 𝛾 = 𝜋 ◦ 𝛾˜) such that
is a smooth manifold of dimension equal to dim Diag+ ( 𝑝) =
𝑝. This can be seen explicitly in the case of taking an
– 𝛾˜ has end points 𝑝, 𝑞 with 𝜋( 𝑝) = 𝑎, 𝜋(𝑞) = 𝑏, and
element C ∈ Corr(2) ⊂ Sym+ (2), and sampling the or– 𝑞 is the solution to the problem
+
bit space by applying DCD, where 𝐷 ∈ Diag ( 𝑝) is
min 𝑑 𝑀 ( 𝑝, 𝑐) such that 𝑐 ∈ 𝜋 −1 (𝑏).
generated randomly (Fig. 5).
Subsequently the quotient manifold Sym+ ( 𝑝)/Diag+ ( 𝑝) This last point can be rephrased for fixed 𝑐 ∈ 𝜋 −1 (𝑏) as
is a smooth manifold on which one can take, as repmin 𝑑 𝑀 ( 𝑝, 𝑔 · 𝑐) such that 𝑔 ∈ 𝐺.
resentative of the equivalence relation, an element of
5
𝐷, acting locally with different properties that change
smoothly throughout the different positions u ∈ 𝐷.
The correlation among attributes is the property that
we consider as a function of u, as we consider standard Gaussian RVs, given as a result the reproduction of the complex non-linear features among vari−1/2
−1/2 1/2
Exp
Log(C
C
C
)𝑡
C
𝛾Sym+ (𝑡) = C1/2
,
2
1
1
1
1
ables. Therefore, this idea is a simple, linear, and geo2
−1/2
2
𝑑Sym
C2 C−1/2
)
.
logical meaningful approach to the estimation and un+ = Log(C1
1
certainty quantification at unknown locations when the
In order to adapt this Riemannian structure to Corr( 𝑝)
mentioned characteristics are exhibited on the data.
we need to find the optimal representative of C2 with
respect to the starting point C1 . This is done by find3.1.1 The Model
e 2 in the fiber 𝜋 −1 (C2 ) which
ing the unique element C
+
e 2 . This
minimizes the Sym -distance between C1 and C
The following model relies on the assumptions that
can be written as
“simple” non-linear multivariate features can be reconstructed in a straightforward way by mapping the orig2
2
inal 𝑝-variate cumulative distribution function with a
(C1 , C2 ) =
inf + 𝑑Sym
𝑑Corr
+ (C1 , DC2 D)
D∈Diag ( 𝑝)
𝑝-variate Gaussian distribution equipped with a proper
Using this equation above we then aim to solve the
prior covariance matrix. This procedure is also known
following minimization problem:
as Nataf transformation (Nataf 1962) or NORTA (NORmal To All), and several properties of the transforma2
minimize 𝑑Sym
subject to D ∈ Diag+ ( 𝑝). tion has been studied in different contexts, for instance,
+ (C1 , DC2 D)
in Cario and Nelson (1997); Ayadi et al. (2019); Xie
(4)
et al. (2015); Xiao (2014); Li and Hammond (1975) and
Assuming D∗ is a sufficient solution to the above probon Bourgault (2014) in the geostatistical context. We
e 2 this element in the fiber 𝜋 −1 (C2 )
lem, we define as C
start by a brief motivation proceeded by highlighting
e2
which minimizes the Sym+ -distance between C1 and C
the relevant theoretical aspects of the transformation.
Let Z̃ = [ 𝑍˜1 (u), . . . , 𝑍˜ 𝑝 (u)] 𝑇 be the vector-valued
e 2 = D ∗ C2 D ∗ .
C
random function (RF) considering 𝑝 simultaneous RFs
𝑍˜𝑖 = { 𝑍˜𝑖 (u) : u ∈ 𝐷 ⊆ R𝑛 , 𝑛 ≥ 1}, indexed by 𝑖 ranging
The corresponding geodesic can be taken as the projec+
e2
in the set 𝐼 = {1, ..., 𝑝}, and defined on a fixed contintion of the Sym -geodesic connecting C1 and and C
uous
domain of interest 𝐷 of the Euclidean space R𝑛 .
−1/2 e
−1/2
𝛾Corr (𝑡) = 𝜋 C1/2
Exp
𝑡Log(C
C
C
)
.
2 1
Let the sampling data given by the multivariate vectors
1
1
z̃ 𝛼 = [˜𝑧1 (u 𝛼 ), . . . , ˜𝑧 𝑝 (u 𝛼 )] 𝑇 , 𝛼 ∈ {1, . . . , 𝑘 }, defined as
data. We face, as a main problem, that the blindly pro3 Applications
cedure the values ˜𝑧𝑖 of the different RV 𝑍˜𝑖 , transforming
each variable into a uni-variate Gaussian values 𝑧 𝑖 ,
On this section we present two novel applications based
Adapting equations 3 to the current scenario, let
C1 , C2 ∈ Corr(𝑛). Then the geodesic and corresponding
distance in Sym+ ( 𝑝) connecting these two points are
given by the following:
on the theory introduced previously. The first one has
to do with the prediction of geological attributes at unknown locations, showing complex non-linear multivariate features on the data. The second one is related to
clustering of data.
3.1 Extending The Linear Model of Coregionalization
As we mentioned earlier, the main contribution of this
work is to see any geological process globally as a mixture of multivariate RVs on a given spatial domain
𝑧 𝑖 = 𝐺 −1 𝐹 (˜𝑧𝑖 ) ,
does not translate into independent Gaussian variables,
𝑍𝑖 . This is shown in the cross plots of Fig. 6, giving a
comparative illustration of the original data, normally
transformed data in a uni-variate way, showing that after the transformation, the data, that was previously
correlated in raw values, is still correlated after the
transformation.
Therefore, when modeling two or more variables by
using a non-correlated multi-Gaussian pdf, the path is
6
function 𝜙−1
𝑖 on each of the components of 𝑍:
Z̃ =
[ 𝑍˜1 (u), . . . , 𝑍˜ 𝑝 (u)] 𝑇
= (𝜙1 [𝑍 (u)], . . . , 𝜙 𝑝 [𝑍 (u)])𝑇 = ΦI 𝑝 (Z).
The coupled prior distribution of Z, is still a 𝑝b with mean vecvariate Gaussian distribution N (0, 𝚺),
tor and correlation matrix given by
1
©
𝜌ˆ21
b = .
𝚺
..
« 𝜌ˆ𝑝1
𝜌ˆ12
1
..
.
𝜌ˆ𝑝2
···
···
..
.
···
𝜌ˆ1 𝑝
ª
𝜌ˆ2 𝑝 ®®
.. ®® .
. ®
1 ¬
Then, the random variables 𝑍𝑖 are correlated and
their pairwise relationships are quantified by the correlation coefficients 𝜌ˆ𝑍𝑖 𝑍 𝑗 (or simply 𝜌ˆ𝑖 𝑗 ) with 𝑖, 𝑗 ∈ 𝐼,
which has to be inferred. We proceed to do this in the
next section. The 𝑝-variate ccdf over the original variables is then retrieved simply as:
b
𝐹𝑍˜1 (u) ,..., 𝑍˜𝑝 (u) (˜𝑧 1 , . . . , ˜𝑧 𝑝 ) = 𝐺 0𝚺 𝜙−1
𝑧 1 ), . . . , 𝜙−1
𝑧 𝑝)
𝑝 (˜
1 (˜
Fig. 6 Correlation of variables before and after applying a
normal score transformation. Variables are still correlated.
prone to give bad results when estimating or simulating
values when back transforming into raw values, since
the separate transformation entails an incorrect map
on the multivariate probability densities. However, this
problem is quickly fixed when a correlated Gaussian
distribution is considered instead. This simple method
works as a multivariate transformation, by coupling the
univariate transformations 𝜙𝑖 .
We define the non-coupled transformation of the
initial multivariate RF Z̃ into a stationary 𝑝-variate
Gaussian RF with zero vector mean 𝝁 = (0, . . . , 0)𝑇 =
0 and covariance matrix equal to the identity matrix
I 𝑝 , i.e., Z = [𝑍1 (u), . . . , 𝑍 𝑝 (u)] 𝑇 ∼ N (0, I 𝑝 ) (I 𝑝 𝑖𝑖 =
1 and I 𝑝 𝑖 𝑗 = 0, 𝑖, 𝑗 ∈ 𝐼), by using the anamorphosis
(5)
We will say that Z follows a coupled anamorphob The transformation (or
sis function, i.e., 𝑍 ∼ 𝚽(0, 𝚺).
coupling process) is conceptually illustrated, for the bivariate case, in Figure 7.
It is important to mention that this transformation
is well-defined, in the sense that the order of variables
does not play a role, and a permutation of them just
translates in permutation of the correlation coefficients
b However, this procedure entails the severe hyon 𝚺.
pothesis that the multivariate behavior of geological attributes can be modeled by assuming a correlated Gaussian distribution, which may be a lousy model globally.
Instead, we take this hypothesis for granted locally in
the geological domain.
Given the different RVs that describe ore deposits,
Z̃ = [ 𝑍˜1 (u), . . . , 𝑍˜ 𝑝 (u)] 𝑇 , we proceed to transform the
variables into Gaussian RVs jointly, according to Eq. 5,
in order to get the vector Z = [𝑍1 (u 𝛼 ), . . . , 𝑍 𝑝 (u 𝛼 )].
Once we have device to perform this transformation,
we have to decide among two possible modeling options.
On the first hand, to perform the gaussianization in a
“global” fashion, that is, gathering all the z 𝛼 data and
perform only one transformation by running Eq. 5 once.
On the second hand, to perform the gaussianization “locally”, which means to collect chunks of data in a vicinity 𝑉 to the location under study, z̃ 𝛼 , 𝛼 ∈ 𝑉 ⊂ {1, . . . , 𝑘 }
7
Uncoupled Raw Dist: ˆ.0; I/
1
0:8
0:6
0:4
5
10
15
20
35
30
25
20
15
10
Coupled Raw Dist: ˆ.0; b
†/
2
1
0
1
2
0:2
0
2
1
0
1
2
Coupled Gaussian Dist: N .0; b
†/
0:25
0:016
0:014
0:012
1
0:01
0:008
0:0060:8
0:004
0:0020:6
0
1
0:8
0:6
0:4
0:2
0:15
0:1
0:05
0
0:4
5
10
15
20
0:2
0
0:16
0:14
0:12
0:1
0:08
0:06
0:04
0:02
0
0:4
0:2
0
Uncoupled Gaussian Dist: N .0; I/
0:007
0:006
0:005
1
0:004
0:003
0:8
0:002
0:001
0:6
0
35
30
25
20
15
10
2
1
0
1
2
0:2
0
2
1
0
1
2
Fig. 7 Conceptual bi-variate picture of the adjustment in the
correlated behavior of independent raw distribution though
the Gaussian coupled anamorphosis. In surface are the correspondent cdfs, and in contour plots the pdfs.
including their methodology into the presented one is a
topic of further research). Thus, obtaining a model in
the fashion Z = AY ∼ N 0, 𝚺(u) .
A second the difficulty is to find an appropriate decomposition of 𝚺(u) = A(u)A𝑇 (u), in order to proceed
later with the decoupling of Z and be able to work with
independent variables Y(u) = A(u) −1 Z(u) ∼ N (0, I 𝑝 ).
One can suggest the use of eigen-decomposition 𝚺 =
UDU𝑇 in order Z = UD1/2 Y, but used in automatized way may result in a model with spatial discontinuities, since the non-uniqueness of this decomposition. 𝚺 can be uniquely decomposed as the product of
a positive-diagonal lower triangular matrix by Cholesky
decomposition, being a suitable choice for our purposes:
𝚺 = LL𝑇 .
Once getting a continuous decomposition for 𝚺(u)
and the independent variables, the overall process of
estimation and simulation becomes straightforward, by
working individually on the spatial behavior in each of
the variables separately.
One last difficulty in overcoming comes from the fact
that, once Cholesky is applied, one notices that the following transformation also works well: Z = LRY, with
R a rotation matrix, as any decomposition of the form
𝚺 = LRR𝑇 L𝑇 is valid. This is a bit problematic since
the model acquires an extra free parameter, which is a
source for ill-definition for our model if different spatial
models are involved in the 𝑌𝑖 (u) variables (Fig. 8). If
there is a way for finding a suitable R and fixing this
parameter, that is a topic of further research. In order
to further simplify these issues and the methodology
overall at this point, we take 𝑌𝑖 (u) following the same
variogram model for all 𝑖 ∈ 𝐼.
in a moving neighborhood fashion, noticing that 𝑉 may
be a function of the location under study, 𝑉 (u). This
is a non-trivial choice to do. We take the second path
in our model, since there is no loss of generality and
contains the case on which Z̃ is stationary, as performing gaussianization locally should not be theoretically
biased. As the next step in the methodology is the inference of a local correlation matrix 𝚺(u 𝛼 ) at the sampling
locations, taking the first path of performing a global
transformation and later taking chunks of data would
give, as result, the inference of a 𝚺(u 𝛼 ) matrix based
on data with non-zero mean locally. Finally, the second
path does not contradict the traditional methodology
for uncertainty modeling, which consists in partitioning
the data in stationary clusters, and continue the work
individually on each of the clusters separately.
Now we proceed to deal with the problem of interpolating the different known correlations matrices in the
space.
Then, the LMC is brought into play, and assuming
that each variable consists of a sum of 𝑝 independent
factors:
3.1.2 Interpolation of the Correlation Matrices
𝑍𝑖 (u) =
𝑝
∑︁
𝑎 𝑖 𝑗 (u)𝑌 𝑗 (u),
(6)
𝑗=1
with the number of factors equal to the number of attributes in order to avoid ill-definition as a linear system
(the problem of working in the stationary set-up with a
number of factors different to the number of attributes
has been recently tackled by Pinto et al. (2021), and
We present a fixed point and a gradient descent algorithm which seeks to minimize the mean-squared distances of Sym+ ( 𝑝) and Corr( 𝑝)-valued observations, respectively, with respect to the affine-invariant distance.
The general process for the optimization procedure for
the Corr( 𝑝) is proposed by David (2019), and takes the
following steps:
8
Given the observations P1 , . . . , P 𝑘 ∈ Sym+ ( 𝑝), one
could consider the arithmetic mean of the 𝑛 labeled
𝑘 :
covariance matrices {P𝑖 }𝑖=1
𝑘
∑︁
b= 1
P𝑖
𝚺
𝑘 𝑖=1
which do not account for any intrinsic geometric property of Sym+ ( 𝑝).
1
𝜌12 (u)
𝜌12 (u)
1
We consider, instead, to use the geometric or Fréchet
mean, introduced in the Sym+ ( 𝑝) context by Moakher
(2005). Such a matrix is defined as follows:
b = arg inf
𝚺
𝚺
𝑘
∑︁
2
𝑑Sym
+ (P𝑖 , 𝚺).
(7)
𝑖=1
Recall that the Riemannian distance between two SPD
matrices is defined as:
2
𝑑Sym
+ (P𝑖 , 𝚺) = logP𝑖 (𝚺)
2
1/2
= Log(P1/2
𝑖 𝚺P𝑖 )
2
1/2
= tr Log2 (S1/2
𝑖 𝚺S𝑖 )
Fig. 8 Effect of the rotation and mixing of two independet
Gaussian RF Y = [𝑌1 (u) 𝑌2 (u) ] 𝑇 ∼ N (0, I2 ) with different
spatial continuity (top) to get Z = LRY for two different
options of R (middle and bottom), and taking L the Cholesky
matrix for a given correlation map shown on top.
1. At the current iterate C𝑡 ∈ Corr( 𝑝) find all appropriate distances utilizing the fiber structure of
Sym+ /Diag+ .
2. Interpret C𝑡 ∈ Sym+ ( 𝑝) and perform the update to
a point P𝑡+1 ∈ Sym+ ( 𝑝).
3. Obtain the next iterate in the algorithm by projecting back to Corr( 𝑝), that is C𝑡 = 𝜋(P𝑡+1 ).
We begin by summarizing the optimization methods
on Sym+ ( 𝑝) and Corr( 𝑝).
3.1.2.1 Optimizing on Sym+ ( 𝑝)
and, therefore, minimizing Eq. 7 seems to be impossible
to solve in closed form, according to Moakher (2006).
The same author describe a fixed-point algorithm to
numerically solve the geometric mean of a set of symmetric positive-definite matrices. Other methods such
as Newton’s method on Riemannian manifolds (David
2019) could also be used for the numerical computation of the geometric mean. However, the fixed-point
algorithm described below is simple to implement, does
not require a sophisticated machinery, and converges
rapidly.
b can be computed efficiently
The geometric mean 𝚺
by an iterative procedure consisting in: projecting the
covariance matrices in the tangent space, estimating the
arithmetic mean in the tangent space and projecting
the arithmetic mean back in the manifold. Then iterate
the three above steps until convergence.
If we want to account for the spatial configuration of
the data, we need to consider the use of the weighted
Fréchet mean:
b = arg inf
𝚺
𝚺
𝑛
∑︁
𝑖=1
2
𝜆𝑖 𝑑Sym
+ (S𝑖 , 𝚺),
𝑛
∑︁
𝜆𝑖 = 1,
𝑖=1
with 𝜆 𝑖 the weights obtained from the kriging interpolation. The algorithm in this case is given by slightly
modifying the one taken from Moakher (2006):
9
Algorithm 1 Weighted mean of 𝑘 SPD matrices
Require: a set of 𝑘 SPD matrices P1 , . . . , P 𝑘 ∈
Sym+ ( 𝑝) and 𝜖 > 0.
b (1) = Í 𝑘 𝜆𝑖 P𝑖
1: Initialize 𝚺
𝑖=1
2: repeat
¯ = Í 𝑘 𝜆𝑖 log (𝑡 ) (P𝑖 ) ⊲ Weighted mean in the
3:
𝑷
𝑖=1
b
𝚺
tangent space
¯
b (𝑡+1) = exp (𝑡 ) ( 𝑷)
4:
𝚺
b
𝚺
¯ <𝜖
5: until 𝑷
b
6: return 𝚺
(𝑡+1)
3.1.2.2 Optimizing Along Fibers
and following iterative steps
𝚫𝑡 = I ◦ 2Sym[D𝑡 Log(C𝑖 D𝑡 C−1 D𝑡 )],
D𝑡+1 = D𝑡 Exp(−𝛿D−1
𝑡 𝚫𝑡 ),
with Sym(A) = 12 (A + A𝑇 ), until a desired stopping
criterion is reached. Once we find an optimal element
D∗ ∈ Diag+ ( 𝑝) as a result of minimizing 𝑔𝑖 (D), we dee 𝑖 this element over the fiber 𝜋 −1 (C𝑖 ) which
fine as C
e𝑖 =
minimizes the Sym+ -distance between C and C𝑖 , C
∗
∗
D C𝑖 D .
We summarize the proposed algorithm which finds
the Fréchet mean on Corr( 𝑝):
Algorithm 2 Weighted mean of 𝑘 Correlation matrices
Require: a set of 𝑘 Correlation matrices C1 , . . . , C 𝑘 ∈
Í𝑘
Corr( 𝑝) and 𝜖 > 0, initial point C0 = 𝑖=1
𝜆 𝑖 C𝑖 ,
stepsize 𝛿 > 0.
1: 𝑡 = 0
𝑛
𝑘
∑︁
∑︁
2
2:
while Stopping criterion not met do
b = arg inf
C
𝜆𝑖 = 1.
(8)
𝜆𝑖 𝑑Corr (C𝑖 , C),
C
3:
for 𝑖 = 1, . . . , 𝑘 do
𝑖=1
𝑖=1
Require: Initial point D0
Recall again that the distance between C𝑖 , C ∈ Corr( 𝑝) 4:
𝑛=1
is given by
5:
while Stopping criterion not met do
6:
𝚫𝑡 = I ◦ 2Sym[D𝑛 Log(C𝑖 D𝑛 C−1
𝑡 D𝑛 )]
𝚫
)
7:
D𝑛+1 = D𝑛 Exp(−𝛿D−1
𝑛
𝑛
2
2
8:
𝑛 =𝑛+1
𝑑Corr
(C𝑖 , C) =
inf + 𝑑Sym
+ (C𝑖 , DCD)
D∈Diag ( 𝑝)
9:
end while
−1/2
e 𝑖 = D𝑛𝑚𝑎𝑥 C𝑖 D𝑛𝑚𝑎𝑥
,
=
inf + tr Log2 C−1/2
DCDC
10:
C
𝑖
𝑖
D∈Diag ( 𝑝)
11:
end for
¯ 𝑡 = Í 𝑘 𝜆𝑖 logC ( C
e𝑖)
12:
𝑷
⊲ Mean in the tangent
where we note that one can fix C and then optimize
𝑖=1
𝑡
+
space of Sym ( 𝑝)
over the fiber of C𝑖 as well, by symmetry. The mini¯
13:
𝚺𝑡+1 = expC𝑡 ( 𝑷)
mization of the distance between an iterate C𝑡 of the
14:
C𝑡+1 = 𝜋(𝚺𝑡+1 ) = (I 𝑝 ◦ 𝚺𝑡+1 ) −1/2 𝚺𝑡+1 (I 𝑝 ◦
algorithm to all of the observations C1 , . . . , C 𝑘 is pre𝚺𝑡+1 ) −1/2
⊲ Project back to Corr( 𝑝)
ferred. Hence, the algorithm is arranged to always keep
15:
𝑡 =𝑡+1
the iterate fixed and then optimizing along the fibers of
16: end while
the given observations. In this way, it is guarantee that
17: return C𝑡
the itererated point is updated appropriately. In other
case, one would end up with different optimal points,
not yielding a consistent base point. In finding the optimal point, it is employed a gradient descent method on
3.1.3 Methodology
the set Diag+ ( 𝑝) with respect to the objective function
Now that we have gone throughout the steps for inter2
𝑔𝑖 (D) = 𝑑Sym
+ (C, DC𝑖 D)
polating the correlation matrices, we summarized the
proposed methodology for extending the LMC, consistThe gradient descent algorithm in order to find the
ing of the following steps with both the first and the
optimal D in the above expression is proposed by David
last step being optional and suggested when the data is
(2019). The algorithm’s derivation is long and tedious,
compositional:
and we refer to the mentioned author for further details.
One ends up, however, with a brief two-steps iterative
1) (Perform log-ratio transformation on data, if conalgorithm, by using a stepsize 𝛿 > 0, initializing D0 = I 𝑝
strains conditions are present).
In the same fashion as previously, given the observations C1 , . . . , C 𝑘 ∈ Corr( 𝑝), we are interested in finding
10
2) At each location u 𝛼 with observation, find the nearest 𝑙 samples.
3) Perform Gaussian transformation individually for
˜
each variable 𝜙−1
𝑖 [ 𝑍 𝑖 (u 𝛼 )] = 𝑍 𝑖 (u 𝛼 ) locally using
the nearest 𝑙 samples.
4) Compute the correlation matrix 𝚺(u 𝛼 ) of the vector
Z = [𝑍1 (u 𝛼 ), . . . , 𝑍 𝑝 (u 𝛼 )].
a) Cholesky decomposition of 𝚺(u 𝛼 ) = LL𝑇 and
apply L−1 Z = Y for decorrelation of Gaussian
variables.
b) Interpolation of 𝚺(u 𝛼 ) on the domain 𝐷 using
weighted Fréchet mean and your favorite set weights
𝜆𝑖 . Kriging weights given by the variogram modeling of Y works appropriately.
5) Variogram modeling and simulation of 𝑌𝑖 , assuming
the same model ∀𝑖 ∈ 𝐼.
6) At each unsampled location u, take the estimated
b
correlation matrix 𝚺(u),
perform cholesky decomb
position, and recover Z(u) = L(u)Y(u).
7) At the unsampled location u, find the nearest 𝑙 samples, perform Gaussian transformation individually
˜
for each variable 𝜙b−1
𝑖 [ 𝑍 𝑖 (u)] = 𝑍 𝑖 (u), and recover
b
the value ˜𝑧𝑖 (u) = 𝜙𝑖 [𝑧𝑖 (u)].
8) (Perform log-ratio back transformation on data).
3.2 Geological Domaining
Now we move to a second novel application of the concepts presented on previous chapters. We are considering the classical problem in geostatistics of clustering data which carry continuous information in space,
𝑧(u) (such as a grades), where u is the vector in the
three-dimensional space (u ∈ R3 ). However, 𝑧 has only
been sampled in a discrete set of points {u 𝛼 , 𝛼 ∈ 𝑁 =
1, . . . , 𝑛}. From these measurements, we have some intuition that there is an unknown finite collection 𝐴1 , 𝐴2 , . . .
𝑘 𝐴 = 𝑁, on which
, 𝐴 𝑘 of disjoint sets of 𝑁, with ∪𝑖=1
𝑖
the measurement 𝑧(u 𝛼 ), 𝛼 ∈ 𝐴𝑙 and 𝑧(u 𝛽 ), 𝛽 ∈ 𝐴𝑚 , for
all 𝑙 ≠ 𝑚, have low relationship (or not at all) between
them (for example, because they have a different genesis) and, therefore, they should be clustered on different
categories (typical examples are lithofacies types). We
want to find the collection 𝐴1 , 𝐴2 , . . . , 𝐴 𝑘 .
Methodologies able to deal with this problem have
several significant applications. One of the most important is the definition of stationary spatial domains,
where the assumption of a fairly constant mean within
a given spatial domain is critical for some aspects of re-
source estimation. There is no resource estimation done
without the definition of stationary units. This process
is known as the definition of geological units in geoscientific terms. Most of the time, geological domaining is
done based on non-continuous attributes (lithology and
alteration of the rock) which is related or explain somehow the values of continuous data. Sometimes, however,
the categorical information is not enough to define the
units by itself.
The methodology presented next implements 𝐾-means
algorithm on Corr( 𝑝) and is an alternative to include
the spatial information on continuous data, and should
help the geo-modeler to decide boundaries for geological units, in cases of fuzzy or contradictory categorical
data.
3.2.1 Methodology
Following an idea proposed in You and Park (2021)
for the SPD case, we implemented 𝐾-means algorithm
modified to our context. 𝐾-means algorithm (MacQueen
et al. 1967) is one of famous clustering algorithms for
data analysis. As pointed out in Goh and Vidal (2008),
the method is easily extensible to non-Euclidean data
as it solely depends on the distance measure in determining class memberships.
1) randomly choose 𝐾 correlations matrices as cluster
(1)
means, 𝝁1(1) , . . . , 𝝁 𝐾
, where the upper index refers
to the number of iteration.
2) repeat following steps until convergence:
a) assign each observation to the cluster by smallest
distances to cluster centers,
𝑆𝑖(𝑡) = {𝚺 𝛼 : 𝑑 (𝚺 𝛼 , 𝝁𝑖(𝑡) )} ≤ 𝑑 (𝚺 𝛼 , 𝝁 (𝑡)
𝑗 ) for all 1 ≤ 𝑗 ≤ 𝐾 }
and when it comes to a situation where an observation can belong to one of multiple clusters,
assign the cluster randomly.
b) update cluster centroids by Fréchet means of
each class,
∑︁
𝝁𝑖(𝑡+1) = arg min
𝑑 2 (P, 𝚺 𝛼 ) for 𝑖 = 1, . . . , 𝐾
P∈Corr( 𝑝)
(𝑡 )
𝑗 ∈𝑆𝑖
We have tested the algorithm on Corr(3), that is, on
the elliptope (Fig. 9), and later, on a set of interpolated
correlation matrices on the space (Fig. 10), both with
good results in terms of the continuity of the clusters.
11
Fig. 9 Clustering on random samples of the Corr(3) manifold, and using 𝐾 -means algorithm with 𝐾 = 4. Iterations 1,
2 and 5 are displayed from the top, respectively.
4 Case Study
4.1 The Data
In order to show the capabilities of the proposed techniques described previously, a data set obtained from
a blast hole campaign pertaining to a Nickel-Laterite
deposit is considered and six cross-correlated variables
Fig. 10 Clustering on the Corr(3) manifold using 𝐾 -means
algorithm with 𝑘 = 4, on a distribution of interpolated ellipses
on the space. Iterations 1, 2 and 11 are displayed from the
top, respectively.
isotopically assayed at each sample point: Fe, Ni, MgO,
SiO2 , Al2 O3 , and Cr. Isotopic sampling ensures that all
the variables are available through all the sample locations (Wackernagel 2013). The case study includes 9990
samples available on the data set with a very dense sampling pattern. The name and location of data set cannot
be disclosed because of confidentiality reasons.
The primary inspection of multivariate relations (scatterplot shown on Fig. 11) exposed many aspects of com-
12
plexity such as non-linearity and heteroscedasticity. A
map of the samples for each variable is presented in
Fig. 12. In order to show the predictability of the proposed methodology, 500 random samples are selected
and taken away for testing purposes.
Fig. 11 Display of multivariate features on sampling data
of Nickel-Laterite Deposit. 4 out of 6 variables can be seen
on the scatter plots, by adding color and a variable diameter
to the bullet, proportional to the amount of alumina (top).
Geological codes provided are display as well (bottom).
4.2 Variography
As described in proposed methodology, we begin by applying additive log-ratio transformation on the data,
taken with respect to the Rest variable (Rest= 100 %
− Ni% − · · · − Cr%), as this extra variable may gives us
further information in the prediction. Gaussian transformation is applied at each sample location, by selecting a neighborhood of the closest 800 samples. This
parameter was calibrated several times, showing that
Fig. 12 Isometric view showing the sampling grade information.
working with less data reduces the capabilities for reproduction of the multivariate behavior shown on the
data drastically, as the correlation matrix gets distorted
with a low amount of data.
Once the data is gaussianized and de-correlated after obtaining the correlation matrix, the experimental
direct and cross omni-directional variograms are calculated. Variogram analysis in different directions was
not considered as the amount of data in the vertical
direction is much less than horizontally. This last aspect, however, is included later in the radius of search
for estimation.
13
Variogram analysis and calibration is the weakest
point of the methodology. A first complication is that
de-correlation breaks the marginal guassianity on the
factors Y = L−1 Z, suggesting that the assumption of
multi-gaussianity on Z is not a perfect model at every
location. As a consequence, the experimental variance
on the factors Y do not attains the value of 1, although
it gets close for a couple of factors. This fact can be seen
on the sill of the experimental variograms in Fig. 13.
However, one-structured exponential variogram with 10
m of range is fixed as a final model, fitting relatively well
for most of direct variograms. Cross variograms show
low correlation among variables, as expected, however
the sill do not attains 1 in some of the cases, in the
same fashion as previously described. This last effect
was not considered nor included.
4.3 Results
Once the only variogram formulae is derived, one can
establish the simply to work individually on each of the
factor. A initial grid with mesh dimension of 2×2×2
(in meters) with mesh size of 75, 90 and 25 along east,
north, and elevation coordinates is considered. We proceed to generate 1000 geostatistical simulations by using turning band algorithm (Chilès and Delfiner 2012)
(picking 1200 directions). The neighborhood is selected
as moving and the parameters for the range of search
neighborhood are set to 100 m with up to 25 number of
data and without considering octants. This number is
chosen arbitrarily as the scope of this study is mainly
focused on the examination of uncertainty, being the
number of data chosen for simulation not relevant for
the study. The simulated factors are later correlated
according to the estimated correlation, interpolated by
ordinary kriging (in order to get weights adding 1) at
each location of the grid, by using the same variogram
model as for the factors, and then back-transformed
from gaussian values and from log-ratios into the raw
distribution. The filtered grid for the mean of the simulations in the Nickel case, excluding the nodes far from
sample data, is shown in Fig. 14. The estimated correlation at sample locations and the interpolation on
the regular grid is shown in Fig. 15. The produced
maps showing the mean of the simulations, at level
95 m, is given in Fig. 16, for the six back-transformed
cross-correlated variables. The results reproduce crosscorrelation trends in the maps. For instance, there is a
strong negative correlation between Fe and MgO, which
can be corroborated from visual inspection.
The non-linear behavior among variables is well reproduced. This is shown on Fig. 17 in the case of the
mean of the simulations and for one of them, around
level 95 m. Scatter plots showing all bi-variate relations
for the mean of the simulations are shown in Fig. 18, together with the results of the variography. Variograms
are well reproduced, besides the issues commented previously. It is quite impressive how well-fitted are most
of the direct and cross variograms, given the fact that
only one variogram was considered for the purpose of
the presented methodology.
In order to test the predictability and the uncertainty assessment capabilities of the methodology, we
bring back the testing data leaved out from the first
part of the case study. Each testing data was linked to
the closest node on the grid, for retaining only the data
within less than 2.5 m in distance to the corresponding node, to avoid distortions on results, leaving finally
366 samples to be considered from the initial 500 in an
uncertainty analysis.
The resulting pdfs from the simulations are obtained
and shown in Fig. 19 for 50 samples. We pick this small
window to inspect results in detail. The realizations are
display of light gray lines, and the mean estimation of
the simulation (in black dots) is shown for the different
seven variables, at the 366 samples. Red dots represent
the true grade of the samples. A 5% and 95% percentile
lines are displayed in black lines to give a 90% confidence area.
Figure 20 shows the scatter plots comparing the estimated mean of volumes versus their ground truth value,
as the mean a value is often taken as a predictor for the
real value. Low bias on the prediction and high correlation values are obtained, varying from a lowest value of
0.78 (in the case of Nickel) to 0.95 (in the case of Iron).
The main advantage of simulating is that we can
validate if the decision made on previous steps was correct. The validation is completed with the generation
of an accuracy plot to check that the uncertainty given
by the pdfs effectively represents the experimental frequencies on the ground truth of testing data (Fig. 21).
Finally, in Figs. 22 and 23 , we present the clustering
algorithm results, with 𝐾 = 5, showing high consistency
between the clusters obtained at the last stage and the
geological units tagged in the database, respectively.
The algorithm even seems to capture the directions of
14
Fig. 13 Experimental variogram of the gaussian factors, and the final model used.
rapidly on iteration number three one can anticipate
the definitive zones, demonstrating to be a promising
tool at the moment of delimitation of stationary units.
5 Conclusions
Fig. 14 Grid used for the study showing the estimated mean
and sampling data, for the Nickel case.
continuity shown on the grades by the data. Spatial
continuity in the units provided by the algorithm and,
besides starting with some artifacts at the initial step,
We have shown how multi-variate data can be modeled
and understood as a RF lying on a correlation manifold, on where one can map every data sample into this
topological space. By using this tool, two applications
follow: first, the interpolation of the different known
correlation matrices throughout the domain with the
purpose of reproducing the non-linear multivariate features of data; and second, an application which deals
with the problem of clustering of multivariate data.
As a summary, a conceptually simple and novel methodology has been proposed to account for non-linearity in
multivariate data, with reasonably good results that reproduce the multivariate behavior. By integrating some
basic aspects of Riemannian geometry and the well known
15
Fig. 15 Estimated correlation matrices at sample positions (correlation of Ni, Fe and Mg), represented by ellipses (left). Interpolation of correlation matrices on a regular grid (right).The color of ellipsoids is related to their
anisotropy (bottom). From left to right: isotropic tensor, planar tensor (flat ellipsoid) (𝜆1 ≃ 𝜆2 > 𝜆3 ), elongated ellipsoid
(𝜆1 ≫ 𝜆2 ≥ 𝜆3 )
machinery for handling SPD matrices into the geostatistical setting, we gain enough flexibility to reproduce
the mentioned complex multivariate behavior and, at
the same time, serves to improve our understanding in
the geological data. Implementing interpolation of correlation matrices for carrying the local linear multivariate relationships is a key step in good reproduction of
data behavior.
Among the limitations, we mention that the proposed methodology only works with enough data to estimate the correlation locally. As with other methodologies that try to handle non-stationarity, when limited
data is available, it is better to simplify the problem
and assume stationarity on the data, as calibration of
hyperparameters, such as the correlation matrix at the
different locations, may become hard to obtain. The
variography becomes theoretically challenging to handle and interpret under the assumption of different underlying structures as well, since working with different
models of spatial continuity for the different structures
is a tricky decision to make as a “rotation” of the structures is a free parameter and a valid model that also fits
the spatial correlation among variables. A third issue
is that the definition of stationary geological domains
beforehand replaces the presented methodology. If the
multivariate behavior changes “continuously”, the proposed methodology may be a promising approach for
handling non-stationary.
As part of future research tasks, we propose developing a synthetic study to fully understand some crucial
details of the methodology, such as the sensibility analysis of the main parameters involved and the impact
to adjustments. Among these parameters, that may be
critical for the method, we mention the local neighborhood from which the correlation matrix is obtained, at
sample locations. How sensitive is the estimation for
this correlation matrix to the number of data used, and
the possibility of using a variable size in the amount of
data are some of the open questions to be answered.
Further efforts has to be made to give meaning to the
variogram modeling step when including different models of spatial structure and the effect when performing
the linear mixing. A base case by using traditional alternatives such as splitting the data on stationary domains
is pendent as well, in order to have a way of comparison for the improvement on the estimation made by the
presented methodology, if that is the case.
Finally, the possibility of investigating a statistical
approach to the modeling of correlation matrices is an
interesting path to follow, setting as objective to obtain a probability distribution of correlation matrices
at unknown locations, improving our capabilities and
understanding when modeling uncertainty, by building
different scenarios sampled from such distributions.
Acknowledgements The author acknowledge the funding
provided by the International Association for Mathematical
Geosciences (IAMG) student grant, funding reference number MG-2020-14, and by the Natural Sciences and Engineering Council of Canada (NSERC), funding reference number
RGPIN-2017-04200 and RGPAS-2017-507956.
References
Adler RJ, Taylor JE (2007) Random fields and geometry.
Springer New York
Almendral A, Abrahamsen P, Hauge R (2008) Multidimensional scaling and anisotropic covariance functions.
In: Procedings of the Eight International Geostatistics
Congress, GECAMIN Ltd, pp 187–196
Ayadi MA, Ben-Ameur H, Channouf N, Tran QK (2019)
Norta for portfolio credit risk. Annals of Operations Research 281(1):99–119
Barnett RM, Manchuk JG, Deutsch CV (2014) Projection pursuit multivariate transform. Mathematical Geosciences 46(3):337–359
16
Fig. 16 Plan view showing the estimated mean of the simulations at level 95 m and sampling data.
Boisvert J, Manchuk J, Deutsch C (2009) Kriging in the presence of locally varying anisotropy using non-euclidean distances. Mathematical Geosciences 41(5):585–601
van den Boogaart KG, Mueller U, Tolosana-Delgado R (2017)
An affine equivariant multivariate normal score transform for compositional data. Mathematical Geosciences
49(2):231–251
Bourgault G (2014) Revisiting multi-gaussian kriging with
the nataf transformation or the bayes’ rule for the estimation of spatial distributions. Mathematical Geosciences
46(7):841–868
Cario MC, Nelson BL (1997) Modeling and generating random vectors with arbitrary marginal distributions and
correlation matrix. Tech. rep., Citeseer
Chilès JP, Delfiner P (2012) Geostatistics: Modeling Spatial
Uncertainty
David P (2019) A riemannian quotient structure for correlation matrices with applications to data science. PhD
thesis, The Claremont Graduate University
Fouedjio F, Desassis N, Romary T (2015) Estimation of space
deformation model for non-stationary random functions.
Spatial statistics 13:45–61
Gelfand AE, Kim HJ, Sirmans C, Banerjee S (2003) Spatial modeling with spatially varying coefficient processes. Journal of the American Statistical Association
98(462):387–396
17
Fig. 17 Scatter plot for sampling data, the mean on the grid
values and one simulation respectively from the left, around
level 95 m.
Goh A, Vidal R (2008) Clustering and dimensionality reduction on riemannian manifolds. In: 2008 IEEE Conference
on Computer Vision and Pattern Recognition, IEEE, pp
1–7
Huckemann S, Hotz T, Munk A (2010) Intrinsic shape analysis: Geodesic pca for riemannian manifolds modulo isometric lie group actions. Statistica Sinica pp 1–58
Journel AG, Huijbregts CJ (1978) Mining geostatistics, vol
600. Academic press London
Leuangthong O, Deutsch CV (2003) Stepwise conditional
transformation for simulation of multiple variables. Mathematical Geology 35(2):155–173
Li ST, Hammond JL (1975) Generation of pseudorandom
numbers with specified univariate distributions and correlation coefficients. IEEE Transactions on Systems, Man,
and Cybernetics (5):557–561
MacQueen J, et al. (1967) Some methods for classification and
analysis of multivariate observations. In: Proceedings of
the fifth Berkeley symposium on mathematical statistics
and probability, Oakland, CA, USA, vol 1, pp 281–297
Moakher M (2005) A differential geometric approach to
the geometric mean of symmetric positive-definite matrices. SIAM Journal on Matrix Analysis and Applications
26(3):735–747
Moakher M (2006) On the averaging of symmetric positivedefinite tensors. Journal of Elasticity 82(3):273–296
Nataf A (1962) Determination des distribution don t les
marges sont donnees. Comptes Rendus de l Academie des
Sciences 225:42–43
Pearson K (1901) Liii. on lines and planes of closest fit
to systems of points in space. The London, Edinburgh,
and Dublin philosophical magazine and journal of science
2(11):559–572
Pinto FC, Manchuk JG, Deutsch CV (2021) Decomposition
of multivariate spatial data into latent factors. Computers
& Geosciences 153:104773
Sampson PD, Guttorp P (1992) Nonparametric estimation of
nonstationary spatial covariance structure. Journal of the
American Statistical Association 87(417):108–119
Switzer P (1985) Min/max autocorrelation factors for multivariate spatial imagery. Computer science and statistics
Taylor JE, Adler RJ, et al. (2003) Euler characteristics for
gaussian fields on manifolds. The Annals of Probability
31(2):533–563
Taylor JE, et al. (2006) A gaussian kinematic formula. The
Annals of Probability 34(1):122–158
Tuzel O, Porikli F, Meer P (2008) Pedestrian detection
via classification on riemannian manifolds. IEEE Transactions on Pattern Analysis and Machine Intelligence
30(10):1713–1727, DOI 10.1109/TPAMI.2008.75
Wackernagel H (2013) Multivariate geostatistics: an introduction with applications. Springer Science & Business Media
Xiao Q (2014) Evaluating correlation coefficient for nataf
transformation. Probabilistic Engineering Mechanics
37:1–6
Xie W, Sun H, Li C (2015) Quantifying statistical uncertainty
for dependent input models with factor structure. In: 2015
Winter Simulation Conference (WSC), IEEE, pp 667–678
You K, Park HJ (2021) Re-visiting riemannian geometry of
symmetric positive definite matrices for the analysis of
functional connectivity. Neuroimage 225:117464
18
Fig. 18 Scatter plots and variograms comparing ground truth and mean of simulations. Variograms from simulations are
shown in light gray lines, the mean of the variogram on black line, and the variogram of the ground truth on red line. The
scatter plot showing bi-variate relations is shown on green dots, superposed to the mean of simulations on red dots.
19
Fig. 19 Uncertainty assessment for 50 samples taken from testing data, showing 1000 simulations in gray lines, the lower 5%
and the upper 95% percentile as confidence boundary in black lines, the estimated mean in black dots and, in red dots, the
ground truth.
20
Fig. 20 Scatter plots comparing the estimated mean of the simulations on locations closed to testing data with the ground
truth.
21
Fig. 21 Accuracy plot, which calculates the proportion of
locations where the true value falls within symmetric 𝑝probability interval.
Fig. 23 Scatter showing the progress on clustering when using 𝐾 -means algorithm on sampling data, for 𝐾 = 5, and for
iterations 1, 3, and 30 respectively from the top.
Fig. 22 Isometric view of ellipses at samplig location showing the progress on clustering when using 𝐾 -means algorithm
on sampling data, for 𝐾 = 5, and for iterations 1, 3, and 30
respectively from the top.