Spectral Decomposition of H1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 27

Spectral decomposition of H 1(µ) and Poincaré inequality on a

compact interval – Application to kernel quadrature.


Olivier Roustant1 , Nora Lüthen2 , and Fabrice Gamboa3
1
UMR CNRS 5219, Institut de Mathématiques de Toulouse, INSA, Université de Toulouse, France
2
Chair of Risk, Safety, and Uncertainty Quantification, ETH Zürich, 8093 Zürich, Switzerland
3
UMR CNRS 5219, Institut de Mathématiques de Toulouse, Université Paul Sabatier, France

November 29, 2022

Abstract
MotivatedR by uncertainty P quantification of complex systems, we aim at finding quadrature formulas
b n 1
of the form a f (x)dµ(x) = i=1 wi f (xi ) where f belongs
Pn to H (µ). Here, µ belongs to a class of
continuous probability distributions on [a, b] ⊂ R and i=1 wi δxi is a discrete probability distribution
on [a, b]. We show that H 1 (µ) is a reproducing kernel Hilbert space with a continuous kernel K, which
allows to reformulate the quadrature question as a kernel (or Bayesian) quadrature problem. Although K
has not an easy closed form in general, we establish a correspondence between its spectral decomposition
and the one associated to Poincaré inequalities, whose common eigenfunctions form a T -system (Karlin
and Studden, 1966). The quadrature problem can then be solved in the finite-dimensional proxy space
spanned by the first eigenfunctions. The solution is given by a generalized Gaussian quadrature, which
we call Poincaré quadrature.
We derive several results for the Poincaré quadrature weights and the associated worst-case error. When µ
is the uniform distribution, the results are explicit: the Poincaré quadrature is equivalent to the midpoint
(rectangle) quadrature rule. Its nodes coincide with the zeros of an eigenfunction and the worst-case error
√ n−1 for large n. By comparison with known results for H 1 (0, 1), this shows that the Poincaré
scales as 2b−a
3
quadrature is asymptotically optimal. For a general µ, we provide an efficient numerical procedure, based
on finite elements and linear programming. Numerical experiments provide useful insights: nodes are
nearly evenly spaced, weights are close to the probability density at nodes, and the worst-case error is
approximately O(n−1 ) for large n.

Keywords Sobolev space, RKHS, Mercer representation, Poincaré inequality, Sturm-Liouville theory,
Tchebytchev system (T -system), Bayesian quadrature, kernel quadrature, Gaussian quadrature.

Contents
1 Introduction 2

2 Background 4
2.1 Poincaré inequalities and basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Quadrature with T-systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Kernel quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

3 Spectral decomposition of H 1 (µ) with the Poincaré basis 9


3.1 Main result . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
3.2 Examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

1
4 Poincaré quadrature and optimal kernel quadrature in H 1 (µ) 12
4.1 Definitions and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Equivalence of Poincaré and optimal kernel quadratures in H 1 (µ) . . . . . . . . . . . . . . . . 13
4.3 Formulas for optimal weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Quadrature error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

5 The case of H 1 (a, b) 17


5.1 Nodes coincide with zeros of a basis function . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5.2 Explicit quadrature formulas and quadrature error . . . . . . . . . . . . . . . . . . . . . . . . 18
5.3 Asymptotical optimality of the Poincaré quadrature . . . . . . . . . . . . . . . . . . . . . . . 19

6 Numerical experiments 20
6.1 Numerical computation of the Poincaré quadrature . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Further properties of Poincaré quadratures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 Worst-case error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

7 Conclusion and outlook 26

1 Introduction
Motivation. This research is motivated
R by uncertainty quantification of complex systems, where a typical
task is to compute integrals I = G(x)dP (x). Here, G is a multivariate function representing a quantity of
interest of the system, x is a vector of Rd representing the input variables and P is a probability distribution
representing the uncertainty on x. In this context, the evaluation of G is often time-consuming, and cubature
formula may be preferred to sampling techniques to compute the integral I. Assuming that the input variables
are independent, cubature formulas then boil down to 1-dimensional quadrature formulas, by tensorization
or using sparse grids (Smolyak, 1963; Garcke and Griebel, 2012).

Problem considered. For a given finite interval [a, b] of R, we aim at finding accurate approximations
Rb
of integrals a f (x)dµ(x) replacing µ by a discrete probability distribution. We thus deal with quadrature
formulas Z b Xn
f (x)dµ(x) = wi f (xi ), (1.1)
a i=1

where, for i = 1, · · · , n, the quadrature nodes xi lie in [a, b]. The quadrature weights (wi ) are non-negative
and sum to 1. We will denote by X (resp. w) the sequence of nodes (xi ) (resp. of weights (wi )). Considering
minimal regularity conditions, we assume that f belongs to the Sobolev space H 1 (µ) = {f ∈ L2 (µ), s.t.f 0 ∈
L2 (µ)}, where the derivatives are defined in a weak sense. More generally, for an integer p ≥ 2, we define the
Sobolev space H p (µ) as the subset of functions f of H p−1 (µ) such that f (p) ∈ L2 (µ). In the whole paper,
Rb
we consider the usual norm of H 1 (µ) defined by kf k2H 1 (µ) = kf k2 + kf 0 k2 , where kf k2 = a f 2 (x)dµ(x)
is the norm of L2 (µ). For technical reasons, we assume that µ is a bounded perturbation of the uniform
distribution on [a, b], meaning that it admits a positive continuous probabity density function ρ on [a, b].
This includes a wide range of probability distributions used in practice, such as the truncated normal one,
obtained by conditioning a Gaussian variable to vary in a finite domain. This assumption implies that the
sets L2 (µ), H 1 (µ) contain the same equivalence classes of functions than L2 (a, b), H 1 (a, b), associated to the
uniform distribution on [a, b], with equivalent norms.

Kernel quadrature formulation. When µ is the uniform probability distribution, it is well known that
H 1 (a, b) is a reproducing kernel Hilbert space (RKHS) (Duc-Jacquet, 1973; Thomas-Agnan, 1996). Under
the previous assumption on µ, we will show that H 1 (µ) is also a RKHS (Section 3). In that case, a suitable
criterion to evaluate the accuracy of a quadrature (X, w) is the worst-case error, defined by
Z n
X
wce(X, w, H) = sup h(x)dµ(x) − wi h(xi ) .
h∈H,khkH ≤1 i=1

2
Here, H is some particular given functional space. Indeed, when H is a RKHS, wce(X, w, H) can be explicitly
computed as a function of the kernel K (see Section 2.3 and e.g. Novak and Woźniakowski, 2008, Chapter
10). An interesting quadrature problem is so the following minimization problem:

(P ) : min wce(X, w, H),


X,w

where one wish to identify the minimizing quadrature. Such problem is often called kernel quadrature, or
Bayesian quadrature in a probabilistic framework. There, the prior information is that the integrand is drawn
from a Gaussian process with kernel K, and the worst-case error is equal to the posterior variance of the
integral. We refer to O’Hagan (1991); Karvonen et al. (2019) for a presentation of this latter point of view.

Originality of the problem. We remark that, apart from the case of the uniform distribution, the problem
does not reduce to the more standard quadrature problem with a weight function
Z b n
X
f (x)ρ(x)dx = wi f (xi ) (1.2)
a i=1

where f belongs to H 1 (a, b). Indeed, the unit balls {h ∈ H 1 (µ), khkH 1 (µ) ≤ 1} and {h ∈ H 1 (a, b), khkH 1 (a,b) ≤
1} are different if ρ is not a constant function. Thus wce(X, w, H 1 (µ)) 6= wce(X, w, H 1 (a, b)) and the weighted
quadrature problem, formulated as a worst-case error minimization problem, will in general not give the same
solutions as (P ).

Spectral decomposition of H 1 (µ), Poincaré inequality and T -systems. The resolution of problem
(P) is in general intractable, as K has not an explicit form. A key result of this paper is that there is a
correspondence between the spectral decomposition (Mercer representation) of K and the one associated to
Poincaré inequalities. Furthermore, the common eigenfunctions form a Tchebytchev system (T -system, see
(Karlin and Studden, 1966)). Interestingly, even if K is not know explicitely in general, one can compute
numerically its Mercer representation with a finite element technique (Roustant et al., 2017).

Problem resolution in a finite-dimensional proxy space. Thanks to the spectral decomposition of


H 1 (µ), we can replace problem (P) by the tractable proxy problem

(PM ) : min wce(X, w, HM )


X,w

where H 1 (µ) has been replaced by its projection HM onto the space spanned by the first M eigenfunctions.
Indeed, similarly to polynomials, T -systems admit a Gaussian quadrature and for a given number of nodes n,
there exists a unique quadrature (X, w) with positive weights for which wce(X, w, KM ) = 0, where M = 2n−1
is maximal. We call this optimal quadrature Poincaré quadrature. For a general probability distribution µ,
the Poincaré quadrature can be computed easily and efficiently by linear programming.

Properties of the Poincaré quadrature. We derive several results for the connection between the kernel
associated to H 1 (µ), the Poincaré quadrature nodes and weights, and the associated worst-case error. When
µ is the uniform distribution, the results are explicit (Section 5): the Poincaré quadrature is equal to the
midpoint (rectangle) quadrature rule, its nodes coincide with the zeros of an eigenfunction, as for the Gaussian
quadrature of polynomials, and the worst-case error scales as 2b−a √ n−1 for large n. Furthermore, in the case
3
of H 1 (0, 1), the kernel is given explicitly, and it is possible in this frame to compute the optimal kernel
quadrature without relying on a finite-dimensional approximation. The results obtained by Duc-Jacquet
(1973) show that the optimal kernel quadrature has evenly space nodes and weights asymptotically equal to
1
n , which shows that the Poincaré quadrature is asymptotically optimal.
In the general case, numerical experiments provide empirical insights (Section 6): nodes are nearly evenly
spaced, weights are close to the probability density at nodes, and the worst-case error is approximately
proportional to n−1 for large n.

3
Links with literature. The Mercer representation of Sobolev spaces seems unexplored for non-uniform
probability distributions. For the uniform case, by connecting RKHS to Green’s functions, Fasshauer (2012)
gives Mercer representations of various kernels associated to Sobolev spaces, such as the anchored Sobolev
space H 1 (0, 1) = {f ∈ H 1 (0, 1), f (0) = f (1) = 0} with the usual norm. Dick et al. (2014) provides the
R1 R1
Mercer representation of H 1 (0, 1) for the norm given by kf k2 = ( 0 f (x)dx)2 + 0 f 0 (x)2 dx. While close to
these examples, the case of H 1 (0, 1) with its usual norm seems not reported in the literature.
Similarly, considering Sobolev spaces with a non-uniform probability distribution seems unexplored for
quadrature problems. As mentioned above, this does not boil down to a quadrature with weights for the
uniform distribution, as the unit balls are different. The case of H 1 (0, 1) (uniform case) has been studied by
several authors, with various choices of norms and weight functions (Equation 1.2). For instance, Zhang and
Novak (2019) provide expressions of the radius of information (worst-case error for the optimal quadrature)
R1
in function of the nodes, for the semi-norm 0 f 0 (x)2 dx and centered weight functions. For a constant weight
function, and the usual norm of H 1 (0, 1) considered here, Duc-Jacquet (1973) obtains the optimal kernel
quadrature. The link between T -systems and kernel quadrature has been also exploited in Oettershagen
(2017). There, the kernel is assumed to have an explicit form, and the T -system is obtained by considering
the kernel function at nodes K(xi , .), which is different than our approach based on the spectral decomposi-
tion of K. Therein, the case of H 1 (0, 1) is considered on numerical experiments, but with a different norm
R1 R1
associated to the Bernoulli polynomials, given by kf k2 = ( 0 f (x)dx)2 + 0 f 0 (x)2 dx.

Paper organization. Section 2 gives the prerequisites on Poincaré inequalities, T -systems, RKHS and
kernel quadrature. The RKHS structure of H 1 (µ) is discussed in Section 3, where a connection is established
between Poincaré inequalities and the kernel of H 1 (µ). Section 4 gives general formulas for the optimal
quadrature weights and the associated worst-case error as a function of the kernel. Section 5 focuses on the
case of the uniform distribution. Section 6 presents numerical experiments in the general case. Section 7
concludes and gives an outlook for future research.

2 Background
In the whole paper, we consider a bounded interval of the real line [a, b], with −∞ < a < b < ∞. We consider
a probability distribution µ supported on [a, b] which is a bounded perturbation of the uniform distribution,
in the following sense.
Definition 1 (Bounded perturbation of the uniform distribution). Let µ be a continuous probability distri-
bution on [a, b], with density ρ. We say that µ is a bounded perturbation of the uniform distribution if ρ is
a positive continuous and piecewise C 1 function on [a, b]. We denote by B the set of bounded perturbations
of the uniform distribution on [a, b].
We also denote by V = − log(ρ) the so-called potential associated to µ. Equivalently, ρ(t) = e−V (t) .
Remark 1. (i) Obviously, if µ fulfils the previous definition, then ρ is bounded from below and above by
positive constants. That is, there exist m, M in R such that
∀t ∈ [a, b], 0 < m ≤ ρ(t) ≤ M < +∞.
(ii) When µ ∈ B, it is straightforward that the sets L2 (µ), H 1 (µ) contain the same equivalence classes of
functions than L2 (a, b), H 1 (a, b), associated to the uniform distribution on [a, b], with equivalent norms.

2.1 Poincaré inequalities and basis


This section is based on Roustant et al. (2017) (see also Bakry et al., 2014). Let µ be a probability distribution
R 2 1/2
on [a, b]. For f, g ∈ L2 (µ), let kf k =
R
f dµ be the usual norm, and hf, gi = f gdµ the usual scalar
product. Denote by Varµ (f ) the variance of f :
Z 2
Varµ (f ) := f − f dµ .

We first recall the notion of Poincaré inequality.

4
Definition 2 (Poincaré inequality). We say that µ satisfies a Poincaré inequality if there exists a positive
constant C such that for all f ∈ H 1 (µ):
Varµ (f ) ≤ Ckf 0 k2 .
In this case, the smallest possible constant C above is denoted CP (µ), and called the Poincaré constant of µ.
0 2
When it exists, the Poincaré constant is obtained by minimizing the so-called Rayleigh ratio J(f ) = kf k
kf k2
1
over all centered functions of H (µ). An important result is that a bounded perturbation of the uniform
distribution satisfies a Poincaré inequality, which is related to a spectral decomposition as described in the
following theorem (proved e.g. in Roustant et al., 2017):
Theorem 1 (Spectral theorem). Let µ be a probability distribution in B. Consider the three following
problems:
0 2
(P1) Find f ∈ H 1 (µ) with f dµ = 0 s.t. J(f ) = kf k
R
kf k2 is minimum.

(P2) Find f ∈ H 1 (µ) and λ > 0 s.t. hf 0 , g 0 i = λhf, gi ∀g ∈ H 1 (µ).


(P3) Find f ∈ H 2 (µ) and λ > 0 s.t. f 00 − V 0 f 0 = −λf and f 0 (a) = f 0 (b) = 0.

Then the eigenvalue problems (P2) and (P3) are equivalent. The eigenvalues form an increasing sequence
(λm )m≥0 of non-negative real numbers that tends to infinity. They are all simple, and λ0 = 0. The eigenvec-
tors (ϕm )m≥0 form a Hilbert basis of L2 (µ), and ϕ0 is a constant function.
Furthermore when λ = λ1 , the first positive eigenvalue, (P 2) and (P 3) are equivalent to (P 1) and the mini-
mum of (P 1) is attained for f = ϕ1 . Thus CP (µ) = 1/λ1 .

In this paper, our interest is on the whole spectral decomposition. In particular, we define the Poincaré
basis of L2 (µ) as follows.
Definition 3 (Poincaré basis). Let µ be a probability distribution in B. The Poincaré basis is the orthonormal
basis formed by the eigenfunctions (ϕm )m≥0 defined in the spectral theorem (Theorem 1).

Remark 2. As all eigenvalues are simple, the Poincaré basis is unique up to a sign change for each eigen-
function. We set ϕ0 = 1.
We conclude this section by linking the previous material with the Sturm-Liouville theory of second-order
differential equations.
Proposition 1. Let µ be a probability distribution in B. Then the Poincaré basis consists of the eigenfunc-
tions of the Sturm-Liouville eigenproblem

L(f )(x) = βr(x)f (x) (2.1)

with Neumann conditions f 0 (a) = f 0 (b) = 0, where L(f )(x) = −(p(x)f 0 (x))0 +q(x)f (x) and q = r = p = e−V .
Furthermore, the Sturm-Liouville problem is regular, in the sense that all its eigenvalues are positive.

Proof. Recall that hf, giH 1 (µ) = hf, gi + hf 0 , g 0 i. From the proof of Theorem 2 in Roustant et al. (2017), the
eigenfunctions of the Poincaré operator are solutions of the spectral problem: to find f ∈ H 1 (µ) and β such
that for all g ∈ H 1 (µ),
hf, giH 1 (µ) = βhf, gi. (2.2)
The corresponding eigenvalues are βm = 1 + λm . In particular, βm > 0, as λm ≥ 0. Moreover, Problem (2.2)
is equivalent to the second order differential equation

f 00 (x) − V 0 (x)f 0 (x) − f (x) = −βf (x)

with Neumann conditions f 0 (a) = f 0 (b) = 0 (see also Roustant et al., 2017, proof of Theorem 2). Multiplying
by e−V (x) , which by definition of B does not vanish, we obtain the equivalent Sturm-Liouville form (2.1).

5
2.2 Quadrature with T-systems
This section is based on Karlin and Studden (1966).
Definition 4 (T-systems, generalized polynomials). Let (un )n∈N be a family of real-valued continuous func-
tions defined on a compact interval [a, b] ⊂ R. We say that (un ) is a complete Tchebytchev system, or
simply T-system, if for all integer n ≥ 1 and for all sequence of distinct points a ≤ t1 < · · · < tn ≤ b, the
determinant of the generalized Vandermonde matrix
 
u0 (t1 ) ... u0 (tn )
V (u0 , . . . , un−1 ; t1 , . . . , tn ) := 
 .. .. .. 
. . . 
un−1 (t1 ) . . . un−1 (tn )

is positive. The finite linear combinations of the un ’s are called generalized polynomials or u-polynomials.
A prototype of T -system on any interval of the real line is given by the polynomial functions ui (t) = ti , and
n−1
the
Q determinant of V is nothing more than the Vandermonde determinant det V (1, t, . . . , t , t1 , . . . , tn ) =
(t
1≤i<j≤n j − t i ) (see e.g. Karlin and Studden, 1966, page 1). An equivalent definition of T -systems (up to
a sign change) is that any generalized polynomial, i.e., any linear combination of u0 , u1 , . . . , un , has at most
n zeros (Karlin and Studden, 1966, Theorem 4.1.). This extends the property that (ordinary) polynomials
of degree n have at most n zeros. In that sense, T -systems can be viewed as a generalization of polynomials,
which justifies the name generalized polynomials.

In the context of quadrature problems, the definition of T -systems guarantees that for any set of dis-
tinct quadrature nodes in [a, b], there exists a unique set of quadrature weights such that the quadrature
formula (1.1) is exact at order n − 1, i.e. for all functions in span(u0 , . . . , un−1 ). Indeed, up to reorder-
ing, the equations above define a linear system whose matrix is invertible and equal, up to a sign change,
to V (u0 , . . . , un−1 ; t1 , . . . , tn ). This quadrature formula thus generalizes the Newton-Cotes quadrature of
polynomials, and suffers in general from the same drawback: the weights can be negative and the resulting
quadrature formula can be instable (Oettershagen, 2017; Brass and Petras, 2011).

Interestingly, extending the Gaussian quadrature of ordinary polynomials to more general functions, T -
systems admit a unique quadrature that has positive weights and is exact at order 2n − 1. Contrarily to
the polynomial case, however, the nodes of this quadrature do in general not coincide with the zeros of a
(generalized) polynomial. The computation of the nodes and weights uses a different approach, relying on
geometry. More precisely, consider the moment space:
 Z b
n+1
Mn+1 = c∈R , with ci = ui (t)dσ(t), for all i = 0, . . . , n,
a

where σ is a finite measure with support [a, b] . (2.3)

Mn+1 is the closed convex hull of the curve (u(t))t∈[a,b] where u(t) = (u0 (t), . . . , un (t)) ∈ Rn+2 (t ∈ [a, b]).
Then, we may enounce precisely the previous quadrature results.
Proposition 2. Let u = (un )n∈N be a T -system, and µ be a probability distribution in B. Then,
R 
b
1. For all n ∈ N, the vector c̄ := a ui (t)dµ(t) is an interior point of M2n .
0≤i≤2n−1

2. There exists a unique quadrature (1.1) with positive weights which is exact at order 2n − 1 (i.e. exact
on the vector space spanned by u0 , . . . , u2n−1 ). It uses a minimal number of nodes, equal to n. The
nodes are all in the open interval (a, b) and the weights sum to 1.
3. This quadrature is obtained by solving the minimization problem
Z b
min u2n (t)dσ(t) (2.4)
σ∈V2n−1 (c) a

6
Rb
over the set V2n−1 (c) of all probability distributions subject to moment conditions a
ui (t)dσ(t) = c̄i ,
for i = 0, 1, . . . , 2n − 1.
Proof of Proposition 2. The proof is based on different results of Karlin and Studden (1966), that we will
refer to. Let us denote N = 2n − 1 the quadrature order.
First, the separation Lemma 9.2., page P point of MN if and only if for all non
65, states that c is an interiorP
N N
negative non trivial u-polynomial g = i=0 ai ui , we have mg (c) = i=0 ai ci > 0. Taking c = c̄, we have
Rb
mg (c̄) = a gdµ and thus mg (c̄) ≥ 0. As the density function of µ is continuous and does not vanish on [a, b],
we can conclude that mg (c̄) > 0, which shows that c is an interior point of MN .
Now, following Karlin and Studden (1966), §3, case (ii), page 46, let define the notion of index of a sequence
of nodes, as the number of nodes in [a, b], with a half weight for the nodes equal to the endpoints a, b (if
any). Then, as N is odd, there are exactly two quadratures with positive weights and the smallest possible
index, equal to (N + 1)/2 = n. These quadrature are called principal representations in this context. For
one of them, called upper principal representation, the nodes include the endpoints, and thus the quadrature
involves n + 1 nodes formed by the endpoints a, b and n − 1 nodes in (a, b). The other one, called lower
principal representation, involves n nodes in the open interval (a, b). Thus, it is the only quadrature with
positive weights and containing the smallest number of nodes, equal to n.
Furthermore, they show in Theorem 1.1. page 80, that, if u is a T -system, the solution of (2.4) is unique and
equal to the lower representation of u. In particular, the weights are positive and sum to one.
Finally, if ui (t) = ti , we have equality with the Gaussian quadrature by uniqueness of the quadrature, since
the Gaussian quadrature has n distinct nodes in (a, b), positive weights summing to one, and is exact for
u0 , . . . , u2n−1 (Karlin and Studden, 1966, Chapter IV).
Remark 3. Replacing the minimization problem in (2.4) by maximization, we obtain another valid quadrature
of oder 2n − 1, called upper principal representation. However, it involves one more node, i.e. n + 1 nodes,
including the endpoints a, b. Equivalently, for a fixed number n of nodes, this quadrature has order 2n − 3,
compared to 2n − 1 for the Gaussian quadrature. It generalizes the Lobatto quadrature for polynomials (Brass
and Petras, 2011).
Definition 5 (Gaussian quadrature for T -systems). The unique quadrature of Prop. 2 is called lower principal
representation of u. By analogy with polynomials, and following (Brass and Petras, 2011; Oettershagen,
2017), we will call it the generalized Gaussian quadrature, or simply the Gaussian quadrature of the T -
system (un )n∈N .
We now show that a Poincaré basis is a T -system. This is an immediate consequence of the example of
eigenfunctions of Sturm-Liouville problems (see e.g. Example 7 in Karlin and Studden, 1966). A proof can
be found in Gantmakher and Krejn (2002). However, the proof is not easy to read as it is split in several
parts. We recall below the main steps and give a roadmap for the interested reader.
Proposition 3. Consider the notations and the assumption of Theorem 1. Then the Poincaré basis (ϕm )m∈N
is a T -system.
Proof. By Prop. 1, the eigenfunctions of the Poincaré operator are eigenfunctions of the regular Sturm-
Liouville problem (2.1). Then the result is a particular case of a more general result stating that the
eigenfunctions of a regular Sturm-Liouville operator form a T -system. A proof can be found in Gantmakher
and Krejn (2002). We give here the three main steps, and pointers to the corresponding sections.
Firstly, it is proved in IV.10.4, (pages 236 – 238), that the eigenfunctions of a Sturm-Liouville operator under
boundary constraints verify an integral equation of the form
Z b
ϕ(x) = λ K(x, s)ϕ(s)dσ(s) (2.5)
a
where λ > 0, σ is a probability distribution, and K is the so-called Green function of L, defined by:
K(x, s) = ψ(min(x, s))χ(max(x, s)),
where ψ, χ are particular solutions of the homogeneous equations L(f ) = 0, such that ψ χ is a non-decreasing
function.
Secondly, it is proved that K is an oscillatory kernel, in the sense of1 Definition 1 page 178. Roughly speaking,
1 see also Definition 1’ page 179, and definitions 4 and 6, pages 74 and 76.

7
it means that every matrix extracted from K in ascending order, i.e. (K(xi , sj ))1≤i,j≤n with x1 < · · · < xn
and s1 < · · · < sn is positive semidefinite. The proof starts at page 78 (Example 5, ‘Single-pair’ matrices),
continues at page 103 (Theorem 12), page 220 (Criterion A) and ends at page 238 (Theorem 16).
Thirdly, if K is an oscillatory kernel, then the solutions of the integral equation (2.5) form a T -sytem, which
is proved in Theorem 1, page 181.

2.3 Kernel quadrature


RKHS. We first recall some facts on reproducing kernel Hilbert spaces (RKHS), refering to Berlinet and
Thomas-Agnan (2011) for more details. For a given set T , let H be a Hilbert space of functions T → R,
with norm k.k and scalar product h., .i. We say that H is a RKHS if for all x ∈ T , the evaluation functions
h ∈ H 7→ h(x) are continuous. It can be shown that a RKHS is in bijection with a semi-definite positive
function, also called kernel. If K is a kernel associated to H, we write H = HK . The RKHS HK is
characterized by the so-called reproducing property
∀x ∈ T, ∀h ∈ HK , hK(x, .), hi = h(x).
In particular, choosing h = K(y, .), we get
∀x, y ∈ T, hK(x, .), K(y, .)i = K(x, y).

Worst-case error in RKHS. In RKHS, worst-case quantities of linear functionals can be computed
explicitly. Indeed, for instance, the Cauchy-Schwartz inequality gives that for all h ∈ HK and all x ∈ T :
|h(x)| = |hK(x, .), hi| ≤ kK(x, .)kkhk,
from which it is obvious that suph∈HK ,khk≤1 |h(x)| = kK(x, .)k. Furthermore, by the reproducing property,
we have kK(x, .)k2 = K(x, x), and finally, for all x ∈ T ,
p
sup |h(x)| = K(x, x).
h∈HK ,khk≤1

A similar computation can be done for linear functionals defined by quadrature formulas. Let us first define
the worst-case error of a general quadrature.
Definition 6 (worst-case error of a quadrature). Let (X, w) be a quadrature composed of a set of nodes
X = (x1 , . . . , xn ) ∈ [a, b]n and a set of weights w = (w1 , . . . , wn ) ∈ Rn . Let H be a set of functions on
[a, b] → R. The worst-case error of (X, w) on H is defined by
Z Xn
wce(X, w, H) = sup h(x)dµ(x) − wi h(xi ) .
h∈H,khk≤1 i=1

If H is a RKHS HK , we simply denote wce(X, w, K) = wce(X, w, HK ).


By a direct extension of the computation above with T = [a, b], we have
Z Xn
wce(X, w, K) = K(x, .)dµ(x) − wi K(xi , .) .
i=1 HK

Using the reproducing property, one obtains the well-known analytical expression (see e.g. Novak and
Woźniakowski, 2008, Chapter 10):
ZZ n
X Z X
wce(X, w, K)2 = K(x, x0 )dµ(x)dµ(x0 ) − 2 wi K(xi , x)dµ(x) + wi wj K(xi , xj )
i=1 i,j

which can be rewritten in the matricial form


wce(X, w, K)2 = w> K(X, X)w − 2`K (X)> w + cK (2.6)
R
where K(X, X) = (K(xi , xj ))1≤i,j≤n is the Gram matrix, `K (X) = RR ( K(xi , x)dµ(x))1≤i≤n is the column
vector formed by the primitive function of the kernel at xi and cK = K(x, x0 )dµ(x)dµ(x0 ) is a constant.

8
Kernel quadrature. A kernel quadrature is obtained by minimizing the worst-case error. We will need
the following assumption:
Assumption 1. If for all i 6= j, xi 6= xj then the Gram matrix K(X, X) is invertible.
Under Assumption 1, for a given set of nodes X formed by different nodes, then (2.6) defines a strictly convex
function. Thus, it has a unique minimum, denoted by w? (X, K). By solving the first order conditions, we
directly get the exact expression of the vector of optimal weights:

w? (X, K) = K(X, X)−1 `K (X). (2.7)

After some algebra, we obtain the corresponding minimal value for the worst case error:

wce(X, w? , K)2 = cK − `K (X)> K(X, X)−1 `K (X). (2.8)

Kernel quadrature and optimal kernel quadratures can then be defined as follows.
Definition 7 (Kernel quadrature, optimal kernel quadrature). Let X be a set of nodes, and assume that
Assumption 1 is verified. 1) The kernel quadrature associated to X on HK is the quadrature (X, w? (X, K))
that minimizes the worst-case error wce(X, w, K) over all sets of weights in Rn .
2) An optimal kernel quadrature is a quadrature (X ? , w? ) that minimizes the worst-case error wce(X, w, K)
among all quadratures (X, w), or equivalently, that minimizes wce(X, w? (X, w), K) over all sets of nodes X.
Remark 4. Notice that the weights of a kernel quadrature are constrained neither to be positive, nor to sum
to 1.

3 Spectral decomposition of H 1 (µ) with the Poincaré basis


We show our main result: when µ is a bounded perturbation of the uniform distribution, then H 1 (µ) is a
RKHS whose kernel eigenfunctions coincide with the Poincaré basis. This comes from two connections: the
first one between Poincaré basis and Sturm-Liouville problems, the second one between Sturm-Liouville prob-
lems and RKHS via Green’s functions. We illustrate the result on two examples where explicit computations
can be made.

3.1 Main result


Proposition 4 (Mercer’s representation of H 1 (µ) with the Poincaré basis). Assume that µ is a probability
distribution in B. Let denote by (λm , ϕm )m∈N the eigenvalues and (normalized) eigenfunctions of the Poincaré
operator. Let, for m ∈ N, αm = (1 + λm )−1 . Then,
1. H 1 (µ), with its usual Hilbert norm kf k2H 1 (µ) = kf k2 + kf 0 k2 , is a RKHS. Its kernel K is continuous
Rb
on [a, b]2 and verifies a K(x, y)dµ(y) = 1 for all x ∈ [a, b].
2. The Mercer decomposition of K is written

X
K(x, y) = αm ϕm (x)ϕm (y) (x, y ∈ [a, b]) (3.1)
m=0

where the convergence is uniform on [a, b]2 .


3. K is a single-pair kernel (Gantmakher and Krejn, 2002), meaning that it has the form
1
K(x, y) = ψ(min(x, y))χ(max(x, y)). (x, y ∈ [a, b]) (3.2)
C
Furthermore, ψ, χ are two linearly independent solutions of the homogeneous equation f 00 −f 0 V 0 −f = 0
Rb Rb
such that ψ 0 (a) = 0 and χ0 (b) = 0, and C = χ(b) a ψ(x)dµ(x) = ψ(a) a χ(y)dµ(y) is a normalization
constant.

9
Proof. Under the assumption on µ, for all f ∈ L2 [a, b],
Z b Z b Z b
m(b − a) f 2 dλ ≤ f 2 dµ ≤ M (b − a) f 2 dλ
a a a

where m, M are defined in Remark 1 and λ is the uniform distribution on [a, b]. Thus L2 (µ) = L2 (a, b) (with an
equivalent norm). Similarly H 1 (µ) = H 1 (a, b) (with an equivalent norm). Now, it is well known that H 1 (a, b)
is a RKHS, with an explicit kernel (see e.g. Duc-Jacquet, 1973; Thomas-Agnan, 1996). Thus, for all x ∈ [a, b],
the evaluation f ∈ H 1 (a, b) 7→ f (x) is continuous. By equivalence of the norms, f ∈ H 1 (µ) → f ∈ H 1 (a, b)
is continuous. Hence by composition, f ∈ H 1 (µ) → f (x) is continuous. This shows that H 1 (µ) is a RKHS.
Let us denote by K its kernel.
The link between K and the Poincaré inequality can be understood through the bilinear form a(f, g) =
hf, giH 1 (µ) . Consider the spectral problem: find f ∈ H 1 (µ) and β such that for all g ∈ H 1 (µ),

a(f, g) = βhf, gi (3.3)

From (Roustant et al., 2017, Theorem 2 and its proof), under the assumption on µ, there exists a countable
sequence of solutions, which is given by βm = 1 + λm and ϕm (m ∈ N). Notice that βm > 0 for all m ∈ N.
Furthermore βm , ϕm are defined in an unique way (up to a change sign of the eigenfunctions) because the
eigenvalues are simple and the eigenfunctions have norm 1.
Now, since H 1 (µ) is a RKHS, the functions K(x, .) are dense in H 1 (µ) (x ∈ [a, b]). Thus, Problem (3.3) is
equivalent to:
∀x ∈ [a, b], a(f, K(x, .)) = βhf, K(x, .)i.
By the reproducing property, a(f, K(x, .)) = hf, K(x, .)iH 1 (µ) = f (x). Hence, (3.3) is equivalent to: find
f ∈ H 1 (µ) and β such that
Z b
∀x ∈ [a, b], f (x) = β K(x, y)f (y)dµ(y), (3.4)
a

which is equivalent to the spectral decomposition of the Hilbert-Schmidt operator associated to K (recall
that β > 0).
Moreover, by Proposition 1 and its proof, Problem (3.3) is equivalent to the regular Sturm-Liouville prob-
lem (2.1). Thus, from (Gantmakher and Krejn, 2002, Section 10, pages 234-238) or (Fasshauer, 2012), we
also obtain that the solution of (2.1) is equivalent to the solution of (3.4). In this context, K is called Green
function. But this point of view gives sharp information and leads to
1
K(x, y) = ψ(min(x, y))χ(max(x, y)), (x, y ∈ [a, b])
C
where ψ, χ are two linearly independent solutions of the homogeneous equation L(f ) = 0 such that ψ 0 (a) =
0 and χ0 (b) = 0 (Gantmakher and Krejn, 2002, section 7). The constant C is determined such that
Rb
a
K(x, y)dµ(y) = 1 for all x ∈ [a, b]. Indeed, as the constant function 1 belongs to H 1 (µ), the RKHS
reproducing property gives, for all x ∈ [a, b]:
Z b
1 = 1(x) = h1, K(x, .)iH 1 (µ) = K(x, y)dµ(y).
a
Rb Rb
For instance, choosing y = b or x = a, we obtain C = χ(b) a ψ(x)dµ(x) = ψ(a) a χ(y)dµ(y).
Now ψ and χ are continuous, as elements of H 1 (µ) (whose functions are equal to those of H 1 (a, b)). As
min, max are continuous functions, we obtain, by composition, that K is continuous on [a, b]2 . Hence by
Mercer theorem (see e.g. Berlinet and Thomas-Agnan, 2011), K is written in terms of the solutions of (3.4)
as X
K(x, y) = αm ϕm (x)ϕm (y), (x, y, ∈ [a, b]) (3.5)
m∈N
1 1
with αm = βm = 1+λm , and the convergence is uniform on [a, b]2 .

10
Remark 5. We mention another way to obtain the Mercer’s representation of K. A property of the Poincaré
basis is that it is an orthogonal basis of H 1 (µ) with kϕm k2H 1 (µ) = 1 +λm = 1/αm (Lüthen et al., 2021). Thus,

the functions em = αm ϕm (m ≥ 0) define an orthonormal basis of H 1 (µ). Then, the representation (3.5) is
obtained with the usual representation of a kernel in a separable RKHS (Berlinet and Thomas-Agnan, 2011):
X X
K(x, y) = αm ϕm (x)ϕm (y) = em (x)em (y).
m∈N m∈N

However, following this way leads only to pointwise convergence. So that, it is not clear whether K is
continuous on [a, b]2 . Thus, another argument has been used here, coming from the Green function point of
view, to prove the kernel continuity.
Remark 6. At first look, it may be surprising that the Neumann conditions f 0 (a) = f 0 (b) = 0 do not appear
in the RKHS, whereas all basis functions ϕm satisfy it (while being dense in L2 (µ)). Actually, it is not
difficult to see that any function f of H 1 (µ) can be approximated by a function f ∈ H 1 (µ) that verifies the
Neumann condition, simply by truncating f on [a + , b − ] and extending it continuously by a constant on
[a, a + ] and [b − , b]. As functions of H 1 (µ) are continuous on the compact interval [a, b] (still under our
assumption on µ), the approximation error can be made arbitrary small.

3.2 Examples
Example 1 (Case of the uniform distribution). Let µ be the uniform distribution on [a, b]. Then L is the
Laplacian operator, and the spectral problem becomes

ϕ00 (x) = −λϕ(x) ∀x ∈ [a, b],

with Neumann conditions ϕ0 (a) = ϕ0 (b) = 0. The solutions are given by λ0 = 0, ϕ0 = 1 and for m ≥ 1,

λm = m2 ω 2 , ϕm (x) = 2 cos(mω(x − a)),

with ω = π/(b − a). The kernel of H 1 (µ) (with its usual norm) has been obtained by Duc-Jacquet (1973) in
the 70’s. English-written proofs can be found in Atteia (1992), Example 1.4, or Thomas-Agnan (1996)). The
kernel may be written as
b−a
K(x, y) = cosh[min(x, y) − a] cosh[b − max(x, y)]
sinh(b − a)
ex +e−x ex −e−x
where cosh, sinh denote the hyperbolic functions: cosh(x) = 2 , sinh(x) = 2 . Applying Proposi-
tion 4, we deduce that for all (x, y) ∈ [a, b]2 such that x ≤ y,
+∞
π/ω X 1
K(x, y) = cosh(x − a) cosh(b − y) = 1 + 2 2 ω2
cos[mω(x − a)] cos[mω(y − a)].
sinh(π/ω) m=1
1 + m

As a by-product, we can derive the value of some ‘shifted’ Riemann series. For instance, from x = y = a and
x = a, y = b and using r = 1/ω, we get the formulas (with tanh(x) = sinh(x)/ cosh(x)), valid for all r > 0:
+∞ +∞
(−1)n−1
   
X 1 1 πr X 1 πr
2 2
= 2 −1 , = 1 − .
n=1
n +r 2r tanh(πr) n=1
n2 + r 2 2r2 sinh(πr)

It can be shown that these formulas are also valid when r tends to zero. The limit case gives the well-known
expression at s = 2 of the Riemann and Dirichlet functions: ζ(2) = π 2 /6 and η(2) = π 2 /12.

As for the classical space H 1 (a, b), the kernel of H 1 (µ) and its Mercer representation are completely
tractable for the truncated exponential distribution.

11
Example 2 (Truncated exponential distribution). Consider the exponential distribution, truncated on [a, b] ⊆
dµ e−x
R+ : = exp(−V (x)) = −a 1[a,b](x) . Noting that V 0 (x) = 1 on [a, b], leads to linear differential
dx e − e−b
equations with constant coefficients. Following Roustant et al. (2017, Proof of Proposition 5), the spectral
problem
ϕ00 − ϕ0 = −λϕ
with Neumann conditions ϕ0 (a) = ϕ0 (b) = 0, admits the solutions, λ0 = 0, ϕ0 ≡ 1 and for m ≥ 1,
1
λm = + (mω)2 , ϕm (x) = cm ex/2 (2mω cos(mω(x − a)) − sin(mω(x − a))) (3.6)
4
where ω = π/(b − a) and cm is a normalizing constant ensuring that ϕm has L2 (µ) norm equal to 1. More
precisely, we have
 −a 1/2
e − e−b 1
cm = .
b − a 2λm
From Proposition 4 and following Gantmakher and Krejn (2002), the Green function associated to this spectral
problem can be computed by considering the linear homogeneous equation
ϕ00 − ϕ0 − ϕ = 0.
√ √
The solutions are spanned by er0 x , er1 x , where r0 = 1−2 5 and r1 = 1+2 5 (the gold number). Solutions ψ, χ
satisfying one-sided Neumann conditions ψ 0 (a) = 0 and χ0 (b) = 0 are given by
ψ(x) = (r1 er1 a )er0 x − (r0 er0 a )er1 x , χ(x) = (r1 er1 b )er0 x − (r0 er0 b )er1 x . (3.7)
Rb
Finally, the normalization constant is given by C = a
ψ(x)χ(b)dµ(x). After some algebra, we obtain

(er0 a+r1 b − er0 b+r1 a )(r1 − r0 )


C= .
e−a − e−b
Finally the kernel of H 1 (µ) is given explicitly by its single-pair form (3.2) or by its Mercer representation
(3.1), where αm = (1 + λm )−1 and λm , ϕm are given by (3.6).

4 Poincaré quadrature and optimal kernel quadrature in H 1 (µ)


Proposition 4 shows that the spectral decomposition associated to Poincaré inequalities is in correspondence
with the spectral decomposition of the kernel of H 1 (µ), viewed as a RKHS. However, the kernel of H 1 (µ) is in
general not directly available, which makes optimal kernel quadrature in H 1 (µ) intractable. This motivates
us to focus on the quadratures defined from the Poincaré basis, as eigenfunctions of H 1 (µ).

4.1 Definitions and notations


Definition 8 (Poincaré quadrature). We call Poincaré quadrature the Gaussian quadrature of the T -system
of the Poincaré basis of µ, as defined in Definition 5. We denote it by (XP , wP ).
We now establish a connection between the Poincaré quadrature and the kernel quadrature spanned by
the Poincaré basis functions. To reach this goal, we first check that these kernel quadratures are properly
defined, by showing that Assumption 1 is verified both for K and its finite-dimensional approximation KM ,
defined below.
Definition 9 (Finite-dimensional kernel for H 1 (µ)). Let M a non-zero integer, and consider the notations
of Proposition 4 and its assumptions. We set
M
X
KM (x, y) = αm ϕm (x)ϕm (y), (x, y ∈ [a, b]) (4.1)
m=0

the truncated Mercer representation of the kernel of H 1 (µ).

12
Proposition 5 (Invertibility of the Gram matrix for truncated Mercer representation of H 1 (µ)). Assump-
tion 1 is verified for KM when X contains at most M + 1 distinct points.
Proof. Let x0 , . . . , xn a set of distinct points with n ≤ M . For i = 0, . . . , n, denote ei = KM (xi , .) ∈ H 1 (µ).
Then KM (xi , xj ) = hei , ej iH 1 (µ) . Thus KM (X, X) is the Gram matrix of the vectors ei for the scalar product
in H 1 (µ). By a classical result, it is invertible if and only if e0 , e1 , . . . , en are linearly independent. It is enough
to prove that e0 , e1 , . . . , eM are linearly independent, since n ≤ M . Now, by definition, we have
M
X
ei = KM (xi , .) = Ai,m ϕm (.)
m=0

with Ai,m = αm ϕm (xi ). Hence, if e = (e1 , . . . , eM )> and ϕ = (ϕ0 , . . . , ϕM )> are column vectors whose
elements are in H 1 (µ), then we have e = Aϕ. Remember that the ϕm ’s are linearly independent (m ≥ 0), as
they form an Hilbertian basis of L2 (µ). Thus the coordinates of e are linearly independent if and only if A is
invertible. Remarking that the mth column of A is proportional to the vector (ϕm (xi ))0≤i≤M with a non-zero
multiplicative coefficient αm . Hence, the rank of A is equal to the rank of the matrix (ϕm (xi ))0≤m,i≤M . As
the ϕm ’s form a T -system, this matrix is invertible, which completes the proof.
Proposition 6 (Invertibility and form of the Gram matrix for H 1 (µ)). Let K be the kernel of H 1 (µ), where
µ is a probability distribution in B. Then, Assumption 1 is verified for all set X composed of distinct knots.
Furthermore, in that case, if the elements of X are sorted in ascending (or decreasing) order, the precision
matrix K(X, X)−1 is a one-band matrix (or Jacobi matrix) of the form:
 
a1 b1 0 0 ... 0
 b1 a2 b2 0 0 
.. 
 
 ..
 0 b2 a3
 b 3 . . 

. . . . .
 .. .. .. .. ..

 0  
0 0 bn−2 an−1 bn−1 
0 0 ... 0 bn−1 an

Proof. By Proposition 4 and Theorem 16 (page 238) of Gantmakher and Krejn (2002), the single-pair kernel
K is oscillatory in the sense of definition 1 page 178 of the same reference, implying in particular that for all
X composed of distinct knots, K(X, X) is invertible. The form of the precision matrix is derived in section
II.3., example 6, pages 79-82 of the same reference.
Remark 7 (Link with Gaussian graphical models). The one-band form of the precision matrix of Propo-
sition 6 shows that a Gaussian process associated to the kernel of H 1 (µ) is a graphical Gaussian model
(Maathuis et al., 2018). Indeed, consider the graph whose vertices are the quadrature nodes sorted in as-
cending order x1 < · · · < xn and edges are formed by successive nodes (xi , xi+1 ) with i = 1, . . . , n − 1. An
associated Gaussian process Z verifies that, for |i − j| > 1, Z(xi ) and Z(xj ) are independent conditionaly
on {Z(x` ) s.t. ` ∈/ {i, j}}. This is equivalent to the one-band form of the precision matrix of Z computed at
x1 , . . . , x n .

4.2 Equivalence of Poincaré and optimal kernel quadratures in H 1 (µ)


In the previous section, we proved that kernel quadrature is well defined both for the kernel K of H 1 (µ)
and its finite-dimensional approximation KM . Now, we show that the Poincaré quadrature can be viewed as
an optimal kernel quadrature with positive weights for the finite dimensional approximation of the kernel of
H 1 (µ) in its Mercer’s representation.

Proposition 7 (Equivalence of Poincaré quadrature and optimal kernel quadrature in H 1 (µ)). Let (XP , wP )
be the Poincaré quadrature of H 1 (µ) with n nodes and order M = 2n − 1. Then, wce(XP , wP , KM ) = 0 and
(XP , wP ) is an optimal kernel quadrature for HKM , with positive weights.
Conversely, if (X, w) defines a kernel quadrature for HKM such that wce(X, w, KM ) = 0 and the weights are

13
positive, then X = XP and w = wP .
Furthermore, wP minimizes over all (possibly negative) weights the worst-case error given XP :

wP = w? (XP , KM ) = argminw∈Rn wce(XP , w, KM ).

Remark 8. As a consequence of Proposition 7, the optimal weights w? (XP , KM ) are equal to wP , and are
thus positive and sum to 1, which was not obvious a priori as they are defined by an optimization problem on
Rn .
Proof. Let us first consider the Poincaré quadrature. We know that
Z n
X
ϕm (x)dµ(x) = wj ϕm (xj ), m = 0, . . . , M. (4.2)
j=1

Let x0 ∈ [a, b]. By linearity, we deduce from (4.2) that


M
X Z M
X n
X
0
αm ϕm (x ) ϕm (x)dµ(x) = αm ϕm (x0 ) wj ϕm (xj ),
m=0 m=0 j=1

or equivalently !
M
Z X n
X M
X
0 0 j
αm ϕm (x )ϕm (x)dµ(x) = wj αm ϕm (x )ϕm (x ) ,
m=0 j=1 m=0

i.e. Z n
X
KM (x, x0 )dµ(x) = wi K(xi , x0 ).
i=1

This proves that wce(XP , wP , KM ) = 0.


Conversely, let (X, w) be a kernel quadrature for HKM such that wce(XP , wP , KM ) = 0 andR with positive
weights.
Pn By definition of the worst-case error, this implies that for all f in the unit ball of HKM , f (x)dµ(x) =
i=1 wi f (xi ). By considering f /kf kHKM , this identity is true for all f ∈ HKM . In particular for f = ϕm
with m ≤ M , we deduce that the quadrature defined by (X, w) is exact for all ϕm such that m ≤ M . Thus,
by uniqueness of the Gaussian quadrature of the T -system (ϕm )m∈N (Proposition 2), we deduce that X = XP
and w = wP .
Now, denote XP = {x1 , . . . , xn }. Recall that the nodes xi are all different by a property of the (generalized)
Gaussian quadrature. By Proposition 5, the Gram matrix (KM (xi , xj ))0≤i,j≤n is then invertible. This implies
that the minimization problem
min wce(XP , w, KM )
w
?
has a unique solution w (XP , KM ). Since wce(XP , wP , KM ) = 0 and wce(XP , w, KM ) ≥ 0 for all w, we
obtain w? (XP , KM ) = wP , which concludes the proof.

4.3 Formulas for optimal weights


Exploiting the equivalence of Poincaré quadrature and kernel quadrature, we obtain several formulas for the
optimal weights.
Proposition 8 (Expression of the optimal weights and associated worst-case error for H 1 (µ)). Let K be the
kernel of H 1 (µ). Then, for all set X composed of n distinct quadrature knots, we have

w? (X, K) = K(X, X)−1 1, (4.3)

where 1 is the vector of ones of length n, and


n
X
(wce(X, w? (X, K), K))2 = 1 − 1> K(X, X)−1 1 = 1 − wi? (X, K). (4.4)
i=1

14
Similarly, when X is formed by at most M + 1 distinct points, Equations (4.3) and (4.4) are true when
replacing K by KM . In particular, let (XP , wP ) be the Poincaré quadrature of H 1 (µ) with n nodes and order
M = 2n − 1. Then we have:
wP = w? (XP , KM ) = KM (XP , XP )−1 1. (4.5)
.
Proof. First recall that Assumption 1 is verified both for K and KM (Prop. 6 and 5), in the latter case when
X has at most M + 1 points. Let us first consider the case of the kernel of H 1 (µ). By Proposition 4, 1., we
have, for all y ∈ [a, b]:
Z b
`K (y) = K(x, y)dµ(x) = 1.
a
Rb
From (2.7), we then deduce (4.3). Finally, from (2.8), we deduce (4.4), using that cK = a `K (y)dµ(y) = 1.
The same proof applies when K = KM , remarking that, by orthonormality of the Poincaré basis,
Z b M
X Z b
`KM (y) = KM (x, y)dµ(x) = αm ϕm (y) ϕm (x)dµ(x) = 1.
a m=0 a

This implies that formulas (4.3) and (4.4) hold when K is replaced by KM .
Now, let us set X = XP . From Proposition 7, we have wP = w? (XP , KM ). This implies (4.5). Notice that,
as the the weights wP sum to one, we recover one statement of Proposition 7 that wce(XP , wP , KM ) = 0.

4.4 Quadrature error


The quadrature error can be quantified using the radius of information, which is defined as the smallest
worst-case error of the optimal kernel quadrature with n nodes:

r(n) = inf wce(X, w, H 1 (µ)). (4.6)


X,w

In our case, the radius of information can hardly be computed. However, we can consider the Poincaré
quadrature, which is the optimal kernel quadrature with positive weights of the finite-dimensional approx-
imation of H 1 (µ), and compute the corresponding worst-case error in H 1 (µ). More precisely, if (XP , wP )
denotes the Poincaré quadrature with n nodes and order M = 2n − 1, we set:

wce(n) = wce(XP , wP , H 1 (µ)). (4.7)

By definition we have wce(n) ≥ r(n). Furthemore, when n is large, KM tends to K and we can hope that
wce(n) is a good approximation of r(n). We now provide formulas for wce(n).
Proposition 9 (Quadrature error). The worst-case error of the Poincaré quadrature with n nodes and order
M = 2n − 1 can be expressed with the Mercer representation of H 1 (µ), by:

n
!2
X X
2
wce(n) = αm wi ϕm (xi ) , (4.8)
m≥M +1 i=1

or with formulas involving the kernel of H 1 (µ):

wce(n)2 = wP> (K(XP , XP ) − KM (XP , XP ))wP (4.9)


= 1> KM (XP , XP )−1 (K(XP , XP ) − KM (XP , XP ))KM (XP , XP )−1 1 (4.10)

Furthermore, we have, for all n ∈ N,


p
wce(n) ≤ kK − K2n−1 k∞ ,

which goes to zero when n tends to infinity.

15
Proof of Proposition 9. Denote by L the linear form on HK defined by
Z Xn
L(f ; X, w) = f (x)dµ(x) − wi f (xi )
i=1

and X
RM (x, x0 ) = αm ϕm (x)ϕm (x0 ). (4.11)
m≥M +1

Notice that K = KM + RM . For the Poincaré quadrature, we have:


M Z n
!
X X
0 0
L(KM (., x ); XP , wP ) = αm ϕm (x ) ϕm (x)dµ(x) − wi ϕm (xi ) =0
m=0 i=1

as the quadrature is exact for the eigenfunctions up to order M . Hence, by linearity, it holds for all x0 :
L(K(., x0 ); XP , wP ) = L(RM (., x0 ); XP , wP ).
Now, remark that for all quadrature formula (X, w),
wce(X, w, K) = kx0 7→ L(K(., x0 ))kHK .
Therefore, we have wce(XP , wP , KM ) = 0 and wce(XP , wP , K) = wce(XP , wP , RM ). Thus, the quantity of
interest reduces to:
wce(n) := |wce(XP , wP , KM ) − wce(XP , wP , K)| = wce(XP , wP , RM ).
R
Now, when m ≥ M + 1, we have ϕm dµ = 0. Thus,
wce(n)2 = kx0 7→ L(RM (., x0 ); XP , wP )k2HK
! 2
X X n
= αm wi ϕm (xi ) ϕm
m≥M +1 i=1
HK

1 −1
As (ϕm ) is an orthogonal basis of H (µ) with kϕm k2HK = 1 + λm = αm , we immediately obtain
n
!2
X X
2
wce(n) = αm wi ϕm (xi ) .
m≥M +1 i=1
X
Remarking that RM (xi , .) = αm ϕm (xi )ϕm (.), we get
m≥M +1

n 2
X
2
wce(n) = wi RM (xi , .)
i=1 HK

which gives, using the reproducing property (as RM (xi , .) ∈ HK ),


X
wce(n)2 = wi wj RM (xi , xj )
1≤i,j≤n

which gives (4.9). Using (4.5), we deduce (4.10). By Mercer theorem, as K is continuous, the series
P
m αm ϕm (x)ϕm (y) converges to K(x, y) uniformly on [a, b] × [a, b]. Thus RM goes to zero uniformly on
[a, b]2 , and, using the positivity of the weights,
 
X
wce(n)2 ≤  wi wj  kRM k∞ .
1≤i,j≤n

P Pn 2
The results follows by remarking that 1≤i,j≤n wi wj = ( i=1 wi ) = 1.

16
5 The case of H 1 (a, b)
5.1 Nodes coincide with zeros of a basis function
For polynomials, the nodes of the Gaussian quadrature coincide with the zeros of orthogonal polynomials
(Szegö, 1959). The main result of this section can be viewed as a variant of this property on the circle. Indeed,
for the uniform distibution, the Poincaré basis consists of trigonometric functions that verify a stability-by-
product property, similarly to power functions. The key argument is that the quadrature is not only exact
for the functions of the T -system (up to some order) but also for their products. This guarantees that the
quadrature nodes coincide with the zeros of an element of the T -system.
Proposition 10. The nodes of the Poincaré quadrature of H 1 (a, b) with n nodes are equal to the zeros of
the Poincaré basis function ϕn .
Proof. We first prove that the Poincaré quadrature with n nodes is exact for the product ϕj ϕk of eigenfunc-
tions with j, k ∈ N such that j < n and k ≤ n. √
Recall that when µ is the uniform distribution on [a, b], we have ϕm (x) = 2 cos(mω(x − a)) with ω =
π/(b − a). Using the trigonometric identity cos(u) cos(v) = 21 (cos(u + v) + cos(u − v)), it follows that
1
ϕn (x)ϕm (x) = (ϕn+m (x) + ϕn−m (x)) .
2
By Definition 5, the Gaussian quadrature of the T -system (ϕm )m∈N with n nodes is exact for ϕm for 0 ≤
m ≤ 2n − 1. Let j, k ∈ N such that j < n, k ≤ n. Without loss of generality, assume k ≥ j. This implies
that both k + j and k − j are non-negative and lower or equal than 2n − 1. Thus, the quadrature is exact for
ϕk+j and ϕk−j and we have:
Z Z Z
1 1
δkj = ϕk (x)ϕj (x)dµ(x) = ϕk+j (x)dµ(x) + ϕk−j (x)dµ(x)
2 2
n n
1X 1X
= wi ϕk+j (xi ) + wi ϕk−j (xi )
2 i=1 2 i=1
n
X
= wi ϕk (xi )ϕj (xi ).
i=1

This proves that the Poincaré quadrature with n nodes is exact for ϕj ϕk , with j < n and k ≤ n. We now
deduce that the quadrature nodes coincides with the zeros of ϕn .
Following (Karlin and Studden, 1966, p. 20), as (ϕm )m∈N is a T -system, there exists a non-zero generalized
polynomial pn = α0 ϕ0 + . . . + αn ϕn that vanishes at x1 , . . . , xn . It can be defined from the generalized
Vandermonde matrix by:
pn (x) = det V (ϕ0 , . . . , ϕn−1 , ϕn ; x1 , . . . , xn , x).
Notice that αn 6= 0. Indeed, by expanding the determinant, we have αn = det V (ϕ0 , . . . , ϕn−1 ; x1 , . . . , xn ),
which is non-zero by the T -system property. Now, let j ∈ N with j < n. By the first part above, for all k ∈ N
such that k ≤ n, the Poincaré quadrature is exact for ϕj ϕk . This implies that pn is orthogonal to ϕj :
Z n
X Z
pn (x)ϕj (x)dµ(x) = αk ϕk (x)ϕj (x)dµ(x)
k=0
Xn n
X
= αk wi ϕk (xi )ϕj (xi )
k=0 i=1
Xn Xn
= wi αk ϕk (xi ) ϕj (xi ) = 0.
i=1 k=0
| {z }
=pn (xi )=0

Thus pn is orthogonal to ϕ0 , . . . , ϕn−1 . Since (ϕm )m∈N is a system of orthogonal functions, we deduce that
pn = αn ϕn . As αn 6= 0, we have ϕn = αn−1 pn . From the definition of pn , this implies that ϕn is zero at the
quadrature nodes, which was to prove.

17
5.2 Explicit quadrature formulas and quadrature error
We first give a standard result about trigonometric sums.
Lemma 1. For all m ∈ Z and all n ∈ N? ,
n    (
X 1 mπ 0 if m is not a multiple of 2n
cos i− =
i=1
2 n n(−1)p if m = (2n)p, for all p ∈ Z.

Proof. Let x = (mπ)/n. If m is a multiple of 2n, then x = 2pπ for some p ∈ Z. Then
n    Xn n
X 1 X
cos i− x = cos(−pπ) = (−1)p = n(−1)p .
i=1
2 i=1 i=1

Now, assume that m is not a multiple of 2n. Then x/2 is not a multiple of π and sin(x/2) 6= 0. Using the
trigonometric identity 2 cos a sin b = sin(a + b) − sin(a − b), we have
  
1 x
2 cos i− x sin = sin(ix) − sin((i − 1)x).
2 2
Summing with respect to i from 1 to n then gives the announced result.
We can now deduce the two main propositions of the section.
Proposition 11. The Poincaré quadrature of H 1 (a, b) with n nodes corresponds to the midpoint (or rectangle)
quadrature rule
Z b n    
b−aX 1 b−a
f (x)dx = f a+ i− .
a n i=1 2 n
Thus, the optimal weights are
 equal to 1/n, and the optimal
 nodes are evenly spaced on [a, b] and located at
b−a b−a
the middle of each interval a + (i − 1) ,a + i , i = 1, . . . , n. This quadrature has order 2n−1 with
n n  
respect to the generalized polynomials: it is exact for all ϕm ∝ cos mπ x−a
b−a with m ≤ 2n − 1. Furthermore
it is also exact for all ϕm such that m is not a multiple of 2n, and for polynomials of order 1.

Proof. From the previous section, the nodes of the Poincaré quadrature of H 1 (a, b) coincide to the zeros of
√ 
ϕn = 2 cos nπ x−a on [a, b]. Hence they are equal to xi = a + i − 21 b−a

b−a n , for i = 1, . . . , n.
√ √ 1 mπ
  Pn
Now, ϕm (xi ) = 2 cos(mω(xi − a)) = 2 cos i − 2 n . Thus, by Lemma 1, i=1 ϕm (xi ) = 0 if m is not
a multiple of 2n. Hence, if we set wi = n1 and if m is not a multiple of 2n, then
Z b n
dx X
ϕm (x) = δm,0 = wi ϕm (xi ),
a b−a i=1

where the case m = 0 is equivalent to w1 + · · · + wn = 1. Recall that the quadrature weights are uniquely
determined by the first n equations above, corresponding to m = 0, . . . , n−1. Indeed, the matrix of the linear
system is (ϕm (xi ))0≤m,i≤n−1 , which is invertible by the T-system property. This proves that the optimal
weights are equal to 1/n. Furthermore, the same equations show that the quadrature rule is exact for all ϕm
such that m is not a multiple of 2n. Finally, the quadrature is interpreted as the midpoint quadrature rule,
which is exact for all polynomials of order 1.
Proposition 12 (Quadrature error). Consider the quadrature error defined as the worst-case error of the
Poincaré quadrature of H 1 (a, b) with n nodes (see Section 4.4). We have:
!1/2
b−a
2n
wce(n) = −1 (5.1)
tanh b−a

2n

18
and goes to zero at a linear speed when n tends to infinity:
b−a 1
wce(n) ∼ √ .
2 3 n
1
 b−a
Proof. Recall that the optimal weights are wi = 1/n, and the optimal nodes are xi = a + i − 2 n . Then,
using (4.8), we have:
∞ n
!2
2
X 1 X
wce(n) = αm 2 ϕm (xi ) .
m=2n
n i=1
√ √
Now, ϕm (xi ) = 2 cos(mω(xi − a)) = 2 cos i − 12 mπ π
 
n , with ω = b−a . By Lemma 1,

n
(
X 0 if m is not a multiple of 2n
ϕm (xi ) = √ p
i=1
2n(−1) if m = (2n)p, for all positive integer p.

Thus in this sum above, the non-zeros terms are such that m = 2pn. Observe that m ≥ 2n is then equivalent
to p ≥ 1. Hence, reparameterizing by p, we obtain
∞ ∞
X X 1
wce(n)2 = 2 α2np = 2 2 /r 2
.
p=1 p=1
1 + p

with r = (2nω)−1 . Following the computations of Example 1, we have


∞ ∞
X 1 X 1 πr
wce(n)2 = 2 2 /r 2
= 2r 2
2 + r2
= − 1.
p=1
1 + p p=1
p tanh(πr)

This gives the explicit form of wce(n). To obtain the speed of convergence, notice that by an immediate
π2
P∞ 1
application of Lebesgue theorem, p=1 p2 +r 2 tends to ζ(2) = 6 when r tends to zero. Hence, when n tends

to infinity,
1 1 π2 (b − a)2 1
wce(n)2 ∼ 2 2 = .
2n ω 6 12 n2

5.3 Asymptotical optimality of the Poincaré quadrature


In the particular case of the uniform distribution, the kernel of H 1 (a, b) is given explicitly (see Section 3.2).
Thus it is possible to derive the optimal kernel quadrature for H 1 (a, b), which can be found in (Duc-Jacquet,
1973, Theorem IV.5). For a = 0, b = 1, it is proved that the optimal nodes are x?i = 2i−12n , thus  corresponding
1
to the nodes of the rectangle quadrature, and the optimal weights are wi? = 2 tanh 2n . For large n,
wi? ∼ n1 . Thus, the Poincaré quadrature, here equal to the rectangle quadrature, is asymptotically equivalent
to the optimal kernel quadrature. Furthermore, the radius of information (worst-case error for the optimal
quadrature, see Section 4.4) verifies
 
2 1 1
r(n) = 1 − 2n tanh ∼ . (5.2)
2n 12n2

This is the same convergence speed as the worst-case error of the Poincaré quadrature, which we derived in
Prop. 12:
wce(n)
→ 1.
r(n) n→∞
Therefore, we can conclude that the Poincaré quadrature is asymptotically optimal for H 1 (0, 1). This result
is intuitive as the finite-dimensional kernel KM , for which the Poincaré quadrature is optimal, tends to the
kernel of H 1 (µ).

19
6 Numerical experiments
6.1 Numerical computation of the Poincaré quadrature
Computation of the spectral decomposition. The first step to compute numerically the Poincaré
quadrature is to obtain the spectral decomposition of Theorem 1. This was investigated by Roustant et al.
(2017, Section 4.2.), who proposed a finite element technique. It consists of solving Problem (P2) in the
finite-dimensional space spanned by Nmesh piecewise linear functions whose knots are evenly spaced, which
then boils down to a matrix diagonalisation problem. The theory of finite elements quantifies the speed of
convergence when Nmesh tends to infinity, depending on the regularity of the probability density function ρ
of µ ∈ B. If ρ is of class C ` , the Poincaré basis functions are of class C `+1 , and the order of convergence is
−2` −`
O(Nmesh ) for the eigenvalues and O(Nmesh ) for the eigenfunctions. Notice that the value of Nmesh , controlling
the mesh size, should be of course must larger than the order of the eigenvalue (or eigenfunction) to estimate.
In practice we choose Nmesh = 1 000. Figure 1 illustrates the result for the uniform distribution on (0, 1) and
the exponential distribution truncated on (0, 3), for which the spectral decomposition is known in closed-form
(as detailed in Section 3.2). We can see that the numerical approximation is accurate.
1.5

10
1.0

8
0.5
0.0

6
−0.5

4
−1.0
−1.5

0.0 0.2 0.4 0.6 0.8 1.0 1 2 5 10 20 50


8
3
2

6
1
0

4
−1
−2

2
−3

0.0 0.5 1.0 1.5 2.0 2.5 3.0 1 2 5 10 20 50

Figure 1: Poincaré spectral decomposition for the uniform distribution on (0, 1) (top) and the exponential
distribution dµ(x) = e−x 1R+ (x), truncated on (0, 3) (bottom). The left panel represent the first five eigen-
functions (Poincaré basis) and the right panel the first 50 eigenvalues in log scale. The colored dotted lines
are the approximations computed by finite elements, and the (superimposed) grey solid lines are the true
expressions.

Computation of the Poincaré quadrature. We now assume that the Poincaré basis has been computed
numerically, as explained in the previous paragraph. We aim at computing the Poincaré quadrature with
n nodes. By definition, the Poincaré quadrature is the (generalized) Gaussian quadrature of the Poincaré

20
basis. Thus, it can be obtained by solving the minimization problem (2.4) of Proposition 2 over probability
distributions σ subject to moment conditions. More precisely, inspired by the work of Ryu and Boyd (2015)
for the usual Gaussian quadrature, we search σ as a discrete mesure supported on a fine uniform grid. We
thus choose a large integer N  n and consider the grid formed by evenly spaced points zj = a + jh
PN −1
(j = 0, . . . , N − 1) where h = Nb−a
−1 is the grid size. Searching for σ of the form σ = j=0 wj δzj , (2.4) is
then rewritten as the following linear programming (LP) problem:
N
X −1
w∗ = arg min wj ϕ2n (zj ) (6.1)
w∈[0,1]N
j=0
N
X −1
subject to wj = 1
j=0
N
X −1 Z b
and wj ϕi (zj ) = ϕi dµ = δi,0 (i = 0, . . . , 2n − 1).
j=0 a

The problem can be solved numerically by standard PN −1LP solvers. However, as the grid points may not contain
exactly the unknown nodes, the solution σ ? = j=0 wj? δzj is generally supported on more than n points.
Thus, as a postprocessing step, we follow Ryu and Boyd (2015) and apply a clustering technique with n
clusters (typically the k-means algorithm) to approximate the support points of the distribution. In each
cluster Ci (i = 1, . . . , n), a node xi is defined as a convex combination of its elements zj with weights
proportional to wj? (j ∈ Ci ). Finally, the weight wi associated to this node is defined as the sum of the
weights in Ci .
Due to the finite grid and the heuristic clustering and averaging technique, the moment conditions are in
general not fulfilled exactly. To improve the accuracy of the obtained solution (x, w) further, we include a
second optimization step, again following Ryu and Boyd (2015), in which we minimize the squared norm of
the misfit for the moments
 2
2n−1
X Xn
(X P , wP ) = arg min δi,0 − wj ϕi (xj ) (6.2)
x,w
i=0 j=1

subject to a ≤ xj ≤ b (j = 1, . . . , n)
and 0 ≤ wj ≤ 1 (j = 1, . . . , n)

over both x and w using the interior-point algorithm (since we compute lower principal representations, which
are always in (a, b)). The nodes and weights obtained from postprocessing the solution to (6.1) are used as
the starting point for (6.2). The objective function value of the solution (X P , wP ) of (6.2) is usually in the
order of 10−7 . For the uniform distribution, it is in the order of 10−16 .
To evaluate the accuracy of the numerical quadrature, we compute the Poincaré quadrature of the uniform
distribution where the analytical result is known (see Section 5). We have used the exact expression of the
Poincaré basis. Figure 6.1 shows the results for n = 3 and n = 5 nodes. We can see that the nodes found
coincide with the zeros of ϕn , as expected by the theory. Furthermore, the weights are equal to 1/n (up to
machine precision), which is also expected.
Although the numerical computation of the Poincaré basis has been found to be accurate (see above
in this section), we also investigate its influence by replacing the exact expression of the Poincaré basis in
the previous experiments by its numerical approximation. The results are almost the same (the difference
is in the order of 10−12 ), showing that the whole procedure gives accurate results for the uniform distribution.

6.2 Further properties of Poincaré quadratures


Empirically, we find that the nodes and weights of a n-point Poincaré quadrature have the following properties,
independently of the density:

21
1.5 1.5

1 1

0.5 0.5

0 0

-0.5 -0.5

-1 -1

-1.5 -1.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1

Figure 2: Poincaré quadrature for the uniform distribution on (0, 1), with n = 3 nodes (left) and n = 5 nodes
(right). The curve represents the Poincaré basis function with n roots, and the red crosses and lines the
quadrature nodes and weights obtained by the numerical procedure.

1. The nodes are almost – but in general not exactly – equal to the zeros of ϕn . The difference is not due
to numerical error. It is present even if the basis functions are known analytically, such as in the case
of the truncated exponential.
2. The nodes are almost evenly spaced, but slightly skewed towards the concentration of probability mass.

3. The weights mimic the shape of the probability density function.


These three observations are illustrated in Fig. 3 for the truncated exponential distribution and for a non-
parametric density.

6 1.4 2 3

4 1.2 2.5
1
1
2 2
0
0.8
0 1.5
0.6
-1
-2 1
0.4
-2
-4 0.2 0.5

-6 0 -3 0
1 2 3 4 5 0 0.2 0.4 0.6 0.8 1

(a) Truncated exponential (n = 8) (b) Nonparametric density (n = 8)

Figure 3: Poincaré nodes and weights for the truncated exponential distribution (interval [1, 5], left) and a
nonparametric density (right). Note that each plot has two y-axes with different scaling: the left one (in
black) is for the basis function ϕn and for the nodes xi , while the right one (in gray) is used for the probability
density and the weights wi .

Experiments with random densities. To better understand the properties of the Poincaré quadrature,
we investigate it for a number of randomly generated continuous probability distributions. Their probability
density functions (pdfs) are generated as follows. On the interval [0, 1], we sample independent realizations
of a Gaussian process with mean zero and a Matern- 52 covariance kernel with parameter θ = 0.3. Denote

22
one such realization with g(x). Then, we define a pdf by exp(g(x)) (up to a normalization constant). In
case where the minimal value on [0, 1] is smaller than 0.05, we reject this pdf, in order to avoid numerical
issues (recall that µ must be a bounded perturbation of the uniform distribution, and thus its pdf does not
vanish on the support interval). One example together with its Poincaré quadrature for n = 8 is shown in
Fig. 3b. A set of 100 such pdfs is visualized in Figure 4a. As expected, the configurations are various, and
often provide multimodal pdfs.

Ratio of pdf and weights. To further investigate the second and third observations mentioned above,
namely, that the weights of the Poincaré quadrature mimic the associated density, we analyse the nodes and
weights associated to 100 random densities.
In Fig. 4b, we display boxplots of the locations of the n = 5 nodes. We see that the nodes are nearly
evenly spaced, which would correspond to the locations (0.1, 0.3, 0.5, 0.7, 0.9).
nwi
Furthermore, in Fig. 4c we display boxplots of the ratio ρ(x i)
for the n = 5 quadrature nodes, where the
weights are scaled by n for convenience. As already guessed from Fig. 3, we see that this ratio is quite close
to 1. This suggests that the Poincaré quadrature might be a good quantization for the density ρ, which we
investigate in the following.

7
0.9
6
0.8

location of node in [0,1]


5 0.7

0.6
4
0.5
3
0.4

2 0.3

0.2
1
0.1
0
0 0.2 0.4 0.6 0.8 1 1 2 3 4 5
node index
(a) 100 randomly generated densities (b) Distribution of nodes

1.3

1.2
ratio

1.1

0.9

1 2 3 4 5
node index
(c) Ratio between weights and pdf

Figure 4: Left panel: 100 random pdfs generated by the procedure described in Section 6.2. Right panel:
Location of nodes for the Poincaré quadratures associated to the same densities, with n = 4. Bottom panel:
wi n
Ratio ρ(x i)
, where ρ is the pdf associated to µ.

23
Wasserstein-optimal quantization. It is interesting to compare the Poincaré quadrature to other stan-
dard quantization procedures, where quantization means an approximation of a continuous probability dis-
tribution by a discrete one. Here, we will focus on the optimal quantization associated to the Wasserstein
distance, called Wasserstein-optimal quantization. The Wasserstein distance between two cumulative density
functions (cdf) F, G is defined by
Z 1 1/2
W (F, G) = (F −1 (p) − G−1 (p))2 dp . (6.3)
0

Then, the corresponding optimal quantization with n points is the discrete probability distribution that has
the smallest Wasserstein distance to the density associated to the measure µ. It can be computed efficiently
using Lloyd’s algorithm (Graf and Luschgy, 2007).
For each random pdf, for a fixed number of nodes n = 5, we compute the Poincaré quadrature, the standard
Gaussian quadrature (associated to ordinary polynomials), and the Wasserstein-optimal quantization. We
report the location of the nodes, as well as the zeros of ϕn , in Figure 5a. We observe that the Poincaré nodes
are quite evenly spaced with small variability, and close (but not equal) to the zeros of the Poincaré basis
function ϕn (denoted by red crosses). Furthermore, the Poincaré nodes are more evenly spaced than the
support points of the Wasserstein-optimal quantization (denoted by yellow triangles). Finally, we observe
that the Gaussian nodes are more spread out than the Poincaré nodes: the outermost Gaussian nodes are
closer to the boundary than the outermost Poincaré nodes.
100

90

80

70

60
Repetitions

Gauss vs Poincare
50 Opt. quant. vs Poincare
0.065
Poincare quadrature

40
0.06

30
0.055

20
0.05

10
0.045
0 0.045 0.05 0.055 0.06 0.065
0 0.2 0.4 0.6 0.8 1
Support interval Gauss quadrature/Optimal quant.
(a) Locations of nodes and zeros (b) Comparison of Wasserstein distances

Figure 5: Analysis of the properties of Poincaré quadrature rules with n = 5 nodes for the random probability
distributions displayed in Fig. 4a. The left panel shows the corresponding nodes for the (usual) Gaussian
quadrature and the Poincaré quadrature, as well as the support points for the Wasserstein-optimal quantiza-
tion. The red crosses indicate the zeros of the Poincaré basis function ϕn . Right panel: Wasserstein distances
between the continuous probability distribution µ and three different quadrature rules: Gaussian, Poincaré,
and Wasserstein-optimal. Each blue point corresponds to a random density and shows the Wasserstein dis-
tances of Gaussian vs Poincaré quadratures. Similarly, each red point corresponds to a random density and
shows the Wasserstein distances of Wasserstein-optimal vs Poincaré quadratures. In this way, each random
density corresponds to two points in the plot. The black dashed line visualizes the identity x = y.

To further quantify the comparison, we measure the Wasserstein distance of the standard Gaussian
quadrature and the Poincaré quadrature to µ. Results with n = 5 are presented in Figure 5b We see that in

24
most cases, the Poincaré quadrature has a smaller Wasserstein distance than the Gaussian one. The optimal
quantization is by construction better than both, but often actually not much better than the Poincaré
quadrature.

6.3 Worst-case error


Finally, we investigate the behaviour of the worst-case error wce(n) as a function of n. To compute wce(n)
numerically, we use Equation (4.9), based on K − KM with M = 2n − 1. As in general, K is not known
analytically, we approximate it by KT (Equation 4.1), obtained by truncating the Mercer representation of
K for a large value of T  M . Thus, wce(n) = wce(XP , wP , K) is approximated by wce(XP , wP , KT ).
We first investigate the quality of this approximation for the uniform distribution, where both K and
the Poincaré quadrature (XP , wP ) have a closed-form expression. Here we have two options: a fully numeri-
cal computation, where we compute numerically the Poincaré quadrature, or a semi-analytical computation
where we use its closed-form expressions. The resulting estimates of wce(n) differ by less than 10−4 , which is
negligible compared to the actual value of wce(n). This reinforces our previous statement that the numerical
computation of the Poincaré quadrature yields accurate results (Section 6.1). Therefore, in the next exper-
iment, we only show the results for the fully numerical computation. Figure 6 shows the proxy worst-case
error wce(XP , wP , KT ) as a function of the number of nodes n for different values of T . We compare it
to the exact value of wce(n), given here by Equation (5.1), as well as to √112 n−1 , which is asymptotically
equivalent. We observe that as we include more terms in the truncated kernel KT , the estimates approach
wce(n). Because of the truncation, wce(XP , wP , KT ) underestimates the true error (see also Equation 4.8).

10 -1

10 -2
1 10 20

Figure 6: Approximation of wce(n) by wce(Xp , wP , KT ) for the uniform distribution, and comparison with
the closed-form expression of wce(n) and its asymptotical speed of convergence.

Secondly, we repeat the experiment for 5 random densities generated as described in Section 6.2. The
worst-case error is computed with the fully numerical procedure presented above, with T = 100. The results
are shown in Figure 7. Since we use a truncated kernel, the estimate likely underestimates the true wce(n).
We observe that the worst-case error for the random densities appears to be close to √112 n−1 , the asymptotic
speed of convergence of wce(n) for H 1 (0, 1), suggesting that this speed of convergence might be universal.

25
10 0
3

2
10 -1

10 -2
1 10 20 0
0 0.5 1

Figure 7: Worst-case error (left) for 5 random densities supported on [0, 1] (right). The worst-case error is
computed using Equation (4.9) as wce(XP , wP , KT ) for T = 100.

7 Conclusion and outlook


In this paper, we have considered the wide class of probability distributions µ with continuous and non-
vanishing pdf on a finite interval [a, b]. In this frame, we have brought three main contributions.
Firstly, remarking that the Sobolev space H 1 (µ) is a RKHS, we have made a connection between the
spectral decomposition of its kernel, Poincaré inequality on a compact interval, and T -systems. The form of
the kernel is ‘single-pair’, and is linked to graphical Gaussian models.
Secondly, we have exploited this connection to achieve optimal kernel quadrature in H 1 (µ). The T -system
property guarantees the existence of a generalized Gaussian quadrature, extending the Gaussian quadrature
of polynomials. We have investigated some particular cases where the results are explicit, in particular the
standard (unweighted) Sobolev space with its usual Hilbert norm.
Thirdly, we have proposed an efficient numerical procedure. Using densities with and without analytically
known kernel, we have explored the distribution of Poincaré quadrature nodes, the magnitude of the weights,
and the speed of convergence for the worst-case error. We have gathered evidence for the hypothesis that
asymptotically, the Poincaré quadrature nodes might be evenly spaced, and that the worst-case error of
Poincaré quadrature scales as √112 n−1 , as for the uniform case.
A direction for future research may be to theoretically investigate the asymptotical properties with respect
to the number of quadrature nodes, that have been observed in the numerical experiments. This may involve
specific techniques for studying large covariance matrices.

Acknowledgement
This research was conducted with the support of the consortium in Applied Mathematics CIROQUO, gath-
ering partners in technological research and academia in the development of advanced methods for Computer
Experiments (doi:10.5281/zenodo.6581217) and the LabEx CIMI in the frame of the research project Global
sensitivity analysis and Poincaré inequalities. Support from the ANR-3IA Artificial and Natural Intelligence
Toulouse Institute is also gratefully acknowledged.

References
Atteia, M. (1992). Hilbertian Kernels and Spline Functions. Studies in Computational Mathematics, 4.
Bakry, D., I. Gentil, and M. Ledoux (2014). Analysis and geometry of Markov diffusion operators, volume 348
of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].
Springer, Cham.
Berlinet, A. and C. Thomas-Agnan (2011). Reproducing kernel Hilbert spaces in probability and statistics.
Springer Science & Business Media.

26
Brass, H. and K. Petras (2011). Quadrature theory: the theory of numerical integration on a compact interval.
Number 178. American Mathematical Soc.
Dick, J., D. Nuyens, and F. Pillichshammer (2014). Lattice rules for nonperiodic smooth integrands. Nu-
merische Mathematik 126 (2), 259–291.

Duc-Jacquet, M. (1973). Approximation des fonctionnelles linéaires sur les espaces hilbertiens autorepro-
duisants. Ph. D. thesis, Université Joseph-Fourier-Grenoble I.
Fasshauer, G. E. (2012). Green’s functions: Taking another look at kernel approximation, radial basis
functions, and splines. In Approximation Theory XIII: San Antonio 2010, pp. 37–63. Springer.

Gantmakher, F. R. and M. G. Krejn (2002). Oscillation matrices and kernels and small vibrations of me-
chanical systems. Number 345. American Mathematical Soc.
Garcke, J. and M. Griebel (2012). Sparse grids and applications, Volume 88. Springer Science & Business
Media.

Graf, S. and H. Luschgy (2007). Foundations of quantization for probability distributions. Springer.
Karlin, S. and W. Studden (1966). T-systems: with applications in analysis and statistics. Pure Appl. Math.,
Interscience Publishers, New York, London, Sidney.
Karvonen, T., M. Kanagawa, and S. Särkkä (2019). On the positivity and magnitudes of bayesian quadrature
weights. Statistics and Computing 29 (6), 1317–1333.

Lüthen, N., O. Roustant, F. Gamboa, B. Iooss, S. Marelli, and B. Sudret (2021). Global sensitivity analysis
using derivative-based sparse Poincaré chaos expansions.
Maathuis, M., M. Drton, S. Lauritzen, and M. Wainwright (2018). Handbook of graphical models. CRC Press.
Novak, E. and H. Woźniakowski (2008). Tractability of Multivariate Problems: Standard information for
functionals, Volume 2. European Mathematical Society.
Oettershagen, J. (2017). Construction of optimal cubature algorithms with applications to econometrics and
uncertainty quantification. Verlag Dr. Hut.
O’Hagan, A. (1991). Bayes–hermite quadrature. Journal of statistical planning and inference 29 (3), 245–260.

Roustant, O., F. Barthe, and B. Iooss (2017). Poincaré inequalities on intervals – application to sensitivity
analysis. Electronic Journal of Statistics 11 (2), 3081 – 3119.
Ryu, E. K. and S. P. Boyd (2015). Extensions of Gauss quadrature via linear programming. Foundations of
Computational Mathematics 15 (4), 953–971.

Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of
functions. In Doklady Akademii Nauk, Volume 148, pp. 1042–1045. Russian Academy of Sciences.
Szegö, G. (1959). Orthogonal polynomials. In Amer. Math. Soc. Colloquium, 1959.
Thomas-Agnan, C. (1996). Computing a family of reproducing kernels for statistical applications. Numerical
Algorithms 13 (1), 21–32.

Zhang, S. and E. Novak (2019). Optimal quadrature formulas for the Sobolev space H 1 . Journal of Scientific
Computing 78 (1), 274–289.

27

You might also like