Spectral Decomposition of H1
Spectral Decomposition of H1
Spectral Decomposition of H1
Abstract
MotivatedR by uncertainty P quantification of complex systems, we aim at finding quadrature formulas
b n 1
of the form a f (x)dµ(x) = i=1 wi f (xi ) where f belongs
Pn to H (µ). Here, µ belongs to a class of
continuous probability distributions on [a, b] ⊂ R and i=1 wi δxi is a discrete probability distribution
on [a, b]. We show that H 1 (µ) is a reproducing kernel Hilbert space with a continuous kernel K, which
allows to reformulate the quadrature question as a kernel (or Bayesian) quadrature problem. Although K
has not an easy closed form in general, we establish a correspondence between its spectral decomposition
and the one associated to Poincaré inequalities, whose common eigenfunctions form a T -system (Karlin
and Studden, 1966). The quadrature problem can then be solved in the finite-dimensional proxy space
spanned by the first eigenfunctions. The solution is given by a generalized Gaussian quadrature, which
we call Poincaré quadrature.
We derive several results for the Poincaré quadrature weights and the associated worst-case error. When µ
is the uniform distribution, the results are explicit: the Poincaré quadrature is equivalent to the midpoint
(rectangle) quadrature rule. Its nodes coincide with the zeros of an eigenfunction and the worst-case error
√ n−1 for large n. By comparison with known results for H 1 (0, 1), this shows that the Poincaré
scales as 2b−a
3
quadrature is asymptotically optimal. For a general µ, we provide an efficient numerical procedure, based
on finite elements and linear programming. Numerical experiments provide useful insights: nodes are
nearly evenly spaced, weights are close to the probability density at nodes, and the worst-case error is
approximately O(n−1 ) for large n.
Keywords Sobolev space, RKHS, Mercer representation, Poincaré inequality, Sturm-Liouville theory,
Tchebytchev system (T -system), Bayesian quadrature, kernel quadrature, Gaussian quadrature.
Contents
1 Introduction 2
2 Background 4
2.1 Poincaré inequalities and basis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Quadrature with T-systems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.3 Kernel quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1
4 Poincaré quadrature and optimal kernel quadrature in H 1 (µ) 12
4.1 Definitions and notations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
4.2 Equivalence of Poincaré and optimal kernel quadratures in H 1 (µ) . . . . . . . . . . . . . . . . 13
4.3 Formulas for optimal weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
4.4 Quadrature error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
6 Numerical experiments 20
6.1 Numerical computation of the Poincaré quadrature . . . . . . . . . . . . . . . . . . . . . . . . 20
6.2 Further properties of Poincaré quadratures . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
6.3 Worst-case error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1 Introduction
Motivation. This research is motivated
R by uncertainty quantification of complex systems, where a typical
task is to compute integrals I = G(x)dP (x). Here, G is a multivariate function representing a quantity of
interest of the system, x is a vector of Rd representing the input variables and P is a probability distribution
representing the uncertainty on x. In this context, the evaluation of G is often time-consuming, and cubature
formula may be preferred to sampling techniques to compute the integral I. Assuming that the input variables
are independent, cubature formulas then boil down to 1-dimensional quadrature formulas, by tensorization
or using sparse grids (Smolyak, 1963; Garcke and Griebel, 2012).
Problem considered. For a given finite interval [a, b] of R, we aim at finding accurate approximations
Rb
of integrals a f (x)dµ(x) replacing µ by a discrete probability distribution. We thus deal with quadrature
formulas Z b Xn
f (x)dµ(x) = wi f (xi ), (1.1)
a i=1
where, for i = 1, · · · , n, the quadrature nodes xi lie in [a, b]. The quadrature weights (wi ) are non-negative
and sum to 1. We will denote by X (resp. w) the sequence of nodes (xi ) (resp. of weights (wi )). Considering
minimal regularity conditions, we assume that f belongs to the Sobolev space H 1 (µ) = {f ∈ L2 (µ), s.t.f 0 ∈
L2 (µ)}, where the derivatives are defined in a weak sense. More generally, for an integer p ≥ 2, we define the
Sobolev space H p (µ) as the subset of functions f of H p−1 (µ) such that f (p) ∈ L2 (µ). In the whole paper,
Rb
we consider the usual norm of H 1 (µ) defined by kf k2H 1 (µ) = kf k2 + kf 0 k2 , where kf k2 = a f 2 (x)dµ(x)
is the norm of L2 (µ). For technical reasons, we assume that µ is a bounded perturbation of the uniform
distribution on [a, b], meaning that it admits a positive continuous probabity density function ρ on [a, b].
This includes a wide range of probability distributions used in practice, such as the truncated normal one,
obtained by conditioning a Gaussian variable to vary in a finite domain. This assumption implies that the
sets L2 (µ), H 1 (µ) contain the same equivalence classes of functions than L2 (a, b), H 1 (a, b), associated to the
uniform distribution on [a, b], with equivalent norms.
Kernel quadrature formulation. When µ is the uniform probability distribution, it is well known that
H 1 (a, b) is a reproducing kernel Hilbert space (RKHS) (Duc-Jacquet, 1973; Thomas-Agnan, 1996). Under
the previous assumption on µ, we will show that H 1 (µ) is also a RKHS (Section 3). In that case, a suitable
criterion to evaluate the accuracy of a quadrature (X, w) is the worst-case error, defined by
Z n
X
wce(X, w, H) = sup h(x)dµ(x) − wi h(xi ) .
h∈H,khkH ≤1 i=1
2
Here, H is some particular given functional space. Indeed, when H is a RKHS, wce(X, w, H) can be explicitly
computed as a function of the kernel K (see Section 2.3 and e.g. Novak and Woźniakowski, 2008, Chapter
10). An interesting quadrature problem is so the following minimization problem:
where one wish to identify the minimizing quadrature. Such problem is often called kernel quadrature, or
Bayesian quadrature in a probabilistic framework. There, the prior information is that the integrand is drawn
from a Gaussian process with kernel K, and the worst-case error is equal to the posterior variance of the
integral. We refer to O’Hagan (1991); Karvonen et al. (2019) for a presentation of this latter point of view.
Originality of the problem. We remark that, apart from the case of the uniform distribution, the problem
does not reduce to the more standard quadrature problem with a weight function
Z b n
X
f (x)ρ(x)dx = wi f (xi ) (1.2)
a i=1
where f belongs to H 1 (a, b). Indeed, the unit balls {h ∈ H 1 (µ), khkH 1 (µ) ≤ 1} and {h ∈ H 1 (a, b), khkH 1 (a,b) ≤
1} are different if ρ is not a constant function. Thus wce(X, w, H 1 (µ)) 6= wce(X, w, H 1 (a, b)) and the weighted
quadrature problem, formulated as a worst-case error minimization problem, will in general not give the same
solutions as (P ).
Spectral decomposition of H 1 (µ), Poincaré inequality and T -systems. The resolution of problem
(P) is in general intractable, as K has not an explicit form. A key result of this paper is that there is a
correspondence between the spectral decomposition (Mercer representation) of K and the one associated to
Poincaré inequalities. Furthermore, the common eigenfunctions form a Tchebytchev system (T -system, see
(Karlin and Studden, 1966)). Interestingly, even if K is not know explicitely in general, one can compute
numerically its Mercer representation with a finite element technique (Roustant et al., 2017).
where H 1 (µ) has been replaced by its projection HM onto the space spanned by the first M eigenfunctions.
Indeed, similarly to polynomials, T -systems admit a Gaussian quadrature and for a given number of nodes n,
there exists a unique quadrature (X, w) with positive weights for which wce(X, w, KM ) = 0, where M = 2n−1
is maximal. We call this optimal quadrature Poincaré quadrature. For a general probability distribution µ,
the Poincaré quadrature can be computed easily and efficiently by linear programming.
Properties of the Poincaré quadrature. We derive several results for the connection between the kernel
associated to H 1 (µ), the Poincaré quadrature nodes and weights, and the associated worst-case error. When
µ is the uniform distribution, the results are explicit (Section 5): the Poincaré quadrature is equal to the
midpoint (rectangle) quadrature rule, its nodes coincide with the zeros of an eigenfunction, as for the Gaussian
quadrature of polynomials, and the worst-case error scales as 2b−a √ n−1 for large n. Furthermore, in the case
3
of H 1 (0, 1), the kernel is given explicitly, and it is possible in this frame to compute the optimal kernel
quadrature without relying on a finite-dimensional approximation. The results obtained by Duc-Jacquet
(1973) show that the optimal kernel quadrature has evenly space nodes and weights asymptotically equal to
1
n , which shows that the Poincaré quadrature is asymptotically optimal.
In the general case, numerical experiments provide empirical insights (Section 6): nodes are nearly evenly
spaced, weights are close to the probability density at nodes, and the worst-case error is approximately
proportional to n−1 for large n.
3
Links with literature. The Mercer representation of Sobolev spaces seems unexplored for non-uniform
probability distributions. For the uniform case, by connecting RKHS to Green’s functions, Fasshauer (2012)
gives Mercer representations of various kernels associated to Sobolev spaces, such as the anchored Sobolev
space H 1 (0, 1) = {f ∈ H 1 (0, 1), f (0) = f (1) = 0} with the usual norm. Dick et al. (2014) provides the
R1 R1
Mercer representation of H 1 (0, 1) for the norm given by kf k2 = ( 0 f (x)dx)2 + 0 f 0 (x)2 dx. While close to
these examples, the case of H 1 (0, 1) with its usual norm seems not reported in the literature.
Similarly, considering Sobolev spaces with a non-uniform probability distribution seems unexplored for
quadrature problems. As mentioned above, this does not boil down to a quadrature with weights for the
uniform distribution, as the unit balls are different. The case of H 1 (0, 1) (uniform case) has been studied by
several authors, with various choices of norms and weight functions (Equation 1.2). For instance, Zhang and
Novak (2019) provide expressions of the radius of information (worst-case error for the optimal quadrature)
R1
in function of the nodes, for the semi-norm 0 f 0 (x)2 dx and centered weight functions. For a constant weight
function, and the usual norm of H 1 (0, 1) considered here, Duc-Jacquet (1973) obtains the optimal kernel
quadrature. The link between T -systems and kernel quadrature has been also exploited in Oettershagen
(2017). There, the kernel is assumed to have an explicit form, and the T -system is obtained by considering
the kernel function at nodes K(xi , .), which is different than our approach based on the spectral decomposi-
tion of K. Therein, the case of H 1 (0, 1) is considered on numerical experiments, but with a different norm
R1 R1
associated to the Bernoulli polynomials, given by kf k2 = ( 0 f (x)dx)2 + 0 f 0 (x)2 dx.
Paper organization. Section 2 gives the prerequisites on Poincaré inequalities, T -systems, RKHS and
kernel quadrature. The RKHS structure of H 1 (µ) is discussed in Section 3, where a connection is established
between Poincaré inequalities and the kernel of H 1 (µ). Section 4 gives general formulas for the optimal
quadrature weights and the associated worst-case error as a function of the kernel. Section 5 focuses on the
case of the uniform distribution. Section 6 presents numerical experiments in the general case. Section 7
concludes and gives an outlook for future research.
2 Background
In the whole paper, we consider a bounded interval of the real line [a, b], with −∞ < a < b < ∞. We consider
a probability distribution µ supported on [a, b] which is a bounded perturbation of the uniform distribution,
in the following sense.
Definition 1 (Bounded perturbation of the uniform distribution). Let µ be a continuous probability distri-
bution on [a, b], with density ρ. We say that µ is a bounded perturbation of the uniform distribution if ρ is
a positive continuous and piecewise C 1 function on [a, b]. We denote by B the set of bounded perturbations
of the uniform distribution on [a, b].
We also denote by V = − log(ρ) the so-called potential associated to µ. Equivalently, ρ(t) = e−V (t) .
Remark 1. (i) Obviously, if µ fulfils the previous definition, then ρ is bounded from below and above by
positive constants. That is, there exist m, M in R such that
∀t ∈ [a, b], 0 < m ≤ ρ(t) ≤ M < +∞.
(ii) When µ ∈ B, it is straightforward that the sets L2 (µ), H 1 (µ) contain the same equivalence classes of
functions than L2 (a, b), H 1 (a, b), associated to the uniform distribution on [a, b], with equivalent norms.
4
Definition 2 (Poincaré inequality). We say that µ satisfies a Poincaré inequality if there exists a positive
constant C such that for all f ∈ H 1 (µ):
Varµ (f ) ≤ Ckf 0 k2 .
In this case, the smallest possible constant C above is denoted CP (µ), and called the Poincaré constant of µ.
0 2
When it exists, the Poincaré constant is obtained by minimizing the so-called Rayleigh ratio J(f ) = kf k
kf k2
1
over all centered functions of H (µ). An important result is that a bounded perturbation of the uniform
distribution satisfies a Poincaré inequality, which is related to a spectral decomposition as described in the
following theorem (proved e.g. in Roustant et al., 2017):
Theorem 1 (Spectral theorem). Let µ be a probability distribution in B. Consider the three following
problems:
0 2
(P1) Find f ∈ H 1 (µ) with f dµ = 0 s.t. J(f ) = kf k
R
kf k2 is minimum.
Then the eigenvalue problems (P2) and (P3) are equivalent. The eigenvalues form an increasing sequence
(λm )m≥0 of non-negative real numbers that tends to infinity. They are all simple, and λ0 = 0. The eigenvec-
tors (ϕm )m≥0 form a Hilbert basis of L2 (µ), and ϕ0 is a constant function.
Furthermore when λ = λ1 , the first positive eigenvalue, (P 2) and (P 3) are equivalent to (P 1) and the mini-
mum of (P 1) is attained for f = ϕ1 . Thus CP (µ) = 1/λ1 .
In this paper, our interest is on the whole spectral decomposition. In particular, we define the Poincaré
basis of L2 (µ) as follows.
Definition 3 (Poincaré basis). Let µ be a probability distribution in B. The Poincaré basis is the orthonormal
basis formed by the eigenfunctions (ϕm )m≥0 defined in the spectral theorem (Theorem 1).
Remark 2. As all eigenvalues are simple, the Poincaré basis is unique up to a sign change for each eigen-
function. We set ϕ0 = 1.
We conclude this section by linking the previous material with the Sturm-Liouville theory of second-order
differential equations.
Proposition 1. Let µ be a probability distribution in B. Then the Poincaré basis consists of the eigenfunc-
tions of the Sturm-Liouville eigenproblem
with Neumann conditions f 0 (a) = f 0 (b) = 0, where L(f )(x) = −(p(x)f 0 (x))0 +q(x)f (x) and q = r = p = e−V .
Furthermore, the Sturm-Liouville problem is regular, in the sense that all its eigenvalues are positive.
Proof. Recall that hf, giH 1 (µ) = hf, gi + hf 0 , g 0 i. From the proof of Theorem 2 in Roustant et al. (2017), the
eigenfunctions of the Poincaré operator are solutions of the spectral problem: to find f ∈ H 1 (µ) and β such
that for all g ∈ H 1 (µ),
hf, giH 1 (µ) = βhf, gi. (2.2)
The corresponding eigenvalues are βm = 1 + λm . In particular, βm > 0, as λm ≥ 0. Moreover, Problem (2.2)
is equivalent to the second order differential equation
with Neumann conditions f 0 (a) = f 0 (b) = 0 (see also Roustant et al., 2017, proof of Theorem 2). Multiplying
by e−V (x) , which by definition of B does not vanish, we obtain the equivalent Sturm-Liouville form (2.1).
5
2.2 Quadrature with T-systems
This section is based on Karlin and Studden (1966).
Definition 4 (T-systems, generalized polynomials). Let (un )n∈N be a family of real-valued continuous func-
tions defined on a compact interval [a, b] ⊂ R. We say that (un ) is a complete Tchebytchev system, or
simply T-system, if for all integer n ≥ 1 and for all sequence of distinct points a ≤ t1 < · · · < tn ≤ b, the
determinant of the generalized Vandermonde matrix
u0 (t1 ) ... u0 (tn )
V (u0 , . . . , un−1 ; t1 , . . . , tn ) :=
.. .. ..
. . .
un−1 (t1 ) . . . un−1 (tn )
is positive. The finite linear combinations of the un ’s are called generalized polynomials or u-polynomials.
A prototype of T -system on any interval of the real line is given by the polynomial functions ui (t) = ti , and
n−1
the
Q determinant of V is nothing more than the Vandermonde determinant det V (1, t, . . . , t , t1 , . . . , tn ) =
(t
1≤i<j≤n j − t i ) (see e.g. Karlin and Studden, 1966, page 1). An equivalent definition of T -systems (up to
a sign change) is that any generalized polynomial, i.e., any linear combination of u0 , u1 , . . . , un , has at most
n zeros (Karlin and Studden, 1966, Theorem 4.1.). This extends the property that (ordinary) polynomials
of degree n have at most n zeros. In that sense, T -systems can be viewed as a generalization of polynomials,
which justifies the name generalized polynomials.
In the context of quadrature problems, the definition of T -systems guarantees that for any set of dis-
tinct quadrature nodes in [a, b], there exists a unique set of quadrature weights such that the quadrature
formula (1.1) is exact at order n − 1, i.e. for all functions in span(u0 , . . . , un−1 ). Indeed, up to reorder-
ing, the equations above define a linear system whose matrix is invertible and equal, up to a sign change,
to V (u0 , . . . , un−1 ; t1 , . . . , tn ). This quadrature formula thus generalizes the Newton-Cotes quadrature of
polynomials, and suffers in general from the same drawback: the weights can be negative and the resulting
quadrature formula can be instable (Oettershagen, 2017; Brass and Petras, 2011).
Interestingly, extending the Gaussian quadrature of ordinary polynomials to more general functions, T -
systems admit a unique quadrature that has positive weights and is exact at order 2n − 1. Contrarily to
the polynomial case, however, the nodes of this quadrature do in general not coincide with the zeros of a
(generalized) polynomial. The computation of the nodes and weights uses a different approach, relying on
geometry. More precisely, consider the moment space:
Z b
n+1
Mn+1 = c∈R , with ci = ui (t)dσ(t), for all i = 0, . . . , n,
a
where σ is a finite measure with support [a, b] . (2.3)
Mn+1 is the closed convex hull of the curve (u(t))t∈[a,b] where u(t) = (u0 (t), . . . , un (t)) ∈ Rn+2 (t ∈ [a, b]).
Then, we may enounce precisely the previous quadrature results.
Proposition 2. Let u = (un )n∈N be a T -system, and µ be a probability distribution in B. Then,
R
b
1. For all n ∈ N, the vector c̄ := a ui (t)dµ(t) is an interior point of M2n .
0≤i≤2n−1
2. There exists a unique quadrature (1.1) with positive weights which is exact at order 2n − 1 (i.e. exact
on the vector space spanned by u0 , . . . , u2n−1 ). It uses a minimal number of nodes, equal to n. The
nodes are all in the open interval (a, b) and the weights sum to 1.
3. This quadrature is obtained by solving the minimization problem
Z b
min u2n (t)dσ(t) (2.4)
σ∈V2n−1 (c) a
6
Rb
over the set V2n−1 (c) of all probability distributions subject to moment conditions a
ui (t)dσ(t) = c̄i ,
for i = 0, 1, . . . , 2n − 1.
Proof of Proposition 2. The proof is based on different results of Karlin and Studden (1966), that we will
refer to. Let us denote N = 2n − 1 the quadrature order.
First, the separation Lemma 9.2., page P point of MN if and only if for all non
65, states that c is an interiorP
N N
negative non trivial u-polynomial g = i=0 ai ui , we have mg (c) = i=0 ai ci > 0. Taking c = c̄, we have
Rb
mg (c̄) = a gdµ and thus mg (c̄) ≥ 0. As the density function of µ is continuous and does not vanish on [a, b],
we can conclude that mg (c̄) > 0, which shows that c is an interior point of MN .
Now, following Karlin and Studden (1966), §3, case (ii), page 46, let define the notion of index of a sequence
of nodes, as the number of nodes in [a, b], with a half weight for the nodes equal to the endpoints a, b (if
any). Then, as N is odd, there are exactly two quadratures with positive weights and the smallest possible
index, equal to (N + 1)/2 = n. These quadrature are called principal representations in this context. For
one of them, called upper principal representation, the nodes include the endpoints, and thus the quadrature
involves n + 1 nodes formed by the endpoints a, b and n − 1 nodes in (a, b). The other one, called lower
principal representation, involves n nodes in the open interval (a, b). Thus, it is the only quadrature with
positive weights and containing the smallest number of nodes, equal to n.
Furthermore, they show in Theorem 1.1. page 80, that, if u is a T -system, the solution of (2.4) is unique and
equal to the lower representation of u. In particular, the weights are positive and sum to one.
Finally, if ui (t) = ti , we have equality with the Gaussian quadrature by uniqueness of the quadrature, since
the Gaussian quadrature has n distinct nodes in (a, b), positive weights summing to one, and is exact for
u0 , . . . , u2n−1 (Karlin and Studden, 1966, Chapter IV).
Remark 3. Replacing the minimization problem in (2.4) by maximization, we obtain another valid quadrature
of oder 2n − 1, called upper principal representation. However, it involves one more node, i.e. n + 1 nodes,
including the endpoints a, b. Equivalently, for a fixed number n of nodes, this quadrature has order 2n − 3,
compared to 2n − 1 for the Gaussian quadrature. It generalizes the Lobatto quadrature for polynomials (Brass
and Petras, 2011).
Definition 5 (Gaussian quadrature for T -systems). The unique quadrature of Prop. 2 is called lower principal
representation of u. By analogy with polynomials, and following (Brass and Petras, 2011; Oettershagen,
2017), we will call it the generalized Gaussian quadrature, or simply the Gaussian quadrature of the T -
system (un )n∈N .
We now show that a Poincaré basis is a T -system. This is an immediate consequence of the example of
eigenfunctions of Sturm-Liouville problems (see e.g. Example 7 in Karlin and Studden, 1966). A proof can
be found in Gantmakher and Krejn (2002). However, the proof is not easy to read as it is split in several
parts. We recall below the main steps and give a roadmap for the interested reader.
Proposition 3. Consider the notations and the assumption of Theorem 1. Then the Poincaré basis (ϕm )m∈N
is a T -system.
Proof. By Prop. 1, the eigenfunctions of the Poincaré operator are eigenfunctions of the regular Sturm-
Liouville problem (2.1). Then the result is a particular case of a more general result stating that the
eigenfunctions of a regular Sturm-Liouville operator form a T -system. A proof can be found in Gantmakher
and Krejn (2002). We give here the three main steps, and pointers to the corresponding sections.
Firstly, it is proved in IV.10.4, (pages 236 – 238), that the eigenfunctions of a Sturm-Liouville operator under
boundary constraints verify an integral equation of the form
Z b
ϕ(x) = λ K(x, s)ϕ(s)dσ(s) (2.5)
a
where λ > 0, σ is a probability distribution, and K is the so-called Green function of L, defined by:
K(x, s) = ψ(min(x, s))χ(max(x, s)),
where ψ, χ are particular solutions of the homogeneous equations L(f ) = 0, such that ψ χ is a non-decreasing
function.
Secondly, it is proved that K is an oscillatory kernel, in the sense of1 Definition 1 page 178. Roughly speaking,
1 see also Definition 1’ page 179, and definitions 4 and 6, pages 74 and 76.
7
it means that every matrix extracted from K in ascending order, i.e. (K(xi , sj ))1≤i,j≤n with x1 < · · · < xn
and s1 < · · · < sn is positive semidefinite. The proof starts at page 78 (Example 5, ‘Single-pair’ matrices),
continues at page 103 (Theorem 12), page 220 (Criterion A) and ends at page 238 (Theorem 16).
Thirdly, if K is an oscillatory kernel, then the solutions of the integral equation (2.5) form a T -sytem, which
is proved in Theorem 1, page 181.
Worst-case error in RKHS. In RKHS, worst-case quantities of linear functionals can be computed
explicitly. Indeed, for instance, the Cauchy-Schwartz inequality gives that for all h ∈ HK and all x ∈ T :
|h(x)| = |hK(x, .), hi| ≤ kK(x, .)kkhk,
from which it is obvious that suph∈HK ,khk≤1 |h(x)| = kK(x, .)k. Furthermore, by the reproducing property,
we have kK(x, .)k2 = K(x, x), and finally, for all x ∈ T ,
p
sup |h(x)| = K(x, x).
h∈HK ,khk≤1
A similar computation can be done for linear functionals defined by quadrature formulas. Let us first define
the worst-case error of a general quadrature.
Definition 6 (worst-case error of a quadrature). Let (X, w) be a quadrature composed of a set of nodes
X = (x1 , . . . , xn ) ∈ [a, b]n and a set of weights w = (w1 , . . . , wn ) ∈ Rn . Let H be a set of functions on
[a, b] → R. The worst-case error of (X, w) on H is defined by
Z Xn
wce(X, w, H) = sup h(x)dµ(x) − wi h(xi ) .
h∈H,khk≤1 i=1
Using the reproducing property, one obtains the well-known analytical expression (see e.g. Novak and
Woźniakowski, 2008, Chapter 10):
ZZ n
X Z X
wce(X, w, K)2 = K(x, x0 )dµ(x)dµ(x0 ) − 2 wi K(xi , x)dµ(x) + wi wj K(xi , xj )
i=1 i,j
8
Kernel quadrature. A kernel quadrature is obtained by minimizing the worst-case error. We will need
the following assumption:
Assumption 1. If for all i 6= j, xi 6= xj then the Gram matrix K(X, X) is invertible.
Under Assumption 1, for a given set of nodes X formed by different nodes, then (2.6) defines a strictly convex
function. Thus, it has a unique minimum, denoted by w? (X, K). By solving the first order conditions, we
directly get the exact expression of the vector of optimal weights:
After some algebra, we obtain the corresponding minimal value for the worst case error:
Kernel quadrature and optimal kernel quadratures can then be defined as follows.
Definition 7 (Kernel quadrature, optimal kernel quadrature). Let X be a set of nodes, and assume that
Assumption 1 is verified. 1) The kernel quadrature associated to X on HK is the quadrature (X, w? (X, K))
that minimizes the worst-case error wce(X, w, K) over all sets of weights in Rn .
2) An optimal kernel quadrature is a quadrature (X ? , w? ) that minimizes the worst-case error wce(X, w, K)
among all quadratures (X, w), or equivalently, that minimizes wce(X, w? (X, w), K) over all sets of nodes X.
Remark 4. Notice that the weights of a kernel quadrature are constrained neither to be positive, nor to sum
to 1.
9
Proof. Under the assumption on µ, for all f ∈ L2 [a, b],
Z b Z b Z b
m(b − a) f 2 dλ ≤ f 2 dµ ≤ M (b − a) f 2 dλ
a a a
where m, M are defined in Remark 1 and λ is the uniform distribution on [a, b]. Thus L2 (µ) = L2 (a, b) (with an
equivalent norm). Similarly H 1 (µ) = H 1 (a, b) (with an equivalent norm). Now, it is well known that H 1 (a, b)
is a RKHS, with an explicit kernel (see e.g. Duc-Jacquet, 1973; Thomas-Agnan, 1996). Thus, for all x ∈ [a, b],
the evaluation f ∈ H 1 (a, b) 7→ f (x) is continuous. By equivalence of the norms, f ∈ H 1 (µ) → f ∈ H 1 (a, b)
is continuous. Hence by composition, f ∈ H 1 (µ) → f (x) is continuous. This shows that H 1 (µ) is a RKHS.
Let us denote by K its kernel.
The link between K and the Poincaré inequality can be understood through the bilinear form a(f, g) =
hf, giH 1 (µ) . Consider the spectral problem: find f ∈ H 1 (µ) and β such that for all g ∈ H 1 (µ),
From (Roustant et al., 2017, Theorem 2 and its proof), under the assumption on µ, there exists a countable
sequence of solutions, which is given by βm = 1 + λm and ϕm (m ∈ N). Notice that βm > 0 for all m ∈ N.
Furthermore βm , ϕm are defined in an unique way (up to a change sign of the eigenfunctions) because the
eigenvalues are simple and the eigenfunctions have norm 1.
Now, since H 1 (µ) is a RKHS, the functions K(x, .) are dense in H 1 (µ) (x ∈ [a, b]). Thus, Problem (3.3) is
equivalent to:
∀x ∈ [a, b], a(f, K(x, .)) = βhf, K(x, .)i.
By the reproducing property, a(f, K(x, .)) = hf, K(x, .)iH 1 (µ) = f (x). Hence, (3.3) is equivalent to: find
f ∈ H 1 (µ) and β such that
Z b
∀x ∈ [a, b], f (x) = β K(x, y)f (y)dµ(y), (3.4)
a
which is equivalent to the spectral decomposition of the Hilbert-Schmidt operator associated to K (recall
that β > 0).
Moreover, by Proposition 1 and its proof, Problem (3.3) is equivalent to the regular Sturm-Liouville prob-
lem (2.1). Thus, from (Gantmakher and Krejn, 2002, Section 10, pages 234-238) or (Fasshauer, 2012), we
also obtain that the solution of (2.1) is equivalent to the solution of (3.4). In this context, K is called Green
function. But this point of view gives sharp information and leads to
1
K(x, y) = ψ(min(x, y))χ(max(x, y)), (x, y ∈ [a, b])
C
where ψ, χ are two linearly independent solutions of the homogeneous equation L(f ) = 0 such that ψ 0 (a) =
0 and χ0 (b) = 0 (Gantmakher and Krejn, 2002, section 7). The constant C is determined such that
Rb
a
K(x, y)dµ(y) = 1 for all x ∈ [a, b]. Indeed, as the constant function 1 belongs to H 1 (µ), the RKHS
reproducing property gives, for all x ∈ [a, b]:
Z b
1 = 1(x) = h1, K(x, .)iH 1 (µ) = K(x, y)dµ(y).
a
Rb Rb
For instance, choosing y = b or x = a, we obtain C = χ(b) a ψ(x)dµ(x) = ψ(a) a χ(y)dµ(y).
Now ψ and χ are continuous, as elements of H 1 (µ) (whose functions are equal to those of H 1 (a, b)). As
min, max are continuous functions, we obtain, by composition, that K is continuous on [a, b]2 . Hence by
Mercer theorem (see e.g. Berlinet and Thomas-Agnan, 2011), K is written in terms of the solutions of (3.4)
as X
K(x, y) = αm ϕm (x)ϕm (y), (x, y, ∈ [a, b]) (3.5)
m∈N
1 1
with αm = βm = 1+λm , and the convergence is uniform on [a, b]2 .
10
Remark 5. We mention another way to obtain the Mercer’s representation of K. A property of the Poincaré
basis is that it is an orthogonal basis of H 1 (µ) with kϕm k2H 1 (µ) = 1 +λm = 1/αm (Lüthen et al., 2021). Thus,
√
the functions em = αm ϕm (m ≥ 0) define an orthonormal basis of H 1 (µ). Then, the representation (3.5) is
obtained with the usual representation of a kernel in a separable RKHS (Berlinet and Thomas-Agnan, 2011):
X X
K(x, y) = αm ϕm (x)ϕm (y) = em (x)em (y).
m∈N m∈N
However, following this way leads only to pointwise convergence. So that, it is not clear whether K is
continuous on [a, b]2 . Thus, another argument has been used here, coming from the Green function point of
view, to prove the kernel continuity.
Remark 6. At first look, it may be surprising that the Neumann conditions f 0 (a) = f 0 (b) = 0 do not appear
in the RKHS, whereas all basis functions ϕm satisfy it (while being dense in L2 (µ)). Actually, it is not
difficult to see that any function f of H 1 (µ) can be approximated by a function f ∈ H 1 (µ) that verifies the
Neumann condition, simply by truncating f on [a + , b − ] and extending it continuously by a constant on
[a, a + ] and [b − , b]. As functions of H 1 (µ) are continuous on the compact interval [a, b] (still under our
assumption on µ), the approximation error can be made arbitrary small.
3.2 Examples
Example 1 (Case of the uniform distribution). Let µ be the uniform distribution on [a, b]. Then L is the
Laplacian operator, and the spectral problem becomes
with Neumann conditions ϕ0 (a) = ϕ0 (b) = 0. The solutions are given by λ0 = 0, ϕ0 = 1 and for m ≥ 1,
√
λm = m2 ω 2 , ϕm (x) = 2 cos(mω(x − a)),
with ω = π/(b − a). The kernel of H 1 (µ) (with its usual norm) has been obtained by Duc-Jacquet (1973) in
the 70’s. English-written proofs can be found in Atteia (1992), Example 1.4, or Thomas-Agnan (1996)). The
kernel may be written as
b−a
K(x, y) = cosh[min(x, y) − a] cosh[b − max(x, y)]
sinh(b − a)
ex +e−x ex −e−x
where cosh, sinh denote the hyperbolic functions: cosh(x) = 2 , sinh(x) = 2 . Applying Proposi-
tion 4, we deduce that for all (x, y) ∈ [a, b]2 such that x ≤ y,
+∞
π/ω X 1
K(x, y) = cosh(x − a) cosh(b − y) = 1 + 2 2 ω2
cos[mω(x − a)] cos[mω(y − a)].
sinh(π/ω) m=1
1 + m
As a by-product, we can derive the value of some ‘shifted’ Riemann series. For instance, from x = y = a and
x = a, y = b and using r = 1/ω, we get the formulas (with tanh(x) = sinh(x)/ cosh(x)), valid for all r > 0:
+∞ +∞
(−1)n−1
X 1 1 πr X 1 πr
2 2
= 2 −1 , = 1 − .
n=1
n +r 2r tanh(πr) n=1
n2 + r 2 2r2 sinh(πr)
It can be shown that these formulas are also valid when r tends to zero. The limit case gives the well-known
expression at s = 2 of the Riemann and Dirichlet functions: ζ(2) = π 2 /6 and η(2) = π 2 /12.
As for the classical space H 1 (a, b), the kernel of H 1 (µ) and its Mercer representation are completely
tractable for the truncated exponential distribution.
11
Example 2 (Truncated exponential distribution). Consider the exponential distribution, truncated on [a, b] ⊆
dµ e−x
R+ : = exp(−V (x)) = −a 1[a,b](x) . Noting that V 0 (x) = 1 on [a, b], leads to linear differential
dx e − e−b
equations with constant coefficients. Following Roustant et al. (2017, Proof of Proposition 5), the spectral
problem
ϕ00 − ϕ0 = −λϕ
with Neumann conditions ϕ0 (a) = ϕ0 (b) = 0, admits the solutions, λ0 = 0, ϕ0 ≡ 1 and for m ≥ 1,
1
λm = + (mω)2 , ϕm (x) = cm ex/2 (2mω cos(mω(x − a)) − sin(mω(x − a))) (3.6)
4
where ω = π/(b − a) and cm is a normalizing constant ensuring that ϕm has L2 (µ) norm equal to 1. More
precisely, we have
−a 1/2
e − e−b 1
cm = .
b − a 2λm
From Proposition 4 and following Gantmakher and Krejn (2002), the Green function associated to this spectral
problem can be computed by considering the linear homogeneous equation
ϕ00 − ϕ0 − ϕ = 0.
√ √
The solutions are spanned by er0 x , er1 x , where r0 = 1−2 5 and r1 = 1+2 5 (the gold number). Solutions ψ, χ
satisfying one-sided Neumann conditions ψ 0 (a) = 0 and χ0 (b) = 0 are given by
ψ(x) = (r1 er1 a )er0 x − (r0 er0 a )er1 x , χ(x) = (r1 er1 b )er0 x − (r0 er0 b )er1 x . (3.7)
Rb
Finally, the normalization constant is given by C = a
ψ(x)χ(b)dµ(x). After some algebra, we obtain
12
Proposition 5 (Invertibility of the Gram matrix for truncated Mercer representation of H 1 (µ)). Assump-
tion 1 is verified for KM when X contains at most M + 1 distinct points.
Proof. Let x0 , . . . , xn a set of distinct points with n ≤ M . For i = 0, . . . , n, denote ei = KM (xi , .) ∈ H 1 (µ).
Then KM (xi , xj ) = hei , ej iH 1 (µ) . Thus KM (X, X) is the Gram matrix of the vectors ei for the scalar product
in H 1 (µ). By a classical result, it is invertible if and only if e0 , e1 , . . . , en are linearly independent. It is enough
to prove that e0 , e1 , . . . , eM are linearly independent, since n ≤ M . Now, by definition, we have
M
X
ei = KM (xi , .) = Ai,m ϕm (.)
m=0
with Ai,m = αm ϕm (xi ). Hence, if e = (e1 , . . . , eM )> and ϕ = (ϕ0 , . . . , ϕM )> are column vectors whose
elements are in H 1 (µ), then we have e = Aϕ. Remember that the ϕm ’s are linearly independent (m ≥ 0), as
they form an Hilbertian basis of L2 (µ). Thus the coordinates of e are linearly independent if and only if A is
invertible. Remarking that the mth column of A is proportional to the vector (ϕm (xi ))0≤i≤M with a non-zero
multiplicative coefficient αm . Hence, the rank of A is equal to the rank of the matrix (ϕm (xi ))0≤m,i≤M . As
the ϕm ’s form a T -system, this matrix is invertible, which completes the proof.
Proposition 6 (Invertibility and form of the Gram matrix for H 1 (µ)). Let K be the kernel of H 1 (µ), where
µ is a probability distribution in B. Then, Assumption 1 is verified for all set X composed of distinct knots.
Furthermore, in that case, if the elements of X are sorted in ascending (or decreasing) order, the precision
matrix K(X, X)−1 is a one-band matrix (or Jacobi matrix) of the form:
a1 b1 0 0 ... 0
b1 a2 b2 0 0
..
..
0 b2 a3
b 3 . .
. . . . .
.. .. .. .. ..
0
0 0 bn−2 an−1 bn−1
0 0 ... 0 bn−1 an
Proof. By Proposition 4 and Theorem 16 (page 238) of Gantmakher and Krejn (2002), the single-pair kernel
K is oscillatory in the sense of definition 1 page 178 of the same reference, implying in particular that for all
X composed of distinct knots, K(X, X) is invertible. The form of the precision matrix is derived in section
II.3., example 6, pages 79-82 of the same reference.
Remark 7 (Link with Gaussian graphical models). The one-band form of the precision matrix of Propo-
sition 6 shows that a Gaussian process associated to the kernel of H 1 (µ) is a graphical Gaussian model
(Maathuis et al., 2018). Indeed, consider the graph whose vertices are the quadrature nodes sorted in as-
cending order x1 < · · · < xn and edges are formed by successive nodes (xi , xi+1 ) with i = 1, . . . , n − 1. An
associated Gaussian process Z verifies that, for |i − j| > 1, Z(xi ) and Z(xj ) are independent conditionaly
on {Z(x` ) s.t. ` ∈/ {i, j}}. This is equivalent to the one-band form of the precision matrix of Z computed at
x1 , . . . , x n .
Proposition 7 (Equivalence of Poincaré quadrature and optimal kernel quadrature in H 1 (µ)). Let (XP , wP )
be the Poincaré quadrature of H 1 (µ) with n nodes and order M = 2n − 1. Then, wce(XP , wP , KM ) = 0 and
(XP , wP ) is an optimal kernel quadrature for HKM , with positive weights.
Conversely, if (X, w) defines a kernel quadrature for HKM such that wce(X, w, KM ) = 0 and the weights are
13
positive, then X = XP and w = wP .
Furthermore, wP minimizes over all (possibly negative) weights the worst-case error given XP :
Remark 8. As a consequence of Proposition 7, the optimal weights w? (XP , KM ) are equal to wP , and are
thus positive and sum to 1, which was not obvious a priori as they are defined by an optimization problem on
Rn .
Proof. Let us first consider the Poincaré quadrature. We know that
Z n
X
ϕm (x)dµ(x) = wj ϕm (xj ), m = 0, . . . , M. (4.2)
j=1
or equivalently !
M
Z X n
X M
X
0 0 j
αm ϕm (x )ϕm (x)dµ(x) = wj αm ϕm (x )ϕm (x ) ,
m=0 j=1 m=0
i.e. Z n
X
KM (x, x0 )dµ(x) = wi K(xi , x0 ).
i=1
14
Similarly, when X is formed by at most M + 1 distinct points, Equations (4.3) and (4.4) are true when
replacing K by KM . In particular, let (XP , wP ) be the Poincaré quadrature of H 1 (µ) with n nodes and order
M = 2n − 1. Then we have:
wP = w? (XP , KM ) = KM (XP , XP )−1 1. (4.5)
.
Proof. First recall that Assumption 1 is verified both for K and KM (Prop. 6 and 5), in the latter case when
X has at most M + 1 points. Let us first consider the case of the kernel of H 1 (µ). By Proposition 4, 1., we
have, for all y ∈ [a, b]:
Z b
`K (y) = K(x, y)dµ(x) = 1.
a
Rb
From (2.7), we then deduce (4.3). Finally, from (2.8), we deduce (4.4), using that cK = a `K (y)dµ(y) = 1.
The same proof applies when K = KM , remarking that, by orthonormality of the Poincaré basis,
Z b M
X Z b
`KM (y) = KM (x, y)dµ(x) = αm ϕm (y) ϕm (x)dµ(x) = 1.
a m=0 a
This implies that formulas (4.3) and (4.4) hold when K is replaced by KM .
Now, let us set X = XP . From Proposition 7, we have wP = w? (XP , KM ). This implies (4.5). Notice that,
as the the weights wP sum to one, we recover one statement of Proposition 7 that wce(XP , wP , KM ) = 0.
In our case, the radius of information can hardly be computed. However, we can consider the Poincaré
quadrature, which is the optimal kernel quadrature with positive weights of the finite-dimensional approx-
imation of H 1 (µ), and compute the corresponding worst-case error in H 1 (µ). More precisely, if (XP , wP )
denotes the Poincaré quadrature with n nodes and order M = 2n − 1, we set:
By definition we have wce(n) ≥ r(n). Furthemore, when n is large, KM tends to K and we can hope that
wce(n) is a good approximation of r(n). We now provide formulas for wce(n).
Proposition 9 (Quadrature error). The worst-case error of the Poincaré quadrature with n nodes and order
M = 2n − 1 can be expressed with the Mercer representation of H 1 (µ), by:
n
!2
X X
2
wce(n) = αm wi ϕm (xi ) , (4.8)
m≥M +1 i=1
15
Proof of Proposition 9. Denote by L the linear form on HK defined by
Z Xn
L(f ; X, w) = f (x)dµ(x) − wi f (xi )
i=1
and X
RM (x, x0 ) = αm ϕm (x)ϕm (x0 ). (4.11)
m≥M +1
as the quadrature is exact for the eigenfunctions up to order M . Hence, by linearity, it holds for all x0 :
L(K(., x0 ); XP , wP ) = L(RM (., x0 ); XP , wP ).
Now, remark that for all quadrature formula (X, w),
wce(X, w, K) = kx0 7→ L(K(., x0 ))kHK .
Therefore, we have wce(XP , wP , KM ) = 0 and wce(XP , wP , K) = wce(XP , wP , RM ). Thus, the quantity of
interest reduces to:
wce(n) := |wce(XP , wP , KM ) − wce(XP , wP , K)| = wce(XP , wP , RM ).
R
Now, when m ≥ M + 1, we have ϕm dµ = 0. Thus,
wce(n)2 = kx0 7→ L(RM (., x0 ); XP , wP )k2HK
! 2
X X n
= αm wi ϕm (xi ) ϕm
m≥M +1 i=1
HK
1 −1
As (ϕm ) is an orthogonal basis of H (µ) with kϕm k2HK = 1 + λm = αm , we immediately obtain
n
!2
X X
2
wce(n) = αm wi ϕm (xi ) .
m≥M +1 i=1
X
Remarking that RM (xi , .) = αm ϕm (xi )ϕm (.), we get
m≥M +1
n 2
X
2
wce(n) = wi RM (xi , .)
i=1 HK
which gives (4.9). Using (4.5), we deduce (4.10). By Mercer theorem, as K is continuous, the series
P
m αm ϕm (x)ϕm (y) converges to K(x, y) uniformly on [a, b] × [a, b]. Thus RM goes to zero uniformly on
[a, b]2 , and, using the positivity of the weights,
X
wce(n)2 ≤ wi wj kRM k∞ .
1≤i,j≤n
P Pn 2
The results follows by remarking that 1≤i,j≤n wi wj = ( i=1 wi ) = 1.
16
5 The case of H 1 (a, b)
5.1 Nodes coincide with zeros of a basis function
For polynomials, the nodes of the Gaussian quadrature coincide with the zeros of orthogonal polynomials
(Szegö, 1959). The main result of this section can be viewed as a variant of this property on the circle. Indeed,
for the uniform distibution, the Poincaré basis consists of trigonometric functions that verify a stability-by-
product property, similarly to power functions. The key argument is that the quadrature is not only exact
for the functions of the T -system (up to some order) but also for their products. This guarantees that the
quadrature nodes coincide with the zeros of an element of the T -system.
Proposition 10. The nodes of the Poincaré quadrature of H 1 (a, b) with n nodes are equal to the zeros of
the Poincaré basis function ϕn .
Proof. We first prove that the Poincaré quadrature with n nodes is exact for the product ϕj ϕk of eigenfunc-
tions with j, k ∈ N such that j < n and k ≤ n. √
Recall that when µ is the uniform distribution on [a, b], we have ϕm (x) = 2 cos(mω(x − a)) with ω =
π/(b − a). Using the trigonometric identity cos(u) cos(v) = 21 (cos(u + v) + cos(u − v)), it follows that
1
ϕn (x)ϕm (x) = (ϕn+m (x) + ϕn−m (x)) .
2
By Definition 5, the Gaussian quadrature of the T -system (ϕm )m∈N with n nodes is exact for ϕm for 0 ≤
m ≤ 2n − 1. Let j, k ∈ N such that j < n, k ≤ n. Without loss of generality, assume k ≥ j. This implies
that both k + j and k − j are non-negative and lower or equal than 2n − 1. Thus, the quadrature is exact for
ϕk+j and ϕk−j and we have:
Z Z Z
1 1
δkj = ϕk (x)ϕj (x)dµ(x) = ϕk+j (x)dµ(x) + ϕk−j (x)dµ(x)
2 2
n n
1X 1X
= wi ϕk+j (xi ) + wi ϕk−j (xi )
2 i=1 2 i=1
n
X
= wi ϕk (xi )ϕj (xi ).
i=1
This proves that the Poincaré quadrature with n nodes is exact for ϕj ϕk , with j < n and k ≤ n. We now
deduce that the quadrature nodes coincides with the zeros of ϕn .
Following (Karlin and Studden, 1966, p. 20), as (ϕm )m∈N is a T -system, there exists a non-zero generalized
polynomial pn = α0 ϕ0 + . . . + αn ϕn that vanishes at x1 , . . . , xn . It can be defined from the generalized
Vandermonde matrix by:
pn (x) = det V (ϕ0 , . . . , ϕn−1 , ϕn ; x1 , . . . , xn , x).
Notice that αn 6= 0. Indeed, by expanding the determinant, we have αn = det V (ϕ0 , . . . , ϕn−1 ; x1 , . . . , xn ),
which is non-zero by the T -system property. Now, let j ∈ N with j < n. By the first part above, for all k ∈ N
such that k ≤ n, the Poincaré quadrature is exact for ϕj ϕk . This implies that pn is orthogonal to ϕj :
Z n
X Z
pn (x)ϕj (x)dµ(x) = αk ϕk (x)ϕj (x)dµ(x)
k=0
Xn n
X
= αk wi ϕk (xi )ϕj (xi )
k=0 i=1
Xn Xn
= wi αk ϕk (xi ) ϕj (xi ) = 0.
i=1 k=0
| {z }
=pn (xi )=0
Thus pn is orthogonal to ϕ0 , . . . , ϕn−1 . Since (ϕm )m∈N is a system of orthogonal functions, we deduce that
pn = αn ϕn . As αn 6= 0, we have ϕn = αn−1 pn . From the definition of pn , this implies that ϕn is zero at the
quadrature nodes, which was to prove.
17
5.2 Explicit quadrature formulas and quadrature error
We first give a standard result about trigonometric sums.
Lemma 1. For all m ∈ Z and all n ∈ N? ,
n (
X 1 mπ 0 if m is not a multiple of 2n
cos i− =
i=1
2 n n(−1)p if m = (2n)p, for all p ∈ Z.
Proof. Let x = (mπ)/n. If m is a multiple of 2n, then x = 2pπ for some p ∈ Z. Then
n Xn n
X 1 X
cos i− x = cos(−pπ) = (−1)p = n(−1)p .
i=1
2 i=1 i=1
Now, assume that m is not a multiple of 2n. Then x/2 is not a multiple of π and sin(x/2) 6= 0. Using the
trigonometric identity 2 cos a sin b = sin(a + b) − sin(a − b), we have
1 x
2 cos i− x sin = sin(ix) − sin((i − 1)x).
2 2
Summing with respect to i from 1 to n then gives the announced result.
We can now deduce the two main propositions of the section.
Proposition 11. The Poincaré quadrature of H 1 (a, b) with n nodes corresponds to the midpoint (or rectangle)
quadrature rule
Z b n
b−aX 1 b−a
f (x)dx = f a+ i− .
a n i=1 2 n
Thus, the optimal weights are
equal to 1/n, and the optimal
nodes are evenly spaced on [a, b] and located at
b−a b−a
the middle of each interval a + (i − 1) ,a + i , i = 1, . . . , n. This quadrature has order 2n−1 with
n n
respect to the generalized polynomials: it is exact for all ϕm ∝ cos mπ x−a
b−a with m ≤ 2n − 1. Furthermore
it is also exact for all ϕm such that m is not a multiple of 2n, and for polynomials of order 1.
Proof. From the previous section, the nodes of the Poincaré quadrature of H 1 (a, b) coincide to the zeros of
√
ϕn = 2 cos nπ x−a on [a, b]. Hence they are equal to xi = a + i − 21 b−a
b−a n , for i = 1, . . . , n.
√ √ 1 mπ
Pn
Now, ϕm (xi ) = 2 cos(mω(xi − a)) = 2 cos i − 2 n . Thus, by Lemma 1, i=1 ϕm (xi ) = 0 if m is not
a multiple of 2n. Hence, if we set wi = n1 and if m is not a multiple of 2n, then
Z b n
dx X
ϕm (x) = δm,0 = wi ϕm (xi ),
a b−a i=1
where the case m = 0 is equivalent to w1 + · · · + wn = 1. Recall that the quadrature weights are uniquely
determined by the first n equations above, corresponding to m = 0, . . . , n−1. Indeed, the matrix of the linear
system is (ϕm (xi ))0≤m,i≤n−1 , which is invertible by the T-system property. This proves that the optimal
weights are equal to 1/n. Furthermore, the same equations show that the quadrature rule is exact for all ϕm
such that m is not a multiple of 2n. Finally, the quadrature is interpreted as the midpoint quadrature rule,
which is exact for all polynomials of order 1.
Proposition 12 (Quadrature error). Consider the quadrature error defined as the worst-case error of the
Poincaré quadrature of H 1 (a, b) with n nodes (see Section 4.4). We have:
!1/2
b−a
2n
wce(n) = −1 (5.1)
tanh b−a
2n
18
and goes to zero at a linear speed when n tends to infinity:
b−a 1
wce(n) ∼ √ .
2 3 n
1
b−a
Proof. Recall that the optimal weights are wi = 1/n, and the optimal nodes are xi = a + i − 2 n . Then,
using (4.8), we have:
∞ n
!2
2
X 1 X
wce(n) = αm 2 ϕm (xi ) .
m=2n
n i=1
√ √
Now, ϕm (xi ) = 2 cos(mω(xi − a)) = 2 cos i − 12 mπ π
n , with ω = b−a . By Lemma 1,
n
(
X 0 if m is not a multiple of 2n
ϕm (xi ) = √ p
i=1
2n(−1) if m = (2n)p, for all positive integer p.
Thus in this sum above, the non-zeros terms are such that m = 2pn. Observe that m ≥ 2n is then equivalent
to p ≥ 1. Hence, reparameterizing by p, we obtain
∞ ∞
X X 1
wce(n)2 = 2 α2np = 2 2 /r 2
.
p=1 p=1
1 + p
This gives the explicit form of wce(n). To obtain the speed of convergence, notice that by an immediate
π2
P∞ 1
application of Lebesgue theorem, p=1 p2 +r 2 tends to ζ(2) = 6 when r tends to zero. Hence, when n tends
to infinity,
1 1 π2 (b − a)2 1
wce(n)2 ∼ 2 2 = .
2n ω 6 12 n2
This is the same convergence speed as the worst-case error of the Poincaré quadrature, which we derived in
Prop. 12:
wce(n)
→ 1.
r(n) n→∞
Therefore, we can conclude that the Poincaré quadrature is asymptotically optimal for H 1 (0, 1). This result
is intuitive as the finite-dimensional kernel KM , for which the Poincaré quadrature is optimal, tends to the
kernel of H 1 (µ).
19
6 Numerical experiments
6.1 Numerical computation of the Poincaré quadrature
Computation of the spectral decomposition. The first step to compute numerically the Poincaré
quadrature is to obtain the spectral decomposition of Theorem 1. This was investigated by Roustant et al.
(2017, Section 4.2.), who proposed a finite element technique. It consists of solving Problem (P2) in the
finite-dimensional space spanned by Nmesh piecewise linear functions whose knots are evenly spaced, which
then boils down to a matrix diagonalisation problem. The theory of finite elements quantifies the speed of
convergence when Nmesh tends to infinity, depending on the regularity of the probability density function ρ
of µ ∈ B. If ρ is of class C ` , the Poincaré basis functions are of class C `+1 , and the order of convergence is
−2` −`
O(Nmesh ) for the eigenvalues and O(Nmesh ) for the eigenfunctions. Notice that the value of Nmesh , controlling
the mesh size, should be of course must larger than the order of the eigenvalue (or eigenfunction) to estimate.
In practice we choose Nmesh = 1 000. Figure 1 illustrates the result for the uniform distribution on (0, 1) and
the exponential distribution truncated on (0, 3), for which the spectral decomposition is known in closed-form
(as detailed in Section 3.2). We can see that the numerical approximation is accurate.
1.5
10
1.0
8
0.5
0.0
6
−0.5
4
−1.0
−1.5
6
1
0
4
−1
−2
2
−3
Figure 1: Poincaré spectral decomposition for the uniform distribution on (0, 1) (top) and the exponential
distribution dµ(x) = e−x 1R+ (x), truncated on (0, 3) (bottom). The left panel represent the first five eigen-
functions (Poincaré basis) and the right panel the first 50 eigenvalues in log scale. The colored dotted lines
are the approximations computed by finite elements, and the (superimposed) grey solid lines are the true
expressions.
Computation of the Poincaré quadrature. We now assume that the Poincaré basis has been computed
numerically, as explained in the previous paragraph. We aim at computing the Poincaré quadrature with
n nodes. By definition, the Poincaré quadrature is the (generalized) Gaussian quadrature of the Poincaré
20
basis. Thus, it can be obtained by solving the minimization problem (2.4) of Proposition 2 over probability
distributions σ subject to moment conditions. More precisely, inspired by the work of Ryu and Boyd (2015)
for the usual Gaussian quadrature, we search σ as a discrete mesure supported on a fine uniform grid. We
thus choose a large integer N n and consider the grid formed by evenly spaced points zj = a + jh
PN −1
(j = 0, . . . , N − 1) where h = Nb−a
−1 is the grid size. Searching for σ of the form σ = j=0 wj δzj , (2.4) is
then rewritten as the following linear programming (LP) problem:
N
X −1
w∗ = arg min wj ϕ2n (zj ) (6.1)
w∈[0,1]N
j=0
N
X −1
subject to wj = 1
j=0
N
X −1 Z b
and wj ϕi (zj ) = ϕi dµ = δi,0 (i = 0, . . . , 2n − 1).
j=0 a
The problem can be solved numerically by standard PN −1LP solvers. However, as the grid points may not contain
exactly the unknown nodes, the solution σ ? = j=0 wj? δzj is generally supported on more than n points.
Thus, as a postprocessing step, we follow Ryu and Boyd (2015) and apply a clustering technique with n
clusters (typically the k-means algorithm) to approximate the support points of the distribution. In each
cluster Ci (i = 1, . . . , n), a node xi is defined as a convex combination of its elements zj with weights
proportional to wj? (j ∈ Ci ). Finally, the weight wi associated to this node is defined as the sum of the
weights in Ci .
Due to the finite grid and the heuristic clustering and averaging technique, the moment conditions are in
general not fulfilled exactly. To improve the accuracy of the obtained solution (x, w) further, we include a
second optimization step, again following Ryu and Boyd (2015), in which we minimize the squared norm of
the misfit for the moments
2
2n−1
X Xn
(X P , wP ) = arg min δi,0 − wj ϕi (xj ) (6.2)
x,w
i=0 j=1
subject to a ≤ xj ≤ b (j = 1, . . . , n)
and 0 ≤ wj ≤ 1 (j = 1, . . . , n)
over both x and w using the interior-point algorithm (since we compute lower principal representations, which
are always in (a, b)). The nodes and weights obtained from postprocessing the solution to (6.1) are used as
the starting point for (6.2). The objective function value of the solution (X P , wP ) of (6.2) is usually in the
order of 10−7 . For the uniform distribution, it is in the order of 10−16 .
To evaluate the accuracy of the numerical quadrature, we compute the Poincaré quadrature of the uniform
distribution where the analytical result is known (see Section 5). We have used the exact expression of the
Poincaré basis. Figure 6.1 shows the results for n = 3 and n = 5 nodes. We can see that the nodes found
coincide with the zeros of ϕn , as expected by the theory. Furthermore, the weights are equal to 1/n (up to
machine precision), which is also expected.
Although the numerical computation of the Poincaré basis has been found to be accurate (see above
in this section), we also investigate its influence by replacing the exact expression of the Poincaré basis in
the previous experiments by its numerical approximation. The results are almost the same (the difference
is in the order of 10−12 ), showing that the whole procedure gives accurate results for the uniform distribution.
21
1.5 1.5
1 1
0.5 0.5
0 0
-0.5 -0.5
-1 -1
-1.5 -1.5
-1 -0.5 0 0.5 1 -1 -0.5 0 0.5 1
Figure 2: Poincaré quadrature for the uniform distribution on (0, 1), with n = 3 nodes (left) and n = 5 nodes
(right). The curve represents the Poincaré basis function with n roots, and the red crosses and lines the
quadrature nodes and weights obtained by the numerical procedure.
1. The nodes are almost – but in general not exactly – equal to the zeros of ϕn . The difference is not due
to numerical error. It is present even if the basis functions are known analytically, such as in the case
of the truncated exponential.
2. The nodes are almost evenly spaced, but slightly skewed towards the concentration of probability mass.
6 1.4 2 3
4 1.2 2.5
1
1
2 2
0
0.8
0 1.5
0.6
-1
-2 1
0.4
-2
-4 0.2 0.5
-6 0 -3 0
1 2 3 4 5 0 0.2 0.4 0.6 0.8 1
Figure 3: Poincaré nodes and weights for the truncated exponential distribution (interval [1, 5], left) and a
nonparametric density (right). Note that each plot has two y-axes with different scaling: the left one (in
black) is for the basis function ϕn and for the nodes xi , while the right one (in gray) is used for the probability
density and the weights wi .
Experiments with random densities. To better understand the properties of the Poincaré quadrature,
we investigate it for a number of randomly generated continuous probability distributions. Their probability
density functions (pdfs) are generated as follows. On the interval [0, 1], we sample independent realizations
of a Gaussian process with mean zero and a Matern- 52 covariance kernel with parameter θ = 0.3. Denote
22
one such realization with g(x). Then, we define a pdf by exp(g(x)) (up to a normalization constant). In
case where the minimal value on [0, 1] is smaller than 0.05, we reject this pdf, in order to avoid numerical
issues (recall that µ must be a bounded perturbation of the uniform distribution, and thus its pdf does not
vanish on the support interval). One example together with its Poincaré quadrature for n = 8 is shown in
Fig. 3b. A set of 100 such pdfs is visualized in Figure 4a. As expected, the configurations are various, and
often provide multimodal pdfs.
Ratio of pdf and weights. To further investigate the second and third observations mentioned above,
namely, that the weights of the Poincaré quadrature mimic the associated density, we analyse the nodes and
weights associated to 100 random densities.
In Fig. 4b, we display boxplots of the locations of the n = 5 nodes. We see that the nodes are nearly
evenly spaced, which would correspond to the locations (0.1, 0.3, 0.5, 0.7, 0.9).
nwi
Furthermore, in Fig. 4c we display boxplots of the ratio ρ(x i)
for the n = 5 quadrature nodes, where the
weights are scaled by n for convenience. As already guessed from Fig. 3, we see that this ratio is quite close
to 1. This suggests that the Poincaré quadrature might be a good quantization for the density ρ, which we
investigate in the following.
7
0.9
6
0.8
0.6
4
0.5
3
0.4
2 0.3
0.2
1
0.1
0
0 0.2 0.4 0.6 0.8 1 1 2 3 4 5
node index
(a) 100 randomly generated densities (b) Distribution of nodes
1.3
1.2
ratio
1.1
0.9
1 2 3 4 5
node index
(c) Ratio between weights and pdf
Figure 4: Left panel: 100 random pdfs generated by the procedure described in Section 6.2. Right panel:
Location of nodes for the Poincaré quadratures associated to the same densities, with n = 4. Bottom panel:
wi n
Ratio ρ(x i)
, where ρ is the pdf associated to µ.
23
Wasserstein-optimal quantization. It is interesting to compare the Poincaré quadrature to other stan-
dard quantization procedures, where quantization means an approximation of a continuous probability dis-
tribution by a discrete one. Here, we will focus on the optimal quantization associated to the Wasserstein
distance, called Wasserstein-optimal quantization. The Wasserstein distance between two cumulative density
functions (cdf) F, G is defined by
Z 1 1/2
W (F, G) = (F −1 (p) − G−1 (p))2 dp . (6.3)
0
Then, the corresponding optimal quantization with n points is the discrete probability distribution that has
the smallest Wasserstein distance to the density associated to the measure µ. It can be computed efficiently
using Lloyd’s algorithm (Graf and Luschgy, 2007).
For each random pdf, for a fixed number of nodes n = 5, we compute the Poincaré quadrature, the standard
Gaussian quadrature (associated to ordinary polynomials), and the Wasserstein-optimal quantization. We
report the location of the nodes, as well as the zeros of ϕn , in Figure 5a. We observe that the Poincaré nodes
are quite evenly spaced with small variability, and close (but not equal) to the zeros of the Poincaré basis
function ϕn (denoted by red crosses). Furthermore, the Poincaré nodes are more evenly spaced than the
support points of the Wasserstein-optimal quantization (denoted by yellow triangles). Finally, we observe
that the Gaussian nodes are more spread out than the Poincaré nodes: the outermost Gaussian nodes are
closer to the boundary than the outermost Poincaré nodes.
100
90
80
70
60
Repetitions
Gauss vs Poincare
50 Opt. quant. vs Poincare
0.065
Poincare quadrature
40
0.06
30
0.055
20
0.05
10
0.045
0 0.045 0.05 0.055 0.06 0.065
0 0.2 0.4 0.6 0.8 1
Support interval Gauss quadrature/Optimal quant.
(a) Locations of nodes and zeros (b) Comparison of Wasserstein distances
Figure 5: Analysis of the properties of Poincaré quadrature rules with n = 5 nodes for the random probability
distributions displayed in Fig. 4a. The left panel shows the corresponding nodes for the (usual) Gaussian
quadrature and the Poincaré quadrature, as well as the support points for the Wasserstein-optimal quantiza-
tion. The red crosses indicate the zeros of the Poincaré basis function ϕn . Right panel: Wasserstein distances
between the continuous probability distribution µ and three different quadrature rules: Gaussian, Poincaré,
and Wasserstein-optimal. Each blue point corresponds to a random density and shows the Wasserstein dis-
tances of Gaussian vs Poincaré quadratures. Similarly, each red point corresponds to a random density and
shows the Wasserstein distances of Wasserstein-optimal vs Poincaré quadratures. In this way, each random
density corresponds to two points in the plot. The black dashed line visualizes the identity x = y.
To further quantify the comparison, we measure the Wasserstein distance of the standard Gaussian
quadrature and the Poincaré quadrature to µ. Results with n = 5 are presented in Figure 5b We see that in
24
most cases, the Poincaré quadrature has a smaller Wasserstein distance than the Gaussian one. The optimal
quantization is by construction better than both, but often actually not much better than the Poincaré
quadrature.
10 -1
10 -2
1 10 20
Figure 6: Approximation of wce(n) by wce(Xp , wP , KT ) for the uniform distribution, and comparison with
the closed-form expression of wce(n) and its asymptotical speed of convergence.
Secondly, we repeat the experiment for 5 random densities generated as described in Section 6.2. The
worst-case error is computed with the fully numerical procedure presented above, with T = 100. The results
are shown in Figure 7. Since we use a truncated kernel, the estimate likely underestimates the true wce(n).
We observe that the worst-case error for the random densities appears to be close to √112 n−1 , the asymptotic
speed of convergence of wce(n) for H 1 (0, 1), suggesting that this speed of convergence might be universal.
25
10 0
3
2
10 -1
10 -2
1 10 20 0
0 0.5 1
Figure 7: Worst-case error (left) for 5 random densities supported on [0, 1] (right). The worst-case error is
computed using Equation (4.9) as wce(XP , wP , KT ) for T = 100.
Acknowledgement
This research was conducted with the support of the consortium in Applied Mathematics CIROQUO, gath-
ering partners in technological research and academia in the development of advanced methods for Computer
Experiments (doi:10.5281/zenodo.6581217) and the LabEx CIMI in the frame of the research project Global
sensitivity analysis and Poincaré inequalities. Support from the ANR-3IA Artificial and Natural Intelligence
Toulouse Institute is also gratefully acknowledged.
References
Atteia, M. (1992). Hilbertian Kernels and Spline Functions. Studies in Computational Mathematics, 4.
Bakry, D., I. Gentil, and M. Ledoux (2014). Analysis and geometry of Markov diffusion operators, volume 348
of Grundlehren der Mathematischen Wissenschaften [Fundamental Principles of Mathematical Sciences].
Springer, Cham.
Berlinet, A. and C. Thomas-Agnan (2011). Reproducing kernel Hilbert spaces in probability and statistics.
Springer Science & Business Media.
26
Brass, H. and K. Petras (2011). Quadrature theory: the theory of numerical integration on a compact interval.
Number 178. American Mathematical Soc.
Dick, J., D. Nuyens, and F. Pillichshammer (2014). Lattice rules for nonperiodic smooth integrands. Nu-
merische Mathematik 126 (2), 259–291.
Duc-Jacquet, M. (1973). Approximation des fonctionnelles linéaires sur les espaces hilbertiens autorepro-
duisants. Ph. D. thesis, Université Joseph-Fourier-Grenoble I.
Fasshauer, G. E. (2012). Green’s functions: Taking another look at kernel approximation, radial basis
functions, and splines. In Approximation Theory XIII: San Antonio 2010, pp. 37–63. Springer.
Gantmakher, F. R. and M. G. Krejn (2002). Oscillation matrices and kernels and small vibrations of me-
chanical systems. Number 345. American Mathematical Soc.
Garcke, J. and M. Griebel (2012). Sparse grids and applications, Volume 88. Springer Science & Business
Media.
Graf, S. and H. Luschgy (2007). Foundations of quantization for probability distributions. Springer.
Karlin, S. and W. Studden (1966). T-systems: with applications in analysis and statistics. Pure Appl. Math.,
Interscience Publishers, New York, London, Sidney.
Karvonen, T., M. Kanagawa, and S. Särkkä (2019). On the positivity and magnitudes of bayesian quadrature
weights. Statistics and Computing 29 (6), 1317–1333.
Lüthen, N., O. Roustant, F. Gamboa, B. Iooss, S. Marelli, and B. Sudret (2021). Global sensitivity analysis
using derivative-based sparse Poincaré chaos expansions.
Maathuis, M., M. Drton, S. Lauritzen, and M. Wainwright (2018). Handbook of graphical models. CRC Press.
Novak, E. and H. Woźniakowski (2008). Tractability of Multivariate Problems: Standard information for
functionals, Volume 2. European Mathematical Society.
Oettershagen, J. (2017). Construction of optimal cubature algorithms with applications to econometrics and
uncertainty quantification. Verlag Dr. Hut.
O’Hagan, A. (1991). Bayes–hermite quadrature. Journal of statistical planning and inference 29 (3), 245–260.
Roustant, O., F. Barthe, and B. Iooss (2017). Poincaré inequalities on intervals – application to sensitivity
analysis. Electronic Journal of Statistics 11 (2), 3081 – 3119.
Ryu, E. K. and S. P. Boyd (2015). Extensions of Gauss quadrature via linear programming. Foundations of
Computational Mathematics 15 (4), 953–971.
Smolyak, S. A. (1963). Quadrature and interpolation formulas for tensor products of certain classes of
functions. In Doklady Akademii Nauk, Volume 148, pp. 1042–1045. Russian Academy of Sciences.
Szegö, G. (1959). Orthogonal polynomials. In Amer. Math. Soc. Colloquium, 1959.
Thomas-Agnan, C. (1996). Computing a family of reproducing kernels for statistical applications. Numerical
Algorithms 13 (1), 21–32.
Zhang, S. and E. Novak (2019). Optimal quadrature formulas for the Sobolev space H 1 . Journal of Scientific
Computing 78 (1), 274–289.
27