Calculus of Variations: Jump Processes As Generalized Gradient Flows
Calculus of Variations: Jump Processes As Generalized Gradient Flows
Calculus of Variations: Jump Processes As Generalized Gradient Flows
(2022) 61:33
https://doi.org/10.1007/s00526-021-02130-2 Calculus of Variations
Received: 11 August 2020 / Accepted: 17 October 2021 / Published online: 4 January 2022
© The Author(s) 2021
Abstract
We have created a functional framework for a class of non-metric gradient systems. The state
space is a space of nonnegative measures, and the class of systems includes the Forward
Kolmogorov equations for the laws of Markov jump processes on Polish spaces. This frame-
work comprises a definition of a notion of solutions, a method to prove existence, and an
archetype uniqueness result. We do this by using only the structure that is provided directly
by the dissipation functional, which need not be homogeneous, and we do not appeal to any
metric structure.
Contents
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1 Generalized gradient systems for Markov jump processes . . . . . . . . . . . . . . . . . . . . . 3
1.2 Challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.1 Definition of a solution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
1.2.2 Existence of solutions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
B Oliver Tse
[email protected]
Mark A. Peletier
[email protected]
Riccarda Rossi
[email protected]
Giuseppe Savaré
[email protected]
1 Department of Mathematics and Computer Science and Institute for Complex Molecular Systems,
TU Eindhoven 5600 MB, Eindhoven, The Netherlands
2 DIMI, Università degli studi di Brescia, Via Branze 38, 25133 Brescia, Italy
3 Department of Decision Sciences and BIDSA, Bocconi University, Via Roentgen 1, 20136 Milan,
Italy
4 Department of Mathematics and Computer Science, Eindhoven University of Technology, 5600 MB
Eindhoven, The Netherlands
123
33 Page 2 of 85 M. A. Peletier et al.
1 Introduction
The study of dissipative variational evolution equations has seen a tremendous activity in the
last two decades. A general class of such systems is that of generalized gradient flows, which
formally can be written as
ρ̇ = Dζ R∗ (ρ, −Dρ E(ρ)) (1.1)
in terms of a driving functional E and a dual dissipation potential R∗ = R∗ (ρ, ζ), where
Dζ and Dρ denote derivatives with respect to ζ and ρ. The most well-studied of these are
classical gradient flows [4], for which ζ → Dζ R∗ (ρ, ζ) = K(ρ)ζ is a linear operator K(ρ),
and rate-independent systems [61], for which ζ → Dζ R∗ (ρ, ζ) is zero-homogeneous.
123
Jump processes as generalized gradient flows Page 3 of 85 33
However, various models naturally lead to gradient structures that are neither classic nor
rate-independent. For these systems, the map ζ → Dζ R∗ (ρ, ζ) is neither linear nor zero-
homogeneous, and in many cases it is not even homogeneous of any order. Some examples
are
(1) Models of chemical reactions, where R∗ depends exponentially on ζ [6,32,38,46],
(2) The Boltzmann equation, also with exponential R∗ [38],
(3) Nonlinear viscosity relations such as the Darcy-Forchheimer equation for porous media
flow [39,44],
(4) Effective, upscaled descriptions in materials science, where the effective potential R∗
arises through a cell problem, and can have many different types of dependence on ζ
[16,28,46,55,65,72–74],
(5) Gradient structures that arise from large-deviation principles for sequences of stochastic
processes, in particular jump processes [58,59].
The last example is the inspiration for this paper.
Regardless whether R∗ is classic, rate-independent, or otherwise, equation (1.1) typically
is only formal, and it is a major mathematical challenge to construct an appropriate func-
tional framework for this equation. Such a functional framework should give the equation a
rigorous meaning, and provide the means to prove well-posedness, stability, regularity and
approximation results to facilitate the study of the equation.
For classical gradient systems, in which Dζ R∗ is linear and R∗ is quadratic in ζ (therefore
also called ‘quadratic’ gradient systems) and when R∗ generates a metric space, a rich frame-
work has been created by Ambrosio, Gigli, and Savaré [4]. For rate-independent systems, in
which R∗ is 1-homogeneous in ζ, the complementary concepts of ‘Global Energetic solutions’
and ‘Balanced Viscosity solutions’ give rise to two different frameworks [19,61,62,64,67].
For the examples (1–5) listed above, however, R∗ is not homogeneous in ζ, and neither the
rate-independent frameworks nor the metric-space theory apply. Nonetheless, the existence of
such models of real-world systems with a formal variational-evolutionary structure suggests
that there may exist a functional framework for such equations that relies on this structure.
In this paper we build exactly such a framework for an important class of equations of this
type, those that describe Markov jump processes. We expect the approach advanced here to
be applicable to a broader range of systems.
Some generalized gradient-flow structures of evolution equations are generated by the large
deviations of an underlying, more microscopic stochastic process [1,2,22,46,59,60]. This
explains the origin and interpretation of such structures, and it can be used to identify hitherto
unknown gradient-flow structures [36,71].
It is the example of Markov jump processes that inspires the results of this paper, and we
describe this example here; nonetheless, the general setup that starts in Sect. 3.1 has wider
application. We think of Markov jump processes as jumping from one ‘vertex’ to another
‘vertex’ along an ‘edge’ of a ‘graph’; we place these terms between quotes because the
space V of vertices may be finite, countable, or even uncountable, and similarly the space
E:=V × V of edges may be finite, countable, or uncountable (see Assumption (Vπκ) below).
In this paper, V is a standard Borel space.
123
33 Page 4 of 85 M. A. Peletier et al.
The jump kernel κ in these definitions characterizes the process: κ(x, ·) ∈ M+ (V ) is the
infinitesimal rate of jumps of a particle from the point x to points in V . Here we address the
reversible case, which means that the process has an invariant measure π ∈ M+ (V ), i.e.,
Q ∗ π = 0, and that the joint measure π(dx)κ(x, dy) is symmetric in x and y.
In this paper we consider evolution equations of the form (1.2) for the nonnegative measure
ρ, as well as various linear and nonlinear generalizations. We will view them as gradient
systems of the form (1.1), and use this gradient structure to study their properties.
The gradient structure for equation (1.2) consists of the state space M+ (V ), a driving
functional E : M+ (V ) → [0, +∞], and a dual dissipation potential R ∗ : M+ (V ) ×
Bb (E) → [0, +∞] (where Bb (E) denotes the space of bounded Borel functions on E). We
now describe this structure in formal terms, and making it rigorous is one of the aims of this
paper.
The functional that drives the evolution is the relative entropy with respect to the invariant
measure π, namely
⎧
⎨ φ u(x)π(dx) if ρ π, with u = dρ ,
E (ρ) = Fφ (ρ|π):= V dπ (1.4)
⎩
+∞ otherwise,
where for the example of Markov jump processes the ‘energy density’ φ is given by
(In the general development below we consider more general functions φ, such as those that
arise in strongly interacting particle systems; see e.g. [23,45]).
The dissipation potential R∗ is best written in terms of an alternative potential R ∗ ,
Here the ‘graph gradient’ ∇ : Bb (V ) → Bb (E) and its negative dual, the ‘graph divergence
operator’ div : M(E) → M(V ), are defined as follows:
123
Jump processes as generalized gradient flows Page 5 of 85 33
d ∗
Dζ R∗ (ρ, ζ), ζ̃ = R (ρ, ∇ζ + h∇ ζ̃) = Dξ R ∗ (ρ, ∇ζ), ∇ ζ̃ = −divDξ R ∗ (ρ, ∇ζ), ζ̃ ,
dh h=0
and DE (ρ) = φ (u) (which corresponds to log u for the logarithmic entropy (1.5)). This
(div, ∇)-duality structure is a common feature in both physical and probabilistic models, and
has its origin in the distinction between ‘states’ and ‘processes’; see [70, Sec. 3.3] and [69]
for discussions.
For this example of Markov jump processes we consider a class of generalized gradient
structures of the type above, given by E and R ∗ (or equivalently by the densities φ, ∗ , and
the measure νρ ), with the property that equations (1.1) and (1.9) coincide with (1.2). Even for
fixed E there exists a range of choices for ∗ and νρ that achieve this (see also the discussion
in [35,59]). A simple calculation (see the discussion at the end of Sect. 3.1) shows that, if
one chooses for the measure νρ the form
νρ (dx dy) = α(u(x), u(y)) π(dx)κ(x, dy), (1.10)
for a suitable fuction α : [0, ∞) × [0, ∞) → [0, ∞), and one introduces the map F :
(0, ∞) × (0, ∞) → R
F(u, v):=( ∗ ) φ (v) − φ (u) α(u, v) u, v > 0, (1.11)
then (1.9) takes the form of the integro-differential equation
∂t u t (x) = F u t (x), u t (y) κ(x, dy), (1.12)
y∈V
123
33 Page 6 of 85 M. A. Peletier et al.
Two other choices have received attention in the recent literature. Both of these are based
not on the quadratic energy φ(s) = 21 s 2 , but on the Boltzmann entropy functional φ(s) =
s log s − s + 1:
(1) The large-deviation characterization [58,59] leads to the choice
√
∗ (ξ ):=4 cosh(ξ/2) − 1 and α(u, v):= uv. (1.17a)
(2) The ‘quadratic-dissipation’ choice introduced independently by Maas [49], Mielke [52],
and Chow, Huang, and Zhou [13] for Markov processes on finite graphs,
u−v
∗ (ξ ):= 21 ξ 2 , (s) = 21 s 2 , and α(u, v):= . (1.17b)
log(u) − log(v)
Other examples are discussed in Sect. 1.3. With the quadratic choice (1.17b), the gradient
system fits into the metric-space structure (see e.g. [4]) and this feature has been used exten-
sively to investigate the properties of general Markov jump processes [25,29–31,49,52]. In
this paper, however, we focus on functions ∗ that are not homogeneous, as in (1.17a), and
such that the corresponding structure is not covered by the usual metric framework. On the
other hand, there are various arguments why this structure nonetheless has a certain ‘natu-
ralness’ (see Sect. 1.4), and these motivate our aim to develop a functional framework based
on this structure.
1.2 Challenges
Constructing a ‘functional framework’ for the gradient-flow equation (1.9) with the
choices (1.5) and (1.17a) presents a number of independent challenges.
As it stands, the formulation of equation (1.9) and of the functional R∗ of (1.8) presents
many difficulties: the definition of R∗ and the measure νρ when ρ is not absolutely contin-
uous with respect to π, the concept of time differentiability for the curve of measures ρt ,
whether ρt is necessarily absolutely continuous with respect to π along an evolution, what
happens if dρt /dπ vanishes and φ is not differentiable at 0 as in the case of the logarithmic
entropy, etcetera. As a result of these difficulties, it is not clear what constitutes a solution of
equation (1.9), let alone whether such solutions exist. In addition, a good solution concept
should be robust under taking limits, and the formulation (1.9) does not seem to satisfy this
requirement either.
For quadratic and rate-independent systems, successful functional frameworks have been
constructed on the basis of variational tools such as the Energy-Dissipation balance [4,42,
61,63]. Moreover, these functional frameworks have been shown to be stable under various
forms of asymptotic limits [46,54,66,79,80]. This strongly suggests that also the framework
proposed here should enjoy this stability with respect to perturbations of E and R which, in
particular, would allow one to generalize the more classical notion of solutions developed
123
Jump processes as generalized gradient flows Page 7 of 85 33
in Sect. 6. We have chosen not to dwell upon the stability issue to avoid overburdening the
exposition; we only provide a ‘partial’ stability result (with E and R fixed) in Theorem 5.10.
In fact, the same large-deviation principle that gives rise to the ‘cosh’ structure (1.17a)
above formally yields the ‘EDP’ functional (see Appendix A for a formal derivation)
⎧
⎪
⎪
T
⎪
⎨ R (ρt , j t ) +R ∗ ρt , −∇φ dρ dπ
t
dt + E (ρT ) − E (ρ0 )
0
L (ρ, j ):= (1.18)
⎪
⎪ if ∂t ρt + div j t = 0 and ρt π for all t ∈ [0, T ],
⎪
⎩+∞ otherwise.
In this formulation, R is the Legendre dual of R ∗ with respect to the ξ variable, which can
be written in terms of the Legendre dual := ∗∗ of ∗ as
1 dj
R (ρ, j ):= 2 dνρ . (1.19)
2 E dνρ
Along smooth curves ρt = u t π with strictly positive densities, the functional L is nonneg-
ative, since
d
E (ρt ) = φ (u t )∂t u t dπ = φ (u t (x))∂t ρt (dx)
dt V V
=− φ (u t (x))(div j t )(dx)
V
= ∇φ (u t )(x, y) j t (dx dy)
E
dj
= ∇φ (u t )(x, y) t (x, y) νρt (dx dy) (1.20)
E dνρt
1 d jt
≥− 2 (x, y) + ∗ −∇φ (u t )(x, y) νρt (dx dy). (1.21)
2 E dνρt
After time integration we find that L (ρ, j ) is nonnegative for any pair (ρ, j ).
The minimum of L is formally achieved at value zero, at pairs (ρ, j ) satisfying
∗ dρt
2 j t = ( ) −∇φ νρt and ∂t ρt + div j t = 0, (1.22)
dπ
which is an equivalent way of writing the gradient-flow equation (1.9). This can be recognized,
as usual for gradient systems, by observing that achieving equality in the inequality (1.21)
requires equality in the Legendre duality of and ∗ , which reduces to the equations above.
Remark 1.1 It is worth noticing that the joint convexity of the functional R of (1.19) (a
crucial property for the development of our analysis) is equivalent to the convexity of and
concavity of the function α.
Remark 1.2 Let us add a comment concerning the choice of the factor 1/2 in front of ∗
in (1.8), and the corresponding factors 1/2 and 2 in (1.19). The cosh-entropy combina-
tion (1.17a) satisfies the linear-equation condition F(u, v) = v − u (equation (1.13)) because
of the elementary identity
√ 1 v
2 uv sinh log = v − u.
2 u
123
33 Page 8 of 85 M. A. Peletier et al.
The factor 1/2 inside the sinh can be included in different ways. In [59] it was included
explicitly, by writing expressions of the form DR∗ (ρ, − 21 DE(ρ)); in this paper we follow
[46] and include this factor in the definition of R ∗ .
The first test of a new solution concept is whether solutions exist under reasonable conditions.
In this paper we provide two existence results that complement each other, one based on
dissipative L 1 -theory and the other on the Energy-Dissipation balance. These theories are
not completely equivalent, as can be observed for the classical heat equation (cf. [42] for
a variational approach), and they reveal different properties of a solution, even when the
assumptions for both theories are satisfied.
The first existence proof is based on a reformulation of the equation (1.2) as a differential
equation in the Banach space L 1 (V , π), driven by a continuous dissipative operator. Under
general compatibility conditions on φ, , and α, we show that the solution provided by
123
Jump processes as generalized gradient flows Page 9 of 85 33
this abstract approach is also a solution in the variational sense that we discussed above.
The proof is presented in Sect. 6 and is quite robust for initial data whose density takes
value in a compact interval [a, b] ⊂ (0, ∞). In order to deal with a more general class of
initial data, we will adopt two different viewpoints. A first possibility is to take advantage of
the robust stability properties of the (E , R , R ∗ ) Energy-Dissipation balance when the Fisher
information D is lower semicontinuous (cf. Theorem 5.10). A second possibility is to exploit
the monotonicity properties of (1.12) when the map F in (1.11) exhibits good behaviour at
the boundary of R2+ and at infinity (cf. Theorem 6.5). As mentioned above, neither of these
viewpoints are completely contained within the other.
Since we believe that the variational formulation reveals a relevant structure of such
systems and we expect that it may also be useful in dealing with more singular cases and their
stability issues, we also present a more intrinsic approach by adapting the well-established
‘JKO-Minimizing-Movement’ method to the structure of this equation. This method has
been used, e.g., for metric-space gradient flows [4,42], for rate-independent systems [51],
for some non-metric systems with formal metric structure [7,48], and also for Lagrangian
systems with local transport [33].
This approach relies on the Dynamical-Variational Transport cost (DVT) W (τ, μ, ν),
which is the τ -dependent transport cost between two measures μ, ν ∈ M+ (V ) induced by
the dissipation potential R via
τ
W (τ, μ, ν):=inf R (ρt , j t ) dt : ∂t ρt + div j t = 0, ρ0 = μ, and ρτ = ν . (1.25)
0
In the Minimizing-Movement scheme a single increment with time step τ > 0 is defined by
the minimization problem
ρ n ∈ argminρ W (τ, ρ n−1 , ρ) + E (ρ) . (1.26)
where S− : D(E ) → [0, +∞) is a suitable relaxed slope of the energy functional E with
respect to the cost W (see (7.29)). Under a lower-semicontinuity condition on D we show
that S − ≥ D . It then follows that ρ is a solution as defined above (see Definition 5.4).
Section 7 is devoted to developing the ‘Minimizing-Movement’ approach for general
DVTs. This requires establishing
(6) Properties of W that generalize those of the ‘metric version’ W (τ, μ, ν) = 2τ1
d(μ, ν)2
(Sect. 7.2);
(7) A generalization of the ‘Moreau-Yosida approximation’ and of the ‘De Giorgi variational
interpolant’ to the non-metric case, and a generalization of their properties (Sects. 7.1
and 7.2);
(8) A compactness result as τ → 0, based on the properties of W (Sect. 7.4);
(9) A proof of S − ≥ D (Corollary 7.11).
This procedure leads to our existence result, Theorem 7.4, of solutions in the sense of Defi-
nition 5.4.
123
33 Page 10 of 85 M. A. Peletier et al.
1.3 Examples
We will use the following two guiding examples to illustrate the results of this paper. Precise
assumptions are given in Sect. 3.1. In both examples the state space consists of measures ρ
on a standard Borel space (V , B) endowed with a reference Borel measure π. The kernel
x → κ(x, ·) is a measurable family of nonnegative measures with uniformly bounded mass,
such that the pair (π, κ) satisfies detailed balance (see Sect. 3.1).
Example 1: Linear equations driven by the Boltzmann entropy. This is the example that
we have been using in this introduction. The equation is the linear equation (1.2),
∂t ρt (dx) = ρ(dy)κ(y, dx) − ρ(dx) κ(x, dy),
y∈V y∈V
and corresponds to the linear field F of (1.13). Apart from the classical quadratic setting
of (1.14), two gradient structures for this equation have recently received attention in the
literature, both driven by the Boltzmann entropy (1.5) φ(s) = s log s − s + 1 as described
in (1.17):
√
(1) The ‘cosh’ structure: ∗ (ξ ) = 4 cosh(ξ/2) − 1 and α(u, v) = uv;
(2) The ‘quadratic’ structure: ∗ (ξ ) = 21 ξ 2 and α(u, v) = (u − v)/ log(u/v).
However, the approach of this paper applies to more general combinations (φ, ∗ , α) that lead
to the same equation. Due to the particular structure of (1.11), it is clear that the 1-homogeneity
of the linear map F (1.13) and the 0-homogeneity of the term φ (v) − φ (u) associated with
the Boltzmann entropy (1.5) restrict the√range of possible α to 1-homogenous functions
such as the ‘mean functions’ α(u, v) = uv (geometric) and α(u, v) = (u − v)/ log(u/v)
(logarithmic).
Confining the analysis to concave functions (according to Remark 1.1), we observe that
every concave and 1-homogeneous function α can be obtained by the concave generating
function f : (0, +∞) → (0, +∞)
α(u, v) = uf(v/u) = vf(u/v), f(r ):=α(r , 1), u, v, r > 0. (1.28)
The symmetry of α corresponds to the property
r f(1/r ) = f(r ) for every r > 0, (1.29)
and shows that the function
exp(s) − 1
g(s):= s ∈ R, is odd. (1.30)
f(exp(s))
The concaveness of f also shows that g is increasing, so that we can define
ξ exp(ξ )
r − 1 dr
∗ (ξ ):= g(s) ds = , ξ ∈ R, (1.31)
0 1 f(r ) r
123
Jump processes as generalized gradient flows Page 11 of 85 33
from which we identify other simpler means, such as the power means m p (u, v) = c p,2 p (u, v)
with p ∈ [−∞, 1]:
⎧ 1/ p
⎪
⎪ 1
up + vp if 0 < p ≤ 1 or − ∞ < p < 0 and u, v = 0,
⎪
⎪ 2
⎨√
m p (u, v) = uv if p = 0, (1.33)
⎪
⎪ min(u, v) if p = −∞,
⎪
⎪
⎩
0 if p < 0 and uv = 0,
and the generalized logarithmic mean l p (u, v) = c1, p+1 (u, v), p ∈ [−∞, −1].
The power means are obtained from the concave generating functions
√
f p (r ):=2−1/ p (r p + 1)1/ p if p = 0, f0 (r ) = r , f−∞ (r ) = min(r , 1), r > 0.
(1.34)
We can thus define
exp ξ r −1 dr
∗p (ξ ):=21/ p , ξ ∈ R, p ∈ (−∞, 1] \ {0}, (1.35)
1 (r p + 1)1/ p r
with the obvious changes when p = 0 (the case 0∗ (ξ ) = 4(cosh(ξ/2) − 1)) or p = −∞
(the case −∞∗ (ξ ) = exp(|ξ |) − 1 − |ξ |).
123
33 Page 12 of 85 M. A. Peletier et al.
and can be classically considered in the framework of the Dirichlet forms, i.e. α ≡ 1,
∗ (r ) = r 2 /2, with energy φ satisfying φ = f .
(2) The case F(u, v) = g(v − u), with g : R → R monotone and odd, yields the equation
∂t u t (x) = g u t (y) − u t (x) κ(x, dy),
y∈V
r
and can be obtained with the choices α ≡ 1, φ(s):=s 2 /2 and ∗ (r ):= 0 g(s) ds.
(3) Consider now the case when F is positively q-homogeneous, with q ∈ [0, 1]. It is then
natural to consider a q-homogeneous α and the logarithmic entropy φ(r ) = r log r −r +1.
If the function h : (0, ∞) → R, h(r ):=F(r , 1)/α(r , 1) is increasing, then setting as in
(1.35)
exp(ξ )
∗ (ξ ):= h(r ) dr
1
equation (1.12) provides an example of generalized gradient system (E , R , R ∗ ). Simple
examples are F(u, v) = v q − u q , corresponding to the equation
q q
∂t u t (x) = u t (y) − u t (x) κ(x, dy),
y∈V
with α(u, v):=m p (u q , v q ) and ∗ (ξ ):= q1 ∗p (qξ ), where ∗p has been defined in (1.35).
In the case p = 0 we get ∗ (ξ ) = q4 cosh(qξ/2) − 1 .
As a last example, we can consider F(u, v) = sign(v − u)|v m − u m |1/m , m > 0, and
α(u, v) = min(u, v); in this case, the function h given by h(r ) = (r m − 1)1/m when
r ≥ 1, and h(r ) = −(r −m − 1)1/m when r < 1, satisfies the required monotonicity
property.
1.4 Comments
Rationale for studying this structure. We think that the structure of generalized gradient
systems (E, R , R ∗ ) is sufficiently rich and interesting to deserve a careful analysis. It provides
a genuine extension of the more familiar quadratic gradient-flow structure of Maas, Mielke,
and Chow–Huang–Zhou, which better fits into the metric framework of [4]. In Sect. 6 we
will also show its connection with the theory of dissipative evolution equations.
Moreover, the specific non-homogeneous structure based on the cosh function (1.17a)
has a number of arguments in its favor, which can be summarized in the statement that it is
‘natural’ in various different ways:
(1) It appears in the characterization of large deviations of Markov processes; see Appendix A
or [10,59];
(2) It arises in evolutionary limits of other gradient structures (including quadratic ones)
[6,46,53,66];
(3) It ‘responds naturally’ to external forcing [66, Prop. 4.1];
(4) It can be generalized to nonlinear equations [37,38].
We will explore these claims in more detail in a forthcoming paper. Last but not least, the
very fact that non-quadratic, generalized gradient flows may arise in the limit of gradient
flows suggests that, allowing for a broad class of dissipation mechanisms is crucial in order
to (1) fully exploit the flexibility of the gradient-structure formulation, and (2) explore its
robustness with respect to -converging energies and dissipation potentials.
123
Jump processes as generalized gradient flows Page 13 of 85 33
Potential for generalization. In this paper we have chosen to concentrate on the con-
sequences of non-homogeneity of the dissipation potential for the techniques that are
commonly used in gradient-flow theory. Until now, the lack of a sufficiently general rigorous
construction of the functional R and its minimal integral over curves W have impeded the
use of this variational structure in rigorous proofs, and a main aim of this paper is to provide
a way forward by constructing a rigorous framework for these objects, while keeping the
setup (in particular, the ambient space V ) as general as possible.
In order to restrict the length of this paper,
we considered only simple driving functionals
E , which are of the local variety E (ρ) = φ(dρ/dπ)dπ. Many gradient systems appearing
in the literature are driven by more general functionals, that include interaction and other
nonlinearities [25,26,40,78], and we expect that the techniques of this paper will be of use
in the study of such systems.
As one specific direction of generalization, we note that the Minimizing-Movement
construction on which the proof of Theorem 7.4 is based has a scope wider than that of
the generalized gradient structure (E , R , R ∗ ) under consideration. In fact, as we show
in Sect. 7, Theorem 7.4 yields the existence of (suitably formulated) gradient flows in
a general topological space endowed with a cost fulfilling suitable properties. While
we do not develop this discussion in this paper, at places throughout the paper we
hint at this prospective generalization: the ‘abstract-level’ properties of the DVT cost
are addressed in Sect. 4.7, and the whole proof of Theorem 7.4 is carried out under
more general conditions than those required on the ‘concrete’ system set up in Sect.
3.
Challenges for generalization. A well-formed functional framework includes a concept
of solutions that behaves well under the taking of limits, and the existence proof is the first
test of this. Our existence proof highlights a central challenge here, in the appearance of
−
two slope functionals
∗
S and D that both represent rigorous versions of the ‘Fisher infor-
mation’ term R ρ, −∇φ (dρ/dπ) . The chain-rule lower-bound inequality holds under
general conditions for D (Theorem 4.16), but the Minimizing-Movement construction leads
to the more abstract object S − . Passing to the limit in the minimizing-movement approach
requires connecting the two through the inequality S − ≥ D . We prove it by first obtain-
ing the inequality S ≥ D , cf. Proposition 7.10, under the condition that a solution to
the (E , R , R ∗ ) system exists (for instance, by the approach developed in Sect. 6). We
then deduce the inequality S − ≥ D under the further condition that D be lower semi-
continuous, which can be in turn proved under a suitable convexity condition (cf. Prop.
5.3). We hope that more effective ways of dealing with these issues will be found in the
future.
Comparison with the Weighted Energy-Dissipation method. It would be interesting
to develop the analogous variational approach based on studying the limit behaviour as
ε ↓ 0 of the minimizers (ρt , j t )t≥0 of the Weighted Energy-Dissipation (WED) func-
tional
+∞ 1
Wε (ρ, j ):= e−t/ε R (ρt , j t ) + E (ρt ) dt (1.37)
0 ε
among the solutions to the continuity equation with initial datum ρ0 , see [76]. Indeed, the
intrinsic character of the WED functional, which only features the dissipation potential R ,
makes it suitable to the present non-metric framework.
123
33 Page 14 of 85 M. A. Peletier et al.
1.5 Notation
The following table collects the notation used throughout the paper.
2 Preliminary results
123
Jump processes as generalized gradient flows Page 15 of 85 33
The set function |μ| : B → [0, +∞) is a positive finite measure on B [3, Thm. 1.6] and
(M(Y ; Rm ), · T V ) is a Banach space.
In the case m = 1, we will simply write M(Y ), and we shall denote the space of positive
finite measures on B by M+ (Y ). For m > 1, we will identify any element μ ∈ M(Y ; Rm )
with a vector (μ1 , . . . , μm ), with μi ∈ M(Y ) for all i = 1, . . . , m. If ϕ = (ϕ 1 , . . . , ϕ m ) ∈
Bb (Y ; Rm ), the set of bounded Rm -valued B-measurable maps, the duality between μ ∈
M(Y ; Rm ) and ϕ can be expressed by
m
μ, ϕ := ϕ · μ(dx) = ϕ i (x)μi (dx).
Y i=1 Y
Besides the topology of convergence in total variation (induced by the norm · T V ), we will
also consider the topology of setwise convergence, i.e. the coarsest topology on M(Y ; Rm )
making all the functions
μ → μ(B) B ∈ B
continuous. For a sequence (μn )n∈N and a candidate limit μ in M(Y ; Rm ) we have the
following equivalent characterizations of the corresponding convergence [9, §4.7(v)]:
(1) Setwise convergence:
lim μn (B) = μ(B) for every set B ∈ B. (2.3)
n→+∞
123
33 Page 16 of 85 M. A. Peletier et al.
(3) Weak topology of the Banach space: the sequence μn converges to μ in the weak topology
of the Banach space (M(Y ; Rm ); · T V ).
(4) Weak L 1 -convergence of the densities: there exists a common dominating measure γ ∈
M+ (Y ) such that μn γ , μ γ and
dμn dμ
weakly in L 1 (Y , γ ; Rm ). (2.5)
dγ dγ
(5) Alternative form of weak L 1 -convergence: (2.5) holds for every common dominating
measure γ .
We will refer to setwise convergence for sequences satisfying one of the equivalent
properties above. The above topologies also share the same notion of compact subsets, as
stated in the following useful theorem, cf. [9, Theorem 4.7.25], where we shall denote by
σ (M(Y ; Rm ); Bb (Y ; Rm )) the weak topology on M(Y ; Rm ) induced by the duality with
Bb (Y ; Rm ).
Theorem 2.1 For every set ∅ = M ⊂ M(Y ; Rm ) the following properties are equivalent:
(1) M has a compact closure in the topology of setwise convergence.
(2) M has a compact closure in the topology σ (M(Y ; Rm ); Bb (Y ; Rm )).
(3) M has a compact closure in the weak topology of (M(Y ; Rm ); · T V ).
(4) Every sequence in M has a subsequence converging on every set of B.
(5) There exists a measure γ ∈ M+ (Y ) such that
∀ε > 0 ∃δ > 0 : B ∈ B, γ (B) ≤ δ ⇒ sup μ(B) ≤ ε. (2.6)
μ∈M
(6) There exists a measure γ ∈ M+ (Y ) such that μ γ for every μ ∈ M and the set
{dμ/dγ : μ ∈ M} has compact closure in the weak topology of L 1 (Y , γ ; Rm ).
The name ‘equi-absolute continuity’ above derives from the interpretation that the mea-
sure f γ is absolutely continuous with respect to γ in a uniform manner; ‘equi-absolute
continuity’ is a shortening of Bogachev’s terminology ‘F has uniformly absolutely continu-
ous integrals’ [9, Def. 4.5.2]. A fourth equivalent property is equi-integrability with respect
to γ [9, Th. 4.5.3], a fact that we will not use.
When Y is endowed with a (separable and metrizable) topology τY , we will use the symbol
Cb (Y ; Rm ) to denote the space of bounded Rm -valued continuous functions on (Y , τY ). We
will consider the corresponding weak topology σ (M(Y ; Rm ); Cb (Y ; Rm )) induced by the
123
Jump processes as generalized gradient flows Page 17 of 85 33
It is obvious that for a sequence (μn )n∈N convergence in total variation implies setwise
convergence (or in duality with bounded measurable functions), and setwise convergence
implies weak convergence in duality with bounded continuous functions.
We will use the following construction several times. Let ψ : Rm → [0, +∞] be convex and
lower semicontinuous and let us denote by ψ∞ : Rm → [0, +∞] its recession function
ψ(t z) ψ(t z) − ψ(0)
ψ∞ (z):= lim = sup , (2.10)
t→+∞ t t>0 t
which is a convex, lower semicontinuous, and positively 1-homogeneous map with ψ∞ (0) =
0. We define the functional Fψ : M(Y ; Rm ) × M+ (Y ) → [0, +∞] by
dμ dμ⊥ dμ
Fψ (μ|ν):= ψ dν + ψ∞ ⊥|
d|μ⊥ |, for μ = ν + μ⊥ . (2.11)
Y dν Y d|μ dν
Note that when ψ is superlinear then ψ∞ (x) = +∞ in Rm \ {0}. Equivalently,
dμ
ψ superlinear, Fψ (μ|ν) < ∞ ⇒ μ ν, Fψ (μ|ν) = ψ dν. (2.12)
Y dν
We collect in the next Lemma a list of useful properties.
Lemma 2.3
(1) When ψ is also positively 1-homogeneous, then ψ ≡ ψ∞ , Fψ (·|ν) is independent of ν
and will also be denoted by Fψ (·): it satisfies
dμ
Fψ (μ) = ψ dγ for every γ ∈ M+ (Y ) such that μ γ . (2.13)
Y dγ
(2) If ψ̂ : Rm+1 → [0, ∞] denotes the 1-homogeneous, convex, perspective function asso-
ciated with ψ by ⎧
⎪
⎨ψ(z/t)t if t > 0,
ψ̂(z, t):= ψ∞ (z) if t = 0, (2.14)
⎪
⎩
+∞ if t < 0,
then
Fψ (μ|ν) = Fψ̂ (μ, ν) for every (μ, ν) ∈ M(Y ; Rm ) × M+ (Y ) (2.15)
with Fψ̂ defined as in (2.13).
(3) In particular, if γ ∈ M+ (Y ) is a common dominating measure such that μ = uγ ,
ν = vγ , and Y :={x ∈ Y : v(x) > 0} we also have
Fψ (μ|ν) = ψ̂(u, v) dγ = ψ(u/v)v dγ + ψ∞ (u) dγ . (2.16)
Y Y Y \Y
123
33 Page 18 of 85 M. A. Peletier et al.
where we also used the fact that |μ|(Y \ (N ∪ N )) = 0, so that dμ/dγ = 0 γ -a.e. on
Y \ (N ∪ N ).
(2) Since ψ̂ is 1-homogeneous, we can apply the previous claim and evaluate Fψ̂ (μ, ν) by
choosing the dominating measure γ :=ν + μ⊥ .
(3) It is an immediate consequence of the first two claims.
(4) By (2.15) it is sufficient to consider the 1-homogeneous case. The convexity then follows
by the convexity of ψ and by choosing a common dominating measure to represent the
integrals. Relations (2.17) are also immediate.
(5) Using (2.15) and selecting a dominating measure γ with γ (B) = 1, Jensen’s inequality
applied to the convex functional ψ̂ yields
dμ dν dμ dν
ψ̂(μ(B), ν(B)) = ψ̂ dγ , dγ ≤ ψ̂ , dγ
B dγ B dγ B dγ dγ
= Fψ̂ (μ B, ν B).
Applying now the above inequality to the mutally singular couples (μa , ν) and (μ⊥ , 0) and
using the second identity of (2.17) we obtain (2.18).
(6) We apply (2.15) and the first identity of (2.16), observing that if ψ(0) = 0 then ψ̂ is
decreasing with respect to its second argument.
(7) By (2.15) it is not restrictive to assume that is 1-homogeneous. If (μn )n is a sequence
setwise converging to μ in M(Y ; Rm ) we can find a common dominating measure γ such
123
Jump processes as generalized gradient flows Page 19 of 85 33
that (2.5) holds. The claimed property is then reduced to the weak lower semicontinuity of
the functional
u → (u) dγ (2.20)
Y
Let us set R+ :=[0, +∞[, Rm + :=(R+ ) , and let α : R+ → R+ be a continuous and concave
m m
function. It is obvious that α is non-decreasing with respect to each variable. As for (2.10),
the recession function α∞ is defined by
α(t z) α(t z) − α(0)
α∞ (z):= lim = inf t>0 , z ∈ Rm
+. (2.21)
t→+∞ t t
We define the corresponding map α : M(Y ; Rm + +
+ ) × M (Y ) → M (Y ) by
dμ dμ
α[μ|γ ]:=α γ + α∞ |μ⊥ | μ ∈ M(Y ; Rm
+ ), γ ∈ M+ (Y ), (2.22)
dγ d|μ⊥ |
where as usual μ = dμ
dγ γ + μ⊥ is the Lebesgue decomposition of μ with respect to γ ; in
what follows, we will use the short-hand μγ := dμ dγ γ . We also mention in advance that, for
shorter notation we will write α[μ1 , μ2 |γ ] in place of α[(μ1 , μ2 )|γ ].
Like for F , it is not difficult to check that α[μ|γ ] is independent of γ if α is positively
1-homogeneous (and thus coincides with α∞ ). If we define the perspective function α̂ :
Rm+1
+ → R+
α(z/t)t if t > 0,
α̂(z, t):= ∞ (2.23)
α (z) if t = 0
123
33 Page 20 of 85 M. A. Peletier et al.
by the arbitrariness of t > 0, we conclude that α∞ (z) ≤ y · z for every y ∈ D(α∗ ). On the
other hand, by (2.26) we have
α(t z) − α(0) y · (t z) − α∗ (y) − α(0)
α∞ (z) = inf t>0 = inf t>0 inf y∈D(α∗ )
t t
−α∗ (y) − α(0)
= inf y∈D(α∗ ) y · z + inf t>0
t
= inf y∈D(α∗ ) y · z,
where we have used that −α∗ (y) − α(0) ≥ 0 since α(0) = inf y∈D(α∗ ) (−α∗ (y)).
dμ
For every Borel set B ⊂ Y , Jensen’s inequality yields (recall the notation μγ = dγ γ )
μγ (B)
α[μ|γ ](B) ≤ α γ (B) + α∞ (μ⊥ (B))
γ (B) (2.28)
α[μ|γ ](B) ≤ α(μ(B)) if α = α∞ is 1-homogeneous.
Taking the infimum with respect to y and y , and recalling (2.26) and (2.27), we find (2.28).
Choosing y = y in the previous formula we also obtain the linear upper bound
We will set
κY (x):=κ(x, Y ), κY ∞ := sup |κ|(x, Y ), (2.31)
x∈X
then Fubini’s Theorem [18, II, 14] shows that there exists a unique measure κ γ (dx, dy) =
γ (dx)κ(x, dy) on (X × Y , A ⊗ B) such that
κ γ (A × B) = κ(x, B) γ (dx) for every A ∈ A, B ∈ B. (2.33)
A
123
Jump processes as generalized gradient flows Page 21 of 85 33
where y : E → V denotes the projection on the second component, cf. (3.1) ahead. We
say that γ is reversible if it satisfies the detailed balance condition, i.e. κ γ is symmetric:
s κ γ = κ γ . The concepts of invariance and detailed balance correspond to the analogous
concepts in stochastic-process theory; see Sect. 3.1. It is immediate to check that reversibility
implies invariance.
If f : X × Y → R is a positive or bounded measurable function, then
the map x → κ f (x):= f (x, y)κ(x, dy) is A-measurable (2.35)
Y
and
f (x, y) κ γ (dx, dy) = f (x, y) κ(x, dy) γ (dx). (2.36)
X ×Y X Y
showing the setwise convergence. The other statement follows by a similar argument.
In the Introduction we described jump processes on V with kernel κ, and showed that the
evolution equation ∂t ρt = Q ∗ ρt for the law ρt of the process is a generalized gradient flow
characterized by a driving functional E and a dissipation potential R ∗ .
123
33 Page 22 of 85 M. A. Peletier et al.
The mathematical setup of this paper is slightly different. Instead of starting with an
evolution equation and proceeding to the generalized gradient system, our mathematical
development starts with the generalized gradient system; we then consider the equation to
be defined by this system. In this Section, therefore, we describe assumptions that we make
on E and R ∗ that will allow us to set up the rigorous functional framework for the evolution
equation (1.9).
We first state the assumptions about the sets V of ‘vertices’ and E:=V × V of ‘edges’.
‘Edges’ are identified with ordered pairs (x, y) of vertices x, y ∈ V . We will denote by
x, y : E → V and s : E → E the coordinate and the symmetry maps defined by
x(x, y):=x, y(x, y):=y, s(x, y) := (y, x) for every x, y ∈ V . (3.1)
123
Jump processes as generalized gradient flows Page 23 of 85 33
We next turn to the driving functional, which is given by the construction in (2.11) and
(2.12) for a superlinear density ψ = φ and for the choice γ = π.
The flux density map α : [0, +∞) × [0, +∞) → [0, +∞), with α ≡ 0, is continuous,
concave, symmetric:
α(u 1 , u 2 ) = α(u 2 , u 1 ) for all u 1 , u 2 ∈ [0, +∞), (3.12)
Note that since α is nonnegative, concave, and not trivially 0, it cannot vanish in the interior
of R2+ , i.e.
u 1 u 2 > 0 ⇒ α(u 1 , u 2 ) > 0. (3.14)
The examples that we gave in the introduction of the cosh-type dissipation (1.17a) and the
quadratic dissipation (1.17b) both fit these assumptions; other examples are
α(u, v) = 1 and α(u, v) = u + v.
In some cases we will use an additional property, namely that α is positively 1-
homogeneous, i.e. α(λu 1 , λu 2 ) = λα(u 1 , u 2 ) for all λ ≥ 0. This 1-homogeneity is
automatically satisfied under the compatibility condition (1.13), with the Boltzmann entropy
function φ(s) = s log s − s + 1.
123
33 Page 24 of 85 M. A. Peletier et al.
Lemma 3.1 Under Assumption (R ∗ α), the function : R → R is even and satisfies
0 = (0) < (s) < +∞ for all s ∈ R \ {0}. (3.15a)
is strictly convex, strictly increasing, and superlinear. (3.15b)
Proof The superlinearity of ∗ implies that (s) < +∞ for all s ∈ R, and similarly the
finiteness of ∗ on R implies that is superlinear. Since ∗ is even, is convex and even,
and therefore (s) ≥ (0) = supξ ∈R [− ∗ (ξ )] = 0. Furthermore, since for all p ∈ R,
argmins∈R ((s) − ps) = ∂ ∗ ( p) (see e.g. [77, Thm. 11.8]) and ∗ is differentiable at
every p, we conclude that argmins ((s) − ps) = {( ∗ ) ( p)}; therefore each point of the
graph of is an exposed point. It follows that is strictly convex, and (s) > 0 for all
s = 0.
As described in the introduction, we use , ∗ , and α to define the dual pair of dissipation
potentials R and R ∗ , which for a couple of measures ρ = uπ ∈ M+ (V ) and j ∈ M(E) are
formally given by
1 dj 1
R (ρ, j ):= 2 dνρ , R ∗ (ρ, ξ ):= ∗ (ξ ) dνρ , (3.16)
2 E dνρ 2 E
with
νρ (dx dy):=α u(x), u(y) ϑ(dx dy) = α u(x), u(y) π(dx)κ(x, dy). (3.17)
This expression for the edge measure νρ also is implicitly present in the structure built in
[30,49]. The above definitions are made rigorous in Definition 4.9 and in (4.20) below.
The three sets of conditions above, Assumptions (Vπκ), (E φ), and (R ∗ α), are the main
assumptions of this paper. Under these assumptions, the evolution equation (1.9) may be linear
or nonlinear in ρ. The equation coincides with the Forward Kolmogorov equation (1.2) if
and only if condition (1.13) is satisfied, as shown below.
Let us call Q [ρ] the right-hand side of (1.9) and let us compute
dρ
Q [ρ], ϕ = −div Dξ R ∗ ρ, −∇φ ,ϕ
dπ
for every ϕ ∈ Bb (V ) and ρ ∈ M+ (V ) with ρ π. With u = dπ
dρ
we thus obtain
Q [ρ], ϕ = Dξ R ∗ ρ, −∇φ (u) , ∇ϕ
1 ∗
= −∇φ (u)(x, y) ∇ϕ(x, y)νρ (dx, dy) . (3.18)
2 E
123
Jump processes as generalized gradient flows Page 25 of 85 33
where for (∗) we used the symmetry of ϑ (i.e. the detailed-balance condition). This calculation
justifies (1.12).
In the linear case of (1.2) it is immediate to see that
Q ∗ ρ, ϕ = ρ, Qϕ = [ϕ(y) − ϕ(x)] κ(x, dy)ρ(dx)
E
1
= ∇ϕ(x, y) κ(x, dy)ρ(dx) − κ(y, dx)ρ(dy)
2
E
1
= ∇ϕ(x, y) u(x) − u(y) ϑ(dx, dy), (3.20)
2 E
4 Curves in M+ (V)
A major challenge in any rigorous treatment of an equation such as (1.1) is finding a way
to deal with the time derivative. The Ambrosio-Gigli-Savaré framework for metric-space
gradient systems, for instance, is organized around absolutely continuous curves. These are
a natural choice because on the one hand this class admits a ‘metric velocity’ that generalizes
the time derivative, while on the other hand solutions are automatically absolutely continuous
by the superlinear growth of the dissipation potential.
For the systems of this paper, a similar role is played by curves such that the ‘action’
R dt is finite; we show below that the superlinearity of R (ρ, j ) in j leads to similarly
beneficial properties. In order to exploit this aspect, however, a number of intermediate steps
need to be taken:
(a) We define the class CE(0, T ) of solutions (ρ, j ) of the continuity equation (1.23) (Defi-
nition 4.1).
(b) For such solutions, t → ρt is continuous in the total variation distance (Corollary 4.3).
(c) We give a rigorous definition of the functional R (Definition 4.9), and describe its
behaviour on absolutely continuous and singular parts of (ρ, j ) (Lemma 4.10 and The-
orem 4.13).
(d) If the action functional R is finite along a solution (ρ, j ) of the continuity equation in
[0, T ], then the property that ρt is absolutely continuous with respect to π at some time
t ∈ [0, T ] propagates to all the interval [0, T ] (Corollary 4.14).
(e) We prove a chain rule for the derivative of convex entropies along curves of finite R -
action (Theorem 4.16) and derive an estimate involving R and a Fisher-information-like
term (Corollary 4.20).
123
33 Page 26 of 85 M. A. Peletier et al.
(f) If the action R is uniformly bounded along a sequence (ρ n , j n ) ∈ CE(0, T ), then the
sequence is compact in an appropriate sense (Proposition 4.21).
Once properties (a)–(f) have been established, the next step is to consider finite-action
curves that also connect two given values μ, ν, leading to the definition of the Dynamical-
Variational Transport (DVT) cost
τ
W (τ, μ, ν) := inf R (ρt , j t ) dt : (ρ, j ) ∈ CE(0, τ ), ρ0 = μ, ρτ = ν . (4.1)
0
This definition is in the spirit of the celebrated Benamou-Brenier formula for the Wasserstein
distance [8], generalized to a broader family of transport distances [20] and to jump processes
[30,49]. However, a major difference with those constructions is that W also depends on
the time variable τ and that W (τ, ·, ·) is not a (power of a) distance, since is not, in
general, positively homogeneous of any order. Indeed, when R is p-homogeneous in j , for
p ∈ (1, +∞), we have (see also the discussion at the beginning of Sec. 7.1)
1 1 p
W (τ, μ, ν) = W (1, μ, ν) = d (μ, ν), (4.2)
τ p−1 pτ p−1 R
where dR is an extended distance and is a central object in the usual Minimizing-Movement
construction. In Sect. 7, the DVT cost W will replace the rescaled p-power of the distance
and play a similar role for the Minimizing-Movement approach.
For the rigorous construction of W ,
(g) we show that minimizers of (4.1) exist (Corollary 4.22);
(h) we establish properties of W that generalize those of the metric-space version (4.2)
(Theorem 4.26).
Finally,
b
(i) we close the loop by showing that from a given functional W integrals of the form a R
can be reconstructed (Proposition 4.27).
Throughout this section we adopt Assumptions (Vπκ) and (R ∗ α).
We now introduce the formulation of the continuity equation we will work with. Hereafter,
for a given function μ : I → M(V ), or μ : I → M(E), with I = [a, b] ⊂ R, we shall
often write μt in place of μ(t) for a given t ∈ I and denote the time-dependent function μ
by (μt )t∈I . We will write λ for the Lebesgue measure on I . The following definition mimics
those given in [4, Sec. 8.1] and [21, Def. 4.2].
Definition 4.1 (Solutions (ρ, j ) of the continuity equation) Let I = [a, b] be a closed
interval of R. We denote by CE(I ) the set of pairs (ρ, j ) given by
• a family of time-dependent measures ρ = (ρt )t∈I ⊂ M+ (V ), and
T
• a measurable family ( j t )t∈I ⊂ M(E) with 0 | j t |(E) dt < +∞, satisfying the conti-
nuity equation
ρ̇ + div j = 0 in I × V , (4.3)
in the following sense:
123
Jump processes as generalized gradient flows Page 27 of 85 33
ϕ dρt2 − ϕ dρt1 = ∇ϕ d j λ for all ϕ ∈ Bb (V ), J = [t1 , t2 ] ⊂ I . (4.4)
V V J ×E
Remark 4.2 The requirement (4.4) shows in particular that t → ρt is continuous with respect
to the total variation metric. Choosing ϕ ≡ 1 in (4.4), one immediately finds that
the total mass ρt (V ) is constant in I . (4.5)
By the disintegration theorem, it is equivalent to assign the measurable family ( j t )t∈I in
M(E) or the measure j λ in M(I × E).
We can in fact prove a more refined property. The proof of the Corollary below is postponed
to Appendix B.
Corollary 4.3 If (ρ, j ) ∈ CE(0, T ), then there exist a common dominating measure γ ∈
M+ (V ) (i.e., ρt γ for all t ∈ [a, b]), and an absolutely continuous map ũ : [a, b] →
L 1 (V , γ ) such that ρt = ũ t γ γ for every t ∈ [a, b].
The interpretation of the continuity equation in Definition 4.1—in duality with all bounded
measurable functions—is quite strong, and in particular much stronger than the more com-
mon continuity in duality with continuous and bounded functions. However, this continuity
equation can be recovered starting from a much weaker formulation. The following result
illustrates this; it is a translation of [4, Lemma 8.1.2] (cf. also [21, Lemma 4.1]) to the present
setting. The proof adapts the argument for [4, Lemma 8.1.2] and is given in Appendix B.
Lemma 4.4 (Continuous representative) Let (ρt )t∈I ⊂ M+ (V ) and ( j t )t∈I be measurable
families that are integrable with respect to λ and let τ be any separable and metrizable
topology inducing B. If
T T
− η (t) ζ (x)ρt (dx) dt = η(t) ∇ζ (x, y) j t (dx dy) dt , (4.6)
0 V 0 E
holds for every η ∈ C∞c ((a, b)) and ζ ∈ Cb (V , τ ), then there exists a unique curve I
t → ρ̃t ∈ M+ (V ) such that ρ̃t = ρt for λ-a.e. t ∈ I . The curve ρ̃ is continuous in the
total-variation norm with estimate
t2
ρ̃t2 − ρ̃t1 T V ≤ 2 | j t |(E) dt for all t1 ≤ t2 , (4.7)
t1
and satisfies
t2
ϕ(t2 , ·) dρ̃t2 − ϕ(t1 , ·) dρ̃t1 = ∂t ϕ dρ̃t dt + ∇ϕ d j λ (4.8)
V V t1 V J ×E
Remark 4.5 In (4.4) we can always replace j with the positive measure j + :=( j − s# j )+ =
(2 j )+ , since div j = div j + (see Lemma B.1); therefore we can assume without loss of
generality that j is a positive measure.
123
33 Page 28 of 85 M. A. Peletier et al.
In this section we give a rigorous definition of the dissipation potential R , following the
formal descriptions above. In the special case when ρ and j are absolutely continuous, i.e.
ρ = uπ π and 2 j = wϑ ϑ, (4.9)
we set
E := {(x, y) ∈ E : α(u(x), u(y)) > 0}, (4.10)
and in this case we can define the functional R by the direct formula
⎧
⎨1 w(x, y)
α(u(x), u(y)) ϑ(dx, dy) if | j |(E \ E ) = 0,
R (ρ, j ) = 2 E α(u(x), u(y))
⎩
+∞ if | j |(E \ E ) > 0.
(4.11)
ˆ (2.14), we can also write (4.11) in the
Recalling the definition of the perspective function
equivalent and more compact form
1
R (ρ, j ) = ˆ w(x, y), α(u(x), u(y)) ϑ(dx, dy), 2 j = wϑ . (4.12)
2 E
so that it is natural to introduce the function ϒ : [0, +∞) × [0, +∞) × R → [0, +∞],
ˆ
ϒ(u, v, w):=(w, α(u, v)), (4.13)
observing that
1
R (ρ, j ) = ϒ(u(x), u(y), w(x, y)) ϑ(dx, dy) for 2 j = wϑ. (4.14)
2 E
Lemma 4.7 The function ϒ : [0, +∞) × [0, +∞) × R → [0, +∞] defined above is convex
and lower semicontinuous, with recession functional
⎧
⎪
⎪ w
⎪
⎨ α∞ (u, v) if α∞ (u, v) > 0
α∞ (u, v)
ϒ ∞ (u, v, w) = (w,
ˆ α∞ (u, v)) =
⎪
⎪0 if w = 0
⎪
⎩+∞ if w = 0 and α∞ (u, v) = 0.
(4.15)
123
Jump processes as generalized gradient flows Page 29 of 85 33
For any u, v ∈ [0, ∞) with α∞ (u, v) > 0, the map w → ϒ(u, v, w) is strictly convex.
If α is positively 1-homogeneous then ϒ is positively 1-homogeneous as well.
Proof Note that ϒ may be equivalently represented in the form
" #
ϒ(u, v, w) = sup ξ w − α(u, v) ∗ (ξ ) =: sup f ξ (u, v, w) . (4.16)
ξ ∈R ξ ∈R
The convexity of f ξ for each ξ ∈ R readily follows from its linearity in w and the convexity of
−α in (u, v). Therefore, ϒ is convex and lower semicontinuous as the pointwise supremum
of a family of convex continuous functions.
The characterization (4.15) of ϒ ∞ follows from observing that ϒ(0, 0, 0) = (0, ˆ 0) = 0
and using the 1-homogeneity of : ˆ
ˆ w, α (u, v) ,
= ∞
ˆ
where the last equality follows from the continuity of r → (w, r ) for all w ∈ R.
The strict convexity of w → ϒ(u, v, w) for any u, v ∈ [0, ∞) with α∞ (u, v) > 0 follows
directly from the strict convexity of (cf. Lemma 3.1).
The choice (4.14) provides a rigorous definition of R for couples of measures (ρ, j ) that
are absolutely continuous with respect to π and ϑ. In order to extend R to pairs (ρ, j ) that
are not absolutely continuous, it is useful to interpret the measure
νρ (dx, dy):=α(u(x), u(y))ϑ(dx, dy) (4.17)
in the integral of (4.11) in terms of a suitable concave transformation as in (2.22) of two
couplings generated by ρ. We therefore introduce the measures
ϑ−
ρ (dx dy):=ρ(dx)κ(x, dy), ϑ+ −
ρ (dx dy):=ρ(dy)κ(y, dx) = s# ϑ ρ (dx dy), (4.18)
observing that
dϑ −
ρ dϑ +
ρ
ρ = uπ π ⇒ ϑ±
ρ ϑ, (x, y) = u(x),
(x, y) = u(y). (4.19)
dϑ dϑ
We thus obtain that (4.17), (4.11) and (4.14) can be equivalently written as
1
νρ = α[ϑ − +
ρ , ϑ ρ |ϑ], R (ρ, j ) = F (2 j |νρ ) , (4.20)
2
where α[ϑ − + − +
ρ , ϑ ρ |ϑ] stands for α[(ϑ ρ , ϑ ρ )|ϑ], and the functional Fψ (·|·) is from (2.11),
and also
1
R (ρ, j ) = Fϒ (ϑ − +
ρ , ϑ ρ , 2 j |ϑ) , (4.21)
2
again writing for shorter notation Fϒ (ϑ − + − +
ρ , ϑ ρ , 2 j |ϑ) in place of Fϒ ((ϑ ρ , ϑ ρ , 2 j )|ϑ).
Therefore we can use the same expressions (4.20) and (4.21) to extend the functional R
to measures ρ and j that need not be absolutely continuous with respect to π and ϑ; the
next lemma shows that they provide equivalent characterizations. We introduce the functions
u ± : E → R, adopting the notation
u − :=u ◦ x and u + :=u ◦ y,
or equivalently u − (x, y):=u(x), u + (x, y):=u(y). (4.22)
(Recall that x and y denote the coordinate maps from E to V ).
123
33 Page 30 of 85 M. A. Peletier et al.
Fϒ (ϑ − +
ρ , ϑ ρ , 2 j |ϑ) = F (2 j |νρ ). (4.23)
νρ = α[ϑ − + 1 2 1 − + 2 ∞ − +
ρ , ϑ ρ |ϑ] = νρ + νρ , νρ :=α(u , u )ϑ, νρ :=α (z , z )ς . (4.26)
2 j = wϑ + w ς, (4.27)
ˆ is 1-homogeneous,
Indeed, identity (*) follows from the fact that, since
d(wϑ, ν1ρ )
Fˆ (wϑ, νρ ) =
1
ˆ dγ
E dγ
for every γ ∈ M+ (E) such that wϑ γ and ν1ρ γ , cf. (2.13). Then, it suffices to
dν1ρ
observe that wϑ ϑ and ν1ρ ϑ with dϑ = α(u − , u + ). The same argument applies to
Fˆ (w ς , ν2ρ ), cf. also Lemma 2.3(3).
123
Jump processes as generalized gradient flows Page 31 of 85 33
In this section, we study the properties of curves with finite R -action, i.e., elements of
b
A(a, b) := (ρ, j ) ∈ CE(a, b) : R (ρt , j t ) dt < +∞ . (4.33)
a
The finiteness of the R -action leads to the following remarkable property: A curve (ρ, j )
with finite R -action can be separated into two mutually singular curves (ρ a , j a ), (ρ ⊥ , j ⊥ ) ∈
123
33 Page 32 of 85 M. A. Peletier et al.
Theorem 4.13 Let (ρ, j ) ∈ A(a, b) and let us consider the Lebesgue decompositions ρt =
ρta + ρt⊥ and j t = j at + j ⊥
t of ρt with respect to π and of j t with respect to ϑ.
(3) If α is sub-linear or κ(x, ·) π for every x ∈ V , then ρt⊥ is constant in [a, b] and
j ⊥ ≡ 0.
Proof (1) Let γ ∈ M+ (V ) be a dominating measure for the curve ρ according to Corollary 4.3
and let us denote by γ = γ a + γ ⊥ the Lebesgue decomposition of γ with respect to π; we
also denote by P ∈ B(V ) a π-negligible Borel set such that γ ⊥ = γ P. Setting R:=V \ P,
since ρt γ we thus obtain ρta = ρt R, ρt⊥ = ρt P. By Lemma 4.10 for λ-a.e. t ∈ (a, b)
we obtain j ⊥
t = j (P × P) and j at = j (R × R) with | j t |(R × P) = | j t |(P × R) = 0.
For every function ϕ ∈ Bb we have ∇(ϕχ R ) ≡ 0 on P × P so that we get
t2
ϕ dρta2 − ϕ dρta1 = ϕ dρt2 − ϕ dρt1 = ∇(ϕχ R ) d( j at + j ⊥
t ) dt
V V R R t1 E
t2 t2
= ∇(ϕχ R ) d j at dt = ∇ϕ d j at dt,
t1 R×R t1 E
showing that (ρ a , j a ) belongs to CE(a, b). Estimate (4.34) follows by (4.30). From
Lemma 4.4 we deduce that ρta (V ) and ρt⊥ (V ) are constant.
(2) This follows by the linearity of the continuity equation and (4.31).
(3) If α is sub-linear or κ(x, ·) π for every x ∈ V , then Lemma 4.10 shows that j ⊥ ≡ 0.
Since by linearity (ρ ⊥ , j ⊥ ) ∈ CE(a, b), we deduce that ρt⊥ is constant.
Corollary 4.14 Let (ρ, j ) ∈ A(a, b). If there exists t0 ∈ [a, b] such that ρt0 π, then we
have ρt π for every t ∈ [a, b], j ⊥ ≡ 0, and div j t π for λ-a.e. t ∈ (a, b). In particular,
123
Jump processes as generalized gradient flows Page 33 of 85 33
there exists an absolutely continuous and a.e. differentiable map u : [a, b] → L 1 (V , π) and
a map w ∈ L 1 (E, λ ⊗ ϑ) such that
1
2 j λ = wλ ⊗ ϑ, ∂t u t (x) = wt (y, x) − wt (x, y) κ(x, dy) for a.e. t ∈ (a, b).
2 V
(4.36)
Moreover there exists a measurable map ξ : (a, b) × E → R such that w = ξ α(u − , u + )
λ ⊗ ϑ-a.e. and
1
R (ρt , j t ) = (ξt (x, y))α(u t (x), u t (y)) ϑ(dx, dy) for a.e. t ∈ (a, b). (4.37)
2 E
If w is skew-symmetric, then ξ is skew-symmetric as well and (4.36) reads as
∂t u t (x) = wt (y, x) κ(x, dy) = ξt (y, x)α(u t (x), u t (y)) κ(x, dy) a.e. in (a, b).
V V
(4.38)
Remark 4.15 Relations (4.36) and (4.38) hold both in the sense of a.e. differentiability of
maps with values in L 1 (V , π) and pointwise a.e. with respect to x ∈ V : more precisely,
there exists a set U ⊂ V of full π-measure such that for every x ∈ U the map t → u t (x) is
absolutely continuous and equations (4.36) and (4.38) hold for every x ∈ U , a.e. with respect
to t ∈ (0, T ).
Proof The first part of the statement is an immediate consequence of Theorem 4.13, which
yields ρt⊥ (V ) = 0 for every t ∈ [a, b]. We can thus write 2 j = w(λ⊗ϑ) for some measurable
map w : (a, b) × E → R. Moreover div j λ ⊗ π, since s j s (λ ⊗ ϑ) = λ ⊗ ϑ, and
therefore
2 j = j − s j λ ⊗ ϑ ⇒ div j = x (2 j ) x (λ ⊗ ϑ) λ ⊗ π. (4.39)
Setting z t = d(div j t )/dπ we get for a.e. t ∈ (a, b)
∂t u t = −z t ,
−2 ϕ z t dπ = (ϕ(y) − ϕ(x))wt (x, y)ϑ(dx, dy)
V
E
= ϕ(x)(wt (y, x) − wt (x, y))ϑ(dx, dy)
E
= ϕ(x) (wt (y, x) − wt (x, y))κ(x, dy) π(dx),
V V
The existence of ξ and formula (4.37) follow from Lemma 4.10(2).
123
33 Page 34 of 85 M. A. Peletier et al.
Note that Aβ is continuous (with extended real values) in R+ × R+ \ {(0, 0)} and is finite and
continuous whenever β (0) > −∞. When β (0) = −∞ we have Aβ (0, v) = −Aβ (u, 0) =
+∞ for every u, v > 0.
In the following we will adopt the convention
⎧
⎪
⎨+∞ if a > 0,
|±∞| = +∞, a · (+∞):= 0 if a = 0, a · (−∞) = −a · (+∞), (4.41)
⎪
⎩
−∞ if a < 0
for every a ∈ [−∞, +∞] and, using this convention, we define the extended valued function
Bβ : R+ × R+ × R → [−∞, +∞] by
We want to study the differentiability properties of the functional Fβ (·|π) along solutions
(ρ, j ) ∈ CE(I ) of the continuity equation. Note that if β is superlinear and Fβ is finite at a
time t0 ∈ I , then Corollary 4.14 shows that ρt π for every t ∈ I . If β has linear growth
then
Fβ (ρt |π) = β(u t ) dπ + β∞ (1)ρ ⊥ (V ), ρt = u t π + ρt⊥ , (4.43)
V
where we have used that t → ρt⊥ (V ) is constant. Thus, we are reduced to studying Fβ along
(ρ a , j a ), which is still a solution of the continuity equation. The absolute continuity property
of ρt with respect to π is therefore quite a natural assumption in the next result.
Theorem 4.16 (Chain rule I) Let (ρ, j ) ∈ A(a, b) with ρt = u t π π and let 2 j =
j − s j = w λ ⊗ ϑ as in Corollary 4.14 satisfy
b
β(u a ) dπ < +∞, Bβ (u t (x), u t (y), wt (x, y)) ϑ(dx, dy) dt < +∞
V a E +
(4.44)
Then the map t → V β(u t ) dπ is absolutely continuous in [a, b], the map Bβ (u − , u + , w )
is λ ⊗ ϑ-integrable and
d 1
β(u t ) dπ = Bβ (u t (x), u t (y), wt (x, y))ϑ(dx, dy) for a.e. t ∈ (a, b).
dt V 2 E
(4.45)
Remark 4.17 At first sight condition (4.44) on the positive part of Bβ is remarkable: we only
require the positive part of Bβ to be integrable, but in the assertion we obtain integrability of
thenegative part as well. This integrability arises from the combination of the upper bound
on V β(u a ) dπ in (4.44) with the lower bound β ≥ 0.
123
Jump processes as generalized gradient flows Page 35 of 85 33
The identity (4.52) is obvious if β (0) is finite, and if β (0) = −∞ then it follows by the
upper bound (4.49) and the fact that the right-hand side of (4.49) is finite almost everywhere.
123
33 Page 36 of 85 M. A. Peletier et al.
By the monotone convergence theorem S(t) = limk→+∞ Sk (t) ∈ [0, +∞] for all t ∈ [a, b]
and the limit is finite for t = 0. For all t ∈ [a, b], therefore,
1 t
S(t) = S(a) + B dϑ dr ,
2 a E
We now introduce three functions associated with the (general) continuous convex func-
tion β : R+ → R+ , differentiable in (0, +∞), that we have considered so far, and whose
main example will be the entropy density φ from (3.9). Recalling the definition (4.40),
the convention (4.41), and setting ∗ (±∞):=+∞, let us now introduce the functions
D+ −
β , Dβ , Dβ : R+ → [0, +∞]
2
D− ∗
β (u, v):= (Aβ (u, v))α(u, v)
∗ (Aβ (u, v))α(u, v) if α(u, v) > 0,
(4.53a)
0 otherwise,
⎧
⎪ ∗
⎨ (Aβ (u, v))α(u, v) if α(u, v) > 0,
+
Dβ (u, v):= 0 if u = v = 0, (4.53b)
⎪
⎩
+∞ otherwise, i.e. if α(u, v) = 0, u = v,
Dβ (·, ·):=the lower semicontinuous envelope of D+
β in R+ .
2
(4.53c)
The function Dφ corresponding to the choice β = φ shall feature in the (rigorous) definition of
the Fisher information functional D , cf. (5.1) ahead. Nonetheless, it is significant to introduce
the functions D− +
φ and Dφ as well, cf. Remarks 5.8 and 7.12 ahead.
Example 4.18 (The functions D± φ and Dφ in the quadratic and in the cosh case) In the
two examples of the linear equation (1.2), with Boltzmann entropy function φ, and with
quadratic and cosh-type potentials ∗ (see (1.17a) and (1.17b)), the functions D±
φ and Dφ
take the following forms:
(1) If ∗ (s) = s 2 /2 and, accordingly, α(u, v) = (u − v)/(log(u) − log(v)) for all u, v > 0
(with α(u, v) = 0 otherwise), then
1
D− 2 (log(u) − log(v))(u − v) if u, v > 0,
φ (u, v) =
0 if u = 0 or v = 0,
⎧
⎪ 1
⎨ 2 (log(u) − log(v))(u − v) if u, v > 0,
+
Dφ (u, v) = Dφ (u, v) = 0 if u = v = 0,
⎪
⎩
+∞ if u = 0 and v = 0, or vice versa.
123
Jump processes as generalized gradient flows Page 37 of 85 33
√
(2) For the case ∗ (s) = 4 cosh(s/2) − 1 and α(u, v) = uv for all u, v ≥ 0, one finds
⎧
⎨ √ √ 2
2 u− v if u, v > 0,
D−
φ (u, v) =
⎩0 if u = 0 or v = 0,
√ √ 2
Dφ (u, v) = 2 u − v for all u, v ≥ 0,
⎧
⎨ √ √ 2
+ 2 u− v if u, v > 0 or u = v = 0,
Dφ (u, v) =
⎩+∞ if u = 0 and v = 0, or vice versa.
(2) D−
β and Dβ are lower semicontinuous;
(3) For every u, v ∈ R+ and w ∈ R we have
Bβ (u, v, w) ≤ ϒ(u, v, w) + D− (u, v). (4.54)
β
(4) Moreover, when the right-hand side of (4.54) is finite, then the equality
− Bβ (u, v, w) = ϒ(u, v, w) + D−
β (u, v) (4.55)
is equivalent to the condition
α(u, v) = w = 0 or α(u, v) > 0, Aβ (u, v) ∈ R, −w = ( ∗ ) Aβ (u, v) α(u, v) .
(4.56)
Proof It is not difficult to check that D−
is lower semicontinuous: such a property is trivial
β
where α vanishes, and in all the other cases it is sufficient to use the positivity and the
continuity of ∗ in [−∞, +∞], the continuity of Aβ in R2+ \ {(0, 0)}, and the continuity and
the positivity of α. It is also obvious that D− + −
β ≤ Dβ , and therefore Dβ ≤ Dβ ≤ Dβ .
+
123
33 Page 38 of 85 M. A. Peletier et al.
Corollary 4.20 (Chain rule II) Let (ρ, j ) ∈ A(a, b) with ρt = u t π π and 2 j λ = w(λ⊗ϑ)
satisfy
b
β(u a ) dπ < +∞, D−β (u t (x), u t (y)) ϑ(dx, dy)dt < +∞. (4.58)
V a E
Then the map t → V β(u t ) dπ is absolutely continuous in [a, b] and
d
β(u ) ≤ R (ρt , j t ) + 1 D−
dt t dπ 2 β (u t (x), u t (y)) ϑ(dx, dy) for a.e. t ∈ (a, b).
V E
(4.59)
If moreover
d 1
− β(u t ) dπ = R (ρt , j t ) + D−β (u t (x), u t (y)) ϑ(dx, dy)
dt V 2 E
then 2 j = j and
− wt (x, y) = ( ∗ ) Aβ (u t (x), u t (y)) α(u t (x), u t (y)) for ϑ-a.e. (x, y) ∈ E. (4.60)
" #
In particular, wt = 0 ϑ-a.e. in (x, y) ∈ E : α(u t (x), u t (y)) = 0 .
We can then apply Lemma 4.19 and Theorem 4.16, observing that
ϒ(u t (x), u t (y), wt (x, y)) ϑ(dx, dy) ≤ ϒ(u t (x), u t (y), w(x, y)) ϑ(dx, dy)
E E
(4.61)
since
1
ϒ(u t (x), u t (y), wt (x, y)) = ϒ(u t (x), u t (y), (wt (x, y) − wt (y, x)))
2
1 1
≤ ϒ(u t (x), u t (y), wt (x, y)) + ϒ(u t (x), u t (y), wt (y, x))
2 2
and the integral of the last term coincides with the right-hand side of (4.61) thanks to the
symmetry of ϑ.
123
Jump processes as generalized gradient flows Page 39 of 85 33
The next result shows an important compactness property for collections of curves in A(a, b)
with bounded action. Recalling the discussion and the notation of Sect. 2.4, we will systemati-
cally associate with a given (ρ, j ) ∈ A(I ), I = [a, b], a couple of measures ρλ ∈ M+ (I ×V ),
j λ ∈ M(I × E) by integrating with respect to the Lebesgue measure λ in I :
ρλ (dt, dx) = λ(dt)ρt (dx), j λ (dt, dx, dy) = λ(dt) j t (dx, dy). (4.62)
Similarly, we define
ϑ± ± ±
ρ,λ (dt, dx, dy):=(ϑ ρ )λ (dt, dx, dy) = λ(dt)ϑ ρt (dx, dy)
(4.63)
= λ(dt)ρt (dx)κ(x, dy) = ϑ ±
ρλ (dt, dx, dy).
Then, there exist a subsequence (not relabelled) and a pair (ρ, j ) ∈ A(a, b) such that, for
the measures j nλ ∈ M([a, b] × E) defined as in (4.62) there holds
ρtn → ρt setwise in M+ (V ) for all t ∈ [a, b] , (4.66a)
j nλ j λ setwise in M([a, b] × E) , (4.66b)
where j λ is induced (in the sense of (4.62)) by a λ-integrable family ( j t )t∈[a,b] ⊂ M(E). In
addition, for any sequence (ρ n , j n ) converging to (ρ, j ) in the sense of (4.66), we have
b b
R (ρt , j t ) dt ≤ lim inf R (ρtn , j nt ) dt. (4.67)
a n→∞ a
Proof Let us first remark that the mass conservation property of the continuity equation yields
ρtn (V ) = ρan (V ) ≤ M1 for every t ∈ [a, b], n ∈ N (4.68)
for a suitable finite constant M1 independent of n. We deduce that for every t ∈ [a, b] the
measures ϑ ±ρtn have total mass bounded by M1 κV ∞ , so that estimate (2.29) for y = (c, c) ∈
D(α∗ ) yields
νρtn (E) = α[ϑ + −
ρ n , ϑ ρ n |ϑ](E) ≤ M2 for every t ∈ [a, b], n ∈ N, (4.69)
t t
where M2 :=2c M1 κV ∞ − α∗ (c, c)ϑ(E). Jensen’s inequality (2.18) and the monotonicity
property (2.19) yield
1 1 1 2 j nt (E)
R (ρtn , j nt ) ≥ ˆ 2 j nt (E), νρ n (E) ≥ ˆ 2 j nt (E), M2 = M2 , (4.70)
t
2 2 2 M2
ˆ the perspective function associated with , cf. (2.14). Since has superlinear growth,
with
we deduce that the sequence of functions t → | j nt |(E) is equi-integrable.
123
33 Page 40 of 85 M. A. Peletier et al.
Since the sequence (ρan )n , with ρan = u an π π, is relatively compact with respect to
setwise convergence, by Theorems 2.1(6) and 2.2(3) there exist a convex superlinear function
β : R+ → R+ and a constant M3 < +∞ such that
Fβ (ρa |π) =
n
β(u an ) dπ ≤ M3 for every n ∈ N. (4.71)
V
We conclude that the sequence of maps (u nt )t∈[a,b] satisfies the conditions of the compactness
result [4, Prop. 3.3.1], which yields the existence of a (not relabelled) subsequence and of a
L 1 (V , π)-continuous (thus also weakly-continuous) function [a, b] t → u t ∈ L 1 (V , π)
such that u nt u t weakly in L 1 (V , π) for every t ∈ [a, b]. By (2.5) we also deduce that
(4.66a) holds, i.e.
123
Jump processes as generalized gradient flows Page 41 of 85 33
For every B ∈ A ⊗ B, A being the Borel σ -algebra of [a, b], with ϑ λ (B) > 0, Jensen’s
inequality (2.18) yields
n
j (B) n
λn ς (B) ≤ F ( j nλ B|ς n B) ≤ M. (4.77)
ς (B)
Denoting by U : R+ → R+ the inverse function of , we thus find
M
j nλ (B) ≤ ς n (B) U . (4.78)
ς n (B)
Since is superlinear, U is sublinear so that
lim δU (M/δ) = 0. (4.79)
δ↓0
For every ε > 0 there exists δ0 > 0 such that δU (M/δ) ≤ ε for every δ ∈ (0, δ0 ). Since ς n is
equi absolutely continuous with respect to ϑ λ we can also find δ1 > 0 such that ϑ λ (B) < δ1
yields ς n (B) ≤ δ0 . By (4.78) we eventually conclude that j nλ (B) ≤ ε.
It is then easy to pass to the limit in the integral formulation (4.4) of the continuity equation.
Finally, concerning (4.67), it is sufficient to use the equivalent representation given by (4.64).
Remark 4.23 (Scaling invariance) Let us consider the perspective function (r ˆ , s) associ-
ated wih as in (2.14), (rˆ , s) = s(r /s) if s > 0. We call Rs (ρ, j ) the dissipation
ˆ s), with induced Dynamic-Transport cost Ws . For every τ > 0,
functional induced by (·,
ρ0 , ρ1 ∈ M+ (V ) a rescaling argument yields
σ
W (τ, ρ0 , ρ1 ) = Wτ/σ (σ, ρ0 , ρ1 ) = inf Rτ/σ (ρt , j t ) dt : (ρ, j ) ∈ CE(0, σ ; ρ0 , ρ1 ) .
0
(4.82)
In particular, choosing σ = 1 we find
W (τ, ρ0 , ρ1 ) = Wτ (1, ρ0 , ρ1 ). (4.83)
123
33 Page 42 of 85 M. A. Peletier et al.
Currently,
proving that any pair of measures can be connected by a curve with finite
action R under general conditions on V , and α is an open problem: in other words, in the
general case we cannot exclude that A (0, τ ; ρ0 , ρ1 ) = ∅, which would make W (τ, ρ0 , ρ1 ) =
+∞. Nonetheless, in a more specific situation, Proposition 4.25 below provides sufficient
conditions for this connectivity property, between two measures ρ0 , ρ1 ∈ M+ (V ) with the
same mass and such that ρi π for i ∈ {0, 1}. Preliminarily, we give the following
Definition 4.24 Let q ∈ (1, +∞). We say that the measures (π, ϑ) satisfy a q-Poincaré
inequality if there exists a constant C P > 0 such that for every ξ ∈ L (V ; π) with
q
We are now in a position to state the connectivity result, where we specialize the discussion
to dissipation densities with p-growth for some p ∈ (1, +∞).
for i ∈ {0, 1}. Then, for every τ > 0 the set A (0, τ ; ρ0 , ρ1 ) is non-empty and thus
W (τ, ρ0 , ρ1 ) < ∞.
We postpone the proof of Proposition 4.25 to Appendix D, where some preliminary results,
also motivating the role of the q-Poincaré inequality, will be provided.
The main result of this section collects a series of properties of the cost that will play a key
role in the study of the Minimizing Movement scheme (1.26). Indeed, as already hinted in
the Introduction, the analysis that we will carry out in Sect. 7 ahead might well be extended
to a scheme set up in a general topological space, endowed with a cost functional enjoying
properties (4.86) below. We will now check them for the cost W associated with generalized
gradient structure (E , R , R ∗ ) fulfilling Assumptions (Vπκ) and (R ∗ α). In this section
all convergences will be with respect to the setwise topology.
W (τ, ρ0 , ρ1 ) = 0 ⇔ ρ0 = ρ1 . (4.86a)
123
Jump processes as generalized gradient flows Page 43 of 85 33
Proof (1) Since (s) is strictly positive for s = 0 it is immediate to check that R (ρ, j ) =
τ
0 ⇒ j = 0. For an optimal pair (ρ, j ) satisfying 0 R (ρt , j t ) dt = 0 we deduce that
j t = 0 for a.e. t ∈ (0, τ ). The continuity equation then implies ρ0 = ρ1 .
(2) This can easily be checked by using the existence of minimizers for W (τ, ρ0 , ρ1 ).
(3) Assume without loss of generality that lim inf n→+∞ W (τn , ρ0n , ρ1n ) < ∞. By (4.83)
we use that, for every n ∈ N and setting τ = supn τn ,
1
(∗)
W (τn , ρn0 , ρn1 ) = Wτn (1, ρn0 , ρn1 ) ≤ Wτ (1, ρn0 , ρn1 ) = Rτ (ρtn , j nt ) dt,
0
where the identity (∗) holds for an optimal pair (ρ n , j n ) ∈ CE(0, 1; ρ0n , ρ1n ). Applying
Proposition 4.21, we obtain the existence of (ρ, j ) ∈ CE(0, 1; ρ0 , ρ1 ) such that, up to a
subsequence,
ρsn → ρs setwise in M+ (V ) for all s ∈ [0, 1] ,
(4.87)
j n → j setwise in M([0, 1]×E) ,
Arguing as in Proposition 4.21 and using the joint lower semicontinuity of , ˆ we find that
1 1
lim inf Rτn ρsn , j ns ds ≥ Rτ ρs , j s ds ≥ Wτ (1, ρ0 , ρ1 ) = W (τ, ρ0 , ρ1 ).
n→∞ 0 0
By the same argument as for part (3), every subsequence of ρn has a converging subsequence
in the setwise topology; the lower semicontinuity result of the proof of part (3) shows that
any limit point must coincide with ρ.
(5) The argument combines (4.88) and part (3).
123
33 Page 44 of 85 M. A. Peletier et al.
Given a functional W satisfying the properties (4.86), we define the ‘W -action’ of a curve
ρ : [a, b] → M+ (V ) as
⎧ ⎫
⎨M ⎬
W(ρ; [a, b]) := sup W (t j − t j−1 , ρ(t j−1 ), ρ(t j )) : (t j ) M ∈ P f ([a, b]) ,
⎩ j=0 ⎭
j=1
(4.89)
for all [a, b] ⊂ [0, T ] where P f ([a, b]) denotes the set of all partitions of a given interval
[a, b].
If W is defined by (4.80), then each term in the sum above is defined as an optimal
tj
version of t j−1 R (ρt , ·) dt, and we might expect that W(ρ; [a, b]) is an optimal version of
b
a R (ρt , ·) dt. This is indeed the case, as is illustrated by the following analogue of [20,
Th. 5.17]:
and there exists a unique j opt such that equality is achieved. The optimal j opt is skew-
symmetric, i.e. j opt = j opt (cf. Remark 4.12).
Lemma 4.28 Let ρ : [0, T ] → M+ (V ) satisfy W(ρ; [0, T ]) < +∞. For a sequence of par-
j j j−1
titions Pn = (tn ) M
j=0 ∈ P f ([0, T ]) with fineness τn = max j=1,...,Mn (tn −tn
n
) converging
+
to zero, let ρ : [0, T ] → M (V ) satisfy
n
j j
ρ n (tn ) = ρ(tn ) for all j = 1, . . . , Mn and supn∈N W(ρ n ; [0, T ]) < +∞.
Proof First of all, observe that by the symmetry of , also the time-reversed curve
ρ̌(t):=ρ(T − t) satisfies W(ρ̌; [0, T ]) < +∞. Let tn and tn be the piecewise constant
interpolants associated with the partitions Pn , cf. (7.5). Fix t ∈ [0, T ]; we estimate
(1)
W 2(tn − t), ρ n (t), ρ(t) ≤ W tn − t, ρ n (t), ρ n (tn (t)) + W tn − t, ρ(tn (t)), ρ(t)
= W tn − t, ρ n (t), ρ n (tn (t)) + W tn − t, ρ̌(T − tn (t)), ρ̌(T − t)
≤ W(ρ n ; [t, tn (t)]) + W(ρ̌; [T − tn (t), T − t])
≤ sup W(ρ n ; [0, T ]) + W(ρ̌; [0, T ]) =: C < +∞,
n∈N
where (1) follows from property (4.86b) of W . Consequently, by property (4.86d) it follows
that ρ n (t) → ρ(t) setwise in M+ (V ) for all t ∈ [0, T ].
123
Jump processes as generalized gradient flows Page 45 of 85 33
Proof of Proposition 4.27 One implication is straightforward: if a pair (ρ, j ) exists, then
t
(4.80)
W (t − s, ρs , ρt ) ≤ R (ρr , j r ) dr , for all 0 ≤ s < t ≤ T ,
s
j−1 j
n ∈ N, construct a pair (ρ n , j n ) ∈ CE(0, T ) as follows: On each time interval [tn , tn ], let
j−1 j−1
(ρ n , j n ) be given by Corollary 4.22 as the minimizer under the constraint ρ n (tn ) = ρ(tn )
j j
and ρ n (tn ) = ρ(tn ), namely
j
tn
j j−1 j−1 j
W (tn −tn , ρ(tn ), ρ(tn )) = j−1
R (ρrn , j rn ) dr . (4.91)
tn
By Lemma 4.28 we then find that ρ n (t) → ρ(t) setwise as n → ∞ for each t ∈ [0, T ].
Applying Proposition 4.21, we find that j n (dt dx dy):= j nt (dx dy) dt setwise converges
along a subsequence to a limit j . The limit j can be disintegrated as j (dt dx dy) =
λ(dt) j t (dx dy) for a measurable family ( j t )t∈[0,T ] , and the pair (ρ, j ) is an element of
CE(0, T ). In addition we have the lower-semicontinuity property
T T
lim inf R (ρt , j t ) dt ≥
n n
R (ρt , j t ) dt. (4.94)
n→∞ 0 0
123
33 Page 46 of 85 M. A. Peletier et al.
With the definitions and the properties that we established in the previous section we have
given a rigorous meaning to the first term in the functional L in (1.18). In this section
we continue with the second term in the integral, often called Fisher information, after the
canonical version in diffusion problems [68]. Section 5.2 is devoted to
(a) A rigorous definition of the Fisher information D (ρ) (Definition 5.1).
In several practical settings, such as the proof of existence that we give in Sect. 7, it is
important to have lower semicontinuity of D : this is proved in Proposition 5.3.
We are then in a position to give
(b) a rigorous definition of solutions to the (E , R , R ∗ ) system (Definition 5.4).
In Sect. 1.2.1 we explained that the Energy-Dissipation balance approach to defining solutions
is based on the fact that L (ρ, j ) ≥ 0 for all (ρ, j ) by the validity of a suitable chain-rule
inequality.
(c) A rigorous proof of this chain-rule inequality, involving R and D , is given in Corollary 5.6,
which is based on Theorem 4.16).
This establishes the inequality L (ρ, j ) ≥ 0. Hence, we can rigorously deduce that the
opposite inequality L (ρ, j ) ≤ 0 characterizes the property that (ρ, j ) is a solution to the
(E , R , R ∗ ) system. Theorem 5.7 provides an additional characterization of this solution
concept.
Finally, in Sects. 5.3 and 5.4,
(d) we prove existence, uniqueness and stability of solutions under suitable convex-
ity/l.s.c. conditions on D (Theorems 5.10 and 5.9). We also discuss their asymptotic
behaviour and the role of the invariant measures π.
Throughout this section we adopt Assumptions (Vπκ), (R ∗ α), and (E φ).
In order to give a precise meaning to this formulation when φ is not differentiable at 0 (as,
for instance, in the case of the Boltzmann entropy function (3.10)), we use the function Dφ
defined in (4.53c).
Example 5.2 (The Fisher information in the quadratic and in the cosh case) For illustration
we recall the two expressions for Dφ from Example 4.18 for the linear equation (1.2) with
quadratic and cosh-type potentials ∗ :
123
Jump processes as generalized gradient flows Page 47 of 85 33
Dφ (u, v) = 0 if u = v = 0,
⎪
⎩
+∞ if u = 0 and v = 0, or vice versa.
(2) If ∗ (s) = 4 cosh(s/2) − 1 , then
√ √ 2
Dφ (u, v) = 2 u− v ∀ (u, v) ∈ [0, +∞) × [0, +∞).
These two examples of Dφ are convex. The convexity of Dφ in the case of potentials ∗
generated by the power means (1.33) is discussed in Appendix E.
Proposition 5.3 (Lower semicontinuity of D ) Assume either that π is purely atomic or that
the function Dφ is convex on R2+ . Then D is (sequentially) lower semicontinuous with respect
to setwise convergence, i.e., for all (ρ n )n , ρ ∈ D(E )
ρ n → ρ setwise in M+ (V ) ⇒ D (ρ) ≤ lim inf D (ρ n ) . (5.2)
n→∞
convex on R+ , then the functional (5.3) is also lower semicontinuous with respect to the
2
weak topology in L 1 (V , π). On the other hand, since ρn and ρ are absolutely continuous
with respect to π, ρn → ρ setwise if and only if dρn /dπdρ/dπ weakly in L 1 (V , π) (see
Theorem 2.1).
123
33 Page 48 of 85 M. A. Peletier et al.
Remark 5.5
(1) Since (ρ, j ) ∈ CE(0, T ), the curve ρ is absolutely continuous with respect to the total
variation distance.
(2) The Energy-Dissipation balance (5.4) written for s = 0 and t = T implies that (ρ, j ) ∈
A(0, T ) as well. Moreover, t → E (ρt ) takes finite values and it is absolutely continuous
in the interval [0, T ].
(3) The chain-rule estimate (4.59) implies the following important corollary:
Corollary 5.6 (Chain-rule estimate III) For any curve (ρ, j ) ∈ CE(0, T ),
T
LT (ρ, j ):= R (ρr , j r ) + D (ρr ) dr + E (ρT ) − E (ρ0 ) ≥ 0. (5.5)
0
123
Jump processes as generalized gradient flows Page 49 of 85 33
Let us now collect a few basic structural properties of solutions of the (E , R , R ∗ ) Energy-
Dissipation balance. Recall that we will always adopt Assumptions (Vπκ), (R ∗ α),
and (E φ).
Following an argument by Gigli [34] we first use convexity of D to deduce uniqueness.
Theorem 5.9 (Uniqueness) Suppose that D is convex and the energy density φ is strictly
convex. Suppose that ρ 1 , ρ 2 satisfy the (E , R , R ∗ ) Energy-Dissipation balance (5.4) and
are identical at time zero. Then ρt1 = ρt2 for every t ∈ [0, T ].
Proof Let j i ∈ M((0, T ) × E) satisfy Lt (ρ i , j i ) = 0 and let us set
1 1
ρt := (ρt1 + ρt2 ), j := ( j 1 + j 2 ).
2 2
By the linearity of the continuity equation we have that (ρ, j ) ∈ CE(0, T ) with ρ0 = ρ01 =
ρ02 , so that by convexity
t
E (ρt ) ≥ E (ρ0 ) − R (ρr , j r ) + D (ρr ) dr
0
1 t 1 t
≥ E (ρ0 ) − R (ρr1 , j r1 ) + D (ρr1 ) dr − R (ρr2 , j r2 ) + D (ρr2 ) dr
2 0 2 0
1 1
= E (ρt1 ) + E (ρt2 ).
2 2
123
33 Page 50 of 85 M. A. Peletier et al.
Theorem 5.10 (Existence and stability) Let us suppose that the Fisher information functional
D is lower semicontinuous with respect to setwise convergence (e.g. if π is purely atomic,
or Dφ is convex, see Proposition 5.3).
(1) For every ρ0 ∈ M+ (V ) with E (ρ0 ) < +∞ there exists a solution ρ : [0, T ] → M+ (V )
of the (E , R , R ∗ ) evolution system starting from ρ0 .
(2) Every sequence (ρtn )t∈[0,T ] of solutions to the (E , R , R ∗ ) evolution system such that
sup E (ρ0n ) < +∞ (5.13)
n∈N
has a subsequence setwise converging to a limit (ρt )t∈[0,T ] for every t ∈ [0, T ].
(3) Let (ρtn )t∈[0,T ] is a sequence of solutions, with corresponding fluxes ( j nt )t∈[0,T ] . Let ρtn
converge setwise to ρt for every t ∈ [0, T ], and assume that
lim E (ρ0n ) = E (ρ0 ). (5.14)
n→∞
Then ρ is a solution as well, with flux j , and the following additional convergence
properties hold:
T T
lim R (ρtn , j nt ) dt = lim R (ρt , j t ) dt, (5.15a)
n→∞ 0 n→∞ 0
T T
lim D (ρtn ) dt = lim D (ρt , j t ) dt, (5.15b)
n→∞ 0 n→∞ 0
If moreover E is strictly convex then ρ n converges uniformly in [0, T ] with respect to the
total variation distance.
4.21 to conclude uniform convergence of the sequence (ρn )n with respect to the total variation
distance.
For part (1), when the density u 0 of ρ0 takes value in a compact interval [a, b] with
0 < a < b < ∞, the existence of a solution follows by Theorem 6.6 below. The general case
follows by a standard approximation of u 0 by truncation and applying the stability properties
of parts (2) and (3).
Let us finally make a few comments on stationary measures and on the asymptotic behaviour
of solutions of the (E , R , R ∗ ) system. The definition of invariant measures was already given
in Sect. 2.4, and we recall it for convenience.
123
Jump processes as generalized gradient flows Page 51 of 85 33
When 0 < γ < 1, any function of the form u(x) = 1{x ∈ A} for A ⊂ V is a stationary
point of this equation, and equivalently any measure π A is a stationary solution of the
(E , R , R ∗ ) system. For 0 < γ < 1 therefore the set of stationary measures is much larger
than just invariant measures.
As in the case of linear evolutions, (E , R , R ∗ ) systems behave well with respect to
decomposition of π into mutually singular invariant measures.
Theorem 5.13 (Decomposition) Let us suppose that π = π 1 + π 2 with π 1 , π 2 ∈ M+ (V )
mutually singular and invariant. Let ρ : [0, T ] → M+ (V ) be a curve with ρt = u t π π
and let ρti :=u t π i be the decomposition of ρt with respect to π 1 and π 2 . Then ρ is a
solution of the (E , R , R ∗ ) system if and only if each curve ρti , i = 1, 2, is a solution of the
(E i , R i , (R i )∗ ) system, where E i (μ):=Fφ (μ|π i ) is the relative entropy with respect to the
measures π i and and R i , (R i )∗ are induced by π i .
Remark 5.14 It is worth noting that when α is 1-homogeneous then R i = R and (R i )∗ = R ∗
do not depend on π i , cf. Corollary 4.11. The decomposition is thus driven just by the splitting
of the entropy E .
123
33 Page 52 of 85 M. A. Peletier et al.
Proof of Theorem 5.13 Note that the assumptions of invariance and mutual singularity of π 1
and π 2 imply that ϑ has a singular decomposition ϑ = ϑ 1 + ϑ 2 :=κ π 1 + κ π 2 , where the κ π i
are symmetric. It then follows that E (ρt ) = E 1 (ρt1 )+ E 2 (ρt2 ) and D (ρt ) = D 1 (ρt1 )+ D 2 (ρt2 ),
where
1
D (ρ ) =
i i
Dφ (u(x), u(y)) ϑ i (dx, dy).
2 E
Finally, Corollary 4.11 shows that decomposing j as the sum j 1 + j 2 where j i ϑ i , the
pairs (ρ i , j i ) belong to CE(0, T ) and R (ρt , j t ) = R 1 (ρt1 , j 1t ) + R 2 (ρt2 , j 2t ).
Theorem 5.15 (Asymptotic behaviour) Let us suppose that the only stationary measures are
multiples of π, and that D is lower semicontinuous with respect to setwise convergence. Then
every solution ρ : [0, ∞) → M+ (V ) of the (E , R , R ∗ ) evolution system converges setwise
to cπ, where c:=ρ0 (V )/π(V ).
'
Proof Let us fix a vanishing sequence τn ↓ 0 such that n τn = +∞. Let ρ∞ be any limit
point with respect to setwise convergence of the curve ρt along a diverging sequence of
times tn ↑ +∞. Such a point exists since the curve ρ is contained in a sublevel set of E . Up
to extracting a further subsequence, it is not restrictive to assume that tn+1 ≥ tn + τn .
Since
tn +τn
R (ρt , j t ) + D (ρt ) dt
n∈N tn
+∞
≤ R (ρt , j t ) + D (ρt ) dt ≤ E (ρ0 ) < ∞
0
and the series of τn diverges, we find
1 tn +τn tn +τn
lim inf D (ρt ) dt = 0, lim R (ρt , j t ) dt = 0.
n→+∞ τn t n→∞ t
n n
Up to extracting a further subsequence, we can suppose that the above lim inf is a limit and
we can select tn ∈ [tn , tn + τn ] such that
tn
lim D (ρtn ) = 0, lim R (ρt , j t ) dt = 0.
n→∞ n→∞ t
n
Recalling the definition (4.80) of the Dynamical-Variational Transport cost and the mono-
tonicity with respect to τ , we also get limn→∞ W (τn , ρtn , ρtn ) = 0, so that Theorem 4.26(5)
and the relative compactness of the sequence (ρtn )n yield ρtn → ρ∞ setwise.
The lower semicontinuity of D yields D (ρ∞ ) = 0 so that ρ∞ = cπ thanks to the
uniqueness assumption and to the conservation of the total mass. Since we have uniquely
identified the limit point, we conclude that the whole curve ρt converges setwise to ρ∞ as
t → +∞.
123
Jump processes as generalized gradient flows Page 53 of 85 33
Let J ⊂ R be a closed interval (not necessarily bounded) and let us first consider a map
G : E × J 2 → R with the following properties:
(1) measurability with respect to (x, y) ∈ E:
for every u, v ∈ J the map (x, y) → G(x, y; u, v) is measurable; (6.1a)
(2) continuity with respect to u, v and linear growth: there exists a constant M > 0 such
that
for every (x, y) ∈ E (u, v) → G(x, y; u, v) is continuous and
|G(x, y; u, v)| ≤ M(1 + |u| + |v|) for every u, v ∈ J , (6.1b)
(3) skew-symmetry:
G(x, y; u, v) = −G(y, x; v, u), for every (x, y) ∈ E, u, v ∈ J , (6.1c)
(4) -dissipativity: there exists a constant ≥ 0 such that for every (x, y) ∈ E, u, u , v ∈ J :
u≤u ⇒ G(x, y; u , v) − G(x, y; u, v) ≤ (u − u). (6.1d)
Remark 6.1 Note that (6.1d) is surely satisfied if G is -Lipschitz in (u, v), uniformly with
respect to (x, y). The ‘one-sided Lipschitz condition’ (6.1d) however is weaker than the
standard Lipschitz condition; this type of condition is common in the study of ordinary
differential equations, since it is still strong enough to guarantee uniqueness and non-blowup
of the solutions (see e.g. [41, Ch. IV.12]).
Let us also remark that (6.1c) and (6.1d) imply the reverse monotonicity property of G
with respect to v,
v≥v ⇒ G(x, y; u, v ) − G(x, y; u, v) ≤ (v − v ) , (6.2)
and the joint estimate
u≤u, v≥v ⇒ G(x, y; u , v ) − G(x, y; u, v) ≤ (u − u) + (v − v ) . (6.3)
123
33 Page 54 of 85 M. A. Peletier et al.
(4) If a ∈ J satisfies
0 = G(x, y; a, a) ≤ G(x, y; a, v) for every (x, y) ∈ E, v ≥ a , (6.7)
then for every function u ∈ L 1 (V , π; J ) we have
1
u ≥ aπ -a.e. ⇒ lim a − (u + h G[u]) dπ = 0 . (6.8)
h↓0 h V +
If b ∈ J satisfies
0 = G(x, y; b, b) ≥ G(x, y; b, v) for every (x, y) ∈ E, v ≤ b, (6.9)
then for every function u ∈ L 1 (V , π;J ) we have
1
u ≤ bπ -a.e. ⇒ lim u + h G[u] − b dπ = 0 . (6.10)
h↓0 h V +
Proof (1) Since G is a Carathéodory function, for every measurable u and every (x, y) ∈ E
the map (x, y) → G(x, y; u(x), u(y)) is measurable. Since
|G(x, y; u(x), u(y))| κ(x, dy)π(dx) = |G(x, y; u(x), u(y))| ϑ(dx, dy)
E E
(6.11)
≤ MκV ∞ 1 + 2 |u| dπ ,
V
the first claim follows by Fubini’s Theorem [18, II, 14].
(2) Let (u n )n∈N be a sequence of functions strongly converging to u in L 1 (V , π; J ). Up to
extracting a further subsequence, it is not restrictive to assume that u n also converges to u
pointwise π-a.e. We have
( (
( G[u n ] − G[u]( 1 = G(x, y; u n (x), u n (y)) − G(x, y; u(x), u(y)) ϑ(dx, dy) .
L (V ,π )
E
(6.12)
Since the integrand gn in (6.12) vanishes ϑ-a.e. in E as n → ∞, by the generalized Dominated
Convergence Theorem (see for instance [27, Thm. 4, page 21] it is sufficient to show that
there exist positive functions h n pointwise converging to h such that
gn ≤ h n ϑ-a.e. in E, lim h n dϑ = h dϑ.
n→∞ E E
We select h n (x, y):=M(2 + |u n (x)| + |u n (y)| + |u(x)| + |u(y)|) and h(x, y):=2M(1 +
|u(x)| + |u(y)|). This proves the result.
(3) Let us set
1 if r > 0 ,
s(r ):=
−1 if r ≤ 0 ,
and observe that the left-hand side of (6.6) may be estimated from below by
( (
((u 1 − u 2 ) − h(G[u 1 ] − G[u 2 ])( 1 ≥ u 1 − u 2 L 1 (V ,π )
L (V ,π )
−h s(u 1 − u 2 ) G[u 1 ] − G[u 2 ] dπ
V
for all h > 0. Therefore, estimate (6.6) follows if we prove that
δ:= s(u 1 − u 2 ) G[u 1 ] − G[u 2 ] dπ ≤ 2κV ∞ u 1 − u 2 L 1 (V ,π ) . (6.13)
V
123
Jump processes as generalized gradient flows Page 55 of 85 33
Let us set
G (x, y):=G(x, y; u 1 (x), u 1 (y)) − G(x, y; u 2 (x), u 2 (y)),
and
s (x, y):=s(u 1 (x) − u 2 (x)) − s(u 1 (y) − u 2 (y)). (6.14)
Since G (x, y) = −G (y, x), using (6.1c) we have
δ= s u 1 − u 2 ) G[u 1 ] − G[u 2 ] dπ = s(u 1 (x) − u 2 (x))G (x, y) ϑ(dx, dy)
V
E
1
= s (x, y)G (x, y) ϑ(dx, dy).
2 E
Setting (x):=u 1 (x) − u 2 (x) we observe that by (6.3)
(x) > 0, (y) > 0 ⇒ s (x, y) = 0,
(x) ≤ 0, (y) ≤ 0 ⇒ s (x, y) = 0,
(x) ≤ 0, (y) > 0 ⇒ s (x, y) = −2, G (x, y) ≥ − (y) − (x)
(x) > 0, (y) ≤ 0 ⇒ s (x, y) = 2, G (x, y) ≤ (x) − (y) .
We deduce that
δ≤ |u 1 (x) − u 2 (x)| + |u 1 (y) − u 2 (y)| ϑ(dx, dy) ≤ 2κV ∞ u 1 − u 2 L 1 (V ,π ) .
E
(4) We will only address the proof of property (6.8), as the argument for (6.10) is completely
analogous. Suppose that u ≥ a π-a.e. Let us first observe that if u(x) = a, then from (6.7),
G[u](x) = G(x, y; a, u(y)) κ(x, dy) ≥ 0 .
V
u |t=0 = u 0 . (6.15b)
Lemma 6.3 (Comparison principles) Let us suppose that the map G satisfies (6.1a,b,c) with
J = R.
(1) If ū ∈ R satisfies
0 = G(x, y; ū, ū) ≤ G(x, y; ū, v) for every (x, y) ∈ E, v ≥ ū, (6.16)
then for every initial datum u 0 ≥ ū the solution u of (6.15) satisfies u t ≥ ū π-a.e. for
every t ≥ 0.
123
33 Page 56 of 85 M. A. Peletier et al.
(2) If ū ∈ R satisfies
0 = G(x, y; ū, ū) ≥ G(x, y; ū, v) for every (x, y) ∈ E, v ≤ ū, (6.17)
then for every initial datum u 0 ≤ ū the solution u of (6.15) satisfies u t ≤ ū π-a.e. for
every t ≥ 0.
Proof (1) Let us first consider the case ū = 0. We define a new map G by symmetry:
G(x, y; u, v):=G(x, y; u, |v|) (6.18)
which satisfies the same structural properties (6.1a,b,c), and moreover
0 = G(x, y; 0, 0) ≤ G(x, y; 0, v) for every x, y ∈ V , v ∈ R. (6.19)
We call G the operator induced by G, and ū the solution curve of the corresponding Cauchy
problem starting from the same (nonnegative) initial datum u 0 . If we prove that ū t ≥ 0 for
every t ≥ 0, then ū t is also the unique solution of the original Cauchy problem (6.15) induced
by G, so that we obtain the positivity of u t .
Note that (6.19) and property (6.1d) yield
G(x, y; u, v) ≥ G(x, y; u, v) − G(x, y; 0, v) ≥ u for u ≤ 0 . (6.20)
We set β(r ):=r− = max(0, −r ) and Pt :={x ∈ V : ū t (x) < 0} for each t ≥ 0. Due to the
Lipschitz continuity of β, the map t → b(t):= V β(ū t ) dπ is absolutely continuous. Hence,
the chain-rule formula applies, which, together with (6.20) gives
d
b(t) = − G[ū t ](x) π(dx) = − G(x, y; ū t (x), ū t (y)) ϑ(dx, dy)
dt P Pt ×V
t
≤ (−ū t (x)) ϑ(dx, dy) = β(ū t (x)) ϑ(dx, dy) ≤ κV ∞ b(t) .
Pt ×V E
We can now state our main result concerning the well-posedness of the Cauchy prob-
lem (6.15).
123
Jump processes as generalized gradient flows Page 57 of 85 33
(5) If = 0, then the evolution is order preserving: if u, v are two solutions with initial data
u 0 , v0 then
u 0 ≤ v0 ⇒ u t ≤ vt for every t ≥ 0. (6.22)
Proof Claims (1), (3), (4) follow by the abstract generation result of [50, §6.6, Theorem
6.1] applied to the operator G defined in the closed convex subset D:=L 1 (V , π; J ) of
the Banach space L 1 (V , π). For the theorem to apply, one has to check the continuity of
G : D → L 1 (V , π) (Lemma 6.2(2)), its dissipativity (6.6), and the property
lim inf h −1 inf v∈D u + h G[u] − v L 1 (V ,π ) = 0 for every u ∈ D .
h↓0
When J = R, the inner infimum always is zero; if J is a bounded interval [a, b] then the
property above follows from the estimates of Lemma 6.2(4), since for any u ∈ D,
inf v∈D |u + h G[u] − v| dπ ≤ a − (u + h G[u]) dπ + u + h G[u] − b dπ .
V V + V +
Let us now consider the map F : (0, +∞)2 → R induced by the system ( ∗ , φ, α), first
introduced in (1.11),
F(u, v):=( ∗ ) φ (v) − φ (u) α(u, v) for every u, v > 0 , (6.23)
with the corresponding integral operator:
F[u](x):= F(u(x), u(y)) κ(x, dy) . (6.24)
V
Since ∗ , φ are C1 convex functions on (0, +∞) and α is locally Lipschitz in (0, +∞)2 it
is easy to check that F satisfies properties (6.1a,b,c,d) in every compact subset J ⊂ (0, +∞)
and conditions (6.7), (6.9) at every point a, b ∈ J . In order to focus on the structural properties
of the associated evolution problem, cf. (6.28) below, we will mostly confine our analysis to
the regular case, according to the following:
123
33 Page 58 of 85 M. A. Peletier et al.
Assumption (F). The map F defined by (6.23) satisfies the following properties:
F admits a continuous extension to [0, ∞), (6.25)
Note that (6.25) is always satisfied if φ is differentiable at 0. Estimate (6.26) is also true
if in addition α is Lipschitz. However, as we have shown in Sect. 1.3, there are important
examples in which φ (0) = −∞, but (6.25) and (6.26) hold nonetheless. All of the examples
given in Sect. 1.3 indeed provide families of maps F that satisfy Assumption (F).
Theorem 6.4 yields the following general result:
We now show that the solution u given by Theorem 6.5 is also a solution in the sense of
the (E , R , R ∗ ) Energy-Dissipation balance.
123
Jump processes as generalized gradient flows Page 59 of 85 33
If F satisfies the stronger assumption (F), then the same result holds for every essentially
bounded and nonnegative initial datum. Finally, if also (F∞ ) holds, the above result is valid
for every nonnegative u 0 ∈ L 1 (V , π) with ρ0 = u 0 π ∈ D(E ).
Proof Let us first consider the case when u 0 satisfies 0 < a ≤ u 0 ≤ b < +∞ π-a.e.. Then,
the solution u = S[u 0 ] satisfies the same bounds, the map wt is uniformly bounded and
α(u t (x), u t (y)) ≥ α(a, a) > 0, so that (ρ, j ) ∈ A(0, T ). We can thus apply Theorem 5.7,
obtaining the Energy-Dissipation balance
T T
E (ρ0 ) − E (ρT ) = R (ρt , j t ) dt + D (ρt ) dt, or equivalently L (ρ, j ) = 0.
0 0
(6.29)
In the case 0 ≤ u 0 ≤ b we can argue by approximation, setting u a0 := max{u 0 , a},
a > 0, and considering the solution u at :=St [u a0 ] with divergence field 2 j at (dx, dy) =
−F(u at (x), u at (y))ϑ(dx, dy). Theorem 6.5(4) shows that u at → u t strongly in L 1 (V , π)
as a ↓ 0, and consequently also j aλ → j λ setwise. Hence, we can pass to the limit in (6.29)
(written for (ρ a , j a ) thanks to Proposition 4.21 and Proposition 5.3), obtaining L (ρ, j ) ≤ 0,
which is still sufficient to conclude that (ρ, j ) is a solution thanks to Remark 5.5(3).
Finally, if (F∞ ) holds, we obtain the general result by a completely analogous argument,
approximating u 0 by the sequence u b0 := min{u 0 , b} and letting b ↑ +∞.
In this section we construct solutions to the (E , R , R ∗ ) formulation via the Minimizing Move-
ment approach. The method uses only fairly general properties of W , E , and the underlying
space, and it may well have broader applicability than the measure-space setting that we
consider here (see Remark 7.8). Therefore we formulate the results in a slightly more general
setup.
We consider a topological space
For consistency with the above definition, in this section we will use use the abstract notation
σ
to denote setwise convergence in X = M+ (V ). Although throughout this paper we adopt
the Assumptions (Vπκ), (R ∗ α), and (E φ), in this chapter we will base the discussion only
on the following properties:
Assumption (Abs).
(1) the Dynamical-Variational Transport (DVT) cost W enjoys properties (4.86);
(2) the driving functional E enjoys the typical lower-semicontinuity and coercivity
properties underlying the variational approach to gradient flows:
E ≥ 0 and E is σ -sequentially lower semicontinuous; (7.2a)
123
33 Page 60 of 85 M. A. Peletier et al.
Assumption (Abs) is implied by Assumptions (Vπκ), (R ∗ α), and (E φ). The proper-
ties (4.86) are the content of Theorem 4.26; condition (7.2a) follows from Assumption (E φ)
and Lemma 5.3; condition (7.2b) follows from the superlinearity of φ at infinity and
Prokhorov’s characterization of compactness in the space of finite measures [9, Th. 8.6.2].
The classical ‘Minimizing Movement’ scheme for metric-space gradient flows [4,17] starts
by defining approximate solutions through incremental minimization,
1
ρ n ∈ argmin d(ρ n−1 , ρ)2 + E (ρ) .
ρ 2τ
In the context of this paper the natural generalization of the expression to be minimized is
W (τ, ρ n−1 , ρ) + E (ρ). This can be understood by remarking that if R (ρ, ·) is quadratic,
then it formally generates a metric
1
1
d(μ, ν)2 = inf R (ρt , j t ) dt : ∂t ρt + div j t = 0, ρ0 = μ, and ρ1 = ν
2 0
τ
= τ inf R (ρt , j t ) dt : ∂t ρt + div j t = 0, ρ0 = μ, and ρτ = ν
0
= τ W (τ, μ, ν).
Nτ
Lemma 7.2 Under assumption (Abs), for any τ > 0 Problem 7.1 admits a solution {ρτn }n=1 ⊂
X.
123
Jump processes as generalized gradient flows Page 61 of 85 33
The existence of a measurable selection is guaranteed by [15, Cor. III.3, Thm. III.6].
It is natural to introduce the following extension of the notion of (Generalized) Minimizing
Movement, which is typically given in a metric setting [4,5]. For simplicity, we will continue
to use the classical terminology.
Theorem 7.4 Under Assumptions (Vπκ), (R ∗ α), and (E φ), let the lower-semicontinuity
Property (5.2) be satisfied.
Then GMM(E , W ; (0, T ), ρ ◦ ) = ∅ and every ρ ∈ GMM(E , W ; (0, T ), ρ ◦ ) satisfies the
(E , R , R ∗ ) Energy-Dissipation balance (Definition 5.4).
Throughout Sects. 7.2–7.4 we will first prove an abstract version of this theorem as The-
orem 7.7 below, under Assumption (Abs). Indeed, therein we could ‘move away’ from the
context of the ‘concrete’ gradient structure for the Markov processes, and carry out our anal-
ysis in a general topological setup (cf. Remark 7.8 ahead). In Sect. 7.5 we will ‘return’ to the
problem under consideration and deduce the proof of Theorem 7.4 from Theorem 7.7.
123
33 Page 62 of 85 M. A. Peletier et al.
Recalling the duality formula for the local slope (cf. [4, Lemma 3.15]) and the fact that
W (τ, ·, ·) is a proxy for 2τ d (·, ·), it is immediate to recognize that the generalized slope is a
1 2
surrogate of the local slope. Furthermore, as we will see that its definition is somehow tailored
to the validity of Lemma 7.5 ahead. Heuristically, the generalized slope S (ρ) coincides with
the Fisher information D (ρ) = R ∗ (ρ, −DE (ρ)). This can be recognized, again heuristically,
by fixing a point ρ0 and considering curves ρt :=ρ0 − tdiv j , for a class of fluxes j . We then
calculate
" #
R ∗ (ρ0 , −DE (ρ0 )) = sup −DE (ρ0 ) · j − R (ρ0 , j )
j
r
1
= sup lim E (ρ0 ) − E (ρr ) − R (ρt , j ) dt .
j r →0 r 0
Lemma 7.5 For all ρ ∈ D(E ) and for every selection ρr ∈ Jr (ρ)
Er2 (ρ) ≤ Er1 (ρ) ≤ E (ρ) for all 0 < r1 < r2 ; (7.12)
σ
ρr ρ as r ↓ 0, E (ρ) = lim Er (ρ); (7.13)
r ↓0
d
Er (ρ) ≤ −S (ρr ) for a.e. r > 0. (7.14)
dr
In particular, for all ρ ∈ D(E )
Proof Let r > 0, ρ ∈ D(E ), and ρr ∈ Jr (ρ). It follows from (7.10) and (4.86a) that
in the same way, one checks that for all ρ ∈ X and 0 < r1 < r2 ,
(7.9)
Er2 (ρ) − Er1 (ρ) ≤ W (r2 , ρr1 , ρ) + E (ρr1 ) − W (r1 , ρr1 , ρ) − E (ρr1 ) ≤ 0,
123
Jump processes as generalized gradient flows Page 63 of 85 33
which implies (7.12). Thus, the map r → Er (ρ) is non-increasing on (0, +∞), and hence
almost everywhere differentiable. Let us fix a point of differentiability r > 0. For h > 0 and
ρr ∈ Jr (ρ) we then have
Er +h (ρ) − Er (ρ) 1 * +
= inf v∈X W (r + h, ρ, v) + E (v) − W (r , ρ, ρr ) − E (ρr )
h h
1 * +
≤ inf v∈X W (h, ρr , v) + E (v) − E (ρr ) ,
h
the latter inequality due to (4.86b), so that
d 1 * +
Er (ρ) ≤ lim inf inf v∈X W (h, ρr , v) + E (v) − E (ρr )
dr h↓0 h
1 * +
= − lim sup sup −W (h, ρr , v) − E (v) + E (ρr ) ,
h↓0 h v∈X
whence (7.14). Finally, (7.17) yields that, for any ρ ∈ D(E ) and any selection ρr ∈ Jr (ρ),
one has supr >0 W (r , ρ, ρr ) < +∞. Therefore, (4.86d) entails the first convergence in (7.13).
Furthermore, we have
E (ρ) ≥ lim sup Er (ρ) ≥ lim inf (W (r , ρ, ρr ) + E (ρr )) ≥ lim inf E (ρr ) ≥ E (ρ),
r ↓0 r ↓0 r ↓0
where the first inequality again follows from (7.17), and the last one from the σ -lower
semicontinuity of E . This implies the second statement of (7.13).
Our next result collects the basic estimates on the discrete solutions. In order to properly state
it, we need to introduce the ‘density of dissipated energy’ associated with the interpolant ρτ ,
namely the piecewise constant function Wτ : [0, T ] → [0, +∞) defined by
W (tτn − tτn−1 , ρτn−1 , ρτn )
Wτ (t):= n−1
t ∈ (tτn−1 , tτn ], n = 1, . . . , Nτ ,
n tτ − tτ
tτn
n
so that j−1
Wτ (t) dt = W (tτk − tτk−1 , ρτk−1 , ρτk ) for all 1 ≤ j < n ≤ Nτ . (7.18)
tτ k= j
and there exists a constant C > 0 such that for all τ > 0
T T
Wτ (t) dt ≤ C, S ()ρτ (t)) dt ≤ C. (7.21)
0 0
123
33 Page 64 of 85 M. A. Peletier et al.
ρτ (t), ρτ (t), ρ
)τ (t) ∈ K ∀ t ∈ [0, T ] and τ > 0. (7.22)
j−1 j
Proof From (7.16) we directly deduce, for t ∈ (tτ , tτ ],
t
W (t − tτj−1 , ρτj−1 , ρ
)τ (t)) + j−1 S ()
ρτ (r )) dr + E ()
ρτ (t)) ≤ E (ρτj−1 ), (7.23)
tτ
j
which implies (7.19); in particular, for t = tτ one has
tτ
j tτ
j
j−1
Wτ (t) dt + j−1
S ()
ρτ (t)) dt + E (ρτj ) ≤ E (ρτj−1 ). (7.24)
tτ tτ
The estimate (7.20) follows upon summing (7.24) over the index j. Furthermore, applying
(7.8)–(7.9) one deduces for all 1 ≤ n ≤ Nτ that
tτn tτn
W (nτ, ρ0 , ρτn ) + E (ρτn ) ≤ Wτ (r ) dr + S ()
ρτ (r )) dr + E (ρτn ) ≤ E (ρ0 ). (7.25)
0 0
In particular, (7.21) follows, as well as supn=0,...,Nτ E (ρτn ) ≤ C. Then, (7.23) also yields
supt∈[0,T ] E ()
ρτ (t)) ≤ C.
Next we show the two estimates
where for (1) we have used that W (2T − tτn , ρ ∗ , ρ0 ) ≤ W (tτm , ρ ∗ , ρ0 ) since 2T − tτn ≥ tτm .
Thus, in view of (7.25) we we deduce
The main result of this section, Theorem 7.7 below, states that GMM(E , W ; (0, T ), ρ ◦ ) is
non-empty, and that any curve ρ ∈ GMM(E , W ; (0, T ), ρ ◦ ) fulfills an ‘abstract’ version
(7.31) of the (E , R , R ∗ ) Energy-Dissipation estimate (5.6), obtained by passing to the limit
in the discrete inequality (7.20).
123
Jump processes as generalized gradient flows Page 65 of 85 33
for all [a, b] ⊂ [0, T ], where P f ([a, b]) is the set of all finite partitions of the interval [a, b].
We also introduce the relaxed generalized slope S − : D(E ) → [0, +∞] of the driving
energy functional E , namely the relaxation of the generalized slope S along sequences with
bounded energy:
σ
S − (ρ) := inf lim inf S (ρn ) : ρn ρ, sup E (ρn ) < +∞ . (7.29)
n→∞ n∈N
We are now in a position to state and prove the ‘abstract version’ of Theorem 7.4.
Theorem 7.7 Under Assumption (Abs), let ρ ◦ ∈ D(E ). Then, for every vanishing sequence
(τk )k there exist a (not relabeled) subsequence and a σ -continuous curve ρ : [0, T ] → X
such that ρ(0) = ρ ◦ , and
σ
)τk (t) ρ(t)
ρτk (t), ρτ (t), ρ for all t ∈ [0, T ], (7.30)
k
Remark 7.8 Theorem 7.7 could be extended to a topological space where the cost W and the
energy functional E satisfy the properties listed at the beginning of the section.
where (1) follows from (7.20) (using the lower bound on E ), and (2) is due to the fact that
t → E (ρτk (t)) is nonincreasing.
By the property (4.86e) of W , this estimate is a form of uniform continuity of ρ, and we
now use this to extend ρ. Fix t ∈ [0, T ] \ A, and choose a sequence tm ∈ A, tm → t, with
the property that ρ(tm ) σ -converges to some ρ̃. For any sequence sm ∈ A, sm → t, we then
have
sup W (|tm − sm |, ρ(sm ), ρ(tm )) < +∞,
m
123
33 Page 66 of 85 M. A. Peletier et al.
σ
and since |tm − sm | → 0, property (4.86e) implies that ρ(sm ) ρ̃. This implies that along
any converging sequence tm ∈ A, tm → t the sequence ρ(tm ) has the same limit; therefore
there is a unique extension of ρ to [0, T ], that we again indicate by ρ. By again applying the
lower-semicontinuity property (4.86c) we find that
W (|t − s|, ρ̃, ρ(s)) ≤ lim inf W (|t − s|, ρτk (t), ρτk (s)) ≤ E (ρ0 ) ≤ C,
j→∞ j j
by the same argument as above. Taking the limit s → t, property (4.86e) and the continuity of
σ
ρ imply ρ̃ = ρ(t). Therefore ρτk (t) ρ(t) along each subsequence τk j , and consequently
j
also along the whole sequence τk .
Estimates (7.19) & (7.20) also give at each t ∈ (0, T ]
so that, again using the compactness information provided by (7.22) and property (4.86e) of
the cost W , it is immediate to conclude (7.30).
Step 3: Derive the energy-dissipation estimate. Finally, let us observe that
tτk (t)
lim inf Wτk (r )dr ≥ W(ρ; [0, t]) for all t ∈ [0, T ]. (7.33)
k→∞ 0
Indeed, for any partition {0 = t 0 < . . . < t j < . . . < t M = t} of [0, t] we find that
M
(1)
M
W (t j − t j−1 , ρ(t j−1 ), ρ(t j )) ≤ lim inf W (tτk (t j ) − tτk (t j−1 ), ρτk (t j−1 ), ρτk (t j ))
k→∞
j=1 j=1
tτk (t)
= lim inf Wτk (r ) dr ,
k→∞ 0
with (1) due to (4.86c). Then (7.33) follows by taking the supremum over all partitions. On
the other hand, by Fatou’s Lemma we find that
tτk (t) t
lim inf S ()
ρτk (r )) dr ≥ S − (ρ(r ))dr ,
k→∞ 0 0
so that (7.31) follows from taking the lim inf k→∞ in (7.20) for s = 0.
123
Jump processes as generalized gradient flows Page 67 of 85 33
Having established the abstract compactness result of Theorem 7.7, we now apply this to the
proof of Theorem 7.4. As described above, under Assumptions (Vπκ), (R ∗ α), and (E φ)
the conditions of Theorem 7.7 are fulfilled, and Theorem 7.7 provides us with a curve ρ :
[0, T ] → M+ (V ) that is continuous with respect to setwise convergence such that
t
W(ρ; [0, t]) + S − (ρ(r ))dr + E (ρ(t)) ≤ E (ρ0 ) for all t ∈ [0, T ]. (7.34)
0
To conclude the proof of Theorem 7.4, we now show that the Energy-Dissipation inequality
(5.6) can be derived from (7.34).
We first note that Corollary 4.22 implies the existence of a flux j such that (ρ, j ) ∈
T
CE(0, T ) and W(ρ; [0, T ]) = 0 R (ρt , j t ) dt. Then from Corollary 7.11 below, we find
−
that S (ρ(r )) ≥ D (ρ(r )) for all r ∈ [0, T ]. Combining these results with (7.34) we find
the required estimate (5.6).
It remains to prove the inequality S − ≥ D , which follows from the corresponding
inequality S ≥ D for the non-relaxed slope (Theorem 7.9) with the lower semicontinuity
of D that is assumed in Theorem 7.4. This is the topic of the next section.
Given the structure of this definition, the proof of the inequality S ≥ D naturally proceeds
by constructing an admissible curve (ρ, j ) ∈ CE(0, T ) such that ρ |t=0 = ρ and such that
the expression in braces can be related to D (ρ).
For the systems of this paper, the construction of such a curve faces three technical
difficulties: the first is that ρ needs to remain nonnegative, the second is that φ may be
unbounded at zero, and the third is that the function Dφ (u, v) in (4.53c) that defines D may
be infinite when u or v is zero (see Example 5.2).
We first prove a lower bound for the generalized slope S involving D− φ , under the basic
conditions on the (E , R , R ∗ ) system presented in Sect. 3.
123
33 Page 68 of 85 M. A. Peletier et al.
where
which vanishes if α(u, v) < ε/2 or min(u, v) < ε/2 or max(u, v) ≥ 2/ε, and coincides
with α if α ≥ ε, min(u, v) ≥ ε, and max(u, v) ≤ 1/ε. Since gε is Lipschitz, it is easy
to check that Gε satisfies all the assumptions (6.1a,b,c,d) and also (6.7) for a = 0, since
0 = gε (0, 0) ≤ gε (0, v) for every v ≥ 0 and every (x, y) ∈ E.
It follows that for every nonnegative u 0 ∈ L 1 (X , π) there exists a unique nonnegative
solution u ε ∈ C1 ([0, ∞); L 1 (V , π)) of the Cauchy problem (6.15) induced by Gε with initial
datum u 0 and the same total mass. Henceforth, we set ρtε = u εt π for all t ≥ 0.
Setting 2 j εt (dx, dy):=wtε (x, y)ϑ(dx, dy), where wtε (x, y):=Gε (x, y; u t (x), u t (y)), it is
also easy to check that (ρ ε , j ε ) ∈ A(0, T ), since gε (u, v) ≤ α(u, v) and
|wtε (x, y)| ≤ |ξ |α(u εt (x), u εt (y))χ Uε (t) (x, y) for (x, y) ∈ E ,
and consequently
τ
S (ρ0 ) ≥ lim sup τ −1 E (ρ0 ) − E (ρτε ) − R (ρtε , j εt ) dt
τ ↓0 0
1
= Bφ (u 0 (x), u 0 (y), w0ε (x, y)) − ϒ(u 0 (x), u 0 (y), w0ε (x, y)) ϑ(dx, dy).
2 E
(7.38)
Let us now set k to be the truncation of φ (u 0 (x)) − φ (u 0 (y)) to [−k, k], i.e.
* +
k (x, y):= max −k, min k, φ (u 0 (x)) − φ (u 0 (y)) ,
and ξk (x, y):=( ∗ ) (k (x, y)) for each k ∈ N. Notice that ξk is a bounded measurable
skew-symmetric map satisfying |ξk (x, y)| ≤ k for every (x, y) ∈ E and k ∈ N. Therefore,
inequality (7.38) holds for w0ε (x, y) = ξk (x, y) gε (u 0 (x), u 0 (y)), (x, y) ∈ E. We then
observe from Lemma 4.19(3) that
(φ (u 0 (x)) − φ (u 0 (y))) · ξk (x, y) ≥ k (x, y)ξk (x, y)
(7.39)
= (ξk (x, y)) + ∗ (k (x, y)) ,
123
Jump processes as generalized gradient flows Page 69 of 85 33
In the next proposition we finally bound S from below by the Fisher information, by
relying on the existence of a solution to the (E , R , R ∗ ) system, as shown in Sect. 6.
Proposition 7.10 Let us suppose that for ρ ∈ D(E ) there exists a solution to the (E , R , R ∗ )
system. Then the generalized slope bounds the Fisher information from above:
Therefore
1
S (ρ0 ) ≥ lim inf E (ρ0 ) − E (ρt ) − W (t, ρ0 , ρt )
t↓0t
t
1 1 t
≥ lim inf E (ρ0 ) − E (ρt ) − R (ρr , j r ) dr = lim inf D (ρr ) dr .
t↓0 t 0 t↓0 t 0
Since u t → u 0 in L 1 (V ; π) as t → 0 and since D is lower semicontinuous with respect to
L 1 (V , π)-convergence (see the proof of Proposition 5.3), with a change of variables we find
1
S (ρ0 ) ≥ lim inf D (ρts ) ds ≥ D (ρ0 ).
t↓0 0
We then easily get the desired lower bound for S − in terms of D , under the condition that
the latter functional is lower semicontinuous (recall that Proposition 5.3 provides sufficient
conditions for the lower semicontinuity of D ):
Corollary 7.11 Let us suppose that Assumptions (Vπκ), (R ∗ α), (E φ) hold and that D is
lower semicontinuous with respect to setwise convergence. Then
As discussed in Example 4.18, the cosh and the quadratic case provide examples in which
the Fisher information functional D is lower semicontinuous. When π is purely atomic, then
D is lower semicontinuous for all the examples mentioned in Sect. 1.3. In the case when π is
not purely atomic, the lower semicontinuity of D is related to the convexity of the function Dφ
(4.53). We show in Appendix E that all the power means in (1.34) for p ∈ [−∞, −1]∪[0, 1),
together with ∗ in (1.35), do lead to Dφ ’s that are convex and lower semicontinuous, and
ultimately to the lower semicontinuity of D .
Remark 7.12 The combination of Theorem 7.9, Proposition 7.10, and Corollary 7.11 illus-
trates why we introduced both Dφ and D−
φ . For the duration of this remark, consider both
123
33 Page 70 of 85 M. A. Peletier et al.
In the two guiding cases of Example 4.18, Dφ is convex and lower semicontinuous, but D− φ
is only lower semicontinuous. As a result, D is lower semicontinuous with respect to setwise
convergence, but D − is not (indeed, consider e.g. a sequence ρn converging setwise to ρ, with
dρn /dπ given by characteristic functions of some sets An , where the sets An are chosen such
that for the limit the density dρ/dπ is strictly positive and non-constant; then D − (ρn ) = 0 for
all n while D − (ρ) > 0). Setwise lower semicontinuity of D is important for two reasons: first,
this is required for stability of solutions of the Energy-Dissipation balance under convergence
in some parameter (evolutionary -convergence), which is a hallmark of a good variational
formulation; and secondly, the proof of existence using the Minimizing-Movement approach
requires the bound (7.43), for which D also needs to be lower semicontinuous. This explains
the importance of Dφ , and it also explains why we defined the Fisher information D in terms
of Dφ and not in terms of D− φ.
−
On the other hand, Dφ is straightforward to determine, and in addition the weaker control
of D− −
φ is still sufficient for the chain rule: it is Dφ that appears on the right-hand side of (4.59).
−
Note that if Dφ itself is convex, then it coincides with Dφ .
Acknowledgements M.A.P. acknowledges support from NWO Grant 613.001.552, “Large Deviations and
Gradient Flows: Beyond Equilibrium". R.R. and G.S. acknowledge support from the MIUR - PRIN project
2017TEXA3H “Gradient flows, Optimal Transport and Metric Measure Structures". G.S. also acknowledges
the support of the Institute of Advanced Study of the Technical University of Munich, of IMATI-CNR, Pavia,
and of the Department of Mathematics of the University of Pavia, where this project was partially carried
out. O.T. acknowledges support from NWO Vidi grant 016.Vidi.189.102, “Dynamical-Variational Transport
Costs and Application to Variational Evolutions". Finally, the authors thank Jasper Hoeksema for insightful
and valuable comments during the preparation of this manuscript.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which
permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give
appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence,
and indicate if changes were made. The images or other third party material in this article are included in the
article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is
not included in the article’s Creative Commons licence and your intended use is not permitted by statutory
regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
arise in the context of large deviations. In this section we describe this context. Throughout this
section we work under Assumptions (Vπκ), (E φ), and (R ∗ α), and since we are interested
in the choices above, we will also assume (A.1), implying that
νρ (dx dy) = u(x)u(y) π(dx)κ(x, dy), if ρ = uπ π.
123
Jump processes as generalized gradient flows Page 71 of 85 33
1
n
ρ n : [0, T ] → M+ (V ), ρtn := δXi ,
n t
i=1
n ∞
1
j n ∈ M+ ((0, T ) × E), j n (dt dx dy):= δt i (dt)δ(X i ,X i ) (dx dy),
n k t− t
i=1 k=1
tki
where is the k th
jump time of Xi , and i
X t− is the left limit (pre-jump state) of X i at time t.
Equivalently, j n is defined by
n ∞
1 i i
j n , ϕ:= ϕ tk , X t i − , X ti i , for ϕ ∈ Cb ([0, T ] × E).
n k k
i=1 k=1
The rate function I0 describes the large deviations of the initial datum ρ0n ; this functional
is determined by the choices of the initial data of X 0i and is independent of the stochastic
process itself, and we therefore disregard it here.
The functional I characterizes the large-deviation properties of the dynamics of the pair
(ρ n , j n ) conditional on the initial state, and has the expression
T
I (ρ, j ) = Fη ( j t |ϑ −
ρt ) dt. (A.2)
0
Remark A.1 Sanov’s theorem can be found in many references on large deviations (e.g. [24,
Sec. 6.2]); the derivation of the expression (A.2) is fairly well known and can be found in
e.g. [56, Eq. (8)] or [43, App. A]. Instead of proving (A.2) we give an interpretation of the
expression (A.2) and the function η in terms of exponential clocks. An exponential clock with
rate parameter r has large-deviation behaviour given by r η(·/r ) (see [24, Exercise 5.2.12] or
[57, Th. 1.5]) in the following sense: for each t > 0,
Prob ≈ βnt firings in time nt ∼ exp −ntr η(β/r ) as n → ∞.
123
33 Page 72 of 85 M. A. Peletier et al.
The expression (A.2) generalizes this to a field of exponential clocks, one for each
edge (x, y). In this case, the rescaled rate parameter r for the clock at edge (x, y) is equal to
ρt (dx)κ(x, dy), since it is proportional to the number of particles nρt (dx) at x and to the rate
of jump κ(x, dy) from x to y. The flux n j t (dx dy) is the observed number of jumps from x
to y, corresponding to firings of the clock associated with the edge (x, y). In this way, the
functional I in (A.2) can be interpreted as characterizing the large-deviation fluctuations in
the clock-firings for each edge (x, y) ∈ E.
The expression (A.2) leads to the functional L in (1.18) after a symmetry reduction,
which we now describe (see also [43, App. A]). Assuming that we are more interested in the
fluctuation properties of ρ than those of j , we might decide to minimize I (ρ, j ) over a class
of fluxes j for a fixed choice of ρ. Here we choose to minimize over the class of fluxes with
the same skew-symmetric part,
" #
A j := j ∈ M([0, T ] × E) : j − s# j = j − s# j .
By the form (1.23) of the continuity equation and the definition (1.6b) of the divergence
we have div j = div j for all j ∈ A j , so that replacing j by j preserves the continuity
equation.
Formal Lemma A.2 The minimum of I (ρ, j ) over all j ∈ A j is achieved for the ‘skew-
symmetrization’ j = 21 ( j − s# j ), and for j the result equals 21 L :
This implies that the two functionals I and L can be considered to be the same, if one
is only interested in ρ, not in j . By the Contraction Principle (e.g. [24, Sec. 4.2.1]) the
functional ρ → inf j I (ρ, j ) = inf j 21 L (ρ, j ) also can be viewed as the large-deviation rate
function of the sequence of empirical measures ρ n .
The above lemma is only formal because we have not given a rigorous definition of the
functional L . While it would be possible to do so, using the construction of Lemma 2.3 and
the arguments of the proof below, actually the rest of this paper deals with this question in
a more detailed manner. In addition, in the context of this paper, this lemma only serves to
explain why we consider this specific class of functionals L . Therefore here we only give
heuristic arguments.
Proof We assume throughout this (formal) proof that all measures are absolutely contin-
uous, strictly positive, and finite where necessary. Note that writing ρt = u t π we have
ϑ−ρt (dx dy) = u t (x)ϑ(dx dy), and using (A.1) we therefore have
,
ϑ− −
ρt s# ϑ ρt (dx dy) = u t (x)u t (y) ϑ(dx dy) = νρt (dx dy), and
ds# ϑ −
ρt u t (y)
log (x, y) = log = ∇φ (u t )(x, y).
dϑ −
ρt u t (x)
123
Jump processes as generalized gradient flows Page 73 of 85 33
For the length of this proof we write η̂ for the perspective function corresponding to η (see
(2.14) in Lemma 2.3)
⎧ a
⎪
⎨a log b − a + b if a, b > 0,
⎪
η̂(a, b):= 0 if a = 0,
⎪
⎪
⎩+∞ if a > 0, b = 0.
* +
ψ : R × [0, +∞)2 → [0, +∞], ψ(s ; c, d):=inf a,b≥0 η̂(a, c) + η̂(b, d) : a − b = 2s ,
In the last identity we used the fact that since div j t = −∂t ρt , formally we have
T T T
1
w ∇φ (u t ) dϑdt = ∇φ (u t ) d j t dt = φ (u t ), ∂t ρt dt = E (ρT ) − E (ρ0 ).
0 E 2 t 0 E 0
123
33 Page 74 of 85 M. A. Peletier et al.
The expression on the right-hand side of (A.4) is one half times the functional L defined
in (1.18) (see also (1.21)). This proves that
1
inf I (ρ, j ) = L ρ, j .
j ∈A j 2
From convexity of and symmetry of νρ we deduce that L (ρ, j ) ≤ L (ρ, j ) for any j ;
see Remark 4.12. The identity L ρ, j = inf j ∈A j L (ρ, j ) then follows immediately;
this proves (A.3).
To prove the second part of the Lemma, we write
* + * +
inf I (ρ, j ) : ∂t ρ + div j = 0 = inf inf I (ρ, j ) : ∂t ρ + div j = 0 ,
j j j ∈A j
* +
= inf inf 2 L (ρ,
1
j ) : ∂t ρ + div j = 0 ,
j j ∈A j
* +
= inf 2 L (ρ,
1
j ) : ∂t ρ + div j = 0 ,
j
* +
= inf 21 L (ρ, j ) : ∂t ρ + div j = 0 .
j
In this Section we complete the analysis of the continuity equation by carrying out the proofs
of Lemma 4.4 and Corollary 4.3.
Proof of Lemma 4.4 The distributional identity (4.6) yields that for every ζ ∈ Cb (V , τ ) the
map
t → ρt (ζ ) := ζ (x)ρt (dx) belongs to W 1,1 (a, b),
V
which implies
dt (V ) ≤ 2| j t |(E).
Hence, the set L ζ of the Lebesgue points of t → ρt (ζ ) has full Lebesgue measure.
Choosing ζ ≡ 1 one immediately recognizes that ρt (V ) is (essentially) constant: it is not
123
Jump processes as generalized gradient flows Page 75 of 85 33
+
- inducing the weak topology in M (V ) (see e.g. [4, § 5.1.1]). By introducing the
is a distance
set L Z := ζ ∈Z L ζ , it follows from (B.2) that
t
d(ρs , ρt ) ≤ 2 | j r |(E) dr (B.3)
s
which shows that the measures (ρt )t∈L Z are uniformly continuous with respect to the total
variation metric in M+ (V ) and thus can be extended to an absolutely continuous curve
ρ̃ ∈ AC(I ; M+ (V )) satisfying (B.5) for every s, t ∈ I .
When ϕ ∈ Cb (V ), (4.4) immediately follows from (B.1). By a standard argument based
on the functional monotone class Theorem [9, §2.12] we can extend the validity of (4.4) to
every bounded Borel function.
If ϕ ∈ C1 ([a, b]; Bb (V )), combining (B.1) and the fact that the map t → V ϕ(t, x) ρ̃t (dx)
is absolutely continuous we easily get (4.8).
Proof of Corollary 4.3 Keeping the same notation of the previous proof, if we define
T
γ :=ρ0 + dt dt
0
123
33 Page 76 of 85 M. A. Peletier et al.
The main result of this Section is Lemma C.3 ahead, invoked in the proof of Proposition
4.21. It provides the construction of a smooth function estimating the entropy density φ from
below and such that the function (r , s) → ∗ (Aω (r , s))α(r , s) fulfills a suitable bound, cf.
(C.10) ahead. Prior to that, we prove the preliminary Lemmas C.1 and C.2 below.
Lemma C.1 Let us suppose that α satisfies Assumptions (R ∗ α). Then for every a ≥ 0
α(r , a) α(a, r )
lim = lim = 0. (C.1)
r →+∞ r r →+∞ r
Proof Since α is symmetric it is sufficient to prove the first limit. Let us first observe that the
concavity of α yields the existence of the limit since the map r → r −1 (α(r , a) − α(0, a)) is
decreasing, so that
α(r , a) α(r , a) − α(0, a) α(r , a) − α(0, a)
lim = lim = inf .
r →+∞ r r →+∞ r r >0 r
Let us call L(a) ∈ R+ the above quantity. The inequality (following by the concavity of α
and the fact that α(0, 0) ≥ 0)
α(r , a) ≤ λα(r /λ, a/λ) for every λ ≥ 1 (C.2)
yields
α(r , a) α(r /λ, a/λ)
L(a) = lim ≤ lim = L(a/λ) for every λ ≥ 1. (C.3)
r →+∞ r r →+∞ r /λ
For every b ∈ (0, a) and r > 0, setting λ:=a/b > 1, we thus obtain
α(r , b) − α(0, b)
L(a) ≤ L(b) ≤
r
Passing first to the limit as b ↓ 0 and using the continuity of α we get
α(r , 0) − α(0, 0)
L(a) ≤ for every r > 0.
r
Eventually, we pass to the limit as r ↑ +∞ and we get L(a) ≤ α∞ (1, 0) = 0 thanks to
(3.13).
123
Jump processes as generalized gradient flows Page 77 of 85 33
Since limr →+∞ f (r ) = +∞, the minimizing set in (C.7) is closed and not empty, so that
the algorithm is well defined. It yields a sequence xn satisfying
so that (xn )n∈N is strictly increasing and unbounded, and induces a partition {0 = x 0 <
x1 < x1 < · · · < xn < · · · } of R+ . We can thus consider the piecewise linear function
g : R+ → R+ such that
g(xn ):=nδ, g((1 − t)xn + t xn+1 ):=(n + t)δ for every n ∈ N, t ∈ [0, 1]. (C.9)
123
33 Page 78 of 85 M. A. Peletier et al.
Since α is concave, the function x → α(x, 1)/x is nonincreasing in (0, +∞); we can thus
define the nondecreasing function Q(x):=P(x/α(x, 1)) and the function
x
1
γ (x):=2g0 + min(β (y), Q (y)) dy for every x ≥ 1, g0 := min(β0 , Q(1)) > 0.
1 2
By construction γ (1) = 2g0 = min(β0 , Q(1)) ≤ β (1) so that γ (x) ≤ min(β (x), Q(x)) for
every x ≥ 1. We eventually set
et
f (t):= t ≥ 0.
γ (et )
Clearly, we have f (0) = 2g0 . Furthermore, we combine the estimate γ (et ) ≤ Q(et ) =
P(et /α(et , 1)) with the facts that et /α(et , 1) → +∞ as t → +∞, thanks to Lemma C.1,
and that P has sublinear growth at infinity, being the inverse function of ∗ . All in all, we
conclude that
Therefore, we are in a position to apply Lemma C.2, obtaining an increasing concave function
g : R+ → R+ such that g0 = g(0) ≤ g(t) ≤ f (t) and limt→+∞ g(t) = +∞. Since
g(0) ≥ 0, the concaveness of g yields g(t ) − g(t ) ≤ g(t − t ) for every 0 ≤ t ≤ t , so
that the function h(x):=g(log(x ∨ 1)) satisfies h(x) = g0 ≤ β (x) for x ∈ [0, 1], and
h(z) ≤ min(β (z), Q(z)) for every z ≥ 1, h(y) − h(x) ≤ h(y/x) for every 0 < x ≤ y.
(C.11)
In fact, if x ≤ 1 we get
and if x ≥ 1 we get
123
Jump processes as generalized gradient flows Page 79 of 85 33
Preliminarily, with the reference measure π ∈ M+ (V ) and with the ‘jump equilibrium rate’
ϑ from (3.5) we associate the ‘graph divergence’ operator divπ,ϑ : L p (E; ϑ) → L p (V ; π),
p ∈ [1, +∞], defined as the transposed of the ‘graph gradient’ ∇ : L q (V ; π) → L q (E; ϑ),
with q = p . Namely
or, equivalently,
ξ π = −div(ζ ϑ) (D.1)
(with div the divergence operator from (1.6)) in the sense of measures.
We can now first address the connectivity problem in the very specific setup
Now, by a general duality result on linear operators, the operator − divπ,ϑ : L p (E; ϑ) →
L p (V ; π) is surjective if and only if the graph gradient ∇ : L q (V ; π) → L q (E; ϑ) fulfills
the following property:
∃ C > 0 ∀ ξ ∈ L (V ; π) with
q
ξ π(dx) = 0 there holds ξ L q (V ;π ) ≤ C∇ξ L q (E;ϑ) ,
V
namely the q-Poincaré inequality (4.84). We can thus conclude the following result.
Lemma D.1 Suppose that α ≡ 1, that has p-growth (cf. (4.85)), and that the measures
(π, ϑ) satisfy a q-Poincaré inequality for q = p−1 p
. Let ρ0 , ρ1 ∈ M+ (V ) be given by
ρi = u i π, with positive u i ∈ L (V ; π), for i ∈ {0, 1}. Then, for every τ ∈ (0, 1) we have
p
W (τ, ρ0 , ρ1 ) < +∞. If (r ) = 1p |r | p , q-Poincaré inequality is also necessary for having
W (τ, ρ0 , ρ1 ) < +∞.
123
33 Page 80 of 85 M. A. Peletier et al.
Proof of Proposition 4.25 Assume that ρ0 (V ) = V u 0 (x)π(dx) = π(V ). Hence, it is suffi-
cient to provide a solution for the connectivity problem between u 0 and u 1 ≡ 1. We may also
assume without loss of generality that α(u, v) ≥ α0 (u, v) with α0 (u, v) = c0 min(u, v, 1)
for some c0 > 0, so that
w w w p
α(u, v) ≤ α0 (u, v) ≤ C p 1 + α0 (u, v)
α(u, v) α0 (u, v) α0 (u, v)
≤ C p c0 + C p |w| p (α0 (u, v))1− p ,
(D.5)
where the first estimate follows from the convexity of and the fact that (0) = 0, yielding
that λ → λ(w/λ) is non-increasing. It is therefore sufficient to consider the case in which
c0 = C p = 1, α0 (u, v) = min(u, v, 1), and to solve the connectivity problem for (r ˜ )=
p |r | . By Lemma D.1, we may first find w ∈ L (E; ϑ) solving the minimum problem
1 p p
(D.4) in the case α ≡ 1, so that the flux density ζt ≡ 21 w is associated with the curve
u t = (1−t)u 0 + tu 1 , t ∈ [0, τ ]. Then, we fix an exponent γ > 0 and we consider the
rescaled curve ũ t := u t γ , that fulfills ∂t ũ t = −divπ,ϑ (ζ̃t ) with ζ̃t = 21 w̃t = 21 γ t γ −1 w.
Moreover,
w̃t (x, y)
α(ũ t (x), ũ t (y))ϑ(dx, dy)
E α(ũ t (x), ũ t (y))
γ p t p(γ −1) |w(x, y)| p t γ (1− p) ϑ(dx, dy) = C p c0 ϑ(E) + γ p t γ − p w L p (E;ϑ) .
p
≤ C p c0 ϑ(E) +
E
hence A (0, τ ; ρ0 , ρ1 ) = ∅.
Here, we prove the claim that Dφ is convex when φ is the Boltzmann entropy (1.5) and
the pair (α, ∗ ) is induced by the power means m p (see Example 1 in Sect. 1.3), for p ∈
[−∞, −1] ∪ [0, 1). More precisely, we consider the case, α(u, v):=m p (u, v) = vf p (u/v),
p ∈ [ − ∞, 1), where f p is the concave generating function
⎧
⎪
⎨√2−1/ p (r p + 1)1/ p if p ∈ (−∞, 1) \ {0},
f p (r ):= r if p = 0, r > 0,
⎪
⎩
min(r , 1) if p = −∞,
and
123
Jump processes as generalized gradient flows Page 81 of 85 33
exp ξ r − 1 dr
∗p (ξ ):=21/ p , ξ ∈ R.
1 f p (r ) r
We consider here only the case p < 1 since m p is not concave for p > 1 and ∗p is not
superlinear for p = 1. We have for u, v > 0
D p,φ (u, v) = v K p (u/v) where K p (ξ ):=f p (ξ ) ∗p (ln ξ ) ξ > 0,
so that
u/v r −1 dr
D p,φ (u, v) = (u + v )
p p 1/ p
if p ∈ (−∞, 1) \ {0},
1 (r p + 1)1/ p r
√ √ 2
D0,φ (u, v) = 2 u − v ,
D−∞,φ (u, v) = |u − v| − min(u, v)| ln u − ln v|.
D p,φ takes a more explicit and remarkable form also in the case of the harmonic mean
p = −1:
(u − v)2
D−1,φ (u, v) = .
u+v
It is also interesting to note that D p,φ has a natural lower semicontinuous extension to the
boundary {0} × (0, ∞) ∪ (0, ∞) × {0} = ∂(0, ∞)2 \ {(0, 0)} given by
⎧
⎪
⎨+∞ if p > 0,
D p,φ (u, v) = 2(u + v) if p = 0, (u, v) ∈ ∂(0, ∞)2 \ {(0, 0)}.
⎪
⎩
u+v if p < 0
The main result of this section is summarized in the following lemma (where we just consider
the range p ∈ [ − ∞, 1)).
Lemma E.1 D p,φ is convex on (0, ∞)2 if and only if p ∈ [−∞, −1] ∪ [0, 1).
Therefore the assertion holds if and only if H p (ξ ) ≥ 0 for ξ > 0. Notice that H p (1) =
22−1/ p ≥ 0. We now show that if p ∈ ( − ∞, −1] ∪ (0, 1) then H p takes its minimum at
ξ = 1. Since
!
1/2 − ξ −1/2 ξ p+1/2 − ξ −( p+1/2)
−1/ p −1/2 ξ
H p (ξ ) = 2 p(ξ + 1)
p
ξ + ,
2 2
. /0 1
=:G p (ξ )
123
33 Page 82 of 85 M. A. Peletier et al.
Since G̃ p (η) is an odd function, H p takes its minimum at ξ = 1 if G̃ p (η) ≥ 0 for η > 0.
This property clearly holds if p ∈ (0, 1). On the other hand, if p < 0, G̃ p (η) ≥ 0 if and only
if (1 + p)η ≤ 0 for all η ≥ 0, i.e. the cases p ≤ −1 apply. Altogether, we conclude that
H p (ξ ) ≥ H p (1) ≥ 0 and therefore K p (ξ ) ≥ 0 for every ξ > 0 if p ∈ ( − ∞, −1] ∪ (0, 1).
If p ∈ (−1, 0) then
(ξ p + 1)1−1/ p ξ − p + ξ = ξ + o(ξ ) as ξ → +∞
and
ξ r −1 dr
= ξ + o(ξ ) as ξ → +∞
1 (r p + 1)1/ p r
so that
H p (ξ ) = pξ + o(ξ ) → −∞ as ξ → +∞.
It follows that K p takes strictly negative values in (0, +∞) and therefore it is not convex.
References
1. Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: From a large-deviations principle to the Wasserstein
gradient flow: a new micro-macro passage. Commun. Math. Phys. 307, 791–815 (2011)
2. Adams, S., Dirr, N., Peletier, M.A., Zimmer, J.: Large deviations and gradient flows. Philos. Trans. R.
Soc. A Math. Phys. Eng. Sci. 371(2005), 20120341 (2013)
3. Ambrosio, L., Fusco, N., Pallara, D.: Functions of Bounded Variation and Free Discontinuity Problems.
Oxford University Press, Oxford (2005)
4. Ambrosio, L., Gigli, N., Savaré, G.: Gradient Flows in Metric Spaces and in the Space of Probability
Measures. Lectures in Mathematics ETH Zürich. Birkhäuser, Basel (2008)
5. Ambrosio, L.: Minimizing movements. Rend. Accad. Naz. Sci. XL Mem. Mat. Appl. 19(5), 1773–1799
(1995)
6. Arnrich, S., Mielke, A., Peletier, M.A., Savaré, G., Veneroni, M.: Passing to the limit in a Wasserstein
gradient flow: from diffusion to reaction. Calc. Var. Partial. Differ. Equ. 44, 419–454 (2012)
7. Almgren, F., Taylor, J.E., Wang, L.: Curvature-driven flows: a variational approach. SIAM J. Control.
Optim. 31, 387–437 (1993)
8. Benamou, J.-D., Brenier, Y.: A computational fluid mechanics solution to the Monge–Kantorovich mass
transfer problem. Numer. Math. 84(3), 375–393 (2000)
9. Bogachev, V.I.: Measure Theory, vol. I, II. Springer, Berlin (2007)
10. Bonaschi, G.A., Peletier, M.A.: Quadratic and rate-independent limits for a large-deviations functional.
Contin. Mech. Thermodyn. 28, 1191–1219 (2016)
11. Brezis, H.: Functional Analysis. Sobolev Spaces and Partial Differential Equations. Springer, New York
(2011)
12. Bullen, P.S.: Handbook of Means and their Inequalities. Mathematics and its Applications. Springer,
Netherlands (2003)
13. Chow, S.-N., Huang, W., Li, Y., Zhou, H.: Fokker–Planck equations for a free energy functional or Markov
process on a graph. Arch. Ration. Mech. Anal. 203(3), 969–1008 (2012)
14. Crandall, M.G., Tartar, L.: Some relations between nonexpansive and order preserving mappings. Proc.
Am. Math. Soc. 78(3), 385–390 (1980)
15. Castaing, C., Valadier, M.: Convex Analysis and Measurable Multifunctions. Lectures Notes in Mathe-
matics, vol. 580. Springer, Berlin, New York (1977)
16. Dondl, P., Frenzel, T., Mielke, A.: A gradient system with a wiggly energy and relaxed EDP-convergence.
ESAIM Control Optim. Calc. Var. 25, 68 (2019)
17. De Giorgi, E., Marino, A., Tosques, M.: Problems of evolution in metric spaces and maximal decreasing
curve. Atti Accad. Naz. Lincei Rend. Cl. Sci. Fis. Mat. Natur. (8) 68(3), 180–187 (1980)
123
Jump processes as generalized gradient flows Page 83 of 85 33
18. Dellacherie, C., Meyer, P.-A.: Probabilities and potential, volume 29 of North-Holland Mathemat-
ics Studies. North-Holland Publishing Co., Amsterdam-New York; North-Holland Publishing Co.,
Amsterdam-New York (1978)
19. Dal Maso, G., DeSimone, A., Mora, M.G.: Quasistatic evolution problems for linearly elastic-perfectly
plastic materials. Arch. Ration. Mech. Anal. 180(2), 237–291 (2006)
20. Dolbeault, J., Nazaret, B., Savaré, G.: A new class of transport distances between measures. Calc. Var.
Part. Differ. Equ. 34(2), 193–231 (2009)
21. Dolbeault, J., Nazaret, B., Savaré, G.: A new class of transport distances between measures. Calc. Var.
Part. Differ. Equ. 34(2), 193–231 (2009)
22. Duong, M.H., Peletier, M.A., Zimmer, J.: GENERIC formalism of a Vlasov–Fokker–Planck equation and
connection to large-deviation principles. Nonlinearity 26, 2951–2971 (2013)
23. Dirr, N., Stamatakis, M., Zimmer, J.: Entropic and gradient flow formulations for nonlinear diffusion. J.
Math. Phys. 57(8), 081505 (2016)
24. Dembo, A., Zeitouni, O.: Large Deviations Techniques and Applications. Springer, New York (1998)
25. Erbar, M., Fathi, M., Laschos, V., Schlichting, A.: Gradient flow structure for McKean–Vlasov equations
on discrete spaces. Disc. Contin. Dyn. Syst. 36(12), 6799–6833 (2016)
26. Erbar, M., Fathi, M., Schlichting, A.: Entropic curvature and convergence to equilibrium for mean-field
dynamics on discrete spaces. arXiv preprint arXiv:1908.03397 (2019)
27. Evans, L.C., Gariepy, R.F.: Measure Theory and Fine Properties of Functions. Studies in Advanced
Mathematics. CRC Press, Boca Raton, FL (1992)
28. El Hajj, A., Ibrahim, H., Monneau, R.: Dislocation dynamics: from microscopic models to macroscopic
crystal plasticity. Continuum Mech. Thermodyn. 21(2), 109–123 (2009)
29. Erbar, M., Maas, J.: Gradient flow structures for discrete porous medium equations. Disc. Contin. Dyn.
Syst. 34(4), 1355–1374 (2014)
30. Erbar, M.: Gradient flows of the entropy for jump processes. Ann. Inst. Henri Poincaré Probab. Stat. 50(3),
920–945 (2014)
31. Erbar, M.: A gradient flow approach to the Boltzmann equation. Arxiv preprint arXiv:01603.00540 (2016)
32. Feinberg, M.: On chemical kinetics of a certain class. Arch. Ration. Mech. Anal. 46(1), 1–41 (1972)
33. Figalli, A., Gangbo, W., Yolcu, T.: A variational method for a class of parabolic PDEs. Annali della Scuola
Normale Superiore di Pisa-Classe di Scienze 10(1), 207–252 (2011)
34. Gigli, N.: On the heat flow on metric measure spaces: existence, uniqueness and stability. Calc. Var.
Partial. Differ. Equ. 39(1–2), 101–120 (2010)
35. Glitzky, A., Mielke, A.: A gradient structure for systems coupling reaction-diffusion effects in bulk and
interfaces. Z. Angew. Math. Phys. 64(1), 29–52 (2013)
36. Gavish, N., Nyquist, P., Peletier, M.: Large deviations and gradient flows for the Brownian one-dimensional
hard-rod system. arXiv preprint arXiv:1909.02054 (2019). https://doi.org/10.1007/s11118-021-09933-0
37. Grmela, M.: Particle and bracket formulations of kinetic equations. In: Marsden, J.E. (ed) Proceedings of
the AMS–IMS–SIAM Joint Summer Research Conference in the Mathematical Sciences on Fluids and
Plasmas: Geometry and Dynamics, pp 125–132 (1984)
38. Grmela, M.: Multiscale equilibrium and nonequilibrium thermodynamics in chemical engineering. Adv.
Chem. Eng. 39, 75–129 (2010)
39. Girault, V., Wheeler, M.F.: Numerical discretization of a Darcy–Forchheimer model. Numer. Math. 110(2),
161–198 (2008)
40. Hudson, T., van Meurs, P., Peletier, M.A.: Atomistic origins of continuum dislocation dynamics. Math.
Models Methods Appl. Sci. 30(13), 2557–2618 (2020)
41. Hairer, E., Wanner, G.: Solving ordinary differential equations. II, volume 14 of Springer Series in
Computational Mathematics, 2nd edn. Springer, Berlin (1996). Stiff and differential-algebraic problems
42. Jordan, R., Kinderlehrer, D., Otto, F.: The variational formulation of the Fokker–Planck equation. SIAM
J. Math. Anal. 29(1), 1–17 (1998)
43. Kaiser, M., Jack, R.L., Zimmer, J.: Canonical structure and orthogonality of forces and currents in irre-
versible Markov chains. J. Stat. Phys. 170(6), 1019–1050 (2018)
44. Knupp, P.M., Lage, J.L.: Generalization of the Forchheimer-extended Darcy flow model to the tensor
permeability case via a variational principle. J. Fluid Mech. 299, 97–104 (1995)
45. Kipnis, C., Olla, S., Varadhan, S.R.S.: Hydrodynamics and large deviation for simple exclusion processes.
Commun. Pure Appl. Math. 42(2), 115–137 (1989)
46. Liero, M., Mielke, A., Peletier, M.A., Renger, D.R.M.: On microscopic origins of generalized gradient
structures. Disc. Cont. Dyn. Syst. Ser. S 10(1), 1 (2017)
47. Liero, M., Mielke, A., Savaré, G.: Optimal entropy-transport problems and a new Hellinger–Kantorovich
distance between positive measures. Invent. Math. 211(3), 969–1117 (2018)
123
33 Page 84 of 85 M. A. Peletier et al.
48. Luckhaus, S., Sturzenhecker, T.: Implicit time discretization for the mean curvature flow equation. Calc.
Var. Partial. Differ. Equ. 3(2), 253–271 (1995)
49. Maas, J.: Gradient flows of the entropy for finite Markov chains. J. Funct. Anal. 261(8), 2250–2292 (2011)
50. Martin, R.H.: Jr.: Nonlinear operators and differential equations in Banach spaces. Pure and Applied
Mathematics. Wiley, New York (1976)
51. Mielke, A.: Evolution in rate-independent systems. In: Handbook of Differential Equations: Evolutionary
Differential Equations, pp 461–559. North-Holland (2005)
52. Mielke, A.: Geodesic convexity of the relative entropy in reversible Markov chains. Calc. Var. Part. Differ.
Equ. 48(1–2), 1–31 (2013)
53. Mielke, A.: Deriving effective models for multiscale systems via evolutionary -convergence. In: Control
of Self-Organizing Nonlinear Systems, pp. 235–251. Springer, New York (2016)
54. Mielke, A.: On evolutionary Γ -convergence for gradient systems. In: Macroscopic and large scale phe-
nomena: coarse graining, mean field limits and ergodicity, vol 3 of Lecture Notes Applied Mathematics
Mechanical, pp 187–249. Springer, New York (2016)
55. Mielke, A., Montefusco, A., Peletier, M.A.: Exploring families of energy-dissipation landscapes via
tilting—three types of EDP convergence. arXiv preprint arXiv:2001.01455 (2020). https://doi.org/10.
1007/s00161-020-00932-x
56. Maes, C., Netočný, K.: Canonical structure of dynamical fluctuations in mesoscopic nonequilibrium
steady states. Europhys. Lett. 82(3), 30003 (2008)
57. Mörters, P.: Introduction to large deviations. Technical report, University of Bath (2010)
58. Mielke, A., Patterson, R.I.A., Peletier, M.A., Renger, D.R.M.: Non-equilibrium thermodynamical prin-
ciples for chemical reactions with mass-action kinetics. SIAM J. Appl. Math. 77(4), 1562–1585 (2017)
59. Mielke, A., Peletier, M.A., Renger, D.R.M.: On the relation between gradient flows and the large-deviation
principle, with applications to Markov chains and diffusion. Potential Anal. 41(4), 1293–1327 (2014)
60. Mielke, A., Peletier, M.A., Renger, D.R.M.: A generalization of Onsagers reciprocity relations to gradient
flows with nonlinear mobility. J. Non-Equil. Thermodyn. 41(2), 141–149 (2016)
61. Mielke, A., Roubíček, T.: Rate-independent systems. Theory and application, volume 193 of Applied
Mathematical Sciences. Springer, New York (2015)
62. Mielke, A., Rossi, R., Savaré, G.: BV solutions and viscosity approximations of rate-independent systems.
ESAIM Control Optim. Calc. Variat. 18(01), 36–80 (2012)
63. Mielke, A., Rossi, R., Savaré, G.: Nonsmooth analysis of doubly nonlinear evolution equations. Calc.
Var. Part. Differ. Equ. 46(1–2), 253–310 (2013)
64. Mielke, A., Rossi, R., Savaré, G.: Balanced viscosity (BV) solutions to infinite-dimensional rate-
independent systems. J. Eur. Math. Soc. (JEMS) 18(9), 2107–2165 (2016)
65. Mirrahimi, S., Souganidis, P.E.: A homogenization approach for the motion of motor proteins. Nonlinear
Differ. Equ. Appl. 20(1), 129–147 (2013)
66. Mielke, A., Stephan, A.: Coarse-graining via EDP-convergence for linear fast-slow reaction systems.
Math. Models Methods Appl. Sci. 30(9), 1765–1807 (2020)
67. Mielke, A., Theil, F., Levitas, V.I.: A variational formulation of rate-independent phase transformations
using an extremum principle. Arch. Ration. Mech. Anal. 162(2), 137–177 (2002)
68. Otto, F.: The geometry of dissipative evolution equations: the porous medium equation. Commun. Part.
Differ. Equ. 26, 101–174 (2001)
69. Öttinger, H.C.: On the combined use of friction matrices and dissipation potentials in thermodynamic
modeling. J. Non-Equilib. Thermodyn. 44(3), 295–302 (2019)
70. Peletier, M.A.: Variational modelling: energies, gradient flows, and large deviations. Arxiv preprint
arXiv:1402:1990 (2014)
71. Peletier, M.A., Redig, F., Vafayi, K.: Large deviations in stochastic heat-conduction processes provide a
gradient-flow structure for heat conduction. J. Math. Phys. 55(9), 093301 (2014)
72. Perthame, B., Souganidis, P.E.: Asymmetric potentials and motor effect: a large deviation approach. Arch.
Ration. Mech. Anal. 193(1), 153–169 (2009)
73. Perthame, B., Souganidis, P.E.: Asymmetric potentials and motor effect: a homogenization approach.
Ann. de l’Institut Henri Poincare (C) Non-Linear Anal. 26(6), 2055–2071 (2009)
74. Peletier, M.A., Schlottke, M.C.: Large-deviation principles of switching Markov processes via Hamilton–
Jacobi equations. arXiv preprint arXiv:1901.08478 (2019)
75. Renger, D.R.M.: Flux large deviations of independent and reacting particle systems, with implications
for macroscopic fluctuation theory. J. Stat. Phys. 172(5), 1291–1326 (2018)
76. Rossi, R., Savaré, G., Segatti, A., Stefanelli, U.: Weighted energy-dissipation principle for gradient flows
in metric spaces. J. Math. Pures Appl. 9(127), 1–66 (2019)
77. Tyrrell Rockafellar, R., Roger, J.-B.: Wets: Variational Analysis. Springer, Berlin (1998)
123
Jump processes as generalized gradient flows Page 85 of 85 33
78. Renger, M., Zimmer, J.: Orthogonality of fluxes in general nonlinear reaction networks. Disc. Contin.
Dyn. Syst. Ser. S (2019)
79. Serfaty, S.: Gamma-convergence of gradient flows on Hilbert and metric spaces and applications. Disc.
Contin. Dyn. Syst. 31(4), 1427–1451 (2011)
80. Sandier, E., Serfaty, S.: Gamma-convergence of gradient flows with applications to Ginzburg–Landau.
Commun. Pure Appl. Math. 57(12), 1627–1672 (2004)
Publisher’s Note Springer Nature remains neutral with regard to jurisdictional claims in published maps and
institutional affiliations.
123