Data-Driven Computation of Molecular Reaction Coordinates: Andreas Bittracher, Ralf Banisch, and Christof SCH Utte
Data-Driven Computation of Molecular Reaction Coordinates: Andreas Bittracher, Ralf Banisch, and Christof SCH Utte
Data-Driven Computation of Molecular Reaction Coordinates: Andreas Bittracher, Ralf Banisch, and Christof SCH Utte
Reaction Coordinates
Andreas Bittracher1 , Ralf Banisch1 , and Christof Schutte1,2
1
Department of Mathematics and Computer Science, Freie Universitat Berlin, Germany
arXiv:1712.05215v1 [physics.comp-ph] 14 Dec 2017
2
Zuse Institute Berlin, Germany
Abstract
The identification of meaningful reaction coordinates plays a key role in the study
of complex molecular systems whose essential dynamics is characterized by rare or
slow transition events. In a recent publication, the authors identified a condition
under which such reaction coordinates exist the existence of a so-called transition
manifold and proposed a numerical method for their point-wise computation that
relies on short bursts of MD simulations.
This article represents an extension of the method towards practical applicability
in computational chemistry. It describes an alternative computational scheme that
instead relies on more commonly available types of simulation data, such as single
long molecular trajectories, or the push-forward of arbitrary canonically-distributed
point clouds. It is based on a Galerkin approximation of the transition manifold
reaction coordinates, that can be tuned to individual requirements by the choice of
the Galerkin ansatz functions. Moreover, we propose a ready-to-implement variant
of the new scheme, that computes data-fitted, mesh-free ansatz functions directly
from the available simulation data. The efficacy of the new method is demonstrated
on a realistic peptide system.
Introduction
In recent years, it has become possible to numerically explore the chemically relevant
slow transition processes in systems with several thousands of atoms. This was made
possible due to the increase of raw computational power and deployment of specialized
computing architectures [28], as well as by the development of accellerated integration
schemes that bias the dynamics in the favor for the slow transition processes, yet preserve
the original statistics [1, 11, 17].
To obtain chemical insight about the essential dynamics of the system, this vast
amount of high-dimensional data has to be adequately processed and filtered. One
desirable goal often is a simplified model of the mechanism of action, in which the fast,
1
unimportant processes are averaged out or otherwise disregarded. One way is to con-
struct kinetic models of the system, i.e. identifying metastable reactant-, product- and
possibly intermediate states, and reducing the dynamics to a jump process between
them. Under certain regularity assumptions on the root model that are readily fulfilled,
such a model can be built in an automated, data-driven fashion [5, 26]. However, the
simplicity of the resulting model comes with a price: since the long-time relaxation ki-
netics is described just by jumps between finitely-many discrete states, any information
about the transition process and its dynamical features is lost.
An alternative collection of approaches, to which this paper ultimately contributes,
thus aims at the automated identification of good reaction coordinates or order parame-
ters, mappings from the full to some lower-dimensional, but still continuous state space,
onto which the full dynamics can be projected without loss of the essential processes.
Often enough, this reaction coordinate alone (i.e. without the corresponding dynamical
model) already contains more valuable chemical information than the kinetic models, as
for example the free energy profile along the reaction coordinate allows the determination
of the activation energy of the respective transition process [19].
The systematic and mathematically rigorously motivated construction of reaction co-
ordinates is an area of active research, for an overview see [19]. Where it is available,
chemical expert knowledge can be used to guide the construction [7, 31]. In the context
of transition path theory [32], the committor function is known to be an ideal reaction
coordinate [20]. Related to this, approximations to the dominant eigenfunctions of the
transfer operator are also often considered ideal reaction coordinates [26, 8, 24], which
has been confirmed in [12] for a class of systems with local timescale separation. How-
ever, the computation of both committor functions and transfer operator eigenfunctions
is infeasible for very high-dimensional systems. Moreover, the authors have recently
shown that said eigenfunctions yield redundant reaction coordinates, in the sense that
often a further reduction is possible [4].
In the same work, the authors identified necessary characteristics that reaction coordi-
nates have to exhibit in order to retain the slow processes: Their isolines must correspond
to those of the dominant transfer operator eigenfunctions. What is more, it was shown
that the existence of such reaction coordinates is guaranteed by the existence of a so-
called transition manifold M, a low-dimensional manifold in the function space L1 . The
property that defines M is that, on moderate time scales tfast < t tslow , the transition
density functions of the dynamics concentrate around M. A firm mathematical theory
for the existence and identification of reaction coordinates was developed around this
transition manifold.
The main practical result of [4] was the insight that any parametrization of M can be
turned into a good reaction coordinate. A numerical algorithm was proposed that allows
the point-wise computation of this reaction coordinate and only requires the ability to
generate trajectories of the aforementioned moderate length that start at the desired
evaluation point.
While the method has a solid theoretical foundation and is directly applicable in many
cases, there exists a certain gap to common scenarios in computational practice:
2
1. While the ability to efficiently compute the reaction coordinate only in specific
points is quite remarkable, in practice one often wishes to learn the reaction co-
ordinate in all of the accessible state space (i.e. where data is available), as the
location of the interesting points is unknown in advance.
2. The original method cannot compute the reaction coordinate from dynamical bulk
data such as long equilibrated trajectories or the push-forward of point clouds
that sample the canonical ensemble that is preferably generated by contemporary
simulation methods and software.
In the present work we attempt to close this gap by proposing an alternative, purely
data-driven algorithm for computing the transition manifold reaction coordinate. It is
based on a classical Galerkin approximation of the reaction coordinate with an arbitrary
ansatz space. As with all Galerkin approximations, its numerical approximation requires
only a so-called transition matrix between its discretisation elements. This matrix can
be constructed for example from the aforementioned types of bulk data. We will see that
the same transition matrix appears as an essential ingredient in the construction of the
aforementioned Markov state models. This makes our new method applicable whenever
the construction of such a discrete kinetic model is possible.
Finally, with the objective to create a method that requires only a minimum of a
priori information about the system, we propose a very practical implementation of
this Galerkin approximation that constructs a mesh-free set of Voronoi cell-based ansatz
functions directly from the available simulation data. Interestingly, the task of optimally
choosing the Voronoi centers leads to two well-known and highly scalable algorithms
from data mining, namely the k-Means clustering algorithm and Poisson-disk sampling
algorithm, depending on the chosen error measure. We demonstrate the efficacy of this
method in identifying chemically interpretable essential degrees of freedom and retaining
the original slow timescales of the small peptide Dialanine.
The paper is organized as follows: In Section 1 the basic concepts of timescale-
separated processes and reaction coordinates are introduced. Section 2 reviews the
established transition manifold theory as well as the local burst-based algorithm. In
Section 3, the Galerkin approximation of the transition manifold reaction coordinate is
derived as well as the Voronoi-based implementation. Section 4 demonstrates the ap-
plication of our new method to a simple synthetic example system, as well as to the
realistic Dialanine system.
3
Here P[|] denotes the conditional probability. ptx (y) can thus be interpreted as the
conditional density of Xt = y provided that X0 = x.
Stationarity together with ergodicity implies the existence of a unique equilibrium
density that is invariant under Xt , i.e. that fulfills
X0 Xt t 0 .
holds for all x, y X. Again, this condition is usually satisfied by common molecular
models.
X0 u Xt T t u .
Formally it is defined as
(y) t
Z
t
T u(x) = p (y)u(y) dy.
X (x) x
being the unique invariant density directly implies that the constant function v0 (x)
1 is the only eigenfunction of T t to eigenvalue 0 = 1. Furthermore, it can be shown
that T t is self-adjoint and contractive under an appropriate expansion onto L2 [2], and
as such possesses a real positive point spectrum
1 = t0 < t1 . . . tm
with the remaining spectrum being confined to a ball around the origin of radius Rt
tm .
It is well known [] that the eigenpair (ti , vi ), i = 1, . . . , m then correspond to the i-th
slowest-decaying sub-processes of Xt , with the implied timescale of this process being
defined by
ti = t/ log(ti ) . (1)
We are interested in the case where there exist a 1 d < m such that the d slowest
processes are well-separated from the rest by a timescale gap, i.e. t1 td td td+1 .
The reproduction of these d slowest processes and the associated timescales will be the
primary objective of our coarse graining approaches.
4
Characterization of good reaction coordinates. In our workflow, the first step of
coarse graining a MD system is the identification of a good reaction coordinate (RC).
Following the notions of [18], a RC is simply any C 1 function : X Rk , i.e. an
observable of the full systems state. One may think of a map from the atoms cartesian
positions to some internal degrees of freedom, e.g. certain dihedral angles or interatomic
distances.
The now k-dimensional projected process is then given by (Xt ). As mentioned above,
the the quality of is determined by how well the slowest timescales of (Xt ) correspond
to those of the full process Xt . For that, we consider the projected transfer operator
associated with (Xt ), which is given by Tt : L1 L1 ,
Tt = P T t P ,
P f (x) = E [f (Y ) | (Y ) = (x)] .
In other words, P takes the -weighted average of f along the (x)-level set of and
assigns it to each point of the level set1 . Analogous to T t , Tt describes the transport of
L1 -densities of the process (Xt ).
Due to the definition (1) of implied timescales by the eigenvalues of T t , we now vaguely
call a good reaction coordinate, if
where dom is the set of the d + 1 leading eigenvalues of the respective operator.
In [4] it has been shown that (2) is fulfilled in an appropriate sense whenever
parametrizes the associated eigenfunctions:
Theorem 1.1 (Corollary 3.6 and Lemma 4.2 in [4]). Let (ti , vi ) be an eigenpair of T t .
Assume there exists a function vi : Rk R with
Intuitively, is a good RC if its isolines run almost parallel to the isolines of all the
dominant eigenfunctions vi . Thus, naturally, with the collection of the d dominant eigen-
functions as RC, := (v1 , . . . , vd )| the eigenvalue error (2) is zero. However, working
with dominant eigenfunctions as RCs has two major disadvantages:
1. The eigenproblem is inherently global. If we we wish to learn the value of an
eigenfunction vi at only one location x X, we need an approximation of T t that
is is accurate on all of X. However, the computational effort to construct such an
approximation grows exponentially2 with dim(X).
1
Note that P f is itself a function on X, but is constant along each level set of , and thus essentially
depends only on (X) Rk .
2
There have been attempts to mitigate this ([33, 15]), but we aim to circumvent this problem entirely.
5
2. In metastable systems, the number of dominant eigenfunctions (d + 1) equals the
number of metastable states. This number can be much larger than the opti-
mal dimension of a good RC. Thus, in general, a RC comprised of the dominant
eigenfunctions is redundant.
Assumption 2.1. There exists a lag time t and an r-dimensional manifold M L1 (X)
so that for every x X,
min kf ptx k .
f M
Remark 2.2. This argument has been made somewhat more precise in [4]. However,
rigorous conditions for the existence of a transition manifold of certain dimension are still
lacking and subject of current investigation. In particular, it is easy to find examples that
show that metastability alone is neither sufficient nor required for a transition manifold
to exist.
6
Figure 1
and let E : M Rk be an embedding of M3 . Then one can show [4] that the reaction
coordinate
(x) := E Q (x) (4)
fulfills the requirement of Theorem 1.1 and thus guarantees a good approximation of the
dominant timescales.
with arbitrarily chosen observables i L then finally leads to the transition manifold
reaction coordinate (TMRC)
(x) := E[(Xt ) | X0 = x] . (5)
As a sidenote, this is simply the Koopman operator [16, 21] applied to : (x) = Kt (x)
To compute the value of at specific evaluation points XN := {x1 , . . . , xN }x X, an
algorithm based on short simulation burst was proposed in [4]. For completeness sake,
it has been added to Appendix A.
3
By the strong Whitney embedding theorem, there always exists an embedding of dimension k 2r.
4
in the sense of prevalence
7
3. Galerkin approach for computing reaction coordinates
The above burst-based approach for computing the reaction coordinate has two major
disadvantages:
1. can only be computed pointwise and has no closed analytic form. For every new
evaluation point many numerical MD simulations have to be started. Further,
the evaluation points have to be chosen in regions relevant to the slow transition
processes (i.e. in the transition regions and metastable sets), which is a non-trivial
task.
Skj = hk , j i .
For many popular choices of basis functions, this matrix is given analytically or can be
cheaply computed.
Recall that can also bePwritten in terms of the Koopman operator: (x)
= Kt (x).
N
Let N L VN , N = l=1 cl l be an observable with k N kL2 . Then
8
Here it was used that Kt can be considered a non-expanding operator on L2 , see [2].
With this, the scalar product in (6) can be estimated as follows:
N
= hk , Kt i hk , Kt N i =
X
hk , i hk , Kt l i cl . (7)
l=1
Thus, in order to estimate (6), all that has to be computed are the entries of the transition
matrix T RN N ,
Tkl = hk , Kt l i .
Remark 3.1. This is the same transition matrix on which the Galerkin approximation
of the transfer operator eigenfunctions, and thus such popular methods as Markov State
Model (MSM) analysis are based [27, 10]. The Galerkin approximation of is thus
applicable whenever those methods are.
Data-based computation of the transition matrix. The entries of T can now be ap-
proximated based on simulation data. Let X X be the region of interest in which the
the computation of the reaction coordinate is desired, and let V L2 (X) be a basis that
adequately resolves L2 (X).
The approximation of T requires two sets of data points,
XM = {x0 , . . . , xM } , YM = {y0 , . . . , yM } , XM , YM X ,
where XM samples the stationary measure , and YM contains realizations of the dy-
namics with starting points XM , yi = t xi , with t the lag time from Assumption 2.1,
and t denoting the random variable that maps the initial state of a (stochastic) MD
trajectory to its state after time t. This data can for example be obtained from a single
equilibrated numerical trajectory of step size ,
which can be approximated by a Monte Carlo sum with -distributed sampling points.
As the points in XM are -distributed, this becomes
M M
1 X t 1 X
k (xj )K l (xj ) = k (xj )E[l (Xt ) | X0 = xj ] .
M M
j=0 j=0
9
If the number of sampling points M is sufficiently high, the expectation value can be
approximated by a single realization.We then get
M M
1 X 1 X
Tkl k (xj )l (t xj ) = k (xj )l (yj ) . (9)
M M
j=0 j=0
Once the Tkl are computed, N can be evaluated at arbitrary points in X as follows:
N N
X
X
N (x) = j (x)(S 1 )jk Tkl cl (10)
j,k=1 l=1
which is effectively just counting the number of transitions from set Ak to set Al within
the data sets XM , YM .
In the case of indicator functions, with x Ak , we have, with Skk = h1Ak , 1Ak i ,
N
X
1
N (x) = Skk Tkl cl .
l=1
Choosing the sets Ak naively, for example on a regular grid, invokes the curse of
dimension. We thus propose a partition into Voronoi cells {A1 , . . . , AN } with suitable
center points {e1 , . . . , eN } X for Galerkin approximation. With this, we will also be
able to avoid the explicit construction of the transition matrix.
Commonly, one wishes to approximate in the region X of state space that is cov-
ered with the available data points XM . The question is then how the Voronoi centers
{e1 , . . . , eN } X should be chosen in order to achieve this. In the following, we demon-
strate that two different criteria on the approximation quality of lead to two different
algorithms for selecting the Voronoi centers.
!
k N kL2 (X) = min . (11)
{e1 ,...,eN }X
10
The difficulty is that neither nor N are known in advance. In the appendix we
show that an unbiased estimator of k N k2L2 (X) based on the sampled data XM =
{x1 , . . . , xM } is given by
N X
i ) A k2 2k+1 ,
X
S (A1 , . . . , AN ) = k(x k R
(12)
k=1 xi Ak
which is the objective function ofPk-means clustering in the image space of the reaction
Here A = |Ak |1
coordinate .
k xi Ak (xi ) is the mean of in cell Ak . To minimize
this objective function directly one would have to know . If we assume in addition that
where diam(Ai ) is the diameter of the Voronoi cell Ai . Minimizing the upper bound
then means looking for Voronoi centers such that the diameter of the largest Voronoi
cell is minimized. Since the number of Voronoi cells and the volume of the set XM to be
covered are fixed, the minimum is achieved if the centers cover XM evenly, such that the
Voronoi cells all have similar diameters. Therefore, we may alternatively maximize the
diameter of the smallest Voronoi cell, which is lower bounded by the minimal internal
point distance:
min diam(Ai ) min kei ej k.
i=1...N i,j=1,...,N
i6=j
The inequality holds because min kei ej k is twice the distance from ei to that face of
Ai which is closest to ei , while the diameter of Ai is by definition larger. Maximizing
11
the lower bound then leads to the objective function of maximal minimal internal point
distance:
E = arg max min kei ej k .
i,j=1,...,N
{e1 ,...,eN }XM
i6=j
This problem is related to Poisson disk- or blue noise (sub)sampling in computer vi-
sion [6]. A simple heuristic picking algorithm to compute a set that approximates the
above distance is the following [34]:
The corresponding weight matrix is diagonal with Sll = hl , l i . Thus, the l-th Galerkin
coefficient of (6) is
l i
R
h, (x) l (x) d(x)
= XR 2
.
hl , l i X l (x) d(x)
Both integrals can again be approximated by a Monte Carlo sum with -distributed
sampling points, for which again the data set XM can be used. Thus we have
M M
1 X
Z
X
(x) l (x) d(x) (x j ) (x
l j ) (yj )l (xj )
X M
j=1 j=1
1 X
= (yj ) arg minkei xj k = l .
M i=1,...,N
12
This can be computed simply by assigning the points in XM to the Voronoi centers,
evaluating the observable at the corresponding points in YM and summing up.
The other integral is
M
1 X
Z
2
l (x) d(x) l (xj )
X M
j=1
1
= # j | arg minkei xj k = l .
M i=1,...,N
This is just the relative number of data points in XM that belong to the Voronoi center el .
Overall, to estimate the Galerkin coefficients and thus a functional form of N , we
propose the following algorithm:
4. Examples
4.1. Curved double well potential
As our first example, we consider a diffusion process in a two-dimensional double well
potential with curved reaction pathway that was first studied in the context of reaction
coordinates in [18]. The system obeys the SDE
p
dXt = V (Xt )dt + 2 1 dWt , (14)
13
Voronoi-Galerkin approximation of the reaction coordinate N that hopefully resolves
the transition pathway and retains the dominant implied timescales of the system. The
algorithm was implemented in Matlab, and for the time-critical parts highly-optimized
native routines (such as dsearchn for the nearest-neighbor search) were used.
(a) (b)
2 2
(c)
t0
t1 5.9332
x2
x2
0
t2 0.9021
t3 0.6031
2 2
t4 0.4001
2 1 0 1 2 2 1 0 1 2
x1 x1
Figure 2: (a) Curved double well potential with minimum energy path. (b) Trajectory
of length 2 (red) and 6 (blue). (c) Implied timescales of the full process.
Setup. The implied timescales5 of the full process for = 0.5 can be seen in Fig-
ure 2 (c). As expected, the system is timescale-separated, with the single slow timescale
representing the mean expected waiting time for a single transition between the wells [3].
We also see that the relaxation time t = 2 falls in between the slow and fast timescales,
so we use it as input for Algorithm 3.2. Indeed, typical trajectories of length 2.0 seem
to sample the local metastable set, yet undergo no transition (Figure 2 (b)). Moreover,
we assume the dimension r = 1 of the effective dynamics to be known.
The transition matrix was computed based on dynamical data that was extracted from
a single well-equilibrated trajectory T = {x0 , x0 , . . .}, with starting point x0 = (1, 0),
step size = 102 and 2 107 steps. We constructed data sets XM , YM of form (8) with
M 2 107 .
The subset of the rectangle [2, 2] [3, 3] that is covered by XM will be taken as
the interesting state space region X where we want to compute N . It is partitioned
into N = 1000 Voronoi cells Ak with centers computed by both the k-means algorithm
(provided by Matlabs kmeans function) as well as Algorithm 3.1 (see Figure 3 (a)).
While the center points based on Algorithm 3.1 cover X evenly (by construction), the
k-means-based center points appear to emphasize the metastable regions and slightly
under-sample the transition regions. As described in Section 3.2, indicator functions
over the Ak will serve as the basis for the Galerkin discretization.
5
Computed by a fine Ulam approximation of the eigenvalues of T t and using the relation (1). Numerical
errors resulting from finite dynamical data introduce a dependence of the timescales on the lag time t
in (1). We compute the timescales at t = 2, as here they are locally almost constant in t.
14
The 2r + 1 = 3 generic observables i : R2 R were chosen as linear functions
i (x) = a|i x with randomly-chosen orthogonal coefficients ai R2 . They were generated
in Matlab by applying Gram Schmidt orthogonalization to the rows of a randomly drawn
matrix:
rng(1); A = rand(3,2); A = GramSchmidt(A) .
However, as in this toy example the embedding dimension is higher than the original
dimension, the third coefficient could not be chosen orthogonal to the other two. Instead,
a3 = (1, 0)| was chosen for simplicity.
Results and analysis. Figure 3 (a) shows the computed reaction coordinate N , eval-
uated at the center points of the Voronoi cells. Note that N is defined and can be
evaluated on all of X, but is constant on the individual Voronoi cells, thus N (X) is
finite.
The points appear to be concentrated around a one-dimensional manifold, which is the
transition manifold M embedded into R3 by . As detailed in [4], a standard manifold
learning method, such as the well-known diffusion maps algorithm [9, 29, 30], can be
used to parametrize the embedded points along this manifold (see the color-coding in
Figure 3 (b)). Calling the parametrization by the leading diffusion maps eigenvector
1 : N (X) R, the map
N (x) := 1 N (x)
can then be used as one-dimensional reaction coordinate. Figure 3 (b) shows that for
both types of Voronoi center points, N parametrizes the transition pathway.
To verify the quality of the one-dimensional reaction coordinate N , we compare
the timescales of the full process Xt and the projected process N (Xt ). The latter
is computed by using the projected trajectory N (T) as dynamical data for a Ulam ap-
proximation of the projected transfer operator Tt (on a regular 50 interval partition
N
of [min N (T), max N (T)]) and using relation (1). Table 1 shows that the reaction co-
ordinates based on the Voronoi centers of both k-means and Algorithm 3.1 significantly
outperform the naively-chosen reaction coordinate (x1 , x2 ) = x1 . Moreover, despite the
seemingly worse approximation behavior at the borders of X, the reaction coordinate
based on Algorithm 3.1 seems to preserve the dominant timescales better. A possible
reason might be the better resolution in the transition region.
Table 1: Implied timescales of the curved double well potential under projection onto
different reaction coordinates.
15
(a) (b) (c)
1
2 2
0.5
3
kMeans
0
x2
x2
0 0
0.5
1
2 2
1
0
1 0.5 1
2 1 0 1 2 0.5 0 2 1 0 1 2
1 1 2
x1 x1
1
2 2
Picking Algorithm
0.5
3
0
x2
x2
0 0
0.5
2 1 2
1
0
2 1 0 1 2 1 0.5 1 2 1 0 1 2
0.5 0
x1 1 1 2 x1
Figure 3: (a) Voronoi center points, computed with k-means clustering and the picking
algorithm, respectively. (b) The three components of the computed reaction coordinate
N , evaluated at the respective Voronoi centers. The coloring indicates the dominant
diffusion maps eigenvector, which is used to construct the one-dimensional reaction co-
ordinate N . (c) N as a continuous function on XM .
4.2. Dialanine
Finally, we show that Algorithm 3.2 can be used to successfully identify good reaction
coordinates in realistic molecular systems. At temperature 400K, the peptide Dialanine
in aqueous solution shows four metastable conformations that can be described by two
essential backbone dihedral angles , (Figure 4). We examine whether our algorithm
can identify reaction coordinates that correlate with , and reproduce the dominant
timescales.
Setup. The molecule consists of 22 atoms (including hydrogen), thus the state space
X is 66-dimensional. The relaxation time t = 20 ps as well as the embedding dimension
16
(b) (c)
(a)
0
0 10 20 30 40
0
0 0 10 20 30 40
t
Figure 4: (a) Dialanine with its two essential dihedral angles and . (b) Ramachan-
dran plot of , , revealing four metastable conformations. (c) Frames from a long
trajectory projected onto and .
r = 2 are assumed to be known. For the dynamical data, a long molecular trajectory T
of Dialanine in explicit water was computed using the MD software Gromacs. To en-
sure the internal conformational switching to be the dominant motion of the system,
the global translational and rotational degrees of freedom (that describe the molecule
floating around in the simulation box) were removed from the trajectory prior to fur-
ther analysis. The simulation of overall length 40 ns (step size 0.002 ps) yielded data
sets XM and YM of size M 2 106 .
Again, N = 1000 Voronoi centers in the region covered by XM were computed using
the k-means algorithm and Algorithm 3.1. The projection of these points onto the
(, )-plane can be seen in Figure 5 (b). Again, the former emphasizes the metastable
sets, whereas the latter covers the range of values more evenly.
For the embedding functions : R66 R5 , linear functions with orthogonal coefficient
vectors were chosen as in Section 4.1.
Results and analysis. Figure 5 visualizes the computed reaction coordinates with
Voronoi center points chosen by the k-means algorithm (top) and Algorithm 3.1 (bot-
tom). The first three of the five components of N , evaluated at the Voronoi center
points, are plotted in Figure 5 (a). The two dominant diffusion map eigenvectors on
the data points, 1 , 2 , are color-coded, and have again been used to construct the
two-dimensional reaction coordinate
1 N (x)
N (x) := .
2 N (x)
As visualized in Figure 5 (b), N shows a clear correlation with the dihedral angles ,
for both methods of choosing the Voronoi center points. The range of N can be seen in
Figure 5 (c).
We again compute the implied timescales of the reduced process N (Xt ). To yield
higher accuracy than the simple box-based Ulam method from the previous example,
17
(a) (b)
24.5
(c)
3
24 0 0.03
41 0
k-means
42 95.5
2 96
97 96.5 0 -0.03
1
2
43
-0.06
24.5
3
-0.09
24 0
42 95.5
2 96
97 96.5 0
43 1
(a) (b)
24.5
3
24 0
(c)
Picking Algorithm
0.03
41
42 95.5 0
2 96
97 96.5 0
43 1
2
-0.03
-0.06
24.5
3
41
42 95.5
2 96
97 96.5 0
43 1
18
we utilize the PyEMMA software package [25] with its built-in methods to discretize
the transfer operator, estimate its eigenvalues and compute the timescales. Again the
implied timescales show a certain dependence on the lag time parameter t (see Equation
(1)) due to numerical inaccuracies. We compute the timescales at the lag time of minimal
local variance, t = 31.96 ps.
Computing the timescales of the full 66-dimensional process with the necessary accu-
racy is numerically infeasible, so a rigorous error analysis is not possible for this system.
Instead, we utilize the variational principle of conformation dynamics [23] which states
that the timescales of the full process are always underestimated by those of any pro-
jection of the process. Thus, larger projected timescales can in general be considered
more accurate. However, due to the possibility of systematic errors in approximating the
projected timescales (discretisation of the transfer operator, finite amount of dynamical
data), this variational principle might be violated in unvetted ways. Thus, we addition-
ally offer a comparison to the timescales of a manually-chosen RC that can be assumed
to be good, namely the backbone dihedrals , . We emphasize that these again only
represent an approximation to the full systems timescales with unknown error.
Using these two error estimators, we compare our RCs N for both the k-means and
the picking algorithm to a two-dimensional TICA projection, a dimensionality reduction
method that is popular in MD analysis [24]. The method finds the directions in the
data sets with maximal global autocorrelation for a specified lag time, and thus always
constructs linear reaction coordinates. For this lag time = 120 ps was chosen as it
maximizes the cumulative kinetic variance (95.5%) [22].
The three (nontrivial) dominant timescales and their deviation from the benchmark
(, )-projection can be seen in Table 2. The remaining timescales ti , i 4 are signifi-
cantly smaller (< 5 ps) and are thus considered irrelevant.
Judging by both the variational principle and the deviation from the benchmark pro-
jection, the k-means Galerkin-RC provides a better approximation of the first two dom-
inant timescales than the TICA projection. The picking algorithm Galerkin-RC yields
a worse approximation of the first, but a better approximation of the second timescale,
compared to TICA.
However, both the new k-means and the picking algorithm Galerkin-RCs significantly
outperform the TICA projection in approximating the third dominant timescale. The
process N (Xt ) can thus be considered a significantly more realistic representation of the
slow processes in Xt than the TICA-projected process.
5. Conclusion
In this paper we introduced a Galerkin approach to the computation of the -optimal
reaction coordinates introduced in [4]. The Galerkin approach is suitable whenever the
transition matrix of a Markov State Model can be computed from available simulation
data, e.g. if the data consists of a long equilibrated trajectory. As with any Galerkin
method, one has to choose a set of basis functions, and this choice is crucial in order
to obtain a small approximation error. We provide two algorithms in order to compute
19
(a) dominant timescales [ps]
(b) rel. error to (, )-projection
timescale t1 t2 t3
timescale t1 t2 t3
k-means 193.10 62.38 41.63
k-means 0.0083 0.0088 0.0088
picking 187.61 61.73 41.25
picking 0.0365 0.0192 0.0004
TICA 191.78 61.27 29.84
TICA 0.0150 0.0264 0.2769
(, ) 194.71 62.93 41.27
Table 2: (a) Implied timescales of the Dialanine system under projection onto different
reaction coordinates. (b) Relative error to the (, )-projection.
a basis of Voronoi ansatz functions directly from the data. Both algorithms are highly
scalable and readily available, making the Voronoi based Galerkin method straightfor-
ward to apply for practitioners who have existing simulation data on their hard drives.
We showed that the resulting approximation error in the dominant time scales is com-
petitive with state of the art dimension reduction techniques. This demonstrates that
the reaction coordinates we compute here can be used to build efficient coarse grained
models. Of course the computed reaction coordinates themselves are also of indepen-
dent value. In the case of the Dipeptide Alanine we showed that our computed reaction
coordinates and the dihedral angles, which are typically used as reaction coordinates for
this system, produce a similar portrait of the system when viewed in this reduced space.
The requirement to have simulation data that samples the invariant distribution is
of course somewhat strict, and in future work we will relax this requirement to a local
version, i.e. we will work with samples that are locally equilibrated in some metastable
region of the state space.
Acknowledgements
This research has been funded by Deutsche Forschungsgemeinschaft (DFG) through
grant CRC 1114 Scaling Cascades in Complex Systems, Project B03 Multilevel coarse
graining of multi-scale problems.
References
[1] D. Aristoff and T. Lelievre. Mathematical analysis of temperature accelerated dy-
namics. 2014.
[3] A. Bianchi, A. Bovier, and D. Ioffe. Pointwise estimates and exponential laws in
metastable systems via coupling methods. The Annals of Probability, 40(1):339371,
01 2012.
20
[4] A. Bittracher, P. Koltai, S. Klus, R. Banisch, M. Dellnitz, and C. Schutte. Transition
manifolds of complex metastable systems: Theory and data-driven computation of
effective dynamics. arXiv preprint arXiv:1704.08927, 2017.
[6] R. Bridson. Fast poisson disk sampling in arbitrary dimensions. In ACM SIG-
GRAPH 2007 Sketches, SIGGRAPH 07, New York, NY, USA, 2007. ACM.
[9] R. R. Coifman and S. Lafon. Diffusion maps. Applied and Computational Harmonic
Analysis, 21(1):530, 2006.
[11] A. K. Faradjian and R. Elber. Computing time scales from reaction coordinates by
milestoning. J. Chem. Phys., 120:1088010889, 2004.
[17] A. Laio and F. L. Gervasio. Metadynamics: a method to simulate rare events and
reconstruct the free energy in biophysics, chemistry and material science. Rep. Prog.
Phys., 71(12):126601, 2008.
21
[18] F. Legoll and T. Lelievre. Effective dynamics using conditional expectations. Non-
linearity, 23(9):2131, 2010.
[21] I. Mezic. Spectral properties of dynamical systems, model reduction and decompo-
sitions. Nonlinear Dynamics, 41(1):309325, 2005.
[22] F. Noe and C. Clementi. Kinetic distance and kinetic maps from molecular dynamics
simulation. J. Chem. Theory Comput., 11(10):50025011, 2015.
[23] F. Noe and F. Nuske. A variational approach to modeling slow processes in stochas-
tic dynamical systems. Multiscale Model. Simul., 11(2):635655, 2013.
[26] C. Schutte and M. Sarich. Metastability and Markov State Models in Molecular
Dynamics. Courant Lecture Notes in Mathematics, 2013.
[29] A. Singer and R. R. Coifman. Non-linear independent component analysis with dif-
fusion maps. Applied and Computational Harmonic Analysis, 25(2):226239, 2008.
22
[31] N. Socci, J. N. Onuchic, and P. G. Wolynes. Diffusive dynamics of the reaction co-
ordinate for protein folding funnels. The Journal of chemical physics, 104(15):5860
5868, 1996.
[34] M. Weber, K. Fackeldey, and C. Schutte. Set-free Markov state model building. J.
Chem. Phys., 146:124133, 2017.
A. Burst-based algorithm
We state an algorithm proposed in [4] for the pointwise computation of at the evaluation
points {x1 , . . . , xN }. The algorithm requires M short trajectories starting at each of the
N evaluation points xi (N M trajectories in total).
B. Objective functions
We show that the objective functions discussed in section 3.2 minimize lower bounds of
the L2 error and the uniform error of respectively.
23
k-means objective function. We show that the objective function S (A1 , . . . , AN ) de-
fined in (12) is an unbiased estimator of k N k2L2 (X) . First, since xi ,
E Ak = E
1 i ) = h1Ak , i .
X
(x
|Ak | h1Ak , 1i
xi Ak
Thus
!2
Z
h1 ,
i
i ) A k2 2k+1 = Ak
X
E k(x k R
(x) d(x)
Ak h1Ak , 1i
xi Ak
Z 2
= N (x) d(x)
(x)
Ak
h1Ak ,i
where the last line follows from N = k
P
h1Ak ,1i 1Ak . Summing over k then gives
as desired.
24