MD 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 69

Chem 860.

Lecture 1
Introduction to molecular simulation
September 2, 2004

Class-related materials

Syllabus
Homework assignments
Project

2
2.1

Basic ideas and goals


What is simulation?

Simulate a process computationally based on a set of pre-determined (physical, mathematical) rules


Molecular simulation: use rules based on physics and chemistry to simulate the behavior
of molecular systems

2.2

What can molecular simulations offer?

Simulation vs. Analytical Theory


Check and refine analytical theories (with a given model, correction simulations give exact
results)
Simulation vs. Experiments
Help to interpret experimental data (spectroscopy, x-ray diffraction)
Study microscopic details of complex processes (protein folding, conformational/reaction
dynamics) that are difficult to obtain experimentally (biological systems, complex materials)

Analyze contribution from different factors using experiments cant be done in reality
(Vitkup et al. Nat. Struct. Biol. 7, 34, 2000)
Perform high-throughput in silico experiments and make predictions (mutation, ligand
design)
Study processes that you dont want to (cant) do experiments (nuclear meltdown, galaxy
collision)

2.3

Typical observables from molecular simulation and state-ofthe-art

Equilibrium quantities (time-independent): either Molecular Dynamics (MD) or Monte


Carlo (MC)
Time average vs. Ensemble average - assumed to be equivalent in ergodic systems

Aobs =< A >time = lim

tobs tobs

tobs

tobs /t

A((t))dt = lim

tobs tobs

Aobs =< A >ensemble =< A|ensemble() >=

obs
1 X

obs

A((ti ))

i=1

A(( ))

=1

Critical to both MC and MD: the configurations follow the correct distribution ((), which
is usually the Botlzmann distribution. In sophisticated MC/MD schemes, the generated
distribution is purposely NOT Boltzmann, but the results are appropriately reweighted).
Structures (average, fluctuations) of liquids, (bio)molecules (radial distribution function,
structural factor, B-factors...)
g(r) =

V X
h
(r rij )i
N 2 i,j6=i

S(k) = N 1 h(k)(k)i = 1 +
g (k)
Thermodynamics (temperature, pressure, free energy, entropy, heat capacity....)

T =

X p2
2
i
<
>
3(N Nc )kB
2m
i
i

P = kB T

X
1
<
ri U >
3V
i
2

CV =

2 (E)
kB T 2

Dynamical quantities (time-dependent): Molecular Dynamics for CAB =< A(t)B(0) >
Time-correlation functions: transport properties; spectroscopy; rate
1
D=
3

< v(t) v(0) > dt


0

< P (t)P (0) > dt


kB T 0
Z
1
I() =
dteit < M(t) M(0) >
2
=

2.4

What are the limitations of molecular simulations?

Reliability of the model (e.g., accuracy of the molecular force field) - Ab initio Molecular
Dynamics (related: Car-Parrinello MD)
Scale of systems (e.g., length and time scales) - million atoms; micro-second MD (Kollman
et al.; Pande et al.).

Useful references and homework assignment

Chapter 2 in Frenkel and Smit. 2nd Ed.


Do problem Question 4 (1D Ising model at zero field) to practice your stat. mech.
Two review articles by M. Karplus separated by one decade: Nature, 347, 631 (1991); Nat.
Struct. Biol. 9, 646 (2002)

Chem 860. Lecture 2


The basic elements: intermolecular interactions and molecular force field
September 15, 2004

el |({ri ; xA }) > +
U ({ri }, {xA }) =< ({ri ; xA })|H

Z A ZB
|xA xB |
A,B6=A
X

Although its possible to compute U with reasonable accuracy with recent developments in
electronic structure theories (e.g., linear-scaling methods), it remains challenging to compute
U for a large number of configurations needed for converged simulation results (unconverged
simulations are rarely meaningful!).
Often U is approximated with empirical expressions (force field ).
Non-bonded terms: often assumed as effective pair-wise additive (higher order terms are
approximately taken into consideration with parameterizations).
Electrostatics (the magnitude of charges are not necessarily physical!)
van der Waals (Lennard-Jones)
Bonded terms: simple harmonic/Fourier expansions - so do NOT allow chemical reactions!
Bond
Angle
Dihedral
Others (Urey-Bradley, Improper)
Popular force field for biomolecules
CHARMM (Karplus et al.)
AMBER (Kollman et al.)
OPLS (Jorgensen et al.)
A very useful review on the status of modern force fields and on-going developments: Ponder,
J. W.; Case, D. Adv. Prot. Chem. 66, 27 (2003)

Chem 860. Lecture 3


The basic algorithm for molecular dynamics simulations
September 15, 2004

Basic procedures of MD

The basic idea of MD simulation is the same of a typical experiment:


t

sample preparation measurement measurement measurement


For MD simulation, the pseudo-code looks like the following:
program md
call init
t=0
do while (t.lt.tmax) then
call force
call integrate
t=t+t
call sample
enddo
initilization
position xi (0): random displacements around the lattice for liquids; x-ray, NMR structures
for biomolecules
velocity vi (0): either uniformly random, or drawn from a Gaussian distribution (following

2
mi vi,
2kB T

Maxwellian distribution: f (vi, ) e


velocities according to the temperature:

) Use equi-partition theorem to adjust (scale)

N
1 X
kB T =
mi vi2
3N i=1

integration of equation of motion


Use classical mechanics, 2nd-Newtons law. Total energy is conserved, so we are sampling
microcanonical ensemble. This gives (in the thermodynamic limit) the same result as any
canonical ensemble based sampling (which is VERY easy to achieve with Monte Carlo!).
The most basic integration procedure is the Verlet algorithm
fi (t) 2 1 d3 xi 3
t +
xi (t + t) = xi (t) + vi (t)t +
t + O(t4 )
3
2mi
3! dt

xi (t t) = xi (t) vi (t)t +

1 d3 xi 3
fi (t) 2
t
t + O(t4 )
3
2mi
3! dt

Sum them up, one obtains the Verlet integration (propagation) scheme:
xi (t + t) = 2xi (t) xi (t t) +

fi (t) 2
t + O(t4 )
mi

Note that velocities do NOT come in the propagation scheme, although they can be estimated
(for the sake of monitoring total energy, temperature etc.)
vi (t) =

xi (t + t) xi (t t)
+ O(t2 )
2t

How large can t be? Rule of thumb: 1/20th of the fastest vibration. For molecular systems,
this corresponds to t 0.5 fs. One can freeze certain high-frequency motions that are not
of interest (e.g., SHAKE), and use larger time steps ( 2fs).
An interesting point is that due to the error in propagation schemes, limits in machine
precision, and the fact that most systems under study are chaotic (i.e., small perturbation in
the initial condition changes the trajectory dramatically after certain time), the computed
trajectories are NOT the real trajectories that correspond to the precise initial condition.
However, since we care only about the statistical behavior, satisfactory results are obtained
as far as the simulated trajectories are sufficiently close to some trajectories of the real
system, although we do not know a priori which ones. Although not proven, people show
the existence of shadow orbitals in chaotic systems - i.e., real trajectories that are close
to simulated trajectories for times longer than expected based on the Lyapunov exponent of
the system.

2
2.1

Homework
Pencil exercises: From Frenkel and Smit. Chap. 5

1. Show that the Verlet and velocity Verlet algorithms lead to identical trajectories.
2

2. Derive the Leap-Frog Algorithm by using Taylor expansion for v(t +


x(t + t) and x(t).

2.2

t
),
2

v(t

t
),
2

Computational exercises

Log on to bohr, familiar yourself with the computational environment. Copy files in
qiang/CHEM 860/Assignment/hw1, run the calculation with CHARMM. At this point, try
to simply understand the basic structure of CHARMM input script, output and keywords
(documentation files are in CHARM 860/doc/; pay more attention to dynamc.doc). We will
make longer runs and analysis (e.g., g(r), diffusion constant) in the future. Try a few different
integrators in CHARMM (Verlet, Leapfrog, velocity Verlet) and compare their speed and
accuracy (conservation of energy, stability of temperature etc.). Plot the evolution of total
energy, kinetic energy, potential energy as a function of time. Is the temperature for the
production run stable?
You can visualize the trajectory with VMD. When loading the molecule, use the option: psf
and dcd.
Refer to the course website (http://kandinsky.chem.wisc.edu/qiang/chem 860.html) for
how to run CHARMM using departmental computers.

Chem 860. Lecture 4,5


Algorithms for MD-II: Constrained dynamics and force calculations
September 24, 2004

Constrained dynamics

Recall the Verlet integration (propagation) scheme:


xi (t + t) = 2xi (t) xi (t t) +

fi (t) 2
t + O(t4 )
mi

Rule of thumb for t: 1/20th of the fastest vibration. For molecular systems, this corresponds to t 0.5 fs.
Methods for more efficient MD calculations:
Multiple time steps: different integration steps for different types of degrees of freedom
(details later); e.g., short t for bond stretch, intermediate value for bending, and large
value for torsion.
Freeze high-frequency motions that are not of interest (e.g., X-H bonds), which allows to
use larger time steps ( 2fs).
The general scheme for including constraints in studying dynamics: the Lagrangian multiplier approach. Given a set of constraints, { }, e.g.,
2
= r;ij
d2

one modifies the Lagrangian of the system,


X
X
L0 = L +
= T U +

Substituting this into the Lagrangian Equation of motion, one obtains:


mi

X
d2 xi (t)
=
f
(t)
+
i (t) = fi (t) + gi (t)
i
dt2

The physical meaning of this is very clear - a new set of generalized force (gi ) is needed
for maintaining the constraints during the evolution of the system.
The { } are the set of Lagrangian multipliers whose values need to be determined. This
can be accomplished, theoretically, using the condition that the second time derivatives of
the constraints are zero.
A simple example: free particle on the ring. The constraint is:
1
= (r2 d2 )
2
Thus the equation of motion is (no other external force):
m

d2 r(t)
= (t) = r(t)
dt2

The condition that d2 /dt2 = 0 gives


d2 r
dr 2
+
(
) =0
dt2
dt

Substituting this into the equation of motion (dot r on both sides of equation), one obtains:
= mv2 /r2
The generalized force is therefore
g = r = mv2 /r
which is the expected centripetal force.
An important point is that this scheme of solving for does NOT work in realistic MD
simulations because the integration schemes that we use are approximate. In fact, this
scheme leads to exponential deviation from the constraint during MD simulations.
The practical approach is a predictor-corrector scheme based on the specific integration
algorithm. In the presence of the generalized force, the Verlet integration scheme is in the
form of,
fi (t) 2
xi (t+t) = 2xi (t)xi (tt)+
t +
mi

(t)i 2
t = xuncons
(t+t)+xcorr
({ })
i
i
mi

We solve for { } by imposing that coordinates {xi (t + t)} satisfy the constraints ( )
rigorously (i.e., we correct for the predicted unconstrained values xuncons
(t + t)). In the
i
presence of multiple constraints, the set of { } has to be solved iteratively.
2

Boundary condition and Force calculations

To minimize the effect of boundary (fraction of molecules on the surface N 1/3 ), a popular
choice is the periodic boundary condition (PBC). One has to choose the size of the periodic
cells carefully based on the problem in hand. For example, in the presence of long-range
correlation (during phase transition), the size of the cell needs to be larger than the lengthscale of the correlation. Even in more standard simulations, too small cell-sizes would
introduce artifacts such as over-stabilization of a particular set of conformations or nonphysical anisotropy in liquid structure. A useful reference: McCammon et al. J. Phys.
Chem. B 104, 3668 (2000)
Force calculations: to avoid quadratic scaling of long-range pair-wise force calculations,
typically cut-off in interaction potentials is used. In choosing the cut-off scheme, one needs
to pay attention to the continuity in the potential energy and force. Above all, however, the
cut-off distance should be sufficiently large (e.g., 12
Afor typical biomolecular simulations).
An important component of the cut-off scheme is the generation of the non-bond list, which
is updated with a given (or heuristic) frequency. For large-scale simulations, the generation
of non-bond list can contribute significantly to the cost of simulation. Therefore, one should
pay attention to efficient schemes in those cases (e.g., cell based).

3
3.1

Homework
Pencil exercises

In explicit solvent simulations, most of the computer time is spend on dealing with water
molecules (you need a lot of them to solvate a biomolecule and to set up a sufficiently large
box with PBC). Therefore, it is useful to understand how to propagate their equations of
motion efficiently with the SHAKE algorithm. The typical set up is to describe the internal
structure of water by three bond vectors (2 OH and 1 HH). Write out the equations for
propagating the trajectory using the Verlet scheme with these bonds constrained; i.e., how
many Lagrangian multipliers do you need, and how do you solve for their values. Discuss
briefly how can such procedures be applied to simulations that contain many water molecules.
Useful discussions can be found in Allen and Tildesley, Chap. 3.

3.2

Computational exercises

We emphasized the importance of cut-off schemes in simulations. Here we explore this with a
simple molecular complex that contains two NMA molecules (N-methyl acetamide, the standard model for peptide bond). Use the CHARMM script in cui/CHEM 860/Assignments/hw2/
to compute the potential energy profile as a function of the distance between the two NMA
molecules; explore how the results vary with different non-bond cut-off schemes (e.g., SHIFT,
FSHIFT, SWITCH, FSWITCH for electrostatics, VSHIFT, VSWITCH for LJ interactions)
3

and cut-off distances (refer to doc/nbond.doc for key word options). Also pick an atom and
monitor its force components as a function of distance. Discuss your findings - a very useful
reference is by B. Brooks et al. J. Comput. Chem. 15, 667 (1994)

Chem 860. Lecture 6


Algorithms for MD-III: Statistical error analysis
September 27, 2004

Analysis of statistical error

By statistical error, we mean that simulations may not be sufficiently long to contain enough
statistically uncorrelated data to give accurate results. We imagine that results from different
simulations distribute around the true value (for a given potential function), and our goal is
to estimate the width of this distribution. We assume that the quantity that we are interested
in follows Gaussian statistics, so that we only need to estimate the second moment. This
Gaussian assumption is a reasonably good one, according to the central limit theorem, if the
property is an average (sum) over many data points.
The most common quantity is expressed as the following average,
< A >run =

1 X
A(ti )
N i

If we assume that all the data points are uncorrelated, then we have
2 (< A >run ) =

1 2
1
(A) = (< A2 > < A >2 )
N
N

The error is simply given by (< A >run ). The result makes perfect sense: the fluctuation
in the average is much smaller than the fluctuation in the data points! However, this simple
estimate is too optimistic, because we assumed that all the data points are uncorrelated which is not true - data points from MD simulations are likely to be correlated if they are
not too far apart in time.

Block average

To better estimate errors, we use block average - which simply stated is a scheme that
attempts to make sure that data points are uncorrelated. We separate data into nb different

blocks, with each block containing Nb data, so we have N = nb Nb . We then compute the
average over each block,
Nb
1 X
< A >b =
A(ti )
Nb i=1
We then compute the second moments of < A >b , (note that the average is NOT affected
by the blocking procedure)
nb
1 X
(< A >b ) =
[(< A >b (i) < A >run )]2
nb i=1
2

We repeat this procedure for different values of Nb = 2, 4, 8 . If we plot 2 (< A >b )/nb
vs. Nb or nb - which should reach a plateau when the blocked data points < A >b become
truly decorrelated from each other. In that case, we readily get
2 (< A >run ) =

1 2
(< A >b )
nb

When doing the block-average, you will appreciate how poor (or optimistic) the naive 1/N
estimate is.

Size and length dependence of statistical error

The error ((< A >)) scales as 1/ N in terms of simulation length. So if one wants to
decrease the error by a factor of 2, the length of simulation has to be extended by a factor
of 4.
For properties that can be written as a sum over particles in the system, i.e.,
< A >=

M
X

ai = M < a >

i=1

if one assumes that there is no correlation between different ai , one can show straightforwardly that
1 < a2 > < a >2
2 < A >
=
< A >2
M
< a >2

So once again, the relative error ( <A>


M . In other words, if one wants to
)
scales
as
1/
<A>
decrease the error by a factor of 2, the size of simulation has to be extended by a factor of 4.

Chem 860. Lecture 7


Protein simulations. I. Structural properties
September 29, 2004

Overall structures

One useful quantity to monitor during the simulation: root-mean-square-deviation (RMSD)


N
1 X
2 1/2
RM SD = [
(ri rref
i ) ]
N i=1

where N is the number of atoms being monitored. Often one reports RMSD for the backbone
atoms (C, O, N, C) and for all heavy (i.e., non-hydrogen) atoms. The reference structure
is typically from x-ray or NMR measurements.
A number of examples where the overall structure of the protein(s) from MD simulation is
of interest:
Benchmark new methods (force field, implicit solvent models etc.) - compare RMSD to
high resolution structures from x-ray or NMR.
Characterize structures in different environment - e.g., crystal vs. solution; temperature
dependence; effect of small solutes.
Structural refinement of experimental data - using experimental data as constraints (e.g.,
NOE, H/D exchange, value for residues in protein folding...) A nice example for sampling
rare fluctuations in protein structure is: M. Vendruscolo, E. Paci, C. M. Dobson, M. Karplus,
J. Am. Chem. Soc. 125, 15686 (2003) See also, M. Vendruscolo, E. Paci, C. M. Dobson,
M. Karplus, Nature, 409, 641 (2001)
Improvements over coarse-grained simulations. Use simple models to efficiently sample
conformational space (e.g., peptide aggregation), all atom MD to refine detailed structures.

Structural fluctuations

Proteins are not static - they have substantial size of thermal fluctuations at room temperature! A classical paper that you should definitely read: A. J. McCammon, B. R. Gelin, M.
1

Karplus, Nature, 267, 585 (1977)

2.1

Isotropic and anisotropic fluctuations of individual atom

Root-mean-square-fluctuation:
RM SF (i) = ( 2 ri )1/2 = (< (r2i ) >)1/2
which is also referred to as root mean square displacement.
As summarized in Proteins: A theoretical perspective of dynamics, structure and thermodynamics, C. L. Brooks III, M. Karplus, B. M. Pettitt, Adv. Chem. Phys. LXXI (1988):
All atoms have RMSF on the order of 0.5
A or larger. Sidechains have larger fluctuations
than mainchains, atoms on the surface have larger fluctuations that those in the interior.
Atomic motions are highly anharmonic (non-vanishing higher moments of position), and
are highly anisotropic. The anisotropy is characterized by several parameters, one of which
is

1/2
2x
A1 =
1.0
0.5 ( 2 y + 2 z)
where 2 x(y, z) is the fluctuation of the position along x(y, z) (to be more precise, one should
use the principal axis for thermal fluctuations, which can be obtained by diagonalizing the
covariance matrix for a given atom).
A cute piece of work is by Zhou, Vitkup and Karplus, J. Mol. Biol. 285, 1371 (1999).
The quantity investigated is the so-called Lindemann parameter
pP
2
i < ri > /N
L =
a0
where a0 is the most typical distance between non-bonded neighbors. In their study, a0 was
taken to be 4.5
A.
The picture emerged in the study is a physically meaningful one: the interior of the protein
is solid-like (L < 0.15), the surface of the protein is liquid-like or molten-solid (L >0.15).
Evidently, the flexibility associated with the surface is crucial to the function of the protein.
The thermal denaturation and glass transition in proteins are correlated with the melting
of the solid core and freezing of the molten surface, respectively.

2.2

Correlated fluctuations and principal components

Protein atoms may exhibit long-range correlated fluctuations, which is one of the most
intriguing aspect of protein property that is crucial to function. Correlation can be characterized with the covariance matrix,
2

ij =< (ri < ri >)(rj < rj >) >


where i, j label atoms and , label coordinate components (x, y, z).
The diagonal elements characterize individual fluctuations (e.g., < x2i >, < xi yi >).
For different atoms, the displacements are said to be positively correlated if ij is larger
than zero, and negatively correlated otherwise (uncorrelated if zero). See nice examples in
T. Ichiye, M. Karplus, Proteins, 11, 205, 1991, also, J. L. Radkiewicz, C. L. Brooks, III, J.
Am. Chem. Soc. 122, 225, 2000.
Principal components can be obtained by diagonalizing the covariance matrix. The component associated with the largest eigenvalue corresponds to the direction that contains the
most fluctuations. Several principal components can be used as a dimension reduction tool
for analyzing the essential motions of molecules (see, e.g., L. Caves, J. D. Evanseck, M.
Karplus, Proteins, 7, 649, 1998).
In many ways, principal components can be thought as the generalization of normal mode
analysis - both correspond to some sort of linear coordinate transformation process that
decomposes atomic motions into uncorrelated sets. Principal components with large eigenvalues are highly correlated with the normal modes with low frequencies. In fact, in the
so-called quasiharmonic analysis (M. Karplus, J. N. Kushick, Macromolecules, 14, 325,
1981; T. Ichiye, M. Karplus, Proteins, 11, 205, 1991; I. Andricio, M. Karplus, J. Chem.
Phys., 115, 6289 (2001)), one diagonalizes the mass-weighted effective hessian,
H ef f = kB T 1
which can be understood heuristically by comparing with a one-dimensional harmonic oscillator, for which
kB T
< x2 >=
m 2
Therefore, the eigenvalues of H ef f are the square of a set of effective frequencies. Compared
to normal mode analysis, the quasi-harmonic analysis considers multiple structures and
anharmonic motions of atoms.

3
3.1

Homework
Pencil exercises

Covariance matrix and thermodynamics


Propose a way to estimate the entropy of a macromolecular system based on the covariance matrix from molecular dynamics simulation. Hint: one may obtain a set of effective
3

frequencies from the quasi-harmonic analysis. The system can be approximated as a set of
harmonic oscillators with those effective frequencies. Give the explicit expression that relates
the entropy to the covariance matrix; discuss its value and limitation. Bonus: is the method
useful for estimating nuclear quantum mechanical contributions to entropy?

3.2

Computational exercises

In the Assignment/hw3 directory, there is a CHARMM input script for performing a short
MD simulation for a simple hairpin with an implicit solvent model (EEF1, which will be
discussed briefly in the near future). Modify the input file such that you carry out 10 ps
of equilibration and 50 ps of production run. The calculation should take about 6 minutes
of CPU time. Once again, you can use VMD to visualize the MD simulation. Note the
fluctuations in temperature and the typical frequency of updating the non-bond list. Please
remove the dcd (trajectory) files after you finish your analysis - to avoid filling up the home
diskspace.
Statistical error analysis
Store the potential energy (of the production run) into a file by specifying the KUNIT value
and an appropriate NPRINT value (10) in the dynamics section. Estimate the average
and statistical error in the total potential energy using the block average technique. How
different is the result compared to the value without any block averaging? You need to
write a simple piece of code that reads in the data from data/hair2.ene and performs block
averaging. If you have difficulty, come and see me.
Structural properties
Plot the following quantities associated with the simulation:
1. RMSD for the backbone atoms relative to the starting structure as a function of time.
2. RMSD for the backbone atoms in the averaged MD structure relative to the the starting
structure. Make a figure to illustrate this by overlaying the two structures in VMD.
3. RMSF for C atoms. Bonus: one can map the RMSF onto the structure in VMD (i.e.,
different color for different magnitude of fluctuation).
4. Covariance matrix for C atoms, discuss the qualitative pattern you saw. In general, it
takes a long time to get converged covariance matrix (because non-local values are influenced
by slow fluctuations).
C
5. End-end distance (rC
1 rN ); main-chain hydrogen bonds (i.e., between NH and CO).
q P
2
6. Radius of gyration (Rg = N1 N
i=1 < (ri rG ) >, where rG is the center of mass). We
usually use C atoms only.

Chem 860. Lecture 9,10


Structural & Dynamical properties; preliminary free energy simulations
October 10, 2004

Structural properties. Radial distribution function

Consider simple monoatomic fluid:


Joint probability distribution for finding a specific particle 1 at position r1 and specific
particle 2 at r2 :
Z
Z
Z
(2,N )
P
(r1 , r2 ) = dr3 dr4 drN P (rN )
where P (rN ) is the Boltzmann distribution for a given configuration {rN }.
Joint probability distribution for finding any particle at position r1 and any other particle
at r2 :
(2,N ) (r1 , r2 ) = N (N 1)P (2,N ) (r1 , r2 )
For ideal gas,
(2,N ) (r1 , r2 ) = N (N 1)P (1,N ) P (1,N ) = N (N 1)/V 2 2
where is the bulk number density.
For a liquid, the radial distribution function is defined as the ratio of (2,N ) (r1 , r2 ) and to
emphasize the difference between liquid and gas,
g(r1 , r2 ) = (2,N ) (r1 , r2 )/2
For an isotropic liquid, g(r1 , r2 ) = g(r). Not surprisingly, g(r) approaches 1 as r goes to .
One can re-write the definition of g(r) and think about its physical meaning
(2,N ) (0, r)/ = g(r)
The l.h.s. gives the conditional probability that a particle found at r given that another is
at the origin.
The r.h.s. gives the average density of particles at r given that a tagged particle is at the
origin.
1

Dynamical properties. Time correlation function

Onsager regression hypothesis: spontaneous fluctuations in equilibrium system regress


back to equilibrium according to the same relaxation equation that describes the macroscopic relaxation, provided that the perturbation to the system is very weak. In other words,
by observing fluctuations in equilibrium simulations, one can learn about dynamical properties.
Time correlation function CAB ( ), CAA ( )
Z

dtA(t)A(t + )

CAA ( ) =< A(0)A( ) >= limT


0

Short time limit: CAA (0) =< A2 >.


Long time limit: CAA () =< A >2 .
To make the T.C.F. zero at long times, one often subtracts out the average,
0
CAA
( ) =< A(0)A( ) >=< (A(0) < A >)(A( ) < A >) >= CAA ( ) < A >2

A useful quantity to characterize the T.C.F. is the correlation time,


Z

0
d CAA
( )/ < A2 >

c =
0

When the T.C.F. is an exponentially decaying function, exp(t/rel ), the correlation time
(c ) is the same as rel .
Very often its of great interest to extract the time-scale (or frequency) of motions involved
in the dynamics. For such a purpose, one computes the power spectrum of the T.C.F.
1
IA () =
2

d ei CAA ( )

Green-Kubo relations: dynamical properties can be expressed in terms of different time


correlation functions. For example, diffusion constant,
1
D=
3

< v(0) v(t) > dt


0

Computations of T.C.F.
The simplest (although not the most efficient) way is to follow the definition, if data points
are separated by , then T.C.F. can be computed as,
2

N
1 X
CAA ( ) =
Ai Ai+n
N i=1

where Ai is the value of A at the ith data point, and n = / .


An important point is that the absolute statistical error of T.C.F. is independent of the time
, i.e.,
c
< 2 CAA ( ) >< A2 >2
0
where 0 is the simulation length. This suggests that the relative error in T.C.F. becomes
large at long time. We note that the long-time behavior of CAA ( ) is often important to the
calculation of dynamical properties - such as the diffusion constant.

Free energy simulations. I. Thermodynamical Integration

It is very difficult to compute absolute free energy and entropy because these quantities
depend on the absolute value of the partition function, which requires sampling of enormous
part of the phase space. It is, however, simpler (though still challenging) to compute relative
free energies, which can be cast into the form of ensemble averages and depend on a much
smaller portion of the phase space (relatively speaking) that have large Boltzmann weight.
One can construct appropriate paths that connect different systems of interest for computing
relative free energies.
The simplest path one can take is a linear parameter path () that interpolates between two
thermodynamical states of interest. The potential function associated with a state with a
specific along such a linear interpolation is,
U (rN ; ) = (1 )U0 (rN ) + U1 (rN )
Its straightforward algebra to show that the derivative of the free energy with respect to
is simply the ensemble average of the potential derivative with respect to , i.e.,
F/ =< U (rN ; )/ > =< U1 (rN ) U0 (rN ) >
where the second step holds only for the linear coupling case. The subscript associated with
the ensemble average emphasizes the fact that the average is taken with the unphysical
system described by potential U (rN ; ) - which is why the free energy derivative varies with
.
3

The total free energy difference is obtained readily by integrating the free energy derivatives
from = 0 to 1 - thus this scheme is referred to as thermodynamical integration.
Z

F =
0

4
4.1

F
d

Homework
Pencil exercises

1. Using the definition of T.C.F., show that


CAA ( ) = hA(0)A( )i A2
Can you think of any simple system in which the equal sign holds from time to time?
Hint: The Schwartzs inequality is, in the general form,
X
X
X
|
Ai Bi |2 (
A2i )(
Bi2 )
i

2. Show that the 2nd derivative of the free energy with respect to is negative semi-definite
(i.e., 2 F/2 0).

4.2

Computational exercises

The CHARMM scripts are in: bohr:/cui/CHEM 860/Assignment/hw4.


1. Radial distribution function and velocity auto correlation function of water.
Use tip3 box rdf vac.inp to run a set of short MD runs. Compute the radial distribution
functions (gOO (r), gOH (r), gHH (r)). Also estimate the self diffusion constant of water using
two different approaches: (1). The Einstein relation, < r2 >= 6Dt; (2) Integrate the
velocity auto-correlation function. Compare the results to available experimental data that
you can find in the literature, make some comments on what you found.
2. Thermodynamic integration
Use free energy ti.inp to estimate the free energy difference associated with an alchemy
mutation: converting ethane to methanol in water (Faq ). [Although this value itself doesnt
have real physical meaning, the difference between Faq and the free energy associated with
the same mutation in another media (e.g., gas phase, Fgas ) is the solvation free energy
difference between ethan and methanol in these two media.] Plot the free energy derivative as
a function of time during the simulation, comment on the convergence behavior. Comment
on the limitation of this simulation.
4

Chem 860. Lecture 11,12


Free energy simulations - thermodynamic integration and umbrella sampling
October 13, 2004

Practical aspect of TI

Free energy simulations are powerful in that one can construct different thermodynamical
cycles to look at free energy changes associated with many different processes: solvation,
ligand binding, protein stability, phase transition, .
Although the principle of TI is very simple, which in the linear coupling case reduces to the
evaluation of the following ensemble average,
F/ =< U (rN ; )/ > =< U1 (rN ) U0 (rN ) >
there are many practical issues that one has to deal with care to obtain meaningful results. A
useful recent review can be found: T. Simonson, G. Archontis and M. Karplus, Acc. Chem.
Res., 35, 430-437 (2002) Free energy simulations come of age.

1.1

End-point problems

Correction for ghost/dummy atoms


Imagine simulating the relative binding free energies of two ligands (A, B) to a protein. A
general set-up is the so-called dual topology approach, in which both ligands are present
at the same time. Calling the two ligands and the environment (protein) as block 2, 3 and
1, respectively, the coupling potential is written as:
U (rN , ) = U11 + U22 + U33 + (1 )U12 + U13
where we kept the internal energy terms associated with the two ligands throughout the
simulations so that at end points ( = 0, 1) the internal structure of the ligand is well
maintained. Note that the two ligands do NOT interact with each other, of course.
A slight problem is that the end-states from such simulations do NOT correspond exactly
to the physical system. For instance, at = 0, we should have just blocks 1 + 2; instead,
we have in addition a ghost block 3 that is non-interacting with either 1 or 2.
1

In principle, this is not a problem because as far as block 3 is non-interacting with 1+2,
we can estimate its contribution to the free energy of the system using ideal gas partition
functions for 3. More fortunately, even if such correction is not exact (due to anharmonicity
etc.), the correction should cancel out exactly if one does the TI simulation in both the
protein and pure solution (to complete the thermodynamic cycle).
In practice, however, a serious problem is that as approaches 0 or 1, one of the ligands
becomes the ghost and start to wander around the simulation box. In most practical
simulations, the space that the ghost ligand samples is far from the area it would sample
under the ideal gas condition. Therefore, large uncertainty remains in the estimated free
energy derivatives (S. Boresch, M. Karplus, J. Phys. Chem. A 103, 103-118 (1999)).
A practical solution is to leave the bonded terms connecting the ligands with the protein, if
any, unscaled throughout the simulations i.e.,
bonded
bonded
nb
nb
U (rN , ) = U11 + U22 + U33 + U12
+ U13
+ (1 )U12
+ U13

With this coupling potential, the ghost ligands are attached to the protein even at the
end-points (T. Simonson, G. Archontis, M. Karplus, J. Phys. Chem. B 101, 8349-8362
(1997)). The contribution of the bonded terms to the free energy can be approximated and
removed with local configuration integrals (S. Boresch, M. Karplus, J. Phys. Chem. A 103,
103-118 (1999)) - which relies on the assumption that these bonded terms are very local and
therefore can be decoupled from the rest degrees of freedom. Although this is NOT exact,
the approximation works fairly well in practical cases, especially when considering error
cancellation due to thermodynamic cycles (i.e., in protein and pure solution simulations).
If the ligand is NOT covalently bonded to the protein, one can use harmonic constraints to
anchor the ligands in space to achieve the same effect. The contribution of the constraints
can then be removed in a similar fashion. This is particularly useful for estimating absolute
binding free energies (B. Roux, M. Nina, R. Pomes, J. Smith, Biophys. J. 71, 670-681, 1996;
J. Hermans, L. Wang, J. Am. Chem. Soc. 119, 2702-2714 (1997)).
Van der Waals term
Another kind of end-point problem arises when certain atoms are annihilated or created
(i.e., no interaction with the environment when = 1 or 0). The repulsive nature of the
van der Waals interactions makes the free energy derivative poorly behaved and also causes
instability in the MD simulation. In the case of annihilation, one can show (T. Simonson,
Mol. Phys. 80, 441-447, (1993)) that the free energy derivative diverges as (1 )n/41
as approaches 1, where n is the power in in the coupling potential. In other words, if
linear coupling is chosen and some atoms are annihilated/created, free energy derivatives
around end-points should be extrapolated with care. Alternatively, one may choose nonlinear coupling (e.g., U = (1 )4 U0 + 4 U1 ).
A better solution is to use the soft-core van der Waals potential - which allows stable MD
integrations even at end-points because the repulsive nature of the van der Waals potential
at short range is better controlled (Beutler et al. Chem. Phys. Lett. 222, 529 (1994)). For
a specific pair of atoms,
2

Uijvan

1.2

= ij

1
1

2
6
2
2
[(1 ) + (rij /ij ) ]
(1 ) + (rij /ij )6

Statistical errors and sampling

As varies, it might take a long time for the environment to reorganize, especially if the
reorganization is collective in nature or requires surpassing a significant barrier. Therefore,
the convergence behavior of free energy derivatives needs to be carefully monitored. A nice
discussion is given in Yang and Karplus, J. Chem. Phys., 120, 2618 (2004).
When several conformational substates (e.g., rotamer states of sidechains) are important
but separated by barriers high compared to kB T , it might be advantageous to follow a
divide-and-conquer type of approach. Consider the two ligands (A, B) that are different
chemically, each has several important substates, A1 , A2 , , B1 , B2 , . For each ligand,
the total free energy can be written as contributions from local free energies, e.g.,
FA = kB T lnQA = kB T ln(QA1 + QA2 + ) = kB T lnQA1 kB T ln (1 + QA1 /QA2 + )



= FA1 kB T ln 1 + eFA1,2 +
i
hR
N
where FAi = kB T lnQAi = kB T ln Ai drN eU (r ) . Therefore, we can use TI to estimate
FB1 FA1 , and somehow estimate (see below), for each ligand, the relative local free energies
of conformational substates; i.e.,
FB FA = (FB1 FA1 ) kB T ln

h

 
i
1 + eFB1,2 + / 1 + eFA1,2 +

Conformational free energy and umbrella sampling

The last discussion illustrated another free energy problem: the variation of free energy
along conformational degrees of freedom (instead of chemical degrees of freedom as in the
alchemy free energy simulations). Its complicated in general in the sense that often its
hard to identify a single geometrical parameter that describes conformational variations. In
most cases, based on the specific problem or intuition, a variable is specified, which is often
referred to reaction coordinate, q. To explore the free energy variation along q, we define
the potential of mean force,
Z

Z
N
0
U (rN )
N U (rN )
W (q) = kB T lnP (q) = kB T ln ( dr (q q)e
)/( dr e
)

It is called potential of mean force because the derivative of W (q) with respect to q gives
the mean force on q (see Homework 1).
Although computing W (q) seems trivial based on the definition - one simply has to collect
the probability distribution of q - it is difficult in practice if there are barriers along q. A
powerful technique is called umbrella sampling in which one adds an umbrella potential
to the true potential function,
U b (rN ) = U (rN ) + U umb (q)
The form of U umb (q) is arbitrary although often a parabola is used (i.e., U umb (q) = 21 kq (q
q0 )2 ) - which looks like an inverse umbrella - hence the name umbrella sampling. The
advantage of using U b instead of U is clear if U umb (q) is added to the high-energy region
- then U b is smoother than U (low-energy region is raised up by U umb (q) compared to the
high-energy region). Therefore, one can obtain much better sampling along q.
The problem remains that one needs to figure out the true probability distribution (in the
absence of U umb (q)!) from such biased simulations. The derivation is simple as the following:
R
P (q) =

drN (q 0 q)exp(U (rN ))


R
=
drN exp(U (rN ))

drN exp(U b (rN ))(q 0 q)exp(U umb (q 0 ))


R
drN exp(U b (rN ))exp(U umb (q 0 ))

which can be re-written as


R N
R N
dr exp(U b (rN ))
dr exp(U b (rN ))(q 0 q)exp(U umb (q 0 ))
R
R
drN exp(U b (rN ))
drN exp(U b (rN ))exp(U umb (q 0 ))
Recognizing that the two terms are nothing but ensemble averages over the biased simulations, we have,
< (q 0 q)exp(U umb (q 0 )) >b
P b (q)
U umb (q)
P (q) =
=e
< exp(U umb (q 0 )) >b
< exp(U umb (q 0 )) >b
where we used the property of the Dirac function and the fact that P b (q) =< (q 0 q) >b .
Physically, the result should not be surprising: we know the bias that we put in (U umb (q 0 )),
thus we should be able to rigorously unbias the simulation to obtain the probability distribution (or any physical observable, see Homework 2) of the true system.
Once again, the convergence behavior of the umbrella sampling depends on the specific form
and location of the umbrella potential. A common practice is to use a number of umbrella
potentials along q to cover all regions of interest - which can be done in a serial fashion (i.e.,
multiple runs with different umbrella potential (in terms of kq and q0 in each simulation) or
simultaneous fashion (i.e., more than one umbrella potential present). The goal is to get as
uniform sampling as possible. In fact, the perfect umbrella potential is simply the negative of
the potential of mean force; this is the principle of adaptive umbrella sampling in which the
4

umbrella potential is expanded by a number of basis functions and the expansion coefficients
are updated iteratively such that sampling in the region of interest approaches being uniform
(e.g., see Bartels, Karplus, J. Phys. Chem. B, 102, 865-880 (1998)).
The value of potential of mean force calculations critically depends on the choice of the
reaction coordinate, q. For complex processes - such as reactions in the condensed phase identifying the right q is very challenging. It should be chosen such that the dynamics along
q is indeed much slower than all other degrees of freedom - i.e., we assume the adiabatic
separation between q and all other degrees of freedom (in other words, as q changes its
value, the environment reaches to new equilibrium quickly). If an improper q is chosen, the
potential of mean force computed along q can not give reliable description of the kinetics of
the system; often this can be examined by looking into non-equilibrium quantities such as
transmission coefficient across the barrier along q - which will be discussed in future lectures.

3
3.1

Homework
Pencil exercises

1. Show explicitly that the derivative of the potential of mean force is the mean force acting
on the chosen reaction coordinate, i.e.,
W (q)/q =< U (rN )/q >
The emphasizes that this is a constrained ensemble average (i.e., in addition to the Boltzmann factor, there is also (q 0 q) in the phase space integral).
2. Generalize the discussion in class on umbrella sampling from unbiasing probability
distribution to unbiasing general observables. That is, show that for any observable A, the
true ensemble average can be obtained from biased simulations with an umbrella potential,
U umb ,
< A >=< AeU

umb

>b / < eU

umb

>b

where the subscript b emphasizes that these are biased ensemble averages.

3.2

Computational exercises

Next week.

Chem 860. Lecture 13


Treatment of electrostatics in molecular simulations
October 20, 2004

Electrostatics are important

Due to the presence of polar and charged chemical groups, electrostatics are extremely
important to the function of biomolecules. The slowly decaying nature of electrostatic forces
makes it undesirable to use cut-off schemes to reduce cost of the simulation; e.g., people
found that ions of the SAME charge tend to cluster at the cut-off distance although they
should repel each other (Bader and Chandler, J. Chem. Phys., 1992, 96, 6423 ). Moreover,
even if the cut-off value is large, the result might depend sensitively on the specific algorithm
used in the calculation; e.g., atom-based and group-based cut-off schemes might give different
results (Wood et al., J. Phys. Chem. B, 1998, 102, 3844 ).
The task of developing an efficient and robust algorithm for computing electrostatics in
macromolecular simulations is a challenging and exciting one.

Finite size simulations

The simplest kind involves a finite simulation set up, in which a relatively small number of
explicit water molecules are used to solvate the system of interest. This is used most often
if the phenomena of interest is localized in an area of a biomolecule, such as the binding site
in a protein. The bulk solvent is then treated with a continuum model or, possibly, other
liquid state theories (e.g., integral equation theories).
The issue of continuum electrostatics will be discussed in the near future. Here we focus on
computing electrostatic interactions between the explicit partial charges in the simulation.
We also assume that these charges have fixed values (i.e., not polarizable) as in most standard
force field.
The total electrostatic energy of a set of partial charges {q (k) } is given by the well-known
Coulombs law:
X
U ele =
q (i) q (j) /rij
i6=j

This can also be written in terms of the product between each partial charge and the electrostatic potential generated by all other charges in the system at this charge :
U ele =

where (r(i) ) =

j6=i

1 X (i) (i)
q (r )
2 i

q (j) /rij . The factor of

1
2

is due to double counting.

Straightforward computation of U ele is a N 2 scaling problem, which is something we try


to avoid. The basic idea is that when r is far from the source of charges, the electrostatic
potential can be approximated by the multipole moment expansion.
Imagine that we try to compute the electrostatic potential at r due to a set of charges
{q (k) } around location b; the distances between the charges to b are much smaller than that
between b and r. Analytically, (r) is given by
(r) =

q (j) /|r r(j) | =

q (j) /|r b + b r(j) |

Since |b r| is much larger than |b r(j) |, one can perform Taylor expansion and the leading
terms would be:
q (j)

(r) =
|r b|
P

q (j) (r(j) b) (r b)
Q
d (r b)
=

3
|r b|
|r b|
|r b|3

The total charge (Q) and dipole moment (d) are properties of the charge set; they only need
to be computed once and can be used for (r) at any location r.
Since the multipole moment expansion (MME) works at large distance (i.e., it will converge
quickly), the general strategy is to compute interactions directly (i.e., Coulombs law) for
charge pairs that are close and to use MME when the distance separations are large. The
switch between explicit sum and MME can be determined based on either a spherical threshold or cubic cells in 3-D. The efficiency of MME can also be further enhanced by grouping
cells together at even larger distances (fast multipole expansion) or by doing Taylor expansion around the observation point r as well. For more details, see, for example, a nice chapter
written by T. A. Darden in Computational Biochemistry and Biophysics, Ed. O. M. Becker
et al. .

Periodic condition - Ewald summation

To minimize the effect of the boundary in finite size simulations, the periodic boundary
condition (PBC) is used. Provided that the size of the unit cell is large, artifacts introduced
due to the periodicity are manageable for high dielectric media such as water.
2

Since there are infinite number of cells in PBC, it is difficult to do the multipole expansion trick. Historically, one often uses cut-off schemes and the so-called minimum image
convention - i.e., the cutoff and cell size is chosen such that each particle will not see its
images explicitly. Provided that the cutoff is large, reasonable results were obtained for
polar systems. For highly charged systems, e.g., DNA, cutoff schemes were found to be often
insufficient (i.e unstable results).

3.1

Theory for the Ewald summation

Lattice sum, or Ewald summation, is an elegant way to compute electrostatic interactions


between charges in the primary cell and its (infinite number of) images. Using cubic PBC as
an example, we can use n = (n1 , n2 , n3 to label all cells. The goal is to compute electrostatic
potential generated by all charges in the primary cell,
(r) =

N
XX
n

i=1

qi
|ri + nL r|

This infinite series (in terms of n) is conditionally convergent if the primary cell is NOT
charge neutral. Therefore, it is a common practice to add salt ions to maintain the charge
neutrality.
The basic idea to solve this infinite series is to separate interactions into a short range part
and a long range part; the former will be computed in the real space, while the latter will
be computed in the reciprocal space.
To construct the short range part, we add a Gaussian to each point charge with the same
magnitude but opposite sign. As a result, the interactions between different point charges are
significantly screened and short ranged. One can show that the corresponding electrostatic
potential is
N
XX
qi erf c(|ri r + n|)
1 (r) =
|ri r + n|
n i=1
where is a parameter that characterizes the width of the Gaussian.
To remove contributions from those additional Gaussians, as the second step we add another
set of Gaussian with the same sign and magnitude as the original point charges. The electrostatic problem (Poisson) associated with a set of Gaussians in periodic boundary condition
is straightforward to solve in the reciprocal space - i.e., perform Fourier transform of the
Poisson Equation and then carry out inverse Fourier transform for the solution. The result
is
2 2
2 2
1 X e k /( L )
2 (r) =
S(k)e2ikr/L
L k6=0
k2
where S(k) is the structure factor
S(k) =

qi e2ikri /L

For any point r NOT coincident with the point charges, the total electrostatic potential is
simply the sum of 1 and 2 . For computing electrostatic energies, we need the electrostatic potential at the sites of the point charges. In this case, we should NOT include the
contribution from the charge at the site. For the short range part, we simply do not include
contribution from the current site, i.e.,
01 (ri ) =

0 X
N
X
qj erf c(|rj r + n|)
n

|rj r + n|

j=1

where the in the summation sign indicates that the summation does not include j = i when
n = 0.
For 2 , it turns out that its convenient to include all the Gaussians because its easier to
solve for a truly periodic problem. This means, however, we have to correct the result to be
consistent with the modification we have done for the short range part. This gives rise to
the so-called self-interaction term,
qi
self = 2

To sum up, the total electrostatic potential at the site of a point charge in the primary cell
is,

Ewald (ri ) =

0 X
N
X
qj erf c(|rj r + n|)
n

j=1

|rj r + n|

2 2

1 X e k /(
+
L k6=0
k2

2 L2 )

qi
S(k)e2ikr/L 2

The total electrostatic interaction energy is nothing but


N

Ewald

1X
=
qi Ewald (ri )
2 i=1

Without going into details, we simply note that if the primary cell is NOT charge neutral,
an additional correction should be added, which is equal to

Q
2L3 3

where Q is the total charge of the primary cell.

3.2

Practical issues of Ewald calculations

Basically we have to choose (width of the Gaussian), range of real-space calculation and
range of reciprocal space calculation; all these are related. One may choose such that
4

real space sum is negligible beyond L/2. In this case, the real space sum is N 2 while the
reciprocal sum is N . Alternatively, one can choose larger so that one can truncate the real
space part at a fixed cutoff (e.g., 10
A); in this case, the reciprocal sum scales as N 2 (because
more k sums are needed). The best balance can make Ewald summation scale as N 3/2 . Rule
; however, one should always vary , rcut and the number of k sums
of thumb is that 45
rcut
such that errors in force is small (< 103 kcal/mol
A).
More recent developments have further sped up Ewald summations. For example, a popular
approach is the so-called Particle-mesh-Ewald, which computes the reciprocal sum using
Fast Fourier Transform (FFT). This makes the calculation essentially scales as N logN .
Finally, we emphasize once again that whenever possible, one should systematically check
the convergence of results with respect to the size of the primary cell. For discussions, see,
for example, Smith and Pettitt, J. Chem. Phys., 1996, 105, 4289; McCammon et al. J.
Phys. Chem. B 2000, 104, 3668

4
4.1

Homework
Pencil exercises

See last handout.

4.2

Computational exercises

The CHARMM scripts are in: bohr:/cui/CHEM 860/Assignment/hw5.


1. umbrella.inp. It deals with the PMF for the chair-to-boat conversion of cyclohexane using a combination of dihedral angles as the reaction coordinate. You can read
doc/umbrella.doc for further details. Compare the statistics of the reaction coordinate
with and without the umbrella potential by building histograms associated with the trace
files (rxn1mean.trc and rxn2mean.trc). Estimate the PMF based on the histogram.
2. tip3 ewald.inp. This is essentially the same water simulation as before, except that we
now use the Ewald summation to treat electrostatics as oppose to a simple cut-off scheme.
Compare the RDF and diffusion constant from those simulations to your previous results.
Which are closer to experimental values? Does the result surprise you? I also encourage
you to explore the speed and convergence of energy/force with different Ewald parameters
(e.g., - the width of Gaussian, and the number of k vector summations); also compare the
efficiency of standard Ewald and the Particle-Mesh-Ewald (PME).

Chem 860. Lecture 14,15


Implicit solvent models in biomolecular simulations
October 27, 2004

Although solvent molecules are shown to play an important role in many processes such
as molecular recognition, it is desirable to be able to treat their effect implicitly because
a major part of the computational cost is associated with describing solven-solvent and
solvent-macromolecule interactions. Take the CHARMM benchmark - solvated MbCO in a
55
A periodic cube as an example, one has 2500 protein atoms but 15,000 water atoms.
In other words, it would be computationally very efficient to have all or some solvent
molecules treated implicitly so that one can focus on the behavior of the macromolecules.
In additional to the explicit gain in computational speed, another point is that the slow motions of the macromolecule is sped up due to the lack of collisions (or friction) with the
solvent molecules (i.e., 1 ns simulation with implicit solvent corresponds to much longer time
scale in reality); this is useful in the simulation of processes that involve large scale conformational rearrangements such as allosteric transition or protein folding. Finally, it is useful
to use implicit solvent models to include the effects due to bulk solvent not included in finite
size simulations; here electrostatics would be the major issue, although other more subtle
subjects (e.g., polarization across the liquid-vacuum interface) might also be important.

Implicit solvent models - a potential of mean force


perspective

Using X and Y to represent the coordinates of solute and solvent molecules, respectively,
the probability of a given configuration X, Y is,
P (X, Y) = R

exp[U (X, Y)]


dXdYexp[U (X, Y)]

If we only care about the solute distribution, we can define the reduced probability distribution,
Z

P (X) = dYP (X, Y)


i.e., we integrate out all solvent coordinates.
1

More explicitly, we can write the expression of the reduced probability distribution as
R
dYexp[U (X, Y)]
exp[W (X)]
1

=R
= exp[W (X)]
P (X) = R
C
dXdYexp[U (X, Y)]
dXexp[W (X)]
As discussed in earlier lectures, W (X) is called the potential of mean force, because its
derivative gives back the mean force,


W (X)
U
=
= hFi iX
xi
xi X
The normalization constant C is not important because the quantity of interest is relative
PMF during a process (e.g., solvation). Another way to look at this is that the choice of
normalization factor does not change the reduced probability P (X) - which is the ultimate
quantity of interest because any solute observable can be obtained once it is known
Z
Z
hQi = dXdYQ(X)P (X, Y) = dXQ(X)P (X)

2
2.1

Solvation free energy components


Decomposition

It is instructive to decompose the PMF into the following terms,


W (X) = Uu (X) + W (np) (X) + W (elec) (X)
The first term is solute internal interactions, and the last two terms represent the solvation
effects.
In defining the non-polar and polar components, we imagine that the non-polar interaction
between the solute (u) and solvent (v) is first turned on in the absence of any electrostatics,
then the electrostatic interactions are turned on. Accordingly, we have
R
exp[W

(np)

(X)] =

(np)

dYexp[(Uvv (Y) + Uuv (X, Y))]


R
dYexp[Uvv (Y)]

and
R
exp[W

(elec)

(X)] =

(np)

(elec)

dYexp[(Uvv (Y) + Uuv (X, Y) + Uuv (X, Y))]


R
(np)
dYexp[(Uvv (Y) + Uuv (X, Y))]

The choice of the normalization constants has a nice physical interpretation,


exp[(W (np) (X) + W (elec) (X))] = exp[W solv (X)]
2

R
=

(np)

(elec)

dYexp[(Uvv (Y) + Uuv (X, Y) + Uuv


R
dYexp[Uvv (Y)]

(X, Y))

which is exactly the definition of solvation free energy.


We emphasize that although the solvation free energy, W solv (X) is independent of the sequence that we turn on non-polar and electrostatic interactions, the components W (np) (X),
W (elec) (X) are path-dependent, like any free energy components.

2.2

Non-polar component

In many ways, the non-polar term is difficult to understand quantitatively. At a qualitative


level, a large component is cavitation - i.e., the reversible work needed to push solvent out
of the volume to be occupied by the solute - which means that this term is generally positive
(unfavorable). The non-polar component gives rise to two aspects of hydrophobic effect:
hydrophobic solvation (poor solubility of non-polar species) and hydrophobic interaction
(clustering of non-polar molecules).
A commonly used formula for describing the non-polar component is based on the surface
area:
W (np) (X) = v Atot (X)
where Atot (X) is the total solvent accessible surface area (i.e., includes both polar and nonpolar contributions). Although v should be closely related to the macroscopic oil-water
surface tension, its value is often adjusted to reproduce the solvation free energy of nonpolar
molecules; e.g., a popular choice is 20-30 cal/(mol
A2 ). The SASA model is very primitive
and much is going on in the field to achieve a better description of W (np) (X) and the
hydrophobic effect.

2.3

Electrostatic component

The electrostatic component is often described in the framework of continuum electrostatics


- i.e., one assumes that the solvent is structureless - this seems to be a drastic representation
especially for those in direct contact with the macromolecules, yet in reality the continuum
model often gives reasonable physical picture and results.
The most fundamental equation in continuum electrostatics is the Poisson Equation,
[(r)(r)] = 4u (r)
where (r) gives the spatial dependent dielectric constant (often taken to be 80 for water
and a small value for the macromolecule), and u (r) is the solute charge distribution. Note
3

that this is a linear differential equation in the sense that the electrostatic potential (r)
depends linearly on the solute charge.
This linearity has an important implication in the calculation of the electrostatic solvation
free energy. Imagine that we gradually turn on the solute-solvent interaction in a linear
(elec)
(elec)
coupling path (i.e., charging the solute up linearly), Uuv (X, Y; ) = Uuv (X, Y),
then thermodynamic integration gives
Z 1

(elec)
(elec)
(X, Y) ;X
W
(X) =
d Uuv
0

In a continuum electrostatic framework,


X

(elec)

Uuv (X, Y) ;X =
qi rf (xi ; )
i

where rf is the reaction field generated by the solvent at the solute site, which is the
electrostatic potential in solution minus the value in vacuum.
Since in the Poisson framework rf depends linearly on the solute charge distribution, rf
depends linearly on in the charging process. Therefore, we can write,

(elec)

Z
(X) =

d
0

The factor
bution.

1
2

qi rf (xi ; = 1) =

1X
qi rf (xi ; = 1)
2 i

is the consequence of linear response of the solvent to the solute charge distri-

In the presence of salt, one uses the Debye-H


uckle thery to describe the salt distribution;
accordingly, one has the Poisson-Boltzmann Equation, whose linear form is given by
[(r)(r)] (r)2 (r) = 4u (r)
where (r)2 is the space-dependent
screening factor that varies from 0 in the interior of the
P 2
macromolecule to 4 i qi i in the bulk solvent.
The PB equations can be solved on a grid - which is the finite difference approach - or on
a surface - which is the boundary element formulation. More recent developments using
adaptive finite element methods have made it possible to solve PB for very large cellular
components such as the ribosome and microtubule.

2.4

Other applications of Continuum electrostatics

The accuracy of the PB solution depends on, for a given force field, the way that one defines
the dielectric boundary between the protein and the solvent, and values for the dielectric
4

constant. In most cases, one defines the dielectric boundary based on the molecular surface
of the protein(Gilson et al., J. Comput. Chem. 9, 327, 1987), the definition of which depends
on the atomic radii; the values of such radii can be optimized for small molecules based on
solvation free energy calculations (Honig et al., J. Phys. Chem. 98, 1978, 1994; Nina et al., J.
Phys. Chem. 101, 5239, 1997)- in some cases, e.g. for metal ions, one can identify the radii
based on the first peak in the ion-water RDF. The precise value of the dielectric constant
has been controversial in the literature. In principle, there should not be a constant since
protein has so heterogeneous charge distribution; in fact, MD simulations (Simonson and
Perahia, Proc. Natl. Acad. Sci., 92, 1082,1995) showed that local dielectric constant
varies from small value (2) in the core of the protein to high values (20) on the surface of
the protein, reflecting different degrees of dielectrical polarizability. For most applications,
however, a single value is used and values between 4 and 20 were found to be useful under
different context.
Another practical aspect concerns the grid spacing associated with the finite difference approximation. Although the smaller the better (0.4
A is reasonable), the size of the grid
increases very quickly. If only localized region is of interest, one can do focusing calculations(Gilson et al., J. Comput. Chem. 9, 327, 1987); i.e., solve the PB using a coarse grid,
then use the result on the boundary of the region of interest as the boundary condition, solve
the PB again for the smaller region with finer grid. This can certainly be done iteratively.
solvation and binding calculations - the importance of desolvation For a molecule A, the
electrostatic solvation energy is calculated as,
el
el
el
Fsolv
= Faq
Fvac

The only practical point to note is that one should solve Given two molecules A (e.g.,
protein) and B (ligand) in solution, the electrostatic binding energy can be computed within
the framework of continuum electrostatics as:
el
el
el
el
FAB
= FAB;aq
FA;aq
FB;aq

in which Fiel can be computed from Poisson Boltzmann calculations. We emphasize that
this does not include other major contributions to the binding, one of which corresponds to
the loss of translation and rotation of the small ligand.
An important point to note that the binding energy does not increase monotonically as the
charge of the ligand increases - because one has to pay for the favorable electrostatic energy
of the ligand in solution - i.e., the desolvation penalty is large if the ligand is highly charged.
pKa predictions - note that below I will use the Gibbs free energy G instead of the Helmholtz
free energy F .
Another useful example concerns pKa predictions. For an acid dissociation, we know the
following relations:
[A ][H + ]
= exp[G ]
Ka =
[AH]
5

thus
log10

[A ]
= pH pKa
[AH]

Based on this relation, if there is a single ionizable group, one can easily write down a
thermodynamic cycle and estimate the pKa (log10 Ka ) shift going from solution to the
protein. The free energy associated with transferring AH and A from solution to the
protein can be estimated with PB calculations. For more complicated cases where one has
many ionizable groups, as in the case of a protein, one uses the similar trick. More detailed
derivations are given below.
Since

[A ]
= exp[G]
[AH]
we can write the free energy difference between the ionized and neutral state of the acid as:
G = 2.303RT (pH pKa )
where is +1 for bases and 1 for acids.
For a protein with M ionizable groups in the state (x1 , , xM ) (xi = 1 for ionized group
and 0 for neutral),
X
G(x1 , , xM ) = 2.303RT
xi i (pH pKi,protein )
i

For the ith residue, if we can find a model compound of known pKa , we can then write,
G(x1 , , xM ) = 2.303RT

xi i (pH pKi,model

Gi,mp
)
2.303RT

where Gi,mp = Gi,protein Gi,model


If we can find a way to estimate Gi,mp , then for a given ionization configuration
(x1 , , xM ) we can compute the corresponding G(x1 , , xM ). By sampling all possible configurations (2M not considering complicated cases) or the dominant ones, we can
compute the the average ionization population (hi i), which essentially gives the equilibrium
value of [A ]/[AH] for the ith ionizable group that in turn gives pKa .
We assume (reasonably) that Gi,mp is dominated by electrostatics. Therefore we will
use PB to estimate its value - which involves 4 PB calculations for each ionizable group:
neutral/ionized in protein/water. We note that the values in the protein depends on the
ionization configuration (x1 , , xM ) because the charge distribution changes (otherwise the
problem would be too trivial:-).
Using PB, we have
G(x1 , , xM ) = 2.303RT

X
i

M
1 X
M
X
Gintrinsic
i,mp
xi i (pH pKi,model
)) +
xi xj i j ij
2.303RT
i=1 j=i+1

i,model
The term Gintrinsic
= Gintrinsic
is the free energy change of ionization for
i,mp
i,protein G
group i in the otherwise unionized protein (thus intrinsic) relative to that for a model
i,model
are estimated from PB electrostatic free energy
compound. The Gintrinsic
i,protein and G
calculations; e.g.,
"
#
X
X
1
ionized
neutral
Gintrinsic
q ionized ionized

qkneutral
neutral
0
i,protein = Wi,protein Wi,protein =
k
k0
2 k k
k0

The sum, pKi,model + Gintrinsic


i,mp /2.303RT is also often referred to as the intrinsic pK.
The term involving ij represents the interaction between the ith and the jth ionizable
groups,
intrinsic
intrinsic
i j ij = Gintrinsic
i,j,protein Gi,protein Gj,protein
intrinsic
where Gintrinsic
i,j,protein has the same meaning as Gi,protein but with two groups (i, j) ionized.

For M ionizable groups, the intrinsic pK values and interaction between them (ij ) need to be
calculated once. Then G(x1 , , xM ) can be computed for any ionization state/configuration.
In practice, one does not have to enumerate all possible configurations but can use Monte
Carlo to sample the important ones. Once equilibrated, the average ionization population
for each ionizable group at a given pH can be computed, from which the pKa values can be
obtained.
In general, the results correlate well with experimental data. However, it is often found to
be important to sample multiple conformations of the protein from MD simulations etc.,
especially if significant changes in the protein conformation occur upon ionization. For more
systematic comparisons, see, for example, McCammon et al., J. Mol. Biol. 238, 415, 1994.

2.5

Other forms of implicit models

Generalized Born model


Different versions of GB (e.g., J. Phys. Chem. 103, 3765, 1999); ACE (J. Phys. Chem. 100,
1578, 1996)
SASA based models
EEF1 (Proteins, 35, 133, 1999); others: Proteins, 46, 24, 2002.
Reaction field models for finite size simulations
GSBP (J. Chem. Phys. 114, 2924, 2001)

3
3.1

Homework
Pencil exercises

1. Consider the solvation free energy of a spherical ion. Use continuum electrostatics to
show that the result is


1
q2
1

2r

where q and r is the charge and radius of the ion, respectively, and  is the dielectric
constant of the media. Hint: use thermodynamic integration in which the charge of the
solute is linearly turned on; at each value, estimate the interaction between the charge and
polarization of the media using classical electrostatics.
2. Use the above result to consider a toy ligand optimization problem. Model ligand
binding as the merging of a small sphere (radius r) into a large sphere (radius R). The
charge for the protein is Q, that for the ligand is q. What is the binding energy (take 
to be to simplify algebra)? Can one optimize q to obtain the most favorable binding?
Show explicitly that the solution indeed gives the most favorable binding rather than a local
stationary result. Hint:An interesting related reference is: Tidor et al., J. Chem. Phys. 109,
7522, 1998

3.2

Computational exercises

Script is in bohr:/cui/CHEM 860/Assignment/hw6. A simple pKa calculation - one ionizable group. Note: you need to look up the pKa for the model compound.
Explore variation in the result by using different (i). atomic radii for defining the dielectric
boundary (disable stream radius.str to switch from the optimized set to CHARMM van
der Waals radii; see pka.str); (ii). solvent probe radii (0 or 1.4
A); (iii). dielectric constant
of the protein (1.0 or 4.0 or 20.0); (iv). grid spacing (e.g.,with FOCUS on or off); (v).
salt concentration (0 or 150mM - which is the physiological condition). Refer to pbeq.doc
for syntax related issues. You will learn quite a few CHARMM scripting tricks from this
exercise.

Chem 860. Lecture 16,17


Basic Monte Carlo and Canonical MD with extended Langrangian
November 3, 2004

As we discussed in the beginning of the course, one can obtain equilibrium static properties
from stochastic simulations in which configurations are NOT related by equation of motion.
The requirement (in the most basic scheme) is that these configurations follow pre-defined
distribution - which often is the Boltzmann distribution. These stochastic simulation methods are often referred to as Monte Carlo methods due to obvious reason. Several major
advantages of Monte Carlo methods compared to Molecular Dynamics, which will soon be
clear, are:
Force evaluation is NOT required
Easy to implement for different ensembles
Flexible moves (e.g., identity exchange) and biases (e.g., multi-canonical MC, parallel
tempering)

1
1.1

Metropolis Monte Carlo


Motivation for Importance Sampling

To compute the ensemble average of a quantity,


Z
< A >= dXA(X)P (X)
all we have to do is constructing an algorithm for evaluating this multi-dimensional integral
very efficiently - e.g., focus on regions that have high P (X) instead of uniformly. This is the
basic idea of Importance Sampling.
If we know P (X), the integral is straightforward to evaluate. Consider a one-dimensional
example,
Z 1
Z 1
f (x)
I=
f (x)dx =
dxw(x)
w(x)
0
0

Let du(x)/dx = w(x), then we have


Z

u(1)

du

I=
u(0)

f [x(u)]
w[x(u)]

f [x(u)]
over
So we can generate a uniform distribution for u(x), then evaluate the average of w[x(u)]
those points. One can show that the statistical uncertainty associated with this average over
L points is

1
I2 =
< (f /w)2 > < f /w >2
L

Therefore, if w(x) is chosen such that f /w is a smooth function, the variance would be much
smaller than that for w(x) = 1 (uniform distribution in x).

1.2

Metropolis scheme

A slight problem is that we do not know the probability density, PR(X), itself. Rather, we
only know exp[U (X)] and not the configuration integral, Z = dXexp[U (X)]. So
we have to develop an algorithm such that configurations are generated with the relative
probability proportional to the Boltzmann factor.
Starting from a configuration o (old) with Boltzmann factor exp[U (o)], add a displacement to reach configuration n (new) with Boltzmann factor exp[U (n)]. We have to
develop a rule to reject/accept n such that on average the relative probability of finding the
system in n and o is equal to the ratio of the Boltzmann factors.
The transition probability from o to n is written as (o n), which involves two steps pick a move, then reject/accept the move,
(o n) = (o n) acc(o n)
(o n) is called the underlying matrix for the Markov chain. In general, it can be nonsymmetric (i.e.,(o n) 6= (n o) ), although often it is symmetric (e.g., uniformly
random in x).
The (strong) condition that the transition probability has to satisfy under equilibrium is
detailed balance:
P (o)(o n) = P (n)(n o)
The Metropolis scheme:
(o n) = (o n), P (n) P (o)
(o n) = (o n)[P (n)/P (o)], P (n) P (o)
X
(o o) = 1
(o n)
n6=o

or equivalently,
acc(o n) = min[1, P (n)/P (o)]
Does the Metropolis scheme satisfy detailed balance? Imagine that U (n) < U (o), then
(o n) = (o n)
(n o) = (n o)[P (o)/P (n)
which means that, if is symmetric,
(o n)
P (n)
=
(n o)
P (o)
Accepting move with a probability P (n)/P (o) < 1: generate a random number uniformly in
[0, 1]. If it is smaller than P (n)/P (o), accept the move. Otherwise, reject. For a long MC
run, a good random generator routine is useful.

1.3

A few practical points

Displacements: single or all?


Since the probability of accepting a move is small if the energy rise is much larger than
kB T , it is much better to move one particle at a time than moving all particles at the same
time in a random fashion. However, one can generate collective moves using MD (with large
step size) which does not change energy much. This is the so-called hybrid Monte Carlo
approach, which was shown to sample better than the standard MC.
Size of displacement - optimal acceptance ratio?
Clearly the size of displacement should be adjusted such that the acceptance ratio is reasonable. One considers both the efficiency of sampling (how far we move around) and ergodicity
of sampling (sample very different configurations). Previous work, for example that of Thirumalai et al. (Physica, A, 210, 453, 1994). showed that the optimal acceptance ratio is 20%
instead of the naive 50%.
Importance of detailed balance
Should avoid: changing maximum stepsize after equilibration.
Even if a move is rejected, the old configuration should still be counted in the average!
because (o o) is in general not zero.
Unphysical moves can significantly speed things up! Especially for highly packed systems
such as solid mixtures, dense polymers and proteins.

Ensemble in MC and MD

We said that its straightforward to implement different ensembles in the MC framework


compared to MD. As an illustration, lets look at canonical ensemble. The Metropolis MC
scheme we discussed generates exactly the canonical distribution. How to do so with Molecular Dynamics?

2.1

Canonical is NOT isokinetic

Its important to know that canonical ensemble does not imply that kinetic energy (i.e., an
instantaneous temperature does not fluctuate. It is only a thermodynamical variable that
characterize the energy distribution of the system. At equilibrium, we know:

P (p) =

2m

3/2

exp[p2 /2m]

and
3kB T =< p2 /m >
To compute the variance of the temperature, we need the second and fourth moment of the
Maxwell distribution:
Z
2
< p >= dpp2 P (p) = 3m/
Z
4
< p >= dpp4 P (p) = 15(m/)2
Thus,
T2k
< Tk2 >N V T < Tk >2N V T
N < p4 > +N (N 1) < p2 >2 N 2 < p2 >2
=
=
< Tk >2N V T
< Tk >2N V T
N 2 < p2 >2
=

< p4 > < p2 >2


2
=
2
2
N <p >
3N

So the relative fluctuations in the instantaneous temperature decreases as the number of


particles increases.

2.2

The Anderson model

Intuitive: the effect of the heat bath is to exchange energy with the system. So we occasionally pick a set of particles in random to re-assign its velocity based on the Maxwell
distribution at temperature T .

Pseudo-code:

do i=1, N particle
if (rand()< t) then
v(i)=Gaussian(T )
endif
enddo

It can be shown that the Anderson model does produce a canonical distribution (e.g., one
can monitor the velocity distribution), and static properties such as pressure are independent of the collision frequency (). However, the stochastic kicks perturbs the dynamics
of the system, which makes dynamical properties sensitive to the value of the collision frequency (e.g., the kicks would artificially decorrelate the velocities, which modifies the velocity
autocorrelation function, which in turn modifies the diffusion constant).

2.3

Nose-Hoover dynamics

The extended Lagrangian approach is widely used - for constant temperature, pressure simulations, as well as in simulations in which expensive optimizations have to be done (e.g.,
relaxing the electronic degrees of freedom in polarizable models or electronic structure theory).
For constant-temperature simulations, the basic idea is that canonical trajectories in the
real molecular systems are generated by sampling microcanonical ensemble for an extended
system (i.e., with extra degrees of freedom that represent thermostat).
The Lagrangian of this system is given as,
LN ose =

N
X
mi s2 r 2
i

i=1

U (rN ) +

Q 2 L
s lns
2

where Q is an effective mass associated with the extended degree of freedom (s). L is a
variable to be determined later. The magic of ln will become clear soon.
The corresponding Hamiltonian is then
HN ose =

N
X

qi pi LN ose

i=1

where qi = {ri , s} and


pi =

LN ose
= mi s2 ri
ri
5

ps =

LN ose
= Qs
s

Substituting those in, we get the Nose Hamiltonian,

HN ose

N
X
lns
p2i
p2s
N
=
+L
+
U
(r
)
+
2
2mi s
2Q

i=1

Thus, the microcanonical partition function associated with the Nose Hamiltonian is given
by,
QN ose

1
=
N!

dps dsdpN drN (E HN ose )

Substituting in the Nose Hamiltonian, and using the property,


f (s) = (s s0 )/|f 0 (s0 )|
(s0 being the root of function f (s)), with straightforward algebra, we get
QN ose

1
=C
N!

dp0 drN exp[

3N + 1
H(p0 , r)]
L

where p0 = p/s.
The exp comes about due to the magic choice of ln in the original Nose Lagrangian.
We see that if we choose L = 3N + 1, the QN ose is essentially the canonical partition function
of the system characterized by rN , p0 N . In other words,
< A(p/s, r) >N ose =< A(p0 , r) >N V T
A technical point is that we distinguish real variables (physical system) and virtual
variables (extended system):
r0 = r
p0 = p/s
s0 = s
t0 = t/s
So the extended variable s can be thought as adjusting the integration time step of the
system. One may choose to propagate the equation of motion in terms of the physical
variables or virtural variables. To ensure correct canonical distribution for the physical
system, different L values need to be adapted (see Frenkel and Smit).
6

Another tricky point we want to mention without deep explanation is that the correct distribution is generated if there is a single constant of motion; for isolated systems (without
external force), this means that the correct distribution is obtained only if the center of mass
is fixed. A useful alternative is to use the so-called Nose-Hoover chain, in which a set of
thermostats are associated with each other (G. J. Martyna, et al. J. Chem. Phys. 97, 2635,
1992).
A key parameter in the Nose-Hoover MD is the effective mass Q. A small Q corresponds
to a low inertia of the heat bath and leads to rapid T fluctuations while a large Q gives slow
response to the T change. The magnitude of Q has only a minor influence on the dynamical
quantities such as diffusion constant.

3
3.1

Homework
Pencil exercises: extended Lagrangian

Show explicitly that the microcanonical partition function associated with the Nose Hamiltonian gives the canonical partition function for the physical system with specific choice of
L.

3.2

Computational exercises: Basic Metropolis Monte Carlo

Script is in bohr:/cui/CHEM 860/Assignment/hw7. You will find a CHARMM script that


does Monte Carlo simulations for a small protein, BPTI, plus four buried water molecules.
Look through the input and mc.doc to understand different types of moves available in
CHARMM for protein simulations.
(1). Run a short MC simulation to optimize the move set (e.g., step size etc.). (2). With
this move set, first carry out MC simulations to equilibrate the system - i.e., monitor the
structure of the protein and make sure that the RMSD reaches a plateau. (3). Next run MC
production runs. Estimate the average potential energy and specific heat of the system. (4)
Bonus: Can you develop a way to compare the efficiency of MC simulations to Molecular
Dynamics? What quantities would you look at. Run a short MD to illustrate your point.

3.3

Project

It is time to think about the final project. You are required to come up with a problem - which
can be (but does not have to be) related to your research. Discuss how to solve this problem
with molecular simulation methods. Perform preliminary calculations with CHARMM to
demonstrate the feasibility of such research. Give a twenty minute presentation by the end
of the semester. You are encouraged to discuss the project with me.

Chem 860. Lecture 18,19


Advanced Monte Carlo: Landau-Wang, parallel tempering; different ensembles
November 14, 2004

Sampling with advanced Monte Carlo methods

Previous discussions have emphasized that sampling is a major issue in molecular simulations.
With molecular dynamics, which follows the natural dynamical evolution of systems, it is
difficult (though not impossible - e.g., umbrella sampling if you know what to bias, or
transformed potential that is smoother based on Tsallis statistics) to surmount barriers
and sample different regions efficiently. With Monte Carlo, by contrast, one can develop
innovative techniques because the process is stochastic. All we have to make sure if that
when we analyze results, we are using the correct probability distribution.
Many versions of MC have been developed over the years. We mainly discuss two approaches:
(i). Density of state Monte Carlo (or known as Landau-Wang approach, which samples
energy space with uniform distribution); (ii). Parallel tempering or replica exchange, which
combines samplings at different temperatures as well as parallel computing technology.

1.1

Density of state Monte Carlo

In canonical MC, energy barriers high compared to kB T are difficult to overcome. A powerful
alternative is to actually perform sampling that has uniform distribution in energy levels
(instead of exponential as in Boltzmann!)- in such a way, we can sample a wide range of
energies and therefore overcome barrier easily. If we can estimate the density of state, (E) which essentially measures the degeneracy (number of microstates) of energy levels, then we
can calculate any statistical quantities. After all, (E) and the canonical partition function
is related by a Laplacian transformation,
Z
Q(N, V, T ) = dE(E)exp(E)
The canonical distribution is simply
P (E) = (E)exp(E)/Q
1

which can be used to compute any ensemble average of observable. We note that it is
difficult to use traditional canonical MC to estimate (E) through sampling P (E), because
the sampling is poor for high energy and regions separated by high barriers.
The Landau-Wang algorithm (Phys. Rev. Lett., 86, 2050, 2001) is a method that converges
to (E) within a given tolerance. Imagine that we have a set of energy levels, {Ei }, and
we will perform a random walk in energy space such that the histogram H(E) is uniform
or flat. Such a random walk can be constructed in the similar way as the canonical MC
discussed earlier, i.e., make a move according to (o n), then accept the move with a
given rule specified by acc(o n). If we choose to be symmetric, then the acc function
takes the following form to satisfy detailed balance,
acc(o n) = min(1,

N (n)
)
N (o)

where the distribution N in our case is inversely proportional to the density of state (because
we want to have uniform distribution in energy levels); i.e.,
acc(o n) = min(1,

(o)
)
(n)

The only problem is that we dont know (E) to start with! So we have to iterate. We
initialize H(E) = 0 and (E) = 1. Then we perform MC moves according to (uniform)
random displacements and the acceptance rule established above (based on the current guess
of (E). Throughout the simulation, we update both quantities based on the energy level
(Ei ) after the move (whether accepted or rejected!):
H(Ei ) = H(Ei ) + 1
(Ei ) = f (Ei )
where f is taken as a large number to start with (e.g., e1 ).
The update is continued until the histogram H(E) is reasonably flat. At this point, E
has converged with error proportional to lnf . To improve the result, we move on to the next
iteration; we reset histogram to be zero, and reset f as
f

The iteration stops until f is reasonably close to be 1 (e.g., lnf < 108 ).
In practice, there are issues like
How flat is flat for H(E)?
Unique functional form for f update?
Accumulation of (E)
We emphasize that Landau-Wang does not rigorously satisfy detailed balance - because
acceptance criterion is changing during the iteration. But in reality, the deviation is very
small especially if f is close to 1.
2

1.2

Parallel tempering

Another type of approach takes advantage of simulations with multiple temperatures. We


construct a extended ensemble that contains multiple copies of the system, each copy is
simulated at a different temperature. In addition, we occasionally swap systems; e.g., swap
a high-T with a low-T system. Since the high-T simulation can sample the configuration
space very efficiently, the swap helps the low-T simulation to explore the landscape much
better than an isolated low-T simulation. To make the swap get accepted with reasonable
probability, we can choose a range of temperatures.
The only thing we have to figure out is the acceptance criterion, i.e., acc function. The
detailed balance condition for a pair of states is, assuming that is taken as symmetric,
N (i, i )N (j, j ) acc[(i, i ; j, j ) (i, j ; j, i )]
= N (i, j )N (j, i ) acc[(i, j ; j, i ) (i, i ; j, j )]
Simple rearrangements give then,
N (i, j )N (j, i )
acc[(i, i ; j, j ) (i, j ; j, i )]
=
acc[(i, j ; j, i ) (i, i ; j, j )]
N (i, i )N (j, j )
= exp[(i j)(U (i) U (j))] = exp[ij Uij ]
Therefore, we can choose the acc function as,
acc[(i, i ; j, j ) (i, j ; j, i )] = min{1, exp[ij Uij ]}
Few points to note:
Swap is computationally inexpensive (no need to recompute any energy).
Swap does not perturb Boltzmann distribution - so simple to accumulate result.
Swap can occur in more than temperature - can be chemical potential, polymer length etc.
See works from the de Pablo group.

Monte Carlo simulation of various ensembles

It is very straightforward to develop Monte Carlo methods for different ensembles. All we
have to do is to include new types of moves that reflect the character of the ensemble (e.g.,
volume move for isobaric), and develop acceptance rule for the move such that detailed
balance is satisfied. In general, if we make the move-generation step symmetric (i.e., (o
n) symmetric), the task of developing acceptance rule is reduced to finding the probability
distribution associated with the move variable (e.g., volume); because we can always use the
Metropolis scheme:
acc(o n) = min{1,

N (n)
}
N (o)

2.1

Isobaric-isothermal ensemble - NPT

Consider that the system of interest (N, V, T ) is in contact with a much larger bath system
(M N, V0 , T ) through a piston; the latter contain ideal gas of fixed pressure P . The volume
V can fluctuate such that the pressure of the system of interest is a constant. To develop a
Monte Carlo scheme for the NPT ensemble, we simply have to add a move that changes the
volume V . For the corresponding acceptance rule, we have find N (V ) or N (V, rN ).
The canonical partition function for a NVT system is given as:
QN V T =

1
3N N !

drN eU (r

N)

Assuming that we deal with a cube of box length L, we can define a set of scaled coordinates:
sN = L3 rN
and the partition function has explicit V -dependence
QN V T

VN
= 3N
N!

dsN eU (s

N ;L)

Taking the ideal gas in the bath-system into account, the total partition function is
Z
V N (V0 V )M N
N
dsN eU (s ;L)
QN V T ;M,V0 = 3M
N !(M N )!
Note that there is no configuration integral contribution from the bath because it contains
ideal gas.
Using this expression, we can write the distribution of volume,
R
N
V N (V0 V )M N dsN eU (s ;L)
N (V ) = R V0
R
0 V 0N (V V 0 )M N
dV
dsN eU (sN ;L0 )
0
0
We note that if we take the bath to be extremely large, i.e., M , V0 , we have the
approximation,
(V0 V )M N = V0M N {1

V M N
}
= V0M N e(M N )V /V0 = V0M N eP V
V0

where we used the fact that the bath contains ideal gas of well defined volume, pressure and
temperature. Substituting this result into the expression of N (V ), it is easy to see that
N (V, sN ) V N eP V eU (s

N ;L)

= exp{[U (sN ; L) + P V N 1 lnV ]}

which means that the acceptance rule for the volume move is
acc(Vold Vnew ) = min{1, exp{[U (sN ; L) + P V N 1 ln(Vnew /Vold )]}}
Note that the potential energy contributes because the inter-particle distances change once
the volume of the system changes (i.e., it is sN held fixed during the volume move). As a
result, the volume move is expensive because the total potential energy has to be calculated.
In realistic simulations, one should invoke the volume move with the probability of 1/N , N
being the number of particles in the system.

2.2

Grand Canonical ensemble - GCMC

Grand canonical ensemble is very powerful when studying absorption or water distribution
in the interior of biomolecules, for example. We use the same bath-system trick to think
about GCMC, except that now the system and bath can exchange particles as well. For
simplicity, we consider the V T ensemble in which the volume of the system is constant.
The canonical partition function of the entire system-bath is given as
Z
M
X
V N (V0 V )M N )
N
dsN eU (s ;L)
QM,V,V0 ,T =
3M
N !(M N )!
N =0
The probability distribution for N is then given as
N

V N (V0 V )M N eU (s ;L)
N (N, s ) =
QM,V,V0 ,T 3M N !(M N )!
N

Using the same large-bath limit, one can show that the probability distribution is simplified
to be
N
eN V N eU (s ;L)
N
N (N, s )
3N N !
where is the chemical potential imposed by the bath.
This immediately suggests the following acceptance rules:
V
acc(N N + 1) = min{1, 3
exp[( U (N + 1) + U (N ))]}
(N + 1)
acc(N N 1) = min{1,

3 N
exp[( + U (N 1) U (N ))]}
V

where notations such as U (N + 1) indicate the potential energy of the system containing
a specific number of particles. One can show that this set of acceptance rule satisfy the
detailed balance.
In practical simulations, the difficulty is associated with the insertion move because one has
to be careful about where to insert the new particle; otherwise, the acceptance probability
is very small. For a recent example of implementation for studying water/ion distribution
in biomolecules, see J. Chem. Phys.,121, 6392.
5

3
3.1

Homework
Pencil exercises - Question 13 in FS.

Consider developing a GCMC scheme for a mixture of two components - at temperature T


and chemical potential for the components 1 and 2 .
1. To add/remove particles, the following scheme is used,
select at random to add or remove a particle
select at random a component
add or remove a particle of this component.
Derive the acceptance rules for these trial moves
2. Imagine an alternative scheme:
select at random to add or remove a particle
select at random a particle, independent of its identity, add or remove.
Does this scheme satisfy detailed balance if the acceptance rules derived in 1 is used? If not,
can this be corrected? Hint: a useful reference is Mol. Phys. 85, 435, 1995

3.2

Computational exercises: NVT vs. NPT simulation of water

Script is in bohr:/cui/CHEM 860/Assignment/hw8. You will find a sample script for running NPT MC simulation for a box of water molecules. Modify the script such that:
You can run converged simulation for water, write out the trajectory and compute the RDFs,
compare with your previous MD run.
Bonus: Run a NVT MC simulation and compare the results from NPT simulation.

Chem 860. Lecture 20,21


Special Topics in Biophysics. I. Enzyme Catalysis
November 21, 2004

Recap: what have we learned about Molecular Simulation?

In previous lectures, we have gone through introductions to some of the most fundamental
topics in molecular simulations. Recall that most simulation studies deal with two types of
observables:
Equilibrium properties - such as average structures (RDF), fluctuations (heat capacity)
and thermodynamical properties (free energy of binding or potential of mean force).
Dynamical properties - such as diffusion constant and line-shape of condensed phase spectroscopy, which involve evaluation of certain type of time-correlation function.
For equilibrium properties, both Molecular Dynamics (MD) and Monte Carlo (MC) can
be used. MD is straightforward to implement and follows natural evolution of the system;
MC requires some tuning (e.g., step-size and move types) but is more flexible which allows
better sampling and implementation of various ensembles. For dynamical properties, only
MD should be used rigorously.
To obtain reliable results for realistic systems, one has to worry about accuracy of the force
field (e.g., whether polarization is needed), sampling of conformational space and statistical
error (e.g., block averaging), as well as treatment of important interactions (e.g., different
methods for treating electrostatics - cut-off, Ewald, implicit solvent). The level of sophistication relies on the specific question under investigation; for the simulation study of complex
systems, relevant experimental validation is crucial. Finally, one should always remember
that although accurate simulation is useful, the most powerful aspect of carrying out molecular simulation is that it allows one to analyze what are the most important factors that
govern the process of interest.

Special Topics

Starting from this lecture, we will discuss several special topics in the field of computational
biophysics. We will introduce relevant scientific backgrounds and then discuss representative
techniques that were specifically developed for studying these topics - these topics concern
areas of active research, there is plenty of opportunity for you to contribute - so we will also
comment on the current challenges.
We plan to introduce (if time permits):
Enzyme catalysis: combined QM/MM methods and Car-Parrinello
Rare events in the condensed phase: reaction path search and transition path sampling
Protein-protein/ligand interaction: thermodynamics and kinetics
Conformational transitions in macromolecules: from protein folding to signal transduction
Simulation and experiment: structural refinement and probing dynamics
Simulation of large length scales.

3.1

Topic I: Enzyme Catalysis: combined QM/MM methods and Car-Parrinello


Background

Enzymes are amazing catalysts - they can catalyze chemical reactions with up to 1020 times
faster than solution reaction and often have equally impressive specificity (e.g., stereochemistry). In the words of Jeremy Knowles, enzymes are not different, just better. Understanding what make them so efficient and specific is the dream of many bio- and physical
chemists. Ultimately, we envision that we can design new enzymes for specific chemistry in
a rational way.
The fundamental difficulty associated with enzyme studies is that bond breaking and formation processes have to be described with adequate accuracy. Tradition force field is not
even qualitatively useful for such purposes. Although accurate numerical potential energy
surface can be developed for small molecules, it is difficult to do so for large systems.
Two general approaches: QM/MM where a small part of the system is treated with QM
and rest with MM; Car-Parrinello that treats all (or most) atoms with Density Functional
Theory on the fly. Although mainly introduced here as a way to study chemical reactions
in biological systems, both find applications in wide areas of science - such as material science (material cracking) and heterogeneous catalysis (surface reaction). In addition, both
methods can be used to probe not only energetics but also diverse types of molecular properties in complex environment (e.g., NMR chemical shifts), which is difficult to do with pure
empirical force fields.

3.2

QM/MM

Its been controversial as to who initially proposed QM/MM methods - its safe to say that
many had the similar idea (which is very intuitive anyway) but Warshel/Levitt and Karplus
led the two groups that really pushed the technique forward in different ways. Many others
made useful developments subsequently (J. Gao, R. Friesner, K. Morokuma, W. Yang, W.
Thiel ).
Useful review articles:
Gao, J. Acc. Chem. Res., 29, 298 (1996); Field, M. J. et al., J. Comp. Chem.,11,700, 1990;
Warshel, A.,Ann. Rev. Biophys. Biomol. Struct.,32,425 (2003)
QM/MM partitioning and potential energy
A popular expression for the potential function,
D
E
QM/M M
QM/M M
QM + H
QM/M M | + Evan
U tot = |H
+ Ebonded + E M M
elec
where the electrostatic part is the dominant interaction between QM and MM atoms. Usually
it takes the form of pure Coulombic interactions between MM partial charges (A) and QM
electrons(i)/nuclei(B),
X QA ZB
X QA e
QM/M M =
+
H
elec
|RA ri | B,A |RA RB |
i,A
QM/M M

The van der Waals term (Evan


) describes short-range repulsion and long-range dispersion
QM/M M
between QM and MM atoms; the bonded term (Ebonded ) maintains the connection of the
QM fragment (e.g., a sidechain in protein) with the MM part. For the importance of these
terms, see, for example, D. Riccardi et al. J. Phys. Chem. B, 108, 6467 (2004)
Key items to pay attention to:
The interface scheme (though NOT as serious as claimed in the literature) - unless significant change in the total charge of the QM part - e.g., in proton affinity or redox potential
calculations. For a recent survey, see, for example, Reuter et al., J. Phys. Chem. A, 104,
1720 (2000)
Electrostatics in QM/MM interactions - cut-off vs. appropriate treatment - can significantly influence results of simulation
Level of QM (EVB, semi-empirical, DFT, ab initio) - which to choose depends on the
question of interest
Sampling - depending on the reaction. See Warshels review for interesting comments.

3.3

Car-Parrinello

With a potential function at a specific level of QM, one can compute the force on the atoms
and propagate the classical equation of motion using molecular dynamics. This is often
3

referred to as ab initio MD. Although straightforward, it may not be most efficient. In


fact, since we often use small integration time steps, the electronic structure of the system
does not change much - we should be able to take advantage of this somehow.
Car and Parrinello proposed in 1985 just such an approach (Phys. Rev. Lett, 55, 2471, 1985)
using the concept of extended Lagrangian. The CP Lagrangian has the form,
E
X D
1X
2

MA RA U ({i }, {RA }) +
i i |i
LCP =
2 A
i
The last term contains the fictitious kinetic energy associated with the Kohn-Sham orbital
i . In other words, instead of going through the SCF cycles in ab initio MD for each nuclear
configuration, we propagate the Kohn-Sham orbitals along with the nuclear positions during
the MD. Here we have implicitly assumed that the orthornormal condition associated with
the Kohn-Sham orbitals have been incorporated in the energy function.
Using the Lagrangian formalism, we have the following equations of motion for the nuclear
positions and KS orbitals,
= U ({i }, {RA })
MA R
A

RA
U ({i }, {RA })
i i =
< i |
It is generally assumed that if the fictitious mass, i , is sufficiently small, the CPMD generates potential energy sufficiently close to that of the true adiabatic electronic state (which
is rigorously solved in ab initio MD through SCF). This is generally accepted, although one
can show that there is systematic error associated with the energy and force in CPMD as
far as the fictitious mass is finite. The result of CPMD depends very sensitively on the
fictitious mass; if i is too large, the electronic motion in CPMD becomes coupled with
the nuclear motion (which they should not when the Born-Oppenheimer approximation is
valid) and the results (e.g., vibrational frequency or diffusion constant) can be significantly
perturbed. For a nice illustration of this, see a recent discussion by Galli et al., J. Chem.
Phys., 120, 300 (2004) To avoid kinetic energy transfer between the electronic and nuclear
degrees of freedom, Nose-Hoover chain methods are used to keep the electronic and nuclear
temperatures at different values (e.g., 2 and 300K, respectively).
Essential limitations of CP at the current stage:
Too short time-scales so far ( 20ps)
Mostly restricted to the use of pure DFT (i.e., original CPMD can not be used with
hybrid functionals which are known to be more accurate).
Un-natural basis set (plane wave) for molecular systems (in contrast to periodic systems).
However, CPMD and ab initio MD clearly offer a very exciting future for molecular simulation
in chemistry related physical and biological science. Follow the work of Parrinello and
Tuckerman group for further developments.
4

3.4

Current Challenges

Reliable treatment of large systems in an efficient manner (e.g., ribosome function).


Open-shell systems (e.g., transition metalloenzymes, electronically excited states).
Balanced treatment of reactive group and its environment (e.g., polarization in the environment).
Accuracy vs. sampling - e.g., combine different levels of theories in free energy simulations

4
4.1

Examples
QM/MM in CHARMM

CHARMM has a number of QM packages interfaced with it, which allows QM/MM simulations at different levels of sophistication (in terms of level of QM and level of sampling).
As a very simple example for syntax, see the input file under Assignment/qmmm. Read
qmmm.doc for general instructions for calculations using semi-empirical (AM1, PM3) as
the QM method. Other possibilities are explained in: gamess.doc, gamess-uk.doc and sccdftb.doc.

4.2

CPMD

Just google:-) but www.cpmd.org is a good place to start.

Chem 860. Lecture 22,23


Special Topics in Biophysics. II. Studying rare events in proteins
December 6, 2004

Background

Last two lectures discussed general methods for generating potential functions that can
describe chemical processes. The next two lectures discuss how can we use such potentials
to gain mechanistic insights into the chemical reaction.
Many chemical reactions that occur in biomolecules can be described as rare events - when
the barrier is substantially higher than the thermal energy, kB T . Most of the time the system
simply undergoes thermal fluctuations without making significant progress towards reacting;
very rarely sufficient amount of energy is distributed into the relevant degrees of freedom, as a
result the chemical reaction occurs very quickly (i.e., barrier crossing is fast). Because of the
long time (typically > s) associated with the system preparing the reactive configuration
and energy distribution, it is often not possible to visualize the chemical reaction to occur
spontaneously - especially if meaningful statistics need to be accumulated.
In very limited number of cases, one can identify a few coordinates that serve as the most
important descriptor (order parameters) of the reactive event - then one can compute
potential of mean force along those coordinates. The shape of the PMF can tell us about
approximate energetics and sequence of events and nature of possible intermediates. For
reactions in the condensed phase, however, it is often difficult to identify most important
coordinates (after all, if we have identified these coordinates, the problem is more or less
solved); more seriously, seemingly reasonable results might be obtained if important coordinates are missing, which can be misleading.
There are two general classes of methods for identifying important degrees of freedom for
describing a chemical reaction, given that both reactant and product states are known.
The first class searches for paths in the coordinate space - the result is generally known as
reaction path; the second class searches for paths in the phase space (both coordinate
and momentum) - the result is referred to as dynamical (transition) path. Apparently the
latter offers more details but is also much more expensive computationally.
There are also many processes that are diffusive - e.g., conformational transitions in proteins.
The issues of identifying specific sequence of events and thermodynamic driving force are

equally important, but different methodologies have to be used, which will be introduced
separately.

2
2.1

Reaction path search algorithms


Temperature-independent path - the minimum energy path
(MEP)

The MEP is a path in coordinate space that connects the reactant, product and the first
order saddle point between the two; it is defined as the steepest descent path from the saddle
point and often mass-weighted coordinate is used. Apparently there is no kinetic energy
component and therefore is temperature-independent. The effect of the temperature on the
thermodynamics/kinetics of the reaction is usually taken into account through harmonic
approximation and transition state theory.
MEP is used extensively in the study of small molecule reactions. In the condensed phase,
it is difficult to use MEP to gain quantitative results because there are many local minima,
saddle points and therefore MEPs connecting those. However, previous studies have shown
that qualitative insights can be obtained in many systems by sampling multiple reaction
paths (e.g., Zhang and McCammon, J. Phys. Chem. B, 107, 4459, 2003).
There are two general protocols for searching MEPs. One protocol searches for the saddle
point first, then follows the steepest descent path from the saddle; the other searches the MEP
directly and hopes to converge the saddle point as well. We make some basic introductions
to those implemented in CHARMM.
2.1.1

Minimization and saddle point optimization; Conjugate Peak refinement


(CPR)

Most optimization procedures in simulations are based on the Newton method, which involves
a second order Taylor expansion of the potential energy around the current configuration,
1
U (X) = U (X0 ) + g(X0 )(X X0 ) + (X X0 )T H(X0 )(X X0 ) +
2
where g, H is the gradient vector and Hessian matrix, respectively.
Taking the first derivative of U (X) and set the result to be zero (stationary condition), one
obtains
X = X0 H1 (X0 )g(X0 )
If the surface is quadratic, the Newton method converges in one step for arbitrary dimensional
system with arbitrary starting configuration. In reality, the potential energy surface is of
course not quadratic and one has to iterate.
2

For large systems, the Hessian is expensive to compute and the inversion is also scales
poorly with respect to the number of dimensions. Therefore, one often takes approximate
hessians and updates them based on gradient information. How quickly does the optimization
converge apparently depends critically on the quality of the Hessian used.
To make sure the energy goes down (we are searching for minima), any negative eigenvalues
of the Hessian are shifted to positive.
In the more dramatic extreme, one uses steepest descent in which the Hessian is set to be
unity, and one simply follows the direction of the gradient, i.e.,
X = X0 g(X0 )
To make the optimization efficient, one often does a line search in the direction of the
gradient, i.e.,
X = X0 g(X0 )
Or one simply uses as a way to restrict the size of the step (trust radius).
The search for saddle points is generally much more demanding because we need to identify
one and only one direction along which the curvature is negative. Therefore, the hessian
should have only one negative eigenvalue. If the starting configuration is far from the saddle,
this may not be the case; many algorithms were developed to identify the correct eigenvector
to enforce the corresponding eigenvalue to be negative.
The conjugate peak refinement (CPR) method is a gradient-based method for searching
for saddle point (Fisher, Karplus, Chem. Phys. Lett., 96, 252 (1992)). It consists of the
following procedures
1. Start from a straight line path as the initial guess, search along the line to identify the
maxima, which is called peak.
2. From that peak, search conjugate to the direction of the reaction pathway and find the
lowest energy point. Make this point a permanent intermediate point on the path.
3. Return to step 1 and refine the two intermediate segments in the same manner, until
convergence.
The method is powerful because it can converge to the exact saddle point. The practical
convergence behavior depends on the initial guess (one doesnt have to start with only two
points - reactant and product; instead, one can use a coordinate to drive an approximate
path and then feed to CPR). The CPR method has been used in many enzymatic studies as
well as conformational transition problems.
2.1.2

Direct path search methods - Nudged Elastic Band (NEB)

A representative direct path method is the NEB approach, which was developed by Jonsson
and co-workers based on the pioneering work of Elber and Karplus. One describes the path
3

with a number of (N ) images (replica of the system). The general goal is to identify a set
of images that minimizes the functional,
N
X

U (Xi )

in the presence of constraints that path points are nearly equal-distance:


N
1
X
i

k
(Xi+1 Xi )2
2

Although one may search for images based on minimizing the sum of the above two terms,
Jonsson et al. found that better convergence can be obtained by working with modified
versions of the force components perpendicular and parallel to the tangent of the path (
i ),
Fi = Fsi |k U (X)|
where
U (X)| = U (X) U (X) i
and
Fsi |k = k[(Xi+1 Xi ) (Xi Xi1 )] i i
The direction of the tangent can be estimated in different ways, with the simplest approach
being
i =

Xi+1 Xi
|Xi+1 Xi |

or one may use the bisect of two adjacent two-point estimates, or even more sophisticated
approximations (J. Chem. Phys. 113, 9978 (2000)).
The fact that perpendicular force vanishes upon convergence ensures MEP; the parallel
component makes sure that the images are nearly equal-distance.
Without the force massaging (or nudging), the spring force causes the path to deviate
from the MEP at high-curvature regions (corner-cutting) and the potential force causes
the images to favor low energy regions (sliding-down). The nudging decouples the
spring force and potential force to be responsible for controlling image distance and MEP
convergence, respectively.
One draw back of the NEB approach is that it does not yield the exact saddle point. However,
a well converged NEB path can be used as the input for other methods to obtain more
accurate saddle point configuration.
There are many improvements made on NEB to make it more effective for biological systems
- e.g., using lower weights for soft degrees of freedom. See, for example, J. Chem. Phys.,
120, 8039 (2004)
4

Transition path sampling - throwing ropes between


mountains in the dark

There are many reasons to go beyond the MEP search methods outlined above:
There are many MEPs connecting two free energy basin for complex systems. One would
like to minimize the effect of initial guess - although chemical intuition is useful, often it can
be misleading as well.
At finite temperature, the transition state can be very different from the saddle point. Even
in the gas phase, this could be the case: an extreme case concern radical recombinations, for
which there is no saddle point on the potential energy surface but there is transition state
(i.e., dynamical bottleneck) for recombination on the free energy surface - because there is
entropic penalty associated with the recombination of two radicals. For many condensed
phase processes, there is significant entropic component.

3.1

Transition State Ensemble

A useful definition of transition state configuration is in terms of commitment probability


- i.e., the probability of reaching the product (reactant) after initiating trajectories with
random velocities (consistent with temperature of course) - a configuration is transition
state if the commitment probability towards the product (pB ) and reactant (pA ) are both
0.5. All these TS configurations form the so-called Transition State Ensemble.
With this definition, it is straightforward to identify the TS ensemble - we collect a set of
reactive trajectories (which TPS is efficient for) - because these trajectories will include at
least one member of the TS ensemble - for each configuration, we determine the commitment
probability by initiating many short trajectories. If pB = pA = 1/2, this configuration is
part of the TS ensemble.
Once the TS ensemble is collected, one can attempt to perform analysis and identify degrees
of freedom that might be important for the reaction. To verify these are the important
variables - reaction coordinates - one may further study the distribution of commitment
probability distribution when the reaction coordinates are held to values at the transition
state. If the variables identified are the only important ones, pB should have a very sharp
distribution at 12 . Otherwise, one observes either very flat distribution or peaked at either
0 or 1. For a good discussion, see Chandler and co-workers in Proc. Natl. Acad. Sci. 97,
5877, 2000 ; even for a short peptide in the gas phase, the variables that make chemical
sense are not sufficient for providing the complete description.

3.2

TPS and chemical kinetics

Here we briefly discuss how TPS can give rigorous (classical) rate constants and how TPS
sampling is done. For more details, see Adv. Chem. Phys, 123, 1 (2002); or Ann. Rev. Phys.
Chem., 53, 291 (2002).
5

Basically, the key quantity to observe when studying kinetics is the time-correlation function,
C(t) =

< hA (x0 )hB (xt ) >


< hA (x0 ) >

where hA (x0 ) defines the reactant; i.e., hA (x0 ) = 1 if x0 , by a predefined criterion, falls into
the reactant region, and 0 otherwise. The same definition applies to hB (xt ) for the product
region. In the time scale of transition, which is long compared to the molecular time-scale
and much shorter compared to the reaction relaxation time (related to the macroscopic rate
constant), we have
C(t) < hB > (1 et/R )
where R is the relaxation time: 1/(kAB + kBA ). In this regime, one can show that the
time derivative of C(t) is nothing but the forward rate constant.

C(t)
=< hB > /R = kAB
The last identity can be derived straightforwardly using classical macroscopic chemical kinetics (see either Chandlers book or Frenkel & Smit). In short, one uses TPS to compute
C(t) and monitor its the time derivative. The plateau value of the latter gives the classically
exact rate constant.
How to compute C(t) using TPS? One can express C(t) slightly differently,
R
dx0 N (x0 )hA (x0 )hB (xt )
R
C(t) =
dx0 N (x0 )hA (x0 )
R
dx0 NA,t (x0 )hB (xt )
R
=
=< hB (xt ) >A,t
dx0 NA,t (x0 )
That it, C(t) is nothing but the ensemble average of hB (xt ) - but this average is over a path
ensemble which consists of all paths that start from the reactant region (A) and last time
t. Although this ensemble is straightforward to construct, computing C(t) is not necessarily
straightforward because we need trajectories end up in the product (B) at time t (otherwise
does not contribute to C(t)). The trick is doing something like umbrella sampling. Imagine
that the condition we use to define the product region, B, is that certain parameter falling
into a specific region, i.e.,
min < < max
Then we can rewrite C(t) as,
Z

max

C(t) =

P (, t)d
min

where
P (, t) =< ( 0 (t)) >A,t
To compute P (, t) efficiently, we can divide the region (min , max ) into multiple windows
and add windowing functions to enforce the sampling of a specific window as in the more
conventional umbrella sampling,
6

R
P (, t) =

dx0 N (x0 )hA (x0 )W (xt )hB (xt )


R
=< ( 0 (t)) >A,W,t
dx0 N (x0 )hA (x0 )W (xt )

where the windowing function is 0 if the final configuration sets in the desired window (i.e.,
range) and otherwise. The probability distribution in such biased simulation is then,
NA,W,t = N (x0 )hA (x0 )W (xt )
TPS is an algorithm that uses Monte Carlo to sample paths according to this
probability distribution.
In regular configuration-based Monte Carlo, we move from the current configuration to a
new one, then decides whether to accept the new configuration based on a rule that ensures
detailed balance. The same motivation applies here: we start from the current trajectory/path in the A, W, t ensemble, make either shooting or shift moves to generate a
totally new trajectory/path, then decide whether to accept this path based on the acceptance
rule,


NA,W,t (n)
acc(o n) = min 1,
NA,W,t (o)
For more detailed discussion of the moves, see the references given above. In addition, this
brutal force way of computing C(t) is not the most efficient - one needs to do TPS for a
large number of t values. A better way is to use a recursive relation, which is detailed by
Dellago in the Adv. Chem. Phys. review.
What are the current challenges for TPS?

Appropriate order parameters that define regions A and B so that they do not overlap.
Generation of the initial path. Usual trick is to use high-temperature trajectories.
Meaningful analysis of the TPS result.
Diffusive processes???

Chem 860. Lecture 24,25


Special Topics in Biophysics. III. Brownian Dynamics
December 6, 2004

Protein-ligand interaction is involved in many fundamental aspects such as substrate recognition by enzymes. The most practical aspect is of course drug design. Computational
approach in principle can help minimize the number of experimental screening and guide the
design towards the right direction.
Protein-protein interaction is important for many reasons: signal transduction, aggregation
and polymerization etc. Computational approach is useful because often the complex between proteins is short lived and therefore difficult to characterize in detail with experiments.
For a decent and recent review, see McCammon et al., J. Phys. Chem. 105, 1504 (2001),
which summarizes both binding affinity and kinetics. Here we focus on binding kinetics.

Langevin Dynamics

Binding kinetics is important because many enzymes are diffusion controlled. More over,
in vivo conditions often require that the real ligand should bind not only strongly but also
quickly to avoid competition. Binding kinetics is often studied with the Brownian Dynamics, which is a special form of Langevin Dynamics that was originally developed to study
Brownian motions. For a concise explanation, on which the following is based, see Zwanzig,
Nonequilibrium Statistical Mechanics, 2001.
For the simplest form of 1D-Brownian motion, one big solute in the presence of the solvent
molecules but free of any other interactions, we realize that the minimal effect is that the
solute motion is damped by the solvent interactions (viscous effect),
m

dv
= v
dt

For a sphere, for example, = 6R.


The solution is simply:
v(t) = et/m v(0)

which goes to zero at the long-time limit. But this can NOT be correct, because at equilibrium we know that
< v 2 >= kB T /m
Therefore, we need to add a random, fluctuating, force to the equation of motion, which
gives the Langevin equation,
dv
m = v + F (t)
dt
How do we specify the random force? What matters would be the distribution of these
random forces - we care only about the first two moments if we assume Gaussian fluctuation
holds.
< F (t) >= 0, < F (t)F (t0 ) >= 2B(t t0 )
where we assumed zero mean and uncorrelated fluctuations.
Since both the viscous drag and random fluctuation come from the solvent, it should not
come as a surprise that and F (t) (or B) are related. Such a relation can be revealed by
using the equilibrium condition < v 2 >= kB T /m.
The formal solution of the LD is:
t/m

v(t) = e

t/m

dt0 et /m F (t0 )/m

v(0) + e

the variance is then


< v 2 >= e2t/m v 2 (0) +

B
(1 e2t/m )
m

which gives
B = kB T
or,
< F (t)F (t0 ) >= 2kB T (t t0 )
which is referred to as fluctuation-dissipation theorem.

Brownian Dynamics

Having the first and second moments, we can integrate the Langevin equation in time - as we
did so for the Newton equation. One special form is the over-damped limit (i.e., the viscous
force is much larger than the inertia term, m dv
). This is often referred to as Brownian
dt
dynamics:
0 = v + f + F
where we also included any conservative force (f ) acting on the Brownian particle. Straightforward integration gives
f
F
r = t +
t

Note the relation, D = kB T /, we have


r =

fD
t + S(t)
kB T

where S(t) is a random displacement whose second moment is 2Dt (McCammon et al.,
J. Chem. Phys., 69, 1352, 1978).
In protein association BD simulations, we usually assume that the protein internal structures
are frozen, which allows a very large time-step; t is often close to be 1 ps. The forces
that proteins feel (f ) would be estimated from implicit solvent models. One usually start
the binding simulation by putting two proteins b distance apart, then run multiple BD
simulations - and obtain statistically meaningful binding probability . The association
constant is then,
kasso = kb
where kb is the rate for two proteins b distance apart - which for spherical cases is equal to
4Drel b.
Not unexpectedly, binding was often found to be driven by electrostatic interactions - which
are long-range in nature and heterogeneous in spatial distribution - this is particularly important for proteins of the same overall charge: they bind because of charge complementarity
in binding sites.

Chem 860. Lecture 26


Special Topics in Biophysics. IV. Coarse-grained simulations
December 8, 2004

All atom models that we talked about in early lectures offer great details into the process under study. However, the computational cost is too high for processes that occur at
large length and time scales - e.g., membrane fusion that is involved in virus infection and
exo/endo-cytosis. Further approximations are needed to make the simulation possible while
maintaining the essential physical and chemical nature of the system. Here we introduce two
such models widely used in this rapidly developing area.

Reduced-resolution models

The most straightforward approach is to reduce the resolution of the model - i.e., grouping a
number of atoms into a large particle/bead. Interactions among those coarse-grained (CG)
beads often have the similar physical meaning as all-atom force fields - i.e., electrostatic,
van der Waals, bonded terms etc.; relevant parameters can be established from comparing
to all-atom simulations, e.g., reproducing radial distribution functions. Due to the reduced
resolution, however, many terms are much softer than those in all-atom models. As a result,
in addition to the reduction in the number of interactions need to be calculated, further
increase in the effective efficiency of the simulation arises because larger time step can be used
and the dynamics are faster. Take the Klein model for DMPC as an example, the number of
particles is reduced from 118 to 13, the number of intra-DMPC interaction terms is reduced
from 900 to 24; the time-step is increases by one order of magnitude and the translational
diffusion constant is larger by two order of magnitude. Therefore, the effective increase in
the sampling efficiency is roughly 105 fold. Such a major increase allows one to pursue
simulations of processes that usually take too long to do using all-atom simulations; good
examples are self-assembly, membrane fusion and peptide insertion into lipids. For a recent
review, see Klein et al. J. Phys.: Condens. Matt., 16, R481 (2004). Other developments
involve the Voth group (e.g., J. Chem. Phys, 120, 4074 (2004)) and Mark et al. (J. Am.
Chem. Soc., 125, 11144 (2003)).

Dissipative Particle Dynamics

DPD was introduced to study colloid dynamics. In many ways, DPD is very similar to
Brownian Dynamics: it explicitly describes only the colloid (solute) dynamics and treat the
effect of the solvent through dissipative and random forces. The expression of the force,
however, has a major difference:
F~i =

Xh

i
c
D
R
~
~
~
f (rij ) + f (rij ) + f (rij )

j6=i

The dissipative and random forces are in the form,


f~D (rij ) = D (rij )(~vij rij )
rij
f~R (rij ) = R (rij )ij rij
where is the friction coefficient, and is related to through fluctuation-dissipation
theorem,
2 = 2kB T
ij is a Gaussian variable with a variance of one. The distance scaling functions, D/R (rij )
are also related to each other if Boltzmann distribution is to be satisfied (see Frenkel &
Smit),
D = ( R )2
An important difference between DPD and BD is that, by writing the forces in the pairwise form, DPD - by construction - satisfies Newtons 3rd law and therefore conserves linear
momentum. BD, on the other hand, does not. The conservation of momentum is important
for studying systems in which hydrodynamics play an essential role - such as microdomain
formation in block copolymers.

You might also like