John von Neumann Institute for Computing
Introduction to Monte Carlo Methods
Daan Frenkel
published in
Computational Soft Matter: From Synthetic Polymers to Proteins,
Lecture Notes,
Norbert Attig, Kurt Binder, Helmut Grubmüller, Kurt Kremer (Eds.),
John von Neumann Institute for Computing, Jülich,
NIC Series, Vol. 23, ISBN 3-00-012641-4, pp. 29-60, 2004.
c 2004 by John von Neumann Institute for Computing
Permission to make digital or hard copies of portions of this work
for personal or classroom use is granted provided that the copies
are not made or distributed for profit or commercial advantage and
that copies bear this notice and the full citation on the first page. To
copy otherwise requires prior specific permission by the publisher
mentioned above.
http://www.fz-juelich.de/nic-series/volume23
Introduction to Monte Carlo Methods
Daan Frenkel
FOM Institute for Atomic and Molecular Physics,
Kruislaan 407, 1098 SJ Amsterdam, The Netherlands
E-mail:
[email protected]
These give an introduction to Monte Carlo simulations. After a general introduction of the
approach and practical implementation, special attention is paid to the used of biased sampling
methods in the context of polymer simulations
1
Introduction
The Monte Carlo techniques that are described in this chapter can be used to compute the
equilibrium properties of classical many-body systems. In this context, the word ‘classical’ means that the core motion of the constituent particles obeys the laws of classical
mechanics. This is an excellent approximation for a wide range of materials. Only when
we consider the translational or rotational motion of light atoms or molecules (He, H 2 ,
D2 ,) or vibrational motion with a frequency such that hν > k B T , should we worry about
quantum effects.
These lecture notes provide a somewhat selective introduction to the Monte Carlo (MC)
method. The selection reflects my own personal bias. It is largely (but not completely)
based on the more complete description given in ref. 1
Before embarking on a description of the MC method, I should first briefly explain the
role of computer simulations in general. This topic is best discussed by considering the
situation that prevailed before the advent of electronic computers. At that time, there was
only one way to predict the outcome of an experiment, namely by making use of a theory
that provided an approximate description of the system under consideration. The reason
why an approximate theory was almost always used is that there are very few model systems for which the equilibrium properties can be computed exactly (examples are the ideal
gas, the harmonic crystal and a number of lattice models, such as the two-dimensional Ising
model for ferro-magnets), and even fewer model systems for which the transport properties can be computed exactly. As a result, most properties of real materials were predicted
on the basis of approximate theories (examples are the van der Waals equation for dense
gases, the Debye-Hückel theory for electrolytes, or the Boltzmann equation to describe the
transport properties of dilute gases). Given sufficient information about the intermolecular
interactions, these theories will provide us with an estimate of the properties of interest.
Unfortunately, our knowledge of the intermolecular interactions of all but the simplest
molecules is quite limited. This leads to a problem if we wish to test the validity of a particular theory by comparing directly to experiment. If we find that theory and experiment
disagree, it may mean that our theory is wrong, or that we have an incorrect estimate of
the intermolecular interactions, or both... Clearly, it would be very nice if we could obtain
essentially exact results for a given model system, without having to rely on approximate
theories. Computer simulations allow us to do precisely that. On the one hand, we can
29
now compare the calculated properties of a model system with those of an experimental
system: if the two disagree, our model is inadequate, i.e. we have to improve on our estimate of the intermolecular interactions. On the other hand, we can compare the result
of a simulation of a given model system with the predictions of an approximate analytical
theory applied to the same model. If we now find that theory and simulation disagree, we
know that the theory is flawed. So, in this case, the computer simulation plays the role of
the ‘experiment’ that is designed to test the theory. This method to ‘screen’ theories before
we apply them to the real world, is called a ‘computer experiment’. Computer experiments
have become standard practice, to the extent that they now provide the first (and often the
last) test of a new theoretical result.
In fact, the early history of computer simulation (see e.g. ref. 2 ) illustrates this role of
computer simulation. In some areas of physics there appeared to be little need for simulation because very good analytical theories were available (e.g. to predict the properties of
dilute gases or of nearly harmonic crystalline solids). However, in other areas, few if any
‘exact’ theoretical results were known, and progress was much hindered by the fact that
there were no unambiguous tests to assess the quality of approximate theories. A case in
point was the theory of dense liquids. Before the advent of computer simulations, the only
way to model liquids was by mechanical simulation 3–5 of large assemblies of macroscopic
spheres (e.g. ball bearings). But such mechanical models are rather crude, as they ignore
the effect of thermal motion. Moreover, the analysis of the structures generated by mechanical simulation was very laborious and, in the end, had to be performed by computer
anyway.
This may explain why, when in 1953 electronic computers were, for the first time, made
available for unclassified research, numerical simulation of dense liquids was one of the
first problems that was tackled. In fact, the first simulation of a ‘liquid’ was carried out by
Metropolis et al.6 at Los Alamos, using (or, more properly, introducing) the Monte Carlo
method. Almost simultaneously, Fermi, Pasta and Ulam 7 performed a very famous numerical study of the dynamics of an anharmonic, one-dimensional crystal. However, the first
proper Molecular Dynamics simulations were reported in 1956 by Alder and Wainwright 8
at Livermore, who studied the dynamics of an assembly of hard spheres. The first MD simulation of a model for a ‘real’ material was reported in 1959 (and published in 1960) by the
group of Vineyard at Brookhaven9 , who simulated radiation damage in crystalline Cu (for
a historical account, see10 ). The first MD simulation of a real liquid (argon) was reported in
1964 by Rahman at Argonne11 . After that, computers were increasingly becoming available to scientists outside the US government labs, and the practice of simulation started
spreading to other continents12–14 . Much of the methodology of MD simulations has been
developed since then, although it is fair to say that the basic algorithms for MC and MD
have hardly changed since the fifties.
1.1 Statistical mechanics
In the present lecture, we describe the basic principles of the Monte Carlo method and
molecular dynamics. In particular, we focus on simulations of systems of a fixed number
of particles (N ) in a given volume (V ) at a temperature (T ).
Our aim is to indicate where the Monte Carlo method comes in. We start from the
30
classical expression for the partition function Q:
Z
Q = c dpN drN exp[−H(rN pN )/kB T ],
(1)
where rN stands for the coordinates of all N particles, and p N for the corresponding
momenta. The function H(qN , pN ) is the Hamiltonian of the system. It expresses the
total energy of an isolated system as a function of the coordinates and momenta of the
constituent particles: H = K + U, where K is the kinetic energy of the system and U is the
potential energy. Finally, c is a constant of proportionality, chosen such that the sum over
quantum states approaches the classical partition function in the limit ~ → 0. For instance,
for a system of N identical atoms, c = 1/(h3N N !). The classical equation is
R
dpN drN A(pN , rN ) exp[−βH(pN , rN )]
R
hAi =
,
(2)
dpN drN exp[−βH(pN , rN )]
where β = 1/kB T . In this equation, the observable A has been expressed as a function
of coordinates and momenta. As K is a quadratic function of the momenta the integration
over momenta can be carried out analytically. Hence, averages of functions that depend
on momenta only are usually easy to evaluate. The difficult problem is the computation of
averages of functions A(rN ). Only in a few exceptional cases can the multidimensional
integral over particle coordinates be computed analytically, in all other cases numerical
techniques must be used.
Having thus defined the nature of the numerical problem that we must solve, let us next
look at possible solutions. It might appear that the most straightforward approach would be
to evaluate hAi in equation 2 by numerical quadrature, for instance using Simpson’s rule.
It is easy to see, however, that such a method is completely useless even if the number of
independent coordinates DN (D is the dimensionality of the system) is still very small
O(100). Suppose that we plan to carry out the quadrature by evaluating the integrand on
a mesh of points in the DN -dimensional configuration space. Let us assume that we take
m equidistant points along each coordinate axis. The total number of points at which the
integrand must be evaluated is then equal to m DN . For all but the smallest systems this
number becomes astronomically large, even for small values of m. For instance, if we
take 100 particles in three dimensions, and m = 5, then we would have to evaluate the
integrand at 10210 points! Computations of such magnitude cannot be performed in the
known universe. And this is fortunate, because the answer that would be obtained would
have been subject to a large statistical error. After all, numerical quadratures work best
on functions that are smooth over distances corresponding to the mesh size. But for most
intermolecular potentials, the Boltzmann factor in equation 2 is a rapidly varying function
of the particle coordinates. Hence an accurate quadrature requires a small mesh spacing
(i.e. a large value of m). Moreover, when evaluating the integrand for a dense liquid
(say), we would find that for the overwhelming majority of points this Boltzmann factor is
vanishingly small. For instance, for a fluid of 100 hard spheres at the freezing point, the
Boltzmann factor would be nonzero for 1 out of every 10 260 configurations!
The closing lines of theR previous section suggest that it is in general not possible to evaluate an integral, such as drN exp[−βU(rN )], by direct Monte Carlo sampling. However, in many cases, we are not interested in the configurational part of the partition func-
31
tion itself but in averages of the type
R N
dr exp[−βU(rN )]A(rN )
R
hAi =
.
drN exp[−βU(rN )]
(3)
Hence, we wish to know the ratio of two integrals. What Metropolis et al. 6 showed
is that it is possible to devise an efficient Monte Carlo scheme to sample such a ratio. a
To understand the Metropolis method, let us first look more closely at the structure of
equation 3. In what follows we denote the configurational part of the partition function by
Z:
Z
Z ≡ drN exp[−βU(rN )].
(4)
Note that the ratio exp(−βU)/Z in equation 3 is the probability density to find the system
in a configuration around rN . Let us denote this probability density by
N (rN ) ≡
exp[−βU(rN )]
.
Z
Clearly, N (rN ) is > 0.
Suppose now that we are somehow able to randomly generate points in configuration
space according to this probability distribution N (r N ). This means that, on average, the
number of points ni generated per unit volume around a point rN is equal to L N (rN ),
where L is the total number of points that we have generated. In other words;
L
hAi ≈
1X
ni A(rN
i ).
L i=1
(5)
By now the reader is almost certainly confused, so let us therefore try to clarify this
method with the help of a simple example (see Figure 1). In this figure, we compare
two ways to measure the depth of the river Nile, by conventional quadrature (left) and
by Metropolis sampling; that is, the construction of an importance-weighted random walk
(right). In the conventional quadrature scheme, the value of the integrand is measured at a
predetermined set of points. As the choice of these points does not depend on the value of
the integrand, many points may be located in regions where the integrand vanishes. In contrast, in the Metropolis scheme, a random walk is constructed through that region of space
where the integrand is nonnegligible (i.e. through the Nile itself). In this random walk, a
trial move is rejected if it takes you out of the water, and is accepted otherwise. After every
trial move (accepted or not), the depth of the water is measured. The (unweighted) average
of all these measurements yields an estimate of the average depth of the Nile. This, then,
is the essence of the Metropolis method. In principle, the conventional quadrature scheme
would also give results for the total area of the Nile. In the importance sampling scheme,
however, information on the total area cannot be obtained directly, since this quantity is
similar to Z.
Let us next consider how to generate points in configuration space with a relative probability proportional to the Boltzmann factor. The general approach is first to prepare the
a An interesting account of the early history of the Metropolis method may be found in H.L. Anderson, J. Stat.
Phys. 43:731, 1986; and Wood15 p. 3.
32
Nile
Nile
Figure 1. Measuring the depth of the Nile: a comparison of conventional quadrature (left), with the Metropolis
scheme (right).
system in a configuration rN , which we denote by o (old), that has a nonvanishing Boltzmann factor exp[−βU(o)]. This configuration, for example, may correspond to a regular
crystalline lattice with no hard-core overlaps. Next, we generate a new trial configuration
N
r′ , which we denote by n (new), by adding a small random displacement ∆ to o. The
Boltzmann factor of this trial configuration is exp[−βU(n)]. We must now decide whether
we will accept or reject the trial configuration. Many rules for making this decision satisfy
the constraint that on average the probability of finding the system in a configuration n is
proportional to N (n). Here we discuss only the Metropolis scheme, because it is simple
and generally applicable.
Let us now “derive” the Metropolis scheme to determine the transition probability
π(o → n) to go from configuration o to n. It is convenient to start with a thought experiment (actually a thought simulation). We carry out a very large number (say M ) Monte
Carlo simulations in parallel, where M is much larger than the total number of accessible
configurations. We denote the number of points in any configuration o by m(o). We wish
that, on average, m(o) is proportional to N (o). The matrix elements π(o → n) must satisfy one obvious condition: they do not destroy such an equilibrium distribution once it
is reached. This means that, in equilibrium, the average number of accepted trial moves
that result in the system leaving state o must be exactly equal to the number of accepted
trial moves from all other states n to state o. It is convenient to impose a much stronger
condition; namely, that in equilibrium the average number of accepted moves from o to any
other state n is exactly canceled by the number of reverse moves. This detailed balance
condition implies the following:
N (o)π(o → n) = N (n)π(n → o).
(6)
Many possible forms of the transition matrix π(o → n) satisfy equation (6). Let us look
how π(o → n) is constructed in practice. We recall that a Monte Carlo move consists of
33
two stages. First, we perform a trial move from state o to state n. We denote the transition
matrix that determines the probability to perform a trial move from o to n by α(o → n);
where α is usually referred to as the underlying matrix of Markov chain 16 . The next stage
is the decision to either accept or reject this trial move. Let us denote the probability of
accepting a trial move from o to n by acc(o → n). Clearly,
π(o → n) = α(o → n) × acc(o → n).
(7)
In the original Metropolis scheme, α is chosen to be a symmetric matrix (acc(o → n) =
acc(n → o)). However, in later sections we shall see several examples where α is not
symmetric. If α is symmetric, we can rewrite equation 6 in terms of the acc(o → n):
N (o) × acc(o → n) = N (n) × acc(n → o).
(8)
acc(o → n)
N (n)
=
= exp{−β[U(n) − U(o)]}.
acc(n → o)
N (o)
(9)
From Eqn. 8 follows
Again, many choices for acc(o → n) satisfy this condition (and the obvious condition that
the probability acc(o → n) cannot exceed 1). The choice of Metropolis et al. is
acc(o → n) = N (n)/N (o) if N (n) < N (o)
=1
if N (n) ≥ N (o).
(10)
Other choices for acc(o → n) are possible (for a discussion, see for instance 17 ), but the
original choice of Metropolis et al. appears to result in a more efficient sampling of configuration space than most other strategies that have been proposed.
1.2 A Basic Monte Carlo Algorithm
It is difficult to talk about Monte Carlo or Molecular Dynamics programs in abstract terms.
The best way to explain how such programs work is to write them down. This will be done
in the present section.
Most Monte Carlo or Molecular Dynamics programs are only a few hundred to several
thousand lines long. This is very short compared to, for instance, a typical quantumchemistry code. For this reason, it is not uncommon that a simulator will write many
different programs that are tailor-made for specific applications. The result is that there
is no such thing as a standard Monte Carlo or Molecular Dynamics program. However,
the cores of most MD/MC programs are, if not identical, at least very similar. Next, we
shall construct such a core. It will be very rudimentary, and efficiency has been traded for
clarity. But it will serve to demonstrate how the Monte Carlo method work.
1.3 The Algorithm
The prime purpose of the kind of Monte Carlo or Molecular Dynamics program that we
shall be discussing is to compute equilibrium properties of classical many-body systems.
From now on, we shall refer to such programs simply as MC or MD programs, although it
should be remembered that there exist many other applications of the Monte Carlo method
34
(and, to a lesser extent, of the Molecular Dynamics method). Let us now look at a simple
Monte Carlo program.
In the previous section, the Metropolis method was introduced as a Markov process
in which a random walk is constructed in such a way that the probability of visiting a
particular point rN is proportional to the Boltzmann factor exp[−βU(r N )]. There are
many ways to construct such a random walk. In the approach introduced by Metropolis et
al.6 , the following scheme is proposed:
1. Select a particle at random, and calculate its energy U(r N ).
2. Give the particle a random displacement; r ′ = r + ∆, and calculate its new energy
N
U(r′ ).
3. Accept the move from rN to r′
N
with probability
N
acc(o → n) = min 1, exp{−β[U(r′ ) − U(rN )]} .
(11)
An implementation of this basic Metropolis scheme is shown in Algorithms 1 and 2.
Algorithm 1
[Basic Metropolis Algorithm]
basic Metropolis algorithm
PROGRAM mc
do icycl=1,ncycl
call mcmove
if (mod(icycl,nsamp).eq.0)
+
call sample
enddo
end
perform ncycl MC cycles
displace a particle
sample averages
Comments:
1. Subroutine mcmove attempts to displace a randomly selected particle
(see Algorithm 2).
2. Subroutine sample samples quantities every nsampth cycle.
1.4 Trial Moves
Now that we have specified the general structure of the Metropolis algorithm, we should
consider its implementation. We shall not go into the problem of selecting intermolecular
potentials for the model system under study. Rather, we shall simply assume that we
have an atomic or molecular model system in a suitable starting configuration and that we
have specified all intermolecular interactions. We must now set up the underlying Markov
35
Algorithm 2
[Attempt to Displace a Particle]
attempts to displace a particle
SUBROUTINE mcmove
o=int(ranf()*npart)+1
call ener(x(o),eno)
xn=x(o)+(ranf()-0.5)*delx
call ener(xn,enn)
if (ranf().lt.exp(-beta
+
*(enn-eno)) x(o)=xn
return
end
select a particle at random
energy old configuration
give particle random displacement
energy new configuration
acceptance rule (11)
accepted: replace x(o) by xn
Comments:
1. Subroutine ener calculates the energy of a particle at the given position.
2. Note that, if a configuration is rejected, the old configuration is retained.
3. The ranf() is a random number uniform in [0, 1].
chain; that is, the matrix α. In more down-to-earth terms: we must decide how we are
going to generate trial moves. We should distinguish between trial moves that involve only
the molecular centers of mass, and those that change the orientation or possibly even the
conformation of a molecule.
We start our discussion with trial moves of the molecular centers of mass. A perfectly
acceptable method to create a trial displacement is to add random numbers between −∆/2
and +∆/2 to the x, y, and z coordinates of the molecular center of mass:
x′i → xi + ∆ (Ranf − 0.5)
yi′ → yi + ∆ (Ranf − 0.5)
zi′ → zi + ∆ (Ranf − 0.5),
(12)
where Ranf are random numbers uniformly distributed between 0 and 1. Clearly, the
reverse trial move is equally probable (hence, α is symmetric).
We are now faced with two questions: how large should we choose ∆?, and should
we attempt to move all particles simultaneously or one at a time? In the latter case we
should pick the molecule that is to be moved at random to ensure that the underlying
Markov chain remains symmetric. All other things being equal, we should choose the
most efficient sampling procedure. But, to this end, we must first define what we mean by
efficient sampling. Broadly speaking, sampling is efficient if it gives you good value for
money. Good value in a simulation corresponds to high statistical accuracy, and “money”
is simply money: the money that buys your computer time and even your own time. For
the sake of the argument, we assume the average scientific programmer is poorly paid. In
36
that case we have to worry only about your computer budget. b Then we could use the
following definition of an optimal sampling scheme: a Monte Carlo sampling scheme can
be considered optimal if it yields the lowest statistical error in the quantity to be computed
for a given expenditure of computing budget. Usually, computing budget is equivalent to
CPU time.
From this definition it is clear that, in principle, a sampling scheme may be optimal
for one quantity but not for another. Actually, the preceding definition is all but useless in
practice (as are most definitions). For instance, it is just not worth the effort to measure
the error estimate in the pressure for a number of different Monte Carlo sampling schemes
in a series of runs of fixed length. However, it is reasonable to assume that the meansquare error in the observables is inversely proportional to the number of uncorrelated
configurations visited in a given amount of CPU time. And the number of independent
configurations visited is a measure for the distance covered in configuration space. This
suggests a more manageable, albeit rather ad hoc, criterion to estimate the efficiency of a
Monte Carlo sampling scheme: the sum of the squares of all accepted trial displacements
divided by computing time. This quantity should be distinguished from the mean-square
displacement per unit of computing time, because the latter quantity goes to zero in the
absence of diffusion (e.g. in a solid or a glass), whereas the former does not.
Next, consider the choice of the parameter ∆ which determines the size of the trial
move. How large should ∆ be? If it is very large, it is likely that the resulting configuration will have a high energy and the trial move will probably be rejected. If it is very small,
the change in potential energy is probably small and most moves will be accepted. In the
literature, one often finds the mysterious statement that an acceptance of approximately
50% should be optimal. This statement is not necessarily true. The optimum acceptance
ratio is the one that leads to the most efficient sampling of configuration space. If we
express efficiency as mean-square displacement per CPU time, it is easy to see that different Monte Carlo codes will have different optimal acceptance ratios. The reason is that it
makes a crucial difference if the amount of computing required to test whether a trial move
is accepted depends on the magnitude of the move (see Figure 2).
In the conventional Metropolis scheme, all continuous interactions have to be computed before a move can be accepted or rejected. Hence, for continuous potentials, the
amount of computation does not depend on the size of a trial move. In contrast, for simulations of molecules with hard repulsive cores, a move can be rejected as soon as overlap
with any neighbor is detected. In that case, a rejected move is cheaper than an accepted
one, and hence the average computing time per trial move goes down as the step size is
increased. As a result, the optimal acceptance ratio for hard-core systems is appreciably
lower than for systems with continuous interactions. Exactly how much depends on the
nature of the program, in particular on whether it is a scalar or a vector code (in the latter
case, hard-core systems are treated much like continuous systems), how the information
about neighbor lists is stored, and even on the computational “cost” of random numbers
and exponentiation. The consensus seems to be that for hard-core systems the optimum
acceptance ratio is closer to 20 than to 50%, but this is just another rule of thumb that
should be checked.
b Still,
we should stress that it is not worthwhile spending a lot of time developing a fancy computational scheme
that will be only marginally better than existing, simpler schemes, unless your program will run very often and
speed is crucial.
37
2
Cost
<∆r >
continuous
hard core
∆
∆
Figure 2. (left) Typical dependence of the mean-square displacement of a particle on the average size ∆ of
the trial move. (right) Typical dependence of the computational cost of a trial move on the step-size ∆. For
continuous potentials, the cost is constant, while for hard-core potentials it decreases rapidly with the size of the
trial move.
2
Canonical Ensemble
In a conventional Molecular Dynamics simulation, the total energy E and the total linear
momentum P are constants of motion. Hence, Molecular Dynamics simulations measure
(time) averages in an ensemble that is very similar to the microcanonical (see 18 ); namely,
the constant-N V E-P ensemble. In contrast, a conventional Monte Carlo simulation probes
the canonical (i.e. constant-N V T ) ensemble. The fact that these ensembles are different
leads to observable differences in the statistical averages computed in Molecular Dynamics
and Monte Carlo simulations. Most of these differences disappear in the thermodynamic
limit and are already relatively small for systems of a few hundred particles. However,
the choice of ensemble does make a difference when computing the mean-square value
of fluctuations in thermodynamic quantities. Fortunately, techniques exist to relate fluctuations in different ensembles19 . Moreover, nowadays it is common practice to carry out
Molecular Dynamics simulations in ensembles other than the microcanonical. In particular, it is possible to do Molecular Dynamics at constant pressure, at constant stress, and at
constant temperature. The choice of ensembles for Monte Carlo simulations is even wider:
isobaric-isothermal, constant-stress-isothermal, grand-canonical (i.e. constant-µV T ), and
even microcanonical20–25 . A more recent addition to this list is a Monte Carlo method that
employs the Gibbs-ensemble technique26 , which was developed to study phase coexistence
in moderately dense (multi component) fluids.
2.1 General Approach
In the following sections, we will use the following procedure to demonstrate the validity
of our Monte Carlo algorithms:
1. Decide which distribution we want to sample. This distribution, denoted N , will
depend on the details of the ensemble.
38
N, V, T
Figure 3. Canonical ensemble. The number of particles, volume, and temperature are constant. Shown is a Monte
Carlo move in which a particle is displaced).
2. Impose the condition of detailed balance,
K(o → n) = K(n → o),
(13)
where K(o → n) is the flow of configuration o to n. This flow is given by the
product of the probability of being in configuration o, the probability of generating
configuration n, and the probability of accepting this move,
K(o → n) = N (o) × α(o → n) × acc(o → n).
(14)
3. Determine the probabilities of generating a particular configuration.
4. Derive the condition which needs to be fulfilled by the acceptance rules.
It is instructive to apply the preceding recipe to the ordinary Metropolis scheme. In the
canonical ensemble, the number of particles, temperature, and volume are constant (see
Figure 3).
The partition function is
Z
1
drN exp[−βU(rN )],
(15)
Q(N, V, T ) ≡ 3N
Λ N!
p
where Λ = h2 /(2πmkB T ) is the thermal de Broglie wavelength. From the partition
function it follows that the probability of finding configuration r N is given by distribution
is
N (rN ) ∝ exp[−βU(rN )].
(16)
equations 15 and 16 are the basic equations for a simulation in the canonical ensemble.
39
2.2 Monte Carlo Simulations
In the canonical ensemble, we have to sample distribution (16). This can be done using the
following scheme:
1. Select a particle at random and calculate the energy of this configuration U(o).
2. Give this particle a random displacement (see Figure 3),
r(o) → r(o) + ∆(Ranf − 0.5),
where ∆/2 is the maximum displacement. The value of ∆ should be chosen such that
the sampling scheme is optimal (see section 1.4). The new configuration is denoted n
and its energy U(n).
3. The move is accepted with a probability (see equation 10)
acc(o → n) = min (1, exp{−β[U(n) − U(o)]}) .
(17)
If rejected, the old configuration is kept.
An implementation of this basic Metropolis scheme is shown in Section 1.2 (Algorithms 1
and 2).
2.3 Justification of the Algorithm
The probability of generating a particular configuration is constant and independent of the
conformation of the system
α(o → n) = α(n → o) = α.
Substitution of this equation in the condition of detailed balance (13) and substitution of
the desired distribution (16) gives as condition for the acceptance rules
acc(o → n)
= exp{−β[U(n) − U(o)]}.
acc(n → o)
(18)
It is straightforward to demonstrate that acceptance rule (17) obeys this condition.
3
Grand-Canonical Ensemble
The ensembles we have discussed so far have the total number of particles imposed. For
some systems, however, one would like to obtain information on the average number of
particles in a system as a function of the external conditions. For example, in adsorption
studies one would like to know the amount of material adsorbed as a function of the pressure and temperature of the reservoir with which the material is in contact. A naive but
theoretically valid approach would be to use the Molecular Dynamics technique (microcanonical ensemble) and simulate the experimental situation; an adsorbent in contact with
a gas (see Figure 4).
Such a simulation is possible only for very simple systems. In real experiments, equilibration may take minutes or even several hours depending on the type of gas molecules.
40
Figure 4. Adsorbent (for example a zeolite) in direct contact with a gas ).
These equilibration times would be reflected in a Molecular Dynamics simulation, the difference being that a minute of experimental time takes of the order of 10 9 seconds on a
computer. Furthermore, in most cases, we are not interested in the properties of the gas
phase, yet a significant amount of CPU time will be spent on the simulation of this phase.
Finally, in such a simulation, there is an interface between the gas phase and the adsorbent.
In the interfacial region the properties of the system are different from the bulk properties
in which we are interested. Since in a simulation the system is relatively small, we have to
simulate a very large system to minimize the influence of this interfacial region. c
Most of these problems can be solved by a careful choice of ensembles. For adsorption
studies, a natural ensemble to use is the grand-canonical ensemble (or µ,V ,T ensemble).
In this ensemble, the temperature, volume, and chemical potential are fixed. In the experimental setup, the adsorbed gas is in equilibrium with the gas in the reservoir. The
equilibrium conditions are that the temperature and chemical potential of the gas inside
and outside the adsorbent must be equal.d The gas that is in contact with the adsorbent
can be considered as a reservoir that imposes a temperature and chemical potential on the
adsorbed gas (see Figure 5).
We therefore have to know only the temperature and chemical potential of this reservoir
to determine the equilibrium concentration inside the adsorbent. This is exactly what is
mimicked in the grand-canonical ensemble: the temperature and chemical potential are
imposed and the number of particles is allowed to fluctuate during the simulation. This
makes these simulations different from the conventional ensembles, where the number of
molecules is fixed.
c Such
a simulation, of course, would be appropriate if the interest is in just this region.
that the pressure is not defined inside the zeolite; therefore, the pressure cannot be an equilibrium quantity.
However, the pressure is related to the chemical potential via an equation of state, and it is always possible to
calculate the pressure of the gas that corresponds to a given chemical potential and vice versa.
d Note
41
µT
Figure 5. Adsorbent in contact with a reservoir that imposes constant chemical potential and temperature by
exchanging particles and energy).
3.1 Statistical Mechanical Basis
The Metropolis sampling scheme as a method to compute thermal averages of functions
A(rN ) that depend explicitly on the coordinates of the molecules in the N -body system
under study. Examples of such mechanical properties are the potential energy or the virial
contribution to the pressure.
However, the Metropolis method could not be used to deR
termine the integral drN exp[−βU(rN )] itself. The latter quantity measures the effective volume in configuration space that is accessible to the system. Hence, the original
Metropolis scheme could not be used to determine those thermodynamic properties of a
system that depend explicitly on the configurational integral. Examples of such thermal
properties are the Helmholtz free energy F , the entropy S, and the Gibbs free energy G.
However, although the Metropolis method cannot be used to measure, for instance, free
energies directly, it can be used to measure the difference in free energy between two possible states of an N -body system. This fact is exploited in the grand-canonical Monte Carlo
method first implemented for classical fluids by Norman and Filinov 23 , and later extended
and improved by a number of other groups24, 27–34 . The basic idea of the grand-canonical
Monte Carlo method is explained next.
To understand the statistical mechanical basis for the grand-canonical Monte Carlo
technique, let us return to consider the partition fucntion of a combined system of N interacting particles in volume V and M − N ideal gas molecules in volume V 0 − V :
Z
Z
V N (V0 − V )M −N
Q(N, M, V, V0 , T ) = 3M
dsM −N dsN exp[−βU(sN )].
Λ N !(M − N )!
Now, instead of allowing the two systems to exchange volume, let us see what happens if
the systems can also exchange particles (see Figure 6).
To be more precise, we assume that the molecules in the two subvolumes are actually
identical particles. The only difference is that when they find themselves in volume V ,
they interact and, when they are in volume V0 − V , they do not. If we transfer a molecule
42
V0 −V
V
Figure 6. Ideal gas (M − N particles, volume V0 − V ) can exchange particles with a N -particle system (volume
V ).
i from a reduced coordinate si in the volume V0 − V to the same reduced coordinate in
volume V , then the potential energy function U changes from U(s N ) to U(sN +1 ). The
expression for the total partition function of the system, including all possible distributions
of the M particles over the two subvolumes is
Q(M, V, V0 , T ) =
Z
Z
M
X
V N (V0 − V )M −N
M −N
dsN exp[−βU(sN )].
ds
Λ3M N !(M − N )!
(19)
N =0
We now write the probability density to find a system with M − N particles at reduced
coordinates sM −N in volume V ′ ≡ V0 − V and N particles at reduced coordinates sN in
volume V :
N (sM ; N ) =
V N V ′M −N
exp[−βU(sN )].
Q(M, V, V ′ , T )Λ3M N !(M − N )!
(20)
Let us now consider a trial move in which a particle is transferred from V ′ to the same
scaled coordinate in V . First we should make sure that we construct an underlying Markov
chain that is symmetric. Symmetry, in this case, implies that the a priori probability to
move a particle from V ′ to V should be equal to the a priori probability of the reverse
move. The probability of acceptance of a trial move in which we move a particle to or
from volume V is determined by the ratio of the corresponding probability densities (20):
V (M − N )
exp(−β[U(sN +1 ) − U(sN )])
V ′ (N + 1)
V ′ (N + 1)
α(N + 1 → N ) =
exp(−β[U(sN ) − U(sN +1 )]).
V (M − N )
α(N → N + 1) =
(21)
(22)
Now let us consider the limit that the ideal gas system is very much larger than the interacting system: M → ∞, V ′ → ∞, (M/V ′ ) → ρ. Note that for an ideal gas the chemical
43
potential µ is related to the particle density ρ by
µ = kB T ln Λ3 ρ.
Therefore, in the limit (M/N ) → ∞, the partition function (19) becomes
Z
∞
X
exp(βµN )V N
Q(µ, V, T ) ≡
dsN exp[−βU(sN )],
Λ3N N !
(23)
N =0
and the corresponding probability density
exp(βµN )V N
exp[−βU(sN )].
(24)
Λ3N N !
Equations 23 and 24 are the basic equations for Monte Carlo simulations in the grandcanonical ensemble. Note that, in these equations, all explicit reference to the ideal gas
system has disappeared.
NµV T (sN ; N ) ∝
3.2 Monte Carlo Simulations
In a grand-canonical simulation, we have to sample the distribution (24). Acceptable trial
moves are
1. Displacement of particles. A particle is selected at random and given a new conformation (for example in the case of atoms a random displacement). This move is
accepted with a probability
N
(25)
acc(s → s′ ) = min 1, exp{−β[U(s′ ) − U(sN )]} .
2. Insertion and removal of particles. A particle is inserted at a random position or a
randomly selected particle is removed. The creation of a particle is accepted with a
probability
V
exp{β[µ − U(N + 1) + U(N )]} (26)
acc(N → N + 1) = min 1, 3
Λ (N + 1)
and the removal of a particle is accepted with a probability
Λ3 N
acc(N → N − 1) = min 1,
exp{−β[µ + U(N − 1) − U(N )]} .
V
(27)
The chemical potential of the reservoir can be related to the pressure of the reservoir.
Algorithm 3 shows the basic structure of a simulation in the grand-canonical ensemble.
3.3 Justification of the Algorithm
It is instructive to demonstrate that the acceptance rules (25)–(27) indeed lead to a sampling of distribution (24). Consider a move in which we start with a configuration with N
particles and move to a configuration with N + 1 particles by inserting a particle in the
system. Recall than we have to demonstrate that detailed balance is obeyed:
K(N → N + 1) = K(N + 1 → N ),
44
Algorithm 3
[Basic Grand-Canonical Ensemble Simulation]
PROGRAM mc gc
do icycl=1,ncycl
ran=int(ranf()*(npart+nexc))+1
if (ran.le.npart) then
call mcmove
else
call mcexc
endif
if (mod(icycl,nsamp).eq.0)
+
call sample
enddo
end
basic µVT-ensemble
simulation
perform ncycl MC cycles
displace a particle
exchange a particle
with the reservoir
sample averages
Comments:
1. This algorithm ensures that, after each MC step, detailed balance is obeyed. Per cycle
we perform on average npart attempts to displace particles and nexc attempts to
exchange particles with the reservoir.
2. Subroutine mcmove attempts to displace a particle (Algorithm 2 at page 36), subroutine mcexc attempts to exchange a particle with a reservoir (Algorithm 4), and
subroutine sample samples quantities every nsamp cycle.
with
K(N → N + 1) = N (N ) × α(N → N + 1) × acc(N → N + 1).
In Algorithm 3 at each Monte Carlo step the probability that an attempt is made to remove
a particle is equal to the probability of attempting to add one:
αgen (N → N + 1) = αgen (N + 1 → N ),
where the subscript “gen” refers to the fact that α measures the probability to generate this
trial move. Substitution of this equation together with equation 24 into the condition of
detailed balance gives:
exp[βµ(N + 1)]V N +1 exp[−βU(sN +1 )]
acc(N → N + 1)
=
acc(N + 1 → N )
Λ3(N +1) (N + 1)!
3N
Λ N ! exp[βU(sN )]
×
exp(βµN )V N
exp(βµ)V
exp{−β[U(sN +1 ) − U(sN )]}.
= 3
Λ (N + 1)
It is straightforward to show that acceptance rules (26) and (27) obey this condition.
45
Algorithm 4
[Attempt to Exchange a Particle with a Reservoir]
SUBROUTINE mcexc
if (ranf().lt.0.5) then
if (npart.eq.0) return
o=int(npart*ranf())+1
call ener(x(o),eno)
arg=npart*exp(beta*eno)
+
/(zz*vol)
if (ranf().lt.arg) then
x(o)=x(npart)
npart=npart-1
endif
else
xn=ranf()*box
call ener(xn,enn)
arg=zz*vol*exp(-beta*enn)
+
/(npart+1)
if (ranf().lt.arg) then
x(npart+1)=xn
npart=npart+1
endif
endif
return
end
attempt to exchange a particle
with a reservoir
decide to remove or add a particle
test whether there is a particle
select a particle to be removed
energy particle o
acceptance rule (27)
accepted: remove particle o
new particle at a random position
energy new particle
acceptance rule (26)
accepted: add new particle
Comment:
1. We have defined: zz = exp(βµ)/Λ3 . The subroutine ener calculates the energy of
a particle at a given position.
The most salient feature of the grand-canonical Monte Carlo technique is that in such
simulations the chemical potential µ is imposed, while the number of particles N is a fluctuating quantity. During the simulation we may measure other thermodynamic quantities,
such as the pressure P , the average density hρi, or the internal energy hUi. As we know
µ, we can derive all other thermal properties, such as the Helmholtz free energy or the
entropy. This may seem surprising. After all, the Metropolis sampling cannot be used
to sample absolute free energies and related quantities. Yet, with grand-canonical Monte
Carlo we seem to be doing precisely that. The answer is that, in fact, we do not. What we
measure is not an absolute but a relative free energy. In grand-canonical Monte Carlo, we
are equating the chemical potential of a molecule in an ideal gas at density ρ (for the ideal
gas case we know how to compute µ) and the chemical potential of the same species in an
interacting system at density ρ′ .
Grand-canonical Monte Carlo works best if the acceptance of trial moves by which
46
particles are added or removed is not too low. For atomic fluids, this condition effectively
limits the maximum density at which the method can be used to about twice the critical
density. Special tricks are needed to extend the grand-canonical Monte Carlo method to
somewhat higher densities32 . Grand-canonical Monte Carlo is easily implemented for mixtures and inhomogeneous systems, such as fluids near interfaces. In fact, some of the most
useful applications of the grand-canonical Monte Carlo method are precisely in these areas
of research. Although the grand canonical Monte Carlo technique can be applied to simple models of nonspherical molecules, special techniques are required since the method
converges very poorly for all but the smallest polyatomic molecules.
4
Beyond Metropolis
4.1 Introduction
The original Metropolis Monte Carlo scheme6 was designed to perform single-particle
trial moves. For most simulations, such moves are perfectly adequate. However, in some
cases it is more efficient to perform moves in which the coordinates of many particles are
changed. A case in point is the sampling of polymer conformations. The conventional
Metropolis algorithm is ill-suited for polymer simulations because the natural dynamics
of polymers is dominated by topological constraints (chains cannot cross). Hence, any
algorithm that mimics the real motion of macromolecules will suffer from the same problem. For this reason, many algorithms have been proposed to speed up the Monte Carlo
sampling of polymer conformations (see e.g. ref. 35 ). The Configurational-Bias Monte
Carlo (CBMC) method is a dynamic MC scheme that makes it possible to achieve large
conformational changes in a single trial move that affects a large number of monomeric
units36–39 .
4.2 CBMC
The CBMC method is based on the Rosenbluth sampling scheme 41, 36, 37 for lattice systems.
In this scheme, the molecular conformation is built up step-by-step, in such a way that, at
every stage, the next monomeric unit is preferentially added in a direction that has a large
Boltzmann weight. This increases the probability of generating a trial conformation that
has no hard-core overlaps. As explained below, the probability of acceptance of the trial
conformation is given by the ratio of the ‘Rosenbluth weights’ of the new and the old
conformations. Whereas the original Rosenbluth scheme was devised for polymers on a
lattice, the CBMC scheme will also work for chain molecules in continuous space. The
advantage of the CBMC algorithm over many of the other, popular algorithms is that it can
be used in cases where particle-insertion and particle-removal trial moves are essential.
This is the case in Grand-Canonical and Gibbs-Ensemble simulations. In addition, the
CBMC can be used in the simulation of grafted chains and ring polymers.
4.2.1 Detailed balance
Before explaining the CBMC scheme, it is useful to recall the general recipe to construct a
Monte Carlo algorithm. It is advisable (although not strictly obligatory 42 )to start from the
47
condition of detailed balance:
Po × Pgen (o → n) × Pacc (o → n) = Pn × Pgen (n → o) × Pacc (n → o) ,
(28)
where Po (Pn ) is the Boltzmann weight of the old (new) conformation, P gen denotes the
a priori probability to generate the trial move from o to n, and P acc is the probability that
this trial move will be accepted. From Eqn. 28 it follows that
Pgen (n → o)
Pacc (o → n)
= exp(−β∆U )
,
Pacc (n → o)
Pgen (o → n)
(29)
where exp(−β∆U ) is the ratio of the Boltzmann weights of the new and old conformations. If we use the Metropolis rule to decide on the acceptance of MC trial moves, then
Eqn 29 implies
Pgen (n → o)
.
(30)
Pacc (o → n) = min 1, exp(−β∆U )
Pgen (o → n)
Ideally, by biasing the probability to generate a trial conformation in the right way, we
could make the term on the right-hand side of Eqn. 30 always equal to unity. In that
case, every trial move will be accepted. This ideal situation can be reached in rare cases 40
Configurational bias Monte Carlo does not achieve this ideal situation. However it does
lead to enhanced acceptance probability of trial moves that involve large conformational
changes.
In CBMC, chain configurations are generated by successive insertion of the bonded
segments of the chain. When the positions of the segments are chosen at random, it is very
likely, that one of the segments will overlap with another particle in the fluid, which results
in rejection of the trial move. The Rosenbluth sampling scheme increases the insertion
probability by looking one step ahead. On lattices, the availability (i.e. the Boltzmann
factor) of all sites adjacent to the previous segment can be tested. In continuous space,
there are in principle an infinite number of positions that should be tested (e.g. in the
case of a chain molecule with rigid bonds, all points on the surface of a sphere with a
radius equal to the bond length). Of course, it is not feasible to scan an infinite number
of possibilities. Surprisingly, it turns out that it is possible to construct a correct Monte
Carlo scheme for off-lattice models in which only a finite number of trial segments (k), is
selected either at random or, more generally, drawn from the distribution of bond-lengths
and bond-angles of the ‘ideal’ chain molecule.
During a CBMC trial move, a polymer conformation is generated segment-by-segment.
At every step, k trial segments are generated (k is, in principle arbitrary and can be chosen
to optimize computational efficiency). One of these segments, say i, is selected with a
probability
exp(−βui )
Pi = Pk
j=1 exp(−βuj )
where uj is the change in the potential energy of the system that would result of this
particular trial segment was added to the polymer. The probability of generating a complete
conformation Γnew consisting of ℓ segments is then
P (Γ)new =
ℓ
Y
exp(−βui (n))
Pk
n=1
j=1 exp(−βuj (n))
48
To keep the equations simple, we only consider the expression for one of the ℓ segments.
k
Y
exp(−βuext (j))
dΓj ′ Pid (Γj ′ ) Pk
Pgen ({j}) = dΓj Pid (Γj )
(31)
′
j ′ =1 exp(−βuext (j ))
j ′ 6=j
In order to compute the acceptance probability of this move, we have to consider what
happens in the reverse move. Then we start from conformation j and generate a set of k
trial directions that includes i. When computing the acceptance probability of the forward
move, we have to impose detailed balance. However, detailed balance in this case means
not just that in equilibrium the number of moves from i to j is equal to the number of
reverse move, but even that the rates are equal for any given set of trial directions for
the forward and reverse moves. This condition we call ‘super-detailed balance’. Superdetailed balance implies that we can only decide on the acceptance of the forward move if
we also generate a set of k −1 trial directions around the old conformation i. We denote the
probability to generate this set of k−1 trial orientations by P rest ({i}), where the sub-script
‘rest’ indicates that this is the set of orientations around, but excluding, i. This allows us
(t)
(o)
to compute the ratio wj /wi of the Rosenbluth weights for forward and reverse moves:
(t)
(t)
wj
=
exp(−βuext (j) +
(t)
Pk−1
exp(−βuext (j ′ ))
Pk−1
exp(−βuext (i′ ))
j ′ 6=j
k
and
(o)
(o)
wi
=
exp(−βuext (i) +
i′ 6=i
(o)
k
The superscript (t) and (o) distinguish the trial conformation from the old conformation.
(t)
(o)
The acceptance probability is determined by the ratio x ≡ w j /wi (actually, for a
molecule of ℓ segments, we should compute a product of such factors). Let us assume
(t)
(o)
that wj < wi . In that case, Pacc (i → j) = x while Pacc (j → i) = 1. Next, let us
check whether detailed balance is satisfied. To do so, we write down the explicit expressions for Kij and Kji .
(t)
(o)
Kij = Ni Pgen ({j})Prest ({i})wj /wi
and
Kji = Nj Pgen ({i})Prest ({j})1
In addition, we use the fact that
(t)
Pgen ({j})Prest ({i})wj ∼ Nj Prest ({j})Prest ({i})
and
(t)
Pgen ({i})Prest ({j})wj ∼ Ni Prest ({i})Prest ({j})
49
In then follows immediately that
(o)
Kij wi
(t)
= Ni Pgen ({j})Prest ({i})wj
= constant × Ni Nj Prest ({i})Prest ({j})
(32)
(o)
= Nj Pgen ({i})Prest ({j})wi
(o)
= Kji wi
(33)
Hence, Kij is indeed equal to Kji . Note that, in this derivation, the number of trial
directions, k, was arbitrary. The procedure sketched above is valid for a complete regrowth
of the chain, but it is also possible to regrow only part of a chain, i.e. to cut a chain at a
(randomly chosen) point and regrow the cut part of the chain either at the same site or
at the other end of the molecule. Clearly, if only one segment is regrown and only one
trial direction is used, CBMC reduces to the reptation algorithm (at least, for linear homopolymers).
We still have to consider the choice for the number of trial directions at the i − th
regrowth step, ki . Too many trial directions increase the cost of a simulation cycle, but too
few trial directions lower the acceptance rate, and increase the simulation length. There
exist simple guidelines that allow us to select k i for every segment such that it optimizes
the efficiency of the simulation44 .
4.3 Beyond CBMC
The CBMC method has several drawbacks. First of all, as the scheme looks ahead only one
step at a time, it is likely to end up in “dead-alleys”. Secondly, for long chain molecules,
the Rosenbluth weight of the trial conformation tends to become quite small. Hence, much
of the computational effort may be wasted on “doomed” configurations.
4.4 DPERM
Grassberger and co-workers45 have suggested adding two ingredients to the Rosenbluth
scheme to improve its efficiency: “pruning” and “enrichment”. The basic rationale behind
pruning is that it is not useful to spend much computer time on the generation of conformations that have a low Rosenbluth weight. Therefore, it is advantageous to discard
(“prune”) such irrelevant conformations at an early stage. The idea behind enrichment is
to make multiple copies of partially grown chains that have a large statistical weight 45, 46 ,
and to continue growing these potentially relevant, chains. The algorithm that combines
these two features is called the Pruned-Enriched Rosenbluth Method (PERM). The examples presented by Grassberger and co-workers 47, 48 indicate that the PERM approach can
be very useful to estimate the thermal equilibrium properties of long polymers. The main
limitation of both the original Rosenbluth method and the PERM algorithm is that they
are “static” Monte Carlo schemes. Such schemes can simulate single polymer chains very
efficiently, but are less suited to simulate systems consisting of many polymer chains: at
each step, one would have to simultaneously generate the conformations of all the chains
in the system. On the other hand, in a dynamic scheme, one can conveniently choose a
new point in phase space by only changing one chain at each step of the algorithm.
50
Just as CBMC is the “dynamic” version of Rosenbluth sampling. Similarly, one can
construct a dynamic version of the PERM algorithm: DPERM 49 .
The static PERM algorithm uses the Rosenbluth algorithm to generate the chains except
that now pruning and enrichment are added. These ingredients are implemented as follows.
At any step of the creation of a chain, if the partial Rosenbluth weight W (j) = Π ji=1 wi
of a configuration is below a lower threshold W < (j), there is a probability of 50% to
terminate the generation of this conformation. If the conformation survives this pruning
step, its Rosenbluth weight is doubled W ∗ (j) = 2 ∗ W (j). Enrichment occurs when the
partial Rosenbluth weight of a conformation W (j) = Π ji=1 wi exceeds an upper threshold
W > (j). In that case, k copies of the partial chain are generated, each with a weight
W ∗ (j) = W (j)/k. All these copies subsequently grow independently (subject to further
pruning and enrichment).
The DPERM algorithm is the dynamic generalization of the PERM algorithm. As in
the CBMC algorithm, we bias the acceptance of trials conformations to recover a correct
Boltzmann sampling of chain conformations.
Thus, starting from an old configuration, we create a trial conformation and calculate the
probability to generate it. Starting from the condition for detailed balance, we then derive
the expression for the probability to accept or reject a new trial conformation. As we use the
Rosenbluth method to generate chains, the probability to grow a particular conformation
is :
(i)
e−βu (n)
(34)
wi
Whenever the re-weighted Rosenbluth partial weight W ∗ (j) of the chain drops below the
lower threshold W < (j), the chain has a 50% probability of being deleted. Let us assume
that this happens m times. Then, the total probability to generate a particular conformation
is :
Pgen (chain) = Πli=1
(i)
1 l e−βu (n)
Π
2m i=1
wi
and the re-weighted Rosenbluth weight of such a chain would be :
Pgen (chain) =
W ∗ (chain new) = 2m ∗ W (chain new)
with W (chain new) = Πli=1 wi
(35)
(36)
(37)
Whenever the Rosenbluth partial weight exceeds the upper threshold, k copies of the chain
are created with the Rosenbluth weight W ∗ (j) = W (j)/k, which leads to the creation of
a set of chains : this is a deterministic procedure. At every stage during the growth of the
chain, others chains will branch off. The probability to grow the entire family of chains
that is generated in one DPERM move can be written as :
Pgen (chain new) ∗ Pgen (rest new)
(38)
Where Pgen (rest new) describes the product of the probabilities involved in generating all
the other pieces of chains that branch off from the main chain. If we now call p the number
of times the Rosenbluth weight exceeds the upper threshold during the generation of the
given trial configuration, the probability to generate this particular chain is :
Pgen (chain new) = k
51
p
e
Πli=1
−βu(i) (n)
wi
(39)
and its re-weighted Rosenbluth weight is :
W ∗ (chain new) =
1
W (chain new)
kp
(40)
Here k is the number of copies that are created each time the Rosenbluth weight exceeds
the upper threshold. In Eqn. 39, the first term of the right hand side, describes the usual
probability to generate a given chain following the Rosenbluth method. The factor k p
comes from the fact that the new chain could be any of the chains in the set so that the
probability to generate a given chain is multiplied by this term. And we deduce Eqn. 40
from the fact that, each time we make some copies, the Rosenbluth weight is divided by k.
If we now also take into account the possibility that the chain can be pruned, then
Eqn. 39 becomes :
(i)
k p l e−βu (n)
Π
2m i=1
wi
e−U (n)
kp
= m
2 W (chain new)
Pgen (chain new) =
(41)
And Eqn. 40 becomes :
2m
(42)
W (chain new)
kp
Note that Eqn. 41 and Eqn. 42 respectively reduce to Eqn. 35 and Eqn. 36 in the absence of enrichment (p = 0) and to Eqn. 39 and Eqn. 40 in the absence of pruning (m = 0).
W ∗ (chain new) =
We now choose to select the new trial chain from the set of chains created by the
DPERM move with a probability given by :
Pchoose new = W ∗ (chain new)/Wtotal (new)
(43)
where W ∗ (chain new) is the re-weighted Rosenbluth weight mentioned in Eqn. 42 and
Wtotal is the sum of all such weights
X
∗
(44)
Wchain
Wtotal (new) =
set
Eqn. 43 implies that we are most likely to choose the best chain (the one with the largest
re-weighted Rosenbluth weight) of the set as the next Monte Carlo trial conformation.
Assuming that we start from an old configuration denoted by the subscript old, we generate
a new configuration following the scheme described above and, we accept this move with
the following acceptance rule :
acc(old → new) = min(1,
Wtotal (new)
)
Wtotal (old)
(45)
To calculate Wtotal (old), one has to “retrace” the old chain : the chain is first clear and
reconstructed following the procedure described above to determine its weight. This is
exactly analogous to what is done in the configurational-bias Monte Carlo scheme.
The demonstration that this scheme satisfies detailed balance is given in Ref. 49
52
This algorithm contains even more free parameters than CBMC. We choose them to
optimize computational efficiency. In practice, this usually means that we wish to generate
“enough” potentially successful trial chains, but not too many. Typically, the number of
chains that survive at the end of a trial move should be O(1).
4.5 Recoil-growth
The recoil growth (RG) scheme is a dynamic Monte Carlo algorithm that was also developed with the dead-alley problem in mind50, 51 . The algorithm is related to earlier static
MC schemes due to Meirovitch52 and Alexandrowicz and Wilding53 . The basic strategy of
the method is that it allows us to escape from a trap by “recoiling back” a few monomers
and retrying the growth process using another trial orientation. In contrast, the CBMC
scheme looks only one step ahead. Once a trial orientation has been selected, we cannot
“deselect” it, even if it turns out to lead into a dead alley. The recoil growth scheme looks
several monomers ahead to see whether traps are to be expected before a monomer is irrevocably added to the trial conformation. In this way we can alleviate (but not remove)
the dead-alley problem. In principle, one could also do something similar with CBMC by
adding a sequence of l monomers per step. However, as there are k possible directions for
every monomer, this would involve computing k l energies per group. Even though many
of these trial monomers do not lead to acceptable conformations, we would still have to
compute all interaction energies.
The RG algorithm is best explained by considering a totally impractical, but conceptually simple schemer that has the same effect. We place the first monomer at a random
position. Next, we generate k trial positions for the second monomer. From each of these
trial positions, we generate k trial positions for the third monomer. At this stage, we have
generated k 2 “trimer” chains. We continue in the same manner until we have grown k l−1
chains of length l. Obviously, most of the conformations thus generated have a vanishing
Boltzmann factor and are, therefore, irrelevant. However, some may have a reasonable
Boltzmann weight and it is these conformations that we should like to find. To simplify
this search, we introduce a concept that plays an important role in the RG algorithm: we
shall distinguish between trial directions that are “open” and those that are “closed.” To
decide whether a given trial direction, say b, for monomer j is open, we compute its energy
uj (b). The probability that trial position b is open is given by
(b) = min(1, exp[−βuj (b)]),
popen
j
(46)
For hard-core interactions, the decision whether a trial direction is open or closed is unambiguous, as popen
(b) is either zero or one. For continuous interactions we compare p open
(b)
j
j
(b), the
with a random number between 0 and 1. If the random number is less than p open
j
direction is open; otherwise it is closed. We now have a tree with k l−1 branches but many
of these branches are “dead,” in the sense that they emerge from a “closed” monomer.
Clearly, there is little point in exploring the remainder of a branch if it does not correspond
to an “open” direction. This is where the RG algorithm comes in. Rather than generating
a host of useless conformations, it generates them “on the fly.” In addition, the algorithm
uses a cheap test to check if a given branch will “die” within a specified number of steps
(this number is denoted by lmax ). The algorithm then randomly chooses among the available open branches. As we have only looked a distance l max ahead, it may still happen that
53
we have picked a branch that is doomed. But the probability of ending up in such a dead
alley is much lower than that in the CBMC scheme.
In practice, the recoil growth algorithm consists of two steps. The first step is to grow
a new chain conformation using only “open” directions. The next step is to compute the
weights of the new and the old conformations. The following steps are involved in the
generation of a new conformation:
1. The first monomer of a chain is placed at a random position. The energy of this
monomer is calculated (u1 ). The probability that this position is “open” is given by
Eqn. 46. If the position is closed we cannot continue growing the chain and we reject
the trial conformation. If the first position is open, we continue with the next step.
2. A trial position bi+1 for monomer i + 1 is generated starting from monomer i. We
compute the energy of this trial monomer ui+1 (b) and, using Eqn. 46, we decide
whether this position is open or closed. If this direction is closed, we try another trial
position, up to a maximum of k trial orientations. As soon as we find an open position
we continue with step 3.
If not a single open trial position is found, we make a recoil step. The chain retracts
one step to monomer i − 1 (if this monomer exists), and the unused directions (if any)
from step 2, for i − 1, are explored. If all directions at level i − 1 are exhausted, we
attempt to recoil to i − 2. The chain is allowed to recoil a total of l max steps, i.e.,
down to length i − lmax + 1.
If, at the maximum recoil length, all trial directions are closed, the trial conformation
is discarded.
3. We have now found an “open” trial position for monomer i+1. At this point monomer
i−lmax is permanently added in the new conformation; i.e., a recoil step will not reach
this monomer anymore.
4. Steps 2 and 3 are repeated until the entire chain has been grown.
In the naive version of the algorithm sketched above, we can consider the above steps
as a procedure for searching for an open branch on the existing tree. However, the RG
procedure does this by generating the absolute minimum of trial directions compatible
with the chosen recoil distance lmax .
Once we have successfully generated a trial conformation, we have to decide on its
acceptance. To this end, we have to compute the weights, W (n) and W (o), of the new
and the old conformations, respectively. This part of the algorithm is more expensive.
However, we only carry it out once we know for sure that we have successfully generated
a trial conformation. In contrast, in CBMC it may happen that we spend much of our time
computing the weight factor for a conformation that terminates in a dead alley.
In the RG scheme, the following algorithm is used to compute the weight of the new
conformation:
1. Consider that we are at monomer position i (initially, of course, i = 1). In the previous stage of the algorithm, we have already found that at least one trial direction is
available (namely, the one that is included in our new conformation). In addition, we
may have found that a certain number of directions (say k c ) are closed—these are the
54
ones that we tried but that died within lmax steps. We still have to test the remaining
krest ≡ k − 1 − kc directions. We randomly generate krest trial positions for monomer
i + 1 and use the recoil growth algorithm to test whether at least one “feeler” of length
lmax can be grown in this direction grown (unless i + l max > l; in that case we only
continue until we have reached the end of the chain). Note that, again, we do not
explore all possible branches. We only check if there is at least one open branch of
length lmax in each of the krest directions. If this is the case, we call that direction
“available”. We denote the total number of available directions (including the one that
corresponds to the direction that we had found in the first stage of the algorithm) by
mi . In the next section we shall derive that monomer i contributes a factor w i (n) to
the weight of the chain, where wi (n) is given by
wi (n) =
mi (n)
popen
(n)
i
and popen
(n) is given by Eqn. 46.
i
2. Repeat the previous step for all i from 1 to l − 1. The expression for the partial
weight of the final monomer seems ambiguous, as m l (n) is not defined. An easy (and
correct) solution is to choose ml (n) = 1.
3. Next compute the weight for the entire chain:
W (n) =
ℓ
Y
wi (n) =
i=1
ℓ
Y
mi (n)
.
open
(n)
p
i=1 i
(47)
For the calculation of the weight of the old conformation, we use almost the same procedure. The difference is that, for the old conformation, we have to generate k − 1 additional
directions for every monomer i. The weight is again related to the total number of directions that start from monomer i and that are “available”, i.e., that contain at least one open
feeler of length lmax :
W (o) =
ℓ
Y
wi (o) =
i=1
ℓ
Y
mi (o)
.
open
p
(o)
i=1 i
Finally, the new conformation is accepted with a probability:
acc(o → n) = min(1, exp[−βU (n)]W (n)/ exp[−βU (o)]W (o)),
(48)
where U (n) and U (o) are the energies of the new and old conformations, respectively. It
can easily be demonstrated (see 50, 51 ) that this scheme generates a Boltzmann distribution
of conformations.
5
Relative Merits
A comparison between CBMC and the RG algorithm was made by Consta et al. 51 , who
studied the behavior of Lennard-Jones chains in solution. The simulations showed that for
relatively short chains (ℓ = 10) at a density of ρ = 0.2, the recoil growth scheme was a
factor of 1.5 faster than CBMC. For higher densities ρ = 0.4 and longer chains N = 40
55
the gain could be as large as a factor 25. This illustrates the fact that the recoil scheme
is still efficient, under conditions where CBMC is likely to fail. For still higher densities
or still longer chains, the relative advantage of RG would be even larger. However, the
bad news is that, under those conditions, both schemes become very inefficient. A similar
comparison has been made between CBMC and DPERM49 . For the cases studied, DPERM
was no worse than CBMC, but also not much better. However, the fact that pruning can be
performed on any chain property (i.e. not necessarily the Rosenbluth weight), may make
the method attractive in special cases.
The basic idea behind both DPERM and RG is that these algorithms aim to avoid investing computational effort in the generation of trial moves that are, in the end, rejected.
Houdayer54 has proposed an algorithm that aims to achieve the same. In this algorithm,
the trial move is a so-called “wormhole” move, where a polymer grows in one part of the
simulation box while it shrinks at its original location. The growth-shrinkage process is
carried our using a reptation-like algorithm. This algorithm has the advantage that, even
if trial moves to the new state are rejected, the “old” state has also changed. This speeds
up relaxation. In addition, the computing effort for the wormhole scheme appears to scale
favorably with polymer size for long polymers (namely as N n55 , rather than as exp(cN )).
However, for short chains, the existing schemes are almost certainly more efficient (in
ref.54 , the comparison with CBMC is made for a particularly inefficient CBMC-parameter
choice). For longer chains, the wormhole scheme really should win. However, in that
regime, all schemes are extremely costly. Nevertheless, as pointed out in ref. 54 , a combination of the various algorithms would probably be more efficient than any one of them
alone.
Acknowledgments
The work of the FOM Institute is part of the research program of the “Stichting voor Fundamenteel Onderzoek der Materie” (FOM), which is financially supported by the “Nederlandse Organisatie voor Wetenschappelijk Onderzoek” (NWO).
References
1. D. Frenkel and B. Smit Understanding Molecular Simulations: from Algorithms to
Applications, Academic Press, San Diego, 2nd edition (2002).
2. G. Ciccotti, D. Frenkel, and I.R. McDonald. Simulation of Liquids and Solids. NorthHolland, Amsterdam, 1987.
3. J.A. Prins, Onze voorstelling omtrent de bouw van de stof. Inaugural Address, 1110-1928, Rijksuniversiteit Groningen Physica, 8:257, 1928.
4. O.K. Rice. J. Chem. Phys., 12:1, 1944.
5. J.D. Bernal. Bakerian lecture. Proc. Roy. Soc., 280:299, 1964.
6. N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.N. Teller, and E. Teller. Equation of state calculations by fast computing machines. J. Chem. Phys., 21:1087–1092,
1953.
7. E. Fermi, J.G. Pasta, and S.M. Ulam. Studies of non-linear problems. LASL Report,
LA-1940, 1955.
56
8. B.J. Alder and T.A. Wainwright. In I. Prigogine, editor, Proceedings of the International Symposium on Statistical Mechanical Theory of Transport Processes (Brussels,
1956). Interscience, New York, 1958.
9. J.B. Gibson, A.N. Goland, M. Milgram, and G.H. Vineyard. Dynamics of radiation
damage. Phys. Rev., 120:1229–1253, 1960.
10. G.H. Vineyard. Autobiographical remarks of G.H. Vineyard. In P.C. Gehlen, J.R.
Beeler, and R.I. Jaffe, editors, Interatomic Potentials and Simulation of Lattice Defects, pages xiii–xvi. Plenum, New York, 1972.
11. A. Rahman. Correlations in the motion of atoms in liquid argon. Phys. Rev.,
136:A405–A411, 1964.
12. L. Verlet. Computer ‘experiments’ on classical fluids. i. thermodynamical properties
of Lennard-Jones molecules. Phys. Rev., 159:98–103, 1967.
13. J.A. Barker and R.O. Watts. Structure of water: A Monte Carlo calculation. Chem.
Phys. Lett., 3:144–145, 1969.
14. I.R. McDonald and K. Singer. Calculation of the thermodynamic properties of liquid
argon from Lennard-Jones parameters by a Monte Carlo method, Discuss. Faraday
Soc., 43:40, 1967.
15. W.W. Wood. Early history of computer simulation in statistical mechanics. In
G. Ciccotti and W.G. Hoover, editors, Molecular Dynamics Simulations of Statistical
Mechanics Systems, pages 2–14. Proceedings of the 97th Int. “Enrico Fermi” School
of Physics, North Holland, Amsterdam, 1986.
16. N.G. van Kampen. Stochastic Processes in Physics and Chemistry. North-Holland,
Amsterdam, 1981.
17. M.P. Allen and D.J. Tildesley. Computer Simulation of Liquids. Clarendon Press,
Oxford, 1987.
18. J.J. Erpenbeck and W.W. Wood. Molecular dynamics techniques for hard-core systems. In B.J. Berne, editor, Statistical Mechanics, Part B, pages 1–40. Plenum, New
York, 1976.
19. J.L. Lebowitz, J.K. Percus, and L. Verlet. Ensemble dependence of fluctuations with
application to machine computations. Phys. Rev., 153:250–254, 1967.
20. W.W. Wood. Monte Carlo calculations for hard disks in the isothermal-isobaric ensemble. J. Chem. Phys., 48:415–434, 1968.
21. I.R. McDonald. NpT-ensemble Monte Carlo calculations for binary liquid mixtures.
Mol. Phys., 23:41–58, 1972.
22. R. Najafabadi and S. Yip. Observation of finite-temperature strain transformation
(f.c.c. ↔ b.c.c.) in Monte Carlo simulation of iron. Scripta Metall., 17:1199–1204,
1983.
23. G.E. Norman and V.S Filinov. Investigation of phase transitions by a Monte-Carlo
method. High Temp. (USSR), 7:216–222, 1969.
24. D.J. Adams. Chemical potential of hard-sphere fluids by Monte Carlo methods. Mol.
Phys., 28:1241–1252, 1974.
25. M. Creutz. Microcanonical Monte Carlo simulation. Phys. Rev. Lett., 50:1411–1414,
1983.
26. A.Z. Panagiotopoulos. Direct determination of phase coexistence properties of fluids
by Monte Carlo simulation in a new ensemble. Mol. Phys., 61:813–826, 1987.
27. D.J. Adams. Grand canonical ensemble Monte Carlo for a Lennard-Jones fluid. Mol.
57
Phys., 29:307–311, 1975.
28. D.J. Adams. Calculating the low temperature vapour line by Monte Carlo. Mol.
Phys., 32:647–657, 1976.
29. D.J. Adams. Calculating the high-temperature vapour line by Monte Carlo. Mol.
Phys., 37:211–221, 1979.
30. L.A. Rowley, D. Nicholson, and N.G. Parsonage. Monte Carlo grand canonical ensemble calculation in a gas-liquid transition region for 12-6 argon. J. Comp. Phys.,
17:401–414, 1975.
31. J. Yao, R.A. Greenkorn, and K.C. Chao. Monte Carlo simulation of the grand canonical ensemble. Mol. Phys., 46:587–594, 1982.
32. M. Mezei. A cavity-biased (T,V,µ) Monte Carlo method for the computer simulation
of fluids. Mol. Phys., 40:901–906, 1980.
33. J.P. Valleau and L.K. Cohen. Primitive model electrolytes. I. Grand canonical Monte
Carlo computations. J. Chem. Phys., 72:5935–5941, 1980.
34. W. van Megen and I.K. Snook. The grand canonical ensemble Monte Carlo method
applied to the electrical double layer. J. Chem. Phys., 73:4656–4662, 1980.
35. Monte Carlo and Molecular Dynamics Simulations in Polymer Science Edited by K.
Binder, OUP, Oxford, 1995.
36. J. Harris and S.A. Rice, A lattice model of a supported monolayer of amphiphilic
molecules: Monte Carlo simulations J. Chem. Phys. 88, 1298–1306 (1988).
37. J.I. Siepmann and D. Frenkel, Configurational-bias Monte Carlo: a new sampling
scheme for flexible chains Mol.Phys. 75, 59–70 (1992).
38. D. Frenkel, G.C.A.M. Mooij, and B. Smit, Novel scheme to study structural and
thermal properties of continuously deformable molecules J.Phys. Condensed Matter
4,3053–3076 (1992).
39. J.J. de Pablo, M. Laso, and Suter U.W, Simulation of polyethylene above and below
the melting point J. Chem. Phys. 96, 2395–2403 (1992).
40. R.H Swendsen and J.S. Wang, Nonuniversal critical dynamics in Monte Carlo simulations, Phys. Rev. Lett. 58, 86–88 (1987).
41. M.N. Rosenbluth and A.W. Rosenbluth, Monte Carlo simulations of the average
extension of molecular chains J.Chem. Phys. 23, 356–359 (1955).
42. V.I. Manousiouthakis and M.W. Deem, Strict Detailed Balance is Unnecessary in
Monte Carlo Simulation J. Chem. Phys., 110,2753–2758(1999).
43. G.C.A.M. Mooij, D. Frenkel, and B. Smit, Direct simulation of phase equilibria of
chain molecules J. Phys. Condensed Matter 4, L255–L259 (1992).
44. G.C.A.M. Mooij and D.Frenkel, A systematic optimization scheme for configurational bias Monte Carlo Molecular Simulation 17,41–55(1996).
45. P. Grassberger, Pruned-enriched Rosenbluth method: Simulations of theta polymers
of chain length up to 1,000,000 Phys. Rev. E 56, 3682–3693 (1997).
46. U. Bastolla, H. Frauenkron, E. Gerstner, P. Grassberger, and W. Nadler, Testing a
new Monte Carlo algorithm for protein folding Proteins: Struc. Func. Gen. 32,52–
66 (1998).
47. H. Frauenkron, U. Bastolla, E. Gerstner, P. Grassberger, and W. Nadler, New Monte
Carlo Algorithm for Protein Folding Phys. Rev. Lett. 80,3149–3152 (1998).
48. U. Bastolla and P. Grassberger, xxx J. Stat. Phys. 89,1061 (1997).
49. N. Combe, T.J.H.Vlugt. P.R. ten Wolde and D. Frenkel, Dynamic pruned-enriched
58
Rosenbluth method Mol. Phys. 101, 16751682 (2003).
50. S.Consta, T.J.H. Vlugt, J.Wichers Hoeth, B.Smit, and D.Frenkel, Recoil growth
algorithm for chain molecules with continuous interactions Mol. Phys. 97,1243–
1254(1999).
51. S. Consta, N.B. Wilding, D. Frenkel, and Z. Alexandrowicz, Recoil growth: An
efficient simulation method for multi-polymer systems J. Chem. Phys. 110,3220–
3228(1999).
52. H. Meirovitch, Statistical Properties Of The Scanning Simulation Method For
Polymer-Chains J. Chem. Phys. 89,2514–2522,(1988).
53. Z. Alexandrowicz and N.B. Wilding, Simulation of polymers with rebound selection
J. Chem. Phys. 109,5622–5626(1998).
54. J. Houdayer, The wormhole move: A new algorithm for polymer simulations J.
Chem. Phys. 116, 1783–1787(2002).
55. Houdayer54 suggests that the computing time per accepted move should scale as N 2 .
However, as the problem is asymptotically that of a one-dimensional random walk
with absorbing boundaries, it is more plausible that, for very long chains, the computing time per accepted move should scale as N 3 .
59