Monte Carlo Protein
Monte Carlo Protein
Monte Carlo Protein
K. P. N. Murthy
Materials Science Division, Indira Gandhi Centre for Atomic Research,
Kalpakkam - 603 102, Tamilnadu, India
R. Chidambaram
Central Complex, Bhabha Atomic Research Centre,
Mumbai - 400 085, India
We demostrate that the recently proposed interacting growth walk (IGW) model, modified for
generating self-avoiding heteropolymers, proves to be a simpler alternative to the other Monte Carlo
methods available in the literature for obtaining minimum energy conformations of lattice proteins.
In fact, this simple growth algorithm seems to be capable of quickly leading to low energy states for
all the three dimensional bench mark HP-sequences investigated.
Proteins are non-branching heteropolymers obtained cially designed Monte Carlo methods [4] have to be used
from twenty commonly occurring amino acids. Their for generating a large ensemble of conformations before
specific biological functions are intimately related to their a ground state search is performed. This ignores the
’native’ (i.e., unique and thermodynamically stable) con- phenomenology of chain growth and its conformational
formational structure. One of the most challenging prob- relaxation.
lems in Biophysics is to understand the kinetic mech- The information for building the proteins is tran-
anism by which the information encoded in an amino scribed from the DNA to the mRNA, and translated into
acid sequence helps the protein grow or fold quickly into the corresponding amino acid sequences by a specific bio-
its native conformation [1]. Since the amino acids can chemical process - namely, as the ribosomes move along
be classified broadly into hydrophobic (H) or polar (P) the mRNA strand from one codon to the next, differ-
types, the problem may be simplified by treating a pro- ent tRNA molecules lock into the appropriate places one
tein as a random copolymer made up only two types of after another, resulting in the transfer of the correspond-
amino acids with non-bonded contact interactions be- ing amino acids to the growing protein molecule. And as
tween them. If we ignore the intra-molecular struc- the protein grows, it coils into its secondary and tertiary
tures of the individual amino acids (i.e., treat them as structures as well.
monomer units) then these interactions can be expressed In this paper, we discuss an implementation of this
in terms of a few effective parameters. Such a coarse- growth scenario for any given HP-sequence using the re-
grained approach to this problem is justified by the obser- cently proposed Interacting Growth Walk (IGW) model
vation that the protein-folding mechanisms and rates are [5]. We demonstrate that this simple growth algorithm
determined more by the topology of protein conforma- is capable of quickly finding the ground state conforma-
tions than by the details of interatomic interactions [2]. tions for all the three dimensional bench mark sequences
By assuming further that the individual bonds in the lin- investigated [6,7].
ear chain of monomers are all of equal length and can only Given a sequence of H and P monomers, say SN =
be along the directions of a regular lattice, we have the {Aj = H or P ; j = 0, 1, ..., N − 1}, we start growing a
lattice protein models [3], called HP-models, which are lattice protein by placing the first monomer A0 at an
amenable to exact enumeration as well as Monte Carlo arbitrarily chosen site, r0 , of a regular d-dimensional lat-
studies. tice of coordination number z. The second monomer,
In this paper, we consider the problem of finding the A1 , can be placed in any one of the z available nearest
native conformational structure that corresponds to a neighbour (NN) sites of r0 , chosen at random. Let the
given HP-sequence and contact interactions. For short walk be non-reversing so that we have a maximum of
chains, the native state (or equivalently, the ground z − 1 sites to choose from for making any further step.
state) can easily be found by an exact enumeration of all Let {rmk | m = 1, 2, ..., zk } be the ’unoccupied’ NN sites
possible conformations. However, for long chains, spe- available for making the kth step of the walk. If the
1
number of available sites, zk , is zero, then the walk can temperature case.
not grow further because it is geometrically ’trapped’.
In this case, we clear the lattice and start a fresh walk B
from r0 again. If zk 6= 0, the walk proceeds by choosing × e -e -u
one of the zk available sites at random with a probability 6
A× e ×C u u ?e
defined as follows. 6 6
Let nm e -u u ?u ×B
k be the number of non-bonded NN contacts
which the kth monomer of type Ak would make with the 6 6
u -u e ×A
walk if it were placed at site rm m
k . Clearly, 0 ≤ nk < z − 1.
Some of these contacts will be with the H type, and some (a) (b)
of them will be the P type monomers. Let these num- e -e -u -e -u
bers be denoted by nm m
Ak H and nAk P respectively. If the 6
energies associated with an HH-contact, an HP-contact u u e e ? u
and a PP-contact are denoted by εHH , εHP and εP P 6 6
u ?u × e ? u
respectively, then the cost of energy involved in plac- 6 B
ing the kth monomer of type Ak at rm k is given by
e ×
m m m A
EA k
= n Ak H ε A k H +n Ak P ε A k P . The probability of choos- (c)
ing the site rmk for the kth monomer of type
P A k may now
be defined as, PAmk ≡ exp(−βEA m
k
)/ m exp(−βE m
A k
), FIG. 1. Schematic illustration of HP-chain growth on a
where the summation is over all the zk available sites, square lattice at T = 0. Hydrophobic and Polar monomers are
β = 1/kB T , kB is the Boltzmann constant and T the represented by open and filled circles respectively. Suppose
temperature. the next monomer to be added is of Hydrophobic type. (a)
At ’infinite’ temperature (β = 0), the walk is insen- Site A will be chosen with probability 1 since it has a NN
sitive to the type of monomers being added and so is contact; (b) Sites A and B will be chosen at random with
identical to the Kinetic Growth Walk (KGW) which in probability 1/2 since they both have a NN contact each; (c)
turn is in the same universality class as self-avoiding walk Site B will be chosen with probability 1 since it has two NN
[5]. However, at zero temperature (β = ∞), the walk contacts whereas site A has only one.
will grow into a compact or an extended conformation
depending on whether the local site-energies, E, are neg- In ’pass’ 1, we use the HP-IGW algorithm to generate
ative or positive respectively. How compact a minimum a large number of chains consisting of N1 monomers in
energy conformation is going to be depends strongly on the order, as well as of the type, specified by the first sub-
the fraction, χ, of H type monomers in the walk - in sequence, SN1 . Out of all the conformations generated,
the limit χ → 1, it is the most compact one, whereas in we store only those with minimum energy, say −E1 . Let
the other limit χ → 0, it is identical to the KGW which N1 denote the ensemble as well as the number of all these
is noncompact. In order to mimic the tendency of the minimum energy conformations. This part of the algo-
H type monomers to minimize contact with the aqueous rithm is not truely kinetic because it does not simulate
medium by forming clusters, it is customary to choose the relaxation kinetics in the conformation space. Nev-
the contact energies, ~ ε ≡ (εHH , εHP , εP P ) = (−1, 0, 0), ertheless, together with the chain growth algorithm, it
in lattice protein models. quickly leads to the ground state conformations.
In this case, at T = 0, an H type monomer will be In ’pass’ 2, we take a chain, say C11 ∈ N1 , and use the
placed at a site with maximum HH-contacts whereas a HP-IGW algorithm to let it grow further by a chain seg-
P type monomer is insensitive to the type of NN con- ment consisting of N2 monomers whose order and type
tacts. We have schematically illustrated this in Fig. 1. are specified by the next subsequence, SN2 . Thus we ob-
It is clear that the total energy associated with a con- tain a chain consisting of N1 + N2 monomers whose type
formation is equal to the total number of HH-contacts and order are specified by the subsequence, SN1 ∪ SN2 .
in the walk. However, such a straightforward adap- We use the same chain segment C1 again and again for
tation of the IGW algorithm, hereafter referred to as generating a specified number of (N1 + N2 )-monomer
the HP-IGW algorithm, does not automatically lead to chain segments. We repeat this for all the chain seg-
ground state conformations corresponding to a given HP- ments, {Cj1 ∈ SN1 | j = 1, 2, ..., N1 }. It is clear that
sequence. Hence, we have further modified the HP-IGW all these chain conformations will have energy less than
algorithm so that it can be used in the multi-pass mode. or equal to −E1 . Let N2 denote the ensemble as well
The given HP-sequence, SN , may be partitioned into as the number of (N1 + N2 )-monomer conformatins with
M subsequences, SN1 , SN2 , ..., SNM , without changing minimum energy, say −E2 .
the order in which the letters appear in the original Similarly, in ’pass’ 3, we generate an ensemble, N3 , of
sequence. Clearly, SN ≡ SN1 ∪ SN2 ∪ ... ∪ SNM with (N1 + N2 + N3 )-monomer conformations with minimum
N = N1 + N2 + ... + NM . We concentrate on the zero energy −E3 , and so on. Thus, at the end of ’pass’ M,
2
we have an ensemble of N -monomer chain conformations specified by the subsequence, SN1 . The chain segments
with minimum energy, −E(< −EM−1 < ... < −E2 < in subsequent passes will be grown always in the order in
−E1 ). It may be noted that the order and type of the which they should grow. We handled these passes sepa-
monomers in the chains generated are as specified by the rately and manually, rather than as integrated modules
original sequence, SN . We have schematically illustrated in a single program. In Table I, we have presented our
this algorithm in Fig.2. We generate a large number of results along with the ones reported in the literature.
chain conformations at every stage, not for the purpose
of statistics, but to make sure that we obtain as many TABLE I. Ground state energies obtained for the 3d bench
distinct conformations as possible. This improves the mark sequences. All the sequences of Yue et al. [6] are of
length N = 48. The sequences 10 and 11 of Toma and Toma
chances of obtaining minimum energy states. Moreover,
[7] are of lengths, N = 46 and N = respectively. The two
the optimal lengths of the subsequences (i.e., the opti-
energy values presented in the middle column are those of
mal values of N1 , N2 , N3 etc.,) for which we may get the Yue et al and Bastolla et al [6]
’global’ minimum are strongly dependent on the partic-
ular sequence under consideration. Ref. Seq. −Emin (reported) −Emin (ours)
h
Hh
P
Hh
PPh h
HPH
P N1 0 Yue et al [6] 1 31, 32 31
Hhh
−E1 PHh
HP Phh 2 32, 34 32
HP
H
P N2 3 31, 34 32
Hhh
−E2 P
HPhh
HP h 4 30, 33 30
HPP N3 5 30, 32 30
HH
−E3 6 30, 32 30
7 31, 32 31
?
−E 8 31, 31 30
9 31, 34 31
10 33, 33 31
0.15
the site r0 with the monomer of the type specified by the 0.00
3
were in turn used for growing the last segment of length, (2001).
N3 = 12, in the third pass. The distributions, not nor- [6] K. Yue et al, Proc. Natl. Acad. Sci. U. S. A, 92, 325 (1995);
malized in any way with respect to each other, schemat- U. Bestolla, H. Frauenkron, E. Gerstner, P. Grassberger
ically illustrate how the number of contacts in a typical and W. Nadler, Proteins: Struct. Func. Genetics, 32, 52
configuration can be increased in a progressive manner. (1998);
[7] L. Toma and S. Toma, Protein Sci. 5, 147 (1996).
Thus, we have a simple but powerful growth algorithm
[8] F. T. Wall and J.J. Erpenbeck, J. Chem. Phys. 30, 634
which can be used for obtaining the ground state confor- (1959); ibid., 637 (1959).
mations of lattice proteins. A finite temperature version
of this algorithm can be implemented in two ways. Since
the temperature used in the chain growth process, say
TG , could be different from the temperature T at which
the fully grown conformations are sampled, we may use
the growth module HP-IGW either at TG = 0 or at
TG 6= 0. In the latter case, we may set TG = T so as
to have just one temperature for the entire process. Hav-
ing generated a chain conformation of energy E, we may
store it with a probability proportional to the Botzmann
factor, exp(−βE). This algorithm is similar to the simple
enrichment scheme of Wall and Erpenbeck [8] for KGW
at TG = T → ∞. It is also likely that this finite temper-
ature version is more efficient for a ground state search
than the zero temperature version discussed in this pa-
per. A detailed study of this algorithm and its variants
will be reported elsewhere.
S. L. N thanks A. K. Rajarajan for critical reading of
the manuscript.
∗
[email protected];