Design DNA

Download as pdf or txt
Download as pdf or txt
You are on page 1of 17

Downloaded from rsif.royalsocietypublishing.

org on March 5, 2012

Design and analysis of DNA strand displacement devices


using probabilistic model checking
Matthew R. Lakin, David Parker, Luca Cardelli, Marta Kwiatkowska and Andrew Phillips
J. R. Soc. Interface published online 4 January 2012
doi: 10.1098/rsif.2011.0800

Supplementary data

"Data Supplement"
http://rsif.royalsocietypublishing.org/content/suppl/2012/01/03/rsif.2011.0800.DC1.htm
l

References

This article cites 27 articles, 6 of which can be accessed free

P<P

Published online 4 January 2012 in advance of the print journal.

http://rsif.royalsocietypublishing.org/content/early/2012/01/03/rsif.2011.0800.full.html#
ref-list-1

This article is free to access

Email alerting service

Receive free email alerts when new articles cite this article - sign up in the box at the top
right-hand corner of the article or click here

Advance online articles have been peer reviewed and accepted for publication but have not yet appeared in
the paper journal (edited, typeset versions may be posted when available prior to final publication). Advance
online articles are citable and establish publication priority; they are indexed by PubMed from initial publication.
Citations to Advance online articles must include the digital object identifier (DOIs) and date of initial
publication.

To subscribe to J. R. Soc. Interface go to: http://rsif.royalsocietypublishing.org/subscriptions

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

J. R. Soc. Interface
doi:10.1098/rsif.2011.0800
Published online

Design and analysis of DNA strand


displacement devices using
probabilistic model checking
Matthew R. Lakin1,3,, David Parker2,, Luca Cardelli1,
Marta Kwiatkowska2 and Andrew Phillips1,*
1

Microsoft Research, 7 JJ Thomson Avenue, Cambridge CB3 0FB, UK


Department of Computer Science, University of Oxford, Wolfson Building, Parks Road,
Oxford OX1 3QD, UK
3
Department of Computer Science, University of New Mexico, Albuquerque, NM 87131, USA
2

Designing correct, robust DNA devices is difcult because of the many possibilities for
unwanted interference between molecules in the system. DNA strand displacement has been
proposed as a design paradigm for DNA devices, and the DNA strand displacement (DSD)
programming language has been developed as a means of formally programming and analysing these devices to check for unwanted interference. We demonstrate, for the rst time,
the use of probabilistic verication techniques to analyse the correctness, reliability and performance of DNA devices during the design phase. We use the probabilistic model checker
PRISM, in combination with the DSD language, to design and debug DNA strand
displacement components and to investigate their kinetics. We show how our techniques can
be used to identify design aws and to evaluate the merits of contrasting design decisions,
even on devices comprising relatively few inputs. We then demonstrate the use of these components to construct a DNA strand displacement device for approximate majority voting.
Finally, we discuss some of the challenges and possible directions for applying these methods
to more complex designs.
Keywords: DNA computing; formal verication; probabilistic model checking;
DNA strand displacement

1. INTRODUCTION

exhibit stochastic behaviour, for example, owing to the


possibility of failures or uncertainty regarding timing.
Conventional (non-probabilistic) model-checking techniques can be used to check correctness properties such
as molecules 1 and 2 are never simultaneously bound to
molecule 3, whereas probabilistic model checking allows
verication of quantitative guarantees such as the probability of a strand displacement device failing to complete
within 20 min is at most 1026. Furthermore, probabilistic
model checking can be used to evaluate many other
quantitative properties, such as performance: what is the
expected time for an input signal to be transduced to an
output signal by a strand displacement circuit?. More
generally, probabilistic model checking has already been
successfully applied to the analysis of systems from a
wide range of application areas, from communication
protocols such as Bluetooth [3] to pin-cracking attacks for
cash machines [4]. In particular, it has also been used in
the domain of systems biology to analyse, for example,
cell signalling pathways [5,6]. In this paper, we use the
probabilistic model-checking tool PRISM [7].
The remainder of the paper is structured as follows.
In 2, we present DNA strand displacement and the
DSD programming language, and introduce probabilistic
model checking and the PRISM model checker. In 3, we
present the results of applying probabilistic model

Molecular computing is a relatively new eld that aims to


construct information-processing devices at the molecular
level. In particular, molecular devices constructed using
DNA show promise for a wide range of important application areas, including biosensing, biomimetic molecular
manufacture and drug delivery. However, designing correct and robust DNA devices is a major challenge. This
results, in part, from the possibility of unwanted interference between molecules in the system. The DNA
strand displacement (DSD) [1,2] has been developed to
facilitate the design, simulation and analysis of DNA
strand displacement devices.
In this paper, we propose the use of formal verication techniques to check the correctness of, and
identify faulty behaviour in, DNA device designs. We
focus on model checking, a fully automated approach
to verication based on the exhaustive exploration
of a nite-state model. We also employ probabilistic
model checking, which generalizes these techniques to
the analysis of probabilistic models of systems that
*Author for correspondence ([email protected]).

These authors contributed equally to the study.


Electronic supplementary material is available at http://dx.doi.org/
10.1098/rsif.2011.0800 or via http://rsif.royalsocietypublishing.org.
Received 20 November 2011
Accepted 6 December 2011

This journal is q 2012 The Royal Society

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

M. R. Lakin et al.
y

2 Design and analysis of DNA

t*

x*

y*

t*

t
t*

x
x

x*

y*

t*

y
y

t*

x*

y*

t*

t*

x*

y
y

t*

x*

y*

y*

t*
y

t*

t*

x*

y*

t*

t*

x*

y*

t*

t*

x*

y*

t*

Figure 1. Toehold-mediated DNA branch migration and strand displacement. (Online version in colour.)

checking to DNA strand displacement gates and systems,


including a DNA strand displacement device for approximate majority voting. Additional details of the methods
are presented in 4, followed by a discussion of future
work in 5.

2. BACKGROUND
2.1. DNA strand displacement
DNA strand displacement [8] is a mechanism for performing computation with DNA molecules. Once initial
species of DNA are mixed together, strand displacement
systems proceed autonomously [9] as increases in entropy
(from releasing strands) and enthalpy (from forming
additional base pairs) drive the system forward [10].
These increases typically result from the conversion of
active gate structures into unreactive waste. Furthermore, because DNA strand displacement relies solely
on hybridization between complementary nucleotide
sequences to perform computational steps, these systems
require no additional enzymes or transcription machinery, which in turn allows experiments to be run using
simple laboratory equipment.
In most strand displacement schemes, populations of
single strands of DNA are interpreted as signals, whereas
double-stranded DNA complexes act as gates, mediating
changes in the signal populations. Within the system, the
computational mechanism is toehold-mediated branch
migration and strand displacement [11]. At the periphery
of the system, signal populations may be connected to
uorophores for human-readable output, or regulated
by custom-designed aptamer molecules that interface to
the biological environment. The latter example highlights a key strength of DNA-based computational
devices: the ability to interface directly with biological
systems [12,13].
Figure 1 presents example branch migration and
strand displacement reactions. Each letter in the gure
represents a distinct domain (a sequence of nucleotides)
and the asterisk operator (*) denotes the Watson
Crick (CG, TA) complement of a given domain.
J. R. Soc. Interface

Short domains (represented in colour) are known as toeholds, while long domains (in grey) are often referred to as
recognition domains. We assume that toeholds are sufciently short (410 nucleotides) that they hybridize
reversibly with their complements, whereas recognition
domains are sufciently long (greater than 20 nucleotides) to hybridize irreversibly [11]. Each single strand
is oriented from the 50 (left) end to the 30 (right) end,
and each double-stranded complex consists of hybridized
single strands with opposite orientations. We assume
that the underlying nucleotide sequences have been
chosen such that distinct domains do not interact at all.
In the rst reaction from gure 1, an incoming strand
binds to a gate because the t toehold domain in
the strand hybridizes with its exposed complement
in the gate, producing the intermediate complex on the
right-hand side. Because the incoming strand is only
held on by a toehold, this reaction can be reversed, causing the single strand to oat away into the solution. In the
second reaction, the x domain in the overhanging strand
matches the next domain along the double-stranded
backbone, which means that the branching point
between the overhanging strand and the backbone can
move back and forth in a random walk called a branch
migration. Eventually, the random walk may completely
detach the short x strand from the gate in a strand displacement. This reaction is considered irreversible
because the invading strand is now attached to the gate
by a long domain as well as a toehold. Note that if the recognition domain on the strand did not match the next
domain along the gate, then branch migration could
not proceed, and the incoming strand would eventually
unbind. We call such binding reactions unproductive.
The third reaction is another branch migration, though
in this case no strand is displaced because even after
the y domain has been displaced, the rightmost strand
is still attached by a toehold. The fourth reaction is a
(reversible) unbinding reaction in which the rightmost
strand spontaneously unbinds because of the low binding
strength of the toehold.
Binding, migration and unbinding reactions such as
those illustrated in gure 1 allow signal populations

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA


to be dynamically modied over time, and irreversible strand displacement reactions such as the second
reaction from gure 1 provide a thermodynamic bias
towards producing output. Combining these different
kinds of reactions allows us to construct cascades of
gates in which the output strands from one gate serve
as the input strands for another. This technique has
enabled the construction of large, complex logic circuits
based on DNA strand displacement [14].
In this paper, we restrict our attention to a class of
gates closely related to Cardellis two-domain gate
scheme [15]. Two-domain gates are a restricted class of
systems where every strand consists of two domains (a toehold and a long recognition domain) and gates have
no structures hanging off the main double-stranded backbone of the complex. The initial and nal gates shown in
gure 1 have this property, although the intermediate
steps do involve transient overhanging structures during
branch migration and strand displacement steps. These
gate structures can be thought of as one continuous
strand hybridized with a complementary strand that
has breaks at certain points. Such restricted gate
structures could be assembled by synthesizing doublestranded DNA and inserting the breaks using either
restriction enzymes or site-specic photocleavage to
split the backbone of the DNA strand at the appropriate
point. This technique should allow gates to be constructed
with a higher yield than would be obtained with the usual
technique of annealing single strands, which has a higher
probability of producing unwanted secondary structures.
In this paper, we use a variant of two-domain gates that
relaxes the restrictions on single-stranded DNA molecules
but that retains the simplied gate structure with all its
practical benets to the experimentalist.
2.2. The DNA strand displacement
programming language
The DSD programming language [2] provides a textual
syntax for expressing the structure of DNA species such
as those portrayed graphically in gure 1. The semantics of the DSD language denes a formal translation
of a collection of DNA species into a system of chemical
reactions that captures the possible interactions
between the species. The language includes syntactic
and graphical abbreviations that allow us to represent
a particular class of DNA molecules in a concise
manner. The class of molecules in question is those
without secondary structurethat is, only singlestranded DNA sequences may hang off the main
double-stranded backbone of the molecule. This rules
out tree-like or pseudo-knotted structures, which
greatly simplies the denition of the semantics while
still allowing a wide variety of systems to be designed.
The textual syntax of the DSD programming
language and the corresponding graphical representation are presented in table 1. The syntax is dened
in terms of sequences S, L, R, strands A, gates G and systems D. A sequence S comprises one or more domains,
which can be long domains N or short domains N^.
DNA species can be single or double stranded. A
single upper strand kSl denotes a sequence S oriented
from left to right on the page, while a single lower
J. R. Soc. Interface

M. R. Lakin et al. 3

Table 1. Syntax of the DNA strand displacement (DSD)


language, in terms of strands A, gates G and systems D.
Where present, the graphical representation below is
equivalent to the program code above.

strand fSg denotes a sequence S oriented from right


to left on the page. A double strand [S] denotes an
upper strand kSl bound to the complementary lower
strand fS*g. A gate G is composed of double-stranded
segments of the form fLgkLl[S]kRlfRg, which represents an upper strand kL S Rl bound to a lower
strand fL S* Rg along the double-stranded region
[S]. The sequences L, R, L0 and R can potentially
be empty, in which case we simply omit them. Gates
are built up by concatenating gate segments G1 and
G2 along a common lower strand, written G1:G2, or
along a common upper strand, written G1::G2. In
the graphical representation, we omit the colons
altogether and connect the strands.
An individual DNA species can be an upper strand
kSl, a lower strand fSg or a gate G. We let D range
over systems of such species. Multiple systems D1,
D2 can be present in parallel, written as D1jD2.
A domain N can also be restricted to molecules D, written new N D. This represents the assumption that the
domain (or its complement) is not used by any other
molecules outside D. We also allow module denitions
)=D, where n
are the module parof the form X(n
) is an instance of the module D
ameters and X(m
replaced by m
. We assume a xed
with parameters n
set of module denitions, which are declared at the
start of the program. The denitions are assumed to
be non-recursive, such that a module cannot invoke
itself, either directly or indirectly via another module.
All of the models discussed in this paper were created
using the Visual DSD tool.1 This is a web-based
implementation of the DSD language that allows networks of strand displacement reactions to be designed,
1

http://lepton.research.microsoft.com/webdna

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

4 Design and analysis of DNA

M. R. Lakin et al.

simulated and analysed. For the purposes of this work,


we have developed additional functionality for visual
DSD that allows the reaction network to be exported
as a model that can be loaded into the PRISM probabilistic model checker for verication. Additional details
of how reaction networks are computed in DSD are
provided in 4.
2.3. Probabilistic model checking
Model checking is an automated formal verication
technique, based on the exhaustive construction and
analysis of a nite-state model of the system being veried. The model is usually a labelled state-transition
system, in which each state represents a possible conguration of the system and each transition between
states represents a possible evolution from one conguration to another. The desired correctness properties of
the system are typically expressed in temporal logics,
such as computation tree logic (CTL) or linear-time
temporal logic. We omit here a precise description of
these logics (see [16,17] for detailed coverage); instead,
we give some typical CTL formulae below, along with
their corresponding informal meanings:
A [ G !(access1 & access2): processes 1
and 2 never simultaneously access a shared resource;
A [ F end ]: the algorithm always eventually
terminates and
E [ !fail U end ]: it is possible for the algorithm to terminate without any failures occurring.
Once the desired correctness properties of the system have
been formally expressed in this way, they can then be veried using a model checker. This performs an exhaustive
analysis of the system model, for each property either
concluding that it is satised or, if not, providing a
counterexample illustrating why it is violated.
Probabilistic model checking is a generalization of
model checking for the verication of systems that exhibit stochastic behaviour. In this case, the models that
are constructed and analysed are augmented with quantitative information regarding the likelihood that
transitions occur and the times at which they do so.
In practice, these models are typically Markov chains
or Markov decision processes. To model systems of reactions at a molecular level, the appropriate model is
continuous-time Markov chains (CTMCs), in which
transitions between states are assigned ( positive, realvalued) rates. These values are interpreted as the
rates of negative exponential distributions.
Properties of CTMCs are, like in non-probabilistic
model checking, expressed in temporal logic, but are
now quantitative in nature. For this, we use probabilistic temporal logics such as continuous stochastic logic
(CSL) [18,19] and its extensions for reward-based properties [20]. For example, rather than verifying that the
protein always eventually degrades, using CSL allows
us to ask what is the probability that the protein eventually degrades? or what is the probability that the
protein degrades within t hours? Reward-based properties include what is the expected time that proteins
are bound within the rst t time units? and what is
J. R. Soc. Interface

Figure 2. Initial transducer gate code, with additional denition


for signal strands.

the expected number of phosphorylations before relocation occurs? For further details on probabilistic model
checking of CTMCs, see [19,20]. For a description of the
application of these techniques to the study of biological
systems, see Kwiatkowska et al. [21]. All of the models
discussed in this paper were analysed using PRISM [7], a
probabilistic model-checking tool developed at the
Universities of Birmingham and Oxford. Additional
details of probabilistic model checking using PRISM are
provided in 4.

3. RESULTS
In this section, we present a series of case studies
that demonstrate the application of probabilistic modelchecking techniques to the design of DNA strand
displacement systems. As mentioned previously, to do
so we have extended the Visual DSD tool [2] with the
capability to generate, from DSD designs, corresponding
model descriptions that can be directly analysed by the
PRISM probabilistic model checker [7]. We will study our
modied two-domain gate designs from 2.1, and present
analyses of various correctness, reliability and performance properties of the gates using PRISM. Finally, we will
construct an approximate majority voting system using
these components and show how additional approximation mechanisms can be adopted in order to verify
this system.
3.1. Transducer gates: correctness
We begin by considering one of the simplest reaction
gates: a transducer that takes an X signal as input
and produces a Y signal as output. The gate can be
thought of as implementing a unary chemical reaction
X ! Y. We will demonstrate the use of (non-probabilistic) model checking to debug strand displacement
systems, by detecting a bug in a awed design for a
transducer gate.
Our initial transducer design is specied by the
DSD code of gure 2. This also includes a denition
of single-stranded signals, where the S(N,x) module
denotes a population of N single-stranded signals carrying the X domain. We assume that the t^ toehold is
dened globally and shared by all gates and strands.
The T(N,x,y) module represents N parallel copies of
the transducer gate that implements the X ! Y reaction. The body of the module denition consists of
two populations of N complexes, and two populations
of N strands. The species present in the initial state
(when N 1) are shown in gure 3a. The rst gate

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA


(a)

(b)
t

c.1

(1)

y
x
x*

(1)

t* y*
x

t* x*

(1)

c.1

(2)

c.1 t
t

(1)
c.1 t

t* c.1* a*

c.1* t* a*
c.1

t*

(1)

t* a*

(1)

M. R. Lakin et al. 5

(1)
(1)

c.1

t* x* t* c.1* a* t* a*
x

x*

t* y*

c.1 t

(1)

c.1* t* a* t*

(1)

Figure 3. (a) Initial species and (b) expected nal species for the transducer gate. (Online version in colour.)

accepts an input signal X and the second gate produces


an output signal Y. The single strands are fuel species:
the rst ejects an intermediate strand from the input
gate in an irreversible reaction that prevents input
signal X from being rejected (thereby undoing the
execution of the gate so far), whereas the second
ejects the output signal Y from the output gate. This
design is closely related to the two-domain design
from [22], with the addition of a private c domain
that introduces an additional irreversible step into
the execution of the gate. The effect of this modication
is limited for the simple transducer gate but will become
more apparent when we move on to more complicated
gate designs. Note, however, that the denition of
T(N,x,y) does not include a similar new declaration
for the a domain. This has implications for crosstalk
between gate populations, as we shall see below.
The expected nal species for the transducer gate
design (when N 1) are shown in gure 3b. The intended
effect is to convert an incoming kt^ xl signal into kt^ yl,
leaving only unreactive waste. We say that a strand or
gate is reactive if it can react with some other species
present in the system to cause a strand to be displaced,
and unreactive otherwise. In this example, the unreactive species are those in which all toeholds occur in
double-stranded segments and are thus sequestered.
Here, we will focus on verifying the correctness of two
transducer gates in series. The rst gate should turn a
signal X0 into X1, and then the second should turn
signal X1 into X2. Using the DSD code from gure 2,
the input signal and pair of transducers are given by:
S(1,x0) | T(1,x0,x1)| T(1,x1,x2).
To formalize a correctness property to be checked by
PRISM, we rst need to identify the states of the model in
which the execution of the gates has completed successfully. This is performed with the following PRISM code,
which is, in a generic form, designed to be applicable to
various different designs:

Here, strands_reactive and gates_reactive


are pre-dened formulae (automatically generated
J. R. Soc. Interface

by Visual DSD) that, when evaluated in a particular state of a model, return the number of reactive
strands and reactive gates in that state, respectively.
The variable output gives the number of output
strands (in this example, kt^ x2l) and N is the
number of parallel copies of the system. So, we say
that the execution was successful when all copies of
the gate have produced the required output and there
are no reactive gates or strands (other than output
strands) present.
Notice that, by denition, when the specication
all_done given above is true; no further reactions
can occur. Hence, such states of the model are deadlock
states (those with no outgoing transitions to other
states). We specify the correctness of the system
design by stating that: (i) any possible deadlock state
that can be reached must satisfy all_done and
(ii) there is at least one path through the system that
reaches a state satisfying all_done. These two
properties can be represented by formulae in the (nonprobabilistic) temporal logic CTL, which can be veried
by PRISM:

When we use PRISM to check these two queries,


we nd that the second is true but the rst is false.
In fact, we nd that there are two deadlock states in
the model, one where all_done is false and one
where it is true. We can visualize both states using
the Visual DSD tool, as shown in gure 4. State 2, on
the right-hand side, represents the case where the
system has executed correctly and indeed this is
the result that we would anticipate: the state contains
the output strand kt^ x2l along with the inert
garbage left over from complete execution of the two
transducer gates. State 1, however, is incorrect: even
though the output strand kt^ x2l is produced,
we see that some constituent complexes of the transducers are left in a reactive state, with exposed
toehold domains.
When checking that the rst query above is false,
PRISM also produces a counterexample in the form of
a path through the model that leads to a deadlock

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

6 Design and analysis of DNA

M. R. Lakin et al.

(a)

(b)

(1)

c.1

(1)

c.2

(1)

c.2

x2

(1)

x1

c.1 t

(1)

(1)

c.1

t* a*

c.1 t

x1

(1)

x0* t* x1* c.1* t* a*

t*

x1

x2

c.2 t

c.2

(2)

c.2

t*

a*

t* x1* t* c.2* a*

x2

(1)

(1)

(1)

(1)

x0

(1)

x1

(1)
x0

c.1

t* x0* t* c.1* a*

t* a*

x1

c.2

t* x1* t* c.2* a*
x0

x1* t* x2* c.2* t* a* t*


x1

(2)

t* x0* t* c.1* a*
x0

c.1

x0

(2)

(1)

x0

x1

t* a*

c.1 t

t*

x1

x2

c.2 t

x1* t* x2* c.2* t* a*

(1)

x0* t* x1* c.1* t* a*


t

(1)

t*

(1)

(1)

Figure 4. Deadlock states for the two faulty transducers in series. (a) State 1 (error); (b) state 2 (success). (Online version in
colour.)

state where all_done is not true (state 1 from


gure 4). Analysing this path reveals exactly how the

x0 t c.1 a t a
t* x0* t* c.1* a* t* a*

x0

c.1 a t a
t x0
t* x0* t* c.1* a* t* a*

c.1

t x2 c.2 t a
x1* t* x2* c.2*t* a* t*

x1

x2 c.2 t

a t
t x2 c.2
x1* t* x2* c.2*t* a* t*

There are some subsequent reactions that tidy up as


many as possible of the species with exposed toeholds.
The output strand X2 is produced, as expected, but
J. R. Soc. Interface

x0

c.1 a t a
t x0
t* x0* t* c.1* a* t* a*

The problem arises because the ka t^l strand released


from the input complex of the X0 ! X1 transducer can

x1

system can fail. The rst few reactions proceed as one


would expect.

a
t x0 t c.1 a
t* x0* t* c.1* a* t* a*

c.1

now interact with the output complex of the X1 ! X2


transducer, causing the following reactions.

x1

a t
t x2 c.2
x1* t* x2* c.2*t* a* t*
x1
x2 c.2 t a t
x1* t* x2* c.2*t* a* t*

c.2

t x2

there are still some reactive species left at the end. The
ka t^l strand from the X0 ! X1 transducer prematurely
activates the X1 ! X2 transducer without producing the

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Figure 5. Corrected transducer gate code, with an additional new a declaration that prevents crosstalk between
different gates.

intermediate kt^ x1l signal, thereby leaving parts of the


transducers unused. This happens because of crosstalk:
the two transducers share a common recognition
domain a that allows them to interfere with each
other. In contrast, the new c declaration in the denition of the T(N,x,y) module from gure 2 enforces
that the two transducers use different nucleotide
sequences for their c domains. The existence of this
faulty behaviour was pointed out in Cardelli [15] and
illustrated by manually tracing a path through the
system. Such bugs have, however, proved to be difcult
to identify manually using simulation tools. Here, we
demonstrate that model checking can identify such
aws in an automatic fashion.
We can x the crosstalk bug in the transducer
module from gure 2 by adding an additional new a
declaration within the module denitions, as shown in
the denition of the alternative T2(N,x,y) module
in gure 5. This sufces to prevent crosstalk between
the populations of gates that implement the different
chemical reactions, because each population of gates
uses different domains for their private a and c
domains. We verify this using PRISM by re-running the
above-mentioned queries against the same model but
with occurrences of the T module replaced by T2 modules. In each case, these designs are correct: both queries
are true.
3.2. Transducer gates: performance
and reliability
Next, we examine some quantitative properties of the
transducer gate designs from 3.1. Returning rst of
all to the pair of faulty transducers, we use PRISM to analyse the kinetics of the system. Recall that there are two
possible outcomes once the system eventually terminates, one where the execution has completed
successfully and one where it has not. Using the PRISM
temporal logic queries

we can compute the probability that the transducer


pair has, after exactly T seconds: (i) terminated, (ii)
terminated in error; and (iii) terminated successfully.
Figure 6 shows how these probabilities vary for
different values of T. We see that the erroneous outcome is more likely to occur in the early stages than
the successful outcome. This is to be expected because
the error in the system arises when various intermediate
reaction steps are skipped. By removing the time
J. R. Soc. Interface

probability

Design and analysis of DNA

M. R. Lakin et al. 7

1.0
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0

terminate
error
success

0.5

1.0

1.5 2.0
T (104 s)

2.5

3.0

Figure 6. Plot showing the probability for each possible outcome of the faulty transducer pair, after T seconds. (Online
version in colour.)

parameter T from the queries, we can use PRISM to compute the eventual probability of each outcome:

As the plot in gure 6 suggests, there is in fact an equal


probability of 0.5 for each possible outcome. Intuitively,
this can be explained as follows. There is a point in the
execution of the system where, as described in 3.1, it is
possible either for the reaction to proceed as intended or
for interference between gates to occur. In fact, each of
these two conicting reactions occurs with the same
rate, meaning that they are equally likely. Furthermore,
each one is irreversible; so the nal outcome is effectively decided at this point.
Interestingly, although a single copy of the faulty
transducer pair is clearly unreliable (because it fails
with probability 0.5), we can improve the overall
reliability of the design by adding multiple (N ) parallel
copies of the gates. Section 4 of Cardelli [15] suggests
that, if large populations of these gates execute in parallel, the additional strands available in the system
should be able to unblock the partially completed
gates in the erroneous deadlock state. This hypothesis
is supported by evidence from individual stochastic
simulations of the system. Here, we use PRISM to perform
a more exhaustive analysis: we compute, for different
values of N, the expected percentage of gates in the
nal state of the system that are still reactive (recall
from 3.1 that a reactive gate in the nal state is an
indicator that the transducer did not execute correctly).
The DSD code for N copies of the transducer pair is:
S(N,x0)| T(N,x0,x1)| T(N,x1,x2). There are
several different ways to compute the desired property
using PRISM. One simple way is to use the query:

which gives the probability that there are i reactive


gates in the nal state of the system and, from this,
compute the expected nal percentage.
Figure 7 plots this value for a range of values of N.
We see that the percentage of reactive gates decreases
with increasing N, indicating, as conjectured, that running more copies of the faulty gates in parallel (i.e.
increasing N ) means that more of the gates in the
system will be used correctly.

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

M. R. Lakin et al.

40
35
30
25
20
15
10
5
0

expected time (104 s)

expected percentage

8 Design and analysis of DNA

2
3
4
5
6
N (number of copies of transducer pair)

Figure 7. Plot showing expected percentage of leftover reactive gates in the nal state of the system against the
number of parallel buggy transducersthat is, the parameter
N in the system S(N,x0)| T(N,x0,x1)| T(N,x1,x2).
We observe that the expected number of unreactive (i.e. correctly executed) gates increases as we add more parallel
copies of the buggy transducer system. (Online version in
colour.)

Finally, in this section, we consider performance properties of DNA strand displacement systems, as opposed
to the reliability properties discussed earlier. Seelig &
Soloveichik [23] showed that the time for strand displacement circuits to execute increases linearly with the
depth of the circuit. We can verify such properties
computationally with our model of DNA strand displacement and PRISM. Using the corrected transducer
design from gure 5, we constructed DSD models of
the form S(1,x0)| T2(1,x0,x1)| . . .| T2(1,x
f k 2 1g,xk), for various values of k, which correspond
to transducer chains of increasing length. Note that we
only consider a single copy of every transducer in the
chain. We used the following PRISM temporal logic query
to compute the expected time to reach a state in which
all of the gates have nished executing:

In this query, "time" denotes a simple PRISM reward


structure (see 4.2) that assigns a reward of 1 to all
states of the model. This indicates the rate at which
reward will be accumulated, i.e. T units of reward for
each T seconds spent in a state. The property abovementioned gives the expected reward accumulated
until "all_done" rst becomes true, thus giving the
expected execution time.
The results are shown in gure 8. We observe that
there is indeed a linear relationship between the time
to complete the circuit and the number of transducers
in the chain, which agrees with Seelig & Soloveichik
[23]. Thus, we can use probabilistic model checking
to analytically investigate the kinetics of strand
displacement circuits.
3.3. Catalyst gates
We now focus on a more complicated reaction gate that
models a chemical reaction X ! Z catalysed by a third
species Y, i.e. a reaction of the form X Y ! Y Z.
Figure 9 presents DSD code for a catalyst module
C(N,x,y,z) that represents N copies of the reaction
gate implementing the chemical reaction X Y !
J. R. Soc. Interface

4.5
4.0
3.5
3.0
2.5
2.0
1.5
1.0
0.5
0

2
3
4
5
6
K (size of transducer chain)

Figure 8. Plot showing the expected time to terminate for


chains of corrected transducer gates; that is, we vary the parameter k in the system S(1,x0)| T2(1,x0,x1)| . . .|
T2(1,xf k 2 1g ,xk). (Online version in colour.)

Figure 9. Catalyst gate code, presenting two different gate


implementations: one that carries out garbage collection
reactions and one that does not.

Y Z. This is an extension of the transducer gate,


which takes advantage of the fact that the extra reactant and product are both of the same species in
order to optimize the gate design. Figure 10 shows the
initial and expected nal states of the system for one
copy of the catalyst gate, i.e. the module instantiation
C(1,x,y,z). Note that the catalyst gate
C(N,x,y,z) effectively implements a catalytic reaction X Y ! Y Z by consuming a strand Y as
input and producing a different strand Y as output.
Even though a different strand is produced, the DNA
implementation effectively emulates the function of a
catalyst by producing a strand identical to the one
that is consumedhence the population of the catalyst
remains constant. This is in line with the general idea of
emulating chemical reactions using DNA [24]: it is not
the exact species and reactions that are implemented,
but equivalent ones.
In this section, we will use probabilistic model checking to investigate the relative performance of two
different catalyst gate designs. In particular, we study
the effect of omitting garbage collection from the
design, i.e. the process of tidying up intermediate species
into inert structures. Omitting garbage collection makes
the design simpler and cheaper to implement but, as we
will see, has an effect on its performance.
Figure 9 also presents DSD code for a variant catalyst
module C_NoGC(N,x,y,z) that also implements N
copies of the chemical reaction X Y ! Y Z, but
with the key difference that intermediate strands displaced from the gate during execution are not garbage

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA


(a)

(b)

(1)

(2)

(1)

(1)

(1)

(1)

z
x

c
t

t
z

x* t* z*

(1)

(1)
c

(1)

c* t* y* t* a* t*

a t a
x t y t c
t* x* t* y* t* c* a* t* a*

M. R. Lakin et al. 9

(1)
x

t* x* t* y* t* c* a* t* a*
(1)

(1)

x* t* z* c* t* y* t* a* t*

(1)

Figure 10. (a) Initial species and (b) expected nal species for the catalyst gate C(1,x,y,z). Garbage collection results in only
inert structures being present among the nal species. (Online version in colour.)

collected by the gate structure. This means that some


intermediate single-stranded fuels remain in solution
after the gate has nished executing (see gure 11). We
will quantify the effect that this has on the kinetics of subsequent reactions by using PRISM to compute the expected
completion times of catalyst gates with and without garbage collection. Intuitively, we would expect that the
intermediate strands that are not garbage collected will
accumulate over time, gradually slowing the system
down by providing a larger backward force in the kinetics.
For both variants of the catalyst gate, we adapt the
earlier fragment of PRISM code used to identify the states
of the model in which the gates have executed
successfully:

This code is as before except that, because there


are now two output species we use output1 and
output2 to refer to the two output signals, which are
Y and Z in the case of C(1,x,y,z). We no longer
require that the only reactive strands must be the
output strands because this is not true for the catalyst
gate without garbage collection (the leftover fuel
strands are reactive). Furthermore, our denition of
gates_reactive must be carefully designed to
ensure a fair comparison between gates with and without
garbage collection. In the case without garbage collection
(C_NoGC(N,X,y,z)), the nal gate structures actually
contain exposed toeholds, because there are no garbage
collection reactions to seal off the nal toehold in the
gate. Despite the fact that the nal gate has an exposed
toehold, the irreversible steps introduced by adding an
extra domain, which was not present in the design from
Cardelli [15], mean that execution of the gate is still irreversible. Because the a and c domains in the catalyst
gate are both private to this gate, we can guarantee
that no other strand in the system will be able to react
with these nished gates in a productive manner. Thus,
it is reasonable to adjust our denition of gates_reactive in this case so that these structures are not counted
as reactive.
In the case with garbage collection (C(N,x,y,z)),
the nal gate structures are indeed completely sealed
J. R. Soc. Interface

off. In order to make a fair comparison of the kinetics,


however, we must also designate the penultimate form
of each gate structure as unreactive, that is, the structure before the nal garbage collection reaction. This
is essentially saying that, for the purposes of computing the time to termination, we do not care whether
the garbage collection reactions have actually taken
place when we decide if the system has terminated. Without this, it would not be possible to make a
fair comparison.
We rst check that both designs satisfy the correctness property given earlier for transducers. Having
established this, we then look at the performance of
the designs, i.e. how quickly they execute. To quantify
the effect of garbage collection on the kinetics of
the system, we compared the behaviour of the systems
S(N,x)| S(N,y)| C(N,x,y,z) and S(N,x)|
S(N,y)| C_NoGC(N,x,y,z) for different values of
N. We vary this parameter because one would expect
that the negative effects of garbage collection will
only begin to accumulate after a number of identical
gates have been executed. Re-using the PRISM temporal
logic query from 3.2, we compute the expected time
until termination.
The results of this analysis are presented in gure 12.
We observe that, as N increases, the expected completion
time for the gates without garbage collection increases
faster than for the gates with garbage collection. This
conrms our intuition that accumulating waste strands
from earlier executions of the gate exerts an additional
backward force on subsequent executions of the gate,
gradually slowing the system down.
3.4. Approximate majority
We now use the catalyst gates from 3.3 to implement a
larger systemthe approximate majority population
protocol of Angluin et al. [25]using DNA strand displacement. The following chemical reactions implement the
approximate majority population protocol:
a X Y ! Y B

BX !X X

b Y X ! X B

d B Y ! Y Y

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

10

Design and analysis of DNA M. R. Lakin et al.


(a)

(b)

(1)

(1)

z
t

c
z

t* z*

(1)

t
c

(2)

(1)

(1)

(1)

(1)
t

(1)

t x
t y
t c
t* x* t* y* t* c*

(1)

c* t* y* t* a* t*

x
t y
t c
t* x* t* y* t* c*

a t
(1)
a* t*

t* z*

a
(1)
a* t*
a

c* t* y* t* a* t*

(1)

Figure 11. (a) Initial species and (b) expected nal species for the alternative catalyst gate C_NoGC(1,x,y,z). Note that this
gate does not perform the additional garbage collection reactions that produced the completely inert structures seen in gure 10.
(Online version in colour.)

9.0
expected time (104 s)

8.0
7.0
6.0
5.0
4.0
3.0
2.0
1.0
0

2
4
6
N (number of copies of catalyst gate)

Figure 12. Plot of expected time to completion for N parallel


copies of catalyst gates with (solid line with lled circles) and
without (solid line with lled triangle) garbage collection.
(Online version in colour.)

Figure 13. DNA strand displacement (DSD) code for a catalyst gate, which extends the C_NoGC gate from gure 9 by
using the constant keyword from the DSD language to
abstract away from population changes due to accumulation
of waste and depletion of fuel.

When an X and a Y meet, reactions (a) and (b) convert one of them into an auxiliary species B with equal
probability. Then, when a B meets an X or Y, reactions
(c) and (d ) convert the B to match the species that it
encountered. In this second step, the probability of a B
encountering an X as opposed to a Y depends on the
initial populations of X and Y in the system, and this
fact allows the system to amplify any excess population
of one species over the other to converge on a consensus
in which all of the species are converted either to X or to
Y. Furthermore, it was proved by Angluin et al. [25] that
the system converges with high probability to the population that was initially in the majority, if the original
margin is sufciently large.
Note that the above-mentioned reactions all involve
catalysts, like those from 3.3. Thus, we can use the catalyst gates discussed therein to implement this
system of chemical reactions in the DSD language, as
shown in the code in gure 13. In order to reduce the
number of reactions and to make model checking
more tractable, we will use a catalyst gate without

garbage collection. Even with this simplication, however, the fact that there are cycles in the chemical
reactions means that the system can potentially use all
of the available fuel, which causes the state space to
grow enormously. To counteract this effect, we modify
the gate designs from 3.3 further, using the constant
keyword of the DSD language. This keyword declares
that the population of a particular species should be
held constant across all reactions in the system, even
those reactions where it is produced or consumed. This
approximation can be used when the species is in
excess, allowing us to abstract away from depletion of
fuels and accumulation of waste, in cases when these
species are in very high concentrations. This helps us to
greatly restrict the size of the state space in the PRISM
model, by essentially collapsing any states that differ
only by their populations of fuels and/or waste products.
Because the population protocol is not guaranteed to
form a consensus around the species that was initially in
the majority, we use PRISM to compute the probabilities
of reaching each consensus state (X or Y ) in the DNA

J. R. Soc. Interface

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

probability of choosing X

Design and analysis of DNA

M. R. Lakin et al.

11

1.0
0.9

1.0

0.8
0.7

0.5

4
5
6
7
8
9
10

0.6
0
5

0.5
4
init X

1 1

0.4

0.3
0.2

init X

Figure 14. Surface plot which shows the probability of reaching a consensus of X, for various initial populations of
X and Y. (Online version in colour.)

0.1
0
1.0

0.5

0.5

1.0

(init Xinit Y)/N


Table 2. Probability of reaching a consensus of X, for various
initial populations of X and Y.

X1
X2
X3
X4
X5

Y1

Y2

Y3

Y4

Y5

0.5000
0.7468
0.8709
0.9341
0.9665

0.2531
0.5000
0.6843
0.8082
0.8869

0.1290
0.3156
0.5000
0.6537
0.7699

0.0658
0.1917
0.3462
0.5000
0.6349

0.0334
0.1131
0.2299
0.3651
0.5000

implementation, given different initial populations of


the species. We constructed PRISM models for various
input populations and computed the probabilities of
ending up with the two possible consensus values.
Care must be taken because, even when a consensus
has been achieved, the resulting signal strands can
still speculatively bind to the remaining fuel gates,
even though the gate will never execute fully (because
we have reached a consensus, there are no different
input species to bind and complete the reaction process)
and the strand must therefore eventually unbind again.
We use PRISM variables output_x and output_y,
which return the number of individuals in the two consensus states, taking these transient structures into
account in the denitions. The required queries in
PRISM are then:

As a sanity check, we used the following query:

to check that the probability of eventually ending up in


either consensus state is 1 in all cases. Thus, it sufces
to study just one of the two outcomes. Figure 14
shows a plot of the probability of nishing in consensus
state X for initial populations of X and Y ranging
between 1 and 5. Table 2 shows the same values. If X
or Y equals zero, then no reactions can take place,
and for total initial populations exceeding 10, the
model becomes too large to handle. We observe that,
if the initial populations of X and Y are the same,
then the system is equally likely to form a consensus
around either X or Y. This is illustrated by the contour
line at the 0.5 level in gure 14. However, as the initial
J. R. Soc. Interface

Figure 15. Probability of reaching a consensus of X, plotted


against (X0 2 Y0 )/N, which is the difference between the
initial populations of X and Y, relative to the total initial
population N X0 Y0 . Results are shown for N 4 . . . 10.

excess of one species increases, the probability of ending


up in that consensus state increases rapidly, so that
when all but one of the initial species are X (say),
then the probability of forming a consensus around X
is almost one, as we would expect.
Theoretical results from Angluin et al. [25] show
that the correct consensus is achieved with high probability if the margin
p between the initial numbers of X
and Y is above N logN , for large total initial population N. So, in gure 15, we also plot the probability
of reaching a consensus of X against (X0 2 Y0 )/N,
which is the initial difference between X and Y relative
to the total initial population N X0 Y0 , for N varying between 4 and 10. Here, we see more clearly how the
probability grows with the increase in the size of
the margin jX0 2 Y0j/N. Figure 15 also illustrates
how, for larger population sizes, we see an increasingly
clear threshold above which consensus is achieved
with high probability.
Finally, it is worth remarking that an analysis of this
system in terms of ordinary differential equations
(ODEs) has a number of limitations. In particular,
the simulation of the ODE model gives us the average
behaviour of the system, but not the probability of computing the majority. Specically, because the system is
inherently stochastic, there is always a chance that the
minority wins, whereas in the ODE model the majority
always wins. Furthermore, in cases where both species
have equal initial populations the ODE model never
converges to a majority, while the stochastic model
does converge. Thus, to correctly analyse the behaviour
of the system, a detailed analysis of the chemical master
equation is required, which is analytically unsolvable in
this case. Another alternative is to simply run large
numbers of stochastic simulations, though this is also
intractable, owing to the very low probability of error
as the number of molecules increases.
Thus, probabilistic model checking allows us to analyse properties that would be considerably more difcult
to obtain using other techniques. Essentially, probabilistic model checking can be viewed as an automated

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA M. R. Lakin et al.

'

'

'

'

'

S1

L1

S1*

S1

L1

L1

R1

'

R1

12

N*

'

'

'

R
'

L1

*
N

S1*

S*

S*

N*

*
N

L
S

R
'

'

'

S2

S1

S2

S1*

S*

S2*

S1*

S*

S2*

R2

'
S

R2

L2

S1

S1*

S*

S1*

S*

S1

'
L2

'

'

S1

R2

L1

L1

R2

'

Figure 16. Elementary reduction rules of the DSD programming language. We let S denote the migration rate of a domain
sequence S and fst(S) denote its rst domain. We also let N+ and N 2 denote the binding and unbinding rates, respectively,
of a toehold N^. We assume that fst(R2) = fst(S2) for rule (RM). This ensures that branch migration is maximal along a
given sequence and that rules (RM) and (RD) are mutually exclusive. (Online version in colour.)

way to solve the chemical master equation for small but


non-trivial model sizes. In particular, it allows us to
determine the distribution of the system, i.e. the individual possible outcomes of the system and their
corresponding probabilities.

4. METHODS
4.1. Model simulation in DNA strand
displacement
The syntax of DSD described in 2 interprets systems as
well-mixed solutions: hence we impose a standard set
of structural equivalence rules (see Lakin & Phillips [2]
for more details). In addition, we assume that no
long domain and its complement are simultaneously
unbound anywhere in the system. This well-formedness
restriction ensures that two species can only interact
with each other via complementary toeholds.
The rules in gure 16 present reduction rules that
formalize DNA interactions in the DSD language. The
arrows are labelled with rates that are used to
J. R. Soc. Interface

parametrize an exponential rate distribution. Rules


(RB) and (RU) dene reversible toehold-mediated
binding and unbinding reactions between a single
strand and a double-stranded gate complex. Rule
(RC) allows complementary toeholds to hybridize if
they are present in opposite overhanging strands of a
gate. This situation could arise when an incoming
strand contains multiple toeholds that match up to
exposed complementary toeholds in the gate. We do
not provide versions of rules (RB), (RU) and (RC),
where the complementary domain is not a toehold
because our well-formedness assumption ensures that
the only complementary domains exposed simultaneously are toeholds. Rules (RM) and (RD) present
primitive branch migration and strand displacement
reactions, respectively. Rule (RD) can be thought of
as the special case of (RM) where there are sufciently
many matching domains for the branch migration to
make it right to the end of the strand. The additional
assumption that fst(R2) = fst(S2) in rule (RM)
ensures that we only derive maximal branch migration
reactions and also ensures that rules (RM) and (RD)

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA


are mutually exclusive. As an example, the following
steps illustrate the application of the sequence of rules
(RB), (RD) and (RC) on an initial system kt^ x u^lj
ft^*g[x]fu^*g
x

13

produce more compact models for verication. In this


paper, we phrase all models in terms of these merged
rules, which allow us to translate a collection of DNA
molecules into a system of chemical reactions for analysis.
Under this condensed semantics, the four reactions from
gure 1 are combined into the following single reaction.

t x
t* x* u*

t x
t* x*

u*

x
t* x* u*

M. R. Lakin et al.

t x u
t* x* u*

The rules of gure 16 dene the fundamental DNA


reactions that we consider in this paper. We do not consider reactions that would form chain polymers or
complex secondary structures, or leak reactions. To complete the denition of reduction, however, we must also
provide additional contextual rules that allow these
primitive steps to occur in richer contexts. For example,
rule (RB) allows a strand to bind to the bottom-right
overhanging strand of a segment, whereas in reality it
could bind to any of the overhangs (provided that there
is a complementary toehold available). We also require
that these reactions can take place partway along more
complex molecules. Thus, we require each reduction
rule to be closed under rotation and mirroring of individual DNA species about the axis of the double-stranded
backbone, and under concatenation of additional
segments onto either end of the gate complex involved
in a particular reaction. We also lift the rules to transform
systems involving parallel compositions and name restrictions in the standard way. We refer the reader to Lakin &
Phillips [2] for further details.
The reduction rules in gure 16 provide a detailed
model of DNA strand displacement reactions between
single strands and gate complexes. In fact, for our purposes a simpler model would sufce. Hence, we use a
merged reduction semantics for the DSD language, in
which we model toehold binding reactions as having a
nite rate and all other reactions as instantaneous. In
particular, we consider gates to be equivalent up to
branch migration; so if two gates differ only by applications of rule (RM), we treat them as if they were
the same gate. These simplications are based on the
assumption that toehold binding steps are sufciently
slow to be rate limiting, which is valid in the limit of
low concentration.
N
We write D ) D to denote a merged reduction from
N

RB;N

D to D. Formally, D ) D means that D ! D and


RX1 ;r1

RXk ;rk

D !    ! D both hold, for some D, and where


none of the (RXi) rules are repeat occurrences of (RB).
Furthermore, because we are assuming that branch
migration is included in the structural equivalence
relation, there should be no occurrences of (RM) either.
In order to improve efciency and reduce the size of the
resulting model, we ignore unproductive reactions where
a strand binds onto a gate but cannot perform any subsequent reaction other than an unbinding. This
N

corresponds to merged reductions of the form D ) D.


The merged reduction relation dened earlier is the
Innite semantics from Lakin & Phillips [2]. This is
the most abstract model of DNA strand displacement
interactions dened in that paper, and allows us to
J. R. Soc. Interface

t*

x*

y*

t*

t*

x*

y*

t*

Where possible (i.e. where models do not become too


large for verication), we have also re-run the experiments in this paper using models obtained with
DSDs Default semantics, observing an overall slow
down in reaction time, but otherwise identical patterns
in behaviour. For all simulations and analysis, the
kinetic rates of toehold binding, toehold unbinding
and branch migration were based on the experimental measurements of Zhang & Winfree [26]. These
rates were in turn used to derive the corresponding
simulation and analysis times, in seconds.

4.2. Probabilistic model checking in

PRISM

[7] is a probabilistic model checking tool developed at the Universities of Birmingham and Oxford.
It provides support for several types of probabilistic
models, including CTMCs, which we use here. Models
are specied in a simple, state-based language based
on guarded commands. Support for several other
high-level model description languages has also been
made available through language-level translations to
the PRISM modelling language. For example, PRISM has
the ability to import SBML [27] specications, which
have an underlying CTMC semantics. Translations
from stochastic process algebra such as PEPA and the
stochastic calculus [28] have also been developed.
Formally, letting R0 denote the set of non-negative
reals and AP denote a xed, nite set of atomic propositions used to label states with properties of interest,
a CTMC is a tuple (S,R,L) where:
PRISM

S is a nite set of states;


R:(S  S) ! R0 is a transition rate matrix;
L:S ! 2AP is a labelling function which associates
each state with a set of atomic propositions.

The transition rate matrix R assigns rates to each pair


of states, which are used as parameters of the exponential distribution. A transition can only occur between
states s and s0 if R(s,s0 ). 0 and, in this case, the probability of the transition
being triggered within t time
0
units is 1 2 e 2 R(s,s )t. Typically, in a state s, there is
more than one state s0 for which R(s,s0 ). 0; this is
known as a race condition and the rst transition to
be triggered determines the next state. The time spent
in state s before any such transition occurs
P is exponentially distributed with the rate Es s0 [S Rs; s0 ,
called the exit rate. The probability of moving to state
s0 is given by R(s,s0 )/E(s).

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

14

Design and analysis of DNA M. R. Lakin et al.

A CTMC can be augmented with rewards, attached


to states and/or transitions of the model. Formally, a
reward structure for a CTMC is a pair (c,C) where

c:S ! R0 is a state reward function;


C:(S  S) ! R0 is a transition reward function.

State rewards can represent either a quantitative


measure of interest at a particular time instant (e.g.
the number of phosphorylated proteins in the system)
or the rate at which some measure accumulates over
time (e.g. energy dissipation). Transition rewards are
accumulated each time a transition occurs and can be
used to compute, e.g. the number of protein bindings
over a particular time period.
PRISM can then be used to specify and verify a range
of properties of CTMCs, including those expressed
in the logic CSL and the reward-based extension of
Kwiatkowska et al. [20]. The underlying computation
performed to apply probabilistic model checking
involves a combination of graph theoretical algorithms
(e.g. to construct and explore models) and numerical
computation methods (e.g. to calculate probabilities
or reward values). For the latter, PRISM typically
solves linear equation systems or performs transient
analysis. Owing to the size of the models that need to
be handled, it uses iterative methods rather than
direct methods. For solutions of linear equation systems, it supports a range of well-known techniques,
including the Jacobi, Gauss-Seidel and SOR (successive
over-relaxation) methods; for transient analysis of
CTMCs, it employs uniformization.
The PRISM tool also offers a graphical user interface
with a range of functionality. First, it facilitates construction and editing of PRISM models and property
specications. In addition, it includes a graph-plotting
component for visualization of numerical results from
probabilistic model checking. It also provides a tool to
perform manual execution and debugging of probabilistic models, based on an underlying discrete-event
simulation engine. Another use of this engine is to
generate approximate solutions for the numerical computations that underlie the model checking process,
by applying Monte Carlo methods and sampling.
These techniques offer increased scalability, at the
expense of numerical accuracy. The main strength of
PRISM, though, and the probabilistic model checking
techniques that it implements, is the ability to compute
quantitative properties exactly, based on exhaustive
model exploration and numerical solution.
5. DISCUSSION
We have introduced a modied version of Cardellis
two-domain gate designs [22], which is amenable to verication yet retains the key simplication of the twodomain design: initial species contain no overhangs
and can thus be constructed by inserting breaks into
one strand of a simple double-stranded DNA complex.
This should give higher yields compared with simple
annealing of single strands in a test tube.
We have demonstrated that probabilistic (and nonprobabilistic) model checking can be used to verify a
J. R. Soc. Interface

Table 3. Model checking statistics: model sizes and run-times.


example

parameters

states

time (s)

buggy transducers (3.2)

N4
N5
N6
X 3,Y 5
X 4,Y 5
X 5,Y 5

57 188
284 641
1 160 292
240 286
674 066
1 785 250

4
26
145
266
860
2602

approx. majority (3.4)

wide range of properties of individual circuit components constructing using this design. We showed
that the PRISM model checker can detect bugs owing
to crosstalk between gates, analyse quantitative properties such as reliability and performance, and compute
the probability of different possible outcomes of the
gates execution. These techniques can be particularly
useful during the initial stages of gate design. Even
model checking a single gate executing in isolation, as
in 3.2, can help us to identify errors in the design
that would be difcult to quantify using simulationbased methods. Although multiple simulation runs
can be used to approximate the probability of a given
error, performing large numbers of simulations can be
time-consuming, particularly in cases such as gure 7,
where the error probability is low for large number of
gates. More importantly, model checking can be used
to identify the source of an error, by providing a specic
execution trace of the behaviour that leads to its occurrence. As illustrated in 3.3, we can also use model
checking to investigate the potential of different
system designs, even when analysed using relatively
small numbers of inputs. Finally, we have shown how
applying additional abstractions to the populations of
fuel and waste species can allow us to scale up to verifying more complicated systems, such as the approximate
majority population protocol [25].
Nonetheless, model checking has its limitations. As
the species populations grow, the number of reaction
interleavings explodes, which causes problems for
naively scaling up to larger systems. Table 3 shows statistics for a selection of the largest models that we used
to generate the results in 3 (model checking was run
on a 2.80 GHz Dell PowerEdge R410 with 32 GB of
RAM). The table shows the size of each model and
the time required to check a single property. As
expected, model sizes grow rapidly as population sizes
are increased, meaning that models larger than those
shown in the table could not be analysed. In 3.4, we
had to approximate the populations of fuel and waste
species in the model as constant in order to prevent
the state space from becoming too large to generate.
This effect can be mitigated to an extent, for example,
by careful gate design: our modied two-domain gates
were deliberately constructed to minimize the number
of asynchronous steps required for garbage collection and sealing off used gates, which greatly expand
the state space. We also used a high level of abstraction (the Innite semantics of DSD [2]) to reduce the
number of reactions in the model as far as possible.
However, even with the cleverest gate design, the

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

Design and analysis of DNA


sheer number of interleavings will eventually become
too great.
One key challenge is to extrapolate the results from
model checking relatively small systems to systems
with higher numbers of molecules. Being able to identify design aws in individual system components, such
as the buggy transducer gate in 3.1, is already valuable
because the aws are still likely to occur when the component is present in larger numbers or is part of a more
complex design. On the other hand, our full-system
model checking does not verify interactions with an
arbitrary environment. For example, the buggy transducer gate would appear to work correctly when
model-checked in isolationthe unwanted crosstalk
only becomes apparent when two gates are modelchecked together. To help address this, we can selectively model check a given gate design with all of the
remaining gates in the system, in order to identify
possible interferences.
For quantitative properties, such as performance or
reliability, we showed in 3.2 that it is already possible
to make comparisons between gate designs using relatively small numbers of molecules. Furthermore, for
certain categories of circuits, such as those involving localized strands tethered to the surface of DNA origami [29],
the internal behaviour of each origami circuit can be analysed independently, and then used to accurately predict
the behaviour of potentially millions of circuits in solution. This is because localization signicantly reduces
cross-talk between circuits, allowing them to be accurately analysed in modular and scalable ways. We also
note that, in the context of cell signalling pathways,
PRISM has been used successfully to evaluate regulation
mechanisms for the broblast growth factor pathway [6]:
behavioural predictions from a PRISM model over small
population numbers were later validated experimentally [30]. In this paper, we were able to reproduce
behavioural trends previously analysed in a theoretical
setting [23,25]. In the future, we plan to investigate experimental validation of our analysis techniques for DNA
strand displacement circuit designs.
Other important areas for research include developing techniques to further improve the scalability of
probabilistic model checking on DNA designsfor
example, through the construction of abstractions or
by analysing systems in a compositional manner. Promising directions for the former include sliding window
abstractions [31], which optimize the analysis of temporal system properties by restricting analysis to a
particular subset of the state space for each phase of
its evolution, and the stochastic hybrid model of [32]
for analysing systems in which some populations are
present in small numbers and others in large numbers.
Abstractions will also become essential when modelling
Turing-powerful computation with DNA strand displacement, because the corresponding reaction network is
of potentially unbounded size. In this case, a notion of
dynamically generated reactions is needed, as discussed
in Lakin & Phillips [2].
Regarding compositional techniques, it may be
benecial to consider stochastic Petri nets [33], which
are an alternative means of representing the behaviour
of strand displacement systems, and have already
J. R. Soc. Interface

M. R. Lakin et al.

15

been applied to systems and synthetic biology [34].


In this approach, places correspond to DNA species
and transitions to the chemical reactions between
them. Previous work has explored compositional
model checking of (non-stochastic) modular Petri
nets [35] composed by transition sharing. In fact, in
the context of Petri net-based models for strand displacement systems, it would be advantageous to consider
compositions based on both sharing of transitions and
of places.
We believe that advances in abstraction techniques
and compositional model checking will be vital in
order to apply model checking to larger DNA strand
displacement systems. In fact, the two-domain scheme
is an excellent framework for research into compositional verication of strand displacement circuits
because the restricted syntax makes it straightforward
to compute, for any gate complex, the set of all singlestranded species that could interact with the complex.
This is a much more challenging problem for more general strand displacement schemes where strands can
contain arbitrary domains. This insight could allow us
to relate every gate complex with a nite-state automaton describing its possible interactions with the
environment, which could form the basis for a compositional verication technique. Just as we used generic
temporal logic formulae to characterize correct nal
states of the reaction gates examined in 3, for compositional model checking we would need to identify
temporal logic formulae that characterize valid
sequences of interactions between the gate and its
environment. We would hope to prove that, if a particular set of gates all satisfy those formulae, then any
cascade comprising just those gates would be correct
in some sense. Such techniques will become increasingly
important as larger and more complex strand displacement systems are constructed, such as Qian &
Winfrees [14] four-bit square root circuit.
This work is partially supported by ERC Advanced Grant
VERIWARE. We are grateful to the anonymous referees for
their helpful comments.

REFERENCES
1 Phillips, A. & Cardelli, L. 2009 A programming language
for composable DNA circuits. J. R. Soc. Interface 6,
S419S436. (doi:10.1098/rsif.2009.0072.focus)
2 Lakin, M. R. & Phillips, A. 2011 Modelling, simulating
and verifying Turing-powerful strand displacement systems. In DNA computing and molecular programming:
17th Int. Conf., DNA 17, 19 23 September 2011 (eds
L. Cardelli & W. M. Shih), pp. 130 144. Pasadena, CA,
USA. Springer Lecture Notes in Computer Science, no.
6937. Berlin, Germany: Springer.
3 Duot, M., Kwiatkowska, M., Norman, G. & Parker, D.
2006 A formal analysis of Bluetooth device discovery.
Int. J. Softw. Tools Technol. Trans. 8, 621632. (doi:10.
1007/s10009-006-0014-x)
4 Steel, G. 2006 Formal analysis of pin block attacks.
Theor. Comput. Sci. 367, 257 270. (doi:10.1016/j.tcs.
2006.08.042)
5 Calder, M., Gilmore, S. & Hillston, J. 2006 Modelling the
inuence of RKIP on the ERK signalling pathway using
the stochastic process algebra PEPA. In Transactions on

Downloaded from rsif.royalsocietypublishing.org on March 5, 2012

16

10

11

12

13

14

15

16
17
18

19

20

21

Design and analysis of DNA M. R. Lakin et al.


computational systems biology VII, vol. 4230 (eds
C. Priami, A. Ingolfsdottir, B. Mishra & H. R. Nielson),
pp. 123. LNBI. Berlin, Germany: Springer.
Heath, J., Kwiatkowska, M., Norman, G., Parker, D. &
Tymchyshyn, O. 2008 Probabilistic model checking of
complex biological pathways. Theor. Comput. Sci. 319,
239 257. (doi:10.1016/j.tcs.2007.11.013)
Kwiatkowska, M., Norman, G. & Parker, D. 2011 PRISM
4.0: verication of probabilistic real-time systems. In
Proc. 23rd Int. Conf. on Computer Aided Verication
(CAV11), Springer Lecture Notes in Computer Science,
vol. 6806, pp. 585 591. Berlin, Germany: Springer
Seelig, G., Soloveichik, D., Zhang, D. Y. & Winfree, E.
2006 Enzyme-free nucleic acid logic circuits. Science
314, 15851588. (doi:10.1126/science.1132493)
Green, S. J., Lubrich, D. & Turbereld, A. J. 2006 DNA
hairpins: fuel for autonomous DNA devices. Biophys. J.
91, 26992675. (doi:10.1529/biophysj.106.084681)
Zhang, D. Y. & Seelig, G. 2011 Dynamic DNA nanotechnology using strand-displacement reactions. Nat. Chem. 3,
103 113. (doi:10.1038/nchem.957)
Yurke, B. & Mills Jr, A. P. 2006 Using DNA to power
nanostructures. Genet. Program. Evol. Mach. Arch. 4,
111 122. (doi:10.1023/A:1023928811651)
Benenson, Y., Paz-Elizur, T., Adar, R., Keinan, E.,
Livneh, Z. & Shapiro, E. 2001 Programmable and autonomous computing machine made of biomolecules. Nature
414, 430 434.
Venkataraman, S., Dirks, R. M., Ueda, C. T. & Pierce,
N. A. 2010 Selective cell death mediated by small conditional RNAs. Proc. Natl Acad. Sci. USA 107, 16 777
16 782. (doi:10.1073/pnas.1006377107)
Qian, L. & Winfree, E. 2011 Scaling up digital circuit computation with DNA strand displacement cascades. Science
332, 11961201. (doi:10.1126/science.1200520)
Cardelli, L. 2010 Two-domain DNA strand displacement. In
Developments in computational models (DCM 2010), vol. 26
(eds S. B. Cooper, E. Kashe & P. Panangaden). Electronic
Proc. in Theoretical Computer Science, pp. 4761
Clarke, E., Grumberg, O. & Peled, D. 1999 Model checking. Cambridge, MA: MIT Press.
Baier, C. & Katoen, J. P. 2008 Principles of model checking. Cambridge, MA: MIT Press.
Aziz, A., Sanwal, K., Singhal, V. & Brayton, R. 2000 Modelchecking continuous time Markov chains. ACM Trans.
Comput. Logic 1, 162170. (doi:10.1145/343369.343402)
Baier, C., Haverkort, B., Hermanns, H. & Katoen, J. P.
2003 Model-checking algorithms for continuous-time
Markov chains. IEEE Trans. Softw. Eng. 29, 524541.
(doi:10.1109/TSE.2003.1205180)
Kwiatkowska, M., Norman, G. & Parker, D. 2007 Stochastic model checking. In Formal methods for the design of
computer, communication and software systems: performance evaluation (SFM07), vol. 4486 (eds M. Bernardo &
J. Hillston). Lecture Notes in Computer Science (tutorial
volume), pp. 220 270. Berlin, Germany: Springer.
Kwiatkowska, M., Norman, G. & Parker, D. 2010
Symbolic systems biology. In Probabilistic model checking
for systems biology (ed. M. Sriram Iyengar), pp. 3159.
Sudbury, MA: Jones and Bartlett.

J. R. Soc. Interface

22 Cardelli, L. In press. Two-domain DNA strand displacement. Math. Struct. Comput. Sci.
23 Seelig, G. & Soloveichik, D. 2009 Time-complexity of multilayered DNA strand displacement circuits. In DNA
computing and molecular programming: 15th Int. Conf.,
DNA 15, Fayetteville, AR, USA, 8 11 June 2009, Revised
Selected Papers, vol. 5877 (eds R. Deaton & A. Suyama).
Lecture Notes in Computer Science, pp. 144 153. Berlin,
Germany: Springer.
24 Soloveichik, D., Seelig, G. & Winfree, E. 2010 DNA as a universal substrate for chemical kinetics. Proc. Natl Acad. Sci.
USA 107, 53935398. (doi:10.1073/pnas.0909380107)
25 Angluin, D., Aspnes, J. & Eisenstat, D. 2008 A simple
population protocol for fast robust approximate majority.
Distrib. Comput. 21, 87 102. (doi:10.1007/s00446-0080059-z)
26 Zhang, D. Y. & Winfree, E. 2009 Control of DNA strand
displacement kinetics using toehold exchange. J. Am.
Chem. Soc. 131, 17 303 17 314.(doi:10.1021/ja906987s)
27 Hucka, M. et al. 2003 The systems biology markup
language (SBML): a medium for representation and
exchange of biochemical network models. Bioinformatics
9, 524531. (doi:10.1093/bioinformatics/btg015)
28 Priami, C., Regev, A., Shapiro, E. & Silverman, W. 2001
Application of a stochastic name-passing calculus to
representation and simulation of molecular processes. Inf.
Process. Lett. 80, 25 31. (doi:10.1016/S0020-0190
(01)00214-9)
29 Chandran, H., Gopalkrishnan, N., Phillips, A. & Reif,
J. H. Localized hybridization circuits. In DNA computing
and molecular programming, 17th Int. Conf. DNA 17,
Pasadena, CA, USA, 19 23 September 2011, vol. 6937
(eds L. Cardelli & W. M. Shih). Lecture Notes in Computer Science, pp. 6483. Berlin, Germany: Springer.
30 Sandilands, E., Akbarzadeh, S., Vecchione, A., McEwan,
D. G., Frame, M. C. & Heath, J. K. 2007 Src kinase modulates the activation, transport and signalling dynamics of
broblast growth factor receptors. EMBO Rep. 8, 1162
1169. (doi:10.1038/sj.embor.7401097)
31 Wolf, V., Goel, R., Mateescu, M. & Henzinger, T. 2010
Solving the chemical master equation using sliding
windows. BMC Syst. Biol. J. 4, 42. (doi:10.1186/17520509-4-42)
32 Henzinger, T., Mateescu, M., Mikeev, L. & Wolf, V.
2010 Hybrid numerical solution of the chemical master
equation. In Proc. 8th Int. Conf. on Computational
Methods in Systems Biology (CMSB10), pp. 5565.
New York, NY: ACM.
33 Goss, P. J. E. & Peccoud, J. 1998 Quantitative modeling of
stochastic systems in molecular biology by using stochastic
Petri nets. Proc. Natl Acad. Sci. USA 95, 67506755.
(doi:10.1073/pnas.95.12.6750)
34 Heiner, M., Gilbert, D. & Donaldson, R. 2008 Petri nets
for systems and synthetic biology. In Proc. SFM 2008,
vol. 5016 (eds M. Bernardo, P. Degano & G. Zavattaro),
Lecture Notes in Computer Science, pp. 215 264. Berlin,
Germany: Springer.
35 Christensen, S. & Petrucci, L. 2000 Modular analysis of
Petri nets. Comput. J. 43, 224 242. (doi:10.1093/
comjnl/43.3.224)

You might also like