Synchronous Elastic Networks: Sava Krsti C Jordi Cortadella Mike Kishinevsky, John O'Leary
Synchronous Elastic Networks: Sava Krsti C Jordi Cortadella Mike Kishinevsky, John O'Leary
Synchronous Elastic Networks: Sava Krsti C Jordi Cortadella Mike Kishinevsky, John O'Leary
... 3 5 2 1
Abstract— We formally define—at the stream transformer (a) + ... 7 6 2 3
level—a class of synchronous circuits that tolerate any variabil- ... 4 1 0 2
ity in the latency of their environment. We study behavioral
properties of networks of such circuits and prove fundamental ... 3 5 2 1
compositionality results. The paper contributes to bridging the (b) +e ... 7 6 2 3
... 4 1 0 2
gap between the theory of latency-insensitive systems and the
correct implementation of efficient control structures for them. Fig. 1. (a) Conventional synchronous adder, (b) Synchronous elastic adder.
I. I NTRODUCTION
The conventional abstract model for a synchronous circuit is proofs are omitted, but are available in the technical report
a machine that reads inputs and writes outputs at every cycle. [7].
The outputs at cycle i are produced according to a calculation
that depends on the inputs at cycles 0, . . . , i. Computations A. Overview
and data transfers are assumed to take zero delay. Figure 1(a) depicts the timing behavior of a conventional
Latency-insensitive design by Carloni et al. [2] aims to relax synchronous adder that reads input and produces output data
this model by elasticizing the time dimension and so decou- at every cycle (boxes represent cycles). In this adder, the i-th
pling the cycles from the calculations of the circuit. It enables output value is produced at the i-th cycle. Figure 1(b) depicts
the design of circuits tolerant to any discrete variation (in a related behavior of an elastic adder—a synchronous circuit
the number of cycles) of the computation and communication too—in which data transfer occurs in some cycles and not in
delays. With this modular approach, the functionality of the others. We refer to the transferred data items as tokens and we
system only depends on the functionality of its components say that idle cycles contain bubbles.
and not on their timing characteristics. Put succinctly, elasticization decouples cycle count from
The motivation for latency-insensitive design comes from token count. In a conventional synchronous circuit, the i-th
the difficulties with timing and communication in nanoscale token of a wire is transmitted at the i-th cycle, whereas in
technologies. The number of cycles required to transmit data a synchronous elastic circuit the i-th token is transmitted at
from a sender to a receiver is governed by the distance some cycle k ≥ i.
between them, and often cannot be accurately known until Turning a conventional synchronous adder into a syn-
the chip layout is generated late in the design process. Tra- chronous elastic adder requires a communication discipline
ditional design approaches require fixing the communication that differentiates idle from non-idle cycles (bubbles from
latencies up front, and these are difficult to amend when tokens). In SELF, this is implemented by a pair of single-
layout information finally becomes available. Elastic circuits bit control wires: Valid and Stop. Every input or output wire
offer a solution to this problem. In addition, their modularity Z in a synchronous component is associated to a channel in
promises novel methods for microarchitectural design that the elastic version of the same component. The channel is a
can use variable-latency components and tolerate static and triple of wires hZ, validZ , stopZ i, with Z carrying the data and
dynamic changes in communication latencies, while—unlike the other two wires implementing the control bits, as shown
asynchronous circuits—still employing standard synchronous in Figure 2(b). A token is transferred on this channel when
design tools and methods. validZ ∧ ¬stopZ : the sender sends valid data and the receiver
Cortadella et al. [4] present a simple elastic protocol, called is ready to accept it; see Figure 4. Additional constraints that
SELF (Synchronous Elastic Flow) and describe methods for guarantee correct elastic behavior are given in Section III.
efficient implementation of elastic systems and for conversion There we define precisely the class of elastic circuits and what
of regular synchronous designs into elastic form. Inspired by it means for a circuit Ae to be an elastization of a given circuit
the original work on latency-insensitive design [2], SELF also A. In particular, our definition implies liveness: Ae produces
differs from it in ways that render the theory developed in [2] infinite streams of tokens if its environment produces infinite
hardly applicable. streams of tokens at the input channels and is ready to accept
In this paper we give theoretical foundations of SELF: a infinite streams at the output channels.
novel and arguably more practicable definition of elasticity, Suppose N is a network of standard (non-elastic) compo-
and the basic compositionality results. For space reasons, the nents, as in Figure 2(a). Suppose we then take elasticizations of
Fig. 2. A synchronous network (a) and its elastic counterpart (b). Fig. 3. Four machines (left) put into a network N (middle), and the network’s
dependency graph ∆(N ) (right). The nodes of ∆(N ) are wires; internal
wires get two labels. The arcs are non-sequential input-output wire pairs of
component circuits. Dotted arcs indicate that (1,2) and (7,10) are sequential
these standard components and join their channels accordingly, pairs for A and C resp.; they are not part of ∆(N ) so ∆(N ) is acyclic.
as in Figure 2(b), ignoring the buffer. Will the resulting
network N e be an elasticization of N ? Will it be elastic at all?
These fundamental questions are answered by Theorem 4 of more complex than that of SELF (Figure 4) and consequently
Section IV, which is the main result of the paper. The answers LID requires significantly more complex implementation. For
are “yes”, provided a certain graph ∆e (N e ) associated with example, conversion of a regular design into LID form needs
N e is acyclic. This graph captures the information about paths a wrapper or registers around every module, increasing the la-
inside elastic systems that contain no tokens—analogous to tency of each module’s computation by two cycles—a penalty
combinational paths in ordinary systems. Importantly, ∆e (N e ) that is not required in the SELF elasticization. There might
can be constructed using only local information (the “sequen- also be practical challenges in interfacing a LID system with
tiality interfaces”) of the individual elastic components. an existing non-LID module, requiring the latter to generate
Since elastic networks tolerate any variability in the latency stop signals with complex semantics.
of the components, empty FIFO buffers can be inserted in
any channel, as shown in Figure 2(b), without changing the cycle 0 1 2 3 4 5 6 7 8 9 ...
functional behavior of the network. This practically important dataZ ∗ A B B B C ∗ ∗ D D ...
fact is proved as a consequence of Theorem 4. validZ 0 1 1 1 1 1 0 0 1 1 ...
stopZ 0 0 1 1 0 0 0 1 1 0 ...
Synchronous circuits are modeled in this paper as stream
transformers, called machines. This well-known technique (see SELF @ t @ @ t t @ @ @ t ...
[8] and references therein) appears to be quite underdeveloped. LID @ t t @ t t @ @ @ t ...
Our rather lengthy preliminary Section II elaborates the nec- Fig. 4. Comparing the SELF and LID protocols. The bottom rows show the
essary theory of networks of machines, culminating with a states of the channel Z, differentiating between bubbles (@) and tokens (t).
When ¬validZ , the value at the data wire is irrelevant (labelled ∗ in cycles
surprisingly novel combinational loop theorem (Theorem 1). 0, 6 and 7). The receiver can issue a stopZ even when the sender does not
Figure 3 illustrates Theorem 1 and, by analogy, Theorem 4 send valid data (cycle 7). In the cycles 3, 4, and 9, the sender persistently
as well. It relies on the formalization of the notion of combina- maintains the same valid data as in the previous cycle. In SELF, data transfer
takes place in cycles 1,4,5,9, so the transferred sequence is ABCD . . .. In
tional dependence at the level of input-output wire pairs. Each LID, the same sequence of values on the channel wires signifies transfer of a
input-output pair of a machine is either sequential or not, and different sequence of data: ABBCD . . . This is because a token is transferred
the set of sequential pairs provides a machine’s “sequentiality on the LID channel when validZ ∧ ¬(stopZ ∧ pre(stopZ )), where pre
stands for the value during the previous cycle. (The first occurrence of the
interface”. When several machines are put together into a stop request stopZ = 1 means “perhaps you will need to stop next cycle”
network N , their sequentiality interfaces define the graph and the data item B sent through the channel during cycle 2 is assumed to
∆(N ), the acyclicity of which is a test for the network to be successfully transmitted to the receiver.)
be a legitimate machine itself.
Elasticizations of ordinary circuits are not uniquely defined. We emphasize that the limitations of LID implementations
On the other hand, for every elastic machine A there is a are not inherent to the concept of patient processes. Regarding
unique standard machine, denoted A| , that corresponds to it. latency properties, they do not seem to be more limited than
We do not discuss any specific elasticization procedures in this elastic systems. Still, it turns out that patient processes are not
paper, but state our results in the form that only involves elastic general enough to model elastic systems as we define them
machines and their unique standard counterparts. This makes in Section III. This we prove in Section V where patient
the results applicable to multiple elasticization procedures. processes and elastic systems are compared as alternative
formalizations of latency-insensitive circuits.
B. Related Work Suhaib et al. [12] revisited and generalized Carloni’s elasti-
Carloni et al. [2] pioneered a theory of latency-insensitive cization procedure, validating its correctness by a simulation
circuits based on their notion of patient processes. Patient method based on model checking.
processes are defined at a high level of abstraction that models Lee et al. [9] study causality interfaces (pairwise input-
communication on a channel only by “token or bubble”, leav- output dependencies) and are “interested in existence and
ing implementation protocol(s) unspecified. In the companion uniqueness of the behavior of feedback composition”, but do
paper [3], Carloni et al. give an incomplete description of not go as far as deriving a combinational loop theorem.
an implementation protocol. Assuming our recovery of that In their work on design of interlock pipelines [6], Jacobson
protocol (let us call it LID) is accurate, its transfer condition is et al. use a protocol equivalent to SELF, without explicitly
2
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
specifying it. Lemma 1: If f : A∞ → A∞ is contractive, then it has a
Manohar and Martin discuss “slack elasticity” of asyn- unique fixpoint.
chronous implementations in [10]. Their slack elasticity con- Remark. One can define the distance d(a, b) between se-
ditions relate to the structure of choices in the asynchronous quences a and b to be 1/2k , where k is the length of the
specification. Unlike [10], in the current paper we deal with largest common prefix of a and b. This gives the sets A∞ and
synchronous systems and we take a black box view of their Aω the structure of complete metric spaces and Lemma 1 is an
control—no information about the control flow (and hence on instance of Banach Fixed Point Theorem. See the review paper
the structure of choices) is ever used. Instead the connectivity [8] for more details and references about the metric semantics
information corresponding to the system data-flow is used for of systems and [13] for “diadic arithmetic of circuits”. We
elasticization. Conservatively ignoring control flow may lead choose not to use the metric space terminology in this paper
to a performance penalty, but simplifies the translation to an since all “metric reasoning” we need can be as easily done
elastic system. with equivalence relations ∼k instead. See [11] for principles
of reasoning with such “converging equivalence relations” in
II. C IRCUITS AS S TREAM F UNCTIONS
more general contexts.
In this section we introduce machines as a mathematical
abstraction of circuits without combinational cycles. For sim- B. Systems
plicity, this abstraction implicitly assumes that all sequential
Suppose W is a set of typed wires; all we know about
elements inside the circuit are initialized. Extending to par-
an individual wire w is a set type(w) associated to it. A
tially initialized systems appears to be trivial. While there is a
W -behavior is a function σ that associates a stream σ.w ∈
large body of work studying circuits or equivalent objects with
type(w)∞ to each wire w ∈ W . The set of all W -behaviors
good (e.g. constructive [1]) combinational cycles and their
will be denoted JW K. Slightly abusing the notation, we will
composition (e.g. [5]), we deliberately restrict consideration
also write JwK for the set type(w)∞ . Notice that the equiva-
to the fully acyclic objects, since neither logic synthesis nor
lence relations ∼k extend naturally from streams to behaviors:
timing analysis can properly treat circuits with combinational
cycles. σ ∼k σ 0 iff ∀w ∈ W. σ.w ∼k σ 0 .w
Most of the effort in this section goes into establishing
modularity conditions guaranteeing that a system obtained as a Notice also that a W -behavior σ can be seen as a single
network of machines (the feedback construction in particular) stream (σ[0], σ[1], . . .) of W -states, where a state is an as-
is a machine itself. signment of a value in type(w) to each wire w.
Definition 1: A W -system is a subset of JW K.
A. Streams
Example. A circuit that at each clock cycle receives an
A stream over A is an infinite sequence whose elements integer as input and returns the sum of all previously received
belong to the set A. The first element of a stream a is referred inputs is described by the W -system S, where W consists
to by a[0], the second by a[1], etc. For example, the equation of two wires u, v of type Z, and S consists of all stream
a[i] = 3i + 1 describes the stream (1, 4, 7, . . .). pairs (a, b) ∈ Z∞ × Z∞ such that b[0] = 0 and b[n] =
The set of all streams will be denoted A∞ . Occasionally a[0]+· · ·+a[n−1] for n > 0. Each stream pair (a, b) represents
we will need to consider finite sequences too; the set of all, a behavior σ such that σ.u = a and σ.v = b.
finite or infinite, sequences over A is denoted Aω . We will use wires as typed variables in formulas meant to
We will write a ∼k b to indicate that the streams a and describe system properties. The formulas are built using ordi-
b have a common prefix of length k. The equivalence rela- nary mathematical and logical notation, enhanced with tempo-
tions ∼0 , ∼1 , ∼2 , . . . are progressively finer and have trivial ral operators next, always, and eventually, denoted respectively
intersection. Thus, to prove two sequences a and b are equal, by ( )+ , G, F. As an illustration, the system S in the example
it suffices to show a ∼k b holds for every k. Note also that above is characterized by the property v = 0∧G (v + = v +u).
a ∼0 b holds for every a and b. Also, one has S |= F G (u > 0) ⇒ F G (v > 1000), where |=
We will use the equivalence relations ∼k to express prop- is used to denote that a formula is true of a system.
erties of systems and machines viewed as multivariate stream
functions. All these properties will be derived from the fol- C. Operations on Systems
lowing two basic properties of single-variable stream functions
f : A∞ → B ∞ . If W 0 ⊆ W , there is an obvious projection map σ 7→
σ ↓ W 0 : JW K → JW 0 K. These projections are all one needs
causality: ∀a, b ∈ A∞ . ∀k ≥ 0. a ∼k b ⇒ f (a) ∼k f (b) for the definition of the following two basic operations on
contraction: ∀a, b ∈ A∞ . ∀k ≥ 0. a ∼k b ⇒ f (a) ∼k+1 f (b) systems.
Definition 2: (a) If S is a W -system and W 0 ⊆ W , then
Informally, f is causal if (for every a) the first k elements of hiding W 0 in S produces a (W − W 0 )-system hideW 0 (S)
f (a) are determined by the first k elements of a, and f is defined by
contractive if the first k elements of f (a) are determined by
the first k − 1 elements of a. τ ∈ hideW 0 (S) iff ∃σ ∈ S. τ = σ ↓ (W − W 0 ).
3
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
(b) The composition of a W1 -system S1 and a W2 -system S2 holds if and only if σ ∗ τ ∈ S. The causality condition for
is a (W1 ∪ W2 )-system S1 t S2 defined by such S can be also written as follows:
σ ∈ S1 t S2 iff σ ↓ W1 ∈ S1 ∧ σ ↓ W2 ∈ S2 . ∀σ, σ 0 ∈ JIK. ∀k ≥ 0. σ ∼k σ 0 ⇒ F (σ) ∼k F (σ 0 ).
The system in the example in Section II-B is a machine if
If W and W 0 are disjoint wire sets, σ ∈ JW K, and τ ∈
we regard u as an input wire and v as an output wire. The
JW 0 K, then there is a unique behavior ϑ ∈ JW ∪W 0 K such that
same is true of the system Conn(u, v): its associated function
σ = ϑ ↓ W and τ = ϑ ↓ W 0 . This “product” of behaviors will
F is the identity function.
be written as ϑ = σ ∗ τ . (If W is the empty set, then JW K has
one element—a “trivial behavior” that is also a multiplicative E. Feedback on Machines
unit for the product operation ∗.) We will also use the notation We will use the term feedback for the system hS | u = vi
[u 7→ a, v 7→ b, . . .] for the {u, v, . . .}-behavior σ such that as mentioned in Section II-C when S is a machine and the
σ.u = a, σ.v = b, etc. wires u and v of the same type are an input and output of
Hiding and composition suffice to define complex networks S respectively. Our concern now is to understand under what
of systems. To model identification of wires, we use simple conditions the feedback produces a machine.
connection systems: by definition, Conn(u, v) is the {u, v}- To fix the notation, assume S is an (I, O)-machine given
system consisting of all behaviors σ such that σ.u = σ.v. by F : JIK → JOK, with wires u ∈ I, v ∈ O of the same type
Now if S1 , . . . , Sm are given systems and u1 , . . . , un , A. By the note at the end of Section II-C, we have that for
v1 , . . . , vn are some of their wires, the network obtained every σ ∈ JI − {u}K and τ ∈ JO − {v}K,
from these systems by identifying each wire ui with the
corresponding wire vi (of equal type) is the system σ ∗ τ ∈ hS | u = vi
hS1 , . . . , Sm | u1 = v1 , . . . , un = vn i if and only if
4
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
a description of hS | v = wi by a single equation that relates its A. Input-output Structure, Channels, and Transfer
input and output: z = û ⊕ û. The other feedback hS | u = zi is We assume that the set of wires is partitioned into data,
easier to calculate; it is given by equation w = v⊕v⊕((0)#v). valid, and stop wires, so that for each data wire X there
exist associated wires validX and stopX of boolean type. (In
F. Networks of Machines and the Combinational Loop Theo- actual circuit implementations, validX and stopX need not be
rem physical wires; it suffices that they be appropriately encoded.)
Consider a network N = hS1 , . . . , Sm | u1 = v1 , . . . , un = Definition 5: Let I, O be disjoint sets of data wires. An
vn i, where S1 , . . . , Sm are machines with disjoint wire sets [I, O]-system is an (I 0 , O0 )-machine, where I 0 = I ∪
and the pairs (u1 , v1 ),. . . ,(un , vn ) involve n distinct input {validX | X ∈ I} ∪ {stopY | Y ∈ O} and O0 = O ∪
wires ui and n distinct output wires vi . (There is no assump- {validY | Y ∈ O} ∪ {stopX | X ∈ I}.
tion that ui , vi belong to the same machine Sj .) Our goal is to The triples hX, validX , stopX i (for X ∈ I) and
understand under what conditions the system N is a machine. hY, validY , stopY i (for Y ∈ O) are to be thought of as elastic
Note that N = hS | u1 = v2 , . . . , un = vn i, where S = input and output channels of the system.
S1 t · · · t Sm . It is easy to check that an input-output pair Let transferZ be a shorthand for validZ ∧ ¬stopZ and say
(u, v) of S is sequential if either (1) (u, v) is sequential for that transfer along Z occurs in a state s if s |= transferZ .
some Si , or (2) u and v belong to different machines. Thus, Given a behavior σ = (σ[0], σ[1], σ[2], . . .) of an [I, O]-system
the information about sequentiality of input-output pairs of the and Z ∈ I ∪ O, let σZ be the sequence (perhaps finite!)
“parallel composition” machine S is readily available from the obtained from σ.Z = (σ[0].Z, σ[1].Z, σ[2].Z, . . .) by deleting
sequentiality information about the component machines Si , all entries σ[i].Z such that transfer along Z does not occur
and our problem boils down to determining when a multiple in σ[i]. The transfer behavior σ | associated with σ is then
feedback operation performed on a single machine results in defined by σ | .Z = σZ . If all sequences σZ are infinite, then
a system that is itself a machine. σ | is an (I ∪ O)-behavior; in general, however, we only have
Simultaneous feedback specified by a set of two or more σZ ∈ type(Z)ω .
input-output pairs of a machine does not necessarily produce For each wire Z of an [I, O]-system S we introduce
a machine even if all pairs involved are sequential. Indeed, an auxiliary transfer counter variable tctZ of type Z. The
in the example in Section II-E, we had a system S with counters serve for expressing system properties related to
two sequential pairs (u, z) and (v, w), but (u, z) ceases to transfer. By definition, tctZ is equal to the number of states
be sequential for hS | v = wi. Indeed, if z and u are related that precede the current state and in which transfer along Z
by z = û ⊕ û, then knowing a prefix of length k of z requires has occurred. That is, for every behavior σ of S, we have
knowing the prefix of the same length of u; a shorter one σ.tctZ = (t0 , t1 , . . .), where tk is the number of indices i
would not suffice. such that i < k and transfer along Z occurs in σ[i]. Note that
To ensure that a multiple feedback construction produces a the sequence σ.tctZ is non-decreasing and begins with t0 = 0.
machine, one needs to show that, in addition to the wire pairs The notation min tctS , for any subset S of I ∪ O will be
to be identified, sufficiently many other input-output pairs are used to denote the smallest of the numbers tctZ , where Z ∈ S.
also sequential. A precise formulation for a double feedback
is given by a version of the Bekić Lemma: for the system B. Definition of Elasticity
hS | u = w, v = zi to be a machine, it suffices that three An elastic component, when ready to communicate over
pairs of wires be sequential—(u, w), (v, z), and one of (u, z), an output channel must remain ready until the transfer takes
(v, w). This non-trivial auxiliary result is needed for the proof place.
of Theorem 1 below, and is a special case of it. Definition 6: The persistence conditions for an [I, O]-
Given an (I, O)-machine S, let its dependency graph ∆(S) system S are given by
have the vertex set I ∪ O and directed edges that go from u
S |= G (validY ∧ stopY ⇒ (validY )+ ∧ Y + = Y ) (1)
to v for each pair (u, v) ∈ I × O that is not sequential. For
a network system N = hS1 , . . . , Sm | u1 = v1 , . . . , un = vn i, for every Y ∈ O.
its graph ∆(N ) is then defined as the direct sum of graphs The conjunct Y + = Y can be removed from (1) without
∆(S1 ), . . . , ∆(Sm ) with each vertex ui (1 ≤ i ≤ n) identified affecting the definition of elastic machines (it follows from
with the corresponding vertex vi (Figure 3). other conditions). The most useful consequence of persistence
Theorem 1 (Combinational Loop Theorem): The network is the “handshake lemma”:
system N is a machine if the graph ∆(N ) is acyclic.
S |= G F validY ∧ G F ¬stopY ⇒ G F transferY
III. E LASTIC M ACHINES
Liveness of an elastic component is expressed in terms of to-
In this section we give the definition of elastic machines. ken count: if all input channels have seen k transfers and there
Its four parts—input-output structure, persistence conditions, is an output channel that has seen less, then the communication
liveness conditions, and the transfer determinism condition— on output channels with the minimum amount of transfer must
are covered by Definitions 5-8 below. be eventually offered. The following definition formalizes this,
5
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
2 a c 1 2 a c 1 2 a c 1 k transfers on the output channels in the cooperating envi-
d 0 d 1 d 1
1 b e 1 1 b e 1 2 b e 1 ronment. Thus, it is not surprising (even though the proof is
not obvious) that the determinism postulated in Definition 8
Fig. 5. Liveness: Only the hungriest channels (shaded) are being served. suffices to derive the causality of S | :
The numbers indicate the current token count at each channel. Theorem 2: If S is an [I, O]-elastic machine, then S | is an
(I, O)-machine.
In the situation of Definition 8, we say that S is an ela-
together with a similar commitment to eventual readiness on
sticization of S | and that S | is the transfer machine of S.
input channels. (See also Figure 5.)
Definition 7: The liveness conditions for an [I, O]-system
IV. E LASTIC N ETWORKS
are given by
An elastic network N is given by a set of elastic machines
S |= G (min tctO = tctY ∧ min tctI > tctY ⇒ F validY )(2)
S1 , . . . , Sm with no shared wires, together with a set of chan-
S |= G (min tctI∪O = tctX ⇒ F ¬stopX ) (3) nel pairs (X1 , Y1 ), . . . , (Xn , Yn ), where the Xi are n distinct
for every Y ∈ O and every X ∈ I. input channels and the Yi are n distinct output channels. As
In practice, elastic components will satisfy simpler (but a network of standard machines, the elastic network N is
stronger) liveness properties; e.g. remove min tctO ≥ tctY defined by
from (2) and replace min tctI∪O ≥ tctX with min tctO ≥
N = hS1 , . . . , Sm | Xi = Yi , validXi = validYi , AAAAAAi
tctX in (3). However, a composition of such components,
while satisfying (2) and (3), may not satify the stronger stopXi = stopYi (1 ≤ i ≤ n)i
versions of these conditions.
for which we will use the shorter notation
Consider single-channel [I, O]-systems satisfying the per-
sistence and liveness conditions: an elastic consumer is a N = hhS1 , . . . , Sm [] X1 = Y1 , . . . , Xn = Yn ii.
[{Z}, ∅]-system C satisfying (4) below; similarly, an elastic
producer is a [∅, {Z}]-system P satisfying (5) and (6). We will define a graph that encodes the sequentiality infor-
mation about the network N and prove in Theorem 4 that
C |= G F ¬stopZ (4)
acyclicity of that graph implies that N is an elastic machine
P |= G (validZ ∧ stopZ ⇒ (validZ )+ ) (5) and that N | = hS1| , . . . , Sm
|
| X1 = Y1 , . . . , Xn = Yn i.
P |= G F validZ (6)
A. Elastic Feedback
Let CZ be the {Z, validZ , stopZ }-system characterized by
condition (4)—the largest (in the sense of behavior inclusion) Elastic feedback is a simple case of elastic network:
of the systems satisfying this condition. Similarly, let PZ be
the {Z, validZ , stopZ }-system characterized by properties (5) hhS [] P = Qii = hS | P = Q, validP = validQ , stopP = stopQ i.
and (6). Finally, define the [I, O]-elastic environment to be the
system Definition 9: Suppose S is an elastic machine. An input-
F F output channel pair (P, Q) will be called sequential for S if
EnvI,O = X∈I PX t Y ∈O CY .
Note that EnvI,O is only a system; it is not functional and so ∧ min tctI∪O = tctQ
S |= G ⇒ F validQ . (7)
is not a machine. ∧ min tctI−{P } > tctQ
When a system satisfying the persistence and liveness con- Condition (7) is a strengthening of the liveness condition
ditions (1-3) is coupled with a matching elastic environment, (2) for channel Q. It expresses a degree of independence of
the transfer on all data wires never comes to a stall: the output channel Q from the input channel P ; e.g., the first
Lemma 3 (Liveness): If S satisfies (1-3), then for every token at Q need not wait for the arrival of the first token
behavior ω of S t EnvI,O , all the component sequences of at P . This independence can be achieved in the system by
the transfer behavior ω | are infinite. storing some tokens inside, between these two channels. Note
As an immediate consequence of Liveness Lemma, if S that (7) does not guarantee that connecting channels P and Q
satisfies (1-3), then would not introduce ordinary combinational cycles. Therefore
S | = {ω | | ω ∈ S t EnvI,O } the acyclicity condition in the following theorem is required
to ensure (by Theorem 1) that the elastic feedback, viewed as
is a well-defined (I, O)-system. an ordinary network, is a machine.
Definition 8: An [I, O]-system S is an [I, O]-elastic ma- Theorem 3: Let S be an elastic machine and F the elastic
chine if it satisfies the properties (1-3) and the associated feedback system hhS [] P = Qii. If the channel pair (P, Q) is
system S | is deterministic. sequential for S, then: (a) the wire pair (P, Q) is sequential for
The liveness conditions (2,3) are visibly related to causality S | . If, in addition, ∆(F) is acyclic, then: (b) F is an elastic
at the transfer level: k transfers on the input channels imply machine, and (c) F | = hS | | P = Qi.
6
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
B. Main Theorems If ∆(N ), ∆(M), and ∆e (N ) are acyclic, then M is an elastic
Sequentiality of two channel pairs (P, Q), (P 0 , Q) of an machine, and M| = N | .
elastic machine does not imply their “simultaneous sequen- The precise relationship between graphs ∆(M) and ∆(N )
tiality” can be easily described. In practice they are at the same time
acyclic or not, as a consequence of sequentiality of sufficiently
∧ min tctI∪O = tctQ many input-output wire pairs of B.
S |= G ⇒ F validQ .
∧ min tctI−{P,P 0 } > tctQ
V. E LASTIC VS . PATIENT S YSTEMS
This deviates from the situation with ordinary machines, where
the analogous property holds and is instrumental in the proof Elastic machines and patient processes of [2] provide two
of Combinational Loop Theorem. formalizations of the intuitive concept of latency-insensitive
To justify multiple feedback on elastic machines, we have circuits. In this section we address their connections and differ-
thus to postulate that simultaneous sequentiality is true where ences. We begin with an overview of [2], using a minimalistic
required. Specifically, we demand that elastic machines come approach and terminology that differs from the original. We
with simultaneous sequentiality information: If S is an [I, O]- believe, however, that Definition 11 below matches the original
elastic machine, then for every Y ∈ O a set δ(Y ) ⊆ I is given definion accurately in most important aspects.
so that
A. Patient Systems
∧ min tctI∪O = tctY
S |= G ⇒ F validY . (8) The notation A∗ is for the set of finite sequences over A. A
∧ min tctI−δ(Y ) > tctY
finitary W -system, by definition, is a set of behaviors σ such
Note that if P ∈ δ(Q), then the pair (P, Q) is sequential, but that σ.w is a finite sequence for every w ∈ W .
the converse is not implied. A function δ : O → 2I with the A stalling stream over A is a stream over A ∪ {@}. We
property (8) will be called a sequentiality interface for S. will refer to @ as the bubble and to elements of A as tokens.
For an [I, O]-elastic machine S with a sequentiality inter- We will consider only stalling streams that contain finitely
face δ, we define ∆e (S, δ) to be the graph with the vertex many tokens. If a is such a stream, let a ∈ A∗ denote the
set I ∪ O and directed edges (X, Y ) where X ∈ / δ(Y ). By sequence over A obtained by dropping all bubbles from a.
Theorem 3(a), ∆e (S, δ) contains ∆(S | ) as a subgraph. Clearly, a is determined by a and the sequence ∂(a) ∈ N∗ of
Given an elastic network N = hhS1 , . . . , Sm [] X1 = lengths of bubble sequences between consecutive tokens of a.
Y1 , . . . , Xn = Yn ii, where each Si comes equipped with a For example, if
sequentiality interface δi , its graph ∆e (N ) is by definition the
direct sum of graphs ∆e (S1 , δ1 ), . . . , ∆e (Sm , δm ) with each a = (@, @, 7, @, 4, 5, @, @, @, 8, . . .) (9)
vertex Xi (1 ≤ i ≤ n) identified with the corresponding vertex
we have a = (7, 4, 5, 8, . . .) and ∂(a) = (2, 1, 0, 3, . . .). Two
Yi .
stalling streams a, b are latency equivalent, written a $ b,
Theorem 4: If the graphs ∆(N ) and ∆e (N ) are acyclic, when a = b. Note that a $ a.
then the network system N is an elastic machine, the cor-
By definition, a stalling W -system is a set of behaviors
responding non-elastic system N̄ = hS1| , . . . , Sm |
| X1 =
| σ such that for every w ∈ W , σ.w is a stalling stream over
Y1 , . . . , Xn = Yn i is a machine, and N = N̄ .
type(w). Latency equivalence extends to W -behaviors and W -
As in Theorem 3, acyclicity of ∆(N ) is needed to ensure
systems: σ $ τ iff σ.w $ τ.w holds for every w ∈ W ; S $ S 0
(by Theorem 1) that N defines a machine. Elasticization
iff for every σ ∈ S (σ ∈ S 0 ) there exists τ ∈ S 0 (τ ∈ S) such
procedures (e.g. [4]) will typically produce elastic components
that σ $ τ .
with enough sequential input-output wire pairs, so that ∆(N )
A stalling W -system S determines a standard finitary W -
will be acyclic as soon as ∆e (N ) is acyclic.
system S | = {σ | σ ∈ S}, where σ is given by σ.w = σ.w
Note, however, that cycles in ∆e (N ) need not correspond
(for all w ∈ W ). Clearly, S | $ S.
to combinational cycles in N seen as an ordinary network,
Stalling the k-th token of a by d steps produces a latency
since empty buffers with sequential elements cutting the
equivalent stream that will be denoted stall(a, k, d). Omitting
combinational feedbacks may be inserted into N . Even though
the easy definition, we give an example: if a is as in (9), then
non-combinational in the ordinary sense, these cycles contain
no tokens and therefore no progress along them can be made. stall(a, 1, 3) = (@, @, 7, @, @, @, @, 4, 5, @, @, @, 8, . . .)
Theorem 4 impies that insertion of empty elastic buffers
does not affect the basic functionality of an elastic network, Definition 11: Let ≺ be a well-founded order1 on W and
as illustrated in Figure 2(b). let D > 0. A patient W -system (relative to ≺ and D) is a
Definition 10: An empty elastic buffer is an elastic machine 1 Introduction of a well-founded ordering of wires is motivated in [2] with
S such that S | = Conn(X, Y ) for some X, Y . the purpose of modeling combinational dependencies, but such dependencies
Theorem 5 (Buffer Insertion Theorem): Suppose B is an in patient systems are not discussed in any detail. Moreover, the ordering of
empty elastic buffer with channels X, Y . Let N = wires is implicitly assumed to be total in [2], which is somewhat unnatural.
For instance, when constructing a patient adder with inputs u, v and output
hhS1 , . . . , Sm [] X1 = Y1 , . . . , Xn = Yn ii and M = w, one has two ordering choices: u ≺1 v ≺1 w and v ≺2 u ≺2 w. It is not
hhB, S1 , . . . , Sm [] X = Y1 , X1 = Y, X2 = Y2 , . . . , Xn = Yn ii. clear that a patient adder in the ≺1 -sense will be patient in the ≺2 -sense too.
7
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
stalling system P such that for every σ ∈ P, every u ∈ W , Are elastic machines more general? The answer is an easy
and every k ≥ 0 there exists σ 0 ∈ P such that “no” since, for example, the set of all possible stalling W -
(Pat-1) σ 0 .u = stall(σ.u, k, 1) behaviors is a patient system in the sense of Definition 11.
and for every v 6= u there However, if one adds to Definition 11 a reasonable require-
exists dv ≤ D such that ment that a patient system be a machine, the answer is not
stall(σ.v, k, dv ) if u ≺ v
(Pat-2) σ 0 .v = immediately clear.
stall(σ.v, k + 1, dv ) otherwise
Which formalization is easier to use? Without offering a
The main results of [2] can now be summarized: definitive answer, we would argue that verifying that a low-
1) a theorem saying that the composition of patient systems level design (RTL, say) implements an elastic machine would
(with the same W , ≺, and D) is a patient system; be easier than verifying that it implements a patient system.
2) the definition and analysis of patient buffers, i.e. patient The bottom line is that the conditions for a system to be
systems B such that B | = Connfin (u, v)—the finitary an elastic machine are expressible as temporal properties of
connection system; suitably constructed infinite-state models. This is not obvious
3) a general construction that, for a given finitary system for the determinism condition for S | in Definition 8, but
M without combinational dependencies (model of a can be done by replacing determinism with causality and
Moore machine), produces a patient system P such that introducing auxiliary variables for sequences of transferred
P $ M. values over channels. Even though (e.g., because of infinite
counters involved) these conditions are not directly checkable
B. Comparison
by the existing model checking technology, there are palpable
The formalization given by patient systems is at a higher opportunities to find manageable stronger conditions that taken
level of abstraction. While elastic machines deal explicitly with together imply elasticity (e.g., postulating a limit on the token
handshaking signals between communicating systems, patient count differences between channels eliminates the need for
systems communicate purely in the token/bubble language. infinite counters). On the other hand, the definition of a patient
Given an elastic (as defined in Section III) [I, O]-system E, system, being of the form “for every behavior σ, there exists
the corresponding stalling (I ∪ O)-system E @ is obtained by a behavior σ 0 such that . . . ” appears to us to be intrinsically
projecting the finite-transfer behaviors of E to data wires and more complex. Our only positive conclusion, however, is that
replacing data items on each wire with @ at all cycles where the mechanical checking of either of the definitions is an open
transfer along that wire does not occur. Precisely, let E F be problem deserving further study.
the subset of E consisting of all behaviors ω such that ω | .Z
is finite for all channels Z.2 Then, given ω ∈ E F , we define a VI. C ONCLUSION
stalling (I ∪ O)-behavior ω @ by We have presented a theory of elastic machines that gives an
easy-to-check condition for the compositional theorem of the
(ω.Z)[i] if (ω.validZ )[i] ∧ ¬(ω.stopZ )[i]
(ω .Z)[i] =
@
form “an elasticization of a network of ordinary components
@ otherwise
is equivalent to the network of components’ elasticizations”.
and finally we define the stalling system E @ as the set of all Verification of a particular implementation is reduced to prov-
such behaviors ω @ . Clearly, the system (E @ )| is the finitary ing that conditions of Definition 8 are satisfied for all elastic
version of the standard machine E | . components used, and that the graph ∆e (N e ) is acyclic for
Now we can address some questions pertinent to the com- every network N to which the elasticization is applied. While
parison of patient processes vs. elastic machines. the definition of the graphs ∆e may appear complex because
Are patient processes more general? The answer is “no” of the sequentiality interfaces involved, it should be noted that
because there exist elastic machines E such that E @ is not the elasticization procedures, e.g. [4], are reasonably expected
patient. To see this, consider an elastic machine E that starts to completely preserve sequentiality: a channel P belongs to
offering new valid outputs on channel u only on even cycles. δ(Q) if the wire-pair (P, Q) is sequential in the original non-
(The existence of such elastic machines is obvious.) Observe elastic machine. This ensures ∆e (N e ) = ∆(N ) and so testing
that σ.u = (@, 7, 9, . . .) is possible for some behavior σ of for sequentiality is done at the level of ordinary networks.
E @ (token 7, even though transmitted on cycle 1 was first Future work will be focused on proving correctness of
offered on cycle 0). Then stall(σ.u, 0, 1) = (@, @, 7, 9, . . .) particular elasticization methods, on techniques for mechanical
must also be part of a behavior of E @ , by condition (Pat-1) verification of elasticity, and on extending the theory to more
of Definition 11. This implies that token 9 is first offered on advanced protocols.
cycle 3, contrary to our assumption.
The above example can be viewed as an indication that Acknowledgments: Luca Carloni clarified some details of [2].
the condition (Pat-1) is too restrictive. It would be interesting Ken McMillan pointed out several inaccuracies in a previous
to see if an appropriate modification of (Pat-1) results in a version of the paper and further clarified [2] for us. Gerard
definition of patient processes that captures elastic machines. Berry, Ching-Tsun Chou, John Harrison, and the anonymous
2 One can prove that E is the set of all limits of behaviors of E F and so E reviewers provided useful remarks. We are grateful for all the
is determined by E F . help we received.
8
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006
R EFERENCES [8] E. A. Lee and A. Sangiovanni-Vincentelli. A framework for comparing
models of computation. IEEE Transactions on Computer-Aided Design
[1] G. Berry. The Constructive Semantics of Pure Esterel. Draft book, of Integrated Circuits and Systems, 17(12):1217–1229, 1998.
available at http://www.esterel.org, version 3, July 1999. [9] E. A. Lee, H. Zheng, and Y. Zhou. Causality interfaces
[2] L. P. Carloni, K. L. McMillan, and A. L. Sangiovanni-Vincentelli. and compositional causality analysis. Invited paper in
Theory of latency-insensitive design. IEEE Transactions on Computer- Foundations of Interface Technologies (FIT 2005), available at
Aided Design of Integrated Circuits, 20(9):1059–1076, September 2001. http://ptolemy.eecs.berkeley.edu/publications.
[3] L. P. Carloni and A. L. Sangiovanni-Vincentelli. Coping with latency in [10] R. Manohar and A. J. Martin. Slack elasticity in concurrent computing.
SoC design. IEEE Micro, Special Issue on Systems on Chip, 22(5):12, In Proc. 4th Int. Conf. on the Mathematics of Program Construction,
October 2002. volume 1422 of Lecture Notes in Computer Science, pages 272–285,
[4] J. Cortadella, M. Kishinevsky, and B. Grundmann. Synthesis of syn- 1998.
chronous elastic architectures. In Proc. Digital Automation Conference [11] J. Matthews. Recursive function definition over coinductive types. In
(DAC), July 2006. TPHOLs ’99: Proc. the 12th Int. Conf. on Theorem Proving in Higher
[5] S. A. Edwards and E. A. Lee. The semantics and execution of a Order Logics, pages 73–90, London, UK, 1999. Springer-Verlag.
synchronous block-diagram language. Sci. Comput. Program., 48(1):21– [12] S. Suhaib, D. Berner, D. Mathaikutty, J.-P. Talpin, and S. Shukla.
42, 2003. Presentation and formal verification of a family of protocols for latency
[6] H. M. Jacobson et al. Synchronous interlocked pipelines. In Proc. Int. insensitive design. Technical Report 2005-02, FERMAT, Virginia Tech,
Symp. on Advanced Research in Asynchronous Circuits and Systems, 2005.
pages 3–12, 2002. [13] J. Vuillemin. On circuits and numbers. IEEE Transactions on Comput-
[7] S. Krstić, J. Cortadella, M. Kishinevsky, and J. O’Leary. Syn- ers, 43(8):868–879, 1994.
chronous elastic networks. Available at www.lsi.upc.edu/
˜jordicf/gavina/BIB/reports/fmcad06 ext.pdf, 2006.
9
Proceedings of the Formal Methods in Computer Aided Design (FMCAD'06)
0-7695-2707-8/06 $20.00 © 2006