IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
1
An Efficient Byzantine-Resilient Tuple Space
Alysson Neves Bessani, Miguel Correia Member, IEEE, Joni da Silva Fraga Member, IEEE and Lau Cheuk Lung
Abstract— Open distributed systems are typically composed by
an unknown number of processes running in heterogeneous hosts.
Their communication often requires tolerance to temporary
disconnections and security against malicious actions. Tuple
spaces are a well-known coordination model for this kind of
systems. They can support communication that is decoupled both
in time and space. There are currently several implementations of
distributed fault-tolerant tuple spaces but they are not Byzantineresilient, i.e., they do not provide a correct service if some
replicas are attacked and start to misbehave. This paper presents
an efficient implementation of a Linearizable Byzantine faulttolerant Tuple Space (LBTS) that uses a novel Byzantine quorum
systems replication technique in which most operations are
implemented by quorum protocols while stronger operations are
implemented by more expensive protocols based on consensus.
LBTS is linearizable and wait-free, showing interesting performance gains when compared to a similar construction based on
state machine replication.
Index Terms— Tuple spaces, Byzantine fault tolerance, intrusion tolerance, quorum systems.
I. I NTRODUCTION
OORDINATION is a classical distributed systems paradigm
based on the idea that separating the system activities
in computation and coordination can simplify the design of
distributed applications [1]. The generative coordination model,
originally introduced in the L INDA programming language [2],
uses a shared memory object called a tuple space to support
the coordination. Tuple spaces can support communication that
is decoupled both in time – processes do not have to be active
at the same time – and space – processes do not need to know
each others’ addresses [3]. The tuple space can be considered to
be a kind of storage that stores tuples, i.e., finite sequences of
values. The operations supported are essentially three: inserting a
tuple in the space, reading a tuple from the space and removing
a tuple from the space. The programming model supported by
tuple spaces is regarded as simple, expressive and elegant, being
implemented in middleware platforms like G IGA S PACES [4],
Sun’s JAVA S PACES [5] and IBM’s TS PACES [6].
There has been some research about fault-tolerant tuple spaces
(e.g., [7], [8]). The objective of those works is essentially to
guarantee the availability of the service provided by the tuple
space, even if some of the servers that implement it crash. This
paper goes one step further by describing a tuple space that
tolerates Byzantine faults. More specifically, this work is part
C
A preliminary version of this paper entitled “Decoupled Quorum-based
Byzantine-Resilient Coordination in Open Distributed Systems” appeared
on the Proceedings of the 6th IEEE International Symposium on Network
Computing and Applications – NCA 2007.
Alysson Neves Bessani and Miguel Correia are with Departamento de
Informática, Faculdade de Ciências da Universidade de Lisboa, Lisboa,
Portugal. Email: {bessani,mpc}@di.fc.ul.pt. Joni da Silva Fraga is with
Departamento de Automação e Sistemas, Universidade Federal de Santa
Catarina, Florianópolis–SC, Brazil. Email:
[email protected]. Lau Cheuk
Lung is with Departamento de Informática e Estatı́stica, Universidade Federal
de Santa Catarina, Florianópolis–SC, Brazil. Email:
[email protected].
of a recent research effort in intrusion-tolerant systems, i.e., on
systems that tolerate malicious faults, like attacks and intrusions
[9]. These faults can be modeled as arbitrary faults, also called
Byzantine faults [10] in the literature.
The proposed tuple space is dubbed LBTS since it is a
Linearizable Byzantine Tuple Space. LBTS is implemented by a
set of distributed servers and behaves according to its specification
if up to a number of these servers fail, either accidentally (e.g.,
crashing) or maliciously (e.g., being attacked and starting to
misbehave). Moreover, LBTS tolerates accidental and malicious
faults in an unbounded number of clients accessing it.
A tuple space service like LBTS might be interesting in
several domains. One case are application domains with frequent
disconnections and mobility that can benefit from the time and
space decoupling provided by LBTS. Two examples of such
domains are ad hoc networks [11] and mobile agents [3]. Another
domain are bag-of-tasks applications in grid computing [12],
where a large number of computers are used to run complex
computations. These applications are decoupled in space and time
since the computers that run the application can enter and leave
the grid dynamically.
LBTS has two important properties. First, it is linearizable,
i.e., it provides a strong concurrency semantics in which operations invoked concurrently appear to take effect instantaneously
sometime between their invocation and the return of their result
[13]. Second, it is wait-free, i.e., every correct client process that
invokes an operation in LBTS eventually receives a response,
independently of the failure of other client processes or access
contention [14].
Additionally, LBTS is based on a novel Byzantine quorum
systems replication philosophy in which the semantics of each
operation is carefully analyzed and a protocol as simple as possible is defined for each. Most operations on the tuple space are
implemented by pure asynchronous Byzantine quorum protocols
[15]. However, a tuple space is a shared memory object with
consensus number higher than one [16], according to Herlihy’s
wait-free hierarchy [14], so it cannot be implemented using only
asynchronous quorum protocols [17]. In this paper we identify
the tuple space operations that require stronger protocols, and
show how to implement them using a Byzantine PAXOS consensus
protocol [18], [19], [20]. The philosophy behind our design is
that simple operations are implemented by “cheap” quorum-based
protocols, while stronger operations are implemented by more
expensive protocols based on consensus. These protocols are more
expensive in two senses. First, they have higher communication
and message complexities (e.g., the communication complexity is
typically O(n2 ) instead of O(n)). Second, while Byzantine quorum protocols can be strictly asynchronous, consensus has been
shown to be impossible to solve deterministically in asynchronous
systems [21], so additional assumptions are needed: either about
synchrony or about the existence of random oracles. Although
there are other recent works that use quorum-based protocols
to implement objects stronger than atomic registers [22] and to
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
optimize state machine replication [23], LBTS is the first to mix
these two approaches supporting wait freedom and being efficient
even in the presence of contention.
The contributions of the paper can be sumarized as following:
1) it presents the first linearizable tuple space that tolerates
Byzantine faults; the tuple space requires n ≥ 4 f +1 servers,
from which f can be faulty, and tolerates any number of
faulty clients. Moreover, it presents a variant of this design
that requires the minimal number of servers (n ≥ 3 f + 1) at
the cost of having weaker semantics;
2) it introduces a new design philosophy to implement shared
memory objects with consensus number higher than one
[14], by using quorum protocols for the weaker operations
and consensus protocols for stronger operations. To implement this philosophy several new techniques are developed;
3) it presents the correctness conditions for a linearizable tuple
space; although this type of object has been used for more
than two decades, there is no other work that provides such
a formalization;
4) it compares the proposed approach with Byzantine state machine replication [18], [24] and shows that LBTS presents
several benefits: some operations are much cheaper and it
supports the concurrent execution of operations, instead of
executing them in total order.
The paper is organized as follows. Section II presents the
background information for the paper as well as the definition of
the system model assumed by our protocols. The definition of the
correctness conditions for a linearizable tuple space is formalized
in Section III. The LBTS protocols are presented in Section
IV. Section V presents several optimizations and improvements
for the basic LBTS protocols. An alternative version of LBTS
is presented in Section VI. Sections VII and VIII presents an
evaluation of LBTS and summarize the related work, respectively.
The conclusions are presented in Section IX, and an Appendix
containing the proofs of the protocols is included at the end.
II. P RELIMINARIES
A. Tuple Spaces
The generative coordination model, originally introduced in
the L INDA programming language [2], uses a shared memory
object called a tuple space to support the coordination between
processes. This object essentially allows the storage and retrieval
of generic data structures called tuples.
Each tuple is a sequence of fields. A tuple t in which all fields
have a defined value is called an entry. A tuple with one or more
undefined fields is called a template (usually denoted by a bar,
e.g., t). An entry t and a template t match — m(t,t) — if they
have the same number of fields and all defined field values of
t are equal to the corresponding field values of t. Templates are
used to allow content-addressable access to tuples in the tuple
space (e.g., template h1, 2, ∗i matches any tuple with three fields
in which 1 and 2 are the values of the first and second fields,
respectively).
A tuple space provides three basic operations [2]: out(t) that
outputs/inserts the entry t in the tuple space; inp(t) that reads and
removes some tuple that matches t from the tuple space; rdp(t)
that reads a tuple that matches t without removing it from the
space. The inp and rdp operations are non-blocking, i.e., if there
is no tuple in the space that matches the template, an error code
2
is returned. Most tuple spaces also provide blocking versions of
these operations, in and rd. These operations work in the same
way of their non-blocking versions but stay blocked until there is
some matching tuple available.
These few operations together with the content-addressable
capabilities of generative coordination provide a simple and
powerful programming model for distributed applications [2], [6],
[25]. The drawback of this model is that it depends on an infrastructure object (the tuple space), which is usually implemented
as a centralized server, being a single point of failure, the main
problem addressed in this paper.
B. System Model
The system is composed by an unknown set of client processes1 Π = {p1 , p2 , p3 , ...} which interact with a set of n servers
U = {s1 , s2 , ..., sn } that implements a tuple space with certain
dependability properties. We consider that each client process and
each server has an unique id.
All communication between client processes and servers is
made over reliable authenticated point-to-point channels2 . All
servers are equipped with a local clock used to compute message
timeouts. These clocks are not synchronized so their values can
drift.
In terms of failures, we assume that an arbitrary number of
client processes and up to f ≤ ⌊ n−1
4 ⌋ servers can be subject
to Byzantine failures, i.e., they can deviate arbitrarily from the
algorithm they are specified to execute and work in collusion
to corrupt the system behavior. Clients or servers that do not
follow their algorithm in some way are said to be faulty. A
client/server that is not faulty is said to be correct. We assume
fault independence for the servers, i.e., that the probability of each
server failing is independent of another server being faulty. This
assumption can be substantiated in practice using several kinds
of diversity [26].
We assume an eventually synchronous system model [27]: in
all executions of the system, there is a bound ∆ and an instant
GST (Global Stabilization Time), so that every message sent by
a correct server to another correct server at instant u > GST is
received before u + ∆. ∆ and GST are unknown. The intuition
behind this model is that the system can work asynchronously
(with no bounds on delays) most of the time but there are stable
periods in which the communication delay is bounded (assuming
local computations take negligible time)3 . This assumption of
eventually synchrony is needed to guarantee the termination of the
Byzantine PAXOS [18], [19], [20]. An execution of a distributed
algorithm is said to be nice if the bound ∆ always holds and there
are no server failures.
Additionally, we use a digital signature scheme that includes a
signing function and a verification function that use pairs of public
and private keys [28]. A message is signed using the signing
function and a private key, and this signature is verified with
the verification function and the corresponding public key. We
assume that each correct server has a private key known only by
itself, and that its public key is known by all client processes
1 We
also call a client process simply client or process.
channels can easily be implemented in practice assuming fair links
and using retransmissions, or in a more practical view, using TCP over IPsec
or SSL/TLS.
3 In practice this stable period has to be long enough for the algorithm to
terminate, but does not need to be forever.
2 These
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
and servers. We represent a message signed by a server s with a
subscript σs , e.g., mσs .
C. Byzantine Quorum Systems
Quorum systems are a technique for implementing dependable
shared memory objects in message passing distributed systems
[29]. Given a universe of data servers, a quorum system is a set
of server sets, called quorums, that have a non-empty intersection.
The intuition is that if, for instance, a shared variable is stored
replicated in all servers, any read or write operation has to be done
only in a quorum of servers, not in all servers. The existence
of intersections between the quorums allows the development
of read and write protocols that maintain the integrity of the
shared variable even if these operations are performed in different
quorums.
Byzantine quorum systems are an extension of this technique
for environments in which client processes and servers can fail in
a Byzantine way [15]. Formally, a Byzantine quorum system is
a set of server quorums Q ⊆ 2U in which each pair of quorums
intersect in sufficiently many servers (consistency) and there is
always a quorum in which all servers are correct (availability).
The servers can be used to implement one or more shared
memory objects. In this paper the servers implement a single
object – a tuple space. The servers form a f -masking quorum
system, which tolerates at most f faulty servers, i.e., it masks
the failure of at most that number of servers [15]. This type
of Byzantine quorum systems requires that the majority of the
servers in the intersection between any two quorums are correct,
thus ∀Q1 , Q2 ∈ Q , |Q1 ∩ Q2 | ≥ 2 f + 1. Given this requirement,
each quorum of the system must have q = ⌈ n+22f +1 ⌉ servers and
the quorum system can be defined as: Q = {Q ⊆ U : |Q| = q}. This
implies that |U| = n ≥ 4 f + 1 servers [15]. With these constraints,
a quorum system with n = 4 f + 1 will have quorums of 3 f + 1
servers.
D. Byzantine PAXOS
Since LBTS requires some modifications to the Byzantine
PAXOS total order protocol [18], this section briefly presents this
protocol.
The protocol begins with a client sending a signed message m
to all servers. One of the servers, called the leader, is responsible
for ordering the messages sent by the clients. The leader then
sends a PRE-PREPARE message to all servers giving a sequence
number i to m. A server accepts a PRE-PREPARE message if the
proposal of the leader is good: the signature of m verifies and no
other PRE-PREPARE message was accepted for sequence number
i. When a server accepts a PRE-PREPARE message, it sends a
PREPARE message with m and i to all servers. When a server
receives ⌈ n+2 f ⌉ PREPARE messages with the same m and i, it
marks m as prepared and sends a COMMIT message with m and
i to all servers. When a server receives ⌈ n+2 f ⌉ COMMIT messages
with the same m and i, it commits m, i.e., accepts that message
m is the i-th message to be delivered.
While the PREPARE phase of the protocol ensures that there
cannot be two prepared messages for the same sequence number i
(which is sufficient to order messages when the leader is correct),
the COMMIT phase ensures that a message committed with
sequence number i will have this sequence number even if the
leader is faulty.
3
When the leader is detected to be faulty, a leader election
protocol is used to freeze the current round of the protocol,
elect a new leader and start a new round. When a new leader
is elected, it collects the protocol state from ⌈ n+2 f ⌉ servers. The
protocol state comprises information about accepted, prepared and
committed messages. This information is signed and allows the
new leader to verify if some message was already committed with
some sequence number. Then, the new leader continues to order
messages.
III. T UPLE S PACE C ORRECTNESS C ONDITIONS
Informally, a tuple space has to provide the semantics presented
in Section II-A. Here we specify this semantics formally using
the notion of history [13].
A history H models an execution of a concurrent system
composed by a set of processes and a shared memory object (a
tuple space in our case). A history is a finite sequence of operation
invocation events and operation response events. A subhistory S
of a history H is a subsequence of the events of H.
We specify the properties of a tuple space in terms of sequential
histories, i.e., histories in which the first event is an invocation
and each invocation is directly followed by the corresponding
response (or an event that signals the operation completion).
We represent a sequential history H by a sequence of pairs
hoperation, responsei separated by commas. We also separate
subhistories by commas to form a new (sub)history. We use the
membership operation ∈ to mean “is a subhistory of”.
A set of histories is said to be prefix-closed if H being in the
set implies every prefix of H is also in the set. A sequential
specification for an object is a prefix-closed set of sequential
histories of that object.
A sequential specification for a tuple space is a set of prefixclosed histories of a tuple space in which any history H and any
subhistory S of H satisfy the following properties4 :
1) S, hrdp(t),ti ∈ H ⇒ ∃hout(t), acki ∈ S
2) S, hrdp(t),ti ∈ H ⇒ ∄hinp(t ′ ),ti ∈ S
3) S, hinp(t),ti ∈ H ⇒ ∃hout(t), acki ∈ S
4) hinp(t),ti, S ∈ H ⇒ ∄hinp(t ′ ),ti ∈ S
The first property states that if a tuple t is read at a given
instant, it must have been written before (the response for out(t)
is an acknowledgment, ack). The other properties have the same
structure. For simplicity, the properties assume that each tuple is
unique, i.e., that it is inserted only once in the space.
The properties 1-4 presented above are sufficient to prove tuple
space linearizability, however, other property, that we call no
match is needed to define when non-blocking read operations
(rdp or inp) can return ⊥. This property states the following:
5) (No match) the special value ⊥ can only be returned as a
result of rdp(t) (or inp(t)) if there is no tuple that matches
t inserted before the operation or all these tuples were
removed before the operation.
We give a sequential specification of a tuple space but we want
these properties to be satisfied by LBTS even when it is accessed
concurrently by a set of processes. We guarantee this is indeed the
case by proving that LBTS is linearizable [13] (see Appendix).
This property states, informally, that operations invoked concurrently appear to take effect instantaneously sometime between
4 These properties also specify the behaviour of blocking operations, substituting inp/rdp respectively by in/rd.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
their invocation and the return of a response (those not invoked
concurrently are executed in the order they are invoked). In other
words, any concurrent history of LBTS is equivalent to a sequential history. One should note that the sequential specification
given above considers each tuple individually. This is sufficient
to ensure the linearizability of LBTS given the inherent nondeterminism on tuple access of the tuple space coordination model
[2] (e.g., two successive rdp using the same template do not need
to result on the same tuple reading).
The five properties above are safety properties of a tuple space.
The liveness property we are interested in providing is wait-free
termination [14]: every correct client process that invokes a nonblocking tuple space operation eventually receives a response,
independent from other client failures or tuple space contention.
IV. L INEARIZABLE B YZANTINE T UPLE S PACE
This section presents LBTS. We concentrate our discussion
only on the non-blocking tuple space operations.
A. Design Rationale and New Techniques
The design philosophy of LBTS is to use quorum-based
protocols for read (rdp) and write (out) operations, and an
agreement primitive for the read-remove operation (inp). The
implementation of this philosophy requires the development of
some new techniques, described in this section.
To better understand these techniques let us recall how basic
quorum-based protocols work. Traditionally, the objects implemented by quorums are read-write registers [15], [30], [31], [32],
[33], [34]. The state of a register in each replica is represented by
its current value and a timestamp (a kind of “version number”).
The write protocol usually consists in (i.) reading the register
current timestamp from a quorum, (ii.) incrementing it, and (iii.)
writing the new value with the new timestamp in a quorum (deleting the old value). In the read protocol, the standard procedure
is (i.) reading the pair timestamp-value from a quorum and (ii.)
applying some read consolidation rule such as “the current value
of the register is the one associated with the greater timestamp
that appears f +1 times” to define what is the current value stored
in the register. To ensure register linearizability (a.k.a. atomicity)
two techniques are usually employed: write-backs – the read value
is written again in the system to ensure that it will be the result of
subsequent reads (e.g., [32], [33]) – or the listener communication
pattern – the reader registers itself with the quorum system servers
for receiving updates on the register values until it receives the
same register state (timestamp-value) from a quorum, ensuring
that this state will be observed in subsequent reads (e.g., [30],
[31], [34]).
In trying to develop a tuple space object using these techniques
two differences between this object and a register were observed:
(1.) the state of the tuple space (the tuples it contains) can be
arbitrarily large and (2.) the inp operation cannot be implemented
by read and write protocols due to the requirement that the
same tuple cannot be removed by two concurrent operations.
Difference (1.) turns difficult using timestamps for defining what
is the current state of the space (the state can be arbitrarily large)
while difference (2.) requires that concurrent inp operations are
executed in total order by all servers. The challenge is how to
develop quorum protocols for implementing an object that does
not use timestamps for versioning and, at the same time, requires
4
a total order protocol in one operation. To solve these problems,
we developed three algorithmic techniques.
The first technique introduced in LBTS serves to avoid timestamps in a collection object (one that its state is composed by
a set of items added to it): we partition the state of the tuple
space in infinitely many simpler objects, the tuples, that have
three states: not inserted, inserted, and removed. This means that
when a process invokes a read operation, the space chooses the
response from the set of matching tuples that are in the inserted
state. So, it does not need the version (timestamp) of the tuple
space, because the read consolidation rule is applied to tuples and
not to the space state.
The second technique is the application of the listener communication pattern in the rdp operation, to ensure that the usual
quorum reasoning (e.g., a tuple can be read if it appears in
f + 1 servers) can be applied in the system even in parallel with
executions of Byzantine PAXOS for inp operations. In the case of
a tuple space, the inp operation is the single read-write operation:
“if there is some tuple that match t on the space, remove it”. The
listener pattern is used to “fit” the rdp between the occurrence
of two inp operations. As will be seen in Section IV-B.4, the
listener pattern is not used to ensure linearizability (as in previous
works [30], [31], [34]), but for capturing replicas’ state between
removals. Linearizability is ensured using write-backs.
The third technique is the modification of the Byzantine PAXOS
algorithm to allow the leader to propose the order plus a candidate
result for an operation, allowing the system to reach an agreement
even when there is no state agreement between the replicas. This
is the case when the tuple space has to select a tuple to be
removed that is not present in all servers. Notice that, without
this modification, two agreements would have to be executed:
one to decide what inp would be the first to remove a tuple, in
case of concurrency (i.e., to order inp requests), and another to
decide which tuple would be the result of the inp.
Another distinguished feature of LBTS is the number of
replicas it requires. The minimal number of replicas required for
asynchronous Byzantine-resilient quorum and consensus protocols are 3 f + 1 [34], [35]5 . However, LBTS requires n ≥ 4 f + 1
replicas. 3 f + 1 replicas imply a quorum system with selfverifiable data, which requires a cryptographic expensive two-step
preparing phase for write protocols [32], or the use of timestamps
for the tuple space, which requires two additional steps in the
out protocol that are vulnerable to timestamp exhaustion attacks.
Solving this kind of attack requires asymmetric cryptography [32],
threshold cryptography [31] or 4 f +1 replicas [30]. Moreover, the
use of f more replicas allows the use of authenticators [18] in
some operations that require digital signatures (see Section V).
Concluding, we trade f more replicas for simplicity and enhanced
performance.
B. Protocols
Before presenting the LBTS protocols, we will describe some
additional assumptions taken into account by the algorithms and
the variables maintained by each LBTS server. The complete
proof of the LBTS protocols is presented in the Appendix.
1) Additional Assumptions: We adopt several simplifications
to improve the presentation of the protocols, namely:
5 Without using special components like in [36], or weakening the protocol
semantics, like in [37], when only n ≥ 2 f + 1 suffices.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
To allow us to represent local tuple spaces as sets, we assume
that all tuples are unique. In practice this might be enforced
by appending to each tuple an opaque field containing its
writer id and a sequence number generated by the writer.
A faulty client would be prevented from inserting a tuple
without its id, due to the channels being authenticated. These
invalid tuples and duplicated tuples are simply ignored by the
algorithm;
• Any message that was supposed to be signed by a server and
is not correctly signed is simply ignored;
• All messages carry nonces in order to avoid replay attacks;
• Access control is implicitly enforced. The tuple space has
some kind of access control mechanism (like access control
lists or role based access control) specifying what processes
can insert tuples in it (space-level access control) and each
tuple has two sets of processes that can read and remove it
(tuple-level access control) [38]. The basic idea is that each
tuple carries the credentials required to read and remove it,
and a client can only remove or read a tuple if it presents
credentials to execute the operation. Notice that this access
control can be implemented inside the local tuple space on
each server, and does not need to appear explicitly in our
algorithms;
• The algorithms are described considering a single tuple
space, but their extension to support multiple tuple spaces
is straightforward: a copy of each space is deployed in each
server and all protocols are executed in the scope of one of
the spaces adding a field in each message indicating which
tuple space is being accessed;
• The reactions of the servers to message receptions are
atomic, i.e., servers are not preempted during execution of
some processing message code block (e.g., lines 3-6 in
Algorithm 1).
2) Protocol Variables: Before we delve into the protocols, we
have to introduce four variables stored in each server s: Ts , rs , Rs
and Ls . Ts is the local copy of the tuple space T in this server.
The variable rs gives the number of tuples previously removed
from the tuple space replica in s. The set Rs contains the tuples
already removed from the local copy of the tuple space (Ts ). We
call Rs the removal set and we use it to ensure that a tuple is
not removed more than once from Ts . Variable rs is updated only
by the inp protocol. Later, in Section IV-B.5, we see that this
operation is executed in all servers in the same order. Therefore,
the value of rs follows the same sequence in all correct servers.
In the basic protocol (without the improvements of Section V)
rs = |Rs |. Finally, the set Ls contains all clients registered to
receive updates from this tuple space. This set is used in the
rdp operation (Section IV-B.4). The protocols use a function
send(to, msg) to send a message msg to the recipient to, and a
function receive(from, msg) to receive a message, where from is
the sender and msg the message received.
3) Tuple Insertion (out): Algorithm 1 presents the out protocol.
When a process p wants to insert a tuple t in the tuple space, it
sends t to all servers (line 1) and waits for acknowledgments
from a quorum of servers6 (line 2). At the server side, if the
tuple is not in the removal set Rs (indicating that it has already
been removed) (line 3), it is inserted in the tuple space (line 4).
•
6 In fact, it would be possible to implement this step sending the message
to a quorum and then, periodically, to other servers, until there are responses
from a quorum.
5
An acknowledgment is returned (line 6).
Algorithm 1 out operation (client p and server s).
{C LIENT}
procedure out(t)
1: ∀s ∈ U, send(s, hOUT,ti)
2: wait until ∃Q ∈ Q : ∀s ∈ Q, receive(s, hACK-OUTi)
{S ERVER}
upon receive(p, hOUT,ti)
3: if t ∈
/ Rs then
4:
Ts ← Ts ∪ {t}
5: end if
6: send(p, hACK-OUTi)
With this simple algorithm a faulty client process can insert a
tuple in a subset of the servers. In that case, we say that it is an
incompletely inserted tuple. The number of incomplete insertions
made by a process can be bounded to one, as described in Section
V. As can be seen in next sections, rdp (resp. inp) operations are
able to read (resp. remove) such a tuple if it is inserted in at least
f + 1 servers.
Notice that this protocol is always fast (terminates in two communication steps) [39]. Additionally, this protocol is confirmable
[37], i.e., a process executing out knows when the operation
ends. This is important because a protocol with this property
gives ordered semantics to LBTS’ out operation, which makes
the coordination language provided by LBTS Turing powerful
[40]. This means that the tuple space implemented by LBTS is
equivalent to a Turing machine and thus can be used to implement
any computable function.
4) Tuple Reading (rdp): rdp is implemented by the protocol
presented in Algorithm 2. The protocol is more tricky than
the previous one for two reasons. First, it employs the listener
communication pattern to capture the replicas state between
removals. Second, if a matching tuple is found, the process may
have to write it back to the system to ensure that it will be read
in subsequent reads, satisfying the linearizability property.
When rdp(t) is called, the client process p sends the template
t to the servers (line 1). When a server s receives this message, it
registers p as a listener, and replies with all tuples in Ts that match
t 7 and the current number of tuples already removed rs (lines
18-20). While p is registered as a listener, whenever a tuple is
added or removed from the space a set with the tuples that match
t is sent to p 8 (lines 28-31).
Process p collects replies from the servers, putting them in the
Replies matrix, until it manages to have a set of replies from
a quorum of servers reporting the state after the same number
of tuple removals r (lines 2-6)9 . After that, a RDP-COMPLETE
message is sent to the servers (line 8).
The result of the operation depends on a single row r of the
matrix Replies. This row represents a cut on the system state in
which a quorum of servers processed exactly the same r removals,
so, in this cut, quorum reasoning can be applied. This mechanism
is fundamental to ensure that agreement algorithms and quorumbased protocols can be used together for different operations,
7 If there are many tuples, only a given number of the oldest tuples are sent,
and the client can request more as needed.
8 In practice, only the update is sent to p.
9 We use a matrix in the algorithm just to simplify the exposition. In practice
this matrix is very sparse so it would have to be implemented using some
other structure, like a list.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
Algorithm 2 rdp operation (client p and server s).
{C LIENT}
procedure rdp(t)
1: ∀s ∈ U, send(s, hRDP,ti)
2: ∀x ∈ {1, 2, ...}, ∀s ∈ U, Replies[x][s] ← ⊥
3: repeat
4:
wait until receive(s, hREP-RDP, s, Tst , rs iσs )
5:
Replies[rs ][s] ← hREP-RDP, s, Tst , rs iσs
6: until ∃r ∈ {1, 2, ...}, {s ∈ U : Replies[r][s] 6= ⊥} ∈ Q
7: {From now on r indicates the r of the condition above}
8: ∀s ∈ U, send(s, hRDP-COMPLETE,ti)
9: if ∃t, count tuple(t, r, Replies[r]) ≥ q then
10:
return t
11: else if ∃t, count tuple(t, r, Replies[r]) ≥ f + 1 then
12:
∀s ∈ U, send(s, hWRITEBACK,t, r, Replies[r]i)
13:
wait until ∃Q ∈ Q : ∀s ∈ Q, receive(s, hACK-WBi)
14:
return t
15: else
16:
return ⊥
17: end if
{S ERVER}
upon receive(p, hRDP,ti)
18: Ls ← Ls ∪ {hp,ti}
19: Tst ← {t ∈ Ts : m(t,t)}
20: send(p, hREP-RDP, s, Tst , rs iσs )
upon receive(p, hRDP-COMPLETE,ti)
21: Ls ← Ls \ {hp,ti}
upon receive(p, hWRITEBACK,t, r, proof i)
22: if count tuple(t, r, proof ) ≥ f + 1 then
23:
if t ∈
/ Rs then
24:
Ts ← Ts ∪ {t}
25:
end if
26:
send(p, hACK-WBi)
27: end if
upon removal of t from Ts or insertion of t in Ts
28: for all hp,ti ∈ Ls : m(t,t) do
29:
Tst ← {t ′ ∈ Ts : m(t ′ ,t)}
30:
send(p, hREP-RDP, s, Tst , rs iσs )
31: end for
Predicate: count tuple(t, r, msgs) ,
|{s ∈ U : msgs[s] = hREP-RDP, s, Tst , riσs ∧ t ∈ Tst }|
one of the novel ideas of this paper. If there is some tuple t in
Replies[r] that was replied by all servers in a quorum, then t is the
result of the operation (lines 9-10). This is possible because this
quorum ensures that the tuple can be read in all subsequent reads,
thus ensuring linearizability. On the contrary, if there is no tuple
replied by an entire quorum, but there is still some tuple t returned
by more than f servers10 for the same value of r, then t is writeback in the servers (lines 11-14). The purpose of this write-back
operation is to ensure that if t has not been removed until r, then
it will be readable by all subsequent rdp(t) operations requested
by any client, with m(t,t) and until t is removed. Therefore, the
write-back is necessary to handle incompletely inserted tuples.
Upon
the
reception
of
a
write-back
message
hWRITEBACK,t, proof i, server s verifies if the write-back
is justified, i.e., if proof includes at least f + 1 correctly signed
REP-RDP messages from different servers with r and t (line
22). A write-back that is not justified is ignored by correct
10 If
a tuple is returned by less than f + 1 servers it can be a tuple that has
not been inserted in the tuple space, created by a collusion of faulty servers.
6
servers. After this verification, if t is not already in Ts and has
not been removed, then s inserts t in its local tuple space (lines
23-24). Finally, s sends a ACK-WB message to the client (line
26), which waits for these replies from a quorum of servers and
returns t (lines 13-14).
5) Tuple Destructive Reading (inp): The previous protocols
are implemented using only Byzantine quorum techniques. The
protocol for inp, on the other hand, requires stronger abstractions.
This is a direct consequence of the tuple space semantics that does
not allow inp to remove the same tuple twice (once removed it is
no longer available). This is what makes the tuple space shared
memory object have consensus number two [16].
An approach to implement this semantics is to execute all inp
operations in the same order in all correct servers. This can be
made using a total order multicast protocol based on the Byzantine
PAXOS algorithm (see Section II-D). A simple approach would
be to use it as an unmodified building block, but this requires
two executions of the protocol for each inp [41]. To avoid this
overhead, the solution we propose is based on modifying this
algorithm in four specific points:
1) When the leader l receives a request inp(t) from client p
(i.e., a message hINP, p,ti), it sends to the other servers
a PRE-PREPARE message with not only the sequence
number i but also htt , hINP, p,tiσ p iσl , where tt is a tuple
in Tl that matches t. If there is no tuple that matches t in
Ts , then tt = ⊥.
2) A correct server s′ accepts to remove the tuple tt proposed
by the leader in the PRE-PREPARE message if: (i.) the
usual Byzantine PAXOS conditions for acceptance described
in Section II-D are satisfied; (ii.) s′ did not accept the
removal of tt previously; (iii.) tt and t match; and (iv.) tt
is not forged, i.e., either t ∈ Ts′ or s′ received f + 1 signed
messages from different servers ensuring that they have t
in their local tuple spaces. This last condition ensures that
a tuple t can be removed if and only if it can be read, i.e.,
only if at least f + 1 servers report having it.
3) When a new leader l ′ is elected, each server sends its
protocol state to l ′ (as in the original total order Byzantine
PAXOS algorithm11 ) and a signed set with the tuples in its
local tuple space that match t. This information is used by
l ′ to build a proof for a proposal with a tuple t (in case
it gets that tuple from f + 1 servers). If there is no tuple
reported by f + 1 servers, this set of tuples justifies a ⊥
proposal. This condition can be seen as a write-back from
the leader in order to ensure that the tuple will be available
in sufficiently many replicas before its removal.
4) A client waits for ⌈ n+1
2 ⌉ matching replies from different
servers to consolidate the result of its request, instead of
f +1 as in Byzantine PAXOS. This increase in the number of
expected responses before completing an operation ensures
that at least f + 1 servers (one correct server) are in the
insersection between the set of servers that report a tuple
removal and the quorums of servers accessed in subsequent
rdps. Ultimately, it ensures that a tuple will never be read
after its removal.
Giving these modifications on the total order protocol, an inp
operation is executed by Algorithm 3.
11 The objective is to ensure that a value decided by some correct server
in some round (i.e., the request sequence number and the reply) will be the
only possible decision in all subsequent rounds.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
Algorithm 3 inp operation (client p and server s).
{C LIENT}
procedure inp(t)
1: TO-multicast(U, hINP, p,ti)
2: wait until receive hREP-INP,tt i from ⌈ n+1
2 ⌉ servers in U
3: return tt
{S ERVER}
upon paxos leader(s) ∧ Ps 6= 0/
4: for all hINP, p,ti ∈ Ps do
5:
i ← i+1
6:
if ∃t ∈ Ts : m(t,t) ∧ ¬marked(t) then
7:
tt ← t
8:
mark(i,t)
9:
else
10:
tt ← ⊥
11:
end if
12:
paxos propose(i, htt , hINP, p,tii)
13: end for
upon paxos deliver(i, htt , hINP, p,tii)
14: unmark(i)
15: Ps ← Ps \ {hINP, p,ti}
16: if tt 6= ⊥ then
17:
if tt ∈ Ts then
18:
Ts ← Ts \ {tt }
19:
end if
20:
Rs ← Rs ∪ {tt }
21:
rs ← rs + 1
22: end if
23: send(p, hREP-INP,tt i)
For a client p, the inp(t) algorithm works exactly as if the
replicated tuple space was implemented using state machine
replication based on Byzantine PAXOS [18]: p sends a request
to all servers and waits until ⌈ n+1
2 ⌉ servers reply with the same
response, which is the result of the operation (lines 1-3).
In the server side, the requests for executions of inp received
are inserted in the pending set Ps . When this set is not empty,
the code in lines 4-13 is executed by the leader (the predicate
paxos leader(s) is true iff s is the current leader). For each
pending request in Ps , a sequence number is attributed (line
5). Then, the leader picks a tuple from the tuple space that
matches t (lines 6-7) and marks it with its sequence number to
prevent it from being removed (line 8). The procedure mark(i,t)
marks the tuple as the one proposed to be removed in the ith removal, while the predicate marked(t) says if t is marked
for removal. If no unmarked tuple matches t, ⊥ is proposed
for the Byzantine PAXOS agreement (using the aforementioned
PRE-PREPARE message), i.e., is sent to the other servers (lines
10, 12). The code in lines 4-13 corresponds to the modification 1
above. Modifications 2 and 3 do not appear in the code since
they are reasonably simple changes on the Byzantine PAXOS
algorithm.
When the servers reach agreement about the sequence number
and the tuple to remove, the paxos deliver predicate becomes
true and the lines 14-23 of the algorithm are executed. Then,
each server s unmarks any tuple that it marked for removal with
the sequence number i (line 14) and removes the ordered request
from Ps (line 15). After that, if the result of the operation is a
valid tuple tt , the server verifies if it exists in the local tuple space
Ts (line 17). If it does, it is removed from Ts (line 18). Finally,
tt is added to Rs , the removal counter rs is incremented and the
7
result is sent to the requesting client process (lines 20-23).
It is worth noticing that Byzantine PAXOS usually does not
employ public-key cryptography when the leader does not change.
The signatures required by the protocol are made using authenticators, which are vectors of message authentication codes [18].
However, modification 3 requires that the signed set of tuples
will be sent to a new leader when it is elected. Therefore, our
inp protocol requires public-key cryptography, but only when the
operation cannot be resolved in the first Byzantine PAXOS round
execution.
V. I MPROVEMENTS AND O PTIMIZATIONS
This section describes several improvements and optimizations
that can be made on the presented protocols.
A. Optimizing rdp
The protocol in Algorithm 2 usually does not require the
write-back phase when there are no faulty servers and there
are many tuples in the space that match the requested template.
In that case it is very likely that some tuple will be replied
by a complete quorum of servers, thus avoiding the need for
verifying if write-backs are justified, something that would imply
verifying the signatures of a set of REP-RDP messages. Publickey cryptography has been shown to be a major bottleneck in
practical Byzantine fault-tolerant systems [18], specially in LANs
and other high speed networks, where the communication latency
is lower and public-key cryptography processing costs dominate
the operation latency. To avoid using public-key signatures we
propose the following optimization on the rdp protocol: the client
first accesses a quorum of servers asking for the tuples that match
the template (without requiring signed REP-RDP messages from
servers or the use of the listener pattern). If there is some tuple
t returned by the whole quorum of servers, then the operation is
finished and t is the result. If no tuple is returned by more than f
servers then the operation is finished and the result is ⊥. If some
tuple is returned by at least f + 1 servers, it means that possibly
a write-back will be needed, so the protocol from Algorithm 2 is
used. Notice that this optimization does not affect the correctness
of the protocol since the single delicate operation, the write-back,
is executed using the normal protocol.
B. Bounding Memory
As described in Section IV-B, the Rs set stores all tuples
removed from the space at a particular tuple space server s. In
long executions, where a large number of tuples are inserted and
removed from the tuple space, Rs will use an arbitrary amount
of memory. Even considering that, in practice, we can bound
the memory size by discarding old tuples from this set, since
insertions and removals cannot take an infinite amount of time to
complete12 , in theoretical terms, this requires unbounded memory
in servers. We can modify the protocols to make servers discard
removed tuples.
The first need for unbounded memory is to avoid that clients
executing rdp write-back a tuple which is being removed concurrently in the servers that already removed it (line 23 of
Algorithm 2). In this way, the requirement is to prevent that
12 Recall that the removed tuples are stored in this set only to prevent them
from being removed more than once during their insertion and/or removal.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
tuples being removed are written-back. This can be implemented
making each server s store the removed tuple t as long as there
are rdp operations that started before the removal of t on s. More
specifically, a removed tuple t must be stored in Rs until there
are no clients in Ls (listeners set) that began their rdp operations
in s (lines 1-17 of Algorithm 2) before the removal of t from
s (lines 14-23 of Algorithm 3). In this way, the Rs set can be
“cleaned” when s receives a message RDP-COMPLETE (line 21
of Algorithm 2).
The second need for unbounded memory on LBTS is avoiding
more than one removal of the same tuple. This problem can
happen if some tuple t is removed while its insertion is not
complete, i.e., t is not in the local tuple space of a quorum of
servers and is removed from the tuple space (recall that a tuple
can be removed if it appears in at least f + 1 servers). To solve
this problem, we have to implement a control that ensures that
a tuple that was not present in the local tuple space of server s
(Ts ) when it was removed from the replicated tuple space cannot
be inserted in Ts after its removal (otherwise it could be removed
more than once). More specifically, a removed tuple t will be
removed from Rs only if t was received (inserted) by s.
To summarize, a tuple can only be removed from the Rs set
when (1.) there is no rdp concurrent with the removal and (2.)
when the tuple was already received by s.
These two modifications make the amount of tuples stored in Rs
at a given time directly dependent of the amount of out operations
being executed concurrently with removals in s at this time (in
the worst case, the concurrency of the system) and the number
of partially inserted tuples in LBTS (that can be bounded at one
per faulty client, as shown below).
C. Bounding Faulty Clients
The LBTS protocols as described allow a malicious client to
partially insert an unbounded number of tuples in a space. Here
we present some modifications to the out protocol to bound to
one the number of incomplete writes that a faulty client can make.
The modifications are based on the use of insertion certificates:
sets of signed ACK-OUT messages from a quorum of servers
corresponding to the replies from an OUT message [32]. Each
insertion has a timestamp provided by the client. Each server
stores the greatest timestamp used by a client. To insert a new
tuple a client must send an OUT message with timestamp greater
than the last used by itself together with the insertion certificate
of the previous completed write. A server accepts an insertion if
the presented insertion certificate is valid, i.e., it contains at least
2 f + 1 correctly signed ACK-OUT messages.
There are three main implications of this modification. First, it
requires that all ACK-OUT messages are signed by the servers.
However, we can avoid public key signature since we are assuming n ≥ 4 f + 1 servers, and with this number of servers we
can use authenticators [18], a.k.a. MAC vectors, that implement
signatures using only symmetric cryptography. In this way, an
insertion certificate is valid for a server s if and only if it contains
q − f messages correctly authenticated for this server (the MAC
field corresponding to s contains a correct signature). The second
implication is the use of FIFO channels between clients and
servers, which were not needed in the standard protocol. The final
implication, also related with FIFO channels, is that each server
must store the timestamp for the last write of a client to evaluate
if the insertion certificate is correct, consequently, the amount of
8
memory needed by the algorithm is proportional to the number
of clients of the system13 .
Notice that this modification can be implemented in a layer below the LBTS out algorithm, without requiring any modification
in it. Notice also that the use of authenticators adds very modest
latency to the protocol [18].
VI. M INIMAL LBTS
As stated in Section IV-A, LBTS requires a sub-optimal number of replicas (n ≥ 4 f + 1) to implement a tuple space with
confirmable semantics. However, it is possible to implement a
tuple space with optimal resilience if we sacrifice the confirmation
phase of some LBTS protocols. To implement LBTS with only
n ≥ 3 f + 1 we need to use an asymmetric Byzantine quorum
system [37]. This type of quorum system has two types of
quorums with different sizes: read quorums with qr = ⌈ n+2f +1 ⌉
servers and write quorums with qw = ⌈ n+22f +1 ⌉.
The key property of this type of system is that every read quorum intersects every write quorum in at least 2 f + 1 servers. This
property makes this system very similar to the f -dissemination
quorum system used by LBTS. Thus, the adaptation of LBTS to
this kind of quorum is very simple: we need to make all write
quorum operations (out protocol and write-back phase of the rdp
protocol) non-confirmable, i.e., the request for write is sent to a
write quorum and there is no wait for confirmation (ACK-OUT
or ACK-WB messages).
In a non-confirmable protocol, the writer does not know when
its operation completes. In LBTS, an out(t) operation ends when
t is inserted in the local tuple space of qw − f correct servers.
The same happens with the write-back phase of the rdp protocol.
Missing this confirmation for an out operation has subtle but
significant impacts on the tuple space computational power: if
the protocol that implements out is non-confirmable, the resulting
operation has unordered semantics and the coordination language
provided by the tuple space has less semantical power (it is not
Turing powerful) [40].
The correctness proof of Minimal LBTS is the same of
“normal” LBTS because the intersection between write and read
quorums is at least 2 f + 1 servers, which is the same intersection
for the symmetrical quorums used in LBTS.
VII. E VALUATION
A. Distributed Algorithm Metrics
This section presents an evaluation of the system using two distributed algorithms metrics: message complexity (M.C.) and number of communication steps (C.S.). Message complexity measures
the maximum amount of messages exchanged between processes,
so it gives some insight about the communication system usage
and the algorithm scalability. The number of communication steps
is the number of sequential communications between processes,
so usually it is the main factor for the time needed for a distributed
algorithm execution to terminate.
In this evaluation, we compare LBTS with an implementation
of a tuple space with the same semantics based on state machine
replication [24], which we call SMR-TS. SMR is a generic solution for the implementation of fault-tolerant distributed services
using replication. The idea is to make all replicas start in the
13 In practice this is not a problem, since only a single integer is require to
be stored for each client.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
same state and deterministically execute the same operations in
the same order in all replicas. The implementation considered
for SMR-TS is based on the total order algorithm of [18] with
the optimization for fast decision (two communication steps) in
nice executions of [19], [20]. This optimization is also considered
for the modified Byzantine PAXOS used in our inp protocol. The
SMR-TS implements an optimistic version for read operations
in which all servers return immediately the value read without
executing the Byzantine PAXOS; the operation is successful in
this optimistic phase if the process manages to collect n − f
identical replies (and perceives no concurrency), otherwise, the
Byzantine PAXOS protocol is executed and the read result is the
response returned by at least f +1 replicas. This condition ensures
linearizability for all executions.
Operation
out
rdp
inp
LBTS
M.C.
C.S.
O(n)
2
O(n)
2/4
O(n2 ) 4/7
SMR-TS
M.C.
C.S.
O(n2 )
4
O(n)/O(n2 ) 2/6
O(n2 )
4
TABLE I
C OSTS IN NICE EXECUTIONS
Table I evaluates nice executions of the operations in terms
of message complexity and communication steps14 . The costs of
LBTS’ operations are presented in the second and third columns
of the table. The fourth and fifth columns show the evaluation
of SMR-TS. The LBTS protocol for out is cheaper than SMRTS in both metrics. The protocol for rdp has the same costs in
LBTS and SMR-TS in executions in which there is no matching
tuple being written concurrently with rdp. The first values in the
line of the table corresponding to rdp are about this optimistic
case (O(n) for message complexity, 2 for communication steps).
When a read cannot be made optimistically, the operation requires
4 steps in LBTS and 6 in SMR-TS (optimistic phase plus the
normal operation). Moreover LBTS’ message complexity is linear,
instead of O(n2 ) like SMR-TS. The protocol for inp uses a single
Byzantine PAXOS execution in both approaches. However, in
cases in which there are many tuples incompletely inserted (due
to extreme contention or too many faulty clients), LBTS might
not decide in the first round (as discussed in Section V). In this
case a new leader must be elected. We expect this situation to
be rare. Notice that LBTS’ quorum-based protocols (rdp and inp)
are fast (terminates in two communication steps) when executed
in favorable settings, matching the lower bound of [39].
The table allow us to conclude that an important advantage of
LBTS when compared with SMR-TS is the fact that in SMR-TS
all operations require protocols with message complexity O(n2 ),
turning simple operations such as rdp and out as complex as inp.
Another advantage of LBTS is that its quorum-based operations,
out and rdp, always terminate in few communication steps while
in SMR-TS these operation relies on Byzantine PAXOS, that
we can have certainty that terminates in 4 steps only in nice
executions [19], [20].
The evaluation of what happens in “not nice” situations is
not shown in the table. In that case, all operations based on
Byzantine PAXOS are delayed until there is enough synchrony
14 Recall from Section II-B that an execution is said to be nice if the
maximum delay ∆ always hold and there are no failures.
9
for the protocol to terminate (u > GST). This problem is especially relevant in systems deployed in large scale networks
and Byzantine environments in which an attacker might delay
the communication at specific moments of the execution of the
Byzantine PAXOS algorithm with the purpose of delaying its
termination.
B. Experimental Latency Evaluation
In order to assess the characteristics of LBTS real executions,
we implemented a prototype of the system and compared it with
D EP S PACE, an SMR-TS implementation [42]. Both systems are
implemented in Java and use JBP (Java Byzantine Paxos) as total
order multicast communication support (with the modifications
described in Section IV-B.5 in the case of LBTS)15 .
The experiments were conducted on a testbed consisting of
6 Dell PowerEdge 850 computers. The characteristics of the
machines are the same: a single 2.8 GHz Pentium 4 CPU
with 2Gb of RAM. The machines were connected by a Dell
PowerConnect 2724 network switch with bandwidth of 1 Gbps.
Figure 1 presents some experimental results comparing LBTS
and SMR-TS. We report operations’ latency results for different
tuple sizes in executions without faults and considering n = 5
(LBTS) and n = 4 (SMR-TS). The experiments consisted in
a client inserting, reading and removing the same tuple from
the space, repeated 2000 times. We calculated the mean of the
measured values excluding the 5% with greater variance.
As expected, the results reflect the metrics shown in Table I
and provide more insights about the advantages of LBTS. Figure
1(a) shows that LBTS’ out protocol greatly outperforms SMR-TS:
it is nine times faster, instead of the two times faster suggested
by the number of communication steps of these protocols. This
happens because LBTS’ out protocol is much simpler when
compared with the Byzantine PAXOS total order protocol used
in the corresponding protocol of SMR-TS. Figure 1(b) presents
the latency of both systems rdp operations. As can be seen in this
figure, the latency of this operation for both protocols is almost
the same, with a little advantage for SMR-TS due to the very
simple optimistic protocol that suffices for read in absence of
faults and contention. Finally, Figure 1(c) shows the latency of
the inp in both systems. As expected, the results reflect the fact
that both inp operations are based on Byzantine PAXOS total order
protocol. Notice that when the tuple get bigger, LBTS becomes
slower. This happens because the Byzantine PAXOS agreement
is made over proposed values (messages and sequence number)
hashes and, in the case of our modified algorithm, the leader must
propose a tuple as response, adding it to the proposed value that
replicas must agreed upon. Consequently, when tuples get bigger,
the messages exchanged during Byzantine PAXOS execution get
bigger, and the protocol loses performance. We do not consider
it a problem at all, since tuples are not expected to be so big.
VIII. R ELATED W ORK
Two replication approaches can be used to build Byzantine
fault-tolerant services: Byzantine quorum systems [15] and state
machine replication [18], [24]. The former is a data centric
approach based on the idea of executing different operations in
different intersecting sets of servers, while the latter is based on
15 Both protypes and JBP are freely available at http://www.
navigators.di.fc.ul.pt/software/depspace/.
256
1024
Tuple Size (bytes)
(a) out
Fig. 1.
4096
7
LBTS
6.5
SMR-TS
6
5.5
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
64
10
Latency (ms)
7
LBTS
6.5
SMR-TS
6
5.5
5
4.5
4
3.5
3
2.5
2
1.5
1
0.5
0
64
Latency (ms)
Latency (ms)
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
256
1024
Tuple Size (bytes)
4096
(b) rdp
9
LBTS
8.5
SMR-TS
8
7.5
7
6.5
6
5.5
5
4.5
4
3.5
3
2.5
2
64
256
1024
Tuple Size (bytes)
4096
(c) inp
Experimental latency evaluation for LBTS and SMR-TS with n = 5 and n = 4, respectivelly.
maintaining a consistent replicated state across all servers in the
system. One advantage of quorum systems in comparison to the
state machine approach is that they do not need the operations
to be executed in the same order in the replicas, so they do not
need to solve consensus. Quorum protocols usually scale much
better due to the opportunity of concurrency in the execution
of operations and the shifting of hard work from servers to
client processes [22]. On the other hand, pure quorum protocols
cannot be used to implement objects stronger than registers (in
asynchronous systems), as opposed to state machine replication,
which is more general [17], but requires additional assumptions
on the system.
The system presented in this paper uses a Byzantine quorum
system and provides specific protocols for tuple space operations.
Only one of the protocols (inp) requires a consensus protocol,
therefore it needs more message exchanges and time assumptions
(eventual synchrony) to terminate. This requirement is justified
by the fact that tuple spaces have consensus number two [16]
according to Herlihy’s wait-free hierarchy [14], therefore they
cannot be implemented deterministically in a wait-free way in
a completely asynchronous system. To the best of our knowledge there is only one work on Byzantine quorums that has
implemented objects more powerful than registers in a way that
is similar to ours, the Q/U protocols [22]. That work aims
to implement general services using quorum-based protocols in
asynchronous Byzantine systems. Since this cannot be done ensuring wait-freedom, the approach sacrifices liveness: the operations
are guaranteed to terminate only if there is no other operation
executing concurrently. A tuple space build using Q/U has mainly
two drawbacks, when compared with LBTS: (i.) it is not wait-free
so, in a Byzantine environment, malicious clients could invoke
operations continuously, causing a denial of service; and (ii.) it
requires 5 f + 1 servers, f more than LBTS, and it has an impact
on the cost of the system due to the cost of diversity [26].
There are a few other works on Byzantine quorums related
to ours. In [30], a non-skipping timestamps object is proposed.
This type of object is equivalent to a fetch&add register, which
is known to have consensus number two [14]. However, in
order to implement this object in asynchronous systems using
quorums, the specification is weakened in such a way that the
resulting object has consensus number 1 (like a register). Some
works propose consensus objects based on registers implemented
using quorum protocols and randomization (e.g., [43]) or failure
detectors (e.g., [44]). These works differ fundamentally from
ours since they use basic quorum-based objects (registers) to
build consensus while we use consensus to implement a more
elaborated object (tuple space). Furthermore, the coordination
algorithms provided in these works requires that processes know
each other, a problem for open systems.
Cowling et al. proposed HQ-R EPLICATION [23], an interesting
replication scheme that uses quorum protocols when there are no
contention in operations executions and consensus protocols to
resolve contention situations. This protocol requires n ≥ 3 f + 1
replicas and process reads and writes in 2 to 4 communication
steps in contention-free executions. When contention is detected,
the protocol uses Byzantine PAXOS to order contending requests.
This contention resolution protocol adds great latency to the
protocols, reaching more than 10 communication steps even in
nice executions. Comparing LBTS with a tuple space based
on HQ-R EPLICATION, in executions without contention, LBTS’
out will be faster (2 steps instead of 4 of HQ), rdp will be
equivalent (the protocols are similar) and inp will have the same
latency in both, however, LBTS’ protocol has O(n2 ) message
complexity instead of O(n) of HQ. In contending executions,
LBTS is expected to outperform HQ in orders of magnitude since
its protocols are little affected by these situations. On the other
hand, HQ-R EPLICATION requires f fewer replicas than LBTS.
Recently, some of the authors designed, implemented and evaluated a Byzantine fault-tolerant SMR-TS called D EP S PACE [42].
As showed in Section VII, this kind of tuple space uses a much
less efficient out protocol when compared with LBTS. There are
also other works that replicate tuple spaces for fault tolerance.
Some of them are based on the state machine approach (e.g., [7])
while others use quorum systems (e.g., [8]). However, none of
these proposals deals with Byzantine failures and intrusions, the
main objective of LBTS.
Several papers have proposed the integration of security mechanisms in tuple spaces. Amongst these proposals, some try to
enforce security policies that depend on the application [25],
[45], while others provide integrity and confidentiality in the
tuple space through the implementation of access control [38].
These works are complementary to LBTS, which focus on the
implementation of a Byzantine fault-tolerant tuple space, and does
not propose any specific protection mechanism to avoid disruptive
operations on the tuple space being executed by malicious clients,
assuming instead that an access control mechanism is employed
in accordance with the applications that use the tuple space.
The construction presented in this paper, LBTS, builds on a
preliminary solution with several limitations, BTS [41]. LBTS
goes much further in mainly three aspects: it is linearizable; it uses
a confirmable protocol for operation out (improving its semantical
power [40]); and it implements the inp operation using only one
Byzantine PAXOS execution, instead of two in BTS.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
IX. C ONCLUSIONS
In this paper we presented the design of LBTS, a Linearizable
Byzantine Tuple Space. This construction provides reliability,
availability and integrity for coordination between processes in
open distributed systems. The overall system model is based on a
set of servers from which less than a fourth may be faulty and on
an unlimited number of client processes, from which arbitrarily
many can also be faulty. Given the time and space decoupling
offered by the tuple space coordination model [3], this model
appears to be an interesting alternative for coordination of nontrusted process in practical dynamic distributed systems (like P2P
networks on the Internet or infrastructured wireless networks).
LBTS uses a novel hybrid replication technique which combines Byzantine quorum systems protocols with consensus-based
protocols resulting in a design in which simple operations
use simple quorum-based protocols while a more complicated
operation, which requires servers’s synchronization, uses more
complex agreement-based protocols. The integration of these
two replication approaches required the development of some
novel algorithmic techniques that are interesting by themselves.
Concerning tuple space implementation, an important contribution
of this work is the assertion that out and rdp can be implemented
using quorum-based protocols, while inp requires consensus. This
design shows important benefits when compared with the same
object implemented using state machine replication.
ACKNOWLEDGEMENTS
We warmly thank the referees, Paulo Sousa, Piotr Zielinski
and Rodrigo Rodrigues for their suggestions to improve the paper.
This work was supported by LaSIGE, the EU through project IST4-027513-STP (CRUTIAL) and CAPES/GRICES (project TISD).
11
being executed concurrently to the read (ensuring that the tuple
will not be removed by a concurrent inp). Notice that any tuple
returned by rdp or inp operations from a tuple space containing
several tuples is readable, according to the Definition 1.
From the algorithm that implements rdp, line 11, it is simple to
infer that a tuple is readable if it exists in f + 1 correct servers in
any quorum of the system. This implies that the tuple must exist
in n − q + f + 1 correct servers ( f + 1 servers from the quorum
plus n − q servers not from the quorum). The concept of readable
tuple is used in the proofs to assert that if a tuple is correctly
inserted, then it can be read (it is readable) or that if a tuple is
correctly removed, then it cannot be read (it is not readable).
The main strategy to show that LBTS implements a linearizable
tuple space is to prove that, for every tuple t in the space, every
sequence of operations that manipulates t is linearizable (Lemmas
1-6). Given that every operation manipulates a single tuple, the
tuple space object can be said to be composed by the set of
tuples that are inserted in it during a run. Then, if every tuple is
linearizable, the locality property of linearizability (Theorem 1 of
[13]) ensures that the tuple space object is linearizable (Theorem
1). LBTS wait-freedom is proved (in our system model) showing
that a correct client that invokes an operation on LBTS never
stays blocked (Lemmas 7-9).
Our first lemma shows that if a tuple is readable then it can be
removed.
Lemma 1 If after an operation op a tuple t is readable, then t
will be the result of inp(t) executed immediately after op.
The definition states that a tuple is readable if it would be
the result of some rdp operation, independently of the accessed
quorum or the number of failures in the system (but assuming less
than n/4 servers fail). Given the non-deterministic nature of tuple
spaces, this is the only way to define that some tuple is available
to be read/removed using only the tuple space specification.
Informally, what we do is to define a readable tuple as a tuple that
will be the result of a matching rdp for certain, i.e., if that tuple
is the only tuple on the tuple space (ruling out the possibility of
another matching tuple be read instead) and there is no operations
Proof: Assume that after op the number of removed tuples is r.
Assume a correct process p invokes inp(t). If a tuple t is readable
after r removals then there are at least n−q+ f +1 correct servers
that have t. In this condition, we have to prove that an execution
of inp(t) must return t. Considering that the current leader in the
execution of the Byzantine PAXOS is the server l, we have to
consider two cases:
1) l is correct – two cases have to be considered:
a) t ∈ Tl : in this case l will propose t to be the result of
the operation (and no other tuple due to the definition
of readable tuple). Since at least n − q + f + 1 correct
servers have t, they will accept this proposal and the
tuple will be the result of the operation.
b) t ∈
/ Tl : in this case l will propose ⊥. This value will not
be accepted because t exists in n − q + f + 1 correct
servers, so at most q − f − 1 < ⌈ n+2 f ⌉ servers can
accept ⊥ 17 . Therefore, a new leader is elected. Since
no tuple was accepted for removal by n − q + f + 1
servers, this leader will be able to choose t as the
result. Any other proposal will be rejected and will
cause a new leader election. Eventually a leader with
t ∈ Tl will be elected and case 1.(a) will apply.
2) l is faulty – in this case, l can propose ⊥ or some t ′ 6= t.
If ⊥ is proposed, it will not be accepted by any of the
n − q + f + 1 correct servers that have t (because m(t,t)).
If t ′ is proposed, it will not be accepted by more than f
servers so it is not decided as the result of the operation.
The reason for this is that, by the definition of readable,
16 The proofs in this section are generic but we suggest the reader to use
n = 4 f + 1 and q = 3 f + 1 to make them simpler to understand.
17 Recall that in the Byzantine PAXOS a value can be decided only if ⌈ n+ f ⌉
2
servers accept it (Section II-D).
A PPENDIX
LBTS C ORRECTNESS P ROOF
In this Appendix we prove that our protocols implement a tuple
space that satisfies the correctness conditions stated in Section III.
We consider the protocols as presented in Section IV.
In this section, when we say that a tuple t exists in some server
s, or that server s has t, we mean that t ∈ Ts . Recall that there
are |U| = n ≥ 4 f + 1 servers and that the quorum size is q =
⌈ n+22f +1 ⌉16 .
We begin by defining the notion of readable tuple.
Definition 1 A tuple t is said to be readable if it would be
the result of a rdp(t) operation, with m(t,t), executed without
concurrency in a tuple space containing only the tuple t.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
there is no other tuple in the space that matches t, so no
correct server will have t ′ in its local tuple space (or t ′ and t
do not match). In both cases there will be at most n − q + f
servers that accept the result proposed by the leader and the
decision will not be reached in this round, consequently a
new leader will be elected. Depending on this leader being
correct or not, cases 1 or 2 apply again.
In both cases, the result of the inp(t) will be t.
The following lemma proves that a tuple cannot be read before
being inserted in the space. This lemma is important because it
shows that faulty servers cannot “create” tuples.
Lemma 2 Before an hout(t)/acki, t is not readable.
Proof: A tuple will be considered for reading if at least f + 1
servers send it in reply to a read request. This means that at least
one correct server must reply it. Since before out(t) no correct
server will reply the tuple t, then t is not readable.
Lemma 3 After an hout(t)/acki and before a hinp(t)/ti, t is
readable.
Proof: The definition of readable considers that t is the only tuple
in the tuple space that matches t. If t has been inserted by an out(t)
then there is a quorum of servers Q1 that have t. Consider a read
operation performed after that insertion but before the removal of
t from the space. We have to prove that t must be the result of a
rdp(t), assuming that m(t,t). After line 6 of the rdp(t) protocol,
we know that a quorum Q2 of servers replied the matching tuples
they have after r removals. Every correct server of Q2 that is
member of Q1 will reply t. Since the intersection of two quorums
has at least 2 f +1 servers, |Q1 ∩Q2 | ≥ 2 f +1, from which at most
f are faulty, t will be returned by at least f + 1 servers and will
be the result of the operation. Therefore, t is readable.
The following lemma proves that when a tuple is read it
remains readable until it is removed.
Lemma 4 After a hrdp(t)/ti and before an hinp(t)/ti, t is
readable.
Proof: We know that a quorum of servers has t after a rdp(t)
operation that returns t (due to the write-back phase). Therefore,
following the same reasoning as in the proof of Lemma 3, we
can infer that t is readable.
The following two lemmas prove that it is not possible to
remove a tuple twice.
Lemma 5 After an hinp(t)/ti, t is not readable.
Proof: Assume t was the r-th tuple removed from the space. Assume also, for sake of simplicity, that there are no more removals
in the system. If t was removed, then at least ⌈ n+1
2 ⌉ > n−q+ f
servers would report r removals and consequently it is impossible
for a client to see the tuple space state with r′ < r removals. Now,
we will prove the lemma by contradiction. Suppose that a rdp(t)
operation executed after r removals returns t. This implies that t
was reported by at least f + 1 servers that have removed r tuples.
This is clearly impossible because no correct server will report t
after removing it.
Lemma 6 A tuple t cannot be removed more than once.
12
Proof: Due to the Byzantine PAXOS total order algorithm safety
properties, all removals are executed sequentially, one after another. The modification number 2 of this algorithm in Section
IV-B.5 prevents correct servers from accepting for removal tuples
already removed.
The following lemmas state that the three operations provided
by LBTS satisfy wait-freedom [14], i.e., that they always terminate in our system model when invoked by a correct client.
Lemma 7 Operation out is wait-free.
Proof: An inspection of Algorithm 1 shows that the only place in
which it can block is when waiting for replies from a quorum of
servers (q servers) in line 2. Since all correct servers reply and
q ≤ n − f (availability property in Section II-C), the algorithm
does not block.
Lemma 8 Operation rdp is wait-free.
Proof: In the first phase of Algorithm 2, client p waits for replies
from a quorum of servers that removed r tuples. The Byzantine
PAXOS guarantees that if a correct server removed r tuples then
eventually all other correct servers will also remove r tuples.
The listener pattern makes each server notify p with the tuples
that combine with its template after each removal, so eventually
there will be some r for which all correct servers replied to the
read operation after r removals. Therefore, the first phase of the
algorithm always terminate.
The write-back phase, when necessary, also satisfies waitfreedom since the condition for the client to unblock is the
reception of confirmations from a quorum of servers. Since there
is always a quorum of correct servers (as q ≤ n − f ) and these
servers always reply to a write-back correctly justified, the client
cannot block.
Lemma 9 Operation inp is wait-free.
Proof: The liveness of this operation depends of the modified
Byzantine PAXOS. This protocol guarantees that a new leader will
be elected until there is some correct leader in a “synchronous”
round (i.e., a round in which all messages are exchanged within
certain time limits and the leader is not suspected). In this round,
the leader will propose a sequence number and a result t for
the inp(t) operation invoked. The properties of Byzantine PAXOS
ensure that the sequence number is valid. The result tuple t will
be accepted by the servers if it is accepted by ⌈ n+2 f ⌉ servers.
This happens if the out() protocol was correctly executed by the
client that inserted t in the space, or if t is correctly justified by a
proposal certificate. Therefore, we have to consider three cases:
1) t was completely inserted. In this case q = ⌈ n+22f +1 ⌉ servers
have t and at least q − f (the correct ones) will accept the
proposal. Since q − f ≥ ⌈ n+2 f ⌉ always holds for n ≥ 4 f + 1,
this proposal will eventually be decided.
2) t was partially inserted. If t exists in ⌈ n+2 f ⌉ servers, it will
be accepted and decided as the result of the operation. If t
was justified by a proposal certificate showing that t exists
in at least f + 1 servers, it will be accepted and eventually
decided. If neither t exists in ⌈ n+2 f ⌉ servers nor it is justified
by a proposal certificate, this value will not be accepted by
sufficiently many servers and a new leader will be elected.
This leader will receive the matching tuples of all servers
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
and then, if it is correct, will choose a result that can be
justified for the operation.
3) t = ⊥. This value will be accepted by a server if there is
no tuple that matches the given template or if this value is
justified by a proposal certificate (if the certificate shows
that there is no tuple that exists in f + 1 servers). If this
value is accepted by ⌈ n+2 f ⌉ servers, it will be eventually
decided.
In all these cases, t will be accepted and the operation ends.
Using all these lemmas we can prove that LBTS is a tuple space
implementation that satisfies linearizability and wait-freedom.
Theorem 1 LBTS is a linearizable wait-free tuple space.
Proof: Wait-freedom is ensured by Lemmas 7, 8 and 9, so we
only have to prove linearizability.
Consider any history H of a system in which processes interact
uniquely through LBTS. We consider that H is complete, i.e., all
invocations in H have a matching response18 . In this history a
set of tuples TH are manipulated, i.e., read, inserted or removed.
Since there is no interference between operations that manipulate
different tuples, we can consider that each tuple t is a different
object and for each tuple t ∈ TH we denote by H|t the subhistory
of H that contains only operations that manipulate t.
For each H|t, our tuple uniqueness assumption ensures that
there will be only one hout(t)/acki and Lemma 6 ensures that
there will be no more than one removal of t in this subhistory.
Therefore, H|t contains one out operation, zero or more readings
of t and at most one inp operation with result t.
The proof that LBTS is a linearizable tuple space has four main
steps. First, we build a sequential history H ′ |t for each tuple t ∈
TH with all sequential operations of H|t preserving their original
order. The second step is to order concurrent operations according
to the properties of LBTS (stated by Lemmas 1-6). Then we will
show that H ′ |t is according to a sequential specification of a tuple
space in which just one tuple is manipulated. Finally, we will use
the locality property of linearizability (Theorem 1 of [13]): if for
all t ∈ TH , H|t is linearizable, then H is linearizable.
1) For sequential operations, for each tuple t ∈ TH , Lemmas 2
and 3 show that a hrdp(t)/ti can only occur in H|t after its
insertion. Lemmas 3, 4 and 5 show that all hrdp(t)/ti will
happen before the removal of t.
2) For each tuple t ∈ H, we will order concurrent operations
in H|t, obtaining H ′ |t with all operations in H|t. The
ordering is done the following way: all hrdp(t)/ti are put
after hout(t)/acki (Lemma 2 states that t cannot be read
before hout(t)/acki, and Lemma 3 states t can be read after
hout(t)/acki) and before hinp(t)/ti (Lemma 3 states that t
can be read before hinp(t)/ti and Lemma 5 states that t
cannot be read after hinp(t)/ti). The order between different
rdp operations that return t does not matter since Lemma 4
states that once read, t will always be read until its removal.
3) After ordering the operations in the described way, all
histories H ′ |t, for each tuple t ∈ TH will begin with an
hout(t)/acki, will have zero or more hrdp(t)/ti after, and
can end with an hinp(t)/ti. Clearly, this history satisfies all
four correctness conditions presented in Section III, even
18 If it is not complete, we can extend it with the missing matching
responses.
13
when operations are executed concurrently in the system.
Therefore, for all t ∈ TH , H|t is linearizable.
4) Using the linearizability locality property, we conclude that
H is linearizable.
This means that all histories in which a set of processes communicate uniquely using LBTS are linearizable, so LBTS is
linearizable.
R EFERENCES
[1] D. Gelernter and N. Carriero, “Coordination languages and their significance,” Communications of ACM, vol. 35, no. 2, pp. 96–107, 1992.
[2] D. Gelernter, “Generative communication in Linda,” ACM Transactions
on Programing Languages and Systems, vol. 7, no. 1, pp. 80–112, Jan.
1985.
[3] G. Cabri, L. Leonardi, and F. Zambonelli, “Mobile agents coordination
models for Internet applications,” IEEE Computer, vol. 33, no. 2, pp.
82–89, Feb. 2000.
[4] GigaSpaces, “GigaSpaces – write once, scale anywere,” Avaliable at
http://www.gigaspaces.com/, 2008.
[5] Sun Microsystems, “JavaSpaces service specification,” Avaliable at
http://www.jini.org/standards, 2003.
[6] T. J. Lehman, A. Cozzi, Y. Xiong, J. Gottschalk, V. Vasudevan, S. Landis,
P. Davis, B. Khavar, and P.Bowman, “Hitting the distributed computing
sweet spot with TSpaces,” Computer Networks, vol. 35, no. 4, pp. 457–
472, 2001.
[7] D. E. Bakken and R. D. Schlichting, “Supporting fault-tolerant parallel
programing in Linda,” IEEE Transactions on Parallel and Distributed
Systems, vol. 6, no. 3, pp. 287–302, Mar. 1995.
[8] A. Xu and B. Liskov, “A design for a fault-tolerant, distributed implementation of Linda,” in Proceedings of the 19th Symposium on FaultTolerant Computing - FTCS’89, Jun. 1989, pp. 199–206.
[9] J. Fraga and D. Powell, “A fault- and intrusion-tolerant file system,” in
Proceedings of the 3rd International Conference on Computer Security,
1985, pp. 203–218.
[10] L. Lamport, R. Shostak, and M. Pease, “The Byzantine generals problem,” ACM Transactions on Programing Languages and Systems, vol. 4,
no. 3, pp. 382–401, Jul. 1982.
[11] A. Murphy, G. Picco, and G.-C. Roman, “LIME: A coordination
model and middleware supporting mobility of hosts and agents,” ACM
Transactions on Software Engineering and Methodology, vol. 15, no. 3,
pp. 279–328, Jul. 2006.
[12] F. Favarim, J. S. Fraga, L. C. Lung, and M. Correia, “GridTS: A new
approach for fault-tolerant scheduling in grid computing,” in Proceedings
of 6th IEEE Symposium on Network Computing and Applications - NCA
2007, Jul. 2007, pp. 187–194.
[13] M. Herlihy and J. M. Wing, “Linearizability: A correctness condition
for concurrent objects,” ACM Transactions on Programing Languages
and Systems, vol. 12, no. 3, pp. 463–492, Jul. 1990.
[14] M. Herlihy, “Wait-free synchronization,” ACM Transactions on Programing Languages and Systems, vol. 13, no. 1, pp. 124–149, Jan. 1991.
[15] D. Malkhi and M. Reiter, “Byzantine quorum systems,” Distributed
Computing, vol. 11, no. 4, pp. 203–213, Oct. 1998.
[16] E. J. Segall, “Resilient distributed objects: Basic results and applications
to shared spaces,” in Proceedings of the 7th Symposium on Parallel and
Distributed Processing - SPDP 1995, Oct. 1995, pp. 320–327.
[17] R. Ekwall and A. Schiper, “Replication: Understanding the advantage of
atomic broadcast over quorum systems,” Journal of Universal Computer
Science, vol. 11, no. 5, pp. 703–711, 2005.
[18] M. Castro and B. Liskov, “Practical Byzantine fault-tolerance and
proactive recovery,” ACM Transactions Computer Systems, vol. 20, no. 4,
pp. 398–461, Nov. 2002.
[19] J.-P. Martin and L. Alvisi, “Fast Byzantine consensus,” IEEE Transactions on Dependable and Secure Computing, vol. 3, no. 3, pp. 202–215,
Jul. 2006.
[20] P. Zielinski, “Paxos at war,” University of Cambridge Computer Laboratory, Cambridge, UK, Tech. Rep. UCAM-CL-TR-593, Jun. 2004.
[21] M. J. Fischer, N. A. Lynch, and M. S. Paterson, “Impossibility of
distributed consensus with one faulty process,” Journal of the ACM,
vol. 32, no. 2, pp. 374–382, Apr. 1985.
[22] M. Abd-El-Malek, G. Ganger, G. Goodson, M. Reiter, and J. Wylie,
“Fault-scalable Byzantine fault-tolerant services,” in Proceedings of the
20th ACM Symposium on Operating Systems Principles - SOSP 2005,
Oct. 2005, pp. 59–74.
IEEE TRANSACTIONS ON COMPUTERS, VOL. X, NO. Y, MONTH YEAR
[23] J. Cowling, D. Myers, B. Liskov, R. Rodrigues, and L. Shrira, “HQReplication: A hybrid quorum protocol for byzantine fault tolerance,”
in Proceedings of 7th Symposium on Operating Systems Design and
Implementations - OSDI 2006, Nov. 2006.
[24] F. B. Schneider, “Implementing fault-tolerant service using the state
machine aproach: A tutorial,” ACM Computing Surveys, vol. 22, no. 4,
pp. 299–319, Dec. 1990.
[25] A. N. Bessani, M. Correia, J. da Silva Fraga, and L. C. Lung, “Sharing memory between Byzantine processes using policy-enforced tuple
spaces,” IEEE Transactions on Parallel and Distributed Systems, vol. 20,
no. 3, pp. 419–443, Mar. 2009.
[26] R. R. Obelheiro, A. N. Bessani, L. C. Lung, and M. Correia, “How
practical are intrusion-tolerant distributed systems?” Dep. of Informatics,
Univ. of Lisbon, DI-FCUL TR 06–15, 2006.
[27] C. Dwork, N. A. Lynch, and L. Stockmeyer, “Consensus in the presence
of partial synchrony,” Journal of the ACM, vol. 35, no. 2, pp. 288–322,
1988.
[28] R. L. Rivest, A. Shamir, and L. M. Adleman, “A method for obtaining
digital signatures and public-key cryptosystems,” Communications of the
ACM, vol. 21, no. 2, pp. 120–126, 1978.
[29] D. Gifford, “Weighted voting for replicated data,” in Proceedings of the
7th ACM Symposium on Operating Systems Principles - SOSP’79, Dec.
1979, pp. 150–162.
[30] R. A. Bazzi and Y. Ding, “Non-skipping timestamps for Byzantine data
storage systems,” in Proceedings of 18th International Symposium on
Distributed Computing - DISC 2004, Oct. 2004, pp. 405–419.
[31] C. Cachin and S. Tessaro, “Optimal resilience for erasure-coded Byzantine distributed storage,” in Proceedings of the International Conference
on Dependable Systems and Networks - DSN 2006, Jun. 2006, pp. 115–
124.
[32] B. Liskov and R. Rodrigues, “Tolerating Byzantine faulty clients in a
quorum system,” in Proceedings of 26th IEEE International Conference
on Distributed Computing Systems - ICDCS 2006, 2006.
[33] D. Malkhi and M. Reiter, “Secure and scalable replication in Phalanx,”
in Proceedings of the 17th IEEE Symposium on Reliable Distributed
Systems - SRDS 1998, Oct. 1998, pp. 51–60.
[34] J.-P. Martin, L. Alvisi, and M. Dahlin, “Minimal Byzantine storage,”
in Proceedings of the 16th International Symposium on Distributed
Computing - DISC 2002, Oct. 2002, pp. 311–325.
[35] G. Bracha and S. Toueg, “Asynchronous consensus and broadcast
protocols,” Journal of ACM, vol. 32, no. 4, pp. 824–840, 1985.
[36] M. Correia, N. F. Neves, and P. Verı́ssimo, “How to tolerate half less
one Byzantine nodes in practical distributed systems,” in Proceedings
of the 23rd IEEE Symposium on Reliable Distributed Systems - SRDS
2004, Oct. 2004, pp. 174–183.
[37] J.-P. Martin, L. Alvisi, and M. Dahlin, “Small Byzantine quorum
systems,” in Proceedings of the Dependable Systems and Networks DSN 2002, Jun. 2002, pp. 374–388.
[38] N. Busi, R. Gorrieri, R. Lucchi, and G. Zavattaro, “SecSpaces: a datadriven coordination model for environments open to untrusted agents,”
Electronic Notes in Theoretical Computer Science, vol. 68, no. 3, pp.
310–327, Mar. 2003.
[39] P. Dutta, R. Guerraoui, R. R. Levy, and A. Chakraborty, “How fast can
a distributed atomic read be?” in Proceedings of the 23rd annual ACM
Symposium on Principles of Distributed Computing - PODC 2004, Jul.
2004, pp. 236–245.
[40] N. Busi, R. Gorrieri, and G. Zavattaro, “On the expressiveness of Linda
coordination primitives,” Information and Computation, vol. 156, no.
1-2, pp. 90–121, Jan. 2000.
[41] A. N. Bessani, J. da Silva Fraga, and L. C. Lung, “BTS: A Byzantine
fault-tolerant tuple space,” in Proceedings of the 21st ACM Symposium
on Applied Computing - SAC 2006, 2006, pp. 429–433.
[42] A. N. Bessani, E. P. Alchieri, M. Correia, and J. S. Fraga, “DepSpace: A
Byzantine fault-tolerant coordination service,” in Proceedings of the 3rd
ACM SIGOPS/EuroSys European Systems Conference - EuroSys 2008,
Apr. 2008, pp. 163–176.
[43] D. Malkhi and M. Reiter, “An architecture for survivable coordination in
large distributed systems,” IEEE Transactions on Knowledge and Data
Engineering, vol. 12, no. 2, pp. 187–202, Apr. 2000.
[44] I. Abraham, G. Chockler, I. Keidar, and D. Malkhi, “Byzantine disk
paxos: optimal resilience with Byzantine shared memory,” Distributed
Computing, vol. 18, no. 5, pp. 387–408, Apr. 2006.
[45] N. H. Minsky, Y. M. Minsky, and V. Ungureanu, “Making tuple-spaces
safe for heterogeneous distributed systems,” in Proceedings of the 15th
ACM Symposium on Applied Computing - SAC 2000, Mar. 2000, pp.
218–226.
14
Alysson Neves Bessani is Visiting Assistant Professor of the Department of Informatics of the
University of Lisboa Faculty of Sciences, Portugal, and a member of LASIGE research unit and
the Navigators research team. He received his B.S.
degree in Computer Science from Maringá State
University, Brazil in 2001, the MSE in Electrical
Engineering from Santa Catarina Federal University
(UFSC), Brazil in 2002 and the PhD in Electrical
Engineering from the same university in 2006. His
main interest are distributed algorithms, Byzantine
fault tolerance, coordination, middleware and systems architecture.
Miguel Correia is Assistant Professor of the Department of Informatics, University of Lisboa Faculty of
Sciences, and Adjunct Faculty of the Carnegie Mellon Information Networking Institute. He received
a PhD in Computer Science at the University of
Lisboa in 2003. Miguel Correia is a member of the
LASIGE research unit and the Navigators research
team. He has been involved in several international
and national research projects related to intrusion
tolerance and security, including the MAFTIA and
CRUTIAL EC-IST projects, and the ReSIST NoE.
He is currently the coordinator of University of Lisboa’s degree on Informatics
Engineering and an instructor at the joint Carnegie Mellon University and
University of Lisboa MSc in Information Technology - Information Security.
His main research interests are: intrusion tolerance, security, distributed
systems, distributed algorithms. More information about him is available at
http://www.di.fc.ul.pt/∼mpc.
Joni da Silva Fraga received the B.S. degree
in Electrical Engineering in 1975 from University
of Rio Grande do Sul (UFRGS), the MSE degree in Electrical Engineering in 1979 from the
Federal University of Santa Catarina (UFSC), and
the PhD degree in Computing Science (Docteur
de l’INPT/LAAS) from the Institut National Polytechnique de Toulouse/Laboratoire d’Automatique
et d’Analyse des Systémes, France, in 1985. Also,
he was a visiting researcher at UCI (University of
California, Irvine) in 1992-1993. Since 1977 he has
been employed as a Research Associate and later as a Professor in the
Department of Automation and Systems at UFSC, in Brazil. His research
interests are centered on Distributed Systems, Fault Tolerance and Security.
He has over a hundread scientific publications and is a Member of the IEEE
and of Brazilian scientific societies.
Lau Cheuk Lung is an associate professor of
the Department of Informatics and Statistics (INE)
at Federal University of Santa Catarina (UFSC).
Currently, he is conducting research in fault tolerance, security in distributed systems and middleware. From 2003 to 2007, he was an associate
professor in the Department of Computer Science
at Pontifical Catholic University of Paraná (Brazil).
From 1997 to 1998, he was an associate research
fellow at the University of Texas at Austin, working
in the Nile Project. From 2001 to 2002, he was
a postdoctoral research associate in the Department of Informatics of the
University of Lisbon, Portugal. In 2001, Lau received the PhD degree in
Electrical Engineering from the Federal University of Santa Catarina, Brazil.