Bridging Paxos and Blockchain Consensus: Aleksey Charapko Ailidani Ailijiang Murat Demirbas
Bridging Paxos and Blockchain Consensus: Aleksey Charapko Ailidani Ailijiang Murat Demirbas
Bridging Paxos and Blockchain Consensus: Aleksey Charapko Ailidani Ailijiang Murat Demirbas
Abstract—The distributed consensus problem has been ex- Byzantine tolerant consensus solutions, such as PBFT [5], are
tensively studied in the last four decades as an important limited to less than 10, and the latency of PBFT has been
problem in distributed systems. Recent advances in decentralized shown to grow quadratically as the number of nodes in the
consensus and blockchain technology, however, arose from a
disparate model and gave rise to disjoint knowledge-base and cluster increases [6].
techniques than those in the classical consensus research. In Recently, on the other side of the canyon, blockchain
this paper we make a case for bridging these two seemingly consensus arose as a radical solution to distributed consen-
disparate approaches in order to help transfer the lessons sus, ignoring both the contributions and baggage of classical
learned from the classical distributed consensus world to the
blockchain world and vice versa. To this end, we draw parallels distributed consensus work. The blockchain consensus model
between blockchain consensus and a classical consensus protocol, brings new constraints and requirements to the consensus
Paxos. We also survey prominent approaches to improving the problem. In this model, participation is open and permission-
throughput and providing instant irreversibility to blockchain less. In an open model, it is not sufficient anymore to use
consensus and show analogies to the techniques from classical 3-5 participants—which was enough for tolerating crashes
consensus protocols. Finally, inspired by the central role formal
methods played in the success of classical consensus research, we and ensuring persistency of the data in a datacenter setting.
suggest more extensive use of formal methods in modeling the For reasons of attestability and tolerating colluding groups of
blockchains and smartcontracts. Byzantine participants, it is preferred to keep the participants
at 1000s or higher.
I. I NTRODUCTION
To deal with the challenges of Byzantine participants in
The consensus problem has been studied for more than open/permissionless settings, many blockchain consensus pro-
forty years in distributed systems. Distributed consensus is tocols adopt proof-of-work (PoW) [7], [8] as an identity
a fundamental problem arising in the context of database assignment tool. PoW ensures that the number of identi-
transactions, state machine replication, group membership, and ties assigned to the honest and adversarial parties can be
leader election. Consensus means processes reach agreement made proportional to their aggregate computational power. By
on a single value: The processes provide their candidate values, employing PoW for blockchain consensus, Sybil attacks [9]
communicate with one another, and agree/commit on a single that employ impersonation is rendered useless since each
consensus value regardless of some faulty processes. sock puppet need to expend significant amount of computing
The research on consensus in distributed systems literature resources. PoW is expensive in terms of time wasted and
has established a rigorous perimeter around the formal safety electricity bills: only a miner which has successfully solved a
and liveness/progress guarantees that can be provided. In computationally hard puzzle (finding the right nonce for the
particular, Paxos [1] protocol for consensus compartmen- block header) can append to the blockchain.
talizes the safety and progress properties nicely. Even un- Blockchain solutions are also not immune from short-
der a fully asynchronous model (where all timing assump- comings. Many solutions exhibited low throughput and high
tions/expectations break), Paxos preserves safety thanks to its latency problems. The approximate probabilistic nature of
balloting and anchoring system. Paxos also provides progress some blockchain solutions also created issues. Finally, some
when the system is out of the realm of the fully asynchronous problems arose from unclear/vague assumptions and guaran-
model (where FLP result [2] holds) and in to the partially tees in blockchain protocols.
synchronous model where weak-complete and eventually-
Contributions. In this paper we make a case for bridging
weak-accurate failure detectors are implementable [3]. Formal
these two seemingly disparate approaches in order to help
methods and formally specified assumptions and guarantees
transfer the lessons learned from the classical distributed
played a key role in the success of the classical distributed
consensus world to the blockchain world and in reciprocal.
consensus work. Today Paxos variants have been deployed
Our paper has the following major contributions.
ubiquitously in many cloud computing and web applications
to provide distributed coordination [4]. 1) We show parallels between the two approaches by show-
A shortcoming of distributed consensus work is that it failed ing how blockchain consensus fits in and compares with
to provide a solution that scales to the large number of par- the Paxos consensus protocol.
ticipants desired in public, permissionless, open participation 2) We survey prominent blockchain consensus work that
settings. Paxos deployments are typically limited to 5 nodes. achieved improved throughput and instant irreversibility,
selfappoint as wait for wait for
command leader majority majority
and show analogies to techniques from the classic con- "lead with
"Ok, but" value v? "Ok" commit v
sensus protocols. Leader
ballot b?"
500 Nodes
than that of byzantized Paxos, making it more suitable for
odes
900 Nodes
400 public environments without further changes.
300 N
Node
s ode
Latency (ms)
300
700 N
IV. B LOCKCHAIN E XTENSIONS AND R ELATION TO PAXOS
1100
200
In this section we discuss some recent blockchain optimiza-
100 tion and enhancements and show how they relate to classical
consensus protocols, such as Paxos and practical byzantine
0
0 20 40 60 80 100 120 consensus protocol (PBFT).
Throughput (rounds/sec)
Fig. 4: Paxos throughput drastically reduces for larger cluster. A. Leader Election: Stable Leader
Paxos deployments typically continue with the same leader
for many rounds. So to avoid paying the cost of phase-1
in a Paxos protocol, the leader would struggle to maintain any
repeatedly, MultiPaxos, or multi-decree Paxos, skips phase-
reasonable commit throughput while receiving that many ack
1 and continues with iterating over phase-2 messages for the
messages in each round. To avoid the problem with thousands
consecutive rounds. If an incumbent leader arises, the phase-
of participants, the blockchain has only one way broadcast
1 leader election may need to happen again. An orthogonal
communication, which is propagated from the leader to the
mechanism, such as a failure detection service, is used to
participants via a gradual gossip protocol (Figure 3).
ensure that there are not many candidates running to become
We illustrate the problem with Paxos leader struggling to
a leader at a given time, as that may hinder the progress of
sustain high throughput for large clusters in Figure 4, showing
the protocol. That situation is called the “dueling leaders”
the results of performance model [13] with parameters to
situation. As we explained earlier, even when there are mul-
roughly match single region EC2 deployment on small nodes.
tiple leader candidates, Paxos is safe thanks to the ballot and
In particular, we modeled a FIFO processing pipeline at the
majority mechanisms employed in phase-1 and phase-2.
leader node with a G/G/1 queue. We approximated message
Bitcoin-NG [16] introduces a similar optimization to the
processing and communication delays with real Paxos imple-
blockchain, where a single leader takes control over the ledger
mentation over Amazon AWS EC2 t2-small instances. We as-
for a duration of several microblocks. In the blockchain setting,
sumed normally distributed network latency. Our final average
however, we do not want to have one stable leader for too long,
latency formula is as follows: L = 2mo + N mi + lm + w + l,
as to incentivize other miners and make sure no single miner
where mo and mi are processing latencies for outbound and
can control the network for long enough to gain any advantage.
inbound messages, lm is the round-trip time for the message
To that case, Bitcoin-NG uses key blocks to carry out the leader
making a majority quorum obtained with Monte-Carlo method
election. Similar to the original Bitcoin [12], miners try to find
approximation of k-order statistics, w is the G/G/1 queue wait
a nonce that will generate a certain predetermined hash value
time [14] and l is network RTT between leader and a client
for the key block. Once the leader is chosen, it can append
issuing a request.
some limited number of microblocks to the ledger. The leader
is prohibited from appending too many microblocks or from
D. The Environment: Public vs. Private appending them too often to prevent malicious or greedy leader
The typical use case of Paxos is to replicate state to make from swamping the system with transactions.
it consistently and safely survive crash failures in a private In Bitcoin-NG microblocks do not require any proof-of-
environment. As a result, keeping the number of participants work, as the leader is already chosen in the key block. This
small, around 5 nodes, works well for this case. allows the leader to process transactions and add them to the
The blockchain applications bring new constraints and re- ledger more efficiently, given that is stays within the rate limits
quirements to the consensus problem. In blockchains, the par- on issuing the microblocks.
ticipants can now be Byzantine, motivated by financial gains.
So it is not sufficient to limit the consensus participants to be B. Irreversibility of Blockchain
3-5 nodes, because the rest of the network does not necessarily Byzcoin [17] is a recent blockchain variant that uses
trust these nodes. In blockchains, for reasons of attestability practical byzantine consensus protocol (PBFT) to make the
and tolerating colluding groups of Byzantine participants, it is blockchain consensus instantly irreversible when a block is
preferred to have many thousands of participants. added to the ledger, rather than probabilistically irreversible
Comparing popular proof-of-work Nakamoto [12] consen- after adding some number of consecutive blocks to the ledger.
sus with Byzantine consensus solutions, such as byzantine PBFT cannot reasonably scale over to thousands of nodes
Paxos [15] reveals major differences. Whereas Nakamoto con- needed for public blockchain networks, therefore Byzcoin
sensus has probabilistic guarantees for termination, agreement, forms a small subset or council of nodes to run PBFT. Unlike
and validity, the classic Byzantine Consensus has deterministic regular blockchain with a single miner leading a block, in
guarantees for them. However, the performance of Nakamoto Byzcoin the council is responsible on agreeing on what the
consensus is far less sensitive to the number of participants next block should be. Once such agreement has been reached,
and thee council collectively endorses the block with Schnorr A practical companion to formal methods is the model
signatures and the block becomes irreversible. checking tools to support them. Model checking frameworks
Byzcoin must choose the council with great care to prevent provide a language to specify/model a distributed algorithm,
large colluding groups of nodes from entering the council. and also support testing this model exhaustively to detect
A PoW approach is used to elect the council members on a violations of user provided safety and progress properties. A
rolling basis: the miner or miners successfully solving the most prominent example is the TLA+ [10] framework, by Leslie
recent block join the council, while some old members leave. Lamport, who is also the inventor for Paxos. TLA+ pro-
The council consists of 144 members, roughly representing vides a math/logic based language for specifying distributed
a day worth of miners in a chain producing a new block algorithms/protocols and model checking them. Many major
every 10 minutes. The PoW prevents any colluding party from distributed consensus and coordination protocols, including
overpowering the council, given the total number of Byzantine Paxos and several of its variants [19], [20], [21], have been
nodes in the network is less than one-third. modeled and verified with TLA+.
Even on the industry side of distributed systems, formal
C. Federated Blockchains methods have been adapted to deal with the challenges of
Stellar Consensus Protocol (SCP) [18] gives a different take distributed systems. AWS successfully used invariant-based
on using classical byzantine consensus, such as PBFT. Unlike reasoning and formal methods (in particular TLA+) for build-
Byzcoin that must carefully select the council to run PBFT, ing robust distributed systems. AWS used TLA+ in many
SCP breaks the network into federation or groups of nodes. key projects: S3, DynamoDB, EBS, and a distributed lock
Upon entering the network, a node specifies some quorum manager. The paper “Use of formal methods at Amazon Web
slices. Each quorum slice represents a consortium of other Services, 2014” [22] writes: “Before launching any complex
participants a node trusts. However, the node does not need service, we need to reach extremely high confidence that the
to trust every node in the quorum slice individually. SCP uses core of the system is correct. We have found that the standard
quorum slices to construct quorums for different nodes in such verification techniques in industry (deep design reviews, code
a way that every node in the quorum must have at least one reviews, static code analysis, stress testing, fault-injection
of its slices be entirely in the quorum as well. testing, etc.) are necessary but not sufficient. Human fallibility
The protocol runs in a two phases. It starts with a nomi- means that some of the more subtle, dangerous bugs turn
nation phase, which if run long enough, eventually produces out to be errors in design; the code faithfully implements
the same set of candidate values at every intact node. Upon the intended design, but the design fails to correctly handle a
predicted/approximated convergence of nomination phase, the particular ‘rare’ scenario. We have found that testing the code
nodes start the ballot phase to perform federated voting is inadequate as a method to find subtle errors in design.”
(PBFT) to commit and abort ballots associated with composite Based on the significant role formal methods played in the
values. The ballot phase runs on some quorum in the system, success of the classical distributed consensus and coordination
and for safety any two quorums in the network need to work, we suggest the use of formal methods for blockchain
intersect with at least one non-faulty, non-byzantine node. consensus for better specifying and model checking the
PBFT also makes the committed blocks instantly irreversible. blockchain protocols and smartcontracts.
SCP does not rely on PoW for any of its operations. The
quorum intersection requirement also means that quorums A. Using formal methods for blockchain consensus
must get larger with more nodes participating, potentially TLA+ can help precisely specify the assumptions made
reducing the performance of the system. by blockchain protocols, such as timing and partition as-
sumptions. Several protocols have been shown vulnerable to
V. F ORMAL M ETHODS AND B LOCKCHAIN timing attacks because they depended on some reasonable time
Formal methods have been an important part of distributed synchronization assumptions. While PoW consensus protocols
systems research. Since concurrency introduces an explosion make several reasonable timing assumptions, when an attacker
of possible executions, operational reasoning about distributed manages to violate tem, the safety of the protocol may be
algorithms is inapplicable. Clearly/unambiguously specifying violated as well. Using TLA+ it is possible to model these
the protocol and environment behavior is of paramount im- assumptions and identify vulnerabilities.
portance for distributed systems. Any assumptions made while Another benefit of using formal methods and TLA+ for
building a distributed system will bite back. For instance, it modeling blockchain protocols is in the design process. TLA+
may not be safe to assume that one network hop is faster than specification language supports stepwise refinement, as it is
two, or even that no hops is faster than one. Or what was easy to check if one model implements/refines the other: this
assumed to be an atomic execution block may be violated by is specified via an implication sign. This can be leveraged
some rogue process executing concurrently and changing the to identify new implementations and abstractions as well as
system state in an unanticipated manner. When faults enter the designing and verifying new features to blockchain protocols,
picture, things get even more convoluted, because fault-actions such as sharding.
will collude with program actions to complicate things and Of course modeling blockchain protocols brings new chal-
create many new corner cases. lenges for TLA+ framework. Blockchain protocols have a
(a) Faulty model
(b) Fixed model
Fig. 5: TLA+ model for DAO attack
probabilistic component, and it is important to model it in To illustrate the usefulness of formal methods and how they
TLA+. It is also important to develop libraries and high level can save contract developers from major problems, we have
primitives for specifying consensus algorithms and blockchain modeled the DAO withdrawal protocol with the bug ultimately
consensus protocols (with models for basic math primitives exploited by the hackers as shown in Figure 5a. We then
like hashes, public/private signatures). performed model checking to find the concurrency problem
with the contract’s protocol and corrected it in Figure 5b.
B. Using formal methods for smartcontracts
TLA+ facilitates invariant-based reasoning, so
Smartcontracts [23], [24] represent an important application
we first define an invariant for safe withdrawal:
of blockchain technology. Smartcontracts help users establish
SafeWithdrawal == (bankBalance=BALANCE ∧
and enforce the common rules for all involved parties, avoiding
malloryBalance=0) ∨ (bankBalance=BALANCE-AMOUNT
the need in lawyers and courts as reward and/or failure clauses
∧ malloryBalance=AMOUNT). This invariant checks that
execute automatically in the contract. However, smart contracts
Mallory’s withdrawal affects the bank balance only once.
are not devoid of challenges. Since a contract is non-mutable
However, we can see the invariant is too tight for our
once in place, all parties must be sure of contract’s correctness
protocol, since the bank balance updates before Mallory
before joining. Formal methods and model checking allow to
receives the money in non-atomic way, i.e. the money can
simplify the contract development, ensure its correctness under
be in-flight from bank account to Mallory. That is how the
various conditions and provide an extra level of confidence to
invariant-based thinking helps us immediately: we can see
the parties joining the contract.
that the withdrawal is a non-atomic operation, and realize that
When money is involved, attackers get smart quickly:
we should be more careful with the updates. Nevertheless,
“Never underestimate the time and expense your opponent will
we relax the invariant as shown in Figure 5a on lines 5-9.
take to break your code”[25]. It is easy to have vulnerabilities
in concurrent code due to the many corner cases. Smartcon- Procedure BankWithdraw on lines 11-18 in Figure 5a mod-
tracts are especially at risk due to money on the line and the els the bank withdrawal contract, while procedure mallory-
high speed of transfers. TLA+ can be used for identifying SendMoney (lines 20-26) governs how Mallory receives the
those cornercases and help design correct protocols. money after the bank withdrawal. Note that in line 25 she
Modeling the DAO hack. can attempt to withdraw the money again. Also note that
Incorrect or buggy smart contract can have large repercus- BankWithdraw line 15 gives money to Mallory before updating
sion on the parties involved. For instance, the DAO attack the bank balance on line 17. If Mallory is cheating, she may
[23] resulted in unauthorized drainage of large amounts of attempt to withdraw the money before the bank balance is
money from the DAO, an investment fund managed entirely updated from her previous withdrawal.
by smart contracts with no central authority or human control. TLA+ model checking catches the malicious behavior eas-
The contract protocol allowed the participants to withdraw ily, since some execution causes the violation of the invariant,
some of their money from the fund, however, the withdraw part resulting in a double spending. The model checker also
of the contract was flawed. The bug was ultimately exploited, produces the error trace, outlining all steps that must happen
resulting in the fund’s money stolen by a malicious participant. to cause the invariant violation. In this case, the error trace
Throughput (transactions/s)
contains 8 steps: initially BankWithdraw is called, which then 12
Pa
calls the MallorySendMoney to complete withdrawal. How- 10 xo
s1
ever, Mallory’s SendMoney implementation includes another Mb
8 Bl
call to BankWithdraw and the balance check on line 13 passes oc
k
6
because bankBalance is not decremented by amount (that
Blockchain 1 Mb Block
comes in line 17). So the second BankWithdraw executes 4
concurrently and Mallory manages to do double withdrawal.
2000 3000 4000 5000 6000 7000 8000 9000 10000
In Figure 5b we illustrate the fixed version of the protocol. Cluster Size (Nodes)
We move the updating bank balance before invoking Mallo-
rySendMoney, making sure the bank balance updates before Fig. 6: Estimated maximum throughput of paxos and
Mallory receives the funds. Of course, for that we change blockchain, both using 1 Mb blocks with 495 byte transactions.
SafeWithdrawal to accommodate the new way of updating
bankBalance. But it turns out that the invariant is still too tight. while the difficulty of the puzzle both reduces the rate at
For instance, if initially BALANCE=10 and we call BankWi- which that one-way communication happens and reduces the
hdraw(AMOUNT=4), it is still valid to have two withdrawals chance of multiple miners arriving to the solution at roughly
concurrently provided that in the final state no new money the same time. Additionally, the peer-to-peer communication
is produced: Invariant == bankBalance+malloryBalance ≤ further reduces miner bottlenecks by shedding some of the
BALANCE. We also model check for progress and write an network load to the nearby peer nodes.
EndState temporal formula for it on lines 11-12. All of these techniques make sure that the network per-
formance does not become the bottleneck in the system,
VI. PAXOS AND B LOCKCHAIN P ERFORMANCE allowing blockchains to provide stable throughput regardless
Paxos and PoW blockchain protocols have very different of the number of participants. One can easily estimate the
performance characteristics, making it rather hard to perform maximum throughput of the system just by knowing how many
a direct and fair comparison. Paxos is designed for small clus- transactions can go to a block and the mining rate of the chain:
ters, relatively quick operation latency and high throughput. Rmax = bs Rµblocks
tx
, where Rblocks is the block mining rate in
Blockchain, on the other hand, runs on many thousands of blocks per some unit of time, bs is the block’s storage capacity,
participants and trades small operation latency for predictable and µtx is average transaction size.
throughput in large deployments. Figure 6 shows our estimated maximum throughput for
Paxos-style protocols rely on inter-node communication, Paxos and blockchain. Both protocols use 1 Mb blocks filled
making the sheer number of message exchanges to be a bottle- with transactions of 495 bytes. For Paxos, we assumed the
neck for a Paxos round leader. In typical small deployments, leader is responsible for sending blocks to each follower over
the message requirement for a round is manageable, however a 100 Mbit connection. Expectedly, the number of messages
as cluster size increases, the same leader needs to handle more greatly hinders the performance of Paxos as we increase the
traffic, ultimately degrading the performance. number of nodes due to the longer block transmission times.
The maximum throughput of Paxos largely depends on the At the blockchain scale, a lower baseline throughput is offset
network and processing capacity of the leader server and the by its stability and better performance in large deployments.
number of followers. If an incoming and outgoing message Note that we did not use byzantized Paxos in this comparison,
and outgoing broadcast take an average ofµi and µo and µb and protocols like PBFT require even more communication
milliseconds to process respectively, we can estimate the max- and will degrade quicker than non-byzantine Paxos.
1
imum throughput of a Paxos system as Rmax = N µi +µ b +µo
, Reasoning about operation latencies involves accounting
where N is the number of nodes in the cluster. for all possible delays between client issuing a request and
On the blockchain scale with thousands of nodes, network receiving the ack from the system. For very large clusters, the
bandwidth becomes the limiting factor, since sending an accept network bandwidth at the leader will determine these latencies,
message to all participants is problematic even with smallest and for smaller deployments the latencies are largely driven
possible blocks. For instance, in the absence of UDP multicast by the compute capacity. For instance, in a network-bound
on the Internet scale, 1 Kb block requires about 0.82 seconds estimate, the time to run a Paxos round on 1 Mb block at a
to be send over to all Paxos followers in a cluster of 10,000 leader with 100 mbps connection may exceed 800 seconds, if
nodes with 100 mbps connection at the leader. This drives no additional methods to relieve network bottlenecks are used.
the broadcast cost µb very high for large clusters. Orthogonal PoW blockchain latency largely depends on the probabilistic
solutions are then required to make Paxos overcome the nature of Nokamoto consensus. Thus, the latency of PoW
network bottlenecks, such as using peer-to-peer networks or blockchain depends on the time required to mine sufficient
pub/sub systems, such as Apache Kafka or BookKeeper [26], number of blocks to make reversing the consensus highly
[27], to deliver signed messages from the leader and back. improbable. In Bitcoin this is typically 6 blocks, each taking
Blockchain, on the other hand, puts more effort in the proof- roughly 10 minutes to mine. However, transactions may spend
of-work and delay it introduces. PoW requires only a one- some additional time in the pool before being picked up by
way communication from the successful miner to its peers, the miners and added to the block, and the queuing effect
of the pool may drive transaction commit latency above [5] M. Castro and B. Liskov, “Practical byzantine fault tolerance and
the 60-minute minimum. More recent blockchains that offer proactive recovery,” ACM Trans. Comput. Syst., vol. 20, no. 4, pp. 398–
461, November 2002.
irreversibility of transactions once the block is appended to [6] L. Luu, V. Narayanan, C. Zheng, K. Baweja, S. Gilbert, and P. Saxena,
the chain can greatly reduce the minimum latency to the time “A secure sharding protocol for open blockchains,” in Proceedings of
of mining/adding one block. the 2016 ACM SIGSAC Conference on Computer and Communications
Security. ACM, 2016, pp. 17–30.
[7] C. Dwork and M. Naor, “Pricing via processing or combatting junk
VII. F UTURE W ORK AND C ONCLUDING R EMARKS mail,” in Annual International Cryptology Conference. Springer, 1992,
In this paper we drew parallels between classical consensus pp. 139–147.
[8] M. Jakobsson and A. Juels, “Proofs of work and bread pudding proto-
protocols and blockchains. Although the algorithms have orig- cols,” in Secure Information Networks. Springer, 1999, pp. 258–272.
inated in disparate communities with different background and [9] M. Demirbas and Y. Song, “An rssi-based scheme for sybil attack de-
expertise, both share many similarities, such as leader election tection in wireless sensor networks,” Advanced EXPerimental activities
ON WIRELESS networks and systems (EXPONWIRELESS) Workshop
phase, and transaction accept phase. Recently there has been (as part of WOWMOM), pp. 564–570, 2006.
several successful examples of transferring techniques from [10] L. Lamport, “The temporal logic of actions,” ACM Transactions on
classical consensus protocols to blockchain domain to offer Programming Languages and Systems, vol. 16, no. 3, pp. 872–923, May
1994.
better throughput and latency. Aside from borrowing on the [11] L. Lamport, R. Shostak, and M. Pease, “The byzantine generals prob-
protocol-level techniques, we suggest that blockchain work lem,” ACM Transactions on Programming Languages and Systems
would benefit from the formal methods commonly adopted in (TOPLAS), vol. 4, no. 3, pp. 382–401, 1982.
[12] S. Nakamoto, “Bitcoin: A peer-to-peer electronic cash system.”
distributed systems. For instance, the DAO hack case study in [13] “Paxos performance model,” https://github.com/acharapko/ppm/, 2018.
Section V shows how formal specification and model checking [14] W. G. Marchal, “An approximate formula for waiting time in single
can help find bugs in smartcontracts. server queues,” AIIE transactions, vol. 8, no. 4, pp. 473–474, 1976.
[15] L. Lamport, “Byzantizing paxos by refinement,” in International Sym-
In our current work, we work towards a more direct perfor- posium on Distributed Computing. Springer, 2011, pp. 211–224.
mance comparison of blockchain and classical consensus at the [16] I. Eyal, A. E. Gencer, E. G. Sirer, and R. Van Renesse, “Bitcoin-ng: A
protocol level in order to illustrate the benefits and drawbacks scalable blockchain protocol.” in NSDI, 2016, pp. 45–59.
[17] E. K. Kogias, P. Jovanovic, N. Gailly, I. Khoffi, L. Gasser, and B. Ford,
of protocol-level techniques independent of implementations “Enhancing bitcoin security and performance with strong consistency
and deployment environments. via collective signing,” in 25th USENIX Security Symposium (USENIX
We have developed the Paxi framework [28], which allows Security 16), 2016, pp. 279–296.
[18] D. Mazieres, “The stellar consensus protocol: A federated model for
consensus protocols to be evaluated in a common environment internet-level consensus,” Stellar Development Foundation, 2015.
under varying workloads, deployment size, and failure recov- [19] I. Moraru, D. G. Andersen, and M. Kaminsky, “There is more consensus
ery scenarios. Paxi framework already ships with a variety in egalitarian parliaments,” in Proceedings of the Twenty-Fourth ACM
Symposium on Operating Systems Principles. ACM, 2013, pp. 358–
of Paxos protocols including MultiPaxos [29], Flexible Paxos 372.
[30], EPaxos [19], Paxos Groups, WPaxos [31], WanKeeper [20] S. Peluso, A. Turcu, R. Palmieri, G. Losa, and B. Ravindran, “Making
[32], within the framework. We plan to add Blockchain fast consensus generally faster,” Proceedings - 46th Annual IEEE/IFIP
International Conference on Dependable Systems and Networks, DSN
implementations to Paxi in the near future to enable direct 2016, pp. 156–167, 2016.
comparison between Paxos and Blockchain solutions. [21] L. Lamport, D. Malkhi, and L. Zhou, “Vertical paxos and primary-
In our future work, we propose to use TLA+/PlusCal to backup replication,” in Proceedings of the 28th ACM symposium on
Principles of distributed computing. ACM, 2009, pp. 312–313.
develop a formal framework and a domain specific language [22] C. Newcombe, “Why amazon chose tla+,” in Abstract State Machines,
in order to facilitate modeling and proving properties for Alloy, B, TLA, VDM, and Z. Springer, 2014, pp. 25–39.
blockchain consensus protocols and systems. The framework [23] M. HERLIHY, “Blockchains from a distributed computing perspective,”
2018.
will be used for model checking blockchain and smartcontracts [24] K. Delmolino, M. Arnett, A. Kosba, A. Miller, and E. Shi, “Step
in the presence of inopportune crash and byzantine failures, by step towards creating a safe smart contract: Lessons and insights
stretched timelines and schedules, and network partitions. from a cryptocurrency lab,” in International Conference on Financial
Cryptography and Data Security. Springer, 2016, pp. 79–94.
Finally, the stepwise refinement that TLA+ formalism supports [25] R. Morris, “Ways of losing information,” 1995.
can help us explore the blockchain design state-space for [26] J. Kreps, N. Narkhede, J. Rao et al., “Kafka: A distributed messaging
hybridized/hierarchical versions of blockchains protocols that system for log processing.” NetDB, 2011.
[27] F. P. Junqueira, I. Kelly, and B. Reed, “Durability with bookkeeper,”
provide efficiency/performance benefits. ACM SIGOPS Operating Systems Review, vol. 47, no. 1, pp. 9–15, 2013.
[28] A. Ailijiang, A. Charapko, and M. Demirbas, “Paxi - paxos protocol
R EFERENCES framework,” https://github.com/ailidani/paxi, 2018.
[1] L. Lamport, “Paxos made simple,” ACM SIGACT News, vol. 32, no. 4, [29] R. Van Renesse and D. Altinbuken, “Paxos made moderately complex,”
pp. 18–25, 2001. ACM Computing Surveys (CSUR), vol. 47, no. 3, p. 42, 2015.
[2] M. J. Fischer, N. A. Lynch, and M. S. Peterson, “Impossibility of [30] H. Howard, D. Malkhi, and A. Spiegelman, “Flexible Paxos:
distributed consensus with one faulty processor,” Journal of the ACM, Quorum intersection revisited,” 2016. [Online]. Available: http:
vol. 32, no. 2, pp. 373–382, 1985. //arxiv.org/abs/1608.06696
[3] T. D. Chandra and S. Toueg, “Unreliable failure detectors for reliable [31] A. Ailijiang, A. Charapko, M. Demirbas, and T. Kosar, “Multileader
distributed systems,” Journal of the ACM (JACM), vol. 43, no. 2, pp. wan paxos: Ruling the archipelago with fast consensus,” arXiv preprint
225–267, 1996. arXiv:1703.08905, 2017.
[4] A. Ailijiang, A. Charapko, and M. Demirbas, “Consensus in the cloud: [32] A. Ailijiang, A. Charapko, M. Demirbas, B. O. Turkkan, and T. Kosar,
Paxos systems demystified,” in Computer Communication and Networks “Efficient distributed coordination at wan-scale,” in Distributed Comput-
(ICCCN), 2016 25th International Conference on. IEEE, 2016, pp. 1– ing Systems (ICDCS), 2017 37th International Conference on. IEEE,
10. 2017.