ABSTRACT In this paper we study the key management problem, in the context of Group Communication... more ABSTRACT In this paper we study the key management problem, in the context of Group Communication Systems (GCS). GCSs are mid-sized systems, scaling up to 100 members. We present a side-by-side comparison of three ways of managing keys, studing bandwidth and latency.
A message-passing solver for linear systems ... Ori Shental, Danny Bickson, Paul H. Siegel, Jack ... more A message-passing solver for linear systems ... Ori Shental, Danny Bickson, Paul H. Siegel, Jack K. Wolf and Danny Dolev ... Abstract We develop an efficient distributed message-passing solution for systems of linear equations based upon Gaussian belief propagation that ...
In this paper, the paradigm of linear detection is reformulated as a Gaussian belief propagation ... more In this paper, the paradigm of linear detection is reformulated as a Gaussian belief propagation (GaBP) scheme, without resorting to direct matrix inversion. The derived iterative framework allows for a distributive message-passing implementation of this important class of sub-optimal tractable estimators. The properties of GaBP-based linear detection are addressed, while its faster convergence, in comparison with conventional iterative solution methods, is demonstrated experimentally.
23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), 1982
Two different notions of Byzantine Agreement - immediate and eventually - are defined depending o... more Two different notions of Byzantine Agreement - immediate and eventually - are defined depending on whether the agreement involves an action to be performed synchronously or not. The lower bounds for time complexity depend on what kind of agreement has to be achieved. All previous algorithms to reach Byzantine Agreement ensure immediate agreement. We present two algorithms that in many
Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on - ITCS '12, 2012
Fair allocation has been studied intensively in both economics and computer science, and fair sha... more Fair allocation has been studied intensively in both economics and computer science, and fair sharing of resources has aroused renewed interest with the advent of virtualization and cloud computing. Prior work has typically focused on mechanisms for fair sharing of a single resource. We provide a new definition for the simultaneous fair allocation of multiple continuously-divisible resources. Roughly speaking, we define fairness as the situation where every user either gets all the resources he wishes for, or else gets at least his entitlement on some bottleneck resource, and therefore cannot complain about not getting more. This definition has the same desirable properties as the recently suggested dominant resource fairness, and also handles the case of multiple bottlenecks. We then prove that a fair allocation according to this definition is guaranteed to exist for any combination of user requests and entitlements (where a user's relative use of the different resources is fixed). The proof, which uses tools from the theory of ordinary differential equations, is constructive and provides a method to compute the allocations numerically.
Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing - PODC '97, 1997
We study failure detectors in an asynchronous environment that admits message omission failures. ... more We study failure detectors in an asynchronous environment that admits message omission failures. In such environments, processes may fail by crashing, but may also disconnect from each other. We adapt Chandra and Toueg's de nitions of failure detection completeness and accuracy to the omissions failure model, and de ne a weak failure detector 3W(om) that allows any majority of the processes that become connected to reach a
Today's modern high-end Network Interface Cards (NICs) are equipped with an onboard CPU. In most ... more Today's modern high-end Network Interface Cards (NICs) are equipped with an onboard CPU. In most cases, these CPU's are only used by the vendor and are operated by a proprietary OS, which makes them inaccessible to the HPC application developer. In this paper we present a design and implementation of a framework for building high-performance networking applications. The framework consists of an embedded NIC Operating System with a specialized scheduler. The main challenge in developing such a scheduler is the lack of a preemption mechanism in most high-end NICs. Our scheduler provides finer-grained schedules than the alternatives. We have implemented several network applications, and were able to increase their throughput while decreasing the host's CPU utilization.
Proceedings of the Thirty-First Hawaii International Conference on System Sciences, 1998
In this paper we present imss: IP Multicast Shortcut Service for ATM networks. imss pursues the \... more In this paper we present imss: IP Multicast Shortcut Service for ATM networks. imss pursues the \shortcut" routing paradigm in order to exploit the underlying ATM QoS and native routing mechanisms in the optimal way. imss is built on top of congress (CONnection oriented Group-address RESolution Service). Congress is an e cient native ATM protocol for resolution and management of multicast group addresses in a large ATM cloud. Congress resolves multicast group addresses and maintains their membership for applications. It is not designed to handle the applications' data-exchange.
This paper presents CONGRESS: a connection-oriented group- address resolution service, and its ap... more This paper presents CONGRESS: a connection-oriented group- address resolution service, and its applications. CONGRESS is an efficient native ATM protocol for resolution and management of multicast group addresses in an ATM WAN. It complements the native ATM multicast mechanisms. CONGRESS resolves multicast group addresses and maintains their membership for applications. It is not designed to handle the applications' data-exchange. Applications can use the resolved addresses returned by CONGRESS, in order to implement a many-to-many communication model. CONGRESS employs hierarchically organized servers in order to be scalable. CONGRESS' hierarchy is naturally mapped onto the ATM private network to network interface peer group hierarchy. CONGRESS communication overhead for management of a single multicast group is linear in the size of the group. Apart from facilitating native ATM multicast applications, CONGRESS can be used for the implementation of an IP multicast 'cut-through' routing over ATM. The cut-through routing paradigm is conceived as one of the most promising techniques for enabling the traditional IP- based communication with QoS. Unfortunately, due to a variety of reasons, the building of scalable IP multicast cut-through protocols is non-trivial. We claim that a multicast address resolution and maintenance service like CONGRESS, can greatly contribute to the development of scalable IP cut-through routing services. A conceptual cut-through routing solution, IP multicast service for non-broadcast access networking technology (IP-SENATE), built on top of CONGRESS and scalable to a large ATM cloud is sketched.
Consider an asynchronous system where each process begins with an arbitrary real value. Given som... more Consider an asynchronous system where each process begins with an arbitrary real value. Given some fixed > 0, an approximate agreement algorithm must have all non-faulty processes decide on values that are at most from each other and are in the range of the initial values of the non-faulty processes.
We define the "Pulse Synchronization" problem that requires nodes to achieve tight synchronizatio... more We define the "Pulse Synchronization" problem that requires nodes to achieve tight synchronization of regular pulse events, in the settings of distributed computing systems. Pulse-coupled synchronization is a phenomenon displayed by a large variety of biological systems, typically overcoming a high level of noise. Inspired by such biological models, a robust and selfstabilizing Byzantine pulse synchronization algorithm for distributed computer systems is presented. The algorithm attains near optimal synchronization tightness while tolerating up to a third of the nodes exhibiting Byzantine behavior concurrently. Pulse synchronization has been previously shown to be a powerful building block for designing algorithms in this severe fault model. We have previously shown how to stabilize general Byzantine algorithms, using pulse synchronization. To the best of our knowledge there is no other scheme to do this without the use of synchronized pulses.
Awareness of the need for robustness in distributed systems increases as distributed systems beco... more Awareness of the need for robustness in distributed systems increases as distributed systems become an integral part of day-to-day systems. Tolerating Byzantine faults and possessing self-stabilizing features are sensible and important requirements of distributed systems in general, and of a fundamental task such as clock synchronization in particular. There are efficient solutions for Byzantine non-stabilizing clock synchronization as well as for non-Byzantine self-stabilizing clock synchronization. In contrast, current Byzantine self-stabilizing clock synchronization algorithms have exponential convergence time and are thus impractical. We present a linear time Byzantine self-stabilizing clock synchronization algorithm, which thus makes this task feasible. Our deterministic clock synchronization algorithm is based on the observation that all clock synchronization algorithms require events for re-synchronizing the clock values. These events usually need to happen synchronously at the different nodes. In these solutions this is fulfilled or aided by having the clocks initially close to each other and thus the actual clock values can be used for synchronizing the events. This implies that the clock values cannot differ arbitrarily, which necessarily renders these solutions to be non-stabilizing. Our scheme suggests using a tight pulse synchronization that is uncorrelated to the actual clock values. The synchronized pulses are used as the events for re-synchronizing the clock values.
In this paper we present algorithms, which given a circular arrangement of n uniquely numbered pr... more In this paper we present algorithms, which given a circular arrangement of n uniquely numbered processes, determine the maximum number in a distributive manner . We begin with a simple unidirectional algorithm, in which the number of messages passed is bounded by 2n log n + 0(n) . By making several improvements to the simple algorithm, we obtain a unidirectional algorithm in which the number of messages passed is bounded by 1 .5n logn + 0(n) . These algorithms disprove Hirschberg and Sinclair's'conjecture that 0(n 2 ) is a lower bound on the number of messages passed in undirectional algorithms for this problem . At the end of the paper we indicate how our methods can be used to improve an algorithm due to Peterson, to obtain a unidirectional algorithm using at most 1 .356n log n + 0(n) messages. This is the best bound so far on the number of messages passed in both the bidirectional and unidirectional cases .
Byzantine Agreement involves a system of n processes, of which some t may be faulty . The problem... more Byzantine Agreement involves a system of n processes, of which some t may be faulty . The problem is for the correct processes to agree on a binary value sent by a transmitter that may itself be one of the n processes . If the transmitter sends the same value to each process, then all correct processes must agree on that value, but in any case, they must agree on some value . An explicit solution not using authentication for n = 3t + 1 processes is given, using . 2t + 3 rounds and 0( t 3 l og t ) message bits . This solution is easily extended to the general case of n >, 3t + 1 to give a solution using 2t + 3 rounds and O(nt + t3 log t) message bits .
Abstract This paper is concerned with the solvability of the problem of processor renaming in unr... more Abstract This paper is concerned with the solvability of the problem of processor renaming in unreliable, completely asynchronous distributed systems. Fischer et al. prove in [8] that nontrivial consensus cannot be attained in such systems, even when only a single, ...
Two different kinds of Byzantine Agreement for distributed systems with processor faults are defi... more Two different kinds of Byzantine Agreement for distributed systems with processor faults are defined and compared. The first is required when coordinated actions may be performed by each participant at different times. This kind of agreement is called Eventual Byzantine Agreement (EBA). The second is needed for coordinated actions that must be performed by all participants at the same time. This kind is called Simultaneous Byzantine Agreement (SBA).
This paper introduces a general formulation of atomic snapshot memory, a shared memory partitione... more This paper introduces a general formulation of atomic snapshot memory, a shared memory partitioned into words written (updated) by individual processes, or instantaneously read (scanned) in its entirety. This paper presents three wait-free implementations of atomic snapshot memory. The rst implementation in this paper uses unbounded (integer) elds in these registers, and is particularly easy to understand. The second implementation uses bounded registers. Its correctness proof follows the ideas of the unbounded implementation. Both constructions implement a single-writer snapshot memory, in which each word may be updated by only one process, from single-writer, n-reader registers. The third algorithm implements a multi-writer snapshot memory from atomic n-writer, n-reader registers, again echoing key ideas from the earlier constructions. All operations require (n 2 ) reads and writes to the component shared registers in the worst case.
ABSTRACT In this paper we study the key management problem, in the context of Group Communication... more ABSTRACT In this paper we study the key management problem, in the context of Group Communication Systems (GCS). GCSs are mid-sized systems, scaling up to 100 members. We present a side-by-side comparison of three ways of managing keys, studing bandwidth and latency.
A message-passing solver for linear systems ... Ori Shental, Danny Bickson, Paul H. Siegel, Jack ... more A message-passing solver for linear systems ... Ori Shental, Danny Bickson, Paul H. Siegel, Jack K. Wolf and Danny Dolev ... Abstract We develop an efficient distributed message-passing solution for systems of linear equations based upon Gaussian belief propagation that ...
In this paper, the paradigm of linear detection is reformulated as a Gaussian belief propagation ... more In this paper, the paradigm of linear detection is reformulated as a Gaussian belief propagation (GaBP) scheme, without resorting to direct matrix inversion. The derived iterative framework allows for a distributive message-passing implementation of this important class of sub-optimal tractable estimators. The properties of GaBP-based linear detection are addressed, while its faster convergence, in comparison with conventional iterative solution methods, is demonstrated experimentally.
23rd Annual Symposium on Foundations of Computer Science (sfcs 1982), 1982
Two different notions of Byzantine Agreement - immediate and eventually - are defined depending o... more Two different notions of Byzantine Agreement - immediate and eventually - are defined depending on whether the agreement involves an action to be performed synchronously or not. The lower bounds for time complexity depend on what kind of agreement has to be achieved. All previous algorithms to reach Byzantine Agreement ensure immediate agreement. We present two algorithms that in many
Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on - ITCS '12, 2012
Fair allocation has been studied intensively in both economics and computer science, and fair sha... more Fair allocation has been studied intensively in both economics and computer science, and fair sharing of resources has aroused renewed interest with the advent of virtualization and cloud computing. Prior work has typically focused on mechanisms for fair sharing of a single resource. We provide a new definition for the simultaneous fair allocation of multiple continuously-divisible resources. Roughly speaking, we define fairness as the situation where every user either gets all the resources he wishes for, or else gets at least his entitlement on some bottleneck resource, and therefore cannot complain about not getting more. This definition has the same desirable properties as the recently suggested dominant resource fairness, and also handles the case of multiple bottlenecks. We then prove that a fair allocation according to this definition is guaranteed to exist for any combination of user requests and entitlements (where a user's relative use of the different resources is fixed). The proof, which uses tools from the theory of ordinary differential equations, is constructive and provides a method to compute the allocations numerically.
Proceedings of the sixteenth annual ACM symposium on Principles of distributed computing - PODC '97, 1997
We study failure detectors in an asynchronous environment that admits message omission failures. ... more We study failure detectors in an asynchronous environment that admits message omission failures. In such environments, processes may fail by crashing, but may also disconnect from each other. We adapt Chandra and Toueg's de nitions of failure detection completeness and accuracy to the omissions failure model, and de ne a weak failure detector 3W(om) that allows any majority of the processes that become connected to reach a
Today's modern high-end Network Interface Cards (NICs) are equipped with an onboard CPU. In most ... more Today's modern high-end Network Interface Cards (NICs) are equipped with an onboard CPU. In most cases, these CPU's are only used by the vendor and are operated by a proprietary OS, which makes them inaccessible to the HPC application developer. In this paper we present a design and implementation of a framework for building high-performance networking applications. The framework consists of an embedded NIC Operating System with a specialized scheduler. The main challenge in developing such a scheduler is the lack of a preemption mechanism in most high-end NICs. Our scheduler provides finer-grained schedules than the alternatives. We have implemented several network applications, and were able to increase their throughput while decreasing the host's CPU utilization.
Proceedings of the Thirty-First Hawaii International Conference on System Sciences, 1998
In this paper we present imss: IP Multicast Shortcut Service for ATM networks. imss pursues the \... more In this paper we present imss: IP Multicast Shortcut Service for ATM networks. imss pursues the \shortcut" routing paradigm in order to exploit the underlying ATM QoS and native routing mechanisms in the optimal way. imss is built on top of congress (CONnection oriented Group-address RESolution Service). Congress is an e cient native ATM protocol for resolution and management of multicast group addresses in a large ATM cloud. Congress resolves multicast group addresses and maintains their membership for applications. It is not designed to handle the applications' data-exchange.
This paper presents CONGRESS: a connection-oriented group- address resolution service, and its ap... more This paper presents CONGRESS: a connection-oriented group- address resolution service, and its applications. CONGRESS is an efficient native ATM protocol for resolution and management of multicast group addresses in an ATM WAN. It complements the native ATM multicast mechanisms. CONGRESS resolves multicast group addresses and maintains their membership for applications. It is not designed to handle the applications' data-exchange. Applications can use the resolved addresses returned by CONGRESS, in order to implement a many-to-many communication model. CONGRESS employs hierarchically organized servers in order to be scalable. CONGRESS' hierarchy is naturally mapped onto the ATM private network to network interface peer group hierarchy. CONGRESS communication overhead for management of a single multicast group is linear in the size of the group. Apart from facilitating native ATM multicast applications, CONGRESS can be used for the implementation of an IP multicast 'cut-through' routing over ATM. The cut-through routing paradigm is conceived as one of the most promising techniques for enabling the traditional IP- based communication with QoS. Unfortunately, due to a variety of reasons, the building of scalable IP multicast cut-through protocols is non-trivial. We claim that a multicast address resolution and maintenance service like CONGRESS, can greatly contribute to the development of scalable IP cut-through routing services. A conceptual cut-through routing solution, IP multicast service for non-broadcast access networking technology (IP-SENATE), built on top of CONGRESS and scalable to a large ATM cloud is sketched.
Consider an asynchronous system where each process begins with an arbitrary real value. Given som... more Consider an asynchronous system where each process begins with an arbitrary real value. Given some fixed > 0, an approximate agreement algorithm must have all non-faulty processes decide on values that are at most from each other and are in the range of the initial values of the non-faulty processes.
We define the "Pulse Synchronization" problem that requires nodes to achieve tight synchronizatio... more We define the "Pulse Synchronization" problem that requires nodes to achieve tight synchronization of regular pulse events, in the settings of distributed computing systems. Pulse-coupled synchronization is a phenomenon displayed by a large variety of biological systems, typically overcoming a high level of noise. Inspired by such biological models, a robust and selfstabilizing Byzantine pulse synchronization algorithm for distributed computer systems is presented. The algorithm attains near optimal synchronization tightness while tolerating up to a third of the nodes exhibiting Byzantine behavior concurrently. Pulse synchronization has been previously shown to be a powerful building block for designing algorithms in this severe fault model. We have previously shown how to stabilize general Byzantine algorithms, using pulse synchronization. To the best of our knowledge there is no other scheme to do this without the use of synchronized pulses.
Awareness of the need for robustness in distributed systems increases as distributed systems beco... more Awareness of the need for robustness in distributed systems increases as distributed systems become an integral part of day-to-day systems. Tolerating Byzantine faults and possessing self-stabilizing features are sensible and important requirements of distributed systems in general, and of a fundamental task such as clock synchronization in particular. There are efficient solutions for Byzantine non-stabilizing clock synchronization as well as for non-Byzantine self-stabilizing clock synchronization. In contrast, current Byzantine self-stabilizing clock synchronization algorithms have exponential convergence time and are thus impractical. We present a linear time Byzantine self-stabilizing clock synchronization algorithm, which thus makes this task feasible. Our deterministic clock synchronization algorithm is based on the observation that all clock synchronization algorithms require events for re-synchronizing the clock values. These events usually need to happen synchronously at the different nodes. In these solutions this is fulfilled or aided by having the clocks initially close to each other and thus the actual clock values can be used for synchronizing the events. This implies that the clock values cannot differ arbitrarily, which necessarily renders these solutions to be non-stabilizing. Our scheme suggests using a tight pulse synchronization that is uncorrelated to the actual clock values. The synchronized pulses are used as the events for re-synchronizing the clock values.
In this paper we present algorithms, which given a circular arrangement of n uniquely numbered pr... more In this paper we present algorithms, which given a circular arrangement of n uniquely numbered processes, determine the maximum number in a distributive manner . We begin with a simple unidirectional algorithm, in which the number of messages passed is bounded by 2n log n + 0(n) . By making several improvements to the simple algorithm, we obtain a unidirectional algorithm in which the number of messages passed is bounded by 1 .5n logn + 0(n) . These algorithms disprove Hirschberg and Sinclair's'conjecture that 0(n 2 ) is a lower bound on the number of messages passed in undirectional algorithms for this problem . At the end of the paper we indicate how our methods can be used to improve an algorithm due to Peterson, to obtain a unidirectional algorithm using at most 1 .356n log n + 0(n) messages. This is the best bound so far on the number of messages passed in both the bidirectional and unidirectional cases .
Byzantine Agreement involves a system of n processes, of which some t may be faulty . The problem... more Byzantine Agreement involves a system of n processes, of which some t may be faulty . The problem is for the correct processes to agree on a binary value sent by a transmitter that may itself be one of the n processes . If the transmitter sends the same value to each process, then all correct processes must agree on that value, but in any case, they must agree on some value . An explicit solution not using authentication for n = 3t + 1 processes is given, using . 2t + 3 rounds and 0( t 3 l og t ) message bits . This solution is easily extended to the general case of n >, 3t + 1 to give a solution using 2t + 3 rounds and O(nt + t3 log t) message bits .
Abstract This paper is concerned with the solvability of the problem of processor renaming in unr... more Abstract This paper is concerned with the solvability of the problem of processor renaming in unreliable, completely asynchronous distributed systems. Fischer et al. prove in [8] that nontrivial consensus cannot be attained in such systems, even when only a single, ...
Two different kinds of Byzantine Agreement for distributed systems with processor faults are defi... more Two different kinds of Byzantine Agreement for distributed systems with processor faults are defined and compared. The first is required when coordinated actions may be performed by each participant at different times. This kind of agreement is called Eventual Byzantine Agreement (EBA). The second is needed for coordinated actions that must be performed by all participants at the same time. This kind is called Simultaneous Byzantine Agreement (SBA).
This paper introduces a general formulation of atomic snapshot memory, a shared memory partitione... more This paper introduces a general formulation of atomic snapshot memory, a shared memory partitioned into words written (updated) by individual processes, or instantaneously read (scanned) in its entirety. This paper presents three wait-free implementations of atomic snapshot memory. The rst implementation in this paper uses unbounded (integer) elds in these registers, and is particularly easy to understand. The second implementation uses bounded registers. Its correctness proof follows the ideas of the unbounded implementation. Both constructions implement a single-writer snapshot memory, in which each word may be updated by only one process, from single-writer, n-reader registers. The third algorithm implements a multi-writer snapshot memory from atomic n-writer, n-reader registers, again echoing key ideas from the earlier constructions. All operations require (n 2 ) reads and writes to the component shared registers in the worst case.
Uploads
Papers by Danny Dolev