Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the l... more Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on ” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-a...
1993 4th Workshop on Future Trends of Distributed Computing Systems
The development of distributed systems in the next few years will most probably be centered aroun... more The development of distributed systems in the next few years will most probably be centered around: im-provement of facilities for application development and execution control; introduction of new distributed services; integration of new technologies to support the first two. One of ...
Proceedings 22nd International Conference on Distributed Computing Systems Workshops
In this paper we describe the model used for the NewsWire collaborative content delivery system. ... more In this paper we describe the model used for the NewsWire collaborative content delivery system. The system builds on the robustness and scalability of Astrolabe to weave a peer-to-peer infrastructure for real-time delivery of news items. The goal of the system is to deliver news updates to hundreds of thousands of subscribers within tens of seconds of the moment of publishing. The system significantly reduces the compute and network load at the publishers and guarantees delivery even in the face of publisher overload or denial of service attacks.
To deliver multicast messages reliably in a group, each member maintains copies of all messages i... more To deliver multicast messages reliably in a group, each member maintains copies of all messages it sends and receives in a bu er for potential local retransmission. The storage of these messages is costly and bu ers may g r o w out of bound. Garbage collection is needed to address this issue. Garbage collection occurs once a process learns that a message in its bu er has been received by e v ery process in the group. The message is declared stable and is released from the process's bu er. This paper proposes a gossip-style garbage collection scheme called GSGC for scalable reliable multicast protocols. This scheme achieves fault-tolerance and scalability without relying on the underlying multicast protocols. It collects and disseminates information in the multicast group by making each group member periodically gossip information to a random subset of the group. Extending the global gossip protocol further, this paper also investigates a local gossip scheme that achieves improved scalability and signi cantly better performance. Simulations conducted in a WAN environment are used to evaluate the performance of both schemes.
This paper describes how this evaluation lead to the insight that Microsoft's Windows NT... more This paper describes how this evaluation lead to the insight that Microsoft's Windows NT is the operating system that is best prepared for the future
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing - PODC '06, 2006
ABSTRACT Computer architecture is about to undergo, if not another revolution, then a vigorous sh... more ABSTRACT Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping ...
Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03, 2003
The Common Language Infrastructure is a new, standardized virtual machine that is likely to becom... more The Common Language Infrastructure is a new, standardized virtual machine that is likely to become popular on several platforms. In this paper we review whether this technology has any future in the highperformance computing community, for example by targeting the same application space as the Java-Grande Forum. We review the technology by benchmarking three implementations of the CLI and compare those with the results on Java virtual machines.
Proceedings of the 7th workshop on ACM SIGOPS European workshop Systems support for worldwide applications - EW 7, 1996
The one issue that unites almost all approaches to distributed computing is the need to know whet... more The one issue that unites almost all approaches to distributed computing is the need to know whether certain components in the system have failed or are otherwise unavailable. When designing and building systems that need to function at a global scale, failure management needs to be considered a fundamental building block. This paper describes the development of a system-independent failure management service, which allows systems and applications to incorporate accurate detection of failed processes, nodes and networks, without the need for making compromises in their particular design. Introduction. With the advent of ubiquitous, worldwide distributed systems, it is becoming clear that the systems that are used today in local-area settings, can not simply be employed in their existing form or trivially converted for wide-area, large-scale operation. Whatever form such systems may take in the future, whether they are replicated databases of hyper-links, distributed objects, view or virtual synchronous groups or agents employing lazy consistency schemes, one of the key problems that needs to be addressed, is that of the detection and handling of faulty components, nodes and networks. Building distributed systems and applications today is done using a variety of systems ranging from the bare bone protocols interfaces like BSD sockets and the TDI, to RPC based systems such as DCE and to more advanced distributed support systems such as Isis, Horus, Delta-4 and others [1,2,3,11,12]. After years of experience with building these systems and applications, it is clear that failure management is not just a essential tool for group oriented systems, all which have built-in failure handling, but that it is a fundamental service that should be placed among such established basic services as naming, authentication, security, service brokerage and IPC. This paper reports on an ongoing research effort to abstract the failure handling strategies from a variety of popular distributed systems and to develop a basic failure management service that can be used by any distributed system regardless of the purpose of that system or the techniques used. The strategies employed by this basic service are specifically targeted towards applications that need to operate on a global scale. Research goals. To build a successful service the following goals were set: • design a failure management system that is independent of the distributed systems packages in use and provide failure detection of processes, nodes and networks. • improve the accuracy of detection of process and node failure through systems support. • design support for failure detectors to work in large scale systems, while maintaining a high level of accuracy. • provide support for the detection of partitions in networks. • build a comprehensive software package that can be easily integrated into various distributed systems packages and applications.
We have performed a study of the usage of the Windows NT File System through long-term kernel tra... more We have performed a study of the usage of the Windows NT File System through long-term kernel tracing. Our goal was to provide a new data point with respect to the 1985 and 1991 trace-based File System studies, to investigate the usage details of the Windows NT file system architecture, and to study the overall statistical behavior of the usage data.In this paper we report on these issues through a detailed comparison with the older traces, through details on the operational characteristics and through a usage analysis of the file system and cache manager. Next to architectural insights we provide evidence for the pervasive presence of heavy-tail distribution characteristics in all aspect of file system usage. Extreme variances are found in session inter-arrival time, session holding times, read/write frequencies, read/write buffer sizes, etc., which is of importance to system engineering, tuning and benchmarking.
An important factor in the successful deployment of federated web services-based business activit... more An important factor in the successful deployment of federated web services-based business activities will be the ability to guarantee reliable distributed operation and execution under scalable conditions. For example advanced failure management is essential for any reliable distributed operation but especially for the target areas of web service architectures, where the activities can be constructed out of services located at different enterprises, and are accessed over heterogeneous networks topologies. In this paper we describe the first technologies and implementations coming out of the Obduro project, which has as a goal to apply the results of scalability and reliability research to global scalable service oriented architectures. We present technology developed for failure and availability tracking of processes involved in long running business activities within a web services coordination framework. The Service Tracker, Coordination Service and related development toolkits are available for public usage.
Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01
Most existing communications technologies are either not scalable at all, or scale only under car... more Most existing communications technologies are either not scalable at all, or scale only under carefully controlled conditions. This threatens an emerging generation of mission-critical but very large computing systems, which will need communication support for such purposes as system management and control, policy administration, data dissemination, and to initiate adaptation in demanding environments. Cornell University's Spinglass project has discovered that "gossip-based" protocols can overcome scalability problems, offering security and reliability even in the most demanding settings. Gossip protocols emulate the spread of an infection in a crowded population, and are both reliable and stable under forms of stress that can disable more traditional protocols. Our effort is developing a new generation of gossip-based technology for secure, reliable large-scale collaboration and soft real-time communications-even over global networks.
Proceedings. 26th International Conference on Software Engineering
Rapid acceptance of the Web Services architecture promises to make it the most widely supported a... more Rapid acceptance of the Web Services architecture promises to make it the most widely supported and popular object-oriented architecture to date. One consequence is that a wave of mission-critical Web Services applications will certainly be deployed in coming years. Yet the reliability options available within Web Services are limited in important ways. To use a term proposed by IBM, Web Services systems need to become far more "autonomic," configuring themselves, diagnosing faults, and managing themselves. High availability applications need more attention. Moreover, the scenarios in which such issues arise often entail very large deployments, raising questions of scalability. In this paper we propose a path by which the architecture could be extended in these respects.
The U-Net communication architecture prowdes processeswith a virtual view of a network interface ... more The U-Net communication architecture prowdes processeswith a virtual view of a network interface to enable userIevel accessto high-speed communication dewces. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while stall prowdmg full The model presented by U-Net allows for the construction of protocols at user level whose performance is only
Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the l... more Reliability at massive scale is one of the biggest challenges we face at Amazon.com, one of the largest e-commerce operations in the world; even the slightest outage has significant financial consequences and impacts customer trust. The Amazon.com platform, which provides services for many web sites worldwide, is implemented on top of an infrastructure of tens of thousands of servers and network components located in many datacenters around the world. At this scale, small and large components fail continuously and the way persistent state is managed in the face of these failures drives the reliability and scalability of the software systems. This paper presents the design and implementation of Dynamo, a highly available key-value storage system that some of Amazon’s core services use to provide an “always-on ” experience. To achieve this level of availability, Dynamo sacrifices consistency under certain failure scenarios. It makes extensive use of object versioning and application-a...
1993 4th Workshop on Future Trends of Distributed Computing Systems
The development of distributed systems in the next few years will most probably be centered aroun... more The development of distributed systems in the next few years will most probably be centered around: im-provement of facilities for application development and execution control; introduction of new distributed services; integration of new technologies to support the first two. One of ...
Proceedings 22nd International Conference on Distributed Computing Systems Workshops
In this paper we describe the model used for the NewsWire collaborative content delivery system. ... more In this paper we describe the model used for the NewsWire collaborative content delivery system. The system builds on the robustness and scalability of Astrolabe to weave a peer-to-peer infrastructure for real-time delivery of news items. The goal of the system is to deliver news updates to hundreds of thousands of subscribers within tens of seconds of the moment of publishing. The system significantly reduces the compute and network load at the publishers and guarantees delivery even in the face of publisher overload or denial of service attacks.
To deliver multicast messages reliably in a group, each member maintains copies of all messages i... more To deliver multicast messages reliably in a group, each member maintains copies of all messages it sends and receives in a bu er for potential local retransmission. The storage of these messages is costly and bu ers may g r o w out of bound. Garbage collection is needed to address this issue. Garbage collection occurs once a process learns that a message in its bu er has been received by e v ery process in the group. The message is declared stable and is released from the process's bu er. This paper proposes a gossip-style garbage collection scheme called GSGC for scalable reliable multicast protocols. This scheme achieves fault-tolerance and scalability without relying on the underlying multicast protocols. It collects and disseminates information in the multicast group by making each group member periodically gossip information to a random subset of the group. Extending the global gossip protocol further, this paper also investigates a local gossip scheme that achieves improved scalability and signi cantly better performance. Simulations conducted in a WAN environment are used to evaluate the performance of both schemes.
This paper describes how this evaluation lead to the insight that Microsoft's Windows NT... more This paper describes how this evaluation lead to the insight that Microsoft's Windows NT is the operating system that is best prepared for the future
Proceedings of the twenty-fifth annual ACM symposium on Principles of distributed computing - PODC '06, 2006
ABSTRACT Computer architecture is about to undergo, if not another revolution, then a vigorous sh... more ABSTRACT Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping ...
Proceedings of the 2003 ACM/IEEE conference on Supercomputing - SC '03, 2003
The Common Language Infrastructure is a new, standardized virtual machine that is likely to becom... more The Common Language Infrastructure is a new, standardized virtual machine that is likely to become popular on several platforms. In this paper we review whether this technology has any future in the highperformance computing community, for example by targeting the same application space as the Java-Grande Forum. We review the technology by benchmarking three implementations of the CLI and compare those with the results on Java virtual machines.
Proceedings of the 7th workshop on ACM SIGOPS European workshop Systems support for worldwide applications - EW 7, 1996
The one issue that unites almost all approaches to distributed computing is the need to know whet... more The one issue that unites almost all approaches to distributed computing is the need to know whether certain components in the system have failed or are otherwise unavailable. When designing and building systems that need to function at a global scale, failure management needs to be considered a fundamental building block. This paper describes the development of a system-independent failure management service, which allows systems and applications to incorporate accurate detection of failed processes, nodes and networks, without the need for making compromises in their particular design. Introduction. With the advent of ubiquitous, worldwide distributed systems, it is becoming clear that the systems that are used today in local-area settings, can not simply be employed in their existing form or trivially converted for wide-area, large-scale operation. Whatever form such systems may take in the future, whether they are replicated databases of hyper-links, distributed objects, view or virtual synchronous groups or agents employing lazy consistency schemes, one of the key problems that needs to be addressed, is that of the detection and handling of faulty components, nodes and networks. Building distributed systems and applications today is done using a variety of systems ranging from the bare bone protocols interfaces like BSD sockets and the TDI, to RPC based systems such as DCE and to more advanced distributed support systems such as Isis, Horus, Delta-4 and others [1,2,3,11,12]. After years of experience with building these systems and applications, it is clear that failure management is not just a essential tool for group oriented systems, all which have built-in failure handling, but that it is a fundamental service that should be placed among such established basic services as naming, authentication, security, service brokerage and IPC. This paper reports on an ongoing research effort to abstract the failure handling strategies from a variety of popular distributed systems and to develop a basic failure management service that can be used by any distributed system regardless of the purpose of that system or the techniques used. The strategies employed by this basic service are specifically targeted towards applications that need to operate on a global scale. Research goals. To build a successful service the following goals were set: • design a failure management system that is independent of the distributed systems packages in use and provide failure detection of processes, nodes and networks. • improve the accuracy of detection of process and node failure through systems support. • design support for failure detectors to work in large scale systems, while maintaining a high level of accuracy. • provide support for the detection of partitions in networks. • build a comprehensive software package that can be easily integrated into various distributed systems packages and applications.
We have performed a study of the usage of the Windows NT File System through long-term kernel tra... more We have performed a study of the usage of the Windows NT File System through long-term kernel tracing. Our goal was to provide a new data point with respect to the 1985 and 1991 trace-based File System studies, to investigate the usage details of the Windows NT file system architecture, and to study the overall statistical behavior of the usage data.In this paper we report on these issues through a detailed comparison with the older traces, through details on the operational characteristics and through a usage analysis of the file system and cache manager. Next to architectural insights we provide evidence for the pervasive presence of heavy-tail distribution characteristics in all aspect of file system usage. Extreme variances are found in session inter-arrival time, session holding times, read/write frequencies, read/write buffer sizes, etc., which is of importance to system engineering, tuning and benchmarking.
An important factor in the successful deployment of federated web services-based business activit... more An important factor in the successful deployment of federated web services-based business activities will be the ability to guarantee reliable distributed operation and execution under scalable conditions. For example advanced failure management is essential for any reliable distributed operation but especially for the target areas of web service architectures, where the activities can be constructed out of services located at different enterprises, and are accessed over heterogeneous networks topologies. In this paper we describe the first technologies and implementations coming out of the Obduro project, which has as a goal to apply the results of scalability and reliability research to global scalable service oriented architectures. We present technology developed for failure and availability tracking of processes involved in long running business activities within a web services coordination framework. The Service Tracker, Coordination Service and related development toolkits are available for public usage.
Proceedings DARPA Information Survivability Conference and Exposition II. DISCEX'01
Most existing communications technologies are either not scalable at all, or scale only under car... more Most existing communications technologies are either not scalable at all, or scale only under carefully controlled conditions. This threatens an emerging generation of mission-critical but very large computing systems, which will need communication support for such purposes as system management and control, policy administration, data dissemination, and to initiate adaptation in demanding environments. Cornell University's Spinglass project has discovered that "gossip-based" protocols can overcome scalability problems, offering security and reliability even in the most demanding settings. Gossip protocols emulate the spread of an infection in a crowded population, and are both reliable and stable under forms of stress that can disable more traditional protocols. Our effort is developing a new generation of gossip-based technology for secure, reliable large-scale collaboration and soft real-time communications-even over global networks.
Proceedings. 26th International Conference on Software Engineering
Rapid acceptance of the Web Services architecture promises to make it the most widely supported a... more Rapid acceptance of the Web Services architecture promises to make it the most widely supported and popular object-oriented architecture to date. One consequence is that a wave of mission-critical Web Services applications will certainly be deployed in coming years. Yet the reliability options available within Web Services are limited in important ways. To use a term proposed by IBM, Web Services systems need to become far more "autonomic," configuring themselves, diagnosing faults, and managing themselves. High availability applications need more attention. Moreover, the scenarios in which such issues arise often entail very large deployments, raising questions of scalability. In this paper we propose a path by which the architecture could be extended in these respects.
The U-Net communication architecture prowdes processeswith a virtual view of a network interface ... more The U-Net communication architecture prowdes processeswith a virtual view of a network interface to enable userIevel accessto high-speed communication dewces. The architecture, implemented on standard workstations using offthe-shelf ATM communication hardware, removes the kernel from the communication path, while stall prowdmg full The model presented by U-Net allows for the construction of protocols at user level whose performance is only
Uploads
Papers by Werner Vogels