ABSTRACT Design errors in the development of concurrent software can result in deadlocks and race... more ABSTRACT Design errors in the development of concurrent software can result in deadlocks and race conditions, and discovering these is difficult. Rigorous analysis techniques such as model checking require the use of temporal logic to formalize application-specific properties and reason about the modeled system behavior. The well-established property patterns taxonomy captures requirements and contains useful mappings for different state-based target formalisms, but practitioners have to fully understand these patterns before they can select and apply the appropriate ones. To reassess the applicability of the pattern-based classification in an event-based setting, we examined 25 published works that use µ-calculus to express system properties from different domains. In the process of "patterning", we encountered wrongly-formalized formulas in a number of them. Our findings indicate that manually eliciting properties is error-prone, and should be supported by more accessible tools. In our previous work, to bring the process of correctly eliciting functional properties closer to software engineers, we introduced PASS, a Property ASSistant, as part of a UML-based front-end to the mCRL2 model checking toolset. PASS instantiates pattern templates using three notations: a natural language summary, a µ-calculus formula and a UML sequence diagram depicting the desired behavior. Through a case study of a critical Grid module used by one of the CERN experiments, we demonstrate the usefulness of PASS in eliciting properties which capture the required behavior in real-world settings. We also introduce new patterns deemed useful, but missing from the original classification.
This paper argues that a distributed system should support two com- munication paradigms: Remote ... more This paper argues that a distributed system should support two com- munication paradigms: Remote Procedure Call (RPC) and group communication. The former captures point-to-point communication; the latter captures one-to-many communication. We demonstrate that group communication is an important paradigm by showing that a fault-tolerant directory service is much easier to implement with groups than with RPC and is also more
Clusters of workstations are often claimed to be a good platform for parallel processing, especia... more Clusters of workstations are often claimed to be a good platform for parallel processing, especially if a fast network is used to interconnect the workstations. Indeed, high performance can be obtained for low-level message passing primitives on modern networks like ATM and Myrinet. Most application programmers, however, want to use higherlevel communication primitives. Unfortunately, implementing such primitives efficiently on a modern network is a difficult task, because their software overhead is relatively much higher than on a traditional, slow network (such as Ethernet).
This paper describes generic NI-level mechanisms that reduce data and control transfer overheads ... more This paper describes generic NI-level mechanisms that reduce data and control transfer overheads in user-level communicationsystems. These mechanisms reduce polling overhead without reducing throughput, improve multicast performance, and reduce the frequency of network and timer interrupts. They have all been implemented in a fam- ily of low-level Myrinet communication systems. We illus- trate the performanceimpact of these mechanismsusing mi- crobenchmarks
DIRAC (Distributed Infrastructure with Remote Agent Control) is the grid solution designed to sup... more DIRAC (Distributed Infrastructure with Remote Agent Control) is the grid solution designed to support production activities as well as user data analysis for the Large Hadron Collider "beauty" experiment. It consists of cooperating distributed services and a plethora of light-weight agents delivering the workload to the grid resources. Services accept requests from agents and running jobs, while agents actively fulfill specific goals. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, or requests for data transfer. Agents continuously check for changes in the service states, and react to these accordingly. The logic of each agent is rather simple; the main source of complexity lies in their cooperation. These agents run concurrently, and communicate using the services' databases as a shared memory for synchronizing the state transitions. Despite the effort invested in making DIRAC reliable, entities occasionally get into inconsistent states. Tracing and fixing such behaviors is difficult, given the inherent parallelism among the distributed components and the size of the implementation.
Although grids hold great promise for many scientific applications, writing efficient and portabl... more Although grids hold great promise for many scientific applications, writing efficient and portable grid applications is notoriously difficult. Grid programmers often have to use low-level programming interfaces that change frequently, and they have to deal with heterogeneity, connectivity problems, and fault tolerance. Also, managing a running application is complicated, because the execution environment changes dynamically, as resources come and go.
2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008
This paper shows how lightpath-based networks can allow challenging, fine-grained parallel superc... more This paper shows how lightpath-based networks can allow challenging, fine-grained parallel supercomputing applications to be run on a grid, using parallel retrograde analysis on DAS-3 as a case study. Detailed performance analysis shows that several problems arise that are not present on tightly-coupled systems like clusters. In particular, flow control, asynchronous communication, and hostlevel communication overheads become new obstacles. By optimizing these aspects, however, a 10G grid can obtain high performance for this type of communication-intensive application. The class of large-scale distributed applications suitable for running on a grid is therefore larger than previously thought realistic.
Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004., 2004
Grid computing applications are challenged by current wide-area networks: firewalls, private IP a... more Grid computing applications are challenged by current wide-area networks: firewalls, private IP addresses and network address translation (NAT) hamper connectivity, the TCP protocol can hardly exploit the available bandwidth, and security features like authentication and encryption are usually difficult to integrate. Existing systems (like GridFTP, JXTA, SOCKS) each address only one of these issues. However, applications need to cope with all of them, at the same time. Unfortunately, existing solutions are often not easy to combine, and a particular solution for one subproblem may reduce the applicability or performance of another.
Proceedings of the 5th workshop on ACM SIGOPS European workshop Models and paradigms for distributed systems structuring - EW 5, 1992
This paper suggests that a distributed system should support two communication paradigms: Remote ... more This paper suggests that a distributed system should support two communication paradigms: Remote Procedure Call (RPC) and group communication. The former is used for point-to-point communication; the latter is used for one-to-many communication. We demonstrate that group communication is an important paradigm by showing that a fault-tolerant directory service is much easier to implement with groups than with RPC and
Many high speed networks have been developed that may be suitable for parallel computing on clust... more Many high speed networks have been developed that may be suitable for parallel computing on clusters of workstations. This paper compares three different networks: FastEthernet, ATM, and Myrinet. We have implemented the Panda portability layer on all three networks, using the same host machines and as much the same software as possible. We compare the latency and throughput for Panda's point-to-point and multicast communication on the three networks and analyze the performance differences.
2009 IEEE International Symposium on Parallel & Distributed Processing, 2009
The introduction of optical private networks (lightpaths) has significantly improved the capacity... more The introduction of optical private networks (lightpaths) has significantly improved the capacity of long distance net - work links, making it feasible to run large parallel applica - tions in a distributed fashion on multiple sites of a com- putational grid. Besides offering bandwidths of 10 Gbit/s or more, lightpaths also allow network connections to be dynamically reconfigured. This paper
Clusters of workstations are a popular platform for high-performance computing. For many parallel... more Clusters of workstations are a popular platform for high-performance computing. For many parallel applications, efficient use of a fast interconnection network is essential for good performance. Several modern System Area Networks include programmable network interfaces that can be tailored to perform protocol tasks that otherwise would need to be done by the host processors. Finding the right trade-off between protocol processing at the host and the network interface is difficult in general. In this work, we systematically evaluate the performance of different implementations of a single, user-level communication interface. The implementations make different architectural assumptions about the reliability of the network and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Also, we investigate the effects of alternative data-transfer methods and multicast implementations, and we evaluate the influence of packet size. Using microbenchmarks, parallel-programming systems, and parallel applications, we assess the performance of the different implementations at multiple levels. We use two hardware platforms with different performance characteristics to validate our conclusions. We show how moving protocol tasks to a relatively slow network interface can yield both performance advantages and disadvantages, depending on specific characteristics of the application and the underlying parallelprogramming system.
We systematically evaluate the performance of five implementations of a single, user-level commun... more We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions about the reliability of the network hardware and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Using microbenchmarks, parallelprogramming systems, and parallel applications, we assess the performance impact of different protocol decompositions. We show how moving protocol tasks to a relatively slow network interface yields both performance advantages and disadvantages, depending on the characteristics of the application and the underlying parallelprogramming system. In particular, we show that a communication system that assumes highly reliable network hardware and that uses network-interface support to process multicast traffic performs best for all applications.
Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing, 1996
Although multicast is an important communication primitive for parallel programming, many modern ... more Although multicast is an important communication primitive for parallel programming, many modern networks do not support it in hardware. Multicast can be implemented in software on such networks, using some spanning tree pro- tocol. Making multicast reliable, however, is a difficult problem, even if the hardware point-to-point communication is reliable. The key issue is that a flow control mechanism is
Performance modeling is important for implementing efficient parallel applications and runtime sy... more Performance modeling is important for implementing efficient parallel applications and runtime systems. The LogP model captures the relevant aspects of message passing in distributed-memory architectures. In this paper we describe an efficient method that measures LogP parameters for a given message passing platform. Measurements are performed for messages of different sizes, as covered by the parameterized LogP model, a slight extension of LogP and LogGP. To minimize both intrusiveness and completion time of the measurement, we propose a procedure that sends as few messages as possible. An implementation of this procedure, called the MPI LogP benchmark, is available from our WWW site.
[1993] Proceedings. The 13th International Conference on Distributed Computing Systems, 1993
Group communication is an important paradigm for building distributed applications. This paper di... more Group communication is an important paradigm for building distributed applications. This paper discusses a fault-tolerant distributed directory service based on group communication, and compares it with the previous design and implementation based on remote procedure call. The group directory service uses an active replication scheme and, when triplicated, can handle 627 lookup operations per second and 88 update operations per second (using nonvolatile RAM). This performance is better than the performance for the RPC implementation and it is even better than the performance for directory operations under SunOS, which does not provide any fault tolerance at all. The paper concludes that the implementation using group communication is simpler and has better performance than the one based on remote procedure call, supporting the claim that a distributed operating system should provide both remote procedure call and group communication.
2009 IEEE International Symposium on Parallel & Distributed Processing, 2009
Model checking is a popular technique to systematically and automatically verify system propertie... more Model checking is a popular technique to systematically and automatically verify system properties. Unfortunately, the well-known state explosion problem often limits the extent to which it can be applied to realistic specifications, due to the huge resulting memory requirements. Distributedmemory model checkers exist, but have thus far only been evaluated on small-scale clusters, with mixed results. We examine one well-known distributed model checker in detail, and show how a number of additional optimizations in its runtime system enable it to efficiently check very demanding problem instances on a large-scale, multi-core compute cluster. We analyze the impact of the distributed algorithms employed, the problem instance characteristics and network overhead. Finally, we show that the model checker can even obtain good performance in a high-bandwidth computational grid environment.
Unlike many other operating systems, Amoeba is a distributed operating system that provides group... more Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We will discuss design issues for group communication, Amoeba's group system calls, and the protocols to implement group communication. To demonstrate that group communication is a useful abstraction, we will describe a design and implementation of a fault-tolerant directory service. We discuss two versions of the directory service: one with Non-Volatile RAM (NVRAM) and one without NVRAM. We will give performance figures for both implementations.
ABSTRACT Design errors in the development of concurrent software can result in deadlocks and race... more ABSTRACT Design errors in the development of concurrent software can result in deadlocks and race conditions, and discovering these is difficult. Rigorous analysis techniques such as model checking require the use of temporal logic to formalize application-specific properties and reason about the modeled system behavior. The well-established property patterns taxonomy captures requirements and contains useful mappings for different state-based target formalisms, but practitioners have to fully understand these patterns before they can select and apply the appropriate ones. To reassess the applicability of the pattern-based classification in an event-based setting, we examined 25 published works that use µ-calculus to express system properties from different domains. In the process of "patterning", we encountered wrongly-formalized formulas in a number of them. Our findings indicate that manually eliciting properties is error-prone, and should be supported by more accessible tools. In our previous work, to bring the process of correctly eliciting functional properties closer to software engineers, we introduced PASS, a Property ASSistant, as part of a UML-based front-end to the mCRL2 model checking toolset. PASS instantiates pattern templates using three notations: a natural language summary, a µ-calculus formula and a UML sequence diagram depicting the desired behavior. Through a case study of a critical Grid module used by one of the CERN experiments, we demonstrate the usefulness of PASS in eliciting properties which capture the required behavior in real-world settings. We also introduce new patterns deemed useful, but missing from the original classification.
This paper argues that a distributed system should support two com- munication paradigms: Remote ... more This paper argues that a distributed system should support two com- munication paradigms: Remote Procedure Call (RPC) and group communication. The former captures point-to-point communication; the latter captures one-to-many communication. We demonstrate that group communication is an important paradigm by showing that a fault-tolerant directory service is much easier to implement with groups than with RPC and is also more
Clusters of workstations are often claimed to be a good platform for parallel processing, especia... more Clusters of workstations are often claimed to be a good platform for parallel processing, especially if a fast network is used to interconnect the workstations. Indeed, high performance can be obtained for low-level message passing primitives on modern networks like ATM and Myrinet. Most application programmers, however, want to use higherlevel communication primitives. Unfortunately, implementing such primitives efficiently on a modern network is a difficult task, because their software overhead is relatively much higher than on a traditional, slow network (such as Ethernet).
This paper describes generic NI-level mechanisms that reduce data and control transfer overheads ... more This paper describes generic NI-level mechanisms that reduce data and control transfer overheads in user-level communicationsystems. These mechanisms reduce polling overhead without reducing throughput, improve multicast performance, and reduce the frequency of network and timer interrupts. They have all been implemented in a fam- ily of low-level Myrinet communication systems. We illus- trate the performanceimpact of these mechanismsusing mi- crobenchmarks
DIRAC (Distributed Infrastructure with Remote Agent Control) is the grid solution designed to sup... more DIRAC (Distributed Infrastructure with Remote Agent Control) is the grid solution designed to support production activities as well as user data analysis for the Large Hadron Collider "beauty" experiment. It consists of cooperating distributed services and a plethora of light-weight agents delivering the workload to the grid resources. Services accept requests from agents and running jobs, while agents actively fulfill specific goals. Services maintain database back-ends to store dynamic state information of entities such as jobs, queues, or requests for data transfer. Agents continuously check for changes in the service states, and react to these accordingly. The logic of each agent is rather simple; the main source of complexity lies in their cooperation. These agents run concurrently, and communicate using the services' databases as a shared memory for synchronizing the state transitions. Despite the effort invested in making DIRAC reliable, entities occasionally get into inconsistent states. Tracing and fixing such behaviors is difficult, given the inherent parallelism among the distributed components and the size of the implementation.
Although grids hold great promise for many scientific applications, writing efficient and portabl... more Although grids hold great promise for many scientific applications, writing efficient and portable grid applications is notoriously difficult. Grid programmers often have to use low-level programming interfaces that change frequently, and they have to deal with heterogeneity, connectivity problems, and fault tolerance. Also, managing a running application is complicated, because the execution environment changes dynamically, as resources come and go.
2008 Eighth IEEE International Symposium on Cluster Computing and the Grid (CCGRID), 2008
This paper shows how lightpath-based networks can allow challenging, fine-grained parallel superc... more This paper shows how lightpath-based networks can allow challenging, fine-grained parallel supercomputing applications to be run on a grid, using parallel retrograde analysis on DAS-3 as a case study. Detailed performance analysis shows that several problems arise that are not present on tightly-coupled systems like clusters. In particular, flow control, asynchronous communication, and hostlevel communication overheads become new obstacles. By optimizing these aspects, however, a 10G grid can obtain high performance for this type of communication-intensive application. The class of large-scale distributed applications suitable for running on a grid is therefore larger than previously thought realistic.
Proceedings. 13th IEEE International Symposium on High performance Distributed Computing, 2004., 2004
Grid computing applications are challenged by current wide-area networks: firewalls, private IP a... more Grid computing applications are challenged by current wide-area networks: firewalls, private IP addresses and network address translation (NAT) hamper connectivity, the TCP protocol can hardly exploit the available bandwidth, and security features like authentication and encryption are usually difficult to integrate. Existing systems (like GridFTP, JXTA, SOCKS) each address only one of these issues. However, applications need to cope with all of them, at the same time. Unfortunately, existing solutions are often not easy to combine, and a particular solution for one subproblem may reduce the applicability or performance of another.
Proceedings of the 5th workshop on ACM SIGOPS European workshop Models and paradigms for distributed systems structuring - EW 5, 1992
This paper suggests that a distributed system should support two communication paradigms: Remote ... more This paper suggests that a distributed system should support two communication paradigms: Remote Procedure Call (RPC) and group communication. The former is used for point-to-point communication; the latter is used for one-to-many communication. We demonstrate that group communication is an important paradigm by showing that a fault-tolerant directory service is much easier to implement with groups than with RPC and
Many high speed networks have been developed that may be suitable for parallel computing on clust... more Many high speed networks have been developed that may be suitable for parallel computing on clusters of workstations. This paper compares three different networks: FastEthernet, ATM, and Myrinet. We have implemented the Panda portability layer on all three networks, using the same host machines and as much the same software as possible. We compare the latency and throughput for Panda's point-to-point and multicast communication on the three networks and analyze the performance differences.
2009 IEEE International Symposium on Parallel & Distributed Processing, 2009
The introduction of optical private networks (lightpaths) has significantly improved the capacity... more The introduction of optical private networks (lightpaths) has significantly improved the capacity of long distance net - work links, making it feasible to run large parallel applica - tions in a distributed fashion on multiple sites of a com- putational grid. Besides offering bandwidths of 10 Gbit/s or more, lightpaths also allow network connections to be dynamically reconfigured. This paper
Clusters of workstations are a popular platform for high-performance computing. For many parallel... more Clusters of workstations are a popular platform for high-performance computing. For many parallel applications, efficient use of a fast interconnection network is essential for good performance. Several modern System Area Networks include programmable network interfaces that can be tailored to perform protocol tasks that otherwise would need to be done by the host processors. Finding the right trade-off between protocol processing at the host and the network interface is difficult in general. In this work, we systematically evaluate the performance of different implementations of a single, user-level communication interface. The implementations make different architectural assumptions about the reliability of the network and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Also, we investigate the effects of alternative data-transfer methods and multicast implementations, and we evaluate the influence of packet size. Using microbenchmarks, parallel-programming systems, and parallel applications, we assess the performance of the different implementations at multiple levels. We use two hardware platforms with different performance characteristics to validate our conclusions. We show how moving protocol tasks to a relatively slow network interface can yield both performance advantages and disadvantages, depending on specific characteristics of the application and the underlying parallelprogramming system.
We systematically evaluate the performance of five implementations of a single, user-level commun... more We systematically evaluate the performance of five implementations of a single, user-level communication interface. Each implementation makes different architectural assumptions about the reliability of the network hardware and the capabilities of the network interface. The implementations differ accordingly in their division of protocol tasks between host software, network-interface firmware, and network hardware. Using microbenchmarks, parallelprogramming systems, and parallel applications, we assess the performance impact of different protocol decompositions. We show how moving protocol tasks to a relatively slow network interface yields both performance advantages and disadvantages, depending on the characteristics of the application and the underlying parallelprogramming system. In particular, we show that a communication system that assumes highly reliable network hardware and that uses network-interface support to process multicast traffic performs best for all applications.
Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing, 1996
Although multicast is an important communication primitive for parallel programming, many modern ... more Although multicast is an important communication primitive for parallel programming, many modern networks do not support it in hardware. Multicast can be implemented in software on such networks, using some spanning tree pro- tocol. Making multicast reliable, however, is a difficult problem, even if the hardware point-to-point communication is reliable. The key issue is that a flow control mechanism is
Performance modeling is important for implementing efficient parallel applications and runtime sy... more Performance modeling is important for implementing efficient parallel applications and runtime systems. The LogP model captures the relevant aspects of message passing in distributed-memory architectures. In this paper we describe an efficient method that measures LogP parameters for a given message passing platform. Measurements are performed for messages of different sizes, as covered by the parameterized LogP model, a slight extension of LogP and LogGP. To minimize both intrusiveness and completion time of the measurement, we propose a procedure that sends as few messages as possible. An implementation of this procedure, called the MPI LogP benchmark, is available from our WWW site.
[1993] Proceedings. The 13th International Conference on Distributed Computing Systems, 1993
Group communication is an important paradigm for building distributed applications. This paper di... more Group communication is an important paradigm for building distributed applications. This paper discusses a fault-tolerant distributed directory service based on group communication, and compares it with the previous design and implementation based on remote procedure call. The group directory service uses an active replication scheme and, when triplicated, can handle 627 lookup operations per second and 88 update operations per second (using nonvolatile RAM). This performance is better than the performance for the RPC implementation and it is even better than the performance for directory operations under SunOS, which does not provide any fault tolerance at all. The paper concludes that the implementation using group communication is simpler and has better performance than the one based on remote procedure call, supporting the claim that a distributed operating system should provide both remote procedure call and group communication.
2009 IEEE International Symposium on Parallel & Distributed Processing, 2009
Model checking is a popular technique to systematically and automatically verify system propertie... more Model checking is a popular technique to systematically and automatically verify system properties. Unfortunately, the well-known state explosion problem often limits the extent to which it can be applied to realistic specifications, due to the huge resulting memory requirements. Distributedmemory model checkers exist, but have thus far only been evaluated on small-scale clusters, with mixed results. We examine one well-known distributed model checker in detail, and show how a number of additional optimizations in its runtime system enable it to efficiently check very demanding problem instances on a large-scale, multi-core compute cluster. We analyze the impact of the distributed algorithms employed, the problem instance characteristics and network overhead. Finally, we show that the model checker can even obtain good performance in a high-bandwidth computational grid environment.
Unlike many other operating systems, Amoeba is a distributed operating system that provides group... more Unlike many other operating systems, Amoeba is a distributed operating system that provides group communication (i.e., one-to-many communication). We will discuss design issues for group communication, Amoeba's group system calls, and the protocols to implement group communication. To demonstrate that group communication is a useful abstraction, we will describe a design and implementation of a fault-tolerant directory service. We discuss two versions of the directory service: one with Non-Volatile RAM (NVRAM) and one without NVRAM. We will give performance figures for both implementations.
Uploads
Papers by Kees Verstoep