IEEE Transactions on Parallel and Distributed Systems, 2006
Freedom from deadlock is a key issue in Cut-Through, Wormhole, and Store and Forward networks, an... more Freedom from deadlock is a key issue in Cut-Through, Wormhole, and Store and Forward networks, and such freedom is usually obtained through careful design of the routing algorithm. Most existing deadlock-free routing methods for irregular topologies do, however, impose severe limitations on the available routing paths. We present a method called Layered Routing, which gives rise to a series of routing algorithms, some of which perform considerably better than previous ones. Our method groups virtual channels into network layers and to each layer it assigns a limited set of source/destination address pairs. This separation of traffic yields a significant increase in routing efficiency. We show how the method can be used to improve the performance of irregular networks, both through load balancing and by guaranteeing shortest-path routing. The method is simple to implement, and its application does not require any features in the switches other than the existence of a modest number of virtual channels. The performance of the approach is evaluated through extensive experiments within three classes of technologies. These experiments reveal a need for virtual channels as well as an improvement in throughput for each technology class.
Fat trees are a very common communication architecture in current large-scale parallel computers.... more Fat trees are a very common communication architecture in current large-scale parallel computers. The probability of failure in these systems increases with the number of components. We present a routing method for deterministically and adaptively routed fat trees, applicable to both distributed and source routing, that is able to handle several concurrent faults and that transparently returns to the original routing strategy once the faulty components have recovered. The method is local and dynamic, completely masking the fault from the rest of the system. It only requires a small extra functionality in the switches to handle rerouting packets around a fault. The method guarantees connectedness and deadlock and livelock freedom for up to k À 1 benign simultaneous switch and/or link faults where k is half the number of ports in the switches. Our simulation experiments show a graceful degradation of performance as more faults occur. Furthermore, we demonstrate that for most fault combinations, our method will even be able to handle significantly more faults beyond the k À 1 limit with high probability.
InterconnectIon network reSeArcH Interconnection networks stipulate particularly high demands in ... more InterconnectIon network reSeArcH Interconnection networks stipulate particularly high demands in terms of bandwidth, delay, and delivery over short distances. Typical sample interconnection application areas include multiprocessors, computing clusters, storage servers, computer I/O devices, processor-memory interconnects, networks-on-a-chip, and Advancing research will enable an interconnection network to support the same seamless virtualization found in other parts of hardware, such as cPUs. Such a network thus poses particular challenges as well as opportunities for a utility computing data center.
IEEE Transactions on Parallel and Distributed Systems
Clouds offer flexible and economically attractive compute and storage solutions for enterprises. ... more Clouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges related to load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. Moreover, cloud data centers incorporate a highly dynamic environment rendering static network reconfigurations, typically used in IB systems, infeasible. In this paper, we present a framework for a self-adaptive network architecture for HPC clouds based on lossless interconnection networks, demonstrated by means of our implemented IB prototype. Our solution, based on a feedback control and optimization loop, enables the lossless HPC network to dynamically adapt to the varying traffic patterns, current resource availability, workload distributions, and also in accordance with the service provider-defined policies. Furthermore, we present IBAdapt, a simplified ruled-based language for the service providers to specify adaptation strategies used by the framework. Our developed self-adaptive IB network prototype is demonstrated using state-of-the-art industry software. The results obtained on a test cluster demonstrate the feasibility and effectiveness of the framework when it comes to improving Quality-of-Service compliance in HPC clouds.
2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), 2016
Exascale computing systems are being built with thousands of nodes. A key component of these syst... more Exascale computing systems are being built with thousands of nodes. A key component of these systems is the interconnection network. The high number of components significantly increases the probability of failure. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid KNS family that provides supreme performance and connectivity at a reduced hardware cost. This paper present a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in the presence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to tolerate network failures, the methodology uses a simple mechanism: for some source-destination pairs, only if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) which allow avoiding faults. The evaluation results shows that the methodology tolerates a large number of faults. Furthermore, the methodology offers a gracious performance degradation. For instance, performance degrades only 1% for a 2D-network with 1024 nodes and 1% faulty links.
2015 IEEE International Conference on Cluster Computing, 2015
As the size of high-performance computing systems grows, the number of events requiring a network... more As the size of high-performance computing systems grows, the number of events requiring a network reconfiguration, as well as the complexity of each reconfiguration, is likely to increase. In large systems, the probability of component failure is high. At the same time, with more network components, ensuring high utilization of network resources becomes challenging. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source-destination pairs. In this paper, we propose a novel routing algorithm for IB based fat-tree topologies, SlimUpdate. SlimUpdate employs techniques to preserve existing forwarding entries in switches to ensure a minimal routing update, without any performance penalty, and with minimal computational overhead. We present an implementation of SlimUpdate in OpenSM, and compare it with the current de facto fat-tree routing algorithm. Our experiments and simulations show a decrease of up to 80% in the number of total path modifications when using SlimUpdate routing, while achieving similar or even better performance than the fat-tree routing in most reconfiguration scenarios.
2015 IEEE International Conference on Cluster Computing, 2015
To meet the demands of the Exascale era and facilitate Big Data analytics in the cloud while main... more To meet the demands of the Exascale era and facilitate Big Data analytics in the cloud while maintaining flexibility, cloud providers will have to offer efficient virtualized High Performance Computing clusters in a pay-as-you-go model. As a consequence, high performance network interconnect solutions, like InfiniBand (IB), will be beneficial. Currently, the only way to provide IB connectivity on Virtual Machines (VMs) is by utilizing direct device assignment. At the same time to be scalable, Single-Root I/O Virtualization (SR-IOV) is used. However, the current SR-IOV model employed by IB adapters is a Shared Port implementation with limited flexibility, as it does not allow transparent virtualization and live-migration of VMs. In this paper, we explore an alternative SR-IOV model for IB, the virtual switch (vSwitch), and propose and analyze two vSwitch implementations with different scalability characteristics. Furthermore, as network reconfiguration time is critical to make live-migration a practical option, we accompany our proposed architecture with a scalable and topology agnostic dynamic reconfiguration method, implemented and tested using OpenSM. Our results show that we are able to significantly reduce the reconfiguration time as route recalculations are no longer needed, and in large IB subnets, for certain scenarios, the number of reconfiguration subnet management packets (SMPs) sent is reduced from several hundred thousand down to a single one.
Medical imaging is critical for the detection and diagnosis of disease, guided biopsies, assessme... more Medical imaging is critical for the detection and diagnosis of disease, guided biopsies, assessment of therapies, and administration of treatment. While computerized tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and ultra-sound (US) are the more familiar modalities, interest in yet other modalities continues to grow. Among the motivations are reduction of cost, avoidance of ionizing radiation, and the search for new information, including biochemical and molecular processes. Fluorescence Molecular Tomography (FMT) is one such emerging technique and, like other techniques, has its advantages and limitations. FMT can reconstruct the distribution of fluorescent molecules in vivo using near-infrared radiation or visible band light to illuminate the subject. FMT is very safe since non-ionizing radiation is used, and inexpensive due to the comparatively low cost of the imaging system. This should make it particularly well suited for small animal studies for research. A broad range of cell activity can be identified by FMT, making it a potentially valuable tool for cancer screening, drug discovery and gene therapy. Since FMT imaging is scattering dominated, reconstruction of volume images is significantly more computationally intensive than for CT. For instance, to reconstruct a vii 32×32×32 image, a flattened matrix with approximately 10 10 , or billion, elements must be dealt with in the inverse problem, while requiring more than 100 GB of memory. To reduce the error introduced by noisy measurements, significantly more measurements are needed, leading to a proportionally larger matrix. The computational complexity of reconstructing FMT images, along with inaccuracies in photon propagation models, has heretofore limited the resolution and accuracy of FMT. To surmount the problems stated above, we decompose the forward problem into a Khatri-Rao product. Inversion of this model is shown to lead to a novel reconstruction method that significantly reduces the computational complexity and memory requirements for overdetermined datasets. Compared to the well known SVD approach, this new reconstruction method decreases computation time by a factor of up to 25, while simultaneously reducing the memory requirement by up to three orders of magnitude. Using this method, we have reconstructed images up to 32×32×32. Also outlined is a two step approach which would enable imaging larger volumes. However, it remains a topic for future research. In achieving the above, the author studied the physics of FMT, developed an extensive set of original computer programs, performed COMSOL simulations on photon diffusion, and unavoidably, developed visual displays.
2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06), 2006
ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward pack... more ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks. Dynamic fault tolerance allows the network to remain available through the occurrence of faults, as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Source routing readily supports the source node choosing a different path when a fault occurs, but using this approach, packets already in the network will be lost. Local dynamic fault tolerance, where the packet is routed around the fault locally, would prevent much of the traffic being lost during failures, but this is cumbersome to achieve in source routed networks since packets encountering a fault will need to follow a path different from that encoded in the packet header. In this paper we present a mechanism to achieve local dynamic fault tolerance in source routed fat trees, a topology that has widespread use in supercomputer systems, and compare it with endpoint dynamic fault tolerance. We also show that by combining the two approaches we achieve performance superior to any of the two individually
Virtualization of computing resources is becoming increasingly important both for high-end server... more Virtualization of computing resources is becoming increasingly important both for high-end servers and multi-core CPUs. In a virtualized system, the set of resources that constitute a virtual compute entity should be spatially separated from each other. Dividing the cores on a chip, or the CPUs in a high end server into disjoint sets for each task is a trivial problem.
Interconnection networks play a key role in the fault toler- ance of massively parallel computers... more Interconnection networks play a key role in the fault toler- ance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to dif- ferent regular topologies. The methodology is mainly based on the selec- tion of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodol- ogy requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults.
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Tho... more ... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Thomas Sødring2 1Department of Informatics 2Networks and Distributed Systems ... The second dataset is the continuous black line, which we call the network performance ratio. ...
IEEE Transactions on Parallel and Distributed Systems, 2006
Freedom from deadlock is a key issue in Cut-Through, Wormhole, and Store and Forward networks, an... more Freedom from deadlock is a key issue in Cut-Through, Wormhole, and Store and Forward networks, and such freedom is usually obtained through careful design of the routing algorithm. Most existing deadlock-free routing methods for irregular topologies do, however, impose severe limitations on the available routing paths. We present a method called Layered Routing, which gives rise to a series of routing algorithms, some of which perform considerably better than previous ones. Our method groups virtual channels into network layers and to each layer it assigns a limited set of source/destination address pairs. This separation of traffic yields a significant increase in routing efficiency. We show how the method can be used to improve the performance of irregular networks, both through load balancing and by guaranteeing shortest-path routing. The method is simple to implement, and its application does not require any features in the switches other than the existence of a modest number of virtual channels. The performance of the approach is evaluated through extensive experiments within three classes of technologies. These experiments reveal a need for virtual channels as well as an improvement in throughput for each technology class.
Fat trees are a very common communication architecture in current large-scale parallel computers.... more Fat trees are a very common communication architecture in current large-scale parallel computers. The probability of failure in these systems increases with the number of components. We present a routing method for deterministically and adaptively routed fat trees, applicable to both distributed and source routing, that is able to handle several concurrent faults and that transparently returns to the original routing strategy once the faulty components have recovered. The method is local and dynamic, completely masking the fault from the rest of the system. It only requires a small extra functionality in the switches to handle rerouting packets around a fault. The method guarantees connectedness and deadlock and livelock freedom for up to k À 1 benign simultaneous switch and/or link faults where k is half the number of ports in the switches. Our simulation experiments show a graceful degradation of performance as more faults occur. Furthermore, we demonstrate that for most fault combinations, our method will even be able to handle significantly more faults beyond the k À 1 limit with high probability.
InterconnectIon network reSeArcH Interconnection networks stipulate particularly high demands in ... more InterconnectIon network reSeArcH Interconnection networks stipulate particularly high demands in terms of bandwidth, delay, and delivery over short distances. Typical sample interconnection application areas include multiprocessors, computing clusters, storage servers, computer I/O devices, processor-memory interconnects, networks-on-a-chip, and Advancing research will enable an interconnection network to support the same seamless virtualization found in other parts of hardware, such as cPUs. Such a network thus poses particular challenges as well as opportunities for a utility computing data center.
IEEE Transactions on Parallel and Distributed Systems
Clouds offer flexible and economically attractive compute and storage solutions for enterprises. ... more Clouds offer flexible and economically attractive compute and storage solutions for enterprises. However, the effectiveness of cloud computing for high-performance computing (HPC) systems still remains questionable. When clouds are deployed on lossless interconnection networks, like InfiniBand (IB), challenges related to load-balancing, low-overhead virtualization, and performance isolation hinder full potential utilization of the underlying interconnect. Moreover, cloud data centers incorporate a highly dynamic environment rendering static network reconfigurations, typically used in IB systems, infeasible. In this paper, we present a framework for a self-adaptive network architecture for HPC clouds based on lossless interconnection networks, demonstrated by means of our implemented IB prototype. Our solution, based on a feedback control and optimization loop, enables the lossless HPC network to dynamically adapt to the varying traffic patterns, current resource availability, workload distributions, and also in accordance with the service provider-defined policies. Furthermore, we present IBAdapt, a simplified ruled-based language for the service providers to specify adaptation strategies used by the framework. Our developed self-adaptive IB network prototype is demonstrated using state-of-the-art industry software. The results obtained on a test cluster demonstrate the feasibility and effectiveness of the framework when it comes to improving Quality-of-Service compliance in HPC clouds.
2016 2nd IEEE International Workshop on High-Performance Interconnection Networks in the Exascale and Big-Data Era (HiPINEB), 2016
Exascale computing systems are being built with thousands of nodes. A key component of these syst... more Exascale computing systems are being built with thousands of nodes. A key component of these systems is the interconnection network. The high number of components significantly increases the probability of failure. If failures occur in the interconnection network, they may isolate a large fraction of the machine. For this reason, an efficient fault-tolerant mechanism is needed to keep the system interconnected, even in the presence of faults. A recently proposed topology for these large systems is the hybrid KNS family that provides supreme performance and connectivity at a reduced hardware cost. This paper present a fault-tolerant routing methodology for the KNS topology that degrades performance gracefully in the presence of faults and tolerates a reasonably large number of faults without disabling any healthy node. In order to tolerate network failures, the methodology uses a simple mechanism: for some source-destination pairs, only if necessary, packets are forwarded to the destination node through a set of intermediate nodes (without being ejected from the network) which allow avoiding faults. The evaluation results shows that the methodology tolerates a large number of faults. Furthermore, the methodology offers a gracious performance degradation. For instance, performance degrades only 1% for a 2D-network with 1024 nodes and 1% faulty links.
2015 IEEE International Conference on Cluster Computing, 2015
As the size of high-performance computing systems grows, the number of events requiring a network... more As the size of high-performance computing systems grows, the number of events requiring a network reconfiguration, as well as the complexity of each reconfiguration, is likely to increase. In large systems, the probability of component failure is high. At the same time, with more network components, ensuring high utilization of network resources becomes challenging. Reconfiguration in interconnection networks, like InfiniBand (IB), typically involves computation and distribution of a new set of routes in order to maintain connectivity and performance. In general, current routing algorithms do not consider the existing routes in a network when calculating new ones. Such configuration-oblivious routing might result in substantial modifications to the existing paths, and the reconfiguration becomes costly as it potentially involves a large number of source-destination pairs. In this paper, we propose a novel routing algorithm for IB based fat-tree topologies, SlimUpdate. SlimUpdate employs techniques to preserve existing forwarding entries in switches to ensure a minimal routing update, without any performance penalty, and with minimal computational overhead. We present an implementation of SlimUpdate in OpenSM, and compare it with the current de facto fat-tree routing algorithm. Our experiments and simulations show a decrease of up to 80% in the number of total path modifications when using SlimUpdate routing, while achieving similar or even better performance than the fat-tree routing in most reconfiguration scenarios.
2015 IEEE International Conference on Cluster Computing, 2015
To meet the demands of the Exascale era and facilitate Big Data analytics in the cloud while main... more To meet the demands of the Exascale era and facilitate Big Data analytics in the cloud while maintaining flexibility, cloud providers will have to offer efficient virtualized High Performance Computing clusters in a pay-as-you-go model. As a consequence, high performance network interconnect solutions, like InfiniBand (IB), will be beneficial. Currently, the only way to provide IB connectivity on Virtual Machines (VMs) is by utilizing direct device assignment. At the same time to be scalable, Single-Root I/O Virtualization (SR-IOV) is used. However, the current SR-IOV model employed by IB adapters is a Shared Port implementation with limited flexibility, as it does not allow transparent virtualization and live-migration of VMs. In this paper, we explore an alternative SR-IOV model for IB, the virtual switch (vSwitch), and propose and analyze two vSwitch implementations with different scalability characteristics. Furthermore, as network reconfiguration time is critical to make live-migration a practical option, we accompany our proposed architecture with a scalable and topology agnostic dynamic reconfiguration method, implemented and tested using OpenSM. Our results show that we are able to significantly reduce the reconfiguration time as route recalculations are no longer needed, and in large IB subnets, for certain scenarios, the number of reconfiguration subnet management packets (SMPs) sent is reduced from several hundred thousand down to a single one.
Medical imaging is critical for the detection and diagnosis of disease, guided biopsies, assessme... more Medical imaging is critical for the detection and diagnosis of disease, guided biopsies, assessment of therapies, and administration of treatment. While computerized tomography (CT), magnetic resonance imaging (MRI), positron emission tomography (PET), and ultra-sound (US) are the more familiar modalities, interest in yet other modalities continues to grow. Among the motivations are reduction of cost, avoidance of ionizing radiation, and the search for new information, including biochemical and molecular processes. Fluorescence Molecular Tomography (FMT) is one such emerging technique and, like other techniques, has its advantages and limitations. FMT can reconstruct the distribution of fluorescent molecules in vivo using near-infrared radiation or visible band light to illuminate the subject. FMT is very safe since non-ionizing radiation is used, and inexpensive due to the comparatively low cost of the imaging system. This should make it particularly well suited for small animal studies for research. A broad range of cell activity can be identified by FMT, making it a potentially valuable tool for cancer screening, drug discovery and gene therapy. Since FMT imaging is scattering dominated, reconstruction of volume images is significantly more computationally intensive than for CT. For instance, to reconstruct a vii 32×32×32 image, a flattened matrix with approximately 10 10 , or billion, elements must be dealt with in the inverse problem, while requiring more than 100 GB of memory. To reduce the error introduced by noisy measurements, significantly more measurements are needed, leading to a proportionally larger matrix. The computational complexity of reconstructing FMT images, along with inaccuracies in photon propagation models, has heretofore limited the resolution and accuracy of FMT. To surmount the problems stated above, we decompose the forward problem into a Khatri-Rao product. Inversion of this model is shown to lead to a novel reconstruction method that significantly reduces the computational complexity and memory requirements for overdetermined datasets. Compared to the well known SVD approach, this new reconstruction method decreases computation time by a factor of up to 25, while simultaneously reducing the memory requirement by up to three orders of magnitude. Using this method, we have reconstructed images up to 32×32×32. Also outlined is a two step approach which would enable imaging larger volumes. However, it remains a topic for future research. In achieving the above, the author studied the physics of FMT, developed an extensive set of original computer programs, performed COMSOL simulations on photon diffusion, and unavoidably, developed visual displays.
2006 18th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD'06), 2006
ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward pack... more ABSTRACT An increasing amount of interconnect technologies rely on source routing to forward packets through the network. It is therefore important to develop methods for fault tolerance that are well suited for source routed networks. Dynamic fault tolerance allows the network to remain available through the occurrence of faults, as opposed to static fault tolerance which requires the network to be halted to reconfigure it. Source routing readily supports the source node choosing a different path when a fault occurs, but using this approach, packets already in the network will be lost. Local dynamic fault tolerance, where the packet is routed around the fault locally, would prevent much of the traffic being lost during failures, but this is cumbersome to achieve in source routed networks since packets encountering a fault will need to follow a path different from that encoded in the packet header. In this paper we present a mechanism to achieve local dynamic fault tolerance in source routed fat trees, a topology that has widespread use in supercomputer systems, and compare it with endpoint dynamic fault tolerance. We also show that by combining the two approaches we achieve performance superior to any of the two individually
Virtualization of computing resources is becoming increasingly important both for high-end server... more Virtualization of computing resources is becoming increasingly important both for high-end servers and multi-core CPUs. In a virtualized system, the set of resources that constitute a virtual compute entity should be spatially separated from each other. Dividing the cores on a chip, or the CPUs in a high end server into disjoint sets for each task is a trivial problem.
Interconnection networks play a key role in the fault toler- ance of massively parallel computers... more Interconnection networks play a key role in the fault toler- ance of massively parallel computers, since faults may isolate a large fraction of the machine containing many healthy nodes. In this paper, we present a methodology to design fully adaptive fault-tolerant routing algorithms for direct interconnection networks that can be applied to dif- ferent regular topologies. The methodology is mainly based on the selec- tion of an intermediate node (if needed) for each source-destination pair. Packets are adaptively routed to the intermediate node and, from this node, they are adaptively forwarded to their destination. This methodol- ogy requires only one additional virtual channel, even for tori. Evaluation results show that the methodology is 7-fault tolerant, and for up to 14 faults, more than 99% of the combinations are tolerated, also without significantly degrading performance in the presence of faults.
2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum, 2011
... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Tho... more ... Frank Olaf Sem-Jacobsen1,2, ˚Ashild Grønstad Solheim1,2, Olav Lysne1,2, Tor Skeie1,2, and Thomas Sødring2 1Department of Informatics 2Networks and Distributed Systems ... The second dataset is the continuous black line, which we call the network performance ratio. ...
Uploads
Papers by Tor Skeie