Fault Tolerant
5,828 Followers
Recent papers in Fault Tolerant
We describe in detail a general strategy for implementing a conditional geometric phase between two spins. Combined with single-spin operations, this simple operation is a universal gate for quantum computation, in that any unitary... more
Responsiveness, the ability to provide real time behavior eveninpresence of faults, is becoming one of the most sought after properties in distributed computing systems. We present a framework for ''High-Performance Responsive... more
This tutorial is about design and proof of design of reliable systems from unreliable components. It teaches the concept and techniques of fault-tolerance, at the same time building a formal theory where this property can be specified and... more
Soft error tolerant design becomes more crucial due to exponential increase in the vulnerability of computer systems to soft errors. Accurate estimation of soft error rate (SER), the probability of system failure due to soft errors, is a... more
A method is investigated to directly engineer the voltage swing in SiGe resonant interband tunnel diodes (RITDs). Voltage swing, defined here as the voltage difference between the peak voltage and the projected peak voltage, is... more
Ultra large scale (ULS) systems are future software intensive systems that have billions of lines of code, composed of heterogeneous, changing, inconsistent and independent elements that are dispersed through worldwide global networks.... more
In the application domain of online information services such as online census information, health records and real-time stock quotes, there are at least two fundamental challenges: the protection of users' privacy and the assurance of... more
This paper presents a new low-loss modulation technique for the hybrid three-level four-leg converter. The total losses of the converter are reduced by about 18% on average compared to the standard three-leg neutral-point-clamped... more
The goal of the GUARDS project is to design and develop a generic fault-tolerant computer architecture that can be built from predefined standardised components. The architecture favours the use of commercial off-the-shelf (COTS) hardware... more
In this paper, we present a new deadlock-free fault-tolerant adaptive routing algorithm for the 2D mesh NoC interconnections. The main contribution of this routing algorithm is that it allows both, routing of messages in the networks... more
One formidable difficulty in quantum communication and computation is to protect information-carrying quantum states against undesired interactions with the environment. To address this difficulty, many good quantum error-correcting codes... more
In this paper, a methodology for the development of fault-tolerant adders based on the radix 2 signed digit (SD) representation is presented. The use of a number representation characterized by a carry propagation confined to neighbor... more
In this research work, a survey on Wireless Sensor Networks (WSN) and their technologies, standards and applications was carried out. Wireless sensor networks consist of small nodes with sensing, computation, and wireless communications... more
A major challenge for quantum computation in ion trap systems is scalable integration of error correction and fault tolerance. We analyze a distributed architecture with rapid high fidelity local control within nodes and entangled links... more
Proactive fault tolerance (FT) in high-performance computing is a concept that prevents compute node failures from impacting running parallel applications by preemptively migrating application parts away from nodes that are about to fail.... more
Over the last decade, storage systems have experienced a 10fold increase between their capacity and bandwidth. This gap is predicted to grow faster with exponentially growing concurrency levels, with future exascales delivering millions... more
Real computer-based systems fail, and hence are often far less dependable than their owners and users need and desire. Individuals, organisations and indeed the world at large are becoming more dependent on such systems, so there has been... more
Software architectures are becoming centric to the development of quality software systems, being the first concrete model of the software system and the base to guide the implementation of software systems. When architecting dependable... more
In a grid environment, resources and services are distributed with dynamic and heterogeneous characteristics. Efficient service discovery is one challenging issue in a grid environment. In this paper, we propose a new distributed and... more
To address the increasing susceptibility of commodity chip multiprocessors (CMPs) to transient faults, we propose Chiplevel Redundantly Threaded multiprocessor with Recovery (CRTR). CRTR extends the previously-proposed CRT for... more
Wireless distributed microsensor systems will enable fault tolerant monitoring and control of a variety of applications. Due to the large number of microsensor nodes that may be deployed and the long required system lifetimes, replacing... more
With the increasing complexity of multiprocessor and distributed processing systems, the need to develop efficient and accurate modeling methods is evident. Fault tolerance and degradable performance of such systems has given rise to... more
The tremendous advances in wireless networks, mobile computing, and sensor networks, along with the rapid growth of small, portable and powerful computing devices, offers more and more opportunities for pervasive computing and... more
Interest in the area of pattern recognition has been renewed recently due to emerging applications which are not only challenging but also computationally more demanding. These applications include data mining (identifying a "pattern",... more
This article describes a self-healing mechanism for statemachine based distributed components. Each component is composed of two layers: a healing (HL) and a service or functional layer (FL). At least, the functional layer must be... more
Network-on-Chip (NoC) is used as the communication network in many applications that use multiple cores or Processing Elements (PEs). Routers play a crucial role as connectors since a faulty router can degrade the NoC's performance and... more
Reliable delivery of messages is an important problem that needs to be addressed in distributed systems. In this paper we present our strategy to enable reliable delivery of messages in the presence of link and node failures. This is... more
Artificial intelligence (AI) techniques are becoming useful as alternate approaches to conventional techniques or as components of integrated systems. They have been used to solve complicated practical problems in various areas and are... more
Peer-to-peer (P2P) overlay networks have recently become one of the hottest topics in OS research. These networks bring with them the promise of harnessing idle storage and network resources from client machines that voluntarily join the... more
In this paper, we present a new generation of active tactile modules (i.e., HEX-O-SKIN), which are developed in order to approach multimodal whole-body-touch sensation for humanoid robots. To better perform like humans, humanoid robots... more
FPGA based Fault injection and Fault tolerance techniques are used to evaluate and validate the reliability of VLSI circuits. This approach combines the efficiency of hardware based techniques and the flexibility of simulation based... more
Service Oriented Computing and its most famous implementation technology Web Services (WS) are becoming an important enabler of networked business models. Discovery mechanisms are a critical factor to the overall utility of Web Services.... more
This paper describes two different but complementary approaches that can be used to perform SEU-like fault injection sessions in order to predict error rates of digital processors. The Code Emulated Upset (CEU) approach allows fault... more
The emergence of cloud environments has made feasible the delivery of Internet-scale services by addressing a number of challenges such as live migration, fault tolerance and quality of service. However, current approaches do not tackle... more
Software errors are a major cause of outages and they are increasingly exploited in malicious attacks. Byzantine fault tolerance allows replicated systems to mask some software errors but it is expensive to deploy. This paper describes a... more
Hybrid systems are at the core of most embedded and many other kinds of systems; formal methods for analysis of hybrid systems have made remarkable progress in the last decade and thus provide a strong foundation for assurance in the... more
Universal quantum computation on decoherence-free subspaces and subsystems ͑DFSs͒ is examined with particular emphasis on using only physically relevant interactions. A necessary and sufficient condition for the existence of... more
This research presents an overview to the issue of fault diagnosis in distributed systems and an evaluation study to some of the algorithms proposed in literature for performing distributed fault diagnosis. One algorithm was chosen and... more