Fault Tolerance

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

FAULT TOLERANCE

Part I Introduction
Part II Process Resilience
Part III Reliable Communication
Part IV Distributed Commit
Part V Recovery
Chapter 8

FAULT TOLERANCE

Part I
Introduction
FAULT TOLERANCE

A DS should be fault-tolerant
⚫ Should be able to continue functioning in the presence of faults

Fault tolerance is related to dependability


DEPENDABILITY
Dependability Includes

Availability
Reliability
Safety
Maintainability
AVAILABILITY & RELIABILITY (1)
Availability: A measurement of whether a system is ready to be
used immediately
⚫ System is up and running at any given moment

Reliability: A measurement of whether a system can run


continuously without failure
⚫ System continues to function for a long period of time
AVAILABILITY & RELIABILITY (2)
A system goes down 1ms/hr has an availability of more
than 99.99%, but is unreliable

A system that never crashes but is shut down for a week


once every year is 100% reliable but only 98% available
SAFETY & MAINTAINABILITY
Safety: A measurement of how safe failures are
⚫ System fails, nothing serious happens
⚫ For instance, high degree of safety is required for systems controlling
nuclear power plants
Maintainability: A measurement of how easy it is to repair a
system
⚫ A highly maintainable system may also show a high degree of
availability
⚫ Failures can be detected and repaired automatically? Self-healing
systems?
FAULTS

A system fails when it cannot meet its promises


(specifications)
An error is part of a system state that may lead to a failure
A fault is the cause of the error
Fault-Tolerance: the system can provide services even in
the presence of faults
Faults can be:
⚫ Transient (appear once and disappear)
⚫ bird flying through the beam of a microwave transmitter
⚫ Intermittent (appear-disappear-reappear behavior)
A loose contact on a connector intermittent fault
⚫ Permanent (appear and persist until repaired)
FAILURE MODELS
Type of failure Description
Crash failure A server halts, but is working correctly until it halts

Omission failure A server fails to respond to incoming requests


Receive omission A server fails to receive incoming messages
Send omission A server fails to send messages
Timing failure A server's response lies outside the specified time interval

Response failure The server's response is incorrect


Value failure The value of the response is wrong
State transition failure The server deviates from the correct flow of control
Arbitrary failure A server may produce arbitrary responses at arbitrary times
(Byzantine failure)
FAILURE MASKING
Redundancy is key technique for hiding failures
Redundancy types:
1. Information: add extra (control) information
● Error-correction codes in messages
2. Time: perform an action persistently until it succeeds:
● Transactions
3. Physical: add extra components (S/W & H/W)
● Process replication, electronic circuits
EXAMPLE – REDUNDANCY IN CIRCUITS
(2)
Triple modular redundancy.
Chapter 8

FAULT TOLERANCE

Part II
Process Resilience
PROCESS RESILIENCE
Mask process failures by replication

Organize processes into groups, a message sent to a group is


delivered to all members

If a member fails, another should fill in


FLAT GROUPS VERSUS HIERARCHICAL
GROUPS

a) Communication in a flat group.


b) Communication in a simple hierarchical group
PROCESS REPLICATION
Replicate a process and group replicas in one group
How many replicas do we create?
A system is k fault-tolerant if it can survive and function even if it
has k faulty processes
⚫ For crash failures (a faulty process halts, but is working correctly until it
halts)
k+1 replicas
⚫ For Byzantine failures (a faulty process may produce arbitrary responses at
arbitrary times)
3k+1 replicas
AGREEMENT
Need agreement in DS:
⚫ Leader, commit, synchronize
Distributed Agreement algorithm: all non-faulty processes
achieve consensus in a finite number of steps
Perfect processes, faulty channels: two-army
Faulty processes, perfect channels: Byzantine generals
TWO-ARMY PROBLEM
BYZANTINE GENERALS PROBLEM
BYZANTINE GENERALS -EXAMPLE (1)

The Byzantine generals problem for 3 loyal generals and1 traitor.


a) The generals announce the time to launch the attack (by messages
marked by their ids).
b) The vectors that each general assembles based on (a)
c) The vectors that each general receives in step 3, where every general
passes his vector from (b) to every other general.
BYZANTINE GENERALS –EXAMPLE (2)

The same as in previous slide, except now with 2


loyal generals and one traitor.
BYZANTINE GENERALS
Given three processes, if one fails, consensus is impossible
Lamport et a1(1982) proved that in a system with k faulty
processes, agreement can be achieved only if 2k+1 correctly
functioning processes are present, for a total of 3k + 1.
Put in slightly different terms, agreement is possible only if more
than two-thirds of the processes are working properly
Chapter 8

FAULT TOLERANCE

Part III
Reliable Communication
RELIABLE GROUP
COMMUNICATION
RELIABLE GROUP
COMMUNICATION
For process resilience to take place, there has to be reliable
process communication.
Reliable multicast services guarantee that messages are
delivered to all members in a process group
⚫ When a group is static and processes do not fail

Reliable communication = deliver the message to all group


members
⚫ Any order delivery
⚫ Ordered delivery
BASIC RELIABLE-MULTICASTING
SCHEMES

A simple solution to reliable multicasting when all


receivers are known and assumed not to fail
a) Message transmission
b) Reporting feedback
BASIC RELIABLE-MULTICASTING
SCHEMES
The sending process assigns a sequence number to each
message it multicasts. We assume that messages are
received in the order they are sent
Each multicast message is stored locally in a history
buffer at the sender.
If a receiver detects it is missing a message, it may return
a negative acknowledgment, requesting the sender for a
retransmission
To reduce the number of messages returned to the
sender, acknowledgments could possibly be piggybacked
with other messages
SCALABILITY IN RELIABLE
MULTICASTING
Reliable multicast scheme cannot support large numbers
of receivers
N receivers = N acknowledgments
Soln: Receivers do not acknowledge receipt
A receiver returns a feedback message only to inform the
sender it is missing a message
Problem: Sender has to keep message in history buffer
NONHIERARCHICAL FEEDBACK
CONTROL
The key issue to scalable solutions for reliable
multicasting is to reduce the number of feedback
messages that are returned to the sender
Solution Model used: feedback suppression
⚫ Scalable Reliable Multicasting(SRM) Protocol
SCALABLE RELIABLE MULTICASTING
(SRM) PROTOCOL
How it works
Receivers never acknowledge the successful delivery of a
multicast message, but instead, report only when they are
missing a message
How message loss is detected is left to the application.
Only negative acknowledgments are returned as feedback.
Whenever a receiver notices that it missed a message, it
multicasts its feedback to the rest of the group
SCALABLE RELIABLE MULTICASTING
(SRM) PROTOCOL
Multicasting feedback allows another group member to
suppress its own feed-back
Problems:
Multiple retransmissions
Interruption of the processes to which the message has been
successfully delivered.
A better approach - let receivers that tend to miss the same
messages team up and share the same multicast channel for
feedback messages and retransmissions
HIERARCHICAL FEEDBACK CONTROL
⚫ SRM protocol described is a non hierarchical solution to reliable
multicasting that may not be suitable for a large system.
⚫ Achieving scalability for very large groups of receivers requires
that hierarchical approaches are adopted
HIERARCHICAL FEEDBACK CONTROL
A group of receivers is partitioned into a number of
subgroups, which are subsequently organized into a tree.
The subgroup containing the sender forms the root of the
tree.
Within each sub-group, any reliable multicasting scheme
that works for small groups can be used.
Each subgroup appoints a local coordinator, which is
responsible for handling retransmission requests of
receivers contained in its subgroup.
The local coordinator has its own history buffer.
HIERARCHICAL FEEDBACK CONTROL
If the coordinator itself has missed a message m, it asks
the coordinator of the parent subgroup to retransmit m.
In a scheme based on acknowledgments, a local
coordinator sends an acknowledgment to its parent if it
has received the message.
If a coordinator has received acknowledgments for
message m from all members in its subgroup, as well as
from its children, it can remove m from its history buffer
The main problem with hierarchical solutions is the
construction of the tree – has to be dynamic
VIRTUAL SYNCHRONY SYSTEM MODEL

The logical organization of a distributed system to distinguish


between message receipt and message delivery
ATOMIC MULTICAST
How do we achieve reliable multicasting in the presence
of process failures?
Distributed system need a guarantee that a message is
delivered to either all processes or to none at all.
Atomic multicast ensures this
In the event of a crash, non faulty processes must reach
an agreement
Update is performed if the remaining replicas have
agreed that the crashed replica no longer belongs to the
group
On recovery, the process is forced to join the group
ATOMIC MULTICAST
Idea behind atomic multicasting is that a multicast message
m is uniquely associated with a list of processes to which it
should be delivered.

This delivery list corresponds to a group view G: the view


on the set of processes contained in the group, which the
sender had at the time message m was multicast

Scenario: a message m is multicast and immediately after,


a message vc is multicast signaling a new member has
joined/left
ATOMIC MULTICAST
What we need to guarantee is that m is either delivered to
all processes in G before each one of them is delivered
message vc, or m is not delivered at all

If m is not delivered to any process, can we speak of a


reliable multicast protocol?

Yes: Delivery of m is allowed to fail only when the


group membership change is the result of the sender of m
crashing
VIRTUAL SYNCHRONOUS MULTICAST
Reliable multicast guarantees that a message multicast to
group view G is delivered to each non-faulty process in
G.
If the sender of the message crashes during the multicast,
the message may either be delivered to all remaining
processes, or ignored by each of them.
A reliable multicast with this property is said to be
virtually synchronous
The principle of virtual synchrony comes from the fact
that all multicasts take place between view changes
MESSAGE ORDERING
Virtual synchrony allows an application developer to think
about multicasts as taking place in epochs that are separated
by group membership changes
Multicasts can be ordered in different ways:
⚫ Unordered multicasts
⚫ FIFO-ordered multicasts
⚫ Causally-ordered multicasts
⚫ Totally-ordered multicasts
Regardless of whether message delivery is unordered, FIFO
ordered, or causally ordered, it is required additionally that
when messages are delivered, they are delivered in the same
order to all group members
RELIABLE MULTICASTING
Virtually synchronous reliable multicasting offering
totally-ordered delivery of messages is called atomic
multicasting.
ISIS is a fault-tolerant distributed system that implements
virtually synchronous reliable multicast
Isis makes use of available reliable point-to-point communication
facilities of the underlying network, in particular, TCP
Isis also assumes that messages from the same source are
received by a communication layer in the order they were sent by
that source
Main problem: guarantee that all messages sent to view G are
delivered to all non-faulty processes in G before the next group
membership change takes place
VIRTUAL SYNCHRONOUS MULTICAST

a) Message is not b) Message is


delivered delivered
A A

B B

C C
Gi = (A, B, C) Gi+1 = (B, Gi = (A, B, C) Gi+1 = (B,
C) C)
VIRTUAL SYNCHRONY
IMPLEMENTATION: [BIRMAN ET AL., 1991]
Only stable messages are delivered
Stable message: a message received by all processes in the
message’s group view
Assumptions (can be ensured by using TCP):
⚫ Point-to-point communication is reliable
⚫ Point-to-point communication ensures FIFO-ordering

How to determine if a message is stable?


VIRTUAL SYNCHRONY
IMPLEMENTATION: EXAMPLE
Gi = {P1, P2, P3, P4, P5}
P5 fails
P2 P3
P1 detects that P5 has failed
P1 send a “view change” message
to every process in Gi+1 = {P1, P2,
P3, P4} P1 change view

P4

P5
VIRTUAL SYNCHRONY
IMPLEMENTATION: EXAMPLE

Every process unstable message


⚫ Send each unstable message m P2 P3
from Gi to members in Gi+1
⚫ Marks m as being stable
⚫ Send a flush message to mark
that all unstable messages have P1
been sent flush
P4
message

P5
VIRTUAL SYNCHRONY
IMPLEMENTATION: EXAMPLE

Every process
⚫ After receiving a flush message P2 P3
from all processes in Gi+1 installs
Gi+1

P1

P4

P5
IMPLEMENTING VIRTUAL SYNCHRONY
If the sender of m to G fails, there should be other ways
of ensuring that m is received by those processes that did
not.
Every process in G keeps m until it knows for sure that
all members in G have received it.
If m has been received by all members in G, m is said to
be stable. Only stable messages are allowed to be
delivered.
To ensure stability, it is sufficient to select an arbitrary
(operational) process in G and request it to send m to all
other processes.
IMPLEMENTING VIRTUAL SYNCHRONY
When a process P receives the view-change message for
Gi+1, it first forwards a copy of any unstable message
from G, it still has to every process in Gi+1, and
subsequently marks it as being stable.
To indicate that P no longer has any unstable messages
and that it is prepared to install Gi+1 as soon as the other
processes can do that as well, it multicasts a flush
message for Gi +1
After P has received a flush message for Gi+ 1 from each
other process, it can safely install the new view
IMPLEMENTING VIRTUAL SYNCHRONY
Major flaw in this protocol:
it cannot deal with process failures while a new view
change is being announced
Solution: announcing view changes for any view Gi+k
even while previous changes have not yet been installed
by all processes.
DISTRIBUTED COMMIT
The atomic multicasting problem discussed previously is an
example of a more general problem, known as distributed
commit.

Goal: Either all members of a group decide to perform an


operation, or none of them perform the operation

Atomic transaction: a transaction that happens completely or


not at all
Implementation is through: 1PC, 2PC or 3PC.

You might also like