Introduction To Fault Tolerant
Introduction To Fault Tolerant
Introduction To Fault Tolerant
1 / 52
114
Distributed Systems
114
Distributed Systems
2 / 52
3 / 52
4 / 52
faults at runtime
outside of control of designer/developer
e.g., to the right: crack in a diode in the
data link interface of the Space Shuttle
led to erroneous messages being sent
Driscoll (Honeywell)
Josef Widder (www.forsyte.at)
4 / 52
faults at runtime
outside of control of designer/developer
e.g., to the right: crack in a diode in the
data link interface of the Space Shuttle
led to erroneous messages being sent
Driscoll (Honeywell)
Josef Widder (www.forsyte.at)
4 / 52
faults at runtime
outside of control of designer/developer
e.g., to the right: crack in a diode in the
data link interface of the Space Shuttle
led to erroneous messages being sent
approach:
keep system operational despite faults
fault-tolerant distributed algorithms
Josef Widder (www.forsyte.at)
Driscoll (Honeywell)
TMPA14, Nov. 2014
4 / 52
5 / 52
5 / 52
Lecture overview
6 / 52
7 / 52
8 / 52
9 / 52
Application areas
buzzwords from the 60ies
operating systems
(distributed) data base systems
communication networks
multiprocessor architectures
control systems
New buzzwords
cloud computing
social networks
multi core
cyber-physical systems
Josef Widder (www.forsyte.at)
10 / 52
Major challenge
Uncertainty
computers run independently at different speeds
exchange messages with (unknown) delays
faults
11 / 52
Major challenge
Uncertainty
computers run independently at different speeds
exchange messages with (unknown) delays
faults
challenge in design of distributed algorithms
a process has access only to its local state
but one wants to achieve some global property
11 / 52
Major challenge
Uncertainty
computers run independently at different speeds
exchange messages with (unknown) delays
faults
challenge in design of distributed algorithms
a process has access only to its local state
but one wants to achieve some global property
challenge in proving them correct
large degree of non-determinism
large execution and state space
Josef Widder (www.forsyte.at)
11 / 52
12 / 52
replication
P
P1
P2
P3
12 / 52
replication
P
P1
P2
consistency
P3
P
P
12 / 52
replication
P
P1
P2
P3
13 / 52
pi
receive
send
internal
14 / 52
15 / 52
15 / 52
15 / 52
Asynchronous system
has very mild restrictions on the environment
interleaving semantics
unbounded message delays
very little can be done. . .
there is no distributed algorithm that solves consensus in the presence
of one faulty process
(as we will see, consensus is the paradigm of consistency)
folklore explanation:
you cannot distinguish a slow process from a crashed one
real explanation:
see intricate proof by Fischer, Lynch, and Paterson (JACM 1985)
16 / 52
Where we stand
replication
P
P1
P2
P3
17 / 52
P1
P2
consistency
P3
P
P
18 / 52
P1
P2
consistency
P3
P
P
18 / 52
P1
P2
consistency
P3
P
P
18 / 52
Wiktionary:
Byzantine: adj. of a devious, usually stealthy manner, of practice.
19 / 52
20 / 52
20 / 52
20 / 52
21 / 52
21 / 52
21 / 52
22 / 52
n
?
t
22 / 52
n
?
?
?
t f
n processes communicate by messages (reliable communication)
all processes know that at most t of them might be faulty
f are actually faulty
resilience conditions, e.g., n > 3t t f 0
no masquerading: the processes know the origin of incoming messages
Josef Widder (www.forsyte.at)
22 / 52
least severe
crash faults:
faulty processes prematurely halt (also) in the middle of send to all
omission faults:
faulty processes follow the algorithm, but some messages sent by them
might be lost
symmetric faults:
faulty processes send arbitrarily to all or nobody
Byzantine faults:
most severe
23 / 52
24 / 52
25 / 52
25 / 52
26 / 52
27 / 52
27 / 52
27 / 52
27 / 52
27 / 52
validity. If all correct processes have the same initial value v , then v
is the only possible decision value.
termination. Every correct process eventually decides.
Give an algorithm that solves validity and termination!
28 / 52
validity. If all correct processes have the same initial value v , then v
is the only possible decision value.
termination. Every correct process eventually decides.
Solution: decide my own proposed value. (no need to agree)
28 / 52
28 / 52
28 / 52
28 / 52
28 / 52
distributed systems
replication and consistency
synchronous vs. asynchronous
fault models
example for an agreement problem: Byzantine Generals
29 / 52
30 / 52
Asynchronous FTDAs
In this lecture we consider methods for asynchronous FTDAs that either
solve problems that are less hard than consensus:
reliable broadcast. termination required only for specific initial state
(Srikanth & Toueg, 1987). [Verified in Parts II, III, V]
condition-based consensus properties required only in runs from
specific initial states (Mostefaoui et al., 2003)
[Verified in Part II]
The Paxos idea fault-tolerant distributed algorithms that are safe and
make progress only if you are lucky (Lamport, 1998)
[Serious challenge]
are asynchronous but use information on faults as a black box
failure detector based atomic commitment. distributed databases
(Raynal, 1997)
[Challenge]
31 / 52
Asynchronous FTDAs
In this lecture we consider methods for asynchronous FTDAs that either
solve problems that are less hard than consensus:
reliable broadcast. termination required only for specific initial state
(Srikanth & Toueg, 1987). [Verified in Parts II, III, V]
condition-based consensus properties required only in runs from
specific initial states (Mostefaoui et al., 2003)
[Verified in Part II]
The Paxos idea fault-tolerant distributed algorithms that are safe and
make progress only if you are lucky (Lamport, 1998)
[Serious challenge]
are asynchronous but use information on faults as a black box
failure detector based atomic commitment. distributed databases
(Raynal, 1997)
[Challenge]
We use the algorithm from (Srikanth & Toueg, 1987) as running example
Josef Widder (www.forsyte.at)
31 / 52
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
32 / 52
asynchronous interleaving
33 / 52
Unforgeability. If vi = false for all correct processes i, then for all correct
processes j, acceptj remains false forever.
34 / 52
Unforgeability. If vi = false for all correct processes i, then for all correct
processes j, acceptj remains false forever.
if no loyal general wants to attack, then traitors should not
be able to force one.
Completeness. If vi = true for all correct processes i, then there is a
correct process j that eventually sets acceptj to true.
If all loyal generals want to attack, there shall be an attack.
Relay. If a correct process i sets accepti to true, then eventually all
correct processes j set acceptj to true.
If one loyal general attacks, then all loyal generals should attack.
34 / 52
Unforgeability. If vi = false for all correct processes i, then for all correct
processes j, acceptj remains false forever.
if no loyal general wants to attack, then traitors should not
be able to force one.
Completeness. If vi = true for all correct processes i, then there is a
correct process j that eventually sets acceptj to true.
If all loyal generals want to attack, there shall be an attack.
Relay. If a correct process i sets accepti to true, then eventually all
correct processes j set acceptj to true.
If one loyal general attacks, then all loyal generals should attack.
These are the specs as given in literature: they can be formalized in LTL
Josef Widder (www.forsyte.at)
34 / 52
35 / 52
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
36 / 52
37 / 52
37 / 52
37 / 52
t f
(threshold)
38 / 52
t f
(threshold)
38 / 52
t f
(threshold)
38 / 52
39 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
By contradiction assume a
correct process sets acceptj = 1
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
8
9
10
11
12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
8
9
10
11
12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
Let p be the first correct
processes that has sent (echo)
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
Let p be the first correct
processes that has sent (echo)
It did not send in line 7, as
vp = 0 by assumption
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
Let p be the first correct
processes that has sent (echo)
It did not send in line 7, as
vp = 0 by assumption
Thus, p sent in line 12
13
14
15
16
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
Let p be the first correct
processes that has sent (echo)
It did not send in line 7, as
vp = 0 by assumption
Thus, p sent in line 12
Based on t + 1 messages, i.e., 1
sent by a correct processes
40 / 52
Proof: Unforgeability
If vi = false for all correct processes i, then for all correct processes j,
acceptj remains false forever.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
By contradiction assume a
correct process sets acceptj = 1
Thus it has executed line 16
Thus it has received n t
messages by distinct processes
That means messages by at
n 2t correct processes
Let p be the first correct
processes that has sent (echo)
It did not send in line 7, as
vp = 0 by assumption
Thus, p sent in line 12
Based on t + 1 messages, i.e., 1
sent by a correct processes
contradiction to p being the
first one.
TMPA14, Nov. 2014
40 / 52
Proof: Completeness
If vi = true for all correct processes i, then there is a correct process j
that eventually sets acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
41 / 52
Proof: Completeness
If vi = true for all correct processes i, then there is a correct process j
that eventually sets acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
41 / 52
Proof: Completeness
If vi = true for all correct processes i, then there is a correct process j
that eventually sets acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
41 / 52
Proof: Completeness
If vi = true for all correct processes i, then there is a correct process j
that eventually sets acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
41 / 52
Proof: Completeness
If vi = true for all correct processes i, then there is a correct process j
that eventually sets acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
41 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
An atomic step:
i f vi = 1
then send ( echo ) to all ;
8
9
10
11
12
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
11
12
An atomic step:
i f vi = 1
then send ( echo ) to all ;
10
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
11
12
An atomic step:
i f vi = 1
then send ( echo ) to all ;
10
13
14
15
16
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
11
12
i f vi = 1
then send ( echo ) to all ;
i f received (echo) from a t l e a s t
t + 1 distinct p r o c e s s e s
and not s e n t ( echo ) b e f o r e
then send ( echo ) to all ;
13
14
15
16
An atomic step:
10
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
11
12
i f vi = 1
then send ( echo ) to all ;
i f received (echo) from a t l e a s t
t + 1 distinct p r o c e s s e s
and not s e n t ( echo ) b e f o r e
then send ( echo ) to all ;
13
14
15
16
An atomic step:
10
42 / 52
Proof: Relay
If a correct process i sets accepti to true, then eventually all correct
processes j set acceptj to true.
1
2
3
Variables of process i
vi : { 0 , 1} i n i t i a l l y 0 or 1
accepti : { 0 , 1} i n i t i a l l y 0
4
5
6
7
11
12
i f vi = 1
then send ( echo ) to all ;
i f received (echo) from a t l e a s t
t + 1 distinct p r o c e s s e s
and not s e n t ( echo ) b e f o r e
then send ( echo ) to all ;
13
14
15
16
An atomic step:
10
42 / 52
43 / 52
44 / 52
45 / 52
46 / 52
46 / 52
47 / 52
47 / 52
47 / 52
End of Part I
48 / 52
References I
49 / 52
References II
Lincoln, P., & Rushby, J. 1993.
A formally verified algorithm for interactive consistency under a hybrid fault model.
Pages 402411 of: FTCS-23.
http://dx.doi.org/10.1109/FTCS.1993.627343.
Malekpour, Mahyar R., & Siminiceanu, Radu. 2006.
Comments on the Byzantine Self-Stabilizing Pulse Synchronization Protocol:
Counterexamples.
Tech. rept. TM-2006-213951. NASA.
Martin, Jean-Philippe, & Alvisi, Lorenzo. 2006.
Fast Byzantine Consensus.
IEEE Trans. Dependable Sec. Comput., 3(3), 202215.
Most
efaoui, Achour, Mourgaya, Eric, Parv
edy, Philippe Raipin, & Raynal, Michel. 2003.
Evaluating the Condition-Based Approach to Solve Consensus.
Pages 541550 of: DSN.
Raynal, Michel. 1997.
A Case Study of Agreement Problems in Distributed Systems: Non-Blocking Atomic
Commitment.
Pages 209214 of: HASE.
Josef Widder (www.forsyte.at)
50 / 52
References III
51 / 52
52 / 52
52 / 52