Chapter 4

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 60

Distributed Systems

(3rd Edition)

Chapter 4: Communication
Communication
• Inter-process communication is at the heart of all distributed
systems
• Communication in distributed systems has traditionally
always been based on low-level message passing as offered
by the underlying network
• Expressing communication through message passing is
harder than using primitives based on shared memory, as
available for nondistributed system
• Modern distributed systems often consists of thousands or
even millions of processes scattered across a network with
unreliable communication such as the Internet
• Unless the primitive communication facilities of computer
networks are replaced by something else, development of
large-scale distributed applications extremely difficult
Outline
• Foundations –Communication Protocol
• Remote Procedure Call (RPC)
• Message-Oriented Communication (MOM)
• Multicast Communication
Communication: Foundations Layered Protocols

Basic networking model

► Due to the absence of shared memory, all communication in


distributed systems is based on sending and receiving (low
level) messages
► When process P wants to communicate with
process Q, it first builds a message in its own
address space
► Then it executes a system call that causes the
operating system to send the message over the
network to Q

The OSI reference model 4 / 49


Communication: Foundations Layered Protocols

Basic networking model…

Application protocol
Application 7
Presentation protocol
Presentation 6
Session protocol
Session 5
Transport protocol
Transport 4
Network protocol
Network 3
Data link protocol
Data link 2
Physical protocol
Physical 1

Network

Layers, interfaces, and protocols in the OSI model

The OSI reference model 5 / 49


Communication: Foundations Layered Protocols

Basic networking model…

A typical message as it appears on the network

Drawbacks
► Focus on message-passing only
► Often unneeded or unwanted functionality
► Violates access transparency
The OSI reference model 6 / 49
Communication: Foundations Layered Protocols

Low-level layers

Recap
► Physical layer: contains the specification and implementation of
bits, and their transmission between sender and receiver
► Data link layer: prescribes the transmission of a series of bits into
a frame to allow for error and flow control
► Network layer: describes how packets in a network of computers
are to be routed.

Observation
For many distributed systems, the lowest-level interface is that of the
network layer.

The OSI reference model 7 / 49


Communication: Foundations Layered Protocols

Transport Layer

Important
The transport layer provides the actual communication facilities for
most distributed systems.

Standard Internet protocols


► TCP: connection-oriented, reliable, stream-oriented
communication
► UDP: unreliable (best-effort) datagram
communication

The OSI reference model 8 / 49


Communication: Foundations Layered Protocols

Middleware layer

Observation
Middleware is invented to provide common services and protocols that
can be used by many different applications
► A rich set of communication protocols
► (Un)marshaling of data, necessary for integrated systems
► Naming protocols, to allow easy sharing of resources
► Security protocols for secure communication
► Scaling mechanisms, such as for replication and caching

Note
What remains are truly application-specific protocols... such as?

Middleware protocols 9 / 49
Communication: Foundations Layered Protocols

An adapted layering scheme

Application protocol
Application

Middleware protocol
Middleware

Operating Host-to-host protocol

system
Physical/Link-level protocol
Hardware

Network

Middleware protocols 10 /
Communication: Foundations Types of Communication

Types of communication

Distinguish...
Synchronize at Synchronize at Synchronize after
request submission request delivery processing by server
Client

Request

Transmission
interrupt

Storage
facility

Reply

Server Time

Viewing middleware as an intermediate (distributed) service in application-


level communication
► Transient versus persistent communication
► Asynchronous versus synchronous communication

11 /
Communication: Foundations Types of Communication

Types of communication
Transient versus persistent
Synchronize at Synchronize at Synchronize after
request submission request delivery processing by server
Client

Request

Transmission
interrupt

Storage
facility

Reply

Server Time

► Transient communication: Comm. server discards message when


it cannot be delivered at the next server, or at the receiver.
► Persistent communication: A message is stored at a
communication server as long as it takes to deliver it
(email).
12 /
Communication: Foundations Types of Communication

Types of communication
Places for synchronization
Synchronize at Synchronize at Synchronize after
request submission request delivery processing by server
Client

Request

Transmission
interrupt

Storage
facility

Reply

Server Time

► At request submission
► At request delivery
► After request processing

13 /
Communication: Foundations Types of Communication

Client/Server

Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
► Client and server have to be active at time of communication
► Client issues request and blocks until it receives reply
► Server essentially waits only for incoming requests, and
subsequently processes them

10 / 49
Communication: Foundations Types of Communication

Client/Server

Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
► Client and server have to be active at time of communication
► Client issues request and blocks until it receives reply
► Server essentially waits only for incoming requests, and
subsequently processes them

Drawbacks synchronous communication


► Client cannot do any other work while waiting for reply
► Failures have to be handled immediately: the client is
waiting
► The model may simply not be appropriate (mail, news)

10 / 49
Communication: Foundations Types of Communication

Messaging

Message-oriented middleware
Aims at high-level persistent asynchronous communication:
► Processes send each other messages, which are queued
► Sender need not wait for immediate reply, but can do other things
► Middleware often ensures fault tolerance

11 / 49
Communication: Remote procedure call Basic RPC operation

Basic RPC operation

Observations
► Application developers are familiar with simple procedure model
► Well-engineered procedures operate in isolation (black box)
► There is no fundamental reason not to execute procedures on
separate machine
Wait for result
Client

Conclusion Call remote


procedure
Return
from call
Communication between caller &
callee can be hidden by using Request Reply
procedure-call mechanism. Server
Call local procedure Time
and return results

12 / 49
Communication: Remote procedure call Basic RPC operation

Basic RPC operation

Client machine Server machine

Client process Server process


1. Client call to
procedure Implementation 6. Stub makes
of doit local call to “doit”
Server stub
r = doit(a,b) r = doit(a,b)
Client stub
proc: “doit” proc: “doit”
type1: val(a) type1: val(a) 5. Stub unpacks
2. Stub builds message
type2: val(b) type2: val(b)
message
proc: “doit” 4. Server OS
Client OS type1: val(a) Server OS
hands message
type2: to server stub
val(b)

3. Message is sent
across the network
1. Client procedure calls client stub. 6. Server does local call; returns result to
2. Stub builds message; calls local OS. stub.
3. OS sends message to remote 7. Stub builds message; calls OS.
OS. 8. OS sends message to client’s OS.
4. Remote OS gives message to stub. 9. Client’s OS gives message to stub.
5. Server stub unpacks parameter(s) 10. Client stub unpacks result; returns to
and call the server Client 13 / 49
Communication: Remote procedure call Parameter passing

RPC: Parameter passing


There’s more than just wrapping parameters into a
message
► Client and server machines may have different data
representations (think of byte ordering)
► Wrapping a parameter means transforming a value into a
sequence of bytes
► Client and server have to agree on the same encoding:

► How are basic data values represented (integers, floats,


characters)
► How are complex data values represented (arrays,
unions)

Conclusion
Client and server need to properly interpret messages, transforming
them into machine-dependent representations.
14 / 49
Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions
► Copy in/copy out semantics: while procedure is executed, nothing
can be assumed about parameter values.
► All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

15 / 49
Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions
► Copy in/copy out semantics: while procedure is executed, nothing
can be assumed about parameter values.
► All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

Conclusion
Full access transparency cannot be realized.

15 / 49
Communication: Remote procedure call Parameter passing

RPC: Parameter passing

Some assumptions
► Copy in/copy out semantics: while procedure is executed, nothing
can be assumed about parameter values.
► All data that is to be operated on is passed by parameters.
Excludes passing references to (global) data.

Conclusion
Full access transparency cannot be realized.

A remote reference mechanism enhances access


transparency
► Remote reference offers unified access to remote data
► Remote references can be passed as parameter in
RPCs
► Note: stubs can sometimes be used as such
references
15 / 49
Communication: Remote procedure call Variations on RPC

Asynchronous RPCs

Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.

Wait for Callback to client


Client acceptance

Call remote Return


procedure from call Return

Accept results
Request request

Server Call local procedure Time

Asynchronous RPC 16 / 49
Communication: Remote procedure call Variations on RPC

Sending out multiple RPCs

Essence
Sending an RPC request to a group of servers.

Server Call local procedure

Callbacks to client
Client

Call remote

procedures

Server Call local procedure Time

Multicast RPC 17 / 49
Communication: Remote procedure call Example: DCE RPC

RPC in practice

Uuidgen

Interface
definition file

IDL compiler

Client code Client stub Header Server stub Server code

#include #include

C compiler C compiler C compiler C compiler

Client Client stub Server stub Server


object file object file object file object file

Runtime Runtime
Linker Linker
library library

Client Server

binary binary

Writing a Client and a Server 18 / 49


Communication: Remote procedure call Example: DCE RPC

Client-to-server binding (DCE)

Issues
(1) Client must locate server machine, and (2) locate the server.

Directory machine

Directory
server
2. Register service
3. Look up server
Server machine
Client machine

5. Do RPC 1. Register port


Server
Client

4. Ask for port DCE


daemon
Port
table

Binding a client to a server 19 / 49


Communication: Message-oriented communication Overview

Message-oriented communication

• RPC and ROI contribute to hiding communication in DS


• Neither mechanism is always appropriate
• Receiving side is executing at the time a request is issued,
alternative communication services are needed
• The inherent synchronous nature of RPCs, by which a client
is blocked until its request has been processed- may need to
be replaced by something else
• Many DSs and applications are built directly on top of the
simple message-oriented model offered by the transport
layer

20 / 49
Communication: Message-oriented communication Simple transient messaging with sockets

Transient messaging: sockets


Berkeley socket interface
Operation Description
socket Create a new communication end point
bind Attach a local address to a socket
listen Tell operating system what the maximum number of pending
connection requests should be
accept Block caller until a connection request arrives
connect Actively attempt to establish a connection
send Send some data over the connection
receive Receive some data over the connection
close Release the connection

Server
socket bind listen accept receive send close

Synchronization point Communication

socket connect send receive close


Client
20 / 49
Communication: Message-oriented communication Simple transient messaging with sockets

Sockets: Python code

Server
1 from so ck et import *
2 s = socket(AF_INET, SOCK_STREAM)
3 s.bind((HOST, PORT))
4 s.listen(1)
5 (conn, a d d r) = s . a c c e p t ( ) # re t u r n s new s o c k e t and a d d r. c l i e n t
6 while True: # f o re v e r
7 d a t a = conn.recv(1024) # re c e i v e data from c l i e n t
8 i f not d a t a : break # s t o p i f c l i e n t stopped
9 conn.send(str(data)+" * ") # re t u r n s e n t data p l u s an " * "
10 co n n .clo se() # c l o s e t h e connection

Client
1 from so ck et import *
2 s = socket(AF_INET, SOCK_STREAM)
3 s.connect((HOST, PORT)) # connect t o s e r v e r (b l o c k u n t i l accepted)
4 s. se n d ( ’He l l o , world’) # send same data
5 d a t a = s.recv(1024) # re c e i v e t h e response
6 print d a t a # p rin t th e re su l t
7 s.close() # c l o s e t h e connection

21 / 49
Communication: Message-oriented communication Advanced transient messaging

Making sockets easier to work with

Observation
Sockets are rather low level and programming mistakes are easily
made. However, the way that they are used is often the same (such as
in a client-server setting).

Alternative: ZeroMQ
Provides a higher level of expression by pairing sockets: one for
sending messages at process P and a corresponding one at process
Q for receiving messages. All communication is asynchronous.

Three patterns
► Request-reply
► Publish-subscribe
► Pipeline

Using messaging patterns: ZeroMQ 22 / 49


Communication: Message-oriented communication Advanced transient messaging

Request-reply

Server
1 import zmq
2 co n tex t = zmq.Context()
3
4 p1 = " t c p : / / " + HOST +":"+ PORT1 # how and where t o connect
5 p2 = " t c p : / / " + HOST +":"+ PORT2 # how and where t o connect
6 s = context.socket(zmq.REP) # c re a t e re p l y s o c k e t
7
8 s.b in d (p 1 ) # bind s o c k e t t o address
9 s.b in d (p 2 ) # bind s o c k e t t o
10 while True: address
11 message = s . r e c v ( ) # w a i t f o r incoming message
12 i f not "STOP" i n message: # i f not t o s t o p . . .
13 s.send(message + " * " ) # append " * " t o message
14 else: # else...
15 break # break o u t o f loop and
end

Using messaging patterns: ZeroMQ 23 / 49


Communication: Message-oriented communication Advanced transient messaging

Request-reply

Client
1 import zmq
2 co n tex t = zmq.Context()
3
4 php = " t c p : / / " + HOST +":"+ PORT # how and where t o
connect
5 s = context.socket(zmq.REQ) # c re a t e s o c k e t
67 s.connect(php) # b lo ck u n t i l connected
8 s.send("Hello World") # send message
9 message = s . r e c v ( ) # b lo ck u n t i l response
10 s.send("STOP") # t e l l server t o stop
11 print message # p rin t re su l t

Using messaging patterns: ZeroMQ 24 / 49


Communication: Message-oriented communication Advanced transient messaging

Publish-subscribe
Server
1 import zmq, time
2
3 co n tex t = zmq.Context()
4 s = context.socket(zmq.PUB) # c re a t e a p u b li sh er s o c k e t
5 p = " t c p : / / " + HOST +":"+ PORT # how and where t o communicate
6 s.bind(p) # bind s o c k e t t o t h e address
7 while True:
8 t i m e . sl e e p (5 ) # w a i t every 5 seconds
9 s.send("TIME " + t i m e . a sc t i m e ()) # p u b li sh t h e cu rren t t i m e
Client
1 import zmq
2
3 co n tex t = zmq.Context()
4 s = context.socket(zmq.SUB) # c re a t e a subscriber s o c k e t
5 p = " t c p : / / " + HOST +":"+ PORT # how and where t o communicate
6 s.co n n ect(p ) # connect t o t h e s e r v e r
7 s.setsockopt(zmq.SUBSCRIBE, "TIME") # subscribe t o TIME messages
8
9 f o r i i n range(5): # Five i t e r a t i o n s
10 time = s . r e c v ( ) # re c e i v e a message
11 print time

Using messaging patterns: ZeroMQ 25 / 49


Communication: Message-oriented communication Advanced transient messaging

Pipeline

Source
1 import zmq, t i m e , p i c k l e , s y s , random
2
3 co n tex t = zmq.Context()
4 me = s t r ( s y s . a rg v [ 1 ] )
5 s = context.socket(zmq.PUSH) # c re a t e a push s o c k e t
6 s r c = SRC1 i f me == ’ 1 ’ e l s e SRC2 # check t a s k source h o s t
7 p r t = PORT1 i f me == ’ 1 ’ e l s e PORT2 # check t a s k source p o r t
8 p = " t c p : / / " + s r c +":"+ p r t # how and where t o connect
9 s.bind(p) # bind s o c k e t t o address
10
11 f o r i i n range(100): # generate 100 workloads
12 workload = random.randint(1, 100) # compute workload
13 s.send(pickle.dumps((me,workload))) # send workload t o
worker

Using messaging patterns: ZeroMQ 26 / 49


Communication: Message-oriented communication Advanced transient messaging

Pipeline

Worker
1 import zmq, t i m e , p i c k l e , s y s
2
3 co n tex t = zmq.Context()
4 me = s t r ( s y s . a rg v [ 1 ] )
5 r = context.socket(zmq.PULL) # c re a t e a p u l l s o c k e t
6 p1 = " t c p : / / " + SRC1 +":"+ PORT1 # address f i r s t t a s k source
7 p2 = " t c p : / / " + SRC2 +":"+ PORT2 # address second t a s k source
8 r.connect(p1) # connect t o t a s k source 1
9 r.connect(p2) # connect t o t a s k source 2
10
11 while True:
12 work = p i c k l e . l o a d s ( r. r e c v ( ) ) # re c e i v e work from a source
13 time.sleep(work[1] * 0.01) # pretend t o work

Using messaging patterns: ZeroMQ 27 / 49


Communication: Message-oriented communication Advanced transient messaging

MPI: When lots of flexibility is needed


Representative operations
Operation Description
MPI bsend Append outgoing message to a local send buffer
MPI send Send a message and wait until copied to local or
remote buffer
MPI ssend Send a message and wait until transmission
starts
MPI sendrecv Send a message and wait for reply
MPI isend Pass reference to outgoing message, and
continue
MPI issend Pass reference to outgoing message, and wait
until receipt starts
MPI recv Receive a message; block if there is none
MPI irecv Check if there is an incoming message, but do
not block

The Message-Passing Interface (MPI) 28 / 49


Communication: Message-oriented communication Message-oriented persistent communication

Message-oriented middleware
Essence
Asynchronous persistent communication through support of
middleware-level queues. Queues correspond to buffers at
communication servers.

Operations
Operation Description
put Append a message to a specified queue
get Block until the specified queue is nonempty, and
remove the first message
poll Check a specified queue for messages, and
remove the first. Never block
notify Install a handler to be called when a message is
put into the specified queue

Message-queuing model 29 / 49
Communication: Message-oriented communication Message-oriented persistent communication

General model
Queue managers
Queues are managed by queue managers. An application can put
messages only into a local queue. Getting a message is possible by
extracting it from a local queue only ⇒ queue managers need to route
messages.

Routing
Look up
Source queue contact address Destination queue
manager of destination manager
queue manager
Logical
queue-level
address (name)

Local OS Address lookup


Local OS
database

Contact
Network
address

General architecture of a message-queuing system 30 / 49


Communication: Message-oriented communication Message-oriented persistent communication

Message broker

Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data
representation)

Broker handles application heterogeneity in an MQ system


► Transforms incoming messages to target format
► Very often acts as an application gateway
► May provide subject-based routing capabilities (i.e.,
publish-subscribe capabilities)

Message brokers 31 / 49
Communication: Message-oriented communication Message-oriented persistent communication

Message broker: general architecture

Source Message broker Destination

Application Application

Broker plugins Rules

Interface Interface
Queuing
layer
Local OS Local OS Local OS

Message brokers 32 / 49
Communication: Message-oriented communication Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ
Basic concepts
► Application-specific messages are put into, and removed from
queues
► Queues reside under the regime of a queue manager
► Processes can put messages only in local queues, or through an
RPC mechanism

Message transfer
► Messages are transferred between queues
► Message transfer between queues at different processes,
requires a channel
► At each end point of channel is a message channel agent
► Message channel agents are responsible for:
► Setting up channels using lower-level network
communication facilities (e.g., TCP/IP)
► (Un)wrapping messages from/in transport-
level packets
Overview ► Sending/receiving packets 33 / 49
Communication: Message-oriented communication Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ
Schematic overview

Client's receive
Sending client Routing table Send queue queue Receiving client

Queue Queue
manager
manager
MQ Interface MQ Interface

Server MCA MCA


Stub
stub MCA MCA Server
stub
Stub

Enterprise network Local network


RPC Message To other remote
passing (synchronous) queue
(asynchronous) managers
► Channels are inherently unidirectional
► Automatically start MCAs when messages arrive
► Any network of queue managers can be created
► Routes are set up manually (system administration)
Overview 34 / 49
Communication: Message-oriented communication Example: IBM’s WebSphere message-queuing system

Message channel agents

Some attributes associated with message channel agents

Attribute Description
Transport type Determines the transport protocol to be used
FIFO delivery Indicates that messages are to be delivered in
the order they are sent
Message length Maximum length of a single message
Setup retry count Specifies maximum number of retries to start up
the remote MCA
Delivery retries Maximum times MCA will try to put received
message into queue

Channels 35 / 49
Communication: Message-oriented communication Example: IBM’s WebSphere message-queuing system

IBM’s WebSphere MQ
Routing
By using logical names, in combination with name resolution to local
queues, it is possible to put a message in a remote queue

Alias table Routing table


LA1 QMC QMB SQ1
Alias table Routing table
LA2 QMD QMC SQ1
LA1 QMA QMA SQ1
QMD SQ2
LA2 QMD QMC SQ1
QMD SQ1
SQ2 SQ1
SQ1
QMA
QMB

Routing table Routing table


SQ1 QMC
QMA SQ1 QMA SQ1
QMC SQ2 QMB SQ1
QMB SQ1 SQ2 QMD SQ1

Alias table
LA1 QMA
LA2 QMC SQ1
QMD

Message transfer 36 / 49
Communication: Multicast communication Application-level tree-based multicasting

Multicast Communication
• An important topic in communication id DSs is the support for
sending data to multiple receivers
• This topic has belonged to the domain of network protocols,
where numerous proposals for network-level and transport-level
solutions have been implemented and evaluated
• A major issue in all solutions was setting up the communication
paths for information disseminations
• In practice, this involved a huge management effort, in many
cases requiring human intervention
• With the advent of peer-to-peer technology, and notably
structured overlay management, it became easier to set up
communication paths
• As peer-to-peer solutions are typically deployed at the application
layer, various application level multicasting techniques have been
introduced
• Gossip-based information dissemination provides simple (yet
often less efficient) ways for multicasting

37 / 49
Communication: Multicast communication Application-level tree-based multicasting

Application-level multicasting

Essence
Organize nodes of a distributed system into an overlay network and
use that network to disseminate data:
► Oftentimes a tree, leading to unique paths
► Alternatively, also mesh networks, requiring a form of routing

• Network routers are not involved in group membership


• The connections between nodes in the overlay network may
cross several physical links, and as such, routing messages
within the overlay may not be optimal in comparison to what
could have been achieved by network-level routing

37 / 49
Communication: Multicast communication Application-level tree-based multicasting

Application-level multicasting in Chord

Basic approach
1. Initiator generates a multicast identifier mid .
2. Lookup succ(mid ), the node responsible for mid .
3. Request is routed to succ(mid ), which will become the root.
4. If P wants to join, it sends a join request to the root.
5. When request arrives at Q:
► Q has not seen a join request before ⇒ it becomes
forwarder; P becomes child of Q. Join request continues to
be forwarded.
► Q knows about tree ⇒ P becomes child of Q. No need to
forward join request anymore.

38 / 49
Communication: Multicast communication Application-level tree-based multicasting

ALM: Some costs


Different metrics
End host E Router
1 1 C
A 1
Ra Re Rc
30 20
5
7
40 Rd
Rb 1
1 Internet D
B
Overlay network

► Link stress: How often does an ALM message cross the same
physical link? Example: message from A to D needs to cross
► (Ra, Rb) twice.
Stretch: Ratio in delay between ALM-level path and network-level
path. Example: messages B to C follow path of length 73 at ALM,
but 47 at network level ⇒ stretch = 73/47.

Performance issues in overlays 39 / 49


Communication: Multicast communication Flooding-based multicasting

Flooding
Essence The size of a random
P simply sends a message m overlay as function of the
to each of its neighbors. Each number of nodes
neighbor will forward that
message, except to P, and only
300
if it had not seen m before.

Number of edges (x 1000)


250
Performance 200 pedge = 0.6
The more edges, the more 150
pedge = 0.4
expensive! 100
pedge = 0.2
50

0
100 500 1000
Number of nodes

40 / 49
Communication: Multicast communication Flooding-based multicasting

Flooding
Essence The size of a random
P simply sends a message m overlay as function of the
to each of its neighbors. Each number of nodes
neighbor will forward that
message, except to P, and only
300
if it had not seen m before.

Number of edges (x 1000)


250
Performance 200 pedge = 0.6
The more edges, the more 150
pedge = 0.4
expensive! 100
pedge = 0.2
50

0
100 500 1000
Number of nodes

Variation
Let Q forward a message with a certain probability pflood , possibly
even dependent on its own number of neighbors (i.e., node degree) or
the degree of its neighbors. 40 / 49
Communication: Multicast communication Gossip-based data dissemination

Epidemic protocols

Assume there are no write–write conflicts


► Update operations are performed at a single server
► A replica passes updated state to only a few neighbors
► Update propagation is lazy, i.e., not immediate
► Eventually, each update should reach every replica

Two forms of epidemics


► Anti-entropy: Each replica regularly chooses another replica at
random, and exchanges state differences, leading to identical
states at both afterwards
► Rumor spreading: A replica which has just been updated (i.e.,
has been contaminated), tells a number of other replicas about its
update (contaminating them as well).

41 / 49
Communication: Multicast communication Gossip-based data dissemination

Anti-entropy

Principle operations
► A node P selects another node Q from the system at random.
► Pull: P only pulls in new updates from Q
► Push: P only pushes its own updates to Q
► Push-pull: P and Q send updates to each other

Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all
N nodes (round = when every node has taken the initiative to start an
exchange).

Information dissemination models 42 / 49


Communication: Multicast communication Gossip-based data dissemination

Anti-entropy: analysis
Basics
Consider a single source, propagating its update. Let pi be the
probability that a node has not received the update after the ith round.

Analysis: staying ignorant


► With pull, pi + 1 = (pi ) 2 : the node
was not updated during the ith
1.0
round and should contact another N = 10,000

Probability not yet updated


ignorant node during the next 0.8
round.
push
► With push, pi + 1 = 0.6

pi (1 −N 1−1 ) (N−1)(1−pi ) ≈ pi e−1 0.4 push-pull


small pi and large N): (for the node pull
0.2
was ignorant during the ith round
and no updated node chooses to
0 5 10 15 20 25
contact it during the next round.
Round
► With push-pull: (pi ) 2 · (pi e−1 )

Information dissemination models 43 / 49


Communication: Multicast communication Gossip-based data dissemination

Anti-entropy performance

1.0
N = 10,000
Probability not yet updated 0.8

0.6 push

0.4 push-pull
pull
0.2

0 5 10 15 20 25
Round

Information dissemination models 44 / 49


Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

Basic model
A server S having an update to report, contacts other servers. If a
server is contacted to which the update has already propagated, S
stops contacting other servers with probability pstop.

Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the
update), it can be shown that with many servers

−(1/ pstop + 1)(1−s)


s= e

Information dissemination models 45 / 49


Communication: Multicast communication Gossip-based data dissemination

Formal analysis
Notations
Let s denote fraction of nodes that have not yet been updated (i.e.,
susceptible; i the fraction of updated (infected) and active nodes; and r
the fraction of updated nodes that gave up (removed).

From theory of epidemics

(1) ds/ dt = −s
(2) di/dt ·= i s · i − pstop · (1 −
stopp
⇒ di/ ds s) · i −(1 + pstop ) s
=
⇒ i (s) +
= −(1 + pstop ) · s + pstop · ln(s)
+ C
Wrapup
i (1) = 0 ⇒ C = 1 + pstop ⇒ i (s) = (1 + pstop) · (1 −s) + pstop · ln(s).
We
are looking for the case i (s) = 0, which leads to s = e − (1/p stop + 1)(1 − s )
Information dissemination models 46 / 49
Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

The effect of stopping


0.20 Consider 10,000 nodes
1/p stop s Ns
0.15
1 0.203188 2032
2 0.059520 595
s 0.10
3 0.019827 198
0.05 4 0.006977 70
5 0.002516 25
0.00
0.0 0.2 0.4
pstop
0.6 0.8 1.0 6 0.000918 9
7 0.000336 3

Information dissemination models 47 / 49


Communication: Multicast communication Gossip-based data dissemination

Rumor spreading

The effect of stopping


0.20 Consider 10,000 nodes
1/p stop s Ns
0.15
1 0.203188 2032
2 0.059520 595
s 0.10
3 0.019827 198
0.05 4 0.006977 70
5 0.002516 25
0.00
0.0 0.2 0.4
pstop
0.6 0.8 1.0 6 0.000918 9
7 0.000336 3

Note
If we really have to ensure that all servers are eventually updated,
rumor spreading alone is not enough

Information dissemination models 47 / 49


Communication: Multicast communication Gossip-based data dissemination

Deleting values

Fundamental problem
We cannot remove an old value from a server and expect the removal
to propagate. Instead, mere removal will be undone in due time using
epidemic algorithms

Solution
Removal has to be registered as a special update by inserting a death
certificate

Removing data 48 / 49
Communication: Multicast communication Gossip-based data dissemination

Deleting values

When to remove a death certificate (it is not allowed to


stay for ever)
► Run a global algorithm to detect whether the removal is known
everywhere, and then collect the death certificates (looks like
garbage collection)
► Assume death certificates propagate in finite time, and associate
a maximum lifetime for a certificate (can be done at risk of not
reaching all servers)

Note
It is necessary that a removal actually reaches all servers.

Removing data 49 / 49

You might also like