Distributed Systems: Chapter 01: Introduction

Download as pdf or txt
Download as pdf or txt
You are on page 1of 163

Distributed Systems

(3rd Edition)

Chapter 01: Introduction


Version: February 25, 2017
Introduction: What is a distributed system?

Distributed System

Definition
A distributed system is a collection of autonomous computing elements that
appears to its users as a single coherent system.

Characteristic features
Autonomous computing elements, also referred to as nodes, be they
hardware devices or software processes.
Single coherent system: users or applications perceive a single system ⇒
nodes need to collaborate.

2 / 56
Introduction: What is a distributed system? Middleware and distributed systems

Middleware: the OS of distributed systems

Same interface everywhere

Computer 1 Computer 2 Computer 3 Computer 4

Appl. A Application B Appl. C

Distributed-system layer (middleware)

Local OS 1 Local OS 2 Local OS 3 Local OS 4

Network

What does it contain?


Commonly used components and functions that need not be implemented by
applications separately.

6 / 56
Introduction: Design goals

What do we want to achieve?

Support sharing of resources


Distribution transparency
Openness
Scalability

7 / 56
Introduction: Design goals Supporting resource sharing

Sharing resources

Canonical examples
Cloud-based shared storage and files
Peer-to-peer assisted multimedia streaming
Shared mail services (think of outsourced mail systems)
Shared Web hosting (think of content distribution networks)

Observation
“The network is the computer”

(quote from John Gage, then at Sun Microsystems)

8 / 56
Introduction: Design goals Making distribution transparent

Distribution transparency

Types
Transparency Description
Access Hide differences in data representation and how an
object is accessed
Location Hide where an object is located
Relocation Hide that an object may be moved to another location
while in use
Migration Hide that an object may move to another location
Replication Hide that an object is replicated
Concurrency Hide that an object may be shared by several
independent users
Failure Hide the failure and recovery of an object

Types of distribution transparency 9 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Observation
Aiming at full distribution transparency may be too much:

Degree of distribution transparency 10 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Observation
Aiming at full distribution transparency may be too much:
There are communication latencies that cannot be hidden

Degree of distribution transparency 10 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Observation
Aiming at full distribution transparency may be too much:
There are communication latencies that cannot be hidden
Completely hiding failures of networks and nodes is (theoretically and
practically) impossible
You cannot distinguish a slow computer from a failing one
You can never be sure that a server actually performed an operation
before a crash

Degree of distribution transparency 10 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Observation
Aiming at full distribution transparency may be too much:
There are communication latencies that cannot be hidden
Completely hiding failures of networks and nodes is (theoretically and
practically) impossible
You cannot distinguish a slow computer from a failing one
You can never be sure that a server actually performed an operation
before a crash
Full transparency will cost performance, exposing distribution of the
system
Keeping replicas exactly up-to-date with the master takes time
Immediately flushing write operations to disk for fault tolerance

Degree of distribution transparency 10 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Exposing distribution may be good


Making use of location-based services (finding your nearby friends)
When dealing with users in different time zones
When it makes it easier for a user to understand what’s going on (when
e.g., a server does not respond for a long time, report it as failing).

Degree of distribution transparency 11 / 56


Introduction: Design goals Making distribution transparent

Degree of transparency

Exposing distribution may be good


Making use of location-based services (finding your nearby friends)
When dealing with users in different time zones
When it makes it easier for a user to understand what’s going on (when
e.g., a server does not respond for a long time, report it as failing).

Conclusion
Distribution transparency is a nice a goal, but achieving it is a different story,
and it should often not even be aimed at.

Degree of distribution transparency 11 / 56


Introduction: Design goals Being open

Openness of distributed systems

What are we talking about?


Be able to interact with services from other open systems, irrespective of the
underlying environment:
Systems should conform to well-defined interfaces
Systems should easily interoperate
Systems should support portability of applications
Systems should be easily extensible

Interoperability, composability, and extensibility 12 / 56


Introduction: Design goals Being open

Policies versus mechanisms

Implementing openness: policies


What level of consistency do we require for client-cached data?
Which operations do we allow downloaded code to perform?
Which QoS requirements do we adjust in the face of varying bandwidth?
What level of secrecy do we require for communication?

Implementing openness: mechanisms


Allow (dynamic) setting of caching policies
Support different levels of trust for mobile code
Provide adjustable QoS parameters per data stream
Offer different encryption algorithms

Separating policy from mechanism 13 / 56


Introduction: Design goals Being scalable

Scale in distributed systems

Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.

Scalability dimensions 15 / 56
Introduction: Design goals Being scalable

Scale in distributed systems

Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.

At least three components


Number of users and/or processes (size scalability)
Maximum distance between nodes (geographical scalability)
Number of administrative domains (administrative scalability)

Scalability dimensions 15 / 56
Introduction: Design goals Being scalable

Scale in distributed systems

Observation
Many developers of modern distributed systems easily use the adjective
“scalable” without making clear why their system actually scales.

At least three components


Number of users and/or processes (size scalability)
Maximum distance between nodes (geographical scalability)
Number of administrative domains (administrative scalability)

Observation
Most systems account only, to a certain extent, for size scalability. Often a
solution: multiple powerful servers operating independently in parallel. Today,
the challenge still lies in geographical and administrative scalability.

Scalability dimensions 15 / 56
Introduction: Design goals Being scalable

Size scalability

Root causes for scalability problems with centralized solutions


The computational capacity, limited by the CPUs
The storage capacity, including the transfer rate between CPUs and disks
The network between the user and the centralized service

Scalability dimensions 16 / 56
Introduction: Design goals Being scalable

Problems with geographical scalability

Cannot simply go from LAN to WAN: many distributed systems assume


synchronous client-server interactions: client sends request and waits for
an answer. Latency may easily prohibit this scheme.
WAN links are often inherently unreliable: simply moving streaming video
from LAN to WAN is bound to fail.
Lack of multipoint communication, so that a simple search broadcast
cannot be deployed. Solution is to develop separate naming and directory
services (having their own scalability problems).

Scalability dimensions 20 / 56
Introduction: Design goals Being scalable

Problems with administrative scalability


Essence
Conflicting policies concerning usage (and thus payment), management, and
security

Examples
Computational grids: share expensive resources between different
domains.
Shared equipment: how to control, manage, and use a shared radio
telescope constructed as large-scale shared sensor network?

Exception: several peer-to-peer networks


File-sharing systems (based, e.g., on BitTorrent)
Peer-to-peer telephony (Skype)
Peer-assisted audio streaming (Spotify)

Note: end users collaborate and not administrative entities.


Scalability dimensions 21 / 56
Introduction: Design goals Being scalable

Techniques for scaling

Hide communication latencies


Make use of asynchronous communication
Have separate handler for incoming response
Problem: not every application fits this model

Scaling techniques 22 / 56
Introduction: Design goals Being scalable

Techniques for scaling

Partition data and computations across multiple machines


Move computations to clients (Java applets)
Decentralized naming services (DNS)
Decentralized information systems (WWW)

Scaling techniques 24 / 56
Introduction: Design goals Being scalable

Techniques for scaling

Replication and caching: Make copies of data available at different machines


Replicated file servers and databases
Mirrored Web sites
Web caches (in browsers and proxies)
File caching (at server and client)

Scaling techniques 25 / 56
Introduction: Design goals Being scalable

Scaling: The problem with replication

Applying replication is easy, except for one thing

Scaling techniques 26 / 56
Introduction: Design goals Being scalable

Scaling: The problem with replication

Applying replication is easy, except for one thing


Having multiple copies (cached or replicated), leads to inconsistencies:
modifying one copy makes that copy different from the rest.

Scaling techniques 26 / 56
Introduction: Design goals Being scalable

Scaling: The problem with replication

Applying replication is easy, except for one thing


Having multiple copies (cached or replicated), leads to inconsistencies:
modifying one copy makes that copy different from the rest.
Always keeping copies consistent and in a general way requires global
synchronization on each modification.

Scaling techniques 26 / 56
Introduction: Design goals Being scalable

Scaling: The problem with replication

Applying replication is easy, except for one thing


Having multiple copies (cached or replicated), leads to inconsistencies:
modifying one copy makes that copy different from the rest.
Always keeping copies consistent and in a general way requires global
synchronization on each modification.
Global synchronization precludes large-scale solutions.

Scaling techniques 26 / 56
Introduction: Design goals Being scalable

Scaling: The problem with replication

Applying replication is easy, except for one thing


Having multiple copies (cached or replicated), leads to inconsistencies:
modifying one copy makes that copy different from the rest.
Always keeping copies consistent and in a general way requires global
synchronization on each modification.
Global synchronization precludes large-scale solutions.

Observation
If we can tolerate inconsistencies, we may reduce the need for global
synchronization, but tolerating inconsistencies is application dependent.

Scaling techniques 26 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous
The topology does not change

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous
The topology does not change
Latency is zero

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous
The topology does not change
Latency is zero
Bandwidth is infinite

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous
The topology does not change
Latency is zero
Bandwidth is infinite
Transport cost is zero

27 / 56
Introduction: Design goals Pitfalls

Developing distributed systems: Pitfalls


Observation
Many distributed systems are needlessly complex caused by mistakes that
required patching later on. Many false assumptions are often made.

False (and often hidden) assumptions


The network is reliable
The network is secure
The network is homogeneous
The topology does not change
Latency is zero
Bandwidth is infinite
Transport cost is zero
There is one administrator
27 / 56
Introduction: Types of distributed systems

Three types of distributed systems

High performance distributed computing systems


Distributed information systems
Distributed systems for pervasive computing

28 / 56
Introduction: Types of distributed systems High performance distributed computing

Parallel computing

Observation
High-performance distributed computing started with parallel computing

Multiprocessor and multicore versus multicomputer


Shared memory Private memory

M M M M M M M

Interconnect P P P P

P P P P Interconnect

Processor Memory

29 / 56
Introduction: Types of distributed systems High performance distributed computing

Cluster computing

Essentially a group of high-end systems connected through a LAN


Homogeneous: same OS, near-identical hardware
Single managing node

Master node Compute node Compute node Compute node

Management Component Component Component


application of of of
parallel parallel parallel
Parallel libs application application application

Local OS Local OS Local OS Local OS

Remote access Standard network


network
High-speed network

Cluster computing 31 / 56
Introduction: Types of distributed systems High performance distributed computing

Grid computing

The next step: lots of nodes from everywhere


Heterogeneous
Dispersed across several organizations
Can easily span a wide-area network

Note
To allow for collaborations, grids generally use virtual organizations. In
essence, this is a grouping of users (or better: their IDs) that will allow for
authorization on resource allocation.

Grid computing 32 / 56
Introduction: Types of distributed systems High performance distributed computing

Cloud computing

Google docs
Software
aa Svc

Web services, multimedia, business apps Gmail


YouTube, Flickr
Application
MS Azure
Software framework (Java/Python/.Net) Google App engine
Storage (databases)
Platform
aa Svc

Platforms
Amazon S3
Computation (VM), storage (block, file) Amazon EC2
Infrastructure
Infrastructure
aa Svc

CPU, memory, disk, bandwidth Datacenters


Hardware

Cloud computing 34 / 56
Introduction: Types of distributed systems High performance distributed computing

Cloud computing

Make a distinction between four layers


Hardware: Processors, routers, power and cooling systems. Customers
normally never get to see these.
Infrastructure: Deploys virtualization techniques. Evolves around
allocating and managing virtual storage devices and virtual servers.
Platform: Provides higher-level abstractions for storage and such.
Example: Amazon S3 storage system offers an API for (locally created)
files to be organized and stored in so-called buckets.
Application: Actual applications, such as office suites (text processors,
spreadsheet applications, presentation applications). Comparable to the
suite of apps shipped with OSes.

Cloud computing 35 / 56
Introduction: Types of distributed systems Distributed information systems

Integrating applications

Situation
Organizations confronted with many networked applications, but achieving
interoperability was painful.

Basic approach
A networked application is one that runs on a server making its services
available to remote clients. Simple integration: clients combine requests for
(different) applications; send that off; collect responses, and present a coherent
result to the user.

Next step
Allow direct application-to-application communication, leading to Enterprise
Application Integration.

42 / 56
Introduction: Types of distributed systems Distributed information systems

Example EAI: (nested) transactions


Transaction
Primitive Description
BEGIN TRANSACTION Mark the start of a transaction
END TRANSACTION Terminate the transaction and try to commit
ABORT TRANSACTION Kill the transaction and restore the old values
READ Read data from a file, a table, or otherwise
WRITE Write data to a file, a table, or otherwise

Issue: all-or-nothing
Nested transaction

Subtransaction Subtransaction

Atomic: happens indivisibly (seemingly)


Airline database Hotel database Consistent: does not violate system invariants
Isolated: not mutual interference
Two different (independent) databases Durable: commit means changes are permanent

Distributed transaction processing 43 / 56


Introduction: Types of distributed systems Distributed information systems

TPM: Transaction Processing Monitor

Server
Reply
Transaction Request
Requests
Request
Client Server
TP monitor
application
Reply
Reply
Request

Server
Reply

Observation
In many cases, the data involved in a transaction is distributed across several
servers. A TP Monitor is responsible for coordinating the execution of a
transaction.

Distributed transaction processing 44 / 56


Introduction: Types of distributed systems Distributed information systems

Middleware and EAI

Client Client
application application

Communication middleware

Server-side Server-side Server-side


application application application

Middleware offers communication facilities for integration


Remote Procedure Call (RPC): Requests are sent through local procedure
call, packaged as message, processed, responded through message, and
result returned as return from call.
Message Oriented Middleware (MOM): Messages are sent to logical contact
point (published), and forwarded to subscribed applications.
Enterprise application integration 45 / 56
Introduction: Types of distributed systems Pervasive systems

Distributed pervasive systems

Observation
Emerging next-generation of distributed systems in which nodes are small,
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.

Three (overlapping) subtypes

47 / 56
Introduction: Types of distributed systems Pervasive systems

Distributed pervasive systems

Observation
Emerging next-generation of distributed systems in which nodes are small,
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.

Three (overlapping) subtypes


Ubiquitous computing systems: pervasive and continuously present, i.e.,
there is a continuous interaction between system and user.

47 / 56
Introduction: Types of distributed systems Pervasive systems

Distributed pervasive systems

Observation
Emerging next-generation of distributed systems in which nodes are small,
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.

Three (overlapping) subtypes


Ubiquitous computing systems: pervasive and continuously present, i.e.,
there is a continuous interaction between system and user.
Mobile computing systems: pervasive, but emphasis is on the fact that
devices are inherently mobile.

47 / 56
Introduction: Types of distributed systems Pervasive systems

Distributed pervasive systems

Observation
Emerging next-generation of distributed systems in which nodes are small,
mobile, and often embedded in a larger system, characterized by the fact that
the system naturally blends into the user’s environment.

Three (overlapping) subtypes


Ubiquitous computing systems: pervasive and continuously present, i.e.,
there is a continuous interaction between system and user.
Mobile computing systems: pervasive, but emphasis is on the fact that
devices are inherently mobile.
Sensor (and actuator) networks: pervasive, with emphasis on the actual
(collaborative) sensing and actuation of the environment.

47 / 56
Introduction: Types of distributed systems Pervasive systems

Mobile computing

Distinctive features
A myriad of different mobile devices (smartphones, tablets, GPS devices,
remote controls, active badges.
Mobile implies that a device’s location is expected to change over time ⇒
change of local services, reachability, etc. Keyword: discovery.
Communication may become more difficult: no stable route, but also
perhaps no guaranteed connectivity ⇒ disruption-tolerant networking.

Mobile computing systems 49 / 56


Introduction: Types of distributed systems Pervasive systems

Sensor networks

Characteristics
The nodes to which sensors are attached are:
Many (10s-1000s)
Simple (small memory/compute/communication capacity)
Often battery-powered (or even battery-less)

Sensor networks 53 / 56
Introduction: Types of distributed systems Pervasive systems

Sensor networks as distributed databases

Two extremes
Sensor network

Operator's site

Sensor data
is sent directly
to operator
Each sensor
can process and Sensor network
store data
Operator's site
Query

Sensors
send only
answers

Sensor networks 54 / 56
Distributed Systems
(3rd Edition)

Chapter 02: Architectures


Version: February 25, 2017
Architectures: Architectural styles

Architectural styles

Basic idea
A style is formulated in terms of
(replaceable) components with well-defined interfaces
the way that components are connected to each other
the data exchanged between components
how these components and connectors are jointly configured into a
system.

Connector
A mechanism that mediates communication, coordination, or cooperation
among components. Example: facilities for (remote) procedure call,
messaging, or streaming.

2 / 36
Architectures: Architectural styles Layered architectures

Layered architecture

Different layered organizations


Request/Response
downcall One-way call

Layer N Layer N Layer N

Layer N-1 Layer N-1 Layer N-1

Handle
Upcall
Layer N-2
Layer N-2
Layer 2
Layer N-3

Layer 1

(a) (b) (c)

3 / 36
Architectures: Architectural styles Layered architectures

Example: communication protocols

Protocol, service, interface


Party A Party B

Layer N Layer N

Interface Service

Layer N-1 Layer N-1


Protocol

Layered communication protocols 4 / 36


Architectures: Architectural styles Layered architectures

Two-party communication

Server
1 from socket import *
2 s = socket(AF_INET, SOCK_STREAM)
3 (conn, addr) = s.accept() # returns new socket and addr. client
4 while True: # forever
5 data = conn.recv(1024) # receive data from client
6 if not data: break # stop if client stopped
7 conn.send(str(data)+"*") # return sent data plus an "*"
8 conn.close() # close the connection

Client
1 from socket import *
2 s = socket(AF_INET, SOCK_STREAM)
3 s.connect((HOST, PORT)) # connect to server (block until accepted)
4 s.send(’Hello, world’) # send some data
5 data = s.recv(1024) # receive the response
6 print data # print the result
7 s.close() # close the connection

Layered communication protocols 5 / 36


Architectures: Architectural styles Layered architectures

Application Layering

Traditional three-layered view


Application-interface layer contains units for interfacing to users or
external applications
Processing layer contains the functions of an application, i.e., without
specific data
Data layer contains the data that a client wants to manipulate through the
application components

Application layering 6 / 36
Architectures: Architectural styles Layered architectures

Application Layering

Traditional three-layered view


Application-interface layer contains units for interfacing to users or
external applications
Processing layer contains the functions of an application, i.e., without
specific data
Data layer contains the data that a client wants to manipulate through the
application components

Observation
This layering is found in many distributed information systems, using traditional
database technology and accompanying applications.

Application layering 6 / 36
Architectures: Architectural styles Layered architectures

Application Layering

Example: a simple search engine


User-interface
User interface level

HTML page
Keyword expression containing list
HTML
generator Processing
Query Ranked list level
generator of page titles
Ranking
Database queries algorithm

Web page titles


with meta-information
Database Data level
with Web pages

Application layering 7 / 36
Architectures: Architectural styles Object-based and service-oriented architectures

Object-based style

Essence
Components are objects, connected to each other through procedure calls.
Objects may be placed on different machines; calls can thus execute across a
network.

State
Object Object

Method
Method call
Object

Object
Object
Interface

Encapsulation
Objects are said to encapsulate data and offer methods on that data without
revealing the internal implementation.

8 / 36
Architectures: Architectural styles Resource-based architectures

RESTful architectures
Essence
View a distributed system as a collection of resources, individually managed by
components. Resources may be added, removed, retrieved, and modified by
(remote) applications.
1 Resources are identified through a single naming scheme
2 All services offer the same interface
3 Messages sent to or from a service are fully self-described
4 After executing an operation at a service, that component forgets
everything about the caller

Basic operations
Operation Description
PUT Create a new resource
GET Retrieve the state of a resource in some representation
DELETE Delete a resource
POST Modify a resource by transferring a new state

9 / 36
Architectures: Architectural styles Resource-based architectures

Example: Amazon’s Simple Storage Service

Essence
Objects (i.e., files) are placed into buckets (i.e., directories). Buckets cannot be
placed into buckets. Operations on ObjectName in bucket BucketName require
the following identifier:

http://BucketName.s3.amazonaws.com/ObjectName

Typical operations
All operations are carried out by sending HTTP requests:
Create a bucket/object: PUT, along with the URI
Listing objects: GET on a bucket name
Reading an object: GET on a full URI

10 / 36
Architectures: Architectural styles Resource-based architectures

On interfaces

Issue
Many people like RESTful approaches because the interface to a service is so
simple. The catch is that much needs to be done in the parameter space.

Amazon S3 SOAP interface


Bucket operations Object operations
ListAllMyBuckets PutObjectInline
CreateBucket PutObject
DeleteBucket CopyObject
ListBucket GetObject
GetBucketAccessControlPolicy GetObjectExtended
SetBucketAccessControlPolicy DeleteObject
GetBucketLoggingStatus GetObjectAccessControlPolicy
SetBucketLoggingStatus SetObjectAccessControlPolicy

11 / 36
Architectures: Architectural styles Resource-based architectures

On interfaces
Simplifications
Assume an interface bucket offering an operation create, requiring an input
string such as mybucket, for creating a bucket “mybucket.”

12 / 36
Architectures: Architectural styles Resource-based architectures

On interfaces
Simplifications
Assume an interface bucket offering an operation create, requiring an input
string such as mybucket, for creating a bucket “mybucket.”

SOAP
import bucket
bucket.create("mybucket")

12 / 36
Architectures: Architectural styles Resource-based architectures

On interfaces
Simplifications
Assume an interface bucket offering an operation create, requiring an input
string such as mybucket, for creating a bucket “mybucket.”

SOAP
import bucket
bucket.create("mybucket")

RESTful
PUT "http://mybucket.s3.amazonsws.com/"

12 / 36
Architectures: Architectural styles Resource-based architectures

On interfaces
Simplifications
Assume an interface bucket offering an operation create, requiring an input
string such as mybucket, for creating a bucket “mybucket.”

SOAP
import bucket
bucket.create("mybucket")

RESTful
PUT "http://mybucket.s3.amazonsws.com/"

Conclusions
Are there any to draw?

12 / 36
Architectures: Architectural styles Publish-subscribe architectures

Coordination
Temporal and referential coupling
Temporally Temporally
coupled decoupled
Referentially Direct Mailbox
coupled
Referentially Event- Shared
decoupled based data space

Event-based and Shared data space

Component Component Component Component

Subscribe Notification
Publish Subscribe Data
delivery
delivery
Event bus

Publish

Component
Shared (persistent) data space

13 / 36
Architectures: Architectural styles Publish-subscribe architectures

Example: Linda tuple space

Three simple operations


in(t): remove a tuple matching template t
rd(t): obtain copy of a tuple matching template t
out(t): add tuple t to the tuple space

More details
Calling out(t) twice in a row, leads to storing two copies of tuple t ⇒ a
tuple space is modeled as a multiset.
Both in and rd are blocking operations: the caller will be blocked until a
matching tuple is found, or has become available.

14 / 36
Architectures: Architectural styles Publish-subscribe architectures

Example: Linda tuple space


Bob
1 blog = linda.universe._rd(("MicroBlog",linda.TupleSpace))[1]
2
3 blog._out(("bob","distsys","I am studying chap 2"))
4 blog._out(("bob","distsys","The linda example’s pretty simple"))
5 blog._out(("bob","gtcn","Cool book!"))

Alice
1 blog = linda.universe._rd(("MicroBlog",linda.TupleSpace))[1]
2
3 blog._out(("alice","gtcn","This graph theory stuff is not easy"))
4 blog._out(("alice","distsys","I like systems more than graphs"))

Chuck
1 blog = linda.universe._rd(("MicroBlog",linda.TupleSpace))[1]
2
3 t1 = blog._rd(("bob","distsys",str))
4 t2 = blog._rd(("alice","gtcn",str))
5 t3 = blog._rd(("bob","gtcn",str))

15 / 36
Architectures: Middleware organization Wrappers

Using legacy to build middleware

Problem
The interfaces offered by a legacy component are most likely not suitable for all
applications.

Solution
A wrapper or adapter offers an interface acceptable to a client application. Its
functions are transformed into those available at the component.

16 / 36
Architectures: Middleware organization Wrappers

Organizing wrappers

Two solutions: 1-on-1 or through a broker


Wrapper

Application Broker

Complexity with N applications

1-on-1: requires N × (N − 1) = O(N 2 ) wrappers


broker: requires 2N = O(N) wrappers

17 / 36
Architectures: Middleware organization Interceptors

Developing adaptable middleware

Problem
Middleware contains solutions that are good for most applications ⇒ you may
want to adapt its behavior for specific applications.

18 / 36
Architectures: Middleware organization Interceptors

Intercept the usual flow of control

Client application
Intercepted call
B.doit(val)

Application stub

Request-level interceptor Nonintercepted call

invoke(B, &doit, val)

Object middleware

Message-level interceptor

send(B, “doit”, val)

Local OS

To object B

19 / 36
Architectures: System architecture Centralized organizations

Centralized system architectures

Basic Client–Server Model


Characteristics:
There are processes offering services (servers)
There are processes that use services (clients)
Clients and servers can be on different machines
Clients follow request/reply model with respect to using services

Client Server

Request

Wait Provide service


Reply

Simple client-server architecture 20 / 36


Architectures: System architecture Centralized organizations

Multi-tiered centralized system architectures


Some traditional organizations
Single-tiered: dumb terminal/mainframe configuration
Two-tiered: client/single server configuration
Three-tiered: each layer on separate machine

Traditional two-tiered configurations


Client machine

User interface User interface User interface User interface User interface
Application Application Application
Database

User interface

Application Application Application


Database Database Database Database Database

Server machine
(a) (b) (c) (d) (e)
Multitiered Architectures 21 / 36
Architectures: System architecture Centralized organizations

Being client and server at the same time

Three-tiered architecture
Client Application Database
server server
Request
operation
Request
data
Wait for Wait for
reply data

Return
data
Return
reply

Multitiered Architectures 22 / 36
Architectures: System architecture Decentralized organizations: peer-to-peer systems

Alternative organizations

Vertical distribution
Comes from dividing distributed applications into three logical layers, and
running the components from each layer on a different server (machine).

Horizontal distribution
A client or server may be physically split up into logically equivalent parts, but
each part is operating on its own share of the complete data set.

Peer-to-peer architectures
Processes are all equal: the functions that need to be carried out are
represented by every process ⇒ each process will act as a client and a server
at the same time (i.e., acting as a servant).

23 / 36
Architectures: System architecture Decentralized organizations: peer-to-peer systems

Structured P2P
Essence
Make use of a semantic-free index: each data item is uniquely associated with
a key, in turn used as an index. Common practice: use a hash function
key(data item) = hash(data item’s value).
P2P system now responsible for storing (key,value) pairs.

Simple example: hypercube

0000 0001 1001


1000
0010 0011 1011
1010

0100
0101 1101
1100
0110 0111 1111
1110

Looking up d with key k ∈ {0, 1, 2, . . . , 24 − 1} means routing request to node


with identifier k .
Structured peer-to-peer systems 24 / 36
Architectures: System architecture Decentralized organizations: peer-to-peer systems

Example: Chord

Principle
Nodes are logically organized in a ring. Each node has an m-bit identifier.
Each data item is hashed to an m-bit key.
Data item with key k is stored at node with smallest identifier id ≥ k ,
called the successor of key k .
The ring is extended with various shortcut links to other nodes.

Structured peer-to-peer systems 25 / 36


Architectures: System architecture Decentralized organizations: peer-to-peer systems

Example: Chord

31 0 1
30 2
29 3
Actual node
28 Shortcut 4

27 5

26 6
Nonexisting
25 7
node

24 8

23 Node responsible for


9
keys {5,6,7,8,9}
22 10

21 11

20 12
19 13
18 14
17 16 15

lookup(3)@9 : 28 → 1 → 4
Structured peer-to-peer systems 26 / 36
Architectures: System architecture Decentralized organizations: peer-to-peer systems

Unstructured P2P

Essence
Each node maintains an ad hoc list of neighbors. The resulting overlay
resembles a random graph: an edge hu, v i exists only with a certain probability
P[hu, v i].

Searching
Flooding: issuing node u passes request for d to all neighbors. Request
is ignored when receiving node had seen it before. Otherwise, v searches
locally for d (recursively). May be limited by a Time-To-Live: a maximum
number of hops.
Random walk: issuing node u passes request for d to randomly chosen
neighbor, v . If v does not have d, it forwards request to one of its
randomly chosen neighbors, and so on.

Unstructured peer-to-peer systems 27 / 36


Architectures: System architecture Decentralized organizations: peer-to-peer systems

Flooding versus random walk

Model
Assume N nodes and that each data item is replicated across r randomly
chosen nodes.

Random walk
P[k ] probability that item is found after k attempts:
r r
P[k ] = (1 − )k−1 .
N N
S (“search size”) is expected number of nodes that need to be probed:
N N
r r
S= ∑ k · P[k ] = ∑ k · N (1 − N )k −1 ≈ N/r for 1  r ≤ N.
k =1 k =1

Unstructured peer-to-peer systems 28 / 36


Architectures: System architecture Decentralized organizations: peer-to-peer systems

Flooding versus random walk

Flooding
Flood to d randomly chosen neighbors
After k steps, some R(k ) = d · (d − 1)k−1 will have been reached
(assuming k is small).
With fraction r /N nodes having data, if Nr · R(k) ≥ 1, we will have found
the data item.

Comparison
If r /N = 0.001, then S ≈ 1000
With flooding and d = 10, k = 4, we contact 7290 nodes.
Random walks are more communication efficient, but might take longer
before they find the result.

Unstructured peer-to-peer systems 29 / 36


Architectures: System architecture Decentralized organizations: peer-to-peer systems

Super-peer networks

Essence
It is sometimes sensible to break the symmetry in pure peer-to-peer networks:
When searching in unstructured P2P systems, having index servers
improves performance
Deciding where to store data can often be done more efficiently through
brokers.

Super peer
Overlay network of super peers

Weak peer

Hierarchically organized peer-to-peer networks 30 / 36


Architectures: System architecture Decentralized organizations: peer-to-peer systems

Skype’s principle operation: A wants to contact B

Both A and B are on the public Internet


A TCP connection is set up between A and B for control packets.
The actual call takes place using UDP packets between negotiated ports.

A operates behind a firewall, while B is on the public Internet


A sets up a TCP connection (for control packets) to a super peer S
S sets up a TCP connection (for relaying control packets) to B
The actual call takes place through UDP and directly between A and B

Both A and B operate behind a firewall


A connects to an online super peer S through TCP
S sets up TCP connection to B .
For the actual call, another super peer is contacted to act as a relay R : A
sets up a connection to R , and so will B .
All voice traffic is forwarded over the two TCP connections, and through R .

Hierarchically organized peer-to-peer networks 31 / 36


Architectures: System architecture Hybrid Architectures

Edge-server architecture

Essence
Systems deployed on the Internet where servers are placed at the edge of the
network: the boundary between enterprise networks and the actual Internet.

Client Content provider

ISP
ISP

Core Internet

Edge server
Enterprise network

Edge-server systems 32 / 36
Architectures: System architecture Hybrid Architectures

Collaboration: The BitTorrent case

Principle: search for a file F


Lookup file at a global directory ⇒ returns a torrent file
Torrent file contains reference to tracker: a server keeping an accurate
account of active nodes that have (chunks of) F .
P can join swarm, get a chunk for free, and then trade a copy of that
chunk for another one with a peer Q also in the swarm.

Client node
K out of N nodes

Lookup(F) Node 1

A BitTorrent List of nodes Node 2


Web page or torrent file
with (chunks of)
search engine for file F
file F
Web server File server Tracker
Node N

Collaborative distributed systems 33 / 36


Architectures: System architecture Hybrid Architectures

BitTorrent under the hood

Some essential details


A tracker for file F returns the set of its downloading processes: the
current swarm.
A communicates only with a subset of the swarm: the neighbor set NA .
if B ∈ NA then also A ∈ NB .
Neighbor sets are regularly updated by the tracker

Exchange blocks
A file is divided into equally sized pieces (typically each being 256 KB)
Peers exchange blocks of pieces, typically some 16 KB.
A can upload a block d of piece D, only if it has piece D.
Neighbor B belongs to the potential set PA of A, if B has a block that A
needs.
If B ∈ PA and A ∈ PB : A and B are in a position that they can trade a block.

Collaborative distributed systems 34 / 36


Architectures: System architecture Hybrid Architectures

BitTorrent phases

Bootstrap phase
A has just received its first piece (through optimistic unchoking: a node from NA
unselfishly provides the blocks of a piece to get a newly arrived node started).

Trading phase
|PA | > 0: there is (in principle) always a peer with whom A can trade.

Last download phase


|PA | = 0: A is dependent on newly arriving peers in NA in order to get the last
missing pieces. NA can change only through the tracker.

Collaborative distributed systems 35 / 36


Architectures: System architecture Hybrid Architectures

BitTorrent phases

Development of |P| relative to |N|.


1.0

0.8

0.6
|P|
|N|
0.4
|N| = 5
0.2 |N| = 10
|N| = 40

0.0 0.2 0.4 0.6 0.8 1.0


Fraction pieces downloaded

Collaborative distributed systems 36 / 36


Distributed Systems
Principles and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer Science


[email protected]

Chapter 03: Processes


Version: November 1, 2012
Processes 3.1 Threads

Introduction to Threads

Basic idea
We build virtual processors in software, on top of physical processors:

Processor: Provides a set of instructions along with the capability of


automatically executing a series of those instructions.
Thread: A minimal software processor in whose context a series of
instructions can be executed. Saving a thread context implies stopping
the current execution and saving all the data needed to continue the
execution at a later stage.
Process: A software processor in whose context one or more threads may be
executed. Executing a thread, means executing a series of instructions
in the context of that thread.

2 / 34
Processes 3.1 Threads

Context Switching

Contexts
Processor context: The minimal collection of values stored in the
registers of a processor used for the execution of a series of
instructions (e.g., stack pointer, addressing registers, program
counter).
Thread context: The minimal collection of values stored in
registers and memory, used for the execution of a series of
instructions (i.e., processor context, state).
Process context: The minimal collection of values stored in
registers and memory, used for the execution of a thread (i.e.,
thread context, but now also at least MMU register values).

3 / 34
Processes 3.1 Threads

Context Switching

Contexts
Processor context: The minimal collection of values stored in the
registers of a processor used for the execution of a series of
instructions (e.g., stack pointer, addressing registers, program
counter).
Thread context: The minimal collection of values stored in
registers and memory, used for the execution of a series of
instructions (i.e., processor context, state).
Process context: The minimal collection of values stored in
registers and memory, used for the execution of a thread (i.e.,
thread context, but now also at least MMU register values).

3 / 34
Processes 3.1 Threads

Context Switching

Contexts
Processor context: The minimal collection of values stored in the
registers of a processor used for the execution of a series of
instructions (e.g., stack pointer, addressing registers, program
counter).
Thread context: The minimal collection of values stored in
registers and memory, used for the execution of a series of
instructions (i.e., processor context, state).
Process context: The minimal collection of values stored in
registers and memory, used for the execution of a thread (i.e.,
thread context, but now also at least MMU register values).

3 / 34
Processes 3.1 Threads

Context Switching

Observations
1 Threads share the same address space. Thread context switching
can be done entirely independent of the operating system.
2 Process switching is generally more expensive as it involves
getting the OS in the loop, i.e., trapping to the kernel.
3 Creating and destroying threads is much cheaper than doing so
for processes.

4 / 34
Processes 3.1 Threads

Threads and Operating Systems

Main issue
Should an OS kernel provide threads, or should they be implemented as
user-level packages?

User-space solution
All operations can be completely handled within a single process ⇒
implementations can be extremely efficient.
All services provided by the kernel are done on behalf of the process in
which a thread resides ⇒ if the kernel decides to block a thread, the
entire process will be blocked.
Threads are used when there are lots of external events: threads block
on a per-event basis ⇒ if the kernel can’t distinguish threads, how can it
support signaling events to them?

5 / 34
Processes 3.1 Threads

Threads and Operating Systems

Kernel solution
The whole idea is to have the kernel contain the implementation of a thread
package. This means that all operations return as system calls
Operations that block a thread are no longer a problem: the kernel
schedules another available thread within the same process.
Handling external events is simple: the kernel (which catches all events)
schedules the thread associated with the event.
The problem is (or used to be) the loss of efficiency due to the fact that
each thread operation requires a trap to the kernel.

Conclusion – but
Try to mix user-level and kernel-level threads into a single concept, however,
performance gain has not turned out to outweigh the increased complexity.

6 / 34
Processes 3.1 Threads

Threads and Distributed Systems

Multithreaded Web client


Hiding network latencies:
Web browser scans an incoming HTML page, and finds that more files
need to be fetched.
Each file is fetched by a separate thread, each doing a (blocking) HTTP
request.
As files come in, the browser displays them.

Multiple request-response calls to other machines (RPC)


A client does several calls at the same time, each one by a different
thread.
It then waits until all results have been returned.
Note: if calls are to different servers, we may have a linear speed-up.

7 / 34
Processes 3.1 Threads

Threads and Distributed Systems

Improve performance
Starting a thread is much cheaper than starting a new process.
Having a single-threaded server prohibits simple scale-up to a
multiprocessor system.
As with clients: hide network latency by reacting to next request while
previous one is being replied.

Better structure
Most servers have high I/O demands. Using simple, well-understood
blocking calls simplifies the overall structure.
Multithreaded programs tend to be smaller and easier to understand due
to simplified flow of control.

8 / 34
Processes 3.2 Virtualizaton

Virtualization
Observation
Virtualization is becoming increasingly important:
Hardware changes faster than software
Ease of portability and code migration
Isolation of failing or attacked components

Program

Interface A
Program Implementation of
mimicking A on B
Interface A Interface B

Hardware/software system A Hardware/software system B

(a) (b)
9 / 34
Processes 3.2 Virtualizaton

Architecture of VMs

Observation
Virtualization can take place at very different levels, strongly depending
on the interfaces as offered by various systems components:

Library functions Application

Library
System calls

Privileged Operating system General


instructions instructions
Hardware

10 / 34
Processes 3.2 Virtualizaton

Process VMs versus VM Monitors

Application Applications
Runtime system Operating system
Runtime system Operating system
Runtime system Operating system

Operating system Virtual machine monitor

Hardware Hardware

(a) (b)

Process VM: A program is compiled to intermediate (portable)


code, which is then executed by a runtime system (Example: Java
VM).
VM Monitor: A separate software layer mimics the instruction set
of hardware ⇒ a complete operating system and its applications
can be supported (Example: VMware, VirtualBox).
11 / 34
Processes 3.2 Virtualizaton

VM Monitors on operating systems

Practice
We’re seeing VMMs run on top of existing operating systems.
Perform binary translation: while executing an application or
operating system, translate instructions to that of the underlying
machine.
Distinguish sensitive instructions: traps to the orginal kernel (think
of system calls, or privileged instructions).
Sensitive instructions are replaced with calls to the VMM.

12 / 34
Processes 3.3 Clients

Clients: User Interfaces

Essence
A major part of client-side software is focused on (graphical) user
interfaces.

Application server Application server User's terminal

Window Application Xlib interface


manager

Xlib Xlib
Local OS Local OS X protocol

X kernel
Device drivers

Terminal (includes display


keyboard, mouse, etc.)

13 / 34
Processes 3.3 Clients

Client-Side Software

Generally tailored for distribution transparency


access transparency: client-side stubs for RPCs
location/migration transparency: let client-side software keep track of
actual location
replication transparency: multiple invocations handled by client stub:
Client machine Server 1 Server 2 Server 3

Client Server Server Server


appl. appl appl appl

Client side handles


request replication Replicated request

failure transparency: can often be placed only at client (we’re trying to


mask server and communication failures).

14 / 34
Processes 3.4 Servers

Servers: General organization

Basic model
A server is a process that waits for incoming service requests at a
specific transport address. In practice, there is a one-to-one mapping
between a port and a service.

ftp-data 20 File Transfer [Default Data]


ftp 21 File Transfer [Control]
telnet 23 Telnet
24 any private mail system
smtp 25 Simple Mail Transfer
login 49 Login Host Protocol
sunrpc 111 SUN RPC (portmapper)
courier 530 Xerox RPC

15 / 34
Processes 3.4 Servers

Servers: General organization

Type of servers
Superservers: Servers that listen to several ports, i.e., provide several
independent services. In practice, when a service request comes
in, they start a subprocess to handle the request (UNIX inetd)
Iterative vs. concurrent servers: Iterative servers can handle only one
client at a time, in contrast to concurrent servers

16 / 34
Processes 3.4 Servers

Servers and state

Stateless servers
Never keep accurate information about the status of a client after having
handled a request:
Don’t record whether a file has been opened (simply close it again after
access)
Don’t promise to invalidate a client’s cache
Don’t keep track of your clients

Consequences
Clients and servers are completely independent
State inconsistencies due to client or server crashes are reduced
Possible loss of performance because, e.g., a server cannot anticipate
client behavior (think of prefetching file blocks)

18 / 34
Processes 3.4 Servers

Servers and state

Stateless servers
Never keep accurate information about the status of a client after having
handled a request:
Don’t record whether a file has been opened (simply close it again after
access)
Don’t promise to invalidate a client’s cache
Don’t keep track of your clients

Consequences
Clients and servers are completely independent
State inconsistencies due to client or server crashes are reduced
Possible loss of performance because, e.g., a server cannot anticipate
client behavior (think of prefetching file blocks)

18 / 34
Processes 3.4 Servers

Servers and state

Stateful servers
Keeps track of the status of its clients:
Record that a file has been opened, so that prefetching can be
done
Knows which data a client has cached, and allows clients to keep
local copies of shared data

Observation
The performance of stateful servers can be extremely high, provided
clients are allowed to keep local copies. As it turns out, reliability is not
a major problem.

20 / 34
Processes 3.4 Servers

Servers and state

Stateful servers
Keeps track of the status of its clients:
Record that a file has been opened, so that prefetching can be
done
Knows which data a client has cached, and allows clients to keep
local copies of shared data

Observation
The performance of stateful servers can be extremely high, provided
clients are allowed to keep local copies. As it turns out, reliability is not
a major problem.

20 / 34
Distributed Systems
Principles and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer Science


[email protected]

Chapter 04: Communication


Version: November 5, 2012
Communication 4.1 Layered Protocols

Layered Protocols

Low-level layers
Transport layer
Application layer
Middleware layer

2 / 55
Communication 4.1 Layered Protocols

Basic networking model

Application protocol
Application 7
Presentation protocol
Presentation 6
Session protocol
Session 5
Transport protocol
Transport 4
Network protocol
Network 3
Data link protocol
Data link 2
Physical protocol
Physical 1

Network

Drawbacks
Focus on message-passing only
Often unneeded or unwanted functionality
Violates access transparency

3 / 55
Communication 4.1 Layered Protocols

Low-level layers

Recap
Physical layer: contains the specification and implementation of
bits, and their transmission between sender and receiver
Data link layer: prescribes the transmission of a series of bits into
a frame to allow for error and flow control
Network layer: describes how packets in a network of computers
are to be routed.

Observation
For many distributed systems, the lowest-level interface is that of the
network layer.

4 / 55
Communication 4.1 Layered Protocols

Transport Layer

Important
The transport layer provides the actual communication facilities for
most distributed systems.

Standard Internet protocols


TCP: connection-oriented, reliable, stream-oriented
communication
UDP: unreliable (best-effort) datagram communication

Note
IP multicasting is often considered a standard available service (which
may be dangerous to assume).

5 / 55
Communication 4.1 Layered Protocols

Middleware Layer

Observation
Middleware is invented to provide common services and protocols that
can be used by many different applications

A rich set of communication protocols


(Un)marshaling of data, necessary for integrated systems
Naming protocols, to allow easy sharing of resources
Security protocols for secure communication
Scaling mechanisms, such as for replication and caching

Note
What remains are truly application-specific protocols...
such as?

6 / 55
Communication 4.1 Layered Protocols

Types of communication
Synchronize at Synchronize at Synchronize after
request submission request delivery processing by server
Client

Request

Transmission
interrupt
Storage
facility

Reply

Server Time

Distinguish
Transient versus persistent communication
Asynchrounous versus synchronous communication

7 / 55
7/31/2017 Synchronous vs asynchronous - javatpoint

Understanding Synchronous vs
Asynchronous
Before understanding AJAX, let’s understand classic web application
model and ajax web application model first.

Synchronous (Classic Web-Application


Model)
A synchronous request blocks the client until operation completes i.e.
browser is not unresponsive. In such case, javascript engine of the
browser is blocked.

As you can see in the above image, full page is refreshed at request
time and user is blocked until request completes.

Let's understand it another way.

https://www.javatpoint.com/understanding-synchronous-vs-asynchronous 1/4
7/31/2017 Synchronous vs asynchronous - javatpoint

Asynchronous (AJAX Web-Application


Model)
An asynchronous request doesn’t block the client i.e. browser is
responsive. At that time, user can perform another operations also. In
such case, javascript engine of the browser is not blocked.

https://www.javatpoint.com/understanding-synchronous-vs-asynchronous 2/4
7/31/2017 Synchronous vs asynchronous - javatpoint

As you can see in the above image, full page is not refreshed at request
time and user gets response from the ajax engine.

Let's try to understand asynchronous communication by the image


given below.

Note: every blocking operation is not synchronous and every


unblocking operation is not asynchronous.

← prev next →


Please Share

https://www.javatpoint.com/understanding-synchronous-vs-asynchronous 3/4
Communication 4.2 Remote Procedure Call

Remote Procedure Call (RPC)

Basic RPC operation


Parameter passing
Variations

12 / 55
Communication 4.2 Remote Procedure Call

Basic RPC operation

Observations
Application developers are familiar with simple procedure model
Well-engineered procedures operate in isolation (black box)
There is no fundamental reason not to execute procedures on
separate machine

Wait for result


Conclusion Client

Communication between caller & Call remote


procedure
Return
from call
callee can be hidden by using
procedure-call mechanism. Request Reply
Server
Call local procedure Time
and return results

13 / 55
Communication 4.2 Remote Procedure Call

Asynchronous RPCs

Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.

Client Wait for result Client Wait for acceptance

Call remote Return Call remote Return


procedure from call procedure from call

Request Request Accept request


Reply

Server Call local procedure Time Server Call local procedure Time
and return results
(a) (b)

17 / 55
Lecture 4.1

Remote Method Invocation


(RMI)
References
 http://www.cis.upenn.edu/~bcpierce/courses/629/j
dkdocs/guide/rmi/spec/rmiTOC.doc.html

2
Overview
 Distributed systems require that computations
running in different address spaces, potentially on
different hosts, be able to communicate. For a basic
communication mechanism, the Java language
supports sockets, which are flexible and sufficient for
general communication. However, sockets require
the client and server to engage in applications-level
protocols to encode and decode messages for
exchange, and the design of such protocols is
cumbersome and can be error-prone.

3
Cont.
 the Java language's RMI system assumes
the homogeneous environment of the Java
Virtual Machine, and the system can
therefore take advantage of the Java object
model whenever possible.

4
RMI Interfaces and Classes
 The interfaces and classes that are
responsible for specifying the remote
behavior of the RMI system are defined in the
java.rmi and the java.rmi.server packages.
The figure in the next slide shows the
relationship between these interfaces and
classes:

5
6
The Remote Interface
 All remote interfaces extend, either directly or
indirectly, the interface java.rmi.remote. The Remote
interface defines no methods, as shown here:
public interface Remote {}

Example:
public interface HelloInterface extends Remote
{
public String say() throws RemoteException;
}

7
Implementing a Remote Interface
 The general rules for a class that implements
a remote interface are as follows:
 The class can implement any number of
remote interfaces.
 The class can extend another remote
implementation class.
 The class can define methods that do not
appear in the remote interface, but those
methods can only be used locally and are not
available remotely.
8
System Architectural Overview
 The RMI system consists of three layers:
 The stub/skeleton layer - client-side stubs (proxies)
and server-side skeletons
 The remote reference layer - remote reference
behavior (such as invocation to a single object or to a
replicated object)
 The transport layer - connection set up and
management and remote object tracking
 The application layer sits on top of the RMI system.
The relationship between the layers is shown in the
following figure.
9
10
Cont.
 A remote method invocation from a client to a remote server
object travels down through the layers of the RMI system to the
client-side transport, then up through the server-side transport to
the server

 A client invoking a method on a remote server object actually


makes use of a stub or proxy for the remote object as a conduit
to the remote object. A client- held reference to a remote object
is a reference to a local stub. This stub is an implementation of
the remote interfaces of the remote object and forwards
invocation requests to that server object via the remote
reference layer. Stubs are generated using the rmic compiler.

 The remote reference layer is responsible for carrying out the


semantics of the invocation.
11
The Stub/Skeleton Layer
 The stub/skeleton layer is the interface between the
application layer and the rest of the RMI system.

 A stub for a remote object is the client-side proxy for


the remote object. Such a stub implements all the
interfaces that are supported by the remote object
implementation.

 A skeleton for a remote object is a server-side entity


that contains a method which dispatches calls to the
actual remote object implementation.
12
The Remote Reference Layer
 The remote reference layer deals with the
lower-level transport interface. This layer is
also responsible for carrying out a specific
remote reference protocol which is
independent of the client stubs and server
skeletons.

13
The Transport Layer
 In general, the transport layer of the RMI system is
responsible for:
 Setting up connections to remote address spaces.
 Managing connections.
 Monitoring connection "liveness."
 Listening for incoming calls.
 Maintaining a table of remote objects that reside in the
address space.
 Setting up a connection for an incoming call.
 Locating the dispatcher for the target of the remote call
and passing the connection to this dispatcher.

14
Client Interfaces
 The Remote Interface
 package java.rmi; public interface Remote {}
 The java.rmi.Remote interface serves to identify all remote objects, all
remote objects must directly or indirectly implement this interface. Note
that all remote interfaces must be declared public.

 The RemoteException Class


 All remote exceptions are subclasses of java.rmi.RemoteException.
This allows interfaces to handle all types of remote exceptions and to
distinguish local exceptions, and exceptions specific to the method,
from exceptions thrown by the underlying distributed object
mechanisms.

 The Naming Class


 The java.rmi.Naming class allows remote objects to be retrieved and
defined using the familiar Uniform Resource Locator (URL) syntax. The
URL consists of protocol, host, port, and name fields. The Registry
service on the specified host and port is used to perform the specified
operation. The protocol should be specified as rmi, as in
rmi://java.sun.com:1099/root or rmi://localhost:1099/Hello
15
Server Interfaces
 The UnicastRemoteObject Class
 The java.rmi.server.UnicastRemoteObject
class provides support for point-to-point active
object references using TCP-based streams.
The class implements a remote server object
with the following characteristics:
 References are valid only for, at most, the life of
the process that creates the remote object.
 A TCP connection-based transport is used.
 Invocations, parameters, and results use a
stream protocol for communicating between client
and server.
16
Stub and Skeleton Compiler
 The rmic stub and skeleton compiler is used
to compile the appropriate stubs and
skeletons for a specific remote object
implementation. The compiler is invoked with
the package qualified class name of the
remote object class. The class must
previously have been compiled successfully.

17
Communication 4.3 Message-Oriented Communication

Message-Oriented Communication

Transient Messaging
Message-Queuing System JMS etc
Message Brokers like CORBA
Example: IBM Websphere

21 / 55
Message-Oriented Middleware and
The Java Message Service API
(JMS)
javax.jms
home page:
http://www.oracle.com/technetwork/java/jms/index.html
Ref: D2212 Network Programming with JavaLecture 8
by Vladimir Vlassov and Leif Lindbäck, KTH/ICT/SCS,
HT 2015
Message-Oriented Middleware,
MOM

Enables the exchange of general-purpose
messages in a distributed application.

Data is exchanged by message queuing, typically
asynchronously.

Reliable message delivery is achieved using
message queues, and by providing security,
transactions and the required administrative
services.
The Java Message Service API (JMS)
• JMS provides a Java API for an existing message queue. The
JMS specification defines how to call the provider.
• Asynchronous message production
• Asynchronous message consumption by a message listener
registered as consumer.
– Message-driven EJBs (Enterprise Java Beans) asynchronously
consume messages.
• Reliable messaging: Can ensure that a message is delivered
once and only once.
• JMS provider is a messaging agent performing messaging

Lecture 8: JMS. JavaEmail API. JNDI 5


Two Messaging Domains
• Queues: Point-to-Point (PTP) Messaging Domain
Each message has only one consumer.

• Topics: Publish/Subscribe (pub/sub) Messaging Domain

Each message can have multiple consumers

Lecture 8: JMS. JavaEmail API. JNDI 6


JMS Architecture
• A JMS application is composed of:
• A JMS provider
– There are many message queues that can be
used as JMS provider, e.g., Apache ActiveMQ,
RabbitMQ and IBM WebSphere MQ. The
GlassFish server also includes a JMS provider.
• JMS clients
– producing and/or consuming messages.
• Messages
– objects that communicate information between
JMS clients.
• Administered objects
– Destinations (D);
– Connection Factories (CF) described in
Administered Objects
– created by an administrator for the use of
clients
Lecture 8: JMS. JavaEmail API. JNDI 8
JMS Programming Concepts
• Administered Objects
– Connection Factory
– Destinations (queues, topics,
both)
• Connection
• Session
• Message Producers
• Message Consumers
– Message consumers
– Message listeners
– Message selectors
• Messages • Steps:
– Creating a connection and a session
– Headers, properties, bodies – Creating message producers and
• Queue Browsers consumers
– Sending and receiving messages

Lecture 8: JMS. JavaEmail API. JNDI 9


ConnectionFactory

An administered object, deployed to the server by the message
queue administrator.

Encapsulates a set of connection configuration parameters,
defined by the administrator.

Used by a JMS client to create a connection with a JMS
provider.

When used in a Java EE server, the connection factory object is
created and injected by the server:

@Resource(mappedName="jms/MyConnectionFactory")
private static ConnectionFactory connectionFactory;

Lecture 8: JMS. JavaEmail API. JNDI 10


Destination

An administered object, deployed to the server by the message
queue administrator.

Encapsulates a provider-specific address.

Used by a client to specify the target of messages it produces
and the source of messages it consumes.

When used in a Java EE server, the connection factory object is
created and injected by the server:
@Resource(mappedName="jms/MyQueue")
private static Queue queue;

@Resource(mappedName="jms/MyTopic")
private static Topic topic;

Lecture 8: JMS. JavaEmail API. JNDI 11


Connection

Encapsulates an open connection with a JMS provider.

Typically represents an open TCP/IP socket between a client
and the service provider.

Created by a ConnectionFactory :

Connection connection =
connectionFactory.createConnection();
...
connection.close();

Lecture 8: JMS. JavaEmail API. JNDI 12


Session

A single-threaded context for producing and consuming
messages.

Used to create message producers and consumers,
messages, queue browsers, temporary queues and topics.

Retains messages it consumes until they have been
acknowledged.

A not transacted session with automatic acknowledgement
of messages:
Session session = connection.createSession( false,
Session.AUTO_ACKNOWLEDGE);

A transacted session, messages are acknowledged on
commit:
Session session = connection.createSession( true, 0);

Lecture 8: JMS. JavaEmail API. JNDI 13


MessageProducer
A message producer is created by a session, and used for
sending messages to a destination.
– Create a producer for a Destination object (Queue or
Topic ):
MessageProducer producer =
session.createProducer(destination);
– Send messages by using the send method:
producer.send(message);
– Create an unidentified producer and specify a
destination when sending a message:
MessageProducer producer =
session.createProducer(null);
producer.send(destination, message);

Lecture 8: JMS. JavaEmail API. JNDI 14


MessageConsumer
• A message consumer is created by a session and used for receiving
messages sent to a destination.
• Create a consumer for a Destination object (Queue or Topic ):
MessageConsumer consumer =
session.createConsumer(dest);
• Start the connection and use the receive method to consume a message
synchronously.
connection.start();
Message m = consumer.receive();
Message m = consumer.receive(1000); // time out after a
second

Lecture 8: JMS. JavaEmail API. JNDI 15


MessageListener
• A message listener acts as an asynchronous event handler
for messages.
– Implements the MessageListener interface, wich has one
method, onMessage.

public void onMessage(Message message);

• Register the message listener with a specific


MessageConsumer
Listener myListener = new Listener();
consumer.setMessageListener(myListener);

Lecture 8: JMS. JavaEmail API. JNDI 16


Messages Header Field
JMSDestination
Set By

JMSDeliveryMod
• A JMS message has three e send or
publish
parts: JMSExpiration method

1. (required) a header, JMSPriority


JMSMessageID
2. (optional) properties,
JMSTimestamp
3. (optional) a body.
JMSCorrelation
• A header contains ID Client
predefined fields with
values that both clients JMSReplyTo

and providers use to JMSType


identify and to route
JMSRedelivered JMS provider
messages.
Lecture 8: JMS. JavaEmail API. JNDI 17
Message Body
Types Message Type
TextMessage
Contents

A String object (for example,


• Five message body formats (a.k.a. the contents of an XML file).
message types)
MapMessage A set of name-value pairs, names
TextMessage message = as String and values as primitive
session.createTextMessage(); types. Entries can be accessed
message.setText(msg_text); sequentially by enumerator or
randomly by name.
producer.send(message);
...
Message m = consumer.receive(); BytesMessage A stream of bytes.
if (m instanceof TextMessage) {
TextMessage message =
(TextMessage) m; StreamMessage A stream of primitive values.
System.out.println(”Message:"
+ message.getText());
ObjectMessage
} else { A Serializable object.
// Handle error
} Message Nothing, but header fields and
properties only.

Lecture 8: JMS. JavaEmail API. JNDI 18

You might also like