Module 1 Chapter 2
Module 1 Chapter 2
Module 1 Chapter 2
Module-1
ARCHITECTURE
1
2.0 ARCHITECTURE
* The organization of distributed systems is mostly about the software components that
constitute the system.
* These software architectures tell us how the various software components are to be
organized and how they should interact.
* An important goal of distributed systems is to separate applications from underlying
platforms by providing a middleware layer.
* Adopting such a layer is an important architectural decision, and its main purpose is to
provide distribution transparency.
2
2.1 ARCHITECTURAL STYLES
* Logical organization of a DS into software components is referred to as its software
architecture.
* Architecture style is formulated in terms of components, the way that components are
connected to each other, the data exchanged between components, and finally how these
elements are jointly configured into a system.
* Component: A modular unit with well-defined required and provided interfaces that is
replaceable within its environment.
* Connector: A mechanism that mediates communication, coordination, or cooperation
among components. It allows the flow of control and data between components.
* Important styles of architecture for DS are discussed.
3
LAYERED ARCHITECTURES
In layered architecture components are organized in a layered
fashion where a component at layer Lj can make a downcall to a
component at a lower-level layer Li (with i < j) and generally
expects a response.
Only in exceptional cases will an upcall be made to a higher-level
component.
There are three common cases as shown in figure- 2.1
4
A Application using library Los to
Lmath
LOS
Figure 2.1: (a) Pure layered organization. (b) Mixed layered organization.
(c) Layered organization with upcalls 5
Layered communication protocols
Server
1 from socket import *
2 s = socket(AF_INET, SOCK_STREAM)
3 (conn, addr) = s.accept() # returns new socket and addr.
client
5 while
4 data True:
= conn.recv(1024) ## receive
forever data from
6 if not data: break
client # stop if client stopped
7 conn.send(str(data)+"*") # return sent data plus an
"*conn.close()
8 " # close the connection
Client
1 from socket import *
2 s = socket(AF_INET, SOCK_STREAM)
3 s.connect((HOST, PORT)) # connect to server (block until
accepted)
4 data
5 =
s.send(’Hello, world’) ## send
receive
somethe
data
s.recv(1024) response # print
6 print data the result
7 s.close() # close the
connection
8
Application Layering
Logical layering of applications is viewed at three different layers.
1) The application-interface level
2) The processing level
3) The data level
In line with this layering, we see that applications can often be constructed from roughly
three different pieces:
a) A part that handles interaction with a user or some external application
b) A part that operates on a database or file system
c) A middle part that generally contains the core functionality of the application
Example: Internet search engine
9
Figure 2.4: The simplified organization of an Internet search engine into three different layers.
10
Example-2: Decision support system for stock brokerage
11
OBJECT BASED AND SERVICE ORIENTED ARCHITECTURES
16
RESOURCE BASED ARCHITECTURES
19
To create a bucket, or an object for that matter, an application would
essentially send a PUT request with the URI of the bucket/object. The
protocol that is used with the service is HTTP.
It is just another HTTP request, which will subsequently be correctly
interpreted by S3.
If the bucket or object already exists, an HTTP error message is returned.
Similarly, to know which objects are contained in a bucket, an application
would send a GET request with the URI of that bucket.
S3 will return a list of object names, again as an ordinary HTTP response.
20
PUBLISH-SUBSCRIBE ARCHITECTURE
An architecture in which dependencies between processes is less.
There is a strong separation between processing and coordination.
Here system is viewed as a collection of autonomously operating processes.
In this model, coordination encompasses the communication and cooperation
between processes.
The activities performed by processes are grouped into a whole.
The distinction between the coordination models is done along two different
dimensions, temporal and referential as shown in Figure 2.9.
21
Direct coordination: When processes are temporally and referentially coupled,
coordination takes place in a direct way.
Referential coupling: Generally appears in the form of explicit referencing in
communication.
Example: A process can communicate only if it knows the name or identifier of the
other processes it wants to exchange information with.
22
Temporal coupling: Means that processes that are communicating will both have to be
up and running.
Example of direct communication: Talking over cell phones.
23
Shared data spaces are often combined with event-based coordination:
A process subscribes to certain tuples by providing a search pattern; when a process inserts a tuple into the
data space, matching subscribers are notified. In both cases, we are dealing with a publish-subscribe
architecture, and indeed, the key characteristic feature is that processes have no explicit reference to each
other.
An important aspect of publish-subscribe systems is that communication takes place by describing the
events that a subscriber is interested in.
24
The difference between a pure event-based architectural style, and that of a shared
data space is shown in Figure 2.10.
Figure 2.10: The (a) event-based and (b) shared data-space architectural style.
25
Example: Linda tuple space (Linda- A programming model)
Data space in Linda is known as a tuple space, which essentially supports three operations:
• in(t): remove a tuple that matches the template t
• rd(t): obtain a copy of a tuple that matches the template t
• out(t): add the tuple t to the tuple space
• Calling out(t) twice in a row, leads to storing two copies of tuple t ⇒ a tuple space
is modeled as a multiset.
• Both in and rd are blocking operations: the caller will be blocked until a matching
tuple is found, or has become available.
1 blog = linda.universe._rd(("MicroBlog",linda.TupleSpace))[1]
2
3 blog._out(("alice","gtcn","This graph theory stuff is not easy"))
4 blog._out(("alice","distsys","I like systems more than graphs"))
(b) Alice’s code for creating a microblog and posting two messages.
1 blog = linda.universe._rd(("MicroBlog",linda.TupleSpace))[1]
2
3 t1 = blog._rd(("bob","distsys",str))
4 t2 = blog._rd(("alice","gtcn",str))
5 t3 = blog._rd(("bob","gtcn",str))
(c) Chuck reading a message from Bob’s and Alice’s microblog.
28
Wrappers are much more than simple interface transformers.
Example: An object adapter is a component that allows applications to
invoke remote objects.
Wrappers help in extending systems with existing components.
Example: If an application A managed data that was needed by application B,
one approach would be to develop a wrapper specific for B so that it could
have access to A’s data.
Broker contacts the relevant applications and gives back result. ( Figure 2.13)
Figure 2.13: (a) Requiring each application to have a wrapper for each other application.
2. The call by A is transformed into a generic object invocation, made possible through a general
object-invocation interface offered by the middleware at the machine where A resides.
3. Finally, the generic object invocation is transformed into a message that is sent through the transport-
level network interface as offered by A’s local operating system.
32
Figure 2.14: Using interceptors to handle remote-object invocations.
33
After the first step, the call B.doit(val) is transformed into a generic call such as invoke(B,
&doit, val) with a reference to B’s method and the parameters that go along with the call.
Now object B is replicated. In that case, each replica should actually be invoked. This is a
clear point where interception can help. The request-level interceptor will call invoke(B,
&doit, val) for each of the replicas.
A call to a remote object will be sent over the network by using the messaging interface as
offered by the local OS.
At that level, a message-level interceptor may assist in transferring the invocation to the
target object. 34
2.3 SYSTEM ARCHITECTURE
Deciding on software components, their interaction, and their placement leads
CENTRALIZED ORGANIZATIONS
35
Simple client-server architecture
Multitiered Architectures
Based on 3 logical levels, a client server application can be physically distributed across
several machines in different ways.
The simplest organization is to have two types of machines:
1) A client machine containing only the programs implementing (part of) the user-interface
level.
2) A server machine containing the programs implementing the processing and data level
37
• Everything is handled by server and client has only graphical interface.
• The approach for organizing client and sever is to distribute the 3 layers (User interface
layer, Processing layer and data layer) across different machines as shown in Figure 2.16.
Distribution of layers
Figure
Client machine Server machine
2.16 (a)
Terminal-dependent part of the user Applications remote control over the
interface presentation of their data
2.16 (b) Rest of the application and data
Entire Graphical front end
Remaining application and database
Entire Graphical front end and part of
the application
2.16 (c)
Example-1: Forms to be filled entirely before processing.
Example-2: Word processor checking spellings at the client side.
39
Distribution of layers
Figure
Client machine Server machine
2.16 (d)
Most of the application is running on the All operations on files or database
client machine. entries
Example: Banking applications run on an end-user’s machine where the user
prepares transactions. Once finished, the application contacts the database on the
bank’s server and uploads the transactions for further processing.
2.16 (e) Client’s local disk contains part of the Remaining part of the database
data.
Example: When browsing the Web, a client can gradually build a huge cache on
local disk of most recent inspected Web pages.
40
• Three-tiered architecture (Physically): A server may some times act as a client.
• In vertical distribution functions are logically and physically split across multiple machines.
• Horizontal distribution: Distribution of the clients and the servers is referred as horizontal
distribution.
• A client or server may be physically split up into logically equivalent parts, but each
part is operating on its own share of the complete data set, thus balancing the load.
• These systems use a semantic free index, in which each data item that is to be maintained by
the system, is uniquely associated with a key, and that key is used as an index.
• Each node is assigned an identifier from the same set of all possible hash
values, and each node is made responsible for storing data associated with a
specific subset of keys.
44
Example : A hypercube which is nothing but an n-dimensional cube.
4D hypercube is viewed as 2 ordinary cubes, each with 8 vertices and 12 edges.
• Now suppose that the node with identifier 0111 is requested to look up the data having
key 14, corresponding to the binary value 1110. We assume that the node with
identifier 1110 is responsible for storing all data items that have key 14.
• What node 0111 can simply do, is forward the request to a neighbor that is closer to
node 1110. In this case, this is either node 0110 or node 1111. If it picks node 0110,
that node will then forward the request directly to node 1110 from where the data can
be retrieved. 46
Unstructured peer-to-peer systems
• When a node joins it often contacts a well-known node to obtain a starting list
of other peers in the system.
• Two usual ways to search for data: Flooding and Random Walk.
47
• Flooding: In the case of flooding, an issuing node ‘u’ simply passes a request
for a data item to all its neighbors.
• Random walks: At the other end of the search spectrum, an issuing node u can
simply try to find a data item by asking a randomly chosen neighbor, say v. If v
does not have the data, it forwards the request to one of its randomly chosen
neighbors, and so on.
48
Hierarchically organized peer-to-peer networks
• These are useful in the situations where peer-to-peer nature of the systems is
not suitable.
• Broker: Collects data on resource usage and availability for a number of nodes
that are in each other’s proximity.
• Whenever a weak peer joins the network, it attaches to one of the super peers
and remains attached until it leaves the network.
• There is need that super peers should be long lived and highly available.
• For a specific organization, one edge server acts as an origin server from which
all content originates.
• Origin server can use other edge servers for replicating Web pages.
• Cloud computing: Edge servers are used to assist in computations and storage,
essentially leading to distributed cloud systems.
• Fog computing: End-user devices form part of the system and are (partly)
controlled by a cloud-service provider
54
Collaborative distributed systems
• Hybrid structures are notably deployed in collaborative distributed systems.
• The main issue in many of these systems is to first get started, for which
often a traditional client-server scheme is deployed.
• Once a node has joined the system, it can use a fully decentralized scheme for
collaboration.
Example: BitTorrent file-sharing system
• BitTorrent is a peer-to-peer file downloading system and its principal working
is shown in Figure 2.22
55
Figure 2.22: The principal working of BitTorrent
The basic idea is that when an end user is looking for a file, he downloads
chunks of the file from other users until the downloaded chunks can be
assembled together yielding the complete file. 56
• Free riding: A phenomenon in which a significant fraction of participants merely
download files but contribute nothing.
• To prevent this situation, in BitTorrent a file can be downloaded only when the
downloading client is providing content to someone else.
• Downloading a file: A user needs to access a global directory -Well know Web
site. Such a directory contains references to what are called torrent files. A torrent
file contains the information that is needed to download a specific file. In particular,
it contains a link to what is known as a tracker, which is a server that is keeping
an accurate account of active nodes that have (chunks of) the requested file.
57
• Once the nodes have been identified from where chunks can be downloaded, the
downloading node effectively becomes active. At that point, it will be forced to help
others. For example by providing chunks of the file it is downloading that others do not yet
have.
• This enforcement comes from a very simple rule: if node P notices that node Q is
downloading more than it is uploading, P can decide to decrease the rate at which it sends
data to Q. This scheme works well provided P has something to download from Q.
Figure 2.24: (a) The remote access model. (b) The upload/download model.
60
• In Unix systems NFS is generally implemented following the layered
architecture as shown in Figure 2.25.
• With NFS, operations on the VFS interface are either passed to a local file system, or
passed to a separate component known as the NFS client, which takes care of
handling access to files stored at a remote server.
• The NFS client implements the NFS file system operations as remote procedure calls
to the server. 62
• Server side: The NFS server is responsible for handling incoming client
requests.
• The RPC component at the server converts incoming requests to regular VFS
file operations that are subsequently passed to the VFS layer.
• Again, the VFS is responsible for implementing a local file system in which the
actual files are stored.
63
The Web
• The core of a Web site is formed by a process that has access to a local file system
storing documents.
• A client interacts with Web servers through a browser, which is responsible for
properly displaying a document 64
• The communication between a browser and Web server is standardized: they
both adhere to the HyperText Transfer Protocol (HTTP). This leads to the
overall organization shown in Figure 2.27.
• CGI defines a standard way by which a Web server can execute a program taking user
data as input.
• Usually, user data come from an HTML form; it specifies the program that is to be
executed at the server side, along with parameter values that are filled in by the user.
• Once the form has been completed, the program’s name and collected parameter
values are sent to the server, as shown in Figure 2.28.
66
Figure 2.28: The principle of using server-side CGI programs.
• When the server sees the request, it starts the program named in the request and passes it
the parameter values.
• At that point, the program simply does its work and generally returns the results in the
form of a document that is sent back to the user’s browser to be displayed. 67