Chapter 1
Chapter 1
Chapter 1
Distributed Systems
Chapter One
Introduction
1
Computer systems are undergoing a revolution. From 1945, when the modern
computer era began, until about 1985, computers were large and expensive. Even
minicomputers normally cost tens of thousands of dollars each.
As a result, most organizations had only a handful of computers, and for lack of a
way to connect them, these operated independently from one another.
2
The amount of improvement that has occurred in computer technology in the past half
century is truly staggering and totally unprecedented in other industries. The second
development was the invention of high-speed computer networks.
The local area networks or LANs allow dozens, or even hundreds, of machines within a
building to be connected in such a way that small amounts of information can be
transferred between machines in a millisecond or so.
3
What is a Distributed System?
A distributed system is a collection of independent computers that appear to the users of
the system as a single computer. This definition has two aspects.
The first one deals with hardware: the machines are autonomous.
The second one deals with software: the users think of the system as a single computer.
Item Description
Device sharing Allow many users to share expensive peripherals like color printers
Flexibility Spread the workload over the available machines in the most cost
effective way
5
Advantages of distributed systems over centralized systems.
ITEM Description
Economics Microprocessors offer a better price/performance than mainframes
Speed A distributed system may have more total computing power than a
mainframe
Inherent Some applications involve spatially separated machines
distribution
Reliability If one machine crashes, the system as a whole can still survive
6
Disadvantages of Distributed Systems
Item Description
7
Characteristics of Distributed Systems
Differences between the computers and the ways they communicate are
hidden from users
Users and applications can interact with a distributed system in a
consistent and uniform way regardless of location
Distributed systems should be easy to expand and scale
Distributed system is normally continuously available, even if there
may be partial failures
o Users and applications should not notice that parts are being replaced
or fixed, or that new parts are added to serve more users or
applications
8
Goals of a Distributed System
To support heterogeneous computers and networks
To provide a single-system view, a distributed system is often organized by
means of a layer of software called middleware that extends over multiple
machines
9
Distributed system should
Make resources accessible(printers, computers, storage facilities, data,
files, Web pages, ...)
o reasons: economics, to collaborate and exchange information
Be transparent: hide the fact that the resources and processes are
distributed across multiple computers.
Be open
Be scalable
Transparency in a Distributed System
Distributed system that is able to present itself to users
and applications as if it were only a single computer
system is said to be transparent.
It exist as a single entity without worrying about
location in which users can get the services
10
Different forms of transparency in a distributed system
Transparency Description
Access Hide differences in data representation
and how a resource is accessed
Location Hide where a resource is physically located; where
is http://www.prenhall.com/index.html? (naming)
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use;
e.g., mobile users using their wireless laptops
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users; a
resource must be left in a consistent state
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk
11
Hardware Concepts
Distributed systems consist of multiple CPUs, there are several different ways the hardware
can be organized, especially in terms of how they are interconnected and how they
communicate.
Flynn(1972) picked two characteristics: the number of instruction streams and the number
of data streams.
A computer with a single instruction stream and a single data stream is called SISD.
SIMD, single instruction stream, multiple data stream. This type refers to array
processors with one instruction unit that fetches an instruction, and then commands many
data units to carry it out in parallel, each with its own data. These machines are useful for
computations that repeat the same calculation on many sets of data, for example, adding
up all the elements of 64 independent vectors. Some supercomputers are SIMD.
MISD multiple instruction stream, single data stream. No known computers fit this
model.
MIMD means a group of independent computers, each with its own program counter,
program, and data. All distributed systems are MIMD.
12
13
Systems are tightly coupled and in others they are loosely coupled.
In a tightly-coupled system, the delay experienced when a message is sent from one
computer to another is short, and the data rate is high; that is, the number of bits per second
that can be transferred is large.
In a loosely-coupled system, the opposite is true: the inter-machine message delay is large
and the data rate is low.
For example, two CPU chips on the same printed circuit board and connected by wires
etched onto the board are likely to be tightly coupled, whereas two computers connected by a
2400 bit/sec modem over the telephone system are certain to be loosely coupled.
Tightly-coupled systems tend to be used more as parallel systems (working on a single
problem) and loosely-coupled ones tend to be used as distributed systems (working on
many unrelated problems), although this is not always true.
One famous counterexample is a project in which hundreds of computers all over the world
worked together trying to factor a huge number (about 100 digits). Each computer was
assigned a different range of divisors to try, and they all worked on the problem in their
spare time, reporting the results back by electronic mail when they finished.
14
Multiprocessors tend to be more tightly coupled than multi-computers, because they can
exchange data at memory speeds, but some fiber-optic based multicomputer can also work at
memory speeds.
Bus multiprocessors,
Switched multiprocessors,
Bus multicomputer, and
Switched multicomputer.
15
Software Concepts
Although the hardware is important, the software is even more important. The image that a
system presents to its users, and how they think about the system, is largely determined by
the operating system software, not the hardware.
Operating systems cannot be put into nice, neat pigeonholes like hardware.
By nature software is vague and amorphous.
It is more-or-less possible to distinguish two kinds of operating systems for multiple CPU
systems: loosely coupled and tightly coupled.
o Loosely-coupled software allows machines and users of a distributed system to be
fundamentally independent of one another, but still to interact to a limited degree where
that is necessary.
o Consider a group of personal computers, each of which has its own CPU, its own
memory, its own hard disk, and its own operating system, but which share some
resources, such as laser printers and databases, over a LAN.
16
To show how difficult it is to make definitions in this area, now consider the same system as
above, but without the network.
To print a file, the user writes the file on a floppy disk, carries it to the machine with the
printer, reads it in, and then prints it. Is this still a distributed system, only now even more
loosely coupled? It's hard to say.
From a fundamental point of view, there is not really any theoretical difference between
communicating over a LAN and communicating by carrying floppy disks around.
At most one can say that the delay and data rate are worse in the second example.
At the other extreme we might find a multiprocessor dedicated to running a single chess
program in parallel. Each CPU is assigned a board to evaluate, and it spends its time
examining that board and all the boards that can be generated from it. When the evaluation
is finished, the CPU reports back the results and is given a new board to work on. The
software for this system, both the application program and the operating system required to
support it, is clearly much more tightly coupled than in our previous example.
17
Network Operating Systems
Let us start with loosely-coupled software on loosely-coupled hardware, since this is
probably the most common combination at many organizations.
A typical example is a network of workstations connected by a LAN. In this model, each
user has a workstation for his exclusive use. It may or may not have a hard disk. It definitely
has its own operating system. All commands are normally run locally, right on the
workstation.
However, it is sometimes possible for a user to log into another workstation remotely by
using a command such as rlogin machine.
The effect of this command is to turn the user's own workstation into a remote terminal
logged into the remote machine. Commands typed on the keyboard are sent to the remote
machine, and output from the remote machine is displayed on the screen. To switch to a
different remote machine, it is necessary first to log out, then to use the rlogin command to
connect to another machine. At any instant, only one machine can be used, and the selection
of the machine is entirely manual.
18
Networks of workstations often also have a remote copy command to copy files from one
machine to another.
For example, a command like rcp machinel:filel machine2:file2
might copy the file file1 from machine1 to machine2 and give it the name file2 there.
Again here, the movement of files is explicit and requires the user to be completely aware of
where all files are located and where all commands are being executed.
While better than nothing, this form of communication is extremely primitive and has led
system designers to search for more convenient forms of communication and
information sharing.
One approach is to provide a shared, global file system accessible from all the
workstations.
The file system is supported by one or more machines called file servers.
The file servers accept requests from user programs running on the other (non-server)
machines, called clients, to read and write files.
Each incoming request is examined and executed, and the reply is sent back as shown fig
below. 19
20
File servers generally maintain hierarchical file systems, each with a root directory
containing subdirectories and files.
Workstations can import or mount these file systems, augmenting their local file systems
with those located on the servers.
One has a directory called games, while the other has a directory called work.
These directories each contain several files. Both of the clients shown have mounted
both of the servers, but they have mounted them in different places in their respective
file systems.
Client 1 has mounted them in its root directory, and can access them as /games and
/work, respectively.
Client 2, like client 1, has mounted games in its root directory, but regarding the
reading of mail and news as a kind of game, has created a directory /games/work and
mounted work there.
Consequently, it can access news using the path /games/work/news rather than
/work/news.
21
Types Of Distributed Systems
22
Cluster Computing
Essentially a group of systems connected through a LAN.
Distributed Computing Systems Homogeneous: Same OS, near-identical hardware
Single managing node
Tightly coupled systems
Centralized job management & scheduling system
23
Grid Computing
Lots of nodes (including clusters across multiple subnets) from everywhere.
Heterogeneous
Diversity and dynamism (it can handle nodes dropping in and out at any point of
time)
Dispersed across several organizations
Can easily span a wide-area network
To allow for collaborations, grids generally use virtual organizations (grouping of
users that will allow for authorization on resource allocation).
Loosely coupled (decentralization)
Distributed job management & scheduling
24
Cloud Computing
Web-based tools or applications that users can access and use through a
web browser as if it were a program installed locally on their own
computer.
Internet-based computing
offers dynamically scalable and virtualized resources that make up
services for users to use over the internet
The only thing the user's computer needs to be able to run is the cloud
computing system's interface software
25
Distributed Pervasive Systems
A next-generation of distributed systems emerging in which the nodes are small, wireless,
battery-powered, mobile (e.g. PDAs, smart phones, wireless surveillance cameras, portable
ECG monitors, etc.), and often embedded as part of a larger system.
Some requirements:
Contextual change: The system is part of an environment in which changes should be
immediately accounted for.
Ad hoc composition: Each node may be used in a very different ways by different users.
-Requires ease-of-configuration.
Sharing is the default: Nodes come and go, providing sharable services and
information.
- Calls again for simplicity.
26
IF You Have Question ,
Comment, Suggestion You
well come!
27
Introduction to
Distributed Systems
Chapter Two
Architecture
28
Introduction
Distributed systems
are often complex pieces of software whose components are dispersed across multiple
machines.
To master their complexity, it is crucial that these systems are properly organized.
Its main organization is mostly about the software components that constitute the system
These software architectures tell us how the various software components are to be
organized and how they should interact.
There are many different choices for the actual realization of a distributed system
requires that we instantiate and place software components on real machines
The final instantiation of a software architecture is also referred to as a system
architecture
The main goal of distributed system is to separate applications from underlying
platforms by providing a middleware layer that provide distribution transparency.
Distributed systems are frequently organized in the form of feedback control
loops which form an important architectural element during a system's design 29
Architectural Styles
Architectural style is formulated
in terms of components
the way that components are connected to each other
the data exchanged between components
finally how these elements are jointly configured into a system
• A component is a modular unit with well-defined required and provided interfaces
that is replaceable within its environment.
• Connector can be formed by the facilities for (remote) procedure calls, message
passing, or streaming data.
Using components and connectors, we can come to various configurations, which, in tum
have been classified into architectural styles.
30
The most important ones for distributed systems are:
Layered architectures
Object-based architectures
Data-centered architectures
Event-based architectures
31
Layered and Object-based Architectures
components are organized in a layered fashion
component at layer L; is allowed to call components at the underlying layer Li
widely adopted by the networking community
An key observation is that control generally flows from layer to layer
requests go down the hierarchy whereas the results flow upward
A far looser organization is followed in object-based architectures, which
are illustrated in Fig
each object corresponds to what we have defined as a component, and these
components are connected through a (remote) procedure call mechanism
this software architecture matches the client-server system architecture
The layered and object based architectures still form the most important styles
for large software systems
32
The (a) layered and (b) object-based architectural style
33
Data-centered architectures
Processes communicate through a common (passive or active) repository
Are as important as the layered and object based architectures
A wealth of networked applications have been developed that rely on a
shared distributed file system in which virtually all communication takes
place through files
Likewise, Web-based distributed systems are largely data-centric: processes
communicate through the use of shared Web-based data services
34
Event-based architectures
35
The (a) event-based and (b) shared data-space architectural style
It can be combined with data-centered architectures, yielding what is also known as
shared data spaces.
many shared data spaces use a SQL-like interface to the shared repository in that sense
that data can be accessed using a description rather than an explicit reference, as is the
case with files
36
System Architectures
Centralized Architectures
In the basic client-server model, processes in a distributed system are divided into two
(possibly overlapping) groups
A server is a process implementing a specific service. Eg: a file system service or a
database service.
A client is a process that requests a service from a server by sending it a request and
subsequently waiting for the server's reply.
This client-server interaction, also known as request-reply.
37
Communication between a client and a server can be implemented by means of a
simple connectionless protocol.
Connectionless protocol is efficient while there is no message lost or corrupted, the
request/reply protocol just sketched works fine.
The problem, however, is that the client cannot detect whether the original request
message was lost, or that transmission of the reply failed.
As an alternative, many client-server systems use a reliable connection oriented
protocol.
Although this solution is not entirely appropriate in a local-area network due to
relatively low performance, it works perfectly tine in wide-area systems in which
communication is inherently unreliable.
38
Application Layering
considering that many client-server applications are targeted toward supporting
user access to databases, many people have advocated a distinction between the
following three levels
The user-interface level: contains all that is necessary to directly interface with the
user, such as display management.
o Clients typically implement the user-interface level
o allow end users to interact with applications
o Many client-server applications can be constructed from roughly three different pieces: a
part that handles interaction with a user, a part that operates on a database or file
system, and a middle part that generally contains the core functionality of an application
The processing level: typically contains the applications
o middle part is logically placed at the processing level
The data level: manages the actual data that is being acted on
39
The simplified organization of an Internet search engine into three different
layers
40
Alternative client-server organizations (a)-(e)
41
Cases:
a) only the terminal-dependent part of the user interface on the client machine
B: place the entire user-interface software on the client side
o Divide the application into a graphical front end, which communicates with the rest of the application
(residing at the server) through an application-specific protocol.
O the front end (the client software) does no processing other than necessary for presenting the
application's interface
C: move part of the application to the front end
o e.g. the application makes use of a form that needs to be filled in entirely before it can be processed
o front end can then check the correctness and consistency of the form, and where necessary interact with
the user
D: used where the client machine is a PC or workstation, connected through a network to a distributed
file system or database
o most of the application is running on the client machine, but all operations on files or database entries
go to the server
o e.g. many banking applications run on an end-user's machine where the user prepares transactions and
such.. Once finished, the application contacts the database on the bank's server and uploads the
transactions for further processing
E: used where the client machine is a PC or workstation, connected through a network to a distributed 42
43
Decentralized Architectures
Peer-to-peer systems
o support horizontal distribution
o functions that need to be carried out are represented by every process that
constitutes the distributed system
o interaction between processes is symmetric.
• each process will act as a client and a server at the same time (which is also
referred to as acting as a servant)
A two-layered approach for constructing and maintaining specific overlay topologies using techniques from
unstructured peer-to-peer systems 46
Super-peers
Super-peers are often also organized in a peer-to-peer network, leading to a
hierarchical organization.
every regular peer is connected as a client to a super-peer
All communication from and to a regular peer proceeds through that peer's
associated super-peer
50