Chapter 1

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 50

Introduction to

Distributed Systems

Chapter One
Introduction

1
Computer systems are undergoing a revolution. From 1945, when the modern
computer era began, until about 1985, computers were large and expensive. Even
minicomputers normally cost tens of thousands of dollars each.

As a result, most organizations had only a handful of computers, and for lack of a
way to connect them, these operated independently from one another.

Starting in the mid-1980s, however, two advances in technology began to change


that situation. The first was the development of powerful microprocessors.
Initially, these were 8-bit machines, but soon 16, 32, and even 64-bit CPUs
became common.
Many of these had the computing power of a decent-sized mainframe (i.e., large)
computer, but for a fraction of the price.

2
The amount of improvement that has occurred in computer technology in the past half
century is truly staggering and totally unprecedented in other industries. The second
development was the invention of high-speed computer networks.
The local area networks or LANs allow dozens, or even hundreds, of machines within a
building to be connected in such a way that small amounts of information can be
transferred between machines in a millisecond or so.

3
What is a Distributed System?
A distributed system is a collection of independent computers that appear to the users of
the system as a single computer. This definition has two aspects.
The first one deals with hardware: the machines are autonomous.
The second one deals with software: the users think of the system as a single computer.

Example1. Consider a network of workstations in a university or company department. In


addition to each user's personal workstation, there might be a pool of processors in the
machine room that are not assigned to specific users but are allocated dynamically as
needed. Such a system might have a single file system, with all files accessible from all
machines in the same way and using the same path name. Furthermore, when a user typed a
command, the system could look for the best place to execute that command, possibly on the
user's own workstation, possibly on an idle workstation belonging to someone else, and
possibly on one of the unassigned processors in the machine room. If the system as a whole
looked and acted like a classical single-processor timesharing system, it would qualify as a
distributed system. 4
Advantages of distributed systems over isolated (personal) computers.

Item Description

Data Sharing Allow many users access to a common data base

Device sharing Allow many users to share expensive peripherals like color printers

Communication Make human-to-human communication easier, for example, by


electronic mail.

Flexibility Spread the workload over the available machines in the most cost
effective way

5
Advantages of distributed systems over centralized systems.

ITEM Description
Economics Microprocessors offer a better price/performance than mainframes

Speed A distributed system may have more total computing power than a
mainframe
Inherent Some applications involve spatially separated machines
distribution
Reliability If one machine crashes, the system as a whole can still survive

Incremental Computing power can be added in small increments


growth

6
Disadvantages of Distributed Systems

Item Description

Software Little software exists at present for distributed systems

Networking The network can saturate or cause other problems

Security Easy access also applies to secret data

7
Characteristics of Distributed Systems
 Differences between the computers and the ways they communicate are
hidden from users
 Users and applications can interact with a distributed system in a
consistent and uniform way regardless of location
 Distributed systems should be easy to expand and scale
 Distributed system is normally continuously available, even if there
may be partial failures
o Users and applications should not notice that parts are being replaced
or fixed, or that new parts are added to serve more users or
applications

8
Goals of a Distributed System
 To support heterogeneous computers and networks
 To provide a single-system view, a distributed system is often organized by
means of a layer of software called middleware that extends over multiple
machines

9
Distributed system should
 Make resources accessible(printers, computers, storage facilities, data,
files, Web pages, ...)
o reasons: economics, to collaborate and exchange information
 Be transparent: hide the fact that the resources and processes are
distributed across multiple computers.
 Be open
 Be scalable
Transparency in a Distributed System
 Distributed system that is able to present itself to users
and applications as if it were only a single computer
system is said to be transparent.
 It exist as a single entity without worrying about
location in which users can get the services
10
Different forms of transparency in a distributed system
Transparency Description
Access Hide differences in data representation
and how a resource is accessed
Location Hide where a resource is physically located; where
is http://www.prenhall.com/index.html? (naming)
Migration Hide that a resource may move to another location
Relocation Hide that a resource may be moved to another location while in use;
e.g., mobile users using their wireless laptops
Replication Hide that a resource is replicated
Concurrency Hide that a resource may be shared by several competitive users; a
resource must be left in a consistent state
Failure Hide the failure and recovery of a resource
Persistence Hide whether a (software) resource is in memory or on disk

11
Hardware Concepts
Distributed systems consist of multiple CPUs, there are several different ways the hardware
can be organized, especially in terms of how they are interconnected and how they
communicate.
Flynn(1972) picked two characteristics: the number of instruction streams and the number
of data streams.
 A computer with a single instruction stream and a single data stream is called SISD.
 SIMD, single instruction stream, multiple data stream. This type refers to array
processors with one instruction unit that fetches an instruction, and then commands many
data units to carry it out in parallel, each with its own data. These machines are useful for
computations that repeat the same calculation on many sets of data, for example, adding
up all the elements of 64 independent vectors. Some supercomputers are SIMD.
 MISD multiple instruction stream, single data stream. No known computers fit this
model.
 MIMD means a group of independent computers, each with its own program counter,
program, and data. All distributed systems are MIMD.
12
13
Systems are tightly coupled and in others they are loosely coupled.
In a tightly-coupled system, the delay experienced when a message is sent from one
computer to another is short, and the data rate is high; that is, the number of bits per second
that can be transferred is large.
In a loosely-coupled system, the opposite is true: the inter-machine message delay is large
and the data rate is low.
For example, two CPU chips on the same printed circuit board and connected by wires
etched onto the board are likely to be tightly coupled, whereas two computers connected by a
2400 bit/sec modem over the telephone system are certain to be loosely coupled.
Tightly-coupled systems tend to be used more as parallel systems (working on a single
problem) and loosely-coupled ones tend to be used as distributed systems (working on
many unrelated problems), although this is not always true.
One famous counterexample is a project in which hundreds of computers all over the world
worked together trying to factor a huge number (about 100 digits). Each computer was
assigned a different range of divisors to try, and they all worked on the problem in their
spare time, reporting the results back by electronic mail when they finished.
14
Multiprocessors tend to be more tightly coupled than multi-computers, because they can
exchange data at memory speeds, but some fiber-optic based multicomputer can also work at
memory speeds.
 Bus multiprocessors,
 Switched multiprocessors,
 Bus multicomputer, and
 Switched multicomputer.

15
Software Concepts
Although the hardware is important, the software is even more important. The image that a
system presents to its users, and how they think about the system, is largely determined by
the operating system software, not the hardware.
 Operating systems cannot be put into nice, neat pigeonholes like hardware.
 By nature software is vague and amorphous.
 It is more-or-less possible to distinguish two kinds of operating systems for multiple CPU
systems: loosely coupled and tightly coupled.
o Loosely-coupled software allows machines and users of a distributed system to be
fundamentally independent of one another, but still to interact to a limited degree where
that is necessary.
o Consider a group of personal computers, each of which has its own CPU, its own
memory, its own hard disk, and its own operating system, but which share some
resources, such as laser printers and databases, over a LAN.

16
To show how difficult it is to make definitions in this area, now consider the same system as
above, but without the network.
To print a file, the user writes the file on a floppy disk, carries it to the machine with the
printer, reads it in, and then prints it. Is this still a distributed system, only now even more
loosely coupled? It's hard to say.
From a fundamental point of view, there is not really any theoretical difference between
communicating over a LAN and communicating by carrying floppy disks around.
At most one can say that the delay and data rate are worse in the second example.

At the other extreme we might find a multiprocessor dedicated to running a single chess
program in parallel. Each CPU is assigned a board to evaluate, and it spends its time
examining that board and all the boards that can be generated from it. When the evaluation
is finished, the CPU reports back the results and is given a new board to work on. The
software for this system, both the application program and the operating system required to
support it, is clearly much more tightly coupled than in our previous example.
17
Network Operating Systems
Let us start with loosely-coupled software on loosely-coupled hardware, since this is
probably the most common combination at many organizations.
A typical example is a network of workstations connected by a LAN. In this model, each
user has a workstation for his exclusive use. It may or may not have a hard disk. It definitely
has its own operating system. All commands are normally run locally, right on the
workstation.
However, it is sometimes possible for a user to log into another workstation remotely by
using a command such as rlogin machine.

The effect of this command is to turn the user's own workstation into a remote terminal
logged into the remote machine. Commands typed on the keyboard are sent to the remote
machine, and output from the remote machine is displayed on the screen. To switch to a
different remote machine, it is necessary first to log out, then to use the rlogin command to
connect to another machine. At any instant, only one machine can be used, and the selection
of the machine is entirely manual.
18
Networks of workstations often also have a remote copy command to copy files from one
machine to another.
For example, a command like rcp machinel:filel machine2:file2
might copy the file file1 from machine1 to machine2 and give it the name file2 there.
Again here, the movement of files is explicit and requires the user to be completely aware of
where all files are located and where all commands are being executed.
 While better than nothing, this form of communication is extremely primitive and has led
system designers to search for more convenient forms of communication and
information sharing.
 One approach is to provide a shared, global file system accessible from all the
workstations.
 The file system is supported by one or more machines called file servers.
 The file servers accept requests from user programs running on the other (non-server)
machines, called clients, to read and write files.
 Each incoming request is examined and executed, and the reply is sent back as shown fig
below. 19
20
File servers generally maintain hierarchical file systems, each with a root directory
containing subdirectories and files.
 Workstations can import or mount these file systems, augmenting their local file systems
with those located on the servers.
 One has a directory called games, while the other has a directory called work.
 These directories each contain several files. Both of the clients shown have mounted
both of the servers, but they have mounted them in different places in their respective
file systems.
 Client 1 has mounted them in its root directory, and can access them as /games and
/work, respectively.
 Client 2, like client 1, has mounted games in its root directory, but regarding the
reading of mail and news as a kind of game, has created a directory /games/work and
mounted work there.
 Consequently, it can access news using the path /games/work/news rather than
/work/news.
21
 Types Of Distributed Systems

Main distributed system types are:


1. Distributed computing systems
o Focus on computation
o Goal: High performance computing tasks
2. Distributed information systems
o Focus on interoperability (the ability to exchange and use information)
o Goal: Distribute information across several servers
3. Distributed pervasive systems
o Focus on mobile, embedded, communicating systems
o Goal: Spread a real-life environment with a large variety of smart devices.

22
Cluster Computing
Essentially a group of systems connected through a LAN.
 Distributed Computing Systems  Homogeneous: Same OS, near-identical hardware
 Single managing node
 Tightly coupled systems
 Centralized job management & scheduling system

23
Grid Computing
Lots of nodes (including clusters across multiple subnets) from everywhere.
 Heterogeneous
 Diversity and dynamism (it can handle nodes dropping in and out at any point of
time)
 Dispersed across several organizations
 Can easily span a wide-area network
 To allow for collaborations, grids generally use virtual organizations (grouping of
users that will allow for authorization on resource allocation).
 Loosely coupled (decentralization)
 Distributed job management & scheduling

24
 Cloud Computing

Web-based tools or applications that users can access and use through a
web browser as if it were a program installed locally on their own
computer.
 Internet-based computing
 offers dynamically scalable and virtualized resources that make up
services for users to use over the internet
 The only thing the user's computer needs to be able to run is the cloud
computing system's interface software

25
 Distributed Pervasive Systems
A next-generation of distributed systems emerging in which the nodes are small, wireless,
battery-powered, mobile (e.g. PDAs, smart phones, wireless surveillance cameras, portable
ECG monitors, etc.), and often embedded as part of a larger system.
Some requirements:
 Contextual change: The system is part of an environment in which changes should be
immediately accounted for.
 Ad hoc composition: Each node may be used in a very different ways by different users.
-Requires ease-of-configuration.
 Sharing is the default: Nodes come and go, providing sharable services and
information.
- Calls again for simplicity.

26
IF You Have Question ,
Comment, Suggestion You
well come!

27
Introduction to
Distributed Systems

Chapter Two
Architecture

28
 Introduction
Distributed systems
 are often complex pieces of software whose components are dispersed across multiple
machines.
 To master their complexity, it is crucial that these systems are properly organized.
 Its main organization is mostly about the software components that constitute the system
 These software architectures tell us how the various software components are to be
organized and how they should interact.
 There are many different choices for the actual realization of a distributed system
requires that we instantiate and place software components on real machines
 The final instantiation of a software architecture is also referred to as a system
architecture
 The main goal of distributed system is to separate applications from underlying
platforms by providing a middleware layer that provide distribution transparency.
 Distributed systems are frequently organized in the form of feedback control
loops which form an important architectural element during a system's design 29
 Architectural Styles
Architectural style is formulated
 in terms of components
 the way that components are connected to each other
 the data exchanged between components
 finally how these elements are jointly configured into a system
• A component is a modular unit with well-defined required and provided interfaces
that is replaceable within its environment.
• Connector can be formed by the facilities for (remote) procedure calls, message
passing, or streaming data.
Using components and connectors, we can come to various configurations, which, in tum
have been classified into architectural styles.

30
The most important ones for distributed systems are:
 Layered architectures
 Object-based architectures
 Data-centered architectures
 Event-based architectures

31
 Layered and Object-based Architectures
 components are organized in a layered fashion
 component at layer L; is allowed to call components at the underlying layer Li
 widely adopted by the networking community
 An key observation is that control generally flows from layer to layer
 requests go down the hierarchy whereas the results flow upward
 A far looser organization is followed in object-based architectures, which
are illustrated in Fig
 each object corresponds to what we have defined as a component, and these
components are connected through a (remote) procedure call mechanism
 this software architecture matches the client-server system architecture
 The layered and object based architectures still form the most important styles
for large software systems
32
The (a) layered and (b) object-based architectural style

33
 Data-centered architectures
 Processes communicate through a common (passive or active) repository
 Are as important as the layered and object based architectures
 A wealth of networked applications have been developed that rely on a
shared distributed file system in which virtually all communication takes
place through files
 Likewise, Web-based distributed systems are largely data-centric: processes
communicate through the use of shared Web-based data services

34
 Event-based architectures

 Processes essentially communicate through the propagation of events, which


optionally also carry data.
 Event propagation has generally been associated with what are known as
publish/subscribe systems.
 Processes publish events after which the middleware ensures that only
those processes that subscribed to those events will receive them.
 The main advantage of event-based systems is that processes are loosely
coupled.
 In principle, they need not explicitly refer to each other.
 This is also referred to as being decoupled in space, or referentially decoupled

35
The (a) event-based and (b) shared data-space architectural style
 It can be combined with data-centered architectures, yielding what is also known as
shared data spaces.
 many shared data spaces use a SQL-like interface to the shared repository in that sense
that data can be accessed using a description rather than an explicit reference, as is the
case with files
36
 System Architectures
 Centralized Architectures
 In the basic client-server model, processes in a distributed system are divided into two
(possibly overlapping) groups
 A server is a process implementing a specific service. Eg: a file system service or a
database service.
 A client is a process that requests a service from a server by sending it a request and
subsequently waiting for the server's reply.
 This client-server interaction, also known as request-reply.

37
 Communication between a client and a server can be implemented by means of a
simple connectionless protocol.
 Connectionless protocol is efficient while there is no message lost or corrupted, the
request/reply protocol just sketched works fine.
 The problem, however, is that the client cannot detect whether the original request
message was lost, or that transmission of the reply failed.
 As an alternative, many client-server systems use a reliable connection oriented
protocol.
 Although this solution is not entirely appropriate in a local-area network due to
relatively low performance, it works perfectly tine in wide-area systems in which
communication is inherently unreliable.

38
 Application Layering
 considering that many client-server applications are targeted toward supporting
user access to databases, many people have advocated a distinction between the
following three levels
 The user-interface level: contains all that is necessary to directly interface with the
user, such as display management.
o Clients typically implement the user-interface level
o allow end users to interact with applications
o Many client-server applications can be constructed from roughly three different pieces: a
part that handles interaction with a user, a part that operates on a database or file
system, and a middle part that generally contains the core functionality of an application
 The processing level: typically contains the applications
o middle part is logically placed at the processing level
 The data level: manages the actual data that is being acted on

39
The simplified organization of an Internet search engine into three different
layers
40
Alternative client-server organizations (a)-(e)

41
Cases:
a) only the terminal-dependent part of the user interface on the client machine
B: place the entire user-interface software on the client side
o Divide the application into a graphical front end, which communicates with the rest of the application
(residing at the server) through an application-specific protocol.
O the front end (the client software) does no processing other than necessary for presenting the
application's interface
C: move part of the application to the front end
o e.g. the application makes use of a form that needs to be filled in entirely before it can be processed
o front end can then check the correctness and consistency of the form, and where necessary interact with
the user
D: used where the client machine is a PC or workstation, connected through a network to a distributed
file system or database
o most of the application is running on the client machine, but all operations on files or database entries
go to the server
o e.g. many banking applications run on an end-user's machine where the user prepares transactions and
such.. Once finished, the application contacts the database on the bank's server and uploads the
transactions for further processing
E: used where the client machine is a PC or workstation, connected through a network to a distributed 42

file system or database


An example of a server acting as client

43
 Decentralized Architectures
 Peer-to-peer systems
o support horizontal distribution
o functions that need to be carried out are represented by every process that
constitutes the distributed system
o interaction between processes is symmetric.
• each process will act as a client and a server at the same time (which is also
referred to as acting as a servant)

 Structured Peer-to-Peer Architectures


o the overlay network is constructed using a deterministic procedure.
o most-used procedure is to organize the processes through a distributed hash table
(DHT)
o In a DHT -based system, data items are assigned a random key from a large
44
identifier space, such as a 128-bit or 160-bit identifier
o node id informs its departure to its predecessor and successor, and transfers its data
items to succ(id).

The mapping of data items onto nodes in Chord


45
 Unstructured Peer-to- Peer Architectures
o largely rely on randomized algorithms for constructing an overlay network
o each node maintains a list of neighbors, but that this list is constructed in a more or
less random way
o data items are assumed to be randomly placed on nodes
o when a node needs to locate a specific data item, the only thing it can effectively
do is flood the network with a search query

A two-layered approach for constructing and maintaining specific overlay topologies using techniques from
unstructured peer-to-peer systems 46
 Super-peers
 Super-peers are often also organized in a peer-to-peer network, leading to a
hierarchical organization.
 every regular peer is connected as a client to a super-peer
 All communication from and to a regular peer proceeds through that peer's
associated super-peer

A hierarchical organization of nodes into a super-peer network


47
 Hybrid Architectures
 Edge-Server Systems
o deployed on the Internet where servers are placed "at the edge" of the network
o This edge is formed by the boundary between enterprise networks and the actual
Internet, for example, as provided by an Internet Service Provider (ISP).

Viewing the Internet as consisting of a collection of edge servers 48


o End users, or clients in general, connect to the Internet by means of an edge server
o main purpose is to serve content, possibly after applying filtering and transcoding
functions
o collection of edge servers can be used to optimize content and application distribution
o one edge server acts as an origin server from which all content originates
 Collaborative Distributed Systems
o Hybrid structures are notably deployed in collaborative distributed systems

The principal working of Bit Torrent


o The basic idea is that when an end user is looking for a file, he downloads chunks of the file from
49
other users until the downloaded chunks can be assembled together yielding the complete file.
IF You Have Question ,
Comment, Suggestion You
well come!

50

You might also like