Parallel Algorithm - Introduction
Parallel Algorithm - Introduction
Parallel Algorithm - Introduction
PARALLEL
ALGORITHMS
CS 1406 - ACOA
• What is an Algorithm?
- An algorithm is a sequence of instructions followed to solve a
problem. While designing an algorithm, we should consider the
architecture of computer on which the algorithm will be executed. As
per the architecture, there are two types of computers −
• Sequential Computer
• Parallel Computer
• Depending on the architecture of computers, we have two types of
algorithms −
• Sequential Algorithm − An algorithm in which some consecutive steps
of instructions are executed in a chronological order to solve a
problem.
• Parallel Algorithm − The problem is divided into sub-problems and are
executed in parallel to get individual outputs. Later on, these
individual outputs are combined together to get the final desired
output.
To design an algorithm properly, we must have a clear idea of the
basic model of computation in a parallel computer.
• Model of Computation:
• Here, one single control unit sends instructions to all processing units. During
computation, at each step, all the processors receive a single set of instructions
from the control unit and operate on different set of data from the memory unit.
• Each of the processing units has its own local memory unit to store both data and
instructions. In SIMD computers, processors need to communicate among
themselves. This is done by shared memory or by interconnection network.
MISD Computers
• As the name suggests, MISD computers contain multiple control units,
multiple processing units, and one common memory unit.
• Here, each processor has its own control unit and they share a common
memory unit. All the processors get instructions individually from their own
control unit and they operate on a single stream of data as per the instructions
they have received from their respective control units. This processor operates
simultaneously.
MIMD Computers
• MIMD computers have multiple control units, multiple processing units, and
a shared memory or interconnection network .
• Here, each processor has its own control unit, local memory unit, and
arithmetic and logic unit. They receive different sets of instructions from
their respective control units and operate on different sets of data.
• An MIMD computer that shares a common memory is
known as multiprocessors, while those that uses an
interconnection network is known as multicomputers.
• Based on the physical distance of the processors,
multicomputers are of two types −
– Multicomputer − When all the processors are very
close to one another (e.g., in the same room).
– Distributed system − When all the processors are far
away from one another (e.g.- in the different cities
Parallel Algorithm - Models
• The model of a parallel algorithm is developed by considering a
strategy for dividing the data and processing method and
applying a suitable strategy to reduce interactions. In this
chapter, we will discuss the following Parallel Algorithm Models −
• The task may be available in the beginning, or may be generated dynamically. If the
task is generated dynamically and a decentralized assigning of task is done, then a
termination detection algorithm is required so that all the processes can actually
detect the completion of the entire program and stop looking for more tasks.
• A process includes:
– program counter Process in Memory
– stack
– data section
Process State Diagram
• Most process posses a typical characteristics:
CPU I/O CPU
……………
Burst Burst Burst
• In I/O Burst:
– It may be an Input from user, waiting to read content of file
from secondary storage, waiting to get input from input
device, etc.
– It may give some output continuously/some time interval, to
output device, etc.
READY RUNNING
Interrupt
Requesting
Held R1
By
P1 P2
Requesting R2 Held By
WAYS TO DEAL WITH DEADLOCK
• DEADLOCK PREVENTION
• DEADLOCK DETECTION
• DEADLOCK CORRECTION
DEADLOCK PREVENTION
• Deadlock prevention works by preventing one of the four
Coffman conditions from occurring:
– software approaches generally make conservative decisions. Leading to inefficient cache utilization.
– Compiler-based cache coherence mechanism perform an analysis on the code to determine which
data items may become unsafe for caching, and they mark those items accordingly. So, there are some
more cacheable items, and the operating system or hardware does not cache those items.
– The simplest approach is to prevent any shared data variables from being cached.
– This is too conservative, because a shared data structure may be exclusively used during some periods
and may be effectively read-only during other periods.
Cache Coherence Protocol
• HARDWARE BASED
– Hardware solution provide dynamic recognition at run time of potential inconsistency
conditions.
– The problem is only dealt with when it actually arises, leading to improved performances
over a software approach
– Hardware schemes can be divided into two categories: i)Directory protocol; ii)Snoopy
protocols
• Directory Protocols:
– Directory protocols collect and maintain information about where copies of lines reside.
– There is centralized controller that is part of the main memory controller, and a directory
that is stored in main memory.
– The directory contains global state information about the contents of the various local
caches.
– When an individual cache controller makes a request, the centralized controller checks
and issues necessary commands for data transfer between memory and caches or
between caches themselves
Cache Coherence Protocol
• Snoopy Protocols:
– Snoopy protocols distribute the responsibility for maintaining cache coherence among all of the cache
controllers in a multiprocessor system.
– A cache must recognize when a line that it holds is shared with other caches.
– When an update action is performed on a shared cache line, it must be announced to all other caches
by a broadcast mechanism.
– Each cache controller is able to “snoop” on the network to observed these broadcasted notification
and react accordingly.
– Two basic approaches to the snoopy protocol are: Write invalidates or write- update (write-broadcast)
Cache Coherence Protocol
(Snoopy Protocols)
Write-invalidate Protocol Write- update (write-broadcast) Protocol
can be multiple readers but only one there can be multiple writers as well as
write at a time multiple readers
Initially, a line may be shared among When a processors wishes to update a
several caches for reading purposes. shared line, the word to be updated is
When one of the caches wants to perform distributed to all others, and caches
a write to the line it first issues a notice containing that line can update it.
that invalidates that time in the other
caches, making the line exclusive to the
writing cache.
Once the line is exclusive, the owning
processor can make local writes until
some other processor requires the same
line
Task Scheduling Approaches in
Multiprocessor System
• Stochastic- concerns scheduling problems involving random attributes,
such as random processing times, random due dates, random weights,
and stochastic machine breakdowns
• Deterministic- all the parameters of the system and of the tasks are
known in advance
• Probable- all the parameters of the system and of the tasks are presumed
References :
• Joseph JaJa ,Introduction to Parallel Algorithms, Addison-Wesley
Professional; 1 edition (April 3, 1992)
• Frank Thomson Leighton, Introduction to Parallel Algorithms and
Architectures: Arrays, Trees, Hypercubes, Morgan Kaufmann Pub; 1
edition (September 1, 1991)
• Shimon Even, Graph Algorithms, 2nd Edition, ISBN-13: 978-
0521736534
• Trahan, Jerry. (1988). Instruction Sets for Parallel Random Access
Machines. 172.
• H. T. Lau, Algorithms on Graphs, Tab Books; 1st edition (December
1, 1989)