Lecture 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 19

Parallel & Distributed Computing

Lecture # 02
Flynn’s Taxonomy and Parallelism
Raja Shamayel

Department of Computer Science


NUML University Islamabad
von Neumann Architecture
• Authored the general
requirements for an electronic
computer in his 1945 papers
• Also known as "stored-program
computer”
• Both program instructions and
data are kept in electronic
memory
• Since then, virtually all
computers have followed his
design
John von Neumann
Von Neumann Architecture
Four main Components
• Read/write, random access memory is
used to store both program instructions
and data
• Program instructions are coded data which tell
the computer to do something
• Data is simply information to be used by the
program
• Control unit fetches instructions/data
from memory, decodes the instructions
and then sequentially coordinates
operations to accomplish the
programmed task.
• Arithmetic Unit performs basic arithmetic
operations parallel computers still follow this basic design
• Input/Output is the interface to the Just multiplied in units
human operator
Flynn's Classical Taxonomy
• Different ways to classify parallel
computers
• One of the more widely used
classifications, in use since 1966,
is called Flynn's Taxonomy
• Distinguishes multi-processor
computer architectures
according to two dimensions
• Instruction Stream
• Data Stream
SISD (Scalar Processing)
• A serial (non-parallel) computer
• Single Instruction: Only one
instruction stream is being acted on
by the CPU during any one clock cycle
• Single Data: Only one data stream is
being used as input during any one
clock cycle
• Deterministic execution
• This is the oldest type of computer
• Examples: older generation
mainframes, minicomputers,
workstations and single
processor/core PCs.
SIMD (Vector Processing)
• Single Instruction: All processing units
execute the same instruction at any given
clock cycle
• Multiple Data: Each processing unit can
operate on a different data element
• Best suited for specialized problems such as
graphics/image processing.
• Synchronous and deterministic execution
• Processor Arrays Vector Pipelines
• Examples:
• Processor Arrays: Thinking Machines CM-2
• Vector Pipelines: IBM 9000
• Most modern computers with GPU units
Scalar vs Vector
(SIMD)
Multiple Instruction, Single Data (MISD)
• Multiple Instruction: Each processing
unit operates on the data
independently via separate instruction
streams.
• Single Data: A single data stream is fed
into multiple processing units.
• Few (if any) actual examples of this
class of parallel computer have ever
existed.
• Some conceivable uses might be:
• multiple frequency filters operating on a
single signal stream
• multiple cryptography algorithms
attempting to crack a single coded
message.
Multiple Instruction, Multiple Data (MIMD)
• Multiple Instruction: Every processor may
be executing a different instruction
stream
• Multiple Data: Every processor may be
working with a different data stream
• Execution can be synchronous or
asynchronous, deterministic or
non-deterministic
• Currently, the most common type of
parallel computer - most modern
supercomputers fall into this category
• Examples: most current supercomputers,
networked parallel computer clusters,
grids, multi-core PCs
MIMD Classification
• Shared Memory
• Distributed
Memory
Shared vs Distributed Memory
Some General Parallel Terminology
• Supercomputing / High Performance Computing (HPC)
• Using the world's fastest and largest computers to solve large problems
• Node
• A standalone "computer in a box". Usually comprised of multiple
CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked
together to comprise a supercomputer.
• CPU / Socket / Processor / Core
• In the past, a CPU (Central Processing Unit) was a singular execution component
• Then, multiple CPUs were incorporated into a node
• Then, individual CPUs were subdivided into multiple "cores", each being a unique
execution unit
• CPUs with multiple cores are sometimes called "sockets”
• The result is a node with multiple CPUs, each containing multiple cores
Some General Parallel Terminology
• Task
• A logically discrete section of computational work. A task is typically a program or
program-like set of instructions that is executed by a processor. A parallel program
consists of multiple tasks running on multiple processors.
• Pipelining
• Breaking a task into steps performed by different processor units, with inputs
streaming through, much like an assembly line; a type of parallel computing.
• Shared Memory
• From a strictly hardware point of view, describes a computer architecture where all
processors have direct (usually bus based) access to common physical memory. In a
programming sense, it describes a model where parallel tasks all have the same
"picture" of memory and can directly address and access the same logical memory
locations regardless of where the physical memory actually exists.
Some General Parallel Terminology
• Symmetric Multi-Processor (SMP)
• Shared memory hardware architecture where multiple processors share a single
address space and have equal access to all resources.
• Distributed Memory
• In hardware, refers to network based memory access for physical memory that is
not common. As a programming model, tasks can only logically "see" local machine
memory and must use communications to access memory on other machines
where other tasks are executing.
• Communications
• Parallel tasks typically need to exchange data. There are several ways this can be
accomplished, such as through a shared memory bus or over a network, however
the actual event of data exchange is commonly referred to as communications
regardless of the method employed
Some General Parallel Terminology
• Synchronization
• The coordination of parallel tasks in real time, very often associated with communications.
• Often implemented by establishing a synchronization point within an application where a task may not proceed further until another
task(s) reaches the same or logically equivalent point.
• Synchronization usually involves waiting by at least one task and can therefore cause a parallel application's wall clock execution
time to increase.
• Granularity
• granularity is a qualitative measure of the ratio of computation to communication
• Coarse: relatively large amounts of computational work are done between communication events
• Fine: relatively small amounts of computational work are done between communication events
• Observed Speedup
• Observed speedup of a code which has been parallelized, defined as:
wall-clock time of serial execution
-----------------------------------
wall-clock time of parallel execution
• One of the simplest and most widely used indicators for a parallel program's performance.
Some General Parallel Terminology
• Parallel Overhead
• The amount of time required to coordinate parallel tasks, as opposed to doing
useful work. Parallel overhead can include factors such Task start-up time
• Synchronizations
• Data communications
• Software overhead imposed by parallel languages, libraries, operating system, etc.
• Task termination time
• Massively Parallel
• Refers to the hardware that comprises a given parallel system - having many
processing elements. The meaning of "many" keeps increasing, but currently, the
largest parallel computers are comprised of processing elements numbering in the
hundreds of thousands
• Embarrassingly Parallel
• Solving many similar, but independent tasks simultaneously; little to no need for
coordination between the tasks
Some General Parallel Terminology
• Scalability
• Refers to a parallel system's (hardware and/or software) ability to
demonstrate a proportionate increase in parallel speedup with the addition
of more resources. Factors that contribute to scalability include:
• Hardware - particularly memory-CPU bandwidths and network communication
properties
• Application algorithm
• Parallel overhead related
• Characteristics of your specific application
That’s all for today!!

You might also like