Model
Model
Model
MODULE I
Parallel computer models – Evolution of Computer Architecture, System attributes to
performance, Amdahl's law for a fixed workload. Multiprocessors and Multicomputers,
Multivector and SIMD computers, Architectural development tracks, Conditions of parallelism.
Parallel processing has emerged as a key enabling technology in modern computers, driven by
the ever-increasing demand for higher performance, lower costs, and sustained productivity in
real-life applications.
Concurrent events are taking place in today's high-performance computers due to the common
practice of multiprogramming, multiprocessing, or multicomputing.
As shown in figure below, Evolution Started with the von Neumann architecture built as a sequential
machine executing scalar data . The sequential computer was improved from bit-serial to word—
parallel operations, and from fixed—point to floating point operations. The von Neumann architecture
is slow due to sequential execution of instructions in programs.
Functional parallelism was supported by two approaches: One is to use multiple functional
units simultaneously, and the other is to practice pipelining at various processing levels.
The latter includes pipelined instruction execution, pipelined arithmetic computations, and
memory-access operations. Pipelining has proven especially attractive in performing identical
operations repeatedly over vector data strings.
A vector is one dimensional array of numbers. A vector processor is CPU that implements an
instruction set containing instructions that operate on one dimensional arrays of data called vectors.
Vector operations were originally carried out implicitly by software-controlled looping using
scalar pipeline processors.
3
Explicit vector instructions were introduced with the appearance of vector processors. A vector
processors equipped with multiple vector pipelines that can be concurrently used under hardware or
firmware control.
Another important branch of the architecture tree consists of the SIMD computers for synchronized
vector processing. An SIMD computer exploits spatial parallelism rather than temporal parallelism as
in a pipelined computer .SIMD computing is achieved through the use of an array of processing
elements [PEs] synchronised by the same controller. Associative memory can be used to build SIMD
associative processors.
Intrinsic parallel computers are those that execute programs in MIMD mode.
There are two major classes of parallel computers, namely, Shared memory multiprocessors and
message passing multicomputer. The major distinction between multiprocessors and multicomputer
lies in memory sharing and the mechanisms used for interprocessor communication.
The processors in a multiprocessor system communicate with each other through shared
variables in a common memory.
Each computer node in a multicomputer system has a local memory, unshared with other
nodes. lnterprocessor communication is done through message passing among the nodes.
4
Qn:Describe briefly about the operational model of SIMD computer with an example?
• Both instructions and data are fetched from the memory modules. Instructions are decoded by
the control unit, which sends the decoded instruction to the processor units for execution. Data
streams flow between the processors and the memory bidirectionally. Each instruction stream is
generated by an independent control unit.
According to Flynn’s classification, either of the instruction or data streams can be single or multiple.
Computer architecture can be classified
into the
single-instruction single-data streams (SISD);
single-instruction multiple-data streams (SIMD);
multiple-instruction single-data streams (MISD); and
multiple-instruction multiple-data streams (MIMD).
• Conventional sequential machines are called SISD -[single instruction stream over single data
stream] computers. Instructions are executed sequentially but may be overlapped in their
execution stages (pipelining).
3. MIMD(multiple instructions over multiple data stream) – most popular model. parallel computers
ters are reserved for MIMD
The same data stream flows through a linear array of processors executing different instruction streams.
This architecture is also known as systolic arrays For pipelined execution of specific algorithms.
Of the four machine models, most parallel computers built in the past assumed the MIMD model
for general purpose computations. The SIMD and MISD models are more suitable for special-purpose
6
computations. For this reason, MIMD is the most popular model, SIMD next, and MISD the least
popular model being applied in commercial machines.
• The address space of a processor in a computer system varies among different architectures. It
depends on the memory organization, which is machine—dependent. These features are up to the
designer and should match the target application domains.
• On the other hand, we want to develop application programs and programming environments
which are machine-independent. Independent of machine architecture, the user programs can be
ported to many computers with minimum conversion costs.
• High- level languages and communication models depend on the Architectural choices made in a
computer system. From a programmer's viewpoint, these two layers should be architecture-
transparent.
• Programming languages such as Fortran, C, C++, Pascal, Ada, Lisp and others can be supported
by most computers. However, the communication models, shared variables versus message
passing, are mostly machine-dependent.
Challenges in Parallel Processing: Major challenge is on the software and application side.It is still difficult
to program parallel and vector computers.High performance computers should provide fast and accurate
solutions to scientific ,engineering , business,social and defense problems.
The ideal performance of a computer system requires a perfect match between machine
capability and program behaviour.
Machine capability can be enhanced with better hardware technology, however program behaviour
is difficult to predict due to its dependence on application and run-time conditions.
Below are the five fundamental factors for projecting the performance of a computer.
CPU is driven by a clock of constant clock with a cycle time ().he inverse of cycle time is the clock
rate (f=1/
Size of the program is determined by the Instruction Count(Ic). Different instructions in a particular
program may require different number of clock cycles to execute. So,
Cycles per instruction (CPI):-is an important parameter for measuring the time needed to execute an
instruction .
Execution Time/CPU Time (T): Let Ic be Instruction Count or total number of instructions in the
program. The Execution Time or CPU time (T) will be:
• The execution of an instruction involves the instruction fetch, decode, operand fetch, execution
and store results.
Only instruction decode and execution phases are carried out in CPU. The remaining three operations
may require access to memory
The CPI of an instruction type can be divided into two component terms corresponding to the total
processor cycles and memory cycles needed to complete the execution of the instruction. That is :-
CPI = Instruction Cycle = Processor Cycles +Memory Cycles. ie
CPI = Instruction cycle = p + m + k
where
m = number of memory references
P = number of processor cycles
k = latency factor (how much the memory is slow w.r.t to CPU)
Eq.1.2
T = Ic (p+m+k)
From the above equation the five factors affecting performance are:-Ic ,p,m,k,
System Attributes: The above five performance factors (Ic ,p,m,k,) are influenced by four system
attributes: instruction-set architecture, compiler technology, CPU implementation and control,
and cache and memory hierarchy, as specified in Table 1.2 below.
The instruction-set architecture affects the program length (1,) and processor cycles needed (p). The
compiler technology affects the values of Ic, p, and the memory reference count (m). The CPU
implementation and control determine the total processor time (p *) needed. Finally, the memory
technology and hierarchy design affect the memory access latency (k * ). The above CPU time can be
used as a basis in estimating the execution rate of a processor.
8
MIPS Rate. The processor speed is often measured in terms of million instructions per second
(MIPS). We simply call it the MIPS rate of a given processor.
Let C be the total number of cycles needed to execute a given program (ie C=Ic * CPI).
Then the CPU time in Eq. 1.2 can be estimated as
T = C*
= C/f .
Or
MFLOPS:
Most compute intensive applications in science and engineering make heavy use of floating point
operations. For such applications a more relevant measure of performance is floating point operations
per second abbreviated as mflops. With prefix mega(10^6),giga(10^9) tera(10^12) or peta(10^15).
Floating-point performance is expressed as millions of floating-point operations per second (MFLOPS),
defined as follows ,only with floating-point instructions.
Number of executed floating-point operations in a program
9
MFLOPS = _______________________________________________
6
execution time * 10
Throughput Rate:- Number of programs executed per unit time is called system throughput ws(in
programs per second).In a multiprogrammed system, the system throughput is often lower than CPU
throughput Wp defined by
Wp = f Eq.1.4
Ic *CPI
W=1/T
OR
W =( MIPS*106 )/Ic
Problems:-
1 A benchmark program is executed on a 40MHz processor. The benchmark program has the
following statistics.
Calculate average CPI, MIPS rate & execution time for the above benchmark program
Given Clock speed of the processor= 40 MHz=40*106 Hz
Average CPI= C/Ic
=Total cycles to execute the program/Instruction count
=(45000 *1 + 32000*2 + 1500*2 + 8000*2) /(45000 + 3200 + 15000 + 8000)
=155000/100000
=1.55
2. Consider the execution of an object code with 2 *106 instructions on a 400 MHz processor. The
program consists of four major types of instructions. The instruction mix and the number of
cycles [CPI] needed for each instruction type are given below based on the result of a program
trace experiment:
10
(a) Calculate the average CPl when the program is executed on a uniprocessor with the above
trace results.
(b) Calculate the corresponding MIPS rate based on the CPI obtained in part (a).
Answer:
average CPI = 0.6 +(2 *0.18) +(4* 0.12) +(8 *0.1) =2.24.
MIPS rate = f/(CPI*106)
=(400* 106) /(2.24 *106) =178
Programming Environments:
Qn: Difference between Implicit Parallelism and Explicit forms of Parallelism?
The programmability of a computer depends on the programming environment provided to the users.
conventional uniprocessor computers are programmed in a sequential environment in which
instructions are executed one after another in a sequential manner. Parallel computers employs parallel
environment where parallelism is automatically exploited.. Based on the programming environments
required parallelism can be of two types:
11
Multiprocessors Multicomputer
1. Single computer with multiple 1. Multiple autonomous computers
processors
2. Each PE’s(CPU/processing elements) 2. Each PE’s has its own memory and
do not have their own individual resources – no sharing – Thus called
memories – memory and I/O resources Distributed Memory Multicomputers
are shared – Thus called Shared
Memory Multiprocessors
3. Communication between PE’s not
12
- UMA model
- NUMA model
COMA model
- Suitable for general purpose and time sharing application by multiple users
- Can be used to speed up execution of a single program in time critical application
13
DISADVANTAGES
- Interacting process cause simultaneous access to same locations – cause problem when an
update is followed by read operation (old value will be read)
- Poor Scalability – as no: of processors increase –shared memory area increase-thus n/w becomes
bottleneck.
- No: of processors usually in range(10-100)
Advantage
- – reduces n/w bottleneck issue that occurs in UMA as prcessors have a direct access path to
attached local memory
Application of COMA