Ca Unit 2.2
Ca Unit 2.2
Ca Unit 2.2
Parallel Processing
Pipelining
Arithmetic Pipeline Instruction Pipeline RISC Pipeline Vector Processing Array Processors
Computer Organization
Parallel Processing
PARALLEL PROCESSING
Execution of Concurrent Events in the computing process to achieve faster Computational Speed Levels of Parallel Processing
- Job or Program level - Task or Procedure level
- Inter-Instruction level
- Intra-Instruction level
Computer Organization
Parallel Processing
PARALLEL COMPUTERS
Architectural Classification
Flynn's classification Based on the multiplicity of Instruction Streams and Data Streams Instruction Stream
Sequence of Instructions read from memory
Data Stream
Operations performed on the data in the processor
Single
Number of Single Instruction Streams Multiple
Computer Organization
Multiple
SIMD MIMD
Computer Architectures Lab
SISD MISD
Parallel Processing
MISD SIMD
MIMD Reduction
Message-passing multicomputers
Hypercube Mesh Reconfigurable Computer Organization
Parallel Processing
Instruction stream
Characteristics
- Standard von Neumann machine - Instructions and data are stored in memory - One operation at a time
Limitations
Von Neumann bottleneck Maximum speed of the system is limited by the Memory Bandwidth (bits/sec or bytes/sec) - Limitation on Memory Bandwidth - Memory is shared by CPU and I/O
Computer Organization
Parallel Processing
Multiprogramming Spooling Multifunction processor Pipelining Exploiting instruction-level parallelism - Superscalar - Superpipelining - VLIW (Very Long Instruction Word)
Computer Organization
Parallel Processing
CU
Memory
CU
Data stream
Instruction stream
Characteristics
- There is no computer at present that can be classified as MISD
Computer Organization
Parallel Processing
Control Unit
Instruction stream
Data stream
Processor units
Alignment network
Memory modules
Characteristics
- Only one copy of the program exists - A single controller executes one instruction at a time
Computer Organization
Parallel Processing
Systolic Arrays
- Regular arrangement of a large number of very simple processors constructed on VLSI circuits - CMU Warp, Purdue CHiP
Associative Processors
- Content addressing - Data transformation operations over many sets of arguments with a single instruction - STARAN, PEPE
Computer Organization
10
Parallel Processing
Interconnection Network
Shared Memory
Characteristics
- Multiple processing units
- Execution of multiple instructions on multiple data
11
Parallel Processing
Interconnection Network(IN)
Characteristics
All processors have equally direct access to one large memory address space
Example systems
Bus and cache-based systems - Sequent Balance, Encore Multimax Multistage IN-based systems - Ultracomputer, Butterfly, RP3, HEP Crossbar switch-based systems - C.mmp, Alliant FX/8
Limitations
Memory access latency Hot spot problem
Computer Organization
12
Parallel Processing
MESSAGE-PASSING MULTICOMPUTER
Message-Passing Network Point-to-point connections
Characteristics
- Interconnected computers - Each processor has its own memory, and communicate via message-passing
Example systems
- Tree structure: Teradata, DADO - Mesh-connected: Rediflow, Series 2010, J-Machine - Hypercube: Cosmic Cube, iPSC, NCUBE, FPS T Series, Mark III
Limitations
- Communication overhead - Hard to programming
Computer Organization
13
Pipelining
PIPELINING
A technique of decomposing a sequential process into suboperations, with each subprocess being executed in a partial dedicated segment that operates concurrently with all other segments.
Ai * Bi + Ci
Ai Segment 1 R1 R2 Bi
for i = 1, 2, 3, ... , 7
Memory Ci
Multiplier
Segment 2
R3 R4
Segment 3
Adder
R5
R1 Ai, R2 Bi R3 R1 * R2, R4 Ci R5 R3 + R4
Computer Organization
14
Pipelining
A1 * B1 + C1 A2 * B2 + C2 A3 * B3 + C3 A4 * B4 + C4 A5 * B5 + C5 A6 * B6 + C6 A7 * B7 + C7
Computer Organization
15
Pipelining
GENERAL PIPELINE
General Structure of a 4-Segment Pipeline
Clock
Input
S1
R1
S2
R2
S3
R3
S4
R4
Space-Time Diagram
1 Segment 1 2 T1 2 T2 T1 3 T3 T2 4 T4 T3 5 T5 T4 6 T6 T5 T6 7 8 9 Clock cycles
3
4
T1
T2
T1
T3
T2
T4
T3
T5
T4
T6
T5 T6
Computer Organization
16
Pipelining
PIPELINE SPEEDUP
n: Number of tasks to be performed Conventional Machine (Non-Pipelined) tn: Clock cycle : Time required to complete the n tasks = n * tn Pipelined Machine (k stages) tp: Clock cycle (time to complete each suboperation) : Time required to complete the n tasks = (k + n - 1) * tp
Sk =
tn tp
( = k, if tn = k * tp )
Computer Organization
17
Pipelining
I i+2
I i+3
P1
P 2
P 3
P4
Computer Organization
18
Arithmetic Pipeline
ARITHMETIC PIPELINE
Floating-point adder
X = A x 2a Y = B x 2b [1] [2] [3] [4] Compare the exponents Align the mantissa Add/sub the mantissa Normalize the result
Exponents a b R Mantissas A B R
Segment 1:
Difference
Segment 2:
Choose exponent
Align mantissa R
Segment 3:
Segment 4:
Adjust exponent R
Normalize result R
Computer Organization
19
Arithmetic Pipeline
Stages: S1
Exponent subtractor
Other fraction
r = max(p,q) t = |p - q|
S2 r
Fraction adder c
S3
c Left shifter
r
d S4 Exponent adder
Computer Organization
20
Instruction Pipeline
INSTRUCTION CYCLE
Six Phases* in an Instruction Cycle [1] Fetch an instruction from memory [2] Decode the instruction [3] Calculate the effective address of the operand [4] Fetch the operands from memory [5] Execute the operation [6] Store the result in the proper place * Some instructions skip some phases * Effective address calculation can be done in the part of the decoding phase * Storage of the operation result into a register is done automatically in the execution phase ==> 4-Stage Pipeline [1] FI: Fetch an instruction from memory [2] DA: Decode the instruction and calculate the effective address of the operand [3] FO: Fetch the operand [4] EX: Execute the operation
Computer Organization
21
Instruction Pipeline
INSTRUCTION PIPELINE
Pipelined
i FI i+1 DA FO FI i+2 EX EX
DA FO
FI
DA FO EX
Computer Organization
22
Instruction Pipeline
Segment2:
Update PC
Empty pipe
Step: Instruction 1 2 (Branch) 3 4 5 6 7 1 FI 2 DA FI 3 FO DA FI 4 EX FO DA FI EX FO EX FI DA FI FO DA FI EX FO DA FI EX FO DA EX FO EX 5 6 7 8 9 10 11 12 13
Computer Organization