Lect27 Parallal Processing

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 15

Pipeline and Vector

Processing
 Parallel Processing
 Simultaneous data processing tasks for the purpose of
= computational
increasing the speed
 Perform concurrent data processing to achieve faster execution
time
 Multiple Functional Unit :
 Separate the execution unit into eight functional units operating in parallel
Adder-subtractor

Integer multiply

Logic unit

Shift unit

To Memory

Incrementer
Processor
registers
Floating-point
add-subtract

Floating-point
multiply

Floating-point
divide
 Pipelining : it is the process of Decomposing a sequential process into
suboperations with Each subprocess is executed in a special dedicated
segment concurrently with all other segments.
 It is a collection of processing segments through which binary information
flows. Where each segment performs partial processing dedicated by the
way the task is partioned.
 Pipelining
 Multiply의and예제 add: Fig. 9-2
operationAi:* Bi  ( for i = 1, 2,
 …, 7 ) Suboperation Segment
3 개의 Ci 로
분리» R1  Ai, R2  Bi : Input Ai and Bi
1) R3  R1* R2, R4  Ci : Multiply and
» R5  R3  R4 input Ci
2)
 Content : Add Ciexample :
of registers in pipeline
»
Tab. 9-1
3)
Ai Bi Ci

R1 R2

Multiplier

R3 R4

Adder

R5
Segment1 Segment 2 Segment 3
Clock pulse R1 R2 R3 R4 R5
Number

1 A1 B1 - - -
2 A2 B2 A1*B1 C1 -
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 A6 A5*B5 C5 A4*B4+C4
7 A7 A7 A6*B6 C6 A5*B5+C5
8- - A7*B7 C7 A6*B6+C6
9- - - - A7*B7+C7
 General considerations
 4 segment pipeline : the operand pass through
all four segments in a fixed sequence. Each segment consists
of a combinational ckt Si that performs a sub operation over
the data stream. The segments are separated by the
registers to hold the intermediate results.
Clock

Input S1 R1 S2 R2 S3 R3 S4 R4

Fig.: Four Segment pipeline


Space-time diagram :
»Show segment utilization as a function of time
Task : T1, T2, T3,…, T6 executed in four segments.
»Total operation performed going through all the
segment
Pipeline 에서의 처리 시간 = 9 clock cycles

Clock es 1 2 3 4 5 6 7 8 9
cycl
1 T1 T2 T3 T4 T5 T6
Segme

2 T1 T2 T3 T4 T5 T6
nt

3 T1 T2 T3 T4 T5 T6

4 T1 T2 T3 T4 T5 T6
 Speedup S : Nonpipeline / Pipeline
 S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn /
9 t»n =n 4
: task number ( 6 )
» tn : time to complete each task in nonpipeline ( 6 cycle times =
k+n-1 6 tp)
n » tp : clock cycle time ( 1 clock cycle )
 If n 이면 , S =number
» k: segment tn / tp ( 4 )
 If we assume that the time it takes to process a task is the same in the
pipeline and nonpipeline circuits then we have
nonpipeline ( tn ) = pipeline ( k • tp )

S = tn / tp = k • tp / tp = k
Where k is the number of segments.

 Arithmetic Pipeline
 Floating-point Adder Pipeline Example :
 Add / Subtract two normalized floating-point binary number

» X = A x 2a = 0.9504 x 103
» Y = B x 2b = 0.8200 x 102
 4 segments
suboperations
» 1) Compare exponents by
subtraction :
3-2=1
 X = 0.9504 x 103
 Y = 0.8200 x 102
» 2) Align mantissas
 X = 0.9504 x 103
 Y = 0.08200 x 103
» 3) Add mantissas
 Z = 1.0324 x 103
» 4) Normalize result
 Z = 0.1324 x 104
Exponen Mantiss
ts as
a b A B

R R

Compare D i ff e r e n c
Segment 1 exponent e
: s
by subtraction

Segment 2 C h o o s e exponent Align


: mantissas

A d d or
Segment 3
subtract
:
mantissas

R R

Adjust Normaliz
Segment 4
exponen e
:
t result

R R
Instruction
Pipeline
 Instruction Cycle

1) Fetch the instruction from


memory
2) Decode the instruction
3) Calculate the effective
address
4) Fetch the operands from
memory
5) Execute the instruction
6) Store the result in the proper
place
Segment 1 : Fetch instruction
from memory

Decode instruction
Segment 2 : and calculate the
effective address

Branch ?

Fetch operand
Segment 3 : from memory

Segment 4 :Execute instruction

Interrupt
handling Interrupt ?

Update PC

Empty pipe
 Example : Four-segment Instruction
Pipeline
 Four-segment CPU pipeline :
» 1) FI : Instruction Fetch
» 2) DA : Decode Instruction &
calculate EA
» 3) FO : Operand Fetch
» 4) EX : Execution
pTiming
Ste : of
1 Instruction
2 3 Pipeline
4 5: 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
:
2 FI DA FO EX

3 FI DA FO EX
(Branch)
4 FI FI DA FO EX

5 FI DA FO EX

6 FI DA FO EX

7 FI DA FO EX

No Branch
Branch
 Pipeline Conflicts : 3 major difficulties
 1) Resource conflicts
» memory access by two segments at the same time.
» Can be avoided by using separate instruction stream and data
memories.
 2) Data dependency
» when an instruction depend on the result of a previous
instruction, but this result is not
yet available
 3) Branch difficulties
» branch and other instruction (interrupt, ret, ..) that change the
value of PC
 Data Dependency 해결 방법
 Hardware 적인 방법
» Hardware Interlock
 previous instruction 의 결과가 나올 때 까지 Hardware 적인 Delay 를 강제 삽입
» Operand Forwarding
 previous instruction 의 결과를 곧바로 ALU 로 전달 ( 정상적인 경우 , register 를 경유함 )
 Software 적인 방법
» Delayed Load
 previous instruction 의 결과가 나올 때 까지 No-operation instruction 을 삽입
Assignment
 What do you mean by pipeline and parallel
processing.
 Explain vector processing.

You might also like