Lect27 Parallal Processing

Pipeline and Vector
Processing
 Parallel Processing
 Simultaneous data processing tasks for the purpose of
= computational
increasing the speed
 Perform concurrent data processing to achieve faster execution
time
 Multiple Functional Unit :
 Separate the execution unit into eight functional units operating in parallel
Adder-subtractor
Integer multiply
Logic unit
Shift unit
To Memory
Incrementer
Processor
registers
Floating-point
add-subtract
Floating-point
multiply
Floating-point
divide
 Pipelining : it is the process of Decomposing a sequential process into
suboperations with Each subprocess is executed in a special dedicated
segment concurrently with all other segments.
 It is a collection of processing segments through which binary information
flows. Where each segment performs partial processing dedicated by the
way the task is partioned.
 Pipelining
 Multiply의and예제 add: Fig. 9-2
operationAi:* Bi  ( for i = 1, 2,
 …, 7 ) Suboperation Segment
3 개의 Ci 로
분리» R1  Ai, R2  Bi : Input Ai and Bi
1) R3  R1* R2, R4  Ci : Multiply and
» R5  R3  R4 input Ci
2)
 Content : Add Ciexample :
of registers in pipeline
»
Tab. 9-1
3)
Ai Bi Ci
R1 R2
Multiplier
R3 R4
Adder
R5
Segment1 Segment 2 Segment 3
Clock pulse R1 R2 R3 R4 R5
Number
1 A1 B1 - - -
2 A2 B2 A1*B1 C1 -
3 A3 B3 A2*B2 C2 A1*B1+C1
4 A4 B4 A3*B3 C3 A2*B2+C2
5 A5 B5 A4*B4 C4 A3*B3+C3
6 A6 A6 A5*B5 C5 A4*B4+C4
7 A7 A7 A6*B6 C6 A5*B5+C5
8- - A7*B7 C7 A6*B6+C6
9- - - - A7*B7+C7
 General considerations
 4 segment pipeline : the operand pass through
all four segments in a fixed sequence. Each segment consists
of a combinational ckt Si that performs a sub operation over
the data stream. The segments are separated by the
registers to hold the intermediate results.
Clock
Input S1 R1 S2 R2 S3 R3 S4 R4
Fig.: Four Segment pipeline

Space-time diagram :
»Show segment utilization as a function of time
Task : T1, T2, T3,…, T6 executed in four segments.
»Total operation performed going through all the
segment
Pipeline 에서의 처리 시간 = 9 clock cycles
Clock es 1 2 3 4 5 6 7 8 9
cycl
1 T1 T2 T3 T4 T5 T6
Segme
2 T1 T2 T3 T4 T5 T6
nt
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
 Speedup S : Nonpipeline / Pipeline
 S = n • tn / ( k + n - 1 ) • tp = 6 • 6 tn / ( 4 + 6 - 1 ) • tp = 36 tn /
9 t»n =n 4
: task number ( 6 )
» tn : time to complete each task in nonpipeline ( 6 cycle times =
k+n-1 6 tp)
n » tp : clock cycle time ( 1 clock cycle )
 If n 이면 , S =number
» k: segment tn / tp ( 4 )
 If we assume that the time it takes to process a task is the same in the
pipeline and nonpipeline circuits then we have
nonpipeline ( tn ) = pipeline ( k • tp )
S = tn / tp = k • tp / tp = k
Where k is the number of segments.
 Arithmetic Pipeline
 Floating-point Adder Pipeline Example :
 Add / Subtract two normalized floating-point binary number
» X = A x 2a = 0.9504 x 103
» Y = B x 2b = 0.8200 x 102
 4 segments
suboperations
» 1) Compare exponents by
subtraction :
3-2=1
 X = 0.9504 x 103
 Y = 0.8200 x 102
» 2) Align mantissas
 X = 0.9504 x 103
 Y = 0.08200 x 103
» 3) Add mantissas
 Z = 1.0324 x 103
» 4) Normalize result
 Z = 0.1324 x 104
Exponen Mantiss
ts as
a b A B
R R
Compare D i ff e r e n c
Segment 1 exponent e
: s
by subtraction
Segment 2 C h o o s e exponent Align

: mantissas
A d d or
Segment 3
subtract
:
mantissas
R R
Adjust Normaliz
Segment 4
exponen e
:
t result
R R
Instruction
Pipeline
 Instruction Cycle
1) Fetch the instruction from

memory
2) Decode the instruction
3) Calculate the effective
address
4) Fetch the operands from
memory
5) Execute the instruction
6) Store the result in the proper
place
Segment 1 : Fetch instruction
from memory
Decode instruction
Segment 2 : and calculate the
effective address
Branch ?
Fetch operand
Segment 3 : from memory
Segment 4 :Execute instruction
Interrupt
handling Interrupt ?
Update PC
Empty pipe
 Example : Four-segment Instruction
Pipeline
 Four-segment CPU pipeline :
» 1) FI : Instruction Fetch
» 2) DA : Decode Instruction &
calculate EA
» 3) FO : Operand Fetch
» 4) EX : Execution
pTiming
Ste : of
1 Instruction
2 3 Pipeline
4 5: 6 7 8 9 10 11 12 13
Instruction 1 FI DA FO EX
:
2 FI DA FO EX
3 FI DA FO EX
(Branch)
4 FI FI DA FO EX
5 FI DA FO EX
6 FI DA FO EX
7 FI DA FO EX
No Branch
Branch
 Pipeline Conflicts : 3 major difficulties
 1) Resource conflicts
» memory access by two segments at the same time.
» Can be avoided by using separate instruction stream and data
memories.
 2) Data dependency
» when an instruction depend on the result of a previous
instruction, but this result is not
yet available
 3) Branch difficulties
» branch and other instruction (interrupt, ret, ..) that change the
value of PC
 Data Dependency 해결 방법
 Hardware 적인 방법
» Hardware Interlock
 previous instruction 의 결과가 나올 때 까지 Hardware 적인 Delay 를 강제 삽입
» Operand Forwarding
 previous instruction 의 결과를 곧바로 ALU 로 전달 ( 정상적인 경우 , register 를 경유함 )
 Software 적인 방법
» Delayed Load
 previous instruction 의 결과가 나올 때 까지 No-operation instruction 을 삽입
Assignment
 What do you mean by pipeline and parallel
processing.
 Explain vector processing.

Lect27 Parallal Processing

Uploaded by

Copyright:

Available Formats

Lect27 Parallal Processing

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Lect27 Parallal Processing

Uploaded by

Copyright:

Available Formats

Pipeline and Vector

Fig.: Four Segment pipeline

Segment 2 C h o o s e exponent Align

1) Fetch the instruction from

Segment 4 :Execute instruction

You might also like