Lec-10 Software Pipelining
Lec-10 Software Pipelining
Lec-10 Software Pipelining
Ajit Pal
Professor Department of Computer Science and Engineering Indian Institute of Technology Kharagpur INDIA -721302
The growth in code size due to loop unrolling (may increase cache miss rates) Shortfall of registers created by aggressive unrolling and scheduling (register pressure)
Ajit Pal, IIT Kharagpur
Recap
Loop unrolling improves the performance by eliminating overhead instructions Loop unrolling is a simple but useful method to increase the size of straight-line code fragments Sophisticated high-level transformations led to significant increase in complexity of the compilers
Software Pipelining
Eliminates loop-independent dependence through code restructuring Reduces stalls Helps achieve better performance in pipelined execution As compared to simple loop unrolling: Consumes less code space
Software Pipelining
Exactly just as it happens in a hardware pipeline: In each iteration of a software pipelined code, some instruction of some iteration of the original loop is executed
DDG
a b c
4x unrolled loop a b a c b a c b a c b c
Kernel
c b
Software Pipelining
Central idea: reorganize loops Each iteration is made from instructions chosen from different iterations of the original loop
i0
i1 i2 i3
i4
i5
Software Pipelining
How is this done? 1 unroll loop body with an unroll factor of n. (we have taken n = 3 for our example) 2 select order of instructions from different iterations to pipeline 3 paste instructions from different iterations into the new pipelined loop body
Ajit Pal, IIT Kharagpur
F4,0(R1)
R1,R1,#-8
; store result
; decrement pointer
Note: 1. We are unrolling the loop. Hence no loop overhead Instructions are needed! 2. A single loop body of restructured loop would contain instructions from different iterations of the original loop body
ADD.D
S.D Iteration i + 2: L.D ADD.D S.D
F4,0(R1)
F0,0(R1) F4,F0,F2 F4,0(R1)
ADD.D F4,F0,F2
S.D Iteration i + 2: L.D ADD.D S.D
F4,0(R1)
F0,0(R1) F4,F0,F2 F4,0(R1) 3.)
2. Each instruction (L.D ADD.D S.D) must be selected at least once to make sure that we dont leave out any instruction of the original loop in the pipelined loop.
L.D
2.) 3.)
ADD.D
L.D
F4,F0,F2
F0,0(R1)
ADD.D
S.D
F4,F0,F2
F4,0(R1)
F4,16(R1) F4,F0,F2
; M[ i ] ; M[ i 1 ]
L.D
DADDUI BNE
F0,0(R1)
R1,R1,#-8 R1,R2,Loop
; M[ i 2 ]
Postheader
Points to Remember
What is pipelining? It is an implementation technique where multiple tasks are performed in an overlapped manner When can it be implemented? It can be implemented when a task can be divided into two or subtasks, which can be performed independently The earliest use of parallelism in designing CPUs (since 1985) to enhance processing speed was Pipelining Pipelining does not reduces the execution time of a single instruction, it increases the throughput CISC processors are not suitable for pipelining because of: Variable instruction format Variable execution time Complex addressing mode RISC processors are suitable for pipelining because of: Fixed instruction format Fixed execution time Limited addressing modes
Ajit Pal, IIT Kharagpur
Points to Remember
There are situations, called hazards, that prevent the next instruction stream from getting executed in its designated clock cycle Three major types:
Structural hazards: Not enough HW resources to keep all instructions moving Data hazards: Data results of earlier instructions not available yet Control hazards: Control decisions resulting from earlier instruction (branches) not yet made; dont know which new instruction to execute
Structural Hazard can be overcome using additional hardware Data Hazards can be overcome using additional hardware (forwarding) or software (compiler)
Ajit Pal, IIT Kharagpur
Thanks!
Ajit Pal, IIT Kharagpur