CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
CCS CMCS 611-101 Advanced Computer Architecture Advanced Computer Architecture
C CS 611-101
Advanced Computer Architecture
Lecture 7
Pipeline Hazards
www.csee.umbc.edu/~younis/CMSC611/CMSC611.htm
Î Pipeline performance
• Performance improvement by increasing instruction throughput
pp bound for speedup
• Ideal and upper p p is number of stages
g in p
pipeline
p
This Lecture:
• Structural,
S l ddata and
d controll h
hazards
d
• Data Hazard resolution techniques
• Pipelined control
Mohamed Younis CMCS 611, Advanced Computer Architecture 2
Stages of Instruction Execution
C l 1
Cycle C l 2
Cycle C l 3
Cycle C l 4
Cycle C l 5
Cycle
Load
Ifetch Reg/Dec Exec Mem WB
ALU
A
I M
Mem R
Reg M
Mem R
Reg
n Load
s
ALU
Mem Reg Mem Reg
t Instr 1
U
r.
ALU
Mem Reg Mem Reg
O Instr 2
r
ALU
d Mem Reg Mem Reg
e
Instr 3
r
AL
Mem Reg Mem Reg
I t 4
Instr
LU
Can be easily detected Resolved by inserting idle cycles
* Slide is courtesy of Dave Patterson
1
Speedup = × Pipeline depth
1 + Pipeline stall cycles per instruction
Mohamed Younis CMCS 611, Advanced Computer Architecture 8
Control Hazard
Stall: wait until decision is clear
Î It is possible to move up decision to 2nd stage by adding hardware to
check registers as being read
I Time (clock cycles)
n
s
ALU
Mem Reg Mem Reg
t Add
U
r.
ALU
Mem Reg Mem Reg
O Beq
q
r
d
Load
ALU
e
Stall Mem Reg Mem Reg
r
ALU
Mem Reg Mem Reg
t Add
r.
ALU
A
Mem Reg Mem Reg
O Beq
r
d
Load
ALU
A
e Mem Reg Mem Reg
r
ALU
Mem Reg Mem Reg
t
r.
Add
ALU
Mem Reg Mem Reg
O Beq
r
d
ALU
A
e Misc Mem Reg Mem Reg
r
AL
L d
Load Mem R
Reg Mem Reg
LU
Impact: 0 clock cycles per branch instruction if can find
instruction to put in “slot” (50% of time)
* Slide is courtesy of Dave Patterson
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
s Im Reg Dm Reg
t
sub r4,r1,r3
r.
ALU
A
Im Reg Dm Reg
O
and r6,r1,r7
r
ALU
Im Reg Dm Reg
d or r8,r1,r9
r8 r1 r9
U
e
r
ALU
Im Reg Dm Reg
xor r10,r1,r11
ALU
I add r1,r2,r3 Im Reg Dm Reg
ALU
s Im Dm Reg
t sub r4,r1,r3 Reg
r.
ALU
Im Reg Dm Reg
O and r6,r1,r7
r
ALU
A
d Im Reg Dm Reg
e or r8,r1,r9
r
AL
Im Reg Reg
xor r10,r1,r11
10 1 11 D
Dm
LU
“Forward” result from one stage to another
* Slide is courtesy of Dave Patterson
¾The ALU result from EX/MEM register is fed back and kept in next stages
¾ If data hazard is detected the forward values will be used
Mohamed Younis CMCS 611, Advanced Computer Architecture 14
Example
Add R1 R2
R1, R2, R3
LW R4, 0(R1)
SW 12(R1), R4
L W R 1 , 0 (R 2 ) IF ID EX M EM 1 M EM 2 WB
AD D R 1, R 2, R 3 IF ID EX WB
Example:
Example:compile
compilethe
thefollowing:
following:
ads that ca
aa==bb++c;c; dd==ee––f;
ff;
f
LW
LW Rb,
Rb,bb
Fraction of loa
LW
LW Rc,
Rc,cc
LW
ADD Ra,eRb, Rc Swapped
Re, (stall)
ADD
LW Ra,
Re,,,Rb,e , Rc
LW
SW Rf,
a, Ra
f
Swapped
SW
LW a,
Rf,Ra f
SUB
SUB Rd
Rd,
Rd
Rd,ReRe,
Re
Re,RfRf (stall)
SW
SW d,
d,Rd
Rd
Benchmark
Mohamed Younis CMCS 611, Advanced Computer Architecture 20
Data Hazards Detection
Detecting hazards early in the pipeline reduces hardware complexity since
the machine state will not get erroneously changed
For the MIPS integer pipeline, all data hazards can be checked in ID stage
Situation Example
p code Action
E
Example:
l sequence
Load No LW R1, 45 (R2) No hazard possible because no dependence
dependence ADD R5,R6,R7 exists on R1 in the immediately following
interlock three instructions
SUB R8
R8,R6,R7
R6 R7
detection OR R9, R6, R7
Dependence LW R1, 45 (R2) Comparators detect the use of R1 in the ADD
requiring stall ADD R5,R1,R7 and stall the ADD (and SUB and OR) before
SUB R8
R8,R6,R7
R6 R7 the ADD begins EX
OR R9, R6, R7
Dependence LW R1, 45 (R2) Comparators detect the use of R1 in the SUB
overcome by ADD R5,R6,R7 and forward result of load to ALU in time for
f
forwarding
di SUB R8,R1 ,R7 SUB to b
begin
i EX
OR R9, R6, R7
Dependence LW R1, 45 (R2) No action required because the read of R1 by
with accesses ADD R5,R6,R7 OR occurs in the second half of the ID p
phase,,
in order SUB R8,R6,R7 while the write of the loaded data occurred in
OR R9, R1, R7 the first half.
Load Load, store, ALU imm., or branch ID/EX. IR 11..15 == IF/ID.IR 6..10
Control logic is simple combinational circuit with input from ID/EX and IF/ID
Once the hazard is detected the control unit must insert the pipeline stall and
prevent the instructions in the IF and ID stages from advancing
Since all control logic is derived from the data stationary, stalling the pipeline
is simply by setting the ID/EX portion to zero (matching the NOP instruction)
In case of a stall,, the contents of the IF/ID registers
g will be re-circulated to
hold the stalled instruction
Next Lecture
Î Pipeline control hazards
Î Pipelining and exception handling
Reading assignment includes Appendix A.2 & A.3 in the textbook
Mohamed Younis CMCS 611, Advanced Computer Architecture 24