ECE 338 Parallel Computer Architecture Spring 2022: Basic MIPS Pipeline Review

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

ECE 338

Parallel Computer Architecture


Spring 2022

Basic MIPS Pipeline Review

Nikos Bellas

Electrical and Computer Engineering Department


University of Thessaly

ECE338 PArallel Computer Architecture 1


Basic 5-stage MIPS pipeline

0 ID/EX
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB

4 Add
P Add
C
RegWrite <<2

Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
Data
Instruction register 1 MemToReg
Registers ALUOp memory
memory Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
Instr [20 - 16] 0
0
Instr [15 - 11]
1

ECE338 PArallel Computer Architecture


5-stage pipeline
Cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $t5, $t6, $0 IF ID EX MEM WB

• Clock period in the single cycle processor is Τc1 = 800ps


• Best case in the 5-stage pipeline would be 800/5=160ps.
• More realistically Τc2 = 200ps
• Execution time in an ideal pipeline (CPI=1):
– Pipeline Fill Time + 1 cycle per instruction
– For N instructions we need 4+N cycles.
– Execution Time = (4+N)*200ps = 200.8ns for N=1000 instructions
• Execution time for single-cycle processors:
– Execution Time = N*800ps = 800ns
• Speedup = 800/200.8 = 4
ECE338 PArallel Computer Architecture
Forwarding sub $2, $1, $3
and $12, $2, $5
or $13, $6, $2
add $14, $2, $2
sw $15, 100($2)
ID/EX
WB EX/MEM
Control M WB MEM/WB
IF/ID EX M WB

PC
Read Read 0
register 1 data 1 1
Addr Instr
Read 2 ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers memory
2
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt
0 0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

ECE338 PArallel Computer 4


Stalling lw $2, 20($3)
and $12,$2, $5
ID/EX.MemRead
Hazard
Unit ID/EX.RegisterRt
IF/ID Write
ID/EX
PC Write

Rs Rt 0 0 WB EX/MEM
M WB MEM/WB
Control 1
EX M WB
PC
IF/ID
Read Read 0
register 1 data 1 1
Addr Instr 2
Read ALU
register 2 Zero
ALUSrc
Write Read Result Address
0
Instruction register data 2
1 0 Data
memory
Write Registers 2 memory
data 1
Write Read
Instr [15 - 0] 1
RegDst data data
Extend
Rt 0
0
Rd
1 EX/MEM.RegisterRd
Rs

Forwarding
Unit

MEM/WB.RegisterRd

ECE338 PArallel Computer 5


Branches

0 ID/EX
WB EX/MEM
PCSrc M WB
Control MEM/WB
IF/ID EX M WB
4 Branch
Add
P Add Zero PCSrc
C
RegWrite << 2

Read Read
register 1 data 1 Zero MemWrite
ALU
Read Instruction Read
address [31-0] register 2 0
Read Result Address
data 2
Write
register Data
Instruction 1 MemToReg
Registers ALUOp memory
memory Write
data
ALUSrc Write Read
1
data data
Sign
extend RegDst
MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1

ECE338 PArallel Computer


6
Branches - Control hazards
Branches in our MIPS
pipeline are executed in
the third stage. The
decision is forwarded to
the PC at the fourth stage

What happens in the


meantime?

The CPU erroneously


fetches instruction and
instead of instruction lw
if beq is TAKEN.

We lose three cycles and


we need to flush the
pipeline.

ECE338 PArallel Computer 7


Branches - Pipeline flush
Κύκλος μηχανής
1 2 3 4 5 6 78

IM Reg DM Reg
beq $1, $3, 28

??? IM Reg DM Reg

• Flush the three instructions that have (erroneously) entered the


pipeline
• Can we reduce the three cycles?

ECE338 PArallel Computer 8


Branch prediction
• Modern CPUs have a large number of pipeline stages (>20)
• Branches are executed towards the end of the pipeline
• If the branch is TAKEN, the penalty can be very large (>10
cycles)
• One solution would be to predict the branch immediately
after it is fetched from the memory
• And follow the predicted path
• Hoping that the prediction is correct

ECE338 PArallel Computer 9


Dynamic branch prediction
• The direction (T/NT) of a branch B is correlated with
previous executions of B
• Especially recent executions of B

ECE338 PArallel Computer 10


A dynamic branch predictor
• Use a 4-state FSM to predict the direction of the branch branch.
• This FSM is located at the IF stage of the CPU

ECE338 PArallel Computer 11

You might also like