CA Chap4 Cpu Nlt2021 Part1
CA Chap4 Cpu Nlt2021 Part1
CA Chap4 Cpu Nlt2021 Part1
Performance metric
CPU time = CPI * CC * IC
How to improve?
• IC:
• CC:
• CPI:
In this chapter
• Implementation of data path
• How to get CPI < 1
❑ Multiplexer
❑ Control
clock Add
4
Fetch
PC = PC+4 Instruction
Memory
Exec Decode Read
PC Instruction
Address
Fetch Control
PC = PC+4 Unit
Exec Decode
Read Addr 1
Register Read
Read Addr 2 Data 1
Instruction
File
Write Addr Read
Data 2
Write Data
Fetch
PC = PC+4
Example: add s1, s2, s3
- Value of s2 and s3 are sent to ALU
Exec Decode
- ALU execute the s2 + s3 operation
- Result is store into s1
Fetch
PC = PC+4
Exec Decode
Read Addr 1
Fetch Register Read
PC = PC+4 Read Addr 2 Data 1 overflow
Instruction
File ALU zero
Write Addr
Exec Decode Read
Data 2
Write Data
overflow
Read Addr 1 zero
Register Read Address
Read Addr 2 Data 1
Instruction Data
File ALU Memory Read Data
Write Addr Read
Data 2 Write Data
Write Data
Sign MemRead
Extend
overflow
Read Addr 1 zero
Register Read Address
Read Addr 2 Data 1
Instruction Data
File ALU Memory Read Data
Write Addr Read
Data 2 Write Data
Write Data
Sign MemRead
16 Extend 32
Add Branch
4 Add target
Shift
address
left 2
ALU control
PC
Add
4
4
Jump
Instruction Shift address
Memory
left 2 28
PC Read Instruction
Address 26
Add
RegWrite ALUSrc ALU control MemWrite MemtoReg
4
ovf
zero
Instruction Read Addr 1
Register Read Address
Memory
Read Addr 2 Data 1 Data
Read File
PC Instruction ALU Memory Read Data
Address Write Addr
Read
Data 2 Write Data
Write Data
MemRead
Sign
16 Extend 32
❑ Operations
And, or, nor b
flag
Add, subtract
ALU control
❑ What do we have if
Operation = 0:
Operation = 1:
❑ Operation = 00:
❑ Operation = 01:
❑ Operation = 10:
CO&ISA, NLT 2021 21
How about 1-bit ALU with AND, OR, ADD, SUB?
❑ a-b = a + (-b) = a + (2’s complement of b)
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Mark active
Instr[15-0] Sign
connections during ALU
16 Extend 32 control
execution flow
Instr[5-0]
RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1
Mark active
Instr[15-0] Sign
connections during ALU
16 Extend 32 control
execution flow
Instr[5-0]
Cycle 1 Cycle 2
Clk
lw sw Waste
❑ Pipelining:
l Start fetching and executing the next instruction before the
current one has completed
l Overlapping execution
◼ With 4 loads
Tnormal = 4*2 = 8 hours
Tpipeline = 3.5 hours
◼ With n loads
Tnormal = n*2 hours
Tpipeline = (3+n)/2 hours
Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5
Start fetching and executing the More than one instruction are
next instruction before the current executed at a time
one has completed
CO&ISA, NLT 2021 45
Single Cycle versus Pipeline
lw sw Waste
ALU
I Mem Reg Mem Reg
memory
n
s
ALU
t Inst 1 Mem Reg Mem Reg
r.
ALU
O Inst 2 Mem Reg Mem Reg
r
d
ALU
e Inst 3 Mem Reg Mem Reg
r
ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory
❑ Fix with separate instr and data memories (I$ and D$)
CO&ISA, NLT 2021 52
How About Register File Access?
Time (clock cycles)
ALU
I IM Reg DM Reg access hazard by
n doing reads in the
s second half of the
ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half
ALU
O Inst 2 IM Reg DM Reg
r
d
ALU
e add $2,$1, IM Reg DM Reg
r
ALU
add $1, IM Reg DM Reg
ALU
sub $4,$1,$5 IM Reg DM Reg
ALU
and $6,$1,$7 IM Reg DM Reg
ALU
or $8,$1,$9 IM Reg DM Reg
ALU
xor $4,$1,$5 IM Reg DM Reg
ALU
I lw $1,4($2) IM Reg DM Reg
n
s
ALU
t sub $4,$1,$5 IM Reg DM Reg
r.
ALU
O and $6,$1,$7 IM Reg DM Reg
r
d
ALU
e or $8,$1,$9 IM Reg DM Reg
r
ALU
xor $4,$1,$5 IM Reg DM Reg
❑ In MIPS pipeline
Need to compare registers and compute target early in the
pipeline
Add hardware to do it in ID stage
beq
ALU
I IM Reg DM Reg
n
s
ALU
t lw IM Reg DM Reg
r.
ALU
O Inst 3 IM Reg DM Reg
r
d
ALU
e Inst 4 IM Reg DM Reg
r
Prediction
correct
Prediction
incorrect
Once the
ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction
s is completed
ALU
t Inst 1 IM Reg DM Reg
every cycle, so
r. CPI = 1
ALU
O Inst 2 IM Reg DM Reg
r
d
ALU
e Inst 3 IM Reg DM Reg
r
ALU
Inst 4 IM Reg DM Reg