CA Chap4 Cpu Nlt2021 Part1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 81

Chapter 4: The Processor

Ngo Lam Trung

[with materials from Computer Organization and Design, 4th Edition,


Patterson & Hennessy, © 2008, MK
and M.J. Irwin’s presentation, PSU 2008]
CO&ISA, NLT 2021 1
Review

Performance metric
CPU time = CPI * CC * IC

CPI: cycle per instruction


CC: clock cycle
IC: instruction count

How to improve?
• IC:
• CC:
• CPI:

In this chapter
• Implementation of data path
• How to get CPI < 1

CO&ISA, NLT 2021 2


Overview

❑ We will examine two MIPS implementations


A simplified version
A more realistic pipelined version

❑ Limit to a simple subset of MIPS ISA


Memory reference: lw, sw
Arithmetic/logical: add, sub, and, or, slt
Control transfer: beq, j
❑ Implementation of real CPU with other instructions are
similar to the simplified version (theoretically!)

CO&ISA, NLT 2021 3


General instruction cycle
❑ Generic implementation
use the program counter (PC) to supply Fetch
the instruction address and fetch the PC = PC+4
instruction from memory (and update the PC)
Exec Decode
decode the instruction (and read registers)
execute the instruction

❑ All instructions (except j) use the ALU after reading the


registers
ALU: Arithmetic and Logic Unit, where the arithmetic and logic
operations are executed

❑ In this chapter: implementation of CPU that can execute


the simple subset of MIPS ISA

CO&ISA, NLT 2021 4


CPU implementation with MUXes and Control

❑ Multiplexer
❑ Control

Don’t panic! We’ll build this incrementally.


CO&ISA, NLT 2021 5
Fetching Instructions
❑ Fetching instructions involves
reading the instruction from the Instruction Memory
updating the PC value to be the address of the next instruction
in memory

clock Add

4
Fetch
PC = PC+4 Instruction
Memory
Exec Decode Read
PC Instruction
Address

CO&ISA, NLT 2021 6


Decoding Instructions
❑ Decoding instructions involves
sending the fetched instruction’s opcode and function field
bits to the control unit
The control unit send appropriate control signals to other
parts inside CPU to execute the operations corresponds to
the instruction

Fetch Control
PC = PC+4 Unit

Exec Decode

Read Addr 1
Register Read
Read Addr 2 Data 1
Instruction
File
Write Addr Read
Data 2
Write Data

• Example: reading two values from the Register File


→Register File addresses are contained in the instruction
CO&ISA, NLT 2021 7
Executing R Format Operations
❑ R format operations (add, sub, slt, and, or)
31 25 20 15 10 5 0
R-type: op rs rt rd shamt funct
read two register operands rs and rt
perform operation (op and funct) on values in rs and rt
store the result back into the Register File (into location rd)

Fetch
PC = PC+4
Example: add s1, s2, s3
- Value of s2 and s3 are sent to ALU
Exec Decode
- ALU execute the s2 + s3 operation
- Result is store into s1

CO&ISA, NLT 2021 8


Executing R Format Operations
❑ R format operations (add, sub, slt, and, or)
31 25 20 15 10 5 0
R-type: op rs rt rd shamt funct
read two register operands rs and rt
perform operation (op and funct) on values in rs and rt
store the result back into the Register File (into location rd)

Fetch
PC = PC+4

Exec Decode

add s1, s2, s3

Draw connection between a and b to form the execution unit?


CO&ISA, NLT 2021 9
Executing R Format Operations
❑ R format operations (add, sub, slt, and, or)
31 25 20 15 10 5 0
R-type: op rs rt rd shamt funct
read two register operands rs and rt
perform operation (op and funct) on values in rs and rt
store the result back into the Register File (into location rd)
RegWrite ALU control

Read Addr 1
Fetch Register Read
PC = PC+4 Read Addr 2 Data 1 overflow
Instruction
File ALU zero
Write Addr
Exec Decode Read
Data 2
Write Data

We need the write control signal to control when the result is


written to Register File
CO&ISA, NLT 2021 10
Executing Load and Store Operations
❑ Load and store operations involves
read register operands (including one base register)
compute memory address by adding the base to the offset
- The 16-bit offset field in the instruction is sign-extended to 32 bit
store: read from the Register File, write to the Data Memory
load: read from the Data Memory, write to the Register File

RegWrite ALU control MemWrite

overflow
Read Addr 1 zero
Register Read Address
Read Addr 2 Data 1
Instruction Data
File ALU Memory Read Data
Write Addr Read
Data 2 Write Data
Write Data

Sign MemRead
Extend

CO&ISA, NLT 2021


Draw necessary connections to form execution unit? 11
Executing Load and Store Operations
❑ Load and store operations involves
read register operands (including one base register)
compute memory address by adding the base to the offset
- The 16-bit offset field in the instruction is signed-extended to 32 bit
store: read from the Register File, write to the Data Memory
load: read from the Data Memory, write to the Register File

RegWrite ALU control MemWrite

overflow
Read Addr 1 zero
Register Read Address
Read Addr 2 Data 1
Instruction Data
File ALU Memory Read Data
Write Addr Read
Data 2 Write Data
Write Data

Sign MemRead
16 Extend 32

CO&ISA, NLT 2021 12


Executing Branch Operations
❑ Branch operations involves
read register operands
compare the operands (subtract, check zero ALU output)
compute the branch target address: adding the updated PC to the
16-bit signed-extended offset field in the instr

Add Branch
4 Add target
Shift
address
left 2

ALU control
PC

Read Addr 1 zero (to branch


RegisterRead control logic)
Instruction Read Addr 2Data 1
File ALU
Write Addr Read
Draw necessary Data 2
Write Data
connections to form
execution unit?
Sign
16 Extend 32
CO&ISA, NLT 2021 13
Executing Jump Operations
❑ Jump operation involves
keep 4 highest bits of PC
replace the lower 28 bits of the PC by
- the lower 26 bits of the fetched instruction shifted left by 2 bits

Add
4
4
Jump
Instruction Shift address
Memory
left 2 28
PC Read Instruction
Address 26

CO&ISA, NLT 2021 14


Creating a Single Datapath from the Parts

❑ Assemble the datapath segments and add control lines


and multiplexors as needed
❑ Single cycle design – fetch, decode and execute each
instructions in one clock cycle
separate Instruction Memory and Data Memory, though they
are both in main memory
multiplexors needed at the input of shared elements with
control lines to do the selection
write signals to control writing to the Register File and Data
Memory

CO&ISA, NLT 2021 15


Fetch, R, and Memory Access Portions

Add
RegWrite ALUSrc ALU control MemWrite MemtoReg
4
ovf
zero
Instruction Read Addr 1
Register Read Address
Memory
Read Addr 2 Data 1 Data
Read File
PC Instruction ALU Memory Read Data
Address Write Addr
Read
Data 2 Write Data
Write Data

MemRead
Sign
16 Extend 32

CO&ISA, NLT 2021 16


Designing ALU
❑ Input/output
Two data input: a, b
ALU control signal a
Data out
Flags out out

❑ Operations
And, or, nor b
flag
Add, subtract

ALU control

CO&ISA, NLT 2021 17


1-bit ALU with logic operation

❑ What do we have if
Operation = 0:
Operation = 1:

CO&ISA, NLT 2021 18


1-bit full-adder

➔ We already designed this in prev. chapter

CO&ISA, NLT 2021 20


1-bit ALU with AND, OR, ADD

❑ Operation = 00:
❑ Operation = 01:
❑ Operation = 10:
CO&ISA, NLT 2021 21
How about 1-bit ALU with AND, OR, ADD, SUB?
❑ a-b = a + (-b) = a + (2’s complement of b)

❑ For SUB operation


Operation =
Binvert =
CarryIn =
CO&ISA, NLT 2021 22
How to add NOR operation?
❑ ത 𝑏ത
𝑎 + 𝑏 = 𝑎.

❑ Find control signal for NOR operation:

CO&ISA, NLT 2021 23


Adding SLT operation
❑ Check highest bit of (a-b)
1: a < b → set
0: a >= b → clear

CO&ISA, NLT 2021 24


Building 32-bit ALU from 1-bit ALUs

CO&ISA, NLT 2021 25


Final ALU with AND, OR, NOR, ADD, SUB, SLT

CO&ISA, NLT 2021 26


ALU symbol and interconnection

CO&ISA, NLT 2021 27


Adding Control Unit to build single datapath

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]

CO&ISA, NLT 2021 29


R-type Instruction Data/Control Flow

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]

CO&ISA, NLT 2021 30


Load Word Instruction Data/Control Flow

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Mark active Instr[15-0] Sign ALU


connections during 16 Extend 32 control

execution flow Instr[5-0]

CO&ISA, NLT 2021 31


Load Word Instruction Data/Control Flow

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Instr[15-0] Sign ALU


16 Extend 32 control
Instr[5-0]

CO&ISA, NLT 2021 32


Branch Instruction Data/Control Flow

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Mark active Instr[15-0] Sign ALU


connections during 16 Extend 32 control

execution flow Instr[5-0]

CO&ISA, NLT 2021 34


Branch Instruction Data/Control Flow

0
Add
Add 1
4 Shift
left 2 PCSrc
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Register Read Address
Memory Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Mark active Instr[15-0] Sign ALU


connections during 16 Extend 32 control

execution flow Instr[5-0]

CO&ISA, NLT 2021 35


Adding the Jump Operation
Instr[25-0]
Shift 1
28 32
26 left 2
PC+4[31-28] 0
Add 0
Add 1
4 Shift
left 2 PCSrc
Jump
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Mark active
Instr[15-0] Sign
connections during ALU
16 Extend 32 control
execution flow
Instr[5-0]

CO&ISA, NLT 2021 36


Adding the Jump Operation
Instr[25-0]
Shift 1
28 32
26 left 2
PC+4[31-28] 0
Add 0
Add 1
4 Shift
left 2 PCSrc
Jump
ALUOp Branch
MemRead
Instr[31-26] Control MemtoReg
Unit MemWrite
ALUSrc

RegWrite
RegDst
ovf
Instr[25-21] Read Addr 1
Instruction
Memory Register Read Address
Instr[20-16] Read Addr 2 Data 1 zero
Data
Read File
PC Instr[31-0] 0 ALU Memory Read Data 1
Address Write Addr
1 Read 0
Instr[15 Data 2 Write Data 0
Write Data
-11] 1

Mark active
Instr[15-0] Sign
connections during ALU
16 Extend 32 control
execution flow
Instr[5-0]

CO&ISA, NLT 2021 37


Instruction Times (Critical Paths)
❑ What is the clock cycle time assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times except:
Instruction and Data Memory (200 ps)
ALU and adders (200 ps)
Register File access (reads or writes) (100 ps)

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total


R-
type
load
store
beq
jump
CO&ISA, NLT 2021 38
Instruction Critical Paths for Single cycle CPU
❑ What is the clock cycle time assuming negligible
delays for muxes, control unit, sign extend, PC access,
shift left 2, wires, setup and hold times except:
Instruction and Data Memory (200 ps)
ALU and adders (200 ps)
Register File access (reads or writes) (100 ps)

Instr. I Mem Reg Rd ALU Op D Mem Reg Wr Total


R- 200 100 200 100 600
type
load 200 100 200 200 100 800
store 200 100 200 200 700
beq 200 100 200 500
jump 200 200
CO&ISA, NLT 2021 39
Single Cycle Disadvantages & Advantages
❑ Uses the clock cycle inefficiently – the clock cycle must
be timed to accommodate the slowest instruction
l especially problematic for more complex instructions like
floating point multiply

Cycle 1 Cycle 2
Clk

lw sw Waste

❑ May be wasteful of area since some functional units


(e.g., adders) must be duplicated since they can not be
shared during a clock cycle
but
❑ Is simple and easy to understand

CO&ISA, NLT 2021 40


How Can We Make The Computer Faster?
❑ Divide instruction cycles into smaller cycles
❑ Executing instructions in parallel
l With only one CPU?

❑ Pipelining:
l Start fetching and executing the next instruction before the
current one has completed
l Overlapping execution

CO&ISA, NLT 2021 41


Pipeline in real life

CO&ISA, NLT 2021 42


A more serious example: laundry work
❑ Pipelined laundry boots performance up to 4 times

◼ With 4 loads
Tnormal = 4*2 = 8 hours
Tpipeline = 3.5 hours

◼ With n loads
Tnormal = n*2 hours
Tpipeline = (3+n)/2 hours

4 stages: washing, drying, ironing, folding


When n →  : Tnormal → 4*Tpipeline
CO&ISA, NLT 2021 43
MIPS Pipeline
❑ Five stages, one step per stage
l IFetch: Instruction Fetch and Update PC
l Dec: Registers Fetch and Instruction Decode
l Exec: Execute R-type; calculate memory address
l Mem: Read/write the data from/to the Data Memory
l WB: Write the result data into the register file

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IFetch Dec Exec Mem WB

Execution time for a single instruction is always 5 cycles, regardless


of instruction operation

CO&ISA, NLT 2021 44


Instruction pipeline

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5

IFetch Dec Exec Mem WB IFetch Dec Exec Mem WB

Cycle 1 Cycle 2 Cycle 3 Cycle 4 Cycle 5 Cycle 6 Cycle 7 Cycle 8

lw IFetch Dec Exec Mem WB Instructions in


pipeline
sw IFetch Dec Exec Mem WB

R-type IFetch Dec Exec Mem WB

Start fetching and executing the More than one instruction are
next instruction before the current executed at a time
one has completed
CO&ISA, NLT 2021 45
Single Cycle versus Pipeline

Single Cycle Implementation (CC = 800 ps):


Cycle 1 Cycle 2
Clk

lw sw Waste

Pipeline Implementation (CC = 200 ps): 400 ps


lw IFetch Dec Exec Mem WB

sw IFetch Dec Exec Mem WB

R-type IFetch Dec Exec Mem WB

❑ To complete an entire instruction in the pipelined case


takes 1000 ps (as compared to 800 ps for the single
cycle case). Why ?
❑ How long does each take to complete 1,000,000 adds ?
CO&ISA, NLT 2021 47
Example with lw instructions

Single-cycle (Tc= 800ps)

Pipelined (Tc= 200ps)

CO&ISA, NLT 2021 48


Pipeline hazards
❑ Pipeline can lead us into troubles!!!
❑ Hazards: situations that prevent starting the next
instruction in the next cycle
l structural hazards: attempt to use the same resource by two
different instructions at the same time
l data hazards: attempt to use data before it is ready
- An instruction’s source operand(s) are produced by a prior
instruction still in the pipeline
l control hazards: attempt to make a decision about program
control flow before the condition has been evaluated and the
new PC target address calculated
- branch and jump instructions, exceptions

❑ In most cases, hazard can be solved simply by waiting


l but we need better solutions to take advantages of pipeline

CO&ISA, NLT 2021 50


Structural hazard
❑ Conflict for use of a resource
❑ In MIPS pipeline with a single memory
l Load/store requires data access
l Instruction fetch would have to stall for that cycle
- Would cause a pipeline “bubble”

❑ Hence, pipelined datapath require separate


instruction/data memories
l Or separate instruction/data caches

CO&ISA, NLT 2021 51


A Single Memory Would Be a Structural Hazard
Time (clock cycles)

Reading data from


lw

ALU
I Mem Reg Mem Reg
memory
n
s

ALU
t Inst 1 Mem Reg Mem Reg
r.

ALU
O Inst 2 Mem Reg Mem Reg
r
d

ALU
e Inst 3 Mem Reg Mem Reg
r

ALU
Inst 4 Mem Reg Mem Reg
Reading instruction
from memory

❑ Fix with separate instr and data memories (I$ and D$)
CO&ISA, NLT 2021 52
How About Register File Access?
Time (clock cycles)

Fix register file


add $1,

ALU
I IM Reg DM Reg access hazard by
n doing reads in the
s second half of the

ALU
t Inst 1 IM Reg DM Reg
cycle and writes in
r. the first half

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e add $2,$1, IM Reg DM Reg
r

clock edge that controls clock edge that controls


register writing loading of pipeline state
CO&ISA, NLT 2021
registers 53
Data hazard

❑ An instruction depends on completion of data access by a


previous instruction
add $s0, $t0, $t1
sub $t2, $s0, $t3

CPU must wait


until data in s0
becomes valid

CO&ISA, NLT 2021 54


Example
❑ Dependencies backward in time cause hazards

ALU
add $1, IM Reg DM Reg

ALU
sub $4,$1,$5 IM Reg DM Reg

ALU
and $6,$1,$7 IM Reg DM Reg

ALU
or $8,$1,$9 IM Reg DM Reg

ALU
xor $4,$1,$5 IM Reg DM Reg

❑ Read before write data hazard


CO&ISA, NLT 2021 55
Example
❑ Dependencies backward in time cause hazards

ALU
I lw $1,4($2) IM Reg DM Reg
n
s

ALU
t sub $4,$1,$5 IM Reg DM Reg
r.

ALU
O and $6,$1,$7 IM Reg DM Reg
r
d

ALU
e or $8,$1,$9 IM Reg DM Reg
r

ALU
xor $4,$1,$5 IM Reg DM Reg

❑ Load-use data hazard


CO&ISA, NLT 2021 56
Solving hazard with forwarding

❑ Use result when it is computed


Don’t wait for it to be stored in a register
Requires extra connections in the datapath

❑ Forward from EX to EX (output to input)

CO&ISA, NLT 2021 57


Load-Use Data Hazard

❑ One cycle stall is necessary


❑ Forward from MEM (output) to EX (input)

CO&ISA, NLT 2021 58


Code Scheduling to Avoid Stalls

❑ Reorder code to avoid use of load result in the next


instruction
❑ C code: A = B + E;
C = B + F;

lw $t1, 0($t0) lw $t1, 0($t0)


lw $t2, 4($t0) lw $t2, 4($t0)
stall add $t3, $t1, $t2 lw $t4, 8($t0)
sw $t3, 12($t0) add $t3, $t1, $t2
lw $t4, 8($t0) sw $t3, 12($t0)
stall add $t5, $t1, $t4 add $t5, $t1, $t4
sw $t5, 16($t0) sw $t5, 16($t0)
13 cycles 11 cycles

CO&ISA, NLT 2021 59


Control Hazards
❑ Branch determines flow of control
Fetching next instruction depends on branch outcome
Pipeline can’t always fetch correct instruction
- Still working on ID stage of branch

❑ In MIPS pipeline
Need to compare registers and compute target early in the
pipeline
Add hardware to do it in ID stage

CO&ISA, NLT 2021 60


Branch Instructions Cause Control Hazards
❑ Dependencies backward in time cause hazards

beq

ALU
I IM Reg DM Reg
n
s

ALU
t lw IM Reg DM Reg
r.

ALU
O Inst 3 IM Reg DM Reg
r
d

ALU
e Inst 4 IM Reg DM Reg
r

CO&ISA, NLT 2021 61


Stall on Branch

❑ Naïve approach: Wait until branch outcome determined


before fetching next instruction

Performance affect: assume that 17% of instructions in program are


branches, if each branch take one cycle for the stall, then performance
will be 17% slower. (CPI = 1.17)

CO&ISA, NLT 2021 62


Branch Prediction
❑ Predict outcome of branch
❑ Only stall if prediction is wrong
❑ In MIPS pipeline
Can predict branches not taken
Fetch instruction after branch, with no delay

CO&ISA, NLT 2021 63


MIPS with Predict Not Taken

Prediction
correct

Prediction
incorrect

CO&ISA, NLT 2021 64


More-Realistic Branch Prediction

❑ Static branch prediction


Based on typical branch behavior
Example: loop and if-statement branches
- Predict backward branches taken
- Predict forward branches not taken

❑ Dynamic branch prediction


Hardware measures actual branch behavior
- e.g., record recent history of each branch
Assume future behavior will continue the trend
- When wrong, stall while re-fetching, and update history
As good as > 90% accuracy

CO&ISA, NLT 2021 65


Summary: Pipeline Operation
Time (clock cycles)

Once the

ALU
I Inst 0 IM Reg DM Reg pipeline is full,
n one instruction
s is completed

ALU
t Inst 1 IM Reg DM Reg
every cycle, so
r. CPI = 1

ALU
O Inst 2 IM Reg DM Reg
r
d

ALU
e Inst 3 IM Reg DM Reg
r

ALU
Inst 4 IM Reg DM Reg

Time to fill the pipeline

CO&ISA, NLT 2021 66


Pipelined datapath
❑ How to share/isolate data between different stages?

CO&ISA, NLT 2021 67


Pipelined datapath
❑ Adding pipeline registers

❑ 5 stages, why only 4 registers?


CO&ISA, NLT 2021 68
Instruction fetch in pipeline

CO&ISA, NLT 2021 69


Instruction decode

CO&ISA, NLT 2021 70


Execution

CO&ISA, NLT 2021 71


Memory access

CO&ISA, NLT 2021 72


Write-back

CO&ISA, NLT 2021 73


Correction to support lw instruction

CO&ISA, NLT 2021 74


Pipeline diagram

CO&ISA, NLT 2021 75


Pipeline diagram

CO&ISA, NLT 2021 76


Control signals in pipeline

CO&ISA, NLT 2021 77


Pipelined control

CO&ISA, NLT 2021 78


Solving read-after-write hazard

CO&ISA, NLT 2021 79


Solving read-after-write hazard

CO&ISA, NLT 2021 80


Solving load-use hazard

CO&ISA, NLT 2021 81


Solving load-use hazard

CO&ISA, NLT 2021 82


Solving load-use hazard

CO&ISA, NLT 2021 83


Solving control hazard

CO&ISA, NLT 2021 84


Pipeline when
Branch taken

CO&ISA, NLT 2021 85


Summary

❑ All modern-day processors use pipelining


❑ Pipelining doesn’t help latency of single task, it helps
throughput of entire workload
❑ Potential speedup: a CPI of 1 and a fast CC
❑ Must detect and resolve hazards
Stalling negatively affects CPI (makes CPI less than the ideal
of 1)

CO&ISA, NLT 2021 86

You might also like