L11 Pipelined Datapath and
L11 Pipelined Datapath and
L11 Pipelined Datapath and
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
1
Pipeline terminology
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
2
Pipelined datapath and control
Now we’ll see a basic implementation of a pipelined processor.
— The datapath and control unit share similarities with both the single-
cycle and multicycle implementations that we already saw.
— An example execution highlights important pipelining concepts.
In future lectures, we’ll discuss several complications of pipelining that
we’re hiding from you for now.
3
Pipelining concepts
A pipelined processor allows multiple instructions to execute at once, and
each instruction uses a different functional unit in the datapath.
This increases throughput, so programs can run faster.
— One instruction can finish executing on every clock cycle, and simpler
stages also lead to shorter cycle times.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $t5, $t6, $0 IF ID EX MEM WB
4
Pipelined Datapath
The whole point of pipelining is to allow multiple instructions to execute
at the same time.
We may need to perform several operations in the same cycle.
— Increment the PC and add registers at the same time.
— Fetch one instruction while another one reads or writes data.
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $t5, $t6, $0 IF ID EX MEM WB
5
One register file is enough
We need only one register file to support both the ID and WB stages.
Read Read
register 1 data 1
Read Read
register 2 data 2
Write
register
Registers
Write
data
6
Single-cycle datapath, slightly rearranged
1
PCSrc
4
Add
P Add
C Shift
RegWrite left 2
Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1
7
Registers added to the multi-cycle
PCWrite
PC ALUSrcA
IorD
0
RegDst RegWrite M
MemRead u
0 0
x
M Read Read M
A 1 ALU
u Address register 1 data 1 u
Zero
x x
Read ALU
1 IRWrite Result 1
Memory register 2 Read B Out
0 data 2 0
[31-26] M Write 4 1 PCSource
Write Mem u register
[25-21] 2 ALUOp
data Data x
[20-16] Write
1 Registers 3
[15-11] data
MemWrite [15-0]
Instruction 0 ALUSrcB
register M
u Sign Shift
Memory x extend left 2
data 1
register
MemToReg
10
Pipeline registers
We’ll add intermediate registers to our pipelined datapath too.
There’s a lot of information to save, however. We’ll simplify our diagrams
by drawing just one big pipeline register between each stage.
The registers are named for the stages they connect.
11
Pipelined datapath
1
PCSrc
Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1
12
Propagating values forward
Any data values required in later stages must be propagated through the
pipeline registers.
The most extreme example is the destination register.
— The rd field of the instruction word, retrieved in the first stage (IF),
determines the destination register. But that register isn’t updated
until the fifth stage (WB).
— Thus, the rd field must be passed through all of the pipeline stages,
as shown in red on the next slide.
Why can’t we keep a single instruction register like we did in the multi-
cycle data-path?
13
The destination register
1
PCSrc
Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1
14
What about control signals?
The control signals are generated in the same way as in the single-cycle
processor—after an instruction is fetched, the processor decodes it and
produces the appropriate control values.
But just like before, some of the control signals will not be needed until
some later stage and clock cycle.
These signals must be propagated through the pipeline until they reach
the appropriate stage. We can just pass them in the pipeline registers,
along with the other data.
Control signals can be categorized by the pipeline stage that uses them.
15
Pipelined datapath and control
1
0
ID/EX
WB EX/MEM
PCSrc
Control M WB MEM/WB
IF/ID EX M WB
4
Add
P Add
C Shift
RegWrite left 2
Read Read
register 1 data 1 MemWrite
ALU
Read Instruction Zero
Read Read
address [31-0] 0
register 2 data 2 Result Address
Write
1 Data
Instruction register MemToReg
memory
memory Registers ALUOp
Write
data ALUSrc Write Read
1
data data
Instr [15 - 0] Sign
RegDst
extend MemRead
0
Instr [20 - 16]
0
Instr [15 - 11]
1
16
Notes about the diagram
The control signals are grouped together in the pipeline registers, just to
make the diagram a little clearer.
Not all of the registers have a write enable signal.
— Because the datapath fetches one instruction per cycle, the PC must
also be updated on each clock cycle. Including a write enable for the
PC would be redundant.
— Similarly, the pipeline registers are also written on every cycle, so no
explicit write signals are needed.
19
An example execution sequence
Here’s a sample sequence of instructions to execute.
We’ll make some assumptions, just so we can show actual data values.
— Each register contains its number plus 100. For instance, register $8
contains 108, register $29 contains 129, and so forth.
— Every data memory location contains 99.
Our pipeline diagrams will follow some conventions.
— An X indicates values that aren’t important, like the constant field of
an R-type instruction.
— Question marks ??? indicate values we don’t know, usually resulting
from instructions coming before and after the ones in our example.
20
Cycle 1 (filling)
IF: lw $8, 4($29) ID: ??? EX: ??? MEM: ??? WB: ???
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P 1004
Add
C Shift
RegWrite (?) left 2
???
21
Cycle 2
IF: sub $2, $4, $5 ID: lw $8, 4($29) EX: ??? MEM: ??? WB: ???
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P 1008
Add
C Shift
RegWrite (?) left 2
29 129 ???
1004 Read Read
register 1 data 1 MemWrite (?)
ALU
Read Instruction X X ??? Zero
Read Read ???
address [31-0] 0
register 2 data 2 Result Address
??? Write ??? MemToReg
1 Data
Instruction register (?)
memory
memory ??? Registers ALUOp (???)
Write ???
data ALUSrc (?) ??? Write Read
1
data data
4 Sign ???
RegDst (?) ???
extend MemRead (?)
0
8 ???
0 ??? ??? ???
X ???
1
???
22
Cycle 3
IF: and $9, $10, $11 ID: sub $2, $4, $5 EX: lw $8, 4($29) MEM: ??? WB: ???
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P 1012
Add
C Shift
RegWrite (?) left 2
4 104 129
1008 Read Read
register 1 data 1 MemWrite (?)
ALU
Read Instruction 5 X Zero
Read Read 105 ???
address [31-0] 0
register 2 data 2 Result Address
4
??? Write 133 MemToReg
1 Data
Instruction register (?)
memory
memory ??? Registers ALUOp (add)
Write
??? Write ???
data ALUSrc (1) Read
1
data data
X Sign 4
RegDst (0)
extend MemRead (?) ???
0
X 8
0 8 ??? ???
2 X
1
???
23
Cycle 4
IF: or $16, $17, $18 ID: and $9, $10, $11 EX: sub $2, $4, $5 MEM: lw $8, 4($29) WB: ???
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P 1016
Add
C Shift
RegWrite (?) left 2
10 110 104
1012 Read Read
register 1 data 1 MemWrite (0)
ALU
Read Instruction 11 105 Zero
Read Read 111 133
address [31-0] 0
register 2 data 2 Result Address
–1
??? Write MemToReg
1 Data
Instruction register (?)
memory
memory ??? Registers ALUOp (sub)
Write
99 ???
data ALUSrc (0) X Write Read
1
data data
X Sign X
RegDst (1)
extend MemRead (1) ???
0
X X
0 2 8 ???
9 2
1
???
24
Cycle 5 (full)
IF: add $13, $14, $0 ID: or $16, $17, $18 EX: and $9, $10, $11 MEM: sub $2, $4, $5 WB:
lw $8, 4($29)
1
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P 1020
Add
C Shift
RegWrite (1) left 2
17 117 110
1016 Read Read
register 1 data 1 MemWrite (0)
ALU
Read Instruction 18 111 Zero
Read Read 118 -1
address [31-0] 0
register 2 data 2 Result Address
8 Write 110 MemToReg
1 Data
Instruction register (1)
memory
memory 99 Registers ALUOp (and)
Write
X 99
data ALUSrc (0) 105 Write Read
1
data data
X Sign X
RegDst (1)
extend MemRead (0) 133
0
X X
0 9 2 8
16 9
1
99
25
Cycle 6 (emptying)
IF: ??? ID: add $13, $14, $0 EX: or $16, $17, $18 MEM: and $9, $10, $11 WB: sub
$2, $4, $5
1
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P ???
Add
C Shift
RegWrite (1) left 2
14 114 117
1020 Read Read
register 1 data 1 MemWrite (0)
ALU
Read Instruction 0 0 118 Zero
Read Read 110
address [31-0] 0
register 2 data 2 Result Address
2 Write 119 MemToReg
1 Data
Instruction register (0)
memory
memory -1 Registers ALUOp (or)
Write
X
data ALUSrc (0) 111 Write Read
1
data data
X Sign X
RegDst (1)
extend MemRead (0)
0
X X
0 16 9
13 16
1
26
Cycle 7
IF: ??? ID: ??? EX: add $13, $14, $0 MEM: or $16, $17, $18 WB: and
$9, $10, $11
1
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P ???
Add
C Shift
RegWrite (1) left 2
110
27
Cycle 8
IF: ??? ID: ??? EX: ??? MEM: add $13, $14, $0 WB: or $16,
$17, $18
1
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P ???
Add
C Shift
RegWrite (1) left 2
119
28
Cycle 9
IF: ??? ID: ??? EX: ??? MEM: ??? WB: add
$13, $14, $0
1
0 ID/EX
WB EX/MEM
PCSrc Control M WB MEM/WB
IF/ID EX M WB
4
Add
P ???
Add
C Shift
RegWrite (1) left 2
114
29
That’s a lot of diagrams there
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $t5, $t6, $0 IF ID EX MEM WB
Compare the last nine slides with the pipeline diagram above.
— You can see how instruction executions are overlapped.
— Each functional unit is used by a different instruction in each cycle.
— The pipeline registers save control and data values generated in
previous clock cycles for later use.
— When the pipeline is full in clock cycle 5, all of the hardware units
are utilized. This is the ideal situation, and what makes pipelined
processors so fast.
Try to understand this example or the similar one in the book at the end
of Section 6.3.
30
Performance Revisited
ALU
mem Read Mem Write
31
Ideal speedup
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
32
The pipelining paradox
Clock cycle
1 2 3 4 5 6 7 8 9
lw $t0, 4($sp) IF ID EX MEM WB
sub $v0, $a0, $a1 IF ID EX MEM WB
and $t1, $t2, $t3 IF ID EX MEM WB
or $s0, $s1, $s2 IF ID EX MEM WB
add $sp, $sp, -4 IF ID EX MEM WB
Pipelining does not improve the execution time of any single instruction.
Each instruction here actually takes longer to execute than in a single-
cycle datapath (15ns vs. 12ns)!
Instead, pipelining increases the throughput, or the amount of work done
per unit time. Here, several instructions are executed together in each
clock cycle.
The result is improved execution time for a sequence of instructions, such
as an entire program.
33
Instruction set architectures and pipelining
The MIPS instruction set was designed especially for easy pipelining.
— All instructions are 32-bits long, so the instruction fetch stage just
needs to read one word on every clock cycle.
— Fields are in the same position in different instruction formats—the
opcode is always the first six bits, rs is the next five bits, etc. This
makes things easy for the ID stage.
— MIPS is a register-to-register architecture, so arithmetic operations
cannot contain memory references. This keeps the pipeline shorter
and simpler.
Pipelining is harder for older, more complex instruction sets.
— If different instructions had different lengths or formats, the fetch
and decode stages would need extra time to determine the actual
length of each instruction and the position of the fields.
— With memory-to-memory instructions, additional pipeline stages may
be needed to compute effective addresses and read memory before
the EX stage.
34
Summary
The pipelined datapath combines ideas from the single and multicycle
processors that we saw earlier.
— It uses multiple memories and ALUs.
— Instruction execution is split into several stages.
Pipeline registers propagate data and control values to later stages.
The MIPS instruction set architecture supports pipelining with uniform
instruction formats and simple addressing modes.
35