5 Singlecycle
5 Singlecycle
5 Singlecycle
Datapath
The Big Picture: The Performance Perspective
CPI
° Performance of a machine is determined by:
• Instruction count
• Clock cycle time
• Clock cycles per instruction Inst. Count Cycle Time
Arithmetic subtract sub $s1, $s2, $s3 $s1 = $s2 - $s3 Three operands; data in registers
add immediate addi $s1, $s2, 100 $s1 = $s2 + 100 Used to add constants
load word lw $s1, 100($s2) $s1 = Memory[$s2 + 100] Word from memory to register
store word sw $s1, 100($s2) Memory[$s2 + 100] = $s1 Word from register to memory
Data transfer load byte lb $s1, 100($s2) $s1 = Memory[$s2 + 100] Byte from memory to register
store byte sb $s1, 100($s2) Memory[$s2 + 100] = $s1 Byte from register to memory
load upper immediate lui $s1, 100 $s1 = 100 * 2
16 Loads constant in upper 16 bits
branch on equal beq $s1, $s2, 25 if ($s1 == $s2) go to Equal test; PC-relative branch
PC + 4 + 100
branch on not equal bne $s1, $s2, 25 if ($s1 != $s2) go to Not equal test; PC-relative
PC + 4 + 100
Conditional
branch set on less than slt $s1, $s2, $s3 if ($s2 < $s3) $s1 = 1; Compare less than; for beq, bne
else $s1 = 0
set less than slti $s1, $s2, 100 if ($s2 < 100) $s1 = 1; Compare less than constant
immediate else $s1 = 0
6
Register Transfer Logic (RTL)
9
CP
U
° Single-cycle CPU (CPI Imp= 1)
lem
• All instructions execute in a single, long clock cycle
ent
atio
° Multi-cycle CPU (CPIns = n)
• Instructions can take a different number of short clock
cycles to execute
10
Abs
trac
t/
Sim
plifi
ed
Data
Vie
Register w
#
PC Address Instruction Registers ALU Address
Instruction Register #
memory Data
Register # memory
Data
11
Stat
e
Ele
° Unclocked vs clocked
me
° Clocks
nts used in synchronous logic
falling edge
cycle time
rising edge
12
Unc
lock
° Set-reset latched
Stat
• Output depends on present inputs and also on past
inputs
e
Ele
me
nt
13
Lat
che
° Output is equal to the sstored value inside the
element and
(don't need to ask forFlip
permission to look at the
-
value)
flop
° Change of state (value) s is based on the clock
14
D-
latc
° Two inputs: h
• the data value to be stored (D)
• the clock signal (C) indicating when to read & store D
° Two outputs:
• the value of the internal state (Q) and it's complement
C
D
Q
C
_
Q
D Q
15
D
flip-
° Output changes only on the
flopclock edge
(Re
gist
er)
D D Q D Q Q
D D
latch latch _ _
C C Q Q
16
Our
Imp
lem
° An edge triggered methodology
ent
° Typical execution: atio
• Read contents of somenstate elements,
• Send values through some combinational logic
• Write results to one or more state elements
State State
element Combinational logic element
1 2
Clock cycle
17
Reg
iste
r
° Built using D flip-flips
File
Read register
number 1
Register 0
Register 1 M
u Read data 1 Read register
x number 1 Read
Register n – 1 data 1
Register n Read register
number 2
Read register Register file
Write
number 2 register
Read
Write data 2
M data Write
u Read data 2
x
18
Reg
iste
° Note: We still use the rreal clock to determine when
to write File
Write
C
0
Register 0 Read register
1 D number 1 Read
data 1
n-to-1 C Read register
Register number number 2
decoder Register 1
D Register file
n– 1 Write
register
n Read
Write data 2
data Write
C
Register n – 1
D
C
Register n
Register data D
19
Sim
ple
Imp
° Include the functional units we need for each
instruction lem
ent
° Think of this as a puzzle!
atio
n
Instruction
address
PC
Instruction Add Sum
MemWrite
Instruction
memory
Address Read
data 16 32
Sign
a. Instruction memory b. Program counter c. Adder extend
Write Data
data memory
RegWrite
a. Registers b. ALU
20
Inst
ruct
ion
° Identify which components each instruction type
Ord
would use and in what order: ALU-Type, LW, SW,
BEQ erin
g
ALU control MemWrite
5 Read 3 ALU control
register 1 5 Read 3
Instruction Instruction Read register 1
data 1 Read
address address Register 5 Read data 1
Register 5 Read Address Read
numbers register 2 Zero numbers register 2 Zero data 16 32
PC PC Registers Data ALU ALU Registers Data ALU ALU Sign
Instruction Add Sum 5 Write 5 Write extend
Instruction Add Sum result register
result
Data
register Read Write
Instruction Instruction Read data 2 data memory
Write
memory memory Write data 2 Data data
Data data RegWrite
MemRead
RegWrite
a. Registers b. ALU
a. Instruction memory b. Program counter c. Adder a. Instruction memory b. Program counter c. Adder
a. Data memory unit b. Sign-extension unit
a. Registers b. ALU
ALU-Type LW SW BEQ
(ADD R3, R2, R1) (LW R2, 4(R1)) (SW R2, 8(R1)) (BEQ R3, R1, displace)
1. PC 1. PC 1. PC 1. PC
2. I-Mem 2. I-Mem 2. I-Mem 2. I-Mem
3. Registers 3. Base. Reg. 3. Base. Reg 3. Register Access
4. ALU 4. ALU 4. ALU 4. Compare
5. WB to Reg. 5. Read Mem 5. Write Mem 5. If Zero,
6. WB to Reg. Update PC = PC+disp
21
Fet
ch
° Required operations Co
mp PC and reading instruction from memory
• Taking address from
• Incrementing PCone
to point at next instruction
nts
° Components
• PC register
• Instruction Memory / Cache
• Adder to increment PC value
22
Fet
ch
° PC value serves as address
Dat to instruction memory
apa by 4 using the adder
while also being incremented
th
° Instruction word is returned by memory after some
delay
° New PC value is clocked into PC register at end of
clock cycle
23
Fet
ch
° The PC and adder operationDat is shown
apa
• The PC doesn’t update until the end of the current cycle
th
° The instruction being readExa out from the instruction
memory mpl
•
e
We have shown “assembly” syntax and the field by field
machine code breakdown
24
Mo
difi
ed
° Support for branch instruction added
Fet
ch
Dat
apa
th
25
Mo
difi
° Mux provides a path ed
for branch target address
Fet
ch
Exa
mpl
e
26
Dec
ode
° Opcode and func. field are decoded to produce other control
signals
° Execution of an ALU instruction (ADD $3,$1,$2) requires
reading 2 register values and writing the result to a third
° REGWrite is an enable signal indicating the write data should
be written to the specified register
27
ALU
Dat
apa
° ALU takes inputs from register file and performs
theth
add, sub, and, or, slt operations
° Result is written back to dest. register
28
Me
mor
° Operands are read from y register file while offset is sign
extended Acc
° ess
ALU calculates effective address
Dat
° apa
Memory access is performed
° th back to register
If LW, read data is written
29
Me
mor
° LW and SW yrequire:
Acc
• Sign extension
ess unit for address offset
• ALU to compute (add) base address + offset
• Data memory
30
Bra
nch
Datrequires…
° BEQ
•apa
ALU for comparison (examine ‘zero’ output)
th
• Sign extension unit for branch offset
• Adder to add PC and offset
- Need a separate adder since ALU is used to perform
comparison
31
Co
mbi
nindatapaths into one
° Combine all
g
Dat
apa
We can have
ths multiple options for certain
inputs
32
ALU
Src
Mux
° Mux controlling second input to ALU
• ALU instruction provides Read Register 2 data to the 2nd input
of ALU
• LW/SW uses 2nd input of ALU as an offset to form effective
address
33
Me
mto
Reg
° Mux controlling writeback value to register file
•Mux
ALU instructions use the result of the ALU
• LW uses the read data from data memory
34
PCS
rc
Mux
° Next instruction can either be PC+4, or the branch
target address, PC+Offset
35
Reg
Dst
Mux destination register ID fields for ALU and LW
° Different
instructions
36
Sin
gle
Cyc
le
CP
U
Dat
apa
th
37
Con
trol
° We now have data path in place, but how do we
control which path an instruction will take?
° Single-Cycle Design:
• Instruction takes exactly one clock cycle
• Datapath units used only once per cycle
• Writable state updated at end of cycle
38
Con
trol
° Single-Cycle Design: everything
happens in one clock cycle
• until next falling edge of clock,
processor just one big combinational
circuit!!!
• control is just a combinational circuit
(output, just function of inputs)
° outputs? control points in datapath
° inputs? the current instruction!
(opcode, funct control everything)
39
Dat
apa
th +
Con
trol
40
Defi
nin
° Most g control signals are a function of the opcode
Con
(i.e. LW/SW, R-Type, Branch, Jump)
trol
41
Defi
nin
g field only present in R-type instruction
° Funct
•Con
Funct controls ALU only
trol
° To simplify control, define Main and ALU control
separately
42
ALU
Con
trol
° ALU Control needs to know what instruction type it is:
• R-Type (op. depends on func. code)
• LW/SW (op. = ADD)
• BEQ (op. = SUB)
° Let main control unit produce ALUOp[1:0] to indicate
instruction type, then use function bits if necessary to tell the
ALU what to do
43
ALU
Con
trol
ALUcon
A
A Zero
L
U Result
B Note: We don’t use NOR.
Ignore MSB in ALUCon.
ALUCon ALU function Instruction supported
0000 AND R-format (AND)
0001 OR R-format (OR)
0010 Add R-format (Add), lw, sw
0110 Subtract R-format (Sub), beq
0111 Set on less than R-format (Slt)
1100 NOR R-format (Nor)
44
ALU
Con
° ALUControl[2:0] istrol
a function of ALUOp[1:0] and Func[5:0]
Trut
h
Tabl
Instruc. Instruction e
Desired ALUOp[1:0] Func[5:0] ALUControl
Operation ALU Action
LW Load Word Add 00 X 010
SW Store Word Add 00 X 010
Branch BEQ Subtract 01 X 110
R-Type AND And 10 100100 000
R-Type OR Or 10 100101 001
R-Type Add Add 10 100000 010
R-Type Sub Subtract 10 100010 110
R-Type SLT Set on less 10 101010 111
than
45
Simplified ALUControl
Truth Table
° We can simplify using don’t cares
° Can turn into gates
Funct Field ALUCont.
ALUOp1 ALUOp0 F5 F4 F3 F2 F1 F0
0 0 X X X X X X 010
X 1 X X X X X X 110
1 X X X 0 0 0 0 010
1 X X X 0 0 1 0 110
1 X X X 0 1 0 0 000
1 X X X 0 1 0 1 001
1 X X X 1 0 1 0 111
46
Dat
apa
th +
Con
trol
48
Mai
n
Con are function of opcode
° Main control signals
trol
Sig
nals
Could generate each control signal Simpler for humans to design if we decode
by writing full truth table of the 6-bit opcode the opcode and then use instruction signals
to generate desired control signals
49
The “Truth Table” for the Main
Control
RegDst
func
ALUSrc ALU ALUctr
op Main 6
6
:
Control ALUop Control 3
(Local)
3
51
Rev
iew
– R-
typ PCSrc
Add
e 1
M
u
4
Inst ALU
Add result
x
0
ruct RegWrite Shift
left 2
ALUOp
Rev
iew
–
Lw PCSrc
Add
Inst 1
M
u
4 ruct ALU
Add result
x
0
ALUOp
Review – Branch Instructions
PCSrc
1
Add M
u
x
4 ALU 0
Add result
RegWrite Shift
left 2
ALUOp
Jump Instruction Implementation
55
Ju
mp
Inst
° JAL (Jump and Link) for function calls
ruct
° Howiondo we add JAL to data path?
• Place PC+4 (return address) into $ra by
• 1. Extend RegDst mux to include 31 ($ra)
• 2. Extend MemtoReg mux at write data input to have PC+4
as input
Executing this instruction requires Setting control signals
accordingly to execute the jump and write pc+4 to Ra.
56
Clocking Methodology - Negative Edge Triggered
Clk
Setup Hold Setup Hold
Don’t Care
. . . .
. . . .
. . . .
32 Address
32 Ideal
32 32-bit
ALU
Data
PC
Clk Clk
32
Clk
Worst case delay for load is much longer than needed for all
other instructions, yet this sets the cycle time.
Summary
° 5 steps to design a processor
• 1. Analyze instruction set => datapath requirements
• 2. Select set of datapath components & establish clock
methodology
• 3. Design datapath meeting the requirements
• 4. Analyze implementation of each instruction to determine setting
of control points that effects the register transfer.
• 5. Design the control logic
° MIPS makes it easier
• Instructions same size
• Source registers always in same place
• Immediates same size, location
• Operations always on registers/immediates
° Single cycle datapath => CPI=1, CCT => long