05SingleCycleCPU_1410693631

Download as pdf or txt
Download as pdf or txt
You are on page 1of 48

Computer Architecture

Single-Cycle Microarchitecture

Jaewoong Sim
Electrical and Computer Engineering
Seoul National University
• CPU performance factors
• Instruction Count
} Impacted by ISA and compiler
• CPI (cycles per instruction) and cycle time
} Impacted by CPU hardware (microarchitecture)

• There can be many different HW implementations for an ISA


• Today, we will look at a simple, single-cycle microarchitecture
} Single Cycle è CPI = 1 (caveat: clock cycle time needs to be very long)
• We will look at more realistic multi-cycle, pipelined version later J

Reading
• P&H Ch 4.1 – 4.4

2
• Formats: Four core instruction formats + two small variations

• Simple Decoding
• 4 bytes per instruction, regardless of format
• rs1, rs2, rd in the same position across different formats

3
• Combinational Circuits
• The outputs depend only on the combination of the current values
• Memoryless
• Also called combinatorial circuits

• Sequential Circuits
• The outputs depend on the input sequence
(i.e., both current and previous values of the inputs)
• Thus, a sequential circuit has memory

• Finite State Machines (FSMs)


• A synchronous sequential circuit with k registers that can be one of a finite
number (2k) of unique states
• FSM = next state logic + output logic + a register that stores the state
• On each clock edge, FSM advances to the next state
4
Implementing the ISA
Microarchitecture

5
• Now that we have an ISA, which defines
• Architectural States + Instructions + Others...

• (Recall) Two key concepts of von Neumann architecture


• Stored Program
• Sequential Instruction Processing
1. Program counter (instruction pointer) identifies the current instruction
2. Instruction is fetched from memory and is executed one at a time
3. Program counter is advanced (according to instruction)
4. Repeat

• We need to build a machine that processes instructions


(one by one for now)

6
• What does processing an instruction mean?

• Remember: Instructions specify how to transform the values of AS

AS = Architectural (programmer visible) state before an


instruction is processed

Process an instruction

AS’ = Architectural (programmer visible) state after an


instruction is processed
7
• ISA abstractly specifies what AS’ should be, given an instruction and AS
• It describes an abstract finite state machine (FSM)
} State = programmer visible state
} Next-state logic = instruction execution
• From ISA point of view, there are no “intermediate states” between AS and AS’
during instruction execution
} One state transition per instruction

• Microarchitecture implements how AS is transformed to AS’


• There are many choices in implementation
• We can have programmer invisible state to optimize the speed of instruction
execution: multiple state transitions per instruction
} Choice 1: AS è AS’ (transform AS to AS’ in a single clock cycle)
} Choice 2: AS è AS+MS1 è AS+MS2 è AS+MS3 è AS’
(take multiple clock cycles to transform AS to AS’)

8
I AS’ AS O
Next-State Logic Programmer
(Combinational Logic) Visible State
(PC, Registers,
MEM, ...)

• Single-cycle machine can be thought of as an abstract FSM


• Next-state (combinational) logic implements all the instructions specified in ISA
• At the rising edge of the clock, the architectural state is updated

• What about clock cycle time?


9
• Let’s start with the state elements
• States updated at the end of the clock cycle
struction
dressControl Signals
• RegWrite PC
Instruction Add Sum
• MemWrite
struction
• MemRead
memory

. Instruction memory b. Program counter c. Adder

Instruction
address

PC
Instruction Add Sum
Instruction
memory

a. Instruction memory b. Program counter c. Adder

10
**Based on figures from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
“Magic” Memory and Register File

• Combinational Read
• The output of the read data port is a combinational function of the register file
contents and the corresponding read select port

• Synchronous write
• The selected register is updated on the positive-edge clock transition when
write enable is asserted
• i.e., cannot affect read output in between clock edge

• In practice, it takes multiple cycles for memory & register files to


read/write the data

11
• 5 Generic Steps
• IF: Instruction fetch
• ID: Instruction decode and register operand fetch
• EX: ALU (execute operation or calculate memory address)
• MEM: Memory operand fetch (only for load & store instructions)
• WB: Write result back to register

WB
IF Data

Register #
PC Address Instruction Registers ALU Address
Register #
Instruction
memory Data
Register # memory
EX
ID Data
MEM 12
**Based on figures from [P&H CO&D, COPYRIGHT 2004 Elsevier. ALL RIGHTS RESERVED.]
We will build a RISC-V datapath incrementally 13
Single-Cycle Datapath for
ALU Instructions
R-Type (Register-Register Instructions)
I-Type (Register-Immediate Instructions)

14
Increment by
4 for next
instruction

15
• Assembly (e.g., register-register signed addition)
ADD rd, rs1, rs2
• Machine Encoding

funct7 rs2 rs1 funct3 rd opcode R-type


7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

• Semantics
if MEM[PC] == ADD rd rs1 rs2
GPR[rd] ¬ GPR[rs1] + GPR[rs2]
PC ¬ PC + 4
• Exceptions: none (ignore carry and overflow)
• Variations
• Arithmetic: {ADD, SUB}
• Logical: {AND, OR, XOR}
• Shift: {Left, Right-Logical, Right-Arithmetic}
• Compare: {signed, unsigned} x {Set if Less Than}
16
• Two more elements needed to implement R-format ALU operations
• Register File (RF)
• Arithmetic and Logic Unit (ALU)

IF ID EX MEM WB
if MEM[PC] == ADD rd rs1 rs2
GPR[rd] ¬ GPR[rs1] + GPR[rs2] Combinational
state update logic
PC ¬ PC + 4 17
4 ALU
ALUoperation
operation
[19:15] Read 3
register 1 Read
Read data 1
[24:20] Zero
Instruction register 2
Registers ALU ALU
[11:7] Write result
register
Read
Write data 2
data

1 RegWrite

0000000 rs2 rs1 000 rd 0110011

7 bits 5 bits 5 bits 3 bits 5 bits 7 bits


IF ID EX MEM WB
if MEM[PC] == ADD rd rs rt
GPR[rd] ¬ GPR[rs] + GPR[rt] Combinational
state update logic
PC ¬ PC + 4 18
• Assembly (e.g., register-immediate signed additions)
ADDI rd, rs1, immediate12
• Machine Encoding

immediate rs1 funct3 rd opcode I-type


12 bits 5 bits 3 bits 5 bits 7 bits

• Semantics
if MEM[PC] == ADDI rd rs1 immediate
GPR[rd] ¬ GPR[rs1] + sign-extend (immediate)
PC ¬ PC + 4
• Exceptions: none (ignore carry and overflow)
• Variations
• Arithmetic: {ADDI, SUBI}
• Logical: {ANDI, ORI, XORI}
• **Shifts by unsigned imm[4:0]: {SLLI, SRLI, SRAI}
• Compare: {signed, unsigned} x {Set if Less Than Imm}
19
• Input: 32-bit instruction

immediate rs1 funct3 rd opcode I-type


12 bits 5 bits 3 bits 5 bits 7 bits

• Output: Sign-extended 32-/64-bit


• RV32I: Take the 12 bits immediate and sign-extend to 32-bits
• RV64I: Take the 12 bits immediate and sign-extend to 64-bits

20
IF ID EX MEM WB
if MEM[PC] == ADDI rd rs1 immediate Combinational
GPR[rd] ¬ GPR[rs1] + sign-extend(immediate) state update logic
PC ¬ PC + 4 21
Add

Read [19:15]
PC address
[24:20]

Instruction
[11:7]
Instruction
memory

Q: How do we efficiently build a datapath for both R-type and I-type ALU?
22
Add

Read
PC address

Instruction

Instruction
memory

ALUSrc
0: R-Type
1: I-Type

23
Single-Cycle Datapath for
Data Movement Instructions
I-Type (Load Instructions)
S-Type (Store Instructions)

24
• Assembly (e.g., load 4-byte word)
LW rd, offset12 (base)
• Machine Encoding
immediate rs1 funct3 rd opcode I-type
12 bits 5 bits 3 bits 5 bits 7 bits

• Semantics
if MEM[PC]==LW rd offset12(base)
EA32 = sign-extend(offset12) + GPR[base]
GPR[rd] ¬ MEM32[ translate(EA) ]
PC ¬ PC + 4
• Exceptions: none for now
• Variations: {LW, LH, LHU, LB, LBU}
• e.g., LB: GPR[rd] ç sign-extend(MEM8[EA])
LBU: GPR[rd] ç zero-extend(MEM8[EA]) EA: Effective Address

25
0

ALUSrc
Add 0: R-Type
1: I-Type

Read
PC address
1

Instruction

Instruction
memory 1

if MEM[PC]==LW rd offset12 (base) IF ID EX MEM WB


EA = sign-extend(offset) + GPR[base] Combinational
GPR[rd] ¬ MEM[ translate(EA) ] state update logic 26
PC ¬ PC + 4
• Assembly (e.g., store 4-byte word)
SW rs2, offset12 (base)
• Machine Encoding
imm[11:5] rs2 rs1 funct3 imm[4:0] opcode S-type
7 bits 5 bits 5 bits 3 bits 5 bits 7 bits

• Semantics
if MEM[PC]==SW rs2 offset12(base)
EA32 = sign-extend(offset12) + GPR[base]
MEM32[ translate(EA) ] ¬ GPR[rs2]
PC ¬ PC + 4
• Exceptions: none for now
• Variations: {SW, SH, SB}
• e.g., SB: MEM8[EA] ç (GPR[rs2])[7:0]

27
• e.g., Load Datapath
• RegWrite: 1, ALUSrc: 1, MemRead: 1, MemWrite: 0
• e.g., Store Datapath
• RegWrite: 0, ALUSrc: 1, MemRead: 0, MemWrite: 1

extend

LoadExtend 28
{W, H, HU, B, BU}
• e.g., Load Datapath
• RegWrite: 1, ALUSrc: 1, MemRead: 1, MemWrite: 0, MemtoReg: 1

extend

29
Single-Cycle Datapath for
Control Flow Instructions
SB-Type (Conditional Branch)
UJ-Type (Unconditional Branch)

30
• Assembly (e.g., branch if equal)
BEQ rs1, rs2, immediate13 (note: implicit imm[0]=0)

• Machine Encoding
imm imm
[10:5] rs2 rs1 funct3 [4:1] opcode SB-type
imm[12] imm[11]

• Semantics
if MEM[PC]==BEQ rs1 rs2 immediate13
target = PC + sign-extend(immediate13)
if GPR[rs1] == GPR[rs2] then PC ç target
else PC ç PC + 4
How far can you jump?
• Exceptions: misaligned target (4-byte) if taken
• Variations: BEQ, BNE, BLT{U}, BGE{U}

31
• Read register operands (rs1, rs2)

• Compare the operands


• Use ALU, subtract and check the “Zero” output (flag)

• Compute target address


• Sign-extended offset
• Shift left 1 place (because of implicit imm[0]=0)
• Add to PC value

32
Just
re-routes
wires

Sign-bit wire
replicated

33
• Assembly
JAL rd, immediate21 (note: implicit imm[0]=0)

• Machine encoding
imm[10:1] imm[19:12] rd opcode UJ-type
5 bits 7 bits
imm[20] imm[11]

• Semantics
if MEM[PC]==JAL rd, immediate21
target = PC + sign-extend(immediate21)
GPR[rd] = PC+4
PC ¬ target How far can you jump?
• Exceptions: misaligned target (4-byte)

34
35
Control

36
• We need to add a control function to make the datapath work for a variety
of instructions!

• The following control signals are derived from a given instruction


• Single-bit control signals
• Multi-bit ALU control signal

37
• Control signals
• A combinational function of Inst=MEM[PC]

ALUOp
RegWrite
Instruction ALUSrc
MEM[PC] Decode PCSrc
Logic MemRead
MemWrite
MemtoReg

• Consider
• All R-type and I-type ALU instructions
• LW and SW
• BEQ, BNE, BLE, BGT
• JAL, JALR 38
• A combinational unit to control the ALU
• Input: 32-bit inst. è funct7, funct3, ALUOp (2-bit)
• Output: 4-bit ALU control signals (ALU operation)

• ALUOp is derived from the opcode field (i.e., instruction[6:0])


• Another combinational logic (called Control) generates ALUOp
} ALUOp (00): add (compute memory address for ld/st)
} ALUOp (01): subtract (for branch comparison)
} ALUOp (10): determined by funct7 & func3

39
• Truth table to derive the “ALU control” combinational function

• Q. Do we need all 10 bits from funct7 and funct3 for the example?

40
41
42
1
0

43
1
0

44
PCSrc

0
0

45
Now you know how to design
a single-cycle RISC-V CPU!!

However, this is a very slow CPU.


Multi-cycle CPU implementations (fast!) in the next lecture.
46
Computer Architecture
Single-Cycle Microarchitecture

Jaewoong Sim
Electrical and Computer Engineering
Seoul National University
• Lab1 will be released on Apr.04 (Tue)
• Implementing a single-cycle RISC-V CPU using Verilog
• Due: Two weeks from Tue!

• Verilog materials are on eTL


• See the materials if you are not familiar with Verilog!
• Do practice using HDLBits!

• Reading for the next lectures


• P&H 4.5
• P&H Appendix C

48

You might also like