Unit 5a - CPU Design

Download as pdf or txt
Download as pdf or txt
You are on page 1of 64

Unit 5

CPU Design
Introduction

• CPU contains three main sections


– register section,
– arithmetic/logic unit (ALU)
– control unit.
• These three work together to perform the
sequences of micro-operations needed to do
the fetch, decode, and execute cycles of every
instruction in CPU’s instruction set.
Big Picture

• What will the CPU be used for?


– To control a microwave oven? Or a PC?
• Once application is clear, we know what kind
of programming requirements exist
– Simple, routine or complex, real time…
• Then we can decide what to put in our ISA
• What internal registers to have (not
programmer accessible)
Big Picture

• Design the CPU state diagram


– Plus the micro-operations needed to fetch, decode
and execute each instruction.
• Define the internal data paths and the
necessary control signals to ensure proper
sequencing and execution
• Control unit design
– Logic for generating the control signals and
causing the operations to occur
Generic CPU State Diagram
• Fetch cycle: Fetch
an instruction from
memory, then go to
the decode cycle.
• Decode cycle:
Decode the
instruction – that is,
determine which
instruction has been
fetched – then go to
the execute cycle for
that instruction.
• Execute cycle:
Execute the
instruction, then go
to the fetch cycle
and fetch the next
instruction.
Specifications of a Simple
Accumulator-based CPU (Carpinelli)
• Access 64 bytes of memory
• Each byte has 8 bits
• Six bit address on output pins A[5..0]
• Reads 8 bit data from memory on input pins D[7..0]
• CPU has one programmer-accessible register
– an 8-bit accumulator, AC
Instruction Instruction Code Operation
ADD 00AAAAAA AC  AC + M[AAAAAA]
AND 01AAAAAA AC  AC ^ M[AAAAAA]
JMP 10AAAAAA GOTO AAAAAA
INC 11XXXXXX AC  AC + 1
Additional Registers
• Six bit address register, AR
– Supplies an address to memory via A[5..0]
• Six bit program counter, PC
– Contains address of next instruction to be executed
• Eight bit data register, DR
– Receives instructions and data from memory via
D[7..0]
• Two bit instruction register, IR
– Stores OPCODE of instruction fetched from
memory
Fetching Instructions from Memory
– Send address to memory by placing it on the address
pins A[5..0].
– Allow memory enough time to perform its internal
decoding and to retrieve the desired instruction
– Then send a signal to memory so that it outputs the
instruction on its output pins.

We commonly use register transfer notation (RTN) here


Fetching Instructions from Memory
• The address of instruction to be fetched is loaded from program
counter, PC into address register, AR
– FETCH1: AR  PC
• CPU asserts a READ signal and increments PC
– FETCH2: DR  M, PC  PC + 1
• Copy 2 high order bits of DR to IR, 6 low-order bits of DR to AR
– FETCH3: IR  DR[7..6], AR  DR[5..0]
Fetch Cycle

FETCH1: AR  PC

FETCH2: DR  M, PC  PC + 1

FETCH3: IR  DR[7..6], AR  DR[5..0]


Fetch and Decode Cycles
ADD1: DR  M
ADD2: AC  AC + DR JMP1: PC  DR[5..0]

AND1: DR  M
AND2: AC  AC ^ DR INC1: AC  AC + 1
Complete State Diagram for the Simple CPU

Jump back to
Microsequencer
Slide (40)
Establishing Required Data Paths

• Two approaches to datapath design


– Direct paths between each pair of components
• Use multiplexers to select one of several possible data
inputs for registers that can receive data from more than
one source
– Example: AR  PC; AR  DR[5..0]
– Mux!
• Impractical as CPU complexity increases
– Use system bus within the CPU
• Route data between CPU components via this shared bus
Establishing Required Data Paths
• State diagram and RTNs specify what must be done in order to
realize this CPU. Now we start the CPU design.
• Operations associated with each state of the CPU are:
FETCH1: AR  PC
FETCH2: DR  M, PC  PC + 1
FETCH3: IR  DR[7..6], AR  DR[5..0]
ADD1: DR  M
ADD2: AC  AC + DR
AND1: DR  M
AND2: AC  AC ^ DR
JMP1: PC  DR[5..0]
INC1: AC  AC + 1
Establishing Required Data Paths
• Regroup operations by registers whose contents they modify.
AR: AR  PC; AR  DR[5..0]
PC: PC  PC + 1; PC  DR[5..0]
DR: DR  M
IR: IR  DR[7..6]
AC: AC  AC + DR; AC  AC ^ DR; AC  AC + 1
• Connect every component to the system bus
Preliminary Register Section for
the Simple CPU
AR: AR  PC;
AR  DR[5..0]

PC: PC  PC + 1;
PC  DR[5..0]

DR: DR  M

IR: IR  DR[7..6]

AC: AC  AC + DR;
AC  AC ^ DR;
AC  AC + 1
Preliminary Register Section for
the Simple CPU
• Registers: AR, PC, DR,
AC and IR
• Buffers
• 8-bit bus
Further Optimization
Observations:
1. AR only supplies its data to
memory – no need to connect to
any other component/internal bus
2. IR and AC also do not supply data
to any other component – remove
output connection
3. Bus is 8 bits wide, but not all data
transfers are 8 bits (some are 6,
some are 2) – need to specify
which registers send and receive
to and from which bus
• In practice, just connect as
required: AR/PC to lower
order 6 bits, IR to higher order
2 bits, etc.
4. AC must be able to load the sum
of AC and DR, logical AND of AC
and DR. CPU needs to include an
ALU to do that.
A Simple ALU

Create separate
hardware for each
function and use a
multiplexer to
select the function
results
Final Register Section for
the Simple CPU
Next: Control Unit Design
• At this point our CPU can perform every
operation necessary to fetch, decode and execute
the entire instruction set.
– Next task: to design the circuitry to generate the
control signals in proper sequence.
• Control Unit Design
– Two main methodologies:
• Hardwired control – uses sequential and combinatorial logic
to generate control signals
• Micro-sequenced control – uses a lookup memory to output
the control signals.
Control Unit Design

HARDWIRED DESIGN
Hardwired Control Unit for our CPU

Requirements
• Counter: contains current state
• Decoder: generates individual state signals from current
state
• Additional logic: to take individual state signals and
generate control signals for each component, as well as
the signals to control counter itself
Our Example CPU has Nine States

9 states, so 4 bit counter and a 4-to-16 bit decoder suffice in this case.
(Seven outputs of decoder will not be used)

Heuristic guidelines:
1. Assign FETCH1 to counter value 0 and use the CLR input of the counter
to reach this state.
2. Assign sequential states to sequential counter values and use the INC
input of the counter to traverse these states.
 CPU would assign FETCH2 to counter value 1 and FETCH3 to counter value 2
 ADD1 and ADD2, AND1 and AND2 to consecutive counter values as well
3. Assign the first state of each execute routine based on the instruction
opcodes and the maximum number of states in the execute routines.
4. Use the opcodes to generate the data input to the counter and the LD input
to the counter to reach the proper execute routine.
Choosing a proper mapping function
• Instructions, first states and op codes for the CPU
Instruction First State IR
ADD ADD1 00
AND AND1 01
JMP JMP1 10
INC INC1 11

• One possible mapping function: 10IR[1..0]


IR[1..0] Counter Value State
00 1000 (8) 00
01 1001 (9) 01
10 1010 (10) 10
11 1011 (11) 11
Choosing a proper mapping function
• Leads to some difficulties
• A better mapping function: 1IR[1..0]0

IR[1..0] Counter Value State


00 1000 (8) 00
01 1010 (10) 01
10 1100 (12) 10
11 1110 (14) 11

• Once we decide which decoder output is assigned to each state, we can use
these signals to generate the control signals for the counter and the
components of the rest of the CPU
Counter Control Signals

• INC: assert when the control unit is traversing sequential


states, during FETCH1, FETCH2, ADD1 and AND1.

• CLR: assert at the end of each execute cycle, to return to


FETCH cycle (during ADD2, AND2, JMP1, INC1)

• LD: assert at the end of the fetch cycle (specifically, during


FETCH3)

• Observe that each state of the CPU’s state diagram drives


exactly one of these three control signals FSM Diagram
Remaining Control Signals

• Combine these state signals to create the remaining control


signals for registers and buffers shown in the datapath
• AR: loaded during FETCH1 and FETCH3
ARLOAD = FETCH1 ˅ FETCH3
• Similarly, we have:
PCLOAD = JMP1
PCINC = FETCH2
DRLOAD = FETCH2 ˅ ADD1 ˅ AND1
ACLOAD = ADD2 ˅ AND2
ACINC = INC1
IRLOAD = FETCH3
• ALU has one control input, ALUSEL
– ALUSEL = 0/1 => Sum/AND
– Set ALUSEL = AND2
Register Section for the CPU
Remaining Control Signals: Buffers

• DR: placed onto the internal data bus during FETCH3, ADD2,
AND2 and JMP1
DRBUS = FETCH3 ˅ ADD2 ˅ AND2 ˅ JMP1
• Similarly, we have:
MEMBUS = FETCH2 ˅ ADD1 ˅ AND1
PCBUS = FETCH1

• Control unit must generate a READ signal, output from CPU
READ = FETCH2 ∨ ADD1 ∨ AND1
Hardwired Control Unit for our CPU
Control Signal Generation

Slide 48
Verify
0:ADD 4
1: AND 5
2: INC
3: JMP 0
4:10H
5:20H
Example

Instruction State Active Operations Next state


Signals performed
FETCH1 PCBUS, AR ← 0 FETCH2
ARLOAD
FETCH2 READ, DR ← 04H, FETCH3
MEMBUS, PC ← PC+1 (=1)
DRLOAD,
PCINC
ADD 4 FETCH3 DRBUS, IR ← 00, AR ← 04H ADD1
ARLOAD,
IRLOAD
ADD1 DRLOAD, DR ← M (10H) ADD2
MEMBUS,
READ
ADD2 ACLOAD, AC ← AC + DR FETCH1
DRBUS (AC ← 0 + 10H)
Home Assignment

Instruction State Active Operations Next state


Signals performed
0:ADD 4 FETCH1
FETCH2
1: AND 5 …
2: INC
3: JMP 0
4:10H
5:20H …
Microprogrammed Control Unit

MICRO-SEQUENCER
DESIGN
Micro-sequencer Design

• Hardwired control – generates control logic


using combinational and sequential logic
• Micro-programmed/Micro-sequenced Control
–Uses a lookup ROM called microcode
memory to issue control signals
– By accessing the locations of the microcode
memory in correct order, the lookup ROM asserts
the control signals in proper sequence to realize the
instructions in the processor’s instruction set
Generic microsequencer organization
• Register stores a value that
corresponds to one state in our CPU
state diagram
• This serves as the address that
is input to the micro-code
memory
• Microcode memory outputs a
micro-instruction – the contents of
the memory location for that address
• Collectively, all the micro-instructions
comprise the microcode or micro-
program for the CPU
• A typical micro-instruction is
comprised of several bit fields,
divided broadly into two groups
• µOPs – signals output from the
micro-sequencer to rest of CPU
(micro)
• Next address generation bits
(sequencing)
Generic microsequencer organization
• µOPs: Are input to combinatorial
logic to generate the CPU’s control
signals, or they directly produce the
control signals
• Second output: used to generate
the next address
• These bits are input, along with
op-code and flag values – used
to generate address of next
microinstruction (equivalent of
making transition from one state
to the next)
• Possible next addresses are –
• Next address in microcode
memory, i.e., current address + 1
• Absolute address supplied by
microcode memory
Generic microsequencer organization
• Mapping logic: Every micro-
sequencer must be able to access the
correct execute routine – the fetched
instruction’s op-code is mapped by
the mapping hardware (Generate
Next Address block) to the address
in microcode of the first micro-
instruction for that routine
• Micro-sequencer loads this
address into the register, thereby
making it branch to the correct
execute routine
• This decoding is done at the end
of the Fetch cycle
Generic microinstruction format

• SELECT field: determines the source of address of next


microinstruction
• ADDR field: specifies an absolute address. Used when
performing absolute jump (may be used otherwise also)
• MICRO-OPERATIONS field: the micro-ops fields or
actual control signals to be activated

• Three primary approaches for Microcode:


• Horizontal microcode
• Vertical microcode
• Direct generation of control signals
1. Horizontal Microcode
• List every micro operation performed by CPU
• Assign one bit in microcode for each micro
operation
– Can result in large microcode (for example, for a
CPU with 50 microinstruction, that would be a
microinstruction with 50 bits – pretty large
• Need more efficient space utilization
2. Vertical Microcode
• Micro operations grouped into fields
– Each micro operation assigned a unique encoded value to
this field
• 16 micro operations could be encoded using four bits, with each
micro operation assigned a unique binary field value from 0000 to
1111 (including one for a no op operation)
• Requires fewer bits for each micro operation but must
include a decoder.
In both vertical and horizontal micro-sequencing,
the CPU must convert micro operation signals to the
actual control signals that load, clear, and
increment registers and buffers etc.
3. Direct Generation
of Control Signals
• Does away with micro operations
• Stores values of control signals directly in
microinstruction
– One bit connected directly to load signal, another
to increment input of program counter, etc.
• Faster, doesn’t require additional logic, but
less readable and more difficult to debug
Microsequencer for our CPU
No need to redesign the CPU from scratch: instruction
set, FSM, data paths and ALU remain the same
• Two possible next addresses
• Op-code mapping (MAP)
• Absolute jump (ADDR)
• Why? Look at our CPU state
diagram
• MUX: use to select the
correct address
• Input to MUX, Register, and
Microde memory and ADDR
output from Microde memory
is 4 bits – minimum bits
needed to determine one of
the 9 states of CPU (see slide
8 for the FSM)
Mapping logic for Simple Microsequencer

State Address
FETCH1 0000 (0)
FETCH2 0001 (1)
FETCH3 0010 (2)
ADD1 1000 (8)
ADD2 1001 (9)
AND1 1010 (10)
Assign each state of the FSM to an
AND2 1011 11)
address in microcode
JMP1 1100 (12)
INC1 1110 (14)

We choose the addresses same as in hardwired


control (can also be done differently)
Partial Microcode for the Simple Microsequencer

State Address SEL ADDR


•SEL = 0 will get next
FETCH1 0000 (0) 0 0001 address from ADDR field
FETCH2 0001 (1) 0 0010
•FETCH3 must map to the
correct execute routine,
FETCH3 0010 (2) 1 xxxx so SEL = 1 to use that
ADD1 1000 (8) 0 1001 address
ADD2 1001 (9) 0 0000

AND1 1010 (10) 0 1011

AND2 1011 11) 0 0000

JMP1 1100 (12) 0 0000

INC1 1110 (14) 0 0000


Micro operations and their mnemonics
for the Micro-sequencer

• Need to list every micro-op so Mnemonics Micro Operation

we can develop the code. ARPC AR  PC


• Assign mnemonic to each ARDR AR  DR[5..0]
• This CPU has nine micro PCIN PC  PC + 1
operations. PRDR PC  DR[5..0]
• Need one bit for each DRM DR  M
operation. IRDR IR  DR[7..6]
PLUS AC  AC + DR
AND AC  AC ^ DR
ACIN AC  AC + 1

Back to slide 53
Data Paths

• Operations associated with each state of the CPU are:


FETCH1: AR  PC ARPC
FETCH2: DR  M, PC  PC + 1 DRM, PCIN
FETCH3: IR  DR[7..6], AR  DR[5..0] IRDR,ARDR
ADD1: DR  M DRM
ADD2: AC  AC + DR PLUS
AND1: DR  M DRM
AND2: AC  AC ^ DR AND
JMP1: PC  DR[5..0] PCDR
INC1: AC  AC + 1 ACIN
1. Preliminary horizontal microcode for
the Simple Microsequencer
State Address S A A P P D I P A A ADDR
E R R C C R R L N C
L P D I D M D U D I
C R N R R S N
FETCH1 0000 (0) 0 1 0 0 0 0 0 0 0 0 0001

FETCH2 0001 (1) 0 0 0 1 0 1 0 0 0 0 0010

FETCH3 0010 (2) 1 0 1 0 0 0 1 0 0 0 xxxx

ADD1 1000 (8) 0 0 0 0 0 1 0 0 0 0 1001

ADD2 1001 (9) 0 0 0 0 0 0 0 1 0 0 0000

AND1 1010 (10) 0 0 0 0 0 1 0 0 0 0 1011

AND2 1011 11) 0 0 0 0 0 0 0 0 1 0 0000

JMP1 1100 (12) 0 0 0 0 1 0 0 0 0 0 0000

INC1 1110 (14) 0 0 0 0 0 0 0 0 0 1 0000


Optimize

• For all states, ARDR and IRDR have the same


value
– Don’t need two outputs
• Use one output to drive both micro operations,
AIDR, combining
– AR  DR[5..0] and IR  DR[7..6]
Optimized horizontal microcode for the
Simple Microsequencer
State Address S A A P P D P A A ADDR
E R I C C R L N C
L P D I D M U D I
C R N R S N
FETCH1 0000 (0) 0 1 0 0 0 0 0 0 0 0001

FETCH2 0001 (1) 0 0 0 1 0 1 0 0 0 0010

FETCH3 0010 (2) 1 0 1 0 0 0 0 0 0 xxxx

ADD1 1000 (8) 0 0 0 0 0 1 0 0 0 1001

ADD2 1001 (9) 0 0 0 0 0 0 1 0 0 0000

AND1 1010 (10) 0 0 0 0 0 1 0 0 0 1011

AND2 1011 11) 0 0 0 0 0 0 0 1 0 0000

JMP1 1100 (12) 0 0 0 0 1 0 0 0 0 0000

INC1 1110 (14) 0 0 0 0 0 0 0 0 1 0000


Control signal values for the Simple CPU
Signal Value
We can now generate ARLOAD ARPC ˅ AIDR
the control signals as PCLOAD PCDR
shown in this table PCINC PCIN
DRLOAD DRM
ACLOAD PLUS ˅ AND
ARINC ACIN
IRLOAD AIDR
ALUSEL AND
MEMBUS DRM
PCBUS ARPC
DRBUS AIDR ˅ PCDR ˅ PLUS ˅ AND
READ DRM
Slide 27
2. Micro-operations from vertical microcode
• Horizontal microcode table
has lots of zeros – 85% in
this instance!
• Vertical micro operations
are grouped into fields
such that no more than one
µOP is active in the field
• Unique field value
assigned to each µOP in
the field
• Use decoders to generate
the µOP instructions from
these field bits
We already have IS, state diagram, data path & ALU in place. Plus
the same sequencing hardware, mapping logic, sequencing portion of
the microcode (SEL & ADDR) and µOPs as in horizontal microcode.
Heuristic guidelines to design microsequencer
using vertical microcode
1. Whenever two µOPs occur during the same
state, assign them to different fields.
2. Include a NOP in each field if necessary.
– Needed because some value must be output every
cycle, even when no µOP is active
3. Distribute the remaining micro-operations to
make the best use of the µOP field bits.
4. Group µOPs that modify the same registers in
the same field.
o Coz two µOPs cannot do this simultaneously
Find micro operations that can run concurrently

• DRM and PCIN both occur in FETCH2, must be


assigned to different fields
– Will need at least two fields for the Simple CPU
M1 M2
NOP NOP
DRM PCIN
• Since PCIN and PCDR both modify PC, add PCDR
to M2
• Then arbitrarily assign the remaining micro-ops to the
fields, keeping micro operations that change the same
register in the same field.

Slide 43 (micro-op mnemonics)


Micro operation field assignments and values

MI M2
Value Micro Operation Value Micro Operation
000 NOP 0 NOP
001 DRM 1 PCIN
010 ARPC 2 PCDR
011 AIDR
100 PLUS • M1 requires 3 bits, M2 2
101 AND bits, so total of 5 bits
110 ACIN
Micro operation field assignments and values

MI M2
Value Micro Operation Value Micro Operation
000 NOP 0 NOP
001 DRM 1 PCIN
010 ARPC
011 AIDR
100 PCDR • To optimize, it is best to
101 PLUS maximize to a power of
110 AND two, so if possible, put eight
111 ACIN micro operations in M1 and
two in M2.
• M1 requires 3 bits, M2 only
one bit, for total of 4 bits
Vertical microcode for the Simple Microsequencer

State Address SEL M1 M2 ADDR

FETCH1 0000 (0) 0 010 0 0001

FETCH2 0001 (1) 0 001 1 0010

FETCH3 0010 (2) 1 011 0 xxxx

ADD1 1000 (8) 0 001 0 1001

ADD2 1001 (9) 0 101 0 0000

AND1 1010 (10) 0 001 0 1011

AND2 1011 11) 0 110 0 0000

JMP1 1100 (12) 0 100 0 0000

INC1 1110 (14) 0 111 0 0000


Generating micro operations for vertical microcode
3. Directly Generating Control Signals from the Microcode

• Use one bit for each control signal


– Set to 1, it is active
– Set to 0, it is not active
• Example
FETCH2: DR  M and PC  PC + 1
READ output data from memory
MEMBUS to allow data onto internal bus
DRLOAD to load data from bus into DR
PCINC to perform second micro operation
Set those four control signal bits to 1, the others to 0
Microcode to directly generate control signals
for the Simple Microsequencer
State Address S A P P D A A I A M P D R ADDR
E R C C R C C R L E C R E
L L L I L L I L U M B B A
O O N O O N O S B U U D
A A C A A C A E U S S
D D D D D L S
FETCH1 0000 (0) 0 1 0 0 0 0 0 0 0 0 1 0 0 0001
FETCH2 0001 (1) 0 0 0 1 1 0 0 0 0 1 0 0 1 0010
FETCH3 0010 (2) 1 1 0 0 0 0 0 1 0 0 0 1 0 xxxx
ADD1 1000 (8) 0 0 0 0 1 0 0 0 0 1 0 0 1 1001
ADD2 1001 (9) 0 0 0 0 0 1 0 0 0 0 0 1 0 0000
AND1 1010 (10) 0 0 0 0 1 0 0 0 0 1 0 0 1 1011
AND2 1011 11) 0 0 0 0 0 1 0 0 1 0 0 1 0 0000
JMP1 1100 (12) 0 0 1 0 0 0 0 0 0 0 0 1 0 0000
INC1 1110 (14) 0 0 0 0 0 0 1 0 0 0 0 0 0 0000
Pros/Cons of Directly Generating Control Signals

• Pro: Does not require additional logic to


convert the outputs of the microcode memory
to control signals
• Con: Less readable, more difficult to debug
References

• Computer Architecture and Organization, 2nd


Edition, Chapters 6 and 7, John D. Carpinelli.
• Computer Systems Architecture, 3rd Edition,
M. Morris Mano.

You might also like