CMP 3011 - Unit 2 - CPU
CMP 3011 - Unit 2 - CPU
CMP 3011 - Unit 2 - CPU
Sets of wires
1. System (processor-memory) buses
Address bus
Data bus
Control bus
2. I/O buses
3. Backplane buses
contains the source or destination address of
the data on the data bus
e.g. CPU needs to read an instruction
(data) from a given location in memory
Transfer type
• serial or parallel
Dedicated
◦ Separate data & address lines
Multiplexed
◦ Shared lines
◦ Address valid or data valid control line
◦ Advantage - fewer lines
◦ Disadvantages
More complex control
Degradation of performance
Ensuring only one device uses the bus at a
time – avoiding collisions
Choosing a master among multiple requests
◦ Try to implement priority and fairness (no device
“starves”)
31
fast memory, almost always connected to
circuitry that allows various arithmetic,
logical, control, and other manipulations, as
well as possibly setting internal flags
32
include those that may be controlled by the
programmer - called operational registers
and are often referred to as the register set
of the machine.
33
EAX
AX ESI
34
Make them general purpose
◦ Increase flexibility and programmer options
◦ Increase instruction size & complexity
Data
Address
Condition Codes
May be true general purpose
May be restricted
May be used for data or addressing
◦ Data
Accumulator
◦ Addressing
Segment
How Many?
Between 8 - 64
Fewer = more memory references
More does not reduce memory references
and takes up processor real estate
How big?
Large enough to hold full address
Large enough to hold full word
Often possible to combine two data registers
Example: C programming
◦ double int a;
◦ long int a;
register-to-register instructions
memory-to-register instructions
memory-to-memory instructions
load-and-store instructions
40
Ax – Used to accumulate the results of additions,
subtractions and so forth.
41
Sp (stack pointer) – points to the top of the
processor’s stack
42
Cs (code segment) – addresses the start of the
programs machine code in memory
43
Also called status bits
44
O or of – Overflow flag
D or df – Direction flag
I or if – interrupt enable flag
T or tf – trap flag
S or sf – sign flag
Z or zf – zero flag
A or af – auxiliary flag
P or pf – parity flag
C or cf – carry
45
Sets of individual bits
◦ e.g. result of last operation was zero
Can be read (implicitly) by programs
◦ e.g. Jump if zero
Cannot (usually) be set by programs
Program Counter
Instruction Decoding Register
Memory Address Register
Memory Buffer Register
May have registers pointing to:
◦ Process control blocks
◦ Interrupt Vectors
53
A.k.a writing microcode
method used to implement machine instructions
in a CPU relatively easily, often using less
hardware than with other methods.
a set of very detailed and rudimentary lowest-
level routines which controls and sequences the
actions needed to execute particular
instructions, sometimes also to decode them.
a machine instruction implemented by a series of
microinstructions - loosely comparable to how an
interpreter implements a high-level language statement
using a series of machine instructions
62
Machine languages
consist entirely of numbers
almost impossible for humans to read and
write.
Assembly languages
have the same structure and set of commands
as machine languages
enable a programmer to use names instead of
numbers
63
Three basic characteristics differentiate
microprocessors:
1. Clock speed
2. Bandwidth/Bus speed
3. Instruction set
64
Clock speed
also called clock rate
the speed at which a microprocessor executes
instructions.
regulated by an internal clock
synchronizes all the various computer
components.
65
Clock speed
faster the clock => more instructions that may be
executed per second
stated in Hertz (Hz)
i.e: 1 MHz = 1 million cycles per second,
Does not imply better performance
major factor in determining the power of a
computer
66
Bus speed
measured in MHz
often hampers the performance of the computer
by having a slower speed than the processor
Ideally should be the same as the CPU clock
speed
67
Instruction set:
The complete collection of instructions that
the microprocessor can execute:
◦ Machine Code
◦ Binary
Usually represented by assembly codes
Determines a computer family
68
Instruction set:
Dictates how programs a (software written) for a
microprocessor
example: the SIMP computer understands
10 instructions, and any program written for it
uses those ten instructions in various ways to
accomplish some surprisingly complicated tasks.
69
Instruction set:
sometimes a larger instruction set will equal
better performance.
For example, one difference between Pentium 4
and Pentium 5 is that Pentium 5 has a larger
instruction set.
70
plays an important role in co-design of the
embedded computer system
links the software and hardware part of the
system.
design process supports software interface
implementation and hardware interface
synthesis.
enables software running on the
microcontroller to control external devices
consists of the sequential logic that
physically connects the devices to the
microcontroller and the software drivers
that allow code to access the device
functions.
model that renders lower-level details of
computer systems temporarily invisible in
order to facilitate the design of sophisticated
systems
the notion that we can concentrate on one
“level” of the big picture at a time, with
confidence that we can then connect
effectively with the levels above and below.
framing the levels of abstraction
appropriately is one of the most important
skills in any undertaking.
Compiler
Assembly Language
Assembler
78
includes anything programmers need to
know to make a binary machine language
program work correctly, including
instructions, I/O devices, and so on.
COA - Sm 1
Note: The operating system will encapsulate the
details of doing I/O, allocating memory, and other
low-level system functions, so that application
programmers do not need to worry about such
details. The combination of the basic instruction
set and the operating system interface provided for
application programmers is called the application
binary interface (ABI).
allows computer designers to talk about
functions independently from the hardware
that performs them. Computer designers
distinguish architecture from an
implementation of an architecture along the
same lines: an implementation is hardware
that obeys the architecture abstraction.
Operation code (Op code) - What to do
1. Fixed length
87
Length
2. Variable length
– Complex implementation
+ Code density
Encoding
88
Length
• 32-bits
• MIPS16: 16-bit variants of common instructions for
density
Encoding
• 4 formats, simple encoding
89
R-Type Instruction
opcode (6) rs (5) rt (5) rd (5) sa (5) function (6)
(register to register)
I-Type Instruction
format function
(register with immediate operand) opcode (6) ft (5) fs (5) fd (5)
(5) (6)
J-Type Instruction
(jump to target) opcode (6) target (26)
90
opcode (31-26)
rd (25-21)
91
rs (20-16)
rt (15-11)
92
sa (B10-6)
function (B5-0)
An additional 6 bits used to specify the
operation, in addition to the opcode
93
3 addresses
◦ Operand 1, Operand 2, Result
◦ a = b + c;
◦ May be a forth - next instruction (usually
implicit)
◦ Not common
◦ Needs very long words to hold everything
2 addresses
◦ One address doubles as operand and
result
◦a=a+b
◦ Reduces length of instruction
◦ Requires some extra work
Temporary storage to hold some results
1 address
◦ Implicit second address
◦ Usually a register (accumulator)
◦ Common on early machines
0 (zero) addresses
◦ All addresses implicit
◦ Uses a stack
◦ e.g. push a
◦ push b
◦ add
◦ pop c
◦c=a+b
More addresses
◦ More complex (powerful?) instructions
◦ More registers
Inter-register operations are quicker
◦ Fewer instructions per program
Fewer addresses
◦ Less complex (powerful?) instructions
◦ More instructions per program
◦ Faster fetch/execution of instructions
Operation repertoire
◦ How many ops?
◦ What can they do?
◦ How complex are they?
Data types
Instruction formats
◦ Length of op code field
◦ Number of addresses
Registers
◦ Number of CPU registers available
◦ Which operations can be performed on which
registers?
RISC v CISC
Addresses
Numbers
◦ Integer/floating point
Characters
◦ ASCII etc.
Logical Data
◦ Bits or flags
(Aside: Is there any difference between numbers and characters?
Ask a C programmer!)
Operation type encoded in instruction opcode
Operations act on operands (stored in registers)
◦ Data Transfer
◦ Arithmetic
◦ Logical
◦ Conversion
◦ I/O
◦ System Control
◦ Transfer of Control
Data Transfer
Specify
◦ Source
◦ Destination
◦ Amount of data
May be different instructions for different
movements
◦ e.g. IBM 370
Or one instruction and different addresses
◦ e.g. VAX
Arithmetic
Add, Subtract, Multiply, Divide, Mod/rem
◦ signed/unsigned)
Packed integer: padd, pmul, pand, por…
(saturating/wraparound
Floating point
◦ add, sub, mul, div, sqrt
May include
◦ Increment (a++)
◦ Decrement (a--)
◦ Negate (-a)
Shift and
Rotate
Operations
Logical
Bitwise operations
AND, OR, NOT, XOR, SLL, SRL. SRA
Conversion
E.g. Binary to Decimal
Input/Output
Privileged instructions
◦ Kernel mode
Skip
◦ e.g. increment and skip if zero
◦ ISZ Register1
◦ Branch xxxx
◦ ADD A
Subroutine call
◦ c.f. interrupt call
What order do we read numbers that occupy
more than one byte?
Endianness
◦ byte ordering used to represent types of data
◦ the transmission order over a network or other
medium
Big endian
◦ most significant byte (MSB) value is stored at the
memory location with the lowest address
Little endian
◦ The least significant byte (LSB) value is stored at the
lowest address
refers to the ways in which instructions
reference their operands.
112
Direct – moves a byte or word between a
memory location specified in the instruction
and a register
MOV AL, [1234h]
115
Register relative – transfers a byte or word of
data between a register and the memory location
addressed by an index (DI or SI) or base register
(BP or BX) plus a constant displacement
116
Register relative Base relative plus index –
transfers a byte or word of data between a
register and the memory location addressed by
a base register (BP or BX) plus an index
register (DI or SI) plus a displacement
117
For the CPU to function effectively, it must:
◦ Fetch instructions
◦ Interpret instructions
◦ Fetch data
◦ Process data
◦ Write data
The Control Unit orchestrates the complete
execution of each instruction:
Note: Only branch hazards and RAW data hazards are possible in MIPS pipeline
Data hazards
◦ when reads and writes of data occur in a different
order in the pipeline than in the program code,
resulting in data dependencies
IF
AND R6, R1, R7 IDAND EX MEM WB
IF
OR R8, R1, R9 IDOR EX MEM WB
IF
XOR R10,R1,R11 IDXOR EX MEM WB
Analysis of hazards
All the instructions after the ADD use the result of the
ADD instruction (in R1).
Delayed Branch
◦ Do not take jump until you have to
◦ Rearrange instructions
Branch Prediction
Predict by Opcode
◦ Some instructions are more likely to result in a
jump
◦ Can get up to 75% success
For example
Op F CPI (CPI x F) % Time
ALU 50% 1 0.5 23
Load 20% 5 1.0 45
Store 10% 3 0.3 14
Branch 20% 2 0.4 18
Total 100% 2.2 100
(frequency (f) must always add up to 100)
Estimating Performance Improvements
Assume a processor currently requires 10 seconds to
execute a program and performance improves by 50% per
year.
By what factor does processor performance improve in 5
years?
Factor of improvement (foi)= (1 + %)years
Speedupoverall = executionold
executionnew
= 1 .
(1 – Fractionenhanced) + Fractionenhanced
Speedupenhanced
Overall speedup achieved by using the
floating-point processor.
Overall speedup = 1 .
(1-0.5) + 0.5/15
= 1 .
0.5 + 0.033
= 1.876
Example: (Impact of optimizing compiler)
Assume the following program makeup:
Operation Freq. Clock
Cycle
ALU 43% 1
Load 21% 2
Store 12% 2
Branch 24% 2
Assume a 20 ns clock, optimizing compiler
eliminates 50% of all ALU operations.
Solution (NOT Optimized):
MIPS = instruction count = clock rate
Exec Time x 106 CPI x 106
MIPS = 50 MHz
1.73 x 106
= 5 x 106
1.73 x 106
= 28.9
An implementation of a CPU
Multicycle implementation
◦ instructions use multiple clock cycles.
◦ Advantage: Shorter execution time
◦ Disadvantage: More complex control
PCSrc
1
Add M
u
x
4 ALU 0
Add result
RegWrite Shift
left 2
ALUOp
Cycle time = Σ(stages)
Execution Time = IC * CPI * Cycle Time
Processor design (datapath and control) will
determine:
◦ Clock cycle time
◦ Clock cycles per instruction
All combinational logic must stabilize within
one clock cycle.
All state elements will be written exactly
once at the end of the clock.
CPI = 1
All operations must occur in parallel
Used to improve performance in real computer
systems
Divides instruction execution into multiple clock
cycles.
Resources can be reused on each cycle, saving
resources
◦ ALU used to compute address and to increment PC
◦ Memory used for instruction and data
Control signals not determined solely by
instruction
◦ e.g., what should the ALU do for a “subtract”
instruction?
Advantages
◦ The work required by the typical instructions can be
divided over approximately equal, smaller elementary
operations.
◦ The clock cycle can be set to the longest elementary
operation.
Multicycle Datapath
The performance of a pipelined processor may
be considered:
Cycle time
= longest stage + pipeline overhead
Execution time
= cycle time * (no. of stages + IC – 1)
the processor is not the only component in the
system that determines overall system
performance
Speed is NOT performance e.g. a system running
with a Pentium 150 is not necessarily "50% faster"
than one with a Pentium 100.
speeding up the processor only improves system
performance for those aspects of system use that
depend on the processor.
most processors are already much faster than the
devices that support them, so they spend a great deal
of time waiting around for data that they can use