Topic #3 - CPU - With Notations

Concordia University
Electrical & Computer Engineering
COEN 311
Computer Organization & Software
Topic #3 - continued
Principle Components of a Computer (CPU)
Dr. Fadi Alzhouri
Adapted from notes by

Dr. Sofiene Tahar
Dr. Anjali Agarwal
Performance: Execution Time
• Processor time depends on the hardware (processor and memory

connected by bus) involved in execution of individual machine
instructions.
• CPU time
• Time spent processing a given job
• Discounts I/O time, other jobs’ shares
• Combines user CPU time and system CPU time.
2
Performance: Execution Time = CPU Time
• Computers operations are synchronized with a clock signal.
• Clock will determine when the events must happen.
• Clock period, or clock cycle is the amount of time in which a period is completed.
• Clock rate (or frequency) is the reverse of the clock period.
• Example: if clock period is 250 ps, the clock rate will be 4 GHz.
• CPU Clocking
 Clock period: duration of a clock cycle

 E.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 E.g., 4.0GHz = 4.0×109Hz
3
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate
 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate (reducing length of clock cycles)
4
Example: CPU Time
• A program runs in 10 seconds on computer A, which has a 2 GHz

clock. We want to have a computer, B, which will run this program in
6 seconds. The designer has determined that a substantial increase in
the clock rate is possible, but this increase will affect the rest of the
CPU design, causing computer B to require 1.2 times as many clock
cycles as computer A for this program.
What clock rate should we tell the designer to target?
5
Example: CPU Time (cont.)
Clock CyclesB 1.2  Clock Cycles A

Clock RateB  
CPU Time B 6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  10 9
1.2  20  10
9
24  10 9
Clock RateB    4GHz
6s 6s
6
Components of a Computer System
7
Central Processing Unit (CPU)
• CPU is the heart and brain
• It interprets and executes machine level instructions
• Controls data transfer from/to Main Memory (MM) and CPU
• Detects any errors
• In the following lectures, we will learn:

• Instruction representation
• Data transfer mechanism between MM and CPU
• The internal functional units of two different CPU architectures
• How these units are interconnected
• How a processor executes instructions
8
Instruction Representation
• CPU operation is determined by the instruction it executes
• Collection of these instructions that a CPU can execute forms its Instruction Set
Architecture.
• An instruction is represented as sequence of bits, for example:
1001 0010 0000 0011 1011 1011 1000 0001
• Instruction is divided into fields 9 2 0 3 B B 8 1
Opcode Operand1 Operand2
• Opcode indicates the operation to be performed, eg., 92 above indicates a copy

operation (MOV). Number of bits in opcode field depends on the number of
instructions in the ISA, the number of instruction types and the complexity of the
instructions.
• we need one, two or three fields for operands – for source and destination
• Opcode represents
• The nature of operands (data or address), operand 1 is address and operand 2 is data
• Addressing mode (register or memory), operand 1 is memory, and operand 2 is immediate data.
9
Basic Instruction Types
Not all instructions require two operands
• 3-address instructions
Operation Destination, Source1, Source2
e.g. Add A, B, C ;A=B+C
Operation Destination, Source
e.g. Move B, C ;B=C
Add A, C ;A=C+A
Here the destination is implicitly the Source2
e.g. Inc A ; A = A+1
10
Simple Instruction Set
Assume we have a processor whose Instruction Set consists of four machine
language instructions
• Move from a memory location to a data register in CPU
• Move from a data register in CPU to a memory location
move $0000 0000
• Add the contents of a memory location to a data register
add
• Stop move
Suppose our program for Z = X + Y looks like:
Move D0, X
Add D0, Y
Move Z, D0
Directive
Stop
This program is coded into machine instruction and suppose is loaded into memory
starting at location $0000 0000 11
• How does the CPU know which instruction to execute?
• There is a dedicated register in CPU called Program Counter (PC)
that points to the memory location where next instruction is stored
Therefore, at start PC = $0000 0000
• Instruction is in Main Memory – it is to be transferred
(fetched) to CPU to be executed
• CPU has an Instruction Register (IR) that holds the instruction
• What kind of instruction is to be executed?
• CPU has its own Instruction Interpreter (Decoder)
• Followed by Instruction execution
• Next instruction follows. PC is incremented by length of
instruction just completed
12
Mechanism of Transferring Data from MM to CPU
CPU has an external bus that connects it to the Memory and I/O devices.
The data lines are connected to the processor via the Memory Data Register
(MDR)
The address lines are connected to the processor via the Memory Address
Register (MAR)
• Memory address from where the instruction/data is to be accessed is copied into
MAR register
• Contents of MAR are loaded onto address bus
• Corresponding memory location accessed MAR
Address bus
• Contents of this location put onto data bus

Data bus
• Data on data bus loaded into MDR MDR
Control bus
CPU MM
R/W 13
General Purpose Register (GPR)Architecture
Its functional units are:
- Data Registers: Register File (consists of number of data register, for example D0, D1,
D2,..., D7) for arithmetic operations – holds any kind of data
- Address Registers: A0, A1, A2,..., A7 serve as pointers to memory addresses
- Working Registers: several such registers – serve as scratch pads for CPU
- Program Counter (PC) holds the address in memory of the next instruction to be
executed. After an instruction is fetched from memory, the PC is automatically
incremented to hold the address of, or point to, the next instruction to be executed.
- Instruction Register (IR) holds the most recently read instruction from memory while it
is being decoded by the Instruction Interpreter.
- Memory Address Register (MAR) holds the address of the next location to be accessed
in memory.
- Memory Buffer Register (MBR or MDR) holds the data just read from memory, or the
data which is about to be written to memory. Buffer is referring to temporarily holding
data.
- Status Register (SR) to record status information 14
GPR CPU (16 bit)
0
1 Register Data bus
2 File MDR
3 IR
Address bus
MAR
Interpreter
ALU
PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 15
Program Execution
Fetch Cycle:
• Processor fetches one instruction at a time from successive memory locations until
a branch/jump occurs.
• Instructions are located in the memory location pointed to by the PC
• Instruction is loaded into the IR
• Increment the contents of the PC by the size of an instruction
Decode Cycle:
• Instruction is decoded/interpreted, opcode will provide the type of operation to be
performed, the nature and mode of the operands
• Decoder and control logic unit are responsible for selecting the registers involved
and directing the data transfer.
Execute Cycle:
• Carry out the actions specified by the instruction in the IR
16
Execution for add R3, R1,R2 in a GPR processor
Memory location MAR  PC R3 R1 + R2
MDR  M[MAR]
16 bit instruction
Fetch
PC  PC + 2
IR  MDR
Decode
R3  R1 + R2 Execute
17
Instruction Execution in GPR Machine
ADD R3, R1, R2
R3 <- R1 + R2
$10 9103
$12 9202 1) Fetch PC = $14 & Inc PC
$14 A039
2) Decode
Memory
1010 xxxx xx11 10 01
ADD R3, R2, R1
MAR MDR 3) Execute

Control
PC R0
R1
IR ...
R15 ALU
CPU
18
GPR CPU
0
1 Register Data bus
2 File MDR
3 IR
Address bus
Type equation here. MAR
Interpreter
ALU
PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 19
Timing Analysis of Instruction Execution ADD R3, R1, R2
0.3ns 1 MARPC Clock Cycles (P) – regular time intervals

defined by the CPU clock
Clock Rate, R = 1/P cycles per second (Hz)
3+0.3ns 10+1 MDRM[MAR] fetch 500 MHz => P = 2ns
1.25 GHz => P = 0.8ns
3.33 GHz => P = 0.3ns
0.3ns 1 IRMDR
2x0.3ns 1+1 Processor: 3.33GHz

PCPC+2 Inc PC Memory: 333MHz
1/(3.3 x 109) = 0.3 ns

0.3ns 1 decode decode 1/(333 x 106) = 3 ns
3x0.3ns 1+1+1 R3R1+R2 execute
5.7 ns 19 clock cycles

20
Timing Analysis of Instruction Execution ADD R3, R1, #100
0.3ns 1 MARPC
imm
3+0.3ns 10+1 MDRM[MAR]

1010 11 01 0110 0100
fetch ADD R3, R1, #100
0.3ns 1 IRMDR
2x0.3ns 1+1 PCPC+2
decode decode Data 100 extracted from IR

0.3ns 1
3x0.3ns 1+1+1 R3R1 + IR[100] execute
5.7 ns 19 clock cycles

21
Execution for add R1,R0,X in a GPR processor
memory location
A464 MAR  PC 1010 01 00 0110 0100

ADD R1, R0, X = $64
MDR  M[MAR]
Fetch
PC  PC + 2
IR  MDR
Decode
Address X extracted from IR MAR  IR (X)
Contents of Address X MDR  M[MAR] Execute

transferred to MDR
Contents of Address X R1  MDR + R0

added to R0 22
GPR CPU
0
1 Register Data bus
2 File MDR
3 IR
Address bus
Type equation here. MAR
Interpreter
ALU
PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 23
Complex (multiple words) instruction
Assembly ADD R4, ($2000), ($2002)
Operation R4  M[$2000] + M[$2002]
Encodings ADD R4 M[$2000] M[$2002]
1010 0100 $2000 $2002

4 bits 4 bits 16 bits 16 bits
40 bits = 5 Bytes!!
16-bit data: 1010 XXXX 0100 XXXX $2000 $2002

16 bits 16 bits
16 bits
48 bits = 6 Bytes 24
1010 0000 0100 0000 $2000 $2002
ADD R4, ($2000), ($2002) 16 bits 16 bits 16 bits
48 bits = 6 bytes =3 Words
$1000 A040
Instruction $1002 2000
PC
$1004 2002 48
A040 IR
16 bits
$0710 $0311
$2000
16
+
MAR Temp 1
$2002
$2000 0710
Data MDR Temp 2
$2002 0311 R4
16 bit data, 16 bit address architecture

25
ADD R4, ($2000), ($2002)
MARPC $1000
Fetch MDRM[MAR] $A040
IRMDR $A040
other opcode
decode interpret
add reg, mem1, mem2
26
ADD R4, ($2000), ($2002)
PCPC+2 $1002
MAR PC $1002

Fetch address of
memory operand 1
MDRM[MAR] $2000
Temp1  MDR $2000
PCPC+2 $1004
MAR PC $1004
Fetch address of
memory operand 2
MDRM[MAR] $2002
Temp2MDR $2002
27
ADD R4, ($2000), ($2002)
MAR Temp 1 $2000
Fetch memory operand 1 MDR M[MAR] $0710
Temp 1  MDR $0710
MAR  Temp2 $2002
Fetch memory operand 2 MDR M[MAR] $0311
Temp2  MDR
Execution R4  Temp1+Temp2 ($0311+$0710)
Increment PC PCPC+2 $1006

28
0.3ns 1 MARPC
ADD R4, ($2000), ($2002) 3.3ns 10+1 MDRM[MAR]
0.3ns 1 IRMDR
Processor: 3.3GHz 0.3ns 1 decode

Memory: 333MHz 0.9ns 2+1 PCPC+2
0.3ns 1 MAR PC
3.3ns 10+1 MDRM[MAR]
Memory access: 10cc 0.3ns 1 Temp1  MDR
Register transfer: 1cc 0.9ns 2+1 PCPC+2
Logical operation: 1cc
0.3ns 1 MAR PC
Arithmetic operation: 2cc
0.3ns 1 Temp2MDR
0.3ns 1 MAR Temp 1
3.3ns 10+1 MDR M[MAR]
0.3ns 1 Temp 1  MDR
0.3ns 1 MAR  Temp2
1.2ns 1+2+1 R4Temp1+Temp2
Total: 79 cc = 23.7 ns 0.9ns 2+1 PCPC+2 29
0.3ns 1 MARPC
ADD R4, ($2000), ($2002) 3.3ns 10+1 MDRM[MAR]
0.3ns 1 IRMDR
Processor: 3.3GHz 0.3ns 1 decode

Memory: 333MHz 0.9ns 2+1 PCPC+2
0.3ns 1 MAR PC
Memory access: 10cc 0.3ns 1 Temp1  MDR
Register transfer: 1cc 0.9ns 2+1 PCPC+2
Logical operation: 1cc
0.3ns 1 MAR PC
Arithmetic operation: 2cc
0.3ns 1 Temp2MDR
0.3ns 1 MAR Temp 1
63% memory traffic!! 0.3ns 1 Temp 1  MDR
0.3ns 1 MAR  Temp2
1.2ns 1+2+1 R4Temp1+Temp2
Total: 79 cc = 23.7 ns 0.9ns 2+1 PCPC+2 30
Accumulator (Acc)Architecture
• Its functional units are same as GPR architecture, except there is only ONE
register – accumulator (Acc) – instead of the Register File
Ex: Z = X + Y
Move contents of location X to Acc
Add contents of location Y to Acc
Move from Acc to location Z
Stop
• All operations and data movements are on this single register
• Most of the instructions in the instruction set require only one Operand
• Destination and Source are implicitly Acc
• Leads to shorter instructions but program may be slower to execute since
there are more moves to memory for intermediate results (to free Acc)
• May lead to inefficiency 31
Accumulator Architecture CPU
Data bus
Acc MDR
IR
Address bus
MAR
Interpreter
ALU
PC
Memory
Control
16 bit Increment
CPU 10 bit Memory 32
Execution for Add Y in an Acc Architecture
MAR  PC ADD Y
ACC  ACC + M[Y]
MDR  M[MAR]
Fetch
PC  PC + 2
IR  MDR
Decode
Address Y extracted from IR MAR  Y
Contents of Address Y MDR  M[MAR] Execute

transferred to MDR
Contents of Address Y Acc  MDR + Acc

added to Accumulator 33
Your First Assembly Program
We want to compute
a  (x+y) * (x-y)
where x, y and a are memory references
1) Write an assembly program for the above task using:

a) Accumulator machine
b) GPR machine ( 4 registers)
2) Assuming all accumulator instructions take 50 ns; and all

GPR instructions take 30ns, except move M-R/R-M that take 50ns
Compute the total execution of the programs in 1a) and 1b)
34
Instruction Sets
1) Accumulator Machine Memory
ADD X ; ACC  ACC + M[X]
SUB X ; ACC  ACC – M[X] ..
MUL X ; ACC  ACC * M[X] .
.
LD X ; ACC  M[X] X x
ST X ; M[X]  ACC
address data
X: address in memory
x: data in memory, i.e., M[X]=x
2) GPR Machine
ADD Rk, Ri, Rj ; Rk  Ri + Rj
SUB Rk, Ri, Rj ; Rk  Ri – Rj dest src1 src2
MUL Rk, Ri, Rj ; Rk  Ri * Rj ADD Rk, Ri, Rj
MOV Rj, Ri ; Rj  Ri
MOV X, Ri ; M[X]  Ri
MOV Ri, X ; Ri  M[X] 35
1) Accumulator Machine
LD X ; ACC  M[X] = x
x X
y Y
ADD Y ; ACC  M[Y] + x ..
..
ST T ; M[T] = temp  x + y a A
temp T
SUB Y ; ACC  ACC - M[Y] = x-y data address
MUL T ; ACC  ACC * M[T] = (x-y)*temp

ST A ; M[A] = a  (x-y)*(x+y)
36
2) GPR
MOV R0, X ; R0  M[X] = x

MOV R1, Y ; R1  M[Y] = y
ADD R2, R0, R1 ; R2  R0+R1
SUB R3, R0, R1 ; R3  R0-R1
MUL R0, R2, R3 ; R0  R2*R3
MOV A, R0 ; M[A]  R0 = R2*R3
; a  (x+y)*(x-y)
37
The Assembly Programs
1) Accumulator Machine
ADD Y ; ACC  ACC + M[Y] = x + y
ST T ; M[T] = temp  ACC = x + y x X
y Y
LD X ; ACC  M[X] = x ..
SUB Y ; ACC  ACC – M[Y] = x-y .
.
MUL T ; ACC  (x-y)*M[T] = (x-y)*temp a A
ST A ; M[A] = a  (x-y)*(x+y) temp T
2) GPR
MOV X, R0 ; R0  M[X] = x data address
MOV Y, R1 ; R1  M[Y] = y
ADD R2, R0, R1 ; R2  R0+R1
SUB R3, R0, R1 ; R3  R0-R1
MUL R0, R2, R3 ; R0  R2*R3
MOV A, R0 ; M[A]  R0 = R2*R3
; a  (x+y)*(x-y)
38
Next …. Memory (Topic #3_2)
39

Topic #3 - CPU - With Notations

Uploaded by

Copyright:

Available Formats

Topic #3 - CPU - With Notations

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Topic #3 - CPU - With Notations

Uploaded by

Copyright:

Available Formats

Concordia University

Electrical & Computer Engineering

Adapted from notes by

• Processor time depends on the hardware (processor and memory

 Clock period: duration of a clock cycle

• A program runs in 10 seconds on computer A, which has a 2 GHz

What clock rate should we tell the designer to target?

Clock CyclesB 1.2  Clock Cycles A

• In the following lectures, we will learn:

• Opcode indicates the operation to be performed, eg., 92 above indicates a copy

• Contents of this location put onto data bus

MAR MDR 3) Execute

0.3ns 1 MARPC Clock Cycles (P) – regular time intervals

2x0.3ns 1+1 Processor: 3.33GHz

1/(3.3 x 109) = 0.3 ns

3x0.3ns 1+1+1 R3R1+R2 execute

5.7 ns 19 clock cycles

3+0.3ns 10+1 MDRM[MAR]

2x0.3ns 1+1 PCPC+2

decode decode Data 100 extracted from IR

3x0.3ns 1+1+1 R3R1 + IR[100] execute

5.7 ns 19 clock cycles

A464 MAR  PC 1010 01 00 0110 0100

Address X extracted from IR MAR  IR (X)

Contents of Address X MDR  M[MAR] Execute

Contents of Address X R1  MDR + R0

Assembly ADD R4, ($2000), ($2002)

Operation R4  M[$2000] + M[$2002]

Encodings ADD R4 M[$2000] M[$2002]

1010 0100 $2000 $2002

16-bit data: 1010 XXXX 0100 XXXX $2000 $2002

48 bits = 6 bytes =3 Words

16 bit data, 16 bit address architecture

Fetch MDRM[MAR] $A040

add reg, mem1, mem2

MAR PC $1002

Temp1  MDR $2000

MAR Temp 1 $2000

Fetch memory operand 1 MDR M[MAR] $0710

Temp 1  MDR $0710

MAR  Temp2 $2002

Fetch memory operand 2 MDR M[MAR] $0311

Execution R4  Temp1+Temp2 ($0311+$0710)

Increment PC PCPC+2 $1006

Processor: 3.3GHz 0.3ns 1 decode

Processor: 3.3GHz 0.3ns 1 decode

Address Y extracted from IR MAR  Y

Contents of Address Y MDR  M[MAR] Execute

Contents of Address Y Acc  MDR + Acc

1) Write an assembly program for the above task using:

2) Assuming all accumulator instructions take 50 ns; and all

MUL T ; ACC  ACC * M[T] = (x-y)*temp

MOV R0, X ; R0  M[X] = x

You might also like