Topic #3 - CPU - With Notations

Download as pdf or txt
Download as pdf or txt
You are on page 1of 39

Concordia University

Electrical & Computer Engineering

COEN 311
Computer Organization & Software
Topic #3 - continued
Principle Components of a Computer (CPU)
Dr. Fadi Alzhouri

Adapted from notes by


Dr. Sofiene Tahar
Dr. Anjali Agarwal
Performance: Execution Time

• Processor time depends on the hardware (processor and memory


connected by bus) involved in execution of individual machine
instructions.
• CPU time
• Time spent processing a given job
• Discounts I/O time, other jobs’ shares
• Combines user CPU time and system CPU time.

2
Performance: Execution Time = CPU Time
• Computers operations are synchronized with a clock signal.
• Clock will determine when the events must happen.
• Clock period, or clock cycle is the amount of time in which a period is completed.
• Clock rate (or frequency) is the reverse of the clock period.
• Example: if clock period is 250 ps, the clock rate will be 4 GHz.

• CPU Clocking

 Clock period: duration of a clock cycle


 E.g., 250ps = 0.25ns = 250×10–12s
 Clock frequency (rate): cycles per second
 E.g., 4.0GHz = 4.0×109Hz

3
CPU Time
CPU Time  CPU Clock Cycles  Clock Cycle Time
CPU Clock Cycles

Clock Rate

 Performance improved by
 Reducing number of clock cycles
 Increasing clock rate (reducing length of clock cycles)

4
Example: CPU Time

• A program runs in 10 seconds on computer A, which has a 2 GHz


clock. We want to have a computer, B, which will run this program in
6 seconds. The designer has determined that a substantial increase in
the clock rate is possible, but this increase will affect the rest of the
CPU design, causing computer B to require 1.2 times as many clock
cycles as computer A for this program.

What clock rate should we tell the designer to target?

5
Example: CPU Time (cont.)

Clock CyclesB 1.2  Clock Cycles A


Clock RateB  
CPU Time B 6s
Clock Cycles A  CPU Time A  Clock Rate A
 10s  2GHz  20  10 9
1.2  20  10
9
24  10 9
Clock RateB    4GHz
6s 6s

6
Components of a Computer System

7
Central Processing Unit (CPU)
• CPU is the heart and brain
• It interprets and executes machine level instructions
• Controls data transfer from/to Main Memory (MM) and CPU
• Detects any errors

• In the following lectures, we will learn:


• Instruction representation
• Data transfer mechanism between MM and CPU
• The internal functional units of two different CPU architectures
• How these units are interconnected
• How a processor executes instructions

8
Instruction Representation
• CPU operation is determined by the instruction it executes
• Collection of these instructions that a CPU can execute forms its Instruction Set
Architecture.
• An instruction is represented as sequence of bits, for example:
1001 0010 0000 0011 1011 1011 1000 0001
• Instruction is divided into fields 9 2 0 3 B B 8 1
Opcode Operand1 Operand2

• Opcode indicates the operation to be performed, eg., 92 above indicates a copy


operation (MOV). Number of bits in opcode field depends on the number of
instructions in the ISA, the number of instruction types and the complexity of the
instructions.
• we need one, two or three fields for operands – for source and destination
• Opcode represents
• The nature of operands (data or address), operand 1 is address and operand 2 is data
• Addressing mode (register or memory), operand 1 is memory, and operand 2 is immediate data.
9
Basic Instruction Types
Not all instructions require two operands
• 3-address instructions
Operation Destination, Source1, Source2
e.g. Add A, B, C ;A=B+C
• 2-address instructions
Operation Destination, Source
e.g. Move B, C ;B=C
Add A, C ;A=C+A
Here the destination is implicitly the Source2

• 1-address instructions
e.g. Inc A ; A = A+1
10
Simple Instruction Set
Assume we have a processor whose Instruction Set consists of four machine
language instructions
• Move from a memory location to a data register in CPU
• Move from a data register in CPU to a memory location
move $0000 0000
• Add the contents of a memory location to a data register
add
• Stop move
Suppose our program for Z = X + Y looks like:
Move D0, X
Add D0, Y
Move Z, D0
Directive
Stop
This program is coded into machine instruction and suppose is loaded into memory
starting at location $0000 0000 11
• How does the CPU know which instruction to execute?
• There is a dedicated register in CPU called Program Counter (PC)
that points to the memory location where next instruction is stored
Therefore, at start PC = $0000 0000
• Instruction is in Main Memory – it is to be transferred
(fetched) to CPU to be executed
• CPU has an Instruction Register (IR) that holds the instruction
• What kind of instruction is to be executed?
• CPU has its own Instruction Interpreter (Decoder)
• Followed by Instruction execution
• Next instruction follows. PC is incremented by length of
instruction just completed
12
Mechanism of Transferring Data from MM to CPU
CPU has an external bus that connects it to the Memory and I/O devices.
The data lines are connected to the processor via the Memory Data Register
(MDR)
The address lines are connected to the processor via the Memory Address
Register (MAR)
• Memory address from where the instruction/data is to be accessed is copied into
MAR register
• Contents of MAR are loaded onto address bus
• Corresponding memory location accessed MAR
Address bus

• Contents of this location put onto data bus


Data bus
• Data on data bus loaded into MDR MDR

Control bus
CPU MM
R/W 13
General Purpose Register (GPR)Architecture
Its functional units are:
- Data Registers: Register File (consists of number of data register, for example D0, D1,
D2,..., D7) for arithmetic operations – holds any kind of data
- Address Registers: A0, A1, A2,..., A7 serve as pointers to memory addresses
- Working Registers: several such registers – serve as scratch pads for CPU
- Program Counter (PC) holds the address in memory of the next instruction to be
executed. After an instruction is fetched from memory, the PC is automatically
incremented to hold the address of, or point to, the next instruction to be executed.
- Instruction Register (IR) holds the most recently read instruction from memory while it
is being decoded by the Instruction Interpreter.
- Memory Address Register (MAR) holds the address of the next location to be accessed
in memory.
- Memory Buffer Register (MBR or MDR) holds the data just read from memory, or the
data which is about to be written to memory. Buffer is referring to temporarily holding
data.
- Status Register (SR) to record status information 14
GPR CPU (16 bit)

0
1 Register Data bus
2 File MDR
3 IR

Address bus
MAR

Interpreter
ALU

PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 15
Program Execution
Fetch Cycle:
• Processor fetches one instruction at a time from successive memory locations until
a branch/jump occurs.
• Instructions are located in the memory location pointed to by the PC
• Instruction is loaded into the IR
• Increment the contents of the PC by the size of an instruction
Decode Cycle:
• Instruction is decoded/interpreted, opcode will provide the type of operation to be
performed, the nature and mode of the operands
• Decoder and control logic unit are responsible for selecting the registers involved
and directing the data transfer.
Execute Cycle:
• Carry out the actions specified by the instruction in the IR
16
Execution for add R3, R1,R2 in a GPR processor
Memory location MAR  PC R3 R1 + R2

MDR  M[MAR]

16 bit instruction
Fetch
PC  PC + 2

IR  MDR

Decode

R3  R1 + R2 Execute
17
Instruction Execution in GPR Machine
ADD R3, R1, R2
R3 <- R1 + R2

$10 9103
$12 9202 1) Fetch PC = $14 & Inc PC
$14 A039
2) Decode
Memory
1010 xxxx xx11 10 01
ADD R3, R2, R1

MAR MDR 3) Execute


Control
PC R0
R1
IR ...
R15 ALU

CPU
18
GPR CPU

0
1 Register Data bus
2 File MDR
3 IR

Address bus
Type equation here. MAR

Interpreter
ALU

PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 19
Timing Analysis of Instruction Execution ADD R3, R1, R2

0.3ns 1 MARPC Clock Cycles (P) – regular time intervals


defined by the CPU clock
Clock Rate, R = 1/P cycles per second (Hz)
3+0.3ns 10+1 MDRM[MAR] fetch 500 MHz => P = 2ns
1.25 GHz => P = 0.8ns
3.33 GHz => P = 0.3ns
0.3ns 1 IRMDR

2x0.3ns 1+1 Processor: 3.33GHz


PCPC+2 Inc PC Memory: 333MHz

1/(3.3 x 109) = 0.3 ns


0.3ns 1 decode decode 1/(333 x 106) = 3 ns

3x0.3ns 1+1+1 R3R1+R2 execute

5.7 ns 19 clock cycles


20
Timing Analysis of Instruction Execution ADD R3, R1, #100

0.3ns 1 MARPC
imm

3+0.3ns 10+1 MDRM[MAR]


1010 11 01 0110 0100
fetch ADD R3, R1, #100
0.3ns 1 IRMDR

2x0.3ns 1+1 PCPC+2

decode decode Data 100 extracted from IR


0.3ns 1

3x0.3ns 1+1+1 R3R1 + IR[100] execute

5.7 ns 19 clock cycles


21
Execution for add R1,R0,X in a GPR processor
memory location

A464 MAR  PC 1010 01 00 0110 0100


ADD R1, R0, X = $64
MDR  M[MAR]
Fetch
PC  PC + 2

IR  MDR

Decode

Address X extracted from IR MAR  IR (X)

Contents of Address X MDR  M[MAR] Execute


transferred to MDR

Contents of Address X R1  MDR + R0


added to R0 22
GPR CPU

0
1 Register Data bus
2 File MDR
3 IR

Address bus
Type equation here. MAR

Interpreter
ALU

PC
Memory
Control
16 bit Increment
CPU 8 bit Memory 23
Complex (multiple words) instruction

Assembly ADD R4, ($2000), ($2002)

Operation R4  M[$2000] + M[$2002]

Encodings ADD R4 M[$2000] M[$2002]

1010 0100 $2000 $2002


4 bits 4 bits 16 bits 16 bits

40 bits = 5 Bytes!!

16-bit data: 1010 XXXX 0100 XXXX $2000 $2002


16 bits 16 bits
16 bits

48 bits = 6 Bytes 24
1010 0000 0100 0000 $2000 $2002
ADD R4, ($2000), ($2002) 16 bits 16 bits 16 bits

48 bits = 6 bytes =3 Words

$1000 A040
Instruction $1002 2000
PC
$1004 2002 48
A040 IR

16 bits

$0710 $0311

$2000
16

+
MAR Temp 1

$2002
$2000 0710
Data MDR Temp 2
$2002 0311 R4

16 bit data, 16 bit address architecture


25
ADD R4, ($2000), ($2002)

MARPC $1000

Fetch MDRM[MAR] $A040

IRMDR $A040

other opcode
decode interpret

add reg, mem1, mem2

26
ADD R4, ($2000), ($2002)

PCPC+2 $1002

MAR PC $1002


Fetch address of
memory operand 1
MDRM[MAR] $2000

Temp1  MDR $2000

PCPC+2 $1004

MAR PC $1004
Fetch address of
memory operand 2
MDRM[MAR] $2002

Temp2MDR $2002
27
ADD R4, ($2000), ($2002)

MAR Temp 1 $2000

Fetch memory operand 1 MDR M[MAR] $0710

Temp 1  MDR $0710

MAR  Temp2 $2002

Fetch memory operand 2 MDR M[MAR] $0311

Temp2  MDR

Execution R4  Temp1+Temp2 ($0311+$0710)

Increment PC PCPC+2 $1006


28
0.3ns 1 MARPC
ADD R4, ($2000), ($2002) 3.3ns 10+1 MDRM[MAR]
0.3ns 1 IRMDR

Processor: 3.3GHz 0.3ns 1 decode


Memory: 333MHz 0.9ns 2+1 PCPC+2
0.3ns 1 MAR PC
3.3ns 10+1 MDRM[MAR]
Memory access: 10cc 0.3ns 1 Temp1  MDR
Register transfer: 1cc 0.9ns 2+1 PCPC+2
Logical operation: 1cc
0.3ns 1 MAR PC
Arithmetic operation: 2cc
3.3ns 10+1 MDRM[MAR]
0.3ns 1 Temp2MDR
0.3ns 1 MAR Temp 1
3.3ns 10+1 MDR M[MAR]
0.3ns 1 Temp 1  MDR
0.3ns 1 MAR  Temp2
3.3ns 10+1 MDR M[MAR]
0.3ns 1 Temp 2  MDR
1.2ns 1+2+1 R4Temp1+Temp2
Total: 79 cc = 23.7 ns 0.9ns 2+1 PCPC+2 29
0.3ns 1 MARPC
ADD R4, ($2000), ($2002) 3.3ns 10+1 MDRM[MAR]
0.3ns 1 IRMDR

Processor: 3.3GHz 0.3ns 1 decode


Memory: 333MHz 0.9ns 2+1 PCPC+2
0.3ns 1 MAR PC
3.3ns 10+1 MDRM[MAR]
Memory access: 10cc 0.3ns 1 Temp1  MDR
Register transfer: 1cc 0.9ns 2+1 PCPC+2
Logical operation: 1cc
0.3ns 1 MAR PC
Arithmetic operation: 2cc
3.3ns 10+1 MDRM[MAR]
0.3ns 1 Temp2MDR
0.3ns 1 MAR Temp 1
3.3ns 10+1 MDR M[MAR]
63% memory traffic!! 0.3ns 1 Temp 1  MDR
0.3ns 1 MAR  Temp2
3.3ns 10+1 MDR M[MAR]
0.3ns 1 Temp 2  MDR
1.2ns 1+2+1 R4Temp1+Temp2
Total: 79 cc = 23.7 ns 0.9ns 2+1 PCPC+2 30
Accumulator (Acc)Architecture
• Its functional units are same as GPR architecture, except there is only ONE
register – accumulator (Acc) – instead of the Register File
Ex: Z = X + Y
Move contents of location X to Acc
Add contents of location Y to Acc
Move from Acc to location Z
Stop
• All operations and data movements are on this single register
• Most of the instructions in the instruction set require only one Operand
• Destination and Source are implicitly Acc
• Leads to shorter instructions but program may be slower to execute since
there are more moves to memory for intermediate results (to free Acc)
• May lead to inefficiency 31
Accumulator Architecture CPU

Data bus
Acc MDR
IR

Address bus
MAR

Interpreter
ALU

PC
Memory
Control
16 bit Increment
CPU 10 bit Memory 32
Execution for Add Y in an Acc Architecture
MAR  PC ADD Y
ACC  ACC + M[Y]
MDR  M[MAR]
Fetch
PC  PC + 2

IR  MDR

Decode

Address Y extracted from IR MAR  Y

Contents of Address Y MDR  M[MAR] Execute


transferred to MDR

Contents of Address Y Acc  MDR + Acc


added to Accumulator 33
Your First Assembly Program

We want to compute
a  (x+y) * (x-y)
where x, y and a are memory references

1) Write an assembly program for the above task using:


a) Accumulator machine
b) GPR machine ( 4 registers)

2) Assuming all accumulator instructions take 50 ns; and all


GPR instructions take 30ns, except move M-R/R-M that take 50ns
Compute the total execution of the programs in 1a) and 1b)

34
Instruction Sets
1) Accumulator Machine Memory
ADD X ; ACC  ACC + M[X]
SUB X ; ACC  ACC – M[X] ..
MUL X ; ACC  ACC * M[X] .
.
LD X ; ACC  M[X] X x
ST X ; M[X]  ACC
address data
X: address in memory
x: data in memory, i.e., M[X]=x

2) GPR Machine
ADD Rk, Ri, Rj ; Rk  Ri + Rj
SUB Rk, Ri, Rj ; Rk  Ri – Rj dest src1 src2
MUL Rk, Ri, Rj ; Rk  Ri * Rj ADD Rk, Ri, Rj
MOV Rj, Ri ; Rj  Ri
MOV X, Ri ; M[X]  Ri
MOV Ri, X ; Ri  M[X] 35
1) Accumulator Machine
LD X ; ACC  M[X] = x
x X
y Y
ADD Y ; ACC  M[Y] + x ..
..
ST T ; M[T] = temp  x + y a A
temp T

LD X ; ACC  M[X] = x
SUB Y ; ACC  ACC - M[Y] = x-y data address

MUL T ; ACC  ACC * M[T] = (x-y)*temp


ST A ; M[A] = a  (x-y)*(x+y)

36
2) GPR

MOV R0, X ; R0  M[X] = x


MOV R1, Y ; R1  M[Y] = y
ADD R2, R0, R1 ; R2  R0+R1
SUB R3, R0, R1 ; R3  R0-R1
MUL R0, R2, R3 ; R0  R2*R3
MOV A, R0 ; M[A]  R0 = R2*R3
; a  (x+y)*(x-y)

37
The Assembly Programs
1) Accumulator Machine
LD X ; ACC  M[X] = x
ADD Y ; ACC  ACC + M[Y] = x + y
ST T ; M[T] = temp  ACC = x + y x X
y Y
LD X ; ACC  M[X] = x ..
SUB Y ; ACC  ACC – M[Y] = x-y .
.
MUL T ; ACC  (x-y)*M[T] = (x-y)*temp a A
ST A ; M[A] = a  (x-y)*(x+y) temp T

2) GPR
MOV X, R0 ; R0  M[X] = x data address
MOV Y, R1 ; R1  M[Y] = y
ADD R2, R0, R1 ; R2  R0+R1
SUB R3, R0, R1 ; R3  R0-R1
MUL R0, R2, R3 ; R0  R2*R3
MOV A, R0 ; M[A]  R0 = R2*R3
; a  (x+y)*(x-y)
38
Next …. Memory (Topic #3_2)

39

You might also like