Module_5_DD_CO_Ver4

UNIT – IV
Basic Processing Unit

Overview
 Instruction Set Processor (ISP)
 Central Processing Unit (CPU)
 A typical computing task consists of a series
of steps specified by a sequence of machine

instructions that constitute a program.
 An instruction is executed by carrying out a
sequence of more rudimentary operations.

Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time and
perform the operation specified.
 Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
 Carry out the actions specified by the instruction in
the IR (execution phase).
Processor Organization
Internal processor
bus
Control signals
PC
Instruction
Address
decoder and
lines
MAR control logic
Memory
bus
MDR
Data
lines IR
Constant 4 R0
Select MUX
Datapath
Add
A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP
Figure 7.1. Single-bus organization of the datapath inside a processor.

Internal organization of the
processor
 ALU
 Registers for temporary storage
 Various digital circuits for executing different micro
operations.(gates, MUX,decoders,counters).
 Internal path for movement of data between ALU
and registers.
 Driver circuits for transmitting signals to external
units.
 Receiver circuits for incoming signals from external
units.
 PC:
 Keeps track of execution of a program
 Contains the memory address of the next instruction to be
fetched and executed.
MAR:
 Holds the address of the location to be accessed.
 I/P of MAR is connected to Internal bus and an O/p to external
bus.
MDR:
 Contains data to be written into or read out of the addressed
location.
 IT has 2 inputs and 2 Outputs.
 Data can be loaded into MDR either from memory bus or from
internal processor bus.
The data and address lines are connected to the internal bus via
MDR and MAR
Registers:
 The processor registers R0 to Rn-1 vary considerably from
one processor to another.
 Registers are provided for general purpose used by
programmer.
 Special purpose registers-index & stack registers.
 Registers Y,Z &TEMP are temporary registers used by
processor during the execution of some instruction.
Multiplexer:
 Select either the output of the register Y or a constant value 4
to be provided as input A of the ALU.

 Constant 4 is used by the processor to increment the contents
of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
b us
Ri in
1.Register Transfers Ri
Ri out
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
 The input and output gates for register Ri are
controlled by signals isRin and Riout .
R Is set to1 – data available on common bus
in
are loaded into Ri.
R
iout Is set to1 – the contents of register are
placed on the bus.
R
iout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
 The ALU is a combinational circuit that has no
internal storage.
 ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
 What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers
 All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 7.3.7.3.Input
Figure Inputand
andoutput
outputgating
gating for one register
register bit.
bit.
Fetching a Word from Memory
 Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus
MDR
MDR inE MDRin
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
3.Fetching a Word from
Memory
 The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
 To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
 Move (R1), R2
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]
Step 1 2 3
Timing Clock
MARin
Assume MAR Address

is always available
on the address lines
Read
of the memory bus.
MR
MDRinE
 Move (R1), R2
1. R1out, MARin, Read
Data
2. MDRinE, WMFC
3. MDRout, R2in MFC
MDR out
Figure 7.5. Timing of a memory Read operation.

4.Storing a word in memory
 Address is loaded into MAR
 Data to be written loaded into MDR.
 Write command is issued.
 Example:Move R2,(R1)
R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction
 Fetch the first operand (the contents of the
memory location pointed to by R3)

 Perform the addition
 Load the result into R1
Execution of a Complete
Instruction Internal processor
bus
Add (R3), R1 Control signals
PC
Step Action
Instruction
Address
decoder and
lines
1 PC out , MAR in , Read,Select4,Add, Zin MAR control logic
Memory
2 Zout , PC in , Y in , WMF C bus
3 MDR out , IR in Data

MDR
lines IR
4 R3out , MAR in , Read
5 R1out , Y in , WMF C Y
Constant 4 R0
6 MDR out , SelectY,Add, Zin
7 Zout , R1in , End Select MUX
Add
A B
ALU Sub R n - 1 
control ALU
lines
Figure7.6. Control sequence
for executionof theinstructionAdd (R3),R1.
XOR
Carry-in
TEMP
Figure 7.1. Single-bus organization of the datapath inside a processor.

Execution of Branch
Instructions
A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
 The offset X is usually the difference between
the branch target address and the address

immediately following the branch instruction.
 UnConditional branch
Execution of Branch
Instructions
StepAction
1 PCout , MAR in , Read,Select4,Add, Zin

2 Zout, PCin , Yin, WMF C
3 MDRout , IR in
4 Offset-field-of-IR
out, Add, Zin
5 Zout, PCin , End
Figure 7.7. Control sequence for an unconditional branch instruction.

Chapter 8. Pipelining
Overview
 Pipelining is widely used in modern
processors.
 Pipelining improves system performance in
terms of throughput.
 Pipelined organization requires sophisticated
compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
 Use faster circuit technology to build the
processor and the main memory.
 Arrange the hardware so that more than one
operation can be performed at the same time.

 In the latter way, the number of operations
performed per second is increased even

though the elapsed time needed to perform
any one operation is not changed.
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
Figure 8.1. Basic idea of instruction pipelining.

(b) Hardware organization
 Processor executes a program by fetching
and executing instructions one after the other.
 Computer has 2 separate hardware units one
for fetch and other for execute.

 Instruction fetched is deposited in
intermediate storage buffer B1, which

enables execution unit to execute instruction
while fetch unit is fetching next instruction.
 Results of execution is deposited in
destination location specified by the

instruction.
 Computer is controlled by clock.
 Fetch and execute is completed in one clk cycle.
 Ist clk cycle---fetch unit fetches an instr I1(step
F1) and stores it in B1 at end of clk cycle.

 2nd clk cycle—instr fetch unit proceeds with
fetch of I2(step F2)..

 Meanwhile execution unit performs operations of
I1 which is available to it in B1(step E1).

 By end of 2nd clk cycle execution of I1 is
completed and I2 is available.I2 is stored in B!

replacing I1.
 Step E2 is performed during 3 rd clk cycle while I3
is being fetched.
Use the Idea of Pipelining in a
Computer Clock cycle 1 2 3 4 5 6 7
Time
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
(a) Instruction execution divided into four steps
Interstage buffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
(b) Hardware organization
Textbook page: 457
Figure 8.2. A 4-stage pipeline.

 During clk cycle-4, information in buffers are
 B1 holds instr I3, which was fetched in cycle-
3 and being decoded.

 B2 holds both source operand for I2 and
specification of operation to be
performed(produced by decoding unit in
cycle-3)
 Buffer also holds information needed for write
step of I2(W2)
 Buffer B3 holds result produced by execution
unit and destination information for I1.

Role of Cache Memory
 Each pipeline stage is expected to complete in one
clock cycle.
 The clock period should be long enough to let the
slowest pipeline stage to complete.
 Faster stages can only wait for the slowest one to
complete.
 Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
 Fortunately, we have cache.
Pipeline Performance
 The potential increase in performance
resulting from pipelining is proportional to the
number of pipeline stages.
 However, this increase would be achieved
only if all pipeline stages require the same

time to complete, and there is no interruption
throughout program execution.
 Unfortunately, this is not true.
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
 The previous pipeline is said to have been stalled for two clock
cycles.
 Any condition that causes a pipeline to stall is called a hazard.
 Data hazard – any condition in which either the source or the
destination operands of an instruction are not available at the
time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
 Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
 Structural hazard – the situation when two instructions require
the use of a given hardware resource at the same time.
 Control hazard-delay in availability of
instruction.
 Result of miss in cache, requiring instruction
to be fetched from main memory

Pipeline Performance Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
(a) Instruction execution steps in successive clock cycles
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3
Idle periods –
D: Decode D1 idle idle idle D2 D3
stalls (bubbles)
E: Execute E1 idle idle idle E2 E3
W: Write W1 idle idle idle W2 W3
(b) Function performed by each processor stage in successive clock cycles
Figure 8.4. Pipeline stall caused by a cache miss in F2.

 Load X(R1),R2
 Memory address X+[R1], to be computed in
step E2 in cycle-4.
 Memory access takes place in cycle-5.
 Operand read from memory is written into R2
in cycle-6
 Execution takes 2 clk cycles( cycle-4 and 5)
 It causes pipeline to stall for one cycle
because both instr I2 and I3 require access to

register file in cycle-6.
 Even though instr and data are all available,
pipeline is stalled because one hardware
resource register file cannot handle 2
operations at once.
 If register file had 2 input ports, pipeline
would not stall.

Load X(R1), R2
Structural
Time
hazard Clock cycle 1 2 3 4 5 6 7
Instruction
I1 F1 D1 E1 W1
I2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5
Figure 8.5. Effect of a Load instruction on pipeline timing.

 Again, pipelining does not result in individual
instructions being executed faster; rather, it is the
throughput that increases.
 Throughput is measured by the rate at which
instruction execution is completed.
 Pipeline stall causes degradation in pipeline
performance.
 We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their
impact.

Module_5_DD_CO_Ver4

Uploaded by

Copyright:

Available Formats

Module_5_DD_CO_Ver4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Module_5_DD_CO_Ver4

Uploaded by

Copyright:

Available Formats

UNIT – IV

Basic Processing Unit

of steps specified by a sequence of machine

sequence of more rudimentary operations.

Figure 7.1. Single-bus organization of the datapath inside a processor.

to be provided as input A of the ALU.

MDR inE MDRin

Assume MAR Address

Figure 7.5. Timing of a memory Read operation.

memory location pointed to by R3)

Add (R3), R1 Control signals

3 MDR out , IR in Data

Figure 7.1. Single-bus organization of the datapath inside a processor.

the branch target address and the address

1 PCout , MAR in , Read,Select4,Add, Zin

5 Zout, PCin , End

Figure 7.7. Control sequence for an unconditional branch instruction.

operation can be performed at the same time.

performed per second is increased even

Figure 8.1. Basic idea of instruction pipelining.

for fetch and other for execute.

intermediate storage buffer B1, which

destination location specified by the

 Ist clk cycle---fetch unit fetches an instr I1(step

F1) and stores it in B1 at end of clk cycle.

fetch of I2(step F2)..

I1 which is available to it in B1(step E1).

completed and I2 is available.I2 is stored in B!

(a) Instruction execution divided into four steps

(b) Hardware organization

Textbook page: 457

Figure 8.2. A 4-stage pipeline.

3 and being decoded.

unit and destination information for I1.

only if all pipeline stages require the same

to be fetched from main memory

(a) Instruction execution steps in successive clock cycles

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.

because both instr I2 and I3 require access to

would not stall.

Figure 8.5. Effect of a Load instruction on pipeline timing.

You might also like