Module_5_DD_CO_Ver4

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 44

UNIT – IV

Basic Processing Unit


Overview
 Instruction Set Processor (ISP)
 Central Processing Unit (CPU)
 A typical computing task consists of a series

of steps specified by a sequence of machine


instructions that constitute a program.
 An instruction is executed by carrying out a

sequence of more rudimentary operations.


Some Fundamental
Concepts
Fundamental Concepts
 Processor fetches one instruction at a time and
perform the operation specified.
 Instructions are fetched from successive memory
locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory
location containing the next instruction to be fetched
using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed
to by the PC. The contents of this location are
loaded into the IR (fetch phase).
IR ← [[PC]]
 Assuming that the memory is byte addressable,
increment the contents of the PC by 4 (fetch phase).
PC ← [PC] + 4
 Carry out the actions specified by the instruction in
the IR (execution phase).
Processor Organization
Internal processor
bus

Control signals

PC

Instruction
Address
decoder and
lines
MAR control logic

Memory
bus

MDR
Data
lines IR

Constant 4 R0

Select MUX

Datapath
Add
A B
ALU Sub R n - 1 
control ALU
lines
Carry-in
XOR TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.


Internal organization of the
processor
 ALU
 Registers for temporary storage
 Various digital circuits for executing different micro
operations.(gates, MUX,decoders,counters).
 Internal path for movement of data between ALU
and registers.
 Driver circuits for transmitting signals to external
units.
 Receiver circuits for incoming signals from external
units.
 PC:
 Keeps track of execution of a program
 Contains the memory address of the next instruction to be
fetched and executed.
MAR:
 Holds the address of the location to be accessed.
 I/P of MAR is connected to Internal bus and an O/p to external
bus.
MDR:
 Contains data to be written into or read out of the addressed
location.
 IT has 2 inputs and 2 Outputs.
 Data can be loaded into MDR either from memory bus or from
internal processor bus.
The data and address lines are connected to the internal bus via
MDR and MAR
Registers:
 The processor registers R0 to Rn-1 vary considerably from
one processor to another.
 Registers are provided for general purpose used by

programmer.
 Special purpose registers-index & stack registers.
 Registers Y,Z &TEMP are temporary registers used by
processor during the execution of some instruction.
Multiplexer:
 Select either the output of the register Y or a constant value 4

to be provided as input A of the ALU.


 Constant 4 is used by the processor to increment the contents

of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
b us

Ri in

1.Register Transfers Ri

Ri out

Y in

Constant 4

Select MUX

A B
ALU

Z in

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
 The input and output gates for register Ri are
controlled by signals isRin and Riout .
R Is set to1 – data available on common bus
in
are loaded into Ri.
R
iout Is set to1 – the contents of register are
placed on the bus.
R
iout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus

Ri

Riout

Yin

Constant 4

Select MUX

A B
ALU

Zin

Z out

Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
 The ALU is a combinational circuit that has no
internal storage.
 ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
 What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers
 All operations and data transfers are controlled by the processor clock.
Bus

D Q
1
Q
Riout

Ri in
Clock

Figure 7.3.7.3.Input
Figure Inputand
andoutput
outputgating
gating for one register
register bit.
bit.
Fetching a Word from Memory
 Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus

MDR

MDR inE MDRin

Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
3.Fetching a Word from
Memory
 The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
 To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
 Move (R1), R2
 MAR ← [R1]
 Start a Read operation on the memory bus
 Wait for the MFC response from the memory
 Load MDR from the memory bus
 R2 ← [MDR]
Step 1 2 3

Timing Clock

MARin

Assume MAR Address


is always available
on the address lines
Read
of the memory bus.
MR

MDRinE
 Move (R1), R2
1. R1out, MARin, Read
Data
2. MDRinE, WMFC
3. MDRout, R2in MFC

MDR out

Figure 7.5. Timing of a memory Read operation.


4.Storing a word in memory
 Address is loaded into MAR
 Data to be written loaded into MDR.
 Write command is issued.
 Example:Move R2,(R1)

R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
 Add (R3), R1
 Fetch the instruction
 Fetch the first operand (the contents of the

memory location pointed to by R3)


 Perform the addition
 Load the result into R1
Execution of a Complete
Instruction Internal processor
bus

Add (R3), R1 Control signals

PC
Step Action
Instruction
Address
decoder and
lines
1 PC out , MAR in , Read,Select4,Add, Zin MAR control logic

Memory
2 Zout , PC in , Y in , WMF C bus

3 MDR out , IR in Data


MDR
lines IR
4 R3out , MAR in , Read
5 R1out , Y in , WMF C Y
Constant 4 R0
6 MDR out , SelectY,Add, Zin
7 Zout , R1in , End Select MUX

Add
A B
ALU Sub R n - 1 
control ALU
lines
Figure7.6. Control sequence
for executionof theinstructionAdd (R3),R1.
XOR
Carry-in
TEMP

Figure 7.1. Single-bus organization of the datapath inside a processor.


Execution of Branch
Instructions
A branch instruction replaces the contents of
PC with the branch target address, which is
usually obtained by adding an offset X given
in the branch instruction.
 The offset X is usually the difference between

the branch target address and the address


immediately following the branch instruction.
 UnConditional branch
Execution of Branch
Instructions

StepAction

1 PCout , MAR in , Read,Select4,Add, Zin


2 Zout, PCin , Yin, WMF C
3 MDRout , IR in
4 Offset-field-of-IR
out, Add, Zin

5 Zout, PCin , End

Figure 7.7. Control sequence for an unconditional branch instruction.


Chapter 8. Pipelining
Overview
 Pipelining is widely used in modern
processors.
 Pipelining improves system performance in

terms of throughput.
 Pipelined organization requires sophisticated

compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
 Use faster circuit technology to build the
processor and the main memory.
 Arrange the hardware so that more than one

operation can be performed at the same time.


 In the latter way, the number of operations

performed per second is increased even


though the elapsed time needed to perform
any one operation is not changed.
Use the Idea of Pipelining in a
Computer
Fetch + Execution
T ime
I1 I2 I3
Time
Clock cycle 1 2 3 4
F E F E F E
1 1 2 2 3 3 Instruction

I1 F1 E1
(a) Sequential execution

I2 F2 E2
Interstage buffer
B1
I3 F3 E3

Instruction Execution
fetch unit (c) Pipelined execution
unit

Figure 8.1. Basic idea of instruction pipelining.


(b) Hardware organization
 Processor executes a program by fetching
and executing instructions one after the other.
 Computer has 2 separate hardware units one

for fetch and other for execute.


 Instruction fetched is deposited in

intermediate storage buffer B1, which


enables execution unit to execute instruction
while fetch unit is fetching next instruction.
 Results of execution is deposited in

destination location specified by the


instruction.
 Computer is controlled by clock.
 Fetch and execute is completed in one clk cycle.

 Ist clk cycle---fetch unit fetches an instr I1(step

F1) and stores it in B1 at end of clk cycle.


 2nd clk cycle—instr fetch unit proceeds with

fetch of I2(step F2)..


 Meanwhile execution unit performs operations of

I1 which is available to it in B1(step E1).


 By end of 2nd clk cycle execution of I1 is

completed and I2 is available.I2 is stored in B!


replacing I1.
 Step E2 is performed during 3 rd clk cycle while I3

is being fetched.
Use the Idea of Pipelining in a
Computer Clock cycle 1 2 3 4 5 6 7
Time

Instruction

I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

(a) Instruction execution divided into four steps

Interstage buffers

D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3

(b) Hardware organization

Textbook page: 457

Figure 8.2. A 4-stage pipeline.


 During clk cycle-4, information in buffers are
 B1 holds instr I3, which was fetched in cycle-

3 and being decoded.


 B2 holds both source operand for I2 and

specification of operation to be
performed(produced by decoding unit in
cycle-3)
 Buffer also holds information needed for write

step of I2(W2)
 Buffer B3 holds result produced by execution

unit and destination information for I1.


Role of Cache Memory
 Each pipeline stage is expected to complete in one
clock cycle.
 The clock period should be long enough to let the
slowest pipeline stage to complete.
 Faster stages can only wait for the slowest one to
complete.
 Since main memory is very slow compared to the
execution, if each instruction needs to be fetched
from main memory, pipeline is almost useless.
 Fortunately, we have cache.
Pipeline Performance
 The potential increase in performance
resulting from pipelining is proportional to the
number of pipeline stages.
 However, this increase would be achieved

only if all pipeline stages require the same


time to complete, and there is no interruption
throughout program execution.
 Unfortunately, this is not true.
Pipeline Performance
Time
Clock cycle 1 2 3 4 5 6 7 8 9

Instruction

I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4 W4

I5 F5 D5 E5

Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance
 The previous pipeline is said to have been stalled for two clock
cycles.
 Any condition that causes a pipeline to stall is called a hazard.
 Data hazard – any condition in which either the source or the
destination operands of an instruction are not available at the
time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
 Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
 Structural hazard – the situation when two instructions require
the use of a given hardware resource at the same time.
 Control hazard-delay in availability of
instruction.
 Result of miss in cache, requiring instruction

to be fetched from main memory


Pipeline Performance Time
Clock cycle 1 2 3 4 5 6 7 8 9
Instruction Instruction
hazard I1 F1 D1 E1 W1

I2 F2 D2 E2 W2

I3 F3 D3 E3 W3

(a) Instruction execution steps in successive clock cycles

Time
Clock cycle 1 2 3 4 5 6 7 8 9

Stage
F: Fetch F1 F2 F2 F2 F2 F3
Idle periods –
D: Decode D1 idle idle idle D2 D3
stalls (bubbles)
E: Execute E1 idle idle idle E2 E3

W: Write W1 idle idle idle W2 W3

(b) Function performed by each processor stage in successive clock cycles

Figure 8.4. Pipeline stall caused by a cache miss in F2.


 Load X(R1),R2
 Memory address X+[R1], to be computed in

step E2 in cycle-4.
 Memory access takes place in cycle-5.
 Operand read from memory is written into R2

in cycle-6
 Execution takes 2 clk cycles( cycle-4 and 5)
 It causes pipeline to stall for one cycle

because both instr I2 and I3 require access to


register file in cycle-6.
 Even though instr and data are all available,
pipeline is stalled because one hardware
resource register file cannot handle 2
operations at once.
 If register file had 2 input ports, pipeline

would not stall.


Pipeline Performance
Load X(R1), R2
Structural
Time
hazard Clock cycle 1 2 3 4 5 6 7

Instruction
I1 F1 D1 E1 W1

I2 (Load) F2 D2 E2 M2 W2

I3 F3 D3 E3 W3

I4 F4 D4 E4

I5 F5 D5

Figure 8.5. Effect of a Load instruction on pipeline timing.


Pipeline Performance
 Again, pipelining does not result in individual
instructions being executed faster; rather, it is the
throughput that increases.
 Throughput is measured by the rate at which
instruction execution is completed.
 Pipeline stall causes degradation in pipeline
performance.
 We need to identify all hazards that may cause the
pipeline to stall and to find ways to minimize their
impact.

You might also like