Module_5_DD_CO_Ver4
Module_5_DD_CO_Ver4
Module_5_DD_CO_Ver4
Control signals
PC
Instruction
Address
decoder and
lines
MAR control logic
Memory
bus
MDR
Data
lines IR
Constant 4 R0
Select MUX
Datapath
Add
A B
ALU Sub R n - 1
control ALU
lines
Carry-in
XOR TEMP
programmer.
Special purpose registers-index & stack registers.
Registers Y,Z &TEMP are temporary registers used by
processor during the execution of some instruction.
Multiplexer:
Select either the output of the register Y or a constant value 4
of PC.
ALU:
Used to perform arithmetic and logical
operation.
Data Path:
The registers, ALU and interconnecting bus are
collectively referred to as the data path.
Internal processor
b us
Ri in
1.Register Transfers Ri
Ri out
Y in
Constant 4
Select MUX
A B
ALU
Z in
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
The input and output gates for register Ri are
controlled by signals isRin and Riout .
R Is set to1 – data available on common bus
in
are loaded into Ri.
R
iout Is set to1 – the contents of register are
placed on the bus.
R
iout Is set to 0 – the bus can be used for
transferring data from other registers .
Data transfer between two
registers:
EX:
Transfer the contents of R1 to R4.
1. Enable output of register R1 by setting
R1out=1. This places the contents of R1 on
the processor bus.
2. Enable input of register R4 by setting
R4in=1. This loads the data from the
processor bus into register R4.
Architecture Riin
Internal processor
bus
Ri
Riout
Yin
Constant 4
Select MUX
A B
ALU
Zin
Z out
Figure 7.2. Input and output gating for the registers in Figure 7.1.
2.Performing an Arithmetic or
Logic Operation
The ALU is a combinational circuit that has no
internal storage.
ALU gets the two operands from MUX and bus.
The result is temporarily stored in register Z.
What is the sequence of operations to add the
contents of register R1 to those of R2 and store the
result in R3?
1. R1out, Yin
2. R2out, SelectY, Add, Zin
3. Zout, R3in
Step 1: Output of the register R1 and input of
the register Y are enabled, causing the
contents of R1 to be transferred to Y.
Step 2: The multiplexer’s select signal is set to
select Y causing the multiplexer to gate the
contents of register Y to input A of the ALU.
Step 3: The contents of Z are transferred to the
destination register R3.
Register Transfers
All operations and data transfers are controlled by the processor clock.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 7.3.7.3.Input
Figure Inputand
andoutput
outputgating
gating for one register
register bit.
bit.
Fetching a Word from Memory
Address into MAR; issue Read operation; data into MDR.
Memory-bus Internal processor
data lines MDRoutE MDRout bus
MDR
Figure 7.4.
Figure 7.4. Connection and control
Connection and controlsignals
signalsfor
forregister
registerMDR.
MDR.
3.Fetching a Word from
Memory
The response time of each memory access varies
(cache miss, memory-mapped I/O,…).
To accommodate this, the processor waits until it
receives an indication that the requested operation
has been completed (Memory-Function-Completed,
MFC).
Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 ← [MDR]
Step 1 2 3
Timing Clock
MARin
MDRinE
Move (R1), R2
1. R1out, MARin, Read
Data
2. MDRinE, WMFC
3. MDRout, R2in MFC
MDR out
R1out,MARin
R2out,MDRin,Write
MDRoutE, WMFC
Execution of a Complete
Instruction
Add (R3), R1
Fetch the instruction
Fetch the first operand (the contents of the
PC
Step Action
Instruction
Address
decoder and
lines
1 PC out , MAR in , Read,Select4,Add, Zin MAR control logic
Memory
2 Zout , PC in , Y in , WMF C bus
Add
A B
ALU Sub R n - 1
control ALU
lines
Figure7.6. Control sequence
for executionof theinstructionAdd (R3),R1.
XOR
Carry-in
TEMP
StepAction
terms of throughput.
Pipelined organization requires sophisticated
compilation techniques.
Basic Concepts
Making the Execution of
Programs Faster
Use faster circuit technology to build the
processor and the main memory.
Arrange the hardware so that more than one
I1 F1 E1
(a) Sequential execution
I2 F2 E2
Interstage buffer
B1
I3 F3 E3
Instruction Execution
fetch unit (c) Pipelined execution
unit
is being fetched.
Use the Idea of Pipelining in a
Computer Clock cycle 1 2 3 4 5 6 7
Time
Instruction
I1 F1 D1 E1 W1
Fetch + Decode
+ Execution + Write I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
Interstage buffers
D : Decode
F : Fetch instruction E: Execute W : Write
instruction and fetch operation results
operands
B1 B2 B3
specification of operation to be
performed(produced by decoding unit in
cycle-3)
Buffer also holds information needed for write
step of I2(W2)
Buffer B3 holds result produced by execution
Instruction
I1 F1 D1 E1 W1
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4 W4
I5 F5 D5 E5
Figure 8.3. Effect of an execution operation taking more than one clock cycle.
Pipeline Performance
The previous pipeline is said to have been stalled for two clock
cycles.
Any condition that causes a pipeline to stall is called a hazard.
Data hazard – any condition in which either the source or the
destination operands of an instruction are not available at the
time expected in the pipeline. So some operation has to be
delayed, and the pipeline stalls.
Instruction (control) hazard – a delay in the availability of an
instruction causes the pipeline to stall.
Structural hazard – the situation when two instructions require
the use of a given hardware resource at the same time.
Control hazard-delay in availability of
instruction.
Result of miss in cache, requiring instruction
I2 F2 D2 E2 W2
I3 F3 D3 E3 W3
Time
Clock cycle 1 2 3 4 5 6 7 8 9
Stage
F: Fetch F1 F2 F2 F2 F2 F3
Idle periods –
D: Decode D1 idle idle idle D2 D3
stalls (bubbles)
E: Execute E1 idle idle idle E2 E3
step E2 in cycle-4.
Memory access takes place in cycle-5.
Operand read from memory is written into R2
in cycle-6
Execution takes 2 clk cycles( cycle-4 and 5)
It causes pipeline to stall for one cycle
Instruction
I1 F1 D1 E1 W1
I2 (Load) F2 D2 E2 M2 W2
I3 F3 D3 E3 W3
I4 F4 D4 E4
I5 F5 D5