Computer Organisation-Unit - II

UNIT-II Basic Processing Unit
Fundamental Concepts
 Processor fetches one instruction at a time, and perform the operation specified.
 Instructions are fetched from successive memory locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory location containing the next instruction to be
fetched using Program Counter (PC).
 Instruction Register (IR)
Executing an Instruction
 Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase). IR ← [[PC]]
 Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch
phase). PC ← [PC] + 4
 Carry out the actions specified by the instruction in the IR (execution phase).
Processor Organization
Figure 2.1 Single-bus organization of the data path inside a processor.
COMPUTER ORGANISATION RAMANUJA NAYAK

 ALU and all the registers are interconnected via a single common bus.
 The data and address lines of the external memory bus connected to the internal processor
bus via the memory data register, MDR, and the memory address register, MAR respectively.
 Register MDR has two inputs and two outputs.
 Data may be loaded into MDR either from the memory bus or from the internal
processor bus.
 The data stored in MDR may be placed on either bus.
 The input of MAR is connected to the internal bus, and its output is connected to the external
bus.
 The control lines of the memory bus are connected to the instruction decoder and control
logic.
 This unit is responsible for issuing the signals that control the operation of all the units
inside the processor and for increasing with the memory bus.
 The MUX selects either the output of register Y or a constant value 4 to be provided as input
A of the ALU.
 The constant 4 is used to increment the contents of the program counter.
Register Transfers
Figure 2.2 Register Transfer

 Instruction execution involves a sequence of steps in which data are transferred from one

register to another.
 For each register two control signals are used to place the contents of that register on the bus
or to load the data on the bus into register.(in figure)
 The input and output of register Riin and Riout is set to 1, the data on the bus are loaded into
Ri.
 Similarly, when Ri out is set to 1, the contents of register Ri are placed on the bus.
 While Riout is equal to 0, the bus can be used for transferring data from other registers.
Example
 Suppose we wish to transfer the contents of register R1 to register R4. This can be
accomplished as follows.
 Enable the output of registers R1 by setting R1out to 1. This places the contents of R1 on the
processor bus.
 Enable the input of register R4 by setting R4in to 1. This loads data from the processor bus
into register R4.
 All operations and data transfers with in the processor take place within time periods
defined by the processor clock.
 The control signals that govern a particular transfer are asserted at the start of the clock
cycle.
Bus
D Q
1
Q
Riout
Ri in
Clock
Figure 2.3 Input and output for one register bit.
Performing an Arithmetic or Logic Operation
 The ALU is a combinational circuit that has no internal storage.
 ALU gets the two operands from MUX and bus. The result is temporarily stored in
register Z.
 What is the sequence of operations to add the contents of register R1 to those of R2 and store
the result in R3?
o R1out, Yin

o R2out, SelectY, Add, Zin
o Zout, R3in
 All other signals are inactive.
 In step 1, the output of register R1 and the input of register Y are enabled, causing the contents
of R1 to be transferred over the bus to Y.
 Step 2, the multiplexer’s select signal is set to Select Y, causing the multiplexer to gate the
contents of register Y to input A of the ALU.
 At the same time, the contents of register R2 are gated onto the bus and, hence, to input B.
 The function performed by the ALU depends on the signals applied to its control lines.
 In this case, the ADD line is set to 1, causing the output of the ALU to be the sum of the two
numbers at inputs A and B.
 This sum is loaded into register Z because its input control signal is activated.
 In step 3, the contents of register Z are transferred to the destination register R3. This last transfer
cannot be carried out during step 2, because only one register output can be connected to the bus
during any clock cycle.
Fetching a Word from Memory
 The processor has to specify the address of the memory location where this information is stored
and request a Read operation.
 This applies whether the information to be fetched represents an instruction in a program or an
operand specified by an instruction.
 The processor transfers the required address to the MAR, whose output is connected to the
address lines of the memory bus.
Memory-Bus data lines Internal processor bus

MDRoutE MDRout
MDRinE
Figure: 2.4 Connection and control signals MDR.

 At the same time, the processor uses the control lines of the memory bus to indicate that a Read
operation is needed.
 When the requested data are received from the memory they are stored in register MDR, from
where they can be transferred to other registers in the processor.

 The response time of each memory access varies (cache miss, memory-mapped I/O…).
 To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
 Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus Wait for the MFC response from the memory Load
MDR from the memory bus
R2 ← [MDR]
 The output of MAR is enabled all the time.
 Thus the contents of MAR are always available on the address lines of the memory bus.
 When a new address is loaded into MAR, it will appear on the memory bus at the beginning of
the next clock cycle.(in fig)
 A read control signal is activated at the same time MAR is loaded.
 This means memory read operations requires three steps, which can be described by the signals
being activated as follows
R1out,MARin,Read MDRinE,WMFC
MDRout,R2in
Step 1 2 3
Clock
MAR in
Address
Read
MR
MDR inE
Data
MFC
MDR out
Figure 2.5 Timing of a memory Read operation.

Storing a word in Memory
• Writing a word into a memory location follows a similar procedure.
• The desired address is loaded into MAR.
• Then, the data to be written are loaded into MDR, and a write command is issued.
Example
• Executing the instruction
• Move R2,(R1) requires the following steps
• 1 R1out,MARin
• 2.R2out,MDRin,Write
• 3.MDRoutE,WMFC
Execution of a Complete Instruction

• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the memory location pointed to by R3)
• Perform the addition
• Load the result into R1
Step Action
1 PCout , MARin , Read,Select4,Add, Zin

2 Zout , PCin , Yin , WMFC
3 MDRout , IRin
4 R3out , MARin , Read
5 R1out , Yin , WMFC
6 MDRout , Select Y, Add, Zin
7 Zout , R1in , End
Figure: 2.6 Control sequence for execution of the instruction Add (R3), R1.

Figure:2.7 Single-bus organization of the data path inside a processor
Execution of Branch Instructions

 A branch instruction replaces the contents of PC with the branch target address, which is usually
obtained by adding an offset X given in the branch instruction.
 The offset X is usually the difference between the branch target address and the address
immediately following the branch instruction.
 Conditional branch

Figure: 2.8 Control sequence for an unconditional branch instruction.
Multiple-Bus Organization
MUX
Figure 2.9: Three-bus organization of the data path.

Example: Add R4, R5, R6
Fig 2.10: Control Sequence for the Instruction for the three bus organization
Hardwired Control
 To execute instructions, the processor must have some means of generating the control signals
needed in the proper sequence.
 Two categories: hardwired control and micro programmed control
 Hardwired system can operate at high speed; but with little flexibility.
Control Unit Organization
Figure: 2.11 Control Unit Organisation

Detailed Control design
CLK
Clock Control step Reset
counter
Step decoder
T 1 T2 Tn
INS1
External
INS2 inputs
Instruction
IR Encoder
decoder
Condition
INSm codes
Run End
Control signals
Figure 2.12: Separation of the decoding and encoding function
Generating Zin
 Zin = T1 + T6 • ADD + T4 • BR + …
Figure 2.13: generation of the Zin control signal for the Processor in figure
Generating End
 End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN +…

Branch<0
Add Branch
N N
T7 T5 T4 T5
End
Figure 2.14: Generation of the End control signal.
A Complete Processor
Instruction Inte ger Floating-point

unit unit unit
Instruction Data
cache cache
Bus interf ace

Pr ocessor
us System b
Main Input/
memory Output
Figure 2.15: Block diagram of a complete processor .

Microprogrammed Control
 Control signals are generated by a program similar to machine language programs.

 Control Word (CW); micro routine; microinstruction
MDRout
MAR in
WMFC
Select
PCout
R1out
R3out
Read
PCin
Micro -
End
R1in
Add
Zout
IRin
Yin
Zin
i instructi n
on
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1
Figure 2.16: An example of microinstructions for above Figure.
Step Action
1 PC out , MAR in , Read, Select4, Add, Z in

2 Z out , PC in , Y in , WMF C
3 MDR out , IR in
4 R3 out , MAR in , Read
5 R1 out , Y in , WMF C
6 MDR out , SelectY, Add, Z in
7 Z out , R1 in , End
Figure 2.17: Control sequence for execution of t he instruction Add (R3), R1.

Starting
IR address
generator
Clock PC
Control
store CW
Figure 2.18: Basic organization of a microprogrammed control unit.
 The previous organization cannot handle the situation when the control unit is required to check
the status of the condition codes or external inputs to choose between alternative courses of
action.
 Use conditional branch microinstruction.

Figure 2.19 Oragnisation of the control unit to allow conditional branching in the Microprogram
Microinstructions
 A straightforward way to structure microinstructions is to assign one bit position to each control
signal.
 However, this is very inefficient.
 The length can be reduced: most signals are not needed simultaneously, and many signals are
mutually exclusive.
 All mutually exclusive signals are placed in the same group in binary coding.
Microinstruction
F1 F2 F3 F4 F5
F1 (4 bits) F2 (3 bits) F3 (3 bits) F4 (4 bits) F5 (2 bits)
0000: No transfer 000: No transfer 000: No transfer 0000: Add 00: No action
0001: PC out 001: PC in 001: MAR in 0001: Sub 01: Read
0010: MDR out 010: IR in 010: MDR in 10: Write
0011: Z out 011: Z in 011: TEMP in
0100: R0 out 100: R0 in 100: Y in 1111: XOR
0101: R1 out 101: R1 in
16 ALU
0110: R2 out 110: R2 in functions
0111: R3 out 111: R3 in
1010: TEMP out
1011: Offset out
F7 F8
F6
F6 (1 bit) F7 (1 bit) F8 (1 bit)
0: SelectY 0: No action 0: Continue

1: Select4 1: WMFC 1: End
Figure 2.20: An example of a partial format for field-encoded microinstructions.

Further Improvement
 Enumerate the patterns of required signals in all possible microinstructions. Each meaningful
combination of active control signals can then be assigned a distinct code.
 Vertical organization
 Horizontal organization
Micro program Sequencing
 If all micro programs require only straightforward sequential execution of microinstructions

except for branches, letting a µPC governs the sequencing would be efficient.
 However, two disadvantages:
o Having a separate micro routine for each machine instruction results in a large total number of
microinstructions and a large control store.
o Longer execution time because it takes more time to carry out the required branches.
 Example: Add src, Rdst
 Four addressing modes: register, autoincrement, autodecrement, and indexed (with indirect
forms).

Figure: 2.22 Microinstruction for Adc(Rsrc)+Rdst
Microinstructions with Next-Address Field
IR
External Condition
Inputs codes
Decoding circuits
AR
Control store
Next address IR
Microinstruction decoder
Control signals
Figure 2.23. Microinstruction-sequencing organization.

 The microprogram we discussed requires several branch microinstructions, which perform
no useful operation in the datapath.
 A powerful alternative approach is to include an address field as a part of every
microinstruction to indicate the location of the next microinstruction to be fetched.
 Pros: separate branch microinstructions are virtually eliminated; few limitations in
assigning addresses to microinstructions.
 Cons: additional bits for the address field (around 1/6)
Microinstruction
F0 F1 F2 F3
F0 (8 bits) F1 (3 bits) F2 (3 bits) F3 (3 bits)
Address of next 000: No transfer 000: No transfer 000: No transfer

microinstruction 001: PCout 001: PCin 001: MARin
010: MDRout 010: IRin 010: MDRin
011: Zout 011: Zin 011: TEMPin
100: Rsrcout 100: Rsrcin 100: Yin
101: Rdsot ut 101: Rdsitn
110: TEMPout
F4 F5 F6 F7
F4 (4 bits) F5 (2 bits) F6 (1 bit) F7 (1 bit)
0000: Add 00: No action 0: SelectY 0: No action

0001: Sub 01: Read 1: Select4 1: WMFC
10: Write
1111: XOR
F8 F9 F10
F8 (1 bit) F9 (1 bit) F10 (1 bit)
0: NextAdrs 0: No action 0: No action

1: InstDec 1: ORmode 1: ORindsrc
Figure 2.24. Format for microinstructions in the example of Section 7

Implementation of the Microroutine
Octal
address F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10
0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 01 1 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 00 0 1 0 0 0
0 0 2 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 1 1 0
1 21 01 0 1 0 0 1 0 1 00 0 1 1 0 0 1 0 0 0 0 01 1 0 0 0 0
1 22 01 1 1 1 0 0 0 0 11 1 0 0 0 0 0 0 0 0 0 00 0 1 0 0 1
1 7 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 01 0 1 0 0 0
1 7 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0
1 7 2 0 1 1 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0
1 7 3 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0
Figure 2.25. Implementation of the microroutine of Figure 2.22 using a next-microinstruction

address field. (See Figure 2.23 for encoded signals.)

Figure 2.26 Some details of the control-signal-generating circuitry.
Figure 2.27 control circuitry for bit-ORing

PREFETCHING MICROINSTRUCTIONS
 Drawback of microprogrammed control: Slower operating speed because of the time
it takes to fetch microinstructions from the control-store.
 Solution: Faster operation is achieved if the next microinstruction is pre-fetched while
the current one is being executed.
EMULATION
 The main function of microprogrammed control is to provide a means for simple,
flexible and relatively inexpensive execution of machine instruction.
 Its flexibility in using a machine's resources allows diverse classes of instructions to
be implemented.
 Suppose we add to the instruction-repository of a given computer M1, an entirely new
set of instructions that is in fact the instruction-set of a different computer M2.
 Programs written in the machine language of M2 can be then be run on computer M1
i.e. M1 emulates M2.
 Emulation allows us to replace obsolete equipment with more up-to-date machines.
 If the replacement computer fully emulates the original one, then no software changes
must be made to run existing programs.
 Emulation is easiest when the machines involved have similar architectures.
Cache Memory
Cache memory bridges the speed mismatch between the processor and the main memory.
When cache hit occurs,
 The required word is present in the cache memory.
 The required word is delivered to the CPU from the cache memory.
When cache miss occurs,
 The required word is not present in the cache memory.
 The page containing the required word has to be mapped from the main memory.
 This mapping is performed using cache mapping techniques.
Cache Mapping-
 Cache mapping defines how a block from the main memory is mapped to the cache
memory in case of a cache miss.
OR
 Cache mapping is a technique by which the contents of main memory are brought into
the cache memory.
The following diagram illustrates the mapping process-

Now, before proceeding further, it is important to note the following points-
Cache Mapping Techniques
Cache mapping is performed using following three different techniques-
1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
1. Direct Mapping-
In direct mapping,
 A particular block of main memory can map only to a particular line of the cache.
 The line number of cache to which a particular block can map is given by-
Cache line number

= (Main Memory Block Address) Modulo (Number of lines in Cache)
Example-
Consider cache memory is divided into ‘n’ number of lines.
 Then, block ‘j’ of main memory can map to line number (j mod n) only of the cache.

Need of Replacement Algorithm
In direct mapping,
 There is no need of any replacement algorithm.
 This is because a main memory block can map only to a particular line of the cache.
 Thus, the new incoming block will always replace the existing block (if any) in that
particular line.
Division of Physical Address
In direct mapping, the physical address is divided as-
2. Fully Associative Mapping

In fully associative mapping,
 A block of main memory can map to any line of the cache that is freely available at
that moment.
 This makes fully associative mapping more flexible than direct mapping.
Example-
Consider the following scenario-

Here,
 All the lines of cache are freely available.
 Thus, any block of main memory can map to any line of the cache.
 Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.
Need of Replacement Algorithm-
In fully associative mapping,
 A replacement algorithm is required.
 Replacement algorithm suggests the block to be replaced if all the cache lines are
occupied.
 Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed.
Division of Physical Address-
In fully associative mapping, the physical address is divided as-
3. K-way Set Associative Mapping

In k-way set associative mapping,
 Cache lines are grouped into sets where each set contains k number of lines.
 A particular block of main memory can map to only one particular set of the cache.
 However, within that set, the memory block can map any cache line that is freely
available.

 The set of the cache to which a particular block of the main memory can map is given
by-
Cache set number

= ( Main Memory Block Address ) Modulo (Number of sets in Cache)
Example-
Consider the following example of 2-way set associative mapping-
Here,
 k = 2 suggests that each set contains two cache lines.
 Since cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets.
 Block ‘j’ of main memory can map to set number (j mod 3) only of the cache.
 Within that set, block ‘j’ can map to any cache line that is freely available at that
moment.
 If all the cache lines are occupied, then one of the existing blocks will have to be
replaced.
Need of Replacement Algorithm-
 Set associative mapping is a combination of direct mapping and fully associative
mapping.
 It uses fully associative mapping within each set.
 Thus, set associative mapping requires a replacement algorithm.
Division of Physical Address-
In set associative mapping, the physical address is divided as-
Special Cases-
 If k = 1, then k-way set associative mapping becomes direct mapping i.e.
1-way Set Associative Mapping ≡ Direct Mapping
If k = Total number of lines in the cache, then k-way set associative mapping becomes fully
associative mapping.
Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is
available.
Virtual memory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of programming easier
because the programmer no longer needs to worry about the amount of physical memory
available.
Address mapping using pages:
The table implementation of the address mapping is simplified if the information in
the address space. And the memory space is each divided into groups of fixed size.
Moreover, the physical memory is broken down into groups of equal size called
blocks, which may range from 64 to 4096 words each.
The term page refers to groups of address space of the same size.
Also, Consider a computer with an address space of 8K and a memory space of 4K.
If we split each into groups of 1K words we obtain eight pages and four blocks as
shown in the figure.
At any given time, up to four pages of address space may reside in main memory in
any one of the four blocks.

Associative memory page table:
The implementation of the page table is vital to the efficiency of the virtual memory technique, for
each memory reference must also include a reference to the page table. The fastest solution is a set
of dedicated registers to hold the page table but this method is impractical for large page tables
because of the expense. But keeping the page table in main memory could cause intolerable delays
because even only one memory access for the page table involves a slowdown of 100 percent and
large page tables can require more than one memory access. The solution is to augment the page
table with special high-speed memory made up of associative registers or translation look aside
buffers (TLBs) which are called Associative Memory.

Computer Organisation-Unit - II

Uploaded by

Copyright:

Available Formats

Computer Organisation-Unit - II

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Organisation-Unit - II

Uploaded by

Copyright:

Available Formats

UNIT-II Basic Processing Unit

Figure 2.1 Single-bus organization of the data path inside a processor.

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.2 Register Transfer

COMPUTER ORGANISATION RAMANUJA NAYAK

COMPUTER ORGANISATION RAMANUJA NAYAK

Memory-Bus data lines Internal processor bus

Figure: 2.4 Connection and control signals MDR.

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.5 Timing of a memory Read operation.

Execution of a Complete Instruction

1 PCout , MARin , Read,Select4,Add, Zin

COMPUTER ORGANISATION RAMANUJA NAYAK

Execution of Branch Instructions

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.9: Three-bus organization of the data path.

COMPUTER ORGANISATION RAMANUJA NAYAK

Control Unit Organization

Figure: 2.11 Control Unit Organisation

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.12: Separation of the decoding and encoding function

 End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN +…

Figure 2.14: Generation of the End control signal.

Instruction Inte ger Floating-point

Bus interf ace

Figure 2.15: Block diagram of a complete processor .

COMPUTER ORGANISATION RAMANUJA NAYAK

 Control signals are generated by a program similar to machine language programs.

Figure 2.16: An example of microinstructions for above Figure.

1 PC out , MAR in , Read, Select4, Add, Z in

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.18: Basic organization of a microprogrammed control unit.

COMPUTER ORGANISATION RAMANUJA NAYAK

F1 (4 bits) F2 (3 bits) F3 (3 bits) F4 (4 bits) F5 (2 bits)

F6 (1 bit) F7 (1 bit) F8 (1 bit)

0: SelectY 0: No action 0: Continue

Figure 2.20: An example of a partial format for field-encoded microinstructions.

COMPUTER ORGANISATION RAMANUJA NAYAK

Micro program Sequencing

 If all micro programs require only straightforward sequential execution of microinstructions

COMPUTER ORGANISATION RAMANUJA NAYAK

Microinstructions with Next-Address Field

Next address IR

Figure 2.23. Microinstruction-sequencing organization.

COMPUTER ORGANISATION RAMANUJA NAYAK

F0 (8 bits) F1 (3 bits) F2 (3 bits) F3 (3 bits)

Address of next 000: No transfer 000: No transfer 000: No transfer

F4 (4 bits) F5 (2 bits) F6 (1 bit) F7 (1 bit)

0000: Add 00: No action 0: SelectY 0: No action

F8 (1 bit) F9 (1 bit) F10 (1 bit)

0: NextAdrs 0: No action 0: No action

Figure 2.24. Format for microinstructions in the example of Section 7

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.25. Implementation of the microroutine of Figure 2.22 using a next-microinstruction

COMPUTER ORGANISATION RAMANUJA NAYAK

Figure 2.27 control circuitry for bit-ORing

COMPUTER ORGANISATION RAMANUJA NAYAK

COMPUTER ORGANISATION RAMANUJA NAYAK

Cache line number