Computer Organisation-Unit - II

Download as pdf or txt
Download as pdf or txt
You are on page 1of 28

UNIT-II Basic Processing Unit

Fundamental Concepts
 Processor fetches one instruction at a time, and perform the operation specified.
 Instructions are fetched from successive memory locations until a branch or a jump instruction is
encountered.
 Processor keeps track of the address of the memory location containing the next instruction to be
fetched using Program Counter (PC).
 Instruction Register (IR)

Executing an Instruction
 Fetch the contents of the memory location pointed to by the PC. The contents of this location are
loaded into the IR (fetch phase). IR ← [[PC]]
 Assuming that the memory is byte addressable, increment the contents of the PC by 4 (fetch
phase). PC ← [PC] + 4
 Carry out the actions specified by the instruction in the IR (execution phase).

Processor Organization

Figure 2.1 Single-bus organization of the data path inside a processor.

COMPUTER ORGANISATION RAMANUJA NAYAK


 ALU and all the registers are interconnected via a single common bus.
 The data and address lines of the external memory bus connected to the internal processor
bus via the memory data register, MDR, and the memory address register, MAR respectively.
 Register MDR has two inputs and two outputs.
 Data may be loaded into MDR either from the memory bus or from the internal
processor bus.
 The data stored in MDR may be placed on either bus.
 The input of MAR is connected to the internal bus, and its output is connected to the external
bus.
 The control lines of the memory bus are connected to the instruction decoder and control
logic.
 This unit is responsible for issuing the signals that control the operation of all the units
inside the processor and for increasing with the memory bus.
 The MUX selects either the output of register Y or a constant value 4 to be provided as input
A of the ALU.
 The constant 4 is used to increment the contents of the program counter.

Register Transfers

Figure 2.2 Register Transfer


 Instruction execution involves a sequence of steps in which data are transferred from one

COMPUTER ORGANISATION RAMANUJA NAYAK


register to another.
 For each register two control signals are used to place the contents of that register on the bus
or to load the data on the bus into register.(in figure)
 The input and output of register Riin and Riout is set to 1, the data on the bus are loaded into
Ri.
 Similarly, when Ri out is set to 1, the contents of register Ri are placed on the bus.
 While Riout is equal to 0, the bus can be used for transferring data from other registers.

Example
 Suppose we wish to transfer the contents of register R1 to register R4. This can be
accomplished as follows.
 Enable the output of registers R1 by setting R1out to 1. This places the contents of R1 on the
processor bus.
 Enable the input of register R4 by setting R4in to 1. This loads data from the processor bus
into register R4.
 All operations and data transfers with in the processor take place within time periods
defined by the processor clock.
 The control signals that govern a particular transfer are asserted at the start of the clock
cycle.
Bus

D Q
1
Q
Riout

Ri in
Clock
Figure 2.3 Input and output for one register bit.
Performing an Arithmetic or Logic Operation
 The ALU is a combinational circuit that has no internal storage.
 ALU gets the two operands from MUX and bus. The result is temporarily stored in
register Z.
 What is the sequence of operations to add the contents of register R1 to those of R2 and store
the result in R3?
o R1out, Yin

COMPUTER ORGANISATION RAMANUJA NAYAK


o R2out, SelectY, Add, Zin
o Zout, R3in
 All other signals are inactive.
 In step 1, the output of register R1 and the input of register Y are enabled, causing the contents
of R1 to be transferred over the bus to Y.
 Step 2, the multiplexer’s select signal is set to Select Y, causing the multiplexer to gate the
contents of register Y to input A of the ALU.
 At the same time, the contents of register R2 are gated onto the bus and, hence, to input B.
 The function performed by the ALU depends on the signals applied to its control lines.
 In this case, the ADD line is set to 1, causing the output of the ALU to be the sum of the two
numbers at inputs A and B.
 This sum is loaded into register Z because its input control signal is activated.
 In step 3, the contents of register Z are transferred to the destination register R3. This last transfer
cannot be carried out during step 2, because only one register output can be connected to the bus
during any clock cycle.
Fetching a Word from Memory
 The processor has to specify the address of the memory location where this information is stored
and request a Read operation.
 This applies whether the information to be fetched represents an instruction in a program or an
operand specified by an instruction.
 The processor transfers the required address to the MAR, whose output is connected to the
address lines of the memory bus.

Memory-Bus data lines Internal processor bus


MDRoutE MDRout

MDRinE

Figure: 2.4 Connection and control signals MDR.


 At the same time, the processor uses the control lines of the memory bus to indicate that a Read
operation is needed.
 When the requested data are received from the memory they are stored in register MDR, from
where they can be transferred to other registers in the processor.

COMPUTER ORGANISATION RAMANUJA NAYAK


 The response time of each memory access varies (cache miss, memory-mapped I/O…).
 To accommodate this, the processor waits until it receives an indication that the requested
operation has been completed (Memory-Function-Completed, MFC).
 Move (R1), R2
MAR ← [R1]
Start a Read operation on the memory bus Wait for the MFC response from the memory Load
MDR from the memory bus
R2 ← [MDR]
 The output of MAR is enabled all the time.
 Thus the contents of MAR are always available on the address lines of the memory bus.
 When a new address is loaded into MAR, it will appear on the memory bus at the beginning of
the next clock cycle.(in fig)
 A read control signal is activated at the same time MAR is loaded.
 This means memory read operations requires three steps, which can be described by the signals
being activated as follows
R1out,MARin,Read MDRinE,WMFC
MDRout,R2in
Step 1 2 3

Clock

MAR in

Address

Read

MR

MDR inE

Data

MFC

MDR out

Figure 2.5 Timing of a memory Read operation.


Storing a word in Memory
• Writing a word into a memory location follows a similar procedure.
• The desired address is loaded into MAR.
• Then, the data to be written are loaded into MDR, and a write command is issued.
Example
COMPUTER ORGANISATION RAMANUJA NAYAK
• Executing the instruction
• Move R2,(R1) requires the following steps
• 1 R1out,MARin
• 2.R2out,MDRin,Write
• 3.MDRoutE,WMFC

Execution of a Complete Instruction


• Add (R3), R1
• Fetch the instruction
• Fetch the first operand (the contents of the memory location pointed to by R3)
• Perform the addition
• Load the result into R1

Step Action

1 PCout , MARin , Read,Select4,Add, Zin


2 Zout , PCin , Yin , WMFC
3 MDRout , IRin
4 R3out , MARin , Read
5 R1out , Yin , WMFC
6 MDRout , Select Y, Add, Zin
7 Zout , R1in , End

Figure: 2.6 Control sequence for execution of the instruction Add (R3), R1.

COMPUTER ORGANISATION RAMANUJA NAYAK


Figure:2.7 Single-bus organization of the data path inside a processor

Execution of Branch Instructions


 A branch instruction replaces the contents of PC with the branch target address, which is usually
obtained by adding an offset X given in the branch instruction.
 The offset X is usually the difference between the branch target address and the address
immediately following the branch instruction.
 Conditional branch

COMPUTER ORGANISATION RAMANUJA NAYAK


Figure: 2.8 Control sequence for an unconditional branch instruction.

Multiple-Bus Organization
MUX

Figure 2.9: Three-bus organization of the data path.

COMPUTER ORGANISATION RAMANUJA NAYAK


Example: Add R4, R5, R6

Fig 2.10: Control Sequence for the Instruction for the three bus organization

Hardwired Control

 To execute instructions, the processor must have some means of generating the control signals
needed in the proper sequence.
 Two categories: hardwired control and micro programmed control
 Hardwired system can operate at high speed; but with little flexibility.

Control Unit Organization

Figure: 2.11 Control Unit Organisation

COMPUTER ORGANISATION RAMANUJA NAYAK


Detailed Control design

CLK
Clock Control step Reset
counter

Step decoder

T 1 T2 Tn

INS1
External
INS2 inputs
Instruction
IR Encoder
decoder

Condition
INSm codes

Run End

Control signals

Figure 2.12: Separation of the decoding and encoding function

Generating Zin

 Zin = T1 + T6 • ADD + T4 • BR + …

Figure 2.13: generation of the Zin control signal for the Processor in figure
COMPUTER ORGANISATION RAMANUJA NAYAK
Generating End

 End = T7 • ADD + T5 • BR + (T5 • N + T4 • N) • BRN +…


Branch<0
Add Branch
N N

T7 T5 T4 T5

End

Figure 2.14: Generation of the End control signal.

A Complete Processor

Instruction Inte ger Floating-point


unit unit unit

Instruction Data
cache cache

Bus interf ace


Pr ocessor

us System b

Main Input/
memory Output

Figure 2.15: Block diagram of a complete processor .

COMPUTER ORGANISATION RAMANUJA NAYAK


Microprogrammed Control

 Control signals are generated by a program similar to machine language programs.


 Control Word (CW); micro routine; microinstruction

MDRout
MAR in

WMFC
Select
PCout

R1out

R3out
Read
PCin
Micro -

End
R1in
Add

Zout
IRin
Yin

Zin
i instructi n
on
1 0 1 1 1 0 0 0 1 1 1 0 0 0 0 0 0
2 1 0 0 0 0 0 1 0 0 0 1 0 0 0 1 0
3 0 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0
4 0 0 1 1 0 0 0 0 0 0 0 0 0 1 0 0
5 0 0 0 0 0 0 1 0 0 0 0 1 0 0 1 0
6 0 0 0 0 1 0 0 0 1 1 0 0 0 0 0 0
7 0 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1

Figure 2.16: An example of microinstructions for above Figure.

Step Action

1 PC out , MAR in , Read, Select4, Add, Z in


2 Z out , PC in , Y in , WMF C
3 MDR out , IR in
4 R3 out , MAR in , Read
5 R1 out , Y in , WMF C
6 MDR out , SelectY, Add, Z in
7 Z out , R1 in , End

Figure 2.17: Control sequence for execution of t he instruction Add (R3), R1.

COMPUTER ORGANISATION RAMANUJA NAYAK


Starting
IR address
generator

Clock PC

Control
store CW

Figure 2.18: Basic organization of a microprogrammed control unit.

 The previous organization cannot handle the situation when the control unit is required to check
the status of the condition codes or external inputs to choose between alternative courses of
action.
 Use conditional branch microinstruction.

COMPUTER ORGANISATION RAMANUJA NAYAK


Figure 2.19 Oragnisation of the control unit to allow conditional branching in the Microprogram

Microinstructions
 A straightforward way to structure microinstructions is to assign one bit position to each control
signal.
 However, this is very inefficient.
 The length can be reduced: most signals are not needed simultaneously, and many signals are
mutually exclusive.
 All mutually exclusive signals are placed in the same group in binary coding.
Microinstruction
F1 F2 F3 F4 F5

F1 (4 bits) F2 (3 bits) F3 (3 bits) F4 (4 bits) F5 (2 bits)

0000: No transfer 000: No transfer 000: No transfer 0000: Add 00: No action
0001: PC out 001: PC in 001: MAR in 0001: Sub 01: Read
0010: MDR out 010: IR in 010: MDR in 10: Write
0011: Z out 011: Z in 011: TEMP in
0100: R0 out 100: R0 in 100: Y in 1111: XOR
0101: R1 out 101: R1 in
16 ALU
0110: R2 out 110: R2 in functions
0111: R3 out 111: R3 in
1010: TEMP out
1011: Offset out

F7 F8
F6

F6 (1 bit) F7 (1 bit) F8 (1 bit)

0: SelectY 0: No action 0: Continue


1: Select4 1: WMFC 1: End

Figure 2.20: An example of a partial format for field-encoded microinstructions.

COMPUTER ORGANISATION RAMANUJA NAYAK


Further Improvement

 Enumerate the patterns of required signals in all possible microinstructions. Each meaningful
combination of active control signals can then be assigned a distinct code.
 Vertical organization
 Horizontal organization

Micro program Sequencing

 If all micro programs require only straightforward sequential execution of microinstructions


except for branches, letting a µPC governs the sequencing would be efficient.
 However, two disadvantages:
o Having a separate micro routine for each machine instruction results in a large total number of
microinstructions and a large control store.
o Longer execution time because it takes more time to carry out the required branches.
 Example: Add src, Rdst
 Four addressing modes: register, autoincrement, autodecrement, and indexed (with indirect
forms).

COMPUTER ORGANISATION RAMANUJA NAYAK


COMPUTER ORGANISATION RAMANUJA NAYAK
Figure: 2.22 Microinstruction for Adc(Rsrc)+Rdst

Microinstructions with Next-Address Field

IR

External Condition
Inputs codes

Decoding circuits

AR

Control store

Next address IR

Microinstruction decoder

Control signals

Figure 2.23. Microinstruction-sequencing organization.

COMPUTER ORGANISATION RAMANUJA NAYAK


 The microprogram we discussed requires several branch microinstructions, which perform
no useful operation in the datapath.
 A powerful alternative approach is to include an address field as a part of every
microinstruction to indicate the location of the next microinstruction to be fetched.
 Pros: separate branch microinstructions are virtually eliminated; few limitations in
assigning addresses to microinstructions.
 Cons: additional bits for the address field (around 1/6)
Microinstruction

F0 F1 F2 F3

F0 (8 bits) F1 (3 bits) F2 (3 bits) F3 (3 bits)

Address of next 000: No transfer 000: No transfer 000: No transfer


microinstruction 001: PCout 001: PCin 001: MARin
010: MDRout 010: IRin 010: MDRin
011: Zout 011: Zin 011: TEMPin
100: Rsrcout 100: Rsrcin 100: Yin
101: Rdsot ut 101: Rdsitn
110: TEMPout

F4 F5 F6 F7

F4 (4 bits) F5 (2 bits) F6 (1 bit) F7 (1 bit)

0000: Add 00: No action 0: SelectY 0: No action


0001: Sub 01: Read 1: Select4 1: WMFC
10: Write
1111: XOR

F8 F9 F10

F8 (1 bit) F9 (1 bit) F10 (1 bit)

0: NextAdrs 0: No action 0: No action


1: InstDec 1: ORmode 1: ORindsrc

Figure 2.24. Format for microinstructions in the example of Section 7

COMPUTER ORGANISATION RAMANUJA NAYAK


Implementation of the Microroutine

Octal
address F0 F1 F2 F3 F4 F5 F6 F7 F8 F9 F10

0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 1 1 0 0 1 0 0 0 0 01 1 0 0 0 0
0 0 1 0 0 0 0 0 0 1 0 0 1 1 0 0 1 1 0 0 0 0 0 0 00 0 1 0 0 0
0 0 2 0 0 0 0 0 0 1 1 0 1 0 0 1 0 0 0 0 0 0 0 0 00 0 0 0 0 0
0 0 3 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 00 0 0 1 1 0

1 21 01 0 1 0 0 1 0 1 00 0 1 1 0 0 1 0 0 0 0 01 1 0 0 0 0
1 22 01 1 1 1 0 0 0 0 11 1 0 0 0 0 0 0 0 0 0 00 0 1 0 0 1

1 7 0 0 1 1 1 1 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 01 0 1 0 0 0
1 7 1 0 1 1 1 1 0 1 0 0 1 0 0 0 0 1 0 0 0 0 0 0 00 0 0 0 0 0
1 7 2 0 1 1 1 1 0 1 1 1 0 1 0 1 1 0 0 0 0 0 0 0 00 0 0 0 0 0
1 7 3 0 0 0 0 0 0 0 0 0 1 1 1 0 1 0 0 0 0 0 0 0 00 0 0 0 0 0

Figure 2.25. Implementation of the microroutine of Figure 2.22 using a next-microinstruction


address field. (See Figure 2.23 for encoded signals.)

COMPUTER ORGANISATION RAMANUJA NAYAK


Figure 2.26 Some details of the control-signal-generating circuitry.

Figure 2.27 control circuitry for bit-ORing

COMPUTER ORGANISATION RAMANUJA NAYAK


PREFETCHING MICROINSTRUCTIONS
 Drawback of microprogrammed control: Slower operating speed because of the time
it takes to fetch microinstructions from the control-store.
 Solution: Faster operation is achieved if the next microinstruction is pre-fetched while
the current one is being executed.
EMULATION
 The main function of microprogrammed control is to provide a means for simple,
flexible and relatively inexpensive execution of machine instruction.
 Its flexibility in using a machine's resources allows diverse classes of instructions to
be implemented.
 Suppose we add to the instruction-repository of a given computer M1, an entirely new
set of instructions that is in fact the instruction-set of a different computer M2.
 Programs written in the machine language of M2 can be then be run on computer M1
i.e. M1 emulates M2.
 Emulation allows us to replace obsolete equipment with more up-to-date machines.
 If the replacement computer fully emulates the original one, then no software changes
must be made to run existing programs.
 Emulation is easiest when the machines involved have similar architectures.

Cache Memory
Cache memory bridges the speed mismatch between the processor and the main memory.
When cache hit occurs,
 The required word is present in the cache memory.
 The required word is delivered to the CPU from the cache memory.
When cache miss occurs,
 The required word is not present in the cache memory.
 The page containing the required word has to be mapped from the main memory.
 This mapping is performed using cache mapping techniques.
Cache Mapping-
 Cache mapping defines how a block from the main memory is mapped to the cache
memory in case of a cache miss.
OR
 Cache mapping is a technique by which the contents of main memory are brought into
the cache memory.
The following diagram illustrates the mapping process-

COMPUTER ORGANISATION RAMANUJA NAYAK


Now, before proceeding further, it is important to note the following points-
Cache Mapping Techniques
Cache mapping is performed using following three different techniques-

1. Direct Mapping
2. Fully Associative Mapping
3. K-way Set Associative Mapping
1. Direct Mapping-
In direct mapping,
 A particular block of main memory can map only to a particular line of the cache.
 The line number of cache to which a particular block can map is given by-

Cache line number


= (Main Memory Block Address) Modulo (Number of lines in Cache)

Example-
Consider cache memory is divided into ‘n’ number of lines.
 Then, block ‘j’ of main memory can map to line number (j mod n) only of the cache.

COMPUTER ORGANISATION RAMANUJA NAYAK


Need of Replacement Algorithm
In direct mapping,
 There is no need of any replacement algorithm.
 This is because a main memory block can map only to a particular line of the cache.
 Thus, the new incoming block will always replace the existing block (if any) in that
particular line.
Division of Physical Address
In direct mapping, the physical address is divided as-

2. Fully Associative Mapping


In fully associative mapping,
 A block of main memory can map to any line of the cache that is freely available at
that moment.
 This makes fully associative mapping more flexible than direct mapping.
Example-
Consider the following scenario-

COMPUTER ORGANISATION RAMANUJA NAYAK


Here,
 All the lines of cache are freely available.
 Thus, any block of main memory can map to any line of the cache.
 Had all the cache lines been occupied, then one of the existing blocks will have to be
replaced.
Need of Replacement Algorithm-
In fully associative mapping,
 A replacement algorithm is required.
 Replacement algorithm suggests the block to be replaced if all the cache lines are
occupied.
 Thus, replacement algorithm like FCFS Algorithm, LRU Algorithm etc is employed.
Division of Physical Address-
In fully associative mapping, the physical address is divided as-

3. K-way Set Associative Mapping


In k-way set associative mapping,
 Cache lines are grouped into sets where each set contains k number of lines.
 A particular block of main memory can map to only one particular set of the cache.
 However, within that set, the memory block can map any cache line that is freely
available.

COMPUTER ORGANISATION RAMANUJA NAYAK


 The set of the cache to which a particular block of the main memory can map is given
by-

Cache set number


= ( Main Memory Block Address ) Modulo (Number of sets in Cache)

Example-
Consider the following example of 2-way set associative mapping-

Here,
 k = 2 suggests that each set contains two cache lines.
 Since cache contains 6 lines, so number of sets in the cache = 6 / 2 = 3 sets.
 Block ‘j’ of main memory can map to set number (j mod 3) only of the cache.
 Within that set, block ‘j’ can map to any cache line that is freely available at that
moment.
 If all the cache lines are occupied, then one of the existing blocks will have to be
replaced.
Need of Replacement Algorithm-
 Set associative mapping is a combination of direct mapping and fully associative
mapping.
 It uses fully associative mapping within each set.
 Thus, set associative mapping requires a replacement algorithm.
Division of Physical Address-
In set associative mapping, the physical address is divided as-
COMPUTER ORGANISATION RAMANUJA NAYAK
Special Cases-
 If k = 1, then k-way set associative mapping becomes direct mapping i.e.

1-way Set Associative Mapping ≡ Direct Mapping

If k = Total number of lines in the cache, then k-way set associative mapping becomes fully
associative mapping.
Virtual Memory
Virtual memory is the separation of logical memory from physical memory. This separation
provides large virtual memory for programmers when only small physical memory is
available.
Virtual memory is used to give programmers the illusion that they have a very large memory
even though the computer has a small main memory. It makes the task of programming easier
because the programmer no longer needs to worry about the amount of physical memory

available.
Address mapping using pages:
The table implementation of the address mapping is simplified if the information in
the address space. And the memory space is each divided into groups of fixed size.
Moreover, the physical memory is broken down into groups of equal size called
blocks, which may range from 64 to 4096 words each.
The term page refers to groups of address space of the same size.
Also, Consider a computer with an address space of 8K and a memory space of 4K.
COMPUTER ORGANISATION RAMANUJA NAYAK
If we split each into groups of 1K words we obtain eight pages and four blocks as
shown in the figure.
At any given time, up to four pages of address space may reside in main memory in
any one of the four blocks.

COMPUTER ORGANISATION RAMANUJA NAYAK


Associative memory page table:
The implementation of the page table is vital to the efficiency of the virtual memory technique, for
each memory reference must also include a reference to the page table. The fastest solution is a set
of dedicated registers to hold the page table but this method is impractical for large page tables
because of the expense. But keeping the page table in main memory could cause intolerable delays
because even only one memory access for the page table involves a slowdown of 100 percent and
large page tables can require more than one memory access. The solution is to augment the page
table with special high-speed memory made up of associative registers or translation look aside
buffers (TLBs) which are called Associative Memory.

COMPUTER ORGANISATION RAMANUJA NAYAK

You might also like