Introduction To IT and Com Systems

Download as doc, pdf, or txt
Download as doc, pdf, or txt
You are on page 1of 122

INDEX

1. Digital Logic Fundamentals

1.1 Overview

1.2 Boolean Algebra

1.3 Combinational Logic Circuit

1.4 Multiplexers

11

1.5 Decoders & Encoders

12

1.6 Memory

14

1.7 Flip-flop

14

1.8 Sequential Circuit

16

1.9 Register

17

1.10

Counters

18

1.11

Digital circuit

19

1.12

Multiple Choice Quiz

20

2. Introduction to Computer Organisation

22

2.1 Functional Units

22

2.2 System Buses

23

2.3 Memory subsystem Organisation

24

2.4 I/O Subsystem Organisation & Interfacing

27

2.5 Multiple Choice Quiz

29

3. Programming the basic computer

31

3.1 Introduction

31

3.2 The level of programming languages

32

3.3 Translators

34

3.4 Multiple Choice Quiz

40

4. CPU Organisation

42
1

4.1 General Register Organisation

42

4.2 Stack Organisation

44

4.3 Register Organisation

45

4.4 Status & control Registers

46

4.5 Register transfer language

47

4.6 Register Transfer

48

4.7 Bus and Memory Transfer

48

4.8 Arithmetic Micro Operations

49

4.9 Logic Micro-Operation

50

4.10 Shift Micro Operation

51

4.11 Control Unit

52

4.12 Case Study: 8085: Microprocessor

56

4.13 Multiple Choice Quiz

58

5. Computer Arithmetic

60

5.1 Introduction

60

5.2 Number Systems

60

5.3 Addition and Substraction

63

5.4 Multiplication

63

5.5 Division

64

5.6 Floating Point

66

5.7 Multiple Choice Quiz

70

6. Input /Output Organisation

72

6.1 Peripheral Devices

72

6.2 Input/Output Interface

73

6.3 Asynchronous Data Transfer

74

6.4 Mode of Transfer

77
2

6.5 Priority Interrupt

86

6.6 Direct Memory Access

88

6.7 Input Output Processor

89

6.8 Serial Communicator

89

6.9 Multiple Choice Quiz

90

7. Memory Organisation

92

7.1 Memory Hierarchy

92

7.2 Main Memory

92

7.3 Auxiliary Memory

95

7.4 Associative Memory

97

7.5 Cache Memory

98

7.6 Virtual Memory Concept

103

8. Pipeline & Vector Processing

107

8.1 Parallel Processing

109

8.2 Pipelining

110

8.3 Arithmetic Pipeline

112

8.4 Instruction Pipeline

112

8.5 RISC Pipeline

114

8.6 Vector Processing

114

8.7 Array Processor

117

8.8 RISC vs. CISC

118

8.9 Multiple Choice Quiz

119

Chapter 1
Digital Logic Fundamentals

1.1 Overview

Gates, latches, memories and other logic components are used to design computer systems and
their subsystems

Good understanding of digital logic is necessary in order to learn the fundamentals of


computing systems organization and architecture

Two types of digital logic:

Combinatorial logic: output is a function of inputs

Sequential logic: output is a complex function of inputs, previous inputs and previous
outputs

Neither combinatorial logic nor sequential logic is better than the other. In practice, both are
used as appropriate in circuit design.

1.2 Boolean Algebra

A Boolean algebra value can be either true or false.

Digital logic uses 1 to represent true and 0 to represent false.

This presentation introduces the Boolean algebra basic functions and examines the fundamental
methods used to combine, manipulate and transform these functions.

AND
x

out = xy

out

amplitude
0

1
Y(t)

0 1

1
X(t)

1
out(t)= x(t) and y(t)

Output is one if every input has value of 1

More than two values can be and-ed together

For example xyz = 1 only if x=1, y=1 and z=1

OR
x

out = x+y

x
out

amplitude
0

0 1 1

Y(t)

X(t)

The number of inputs that are 1 matter.

More than two values can be xor-ed together.

General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even
number of input values are 1.

out(t)= x(t) xor y(t)

XOR (Exclusive OR)


x

out =

x
out
y
amplitude

The number of inputs that are 1 matter.

More than two values can be xor-ed together.

1 1

Y(t)

X(t)
out(t)= x(t) xor y(t)

General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even
number of input values are 1.

NOT
x

x'

x'

amplitude
0

This function operates on a single Boolean value.

Its output is the complement of its input.

An input of 1 produces an output of 0 and an input of 0 produces an output of 1

x(t)
x'(t)

XNOR
x

out =x xnor y

x
out
y

amplitude
0

0 0

Output value is the complemented output from an XOR function.

Y(t)

X(t)
out(t)= x(t) xnor y(t)

1.3 Combinatorial Logic Circuit

Combinatorial Logic Circuit that implements the function xy+yz DeMorgans Law

(ab)=a+b
(a+b)=ab

y
xy'+yz
z

Property for generating equivalent functions

Allows conversion of AND function to an equivalent OR function and vice-versa

It may allow the simplification of complex functions, that will allow a simpler design

It is useful in generating the complement of a function

Generating the complement of a function using DeMorgans law


7

(xy + yz) = (xy)(yz) = (x + y)(y + z) = xy + xz + yy + yz (because yy=0) => (xy+yz) =


xy + xz + yz

x'y'

x'z'

yz'

x'y + y'z + yz'

Karnaugh Map (K map)

Method for minimizing logic

Is used to represent the values of a function for different input values

The rows and columns of the K-map correspond to the possible values of the function's input

Each cell in the K-map represents a minterm (i.e. a three variables function has: xyz, xyz,
xyz, xyz, xyz, xyz, xyz and xyz)

Gray Code

Depends on the number of bits in its value

The 1-bit Gray code serves as basis for the 2-bit Gray code, the 2-bit Gray code is the basis for
3-bit Gray code, etc

Gray code sequences are cycles: 000 -> 001 -> 011 -> 010 -> 110 -> 111 -> 101 -> 100 -> 000

Adjacent values differ by only one bit

K-map Example

Lets consider (xy+yz) = xy + xz + yz

Group together the 1s in the map:

g1: xyz+xyz=xy(z+z)=xy

g2: xyz+xyz = yz(x+x)=yz

g3: xyz+xyz=xz(y+y)=xz

To derive a minimal expression we must select the fewest groups that cover all active minterms
(1s).
9

(xy + yz)= xy + yz
x

x'y'+y'z'+yz'

K-map for more complex function


wxyz + wxyz + wxyz + wxyz + wxyz + wxyz
The final minimized function is:
xz + wx + wxyz

10

1.4 Multiplexers

It is a selector: it chooses one of its data inputs and passes it to the output according to some
other selection inputs

Consider four binary data inputs as inputs of a multiplexer. Two select signals will determine
which of the four inputs will be passed to the output.

Figure (a) presents the internal structure of a four inputs multiplexer, b and c present the
multiplexer schematic representation with active high enable signal (b) and active low enable
signal (c)

11

Multiplexer schematic representation with active high enable signal

Multiplexer schematic representation with active low enable signal

Multiplexers can be cascaded to select from a large number of inputs 4 to 1 multiplexer made

of 2 to 1 multiplexers
1.5 Decoders and Encoders

A decoder accepts a binary value as input and decodes it.

It has n inputs and 2n outputs, numbered from 0 to 2n -1.

Each output represents one minterm of the inputs

The output corresponding to the value of the n inputs is activated

For example, a decoder with three inputs and eight outputs will activate output 6 whenever the
input values are 110.

12

Figure (a) shows a two to four decoder internal structure, (b) and (c) show its schematic
representation with active high enable signal and active low enable signal

For inputs S1S0 = 00, 01, 10 and 11 the outputs are 0, 1, 2 respectively 3 are active

As with the multiplexer, the output can tri-state all outputs

Decoders can have active high or active low enable signals.

Variants:

have active low outputs (the selected output has a value 0 and all the other outputs have
a value 1)
13

output all 0 when not enabled instead of state Z (the ones in the figure).

Encoders

The encoder is the exact opposite of the decoder.

It receives 2n inputs and outputs a n bit value corresponding to the one input that has a value of
1.

An 4-to-2 encoder and its schematic representations are presented in (a), (b) and (c).

1.6 Memory
The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were equipped
with additional storage device beyond the capacity of main memory.The main memory unit that
communicates directly with CPU is called the MAIN MEMORY . Devices that provide backup
storage are called AUXILARY MEMORY. Most common auxiliary devices are magnetic disks and
tapes they are used for strong system programs, large data files and other backup information. Only
programs and data currently needed by the processor resides in main memory. All other informations
are stored in auxiliary memory and transferred to the main memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is
to obtain the highest possible access speed while minimizing the total cost of the memory system.
1.7 Flip-Flop
Every digital circuit likely to have a combinational circuit , most systems encountered, its also
include storage elements which describe in terms of sequential circuits. The most common type of
sequential circuit is synchronous.The storage elements employed in clocked sequential circuits are
called flip-flops. A flip-flop is a binary cell capable of storing one bit of information.
Basic RS Flip-Flop (NAND)

14

Clocked D Flip-Flop

Clock Pulse Definition

Master-Slave Flip-Flop

T Flip-Flop

15

JK Flip-Flop

1.8 Sequential circuit


A sequential circuits is an interconnection of flip-flop and gates.The gates by themselves constitutes a
Combinational circuit ,but when included with the flip-flop, the overall circuit is classified as a
Sequential circuit .

State Tables and State Diagrams


Input
Present state

Next state
Y

x/z
Input/output
Present state

Y/z

y
Next
state/output
(a)

16

(b)

Sequential Circuit Example


Input

0
A
B
C
D

Present
state

D
B
C
A

/0
/1
/1
/0

C
A
D
B

1
/1
/0
/0
/1

(a)

0/1
1/1
A

0/0

1/0

1/0

0/0

B
0/1

D
1/1
x /z
(b)

1.9 Register
A register is a group of flip-flops with each flip-flop capable of storing one bit of information. An n-bit
register has a group of n flip-flop and is capable of storing any binary information of n-bits. In addition
to flip-flops a register may have combinational gates that perform certain data -processing tasks. The
flip-flops hold the binary information and the gates control when and how new information is
transferred into the register.

4-Bit Parallel Register

17

D and Q are both sets of lines, with the number of lines equal to the width of each register. There are
often multiple address ports, as well as additional data ports.

1.10 Counters
A register that goes through a predetermined of states upon the application of input pulses is called a
counter. The input pulses may be clock pulses or may originate from external sources. They may occur
at uniform interval of time or at a random. Counters are found in almost all equipment containing
digital logic.
Binary counter

Count-down counter: A binary counter with reverse count: Starts from 15 goes down.

In a count-down counter the least significant bit is complemented with every count pulse. Any
other bit is complemented if the previous bit goes from 0 to 1.

We can use the same counter design with negative edge flip-flops to make a count-down flipflop.
A BCD counter starts from 0 ends at 9.

State diagram of Binary counter.

18

1.11Digital Circuit

19

MULTIPLE CHOICE QUESTIONS


1. Which of the following is not a Boolean Algebra function?
a) AND
b) OR
c) XAND
d) XNOR
2. Which of the following is a De- Morgan Law ?
a) a+b = ab
b) (a-b) = ab
c) (ab)=a+b
d) a+b=a + b
3. Transistors are made up of?
a) Lithium
b) Neon
c) Silicon
d) Nickel
4. How many Type of memory are there?
a) 1
b) 2
c) 3
d) 4
5. Which of the following is not a counter?
a) Century counter
b) Ring Counter
c) Johnson Counter
20

d) Decade Counter
6. What is the main advantage of synchronous logic?
a) Complexity
b) Simplicity
c) Serial Port
d) Regularity
7. How many types of counters are there?
a) 6 type
b) 7 type
c) 8 type
d) 9 type
8. Asynchronous & Synchronous counter is created from?
a) Decoder
b) Encoder
c) JK flip Flop
d) D Flip flop
9. Which of the following is not a type of register?
a) Active Register
b) Data Register
c) Address Register
d) Vector Register
10 . Which of the following is a type of register?
a) Speed Register
b) Shift Register
c) User- accessible Register
21

d) Constant Register
ANSWERS
1. D, 2.C,3.B,4.A,5.B,6.B,7.C,8.A,9.A ,10.B

22

Chapter 2
Introduction to computer organization

2.1 Functional Units

CPU Central Processing Unit, which includes:


o ALU Arithmetic and Logic Unit
o Set of registers (including program counter keeping the address of the next
instruction to be performed)
o Control Unit decoding instructions and controlling internal data movement
among different CPU parts
o Interface Unit moves data between CPU and other hardware components

Memory
o The Main Memory Primary Storage or RAM Random Access Memory

Holds data or a piece of a program code (instructions)

Consists of a large number of cells, each individually addressable.

Cell is called byte and consists of 8 bits (binary elements holding 0 or


1)

Usually a word consists of 4 bytes.

In most cases volatile memory

o The Multilevel Cache - An intermediate stage between ultra-fast registers (inside


CPU) and much slower main memory.

Most actively used information in the main memory is duplicated in the


cache, which is faster, but of much lesser capacity.

2.2.System Buses
2.2.1 Memory Bus
23

The memory bus is the set of wires that is used to carry memory addresses and data to and from the
system RAM. The memory bus in most PCs is also shared with the processor bus, connecting the
system memory to the processor and the system chipset. The memory bus is part of the PC's hierarchy
of buses, which are high-speed communications channels used within the computer to transfer
information between its components.
The memory bus is made up of two parts:

Data bus: carries actual memory data within the PC

Address bus: selects the memory address that the data will come from or go to on a read or write

2.2.2 I/O Bus


The I/O bus facilitates communication between CPU and all the interface units.
I/O bus is of two types:

Isolated I/O
o Separate I/O read/write control lines in addition to memory read/write control lines
o Separate (isolated) memory and I/O address spaces
o Distinct input and output instructions

Memory-mapped I/O
o A single set of read/write control lines (no distinction between memory and I/O
transfer)
o Memory and I/O addresses share the common address space which reduces memory
address range available
o No specific input or output instruction, as a result the same memory reference
instructions can be used for I/O transfers
o

Considerable flexibility in handling I/O operations

2.3 Memory Subsystem Organization

24

2.3.1. Memory Hierarchy

Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.

25

2.3.2. Memory Location: Address Map


Address space assignment to each memory chip. Example: 512 bytes RAM and 512 bytes ROM.
Address bus

Hexa

Component

10 9

8 7 6 5

4 3 2 1

0 0

0 x x x

x x x x

RAM 1

address
0000 - 007F

RAM 2

0080 - 00FF

0 0

1 x x x

x x x x

RAM 3

0100 - 017F

0 1

0 x x x

x x x x

RAM 4

0180 - 01FF

0 1

1 x x x

x x x x

ROM

0200 - 03FF

1 x

x x x x

x x x x

2.3.3.Types of Memory

Main Memory
o Consists of RAM and ROM chips
o Holds instructions and data for the programs in execution
o Volatile
o Fast access
o Small storage capacity
Chip select 1
Chip select 2
Read
Write
7-bit address

CS1
CS2
RD
WR
AD 7

128 x 8
RAM

8-bit data bus

Typical RAM chip


Chip select 1
Chip select 2

9-bit address

CS1
CS2

512 x 8
ROM

8-bit data bus

AD 9

Typical ROM chip


26

Auxiliary Memory
o Information organized on magnetic tapes
o Auxiliary memory holds programs and data for future use
o Non-volatile data remains even when the memory device is taken off-power
o Slower access rates
o Greater storage capacity

Associative Memory
o Accessed by the content of the data rather than by an address
o Also called Content Addressable Memory (CAM)
o Used in very high speed searching applications
o Much faster than RAM in virtually all search applications
o Higher costs

Argument register(A)

Key register (K)


Match Register
Input

Associative memory
array and logic

Read

m words

Write

n bits per word

CAM Hardware Organization

Cache Memory
o Cache is a fast small capacity memory that should hold those information which are
most likely to be accessed
o The property of Locality of Reference makes the Cache memory systems work

27

o Locality of Reference - The references to memory at any given time interval tend to be
confined within a localized areas. This area contains a set of information and the
membership changes gradually as time goes by.

Temporal Locality - The information which will be used in near future is likely
to be in use already

Spatial Locality - If a word is accessed, adjacent(near) words are likely accessed


soon

2.4 I/O Subsystem Organization and Interfacing


2.4.1. A Model of I/O Subsystem Organization

User Processes
User Processes
Directory
Management

Logical
I/O

Communication
Architecture

File System

Physical
Organization

Device I/O
Device I/O
Scheduling & Control
Scheduling & Control

Hardware
Hardware
Local peripheral
device

Communications
port

File
System

2.4.2. I/O Interface

Provides a method for transferring information between internal storage (such as memory and
CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
28

Peripherals - Electromechanical Devices


CPU or Memory - Electronic Device

Data Transfer Rate


o Peripherals - Usually slower
o CPU or Memory - Usually faster than peripherals; Some Synchronization
mechanism may be needed
unit of Information
o Peripherals Byte, Block, etc.
o CPU or Memory Word

Port A
register

Bidirectional
data bus

Bus
buffers

Chip select

CPU

Register select
Register select
I/O read
I/O write

CS
RS1
RS0
RD

Timing
and
Control

Internal bus

Port B
register
Control
register
Status
register

WR

29

I/O data

I/O data

Control

Status

I/O
Device

End chapter quizzes

Q1.Control unit basically


a)
b)
c)
d)

Decode instruction
Moves the data
It is a logical unit
Used to save the data in register

Q2. CPU does not include


a)
b)
c)
d)

ALU
Registers
Control Unit
System Bus

Q3. Memory bus is made up of two parts


a)
b)
c)
d)

Data Bus and Address bus


Data bus and I/O bus
Main memory and address bus
Address bus and I/O bus

Q4. RAM and ROM chip


a)
b)
c)
d)

Volatile and Non volatile


Non volatile and Volatile
Both are volatile
Both are non volatile

Q5. I/O interface work for


a)
b)
c)
d)

Synchronization
Transferring information
Managing memory
Process management
30

Q6. Special high speed memory


a)
b)
c)
d)

Auxiliary memory
Main memory
Associative memory
Cache memory

Q7.Memory is assigned with help of


a)
b)
c)
d)

Memory location
Memory management
Memory address map
By assignment

Q8. CPU- Central Processing Unit has how many parts


a)
b)
c)
d)

5 parts
4 parts
3 parts
2 parts

Q9. Associative Memory is also known as


a)
b)
c)
d)

RAM
ROM
CAD
CAM

Q10. What is the basic function of I/O.


a)
b)
c)
d)

Connect to CPU
Connect to Memory
Connect to Output Devices
Connect to ROM

ANSWERS
1. c, 2.d, 3.a, 4.a, 5.b, 6.d, 7.c, 8.b, 9.d, 10. c

31

Chapter 3
Programming the Basic computer

3.1 Introduction:
A total computer system includes both hardware and software. Hardware consists of physical
components and all associated equipment. Software refers to the program that written to the computer.
It is possible to be familiar with various aspects of computer software without being concerned with of
how computer hardware operates. It is also possible to design the parts of software without a
knowledge of software capabilities. A program written by a user may be either dependent or
independent of the physical computer that runs his program. For this programming language is a
interface between computer and programmer. The set of instructions written in a language which is
being translated in machine compatible language.
3.2 The level of programming languages
3.2.1

Low level Programming Language:


2.2.1.1. First generation language.
2.2.1.2. Second generation language.
Low level programming language includes machine languages and Assembly
languages. Both are machine dependent like a programs written in low-level
languages for IBM PC cannot be run in a Macintosh computer. e.g. LEGO
programming. Low level languages are used to control the hardware for e.g. Driver
programs are written in low-level languages. And also Programs written in low-level
languages are usually small in size and efficient.

3.2.2.

High level Programming Language


1. Third Generation Language (3GL)
2. Fourth Generation Language (4GL)

3.2.3.

Comparison : High & Low Level languages

3.21.1 First generation languages

The first generation programming languages:


32

Machine languages
Defined by the hardware design of the computer
Machine dependent.
Instruction formed by binary digits (0or 1)Corresponding to an ON or OFF state of an
electric circuit .
Machine Instruction has two parts: op code and operand
1. Op code : Operation to be performed as a verb like increment, add , copy etc.
2. operand: operand acts like a noun on the data for example incrementing the value of A,
A is the operand. It can be a value or an address of memory location.(Remark: variable is
machine language is in form of address, i.e address = variable)

3.2.1.2. Second generation languages

Assembly languages: The user employs symbols (letter numerals , or special characters )
for the operation parts , the address part and other parts of instruction code. Each symbolic
instruction can be translated in to one binary coded instruction. This translation is done by
the special program called assembler. Because an assembler translates the symbols, this
type of symbolic program is referred to as an assembly language program and the language
is assembly language.
Instructions are 1:1 with machine instructions.
Symbolic instruction code (or mnemonics).
1. Using symbolic instruction code to replace binary digits.
2. Meaningful abbreviations that substitute the op code
3. e.g. JMP A means jump to address represented by A.
3.2.3 High-Level Programming Languages

Machine-independent

Third generation languages or above

More like human languages

Let programmers concentrate on the logic rather than machine


architecture.
These are the special languages developed to reflected the procedure used in the
solution of a problem rather than be concerned with computer hardware
behavior e.g. FORTAN.
3.2.3.1. Third generation language (3GL)

More human like than assembly language


Meet demand of efficiency and effectiveness
Examples : C, Fortran, COBOL, BASIC, PASCAL etc.
Each high-level language statement is translated () into many

machine instructions.
32.2.3.2. Fourth generation language (4GL)

Each 4GL instruction represents many 3GL statements.


Typical example: SQL (Structured Query Language).
33

e.g. SELECT * FROM student ORDER BY class, class_no;


will display all the records in table Student in ascending order of class and class_no.
If the same task is written in a 3GL, instructions have to be written:
Sort the records in ascending order of class and class_no;Get each record, test for
end of file, put each item on screen;Go back and repeat the operation until no more
records to process.

2.2.3. Comparing Low & High-level Languages (1)


1. Advantages of high-level language

1.
2.
3.

4.

Advantages of HIGH level


language
more effective

Reasons

easier to learn, write, read and


modify
programs are shorter in length

instructions are English-like format

portable (can use in different


computers with little
modification)

2.

one high-level language statement


take the place of many machine
instructions
machine-independent

Advantages of low-level language

Advantages of LOW level


language
more efficient

1.

designed to solve problem

Reasons

hardware features, like bus width, is


fully made use of
require smaller memory space to fewer low-level instructions are
run
needed to do the same task (no
redundant instructions produced by a
translator)
precise control of hardware
some low-level instructions are
designed specifically for controlling
the hardware at low level

2.

3.

Examples of High Level Languages

Designed to solve different problems.


Ranging from business to games .
Some common programming languages:

1.

PASCAL

2.

Designed for teaching structured programming.


C
34

3.

Java

4.

Includes some syntax of C.


Java programs can be called from HTML document or run directly.
Computer needs Java Virtual Machine to execute Java Programs e.g. Yahoo Snooker
.
COBOL

5.

Designed to operate computer at low-level using high level language.


More readable and better structured than Assembly language.
Similar language C++ is popular in game programming.

Designed for business applications


Wordy language
Readable for beginners
e.g. multiply hourly-rate by hours-worked giving gross-pay
i.e. gross-pay hourly-rate * hours-worked. But, can be clumsy for long
programs.
BASIC

Designed to provide students with an easy-to-learn language.


Visual Basic
A version of BASIC
Specialized for developing Windows applications
Belong to Rapid Application Development (RAD)
Programming can be done by drag-and-drop
Programmers can quickly build an application
Visual Basic.NET
Newest version of Visual Basic
Supports all Web-based features
6.

Script Languages
Interpreted and processed by a software (e.g. Browser)
e.g.
VBScript and JavaScript
have similar syntax as Visual Basic and Java respectively
Interpreted and processed by Web browser
Instructions are embedded in HTML documents
35

Increasing the function of HTML


Making Web pages more interactive
3.3 Translators
Translator consist of three attributes source program, translator and object program. Source program
is a text file used to for storing source code which must be translated into machine instructions before
execution .Translator is a software to convert source code into machine instructions. And it is used to
discover syntax errors in a program.Object program is a binary file e.g. EXE used for storing machine
instructions during execution.
Types of translators:
3.3.1. Assembler
3.3.2. Compiler
3.3.3. Interpreter
3.3.1. Assemblers
A software used to translate assembly language program into machine instructions for producing one
machine instruction for each assembly language statement which translate the whole source program
before execution and the store the results in an object program(During execution, need the object
program only).
3.3.2 Compilers
Compiler (e.g. Visual Basic Make EXE)
A software
Translate high-level language program into machine instructions
Translate the whole source program before execution
Storing the results in an object program
3.3.3. Interpreters (e.g. Visual Basic Run)

A software for translating ONE high-level language program statement at a time


send it to the computer for execution and then proceed to the NEXT statement.
No object program is produced when program is run, the following must be
present interpreter and source code.
Interpreted program runs slower than compiled ones because time is spent on
translation .
Like by most of the programmers as it is more convenient in testing.
3.3.4. Comparison of Translators

36

Assembler

Compiler

Interpreter

Translate low-level language


Translate high-level language into machine language
into machine language
A new file, called object program, is created
No new file is produced
Programs run faster

Programs run slower

Translate the whole source program

Translate and execute a


statement at a time.
Convenient for programmer

Less convenient for programmer. Every change in the


source program needs compilation.
Programmer can hide the source program
Translator is not necessary at run-time

The source program must be


present
Translator is necessary at
run-time

3.3.5. Instruction sets


An instruction set is the complete collection of instructions that are understood (executed) by a CPU
that is machine code, binary code and usually represented by assembly codes.
2.3.5.1. Elements of a Machine Instruction each instruction must contain the information required by
the CPU for execution.Elements of a Machine Instruction are:

Operation Code: Specifies the operation to be performed. The operation is specified by a


binary code, known as the operation code, or opcode.
Source operand reference: The operation may involve one or more source operands; that is,
operands that are the inputs for the operation.
Result operand reference: The operation may produce a result.
Next instruction reference: This tells the CPU where to fetch the next instruction after the
execution of this instruction is complete.

3.3.5.2. Source and Result operands:(values for that operands are of importance)

Main (or virtual) memory: The memory address must be supplied.


CPU register: CPU contains one or more registers that may be referenced by machine
instructions. If only one register exists, reference to it may be implicit. If more than one
register exists, then each register is assigned a unique number, and the instruction must contain
the number of the desired register.
I/O device: The instruction must specify the I/O module or device for the operation. If
memory-mapped I/O is used, this is just another memory address.
Each instruction is represented by a sequence of bits. The instruction is divided into fields,
corresponding to the constituent elements of the instruction.

3.3.5.3Instruction Types
Example: High level language statement: X = X + Y
37

If we assume a simple set of machine instructions, this operation could be accomplished with
three instructions: (assume X is stored in memory location 624, and Y in memory loc. 625.)
1. Load a register with the contents of memory location 624.
2. Add the contents of memory location 625 to the register,
3. Store the contents of the register in memory location 624.
As seen, a simple "C" (or BASIC) instruction, may require 3 machine instructions.
The instructions fall into one the following four categories:
Data processing: Arithmetic and logic instructions.
Data storage: Memory instructions.
Data movement: I/O instructions.
Control: Test and branch instructions.
What is the maximum number of addresses one might need in an instruction?

Virtually all arithmetic and logic operations are either unary (one operand) or binary (two
operands). The result of an operation must be stored, suggesting a third address. Finally, after
the completion of an instruction, the next instruction must be fetched, and its address is needed.
This line of resoning suggests that an instruction could be required to contain 4 address
references: two operands, one result, and one address of the next instruction. In practice, the
address of the next instruction is handled by the program counter; therefore most instructions
have one, two or three operand addresses.
Three-address instruction formats are not common, because they require a relatively long
instruction format to hold three address references.
Number of Addresses

Instruction Set Design


The fundamental design issues in designing an instruction set are:

38

Operation Repertoire: How many and which operations to provide, and how complex
operations should be.
Data Types: The various types of data upon which operations are performed.
Instruction Format: Instruction length (in bits), number of addresses, size of various fields,
and so on.
Registers: Number of CPU registers that can be referenced by instructions and their use.
Addressing: The mode or modes by which the address of an operand is specified.
These issues are highly interrelated and must be considered togeter in designing an instruction set.
3.3.6. Types of Operands
3.3.6.1. Most important general categories of data are:
Addresses
Numbers
Characters
Logical data.
Numbers:
o Integer or fixed point
o Floating point (real numbers)
o Decimal
Characters:
o International Reference Alphabet (IRA) which is referred to as ASCII in the USA,
o EBCDIC character set used in IBM 370 machines.
Logical Data:

Boolean data. 1 = true, 0 = false.

Types of Operations
A set of general types of operations is as follows:

Data transfer: Move, Store, Load (Fetch), Exchange, Clear (Reset), set, Push, Pop.
Arithmetic: Add, Subtract, Multiply, Divide, Absolute, Negate, Increment, Decrement.
Logical: AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set control variables.
Conversion: Translate, Convert.
I/O: Input (Read), Output (Write), Start I/O, Test I/O.
Transfer of Control: Jump (Branch), Jump Conditional, Jump to Subroutine, Return, Execute,
Skip, Skip Conditional, Halt, Wait (Hold), No Operation.
System Control: Instructions that can only be executed while the processor is in a priviledged
state, or is executing a program in a special priviledged area of memory. Instructions reserved
for the use of operating system.
39

3.3.7. Instruction Formats

Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction set

Instruction Length:

Affected by and affects:


Memory size
Memory organization
Bus structure
CPU complexity
CPU speed
Trade off between powerful instruction repertoire and saving space

Allocation of Bits:

Number of addressing modes


Number of operands
Register versus memory
Number of register sets
Address range
Address granularity

Variable-Length Instructions:

1.
2.

Due to varying number of operands,


Due to varying length of opcode in some CPU's.

40

MULTIPLE CHOICE QUESTIONS

1. The speed at which the CPU processes data to convert is measured in what :
a) Megahertz
b) Gigahertz
c) Nanoseconds
d) A and B
2. Machine cycles are measured in nanoseconds or picoseconds..
a) True
b) False
3. Which register example holds the address of the current instruction being processed?
a) Program counter
b)
Instruction register
c)
Control Unit
d)
Arithmetic Logic Unit
4. What is a machine cycle :
a) Data measured in megahertz
b) Data measured in gigahertz
c) A sequence of instruction perform to execute one program instruction
d) All of the Above
5. Most of the computers available today are known as
a). 3rd generation computers

b). 4 th generation computers


41

c).5th generation computers

d). 6th

generation computers

6. What difference does the 5th generation computers have from othe generation computers?
a). Technological advancement

b). Scientific code

c). Object Oriented Programming

d). All of the above

7.Which of the following language is translated in to pseudo code?


a). Assembly .language
c).PASCAL

b). FORTRAN

d). BASIC

8. Which translator translate high level into machine language


a). compiler and interpreter
b). compiler and assembler
c) assembler and interpreter

9.Which translator translate the whole source program before execution


a) compiler
b) assembler
c) interpreter
10. Instruction format does not include
a)
b)
c)
d)

Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction set
nclude only operators

Answers
1.a , 2.b, 3.b, 4.a, 5.b 6.a , 7.a ,8.a, 9. a,10.d

42

Chapter 4
CPU ORGANIZATION
4.1 General Register Organization In a computational device memory access is the most time
consuming process. So, it would be more convenient & more efficient to store these intermediate
values in processor register.
Input

Cloc
k
R
1
R
2
R
3
R
4
R
5
R
6
R
Load 7
(7 lines)

SEL
A

3x8
decoder

SEL
D

MUX

MUX

A bus

B
bus

ALU

OPR

Output

The overall layout includes:


o Bus Systems
43

} SELB

MUX A & B - select one of 7 register or external data input by SELA


and SELB

BUS A and B : form the inputs to a common ALU

3 X 8 Decoder: select the register (by SELD) that receives the


information from ALU

o Control Word

Format 14 binary selection inputs with 3 bits for SELA, 3 for SELB, 3
for SELD, and 5 for OPR

Encoding of register selection fields

o ALU An operation is selected by the ALU operation selector (OPR). The result
of a microoperation is directed to a destination register selected by a decoder
(SELD).

44

Example of CPU Microprograms

4.2 Stack Organization


o Stack - A storage device that stores information in such a manner that the item
stored last is the first item retrieved. Hence, also called last-in first-out (LIFO)
list.
o Use of Stack
Temporary storage during program runtime
Not fixed in size: grows and shrinks using push and pop
operations
Not typed: elements of list not defined as one type
Subroutines (functions, procedures)

Linkage: saving a link back to the caller (address of calling


routine)

Data: return value, parameters, local variables

Other saved information

Managed at the assembly language level

Rules for compiler, AL programmer

o Stack Pointer (SP) - A register that holds

the address of the top item in the

stack. SP always points at the top item in the stack.


o Push - Operation to insert an item into the stack.
o Pop - Operation to retrieve an item from the stack.

45

4.3 Register Organization


One of the key difference among various computers is the difference in their register sets. Some
computers have very large while some has smaller sets. But on the whole, from a user's point of view
the register set can be classified under two basic categories
Programmer Visible Registers: These registers can be used by machine or assembly language
programmers to minimize the references to main memory.
Status Control and Registers: These registers can not be used by the programmers but are used to
control the CPU or the execution of a program.
4.3.1 Program Visible Register
These registers can be accessed using machine language. In general we encounter four types of
programmer visible registers.

General Purpose Registers

Data registers

Address Registers

Condition Codes Registers


The general purpose registers are used for various functions desired by the processor. A true general
purpose register can contain operand or can be used for calculation of address of operand for any
operation code of an instruction. But trends in today's machines show drift towards dedicated registers,
for example, some registers may be dedicated to floating point operations. In some machines there is a
distinct separation between data and address registers. The data registers are used only for storing
intermediate results or data. These data registers are not used for calculation of address of the operand.
An address register may be a general-purpose register, but some dedicated address registers are also
used in several machines. Examples of dedicated address registers can be:
Segment Pointer : Used to point out a segment of memory.
Index Register : These are used for index addressing scheme.
Stack Pointer (when programmer visible stack addressing is used.)
In the case of specialized register the number of bits needed for register specific details are reduced as
here we need to specify only few registers out of a set of registers. However, this specialization does
not allow much flexibility to the programmer.
Another issue related to the register set design is the number of general-purpose registers or data and
address registers to be provided in a microprocessor. The number of registers also effects the
instruction design as the number of registers determines the number of bits needed in an instruction to
specify a register reference. In general, it has been found that optimum number of registers in a CPU is
in the range 8 to 32. In case registers fall below the range then more memory reference per instruction
on an average will be needed, as some of the intermediate result then has to be stored in the memory.
46

On the other hand, if the number of registers goes above 32, then there is no appreciable reduction in
memory references.
4.4 Status and Control Registers
For control of various operations several registers are used. These registers can not be used in data
manipulations, however. the content of some of these registers can be used by the programmer. Some
of the control register for a von Neurnann machine can be the Program Counter (PC), Memory Address
Register (MAR) and Data Register .Almost all the CPUs, have status register, a part of which may be
programmer visible. A register which may be formed by condition codes is called condition code
register. Some of the commonly used flags or condition codes in such a register may be:
Sign flag : This indicates whether the sign of previous arithmetic operation was positive (0) or
negative (1).
Zero flag : This flag bit will be set if the result of last arithmetic operation was zero.
Carry flag : This flag is set, if a carry results from the addition of the highest order bits or a borrow is
taken on subtraction of highest order bit.
Equal flag : This bit flag will be set if a logic comparison operation finds out that both of its operands
are equal.
Overflow flag : This flag is used to indicate the condition of arithmetic overflow.
Interrupt : This flag is used for enabling or disabling interrupts.
Supervisor : This flag is used in certain computers to determine whether the CPU is,flag executing in
supervisor or user mode. In case the CPU is in supervisor modeit will be allowed to execute certain
privileged instructions. In most CPUs, on encountering a subroutine call or interrupt handling routine,
it is desired that the status information such as conditional codes and other register information be
stored and restored on initiation and end of these routines respectively. The register often known as
Program Status Word (PSW) contains condition code plus other status information. There can be
several other status and control registers such as interrupt vector register in the machines using
vectored interrupt, stack pointer if a stack is used to implement subroutine calls, etc.T4.2 Status and
Control Registers has status and control register design and also dependent on the Operating System
(0s) support. The functional understanding of OS helps in tailoring the register organization. In fact,
some control information is only of specific use to the operating system.
One major decision to be taken for designing status and control registers organization is :
how to allocate control information between registers and the memory. Generally first few hundred or
thousands words of memory are allocated for storing control information. It is the responsibility of the
designer to determine how much control information should be in registers and how much in memory.

47

4.5 Register Transfer Language

Rather than specifying a digital system in words, a specific notation is used, register transfer
language
For any function of the computer, the register transfer language can be used to describe the
(sequence of) microoperations
Register transfer language:
A symbolic language which is a convenient tool for describing the internal organization of
digital computers and can also be used to facilitate the design process of digital systems.
Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13, IR)
Often the names indicate function:
MAR - memory address register
PC
- program counter
IR
- instruction register
Registers and their contents can be viewed and represented in various ways
A register can be viewed as a single entity:

Registers may also be represented showing the bits of data they contain

4.6 Register Transfer


Computer registers are designated by capital letters to denote the function of registers eg. The register
that holds an address for the memory unit is usually called a memory address register and is designated
by the name MAR. Other designation for registers are PC (for programme counter), IR (for instruction
register), and R1 (for processor register). The individual flip flops is an n-bit register are3 numbered in
sequence from zero through n-1, starting from zero is the right most position and increasing the
number towards the left.
Block diagram of register

Copying the contents of one register to another is a register transfer


A register transfer is indicated as R2 R1
In this case the contents of register R2 are copied (loaded) into register R1
A simultaneous transfer of all bits from the source R1 to the destination register R2,
during one clock pulse.

48

Note that this is a non-destructive; i.e. the contents of R1 are not altered by copying
(loading) them to R2

4.7 Bus And Memory Transfer


Bus is a path(of a group of wires) over which information is transferred, from any of several sources to
any of several destinations.
From a register to bus: BUS R
Bus system for four registers

Transfer From Bus to a Destination Register

49

4.8 Arithmetic Micro operations

The basic arithmetic microoperations are


Addition
Subtraction
Increment
Decrement
The additional arithmetic microoperations are
Add with carry
Subtract with borrow
Transfer/Load

50

4.9 Logic Microoperation


Specify binary operations on the strings of bits in registers
Logic microoperations are bit-wise operations, i.e., they work on the individual bits
of data
and useful for bit manipulations on binary data even useful for making logical decisions based on
the bit value.

There are, in principle, 16 different logic functions that can be defined over two binary
input variables

However, most systems only implement four of these are AND (), OR (), XOR (),
Complement/NOT.
The others can be created from combination of these.
List of Logic Microoperations
- 16 different logic operations with 2 binary vars.

- n binary vars

functions

51

Truth tables for 16 functions of 2 variables and the corresponding 16 logic microoperations

Application of logic Microoperation

Logic microoperations can be used to manipulate individual bits or a portions of a word in a


register
Consider the data in a register A. In another register, B, is bit data that will be used to modify
the contents of A

Selective-set
Selective-complement
Selective-clear
Mask (Delete)
Clear
Insert
Compare

A
A
A
A
A
A
A

A+ B
A B
A B
A B
A B
(A B) + C
A B

4.10 Shift Micro operations


Logic micro operation specify binary operations for strings of bits stored in
register. These
operations consider each bit of register separately and treat them as binary variables. For example, the
exclusive OR micro operation with the content of two registers R1 and R2 is symbolized by the
statement.
P: R1 R1 R2
It specifies a logic micro operation to be executed on the individual bits of registers provided that
controls variable P = 1.
a).There are three types of shifts

Logical shift
52

Circular shift
Arithmetic shift
b).What differentiates them is the information that goes into the serial input

c).
A right shift operation

A left shift operation

4.11 Control Unit


The Control Unit can be thought of as the brain of the CPU itself. It controls based on the instructions
it decodes, how other parts of the CPU and in turn, rest of the computer systems should work in order
that the instruction gets executed in a correct manner. There are two types of control units, the first
type is called hardwired control unit. Hardwired control units are constructed using digital circuits and
once formed cannot be changed. The other type of control unit is micro programmed control unit. A
micro programmed control unit itself decodes and execute instructions by means of executing micro
programs.
4.11.1 Structure of control unit
Control Unit Block Diagram

53

4.11.2 Hardwired Control Unit


Figure 1 is a block diagram showing the internal organization of a hard-wired control unit for our
simple computer. Input to the controller consists of the 4-bit opcode of the instruction currently
contained in the Instruction Register and the negative flag from the accumulator. The controller's
output is a set of 16 control signals that go out to the various registers and to the memory of the
computer, in addition to a HLT signal that is activated whenever the leading bit of the op-code is one.
The controller is composed of the following functional units: A ring counter, an instruction decoder,
and a control matrix.
The ring counter provides a sequence of six consecutive active signals that cycle continuously.
Synchronized by the system clock, the ring counter first activates its T0 line, then its T1 line, and so
forth. After T5 is active, the sequence begins again with T0. Figure 2 shows how the ring counter
might be organized internally. The instruction decoder takes its four-bit input from the op-code field of
the instruction register and activates one and only one of its 8 output lines. Each line corresponds to
one of the instructions in the computer's instruction set. Figure 3 shows the internal organization of this
decoder. The most important part of the hard-wired controller is the control matrix. It receives input
from the ring counter and the instruction decoder and provides the proper sequence of control signals.

Figure 1.Block diagram of hard-wired control unit

Figure 2. The internal organization of Ring Counter

54

Figure 3.The internal organization of hard-wired Instruction Decoder

4.11.3 A Micro-programmed Control Unit


As we have seen, the controller causes instructions to be executed by issuing a specific set of control
signals at each beat of the system clock. Each set of control signals issued causes one basic operation
(micro-operation), such as a register transfer, to occur within the data path section of the computer. In
the case of a hard-wired control unit the control matrix is responsible for sending out the required
sequence of signals.
An alternative way of generating the control signals is that of micro-programmed control. In order to
understand this method it is convenient to think of sets of control signals that cause specific microoperations to occur as being "microinstructions" that could be stored in a memory. Each bit of a
microinstruction might correspond to one control signal. If the bit is set it means that the control signal
will be active; if cleared the signal will be inactive. Sequences of microinstructions could be stored in
an internal "control" memory. Execution of a machine language instruction could then be caused by
fetching the proper sequence of microinstructions from the control memory and sending them out to
the data path section of the computer. A sequence of microinstructions that implements an instruction
on the external computer is known as a micro-routine. The instruction set of the computer is thus
determined by the set of micro-routines, the "microprogram," stored in the controller's memory. The
control unit of a microprogram-controlled computer is essentially a computer within a computer.

55

Figure 4. A Microprogrammed Control Unit for the Simple Computer

Figure 4 is a block diagram of a micro-programmed control unit that may be used to implement the
instruction set of the computer we described above. The heart of the controller is the control 32 X 24
ROM memory in which upt to 32 24-bit long microinstructions can be stored. Each is composed of two
main fields: a 16-bit wide control signal field and an 8-bit wide next-address field. Each bit in the
control signal field corresponds to one of the control signals discussed above. The next-address field
contains bits that determine the address of the next microinstruction to be fetched from the control
ROM. We shall see the details of how these bits work shortly. Words selected from the control ROM
feed the microinstruction register. This 24-bit wide register is analogous to the outer machine's
instruction register. Specifically, the leading 16 bits (the control-signal field) of the microinstruction
register are connected to the control-signal lines that go to the various components of the external
machine's data path section.
Addresses provided to the control ROM come from a micro-counter register, which is analogous to the
external machine's program counter. The micro-counter, in turn, receives its input from a multiplexer
which selects from : (1) the output of an address ROM, (2) a current-address incrementer, or (3) the
address stored in the next-address field of the current microinstruction.
56

4.12 Case Study: 8085 Microprocessor

The 8085 is an 8-bit general purpose microprocessor that can address 64K Byte of memory. It
has 40 pins and uses +5V for power. It can run at a maximum frequency of 3 MHz.The pins on
the chip can be grouped into 6 groups:
Address Bus.

Data Bus.

Control and Status Signals.

Power supply and frequency.

Externally Initiated Signals.

Serial I/O ports.

The Address and Data Busses

The address bus has 8 signal lines A8 A15 which are unidirectional.The other 8 address bits
are multiplexed (time shared) with the 8 data bits.So, the bits AD0 AD7 are bi-directional and
serve as A0 A7 and D0 D7 at the same time.During the execution of the instruction, these
lines carry the address bits during the early part, then during the late parts of the execution, they
carry the 8 data bits.In order to separate the address from the data, we can use a latch to save
the value before the function of the bits changes.

The Control and Status Signals

There are 4 main control and status signals. These are:


ALE: Address Latch Enable. This signal is a pulse that become 1 when the AD0
AD7 lines have an address on them. It becomes 0 after that. This signal can be
used to enable a latch to save the address bits from the AD lines.

RD: Read. Active low.

WR: Write. Active low.

IO/M: This signal specifies whether the operation is a memory operation


(IO/M=0) or an I/O operation (IO/M=1).

S1 and S0 : Status signals to specify the kind of operation being performed


.Usually un-used in small systems.

Frequency Control Signals

There are 3 important pins in the frequency control group.X0 and X1 are the inputs from the
crystal or clock generating circuit.The frequency is internally divided by 2.So, to run the

57

microprocessor at 3 MHz, a clock running at 6 MHz should be connected to the X0 and X1


pins.

CLK (OUT): An output clock pin to drive the clock of the rest of the system.

Steps For Fetching an Instruction

Lets assume that we are trying to fetch the instruction at memory location 2005. That means
that the program counter is now set to that value.The following is the sequence of operations:
The program counter places the address value on the address bus and the
controller issues a RD signal.

The memorys address decoder gets the value and determines which memory
location is being accessed.

The value in the memory location is placed on the data bus.

The value on the data bus is read into the instruction decoder inside the
microprocessor.

After decoding the instruction, the control unit issues the proper control signals
to perform the operation.

MULTIPLE CHOICE QUESTIONS


1. Which register example hold the instruction itself?
A) Program Counter
B) Instruction Register
C) Control Unit
D) Arithmetic Logic unit
58

2. To accomplish a task a computer has to process data in three stages. They are:

3. The CPU is also known as:


A) The Brain
B) The Processor
C) The Central Processing Unit
D) All of the above
4. Main Memory holds data and instructions being processed by the computer and is this memory is
accessible by the CPU.
a) True
b) False
5. It is difficult to classify computer systems on the basis of their system performance ,as newer,
smaller computer systems outperform their larger models of yesteryear.
a) True
b) False
6.The two smaller units of the processor are CU and ALU
a). True
b). False

7.Which smaller unit of the CPU directs and coordinates all activities within it and determines the
sequence in which instructions are executed, sending instruction sequence to the other units.
a)

CU

b)

ALU

c)

PROCESSOR
59

d)

All of the above

8. The CPU primary responsibility is the movement of data and instructions from itself to main
memory and ALU and back. Arrange the CU execution of instruction in the correct order by placing
the execution instructions letter in the box provided:
a)

Send data to memory unit after processing

b)

Fetches data required by the instruction from memory

c)

Fetches the instruction from memory

d)

Decode the instruction

e)

Send data and instruction to the ALU for processing

9.

Which smaller CPU unit contains registers-temporary storage locations that hold a single
instruction or data item needed immediately and frequently

a)

CU

b)

ALU

c)

PROCESSOR

d)

All of the above

10.

Program counter (PC) and instruction register (IR) are examples of registers:

a)

True

b)

False

Answers
1 ,2 ,3 ,4 ,5 ,6 ,7 8, 9, 10

Chapter 5
Computer Arithmetic
5.1 INTRODUCTION
Because electronic logic deals with currents that are on or off, it has been found convenient to
represent quantities in binary form to perform arithmetic on a computer. Thus, instead of having ten
different digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in binary arithmetic, there are only two different digits, 0
60

and 1, and when moving to the next column, instead of the digit representing a quantity that is ten
times as large, it only represents a quantity that is two times as large. Thus, the first few numbers are
written in binary as follows:
Decimal
Zero
0
One
1
Two
2
Three
3
Four
4
Five
5
Six
6
Seven
7
Eight
8
Nine
9
Ten
10
Eleven 11
Twelve 12

Binary
0
1
10
11
100
101
110
111
1000
1001
1010
1011
1100

The addition and multiplication tables for binary arithmetic are very small, and this makes it possible
to use logic circuits to build binary adders.
+|0 1
--------0|0 1
1 | 1 10

*|0 1
--------0|0 0
1|0 1

Thus, from the table above, when two binary digits, A and B are added, the carry bit is simply (A AND
B), while the last digit of the sum is more complicated; ((A AND NOT B) OR ((NOT A) AND B)) is
one way to express it.
5.2 Number Systems

ALU does calculations with binary numbers


Decimal number system
Uses 10 digits (0,1,2,3,4,5,6,7,8,9)
In
decimal
system,
a
number
84,
e.g.,
means
84 = (8x10) + 3
4728 = (4x1000)+(7x100)+(2x10)+8
Base or radix of 10: each digit in the number is multiplied by 10 raised to a power
corresponding to that digits position
E.g. 83 = (8x101)+ (3x100)
4728 = (4x103)+(7x102)+(2x101)+(8x100)

Decimal number system

Fractional values, e.g.


61

472.83=(4x102)+(7x101)+(2x100)+(8x10-1)+(3x10-2)
In
general,
for
the
decimal
X
=
{
x2x1x0.x-1x-2x-3

representation

of
}

X = i xi10i
Binary Number System

Uses only two digits, 0 and 1


It is base or radix of 2
Each digit has a value depending on its position:
102 = (1x21)+(0x20) = 210
112 = (1x21)+(1x20) = 310
1002 = (1x22)+ (0x21)+(0x20) = 410
1001.1012=(1x23)+(0x22)+(0x21)+(1x20)+(1x2-1)+(0x2-2)+(1x2-3)= 9.62510
Decimal to Binary conversion

Integer and fractional parts are handled separately,


Integer part is handled by repeating division by 2
Factional part is handled by repeating multiplication by 2
E.g. convert decimal 11.81 to binary
Integer part 11
Factional part .81
Decimal to Binary conversion, e.g. //

e.g. 11.81 to 1011.11001 (approx)


11/2 = 5 remainder 1
5/2 = 2 remainder 1
2/2 = 1 remainder 0
1/2 = 0 remainder 1
Binary number 1011
.81x2 = 1.62 integral part 1
.62x2 = 1.24 integral part 1
.24x2 = 0.48 integral part 0
.48x2 = 0.96 integral part 0
.96x2 = 1.92 integral part 1
Binary number .11001 (approximate)
Hexadecimal Notation:

command ground between computer and Human


Use 16 digits, (0,1,3,9,A,B,C,D,E,F)
1A16=(116x161)+(A16x16o)
= (110 x 161)+(1010 x 160)=2610
Convert group of four binary digits to/from one hexadecimal digit,
0000=0; 0001=1; 0010=2; 0011=3; 0100=4; 0101=5; 0110=6; 0111=7; 1000=8;
1001=9; 1010=A; 1011=B; 1100=C; 1101=D; 1110=E; 1111=F;
62

e.g.
1101 1110 0001. 1110 1101 = DE1.DE

Integer Representation (storage)

Only have 0 & 1 to represent everything


Positive numbers stored in binary
e.g. 41=00101001
No minus sign
No period
How to represent negative number
Sign-Magnitude
Twos compliment
Sign-Magnitude

Left most bit is sign bit


0 means positive
1 means negative
+18 = 00010010
-18 = 10010010
Problems
Need to consider both sign and magnitude in arithmetic
Two representations of zero (+0 and -0)

Twos Compliment (representation)


+3 = 00000011
+2 = 00000010
+1 = 00000001
+0 = 00000000
-1 = 11111111
-2 = 11111110
-3 = 11111101
Benefits

One representation of zero


Arithmetic works easily (see later)
Negating is fairly easy (2s compliment operation)
3 = 00000011
Boolean complement gives 11111100
Add 1 to LSB
11111101

5.3 Addition and Subtraction

Normal binary addition


0011
0101

1100
63

5.4

+0100
+0100
+1111
---------------------------0111
1001 = overflow 11011
Monitor sign bit for overflow (sign bit change as adding two positive numbers or two
negative numbers.)
Subtraction: Take twos compliment of subtrahend then add to minuend
i.e. a - b = a + (-b)
So we only need addition and complement circuits

Multiplication
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Multiplication Example

(unsigned numbers e.g.)


1011 Multiplicand (11 dec)
x 1101 Multiplier (13 dec)
1011 Partial products
0000 Note: if multiplier bit is 1 copy
1011 multiplicand (place value)
1011 otherwise zero
10001111 Product (143 dec)
Note: need double length result

5.4.1 Booths Algorithm for multiplication

64

Example of Booths Algorithm

5.5 Division

More complex than multiplication


However, can utilize most of the same hardware.
Based on long division

Division of Unsigned Binary Integers

65

Divisor

00001101
1011 10010011
1011

Partial
Remainders

Quotient
Dividend

001110
1011
001111
1011
100

Remainder

Flowchart for Unsigned Binary division

Real Numbers

Numbers with fractions


Could be done in pure binary
1001.1010 = 24 + 20 +2-1 + 2-3 =9.625
66

Where is the binary point?


Fixed?
Very limited
Moving?
How do you show where it is?
5.6 Floating Point

+/- .significand x 2exponent


Point is actually fixed between sign bit and body of mantissa
Exponent indicates place value (point position)

Floating Point Examples

Signs for Floating Point

Exponent is in excess or biased notation

67

e.g. Excess (bias) 127 means


8 bit exponent field
Pure value range 0-255
Subtract 127 to get correct value
Range -127 to +128

The relative magnitudes (order) of the numbers do not change. Can be treated as integers for
comparison.
5.6.1 Normalization

FP numbers are usually normalized


i.e. exponent is adjusted so that leading bit (MSB) of mantissa is 1
Since it is always 1 there is no need to store it
(c.f. Scientific notation where numbers are normalized to give a single digit before the
decimal point
e.g. 3.123 x 103)
FP Ranges

For a 32 bit number


8 bit exponent
+/- 2256 1.5 x 1077
Accuracy
The effect of changing lsb of mantissa
23 bit mantissa 2-23 1.2 x 10-7
About 6 decimal places
Expressible Numbers

IEEE 754

Standard for floating point storage


68

32 and 64 bit standards


8 and 11 bit exponent respectively
Extended formats (both mantissa and exponent) for intermediate results
Representation: sign, exponent, faction
0: 0, 0, 0
-0: 1, 0, 0
Plus infinity: 0, all 1s, 0
Minus infinity: 1, all 1s, 0
NaN; 0 or 1, all 1s, =! 0

FP Arithmetic +/ Check for zeros


Align significands (adjusting exponents)
Add or subtract significands
Normalize result
FP Arithmetic x/

Check for zero


Add/subtract exponents
Multiply/divide significands (watch sign)
Normalize
Round
All intermediate results should be in double length storage

Floating Point Multiplication

69

Floating Point Division

70

End chapter quizzes


Q1. The Binary Number system uses the radix of
a) 2
b) 3
c) 1
d) 0
Q2. (41) is equivalent to
a)
b)
c)
d)

(101001)2
(.10111)2
(101010)2
(156)2

Q3. (0.1011) is equivalent to


a)
b)
c)
d)

(41)10
(41)8
(0.6785)10
(0.6875)10

Q4. What is the result of adding the following two positive binary bit strings?
101101.101
+101100.0010
a).1000001.1110
b).1000001.1010
c).1000001.1000
d).1000001.1100

Q5.What name is used to describe the bit of lowest magnitude within a


byte or bit string?
a). radix point
71

b). low order bit


c). high order bit
d). op code

Q6. What is the best way to write the value '7564' and make it clear to
the reader that the number should be interpreted as a hexadecimal
value?
a). 0x7654
b). 7654B
c). 7654H
d). \7654
Q7.What is the numeric range of an eight bit unsigned binary number?
a). 0..7
b). 1..8
c). 0..255
d). 1..256
Q8. What coding format encodes a real number as a mantissa multiplied
by a power (exponent) of two?
a). Binary
b). excess notation
c). floating point
d). two's complement
Q9. Which of the following does not result from floating point math
operations?
a). Underflow
b). Overflow
c). Truncation
d). Two's complement
Q10. How many bits of information can each memory cell in a computer chip hold?
a). 0 bit
b). 1 bit
c). 8 bits

72

Answers
1.a ,2. a ,3.c, 4.d, 5.a, 6.a, 7.a, 8.a, 9.d, 10.c

Chapter 6
Input-Output Organization
INTRODUCTION
In computing, input/output, or I/O, refers to the communication between an information processing
system (such as a computer), and the outside world possibly a human, or another information
processing system. Inputs are the signals or data received by the system, and outputs are the signals or
data sent from it. The term can also be used as part of an action; to "perform I/O" is to perform an input
or output operation. I/O devices are used by a person (or other system) to communicate with a
computer. For instance, keyboards and mouses are considered input devices of a computer, while
monitors and printers are considered output devices of a computer. Devices for communication
between computers, such as modems and network cards, typically serve for both input and output.
Note that the designation of a device as either input or output depends on the perspective. Mouses and
keyboards take as input physical movement that the human user outputs and convert it into signals that
a computer can understand. The output from these devices is input for the computer. Similarly, printers
and monitors take as input signals that a computer outputs. They then convert these signals into
representations that human users can see or read. (For a human user the process of reading or seeing
these representations is receiving input.)
In computer architecture, the combination of the CPU and main memory (i.e. memory that the CPU
can read and write to directly, with individual instructions) is considered the brain of a computer, and
from that point of view any transfer of information from or to that combination, for example to or from
a disk drive, is considered I/O. The CPU and its supporting circuitry provide memory-mapped I/O that
is used in low-level computer programming in the implementation of device drivers.
INPUT-OUTPUT ORGANIZATION

Peripheral Devices
Input-Output Interface

Asynchronous Data Transfer

Modes of Transfer

Priority Interrupt

Direct Memory Access

Input-Output Processor
73

Serial Communication

6.1 PERIPHERAL DEVICES


6.1.1 Keyboard
A keyboard is a human interface device which is represented as a layout of buttons. Each button, or
key, can be used to either input a linguistic character to a computer, or to call upon a particular function
of the computer. Traditional keyboards use spring-based buttons, though newer variations employ
virtual keys, or even projected keyboards.
Examples of types of keyboards include:

Computer keyboard
Keyer

Chorded keyboard

LPFK
6.1.2. Pointing devices

A pointing device is any human interface device that allows a user to input spatial data to a
computer. In the case of mice and touch screens, this is usually achieved by detecting movement across
a physical surface. Analog devices, such as 3D mice, joysticks, or pointing sticks, function by reporting
their angle of deflection. Movements of the pointing device are echoed on the screen by movements of
the cursor, creating a simple, intuitive way to navigate a computer's GUI.
6.1.3. High-degree of freedom input devices
Some devices allow many continuous degrees of freedom as input. These can be used as pointing
devices, but are generally used in ways that don't involve pointing to a location in space, such as the
control of a camera angle while in 3D applications. These kinds of devices are typically used in
CAVEs, where input that registers 6DOF is
6.1.4. Imaging and Video input devices
Video input devices are used to digitize images or video from the outside world into the computer. The
information can be stored in a multitude of formats depending on the user's requirement.

Webcam
Image scanner

Fingerprint scanner

Barcode reader

3D scanner

Laser rangefinder
74

6.1.5. Medical Imaging


o
o

Computed tomography
Magnetic resonance imaging

Positron emission tomography

Medical ultrasonography

6.2 Output Devices


An output device is any piece of computer hardware equipment used to communicate the results
of data processing carried out by an information processing system (such as a computer) to the
outside world. In computing, input/output, or I/O, refers to the communication between an
information processing system (such as a computer), and the outside world. Inputs are the signals
or data sent to the system, and outputs are the signals or data sent by the system to the outside.
The most common input devices used by the computer are the keyboard and mouse. The keyboard
allows the entry of textual information while the mouse allows the selection of a point on the
screen by moving a screen cursor to the point and pressing a mouse button. The most common
outputs are monitors and speakers
o Card Puncher, Paper Tape Puncher
o CRT
o Printer (Impact, Ink Jet, Laser, Dot Matrix) Plotter
o Analog
o Voice
6.3 INPUT/OUTPUT INTERFACE

Provides a method for transferring information between internal storage (such as memory and
CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices

Peripherals - Electromechanical Devices

CPU or Memory - Electronic Device

Data Transfer Rate

Peripherals - Usually slower

CPU or Memory - Usually faster than peripherals

Some kinds of Synchronization mechanism may be needed


75

Unit of Information

Peripherals Byte, Block,

CPU or Memory Word

Data representations may differ

6.3.1 BUS AND INTERFACE MODULES

Each peripheral has an interface module associated with it Interface.


- Decodes the device address (device code)
- Decodes the commands (operation)
- Provides signals for the peripheral controller
- Synchronizes the data flow and supervises
the transfer rate between peripheral and CPU or Memory
Typical I/O instruction
Op.
code

Device
address

Function code

Commands
6.3.2. CONNECTION OF I/O BUS
6.3.2.1 Connection of I/O Bus to CPU

76

6.3.2.2.Connection of I/O to One interface

6.3.4. I/O BUS AND MEMORY BUS


Functions of Buses
MEMORY BUS is for information transfers between CPU and the MM
* I/O BUS is for information transfers between CPU and I/O devices through their I/O interface
* Many computers use a common single bus system for both memory and I/O interface units
- Use one common bus but separate control lines for each function
- Use one common bus with common control lines for both functions
* Some computer systems use two separate buses, one to communicate with memory and the other
with I/O interfaces
- Communication between CPU and all interface units is via a common I/O Bus
- An interface connected to a peripheral device may have a number of data registers , a control
register, and a status register
- A command is passed to the peripheral by sending to the appropriate interface register

77

- Function code and sense lines are not needed (Transfer of data, control, and status information is
always via the common I/O Bus)

6.3.4. ISOLATED vs MEMORY MAPPED I/O


Isolated I/O
Separate I/O read/write control lines in addition to memory read/write control lines.
- Separate (isolated) memory and I/O address spaces.
- Distinct input and output instructions.
Memory-mapped I/O
A single set of read/write control lines(no distinction between memory and I/O transfer)
- Memory and I/O addresses share the common address space
-> reduces memory address range available
- No specific input or output instruction
-> The same memory reference instructions can be used for I/O transfers
-

Considerable flexibility in handling I/O operations


6.3.5. I/O INTERFACE

6.3.6. Programmable Interface

78

- Information in each port can be assigned a meaning depending on the mode of operation of the I/O
device.
Port A = Data; Port B = Command; Port C = Status
- CPU initializes(loads) each port by transferring a byte to the Control Register .
Allows CPU can define the mode of operation of each port.
Programmable Port: By changing the bits in the control register, it is possible to
the interface characteristics.

change

6.4 ASYNCHRONOUS DATA TRANSFER


6.4.1. Synchronous and Asynchronous Operations
Synchronous - All devices derive the timing information from common clock line.
Asynchronous - No common clock.
6.4.2 Asynchronous Data Transfer
Asynchronous data transfer between two independent units requires that control signals be transmitted
between the communicating units to indicate the time at which data is being transmitted.
Two Asynchronous Data Transfer Methods
Strobe pulse : A strobe pulse is supplied by one unit to indicate the other unit when the transfer has to
occur.
Handshaking: A control signal is accompanied with each data being transmitted to indicate the
presence of data .The receiving unit responds with another control signal to acknowledge receipt of
the data.
6.4.2.1 STROBE CONTROL
* Employs a single control line to time each transfer
* The strobe may be activated by either the source or the destination unit

79

Source-Initiated Strobe for Data Transfer


Transfer

Destination-Initiated Strobe for Data

6.4.2.2. HANDSHAKING
Strobe Methods
1. Source-Initiated
The source unit that initiates the transfer has no way of knowing whether the destination unit
has actually received data
2. Destination-Initiated
The destination unit that initiates the transfer no way of knowing whether the source has actually
placed the data on the bus.To solve this problem, the HANDSHAKE method introduces a second
control signal to provide a Reply to the unit that initiates the transfer.
SOURCE-INITIATED TRANSFER USING HANDSHAKE

* Allows arbitrary delays from one state to the next


* Permits each unit to respond at its own data transfer rate
* The rate of transfer is determined by the slower unit
80

DESTINATION-INITIATED TRANSFER USING HANDSHAKE

6.4. ASYNCHRONOUS SERIAL TRANSFER


Four Different Types of Transfer

Asynchronous serial transfer

Synchronous serial transfer

Asynchronous parallel transfer

Synchronous parallel transfer

Asynchronous Serial Transfer


- Employs special bits which are inserted at both ends of the character code .
- Each character consists of three parts; Start bit; Data bits; Stop bits.

A character can be detected by the receiver from the knowledge of 4 rules;


When data are not being sent, the line is kept in the 1-state (idle state)
- The initiation of a character transmission is detected by a Start Bit , which is always a 0.
- The character bits always follow the Start Bit.
- After the last character , a Stop Bit is detected when the line returns to the 1-state for at least 1 bit
time.
81

The receiver knows in advance the transfer rate of the


expect.
UNIVERSAL
- UART

ASYNCHRONOUS

bits and the number of information bits to

RECEIVER-TRANSMITTER

Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits.
Define baud rate, no. of bits in each character, whether to generate and check parity, and no.
bits.
FIRST-IN-FIRST-OUT(FIFO) BUFFER
* Input data and output data at two different rates
* Output data are always in the same order in which the data entered the buffer.
* Useful in some applications when data is transferred asynchronously
82

of stop

4 x 4 FIFO Buffer (4 4-bit registers Ri),


4

Control

Registers(flip-flops

Fi,

associated

with

MODES OF TRANSFER - PROGRAM-CONTROLLED I/O 3 different Data Transfer Modes between the central computer (CPU or Memory) and peripherals;
1. Program-Controlled I/O
2. Interrupt-Initiated I/O
3. Direct Memory Access (DMA)
Program-Controlled I/O(Input Dev to CPU)

83

each

MODES OF TRANSFER - INTERRUPT INITIATED I/O & DMA


- Polling takes valuable CPU time.
- Open communication only when some data has to be passed -> Interrupt.
- I/O interface, instead of the CPU, monitors the I/O device.
- When the interface determines that the I/O device is ready for data transfer, it generates an Interrupt
Request to the CPU.
- Upon detecting an interrupt, CPU stops momentarily the task it is doing, branches to the service
routine to process the data transfer, and then returns to the task it was performing.

DMA (Direct Memory Access)


- Large blocks of data transferred at a high speed to or from high speed devices, magnetic drums, disks,
tapes, etc.
- DMA controller :
the I/O device.

Interface that provides I/O transfer of data directly to and from the memory and

- CPU initializes the DMA controller by sending a memory address and the number of words to be
transferred.
- Actual transfer of data is done directly between the device and memory through DMA controller.
-> Freeing CPU for other tasks
PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first when two or more requests are made
simultaneously. And also determines which interrupts are permitted to interrupt the computer while
another is being serviced.
- Higher priority interrupts can make requests while servicing a lower priority interrupt .
Priority Interrupt by Software(Polling)
- Priority is established by the order of polling the devices(interrupt sources) and flexible since it is
established by software.
- Low cost since it needs a very little hardware.
84

- Very slow .
Priority Interrupt by Hardware
Require a priority interrupt manager which accepts all the interrupt requests to determine the
highest priority request . Fast since identification of the highest priority interrupt request is
identified by the hardware. Fast since each interrupt source has its own interrupt vector to access
directly to its own service routine .
HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN Interrupt Request from any device(>=1)
-> CPU responds by INTACK <- 1.
-> Any device receives signal (INTACK) 1 at PI puts the VAD on the bus.
Among interrupt requesting devices the only device which is physically closest to CPU gets
INTACK=1, and it blocks INTACK to propagate to the next device.

One stage of the daisy chain priority arrangement

PARALLEL PRIORITY INTERRUPT


IEN:

Set or Clear by instructions ION or IOF.


85

IST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load
VAD generated by the Priority Logic.
Interrupt Register:
- Each bit is associated with an Interrupt Request from different Interrupt Source - different priority
level.
- Each bit can be cleared by a program instruction.
Mask Register:
- Mask Register is associated with Interrupt Register.
- Each bit can be set or cleared by an Instruction.

INTERRUPT PRIORITY ENCODER


Determines the highest priority interrupt when more than one interrupts take place.
Priority Encoder Truth table

INTERRUPT CYCLE
At the end of each Instruction cycle :
- CPU checks IEN and IST.
86

- If IEN IST = 1, CPU -> Interrupt Cycle.


SP SP - 1

Decrement stack pointer.

M[SP] PC Push PC into stack.


INTACK 1 Enable interrupt acknowledge.
PC VAD
IEN 0

Transfer vector address to PC.


Disable further interrupts.

Go To Fetch to execute the first instruction in the interrupt service routine.


INTERRUPT SERVICE ROUTINE

Initial and Final Operations


Each interrupt service routine must have an initial and final set of operations for controlling the
registers in the hardware interrupt system.
Initial and Final Operations

6.5 DIRECT MEMORY ACCESS


* Block of data transfer from high speed devices, Drum, Disk, Tape.
87

* DMA controller - Interface which allows I/O transfer directly between Memory and Device, freeing
CPU for other tasks.
* CPU initializes DMA Controller by sending memory address and the block size (number of words) .
CPU bus signals for DMA transfer

Block diagram of DMA controller

6.5.1. DMA I/O OPERATION


Starting an I/O
- CPU executes instruction to
a. Load Memory Address Register
b. Load Word Counter
c. Load Function (Read or Write) to be performed
d. Issue a GO command
Upon receiving, a GO Command DMA performs I/O operation as follows independently from CPU.
[1] Input Device <- R (Read control signal).
[2] Buffer (DMA Controller) <- Input Byte;and assembles the byte into a word until word is full.
[4] M <- memory address, W(Write control signal).
[5] Address Reg <- Address Reg +1; WC(Word Counter) <- WC 1.
88

[6] If WC = 0, then Interrupt to acknowledge done, else go to [1] .


Output
[1] M <- M Address, R, M Address R <- M Address R + 1, WC <- WC 1.
[2] Disassemble the word .
[3] Buffer <- One byte; Output Device <- W, for all disassembled bytes .
[4] If WC = 0, then Interrupt to acknowledge done, else go to [1].

6.5.2. CYCLE STEALING


While DMA I/O takes place, CPU is also executing instructions DMA Controller and CPU both access
Memory -> Memory Access Conflict .
Memory Bus Controller
- Coordinating the activities of all devices requesting memory access
- Priority System
Memory accesses by CPU and DMA Controller are interwoven, with the top priority given to DMA
Controller
-> Cycle Stealing

Cycle Steal
- CPU is usually much faster than I/O(DMA), thus
cycles.

CPU uses the most of the memory

- DMA Controller steals the memory cycles from CPU.


- For those stolen cycles, CPU remains idle.
- For those slow CPU, DMA Controller may steal most of the memory cycles which may cause
CPU remain idle long time .
DMA TRANSFER

89

6.6 PRIORITY INPUT/OUTPUT PROCESSOR


Channel
- Processor with direct memory access capability that communicates with I/O devices.
- Channel accesses memory by cycle stealing.
- Channel can execute a Channel Program.
i. Stored in the main memory.
ii. Consists of Channel Command Word(CCW).
iii. Each CCW specifies the parameters needed by the channel to control the I/O
devices and perform data transfer operations.
- CPU initiates the channel by executing an channel I/O class instruction and once initiated,
channel operates independently of the CPU .

CHANNEL / CPU COMMUNICATION


90

Multiple choice Questions


1.

In Programmable interface each port contains

a).Port A=data ; Port B= Command ; Port C=status ;


b).Port A=data ; Port B= status ; Port C =Command ;
c).Port A=Command ; Port B= Data ; Port=status ;

2._________ types of Asynchronous Data Transfer methods are


91

a). 3, strobe , stroke and Handshaking


b).2, strobe and handshaking
c). 1 , handshaking

3. Four different types of asynchronous serial transfer


a).Asynchronous serial and parallel transfer
b). synchronous serial and asynchronous parallel transfer
c).Asynchronous serial, synchronous parallel Asynchronous parallel, synchronous serial.

4.Status register is used for


a). I/O flag and to receive information
b).I/O flag and for recording errors.
c).for recording errors and to receive information

5 Which one is incorrect


a).CPU is usually much faster than I/O(DMA) .
b). DMA Controller steals the memory cycles from CPU.
c).For those stolen cycles, CPU remains idle.
d). For those slow CPU, DMA Controller may steal most of the memory cycles which may
cause CPU remain idle for few seconds .

6. Transmitter register
a).Accepts a data byte through data bus.
b). Accepts a data byte through memory bus.
c). Accepts a data byte through address bus.
7.The larger the RAM of computer the faster its processing speed is, since it eliminates.
a). need for external memory
b). need for ROM
92

c).frequent disk I/Os.

d). need for wider data path

8. A group of signal lines used to transmit data in parallel from one element of a computer to
another is
a).Control Bus
b).Address Bus
c). Databus

d). Network

9. The basic unit within a computer store capable of holding a single unit of Data is
a). register

b).ALU

c).Control unit

d). store location

10.A device used to bring information into a computer is


a).ALU

b).Input device

c).Control unit

d).Output device

Answers
1.a, 2.b,3.c,4.b,5.d,6.a, 7.b,8.c,9.b,10.d

Chapter 7
Memory Organization

7.1 Memory Hierarchy


The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were equipped
with additional storage

93

device beyond the capacity of main memory.The main memory unit that communicates directly with
CPU is called the MAIN MEMORY . Devices that provide backup storage are called AUXILARY
MEMORY. Most common auxiliary devices are magnetic disks and tapes they are used for strong
system programs, large data files and other backup information. Only programs and data currently
needed by the processor resides in main memory. All other informations are stored in auxiliary
memory and transferred to the main memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is
to obtain the highest possible access speed while minimizing the total cost of the memory system.
Memory Hierarchy in computer system

A very high speed memory is called cache memory used to increase the speed of processing
by making current programs and data available to the CPU at rapid rate.The cache memory is
employed in the system to compensates the speed differential between main memory access time and
processor logic.
7.2 Main Memory
The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operations. The principal technology
used for maim memory is based on semiconductor integrated circuits. Integrated circuits RAM chips
are available in two possible operating modes static and dynamic. The static RAM is easier to use and
has shorter read and write cycles.
The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory
chip compared to static RAM.
7.2.1 RAM and ROM Chips

94

Most of main memory in a general- purpose computer is made up of RAM integrated circuit chips, but
apportion of the memory may be constructed with ROM chips. Originally RAM was used to refer the
random access memory, but now it used to designate the read/write memory to distinguish it from only
read only memory, although ROM is also a random access. RAM is used for storing bulk of programs
and data that are subject to change. ROM are used to for storing programs that are permanently
resident in the computer and for tables of constants that do not change in value once the production of
computer s completed. Among other things , the ROM portion is used to store the initial programs
called a bootstrap loader .This is program whose function is used to turn on the computer software
operating system. Since RAM is volatile its content are destroyed when power is turn off on other side
the content in ROM remain unchanged either the power is turn off and on again.

7.2.2 Memory Address maps


The designer of computer system must calculate the amount of memory required for particular
application and assign it to either RAM and ROM. The interconnection between memory and
processor is an established from knowledge of the size of memory needed and the type of RAM and
ROM chips available. The addressing of memory can be established by means of a table that specifies
95

the memory address to each chip. The table, called a memory address map , is a pictorial representation
of assigned address space for each chip in the system.

Example: 512 bytes RAM and 512 bytes ROM

Memory Connection to CPU


1. RAM and ROM chips are connected to a CPU through the data and address buses.
2. The low-order lines in the address bus select the byte within the chips and other lines in the
address bus select a particular chip through its chip select inputs.

7.3. Auxiliary Memory


The most common auxiliary memory devices used in computer systems are magnetic disks and tapes.
Other components used, but not as frequently, are magnetic drums, magnetic bubble memory, and
optical disks. To understand fully the physical mechanism of auxiliary memory devices one must have
knowledge of magnetic, electronics and electronics and electromechanical systems.
96

7.3.1 Magnetic Tapes


A magnetic tape transport consists of electrical, mechanical and electronic components to provide the
parts and control mechanism for magnetic tape unit. The tape itself is a strip of coated with magnetic
recording medium. Bits are recorded as magnetic spots on the tape along tracks. Usually, seven or nine
bits are recorded simultaneously to from a character together with a parity bit. Read/write heads are
mounted one in each track so that data can be recorded and read as a sequence of characters.

7.3.2 Magnetic Disks


A magnetic disk is a circular plate constructed of metals or plastic coated with magnetized. Often both
sides of disk are used and several disks may be stacked on one spindle with read/write heads available on
each surface. All disks rotate together at high speed and are not stopped or started for access purposes.
Bits are stored in magnetized surface in spots along concentric circles called track. The tracks are
commonly divided into section called sectors. In most systems, the minimum quality of information,
which can be transferred, is a sector.

7.3.3

RAID

RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the
University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive Disks a
97

technology that allowed computer users to achieve high levels of storage reliability from low-cost and
less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for
redundancy .More recently, marketers representing industry RAID manufacturers reinvented the term
to describe a Redundant Array of Independent Disks as a means of disassociating a "low cost"
expectation from RAID technology.
"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate
data among multiple hard disk drives. The different Schemes/architectures are named by the word RAID
followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design
goals: increased data reliability or increased input/output performance. When multiple physical disks are
set up to use RAID technology, they are said to be in a RAID array. This array distributes data across
multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID
can be set up to serve several different purposes.
Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives (known
as mirroring), or writing extra data (known as parity data) across the array, calculated such that the
failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss
of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining
data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity.
For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been
available using both disks independently, and a RAID 5 array with several disks loses the capacity of one
disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a
single disk.There are various combinations of these approaches giving different trade-offs of protection
against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover
most requirements.
RAID can involve significant computation when reading and writing information. With traditional
"real" RAID hardware, a separate controller does this computation. In other cases the operating system
or simpler and less expensive controllers require the host computer's processor to do the computing,
which reduces the computer's performance on processor-intensive tasks (see "Software RAID" and
"Fake RAID" below). Simpler RAID controllers may provide only levels 0 and 1, which require less
processing.
RAID systems with redundancy continue working without interruption when one (or possibly more,
depending on the type of RAID) disks of the array fail, although they are then vulnerable to further
failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to
operate normally. Some systems have to be powered down when removing or adding a drive; others
support hot swapping, allowing drives to be replaced without powering down. RAID with hotswapping is often used in high availability systems, where it is important that the system remains
running as much of the time as possible.
Principles: RAID combines two or more physical hard disks into a single logical unit by using either
special hardware or software. Hardware solutions often are designed to present themselves to the
attached system as a single hard drive, so that the operating system would be unaware of the technical
workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in
hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software
98

solutions are typically implemented in the operating system and would present the RAID drive as a
single drive to applications running upon the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping,
the splitting of data across more than one disk; and error correction, where redundant data is stored to
allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use
one or more of these techniques, depending on the system requirements. RAID's main aim can be
either to improve reliability and availability of data, ensuring that important data is available more
often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g.
for a system that delivers video on demand TV programs to many viewers).
7.4 Associative memory
Many data-processing application require the search of items in a table stored in memory. The
established way to search a table is to store all items where they can be addressed in a sequence. The
search procedure is a strategy for choosing a sequence of addresses, reading the content of memory at
each address, and comparing the information read with the item being searched until the match occurs.
The number of accesses to memory depends on the location of item and efficiency of the search
algorithm.
The time required to find the item stored in memory can be reduced considerably if stored data can be
identified for access by content of the data itself rather than by an address. A memory unit accessed by
a content is called associative memory or content addressable memory(CAM).

Compare each word in CAM in parallel with the content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1

99

- K(Key Register) provides a mask for choosing a particular field or key in the argument in A(only
those bits in the argument that have 1s in
their corresponding position of K are compared).

Organization of CAM

7.5

Cache memory

The cache is a small amount of high-speed memory, usually with a memory cycle time comparable to
the time required by the CPU to fetch one instruction. The cache is usually filled from main memory
when instructions or data are fetched into the CPU. Often the main memory will supply a wider data
word to the cache than the CPU requires, to fill the cache more rapidly. The amount of information
which is replaces at one time in the cache is called the line size for the cache. This is normally the
width of the data bus between the cache memory and the main memory. A wide line size for the cache
means that several instruction or data words are loaded into the cache at one time, providing a kind of
prefetching for instructions or data. Since the cache is small, the effectiveness of the cache relies on the
following properties of most programs:

Spatial locality -- most programs are highly sequential; the next instruction usually comes from
the next memory location.
Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.

Short loops are a common program structure, especially for the innermost sets of nested loops.
This means that the same small set of instructions is used over and over.
Generally, several operations are performed on the same data values, or variables.
100

When a cache is used, there must be some way in which the memory controller determines whether the
value currently being addressed in memory is available from the cache. There are several ways that this
can be accomplished. One possibility is to store both the address and the value from main memory in
the cache, with the address stored in a type of memory called associative memory or, more
descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the memory,
otherwise an indication that the value is not in the associative memory is returned. All of the
comparisons are done simultaneously, so the search is performed very quickly. This type of memory is
very expensive, because each memory location must have both a comparator and a storage element. A
cache memory can be implemented with a block of associative memory, together with a block of
``ordinary'' memory. The associative memory would hold the address of the data stored in the cache,
and the ordinary memory would contain the data at that address. Such a cache memory might be
configured as shown in Figure.

Figure: A cache implemented with associative memory


If the address is not found in the associative memory, then the value is obtained from main memory.
Associative memory is very expensive, because a comparator is required for every word in the
memory, to perform all the comparisons in parallel. A cheaper way to implement a cache memory,
without using expensive associative memory, is to use direct mapping. Here, part of the memory
address (usually the low order digits of the address) is used to address a word in the cache. This part of
the address is called the index. The remaining high-order bits in the address, called the tag, are stored
in the cache memory along with the data. For example, if a processor has an 18 bit address for
memory, and a cache of 1 K words of 2 bytes (16 bits) length, and the processor can address single
bytes or 2 byte words, we might have the memory address field and cache organized as in Figure .

101

Figure: A direct mapped cache configuration


This was, in fact, the way the cache is organized in the PDP-11/60. In the 11/60, however, there are 4
other bits used to ensure that the data in the cache is valid. 3 of these are parity bits; one for each byte
and one for the tag. The parity bits are used to check that a single bit error has not occurred to the data
while in the cache. A fourth bit, called the valid bit is used to indicate whether or not a given location
in cache is valid. In the PDP-11/60 and in many other processors, the cache is not updated if memory is
altered by a device other than the CPU (for example when a disk stores new data in memory). When
such a memory operation occurs to a location which has its value stored in cache, the valid bit is reset
to show that the data is ``stale'' and does not correspond to the data in main memory. As well, the valid
bit is reset when power is first applied to the processor or when the processor recovers from a power
failure, because the data found in the cache at that time will be invalid. In the PDP-11/60, the data path
from memory to cache was the same size (16 bits) as from cache to the CPU. (In the PDP-11/70, a
faster machine, the data path from the CPU to cache was 16 bits, while from memory to cache was 32
bits which means that the cache had effectively prefetched the next instruction, approximately half of
the time). The amount of information (instructions or data) stored with each tag in the cache is called
the line size of the cache. (It is usually the same size as the data path from main memory to the cache.)
A large line size allows the prefetching of a number of instructions or data words. All items in a line of
the cache are replaced in the cache simultaneously, however, resulting in a larger block of data being
replaced for each cache miss.
The MIPS R2000/R3000 had a built-in cache controller which could control a cache up to 64K bytes.
For a similar 2K word (or 8K byte) cache, the MIPS processor would typically have a cache
configuration as shown in Figure . Generally, the MIPS cache would be larger (64Kbytes would be
typical, and line sizes of 1, 2 or 4 words would be typical).

102

Figure: One possible MIPS cache organization


A characteristic of the direct mapped cache is that a particular memory address can be mapped into
only one cache location. Many memory addresses are mapped to the same cache location (in fact, all
addresses with the same index field are mapped to the same cache location.) Whenever a ``cache miss''
occurs, the cache line will be replaced by a new line of information from main memory at an address
with the same index but with a different tag.
Note that if the program ``jumps around'' in memory, this cache organization will likely not be
effective because the index range is limited. Also, if both instructions and data are stored in cache, it
may well happen that both map into the same area of cache, and may cause each other to be replaced
very often. This could happen, for example, if the code for a matrix operation and the matrix data itself
happened to have the same index values.
A more interesting configuration for a cache is the set associative cache, which uses a set associative
mapping. In this cache organization, a given memory location can be mapped to more than one cache
location. Here, each index corresponds to two or more data words, each with a corresponding tag. A set
associative cache with n tag and data fields is called an ``n-way set associative cache''. Usually
,
for k = 1, 2, 3 are chosen for a set associative cache (k = 0 corresponds to direct mapping). Such n-way
set associative caches allow interesting tradeoff possibilities; cache performance can be improved by
increasing the number of ``ways'', or by increasing the line size, for a given total amount of memory.
103

An example of a 2-way set associative cache is shown in Figure , which shows a cache containing a
total of 2K lines, or 1 K sets, each set being 2-way associative. (The sets correspond to the rows in the
figure.)

Figure: A set-associative cache organization


In a 2-way set associative cache, if one data word is empty for a read operation corresponding to a
particular index, then it is filled. If both data words are filled, then one must be overwritten by the new
data. Similarly, in an n-way set associative cache, if all n data and tag fields in a set are filled, then one
value in the set must be overwritten, or replaced, in the cache by the new tag and data values. Note that
an entire line must be replaced each time. The most common replacement algorithms are:

Random -- the location for the value to be replaced is chosen at random from all n of the cache
locations at that index position. In a 2-way set associative cache, this can be accomplished with
a single modulo 2 random variable obtained, say, from an internal clock.
First in, first out (FIFO) -- here the first value stored in the cache, at each index position, is the
value to be replaced. For a 2-way set associative cache, this replacement strategy can be
implemented by setting a pointer to the previously loaded word each time a new word is stored
in the cache; this pointer need only be a single bit. (For set sizes > 2, this algorithm can be
implemented with a counter value stored for each ``line'', or index in the cache, and the cache
can be filled in a ``round robin'' fashion).
Least recently used (LRU) -- here the value which was actually used least recently is replaced.
In general, it is more likely that the most recently used value will be the one required in the
near future. For a 2-way set associative cache, this is readily implemented by setting a special
bit called the ``USED'' bit for the other word when a value is accessed while the corresponding
bit for the word which was accessed is reset. The value to be replaced is then the value with the
USED bit set. This replacement strategy can be implemented by adding a single USED bit to
each cache location. The LRU strategy operates by setting a bit in the other word when a value
is stored and resetting the corresponding bit for the new word. For an n-way set associative
cache, this strategy can be implemented by storing a modulo n counter with each data word. (It
is an interesting exercise to determine exactly what must be done in this case. The required
circuitry may become somewhat complex, for large n.)

Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:

Write through cache -- both the cache and main memory are updated at the same time. This
may slow down the execution of instructions which write data to memory, because of the
104

7.6

relatively longer write time to main memory. Buffering memory writes can help speed up
memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache memory
controller marks the value so that it can be written back into memory when the word is
removed from the cache. This method is used because a memory location may often be altered
several times while it is still in cache without having to write the value into main memory. This
method is often implemented using an ``ALTERED'' bit in the cache. The ALTERED bit is set
whenever a cache value is written into by the processor. Only if the ALTERED bit is set is it
necessary to write the value back into main memory (i.e., only values which have been altered
must be written back into main memory). The value should be written back immediately before
the value is replaced in the cache.
Virtual Memory Concept

In a memory hierarchy system, programs and data are first stored in auxiliary memory. Portion of
program or data are brought into main memory as they are needed by CPU. Virtual memory is a concept
used in some large computer systems that permit the user to construct programs as though a large
memory space were available , equal to the totality of auxiliary memory. Each address that is referenced
by CPU goes through the address mapping from the so called virtual address to a physical address in
memory.Virtual memory is used to give the programmer the illusion that the system has a very large
memory, even though the computer actually has a relatively small main memory.

7.6.1 Address Mapping

Memory Mapping Table for Virtual Address -> Physical address

Address Space and Memory Space are each divided into fixed size group of words called blocks
or pages
1K words group

105

Organization of memory Mapping Table in a paged system

7.6.2 Associative memory page table


Assume that Number of Blocks in memory = m, Number of Pages in Virtual Address Space = n.
Page Table

Straight forward design -> n entry table in memory, Inefficient storage space utilization <- n-m
entries of the table is empty
More efficient method is m-entry Page Table. Page Table made of an Associative Memory that is
m words; (Page Number: Block Number)

106

Virtual address
Page no.
1 0 1

Line number

Argument register

1 0 1

0 0

Key register

0
0
1
1

1
0
0
1

Associative memory

0
1
0
1

1
0
1
0

1
0
1
0

Page no. Block no.

Page Fault
1. Trap to the OS
2. Save the user registers and program state
3. Determine that the interrupt was a page fault
4. Check that the page reference was legal and determine the location of the page on the
store(disk)

backing

5. Issue a read from the backing store to a free frame


a. Wait in a queue for this device until serviced
b. Wait for the device seek and/or latency time
c. Begin the transfer of the page to a free frame
6. While waiting, the CPU may be allocated to some other process
7. Interrupt from the backing store (I/O completed)
8. Save the registers and program state for the other user
9. Determine that the interrupt was from the backing store
10. Correct the page tables (the desired page is now in memory)
11. Wait for the CPU to be allocated to this process again
12. Restore the user registers, program state, and new page table, then
instruction.
107

resume the interrupted

Processor architecture should provide the ability to restart any instruction after a page fault.

108

Multiple choice questions


1.The content of these chips are lost when the computer is switched off?
a) ROM Chips
b) RAM Chips
c) DRAM Chips
2.What are responsible for storing permanent data and instructions?
a) ROM Chips
b) RAM Chips
c) DRAM Chips
3. which memory is used to speedup the processes
a) Associative memory
b) Main memory
c) Cache memory
4.How many bits of information can each memory cell in a computer chip hold?
a).0 bit
b).1 bit
c).8 bits

5.What type of computer chips are said to be volatile


a).ROM Chips
b).RAM Chips
c).DRAM Chips

6. The interface between level-2 (operating system) and level-1 (Microprogram) of a


design is called:
a).Computer architecture
b).Virtual machine interface
109

computer

c).User interface
d).All of the above

7.Which memory is actual working memory


a)SAM

b)ROM

c).RAM

d).TAB MEMORY

8.Which of the following is secondary memory device


a). ALU

b).keyboard

c). Disk

d) All of the above

9. A micro computer has primary memory of 640k . What is the exact number of bytes contained in this
memory?
a). 64x 1000

b).640x100 c).640x1024

d). either b or c

10.How many bits can be stored in 8K RAM


a). 8000

b).8192

c). 4000

d). 4096

Answers
1. a , 2.b , 3. a,4.c ,5.b ,6.d ,7. d ,8. c , 9. c,10.d

110

Chapter 8
Pipeline and Vector Processing

8.1 Parallel Processing


Parallel processing is a term used to denote a large class of techniques that are used to provide
simultaneous data processing tasks for the purpose of increasing the computational speed of computer.
The purpose of parallel processing is to speed up the computer processing capability and increased its
throughput that is, the amount of processing can be accomplished during a interval of time the amount
of hardware increases with parallel processing and with it the cost of system increases.

Processor with multiple functional units 8.1

Parallel processing at a higher level of complexity can be achieved by having a multiplicity of


functional units that perform identical or different operation simultaneously. Parallel processing is
established by distributing the data among the multiple functional units. . For example, the arithmetic,
logic, and shift operations can be separated into three units and the operands diverted to each unit
under the supervision of a control unit . The normal operation of a computer is to fetch instructions
from memory and execute them in the processor.
The sequence of instructions read from memory constitutes an instruction stream. The operations
performed on the data in the processor constitutes a data stream. Parallel processing may occur in the
instruction stream, in the data stream, or in both. Flynn's classification divides computers into four
111

major groups as follows:


1. Single instruction stream, single data stream (SISD)
2. Single instruction stream, multiple data stream (SIMD)
3. Multiple instruction stream, single data stream (MISD)
4. Multiple instruction stream, multiple data stream (MIMD)
SISD represents the organization of single computer containing a control unit, a processor unit and
memory unit. Instruction are executed sequentially and system may or may not have internal parallel
processing capabilities. Parallel processing in this case may be achieved by means of multiple
functional units or by pipeline processing.
SIMD represents an organization that includes many processing units under the supervision of a
common control unit .All processor receive the same instruction from the control unit but operate on
the different items of data. The shared memory unit must contain multiple modules so that it can
communicate with all the processors simultaneously. MISD structure is only of theoretical interest
since no practical system has been constructed using this organization. MIMD organization refer to
computer system capable of processing several programs at the same time. Most multiprocessor and
multicomputer systems can be classified in this category.
Flynn's classification depends on the distinction between the performance of the control unit and the
data-processing unit. It emphasizes the behavior characteristics of the computer system rather than its
operational and structural interconnections. One type of parallel processing that does not fit Flynn's
classification is pipelining.

8.2 Pipelining

Pipelining is a technique of decomposing a sequential process into suboperations, with each


subprocess being executed in special dedicated segment that operates concurrently with all other
segments. A pipeline can be visualized as a collection of processing segments through which binary
information flows.
Each segment performs partial processing dictated by the way the task is partitioned. The result
obtained from the computation in each segment is transferred to the next segment in the pipeline. The
final result is obtained after the data have passed through all segments. The name "pipeline" implies a
flow of information analogous to an industrial assembly line. It is characteristic of pipelines that
several computations can be in progress in distinct segments at the same time. The overlapping of
computation is made possible by associating a register with each segment in the pipeline. The registers
provide isolation between each segment so that each can operate on distinct data simultaneously.
Perhaps the simplest way of viewing the pipeline structure is to imagine that each segment consists of
an input register followed by a combinational circuit. The register holds the data and the combinational
circuit performs the suboperation in the particular segment. The output of the combinational circuit in a
112

given segment is applied to the input register of the next segment. A clock is applied to all registers
after enough time has elapsed to perform all segment activity. In this way the information flows
through the pipeline one step at a time.
The pipeline organization will be demonstrated by means of a simple example. Suppose that we want
to perform the combined multiply and add operations with a stream of numbers.

Ai*Bi + Cj

for i = 1,2,3, . . . , 7
Each suboperation is to be implemented in a segment within a pipeline. Each segment has one or two
registers and a combinational circuit as shown in Figure 8.2. Rl through R5 are registers that receive
new data with every clock pulse.The multiplier and adder are combinational circuits. The
suboperations performed in each segment of the pipeline are as follows:

R1<--Aj, R2<--Bj

Input Aj and Bj

R3<--Rl*R2, R4-Cj

Multiply and input Ci

R5<--R3 + R4

Add Cj to product

Example of pipeline processing 8.2

8.3 Arithmetic Pipeline


113

Pipeline arithmetic units are usually found in very high speed computers. They are used to implement
floating-point operations, multiplication of fixed-point numbers, and similar computations encountered
in scientific problems. A pipeline multiplier is essentially an array multiplier as described in Fig. 10-10,
with special adders designed to minimize the carry propagation time through the partial products.
Floating-point operations are easily decomposed into suboperations as demonstrated in . We will now
show an example of a pipeline unit for floating-point addition and subtraction.
Floating Point Arithmetic Pipeline:
a. Suitable for pipeline as it consists of series of steps
b. Can implement FP ADD algorithm using the following pipeline

Fixed Point Arithmetic Pipeline:


a. Can convert the ripple-carry adder as follows:

b. Can be used for both ADD & SUB

8.4 Instruction Pipeline


Pipeline processing can occur not only in the data stream but in the instruction stream as well. An
instruction pipeline reads consecutive instructions from memory while previous instructions are being
executed in other segments.This causes the instruction fetch and execute phases to overlap and perform
simultaneous operations. One possible digression associated with such a scheme is that an instruction
may cause a branch out of sequence. In that case the pipeline must be emptied and all the instructions
that have been read from memory after the branch instruction must be discarded.
114

Consider a computer with an instruction fetch unit and an instruction execution unit designed to
provide a two-segment pipeline. The instruction fetch segment can be implemented by means of a firstin, first-out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack. Whenever the
execution unit is not using memory, the control increments the program counter and uses its address
value to read consecutive instructions from memory. The instructions are inserted into the FIFO buffer
so that they can be executed on a first-in, first-out basis. Thus an instruction stream can be placed in a
queue, waiting for decoding and processing by the execution segment. The instruction stream queuing
mechanism provides an efficient way for reducing the average access time to memory for reading
instructions. Whenever there is space in the FIFO buffer, the control unit initiates the next instruction
fetch phase. The buffer acts as a queue from which control then extracts the instructions for the
execution unit.
Computers with complex instructions require other phases in addition to the fetch and
execute to process an instruction completely. In the most general case, the computer needs to process
each instruction with the following sequence of steps.
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.

There are certain difficulties that will prevent the instruction pipeline from operating at its maximum
rate. Different segments may take different times to operate on the incoming information. Some
segments are skipped for certain operations. For example, a register mode instruction does not need an
effective address calculation. Two or more segments may require memory access at the same time,
causing one segment to wait until another is finished with the memory. Memory access conflicts are
sometimes resolved by using two memory buses for accessing instructions and data in separate
modules. In this way, an instruction word and a data word can be read simultaneously from two
different modules.
The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration. The time that each step takes to fulfill its function' depends on the
instruction and the way it is executed.

8.5 RISC Pipeline


115

Among the characteristics attributed to RISC is its ability to use an efficient instruction pipeline. The
simplicity of the instruction set can be utilized to implement an instruction pipeline using a small
number of suboperations, with each being executed in one clock cycle. Because of the fixed-length
instruction format, the decoding of the operation can occur at the same time as the register selection.
All data manipulation instructions have register-to register operations. Since all operands are in
registers, there is no need for calculating an effective address or fetching of operands from memory.
Therefore, the instruction pipeline can be implemented with two or three segments.One segment
fetches the instruction from program memory, and the other segment executes the instruction in the
ALU. A third segment may be used to store the result of the ALU operation in a destination register.
The data transfer instructions in RISC are limited to load and store instructions. These instructions use
register indirect addressing. They usually need three or four stages in the pipeline. To prevent conflicts
between a memory access to fetch an instruction and to load or store an operand, most RISe machines
use two separate buses with two memories: one for storing the instructions and the other for storing the
data. The two memories can sometime operate at the same speed as the epu clock and are referred to as
cache memories .
It is not possible to expect that every instruction be fetched from memory and executed in one clock
cycle. What is done, in effect, is to start each instruction with each clock cycle and to pipeline the
processor to achieve the goal of single-cycle instruction execution. The advantage of RISC over else
(complex instruction set computer) is that RISC can achieve pipeline segments, requiring just one
clock cycle, while CISC uses many segments in its pipeline, with the longest segment requiring two or
more clock cycles.
Another characteristic of RISC is the support given by the compiler that translates the high-level
language program into machine language program. Instead of designing hardware to handle the
difficulties associated with data conflicts and branch penalties, RISC processors rely on the efficiency
of the compiler to detect and minimize the delays encountered with these problems.

8.6 Vector Processing


There is a class of computational problems that are beyond the capabilities of conventional computer.
These problems are characterized by the fact that they require a vast number of computations that will
take a conventional computer days or even weeks to complete. In many science and engineering
applications, the problems can be formulated in terms of vectors and matrices that lend themselves to
vector processing.Computers with vector processing capabilities are in demand in specialized
applications. The following are representative application areas where vector processing is of the
utmost importance.

1. Long-range weather forecasting


2. Petroleum explorations
3. Seismic data analysis
116

4.
5.
6.
7.

Medical diagnosis Aerodynamics and space flight simulations


Artificial intelligence and expert systems
Mapping the human genome
Image processing

Without sophisticated computers, many of the required computations cannot be completed within a
reasonable amount of time. To achieve the required level of high performance it is necessary to utilize
the fastest and most reliable hardware and apply innovative procedures from vector and parallel
processing techniques.

Vector Operations
Many scientific problems require arithmetic operations on large arrays of numbers. These numbers fire
usually formulated as vectors and matrices of floating-point numbers. A vector is an ordered set of a
one-dimensional array of data items. A vector V of length n is represented as a row vector by V = [VI
V2 V3 ... Vn]. It may be represented as a column vector if the data items are listed in a column. A
conventional sequential computer is capable of processing operands one at a time. Consequently,
operations on vectors must be broken down into single computations with subscripted variables. The
element Vi ot vector V is written as V(I) and the index I refers to a memory address or register where
the number is stored. To examine the difference between a conventional scalar processor and a vector
processor, consider the following Fortran DO loop:
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
This is a program for adding two vectors A and B of length 100 to produce a vector C. This is
implemented in machine language by the following sequence of operations.
Initialize I = 0
20

Read A (I)
Read B ( I )
Store C (I) = A (I) + B (I)
Increment I = I + 1
If I :5 100 go to 20
Continue

This constitutes a program loop that reads a pair of operands from arrays A and B and performs a
floating-point addition. The loop control variable is then up~ and the steps repeat 100 times.
117

A computer capable of vector processing eliminates the overhead associated with the time it takes of
fetch and execute the instructions in the program loop.It allows operations to be specified with single
vector instruction of the form
C(1:100) = A(1:100) + B(1:100)
Vector Instructions

Vector Instruction format

Pipeline for Inner Product

8.7 Array Processors


An array processor is processor that performs computations on large arrays of data. The term is used to
118

refer two different types of processor . An attach array processor is an auxiliary processor attached to
general purpose computer. It is intended to improve the performance of the host computer in specific
numerical computation tasks. An SIMD
processor is a processor that has single instruction multiple data organization. It manipulates vector
instructions by means of multiple functional units responding to a common instruction . Although both
types of array processors manipulate vector , their organization is different.
Attached Array Processor
An attached array processor is designed as a peripheral for conventional host computer, and its
purpose is to enhance the performance of computer by providing Vector processing for complex
applications. It achieves high performance by means of parallel processing with multiple functional
units. It includes an arithmetic unit containing one or more pipelined floating point adders and
multipliers. The array processor can be programmed by the user to accommodate variety of complex
arithmetic problems.
Attached array processor with host computer

SIMD Array Processor


An SIMD array processor is a computer with multiple processing units operating in parallel. The
processing units are synchronized to perform the same operation under the control of common control
unit, thus providing a single instruction stream, multiple data stream organization. A general block
diagrams of an array is shown in the below diagram. It contains a set of identical processing elements
(PE),each having a local memory M. Each processor includes an ALU, a floating point arithmetic
unit , and working registers. The master control unit controls the operations in the processor
elements.The main memory is used for storage of the program.The Function of the master control unit
is decode the instructions and determine how the instructions and determine how the instruction is to
be executed. Scalar and program control instructions are directly executed within the master control
units. Vector instructions are broad cast to all PEs simultaneously. Each PE uses operand stored in its
local memory. Vector operands distributed to the local memories prior to parallel execution of the
instruction.

119

8.8 Reduced Instruction set computer: RISC VS CISC

Many computers have instruction sets that include more than 100 and sometimes even more than
200 instructions. These computers also employ a variety of data types and a large number of
addressing modes. The trend into computer hardware complexity was influenced by various
factors, such as upgrading existing models to provide more customer applications, adding
instructions that facilitate the translation from high-level language into machine language
programs, and striving to develop machines that move functions from software implementation
into hardware implementation. A computer with a large number of instructions is classified as a
complex instruction set computer, abbreviated CISC.
ClSC Characteristics
1. A large number of instructions-typically from 100 to 250 instructions.
2. Some instructions that perform specialized tasks and are used infrequently.
3. A large variety of addressing modes-typically from 5 to 20 different modes.
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory
RISC Characteristics
1.Relatively few instructions.
2.Relatively few addressing modes.
3. Memory access limited to load and store instructions
4.All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than micro programmed control
120

End Chapter quizzes:


Q1. A technique uses special hardware to detect a conflict and then avoid the data by routing the
data through special paths between pipeline segment.
a)
b)
c)
d)

Operand forwarding
Delayed load
Hardware interlocks
Delayed branch

Q2.Data transfer are limited to load and store instructions.


a)
b)
c)
d)

CISC
RISC
SIMD
MISD

Q3. Which Pipeline are used to implement floating-point operations, multiplication of fixed-point
numbers, and similar computations encountered in scientific problems.
a)
b)
c)
d)

Instruction Pipeline
Arithmetic Pipeline
RISC Pipeline
CISC Pipeline

Q4. SIMD does not include


a)
b)
c)
d)

ALU
Processing Element
Local memory
Control Unit

Q5. Pipelining is a technique of decomposing a __________into __________, with each subprocess


being executed in special dedicated segment.
a)
b)
c)
d)

sequential process and suboperations


Suboperations and sequential process
Instructions and sequential process
Sequential and segmentation

Q6. Flow of information is synonymous of


a) Parallel processing
b) Vector processing
c) Array processor
121

d) Pipelining
Q7.Parallel processing can be achieved with the help of
a)
b)
c)
d)

Multiple functional unit


Synchronization
Stream lining the data
Execution of instruction

Q8. To solve A-B/C equation number of registers


a)
b)
c)
d)

3
4
5
2

Q9.Flynns classification divides computers into how many computers.


a)
b)
c)
d)

5
6
3
4

Q10. A measure used to evaluate supercomputer computer in their ability to perform a given
number of floating-point operations per second is referred
a)
b)
c)
d)

Flops
Bytes
Bits
Hz

Answers
1.(a), 2.(b),3.(b), 4.(d) ,5.(a) ,6(d), 7.(a) ,8.(c), 9.(d) , 10.(a)

122

You might also like