Introduction To IT and Com Systems
Introduction To IT and Com Systems
Introduction To IT and Com Systems
1.1 Overview
1.4 Multiplexers
11
12
1.6 Memory
14
1.7 Flip-flop
14
16
1.9 Register
17
1.10
Counters
18
1.11
Digital circuit
19
1.12
20
22
22
23
24
27
29
31
3.1 Introduction
31
32
3.3 Translators
34
40
4. CPU Organisation
42
1
42
44
45
46
47
48
48
49
50
51
52
56
58
5. Computer Arithmetic
60
5.1 Introduction
60
60
63
5.4 Multiplication
63
5.5 Division
64
66
70
72
72
73
74
77
2
86
88
89
89
90
7. Memory Organisation
92
92
92
95
97
98
103
107
109
8.2 Pipelining
110
112
112
114
114
117
118
119
Chapter 1
Digital Logic Fundamentals
1.1 Overview
Gates, latches, memories and other logic components are used to design computer systems and
their subsystems
Sequential logic: output is a complex function of inputs, previous inputs and previous
outputs
Neither combinatorial logic nor sequential logic is better than the other. In practice, both are
used as appropriate in circuit design.
This presentation introduces the Boolean algebra basic functions and examines the fundamental
methods used to combine, manipulate and transform these functions.
AND
x
out = xy
out
amplitude
0
1
Y(t)
0 1
1
X(t)
1
out(t)= x(t) and y(t)
OR
x
out = x+y
x
out
amplitude
0
0 1 1
Y(t)
X(t)
General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even
number of input values are 1.
out =
x
out
y
amplitude
1 1
Y(t)
X(t)
out(t)= x(t) xor y(t)
General rule: the output is equal to 1 if an odd number of input values are 1 and 0 if an even
number of input values are 1.
NOT
x
x'
x'
amplitude
0
x(t)
x'(t)
XNOR
x
out =x xnor y
x
out
y
amplitude
0
0 0
Y(t)
X(t)
out(t)= x(t) xnor y(t)
Combinatorial Logic Circuit that implements the function xy+yz DeMorgans Law
(ab)=a+b
(a+b)=ab
y
xy'+yz
z
It may allow the simplification of complex functions, that will allow a simpler design
x'y'
x'z'
yz'
The rows and columns of the K-map correspond to the possible values of the function's input
Each cell in the K-map represents a minterm (i.e. a three variables function has: xyz, xyz,
xyz, xyz, xyz, xyz, xyz and xyz)
Gray Code
The 1-bit Gray code serves as basis for the 2-bit Gray code, the 2-bit Gray code is the basis for
3-bit Gray code, etc
Gray code sequences are cycles: 000 -> 001 -> 011 -> 010 -> 110 -> 111 -> 101 -> 100 -> 000
K-map Example
g1: xyz+xyz=xy(z+z)=xy
g3: xyz+xyz=xz(y+y)=xz
To derive a minimal expression we must select the fewest groups that cover all active minterms
(1s).
9
(xy + yz)= xy + yz
x
x'y'+y'z'+yz'
10
1.4 Multiplexers
It is a selector: it chooses one of its data inputs and passes it to the output according to some
other selection inputs
Consider four binary data inputs as inputs of a multiplexer. Two select signals will determine
which of the four inputs will be passed to the output.
Figure (a) presents the internal structure of a four inputs multiplexer, b and c present the
multiplexer schematic representation with active high enable signal (b) and active low enable
signal (c)
11
Multiplexers can be cascaded to select from a large number of inputs 4 to 1 multiplexer made
of 2 to 1 multiplexers
1.5 Decoders and Encoders
For example, a decoder with three inputs and eight outputs will activate output 6 whenever the
input values are 110.
12
Figure (a) shows a two to four decoder internal structure, (b) and (c) show its schematic
representation with active high enable signal and active low enable signal
For inputs S1S0 = 00, 01, 10 and 11 the outputs are 0, 1, 2 respectively 3 are active
Variants:
have active low outputs (the selected output has a value 0 and all the other outputs have
a value 1)
13
output all 0 when not enabled instead of state Z (the ones in the figure).
Encoders
It receives 2n inputs and outputs a n bit value corresponding to the one input that has a value of
1.
An 4-to-2 encoder and its schematic representations are presented in (a), (b) and (c).
1.6 Memory
The memory unit is an essential components in any digital computer since it is needed for strong
progress and data. Most general purpose computer would run more efficiently if they were equipped
with additional storage device beyond the capacity of main memory.The main memory unit that
communicates directly with CPU is called the MAIN MEMORY . Devices that provide backup
storage are called AUXILARY MEMORY. Most common auxiliary devices are magnetic disks and
tapes they are used for strong system programs, large data files and other backup information. Only
programs and data currently needed by the processor resides in main memory. All other informations
are stored in auxiliary memory and transferred to the main memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is
to obtain the highest possible access speed while minimizing the total cost of the memory system.
1.7 Flip-Flop
Every digital circuit likely to have a combinational circuit , most systems encountered, its also
include storage elements which describe in terms of sequential circuits. The most common type of
sequential circuit is synchronous.The storage elements employed in clocked sequential circuits are
called flip-flops. A flip-flop is a binary cell capable of storing one bit of information.
Basic RS Flip-Flop (NAND)
14
Clocked D Flip-Flop
Master-Slave Flip-Flop
T Flip-Flop
15
JK Flip-Flop
Next state
Y
x/z
Input/output
Present state
Y/z
y
Next
state/output
(a)
16
(b)
0
A
B
C
D
Present
state
D
B
C
A
/0
/1
/1
/0
C
A
D
B
1
/1
/0
/0
/1
(a)
0/1
1/1
A
0/0
1/0
1/0
0/0
B
0/1
D
1/1
x /z
(b)
1.9 Register
A register is a group of flip-flops with each flip-flop capable of storing one bit of information. An n-bit
register has a group of n flip-flop and is capable of storing any binary information of n-bits. In addition
to flip-flops a register may have combinational gates that perform certain data -processing tasks. The
flip-flops hold the binary information and the gates control when and how new information is
transferred into the register.
17
D and Q are both sets of lines, with the number of lines equal to the width of each register. There are
often multiple address ports, as well as additional data ports.
1.10 Counters
A register that goes through a predetermined of states upon the application of input pulses is called a
counter. The input pulses may be clock pulses or may originate from external sources. They may occur
at uniform interval of time or at a random. Counters are found in almost all equipment containing
digital logic.
Binary counter
Count-down counter: A binary counter with reverse count: Starts from 15 goes down.
In a count-down counter the least significant bit is complemented with every count pulse. Any
other bit is complemented if the previous bit goes from 0 to 1.
We can use the same counter design with negative edge flip-flops to make a count-down flipflop.
A BCD counter starts from 0 ends at 9.
18
1.11Digital Circuit
19
d) Decade Counter
6. What is the main advantage of synchronous logic?
a) Complexity
b) Simplicity
c) Serial Port
d) Regularity
7. How many types of counters are there?
a) 6 type
b) 7 type
c) 8 type
d) 9 type
8. Asynchronous & Synchronous counter is created from?
a) Decoder
b) Encoder
c) JK flip Flop
d) D Flip flop
9. Which of the following is not a type of register?
a) Active Register
b) Data Register
c) Address Register
d) Vector Register
10 . Which of the following is a type of register?
a) Speed Register
b) Shift Register
c) User- accessible Register
21
d) Constant Register
ANSWERS
1. D, 2.C,3.B,4.A,5.B,6.B,7.C,8.A,9.A ,10.B
22
Chapter 2
Introduction to computer organization
Memory
o The Main Memory Primary Storage or RAM Random Access Memory
2.2.System Buses
2.2.1 Memory Bus
23
The memory bus is the set of wires that is used to carry memory addresses and data to and from the
system RAM. The memory bus in most PCs is also shared with the processor bus, connecting the
system memory to the processor and the system chipset. The memory bus is part of the PC's hierarchy
of buses, which are high-speed communications channels used within the computer to transfer
information between its components.
The memory bus is made up of two parts:
Address bus: selects the memory address that the data will come from or go to on a read or write
Isolated I/O
o Separate I/O read/write control lines in addition to memory read/write control lines
o Separate (isolated) memory and I/O address spaces
o Distinct input and output instructions
Memory-mapped I/O
o A single set of read/write control lines (no distinction between memory and I/O
transfer)
o Memory and I/O addresses share the common address space which reduces memory
address range available
o No specific input or output instruction, as a result the same memory reference
instructions can be used for I/O transfers
o
24
Memory Hierarchy is to obtain the highest possible access speed while minimizing the total cost of the
memory system.
25
Hexa
Component
10 9
8 7 6 5
4 3 2 1
0 0
0 x x x
x x x x
RAM 1
address
0000 - 007F
RAM 2
0080 - 00FF
0 0
1 x x x
x x x x
RAM 3
0100 - 017F
0 1
0 x x x
x x x x
RAM 4
0180 - 01FF
0 1
1 x x x
x x x x
ROM
0200 - 03FF
1 x
x x x x
x x x x
2.3.3.Types of Memory
Main Memory
o Consists of RAM and ROM chips
o Holds instructions and data for the programs in execution
o Volatile
o Fast access
o Small storage capacity
Chip select 1
Chip select 2
Read
Write
7-bit address
CS1
CS2
RD
WR
AD 7
128 x 8
RAM
9-bit address
CS1
CS2
512 x 8
ROM
AD 9
Auxiliary Memory
o Information organized on magnetic tapes
o Auxiliary memory holds programs and data for future use
o Non-volatile data remains even when the memory device is taken off-power
o Slower access rates
o Greater storage capacity
Associative Memory
o Accessed by the content of the data rather than by an address
o Also called Content Addressable Memory (CAM)
o Used in very high speed searching applications
o Much faster than RAM in virtually all search applications
o Higher costs
Argument register(A)
Associative memory
array and logic
Read
m words
Write
Cache Memory
o Cache is a fast small capacity memory that should hold those information which are
most likely to be accessed
o The property of Locality of Reference makes the Cache memory systems work
27
o Locality of Reference - The references to memory at any given time interval tend to be
confined within a localized areas. This area contains a set of information and the
membership changes gradually as time goes by.
Temporal Locality - The information which will be used in near future is likely
to be in use already
User Processes
User Processes
Directory
Management
Logical
I/O
Communication
Architecture
File System
Physical
Organization
Device I/O
Device I/O
Scheduling & Control
Scheduling & Control
Hardware
Hardware
Local peripheral
device
Communications
port
File
System
Provides a method for transferring information between internal storage (such as memory and
CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
28
Port A
register
Bidirectional
data bus
Bus
buffers
Chip select
CPU
Register select
Register select
I/O read
I/O write
CS
RS1
RS0
RD
Timing
and
Control
Internal bus
Port B
register
Control
register
Status
register
WR
29
I/O data
I/O data
Control
Status
I/O
Device
Decode instruction
Moves the data
It is a logical unit
Used to save the data in register
ALU
Registers
Control Unit
System Bus
Synchronization
Transferring information
Managing memory
Process management
30
Auxiliary memory
Main memory
Associative memory
Cache memory
Memory location
Memory management
Memory address map
By assignment
5 parts
4 parts
3 parts
2 parts
RAM
ROM
CAD
CAM
Connect to CPU
Connect to Memory
Connect to Output Devices
Connect to ROM
ANSWERS
1. c, 2.d, 3.a, 4.a, 5.b, 6.d, 7.c, 8.b, 9.d, 10. c
31
Chapter 3
Programming the Basic computer
3.1 Introduction:
A total computer system includes both hardware and software. Hardware consists of physical
components and all associated equipment. Software refers to the program that written to the computer.
It is possible to be familiar with various aspects of computer software without being concerned with of
how computer hardware operates. It is also possible to design the parts of software without a
knowledge of software capabilities. A program written by a user may be either dependent or
independent of the physical computer that runs his program. For this programming language is a
interface between computer and programmer. The set of instructions written in a language which is
being translated in machine compatible language.
3.2 The level of programming languages
3.2.1
3.2.2.
3.2.3.
Machine languages
Defined by the hardware design of the computer
Machine dependent.
Instruction formed by binary digits (0or 1)Corresponding to an ON or OFF state of an
electric circuit .
Machine Instruction has two parts: op code and operand
1. Op code : Operation to be performed as a verb like increment, add , copy etc.
2. operand: operand acts like a noun on the data for example incrementing the value of A,
A is the operand. It can be a value or an address of memory location.(Remark: variable is
machine language is in form of address, i.e address = variable)
Assembly languages: The user employs symbols (letter numerals , or special characters )
for the operation parts , the address part and other parts of instruction code. Each symbolic
instruction can be translated in to one binary coded instruction. This translation is done by
the special program called assembler. Because an assembler translates the symbols, this
type of symbolic program is referred to as an assembly language program and the language
is assembly language.
Instructions are 1:1 with machine instructions.
Symbolic instruction code (or mnemonics).
1. Using symbolic instruction code to replace binary digits.
2. Meaningful abbreviations that substitute the op code
3. e.g. JMP A means jump to address represented by A.
3.2.3 High-Level Programming Languages
Machine-independent
machine instructions.
32.2.3.2. Fourth generation language (4GL)
1.
2.
3.
4.
Reasons
2.
1.
Reasons
2.
3.
1.
PASCAL
2.
3.
Java
4.
5.
Script Languages
Interpreted and processed by a software (e.g. Browser)
e.g.
VBScript and JavaScript
have similar syntax as Visual Basic and Java respectively
Interpreted and processed by Web browser
Instructions are embedded in HTML documents
35
36
Assembler
Compiler
Interpreter
3.3.5.2. Source and Result operands:(values for that operands are of importance)
3.3.5.3Instruction Types
Example: High level language statement: X = X + Y
37
If we assume a simple set of machine instructions, this operation could be accomplished with
three instructions: (assume X is stored in memory location 624, and Y in memory loc. 625.)
1. Load a register with the contents of memory location 624.
2. Add the contents of memory location 625 to the register,
3. Store the contents of the register in memory location 624.
As seen, a simple "C" (or BASIC) instruction, may require 3 machine instructions.
The instructions fall into one the following four categories:
Data processing: Arithmetic and logic instructions.
Data storage: Memory instructions.
Data movement: I/O instructions.
Control: Test and branch instructions.
What is the maximum number of addresses one might need in an instruction?
Virtually all arithmetic and logic operations are either unary (one operand) or binary (two
operands). The result of an operation must be stored, suggesting a third address. Finally, after
the completion of an instruction, the next instruction must be fetched, and its address is needed.
This line of resoning suggests that an instruction could be required to contain 4 address
references: two operands, one result, and one address of the next instruction. In practice, the
address of the next instruction is handled by the program counter; therefore most instructions
have one, two or three operand addresses.
Three-address instruction formats are not common, because they require a relatively long
instruction format to hold three address references.
Number of Addresses
38
Operation Repertoire: How many and which operations to provide, and how complex
operations should be.
Data Types: The various types of data upon which operations are performed.
Instruction Format: Instruction length (in bits), number of addresses, size of various fields,
and so on.
Registers: Number of CPU registers that can be referenced by instructions and their use.
Addressing: The mode or modes by which the address of an operand is specified.
These issues are highly interrelated and must be considered togeter in designing an instruction set.
3.3.6. Types of Operands
3.3.6.1. Most important general categories of data are:
Addresses
Numbers
Characters
Logical data.
Numbers:
o Integer or fixed point
o Floating point (real numbers)
o Decimal
Characters:
o International Reference Alphabet (IRA) which is referred to as ASCII in the USA,
o EBCDIC character set used in IBM 370 machines.
Logical Data:
Types of Operations
A set of general types of operations is as follows:
Data transfer: Move, Store, Load (Fetch), Exchange, Clear (Reset), set, Push, Pop.
Arithmetic: Add, Subtract, Multiply, Divide, Absolute, Negate, Increment, Decrement.
Logical: AND, OR, NOT, XOR, Test, Compare, Shift, Rotate, Set control variables.
Conversion: Translate, Convert.
I/O: Input (Read), Output (Write), Start I/O, Test I/O.
Transfer of Control: Jump (Branch), Jump Conditional, Jump to Subroutine, Return, Execute,
Skip, Skip Conditional, Halt, Wait (Hold), No Operation.
System Control: Instructions that can only be executed while the processor is in a priviledged
state, or is executing a program in a special priviledged area of memory. Instructions reserved
for the use of operating system.
39
Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction set
Instruction Length:
Allocation of Bits:
Variable-Length Instructions:
1.
2.
40
1. The speed at which the CPU processes data to convert is measured in what :
a) Megahertz
b) Gigahertz
c) Nanoseconds
d) A and B
2. Machine cycles are measured in nanoseconds or picoseconds..
a) True
b) False
3. Which register example holds the address of the current instruction being processed?
a) Program counter
b)
Instruction register
c)
Control Unit
d)
Arithmetic Logic Unit
4. What is a machine cycle :
a) Data measured in megahertz
b) Data measured in gigahertz
c) A sequence of instruction perform to execute one program instruction
d) All of the Above
5. Most of the computers available today are known as
a). 3rd generation computers
d). 6th
generation computers
6. What difference does the 5th generation computers have from othe generation computers?
a). Technological advancement
b). FORTRAN
d). BASIC
Includes opcode
Includes (implicit or explicit) operand(s)
Usually more than one instruction format in an instruction set
nclude only operators
Answers
1.a , 2.b, 3.b, 4.a, 5.b 6.a , 7.a ,8.a, 9. a,10.d
42
Chapter 4
CPU ORGANIZATION
4.1 General Register Organization In a computational device memory access is the most time
consuming process. So, it would be more convenient & more efficient to store these intermediate
values in processor register.
Input
Cloc
k
R
1
R
2
R
3
R
4
R
5
R
6
R
Load 7
(7 lines)
SEL
A
3x8
decoder
SEL
D
MUX
MUX
A bus
B
bus
ALU
OPR
Output
} SELB
o Control Word
Format 14 binary selection inputs with 3 bits for SELA, 3 for SELB, 3
for SELD, and 5 for OPR
o ALU An operation is selected by the ALU operation selector (OPR). The result
of a microoperation is directed to a destination register selected by a decoder
(SELD).
44
45
Data registers
Address Registers
On the other hand, if the number of registers goes above 32, then there is no appreciable reduction in
memory references.
4.4 Status and Control Registers
For control of various operations several registers are used. These registers can not be used in data
manipulations, however. the content of some of these registers can be used by the programmer. Some
of the control register for a von Neurnann machine can be the Program Counter (PC), Memory Address
Register (MAR) and Data Register .Almost all the CPUs, have status register, a part of which may be
programmer visible. A register which may be formed by condition codes is called condition code
register. Some of the commonly used flags or condition codes in such a register may be:
Sign flag : This indicates whether the sign of previous arithmetic operation was positive (0) or
negative (1).
Zero flag : This flag bit will be set if the result of last arithmetic operation was zero.
Carry flag : This flag is set, if a carry results from the addition of the highest order bits or a borrow is
taken on subtraction of highest order bit.
Equal flag : This bit flag will be set if a logic comparison operation finds out that both of its operands
are equal.
Overflow flag : This flag is used to indicate the condition of arithmetic overflow.
Interrupt : This flag is used for enabling or disabling interrupts.
Supervisor : This flag is used in certain computers to determine whether the CPU is,flag executing in
supervisor or user mode. In case the CPU is in supervisor modeit will be allowed to execute certain
privileged instructions. In most CPUs, on encountering a subroutine call or interrupt handling routine,
it is desired that the status information such as conditional codes and other register information be
stored and restored on initiation and end of these routines respectively. The register often known as
Program Status Word (PSW) contains condition code plus other status information. There can be
several other status and control registers such as interrupt vector register in the machines using
vectored interrupt, stack pointer if a stack is used to implement subroutine calls, etc.T4.2 Status and
Control Registers has status and control register design and also dependent on the Operating System
(0s) support. The functional understanding of OS helps in tailoring the register organization. In fact,
some control information is only of specific use to the operating system.
One major decision to be taken for designing status and control registers organization is :
how to allocate control information between registers and the memory. Generally first few hundred or
thousands words of memory are allocated for storing control information. It is the responsibility of the
designer to determine how much control information should be in registers and how much in memory.
47
Rather than specifying a digital system in words, a specific notation is used, register transfer
language
For any function of the computer, the register transfer language can be used to describe the
(sequence of) microoperations
Register transfer language:
A symbolic language which is a convenient tool for describing the internal organization of
digital computers and can also be used to facilitate the design process of digital systems.
Registers are designated by capital letters, sometimes followed by numbers (e.g., A, R13, IR)
Often the names indicate function:
MAR - memory address register
PC
- program counter
IR
- instruction register
Registers and their contents can be viewed and represented in various ways
A register can be viewed as a single entity:
Registers may also be represented showing the bits of data they contain
48
Note that this is a non-destructive; i.e. the contents of R1 are not altered by copying
(loading) them to R2
49
50
There are, in principle, 16 different logic functions that can be defined over two binary
input variables
However, most systems only implement four of these are AND (), OR (), XOR (),
Complement/NOT.
The others can be created from combination of these.
List of Logic Microoperations
- 16 different logic operations with 2 binary vars.
- n binary vars
functions
51
Truth tables for 16 functions of 2 variables and the corresponding 16 logic microoperations
Selective-set
Selective-complement
Selective-clear
Mask (Delete)
Clear
Insert
Compare
A
A
A
A
A
A
A
A+ B
A B
A B
A B
A B
(A B) + C
A B
Logical shift
52
Circular shift
Arithmetic shift
b).What differentiates them is the information that goes into the serial input
c).
A right shift operation
53
54
55
Figure 4 is a block diagram of a micro-programmed control unit that may be used to implement the
instruction set of the computer we described above. The heart of the controller is the control 32 X 24
ROM memory in which upt to 32 24-bit long microinstructions can be stored. Each is composed of two
main fields: a 16-bit wide control signal field and an 8-bit wide next-address field. Each bit in the
control signal field corresponds to one of the control signals discussed above. The next-address field
contains bits that determine the address of the next microinstruction to be fetched from the control
ROM. We shall see the details of how these bits work shortly. Words selected from the control ROM
feed the microinstruction register. This 24-bit wide register is analogous to the outer machine's
instruction register. Specifically, the leading 16 bits (the control-signal field) of the microinstruction
register are connected to the control-signal lines that go to the various components of the external
machine's data path section.
Addresses provided to the control ROM come from a micro-counter register, which is analogous to the
external machine's program counter. The micro-counter, in turn, receives its input from a multiplexer
which selects from : (1) the output of an address ROM, (2) a current-address incrementer, or (3) the
address stored in the next-address field of the current microinstruction.
56
The 8085 is an 8-bit general purpose microprocessor that can address 64K Byte of memory. It
has 40 pins and uses +5V for power. It can run at a maximum frequency of 3 MHz.The pins on
the chip can be grouped into 6 groups:
Address Bus.
Data Bus.
The address bus has 8 signal lines A8 A15 which are unidirectional.The other 8 address bits
are multiplexed (time shared) with the 8 data bits.So, the bits AD0 AD7 are bi-directional and
serve as A0 A7 and D0 D7 at the same time.During the execution of the instruction, these
lines carry the address bits during the early part, then during the late parts of the execution, they
carry the 8 data bits.In order to separate the address from the data, we can use a latch to save
the value before the function of the bits changes.
There are 3 important pins in the frequency control group.X0 and X1 are the inputs from the
crystal or clock generating circuit.The frequency is internally divided by 2.So, to run the
57
CLK (OUT): An output clock pin to drive the clock of the rest of the system.
Lets assume that we are trying to fetch the instruction at memory location 2005. That means
that the program counter is now set to that value.The following is the sequence of operations:
The program counter places the address value on the address bus and the
controller issues a RD signal.
The memorys address decoder gets the value and determines which memory
location is being accessed.
The value on the data bus is read into the instruction decoder inside the
microprocessor.
After decoding the instruction, the control unit issues the proper control signals
to perform the operation.
2. To accomplish a task a computer has to process data in three stages. They are:
7.Which smaller unit of the CPU directs and coordinates all activities within it and determines the
sequence in which instructions are executed, sending instruction sequence to the other units.
a)
CU
b)
ALU
c)
PROCESSOR
59
d)
8. The CPU primary responsibility is the movement of data and instructions from itself to main
memory and ALU and back. Arrange the CU execution of instruction in the correct order by placing
the execution instructions letter in the box provided:
a)
b)
c)
d)
e)
9.
Which smaller CPU unit contains registers-temporary storage locations that hold a single
instruction or data item needed immediately and frequently
a)
CU
b)
ALU
c)
PROCESSOR
d)
10.
Program counter (PC) and instruction register (IR) are examples of registers:
a)
True
b)
False
Answers
1 ,2 ,3 ,4 ,5 ,6 ,7 8, 9, 10
Chapter 5
Computer Arithmetic
5.1 INTRODUCTION
Because electronic logic deals with currents that are on or off, it has been found convenient to
represent quantities in binary form to perform arithmetic on a computer. Thus, instead of having ten
different digits, 0, 1, 2, 3, 4, 5, 6, 7, 8, and 9, in binary arithmetic, there are only two different digits, 0
60
and 1, and when moving to the next column, instead of the digit representing a quantity that is ten
times as large, it only represents a quantity that is two times as large. Thus, the first few numbers are
written in binary as follows:
Decimal
Zero
0
One
1
Two
2
Three
3
Four
4
Five
5
Six
6
Seven
7
Eight
8
Nine
9
Ten
10
Eleven 11
Twelve 12
Binary
0
1
10
11
100
101
110
111
1000
1001
1010
1011
1100
The addition and multiplication tables for binary arithmetic are very small, and this makes it possible
to use logic circuits to build binary adders.
+|0 1
--------0|0 1
1 | 1 10
*|0 1
--------0|0 0
1|0 1
Thus, from the table above, when two binary digits, A and B are added, the carry bit is simply (A AND
B), while the last digit of the sum is more complicated; ((A AND NOT B) OR ((NOT A) AND B)) is
one way to express it.
5.2 Number Systems
472.83=(4x102)+(7x101)+(2x100)+(8x10-1)+(3x10-2)
In
general,
for
the
decimal
X
=
{
x2x1x0.x-1x-2x-3
representation
of
}
X = i xi10i
Binary Number System
e.g.
1101 1110 0001. 1110 1101 = DE1.DE
1100
63
5.4
+0100
+0100
+1111
---------------------------0111
1001 = overflow 11011
Monitor sign bit for overflow (sign bit change as adding two positive numbers or two
negative numbers.)
Subtraction: Take twos compliment of subtrahend then add to minuend
i.e. a - b = a + (-b)
So we only need addition and complement circuits
Multiplication
Complex
Work out partial product for each digit
Take care with place value (column)
Add partial products
Multiplication Example
64
5.5 Division
65
Divisor
00001101
1011 10010011
1011
Partial
Remainders
Quotient
Dividend
001110
1011
001111
1011
100
Remainder
Real Numbers
67
The relative magnitudes (order) of the numbers do not change. Can be treated as integers for
comparison.
5.6.1 Normalization
IEEE 754
69
70
(101001)2
(.10111)2
(101010)2
(156)2
(41)10
(41)8
(0.6785)10
(0.6875)10
Q4. What is the result of adding the following two positive binary bit strings?
101101.101
+101100.0010
a).1000001.1110
b).1000001.1010
c).1000001.1000
d).1000001.1100
Q6. What is the best way to write the value '7564' and make it clear to
the reader that the number should be interpreted as a hexadecimal
value?
a). 0x7654
b). 7654B
c). 7654H
d). \7654
Q7.What is the numeric range of an eight bit unsigned binary number?
a). 0..7
b). 1..8
c). 0..255
d). 1..256
Q8. What coding format encodes a real number as a mantissa multiplied
by a power (exponent) of two?
a). Binary
b). excess notation
c). floating point
d). two's complement
Q9. Which of the following does not result from floating point math
operations?
a). Underflow
b). Overflow
c). Truncation
d). Two's complement
Q10. How many bits of information can each memory cell in a computer chip hold?
a). 0 bit
b). 1 bit
c). 8 bits
72
Answers
1.a ,2. a ,3.c, 4.d, 5.a, 6.a, 7.a, 8.a, 9.d, 10.c
Chapter 6
Input-Output Organization
INTRODUCTION
In computing, input/output, or I/O, refers to the communication between an information processing
system (such as a computer), and the outside world possibly a human, or another information
processing system. Inputs are the signals or data received by the system, and outputs are the signals or
data sent from it. The term can also be used as part of an action; to "perform I/O" is to perform an input
or output operation. I/O devices are used by a person (or other system) to communicate with a
computer. For instance, keyboards and mouses are considered input devices of a computer, while
monitors and printers are considered output devices of a computer. Devices for communication
between computers, such as modems and network cards, typically serve for both input and output.
Note that the designation of a device as either input or output depends on the perspective. Mouses and
keyboards take as input physical movement that the human user outputs and convert it into signals that
a computer can understand. The output from these devices is input for the computer. Similarly, printers
and monitors take as input signals that a computer outputs. They then convert these signals into
representations that human users can see or read. (For a human user the process of reading or seeing
these representations is receiving input.)
In computer architecture, the combination of the CPU and main memory (i.e. memory that the CPU
can read and write to directly, with individual instructions) is considered the brain of a computer, and
from that point of view any transfer of information from or to that combination, for example to or from
a disk drive, is considered I/O. The CPU and its supporting circuitry provide memory-mapped I/O that
is used in low-level computer programming in the implementation of device drivers.
INPUT-OUTPUT ORGANIZATION
Peripheral Devices
Input-Output Interface
Modes of Transfer
Priority Interrupt
Input-Output Processor
73
Serial Communication
Computer keyboard
Keyer
Chorded keyboard
LPFK
6.1.2. Pointing devices
A pointing device is any human interface device that allows a user to input spatial data to a
computer. In the case of mice and touch screens, this is usually achieved by detecting movement across
a physical surface. Analog devices, such as 3D mice, joysticks, or pointing sticks, function by reporting
their angle of deflection. Movements of the pointing device are echoed on the screen by movements of
the cursor, creating a simple, intuitive way to navigate a computer's GUI.
6.1.3. High-degree of freedom input devices
Some devices allow many continuous degrees of freedom as input. These can be used as pointing
devices, but are generally used in ways that don't involve pointing to a location in space, such as the
control of a camera angle while in 3D applications. These kinds of devices are typically used in
CAVEs, where input that registers 6DOF is
6.1.4. Imaging and Video input devices
Video input devices are used to digitize images or video from the outside world into the computer. The
information can be stored in a multitude of formats depending on the user's requirement.
Webcam
Image scanner
Fingerprint scanner
Barcode reader
3D scanner
Laser rangefinder
74
Computed tomography
Magnetic resonance imaging
Medical ultrasonography
Provides a method for transferring information between internal storage (such as memory and
CPU registers) and external I/O devices
Resolves the differences between the computer and peripheral devices
Unit of Information
Device
address
Function code
Commands
6.3.2. CONNECTION OF I/O BUS
6.3.2.1 Connection of I/O Bus to CPU
76
77
- Function code and sense lines are not needed (Transfer of data, control, and status information is
always via the common I/O Bus)
78
- Information in each port can be assigned a meaning depending on the mode of operation of the I/O
device.
Port A = Data; Port B = Command; Port C = Status
- CPU initializes(loads) each port by transferring a byte to the Control Register .
Allows CPU can define the mode of operation of each port.
Programmable Port: By changing the bits in the control register, it is possible to
the interface characteristics.
change
79
6.4.2.2. HANDSHAKING
Strobe Methods
1. Source-Initiated
The source unit that initiates the transfer has no way of knowing whether the destination unit
has actually received data
2. Destination-Initiated
The destination unit that initiates the transfer no way of knowing whether the source has actually
placed the data on the bus.To solve this problem, the HANDSHAKE method introduces a second
control signal to provide a Reply to the unit that initiates the transfer.
SOURCE-INITIATED TRANSFER USING HANDSHAKE
ASYNCHRONOUS
RECEIVER-TRANSMITTER
Transmitter Register
- Accepts a data byte(from CPU) through the data bus
- Transferred to a shift register for serial transmission
Receiver
- Receives serial information into another shift register
- Complete data byte is sent to the receiver register
Status Register Bits
- Used for I/O flags and for recording errors
Control Register Bits.
Define baud rate, no. of bits in each character, whether to generate and check parity, and no.
bits.
FIRST-IN-FIRST-OUT(FIFO) BUFFER
* Input data and output data at two different rates
* Output data are always in the same order in which the data entered the buffer.
* Useful in some applications when data is transferred asynchronously
82
of stop
Control
Registers(flip-flops
Fi,
associated
with
MODES OF TRANSFER - PROGRAM-CONTROLLED I/O 3 different Data Transfer Modes between the central computer (CPU or Memory) and peripherals;
1. Program-Controlled I/O
2. Interrupt-Initiated I/O
3. Direct Memory Access (DMA)
Program-Controlled I/O(Input Dev to CPU)
83
each
Interface that provides I/O transfer of data directly to and from the memory and
- CPU initializes the DMA controller by sending a memory address and the number of words to be
transferred.
- Actual transfer of data is done directly between the device and memory through DMA controller.
-> Freeing CPU for other tasks
PRIORITY INTERRUPT
Priority
- Determines which interrupt is to be served first when two or more requests are made
simultaneously. And also determines which interrupts are permitted to interrupt the computer while
another is being serviced.
- Higher priority interrupts can make requests while servicing a lower priority interrupt .
Priority Interrupt by Software(Polling)
- Priority is established by the order of polling the devices(interrupt sources) and flexible since it is
established by software.
- Low cost since it needs a very little hardware.
84
- Very slow .
Priority Interrupt by Hardware
Require a priority interrupt manager which accepts all the interrupt requests to determine the
highest priority request . Fast since identification of the highest priority interrupt request is
identified by the hardware. Fast since each interrupt source has its own interrupt vector to access
directly to its own service routine .
HARDWARE PRIORITY INTERRUPT - DAISY-CHAIN Interrupt Request from any device(>=1)
-> CPU responds by INTACK <- 1.
-> Any device receives signal (INTACK) 1 at PI puts the VAD on the bus.
Among interrupt requesting devices the only device which is physically closest to CPU gets
INTACK=1, and it blocks INTACK to propagate to the next device.
IST: Represents an unmasked interrupt has occurred. INTACK enables tristate Bus Buffer to load
VAD generated by the Priority Logic.
Interrupt Register:
- Each bit is associated with an Interrupt Request from different Interrupt Source - different priority
level.
- Each bit can be cleared by a program instruction.
Mask Register:
- Mask Register is associated with Interrupt Register.
- Each bit can be set or cleared by an Instruction.
INTERRUPT CYCLE
At the end of each Instruction cycle :
- CPU checks IEN and IST.
86
* DMA controller - Interface which allows I/O transfer directly between Memory and Device, freeing
CPU for other tasks.
* CPU initializes DMA Controller by sending memory address and the block size (number of words) .
CPU bus signals for DMA transfer
Cycle Steal
- CPU is usually much faster than I/O(DMA), thus
cycles.
89
6. Transmitter register
a).Accepts a data byte through data bus.
b). Accepts a data byte through memory bus.
c). Accepts a data byte through address bus.
7.The larger the RAM of computer the faster its processing speed is, since it eliminates.
a). need for external memory
b). need for ROM
92
8. A group of signal lines used to transmit data in parallel from one element of a computer to
another is
a).Control Bus
b).Address Bus
c). Databus
d). Network
9. The basic unit within a computer store capable of holding a single unit of Data is
a). register
b).ALU
c).Control unit
b).Input device
c).Control unit
d).Output device
Answers
1.a, 2.b,3.c,4.b,5.d,6.a, 7.b,8.c,9.b,10.d
Chapter 7
Memory Organization
93
device beyond the capacity of main memory.The main memory unit that communicates directly with
CPU is called the MAIN MEMORY . Devices that provide backup storage are called AUXILARY
MEMORY. Most common auxiliary devices are magnetic disks and tapes they are used for strong
system programs, large data files and other backup information. Only programs and data currently
needed by the processor resides in main memory. All other informations are stored in auxiliary
memory and transferred to the main memory when needed.
The main memory hierarchy system consists of all storage devices employed in a computer system
from the slow but high capacity auxiliary memory to a relatively faster main memory, to an even
smaller and faster cache memory accessible to the high-speed processing logic. Memory Hierarchy is
to obtain the highest possible access speed while minimizing the total cost of the memory system.
Memory Hierarchy in computer system
A very high speed memory is called cache memory used to increase the speed of processing
by making current programs and data available to the CPU at rapid rate.The cache memory is
employed in the system to compensates the speed differential between main memory access time and
processor logic.
7.2 Main Memory
The main memory is the central storage unit in a computer system. It is a relatively large and fast
memory used to store programs and data during the computer operations. The principal technology
used for maim memory is based on semiconductor integrated circuits. Integrated circuits RAM chips
are available in two possible operating modes static and dynamic. The static RAM is easier to use and
has shorter read and write cycles.
The dynamic RAM offers reduced power consumption and larger storage capacity in a single memory
chip compared to static RAM.
7.2.1 RAM and ROM Chips
94
Most of main memory in a general- purpose computer is made up of RAM integrated circuit chips, but
apportion of the memory may be constructed with ROM chips. Originally RAM was used to refer the
random access memory, but now it used to designate the read/write memory to distinguish it from only
read only memory, although ROM is also a random access. RAM is used for storing bulk of programs
and data that are subject to change. ROM are used to for storing programs that are permanently
resident in the computer and for tables of constants that do not change in value once the production of
computer s completed. Among other things , the ROM portion is used to store the initial programs
called a bootstrap loader .This is program whose function is used to turn on the computer software
operating system. Since RAM is volatile its content are destroyed when power is turn off on other side
the content in ROM remain unchanged either the power is turn off and on again.
the memory address to each chip. The table, called a memory address map , is a pictorial representation
of assigned address space for each chip in the system.
7.3.3
RAID
RAID is an acronym first defined by David A. Patterson, Garth A. Gibson and Randy Katz at the
University of California, Berkeley in 1987 to describe a Redundant Array of Inexpensive Disks a
97
technology that allowed computer users to achieve high levels of storage reliability from low-cost and
less reliable PC-class disk-drive components, via the technique of arranging the devices into arrays for
redundancy .More recently, marketers representing industry RAID manufacturers reinvented the term
to describe a Redundant Array of Independent Disks as a means of disassociating a "low cost"
expectation from RAID technology.
"RAID" is now used as an umbrella term for computer data storage schemes that can divide and replicate
data among multiple hard disk drives. The different Schemes/architectures are named by the word RAID
followed by a number, as in RAID 0, RAID 1, etc. RAID's various designs all involve two key design
goals: increased data reliability or increased input/output performance. When multiple physical disks are
set up to use RAID technology, they are said to be in a RAID array. This array distributes data across
multiple disks, but the array is seen by the computer user and operating system as one single disk. RAID
can be set up to serve several different purposes.
Purpose and basics: Redundancy is achieved by either writing the same data to multiple drives (known
as mirroring), or writing extra data (known as parity data) across the array, calculated such that the
failure of one (or possibly more, depending on the type of RAID) disks in the array will not result in loss
of data. A failed disk may be replaced by a new one, and the lost data reconstructed from the remaining
data and the parity data. Organizing disks into a redundant array decreases the usable storage capacity.
For instance, a 2-disk RAID 1 array loses half of the total capacity that would have otherwise been
available using both disks independently, and a RAID 5 array with several disks loses the capacity of one
disk. Other types of RAID arrays are arranged so that they are faster to write to and read from than a
single disk.There are various combinations of these approaches giving different trade-offs of protection
against data loss, capacity, and speed. RAID levels 0, 1, and 5 are the most commonly found, and cover
most requirements.
RAID can involve significant computation when reading and writing information. With traditional
"real" RAID hardware, a separate controller does this computation. In other cases the operating system
or simpler and less expensive controllers require the host computer's processor to do the computing,
which reduces the computer's performance on processor-intensive tasks (see "Software RAID" and
"Fake RAID" below). Simpler RAID controllers may provide only levels 0 and 1, which require less
processing.
RAID systems with redundancy continue working without interruption when one (or possibly more,
depending on the type of RAID) disks of the array fail, although they are then vulnerable to further
failures. When the bad disk is replaced by a new one the array is rebuilt while the system continues to
operate normally. Some systems have to be powered down when removing or adding a drive; others
support hot swapping, allowing drives to be replaced without powering down. RAID with hotswapping is often used in high availability systems, where it is important that the system remains
running as much of the time as possible.
Principles: RAID combines two or more physical hard disks into a single logical unit by using either
special hardware or software. Hardware solutions often are designed to present themselves to the
attached system as a single hard drive, so that the operating system would be unaware of the technical
workings. For example, you might configure a 1TB RAID 5 array using three 500GB hard drives in
hardware RAID, the operating system would simply be presented with a "single" 1TB disk. Software
98
solutions are typically implemented in the operating system and would present the RAID drive as a
single drive to applications running upon the operating system.
There are three key concepts in RAID: mirroring, the copying of data to more than one disk; striping,
the splitting of data across more than one disk; and error correction, where redundant data is stored to
allow problems to be detected and possibly fixed (known as fault tolerance). Different RAID levels use
one or more of these techniques, depending on the system requirements. RAID's main aim can be
either to improve reliability and availability of data, ensuring that important data is available more
often than not (e.g. a database of customer orders), or merely to improve the access speed to files (e.g.
for a system that delivers video on demand TV programs to many viewers).
7.4 Associative memory
Many data-processing application require the search of items in a table stored in memory. The
established way to search a table is to store all items where they can be addressed in a sequence. The
search procedure is a strategy for choosing a sequence of addresses, reading the content of memory at
each address, and comparing the information read with the item being searched until the match occurs.
The number of accesses to memory depends on the location of item and efficiency of the search
algorithm.
The time required to find the item stored in memory can be reduced considerably if stored data can be
identified for access by content of the data itself rather than by an address. A memory unit accessed by
a content is called associative memory or content addressable memory(CAM).
Compare each word in CAM in parallel with the content of A(Argument Register)
- If CAM Word[i] = A, M(i) = 1
- Read sequentially accessing CAM for CAM Word(i) for M(i) = 1
99
- K(Key Register) provides a mask for choosing a particular field or key in the argument in A(only
those bits in the argument that have 1s in
their corresponding position of K are compared).
Organization of CAM
7.5
Cache memory
The cache is a small amount of high-speed memory, usually with a memory cycle time comparable to
the time required by the CPU to fetch one instruction. The cache is usually filled from main memory
when instructions or data are fetched into the CPU. Often the main memory will supply a wider data
word to the cache than the CPU requires, to fill the cache more rapidly. The amount of information
which is replaces at one time in the cache is called the line size for the cache. This is normally the
width of the data bus between the cache memory and the main memory. A wide line size for the cache
means that several instruction or data words are loaded into the cache at one time, providing a kind of
prefetching for instructions or data. Since the cache is small, the effectiveness of the cache relies on the
following properties of most programs:
Spatial locality -- most programs are highly sequential; the next instruction usually comes from
the next memory location.
Data is usually structured, and data in these structures normally are stored in contiguous
memory locations.
Short loops are a common program structure, especially for the innermost sets of nested loops.
This means that the same small set of instructions is used over and over.
Generally, several operations are performed on the same data values, or variables.
100
When a cache is used, there must be some way in which the memory controller determines whether the
value currently being addressed in memory is available from the cache. There are several ways that this
can be accomplished. One possibility is to store both the address and the value from main memory in
the cache, with the address stored in a type of memory called associative memory or, more
descriptively, content addressable memory.
An associative memory, or content addressable memory, has the property that when a value is
presented to the memory, the address of the value is returned if the value is stored in the memory,
otherwise an indication that the value is not in the associative memory is returned. All of the
comparisons are done simultaneously, so the search is performed very quickly. This type of memory is
very expensive, because each memory location must have both a comparator and a storage element. A
cache memory can be implemented with a block of associative memory, together with a block of
``ordinary'' memory. The associative memory would hold the address of the data stored in the cache,
and the ordinary memory would contain the data at that address. Such a cache memory might be
configured as shown in Figure.
101
102
An example of a 2-way set associative cache is shown in Figure , which shows a cache containing a
total of 2K lines, or 1 K sets, each set being 2-way associative. (The sets correspond to the rows in the
figure.)
Random -- the location for the value to be replaced is chosen at random from all n of the cache
locations at that index position. In a 2-way set associative cache, this can be accomplished with
a single modulo 2 random variable obtained, say, from an internal clock.
First in, first out (FIFO) -- here the first value stored in the cache, at each index position, is the
value to be replaced. For a 2-way set associative cache, this replacement strategy can be
implemented by setting a pointer to the previously loaded word each time a new word is stored
in the cache; this pointer need only be a single bit. (For set sizes > 2, this algorithm can be
implemented with a counter value stored for each ``line'', or index in the cache, and the cache
can be filled in a ``round robin'' fashion).
Least recently used (LRU) -- here the value which was actually used least recently is replaced.
In general, it is more likely that the most recently used value will be the one required in the
near future. For a 2-way set associative cache, this is readily implemented by setting a special
bit called the ``USED'' bit for the other word when a value is accessed while the corresponding
bit for the word which was accessed is reset. The value to be replaced is then the value with the
USED bit set. This replacement strategy can be implemented by adding a single USED bit to
each cache location. The LRU strategy operates by setting a bit in the other word when a value
is stored and resetting the corresponding bit for the new word. For an n-way set associative
cache, this strategy can be implemented by storing a modulo n counter with each data word. (It
is an interesting exercise to determine exactly what must be done in this case. The required
circuitry may become somewhat complex, for large n.)
Cache memories normally allow one of two things to happen when data is written into a memory
location for which there is a value stored in cache:
Write through cache -- both the cache and main memory are updated at the same time. This
may slow down the execution of instructions which write data to memory, because of the
104
7.6
relatively longer write time to main memory. Buffering memory writes can help speed up
memory writes if they are relatively infrequent, however.
Write back cache -- here only the cache is updated directly by the CPU; the cache memory
controller marks the value so that it can be written back into memory when the word is
removed from the cache. This method is used because a memory location may often be altered
several times while it is still in cache without having to write the value into main memory. This
method is often implemented using an ``ALTERED'' bit in the cache. The ALTERED bit is set
whenever a cache value is written into by the processor. Only if the ALTERED bit is set is it
necessary to write the value back into main memory (i.e., only values which have been altered
must be written back into main memory). The value should be written back immediately before
the value is replaced in the cache.
Virtual Memory Concept
In a memory hierarchy system, programs and data are first stored in auxiliary memory. Portion of
program or data are brought into main memory as they are needed by CPU. Virtual memory is a concept
used in some large computer systems that permit the user to construct programs as though a large
memory space were available , equal to the totality of auxiliary memory. Each address that is referenced
by CPU goes through the address mapping from the so called virtual address to a physical address in
memory.Virtual memory is used to give the programmer the illusion that the system has a very large
memory, even though the computer actually has a relatively small main memory.
Address Space and Memory Space are each divided into fixed size group of words called blocks
or pages
1K words group
105
Straight forward design -> n entry table in memory, Inefficient storage space utilization <- n-m
entries of the table is empty
More efficient method is m-entry Page Table. Page Table made of an Associative Memory that is
m words; (Page Number: Block Number)
106
Virtual address
Page no.
1 0 1
Line number
Argument register
1 0 1
0 0
Key register
0
0
1
1
1
0
0
1
Associative memory
0
1
0
1
1
0
1
0
1
0
1
0
Page Fault
1. Trap to the OS
2. Save the user registers and program state
3. Determine that the interrupt was a page fault
4. Check that the page reference was legal and determine the location of the page on the
store(disk)
backing
Processor architecture should provide the ability to restart any instruction after a page fault.
108
computer
c).User interface
d).All of the above
b)ROM
c).RAM
d).TAB MEMORY
b).keyboard
c). Disk
9. A micro computer has primary memory of 640k . What is the exact number of bytes contained in this
memory?
a). 64x 1000
b).640x100 c).640x1024
d). either b or c
b).8192
c). 4000
d). 4096
Answers
1. a , 2.b , 3. a,4.c ,5.b ,6.d ,7. d ,8. c , 9. c,10.d
110
Chapter 8
Pipeline and Vector Processing
8.2 Pipelining
given segment is applied to the input register of the next segment. A clock is applied to all registers
after enough time has elapsed to perform all segment activity. In this way the information flows
through the pipeline one step at a time.
The pipeline organization will be demonstrated by means of a simple example. Suppose that we want
to perform the combined multiply and add operations with a stream of numbers.
Ai*Bi + Cj
for i = 1,2,3, . . . , 7
Each suboperation is to be implemented in a segment within a pipeline. Each segment has one or two
registers and a combinational circuit as shown in Figure 8.2. Rl through R5 are registers that receive
new data with every clock pulse.The multiplier and adder are combinational circuits. The
suboperations performed in each segment of the pipeline are as follows:
R1<--Aj, R2<--Bj
Input Aj and Bj
R3<--Rl*R2, R4-Cj
R5<--R3 + R4
Add Cj to product
Pipeline arithmetic units are usually found in very high speed computers. They are used to implement
floating-point operations, multiplication of fixed-point numbers, and similar computations encountered
in scientific problems. A pipeline multiplier is essentially an array multiplier as described in Fig. 10-10,
with special adders designed to minimize the carry propagation time through the partial products.
Floating-point operations are easily decomposed into suboperations as demonstrated in . We will now
show an example of a pipeline unit for floating-point addition and subtraction.
Floating Point Arithmetic Pipeline:
a. Suitable for pipeline as it consists of series of steps
b. Can implement FP ADD algorithm using the following pipeline
Consider a computer with an instruction fetch unit and an instruction execution unit designed to
provide a two-segment pipeline. The instruction fetch segment can be implemented by means of a firstin, first-out (FIFO) buffer. This is a type of unit that forms a queue rather than a stack. Whenever the
execution unit is not using memory, the control increments the program counter and uses its address
value to read consecutive instructions from memory. The instructions are inserted into the FIFO buffer
so that they can be executed on a first-in, first-out basis. Thus an instruction stream can be placed in a
queue, waiting for decoding and processing by the execution segment. The instruction stream queuing
mechanism provides an efficient way for reducing the average access time to memory for reading
instructions. Whenever there is space in the FIFO buffer, the control unit initiates the next instruction
fetch phase. The buffer acts as a queue from which control then extracts the instructions for the
execution unit.
Computers with complex instructions require other phases in addition to the fetch and
execute to process an instruction completely. In the most general case, the computer needs to process
each instruction with the following sequence of steps.
1. Fetch the instruction from memory.
2. Decode the instruction.
3. Calculate the effective address.
4. Fetch the operands from memory.
5. Execute the instruction.
6. Store the result in the proper place.
There are certain difficulties that will prevent the instruction pipeline from operating at its maximum
rate. Different segments may take different times to operate on the incoming information. Some
segments are skipped for certain operations. For example, a register mode instruction does not need an
effective address calculation. Two or more segments may require memory access at the same time,
causing one segment to wait until another is finished with the memory. Memory access conflicts are
sometimes resolved by using two memory buses for accessing instructions and data in separate
modules. In this way, an instruction word and a data word can be read simultaneously from two
different modules.
The design of an instruction pipeline will be most efficient if the instruction cycle is divided into
segments of equal duration. The time that each step takes to fulfill its function' depends on the
instruction and the way it is executed.
Among the characteristics attributed to RISC is its ability to use an efficient instruction pipeline. The
simplicity of the instruction set can be utilized to implement an instruction pipeline using a small
number of suboperations, with each being executed in one clock cycle. Because of the fixed-length
instruction format, the decoding of the operation can occur at the same time as the register selection.
All data manipulation instructions have register-to register operations. Since all operands are in
registers, there is no need for calculating an effective address or fetching of operands from memory.
Therefore, the instruction pipeline can be implemented with two or three segments.One segment
fetches the instruction from program memory, and the other segment executes the instruction in the
ALU. A third segment may be used to store the result of the ALU operation in a destination register.
The data transfer instructions in RISC are limited to load and store instructions. These instructions use
register indirect addressing. They usually need three or four stages in the pipeline. To prevent conflicts
between a memory access to fetch an instruction and to load or store an operand, most RISe machines
use two separate buses with two memories: one for storing the instructions and the other for storing the
data. The two memories can sometime operate at the same speed as the epu clock and are referred to as
cache memories .
It is not possible to expect that every instruction be fetched from memory and executed in one clock
cycle. What is done, in effect, is to start each instruction with each clock cycle and to pipeline the
processor to achieve the goal of single-cycle instruction execution. The advantage of RISC over else
(complex instruction set computer) is that RISC can achieve pipeline segments, requiring just one
clock cycle, while CISC uses many segments in its pipeline, with the longest segment requiring two or
more clock cycles.
Another characteristic of RISC is the support given by the compiler that translates the high-level
language program into machine language program. Instead of designing hardware to handle the
difficulties associated with data conflicts and branch penalties, RISC processors rely on the efficiency
of the compiler to detect and minimize the delays encountered with these problems.
4.
5.
6.
7.
Without sophisticated computers, many of the required computations cannot be completed within a
reasonable amount of time. To achieve the required level of high performance it is necessary to utilize
the fastest and most reliable hardware and apply innovative procedures from vector and parallel
processing techniques.
Vector Operations
Many scientific problems require arithmetic operations on large arrays of numbers. These numbers fire
usually formulated as vectors and matrices of floating-point numbers. A vector is an ordered set of a
one-dimensional array of data items. A vector V of length n is represented as a row vector by V = [VI
V2 V3 ... Vn]. It may be represented as a column vector if the data items are listed in a column. A
conventional sequential computer is capable of processing operands one at a time. Consequently,
operations on vectors must be broken down into single computations with subscripted variables. The
element Vi ot vector V is written as V(I) and the index I refers to a memory address or register where
the number is stored. To examine the difference between a conventional scalar processor and a vector
processor, consider the following Fortran DO loop:
DO 20 I = 1, 100
20 C(I) = B(I) + A(I)
This is a program for adding two vectors A and B of length 100 to produce a vector C. This is
implemented in machine language by the following sequence of operations.
Initialize I = 0
20
Read A (I)
Read B ( I )
Store C (I) = A (I) + B (I)
Increment I = I + 1
If I :5 100 go to 20
Continue
This constitutes a program loop that reads a pair of operands from arrays A and B and performs a
floating-point addition. The loop control variable is then up~ and the steps repeat 100 times.
117
A computer capable of vector processing eliminates the overhead associated with the time it takes of
fetch and execute the instructions in the program loop.It allows operations to be specified with single
vector instruction of the form
C(1:100) = A(1:100) + B(1:100)
Vector Instructions
refer two different types of processor . An attach array processor is an auxiliary processor attached to
general purpose computer. It is intended to improve the performance of the host computer in specific
numerical computation tasks. An SIMD
processor is a processor that has single instruction multiple data organization. It manipulates vector
instructions by means of multiple functional units responding to a common instruction . Although both
types of array processors manipulate vector , their organization is different.
Attached Array Processor
An attached array processor is designed as a peripheral for conventional host computer, and its
purpose is to enhance the performance of computer by providing Vector processing for complex
applications. It achieves high performance by means of parallel processing with multiple functional
units. It includes an arithmetic unit containing one or more pipelined floating point adders and
multipliers. The array processor can be programmed by the user to accommodate variety of complex
arithmetic problems.
Attached array processor with host computer
119
Many computers have instruction sets that include more than 100 and sometimes even more than
200 instructions. These computers also employ a variety of data types and a large number of
addressing modes. The trend into computer hardware complexity was influenced by various
factors, such as upgrading existing models to provide more customer applications, adding
instructions that facilitate the translation from high-level language into machine language
programs, and striving to develop machines that move functions from software implementation
into hardware implementation. A computer with a large number of instructions is classified as a
complex instruction set computer, abbreviated CISC.
ClSC Characteristics
1. A large number of instructions-typically from 100 to 250 instructions.
2. Some instructions that perform specialized tasks and are used infrequently.
3. A large variety of addressing modes-typically from 5 to 20 different modes.
4. Variable-length instruction formats
5. Instructions that manipulate operands in memory
RISC Characteristics
1.Relatively few instructions.
2.Relatively few addressing modes.
3. Memory access limited to load and store instructions
4.All operations done within the registers of the CPU
5. Fixed-length, easily decoded instruction format
6. Single-cycle instruction execution
7. Hardwired rather than micro programmed control
120
Operand forwarding
Delayed load
Hardware interlocks
Delayed branch
CISC
RISC
SIMD
MISD
Q3. Which Pipeline are used to implement floating-point operations, multiplication of fixed-point
numbers, and similar computations encountered in scientific problems.
a)
b)
c)
d)
Instruction Pipeline
Arithmetic Pipeline
RISC Pipeline
CISC Pipeline
ALU
Processing Element
Local memory
Control Unit
d) Pipelining
Q7.Parallel processing can be achieved with the help of
a)
b)
c)
d)
3
4
5
2
5
6
3
4
Q10. A measure used to evaluate supercomputer computer in their ability to perform a given
number of floating-point operations per second is referred
a)
b)
c)
d)
Flops
Bytes
Bits
Hz
Answers
1.(a), 2.(b),3.(b), 4.(d) ,5.(a) ,6(d), 7.(a) ,8.(c), 9.(d) , 10.(a)
122