Digital Computer Fundamentals Unit III

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Digital Computer

Fundamentals
Unit III
Digital Computer Fundamentals

Unit III
This book is a part of the course by uts, Pune.
This book contains the course content for Digital Computer Fundamentals.

© uts, Pune
First Edition 2011

The content in the book is copyright of uts. All rights reserved.


No part of the content may in any form or by any electronic, mechanical, photocopying, recording, or any other
means be reproduced, stored in a retrieval system or be broadcast or transmitted without the prior permission of
the publisher.

uts makes reasonable endeavours to ensure Content is current and accurate. uts reserves the right to alter it’s the
content whenever the need arises, and to vary it at any time without prior notice.

Published by
uts
Bavdhan, Pune - 411021

Website : www.utsglobal.edu.in
Tel : +91-20-41034800, +91 9011067684
Contents

Chapter V....................................................................................................................................................... 1
Central Processing Unit................................................................................................................................ 1
Aim................................................................................................................................................................. 1
Objectives....................................................................................................................................................... 1
Learning outcome......................................................................................................................................... 1
5.1 Introduction . ............................................................................................................................................ 2
5.2 CPU Operation.......................................................................................................................................... 2
5.3 Processor Organisation............................................................................................................................. 3
5.3.1 General Register....................................................................................................................... 3
5.3.2 Addressing Modes ................................................................................................................... 4
5.4 RISC, CISC and VLIW . .......................................................................................................................... 4
5.4.1 RISC ........................................................................................................................................ 4
5.4.2 VLIW........................................................................................................................................ 4
5.4.3 CISC.......................................................................................................................................... 5
5.5 Instruction Set........................................................................................................................................... 5
5.5.1 Code Density............................................................................................................................. 6
5.5.2 Number of Operands................................................................................................................. 6
Summary........................................................................................................................................................ 8
References...................................................................................................................................................... 8
Recommended Reading................................................................................................................................ 8

Chapter VI..................................................................................................................................................... 9
Introduction to Multiprocessor System...................................................................................................... 9
Aim................................................................................................................................................................. 9
Objectives....................................................................................................................................................... 9
Learning outcome......................................................................................................................................... 9
6.1 Introduction . .......................................................................................................................................... 10
6.2 Flynn’s Classification............................................................................................................................. 10
6.3 Characteristics of Multiprocessors...........................................................................................................11
6.3.1 Interprocessor Arbitration....................................................................................................... 12
6.3.2 Inter Processor Communication and Synchronisation............................................................ 12
6.3.3 Cache Coherence.................................................................................................................... 13
6.4 Parallel Processing.................................................................................................................................. 13
6.5 Types of Parallel Organisation................................................................................................................ 14
6.6 Pipelining................................................................................................................................................ 14
Summary...................................................................................................................................................... 17
References.................................................................................................................................................... 17
Recommended Reading.............................................................................................................................. 17
Video Links.................................................................................................................................................. 18

I/uts
Digital Computer Fundamentals

Chapter

V
Central Processing Unit

Aim
The aim of this chapter is to:

• introduce CPU organisation

• explain CPU operation

• elucidate processor organisation

• explain addressing modes

Objectives
The objectives of the chapter are to:

• enlist the number of operands

• classify CPU operation

• explain addressing modes

Learning outcome
At the end of this chapter, the students will be able to:

• enlist RISW, CISW and VLIW

• recognise addressing modes

• describe CPU organisation

1/uts
Digital Computer Fundamentals

5.1 Introduction
• The CPU is the brain of computer. Sometimes simply referred to as the central processor, CPU is more commonly
called “Processor”. The CPU is where most calculations take place. In terms of computing power, the CPU is
the most important element of a computer system.
• On large machines, CPUs require one or more printed circuit boards. On personal computers and small
workstations, the CPU is housed in a single chip called a microprocessor. Since the 1970’s, the microprocessor
class of CPUs has almost completely overtaken all other CPU implementations.
• The CPU itself is an internal component of the computer. Modern CPUs are small and square in shape and
contain multiple metallic connectors or pins on the underside. The CPU is inserted directly into a CPU socket,
pin side down, on the motherboard.
• Each motherboard will support only a specific type or range of CPU, so one must check the motherboard
manufacturer’s specifications before attempting to replace or upgrade a CPU. Modern CPUs also have an
attached heat sink and small fan that is located at the top of the CPU to help dissipate heat.

5.2 CPU Operation


The fundamental operation of most CPUs, regardless of the physical form they take, is to execute a sequence
of stored instructions called a program. The program is represented by a series of numbers that are kept in
some kind of computer memory. There are four steps that nearly all CPUs use in their operation: fetch, decode,
execute, and write back.

Fetch:
• This involves retrieving an instruction (which is represented by a number or sequence of numbers) from program
memory. The location in program memory is determined by a program counter (PC), which stores a number
that identifies the current position in the program.
• In other words, the program counter keeps a track of the CPU’s place in the program. After an instruction is
fetched, the PC is incremented by the length of the instruction word in terms of memory units.
• Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall
while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches
and pipeline architectures.
• The instruction that the CPU fetches from memory is used to determine what the CPU is to do.

Decode:
• In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU.
• The way in which the numerical instruction value is interpreted is defined by the CPU’s instruction set
architecture (ISA). Often, one group of numbers in the instruction, called the opcode, indicates which operation
to perform.
• The remaining parts of the number usually provide information required for that instruction, such as operands
for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a
place to locate a value: a register or a memory address, as determined by some addressing mode.
• In older designs, the portions of the CPU responsible for instruction decoding, were unchangeable hardware
devices. However, in more abstract and complicated CPUs and ISAs, a micro program is often used to assist in
translating instructions into various configuration signals for the CPU.
• This micro program is sometimes rewritable, so that it can be modified to change the way the CPU decodes
instructions even after it has been manufactured.

2/uts
Digital Computer Fundamentals

Execute:
• After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU
are connected, so they can perform the desired operation. If, for instance, an addition operation was requested,
an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs.
• The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the
circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations).
If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags
register may also be set.

Write:
• The final step, write back, simply “writes back” the results of the execute step to some form of memory. Very
often, the results are written to some internal CPU register for quick access by subsequent instructions.
• In other cases, results may be written to slower, but cheaper and larger, main memory. Some types of instructions
manipulate the program counter rather than directly produce result data. These are generally called “jumps”
and facilitate behaviour like loops, conditional program execution (through the use of a conditional jump), and
functions in programs.
• Many instructions will also change the state of digits in a “flags” register. These flags can be used to influence
how a program behaves, since they often indicate the outcome of various operations. For example, one type of
“compare” instruction considers two values and sets a number in the flags register according to which one is
greater. This flag could then be used by a later jump instruction to determine program flow.
• After the execution of the instruction and write back of the resulting data, the entire process repeats, with the
next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in
the program counter.
• If the completed instruction was a jump, the program counter will be modified to contain the address of the
instruction that was jumped to, and program execution continues normally.
• In more complex CPUs than the ones described here, multiple instructions can be fetched, decoded, and executed
simultaneously. This section describes what generally is referred to as the “Classic RISC pipeline”, which in
fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller).
• It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.

5.3 Processor Organisation


The two main components of processor organisation are:

5.3.1 General Register


• The internal processor memory of a CPU is served by its registers. One of the key differences among various
computers is the difference in their register sets. Some computers have very large, while some has smaller
sets.
• But on the whole, from a user’s point of view, the register set can be classified under two basic categories.
‚‚ Programmer visible registers-These registers can be used by machine or assembly language programmers
to minimise the references to main memory.
‚‚ Status control and registers-These registers can not be used by the programmers but are used to control the
CPU or the execution of a program.
• Different vendors have used some of these registers interchangeably; therefore one should not stick to these
definitions rigidly.

3/uts
Digital Computer Fundamentals

5.3.2 Addressing Modes


• Addressing modes are the aspects of the instruction set architecture in most central processing unit (CPU)
designs.
• The various addressing modes that are defined in a given instruction set architecture define how machine language
instructions in that architecture identify the operand (or operands) of each instruction.
• An addressing mode specifies how to calculate the effective memory address of an operand by using information
held in registers and/or constants contained within a machine instruction or elsewhere.
• In computer programming, addressing modes are primarily of interest to compiler writers and to those who
write code directly in assembly language.

5.4 RISC, CISC and VLIW


RISC, CISC and VLIW design classifications are the modern equivalents of the micro, mini and supermini, scheme
of the old.

5.4.1 RISC
• Reduced instruction set computing or RISC, is a CPU design strategy based on the insight that simplified
(as opposed to complex) instructions can provide higher performance, if this simplicity enables much faster
execution of each instruction.
• A computer based on this strategy is a reduced instruction set computer (also RISC). There are many proposals
for precise definitions, but the term is slowly being replaced by the more descriptive load-store architecture.
• Well known RISC families include DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR, MIPS, PA-RISC, Power
(including PowerPC), SuperH, and SPARC.
• Some aspects attributed to the first RISC-labeled designs around 1975 include the observations that the memory-
restricted compilers of the time were often unable to take advantage of features intended to facilitate manual
assembly coding, and that complex addressing modes take many cycles to perform due to the required additional
memory accesses.
• It was argued that such functions would be better performed by sequences of simpler instructions if this could
yield implementations small enough to leave room for many registers, reducing the number of slow memory
accesses.
• In these simple designs, most instructions are of uniform length and similar structure, arithmetic operations are
restricted to CPU registers and only separate load and store instructions access memory.
• These properties enable a better balancing of pipeline stages than before, making RISC pipelines significantly
more efficient and allowing higher clock frequencies.

5.4.2 VLIW
• Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction
level parallelism (ILP).
• A processor that executes every instruction one after the other (i.e., a non-pipelined scalar architecture) may
use processor resources inefficiently, potentially leading to poor performance.
• The performance can be improved by executing different sub-steps of sequential instructions simultaneously (this
is pipelining), or even executing multiple instructions entirely simultaneously as in superscalar architectures.
• Further improvement can be achieved by executing instructions in an order different from the order they appear
in the program; this is called out-of-order execution.
• As often implemented, these three techniques all come at a cost: increased hardware complexity. Before executing
any operations in parallel, the processor must verify that the instructions do not have interdependencies.
• For example, a result of first instruction is used as an input for the second instruction. Clearly, they cannot
execute at the same time, and the second instruction can’t be executed before the first.

4/uts
Digital Computer Fundamentals

• Modern out-of-order processors have increased the hardware resources which do the scheduling of instructions
and determining of interdependencies.
• The VLIW approach, on the other hand, executes operations in parallel based on a fixed schedule determined
when programs are compiled. Since determining the order of execution of operations (including which operations
can execute simultaneously) is handled by the compiler, the processor does not need the scheduling hardware
that the three techniques described above require.
• As a result, VLIW CPUs offer significant computational power with less hardware complexity (but greater
compiler complexity) than is associated with most superscalar CPUs.
• As is the case with any novel architectural approach, the concept is only as useful as code generation makes
it. That is, the fact that a number of special-purpose instructions are available to facilitate certain complicated
operations—say, Fast Fourier Transform (FFT) computation or certain calculations that recur in tomography
contexts—is useless if compilers are unable to spot relevant source code constructs and generate target code
that duly utilises the CPU’s advanced offerings.
• A fortiori, the programmer must be able to express his algorithms in a manner that makes the compiler’s task.

5.4.3 CISC
• A complex instruction set computer (CISC), is a computer where single instructions can execute several low-
level operations (such as a load from memory, an arithmetic operation, and a memory store) and/or are capable
of multi-step operations or addressing modes within single instructions.
• The term was retroactively coined in contrast to reduced instruction set computer (RISC).

5.5 Instruction Set


• Any given instruction set can be implemented in a variety of ways. All ways of implementing an instruction set
give the same programming model, and they all are able to run the same binary executables.
• The various ways of implementing an instruction set give different tradeoffs between cost, performance, power
consumption, size, etc.
• When designing micro architectures, engineers use blocks of “hard-wired” electronic circuitry (often designed
separately) such as adders, multiplexers, counters, registers, ALUs etc.
• Some kind of register transfer language is then often used to describe the decoding and sequencing of each
instruction of an ISA using this physical micro architecture.
• There are two basic ways to build a control unit to implement this description (although many designs use
middle ways or compromises):
‚‚ Early computer designs and some of the simpler RISC computers “hard-wired” the complete instruction
set decoding and sequencing (just like the rest of the micro architecture).
‚‚ Other designs employ microcode routines and/or tables to do this—typically as on chip ROMs and/or PLAs
(although separate RAMs have been used historically).
• There are also some new CPU designs which compile the instruction set to a writable RAM or FLASH inside
the CPU or an FPGA (reconfigurable computing). The Western Digital MCP-1600 is an older example, using
a dedicated, separate ROM for microcode.
• An ISA can also be emulated in software by an interpreter. Naturally, due to the interpretation overhead, this is
slower than directly running programs on the emulated hardware, unless the hardware running the emulator is
an order of magnitude faster. Today, it is common practice for vendors of new ISAs or micro architectures to
make software emulators available to software developers before the hardware implementation is ready.
• Often the details of the implementation have a strong influence on the particular instructions selected for the
instruction set.
• For example, many implementations of the instruction pipeline only allow a single memory load or memory store
per instruction, leading to load-store architecture (RISC). In another example, some early ways of implementing
the instruction pipeline led to a delay slot.

5/uts
Digital Computer Fundamentals

5.5.1 Code Density


• In early computers, program memory was expensive, so minimising the size of a program to make sure it would
fit in the limited memory was often central. Thus, the combined size of all the instructions needed to perform a
particular task, the code density, was an important characteristic of any instruction set.
• Computers with high code density also often had (and have still) complex instructions for procedure entry,
parameterised returns, loops etc. (therefore retroactively named as Complex Instruction Set Computers,
CISC).
• However, more typical or frequent “CISC” instructions merely combine a basic ALU operation, such as “add”,
with the access of one or more operands in memory (using addressing modes such as direct, indirect, indexed
etc.). Certain architectures may allow two or three operands (including the result) directly in memory or may
be able to perform functions such as automatic pointer increment etc.
• Software-implemented instruction sets may have even more complex and powerful instructions.
• Reduced instruction-set computers, RISC, were first widely implemented during a period of rapidly-growing
memory subsystems and sacrifice code density in order to simplify implementation circuitry and thereby try to
increase performance via higher clock frequencies and more registers.
• RISC instructions typically perform only a single operation, such as an “add” of registers or a “load” from
a memory location into a register; they also normally use a fixed instruction width, whereas a typical CISC
instruction set has many instructions shorter than this fixed length.
• Fixed-width instructions are less complicated to handle than variable-width instructions for several reasons and
are therefore somewhat easier to optimise for speed. However, as RISC computers normally require more and
often longer instructions to implement a given task, they inherently make less optimal use of bus bandwidth
and cache memories.
• Minimal instructions set computers (MISC) are a form of stack machine, where there are few separate instructions
(16-64), so that multiple instructions can be fit into a single machine word.
• This often takes little silicon to implement, so they can be easily realised in an FPGA or in a multi-core form.
Code density is similar to RISC; the increased instruction density is offset by requiring more of the primitive
instructions to do a task.

5.5.2 Number of Operands


• Instruction sets may be categorised by the maximum number of operands explicitly specified in instructions.
• 0-operand (zero address machines), so called “stack machines”: All arithmetic operations take place using the
top one or two positions on the stack; 1-operand push and pop instructions are used to access memory: push
a, push b, add, pop c.
• 1-operand: One address machines, called “accumulator machines”, include most early computers and many
small microcontrollers.
• Most instructions specify a single explicit right operand (a register, a memory location, or a constant) with the
implicit accumulator as both destination and left (or only) operand: load a, add b, store c. A related class is
practical stack machines which often allow a single explicit operand in arithmetic instructions: push a, add b,
pop c.
• 2-operand: Many CISC and RISC machines fall under this category:
‚‚ CISC: load a,reg1; add reg1,b; store reg1,c
‚‚ RISC: Requiring explicit memory loads, the instructions would be: load a,reg1; load b,reg2; add reg1,reg2;
store reg2,c
• 3-operand: Allows better reuse of data:
‚‚ CISC: It becomes either a single instruction: add a,b,c, or more typically: move a,reg1; add reg1,b,c as
most machines are limited to two memory operands.
‚‚ RISC: Due to the large number of bits needed to encode three registers, this scheme is typically not
available in RISC processors using small 16-bit instructions; arithmetic instructions use registers only,

6/uts
Digital Computer Fundamentals

so explicit 2-operand load/store instructions are needed: load a,reg1; load b,reg2; add reg1+reg2->reg3;
store reg3,c;
• More operands: Some CISC machines permit a variety of addressing modes that allow more than 3 operands
(registers or memory accesses), such as the VAX “POLY” polynomial evaluation instruction.
• Each instruction specifies some number of operands (registers, memory locations, or immediate values)
explicitly.
• Some instructions give one or both operands implicitly, such as by being stored on top of the stack or in an
implicit register.
• When some of the operands are given implicitly, the number of specified operands in an instruction is smaller
than the parity of the operation.
• When a “destination operand” explicitly specifies the destination, the number of operand specifies in an
instruction is larger than the parity of the operation. Some instruction sets have different numbers of operands
for different instructions.

7/uts
Digital Computer Fundamentals

Summary
• The chapter explains central processing unit as the brain of the computer.
• The instruction that the CPU fetches from memory is used to determine what the CPU is to do.
• The internal processor memory of a CPU is served by its registers.
• Addressing modes are aspects of the instruction set architecture in most central processing unit (CPU)
designs.
• Reduced instruction set computing, or RISC, is a CPU design strategy based on the insight that simplified
instructions can provide higher performance if this simplicity enables much faster execution of each
instruction.
• Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction
level parallelism.
• A complex instruction set computer (CISC), is a computer where single instructions can execute several low-
level operations and/or are capable of multi-step operations or addressing modes within single instructions.
• The various ways of implementing an instruction set give different tradeoffs between cost, performance, power
consumption, size, etc.
• Instruction sets may be categorised by the maximum number of operands explicitly specified in instructions.

References
• Katevenis, G.H., 2007, Reduced instruction set computer architectures for VLSI, MIT Press.
• Godse, D.A., Godse, A. P., 2006, Computer Organisation and Architecture, Technical Publications.
• Maini, A., 2007, Digital electronics: principles, devices and applications, John Wiley and Sons.

Recommended Reading
• Warford, J. S., Computer Systems, Jones & Bartlett publisher, 4th ed.
• Patterson, D., A., 2008, Computer Organization and Design, Fourth Edition: The Hardware/Software
Interface,
• White, R., 2007, How Computers Work, Que.

8/uts
Digital Computer Fundamentals

Chapter

VI
Introduction to Multiprocessor System

Aim
The aim of this chapter is to:

• understand multiprocessor system

• study Flynn’s classification

• comprehend pipelining

Objectives
The objectives of this chapter are to:

• understand the characteristics of multiprocessor

• classify multi-port memory

• explain inter processor arbitration

Learning outcome
At the end of this chapter, the students will be able to:

• enlist the types of parallel organisation

• explain the characteristics of multiprocessor

• describe parallel processing

9/uts
Digital Computer Fundamentals

6.1 Introduction
• The computers can be made faster by increasing their clock speed or by using improved VLSI technology. But
there is a practical limitation to both of them. The only way that seems possible to theoretically increase the
speed in an unlimited fashion is to have multiple modules, that is, split the original problems into independent
modules and execute them simultaneously.
• When more number of processors do the job simultaneously, the speed also is more. “Parallel processing” is a
term used to denote a large class of techniques that are used to provide simultaneous data processing tasks for
the purpose of increasing the computational speed of a computer.
• Most basic parallel processing can be seen in uniprocessors (the systems with one CPU) as well. The simplest
parallel processing is the parallel transfer of data from one module to another in the computer. Instead of
transferring the data bit by bit, 8/16/32 bit are transferred simultaneously through the bus.
• Various cases of instruction execution also take place simultaneously. For example, while one instruction is being
executed in the 8088 based PC, the next instruction is fetched and kept in the queue, ready for execution. In a
similar manner, CPU and I/O operations are also overlapped to increase the overall speed of the computer.

6.2 Flynn’s Classification


There are varieties of ways in which parallel processing can be classified. One of the most popular classifications
of parallel computer is by M.J. Flynn, called Flynn’s classification, and it is based on the multiplicity of instruction
streams and data streams in a computer system. The sequence of instructions read from the memory constitutes the
instruction steam, and the data they operate of in the processor constitutes the data stream. Flynn’s classification
divides the computers into four categories:
‚‚ single instruction single data (SISD)
‚‚ single instruction multiple data (SIMD)
‚‚ multiple instruction single data (MISD)
‚‚ multiple instruction multiple data (MIMD)
• SISD organisation is available in most serial computers today. Instructions are executed sequentially, though
may be overlapped in their execution stage (pipelining). All the functional units are under the supervision of
one control unit. The Von Neumann architecture falls under this category.
• SIMD organisation consists of multiple processing elements supervised by the same control unit. All processing
units receive the same instruction broadcast but works or different data streams. One of the very common example
is the execution of “For Loop”, in which the same set of instructions are executed for, may be a different set
of data.

Control
(Instruction Stream)

Proc 1 Proc 2 Proc n

Memory
(Data Stream)

Fig. 6.1 SIMD

10/uts
Digital Computer Fundamentals

• MISD organisation consists of N processor units, each working on a different set of instructions but working on
the same set of data. The output of one unit becomes the input to the other unit. There is a commercial computer
of this kind.

(Instruction Streams)
Control 1 Control 2 Control n

Proc 1 Proc 2 Proc n

Memory
(Data Stream)

Fig. 6.2 MISD

• MIMD organisation implies interactions between N processors because all memory streams are derived from the
same data stream shared by all processors. If the interaction between the processors is high, it is called a tightly
coupled system, or else it is called a loosely coupled system. Most multiprocessors fit into this category.

(Instruction Streams)
Control 1 Control 2 Control n

Proc 1 Proc 2 Proc n

Memory
(Data Stream)

Fig 6.3 MIMD

6.3 Characteristics of Multiprocessors


• A multiprocessor system is an interconnection of two or more CPUs, with memory and input-output equipment.
As defined earlier, multiprocessors can be put under MIMD category.
• The term multiprocessor is some times confused with the term multicomputer. Though both of them support
concurrent operations, there is an important difference between a system with multiple computers and a system
with multiple processors. In a multicomputer system, there are multiple computers with their own operating
systems, which communicate with each other, if needed, through communication links.
• A multiprocessor system, on the other hand, is controlled by a single operating system, which coordinates the
activities of the various processors, either through shared memory or through interprocessor messages.

11/uts
Digital Computer Fundamentals

• The advantages of multiprocessor systems are:


‚‚ increased reliability because of redundancy in processors
‚‚ increased throughput because of execution of - multiple jobs in parallel - portions of the same job in
parallel
• A single job can be divided into independent tasks, either manually by the programmer, or by the compiler,
which finds the portions of the program that are data independent, and can be executed in parallel.
• The multiprocessors are further classified into two groups, depending on the way their memory is organised.
The processors with shared memory are called tightly coupled or shared memory processors.
• The information in these processors is shared through the common memory. Each of the processors can also
have their local memory too.
• The other class of multiprocessors is loosely coupled or distributed memory multiprocessors. In this, all the
processors have their own private memory, and they share information with each other through interconnection
switching scheme or message passing.
• The principal characteristic of a multiprocessor is its ability to share a set of main memory and some I/O
devices. This sharing is possible through some physical connections between them called the interconnection
structures.

6.3.1 Interprocessor Arbitration


• Computer system needs buses to facilitate the transfer of information between its various components. For
example, even in a uniprocessor system, if the CPU has to access a memory location, it sends the address of
the memory location on the address bus. This address activates a memory chip.
• The CPU then sends a red signal through the control bus, in response of which the memory puts the data on the
address bus. This address activates a memory chip. The CPU then sends a read signal through the control bus,
in response of which the memory puts the data on the data bus. Similarly, in a multiprocessor system, if any
processor has to read a memory location from the shared area, it follows the similar routine.
• There are buses that transfer data between the CPUs and memory. These are called memory buses. An I/O
bus is used to transfer data to and from input and output devices. A bus that connects major components in a
multiprocessor system, such as CPUs, I/Os, and memory is called system bus.
• A processor, in a multiprocessor system, requests the access of a component through the system bus. If there
is no processor accessing the bus at that time, it is then given the control of the bus immediately. If there is a
second processor utilising the bus, then this processor has to wait for the bus to be freed.
• If at any time, there is a request for the services of the bus by more than one processor, then the arbitration is
performed to resolve the conflict. A bus controller is placed between the local bus and the system bus to handle
this.

6.3.2 Inter Processor Communication and Synchronisation


• In a multiprocessor system, it becomes very necessary, that there is a proper communication protocol between
the various processors. In a shared memory multiprocessor system, a common area in the memory is provided,
in which all the messages that need to be communicated to other processors are written.
• A proper synchronisation is also needed whenever there is a race of two or more processors for shared resources
like I/O resources. The operating system in this case is given the task of allocating the resources to the various
processors in a way that at any time not more than one processor uses the resource.
• A very common problem can occur when two or more resources are trying to access a resource, can be modified.
For example, processor 1 and 2 are simultaneously trying to access memory location 100. Processor 1 is writing
on to the location while processor 2 is reading it.
• The chances are that, processor 2 will end up reading erroneous data. Such kind of resources which need to
be protected from simultaneous access of more than one processors are called critical sections. The following
assumptions are made regarding the critical sections:

12/uts
Digital Computer Fundamentals

At most one processor can be in a critical section at a


Mutual exclusion
time.
Termination The critical section is executed in a finite time.
A process attempting to enter the critical section will
Fair scheduling
eventually do so in a finite time.

Table 6.1 Critical sections

• A binary value called a semaphore is usually used to indicate whether a processor is currently executing the
critical section.

6.3.3 Cache Coherence


• Cache memories are high speed buffers which are inserted between the processor and the main memory to
capture those portions of the contents of main memory which are currently in use. These memories are five to
ten times faster than main memories and therefore reduce the overall access time. In a multiprocessor system,
with shared memory, each processor has its own set of private cache.
• Multiple copies of the cache are provided with each processor to reduce the access time. Each processor, whenever
accesses the shared memory, also updates its private cache. This introduced the problem of cache coherence,
which may result in data inconsistency. That is, several copies of the same data may exist in different caches
at any given time.
• For example, let us assume there are two processors x and y. Both have the same copy of the cache. Processor x
produces data ‘a’, which is to be consumed by processor y. Processor updates the value of ‘a’ in its own private
copy of the cache. As it does not have any access to the private copy of cache of processor y, the processor y
continues to use the variable ‘a’ with old value, unless it is informed of the change.

6.4 Parallel Processing


• Parallel processing means simultaneous use of more than one CPU or processor core to execute a program or
multiple computational threads. Ideally, parallel processing makes programs run faster because there are more
engines (CPUs or cores) running it.
• In practice, it is often difficult to divide a program in such a way that separate CPUs or cores can execute different
portions without interfering with each other. Most computers have just one CPU, but some models have several,
and multi-core processor chips are becoming the norm.
• There are even computers with thousands of CPUs. With single CPU, single-core computers; it is possible to
perform parallel processing by connecting the computers in a network.
• However, this type of parallel processing requires very sophisticated software called distributed processing
software.
• Parallel processors can be categorised into several categories. These include:
‚‚ Array processors
‚‚ Distributed architecture
‚‚ Multiprocessors
‚‚ Data flow architectures
• Array processors are parallel architectures which deal with repetitive operations. These are the examples of
SIMD architecture, with applications varying from mathematical array operations and in structures in which
data objects are known well in advance.
• Distributed architecture are composed of relatively autonomous subsystems which are capable of handling
complete system and execution functions, and which cooperate together to run a large application.

13/uts
Digital Computer Fundamentals

• Multiprocessors are architectures composed of multiple computing units which operate in a synchronous mode.
Generally, both shared and local memory can be available for these processors. The communication between
the processors takes place either through the shared memory area or through the interprocessor messages.
• Dataflow architectures are functionally distributed architectures, in which the operations are triggered with the
arrival of data at these processors. They may be viewed as very general MIMD parallel architectures.

6.5 Types of Parallel Organisation


The types of parallel organisation are as follows:
Time sharing bus
• An extensible time-sharing bus structure is provided for a microprocessor to read data from, or write data to
a memory. An address/data bus transfers addresses and data between the microprocessor and the memory in a
time sharing manner.
• The combination of the logic levels of two control lines is used to determine that the address/data bus is utilised
to transfer addresses, to read data or to write data.
• Thus, the pin number required in the bus interface is reduced, and the memory capacity can be increased
flexibly.

Multiport memory
• A multiport memory has a RAM port includes a memory cell array having a plurality of memory cells arranged
in a matrix form.
• It has a sense amplifier circuit for sensing potential of a bit line after the storage potential has been transferred
from the memory cells and restore circuit connected to the bit line for pulling up the potential of the bit line at
the predetermined timing after sense operation has been started.
• It has a barrier circuit connected between the bit line and the sense amplifier circuit; and a SAM port including
a data register, transfer gate and functional means for transferring serial data in the column direction.
• In this memory, the RAM port is connected to the SAM port by the transfer gate with the bit line directly
connected to the data register, and the potentials at the bit line are amplified by the sense amplifier circuit and
are directly transferred to the data register.

6.6 Pipelining
The basic idea behind pipeline design is quite natural; it is not specific to computer technology. In fact, the name
pipeline stems from the analogy with petroleum pipelines, in which a sequence of hydrocarbon products is pumped
through a pipeline.
• The last product might well be entering the pipeline before the first product has been removed from the terminus.
The key contribution of pipelining is that it provides a way to start a new task before an old one has been
completed. The concept can be better understood by taking an example of an industrial plant.
• To achieve pipelining, one must subdivide the input task into a sequence of subtasks, each of which can be
executed by a specialised hardware stage that operates concurrently with other stages in the hardware. Successive
tasks are streamed into the pipe and get executed in an overlapped fashion at the subtask level. Hence, the
completion rate is not a function of the total processing time, but rather of how soon a new process can be
introduced.
• The subdivision of labour in assembly lines has contributed to the success of mass production in modern industry.
By the same token, pipeline processing has led to the tremendous improvement of system throughput in the
modern digital computers.

14/uts
Digital Computer Fundamentals

• Now consider that for execution of an instruction, a computers performs N processing steps. A server unit as
represented in fig. 6.4 (a) can perform any one of the N steps in one unit time. If we process M instructions by
this server, then the rate of completion is one instruction of every N steps, and the time behaviour of the job
stream is as described in fig. 6.4 (a). Compare fig. 6.4 (a) with fig. 6.4 (b) which shows N servers concatenated
in a sequence each performing only a single step flows through the collection of servers by visiting server 1,
then server 2, and so on, and finally emerging from server N after N steps.
• The time behaviour of this system is shown in fig. 6.4 (b).
• Fig. 6.4 (b) is an ideal model of a constant speed assembly line, such as an automobile assembly plant.

N stpes N stpes
Job stream Server One completion every N steps
N units job (N) units
Time
(a) Sequential execution with N-unit server

1 2 3 4 N
One
Job stream Server Server Server completion
N units job (1 unit) (1 unit) (1 unit) every 1 step 1 2 3 4 N

1 2 3 4 N

1 2 3 4 N
Time
(b) Pipelined execution with 1 unit servers

Fig. 6.4 Two ways of executing N-unit jobs in a stream

• Thus, any arithmetic task is divided into steps. Thus, a sort of concurrent operations can be obtained in the
pipeline. This will help in increasing the throughput of the system.
• Though superficially it seems, that it should be possible to execute any application in various pipeline stages,
it is not actually so. Only those applications, which can be broken into independent subtasks, can take the
advantage of pipelining.
• It is most efficient for those applications that need to repeat the same task many times with different sets of data.
Fig. 6.5 shows the four segment pipeline.

Clock

Input
S1 R1 S2 R2 S3 R3 S4 R4

Fig. 6.5 Four segment pipeline


• The figure shows two kinds of modules, Si and Ri. These specify the processing circuits and the registers to
hold the intermediate results, respectively. The information flows from left to right, after being input from the
left most modules.
• The task can be defined as the total operation going from all the segments of the pipeline.
• The behaviour of a pipelined operation can be explained with the help of a space-time diagram, fig. 6.6.
• This illustrates the overlapped behaviour in a pipelined processor having four segments. The horizontal axis
displays the time in clock cycles, and the vertical axis displays the segment number.

15/uts
Digital Computer Fundamentals

• The diagram shows the six tasks, T1 through T6. A task is the total operation performed after going through all
the segments of a pipeline. At the initial clock cycles, the segment 1 is busy with task Tl while all other segments
are idle.
• In the second clock cycle, the second segment starts with the first task output from the segment 1, while the
segment 1 starts with the second task (T2). Once the pipe is filled up, it will output one result per clock period,
independent of the number of stages in the pipe.

1 2 3 4 5 6 7 8 9
Clock cycle
Segment: 1 T1 T2 T3 T4 T5 T6

2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6

Fig. 6.6 Space time diagram for a segment pipeline

• Let us now expand our example, by considering the case of k-segment pipeline. Let us assume the clock cycle
time is Tp and a total of n tasks are to be executed.
• The first task requires a time equal to k Tp to complete its operation as it has to go through k stages of the
pipeline. The remaining (n-1) tasks emerge from the pipe at the rate of one task per clock cycle and they will
be completed after a time equal to (n-1) Tp. Therefore, to complete n tasks using a k segment pipeline, the total
time required is k + (n-1) clock periods.
• For example, in the pipelined processing of fig. 6.5, with four segments and six tasks. The time required to
complete all the operations is 4 + (6-1) = 9 clock cycles, as indicated in the fig. 3.6
• Now let us assume that in a non-pipelined processor, a task is executed in Tn times, then, the speedup obtained
form a k stage, n tasks pipeline can be given as:
S k = Time taken in non pipelined system /Time taken in a k stage pipelined system
= n Tn / (k + n - 1) Tp
• As the number of tasks increases, n becomes much larger than k - 1, and k + n - 1 approaches the value of n,
the speedup becomes
S k = Tn/Tp
• If we assume that the time it takes to process the task is same in the pipeline and the non-pipeline circuits, we
will have Tn = k Tp. Including this assumption, the speedup becomes:
S k = k Tp/Tp = k
• Example: Let us consider a pipeline system with the following specifications.
time takes to process a sub operation in each segment = 20 nsecs
number of segments in the pipeline =4
number of tasks being executed in sequence = 100
time takes to complete the pipeline = (k + n - 1) Tp
= (4 + 100 - 1) 20
= 2060 nsecs
• Drawback of a pipeline:
‚‚ The performance of a pipeline highly depends on the data dependency and branches in the program.
‚‚ In this memory, the RAM port is connected to the SAM port by the transfer gate with the bit line directly
connected to the data register, and the potentials at the bit line are amplified by the sense amplifier circuit
and are directly transferred to the data register.

16/uts
Digital Computer Fundamentals

Summary
• The chapter explains multiprocessor system and its functioning.
• Flynn’s Classification is based on the multiplicity of instruction streams and data streams in a computer
system.
• Flynn’s classification divides the computers into four categories that are SISD, SIMD, MISD and MIMD.
• A multiprocessor system is an interconnection of two or more CPU, with memory and input-output
equipment.
• Cache memories are high speed buffers which are inserted between the processor and the main memory to
capture those portions of the contents of main memory which are currently in use.
• Parallel processing means simultaneous use of more than one CPU or processor core to execute a program or
multiple computational threads.
• The key contribution of pipelining is that, it provides a way to start a new task before an old one has been
completed.

References
• Catanzaro, B. J., 2007, Multiprocessor System Architectures, PTR Prentice Hall.
• Jadhav, S.S., 2009, Advanced Computer Architecture and Computing, Technical Publications.
• Popovici, K., Jerraya, A. A., 2010, Embedded Software Design and Programming of Multiprocessor System-
on-Chip, Springer.

Recommended Reading
• Baer, J. L., Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors, Cambridge University,
1st ed.
• Bruce Jacob, David Wang 2007, Memory Systems: Cache, DRAM, Disk, Morgan Kaufmann.
• Jordan, H. F., 2002, Fundamentals of Parallel Processing, Prentice Hall.

17/uts
Digital Computer Fundamentals

Video Links
http://www.youtube.com/watch?v=mAK9KklkRYY
http://www.youtube.com/watch?v=wXnVAcvJWDk
http://www.youtube.com/watch?v=hP5SSm0alcU

18/uts
© Universal Training Solutions

You might also like