Digital Computer Fundamentals Unit III
Digital Computer Fundamentals Unit III
Digital Computer Fundamentals Unit III
Fundamentals
Unit III
Digital Computer Fundamentals
Unit III
This book is a part of the course by uts, Pune.
This book contains the course content for Digital Computer Fundamentals.
© uts, Pune
First Edition 2011
uts makes reasonable endeavours to ensure Content is current and accurate. uts reserves the right to alter it’s the
content whenever the need arises, and to vary it at any time without prior notice.
Published by
uts
Bavdhan, Pune - 411021
Website : www.utsglobal.edu.in
Tel : +91-20-41034800, +91 9011067684
Contents
Chapter V....................................................................................................................................................... 1
Central Processing Unit................................................................................................................................ 1
Aim................................................................................................................................................................. 1
Objectives....................................................................................................................................................... 1
Learning outcome......................................................................................................................................... 1
5.1 Introduction . ............................................................................................................................................ 2
5.2 CPU Operation.......................................................................................................................................... 2
5.3 Processor Organisation............................................................................................................................. 3
5.3.1 General Register....................................................................................................................... 3
5.3.2 Addressing Modes ................................................................................................................... 4
5.4 RISC, CISC and VLIW . .......................................................................................................................... 4
5.4.1 RISC ........................................................................................................................................ 4
5.4.2 VLIW........................................................................................................................................ 4
5.4.3 CISC.......................................................................................................................................... 5
5.5 Instruction Set........................................................................................................................................... 5
5.5.1 Code Density............................................................................................................................. 6
5.5.2 Number of Operands................................................................................................................. 6
Summary........................................................................................................................................................ 8
References...................................................................................................................................................... 8
Recommended Reading................................................................................................................................ 8
Chapter VI..................................................................................................................................................... 9
Introduction to Multiprocessor System...................................................................................................... 9
Aim................................................................................................................................................................. 9
Objectives....................................................................................................................................................... 9
Learning outcome......................................................................................................................................... 9
6.1 Introduction . .......................................................................................................................................... 10
6.2 Flynn’s Classification............................................................................................................................. 10
6.3 Characteristics of Multiprocessors...........................................................................................................11
6.3.1 Interprocessor Arbitration....................................................................................................... 12
6.3.2 Inter Processor Communication and Synchronisation............................................................ 12
6.3.3 Cache Coherence.................................................................................................................... 13
6.4 Parallel Processing.................................................................................................................................. 13
6.5 Types of Parallel Organisation................................................................................................................ 14
6.6 Pipelining................................................................................................................................................ 14
Summary...................................................................................................................................................... 17
References.................................................................................................................................................... 17
Recommended Reading.............................................................................................................................. 17
Video Links.................................................................................................................................................. 18
I/uts
Digital Computer Fundamentals
Chapter
V
Central Processing Unit
Aim
The aim of this chapter is to:
Objectives
The objectives of the chapter are to:
Learning outcome
At the end of this chapter, the students will be able to:
1/uts
Digital Computer Fundamentals
5.1 Introduction
• The CPU is the brain of computer. Sometimes simply referred to as the central processor, CPU is more commonly
called “Processor”. The CPU is where most calculations take place. In terms of computing power, the CPU is
the most important element of a computer system.
• On large machines, CPUs require one or more printed circuit boards. On personal computers and small
workstations, the CPU is housed in a single chip called a microprocessor. Since the 1970’s, the microprocessor
class of CPUs has almost completely overtaken all other CPU implementations.
• The CPU itself is an internal component of the computer. Modern CPUs are small and square in shape and
contain multiple metallic connectors or pins on the underside. The CPU is inserted directly into a CPU socket,
pin side down, on the motherboard.
• Each motherboard will support only a specific type or range of CPU, so one must check the motherboard
manufacturer’s specifications before attempting to replace or upgrade a CPU. Modern CPUs also have an
attached heat sink and small fan that is located at the top of the CPU to help dissipate heat.
Fetch:
• This involves retrieving an instruction (which is represented by a number or sequence of numbers) from program
memory. The location in program memory is determined by a program counter (PC), which stores a number
that identifies the current position in the program.
• In other words, the program counter keeps a track of the CPU’s place in the program. After an instruction is
fetched, the PC is incremented by the length of the instruction word in terms of memory units.
• Often the instruction to be fetched must be retrieved from relatively slow memory, causing the CPU to stall
while waiting for the instruction to be returned. This issue is largely addressed in modern processors by caches
and pipeline architectures.
• The instruction that the CPU fetches from memory is used to determine what the CPU is to do.
Decode:
• In the decode step, the instruction is broken up into parts that have significance to other portions of the CPU.
• The way in which the numerical instruction value is interpreted is defined by the CPU’s instruction set
architecture (ISA). Often, one group of numbers in the instruction, called the opcode, indicates which operation
to perform.
• The remaining parts of the number usually provide information required for that instruction, such as operands
for an addition operation. Such operands may be given as a constant value (called an immediate value), or as a
place to locate a value: a register or a memory address, as determined by some addressing mode.
• In older designs, the portions of the CPU responsible for instruction decoding, were unchangeable hardware
devices. However, in more abstract and complicated CPUs and ISAs, a micro program is often used to assist in
translating instructions into various configuration signals for the CPU.
• This micro program is sometimes rewritable, so that it can be modified to change the way the CPU decodes
instructions even after it has been manufactured.
2/uts
Digital Computer Fundamentals
Execute:
• After the fetch and decode steps, the execute step is performed. During this step, various portions of the CPU
are connected, so they can perform the desired operation. If, for instance, an addition operation was requested,
an arithmetic logic unit (ALU) will be connected to a set of inputs and a set of outputs.
• The inputs provide the numbers to be added, and the outputs will contain the final sum. The ALU contains the
circuitry to perform simple arithmetic and logical operations on the inputs (like addition and bitwise operations).
If the addition operation produces a result too large for the CPU to handle, an arithmetic overflow flag in a flags
register may also be set.
Write:
• The final step, write back, simply “writes back” the results of the execute step to some form of memory. Very
often, the results are written to some internal CPU register for quick access by subsequent instructions.
• In other cases, results may be written to slower, but cheaper and larger, main memory. Some types of instructions
manipulate the program counter rather than directly produce result data. These are generally called “jumps”
and facilitate behaviour like loops, conditional program execution (through the use of a conditional jump), and
functions in programs.
• Many instructions will also change the state of digits in a “flags” register. These flags can be used to influence
how a program behaves, since they often indicate the outcome of various operations. For example, one type of
“compare” instruction considers two values and sets a number in the flags register according to which one is
greater. This flag could then be used by a later jump instruction to determine program flow.
• After the execution of the instruction and write back of the resulting data, the entire process repeats, with the
next instruction cycle normally fetching the next-in-sequence instruction because of the incremented value in
the program counter.
• If the completed instruction was a jump, the program counter will be modified to contain the address of the
instruction that was jumped to, and program execution continues normally.
• In more complex CPUs than the ones described here, multiple instructions can be fetched, decoded, and executed
simultaneously. This section describes what generally is referred to as the “Classic RISC pipeline”, which in
fact is quite common among the simple CPUs used in many electronic devices (often called microcontroller).
• It largely ignores the important role of CPU cache, and therefore the access stage of the pipeline.
3/uts
Digital Computer Fundamentals
5.4.1 RISC
• Reduced instruction set computing or RISC, is a CPU design strategy based on the insight that simplified
(as opposed to complex) instructions can provide higher performance, if this simplicity enables much faster
execution of each instruction.
• A computer based on this strategy is a reduced instruction set computer (also RISC). There are many proposals
for precise definitions, but the term is slowly being replaced by the more descriptive load-store architecture.
• Well known RISC families include DEC Alpha, AMD 29k, ARC, ARM, Atmel AVR, MIPS, PA-RISC, Power
(including PowerPC), SuperH, and SPARC.
• Some aspects attributed to the first RISC-labeled designs around 1975 include the observations that the memory-
restricted compilers of the time were often unable to take advantage of features intended to facilitate manual
assembly coding, and that complex addressing modes take many cycles to perform due to the required additional
memory accesses.
• It was argued that such functions would be better performed by sequences of simpler instructions if this could
yield implementations small enough to leave room for many registers, reducing the number of slow memory
accesses.
• In these simple designs, most instructions are of uniform length and similar structure, arithmetic operations are
restricted to CPU registers and only separate load and store instructions access memory.
• These properties enable a better balancing of pipeline stages than before, making RISC pipelines significantly
more efficient and allowing higher clock frequencies.
5.4.2 VLIW
• Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction
level parallelism (ILP).
• A processor that executes every instruction one after the other (i.e., a non-pipelined scalar architecture) may
use processor resources inefficiently, potentially leading to poor performance.
• The performance can be improved by executing different sub-steps of sequential instructions simultaneously (this
is pipelining), or even executing multiple instructions entirely simultaneously as in superscalar architectures.
• Further improvement can be achieved by executing instructions in an order different from the order they appear
in the program; this is called out-of-order execution.
• As often implemented, these three techniques all come at a cost: increased hardware complexity. Before executing
any operations in parallel, the processor must verify that the instructions do not have interdependencies.
• For example, a result of first instruction is used as an input for the second instruction. Clearly, they cannot
execute at the same time, and the second instruction can’t be executed before the first.
4/uts
Digital Computer Fundamentals
• Modern out-of-order processors have increased the hardware resources which do the scheduling of instructions
and determining of interdependencies.
• The VLIW approach, on the other hand, executes operations in parallel based on a fixed schedule determined
when programs are compiled. Since determining the order of execution of operations (including which operations
can execute simultaneously) is handled by the compiler, the processor does not need the scheduling hardware
that the three techniques described above require.
• As a result, VLIW CPUs offer significant computational power with less hardware complexity (but greater
compiler complexity) than is associated with most superscalar CPUs.
• As is the case with any novel architectural approach, the concept is only as useful as code generation makes
it. That is, the fact that a number of special-purpose instructions are available to facilitate certain complicated
operations—say, Fast Fourier Transform (FFT) computation or certain calculations that recur in tomography
contexts—is useless if compilers are unable to spot relevant source code constructs and generate target code
that duly utilises the CPU’s advanced offerings.
• A fortiori, the programmer must be able to express his algorithms in a manner that makes the compiler’s task.
5.4.3 CISC
• A complex instruction set computer (CISC), is a computer where single instructions can execute several low-
level operations (such as a load from memory, an arithmetic operation, and a memory store) and/or are capable
of multi-step operations or addressing modes within single instructions.
• The term was retroactively coined in contrast to reduced instruction set computer (RISC).
5/uts
Digital Computer Fundamentals
6/uts
Digital Computer Fundamentals
so explicit 2-operand load/store instructions are needed: load a,reg1; load b,reg2; add reg1+reg2->reg3;
store reg3,c;
• More operands: Some CISC machines permit a variety of addressing modes that allow more than 3 operands
(registers or memory accesses), such as the VAX “POLY” polynomial evaluation instruction.
• Each instruction specifies some number of operands (registers, memory locations, or immediate values)
explicitly.
• Some instructions give one or both operands implicitly, such as by being stored on top of the stack or in an
implicit register.
• When some of the operands are given implicitly, the number of specified operands in an instruction is smaller
than the parity of the operation.
• When a “destination operand” explicitly specifies the destination, the number of operand specifies in an
instruction is larger than the parity of the operation. Some instruction sets have different numbers of operands
for different instructions.
7/uts
Digital Computer Fundamentals
Summary
• The chapter explains central processing unit as the brain of the computer.
• The instruction that the CPU fetches from memory is used to determine what the CPU is to do.
• The internal processor memory of a CPU is served by its registers.
• Addressing modes are aspects of the instruction set architecture in most central processing unit (CPU)
designs.
• Reduced instruction set computing, or RISC, is a CPU design strategy based on the insight that simplified
instructions can provide higher performance if this simplicity enables much faster execution of each
instruction.
• Very long instruction word or VLIW refers to a CPU architecture designed to take advantage of instruction
level parallelism.
• A complex instruction set computer (CISC), is a computer where single instructions can execute several low-
level operations and/or are capable of multi-step operations or addressing modes within single instructions.
• The various ways of implementing an instruction set give different tradeoffs between cost, performance, power
consumption, size, etc.
• Instruction sets may be categorised by the maximum number of operands explicitly specified in instructions.
References
• Katevenis, G.H., 2007, Reduced instruction set computer architectures for VLSI, MIT Press.
• Godse, D.A., Godse, A. P., 2006, Computer Organisation and Architecture, Technical Publications.
• Maini, A., 2007, Digital electronics: principles, devices and applications, John Wiley and Sons.
Recommended Reading
• Warford, J. S., Computer Systems, Jones & Bartlett publisher, 4th ed.
• Patterson, D., A., 2008, Computer Organization and Design, Fourth Edition: The Hardware/Software
Interface,
• White, R., 2007, How Computers Work, Que.
8/uts
Digital Computer Fundamentals
Chapter
VI
Introduction to Multiprocessor System
Aim
The aim of this chapter is to:
• comprehend pipelining
Objectives
The objectives of this chapter are to:
Learning outcome
At the end of this chapter, the students will be able to:
9/uts
Digital Computer Fundamentals
6.1 Introduction
• The computers can be made faster by increasing their clock speed or by using improved VLSI technology. But
there is a practical limitation to both of them. The only way that seems possible to theoretically increase the
speed in an unlimited fashion is to have multiple modules, that is, split the original problems into independent
modules and execute them simultaneously.
• When more number of processors do the job simultaneously, the speed also is more. “Parallel processing” is a
term used to denote a large class of techniques that are used to provide simultaneous data processing tasks for
the purpose of increasing the computational speed of a computer.
• Most basic parallel processing can be seen in uniprocessors (the systems with one CPU) as well. The simplest
parallel processing is the parallel transfer of data from one module to another in the computer. Instead of
transferring the data bit by bit, 8/16/32 bit are transferred simultaneously through the bus.
• Various cases of instruction execution also take place simultaneously. For example, while one instruction is being
executed in the 8088 based PC, the next instruction is fetched and kept in the queue, ready for execution. In a
similar manner, CPU and I/O operations are also overlapped to increase the overall speed of the computer.
Control
(Instruction Stream)
Memory
(Data Stream)
10/uts
Digital Computer Fundamentals
• MISD organisation consists of N processor units, each working on a different set of instructions but working on
the same set of data. The output of one unit becomes the input to the other unit. There is a commercial computer
of this kind.
(Instruction Streams)
Control 1 Control 2 Control n
Memory
(Data Stream)
• MIMD organisation implies interactions between N processors because all memory streams are derived from the
same data stream shared by all processors. If the interaction between the processors is high, it is called a tightly
coupled system, or else it is called a loosely coupled system. Most multiprocessors fit into this category.
(Instruction Streams)
Control 1 Control 2 Control n
Memory
(Data Stream)
11/uts
Digital Computer Fundamentals
12/uts
Digital Computer Fundamentals
• A binary value called a semaphore is usually used to indicate whether a processor is currently executing the
critical section.
13/uts
Digital Computer Fundamentals
• Multiprocessors are architectures composed of multiple computing units which operate in a synchronous mode.
Generally, both shared and local memory can be available for these processors. The communication between
the processors takes place either through the shared memory area or through the interprocessor messages.
• Dataflow architectures are functionally distributed architectures, in which the operations are triggered with the
arrival of data at these processors. They may be viewed as very general MIMD parallel architectures.
Multiport memory
• A multiport memory has a RAM port includes a memory cell array having a plurality of memory cells arranged
in a matrix form.
• It has a sense amplifier circuit for sensing potential of a bit line after the storage potential has been transferred
from the memory cells and restore circuit connected to the bit line for pulling up the potential of the bit line at
the predetermined timing after sense operation has been started.
• It has a barrier circuit connected between the bit line and the sense amplifier circuit; and a SAM port including
a data register, transfer gate and functional means for transferring serial data in the column direction.
• In this memory, the RAM port is connected to the SAM port by the transfer gate with the bit line directly
connected to the data register, and the potentials at the bit line are amplified by the sense amplifier circuit and
are directly transferred to the data register.
6.6 Pipelining
The basic idea behind pipeline design is quite natural; it is not specific to computer technology. In fact, the name
pipeline stems from the analogy with petroleum pipelines, in which a sequence of hydrocarbon products is pumped
through a pipeline.
• The last product might well be entering the pipeline before the first product has been removed from the terminus.
The key contribution of pipelining is that it provides a way to start a new task before an old one has been
completed. The concept can be better understood by taking an example of an industrial plant.
• To achieve pipelining, one must subdivide the input task into a sequence of subtasks, each of which can be
executed by a specialised hardware stage that operates concurrently with other stages in the hardware. Successive
tasks are streamed into the pipe and get executed in an overlapped fashion at the subtask level. Hence, the
completion rate is not a function of the total processing time, but rather of how soon a new process can be
introduced.
• The subdivision of labour in assembly lines has contributed to the success of mass production in modern industry.
By the same token, pipeline processing has led to the tremendous improvement of system throughput in the
modern digital computers.
14/uts
Digital Computer Fundamentals
• Now consider that for execution of an instruction, a computers performs N processing steps. A server unit as
represented in fig. 6.4 (a) can perform any one of the N steps in one unit time. If we process M instructions by
this server, then the rate of completion is one instruction of every N steps, and the time behaviour of the job
stream is as described in fig. 6.4 (a). Compare fig. 6.4 (a) with fig. 6.4 (b) which shows N servers concatenated
in a sequence each performing only a single step flows through the collection of servers by visiting server 1,
then server 2, and so on, and finally emerging from server N after N steps.
• The time behaviour of this system is shown in fig. 6.4 (b).
• Fig. 6.4 (b) is an ideal model of a constant speed assembly line, such as an automobile assembly plant.
N stpes N stpes
Job stream Server One completion every N steps
N units job (N) units
Time
(a) Sequential execution with N-unit server
1 2 3 4 N
One
Job stream Server Server Server completion
N units job (1 unit) (1 unit) (1 unit) every 1 step 1 2 3 4 N
1 2 3 4 N
1 2 3 4 N
Time
(b) Pipelined execution with 1 unit servers
• Thus, any arithmetic task is divided into steps. Thus, a sort of concurrent operations can be obtained in the
pipeline. This will help in increasing the throughput of the system.
• Though superficially it seems, that it should be possible to execute any application in various pipeline stages,
it is not actually so. Only those applications, which can be broken into independent subtasks, can take the
advantage of pipelining.
• It is most efficient for those applications that need to repeat the same task many times with different sets of data.
Fig. 6.5 shows the four segment pipeline.
Clock
Input
S1 R1 S2 R2 S3 R3 S4 R4
15/uts
Digital Computer Fundamentals
• The diagram shows the six tasks, T1 through T6. A task is the total operation performed after going through all
the segments of a pipeline. At the initial clock cycles, the segment 1 is busy with task Tl while all other segments
are idle.
• In the second clock cycle, the second segment starts with the first task output from the segment 1, while the
segment 1 starts with the second task (T2). Once the pipe is filled up, it will output one result per clock period,
independent of the number of stages in the pipe.
1 2 3 4 5 6 7 8 9
Clock cycle
Segment: 1 T1 T2 T3 T4 T5 T6
2 T1 T2 T3 T4 T5 T6
3 T1 T2 T3 T4 T5 T6
4 T1 T2 T3 T4 T5 T6
• Let us now expand our example, by considering the case of k-segment pipeline. Let us assume the clock cycle
time is Tp and a total of n tasks are to be executed.
• The first task requires a time equal to k Tp to complete its operation as it has to go through k stages of the
pipeline. The remaining (n-1) tasks emerge from the pipe at the rate of one task per clock cycle and they will
be completed after a time equal to (n-1) Tp. Therefore, to complete n tasks using a k segment pipeline, the total
time required is k + (n-1) clock periods.
• For example, in the pipelined processing of fig. 6.5, with four segments and six tasks. The time required to
complete all the operations is 4 + (6-1) = 9 clock cycles, as indicated in the fig. 3.6
• Now let us assume that in a non-pipelined processor, a task is executed in Tn times, then, the speedup obtained
form a k stage, n tasks pipeline can be given as:
S k = Time taken in non pipelined system /Time taken in a k stage pipelined system
= n Tn / (k + n - 1) Tp
• As the number of tasks increases, n becomes much larger than k - 1, and k + n - 1 approaches the value of n,
the speedup becomes
S k = Tn/Tp
• If we assume that the time it takes to process the task is same in the pipeline and the non-pipeline circuits, we
will have Tn = k Tp. Including this assumption, the speedup becomes:
S k = k Tp/Tp = k
• Example: Let us consider a pipeline system with the following specifications.
time takes to process a sub operation in each segment = 20 nsecs
number of segments in the pipeline =4
number of tasks being executed in sequence = 100
time takes to complete the pipeline = (k + n - 1) Tp
= (4 + 100 - 1) 20
= 2060 nsecs
• Drawback of a pipeline:
The performance of a pipeline highly depends on the data dependency and branches in the program.
In this memory, the RAM port is connected to the SAM port by the transfer gate with the bit line directly
connected to the data register, and the potentials at the bit line are amplified by the sense amplifier circuit
and are directly transferred to the data register.
16/uts
Digital Computer Fundamentals
Summary
• The chapter explains multiprocessor system and its functioning.
• Flynn’s Classification is based on the multiplicity of instruction streams and data streams in a computer
system.
• Flynn’s classification divides the computers into four categories that are SISD, SIMD, MISD and MIMD.
• A multiprocessor system is an interconnection of two or more CPU, with memory and input-output
equipment.
• Cache memories are high speed buffers which are inserted between the processor and the main memory to
capture those portions of the contents of main memory which are currently in use.
• Parallel processing means simultaneous use of more than one CPU or processor core to execute a program or
multiple computational threads.
• The key contribution of pipelining is that, it provides a way to start a new task before an old one has been
completed.
References
• Catanzaro, B. J., 2007, Multiprocessor System Architectures, PTR Prentice Hall.
• Jadhav, S.S., 2009, Advanced Computer Architecture and Computing, Technical Publications.
• Popovici, K., Jerraya, A. A., 2010, Embedded Software Design and Programming of Multiprocessor System-
on-Chip, Springer.
Recommended Reading
• Baer, J. L., Microprocessor Architecture: From Simple Pipelines to Chip Multiprocessors, Cambridge University,
1st ed.
• Bruce Jacob, David Wang 2007, Memory Systems: Cache, DRAM, Disk, Morgan Kaufmann.
• Jordan, H. F., 2002, Fundamentals of Parallel Processing, Prentice Hall.
17/uts
Digital Computer Fundamentals
Video Links
http://www.youtube.com/watch?v=mAK9KklkRYY
http://www.youtube.com/watch?v=wXnVAcvJWDk
http://www.youtube.com/watch?v=hP5SSm0alcU
18/uts
© Universal Training Solutions