Computer Architecture

Download as pdf or txt
Download as pdf or txt
You are on page 1of 12

https://sites.google.

com/site/computing9691/

Chapter 3.3 Computer Architecture and the Fetch-Execute Cycle

3.3 (a) Von Neumann Architecture


The earliest computing machines had fixed programs. For example, a desk calculator
(in principle) is a fixed program computer. It can do basic mathematics, but it cannot
be used as a word processor or a gaming console. Changing the program of a fixed-
program machine requires re-wiring, re-structuring, or re-designing the machine. The
earliest computers were not so much "programmed" as they were "designed".
"Reprogramming", when it was possible at all, was a laborious process, starting with
flowcharts and paper notes, followed by detailed engineering designs, and then the
often-arduous process of physically re-wiring and re-building the machine. It could
take three weeks to set up a program on ENIAC (a computer of 1940s) and get it
working.

The phrase Von Neumann architecture derives from a paper written by computer
scientist John von Neumann in1945. This describes a design architecture for an
electronic digital computer with subdivisions of a central arithmetic part, a central
control part, a memory to store both data and instructions, external storage, and input
and output mechanisms. The meaning of the phrase has evolved to mean a stored-
program computer. A stored-program digital computer is one that keeps its
programmed instructions, as well as its data, in read-write, random-access memory
(RAM). So John Von Neumann introduced the idea of the stored program.
Previously data and programs were stored in separate memories. Von Neumann
realised that data and programs are indistinguishable and can, therefore, use the same
memory. On a large scale, the ability to treat instructions as data is what makes
assemblers, compilers and other automated programming tools possible. One can
"write programs which write programs". This led to the introduction of compilers
which accepted high level language source code as input and produced binary code as
output.

The Von Neumann architecture uses a single processor which follows a linear
sequence of fetch-decode-execute. In order to do this, the processor has to use some
special registers, which are discrete memory locations with special purposes attached.
These are

Register Meaning
PC Program Counter
CIR Current Instruction Register
MAR Memory Address Register
MDR Memory Data Register
IR Index Register
Accumulator Holds results

The program counter keeps track of where to find the next instruction so that a copy
of the instruction can be placed in the current instruction register. Sometimes the
program counter is called the Sequence Control Register (SCR) as it controls the
sequence in which instructions are executed.

https://sites.google.com/site/computing9691/
Page 1 of 12
https://sites.google.com/site/computing9691/

The current instruction register holds the instruction that is to be executed.

The memory address register is used to hold the memory address that contains either
the next piece of data or an instruction that is to be used.

The memory data register acts like a buffer and holds anything that is copied from the
memory ready for the processor to use it.

The central processor contains the arithmetic-logic unit (also known as the arithmetic
unit) and the control unit. The arithmetic-logic unit (ALU) is where data is processed.
This involves arithmetic and logical operations. Arithmetic operations are those that
add and subtract numbers, and so on. Logical operations involve comparing binary
patterns and making decisions.

The control unit fetches instructions from memory, decodes them and synchronises
the operations before sending signals to other parts of the computer.

The accumulator is in the arithmetic unit, the program counter and the instruction
registers are in the control unit and the memory data register and memory address
register are in the processor.

An index register is a microprocessor register used for modifying operand addresses


during the run of a program, typically for doing vector/array operations. Index
registers are used for a special kind of indirect addressing (covered in 3.5 (i) ) where
an immediate constant (i.e. which is part of the instruction itself) is added to the
contents of the index register to form the address to the actual operand or data.

A typical layout is shown in Fig. 3.3.a.1 which also shows the data paths.

Main Memory
Central Processing Unit (CPU)

Control Unit
ALU
PC Accumulator
CIR

MAR

MDR

Fig 3.3.a.1

https://sites.google.com/site/computing9691/
Page 2 of 12
https://sites.google.com/site/computing9691/

3.3 (b) The Fetch-Decode-Execute-Reset Cycle


The following is an algorithm that shows the steps in the cycle. At the end the cycle
is reset and the algorithm repeated.

1. Load the address that is in the program counter (PC) into the memory address
register (MAR).
2. Increment the PC by 1.
3. Load the instruction that is in the memory address given by the MAR into the
memory data register (MDR).
4. Load the instruction that is now in the MDR into the current instruction
register (CIR).
5. Decode the instruction that is in the CIR.
6. If the instruction is a jump instruction then
a. Load the address part of the instruction into the PC
b. Reset by going to step 1.
7. Execute the instruction.
8. Reset by going to step 1.

Steps 1 to 4 are the fetch part of the cycle. Steps 5, 6a and 7 are the execute part of
the cycle and steps 6b and 8 are the reset part.

Step 1 simply places the address of the next instruction into the memory address
register so that the control unit can fetch the instruction from the right part of the
memory. The program counter is then incremented by 1 so that it contains the address
of the next instruction, assuming that the instructions are in consecutive locations.

The memory data register is used whenever anything is to go from the central
processing unit to main memory, or vice versa. Thus the next instruction is copied
from memory into the MDR and is then copied into the current instruction register.

Now that the instruction has been fetched the control unit can decode it and decide
what has to be done. This is the execute part of the cycle. If it is an arithmetic
instruction, this can be executed and the cycle restarted as the PC contains the address
of the next instruction in order. However, if the instruction involves jumping to an
instruction that is not the next one in order, the PC has to be loaded with the address
of the instruction that is to be executed next. This address is in the address part of the
current instruction, hence the address part is loaded into the PC before the cycle is
reset and starts all over again.

A CPU cannot do math on data registers, although it can do it indirectly with an index
register. The index register works with the data registers, allowing a program to
process strings of data efficiently. To process your first name, for example, a program
move 300 to MAR and zero to the index register. An indexed operation adds the index
value to the MDR, retrieving the letter at location 300. Next, the program increments
the index by one and gets the next letter. It repeats this process until it has moved the
whole name. By itself, the index register does little; its value is that it gives greater
speed and convenience to address registers. Further index register is covered in 3.5 (i).

https://sites.google.com/site/computing9691/
Page 3 of 12
https://sites.google.com/site/computing9691/

3.3 (c) The need for and the use of buses to convey data (Data,
Address and Control Buses)

A bus is a set of parallel wires connecting two or more components of the computer.
The CPU is connected to main memory by three separate buses. When the CPU
wishes to access a particular memory location, it sends this address to memory on the
address bus. The data in that location is then returned to the CPU on the data bus.
Control signals are sent along the control bus.
In Figure below, you can see that data, address and control buses connect the
processor, memory and I/O controllers. These are all system buses. Each bus is a
shared transmission medium, so that only one device can transmit along a bus at any
one time.

Data and control signals travel in both directions between the processor, memory and
I/O controllers. Address, on the other hand, travel only one way along the address
bus: the processor sends the address of an instruction, or of data to be stored or
retrieved, to memory to an I/O controller.

The control bus is a bi-directional bus meaning that signals can be carried in both
directions. The data and address buses are shared by all components of the system.
Control lines must therefore be provided to ensure that access to and use of the data
and address buses by the different components of the system does not lead to conflict.
The purpose of the control bus is to transmit command, timing and specific status
information between system components. Timing signals indicate the validity of data
and address information. Command signals specify operations to be performed.
Specific status signals indicate the state of a data transfer request, or the status of
request by a components to gain control of the system bus.

The data bus, typically consisting of 8, 16, or 32 separate lines provides a bi-
directional path for moving data and instructions between system components. The
width of the data bus is a key factor in determining overall system performance. For
example, if the data bus is 8 bits wide, and each instruction is 16 bits long, then the
processor must access the main memory twice during each instruction cycle.

Address bus: When the processor wishes to read a word (say 8, 16, or 32 bits) of data
from memory, it first puts the address of the desired word on the address bus. The
width of the address bus determines the maximum possible memory capacity of the
system. For example, if the address bus consisted of only 8 lines, then the maximum
address it could transmit would be (in binary) 11111111 or 255 – giving a maximum

https://sites.google.com/site/computing9691/
Page 4 of 12
https://sites.google.com/site/computing9691/

memory capacity of 256 (including address 0). A more realistic minimum bus width
would be 20 lines, giving a memory capacity of 220, i.e. 1Mb.

https://sites.google.com/site/computing9691/
Page 5 of 12
https://sites.google.com/site/computing9691/

Von Neumann is a very successful architecture, but it has its problems.

First problem is that every piece of data and instruction has to pass across the data bus
in order to move from main memory into the CPU (and back again). This is a problem
because the data bus is a lot slower than the rate at which the CPU can carry out
instructions. This is called the 'Von Neumann bottleneck'. If nothing were done, the
CPU would spend most of its time waiting around for instructions. A special kind of
memory called a 'Cache' (pronounced 'cash') is used to tackle with this problem. OS
tries to fetch block of memory to cache, in a wake to fetch further required instruction
or data before hand.

Second problem is both data and programs share the same memory space.
This is a problem because it is quite easy for a poorly written or faulty piece of code
to write data into an area holding other instructions, so trashing that program.

Another problem is that the rate at which data needs to be fetched and the rate at
which instructions need to be fetched are often very different. And yet they share the
same bottlenecked data bus. To solve the problem idea of the Harvard Architecture is
considered that to split the memory into two parts. One part for data and another part
for programs. Each part is accessed with a different bus. This means the CPU can be
fetching both data and instructions at the same time. There is also less chance of
program corruption. This architecture is sometimes used within the CPU to handle its
caches, but it is less used with RAM because of complexity and cost.

https://sites.google.com/site/computing9691/
Page 6 of 12
https://sites.google.com/site/computing9691/

3.3 (d) Parallel Processor Systems

Von Neumann architecture is a sequential processing machine but if we could process


more than one piece of data at the same time? This would dramatically speed up the
rate at which processing could occur. This is the idea behind 'parallel processing'.
Parallel processing is the simultaneous processing of data. There are a number of
ways to carry out parallel processing; the table below shows each one of them and
how they are applied in real life.

Types of parallel Class of computer Application


processing
Pipeline Single Instruction Single Inside a CPU
Data (SISD)
Array Processor Single Instruction Multiple Graphics cards, games
Data SIMD consoles
Multi-Core Multiple Instruction Super computers,
Multiple Data MIMD modern multi-core chips

Advantages of parallel processing over the Von Neumann architecture


• Faster when handling large amounts of data, with each data set requiring the
same processing (array and multi-core methods)
• Is not limited by the bus transfer rate (the Von Neumann bottleneck)
• Can make maximum use of the CPU (pipeline method) in spite of the
bottleneck
Disadvantages
• Only certain types of data are suitable for parallel processing. Data that relies
on the result of a previous operation cannot be made parallel. For parallel
processing, each data set must be independent of each other.
• More costly in terms of hardware - multiple processing blocks needed, this
applies to all three methods

https://sites.google.com/site/computing9691/
Page 7 of 12
https://sites.google.com/site/computing9691/

Pipelining:
Using the Von Neumann architecture for a microprocessor illustrates that basically an
instruction can be in one of three phases. It could be being fetched (from memory),
decoded (by the control unit) or being executed (by the control unit). An alternative is
to split the processor up into three parts, each of which handles one of the three
stages. This would result in the situation shown in Fig. 3.3.d.1, which shows how this
process, known as pipelining, works. Where each single line is a pipeline.

Fetch Decode Execute


Instruction 1

Instruction 2 Instruction 1

Instruction 3 Instruction 2 Instruction 1

Instruction 4 Instruction 3 Instruction 2

Instruction 5 Instruction 4 Instruction 3

Fig. 3.3.d.1

This helps with the speed of throughput unless the next instruction in the pipe is not
the next one that is needed. Suppose Instruction 2 is a jump to Instruction 10. Then
Instructions 3, 4 and 5 need to be removed from the pipe and Instruction 10 needs to
be loaded into the fetch part of the pipe. Thus, the pipe will have to be cleared and
the cycle restarted in this case. The result is shown in Fig. 3.3.d.2 below
Fetch Decode Execute
Instruction 1

Instruction 2 Instruction 1

Instruction 3 Instruction 2 Instruction 1

Instruction 4 Instruction 3 Instruction 2

Instruction 10

Instruction 11 Instruction 10

Instruction 12 Instruction 11 Instruction 10

Fig. 3.3.d.2
The effect of pipe lining is that there are three instructions being dealt with at the
same time. This SHOULD reduce the execution times considerably (to approximately
1/3 of the standard times), however, this would only be true for a very linear program.
Once jump instructions are introduced the problem arises that the wrong instructions
https://sites.google.com/site/computing9691/
Page 8 of 12
https://sites.google.com/site/computing9691/

are in the pipe line waiting to be executed, so every time the sequence of instructions
changes, the pipe line has to be cleared and the process started again.

Array or Vector processing:


Some types of data can be processed independently of one another. A good example
of this is the simple processing of pixels on a screen. If you wanted to make each
coloured pixel a different colour according to what it currently holds. For example
"Make all Red Pixels Blue", "Make Blue Pixels Red", "Leave Green pixels alone".

A sequential processor would examine each pixel one at a time and apply the
processing instruction. But you can also arrange for data such as this to be an array.
The simplest array is called a '1 dimensional array' or a 'vector'. A slightly more
complicated array would have rows and columns. This is called a '2 dimensional'
array or 'matrix'. The matrix is fundamental to graphics work.

An array processor (or vector processor) has a number of Arithmetic Logic Units
(ALU) that allows all the elements of an array to be processed at the same time.
The illustration Fig. 3.3.d.3 below shows the architecture of an array or vector
processor

Fig. 3.3.d.3
With an array processor, a single instruction is issued by a control unit and that
instruction is applied to a number of data sets at the same time.

An array processor is a Single Instruction Multiple Data computer or SIMD


You will find games consoles and graphics cards making heavy use of array
processors to shift those pixels about.

Limitations.
This architecture relies on the fact that the data sets are all acting on a single
instruction. However, if these data sets somehow rely on each other then you cannot
apply parallel processing. For example, if data A has to be processed before data B
then you cannot do A and B simultaneously. This dependency is what makes parallel
processing difficult to implement. And it is why sequential machines are still
extremely common.
https://sites.google.com/site/computing9691/
Page 9 of 12
https://sites.google.com/site/computing9691/

Multiple Processors:
Moving on from an array processor, where a single instruction acts upon multiple data
sets and the next level of parallel processing is to have multiple instructions acting
upon multiple data sets. This is achieved by having a number of CPUs being applied
to a single problem, with each CPU carrying out only part of the overall problem.

Fig. 3.3.d.4

A good example of this architecture is a supercomputer. For example the massively


parallel IBM Blue Gene supercomputer that has 4,098 processors, allowing for 560
TeraFlops of processing. This is applied to problems such as predicting climate
change or running new drug simulations. Large problems that can be broken down
into smaller sub-problems.

But even the humble CPU chip in your personal computers is likely to have multiple
cores. For example the Intel Core Duo has two CPUs (called 'cores') inside the chip,
whilst the Quad core has four. A multi-core computer is a 'Multiple Instruction
Multiple Data' computer or MIMD.

Limitations of multi-core processing:


This architecture is dependent on being able to cut a problem down into chunks, each
chunk can then be processed independently. But not many problems can be broken
down in this way and so it remains a less used architecture.

Furthermore, the software programmer has to write the code to take advantage of the
multi-core CPUs. This is actually quite difficult and even now most applications
running on a multi-core CPU such as the Intel 2 Duo will not be making use of both
cores most of the time.

https://sites.google.com/site/computing9691/
Page 10 of 12
https://sites.google.com/site/computing9691/

Maths co-processor
So far, we have discussed parallel processing as a means of speeding up data
processing. This is fine but it does make an assumption that the Arithmetic Logic Unit
(ALU) within the CPU is perfect for handling all kinds of data. And this is not always
true. There are two basic ways of doing calculations within a CPU that is integer
maths which only deal with whole numbers and floating point maths which can deal
with decimal or fractional numbers.

Large-number ranges are best handled as 'floating-point' numbers which is discussed


in 3.4. Handling floating point numbers efficiently requires wide registers to deal with
a calculation in one go. And the CPU architect may not want to dedicate precious
hardware space in his CPU for these wider registers.

So the idea of a 'Maths co-processor' came about. A co-processor is especially


designed to carry out floating point calculation extremely quickly. It co-exists with
the CPU on the motherboard. Whenever a floating point calculation needs to be done,
the CPU hands the task over to the co-processor then carries on with doing something
else until the task is complete.

The advantage of having a co-processor is that calculation (and hence performance) is


much faster. The disadvantage is that it is more expensive, requires more motherboard
space and takes more power.

But if the computer is dedicated to handling heavy floating point work then it may be
worth it. For instance a computer within a signal processing card in a communication
system may include a maths co-processor to process the incoming data as quickly as
possible.

https://sites.google.com/site/computing9691/
Page 11 of 12
https://sites.google.com/site/computing9691/

3.3 Example Questions

The questions in this section are meant to mirror the type and form of
questions that a candidate would expect to see in an exam paper. As before,
the individual questions are each followed up with comments from an
examiner.

1. The Program Counter (Sequence Control Register) is a special register in the


processor of a computer.

a) Describe the function of the program counter. (2)

b) Describe two ways in which the program counter can change during the
normal execution of a program, explaining, in each case, how this change is
initiated. (4)

c) Describe the initial state of the program counter before the running of the
program. (2)

2. Explain what is meant by the term Von Neumann Architecture. (2)

3. Describe the fetch/decode part of the fetch/decode/execute/reset cycle,


explaining the purpose of any special registers that you have mentioned. (7)

4. a) Describe how pipelining normally speeds up the processing done by a


computer. (2)

b) State one type of instruction that would cause the pipeline system to be reset,
explaining why such a reset is necessary. (3)

https://sites.google.com/site/computing9691/
Page 12 of 12

You might also like