Computer Architecture
Computer Architecture
Computer Architecture
com/site/computing9691/
The phrase Von Neumann architecture derives from a paper written by computer
scientist John von Neumann in1945. This describes a design architecture for an
electronic digital computer with subdivisions of a central arithmetic part, a central
control part, a memory to store both data and instructions, external storage, and input
and output mechanisms. The meaning of the phrase has evolved to mean a stored-
program computer. A stored-program digital computer is one that keeps its
programmed instructions, as well as its data, in read-write, random-access memory
(RAM). So John Von Neumann introduced the idea of the stored program.
Previously data and programs were stored in separate memories. Von Neumann
realised that data and programs are indistinguishable and can, therefore, use the same
memory. On a large scale, the ability to treat instructions as data is what makes
assemblers, compilers and other automated programming tools possible. One can
"write programs which write programs". This led to the introduction of compilers
which accepted high level language source code as input and produced binary code as
output.
The Von Neumann architecture uses a single processor which follows a linear
sequence of fetch-decode-execute. In order to do this, the processor has to use some
special registers, which are discrete memory locations with special purposes attached.
These are
Register Meaning
PC Program Counter
CIR Current Instruction Register
MAR Memory Address Register
MDR Memory Data Register
IR Index Register
Accumulator Holds results
The program counter keeps track of where to find the next instruction so that a copy
of the instruction can be placed in the current instruction register. Sometimes the
program counter is called the Sequence Control Register (SCR) as it controls the
sequence in which instructions are executed.
https://sites.google.com/site/computing9691/
Page 1 of 12
https://sites.google.com/site/computing9691/
The memory address register is used to hold the memory address that contains either
the next piece of data or an instruction that is to be used.
The memory data register acts like a buffer and holds anything that is copied from the
memory ready for the processor to use it.
The central processor contains the arithmetic-logic unit (also known as the arithmetic
unit) and the control unit. The arithmetic-logic unit (ALU) is where data is processed.
This involves arithmetic and logical operations. Arithmetic operations are those that
add and subtract numbers, and so on. Logical operations involve comparing binary
patterns and making decisions.
The control unit fetches instructions from memory, decodes them and synchronises
the operations before sending signals to other parts of the computer.
The accumulator is in the arithmetic unit, the program counter and the instruction
registers are in the control unit and the memory data register and memory address
register are in the processor.
A typical layout is shown in Fig. 3.3.a.1 which also shows the data paths.
Main Memory
Central Processing Unit (CPU)
Control Unit
ALU
PC Accumulator
CIR
MAR
MDR
Fig 3.3.a.1
https://sites.google.com/site/computing9691/
Page 2 of 12
https://sites.google.com/site/computing9691/
1. Load the address that is in the program counter (PC) into the memory address
register (MAR).
2. Increment the PC by 1.
3. Load the instruction that is in the memory address given by the MAR into the
memory data register (MDR).
4. Load the instruction that is now in the MDR into the current instruction
register (CIR).
5. Decode the instruction that is in the CIR.
6. If the instruction is a jump instruction then
a. Load the address part of the instruction into the PC
b. Reset by going to step 1.
7. Execute the instruction.
8. Reset by going to step 1.
Steps 1 to 4 are the fetch part of the cycle. Steps 5, 6a and 7 are the execute part of
the cycle and steps 6b and 8 are the reset part.
Step 1 simply places the address of the next instruction into the memory address
register so that the control unit can fetch the instruction from the right part of the
memory. The program counter is then incremented by 1 so that it contains the address
of the next instruction, assuming that the instructions are in consecutive locations.
The memory data register is used whenever anything is to go from the central
processing unit to main memory, or vice versa. Thus the next instruction is copied
from memory into the MDR and is then copied into the current instruction register.
Now that the instruction has been fetched the control unit can decode it and decide
what has to be done. This is the execute part of the cycle. If it is an arithmetic
instruction, this can be executed and the cycle restarted as the PC contains the address
of the next instruction in order. However, if the instruction involves jumping to an
instruction that is not the next one in order, the PC has to be loaded with the address
of the instruction that is to be executed next. This address is in the address part of the
current instruction, hence the address part is loaded into the PC before the cycle is
reset and starts all over again.
A CPU cannot do math on data registers, although it can do it indirectly with an index
register. The index register works with the data registers, allowing a program to
process strings of data efficiently. To process your first name, for example, a program
move 300 to MAR and zero to the index register. An indexed operation adds the index
value to the MDR, retrieving the letter at location 300. Next, the program increments
the index by one and gets the next letter. It repeats this process until it has moved the
whole name. By itself, the index register does little; its value is that it gives greater
speed and convenience to address registers. Further index register is covered in 3.5 (i).
https://sites.google.com/site/computing9691/
Page 3 of 12
https://sites.google.com/site/computing9691/
3.3 (c) The need for and the use of buses to convey data (Data,
Address and Control Buses)
A bus is a set of parallel wires connecting two or more components of the computer.
The CPU is connected to main memory by three separate buses. When the CPU
wishes to access a particular memory location, it sends this address to memory on the
address bus. The data in that location is then returned to the CPU on the data bus.
Control signals are sent along the control bus.
In Figure below, you can see that data, address and control buses connect the
processor, memory and I/O controllers. These are all system buses. Each bus is a
shared transmission medium, so that only one device can transmit along a bus at any
one time.
Data and control signals travel in both directions between the processor, memory and
I/O controllers. Address, on the other hand, travel only one way along the address
bus: the processor sends the address of an instruction, or of data to be stored or
retrieved, to memory to an I/O controller.
The control bus is a bi-directional bus meaning that signals can be carried in both
directions. The data and address buses are shared by all components of the system.
Control lines must therefore be provided to ensure that access to and use of the data
and address buses by the different components of the system does not lead to conflict.
The purpose of the control bus is to transmit command, timing and specific status
information between system components. Timing signals indicate the validity of data
and address information. Command signals specify operations to be performed.
Specific status signals indicate the state of a data transfer request, or the status of
request by a components to gain control of the system bus.
The data bus, typically consisting of 8, 16, or 32 separate lines provides a bi-
directional path for moving data and instructions between system components. The
width of the data bus is a key factor in determining overall system performance. For
example, if the data bus is 8 bits wide, and each instruction is 16 bits long, then the
processor must access the main memory twice during each instruction cycle.
Address bus: When the processor wishes to read a word (say 8, 16, or 32 bits) of data
from memory, it first puts the address of the desired word on the address bus. The
width of the address bus determines the maximum possible memory capacity of the
system. For example, if the address bus consisted of only 8 lines, then the maximum
address it could transmit would be (in binary) 11111111 or 255 – giving a maximum
https://sites.google.com/site/computing9691/
Page 4 of 12
https://sites.google.com/site/computing9691/
memory capacity of 256 (including address 0). A more realistic minimum bus width
would be 20 lines, giving a memory capacity of 220, i.e. 1Mb.
https://sites.google.com/site/computing9691/
Page 5 of 12
https://sites.google.com/site/computing9691/
First problem is that every piece of data and instruction has to pass across the data bus
in order to move from main memory into the CPU (and back again). This is a problem
because the data bus is a lot slower than the rate at which the CPU can carry out
instructions. This is called the 'Von Neumann bottleneck'. If nothing were done, the
CPU would spend most of its time waiting around for instructions. A special kind of
memory called a 'Cache' (pronounced 'cash') is used to tackle with this problem. OS
tries to fetch block of memory to cache, in a wake to fetch further required instruction
or data before hand.
Second problem is both data and programs share the same memory space.
This is a problem because it is quite easy for a poorly written or faulty piece of code
to write data into an area holding other instructions, so trashing that program.
Another problem is that the rate at which data needs to be fetched and the rate at
which instructions need to be fetched are often very different. And yet they share the
same bottlenecked data bus. To solve the problem idea of the Harvard Architecture is
considered that to split the memory into two parts. One part for data and another part
for programs. Each part is accessed with a different bus. This means the CPU can be
fetching both data and instructions at the same time. There is also less chance of
program corruption. This architecture is sometimes used within the CPU to handle its
caches, but it is less used with RAM because of complexity and cost.
https://sites.google.com/site/computing9691/
Page 6 of 12
https://sites.google.com/site/computing9691/
https://sites.google.com/site/computing9691/
Page 7 of 12
https://sites.google.com/site/computing9691/
Pipelining:
Using the Von Neumann architecture for a microprocessor illustrates that basically an
instruction can be in one of three phases. It could be being fetched (from memory),
decoded (by the control unit) or being executed (by the control unit). An alternative is
to split the processor up into three parts, each of which handles one of the three
stages. This would result in the situation shown in Fig. 3.3.d.1, which shows how this
process, known as pipelining, works. Where each single line is a pipeline.
Instruction 2 Instruction 1
Fig. 3.3.d.1
This helps with the speed of throughput unless the next instruction in the pipe is not
the next one that is needed. Suppose Instruction 2 is a jump to Instruction 10. Then
Instructions 3, 4 and 5 need to be removed from the pipe and Instruction 10 needs to
be loaded into the fetch part of the pipe. Thus, the pipe will have to be cleared and
the cycle restarted in this case. The result is shown in Fig. 3.3.d.2 below
Fetch Decode Execute
Instruction 1
Instruction 2 Instruction 1
Instruction 10
Instruction 11 Instruction 10
Fig. 3.3.d.2
The effect of pipe lining is that there are three instructions being dealt with at the
same time. This SHOULD reduce the execution times considerably (to approximately
1/3 of the standard times), however, this would only be true for a very linear program.
Once jump instructions are introduced the problem arises that the wrong instructions
https://sites.google.com/site/computing9691/
Page 8 of 12
https://sites.google.com/site/computing9691/
are in the pipe line waiting to be executed, so every time the sequence of instructions
changes, the pipe line has to be cleared and the process started again.
A sequential processor would examine each pixel one at a time and apply the
processing instruction. But you can also arrange for data such as this to be an array.
The simplest array is called a '1 dimensional array' or a 'vector'. A slightly more
complicated array would have rows and columns. This is called a '2 dimensional'
array or 'matrix'. The matrix is fundamental to graphics work.
An array processor (or vector processor) has a number of Arithmetic Logic Units
(ALU) that allows all the elements of an array to be processed at the same time.
The illustration Fig. 3.3.d.3 below shows the architecture of an array or vector
processor
Fig. 3.3.d.3
With an array processor, a single instruction is issued by a control unit and that
instruction is applied to a number of data sets at the same time.
Limitations.
This architecture relies on the fact that the data sets are all acting on a single
instruction. However, if these data sets somehow rely on each other then you cannot
apply parallel processing. For example, if data A has to be processed before data B
then you cannot do A and B simultaneously. This dependency is what makes parallel
processing difficult to implement. And it is why sequential machines are still
extremely common.
https://sites.google.com/site/computing9691/
Page 9 of 12
https://sites.google.com/site/computing9691/
Multiple Processors:
Moving on from an array processor, where a single instruction acts upon multiple data
sets and the next level of parallel processing is to have multiple instructions acting
upon multiple data sets. This is achieved by having a number of CPUs being applied
to a single problem, with each CPU carrying out only part of the overall problem.
Fig. 3.3.d.4
But even the humble CPU chip in your personal computers is likely to have multiple
cores. For example the Intel Core Duo has two CPUs (called 'cores') inside the chip,
whilst the Quad core has four. A multi-core computer is a 'Multiple Instruction
Multiple Data' computer or MIMD.
Furthermore, the software programmer has to write the code to take advantage of the
multi-core CPUs. This is actually quite difficult and even now most applications
running on a multi-core CPU such as the Intel 2 Duo will not be making use of both
cores most of the time.
https://sites.google.com/site/computing9691/
Page 10 of 12
https://sites.google.com/site/computing9691/
Maths co-processor
So far, we have discussed parallel processing as a means of speeding up data
processing. This is fine but it does make an assumption that the Arithmetic Logic Unit
(ALU) within the CPU is perfect for handling all kinds of data. And this is not always
true. There are two basic ways of doing calculations within a CPU that is integer
maths which only deal with whole numbers and floating point maths which can deal
with decimal or fractional numbers.
But if the computer is dedicated to handling heavy floating point work then it may be
worth it. For instance a computer within a signal processing card in a communication
system may include a maths co-processor to process the incoming data as quickly as
possible.
https://sites.google.com/site/computing9691/
Page 11 of 12
https://sites.google.com/site/computing9691/
The questions in this section are meant to mirror the type and form of
questions that a candidate would expect to see in an exam paper. As before,
the individual questions are each followed up with comments from an
examiner.
b) Describe two ways in which the program counter can change during the
normal execution of a program, explaining, in each case, how this change is
initiated. (4)
c) Describe the initial state of the program counter before the running of the
program. (2)
b) State one type of instruction that would cause the pipeline system to be reset,
explaining why such a reset is necessary. (3)
https://sites.google.com/site/computing9691/
Page 12 of 12