Multiprocessor Architecture and Programming

UNIT-5
MULTIPROCESSORS ARCHITECTURE AND

PROGRAMMING.
MULTIPROCESSORS:
• A multiprocessor system is an interconnection of two or more CPU’s with
memory and input-output equipment.
• Multiprocessors system are classified as multiple instruction stream, multiple

data stream systems(MIMD).
• There exists a distinction between multiprocessor and multicomputers that though

both support concurrent operations.
• In multicomputers several autonomous computers are connected through a

network and they may or may not communicate but in a multiprocessor system
there is a single OS Control that provides interaction between processors and all
the components of the system to cooperate in the solution of the problem.
• VLSI circuit technology has reduced the cost of the computers to such a low
Level that the concept of applying multiple processors to meet system performance
requirements has become an attractive design possibility.
FUNCTIONAL STRUCTURES.
LOSSELY COUPLED MULTIPROCESSORS:
 Multiprocessor is one which has more than two processors in the system.
Now when the degree of coupling between these processors is very low, the
system is called loosely coupled multiprocessor system.
 In loosely coupled system each processor has its own local memory, a set
of input-output devices and a channel and arbiter switch (CAS). We refer
to the processor with its local memory and set of input-output devices and
CAS as a computer module.
 Processes that execute on different computer modules communicate with

each other by exchanging the messages through a physical segment
of message transfer system (MTS). The loosely coupled system is also
known as distributed system. The loosely coupled system is efficient when
the processes running on different computer module require minimal
interaction.
TIGHTLLY COUPLED MULTIPROCESSORS:

 The throughput of the loosely coupled system may be too low for some of
the applications that require fast access time. In this case, Tightly coupled
microprocessor system must be used. The tightly coupled system
has processors, shared memory modules, input-output channels.
 The above units of the tightly coupled system are connected through the set
of three interconnection networks, processor-memory interconnection
network (PMIN), I/O-processor interconnection network (IOPIN) and
the interrupt-signal interconnection network (ISIN). The use of these three
interconnection networks is as follow.
 PMIN: It is a switch which connects each processor to every memory
module. It can also be designed in a way that a processor can broadcast data
to one or more memory module.
 ISIN: It allows each processor to direct an interrupt to any other
processor.
 IOPIN: It allows a processor to communicate with an I/O channel which
is connected to input-output devices.
PROCESSOR CHARACTRISTICS FOR MULTIPROCESSING:

process recover-ability: The processes and the processor should be considered as
two different entities. In case of a process failure, the interrupted process should
be retrieved and executed by another processor. If a processor does not support this
feature, the reliability of a multiprocessor is reduced. Most processor maintain the
state of the currently executing instruction in the internal registers. In case of a
failure the result is not written back to the memory.
Efficient context switching: The processor should support more than one

addressing domain (context) and thus, an efficient context switching for the
effective utilization of resources. The context switch time is the total time taken to
switch from one process to another process, which includes saving the current state
of one process and restoring the state of new process. A special instruction and a
number of register sets are used to accomplish the context switching efficiency.
The multiprocessor architecture should support such instruction and the register
sets.
Large virtual and physical address space: The processor should support large
physical address space to allow programs to access large amount of data if
required. It should also support a large virtual address space. The virtual memory
space should be segmented to achieve modular sharing. It should also provide the
mechanisms for memory protection and checking software reliability .
Effective synchronization primitives: The processors should provide

implementation of some mechanisms that can be used to establish mutual
exclusion among concurrently executing cooperating processes. The mechanisms
for achieving mutual exclusion need certain operations to be treated and executed
as indivisible operations, Semaphore is one of the generally used mechanisms to
achieve mutual exclusion.
Inter processor communication mechanism: The processors that are used to

build multiprocessor must have support for inter-processor communication. The
mechanism for inter-processor communication must be implemented in the
hardware. Such a mechanism becomes especially useful when processors
frequently exchange requests for services.
INTERCONNECTION NETWORKS.
TIME SHARED OR COMMON BUSES:
 A common-bus multiprocessor system consists of a number of processors
connected through a common path to a memory unit.
 Disadvantage - Only one processor can communicate with the memory or
another processor at any given time. As a consequence, the total overall
transfer rate within the system is limited by the speed of the single path .
 A more economical implementation of a dual bus structure is depicted in
Fig. below. 4. Part of the local memory may be designed as a cache memory
attached to the CPU.
CROSSBAR SWITCH:
 Consists of a number of crosspoints that are placed at intersections
between processor buses and memory module paths.
 The small square in each crosspoint is a switch that determines the path
from a processor to a memory module.
 Advantage - Supports simultaneous transfers from all memory modules .
 Disadvantage - The hardware required to implement the switch can
become quite large and complex.
MULTIPORT MEMOTIES:
 A multiport memory system employs separate buses between each
memory module and each CPU.
 The module must have internal control logic to determine which port
will have access to memory at any given time.
 Memory access conflicts are resolved by assigning fixed priorities to
each memory port.
 Advantage - The high transfer rate can be achieved because of the
multiple paths.
 Disadvantage - It requires expensive memory control logic and a large
number of cables and connections
MULTITAGE INTERCONNECTION NETWORK:
 The basic component of a multistage network is a two-input, two-output

interchange switch as shown in Fig. below.
 Using the 2x2 switch as a building block, it is possible to build a multistage
network to control the communication between a number of sources and
destinations.
 One such topology is the omega switching network shown in Fig. below
PERFORMANCE OF INTERCONNECTION NETWORK:
 A multiprocessor system consists of multiple processing units connected via
some interconnection network plus the software needed to make the
processing units work together.
 A number of communication styles exist for multiprocessing networks.
These can be broadly classified according to the communication model as
shared memory (single address space) versus message passing (multiple
address spaces).
o Communication in shared memory systems is performed by writing
to and reading from the global memory
o Communication in message passing systems is accomplished via send
and receive commands.
o In both cases, the interconnection network plays a major role in
determining the communication speed. Two schemes are introduced,
namely static and dynamic interconnection networks.
 Static networks form all connections when the system is
designed rather than when the connection is needed. In a
static network, messages must be routed along established
links. (hypercube, mesh, and k-ary n-cube topologies)
 Dynamic interconnection networks establish connections
between two or more nodes on the fly as messages are routed
along the links(bus, crossbar, and multistage interconnection ).
Analysis and Performance:

Dynamic Networks
 The Crossbar;
o the cost of the crossbar system can be measured in terms of the
number of switching elements (cross points) required inside the
crossbar. The crossbar possesses a quadratic rate of cost (complexity)
given by .
o The delay (latency) within a crossbar switch, measured in terms of
the amount of the input to output delay, is constant.The crossbar
possesses a constant rate of delay (latency) given by . It should
be noted that the high cost (complexity) of the crossbar network
pays off in the form of reduction in the time (latency).
o The crossbar is however a nonblocking network; that is, it allows
multiple output connection pattern (permutation) to be achieved.
o A fault-tolerant system can be simply defined as a system that can
still function even in the presence of faulty components inside the
system. The crossbar can be affected by a single-point failure.
Nevertheless, segmenting the crossbar and realizing each segment
independently can reduce the effect of a single-point failure in a
crossbar.
 Multiple Bus;
o It consists of memory modules, processors, and buses. A
given bus is dedicated to a particular processor for the duration of a
bus transaction.
o A processor?memory transfer can use any of the available buses.
Given buses in the system, then up to requests for memory
use can be served simultaneously.
o A multiple bus possesses an rate of cost (complexity) growth.
o The multiple bus possesses an rate of delay (latency)

growth.
o Multiple bus-multiprocessor organization offers the desirable feature
of being highly reliable and fault-tolerant. This is because a single bus
failure in a bus system will leave ( ) distinct fault-free paths

between the processors and thememory modules.
o On the other hand, when the number of buses is less than the
number of memory modules (or the number of processors), bus
contention is expected to increase.
 Multistage Interconnection Networks;
o Each stage consists of , SEs.

o The network cost (complexity), measured in terms of the total
number of SEs, is .

o The latency (time) complexity, measured by the number of SEs along
the path from input to output, is .

o Simplicity of message routing inside a MIN is a desirable feature of
such networks. There exists a unique path between a given input?
output pair.
o MINs are characterized as being 0-fault tolerant; that is, a MIN
cannot tolerate the failure of a single component.
Figure 2.10: Performance Comparison of Dynamic Networks.
Static Networks
1. Degree of a node, d, is defined as the number of channels incident on the

node.
2. Diameter, D, of a network having N nodes is defined as the longest path, p,
of the shortest paths between any two nodes. For example, the diameter of
a Mesh .
3. A network is said to be symmetric if it is isomorphic to itself with any node
labeled as the origin; that is, the network looks the same from any node.
Rings and Tori networks are symmetric while linear arrays and mesh
networks are not.
 Completely Connected Networks (CCNs);

o the cost of a completely connected network having N nodes,
measured in terms of the number of links in the network, is given
by , that is, .

o The delay (latency) complexity of CCNs, measured in terms of the
number of links traversed as messages are routed from any source to
any destination, is constant, that is, O(1).
o The degree of a node in CCN is , that is, , while the
diameter is .
 Linear Array Networks (LCNs)
o In this network architecture, each node is connected to its two
immediate neighboring nodes. Each of the two nodes at the extreme
ends of the network is connected only to its single immediate
neighbor.
o The network cost (complexity) measured in terms of the number of
nodes of the linear array is .

o The delay (latency) complexity measured in terms of the average
number of nodes that must be traversed to reach from a source node
to a destination node is , that is, .
o The node degree in the linear array is , that is, and the
diameter is , that is, .
Figure 2.11: Performance Characteristics of Static Networks.
PARALLEL MEMORY ORGANIZATIONS.

INTERLEAVED MEMORY COFIGURATION:
 Memory interleaving is a concept of dividing the main memory in a number
of modules or banks with each module interacting with processor
independent of others.
 Each memory modules has its own memory address register and data
register. Memory Address Register(MAR) is connected with common
unidirectional memory bus and data register is connected with common
data bus.
 When CPU requests a data, it sends the address to memory via common
memory bus. The address is saved in the memory address register. The
data is now retrived from module and kept in data register, from where it is
sent to the cpu via common data bus.
To understand this scenario we take an example of a two way interleaved

memory.
 In a two way interleaved memory, main memory is divided into two

modules in such a way that all the positive addresses are in first module
and rest all odd addresses are in second module.
 So, all the positive addresses go to first module and so on. We can also
say that, least significant bit of the memory address specifies the
address of the module. That is if LSB is 0, first module is selected and if
LSB is 1, second module is selected. If it is a four way interleaved
memory then least two significant bits of memory address is used to
specify the memory module.
MULTI CACHE PROBMEL AND SOLUTION:
 A coherence problem may occur in a multicache system as soon
as data inconsistency exists in the private caches and the main
memory.
 Without an effective solution to the coherence problem, the
effectiveness of a multicache system will be inherently limited.
 The problem is closely examined in this paper and previous
solutions, both centralized approaches and distributed
approaches, are analyzed based on the notion of semicritical
sections.
 A state model is then presented which clarifies various coherence
mechanisms as well as introduces a new state to enable the
multicache system to more efficiently handle the processor
writes.
 Software guidance, for performance and not for integrity, is
advocated in a new proposal which in a practical multicache
environment explores the benefit of the new state with little cost.
Cache Coherence:
 In a multiprocessor system, data inconsistency may occur among adjacent
levels or within the same level of the memory hierarchy. For example, the
cache and the main memory may have inconsistent copies of the same
object.
 As multiple processors operate in parallel, and independently multiple
caches may possess different copies of the same memory block, this
creates cache coherence problem. Cache coherence schemes help to
avoid this problem by maintaining a uniform state for each cached block of
data.
 Let X be an element of shared data which has been referenced by two

processors, P1 and P2. In the beginning, three copies of X are
consistent. If the processor P1 writes a new data X1 into the cache, by
using write-through policy, the same copy will be written immediately
into the shared memory. In this case, inconsistency occurs between
cache memory and the main memory. When a write-back policy is
used, the main memory will be updated when the modified data in the
cache is replaced or invalidated.
MULTIPROCESSOR OPERATING SYSTEM.
CLASSIFICATION:
Most computer systems are single processor systems i.e they only have one
processor. However, multiprocessor or parallel systems are increasing in
importance nowadays. These systems have multiple processors working in
parallel that share the computer clock, memory, bus, peripheral devices etc. An
image demonstrating the multiprocessor architecture ;
Types of Multiprocessors
There are mainly two types of multiprocessors i.e. symmetric and asymmetric
multiprocessors. Details about them are as follows −
Symmetric Multiprocessors
In these types of systems, each processor contains a similar copy of the operating
system and they all communicate with each other. All the processors are in a peer
to peer relationship i.e. no master - slave relationship exists between them.
An example of the symmetric multiprocessing system is the Encore version of
Unix for the Multimax Computer.
Asymmetric Multiprocessors
In asymmetric systems, each processor is given a predefined task. There is a
master processor that gives instruction to all the other processors. Asymmetric
multiprocessor system contains a master slave relationship.
Asymmetric multiprocessor was the only type of multiprocessor available before
symmetric multiprocessors were created. Now also, this is the cheaper option.
Advantages of Multiprocessor Systems

There are multiple advantages to multiprocessor systems. Some of these are −
More reliable Systems
In a multiprocessor system, even if one processor fails, the system will not halt.
This ability to continue working despite hardware failure is known as graceful
degradation. For example: If there are 5 processors in a multiprocessor system
and one of them fails, then also 4 processors are still working. So the system only
becomes slower and does not ground to a halt.
Enhanced Throughput
If multiple processors are working in tandem, then the throughput of the system
increases i.e. number of processes getting executed per unit of time increase. If
there are N processors then the throughput increases by an amount just under N.
More Economic Systems
Multiprocessor systems are cheaper than single processor systems in the long run
because they share the data storage, peripheral devices, power supplies etc. If
there are multiple processes that share data, it is better to schedule them on
multiprocessor systems with shared data than have different computer systems
with multiple copies of the data.
Disadvantages of Multiprocessor Systems
There are some disadvantages as well to multiprocessor systems. Some of these
are:
Increased Expense
Even though multiprocessor systems are cheaper in the long run than using
multiple computer systems, still they are quite expensive. It is much cheaper to
buy a simple single processor system than a multiprocessor system.
Complicated Operating System Required
There are multiple processors in a multiprocessor system that share peripherals,
memory etc. So, it is much more complicated to schedule processes and impart
resources to processes.than in single processor systems. Hence, a more complex
and complicated operating system is required in multiprocessor systems.
Large Main Memory Required
All the processors in the multiprocessor system share the memory. So a much
larger pool of memory is required as compared to single processor systems.
SOFTWARE REQUREMNET FOR MULTIPROCESSOR:
The task of writing basic system software for a multiprocessor system is simplified
if the system is organized so that it may be programmed as a single virtual
machine. This strategy is considered for tightly coupled real time systems with
many processors. Software is written in a high level language supporting parallel
execution, without knowledge of the target hardware configuration .
OS requirements:
 Must provide functionality of a multiprogramming OS plus additional

features to support multiple processors
 Simultaneous concurrent processes or threads: kernel routines need to be

reentrant
 Scheduling done by any processor, can create conflicts.
 Synchronization through locks is required.
 Memory management needs to be coordinated in the different processors
 Much more complex than just multiprogramming OS.

EXPLOITING CONCURRENCY FOR MULTIPROCESSING.
EXPLOTING PARALLELISM:
Parallelism in computer architectures is the ability for multiple actions to occur
simultaneously. Parallelism is often categorized into the three types discussed
here.
Data level parallelism (DLP)

o General purpose architectures often support data-level parallelism (DLP),
the parallelism found in the application of identical operations across many
different data elements [HGLS86]. Some scientific applications and many
multimedia applications exhibit large levels of DLP. Vector processors such
as Cray Research’s Cray-1 and Cray-2 use single instructions to operate on
vectors of data elements at a time.
o SIMD instructions allow programmers to specify a single operation to apply

to multiple, distinct data elements. Unlike vector processors, generally only
a small number of elements that can fit in a single register at one time are
operated on simultaneously.
Instruction level parallelism (ILP)

o Many processors today also support instruction-level parallelism (ILP) by
taking advantage of resource and data independence between instructions
and executing multiple operations at one time. Superscalar processors (or
superscalars) are capable of fetching and executing more than one
instruction simultaneously and thus exploit some level of ILP.
o Statically-scheduled superscalars may execute several instructions in

program order if there are no hazards between the instructions. These
superscalars stop executing later instructions when any hazard is first
encountered which limits ILP. Another class of architectures, very long
instruction word architectures (VLIW), contain multiple operations in a
single instruction.
MULTIPROCESSOR SCHEDULING STRATEGIES:
 multiprocessor scheduling focuses on designing the system's scheduling

function, which consists of more than one processor. Multiple CPUs share
the load (load sharing) in multiprocessor scheduling so that various
processes run simultaneously. In general, multiprocessor scheduling is
complex as compared to single processor scheduling. In the multiprocessor
scheduling, there are many processors, and they are identical, and we can
run any process at any time.
 The multiple CPUs in the system are in close communication, which shares
a common bus, memory, and other peripheral devices. So we can say that
the system is tightly coupled. These systems are used when we want to
process a bulk amount of data, and these systems are mainly used in
satellite, weather forecasting, etc.
 There are cases when the processors are identical, i.e., homogenous, in
terms of their functionality in multiple-processor scheduling. We can use
any processor available to run any process in the queue.
Dimensions of multiple processor management:

 Multiprocessor, scheduling is two dimensional. The scheduler has to decide
which process to run and which CPU to run it on. This extra dimension
greatly complicates scheduling on multiprocessors.
 Another complicating factor is that in some systems, all the processes are
unrelated whereas in others they come in groups. An example of the
former situation is a timesharing system in which independent users start
up independent processes. The processes are unrelated and each one can
be scheduled without regard to the other ones.
DETERMINISTIC SCHEDULING MODELS:
 In this section we examine schedules in which more than one processor can
be used to optimize measures of performance. This section is divided into
two major parts.
 In the first part-Common Scheduling Environments-the parameters
identified in most of the scheduling literature and discussed earlier prevail
unless stated otherwise.
 That is, we assume a number of identical processors, a set of tasks with
equal or unequal execution times, and a precedence order. Both
preemptive and nonpreemptive disciplines are examined.
 In the second part-Special Scheduling Environments- additional constraints
are introduced. These constraints include systems with a finite number of
resources in each member of a set of resource classes, periodic jobs with
specified initiation and completion times, and the presence of intermediate
deadlines within a schedule.

Multiprocessor Architecture and Programming

Uploaded by

Copyright:

Available Formats

Multiprocessor Architecture and Programming

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Multiprocessor Architecture and Programming

Uploaded by

Copyright:

Available Formats

UNIT-5

MULTIPROCESSORS ARCHITECTURE AND

• Multiprocessors system are classified as multiple instruction stream, multiple

• There exists a distinction between multiprocessor and multicomputers that though

• In multicomputers several autonomous computers are connected through a

 Processes that execute on different computer modules communicate with

TIGHTLLY COUPLED MULTIPROCESSORS:

PROCESSOR CHARACTRISTICS FOR MULTIPROCESSING:

Efficient context switching: The processor should support more than one

Effective synchronization primitives: The processors should provide

Inter processor communication mechanism: The processors that are used to

MULTITAGE INTERCONNECTION NETWORK:

 The basic component of a multistage network is a two-input, two-output

Analysis and Performance:

o A multiple bus possesses an rate of cost (complexity) growth.

o The multiple bus possesses an rate of delay (latency)

failure in a bus system will leave ( ) distinct fault-free paths

o Each stage consists of , SEs.

number of SEs, is .

the path from input to output, is .

Figure 2.10: Performance Comparison of Dynamic Networks.

1. Degree of a node, d, is defined as the number of channels incident on the

 Completely Connected Networks (CCNs);

by , that is, .

nodes of the linear array is .

to a destination node is , that is, .

diameter is , that is, .

Figure 2.11: Performance Characteristics of Static Networks.

PARALLEL MEMORY ORGANIZATIONS.

To understand this scenario we take an example of a two way interleaved

 In a two way interleaved memory, main memory is divided into two

 Let X be an element of shared data which has been referenced by two

Advantages of Multiprocessor Systems

 Must provide functionality of a multiprogramming OS plus additional

 Simultaneous concurrent processes or threads: kernel routines need to be

 Scheduling done by any processor, can create conflicts.

 Synchronization through locks is required.

 Memory management needs to be coordinated in the different processors

 Much more complex than just multiprogramming OS.

Data level parallelism (DLP)

o SIMD instructions allow programmers to specify a single operation to apply

Instruction level parallelism (ILP)

o Statically-scheduled superscalars may execute several instructions in

MULTIPROCESSOR SCHEDULING STRATEGIES:

 multiprocessor scheduling focuses on designing the system's scheduling

Dimensions of multiple processor management:

You might also like