CH17-COA10e - Parallel Processing
CH17-COA10e - Parallel Processing
CH17-COA10e - Parallel Processing
William Stallings
Computer Organization
and Architecture
10th Edition
© 2016 Pearson Education, Inc., Hoboken,
NJ. All rights reserved.
+ Chapter 17
Parallel Processing
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
+
Multiple Processor Organization
Uniprocessor
Clusters
Symmetric Nonumiform
Multiprocessor Memory
(SMP) Access
(NUMA)
(a) SISD DS
PU2 LM2
IS
CU
IS DS DS
CU1 PU1 PUn LMn
Memory
Shared
IS DS
CU1 PU1 LM1
Interconnection
IS DS IS DS
Network
CUn PUn CU2 PU2 LM2
Process 1
Process 2
Process 3
Process 1
Process 2
Process 3
Blocked Running
I/O
I/O
Interconnection
Network
I/O
Main Memory
shared bus
Main I/O
Memory I/O Adapter
Subsytem
I/O
Adapter
I/O
Adapter
Simplicity
Simplest approach to multiprocessor organization
Flexibility
Generally easy to expand the system by attaching more
processors to the bus
Reliability
The bus is essentially a passive medium and the failure of any
attached device should not cause failure of the whole system
Scheduling
Any processor may perform scheduling so conflicts must be avoided
Scheduler must assign ready processes to available processors
Synchronization
With multiple active processes having potential access to shared address spaces or I/O resources, care must be
taken to provide effective synchronization
Synchronization is a facility that enforces mutual exclusion and event ordering
Memory management
In addition to dealing with all of the issues found on uniprocessor machines, the OS needs to exploit the available
hardware parallelism to achieve the best performance
Paging mechanisms on different processors must be coordinated to enforce consistency when several processors
share a page or segment and to decide on page replacement
Modified
The line in the cache has been modified and is available only in
this cache
Exclusive
The line in the cache is the same as that in main memory and is
not present in any other cache
Shared
The line in the cache is the same as that in main memory and may
be present in another cache
Invalid
The line in the cache does not contain valid data
M E S I
Modified Exclusive Shared Invalid
This cache line Yes Yes Yes No
valid?
The memory out of date valid valid —
copy is…
Copies exist in
No No Maybe Maybe
other caches?
A write to this does not go to does not go to goes to bus and goes directly to
line… bus bus updates cache bus
R
M
WM
SHR
SHW
SH
W
R
H
SH
W
WH
(a) Line in cache at initiating pr ocessor (b) Line in snooping cache
Multithreading
Allows for a high degree of instruction-level parallelism without
increasing circuit complexity or power consumption
Instruction stream is divided into several smaller streams, known as
threads, that can be executed in parallel
Thread:
• Dispatchable unit of work within a Process:
process • An instance of program running on
• Includes processor context (which computer
includes the program counter and • Two key characteristics:
stack pointer) and data area for stack
• Resource ownership
• Executes sequentially and is
interruptible so that the processor can • Scheduling/execution
turn to another thread
Process switch
• Operation that switches the processor
from one process to another by saving all
the process control data, registers, and
other information for the first and
replacing them with the process
information for the second
© 2016 Pearson Education, Inc., Hoboken, NJ. All rights reserved.
Implicit and Explicit
Multithreading
All commercial processors and most
experimental ones use explicit multithreading
Concurrently execute instructions from different
explicit threads
Interleave instructions from different threads on
shared pipelines or parallel execution on parallel
pipelines
A A A A A
thread switches
thread switches
A B A A
cycles
C A
D A A A A
A B
A B B A A A
ot
e sl
A A A A issu A A N N A A N N
thread switches
thread switches
thread switches
B B B A A A N N N B B B N
latency
C cycle
C N N N
D D D D B B B A A A A D D D D
A A B A A N N
B C A A A N B N N N
issue bandwidth
(g) VLIW (h) interleaved
(e) interleaved (f) blocked multithreading
multithreading multithreading VLIW
superscalar superscalar
A A N N A A A A B B B C A A B B C
thread switches
A A N N D D D A A A B D A B B D D
D D D A A A B C B D
B B B N B D A A A A B B A A C D D
B N N N C D D A A A A A B B C C D D
C N N N A B B D D D D D A A B C C D
Defined as:
A group of interconnected whole computers working
together as a unified computing resource that can
create the illusion of being one machine
(The term whole computer means a system that can run
on its own, apart from the cluster)
RAID
Two approaches:
Highly available clusters
Fault tolerant clusters
Failover
The function of switching applications and data resources over from a failed system
to an alternative system in the cluster
Failback
Restoration of applications and data resources to the original system once it
has been fixed
Load balancing
Incremental scalability
Automatically include new computers in scheduling
Middleware needs to recognize that processes may switch between machines
Cluster Middleware
(Single System Image and Availability Infrastructure)
Net. Interface HW Net. Interface HW Net. Interface HW Net. Interface HW Net. Interface HW
N 100GbE
10GbE
& Eth Switch Eth Switch Eth Switch
40GbE
SMP Clustering
Easier to manage and Far superior in terms of
configure incremental and absolute
scalability
Much closer to the original
single processor model for Superior in terms of
which nearly all applications availability
are written
All components of the system
Less physical space and lower can readily be made highly
power consumption redundant
I/O
Main
Memory 1
Processor Processor
2-1 2-m
L1 Cache L1 Cache
Interconnect
Network L2 Cache L2 Cache Directory
I/O
Main
Memory 2
Processor Processor
N-1 N-m
L1 Cache L1 Cache
Directory
Main
Memory N
Resource Pooling
(c) IaaS
Router
Network
or Internet
Router
LAN Cloud
switch service
provider
Servers
Security
Privacy
IaaS Service
Security Aggregation
Resource Abstraction Provisioning/
Audit
and Control Layer Configuration Service
Privacy Physical Resource Layer Arbitrage
Impact Audit
Hardware Portability/
Performance Interoperability
Facility
Audit
Cloud Carrier