Multiprocessors and Threads: Fred Kuhns CS523S: Operating Systems

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 41

Multiprocessors and Threads

Lecture 3

Fred Kuhns ( )

CS523S: Operating Systems

Motivation for Multiprocessors


Enhanced Performance Concurrent execution of tasks for increased
throughput (between processes)
Exploit Concurrency in Tasks (Parallelism
within process)

Fault Tolerance graceful degradation in face of failures

Fred Kuhns ( )

CS523S: Operating

Basic MP Architectures
Single Instruction Single Data (SISD) conventional uniprocessor designs.
Single Instruction Multiple Data (SIMD) Vector and Array Processors
Multiple Instruction Single Data (MISD) Not Implemented.
Multiple Instruction Multiple Data (MIMD)
- conventional MP designs
Fred Kuhns ( )

CS523S: Operating

MIMD Classifications
Tightly Coupled System - all processors
share the same global memory and have the
same address spaces (Typical SMP system).
Main memory for IPC and Synchronization.

Loosely Coupled System - memory is


partitioned and attached to each processor.
Hypercube, Clusters (Multi-Computer).
Message passing for IPC and synchronization.

Fred Kuhns ( )

CS523S: Operating

MP Block Diagram
CPU

CPU

CPU

CPU

cache MMU

cache MMU

cache MMU

cache MMU

Interconnection Network

MM

Fred Kuhns ( )

MM

MM

CS523S: Operating

MM

Memory Access Schemes


Uniform Memory Access (UMA)
Centrally located
All processors are equidistant (access times)

NonUniform Access (NUMA)


physically partitioned but accessible by all
processors have the same address space

NO Remote Memory Access (NORMA)


physically partitioned, not accessible by all
processors have own address space
Fred Kuhns ( )

CS523S: Operating

Other Details of MP
Interconnection technology
Bus
Cross-Bar switch
Multistage Interconnect Network

Caching - Cache Coherence Problem!


Write-update
Write-invalidate
bus snooping

Fred Kuhns ( )

CS523S: Operating

MP OS Structure - 1
Separate Supervisor all processors have their own copy of the kernel.
Some share data for interaction
dedicated I/O devices and file systems
good fault tolerance
bad for concurrency

Fred Kuhns ( )

CS523S: Operating

MP OS Structure - 2
Master/Slave Configuration
master monitors the status and assigns work to
other processors (slaves)
Slaves are a schedulable pool of resources for
the master
master can be bottleneck
poor fault tolerance

Fred Kuhns ( )

CS523S: Operating

MP OS Structure - 3
Symmetric Configuration - Most Flexible.
all processors are autonomous, treated equal
one copy of the kernel executed concurrently
across all processors
Synchronize access to shared data structures:
Lock entire OS - Floating Master
Mitigated by dividing OS into segments that normally
have little interaction
multithread kernel and control access to resources
(continuum)
Fred Kuhns ( )

CS523S: Operating

MP Overview
MultiProcessor
SIMD

MIMD
Shared Memory
(tightly coupled)

Master/Slave

Fred Kuhns ( )

Distributed Memory
(loosely coupled)

Symmetric
(SMP)

CS523S: Operating

Clusters

SMP OS Design Issues


Threads - effectiveness of parallelism depends
on performance of primitives used to express
and control concurrency.
Process Synchronization - disabling interrupts
is not sufficient.
Process Scheduling - efficient, policy
controlled, task scheduling (process/threads)
global versus per CPU scheduling
Task affinity for a particular CPU
resource accounting and intra-task thread
dependencies
Fred Kuhns ( )

CS523S: Operating

SMP OS design issues - 2


Memory Management - complicated since
main memory is shared by possibly many
processors. Each processor must maintain its
own map tables for each process
cache coherence
memory access synchronization
balancing overhead with increased concurrency

Reliability and fault Tolerance - degrade


gracefully in the event of failures
Fred Kuhns ( )

CS523S: Operating

Typical SMP System


500MHz

CPU
cache

CPU

MMU

Issues:
Memory contention
Limited bus BW
I/O contention
Cache coherence

cache

MMU

CPU
cache

MMU

cache

MMU

System/Memory Bus
INT

Main 50ns
Memory

I/O
Bridge
subsystem

System Functions
(timer, BIOS, reset)
Typical I/O Bus:
33MHz/32bit (132MB/s)
66MHz/64bit (528MB/s)

Fred Kuhns ( )

CPU

CS523S: Operating

ether
scsi
video

Some Definitions
Parallelism: degree to which a multiprocessor
application achieves parallel execution
Concurrency: Maximum parallelism an
application can achieve with unlimited
processors
System Concurrency: kernel recognizes
multiple threads of control in a program
User Concurrency: User space threads
(coroutines) provide a natural programming
model for concurrent applications. Concurrency
not supported by system.
Fred Kuhns ( )

CS523S: Operating

Process and Threads


Process: encompasses
set of threads (computational entities)
collection of resources

Thread: Dynamic object representing an


execution path and computational state.
threads have their own computational state: PC,
stack, user registers and private data
Remaining resources are shared amongst threads
in a process
Fred Kuhns ( )

CS523S: Operating

Threads
Effectiveness of parallel computing depends on
the performance of the primitives used to
express and control parallelism
Threads separate the notion of execution from
the Process abstraction
Useful for expressing the intrinsic concurrency
of a program regardless of resulting
performance
Three types: User threads, kernel threads and
Light Weight Processes (LWP)
Fred Kuhns ( )

CS523S: Operating

User Level Threads


User level threads - supported by user level
(thread) library
Benefits:
no modifications required to kernel
flexible and low cost
Drawbacks:
can not block without blocking entire process
no parallelism (not recognized by kernel)

Fred Kuhns ( )

CS523S: Operating

Kernel Level Threads


Kernel level threads - kernel directly supports
multiple threads of control in a process. Thread
is the basic scheduling entity
Benefits:
coordination between scheduling and
synchronization
less overhead than a process
suitable for parallel application
Drawbacks:
more expensive than user-level threads
generality leads to greater overhead
Fred Kuhns ( )

CS523S: Operating

Light Weight Processes (LWP)


Kernel supported user thread
Each LWP is bound to one kernel thread.
a kernel thread may not be bound to an LWP

LWP is scheduled by kernel


User threads scheduled by library onto LWPs
Multiple LWPs per process

Fred Kuhns ( )

CS523S: Operating

First Class threads (Psyche OS)


Thread operations in user space:
create, destroy, synch, context switch

kernel threads implement a virtual processor


Course grain in kernel - preemptive scheduling
Communication between kernel and threads library
shared data structures.
Software interrupts (user upcalls or signals). Example, for
scheduling decisions and preemption warnings.
Kernel scheduler interface - allows dissimilar thread
packages to coordinate.

Fred Kuhns ( )

CS523S: Operating

Scheduler Activations
An activation:
serves as execution context for running thread
notifies thread of kernel events (upcall)
space for kernel to save processor context of current
user thread when stopped by kernel

kernel is responsible for processor allocation =>


preemption by kernel.
Thread package responsible for scheduling
threads on available processors (activations)
Fred Kuhns ( )

CS523S: Operating

Support for Threading


BSD:
process model only. 4.4 BSD enhancements.

Solaris:provides
user threads, kernel threads and LWPs

Mach: supports
kernel threads and tasks. Thread libraries provide
semantics of user threads, LWPs and kernel threads.

Digital UNIX: extends MACH to provide usual


UNIX semantics.
Pthreads library.
Fred Kuhns ( )

CS523S: Operating

Solaris Threads
Supports:
user threads (uthreads) via libthread and libpthread
LWPs, acts as a virtual CPU for user threads
kernel threads (kthread), every LWP is associated
with one kthread, however a kthread may not have an
LWP

interrupts as threads

Fred Kuhns ( )

CS523S: Operating

Solaris kthreads
Fundamental scheduling/dispatching object
all kthreads share same virtual address space
(the kernels) - cheap context switch
System threads - example STREAMS, callout
kthread_t, /usr/include/sys/thread.h
scheduling info, pointers for scheduler or sleep
queues, pointer to klwp_t and proc_t

Fred Kuhns ( )

CS523S: Operating

Solaris LWP
Bound to a kthread
LWP specific fields from proc are kept in
klwp_t (/usr/include/sys/klwp.h)
user-level registers, system call params, resource
usage, pointer to kthread_t and proc_t

klwp_t can be swapped with LWP


LWP non-swappable info kept in kthread_t

Fred Kuhns ( )

CS523S: Operating

Solaris LWP (cont)


All LWPs in a process share:
signal handlers

Each may have its own


signal mask
alternate stack for signal handling

No global name space for LWPs

Fred Kuhns ( )

CS523S: Operating

Solaris User Threads


Implemented in user libraries
library provides synchronization and scheduling
facilities
threads may be bound to LWPs
unbound threads compete for available LWPs
Manage thread specific info
thread id, saved register state, user stack, signal mask,
priority*, thread local storage

Solaris provides two libraries: libthread and


libpthread.
Try man thread or man pthreads
Fred Kuhns ( )

CS523S: Operating

Solaris Thread Data Structures


proc_t
p_tlist

klwp_t
lwp_thread
lwp_procp

Fred Kuhns ( )

CS523S: Operating

kthread_t
t_procp
t_lwp
t_forw

Solaris: Processes, Threads and LWPs


Process 2

Process 1

user

......
...

kernel

hardware
Fred Kuhns ( )

Int kthr

CS523S: Operating

Solaris Interrupts
One system wide clock kthread
pool of 9 partially initialized kthreads per CPU
for interrupts
interrupt thread can block
interrupted thread is pinned to the CPU

Fred Kuhns ( )

CS523S: Operating

Solaris Signals and Fork


Divided into Traps (synchronous) and interrupts
(asynchronous)
each thread has its own signal mask, global set
of signal handlers
Each LWP can specify alternate stack
fork replicates all LWPs
fork1 only the invoking LWP/thread

Fred Kuhns ( )

CS523S: Operating

Mach
Two abstractions:
Task - static object, address space and system
resources called port rights.
Thread - fundamental execution unit and runs in
context of a task.
Zero or more threads per task,
kernel schedulable
kernel stack
computational state

Processor sets - available processors divided into nonintersecting sets.


permits dedicating processor sets to one or more tasks
Fred Kuhns ( )

CS523S: Operating

Mach c-thread Implementations


Coroutine-based - multiples user threads onto
a single-threaded task
Thread-based - one-to-one mapping from cthreads to Mach threads. Default.
Task-based - One Mach Task per c-thread.

Fred Kuhns ( )

CS523S: Operating

Digital UNIX
Based on Mach 2.5 kernel
Provides complete UNIX programmers interface
4.3BSD code and ULTRIX code ported to Mach
u-area replaced by utask and uthread
proc structure retained

Fred Kuhns ( )

CS523S: Operating

Digital UNIX threads


Signals divided into synchronous and
asynchronous
global signal mask
each thread can define its own handlers for
synchronous signals
global handlers for asynchronous signals

Fred Kuhns ( )

CS523S: Operating

Pthreads library
One Mach thread per pthread
implements asynchronous I/O
separate thread created for synchronous I/O which in
turn signals original thread

library includes signal handling, scheduling


functions, and synchronization primitives.

Fred Kuhns ( )

CS523S: Operating

Mach Continuations
Address problem of excessive kernel stack memory
requirements
process model versus interrupt model
one per process kernel stack versus a per thread kernel
stack

Thread is first responsible for saving any required


state (the thread structure allows up to 28 bytes)
indicate a function to be invoked when unblocked
(the continuation function)
Advantage: stack can be transferred between
threads eliminating copy overhead.
Fred Kuhns ( )

CS523S: Operating

Threads in Windows NT
Design driven by need to support a variety of
OS environments
NT process implemented as an object
executable process contains >= 1 thread
process and thread objects have built in
synchronization capabilitiesS

Fred Kuhns ( )

CS523S: Operating

NT Threads
Support for kernel (system) threads
Threads are scheduled by the kernel and thus are
similar to UNIX threads bound to an LWP
(kernel thread)
fibers are threads which are not scheduled by the
kernel and thus are similar to unbound user
threads.

Fred Kuhns ( )

CS523S: Operating

4.4 BSD UNIX


Initial support for threads implemented but not
enabled in distribution
Proc structure and u-area reorganized
All threads have a unique ID
How are the proc and u areas reorganized to
support threads?

Fred Kuhns ( )

CS523S: Operating

You might also like