The document discusses Flynn's taxonomy, a widely used classification system for parallel computers from 1966. It describes the four categories in Flynn's taxonomy: SISD, SIMD, MISD, and MIMD based on whether the instruction and data streams are single or multiple. Key aspects of each category are provided along with examples. Additionally, the document defines common parallel computing terminology such as nodes, cores, tasks, shared and distributed memory, and parallel overhead.
The document discusses Flynn's taxonomy, a widely used classification system for parallel computers from 1966. It describes the four categories in Flynn's taxonomy: SISD, SIMD, MISD, and MIMD based on whether the instruction and data streams are single or multiple. Key aspects of each category are provided along with examples. Additionally, the document defines common parallel computing terminology such as nodes, cores, tasks, shared and distributed memory, and parallel overhead.
The document discusses Flynn's taxonomy, a widely used classification system for parallel computers from 1966. It describes the four categories in Flynn's taxonomy: SISD, SIMD, MISD, and MIMD based on whether the instruction and data streams are single or multiple. Key aspects of each category are provided along with examples. Additionally, the document defines common parallel computing terminology such as nodes, cores, tasks, shared and distributed memory, and parallel overhead.
The document discusses Flynn's taxonomy, a widely used classification system for parallel computers from 1966. It describes the four categories in Flynn's taxonomy: SISD, SIMD, MISD, and MIMD based on whether the instruction and data streams are single or multiple. Key aspects of each category are provided along with examples. Additionally, the document defines common parallel computing terminology such as nodes, cores, tasks, shared and distributed memory, and parallel overhead.
Lecture # 02 Flynn’s Taxonomy and Parallelism Raja Shamayel
Department of Computer Science
NUML University Islamabad von Neumann Architecture • Authored the general requirements for an electronic computer in his 1945 papers • Also known as "stored-program computer” • Both program instructions and data are kept in electronic memory • Since then, virtually all computers have followed his design John von Neumann Von Neumann Architecture Four main Components • Read/write, random access memory is used to store both program instructions and data • Program instructions are coded data which tell the computer to do something • Data is simply information to be used by the program • Control unit fetches instructions/data from memory, decodes the instructions and then sequentially coordinates operations to accomplish the programmed task. • Arithmetic Unit performs basic arithmetic operations parallel computers still follow this basic design • Input/Output is the interface to the Just multiplied in units human operator Flynn's Classical Taxonomy • Different ways to classify parallel computers • One of the more widely used classifications, in use since 1966, is called Flynn's Taxonomy • Distinguishes multi-processor computer architectures according to two dimensions • Instruction Stream • Data Stream SISD (Scalar Processing) • A serial (non-parallel) computer • Single Instruction: Only one instruction stream is being acted on by the CPU during any one clock cycle • Single Data: Only one data stream is being used as input during any one clock cycle • Deterministic execution • This is the oldest type of computer • Examples: older generation mainframes, minicomputers, workstations and single processor/core PCs. SIMD (Vector Processing) • Single Instruction: All processing units execute the same instruction at any given clock cycle • Multiple Data: Each processing unit can operate on a different data element • Best suited for specialized problems such as graphics/image processing. • Synchronous and deterministic execution • Processor Arrays Vector Pipelines • Examples: • Processor Arrays: Thinking Machines CM-2 • Vector Pipelines: IBM 9000 • Most modern computers with GPU units Scalar vs Vector (SIMD) Multiple Instruction, Single Data (MISD) • Multiple Instruction: Each processing unit operates on the data independently via separate instruction streams. • Single Data: A single data stream is fed into multiple processing units. • Few (if any) actual examples of this class of parallel computer have ever existed. • Some conceivable uses might be: • multiple frequency filters operating on a single signal stream • multiple cryptography algorithms attempting to crack a single coded message. Multiple Instruction, Multiple Data (MIMD) • Multiple Instruction: Every processor may be executing a different instruction stream • Multiple Data: Every processor may be working with a different data stream • Execution can be synchronous or asynchronous, deterministic or non-deterministic • Currently, the most common type of parallel computer - most modern supercomputers fall into this category • Examples: most current supercomputers, networked parallel computer clusters, grids, multi-core PCs MIMD Classification • Shared Memory • Distributed Memory Shared vs Distributed Memory Some General Parallel Terminology • Supercomputing / High Performance Computing (HPC) • Using the world's fastest and largest computers to solve large problems • Node • A standalone "computer in a box". Usually comprised of multiple CPUs/processors/cores, memory, network interfaces, etc. Nodes are networked together to comprise a supercomputer. • CPU / Socket / Processor / Core • In the past, a CPU (Central Processing Unit) was a singular execution component • Then, multiple CPUs were incorporated into a node • Then, individual CPUs were subdivided into multiple "cores", each being a unique execution unit • CPUs with multiple cores are sometimes called "sockets” • The result is a node with multiple CPUs, each containing multiple cores Some General Parallel Terminology • Task • A logically discrete section of computational work. A task is typically a program or program-like set of instructions that is executed by a processor. A parallel program consists of multiple tasks running on multiple processors. • Pipelining • Breaking a task into steps performed by different processor units, with inputs streaming through, much like an assembly line; a type of parallel computing. • Shared Memory • From a strictly hardware point of view, describes a computer architecture where all processors have direct (usually bus based) access to common physical memory. In a programming sense, it describes a model where parallel tasks all have the same "picture" of memory and can directly address and access the same logical memory locations regardless of where the physical memory actually exists. Some General Parallel Terminology • Symmetric Multi-Processor (SMP) • Shared memory hardware architecture where multiple processors share a single address space and have equal access to all resources. • Distributed Memory • In hardware, refers to network based memory access for physical memory that is not common. As a programming model, tasks can only logically "see" local machine memory and must use communications to access memory on other machines where other tasks are executing. • Communications • Parallel tasks typically need to exchange data. There are several ways this can be accomplished, such as through a shared memory bus or over a network, however the actual event of data exchange is commonly referred to as communications regardless of the method employed Some General Parallel Terminology • Synchronization • The coordination of parallel tasks in real time, very often associated with communications. • Often implemented by establishing a synchronization point within an application where a task may not proceed further until another task(s) reaches the same or logically equivalent point. • Synchronization usually involves waiting by at least one task and can therefore cause a parallel application's wall clock execution time to increase. • Granularity • granularity is a qualitative measure of the ratio of computation to communication • Coarse: relatively large amounts of computational work are done between communication events • Fine: relatively small amounts of computational work are done between communication events • Observed Speedup • Observed speedup of a code which has been parallelized, defined as: wall-clock time of serial execution ----------------------------------- wall-clock time of parallel execution • One of the simplest and most widely used indicators for a parallel program's performance. Some General Parallel Terminology • Parallel Overhead • The amount of time required to coordinate parallel tasks, as opposed to doing useful work. Parallel overhead can include factors such Task start-up time • Synchronizations • Data communications • Software overhead imposed by parallel languages, libraries, operating system, etc. • Task termination time • Massively Parallel • Refers to the hardware that comprises a given parallel system - having many processing elements. The meaning of "many" keeps increasing, but currently, the largest parallel computers are comprised of processing elements numbering in the hundreds of thousands • Embarrassingly Parallel • Solving many similar, but independent tasks simultaneously; little to no need for coordination between the tasks Some General Parallel Terminology • Scalability • Refers to a parallel system's (hardware and/or software) ability to demonstrate a proportionate increase in parallel speedup with the addition of more resources. Factors that contribute to scalability include: • Hardware - particularly memory-CPU bandwidths and network communication properties • Application algorithm • Parallel overhead related • Characteristics of your specific application That’s all for today!!