This document discusses parallel database systems and techniques. It covers topics like parallel architectures, data placement, query processing, load balancing and fault tolerance in parallel databases. It describes objectives of data servers and various parallel system architectures like shared memory, shared disk and shared nothing. It also discusses techniques for parallel data processing, query optimization, data partitioning and placement, and parallel join processing.
This document discusses parallel database systems and techniques. It covers topics like parallel architectures, data placement, query processing, load balancing and fault tolerance in parallel databases. It describes objectives of data servers and various parallel system architectures like shared memory, shared disk and shared nothing. It also discusses techniques for parallel data processing, query optimization, data partitioning and placement, and parallel join processing.
This document discusses parallel database systems and techniques. It covers topics like parallel architectures, data placement, query processing, load balancing and fault tolerance in parallel databases. It describes objectives of data servers and various parallel system architectures like shared memory, shared disk and shared nothing. It also discusses techniques for parallel data processing, query optimization, data partitioning and placement, and parallel join processing.
This document discusses parallel database systems and techniques. It covers topics like parallel architectures, data placement, query processing, load balancing and fault tolerance in parallel databases. It describes objectives of data servers and various parallel system architectures like shared memory, shared disk and shared nothing. It also discusses techniques for parallel data processing, query optimization, data partitioning and placement, and parallel join processing.
Download as PPTX, PDF, TXT or read online from Scribd
Download as pptx, pdf, or txt
You are on page 1of 35
DEVANG
DEVANG PATEL INSTITUTEOF
PATEL INSTITUTE OFADVANCE ADVANCETECHNOLOGY TECHNOLOGY AND AND RESEARCH RESEARCH
Chapter : 7
Subject Faculties: Parallel Database Systems
Prof. Kashyap Patel Assistant Professor, Department of Computer Engineering.
Devang Patel Institute of Advance Technology And Research
Charotar University of Science and Technology Outline: • Parallel Architectures • Data Placement • Parallel Query Processing • Load Balancing • Fault-Tolerance • Database Clusters
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
The Database Problem: • Large volume of data use disk and large main memory • I/O bottleneck (or memory access bottleneck) • Speed(disk) << speed(RAM) << speed(microprocessor) • Predictions • Moore’s law: processor speed growth (with multicore): 50 % per year • DRAM capacity growth : 4 × every three years • Disk throughput : 2 × in the last ten years • Conclusion : the I/O bottleneck worsens
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
The Solution: • Increase the I/O bandwidth • Data partitioning • Parallel data access • Origins (1980's): database machines • Hardware-oriented bad cost-performance failure • Notable exception : ICL's CAFS Intelligent Search Processor • 1990's: same solution but using standard hardware components integrated in a multiprocessor • Software-oriented • Standard essential to exploit continuing technology improvements
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Data Server Architecture:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Objectives of Data Servers: • Avoid the shortcomings of the traditional DBMS approach • Centralization of data and application management • General-purpose OS (not DB-oriented) • By separating the functions between • Application server (or host computer) • Data server (or database computer or back-end computer)
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel Data Processing: • Three ways of exploiting high-performance multiprocessor systems: Automatically detect parallelism in sequential programs (e.g., Fortran, OPS5) Augment an existing language with parallel constructs (e.g., C*, Fortran90) Offer a new language in which parallelism can be expressed or automatically inferred • Critique Hard to develop parallelizing compilers, limited resulting speed-up Enables the programmer to express parallel computations but too low-level Can combine the advantages of both (1) and (2)
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Data-based Parallelism:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel DBMS: • Loose definition: a DBMS implemented on a tighly coupled multiprocessor • Alternative extremes • Straighforward porting of relational DBMS (the software vendor edge) • New hardware/software combination (the computer manufacturer edge) • Naturally extends to distributed databases with one server per site
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel DBMS – Objectives: • Much better cost / performance than mainframe solution • High-performance through parallelism • High throughput with inter-query parallelism • Low response time with intra-operation parallelism • High availability and reliability by exploiting data replication • Extensibility with the ideal goals • Linear speed-up • Linear scale-up
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel DBMS – Functional Architecture :
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel DBMS Functions: • Session manager • Host interface • Transaction monitoring for OLTP • Request manager • Compilation and optimization • Data directory management • Semantic data control • Execution control • Data manager • Execution of DB operations • Transaction management support • Data management DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH Parallel System Architectures: • Multiprocessor architecture alternatives • Shared memory (SM) • Shared disk (SD) • Shared nothing (SN) • Hybrid architectures • Non-Uniform Memory Architecture (NUMA) • Cluster
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Shared-Memory: DBMS on symmetric multiprocessors (SMP) Prototypes: XPRS, Volcano, DBS3 + Simplicity, load balancing, fast communication - Network cost, low extensibility
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Shared-Disk: Origins : DEC's VAXcluster, IBM's IMS/VS Data Sharing Used first by Oracle with its Distributed Lock Manager Now used by most DBMS vendors + network cost, extensibility, migration from uniprocessor - complexity, potential performance problem for cache coherency
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Shared-Nothing: Used by Teradata, IBM, Sybase, Microsoft for OLAP Prototypes: Gamma, Bubba, Grace, Prisma, EDS + Extensibility, availability - Complexity, difficult load balancing
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Hybrid Architectures: • Various possible combinations of the three basic architectures are possible to obtain different trade-offs between cost, performance, extensibility, availability, etc. • Hybrid architectures try to obtain the advantages of different architectures: • efficiency and simplicity of shared-memory • extensibility and cost of either shared disk or shared nothing • 2 main kinds: NUMA and cluster
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
NUMA: • Shared-Memory vs. Distributed Memory • Mixes two different aspects : addressing and memory • Addressing: single address space vs multiple address spaces • Physical memory: central vs distributed • NUMA = single address space on distributed physical memory • Eases application portability • Extensibility • The most successful NUMA is Cache Coherent NUMA (CC-NUMA)
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
CC-NUMA: • Principle • Main memory distributed as with shared-nothing • However, any processor has access to all other processors’ memories • Similar to shared-disk, different processors can access the same data in a conflicting update mode, so global cache consistency protocols are needed. • Cache consistency done in hardware through a special consistent cache interconnect • Remote memory access very efficient, only a few times (typically between 2 and 3 times) the cost of local access
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Cluster: • Combines good load balancing of SM with extensibility of SN • Server nodes: off-the-shelf components • From simple PC components to more powerful SMP • Yields the best cost/performance ratio • In its cheapest form, • Fast standard interconnect (e.g., Myrinet and Infiniband) with high bandwidth (Gigabits/sec) and low latency
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
SN cluster vs SD cluster: • SN cluster can yield best cost/performance and extensibility • But adding or replacing cluster nodes requires disk and data reorganization • SD cluster avoids such reorganization but requires disks to be globally accessible by the cluster nodes • Network-attached storage (NAS) • distributed file system protocol such as NFS, relatively slow and not appropriate for database management • Storage-area network (SAN) • Block-based protocol thus making it easier to manage cache consistency, efficient, but costlier
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel DBMS Techniques: • Data placement • Physical placement of the DB onto multiple nodes • Static vs. Dynamic • Parallel data processing • Select is easy • Join (and all other non-select operations) is more difficult • Parallel query optimization • Choice of the best parallel execution plans • Automatic parallelization of the queries and load balancing • Transaction management • Similar to distributed transaction management
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Data Partitioning: • Each relation is divided in n partitions (subrelations), where n is a function of relation size and access frequency • Implementation • Round-robin • Maps i-th element to node i mod n • Simple but only exact-match queries • B-tree index • Supports range queries but large index • Hash function • Only exact-match queries but small index
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Partitioning Schemes:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Placement Directory: • Performs two functions • F1 (relname, placement attval) = lognode-id • F2 (lognode-id) = phynode-id • In either case, the data structure for f1 and f2 should be available when needed at each node
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Join Processing: • Three basic algorithms for intra-operator parallelism • Parallel nested loop join: no special assumption • Parallel associative join: one relation is declustered on join attribute and equi-join • Parallel hash join: equi-join • They also apply to other complex operators such as duplicate elimination, union, intersection, etc. with minor adaptation
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel Nested Loop Join:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel Associative Join:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel Hash Join:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Parallel Query Optimization: • The objective is to select the “best” parallel execution plan for a query using the following components • Search space • Models alternative execution plans as operator trees • Left-deep vs. Right-deep vs. Bushy trees • Search strategy • Dynamic programming for small search space • Randomized for large search space • Cost model (abstraction of execution system) • Physical schema info. (partitioning, indexes, etc.) • Statistics and cost functions
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Load Balancing: • Problems arise for intra-operator parallelism with skewed data distributions • attribute data skew (AVS) • tuple placement skew (TPS) • selectivity skew (SS) • redistribution skew (RS) • join product skew (JPS) • Solutions • sophisticated parallel algorithms that deal with skew • dynamic processor allocation (at execution time)
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Load Balancing in a DB Cluster:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
The Exadata Database Machine: • New machine from Oracle with Sun • Objectives • OLTP, OLAP, mixed workloads • Oracle Real Application Cluster • 8+ servers bi-pro Xeon, 72 GB RAM • Exadata storage server : intelligent cache • 14+ cells, each with • 2 processors, 24 Go RAM • 385 GB of Flash memory (read is 10* faster than disk) • 12+ SATA disks of 2 To or 12 SAS disks of 600 GB
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Exadata Architecture:
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH
Thank you
DEVANG PATEL INSTITUTE OF ADVANCE TECHNOLOGY AND RESEARCH