Lecture 1 Parallel Databases
Lecture 1 Parallel Databases
Lecture 1 Parallel Databases
Advanced Database
Systems
dt
h
10 MB/s
Parallelism:
divide a big problem
into many smaller ones
to be solved in parallel.
Parallel DBMS: Introduction
Centralized DBMSs assume that
– Processing of individual transactions is sequential
– Control and/or data are maintained at one single site
These assumptions have been relaxed in recent decades:
– Parallel DBMSs:
Use of parallel evaluation techniques; parallelization of various operations
such as data loading, index construction, and query evaluations.
Data may still be centralized; distribution dictated solely by performance
considerations
– Distributed DBMSs:
Use of both control and data distribution; data and control are dispersed and
stored across several sites
Data distribution also dictated by considerations of increased availability and
local ownership.
Parallel DBMS: Introduction (Cont’d)
Parallelism is natural to DBMS processing
– Pipeline parallelism:
many machines each doing one step in a multi-step
process.
– Partition parallelism:
many machines doing the same thing to different pieces
of data.
– Both are natural in DBMS!
Any Any
Sequential Sequential
Pipeline Program Program
Sequential
Any Any
Partition Sequential
Sequential
Program
Sequential
Program
Oracle RAC
RAM
What is shared?
Nothing
RAM
Architectures for Parallel Databases
Shared Memory Shared Disk Shared Nothing
(SMP) (network)
Processors
Memory
# of trans./sec.
(throughput)
Speed-Up
– For a given amount of data,
more resources (CPUs)
means proportionally more
transactions processed per
second. # of CPUs
# of trans./sec.
(throughput)
– If resources increased in
proportion to increase in
data size, # of trans./sec.
remains constant.
# of CPUS, DB size
What Systems Work
Which Way ?
Shared Nothing
Teradata: 400 nodes CLIENTS
What to Cover
Partitioning Techniques
• Range, Hash, Round Robin partitioning techniques
Comparison of Partitioning Techniques
• Depending on scan, point, range types queries
Handling of Skew
• Types of skews (attribute-value & partition skew)
Partitioning Techniques
• Assume that there are n disks, D0, D1, . . . , Dn−1,
• Range P assigning contiguous attribute-value ranges to each disk
• Hash P hashes the tuple values from 0 to n-1
• Round Robin P sends the ith tuple to disk number Di mod n .
A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z A...E F...J K...N O...S T...Z
INPUT 2
hash 2
Original Relations ... function
h
(R then S) B-1
B-1
Disk B main memory buffers Disk
A B R S
Parallel Query Optimization (Cont’d)