Query Execution: Intro To Database Systems Andy Pavlo

Download as pdf or txt
Download as pdf or txt
You are on page 1of 56

Query Execution

12 Part I

Intro to Database Systems Andy Pavlo


15-445/15-645
Fall 2019 AP Computer Science
Carnegie Mellon University
2

ADMINISTRIVIA

Homework #3 is due Wed Oct 9th @ 11:59pm

Mid-Term Exam is Wed Oct 16th @ 12:00pm

Project #2 is due Sun Oct 20th @ 11:59pm

CMU 15-445/645 (Fall 2019)


3

QUERY PL AN
SELECT R.id, S.cdate
The operators are arranged in a tree. FROM R JOIN S
ON R.id = S.id
Data flows from the leaves of the tree WHERE S.value > 100
up towards the root.
p R.id, S.value

The output of the root node is the


result of the query. ⨝ R.id=S.id

s value>100

R S
CMU 15-445/645 (Fall 2019)
4

T O D AY ' S A G E N D A

Processing Models
Access Methods
Expression Evaluation

CMU 15-445/645 (Fall 2019)


5

PROCESSING MODEL

A DBMS's processing model defines how the


system executes a query plan.
→ Different trade-offs for different workloads.

Approach #1: Iterator Model


Approach #2: Materialization Model
Approach #3: Vectorized / Batch Model

CMU 15-445/645 (Fall 2019)


6

I T E R AT O R M O D E L

Each query plan operator implements a Next


function.
→ On each invocation, the operator returns either a single
tuple or a null marker if there are no more tuples.
→ The operator implements a loop that calls next on its
children to retrieve their tuples and then process them.

Also called Volcano or Pipeline Model.

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L

Next() for t in child.Next():


SELECT R.id, S.cdate
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
Next() for t1 in left.Next(): WHERE S.value > 100
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

Next() for t in child.Next(): ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100
Next() for t in R: Next() for t in S:

R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L
SELECT R.id, S.cdate
1 for t in child.Next():
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
for t1 in left.Next(): WHERE S.value > 100
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

for t in child.Next(): ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100
for t in R: for t in S:

R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L
SELECT R.id, S.cdate
1 for t in child.Next():
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
WHERE S.value > 100
2 for t1 in left.Next():
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

for t in child.Next(): ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100
for t in R: for t in S:

R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L
SELECT R.id, S.cdate
1 for t in child.Next():
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
WHERE S.value > 100
2 for t1 in left.Next():
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

Single Tuple
for t in child.Next(): ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100

3 for t in R: for t in S:

R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L
SELECT R.id, S.cdate
1 for t in child.Next():
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
WHERE S.value > 100
2 for t1 in left.Next():
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

for t in child.Next():
4 ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100

3 for t in R: for t in S:
5
R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


7

I T E R AT O R M O D E L
SELECT R.id, S.cdate
1 for t in child.Next():
emit(projection(t)) FROM R JOIN S
ON R.id = S.id
WHERE S.value > 100
2 for t1 in left.Next():
buildHashTable(t1)
for t2 in right.Next():
if probe(t2): emit(t1⨝t2) p R.id, S.value

for t in child.Next():
4 ⨝ R.id=S.id

s
if evalPred(t): emit(t)
value>100

3 for t in R: for t in S:
5
R S
emit(t) emit(t)

CMU 15-445/645 (Fall 2019)


8

I T E R AT O R M O D E L

This is used in almost every DBMS. Allows for


tuple pipelining.

Some operators have to block until their children


emit all of their tuples.
→ Joins, Subqueries, Order By

Output control works easily with this approach.

CMU 15-445/645 (Fall 2019)


9

M AT E R I A L I Z AT I O N M O D E L

Each operator processes its input all at once and


then emits its output all at once.
→ The operator "materializes" its output as a single result.
→ The DBMS can push down hints into to avoid scanning
too many tuples.
→ Can send either a materialized row or a single column.

The output can be either whole tuples (NSM) or


subsets of columns (DSM)

CMU 15-445/645 (Fall 2019)


10

M AT E R I A L I Z AT I O N M O D E L
out = [ ]
1 for t in child.Output():
out.add(projection(t)) SELECT R.id, S.cdate
return out FROM R JOIN S
ON R.id = S.id
out = [ ]
for t1 in left.Output(): WHERE S.value > 100

p
buildHashTable(t1)
for t2 in right.Output(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
return out

out = [ ] ⨝ R.id=S.id

s
for t in child.Output():
if evalPred(t): out.add(t)
return out value>100

R S
out = [ ] out = [ ]
for t in R: for t in S:
out.add(t) out.add(t)
return out return out
CMU 15-445/645 (Fall 2019)
10

M AT E R I A L I Z AT I O N M O D E L
out = [ ]
1 for t in child.Output():
out.add(projection(t)) SELECT R.id, S.cdate
return out FROM R JOIN S
ON R.id = S.id
out = [ ]
WHERE S.value > 100
2
for t1 in left.Output():

p
buildHashTable(t1)
for t2 in right.Output(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
return out

out = [ ] ⨝ R.id=S.id

s
for t in child.Output():
if evalPred(t): out.add(t)
All Tuples return out value>100

3
R S
out = [ ] out = [ ]
for t in R: for t in S:
out.add(t) out.add(t)
return out return out
CMU 15-445/645 (Fall 2019)
10

M AT E R I A L I Z AT I O N M O D E L
out = [ ]
1 for t in child.Output():
out.add(projection(t)) SELECT R.id, S.cdate
return out FROM R JOIN S
ON R.id = S.id
out = [ ]
WHERE S.value > 100
2
for t1 in left.Output():

p
buildHashTable(t1)
for t2 in right.Output(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
return out

out = [ ] ⨝ R.id=S.id

4
s
for t in child.Output():
if evalPred(t): out.add(t)
return out value>100

3 5
R S
out = [ ] out = [ ]
for t in R: for t in S:
out.add(t) out.add(t)
return out return out
CMU 15-445/645 (Fall 2019)
10

M AT E R I A L I Z AT I O N M O D E L
out = [ ]
1 for t in child.Output():
out.add(projection(t)) SELECT R.id, S.cdate
return out FROM R JOIN S
ON R.id = S.id
out = [ ]
WHERE S.value > 100
2
for t1 in left.Output():

p
buildHashTable(t1)
for t2 in right.Output(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
return out

out = [ ] ⨝ R.id=S.id

4
s
for t in child.Output():
if evalPred(t): out.add(t)
return out value>100

3 5
R S
out = [ ] out = [ ]
for t in R: for t in S:
out.add(t) out.add(t)
return out return out
CMU 15-445/645 (Fall 2019)
11

M AT E R I A L I Z AT I O N M O D E L

Better for OLTP workloads because queries only


access a small number of tuples at a time.
→ Lower execution / coordination overhead.
→ Fewer function calls.

Not good for OLAP queries with large


intermediate results.

CMU 15-445/645 (Fall 2019)


12

V E C T O R I Z AT I O N M O D E L

Like the Iterator Model where each operator


implements a Next function in this model.
Each operator emits a batch of tuples instead of a
single tuple.
→ The operator's internal loop processes multiple tuples at a
time.
→ The size of the batch can vary based on hardware or
query properties.

CMU 15-445/645 (Fall 2019)


13

V E C T O R I Z AT I O N M O D E L
out = [ ]
for t in child.Next(): 1 SELECT R.id, S.cdate
out.add(projection(t))
if |out|>n: emit(out) FROM R JOIN S
ON R.id = S.id
2 out = [ ]
for t1 in left.Next(): WHERE S.value > 100

p
buildHashTable(t1)
for t2 in right.Next(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
if |out|>n: emit(out)

out = [ ]
for t in child.Next():
⨝ R.id=S.id

if evalPred(t): out.add(t)
if |out|>n: emit(out) s value>100

out = [ ] Tuple Batch out = [ ]


3 for t in R:
out.add(t)
if |out|>n: emit(out)
for t in S:
out.add(t)
if |out|>n: emit(out)
R S
CMU 15-445/645 (Fall 2019)
13

V E C T O R I Z AT I O N M O D E L
out = [ ]
for t in child.Next(): 1 SELECT R.id, S.cdate
out.add(projection(t))
if |out|>n: emit(out) FROM R JOIN S
ON R.id = S.id
2 out = [ ]
for t1 in left.Next(): WHERE S.value > 100

p
buildHashTable(t1)
for t2 in right.Next(): R.id, S.value
if probe(t2): out.add(t1⨝t2)
if |out|>n: emit(out)

out = [ ]
for t in child.Next(): 4 ⨝ R.id=S.id

if evalPred(t): out.add(t)
if |out|>n: emit(out) s value>100

out = [ ] Tuple Batch out = [ ]


3 5
for t in R:
out.add(t)
if |out|>n: emit(out)
for t in S:
out.add(t)
if |out|>n: emit(out)
R S
CMU 15-445/645 (Fall 2019)
14

V E C T O R I Z AT I O N M O D E L

Ideal for OLAP queries because it greatly reduces


the number of invocations per operator.
Allows for operators to use vectorized (SIMD)
instructions to process batches of tuples.

CMU 15-445/645 (Fall 2019)


15

PLAN PROCESSING DIRECTION

Approach #1: Top-to-Bottom


→ Start with the root and "pull" data up from its children.
→ Tuples are always passed with function calls.

Approach #2: Bottom-to-Top


→ Start with leaf nodes and push data to their parents.
→ Allows for tighter control of caches/registers in pipelines.

CMU 15-445/645 (Fall 2019)


16

ACCESS METHODS
SELECT R.id, S.cdate
An access method is a way that the FROM R JOIN S
DBMS can access the data stored in a ON R.id = S.id
table. WHERE S.value > 100
→ Not defined in relational algebra.
p R.id, S.value

Three basic approaches:


→ Sequential Scan ⨝ R.id=S.id

→ Index Scan
→ Multi-Index / "Bitmap" Scan s value>100

R S
CMU 15-445/645 (Fall 2019)
17

SEQUENTIAL SCAN

For each page in the table: for page in table.pages:


→ Retrieve it from the buffer pool. for t in page.tuples:
→ Iterate over each tuple and check whether if evalPred(t):
to include it. // Do Something!

The DBMS maintains an internal


cursor that tracks the last page / slot
it examined.

CMU 15-445/645 (Fall 2019)


18

S E Q U E N T I A L S C A N : O P T I M I Z AT I O N S

This is almost always the worst thing that the


DBMS can do to execute a query.

Sequential Scan Optimizations:


→ Prefetching
→ Buffer Pool Bypass
→ Parallelization
→ Zone Maps
→ Late Materialization
→ Heap Clustering

CMU 15-445/645 (Fall 2019)


19

ZONE MAPS

Pre-computed aggregates for the attribute values


in a page. DBMS checks the zone map first to
decide whether it wants to access the page.

Original Data Zone Map


val type val
SELECT * FROM table 100 MIN 100
WHERE val > 600 200 MAX 400
300 AVG 280
400 SUM 1400
400 COUNT 5
CMU 15-445/645 (Fall 2019)
20

L AT E M AT E R I A L I Z AT I O N

DSM DBMSs can delay stitching together tuples


until the upper parts of the query plan.
SELECT AVG(foo.c)
γ AVG(foo.c) FROM
ON
foo JOIN bar
foo.b = bar.b
WHERE foo.a > 100
⨝ foo.b=bar.b

s
a b c
a>100 0
1
bar foo 2
3
CMU 15-445/645 (Fall 2019)
20

L AT E M AT E R I A L I Z AT I O N

DSM DBMSs can delay stitching together tuples


until the upper parts of the query plan.
SELECT AVG(foo.c)
γ AVG(foo.c) FROM
ON
foo JOIN bar
foo.b = bar.b
WHERE foo.a > 100
⨝ foo.b=bar.b

s
a b c
a>100 Offsets 0
1
bar foo 2
3
CMU 15-445/645 (Fall 2019)
20

L AT E M AT E R I A L I Z AT I O N

DSM DBMSs can delay stitching together tuples


until the upper parts of the query plan.
SELECT AVG(foo.c)
γ AVG(foo.c) FROM
ON
foo JOIN bar
foo.b = bar.b
WHERE foo.a > 100
⨝ foo.b=bar.b Offsets

s
a b c
a>100 Offsets 0
1
bar foo 2
3
CMU 15-445/645 (Fall 2019)
20

L AT E M AT E R I A L I Z AT I O N

DSM DBMSs can delay stitching together tuples


until the upper parts of the query plan.
SELECT AVG(foo.c)
γ AVG(foo.c) Result FROM
ON
foo JOIN bar
foo.b = bar.b
WHERE foo.a > 100
⨝ foo.b=bar.b Offsets

s
a b c
a>100 Offsets 0
1
bar foo 2
3
CMU 15-445/645 (Fall 2019)
21

HEAP CLUSTERING

Tuples are sorted in the heap's pages Scan Direction


using the order specified by a
clustering index.

If the query accesses tuples using the 101 102 103 104
clustering index's attributes, then the
DBMS can jump directly to the pages
that it needs.

CMU 15-445/645 (Fall 2019)


22

INDEX SCAN

The DBMS picks an index to find the tuples that


the query needs.

Which index to use depends on:


→ What attributes the index contains
→ What attributes the query references
→ The attribute's value domains
→ Predicate composition
→ Whether the index has unique or non-unique keys

CMU 15-445/645 (Fall 2019)


23

INDEX SCAN
SELECT * FROM students
Suppose that we a single table with WHERE age < 30
100 tuples and two indexes: AND dept = 'CS'
→ Index #1: age AND country = 'US'
→ Index #2: dept

Scenario #1 Scenario #2
There are 99 people There are 99 people in
under the age of 30 but the CS department but
only 2 people in the CS only 2 people under the
department. age of 30.
CMU 15-445/645 (Fall 2019)
24

M U LT I - I N D E X S C A N

If there are multiple indexes that the DBMS can


use for a query:
→ Compute sets of record ids using each matching index.
→ Combine these sets based on the query's predicates
(union vs. intersect).
→ Retrieve the records and apply any remaining predicates.

Postgres calls this Bitmap Scan.

CMU 15-445/645 (Fall 2019)


25

M U LT I - I N D E X S C A N
SELECT * FROM students
With an index on age and an index WHERE age < 30
on dept, AND dept = 'CS'
→ We can retrieve the record ids satisfying AND country = 'US'
age<30 using the first,
→ Then retrieve the record ids satisfying
dept='CS' using the second,
→ Take their intersection
→ Retrieve records and check
country='US'.

CMU 15-445/645 (Fall 2019)


26

M U LT I - I N D E X S C A N
SELECT * FROM students
Set intersection can be done with WHERE age < 30
bitmaps, hash tables, or Bloom filters. AND dept = 'CS'
AND country = 'US'

age<30 dept='CS'
record ids

CMU 15-445/645 (Fall 2019)


26

M U LT I - I N D E X S C A N
SELECT * FROM students
Set intersection can be done with WHERE age < 30
bitmaps, hash tables, or Bloom filters. AND dept = 'CS'
AND country = 'US'

age<30 dept='CS'
record ids record ids

CMU 15-445/645 (Fall 2019)


26

M U LT I - I N D E X S C A N
SELECT * FROM students
Set intersection can be done with WHERE age < 30
bitmaps, hash tables, or Bloom filters. AND dept = 'CS'
AND country = 'US'

age<30 dept='CS'
record ids record ids

fetch records country='US'

CMU 15-445/645 (Fall 2019)


27

I N D E X S C A N PA G E S O R T I N G
Scan Direction
Retrieving tuples in the order that
appear in an unclustered index is
inefficient.
101 102 103 104
The DBMS can first figure out all the
tuples that it needs and then sort them
based on their page id.

CMU 15-445/645 (Fall 2019)


27

I N D E X S C A N PA G E S O R T I N G
Scan Direction
Retrieving tuples in the order that
appear in an unclustered index is
inefficient.
101 102 103 104
The DBMS can first figure out all the
tuples that it needs and then sort them
based on their page id.

CMU 15-445/645 (Fall 2019)


27

I N D E X S C A N PA G E S O R T I N G
Scan Direction
Retrieving tuples in the order that
appear in an unclustered index is
inefficient.
101 102 103 104
The DBMS can first figure out all the Page 102
tuples that it needs and then sort them Page
Page
103
104
based on their page id. Page
Page
104
102
Page 103
Page 102
Page 102
Page 101
Page 103
Page 104
Page 103

CMU 15-445/645 (Fall 2019)


27

I N D E X S C A N PA G E S O R T I N G
Scan Direction
Retrieving tuples in the order that
appear in an unclustered index is
inefficient.
101 102 103 104
The DBMS can first figure out all the Page 102 Page 101
tuples that it needs and then sort them Page
Page
103
104
Page
Page
101
102
based on their page id. Page
Page
104
102
Page
Page
102
102
Page 103 Page 102
Page 102 Page 103
Page 102 Page 103
Page 101 Page 103
Page 103 Page 104
Page 104 Page 104
Page 103 Page 104

CMU 15-445/645 (Fall 2019)


28

E X P R E S S I O N E VA L U AT I O N
SELECT R.id, S.cdate
The DBMS represents a WHERE clause FROM R JOIN S
as an expression tree. ON R.id = S.id
WHERE S.value > 100
The nodes in the tree represent
different expression types: AND
→ Comparisons (=, <, >, !=)
→ Conjunction (AND), Disjunction (OR)
→ Arithmetic Operators (+, -, *, /, %) = >
→ Constant Values
→ Tuple Attribute References
Attribute(R.id) Attribute(S.id) Attribute(value) Constant(100)

CMU 15-445/645 (Fall 2019)


29

E X P R E S S I O N E VA L U AT I O N

SELECT * FROM S
WHERE B.value = ? + 1

CMU 15-445/645 (Fall 2019)


29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +

Parameter(0) Constant(1)

CMU 15-445/645 (Fall 2019)


29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +

Parameter(0) Constant(1)

CMU 15-445/645 (Fall 2019)


29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +
1000

Parameter(0) Constant(1)

CMU 15-445/645 (Fall 2019)


29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +
1000

Parameter(0) Constant(1)
999
CMU 15-445/645 (Fall 2019)
29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +
1000

Parameter(0) Constant(1)
999 1
CMU 15-445/645 (Fall 2019)
29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

Attribute(S.value) +
1000 1000

Parameter(0) Constant(1)
999 1
CMU 15-445/645 (Fall 2019)
29

E X P R E S S I O N E VA L U AT I O N
Execution Context
SELECT * FROM S Current Tuple Query Parameters Table Schema
WHERE B.value = ? + 1 (123, 1000) (int:999) S→(int:id, int:value)

=
true
Attribute(S.value) +
1000 1000

Parameter(0) Constant(1)
999 1
CMU 15-445/645 (Fall 2019)
30

E X P R E S S I O N E VA L U AT I O N

Evaluating predicates in this manner


=
is slow.
→ The DBMS traverses the tree and for each
node that it visits it must figure out what
the operator needs to do. Constant(1) Constant(1)
Consider the predicate "WHERE 1=1"

A better approach is to just evaluate


1 = 1
the expression directly.
→ Think JIT compilation

CMU 15-445/645 (Fall 2019)


31

CONCLUSION

The same query plan be executed in multiple ways.

(Most) DBMSs will want to use an index scan as


much as possible.

Expression trees are flexible but slow.

CMU 15-445/645 (Fall 2019)


32

NEXT CLASS

Parallel Query Execution

CMU 15-445/645 (Fall 2019)

You might also like