Chapter 15

Topics
{ Query evaluation plan

Relational Query Optimization { Query block
{ Relation algebra equivalence
{ Cost estimation
Linda Wu { Enumeration of alternative plans
{ Nested queries
(CMPT 354 • 2004-2)
References
z Chapter 12: 12.1, 12.4.1, 12.4.2, 12.6.1
z Chapter 15
Chapter 15 CMPT 354 • 2004-2 2
Overview of Query Optimization Query Evaluation Plan

{ SQL queries are decomposed into smaller { An extended relational algebra tree, with
units: query blocks choice of algorithms for each relational
operation
{ Query blocks are translated into extended πsname (On-the-fly)
relational algebra expressions
{ A query optimizer optimizes a single block σ bid=100 ∧ rating>5 (On-the-fly)
at a time
z Enumerate alternative evaluation plans for the (Simple nested loops)
sid=sid
RA expression of the block
z Estimate the cost of each enumerated plan, (File scan) Reserves Sailors (File scan)
choose the plan with the lowest estimated cost
{ Ideally: find the best plan
πsname (σ bid=100 ∧ rating>5 (Reserves ZY Sailors) )
{ Practically: avoid the worst plans
Chapter 15 CMPT 354 • 2004-2 3 Chapter 15 CMPT 354 • 2004-2 4

Query Evaluation Plan (Cont.) Query Evaluation Plan (Cont.)
{ Pipelining multiple-operator queries { Left-deep tree
z The result of one operator is fed to the another z On the plan tree, the right child of each join
operator without creating a temporary table node is a base table
(i.e., intermediate result is not materialized) { Query optimizers typically concentrate on
z Save the cost of writing out the intermediate all left-deep plans
result and reading it back in
z As # of joins increases, # of alternative plans
z When the input table to a unary operator (π,σ) is
increases: it is necessary to prune the space of
pipelined into it, we say the operator is applied
alternative plans
on-the-fly
z Left-deep tree allows the generation of all fully-
pipelined plans
z Many alternative plans with less cost are ruled

out!
Query Evaluation Plan (Cont.) Query Block: Unit of Optimization

{ Query block
A B C D
z An SQL query with no nesting and exactly one
SELECT clause, one FROM clause, and at most
one WHERE, GROUP BY and HAVING clauses
D D
z Nested block: usually treated as calls to a
C C subroutine, made once per outer tuple
C D A B
A B A B SELECT S.sid, MIN (R.day)
FROM Sailors S, Reserves R, Boats B Nested
(Left-deep tree)
WHERE S.sid = R.sid AND R.bid = B.bid AND block
Outer B.color = ‘red’ AND S.rating =
Right child: inner relation of join; block ( SELECT MAX (S2.rating)
must be materialized FROM Sailors S2 )
GROUP BY S.sid
HAVING COUNT (*) > 1
Query Block (Cont.) Query Block (Cont.)
{ Query block to relational algebra { Query block optimization
expression z The optimizer find the best plan for the σπ×
expression
z SELECT → projection; WHERE → selection;
{ Plans are enumerated by applying several
FROM → cross - product equivalences between RA expressions
z A block is essential a σπ× algebra expression, { For each plan, estimate the I/O cost & result
with remaining operations carried out on the size for each operation in the tree
result of the σπ× expression z This plan is evaluated and the resulting tuples
are sorted/hashed to implement the GROUP
π S.sid, MIN(R.day) ( BY clause
HAVING COUNT(*)>1 ( z The HAVING clause is applied to eliminate
GROUP BY S.sid ( some groups
π S.sid, R.day ( z Aggregate operations in SELECT clause are
σ S.sid=R.sid∧R.bid=B.bid∧B.color=‘red’∧S.rating=valueFromNestedBlk ( computed for each remaining group
Sailors × Reserves × Sailors
Chapter 15 CMPT 354) •)2004-2 9 Chapter 15 CMPT 354 • 2004-2 10
Relational Algebra Equivalences RA Equivalences (Cont.)

{ Allow us to choose different join orders and { πa (σc (R) ) ≡ σc (πa (R) )
to push selections and projections ahead of if condition c only involves attributes in a
joins { R ZY c S ≡ σc ( R × S)
Selections: σ c 1 ∧ ... ∧ cn ( R ) ≡ σ c1 ( . . . σ cn ( R ) ) { σc ( R × S) ≡ σc (R) × S, σc ( R ZY S) ≡ σc (R) ZY S
σ c 1 (σ c 2 ( R ) ) ≡ σ c 2 (σ c1 ( R ) ) (Commutative) if c appears in R but not S
π a ( R × S) ≡ π a1 (R) × π a2 (S)
π a 1 ( R ) ≡ π a 1 ( . . . (π a n ( R ) )) (Cascading)
{
Projections:
where ai ⊆ ai+1 π a ( R ZY S) ≡ π a1 (R) ZY π a2 (S)
if a1 ⊂ a, a2 ⊂ a, a1 appear only in R, a2 appear
Cross-products R × (S × T) ≡ (R × S) × T (Associative) only in S (similar equivalences hold for selections)
& joins: R ZY (S ZY T) ≡ (R ZY S) ZY T
R×S≡S×R (Commutative)
R ZY S ≡ S ZY R
Cost Estimation Cost Estimation (Cont.)
{ For each plan considered { System catalogs
z Estimate the I/O cost of each operation in the z A collection of tables that contains the descriptive
plan tree information for every table and index
{ Need to know # of pages and available index { Information in catalog
(from system catalog) z For each relation: file name, file structure,
z Estimate the result size for each operation in attribute names/types, # tuples (NTuples), #
the plan tree pages (NPages), index name, PK, FK, …
{ The result of one operation is the input for the z For each index: index name, index structure,
operation in the parent node, so, it affects the search key attributes, # pages, low/high key
estimation of size and cost of the parent node values High(I)/Low(I), # distinct key values
{ Use information about the input relations
NKeys(I) …
z For each tree index: height
Cost Estimation (Cont.) Cost Estimation (Cont.)

{ Result size estimation (single relation query) Terms Reduction factor
z Result cardinality = (Max # tuples) * (product of column = value 1 / NKeys(I) or 1/10
all reduction factors) Assuming uniform distribution of tuples
{ Assume terms are independent among index key values
z Maximum # tuples = product of the column1 = 1 / MAX(NKeys(I1), NKeys(I2)) or 1/10
cardinalities of relations in the FROM clause column2 Assuming each key value in the smaller
z Reduction factor (RF) associated with each term index has a matching value in the other
reflects the impact of the term in reducing result index
size column > value (High(I) – value) / (High(I) – Low(I))
column IN (list (RF of column=value) * (# of values in
SELECT attribute list of values) the list)
FROM relation list
I, I1, I2: indexes on the corresponding columns
WHERE term1 AND ... AND termk

Cost Estimation (Cont.) Cost Estimation (Cont.)
{ I/O cost estimates for single relation query { Example 1: an index on rating
z Index I on primary key matches selection z (1/NKeys(I)) * NTuples(R) = (1/10) * 40000 tuples
z Clustered index
Height(I)+1 for a B+ tree, ≈ 1.2 for a hash index
cost = (1/NKeys(I)) * (NPages(I)+NPages(R))
z Clustered index I matches one or more selection
= (1/10) * (50+500)
terms
z Unclustered index
(NPages(I)+NPages(R)) * (product of RFs of
Cost = (1/NKeys(I)) * (NPages(I)+NTuples(R))
matching terms)
= (1/10) * (50+40000)
z Non-clustered index I matches one or more
selection terms { Example 2: an index on sid
z Clustered index: cost = 50+500
(NPages(I)+NTuples(R)) * (product of RFs of
matching terms) z Unclustered index: cost = 50+40000
z Sequential scan of file: NPages(R) { Exampe 3: doing a file scan SELECT S.sid
z We retrieve all file pages (500) FROM Sailors S
WHERE S.rating=8
Cost Estimation (Cont.) Enumeration of Alternative Plans

{ Result size estimates (multi- { Two main cases
relation query) z Single-relation queries
{ Over a single relation
z Multirelation plans are built up by { A combination of selects, projects, and
joining one new relation at a time aggregate operations; no join
z Cost of join method, plus z Multi-relation queries
estimation of join cardinality gives { Over two or more relations
us both cost estimate and result { Requires join (or cross-product): expensive
size estimate

Single-Relation Queries Single-Relation Queries (Cont.)
{ Each available access path is considered, { Plan without index
and the one with the least estimated cost is z Perform a file scan to retrieve tuples and apply
chosen the selection and projections on-the-fly
z Access path: a method of retrieving tuples, e,g., z Write out tuples (as table Temp)
file scan, or, an index that match a selection z Sort Temp, and generate one answer tuple for
{ The different operations are essentially each qualifying group on-the-fly
carried out together z I/O cost
{ File scan: NPages(R)
z e.g., if an index is used for a selection,
{ Writing out Temp table: NPages(Temp) =
projection is done for each retrieved tuple, and
the resulting tuples are pipelined into the NPages(R) * (size ratio) * (RF of selections)
aggregate computation { Sorting: 3*NPages(Temp) (assume Temp can
be sorted in 2 passes)
Single-Relation Queries (Cont.) Single-Relation Queries (Cont.)

{ Plans using indexes z Sorted index access path
z Single-index access path { If the list of grouping attribute is a prefix of a
tree index, use the index to retrieve tuples in
{ If several indexes match the selections, choose
the order required by the GROUP BY clause,
the most selective access path to retrieve data
apply projections and selections and compute
entries and then tuples, apply projections and
aggregate values on-the-fly
remaining selection terms, (write the result to a
temp relation), sort, compute aggregate values z Index-only access path
z Multiple-index access path { If all the attributes mentioned in the query are
included in the search key for an index, index-
{ If several indexes of alt. (2)/(3) match the
only scan, apply projections and selections,
selection, use each index to retrieve a set of
(write the result to a temp relation), sort,
rids, intersect rids, sort rids by page id, retrieve
compute aggregate values
tuples, apply projection and remaining selection
terms, (write the result to a temp relation),
sort, compute aggregate values

Multi-Relation Queries Multi-Relation Queries (Cont.)
{ Enumeration of left-deep plans { For each subset of relations, retain only
z All enumerated left-deep plans differ only in the z Cheapest plan overall, plus,
order of relations, the access method for each z Cheapest plan for each “interesting order” of the
relation, and the join method for each join tuples
operation
{ ORDER BY, GROUP BY, aggregates, etc.
z N passes (if N relations joined)
z Handled as a final step, using either an
Pass 1: find best 1-relation plan for each relation; “interestingly ordered” plan or an additional
selection/projections considered as early as possible sorting operator
Pass 2: find best way to join result of each 1-
relation plan (as outer) to another relation (all 2-
{ Avoid cross-product if possible
relation plans) z An N-1 relation plan is not combined with an
additional relation unless there is a join condition
Pass N: find best way to join the result of a (N-1)-
between them, unless all terms in WHERE have
relation plan (as outer) to the N’th relation (all N-
been used up
relation plans)
Nested Queries Summary

{ Nested block is optimized independently, { Query optimization is an important task in
with the outer tuple considered as a relational DBMS
providing a selection condition { Query optimization
{ Outer block is optimized with the cost of z Consider a set of alternative plans
“calling” nested block computation taken z Estimate cost of each plan that is considered
into account { Single-relation queries
z All access paths considered, cheapest is chosen
{ Implicit ordering imposed by nesting
means that some good strategies are not { Multi-relation queries
z Only left-deep plans considered, N passes
considered
z The non-nested version of the query is typically
optimized better

Chapter 15

Uploaded by

Copyright:

Available Formats

Chapter 15

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Chapter 15

Uploaded by

Copyright:

Available Formats

Topics

{ Query evaluation plan

Chapter 15 CMPT 354 • 2004-2 2

Overview of Query Optimization Query Evaluation Plan

Chapter 15 CMPT 354 • 2004-2 3 Chapter 15 CMPT 354 • 2004-2 4

z Many alternative plans with less cost are ruled

Chapter 15 CMPT 354 • 2004-2 5 Chapter 15 CMPT 354 • 2004-2 6

Query Evaluation Plan (Cont.) Query Block: Unit of Optimization

Relational Algebra Equivalences RA Equivalences (Cont.)

Chapter 15 CMPT 354 • 2004-2 13 Chapter 15 CMPT 354 • 2004-2 14

Cost Estimation (Cont.) Cost Estimation (Cont.)

Chapter 15 CMPT 354 • 2004-2 15 Chapter 15 CMPT 354 • 2004-2 16

Cost Estimation (Cont.) Enumeration of Alternative Plans

Chapter 15 CMPT 354 • 2004-2 19 Chapter 15 CMPT 354 • 2004-2 20

Chapter 15 CMPT 354 • 2004-2 21 Chapter 15 CMPT 354 • 2004-2 22

Single-Relation Queries (Cont.) Single-Relation Queries (Cont.)

Chapter 15 CMPT 354 • 2004-2 23 Chapter 15 CMPT 354 • 2004-2 24

Nested Queries Summary

Chapter 15 CMPT 354 • 2004-2 27 Chapter 15 CMPT 354 • 2004-2 28

You might also like