Relational Algebra: R & G, Chapter 4

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 27

Relational Algebra

R & G, Chapter 4
By relieving the brain of all unnecessary
work, a good notation sets it free to
concentrate on more advanced problems,
and, in effect, increases the mental power of
the race.
-- Alfred North Whitehead (1861 - 1947)

Relational Query Languages


Query languages: Allow manipulation and
retrieval of data from a database.
Relational model supports simple, powerful QLs:
Strong formal foundation based on logic.
Allows for much optimization.
Query Languages != programming languages!
QLs not expected to be Turing complete.
QLs not intended to be used for complex
calculations.
QLs support easy, efficient access to large data sets.

Formal Relational Query


Languages
Two mathematical Query Languages form
the basis for real languages (e.g. SQL),
and for implementation:
Relational Algebra: More operational, very
useful for representing execution plans.
Relational Calculus: Lets users describe
what they want, rather than how to
compute it. (Non-procedural,
declarative.)

Understanding Algebra & Calculus is key to


understanding SQL, query processing!

Preliminaries
A query is applied to relation instances, and
the result of a query is also a relation
instance.
Schemas of input relations for a query are fixed
(but query will run over any legal instance)
The schema for the result of a given query is also
fixed. It is determined by the definitions of the
query language constructs.
Compare above two points to XPath, XQuery!
Positional vs. named-field notation:
Positional notation easier for formal definitions,
named-field notation more readable.
Both used in SQL
Though positional notation is not encouraged

Relational Algebra: 5 Basic


Operations

Selection ( ) Selects a subset of rows from


relation (horizontal).
Projection ( ) Retains only wanted columns from
relation (vertical).
Cross-product ( ) Allows us to combine two
relations.
Set-difference ( ) Tuples in r1, but not in r2.
Union ( ) Tuples in r1 or in r2.
Since each operation returns a relation, operations can
be composed! (Algebra is closed.)

Example InstancesR1 sid bid


22
58
Boats

bid
101
102
103
104

bname
Interlake
Interlake
Clipper
Marine

color
blue
red
green
red

day
101 10/10/96
103 11/12/96

S1 sid

sname rating age


dustin
7
45.0
lubber
8
55.5
rusty
10 35.0

S2 sid

sname rating age


yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0

22
31
58

28
31
44
58

Projection ()
Examples:

age(S2) ; sname,rating(S2)

Retains only attributes that are in the projection list.


Schema of result:
exactly the fields in the projection list, with the same names
that they had in the input relation.

Projection operator has to eliminate duplicates


do they arise? Why remove them?)

(How

Note: real systems typically dont do duplicate elimination


unless the user explicitly asks for it. (Why not?)

Projection ()

sname rating
yuppy
lubber
guppy
rusty

sid
28
31
44
58

sname rating age


yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2

9
8
5
10

sname,rating (S 2)
age
35.0
55.5

age(S2)

Selection ()
Selects rows that satisfy selection condition.
Result is a relation.
Schema of result is same as that of the input relation.

Do we need to do duplicate elimination?

sid
28
31
44
58

sname
yuppy
lubber
guppy
rusty

rating
9
8
5
10

rating >8(S2)

age
35.0
55.5
35.0
35.0

sname rating
yuppy 9
rusty 10

sname,rating( rating >8(S2))

Union and Set-Difference


Both of these operations take two input
relations, which must be union-compatible:
Same number of fields.
`Corresponding fields have the same type.

For which, if any, is duplicate elimination


required?

Union
sid sname rating age

sid sname rating age

22
31
58

22
31
58
44
28

dustin
lubber
rusty

7
8
10

45.0
55.5
35.0

S1

sid
28
31
44
58

sname rating age


yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2

dustin
lubber
rusty
guppy
yuppy

7
8
10
5
9

S1 S2

45.0
55.5
35.0
35.0
35.0

Set Difference
sid sname rating age

sid sname rating age

22
31
58

22

dustin
lubber
rusty

7
8
10

45.0
55.5
35.0

dustin 7

45.0

S1 S2

S1

sid
28
31
44
58

sname rating age


yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2

sid sname rating age


28 yuppy
9
35.0
44 guppy
5
35.0

S2 S1

Cross-Product
S1 R1: Each row of S1 paired with each row of
R1.
Q: How many rows in the result?
Result schema has one field per field of S1 and
R1, with field names `inherited if possible.
May have a naming conflict: Both S1 and R1 have a
field with the same name.
In this case, can use the renaming operator:

(C(1 sid1, 5 sid2), S1 R1)

Cross Product Example


sid
22
31
58

sid bid
day
22 101 10/10/96
58 103 11/12/96
R1

sname rating age


dustin
7
45.0
lubber
8
55.5
rusty
10 35.0
S1

(sid) sname rating age (sid) bid day

S1 x R1 =

22 dustin

45.0

22 101 10/10/96

22 dustin

45.0

58 103 11/12/96

31 lubber

55.5

22 101 10/10/96

31 lubber

55.5

58 103 11/12/96

58 rusty

10

35.0

22 101 10/10/96

58 rusty

10

35.0

58 103 11/12/96

Compound Operator: Intersection


In addition to the 5 basic operators, there are
several additional Compound Operators
These add no computational power to the language, but
are useful shorthands.
Can be expressed solely with the basic ops.

Intersection takes two input relations, which must


be union-compatible.
Q: How to express it using basic operators?

R S = R (R S)

Intersection
sid sname rating age
22
31
58

dustin
lubber
rusty

7
8
10

45.0
55.5
35.0

S1

sid
28
31
44
58

sname rating age


yuppy
9
35.0
lubber
8
55.5
guppy
5
35.0
rusty
10 35.0
S2

sid sname rating age


31 lubber 8
55.5
58 rusty 10
35.0

S1S2

Compound Operator: Join


Joins are compound operators involving cross
product, selection, and (sometimes) projection.
Most common type of join is a natural join (often
just called join). R
S conceptually is:
Compute R S
Select rows where attributes that appear in both
relations have equal values
Project all unique atttributes and one copy of each of
the common ones.

Note: Usually done much more efficiently than this.

Natural Join Example


sid bid
day
22 101 10/10/96
58 103 11/12/96
R1

S1

sid
22
31
58

sname rating age


dustin
7
45.0
lubber
8
55.5
rusty
10 35.0
S1

R1 =
sid
22
58

sname rating age bid day


dustin 7
45.0 101 10/10/96
rusty 10
35.0 103 11/12/96

Other Types of Joins


Condition Join (or theta-join):

R ><c S c R S

(sid) sname rating age (sid) bid day


22
dustin 7
45.0 58
103 11/12/96
31
lubber 8
55.5 58
103 11/12/96

S1><

R1

Result schema same as that


of cross-product.
S1.sid
R1.sid
May have fewer tuples than cross-product.

Equi-Join: Special case: condition c contains only conjunction of


equalities.

Examples

sid bid
day
Reserves
22 101 10/10/96
58 103 11/12/96

sid
Sailors 22
31
58
Boats

bid
101
102
103
104

bname
Interlake
Interlake
Clipper
Marine

sname rating age


dustin
7
45.0
lubber
8
55.5
rusty
10 35.0

color
Blue
Red
Green
Red

Find names of sailors whove reserved boat


#103
Solution 1:
Solution 2:

sname ((
sname (

bid =103

Re serves) >< Sailors)

bid =103

(Re serves >< Sailors))

Find names of sailors whove reserved a red


boat
Information about boat color only
available in Boats; so need an extra join:

sname ((

color =' red '

Boats) >< Re serves >< Sailors)

A more efficient solution:

sname ( ((
Boats) >< Re s) >< Sailors)
sid bid color =' red '

A query optimizer can find this given the first solut

Find sailors whove reserved a red boat or a


green boat
Can identify all red or green boats, then
find sailors whove reserved one of these
boats:

(Tempboats , (

color =' red ' color =' green '

Boats))

sname(Tempboats >< Re serves >< Sailors)

Find sailors whove reserved a red and a green


boat

Find sailors whove reserved a red and a green


boat
Cut-and-paste previous slide?

(Tempboats,(

color='red 'color='green'

Boats))

sname(Tempboats >< Re serves >< Sailors)

Find sailors whove reserved a red and a green


boat

Previous approach wont work! Must


identify sailors whove reserved red boats,
sailors whove reserved green boats, then
find the intersection (note that sid is a key
for Sailors):
(Tempred
,
((
Boats) >< Re serves))

sid

(Tempgreen ,

sid

color =' red '

((

color =' green'

Boats) >< Re serves))

sname((Tempred Tempgreen ) >< Sailors)

Summary
Relational Algebra: a small set of
operators mapping relations to relations
Operational, in the sense that you specify
the explicit order of operations
A closed set of operators! Can mix and
match.
Basic ops include: , , , ,
Important compound ops: ,

You might also like