Lecture 77777

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

CSE221: Database Systems

Lecture : Relational Algebra and


relational calculus
Professor Shaker El-Sappagh
[email protected]
Spring 2024
Outline
• Relational algebra
• Unary Relational Operations
• SELECT (symbol:  (sigma))
• PROJECT (symbol:  (pi))
• RENAME (symbol:  (rho))
• Relational Algebra Operations From Set Theory
• UNION (  ), INTERSECTION (  ), DIFFERENCE (or MINUS, – )
• CARTESIAN PRODUCT ( x )
• Binary Relational Operations
• JOIN (several variations of JOIN exist)
• DIVISION
• Additional Relational Operations
• OUTER JOINS, OUTER UNION
• AGGREGATE FUNCTIONS (These compute summary of information: for example, SUM,
COUNT, AVG, MIN, MAX)
• Relational calculus
• Tuple relational calculus
• Domain relational calculus
Relational Query Languages
▪ Query languages (QLs) allow manipulating and retrieving
data from databases

▪ The relational model supports simple and powerful QLs:


▪ Strong formal foundation based on logic
▪ High amenability for effective optimizations

▪ Query Languages != programming languages!


▪ Know what not how.
▪ QLs are not intended to be used for complex calculations
▪ QLs support easy and efficient access to large datasets
Note that we speak only about query (i.e., no update,
insert, DDL, DCL, nor TML)
Formal Relational Query Languages
▪ There are two mathematical Query Languages which form the basis for
commercial languages (e.g. SQL)
▪ Relational Algebra
▪ Queries are composed of sequence of operators in specific order. The
order affects the execution strategy. So, procedural queries
▪ Each query describes a step-by-step procedure for computing the desired
answer
▪ Very useful for representing execution plans

▪ Relational Calculus: provides a higher-level declarative language for


specifying relational queries
▪ Queries are subsets of first-order logic
▪ Queries describe desired answers (What) without specifying how they will
be computed (How)
▪ A type of non-procedural (or declarative) formal query language
1. Relational Algebra
Relational Algebra
▪ Relational algebra is important for several reasons
▪ It provides a formal foundation for relational model operations.
▪ It is used as a basis for implementing and optimizing queries in
the query processing and optimization modules.
▪ Some of its concepts are incorporated into the SQL standard
query language for RDBMSs.
▪ Its operations can be divided into two groups:
▪ One group includes set operations from mathematical set theory
including UNION, INTERSECTION, SET DIFFERENCE, and
CARTESIAN PRODUCT.
▪ The other group consists of operations developed specifically for
relational databases including SELECT, PROJECT, and JOIN, etc.
▪ These operations can be divided into unary operations that operate
on single relation, and binary operations that work on two
relations.
Relational Algebra
Relational algebra defines a set of operations:
▪ Operators (with notations):
1.Selection ( ) Sigma
2. Projection (  ) Pi
3. Cross-product ( )

4. Set-difference ( )
5. Union ( ∪ )
6. Intersection ( ∩ )
7. Join (  )
8. Division ( ÷ )
9. Renaming ( ) Rho
• Operators take one or two relations as inputs and produce a
new relation as a result.
• Each operation returns a relation, hence, operations can
be composed! (i.e., Algebra is “closed”)
The Projection Operation
▪ Unary Projection:  att−list (R)
▪ “Project out” attributes that are NOT in att-list
▪ PROJECT creates a vertical partitioning
▪ The schema of the output relation contains ONLY the fields in att-list,
with the same names that they had in the input relation

▪ Example 1:
Input Relation:  sname,rating (S 2) Output Relation:

sid sname rating age sname rating


28 yuppy 9 35.0 yuppy 9
31 lubber 8 55.5 lubber 8
44 guppy 5 35.0 guppy 5
58 rusty 10 35.0 rusty 10
S2
The Projection Operation
▪ Example 2:  age(S2)
Input Relation: Output Relation:
sid
28
sname
yuppy
rating
9
age
35.0
age
31 lubber 8 55.5 35.0
44 guppy 5 35.0
58 rusty 10 35.0 55.5
S2

▪ The projection operator eliminates duplicates!


▪ Note: real DBMSs typically do not eliminate
duplicates unless explicitly asked for
Project Operation – Example
• Relation r:

A,C (r)
Project Operation
• Notation:  (r )
A1 , A2 ,, Ak

where A1, A2 are attribute names and r is a relation


name.
• The result is defined as the relation of k columns
obtained by erasing the columns that are not listed
• Duplicate rows removed from result, since relations are
sets
• Example: To eliminate the dept_name attribute of
instructor
ID, name, salary (instructor)

– PROJECT is not commutative


•  <list1> ( <list2> (R) ) =  <list1> (R) as long as <list2> contains the
attributes in <list1>
The Selection Operation
▪ Selection:  condition (R )
▪ Selects rows that satisfy the selection condition or predicate
The schema of the output relation is identical to the schema of the
input relation
▪ The selection condition acts as a filter
▪ Example:
 rating 8(S 2)
Input Relation: Output Relation:
sid sname rating age sid sname rating age
28 yuppy 9 35.0 28 yuppy 9 35.0
31 lubber 8 55.5 58 rusty 10 35.0
44 guppy 5 35.0
58 rusty 10 35.0
S2
Select Operation
• Notation:  p(r)
• p is called the selection predicate
• Defined as:

p(r) = {t | t  r and p(t)}

Where p is a formula in propositional calculus consisting of terms


connected by :  (and),  (or),  (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, , >, . <. 

• Example of selection:

 dept_name=“Physics”(instructor)
Select Operation – Example
Relation r

 A=B ^ D > 5 (r)


Unary Relational Operations: SELECT (continued)
• SELECT Operation Properties
– The SELECT operation  <selection condition>(R) produces a relation S that
has the same schema (same attributes) as R
– SELECT  is commutative:
•  <condition1>( < condition2> (R)) =  <condition2> ( < condition1> (R))
– Because of commutativity property, a cascade (sequence) of SELECT
operations may be applied in any order:
• <cond1>(<cond2> (<cond3> (R)) = <cond2> (<cond3> (<cond1> ( R)))
– A cascade of SELECT operations may be replaced by a single selection
with a conjunction of all the conditions:
• <cond1>(< cond2> (<cond3>(R)) =  <cond1> AND < cond2> AND < cond3>(R)))
– The number of tuples in the result of a SELECT is less than (or equal
to) the number of tuples in the input relation R
Operator Composition (1)
▪ The output relation can be the input for another relational
algebra operation! (Operator composition)

▪ Example:  sname,rating( rating 8(S2))


Input Relation: Intermediate Relation:
sid sname rating age sid sname rating age
28 yuppy 9 35.0 28 yuppy 9 35.0
31 lubber 8 55.5 58 rusty 10 35.0
44 guppy 5 35.0
58 rusty 10 35.0 Final Output Relation:
S2 sname rating
yuppy 9
rusty 10
Examples of applying SELECT and PROJECT operations

Example
Operator Composition (1)
Single expression versus sequence of relational operations
(Example)
• To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must
apply a select and a project operation
• We can write a single relational algebra expression as follows:
– FNAME, LNAME, SALARY( DNO=5(EMPLOYEE))
• OR We can explicitly show the sequence of operations, giving
a name to each intermediate relation:
– DEP5_EMPS   DNO=5(EMPLOYEE)
– RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)
Examples of applying sequence of operations
Relational Algebra
operations from Set theory
UNION, INTERSECTION, and MINUS Operations
The Union Operation
▪ Union: R U S
▪ The two input relations must be union-compatible
▪ Same number of fields
▪ `Corresponding’ fields have the same names and type compatible.
▪ The output relation includes all tuples that occur “in either” R or S “or both”
▪ The schema of the output relation is identical to the schema of R
▪ Duplicate tuples are eliminated

S1 S 2 Output Relation:


▪ Example:
Input Relations: sid sname rating age
sid sname rating age 22 dustin 7 45.0
sid sname rating age
28 yuppy 9 35.0 31 lubber 8 55.5
22 dustin 7 45.0 31 lubber 8 55.5 58 rusty 10 35.0
31 lubber 8 55.5 44 guppy 5 35.0 44 guppy 5 35.0
58 rusty 10 35.0 58 rusty 10 35.0
28 yuppy 9 35.0
S1 S2
Union Operation – Example
• Relations r, s:

r  s:
Union Operation
• Notation: r  s
• Defined as:
r  s = {t | t  r or t  s}
• For r  s to be valid.
1. r, s must have the same arity (same number of
attributes)
2. The attribute domains must be compatible (example:
2nd column of r deals with the same type of values as does
the 2nd column of s)
• Example: to find all courses taught in the Fall 2009
semester, or in the Spring 2010 semester, or in both
course_id ( semester=“Fall” Λ year=2009 (section)) 
course_id ( semester=“Spring” Λ year=2010 (section))
Intersection Operation
▪ Intersection: 𝑹 ∩ 𝑺
▪ The two input relations must be union-compatible
▪ The output relation includes all tuples that occur “in both” R and S
▪ The schema of the output relation is identical to the schema of R

▪ Example: S1 S 2
Input Relations:
Output Relation:
sid sname rating age sid sname rating age sid sname rating age
28 yuppy 9 35.0
22 dustin 7 45.0 31 lubber 8 55.5
31 lubber 8 55.5
31 lubber 8 55.5 44 guppy 5 35.0 58 rusty 10 35.0
58 rusty 10 35.0 58 rusty 10 35.0

S1 S2
Intersection Operation
• Notation: r  s
• Defined as:
• r  s = { t | t  r and t  s }
• Assume:
– r, s have the same arity
– attributes of r and s are compatible
• Note: r  s = r – (r – s)
Intersection Operation – Example

• Relation r, s:

• rs
Set-Difference Operation
▪ Set-Difference: 𝑹 − 𝑺
▪ The two input relations must be union-compatible
▪ The output relation includes all tuples that occur in R “but not” in S
▪ The schema of the output relation is identical to the schema of R

▪ Example: S1− S 2
Input Relations: Output Relation:

sid sname rating age sid


28
sname
yuppy
rating
9
age
35.0
sid sname rating age
22 dustin 7 45.0 31 lubber 8 55.5 22 dustin 7 45.0
31 lubber 8 55.5 44 guppy 5 35.0
58 rusty 10 35.0 58 rusty 10 35.0

S1 S2
Set difference of two relations
• Relations r, s:

r – s:
Set Difference Operation
• Notation r – s
• Defined as:
r – s = {t | t  r and t  s}

• Set differences must be taken between compatible relations.


– r and s must have the same arity
– attribute domains of r and s must be compatible
• Example: to find all courses taught in the Fall 2009 semester, but not in
the Spring 2010 semester

course_id ( semester=“Fall” Λ year=2009 (section)) −

course_id ( semester=“Spring” Λ year=2010 (section))


Some properties of UNION, INTERSECT, and
DIFFERENCE
• Notice that both union and intersection are commutative
operations; that is
– R  S = S  R, and R  S = S  R
• Both union and intersection can be treated as n-ary
operations applicable to any number of relations as both are
associative operations; that is
– R  (S  T) = (R  S)  T
– (R  S)  T = R  (S  T)
• The minus operation is not commutative; that is, in general
– R–S≠S–R


▪Example:
The Cross-Product and Renaming
Operations
• Cross Product: 𝑹𝑿𝑺
▪ Each row of R is paired with each row of S
▪ The schema of the output relation concatenates S1’s and R1’s schemas
▪ Conflict: R and S might have the same field name
▪ Solution: Rename fields using the “Renaming Operator”
▪ 
Renaming: (R(F ), E)
Output Relation:
▪ Example: S1XR1 (sid) sname rating age (sid) bid day
Input Relations: 22 dustin 7 45.0 22 101 10/10/96
22 dustin 7 45.0 58 103 11/12/96
sid sname rating age sid bid day 31 lubber 8 55.5 22 101 10/10/96
22 dustin 7 45.0 31 lubber 8 55.5 58 103 11/12/96
22 101 10/10/96
31 lubber 8 55.5 58 rusty 10 35.0 22 101 10/10/96
58 rusty 10 35.0
58 103 11/12/96 58 rusty 10 35.0 58 103 11/12/96
R1
S1
Conflict: Both S1 and R1 have a field called sid
Cartesian-Product Operation – Example
Relations r, s:

r x s:
Cartesian-Product Operation
• Notation r x s
• Defined as:
r x s = {t q | t  r and q  s}

• Assume that attributes of r(R) and s(S) are


disjoint. (That is, R  S = ).
• If attributes of r(R) and s(S) are not disjoint,
then renaming must be used.
Rename Operation
• Allows us to name the results of relational-algebra
expressions.
• Allows us to refer to a relation by more than one name.
• Example:
 x (E)

returns the expression E under the name X


• If a relational-algebra expression E has arity n, then
 x ( A1 , A2 ,...,An ) ( E )
returns the result of expression E under the name X, and
with the attributes renamed to A1 , A2 , …., An .
Rename Operation (cont’d)
• The general RENAME operation  can be
expressed by any of the following forms:
– S (B1, B2, …, Bn )(R) changes both:
• the relation name to S, and
• the column (attribute) names to B1, B1, …..Bn
– S(R) changes:
• the relation name only to S
– (B1, B2, …, Bn )(R) changes:
• the column (attribute) names only to B1, B1, …..Bn
Rename Operation (cont’d)
• For convenience, we also use a shorthand for
renaming attributes in an intermediate relation:
– If we write:
• RESULT   FNAME, LNAME, SALARY (DEP5_EMPS)
• RESULT will have the same attribute names as
DEP5_EMPS (same attributes as EMPLOYEE)
• If we write:
• RESULT (F, M, L, S, B, A, SX, SAL, SU, DNO)
 RESULT (F.M.L.S.B,A,SX,SAL,SU, DNO)(DEP5_EMPS)
• The 10 attributes of DEP5_EMPS are renamed to F, M,
L, S, B, A, SX, SAL, SU, DNO, respectively
Note: the  symbol is an assignment operator
Rename Operation: Example Query
• Find the largest salary in the university
– Step 1: find instructor salaries that are less than some other
instructor salary (i.e. not maximum)
– using a copy of instructor under a new name d
• instructor.salary ( instructor.salary < d.salary
(instructor x d (instructor)))
– Step 2: Find the largest salary
• salary (instructor) –
instructor.salary ( instructor.salary < d.salary
(instructor x d (instructor)))
The Join Operation
▪ (Theta) Join : R c S = c(RS )
▪ The schema of the output relation is the same as that of cross-product
▪ It usually includes fewer tuples than cross-product

▪ Example: S1  R1
S1. sid  R1. sid

Input Relations: Output Relation:

sid sname rating age sid bid day (sid) sname rating age (sid) bid day
22 dustin 7 45.0 22 dustin 7 45.0 58 103 11/12/96
22 101 10/10/96 31 lubber 8 55.5 58 103 11/12/96
31 lubber 8 55.5
58 rusty 10 35.0 58 103 11/12/96
S1 R1

Will be redundant “if” the condition is S1.sid = R1.sid!


The Join Operation
▪ Equi-Join: R c S = c(RS )
▪ A special case of theta join where the condition c contains only equalities
▪ The schema of the output relation is the same as that of cross-product,
“but only one copy of the fields for which equality is specified”

▪ Natural Join: R S


▪ Equijoin on “all” common fields

▪ Example: S1 R1
S1.sid = R1.sid
Input Relations: Output Relation:
sid sname rating age sid bid day sid sname rating age bid day
22 dustin 7 45.0
22 101 10/10/96 22 dustin 7 45.0 101 10/10/96
31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0 103 11/12/96
58 rusty 10 35.0
S1 R1 ONLY one sid column!
The Join Operation
▪ Equi-Join: R c S = c(RS )
▪ A special case of theta join where the condition c contains only equalities
▪ The schema of the output relation is the same as that of cross-product,
“but only one copy of the fields for which equality is specified”

▪ Natural Join: R S


▪ Equijoin on “all” common fields

Natural Join
▪ Example: S1 R1
Input Relations: Output Relation:
sid sname rating age sid bid day sid sname rating age bid day
22 dustin 7 45.0
22 101 10/10/96 22 dustin 7 45.0 101 10/10/96
31 lubber 8 55.5
58 103 11/12/96 58 rusty 10 35.0 103 11/12/96
58 rusty 10 35.0
S1 R1 In this case, same as equi-join!
Natural-Join Operation
Notation: r s
• Let r and s be relations on schemas R and S respectively.
Then, r s is a relation on schema R x S obtained as
follows:
– Consider each pair of tuples tr from r and ts from s.
– If tr and ts have the same value on each of the attributes in R
 S, add a tuple t to the result, where
• t has the same value as tr on r
• t has the same value as ts on s
• Example:
R = (A, B, C, D)
S = (E, B, D)
– Result schema = (A, B, C, D, E)
– r s is defined as:
r.A, r.B, r.C, r.D, s.E (r.B = s.B  r.D = s.D (r x s))
Natural Join Example
• Relations r, s:

r s
Outer Join
• An extension of the join operation that avoids loss of
information.
• Computes the join and then adds tuples form one relation
that does not match tuples in the other relation to the result
of the join.
• Uses null values:
– null signifies that the value is unknown or does not exist
– All comparisons involving null are (roughly speaking) false by
definition.
• We shall study precise meaning of comparisons with nulls later
Outer Join – Example
• Relation instructor

ID name dept_name
10101 Srinivasan Comp. Sci.
12121 Wu Finance
15151 Mozart Music

Relation teaches

ID course_id
10101 CS-101
12121 FIN-201
76766 BIO-101
Outer Join – Example
• Join
instructor teaches
ID name dept_name course_id
10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201

Left Outer Join


instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
15151 Mozart Music null
Outer Join – Example
Right Outer Join
instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
76766 null null BIO-101

Full Outer Join


instructor teaches

ID name dept_name course_id


10101 Srinivasan Comp. Sci. CS-101
12121 Wu Finance FIN-201
15151 Mozart Music null
76766 null null BIO-101
Notation for Query Trees
Additional Relational Operations: Aggregate
Functions and Grouping
• Examples of such functions include retrieving the average or total
salary of all employees or the total number of employee tuples.
– These functions are used in simple statistical queries that summarize
information from the database tuples.
• Common functions applied to collections of numeric values include
– SUM, AVERAGE, MAXIMUM, and MINIMUM.
• The COUNT function is used for counting tuples or values.
• General syntax is (using the script f operator)
Aggregate Function Operation
• Use of the Aggregate Functional operation ℱ
– ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from
the EMPLOYEE relation
– ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from
the EMPLOYEE relation
– ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from the
EMPLOYEE relation
– ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count
(number) of employees and their average salary
• Note: count just counts the number of rows, without removing
duplicates
• Note: the NULL values are not considered in the aggregation.
• Relational Algebra is closed: So, result of applying an aggregate
function is a relation, not a scalar number—even if it has a single
value.
Using Grouping with Aggregation
• The previous examples all summarized one or more attributes
for a set of tuples
– Maximum Salary or Count (number of) Ssn
• Grouping can be combined with Aggregate Functions
• Example: For each department, retrieve the DNO, COUNT
SSN, and AVERAGE SALARY
• A variation of aggregate operation ℱ allows this:
– Grouping attribute placed to left of symbol
– Aggregate functions to right of symbol
– DNO ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE)
• Above operation groups employees by DNO (department
number) and computes the count of employees and average
salary per department
Using Grouping with Aggregation
Examples of Queries in Relational Algebra
Examples of Queries in Relational Algebra
Examples of Queries in Relational Algebra
Examples of Queries in Relational Algebra

As a single expression:
Examples of Queries in Relational Algebra
Tool for practice
• RelaX - relational algebra calculator (dbis-uibk.github.io)
2. Relational Calculus
2.1 Tuple Relational Calculus
2.2 Domain relational calculus
Relational Calculus (in General)
▪ It describes what we want to retrieve (not how).

▪ It is a nonprocedural language.

▪ It differs from relational algebra, where we must write a


sequence of operations to specify a retrieval request in a
particular order of applying the operations; thus,
relational algebra is a procedural way of stating a query.

▪ It has two equivalent flavors, ‘tuple’ and ‘domain’ calculus

▪ It is useful for proofs (query optimization)

▪ Expressive power of relational algebra and calculus is


identical.
Tuple Relational Calculus (TRC)
▪ RTC is a subset of ‘first order logic’:
A “formula” that describes t
{t | P (t )}

Give me tuples ‘t’, satisfying predicate ‘P’

▪ Examples:
▪ Find all students:
{t | t  STUDENT }
▪ Find all sailors with a rating above 7:

{t | t  Sailors  t.rating  7}
Syntax of RTC Queries
▪ The allowed symbols:
, , , 
, , =, , , ,
(, ), 

▪ Quantifiers:
, 
Predicate Calculus Formula
1. Set of attributes and constants
2. Set of comparison operators: (e.g., , , =, , , )
3. Set of connectives: and (), or (v)‚ not ()
4. Implication (): x  y, if x if true, then y is true
x  y  x v y
5. Set of quantifiers:
 t  r (Q (t ))  ”there exists” a tuple in t in relation r
such that predicate Q (t ) is true
t  r (Q (t ))  Q is true “for all” tuples t in relation r
Syntax of TRC Queries

▪ Atomic “formulas”: t  TABLE


t.attr op const
t.attr op s.attr
Where op is an operator in the set {<, >, =, ≤, ≥, ≠}
Syntax of TRC Queries
▪ A “formula” is made up of one or more atoms connected via the
logical operators AND, OR, and NOT and is defined recursively by
Rules 1 and 2 as follows:
▪ Rule 1: Any atomic formula
▪ Rule 2: If P1 and P2 are formulas, so are
P1; P 2; P1  P 2; P1  P 2; P1  P 2

▪ If P(s) is a formula, so are ∃𝑠(𝑃(𝑠)), ∀𝑠(𝑃(𝑠))


Syntax of TRC Queries
▪ To retrieve only some of the attributes—say, the first and last
names—we write

• We need to specify the following information in a tuple


relational calculus expression:
– For each tuple variable t, the range relation R of t.
– Condition(s) to select particular combinations of tuples.
– A set of attributes to be retrieved, the requested attributes.
• Ex: Retrieve the birth date and address of the employee (or
employees) whose name is John B. Smith:
Basic Rules
▪ Reminders:
▪ De Morgan: P1  P 2  (P1  P 2)
▪ Implication: P1  P 2  P1  P 2
▪ Double Negation:
s  TABLE ( P( s ))   s  TABLE (P( s ))

‘every human is mortal : no human is immortal’


Example
Example
Example
Example
Example
A Mini University Database

STUDENT CLASS
Ssn Name Address c-id c-name units
123 smith main str 15-413 s.e. 2
234 jones forbes ave 15-412 o.s. 2

TAKES
SSN c-id grade
123 15-413 A
234 15-413 B
Examples
▪ Find all student records

{t | t  STUDENT }

output
tuple of type ‘STUDENT’
Examples
▪ Find the student record with ssn=123
Examples
▪ Find the student record with ssn=123

{t | t  STUDENT  t.ssn = 123}

This is equivalent to the ‘Selection’ operator in Relational Algebra!


Examples
▪ Find the name of the student with ssn=123

{t | t  STUDENT  t.ssn = 123}

Will this work?


Examples
▪ Find the name of the student with ssn=123

{t | s  STUDENT ( s.ssn = 123 


t.name = s.name)}

‘t’ has only one column

This is equivalent to the ‘Projection’ operator in Relational Algebra!


Examples
▪ Get records of both part time and full time students*

{t | t  FT _ STUDENT 
t  PT _ STUDENT}

This is equivalent to the ‘Union’ operator in Relational Algebra!

* Assume we maintain tables for PT_STUDENT and FT_STUDENT in our Mini University DB
Examples
▪ Find students that are not staff*

{t | t  STUDENT 
t  STAFF}
This is equivalent to the ‘Difference’ operator in Relational Algebra!

* Assume we maintain a table for STAFF in our Mini University DB and that STUDENT and
STAFF are union-compatible
Cartesian Product: A Reminder
▪ Assume MALE and FEMALE dog tables as follows:

M.Name F.Name
MALE FEMALE spike lassie
name x name = spike shiba
spike lassie spot lassie
spot shiba spot shiba

This gives all possible couples!


Examples (Cont’d)
▪ Find all the pairs of (male, female) dogs

{t | m  MALE 
f  FEMALE
(t.m − name = m.name 
t. f − name = f .name)}

This is equivalent to the ‘Cartesian Product’ operator in


Relational Algebra!
More Examples
▪ Find the names of students taking 15-415
STUDENT CLASS
Ssn Name Address c-id c-name units
123 smith main str 15-413 s.e. 2
234 jones forbes ave 15-412 o.s. 2

TAKES
2-way Join! SSN c-id grade
123 15-413 A
234 15-413 B
More Examples
▪ Find the names of students taking 15-415

{t | s  STUDENT
 e  TAKES ( s.ssn = e.ssn 
t.name = s.name 
e.c − id = 15 − 415)}
More Examples
▪ Find the names of students taking 15-415

{t | s  STUDENT
 e  TAKES ( s.ssn = e.ssn  join

t.name = s.name  projection


e.c − id = 15 − 415)} selection
More Examples
▪ Find the names of students taking a 2-unit course
STUDENT CLASS
Ssn Name Address c-id c-name units
123 smith main str 15-413 s.e. 2
234 jones forbes ave 15-412 o.s. 2

TAKES
SSN c-id grade
3-way Join!
123 15-413 A
234 15-413 B
More Examples
• Find the names of students taking a 2-unit course

{t | s  STUDENT  e  TAKES
c  CLASS ( s.ssn = e.ssn  join
e.c − id = c.c − id 
t.name = s.name  projection
c.units = 2)}
selection
Example Queries
• Find the ID, name, dept_name, salary for
instructors whose salary is greater than $80,000
{t | t  instructor  t [salary ]  80000}

As in the previous query, but output only the ID attribute value

{t |  s  instructor (t [ID ] = s [ID ]  s [salary ]  80000)}

Notice that a relation on schema (ID) is implicitly defined by


the query
Example Queries
• Find the names of all instructors whose department
is in the Watson building
{t | s  instructor (t [name ] = s [name ]
 u  department (u [dept_name ] = s[dept_name] “
 u [building] = “Watson” ))}

Find the set of all courses taught in the Fall 2009 semester, or in
the Spring 2010 semester, or both

{t | s  section (t [course_id ] = s [course_id ] 


s [semester] = “Fall”  s [year] = 2009
v u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}
Example Queries
Find the set of all courses taught in the Fall 2009 semester, and in
the Spring 2010 semester

{t | s  section (t [course_id ] = s [course_id ] 


s [semester] = “Fall”  s [year] = 2009
 u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}

Find the set of all courses taught in the Fall 2009 semester, but not in
the Spring 2010 semester

{t | s  section (t [course_id ] = s [course_id ] 


s [semester] = “Fall”  s [year] = 2009
  u  section (t [course_id ] = u [course_id ] 
u [semester] = “Spring”  u [year] = 2010)}
Universal Quantification
• Find all students who have taken all courses offered in the Biology
department
– {t |  r  student (t [ID] = r [ID]) 
( u  course (u [dept_name]=“Biology” 
 s  takes (t [ID] = s [ID ] 
s [course_id] = u [course_id]))}
– Note that without the existential quantification on student, the above
query would be unsafe if the Biology department has not offered any
courses.
Quantification example
• Find all students who have taken all courses offered in the Biology
department
– {t |  r  student (t [ID] = r [ID]) 
( u  course (u [dept_name]=“Biology” 
 s  takes (t [ID] = s [ID ] 
s [course_id] = u [course_id]))}
– Note that without the existential quantification on student, the above
query would be unsafe if the Biology department has not offered any
courses.
Transforming the Universal and Existential Quantifiers
• It is possible to transform a universal quantifier into an existential quantifier, and vice
versa, to get an equivalent expression.
– One general transformation can be described informally as follows:
– Transform one type of quantifier into the other with negation (preceded by NOT);
– AND and OR replace one another;
– a negated formula becomes unnegated;
– and an unnegated formula becomes negated.

• Some special cases of this transformation can be stated as follows:


2.1 Domain Relational Calculus
Domain Relational Calculus
• Domain calculus differs from tuple calculus in the type of
variables used in formulas:
– Rather than having variables range over tuples, the
variables range over single values from domains of
attributes.
• To form a relation of degree n for a query result, we must have
n of these domain variables— one for each attribute.
• A nonprocedural query language equivalent in power to the
tuple relational calculus
• Each query is an expression of the form:
{  x1, x2, …, xn  | P (x1, x2, …, xn)}

– x1, x2, …, xn represent domain variables


– P represents a formula similar to that of the predicate
calculus
The Domain Relational Calculus (continued)

• An expression of the domain calculus is of the


form
{ x1, x2, . . ., xn |
COND(x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m)}
– where x1, x2, . . ., xn, xn+1, xn+2, . . ., xn+m are domain
variables that range over domains (of attributes)
– and COND is a condition or formula of the domain
relational calculus.
Example Queries
• Find the ID, name, dept_name, salary for instructors whose
salary is greater than $80,000
– {< i, n, d, s> | < i, n, d, s>  instructor  s  80000}
• As in the previous query, but output only the ID attribute value
– {< i> | < i, n, d, s>  instructor  s  80000}
• Find the names of all instructors whose department is in the
Watson building
{< n > |  i, d, s (< i, n, d, s >  instructor
  b, a (< d, b, a>  department  b = “Watson” ))}
Example Queries
Find the set of all courses taught in the Fall 2009 semester, or in
the Spring 2010 semester, or both
{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 
s = “Fall”  y = 2009 )
v  a, s, y, b, r, t ( <c, a, s, y, b, t >  section ] 
s = “Spring”  y = 2010)}
This case can also be written as
{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 
( (s = “Fall”  y = 2009 ) v (s = “Spring”  y = 2010))}
Find the set of all courses taught in the Fall 2009 semester, and in
the Spring 2010 semester

{<c> |  a, s, y, b, r, t ( <c, a, s, y, b, t >  section 


s = “Fall”  y = 2009 )
  a, s, y, b, r, t ( <c, a, s, y, b, t >  section ] 
s = “Spring”  y = 2010)}
Safety of Expressions
The expression:
{  x1, x2, …, xn  | P (x1, x2, …, xn )}

is safe if all of the following hold:


1. All values that appear in tuples of the expression are
values from dom (P ) (that is, the values appear either in
P or in a tuple of a relation mentioned in P ).
2. For every “there exists” subformula of the form  x (P1(x
)), the subformula is true if and only if there is a value of
x in dom (P1)such that P1(x ) is true.
3. For every “for all” subformula of the form x (P1 (x )), the
subformula is true if and only if P1(x ) is true for all values
x from dom (P1).
Universal Quantification
• Find all students who have taken all courses offered in the Biology
department
– {< i > |  n, d, tc ( < i, n, d, tc >  student 
( ci, ti, dn, cr ( < ci, ti, dn, cr >  course  dn =“Biology”
  si, se, y, g ( <i, ci, si, se, y, g>  takes ))}
– Note that without the existential quantification on student, the
above query would be unsafe if the Biology department has not
offered any courses.
Thank you

You might also like