Lecture 10-Relational Algebra

Download as pdf or txt
Download as pdf or txt
You are on page 1of 42

CSE 324 - Database Systems

Lecture 10: Relational Algebra

September 1, 2022 CSE 324 1


Objectives
• Relational Algebra Overview
• Unary Relational Operations
• Relational Algebra Operations From Set Theory
• Binary Relational Operations
• Additional Relational Operations
• Examples of Queries in Relational Algebra

September 1, 2022 CSE 324 2


Relational Algebra Overview
• Relational algebra is the basic set of operations for
the relational model
• These operations enable a user to specify basic
retrieval requests (or queries)
• The result of an operation is a new relation, which
may have been formed from one or more input
relations
– This property makes the algebra “closed” (all objects in
relational algebra are relations)

September 1, 2022 CSE 324 3


Relational Algebra Overview
• Relational Algebra consists of several groups of operations
– Unary Relational Operations
• SELECT ( s (sigma))
• PROJECT ( p (pi))
• RENAME ( r (rho))
– Relational Algebra Operations From Set Theory
• UNION ( È ), INTERSECTION ( Ç ), DIFFERENCE or MINUS ( – )
• CARTESIAN PRODUCT ( x )
– Binary Relational Operations
• JOIN ( ) (several variations)
• DIVISION ( ÷ )
– Additional Relational Operations
• AGGREGATE FUNCTIONS
• OUTER JOIN (several variations)
• OUTER UNION

September 1, 2022 CSE 324 4


All examples discussed below refer to the COMPANY database shown here

September 1, 2022 CSE 324 5


The following query results refer to this database state

Next slide…
The following query results refer to this database state
Unary Relational Operations: SELECT
• SELECT operation (denoted by s (sigma)) is used to select a subset of the
tuples from a relation R based on a selection condition.

s <selection condition>(R)
• The <selection condition> acts as a filter:
Tuples satisfying the condition are selected whereas the other tuples are
discarded (filtered out)
• Examples:
– Select the EMPLOYEE tuples whose department number is 4:
s DNO = 4 (EMPLOYEE)
– Select the employee tuples whose salary is greater than $30,000:
s SALARY > 30,000 (EMPLOYEE)

September 1, 2022 CSE 324 8


Unary Relational Operations: PROJECT
• PROJECT Operation (denoted by p (pi)) keeps certain columns (attributes)
from a relation R and discards the other columns.
p<attribute list>(R)
• PROJECT creates a vertical partitioning:
The list of specified columns (attributes) is kept in each tuple, while the
other attributes are discarded
• The project operation removes any duplicate tuples
– This is because the result of the project operation must be a set of tuples
• Mathematical sets do not allow duplicate elements.
• If the list of attributes includes a key of R, then the number of tuples
in the result of PROJECT is equal to the number of tuples in R
• Example:
– List employees’ first and last names and salary:
pLNAME, FNAME,SALARY(EMPLOYEE)

September 1, 2022 CSE 324 9


Examples of applying SELECT and PROJECT
operations
Figure 8.1 Results of SELECT and PROJECT operations.
(a) σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000) (EMPLOYEE).
(b) πLname, Fname, Salary(EMPLOYEE).
(c) πSex, Salary(EMPLOYEE).

Slide 8- 10
Relational Algebra Expressions
• We may want to apply several relational algebra
operations one after the other
– Either we can write the operations as a single relational
algebra expression by nesting the operations, or
– We can apply one operation at a time and create
intermediate result relations.
• In the latter case, we must give names to the
relations that hold the intermediate results.

September 1, 2022 CSE 324 11


Single expression versus sequence of relational
operations (Example)
• To retrieve the first name, last name, and salary of all
employees who work in department number 5, we must apply
a select and a project operation
• We can write a single relational algebra expression as follows:
pFNAME, LNAME, SALARY(s DNO=5(EMPLOYEE))
• OR We can explicitly show the sequence of operations, giving
a name to each intermediate relation:
DEP5_EMPS ¬ s DNO=5(EMPLOYEE)
RESULT ¬ p FNAME, LNAME, SALARY (DEP5_EMPS)

September 1, 2022 CSE 324 12


Unary Relational Operations: RENAME
• RENAME operator (denoted by r (rho)) renames the attributes of a
relation or the relation name or both

– Useful when a query requires multiple operations


– Necessary in some cases (see JOIN operation later)
• Three forms:
1. rS(R) :
changes the relation name to S
2. r(B1, B2, …, Bn )(R) :
changes the column (attribute) names to B1, B2, …..Bn
3. rS (B1, B2, …, Bn )(R) :
changes both the relation name to S, and the column (attribute) names to B1, B2,
…..Bn

September 1, 2022 CSE 324 13


Relational Algebra Operations from
Set Theory
• UNION È
– The result of R È S is a relation that includes all tuples that are either
in R or in S or in both R and S
• INTERSECTION Ç
– The result of R Ç S is a relation that includes all tuples that are in both
R and S
• SET DIFFERENCE (MINUS or EXCEPT) –
– The result of R – S is a relation that includes all tuples that are in R but
not in S

September 1, 2022 CSE 324 14


Relational Algebra Operations from
Set Theory:
• Duplicate tuples are eliminated
• The operands R and S must be “type compatible”
– R and S must have same number of attributes
– Each pair of corresponding attributes must be type compatible (have
same or compatible domains) but the attribute names may differ!
• In case the attribute names of R and S are different, the attribute names in
the result will be the same as the attribute names in R (the first operand)

September 1, 2022 CSE 324 15


Figure 8.4 The set operations UNION, INTERSECTION, and MINUS.
(a) Two union-compatible relations.
(b) STUDENT ∪ INSTRUCTOR.
(c) STUDENT ∩ INSTRUCTOR.
(d) STUDENT – INSTRUCTOR.
(e) INSTRUCTOR – STUDENT.
CARTESIAN PRODUCT

• Combines tuples from two relations in a combinatorial fashion.


• Result of R(A1, A2, . . ., An) x S(B1, B2, . . ., Bm) is a relation Q with degree n +
m attributes:
Q(A1, A2, . . ., An, B1, B2, . . ., Bm)
• Q has one tuple for each combination of tuples—one from R and one from
S.
– Hence, if R has nR tuples (denoted as |R| = nR ), and S has nS tuples, then R x S
will have nR * nS tuples.
• The two operands R and S do NOT have to be "type compatible”

September 1, 2022 CSE 324 17


CARTESIAN PRODUCT (cont.)
• Generally, CROSS PRODUCT is not a meaningful operation
– Can become meaningful when followed by other operations
• Example (not meaningful):
– FEMALE_EMPS ¬ s SEX=’F’(EMPLOYEE)
– EMPNAMES ¬ p FNAME, LNAME, SSN (FEMALE_EMPS)
– EMP_DEPENDENTS ¬ EMPNAMES x DEPENDENT
• EMP_DEPENDENTS will contain every combination of EMPNAMES and DEPENDENT
– whether or not they are actually related

• To keep only combinations where the DEPENDENT is related to the


EMPLOYEE, we add a SELECT operation as follows
• Example (meaningful):
– FEMALE_EMPS ¬ s SEX=’F’(EMPLOYEE)
– EMPNAMES ¬ p FNAME, LNAME, SSN (FEMALE_EMPS)
– EMP_DEPENDENTS ¬ EMPNAMES x DEPENDENT
– ACTUAL_DEPS ¬ s SSN=ESSN(EMP_DEPENDENTS)
– RESULT ¬ p FNAME, LNAME, DEPENDENT_NAME (ACTUAL_DEPS)
• RESULT will now contain the name of female employees and their dependents
September 1, 2022 CSE 324 18
Figure 8.5 The CARTESIAN PRODUCT (CROSS PRODUCT) operation.

continued on next slide


Figure 8.5 (continued) The CARTESIAN PRODUCT (CROSS PRODUCT)
operation.

continued on next slide


Figure 8.5 (continued) The CARTESIAN PRODUCT (CROSS PRODUCT)
operation.
JOIN
• JOIN Operation (denoted by ) is a sequence of CARTESIAN
PRODECT followed by SELECT
R <join-condition> S
– The resulting relation state has one tuple for each combination of
tuples—r from R and s from S, but only if they satisfy the <join-
condition>
• This operation is very important for any relational database with
more than a single relation, because it allows us combine related
tuples from various relations

September 1, 2022 CSE 324 22


JOIN (cont.)

• Example: retrieving the name of the manager of each


department.
– To get the manager’s name, we need to combine each
DEPARTMENT tuple with the EMPLOYEE tuple whose SSN value
matches the MGRSSN value in the department tuple.
– We do this by using the join operation.
DEPT_MGR ¬ DEPARTMENT Mgr_SSN=SSN EMPLOYEE
– Mgr_SSN = SSN is the join condition
• Combines each department record with the employee who manages
the department
• This join condition can also be specified as
DEPARTMENT.Mgr_SSN = EMPLOYEE.SSN

September 1, 2022 CSE 324 23


Figure 8.6 Result of the JOIN operation
DEPT_MGR ← DEPARTMENT Mgr_ssn=Ssn EMPLOYEE

Slide 8- 24
JOIN Variations
• THETA-JOIN
– The general case of JOIN operation
R theta S
– The join condition is called theta
– Theta can be any general Boolean expression on the attributes of R and S; e.g.,
R.Ai <= S.Bj AND ( R.Ak = S.Bi OR R.Ap <> S.Bq )
• EQUIJOIN
– The JOINT operation where the only comparison operator used is =
R R.Aj = S.Bk S
– equality comparisons are commonly used to combine related tuples from
various relations.
– The JOIN seen in Example 8.6 is an EQUIJOIN:
DEPARTMENT Mgr_ssn=Ssn EMPLOYEE
Note that the names of the two join attributes need not be identical

September 1, 2022 CSE 324 25


JOIN Variations (cont.)
• Natural-JOIN (denoted by * )
– An EQUIJOIN in which the two join attributes, or each pair of corresponding
join attributes, have the same name in both relations
R*S
– Example: To apply a natural join on the DNUMBER attributes of DEPARTMENT
and DEPT_LOCATIONS, it is sufficient to write:
DEPT_LOCS ¬ DEPARTMENT * DEPT_LOCATIONS
• An implicit join condition is created based on the attribute DNUMBER:
DEPARTMENT.DNUMBER = DEPT_LOCATIONS.DNUMBER
– Another example: Q ¬ R(A,B,C,D) * S(C,D,E)
• The implicit join condition includes each pair of attributes with the same name,
“AND”ed together:
(R.C = S.C) AND (R.D = S.D)
• Result keeps only one attribute of each such pair:
Q(A,B,C,D,E)
September 1, 2022 CSE 324 26
Figure 8.7 Results of two natural join operations.
(a) proj_dept ← project * dept.
(b) dept_locs ← department * dept_locations.
DIVISION
• DIVISION Operation (÷) is applied to two relations as:
R(Z) ÷ S(X), where X is a subset of Z
– Let Y = Z - X (and hence Z = X È Y); i.e., Y is the set of attributes
of R that are not attributes of S.
– The result of DIVISION is a relation T(Y) that includes a tuple t if
tuples tR appear in R with tR [Y] = t, and with
tR [X] = ts for every tuple ts in S.

– For a tuple t to appear in the result T of the DIVISION, the values


in t must appear in R in combination with every tuple in S.
Somehow opposite to JOIN (or to CORSS PRODUCT in general)!

September 1, 2022 CSE 324 28


Figure 8.8 The DIVISION operation.
(a) Dividing SSN_PNOS by SMITH_PNOS.
(b) T ← R ÷ S.
Table 8.1 Operations of Relational Algebra

continued on next slide


Table 8.1 (continued) Operations of Relational Algebra
Additional Relational Operations:
Aggregate Functions and Grouping
• A type of request that cannot be expressed in the basic relational
algebra is to specify mathematical aggregate functions on
collections of values from the database.
• Examples of such functions include retrieving the average or total
salary of all employees or the total number of employee tuples.
– These functions are used in simple statistical queries that summarize
information from the database tuples.
• Common functions applied to collections of numeric values include
– SUM, AVERAGE, MAXIMUM, and MINIMUM.
• The COUNT function is used for counting tuples or values.

September 1, 2022 CSE 324 32


Aggregate Function Operation
• Use of the Aggregate Functional operation ℱ
– ℱMAX Salary (EMPLOYEE) retrieves the maximum salary value from
the EMPLOYEE relation
– ℱMIN Salary (EMPLOYEE) retrieves the minimum Salary value from
the EMPLOYEE relation
– ℱSUM Salary (EMPLOYEE) retrieves the sum of the Salary from the
EMPLOYEE relation
– ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE) computes the count
(number) of employees and their average salary
• Note: count just counts the number of rows, without removing
duplicates

September 1, 2022 CSE 324 33


Using Grouping with Aggregation
• The previous examples all summarized one or more attributes for a
set of tuples
– Maximum Salary or Count (number of) Ssn
• Grouping can be combined with Aggregate Functions
• Example: For each department, retrieve the DNO, COUNT SSN, and
AVERAGE SALARY
• A variation of aggregate operation ℱ allows:
– Grouping attribute placed to left of symbol
– Aggregate functions to right of symbol
– DNO ℱCOUNT SSN, AVERAGE Salary (EMPLOYEE)
• Above operation groups employees by DNO (department number)
and computes the count of employees and average salary per
department

September 1, 2022 CSE 324 34


Figure 8.10 The aggregate function operation.
(a) ρR(Dno, No_of_employees, Average_sal)(Dno ℱCOUNT Ssn, AVERAGE Salary (EMPLOYEE)).
(b) Dno ℱCOUNT Ssn, AVERAGE Salary(EMPLOYEE).
(C) ℱCOUNT Ssn, AVERAGE Salary(EMPLOYEE).
Additional Relational Operations (continued)

• The OUTER JOIN Operation


– In NATURAL JOIN and EQUIJOIN, tuples without a matching (or
related) tuple are eliminated from the join result
• Tuples with null in the join attributes are also eliminated
• This amounts to loss of information.
– A set of operations, called OUTER joins, can be used when we
want to keep all the tuples in R, or all those in S, or all those in
both relations in the result of the join, regardless of whether or
not they have matching tuples in the other relation.

September 1, 2022 CSE 324 36


Additional Relational Operations (continued)
• The left outer join operation keeps every tuple in the first or left relation
R in R S; if no matching tuple is found in S, then the attributes of S in
the join result are filled or “padded” with null values.
• A similar operation, right outer join, keeps every tuple in the second or
right relation S in the result of R S.
• A third operation, full outer join, denoted by keeps all tuples in
both the left and the right relations when no matching tuples are found,
padding them with null values as needed.

September 1, 2022 CSE 324 37


Figure 8.12 The result of a LEFT OUTER JOIN operation.

Slide 8- 38
Additional Relational Operations (continued)
• OUTER UNION Operations
– The outer union operation was developed to take the union of tuples
from two relations if the relations are not type compatible.

– This operation will take the union of tuples in two relations R(X, Y) and
S(X, Z) that are partially compatible, meaning that only some of their
attributes, say X, are type compatible.

– The attributes that are type compatible are represented only once in
the result, and those attributes that are not type compatible from
either relation are also kept in the result relation T(X, Y, Z).

September 1, 2022 CSE 324 39


Additional Relational Operations (continued)
• Example: An outer union can be applied to two relations
whose schemas are STUDENT(Name, SSN, Department,
Advisor) and INSTRUCTOR(Name, SSN, Department, Rank).
– Tuples from the two relations are matched based on having the
same combination of values of the shared attributes— Name, SSN,
Department.
– If a student is also an instructor, both Advisor and Rank will have a
value; otherwise, one of these two attributes will be null.
– The result relation STUDENT_OR_INSTRUCTOR will have the
following attributes:
STUDENT_OR_INSTRUCTOR (Name, SSN, Department, Advisor,
Rank)

September 1, 2022 CSE 324 40


Examples of Queries in Relational Algebra :
Procedural Form

n Q1: Retrieve the name and address of all employees who work for the
‘Research’ department.
RESEARCH_DEPT ¬ s DNAME=’Research’ (DEPARTMENT)
RESEARCH_EMPS ¬ (RESEARCH_DEPT DNUMBER= DNOEMPLOYEE EMPLOYEE)
RESULT ¬ p FNAME, LNAME, ADDRESS (RESEARCH_EMPS)

n Q6: Retrieve the names of employees who have no dependents.


ALL_EMPS ¬ p SSN(EMPLOYEE)
EMPS_WITH_DEPS(SSN) ¬ p ESSN(DEPENDENT)
EMPS_WITHOUT_DEPS ¬ (ALL_EMPS - EMPS_WITH_DEPS)
RESULT ¬ p LNAME, FNAME (EMPS_WITHOUT_DEPS * EMPLOYEE)

Slide 8- 41
Examples of Queries in Relational Algebra –
Single expressions

As a single expression, these queries become:


n Q1: Retrieve the name and address of all employees who work for the
‘Research’ department.
p Fname, Lname, Address (σDname= ‘Research’ (DEPARTMENT Dnumber=Dno (EMPLOYEE))

n Q6: Retrieve the names of employees who have no dependents.


p Lname, Fname((p Ssn (EMPLOYEE) − ρ Ssn (p Essn (DEPENDENT))) ∗ EMPLOYEE)

Slide 8- 42

You might also like