Unit 3 Relational Data Model

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

💻

Unit 3: Relational Data Model


Detailed Notes on Unary Relational Operations: SELECT and
PROJECT

8.1 Unary Relational Operations: SELECT and PROJECT

8.1.1 The SELECT Operation


Definition and Purpose:

The SELECT operation is used to filter and choose a subset of tuples from a
relation that satisfies a specific selection condition.

It acts as a filter, retaining only those tuples that meet the condition and
excluding those that do not.

Visualization:

SELECT can be visualized as a horizontal partitioning of the relation into


two sets:

Tuples that satisfy the condition (selected).

Tuples that do not satisfy the condition (filtered out).

Syntax and Example:

The SELECT operation is denoted by:


σ<selection condition>(R)

Unit 3: Relational Data Model 1


σ (sigma) is the SELECT operator.

<selection condition> is a Boolean expression involving the attributes of


relation R.

Example:
σDno=4(EMPLOYEE)
σSalary>30000(EMPLOYEE)

Selection Condition:

The Boolean expression for the selection condition can be:


<attribute name> <comparison op> <constant value>
<attribute name> <comparison op> <attribute name>

<comparison op> includes {=, <, ≤, >, ≥, ≠}.

Conditions can be combined using Boolean operators: AND, OR, and


NOT.

Example combining conditions:


σ(Dno=4 AND Salary>25000) OR (Dno=5 AND Salary>30000)(EMPLOYEE)

Application and Result:

Each tuple in relation R is evaluated against the selection condition.

If the condition evaluates to TRUE, the tuple is selected.

All selected tuples form the result of the SELECT operation.

Boolean conditions are interpreted as:

(cond1 AND cond2) is TRUE if both are TRUE.

(cond1 OR cond2) is TRUE if either is TRUE.

(NOT cond) is TRUE if cond is FALSE.

Properties:

Unary Operator: SELECT operates on a single relation.

Commutativity: The order of SELECT operations does not affect the result:
σ<cond1>(σ<cond2>(R)) = σ<cond2>(σ<cond1>(R))

Cascading: Multiple SELECT operations can be combined into one:


σ<cond1>(σ<cond2>(...σ<condn>(R)...) = σ<cond1> AND <cond2> AND ...
AND <condn>(R)

Unit 3: Relational Data Model 2


Example and Correspondence to SQL:

A SELECT operation in relational algebra corresponds to the WHERE clause


in SQL.

Example:
σDno=4 AND Salary>25000(EMPLOYEE)

SQL equivalent:
SELECT * FROM EMPLOYEE WHERE Dno=4 AND Salary>25000

Figures Explanation:

Figure 8.1(a): Result of σ(Dno=4 AND Salary>25000) OR (Dno=5 AND


Salary>30000)(EMPLOYEE).

Figure 8.1(b): Result of projection πLname, Fname, Salary(EMPLOYEE).

Figure 8.1(c): Result of projection πSex, Salary(EMPLOYEE).

Key Points to Remember:

SELECT Operation: Filters tuples based on a condition.

Boolean Operators: AND, OR, NOT are used to form complex conditions.

Commutativity and Cascading: SELECT operations can be reordered or


combined.

SQL Correspondence: SELECT operation in relational algebra is akin to the


WHERE clause in SQL.

This detailed breakdown covers the essential aspects of the SELECT operation
in relational algebra, its syntax, usage, and how it translates to SQL queries.

Detailed Notes on Unary Relational Operations: SELECT and


PROJECT

8.1.2 The PROJECT Operation


Definition and Purpose:

The PROJECT operation selects certain columns (attributes) from a relation


(table) and discards the other columns.

Used when interested in only specific attributes of a relation.

Visualized as a vertical partition of the relation into two: one with needed
columns (result of the operation) and the other with discarded columns.

Unit 3: Relational Data Model 3


Syntax and Example:

The PROJECT operation is denoted by:


π<attribute list>(R)

π (pi) represents the PROJECT operation.

<attribute list> is the sublist of desired attributes from the relation R.

Example:
πLname, Fname, Salary(EMPLOYEE)

This lists each employee’s last name, first name, and salary.

Resulting Relation:

The resulting relation contains only the attributes specified in <attribute


list>.

The degree (number of attributes) of the result equals the number of


attributes in <attribute list>.

Duplicate Elimination:

The PROJECT operation removes any duplicate tuples, ensuring the result
is a set of distinct tuples.

Example:
πSex, Salary(EMPLOYEE)

The result would contain distinct combinations of Sex and Salary,


removing duplicates.

This ensures the result is a valid relation.

Properties:

The number of tuples in the resulting relation is always less than or equal to
the number of tuples in R.

If the projection list is a superkey of R, the resulting relation has the same
number of tuples as R.

Commutativity does not hold for PROJECT operations.

SQL Correspondence:

The PROJECT attribute list corresponds to the SELECT clause in SQL.

Unit 3: Relational Data Model 4


Example:
πSex, Salary(EMPLOYEE)

SQL equivalent:
SELECT DISTINCT Sex, Salary FROM EMPLOYEE

Removing DISTINCT from SQL allows duplicates, resulting in a multiset


or bag of tuples.

8.1.3 Sequences of Operations and the RENAME Operation


Sequences of Operations:

For complex queries, multiple relational algebra operations are often


applied in sequence.

Operations can be nested in a single relational algebra expression or


applied one at a time with intermediate results.

Example of Sequence:

To retrieve the first name, last name, and salary of all employees in
department 5:

In-line expression: πFname, Lname, Salary(σDno=5(EMPLOYEE))

Sequence of operations with intermediate results:


DEP5_EMPS ← σDno=5(EMPLOYEE)
RESULT ← πFname, Lname, Salary(DEP5_EMPS)

Renaming Intermediate Relations:

Intermediate results can be named for clarity and simplicity.

Attributes can be renamed during operations:


TEMP ← σDno=5(EMPLOYEE)
R(First_name, Last_name, Salary) ← πFname, Lname, Salary(TEMP)

Formal RENAME Operation:

The RENAME operation can rename the relation name, attribute names, or
both.

Denoted by:
ρS(B1, B2, ... , Bn)(R)
ρS(R)
ρ(B1, B2, ... , Bn)(R)

Unit 3: Relational Data Model 5


ρ (rho) represents the RENAME operator.

S is the new relation name.

B1, B2, ..., Bn are the new attribute names.

SQL Correspondence:

Complex relational algebra expressions are represented by single SQL


queries.

Renaming in SQL is done using aliasing with AS.

Example:
SELECT E.Fname AS First_name, E.Lname AS Last_name, E.Salary AS
Salary
FROM EMPLOYEE AS E
WHERE E.Dno=5

These notes cover the essential aspects of the PROJECT operation, its syntax,
usage, the concept of sequences of operations, and the RENAME operation in
relational algebra, including how they translate to SQL queries.

8.2 Relational Algebra Operations from Set Theory

8.2.1 The UNION, INTERSECTION, and MINUS Operations


The operations of UNION, INTERSECTION, and MINUS (also known as SET
DIFFERENCE or EXCEPT) are standard mathematical set operations that are
adapted for relational databases. These operations are binary, meaning they
are applied to two relations. For these operations to be valid, the two relations
must be union compatible (or type compatible). This means both relations
must have the same degree (number of attributes) and corresponding attributes
must have the same domains.


1. UNION (R S): This operation combines all tuples from both relations,
eliminating duplicates. The result is a relation containing all unique tuples
that are in either relation or both.

2. INTERSECTION (R ∩ S): This operation returns a relation containing only the


tuples that are present in both relations.

3. SET DIFFERENCE (R - S): This operation returns a relation containing the


tuples that are in the first relation but not in the second.

Unit 3: Relational Data Model 6


These operations are illustrated in Figure 8.4, where the relations STUDENT
and INSTRUCTOR are union compatible.

UNION combines the tuples from both STUDENT and INSTRUCTOR,


eliminating duplicates.

INTERSECTION includes only those tuples that are common to both


relations.

SET DIFFERENCE (STUDENT - INSTRUCTOR) shows tuples in STUDENT


that are not in INSTRUCTOR.

SET DIFFERENCE (INSTRUCTOR - STUDENT) shows tuples in


INSTRUCTOR that are not in STUDENT.

These operations are commutative and associative, allowing for combinations


and simplifications in complex queries. However, the SET DIFFERENCE
operation is not commutative (i.e., R - S ≠ S - R).
In SQL, these operations correspond to UNION, INTERSECT, and EXCEPT
respectively. SQL also supports multiset versions of these operations (UNION
ALL, INTERSECT ALL, and EXCEPT ALL) which do not eliminate duplicates.

8.2.2 The CARTESIAN PRODUCT (CROSS PRODUCT) Operation


The CARTESIAN PRODUCT (or CROSS PRODUCT) is a binary operation that
does not require the relations to be union compatible. It combines every tuple
from one relation with every tuple from another, resulting in a new relation. The
new relation has a degree equal to the sum of the degrees of the two original
relations.
For example, if R has attributes A1, A2, ..., An and S has attributes B1, B2, ...,
Bm, the resulting relation Q from R × S will have attributes A1, A2, ..., An, B1, B2,
..., Bm.
The CARTESIAN PRODUCT operation is rarely useful by itself but is often used
in conjunction with a selection to filter meaningful tuples. This combination is
common enough to warrant the creation of a specific operation called JOIN,
which we will discuss next.

Example: Combining Female Employees with Their Dependents


To retrieve the names of dependents for female employees, we can use the
CARTESIAN PRODUCT followed by a selection:

Unit 3: Relational Data Model 7


1. Select female employees:
FEMALE_EMPS ← σSex = 'F'(EMPLOYEE)

2. Project relevant attributes:


EMPNAMES ← πFname, Lname, Ssn(FEMALE_EMPS)

3. Compute the CARTESIAN PRODUCT:


EMP_DEPENDENTS ← EMPNAMES × DEPENDENT

4. Select tuples where the employee's Ssn matches the dependent's Essn:
ACTUAL_DEPENDENTS ← σSsn = Essn(EMP_DEPENDENTS)

5. Project the final result:


RESULT ← πFname, Lname, Dependent_name(ACTUAL_DEPENDENTS)

This sequence of operations is shown in Figure 8.5.

Summary
These set theory operations (UNION, INTERSECTION, and MINUS) and the
CARTESIAN PRODUCT are fundamental

8.3 Binary Relational Operations: JOIN and DIVISION

8.3.1 The JOIN Operation


Definition and Importance:
The JOIN operation, denoted by ⨝, is used to combine related tuples from two
relations into single “longer” tuples. It is crucial for processing relationships
among relations in a relational database with more than one relation.
Example:
To retrieve the name of the manager of each department, combine department
tuples with employee tuples where Mgr_ssn in the department matches Ssn in
the employee.
Operations:

DEPT_MGR ← DEPARTMENT ⨝ Mgr_ssn=Ssn EMPLOYEE

RESULT ← π Dname, Lname, Fname (DEPT_MGR)

Referential Integrity:
Mgr_ssn is a foreign key in the DEPARTMENT relation referencing Ssn in the
EMPLOYEE relation, ensuring matching tuples in the EMPLOYEE relation.

Unit 3: Relational Data Model 8


JOIN as Cartesian Product and Selection:
JOIN can be seen as a CARTESIAN PRODUCT followed by a SELECT operation.

Example sequence:

EMP_DEPENDENTS ← EMPNAMES × DEPENDENT

ACTUAL_DEPENDENTS ← σ Ssn=Essn (EMP_DEPENDENTS)

Replaced by:

ACTUAL_DEPENDENTS ← EMPNAMES ⨝ Ssn=Essn DEPENDENT

General Form of JOIN:


On two relations R(A1, A2, … , An) and S(B1, B2, … , Bm):

R <join condition> S

Resulting relation Q has n + m attributes. Only combinations satisfying the join


condition appear in the result.
Join Condition:
Specified on attributes from R and S, evaluated for each tuple combination.

Form: <condition> AND <condition> AND … AND <condition>


Each <condition> is of the form Ai θ Bj, where:

Ai is an attribute of R.

Bj is an attribute of S.

Ai and Bj have the same domain.

θ is a comparison operator {=, <, ≤, >, ≥, ≠}.

Result Inclusion:
Tuples are included in the resulting relation Q if the join condition evaluates to
TRUE. Does not necessarily preserve all information from the participating
relations.
Example Result:
Table: DEPT_MGR
Columns: Dname, Dnumber, Mgr_ssn, Fname, Minit, Lname, Ssn
Example rows:

Research, 5, 333445555, Franklin, T, Wong, 333445555

Administration, 4, 987654321, Jennifer, S, Wallace, 987654321

Unit 3: Relational Data Model 9


Headquarters, 1, 888665555, James, E, Borg, 888665555

8.3.2 Variations of JOIN: The EQUIJOIN and NATURAL JOIN


EQUIJOIN:
Most common JOIN involves equality comparisons only. Result includes pairs
of attributes with identical values in every tuple.
Example:
Attributes Mgr_ssn and Ssn have identical values in every tuple of DEPT_MGR.

NATURAL JOIN:
Denoted by *. Removes superfluous attributes from EQUIJOIN. Requires join
attributes to have the same name in both relations.
Example:
Combine PROJECT and DEPARTMENT by renaming Dnumber to Dnum:

PROJ_DEPT ← PROJECT * ρ(Dname, Dnum, Mgr_ssn, Mgr_start_date)


(DEPARTMENT)

Resulting relation:
Each tuple combines PROJECT tuple with DEPARTMENT tuple controlling the
project. Only one join attribute value is kept.

No Renaming Needed:
If join attributes already have the same names:
Example: DEPARTMENT and DEPT_LOCATIONS on Dnumber:

DEPT_LOCS ← DEPARTMENT * DEPT_LOCATIONS

Join Condition:
Constructed by equating pairs of join attributes with the same name in both
relations.

Example result:

PROJ_DEPT combines PROJECT tuples with controlling DEPARTMENT


tuples.

DEPT_LOCS combines DEPARTMENT with its locations.

Empty Result:
If no combination satisfies the join condition, the result is an empty relation.
Result size ranges from zero to nR * nS, where nR and nS are the number of
tuples in R and S.

Unit 3: Relational Data Model 10


Join Selectivity:
Expected size of the join result divided by the maximum size nR * nS.

CROSS JOIN:
If no join condition, all combinations of tuples qualify. Degenerates into a
CARTESIAN PRODUCT or CROSS PRODUCT.

Multiple Relations Join:


JOIN can be specified among multiple tables.
Example: three-way join:

((PROJECT Dnum=Dnumber DEPARTMENT) Mgr_ssn=Ssn EMPLOYEE)

SQL Implementation:
JOIN can be realized in several ways in SQL:

Specifying join conditions in the WHERE clause.

Using nested relations.

Using joined tables.

8.3.3 A Complete Set of Relational Algebra Operations


Complete Set:

{σ, π, , ρ, –, ×} is a complete set of relational algebra operations. Any other
relational algebra operations can be expressed using this set.
Examples:

INTERSECTION:

R ∩ S ≡ (R ∪ S) – ((R – S) ∪ (S – R))
JOIN:

R <condition> S ≡ σ<condition>(R × S)

Convenience Operations:
Operations like JOIN are important for convenience and common usage. Not
strictly necessary for the expressive power of relational algebra. Other
operations, such as NATURAL JOIN, are also included for convenience.

8.4 Additional Relational Operations

8.4.1 Generalized Projection

Unit 3: Relational Data Model 11


Definition:
The generalized projection operation extends the projection operation by
allowing functions of attributes to be included in the projection list. It enables
the inclusion of computed values in query results.
Example:
Suppose we need to produce a report with computed values from the
EMPLOYEE relation:
REPORT ← ρ(Ssn, Net_salary, Bonus, Tax)(π(Ssn, Salary - Deduction, 2000 *
Years_service, 0.25 * Salary)(EMPLOYEE))

8.4.2 Aggregate Functions and Grouping


Aggregate Functions:
Aggregate functions perform calculations on collections of values from the
database, like SUM, AVERAGE, MAXIMUM, MINIMUM, and COUNT. They are
used in statistical queries to summarize information.

Grouping:
Grouping involves grouping tuples by the value of some attributes and applying
an aggregate function independently to each group.

Example:
To retrieve department numbers, the number of employees, and their average
salary:
ρ(Dno, No_of_employees, Average_sal) (Dno ℑ COUNT Ssn, AVERAGE Salary
(EMPLOYEE))

8.4.3 Recursive Closure Operations


Definition:
Recursive closure operations handle recursive relationships between tuples of
the same type, such as employee-supervisor relationships. They are used to
retrieve all related tuples at all levels of recursion.

8.4.4 OUTER JOIN Operations


Definition:
Outer join operations keep all tuples from one or both relations in the result of a
join, regardless of matching tuples in the other relation. Types include LEFT
OUTER JOIN, RIGHT OUTER JOIN, and FULL OUTER JOIN.
Example:
To list all employee names and their department names, indicating if they

Unit 3: Relational Data Model 12


manage a department:
TEMP ← (EMPLOYEE Ssn=Mgr_ssnDEPARTMENT)
RESULT ← π(Fname, Minit, Lname, Dname)(TEMP)

8.4.5 The OUTER UNION Operation


Definition:
The OUTER UNION operation combines tuples from two relations partially
compatible in schema. It includes all tuples from both relations, padding
attributes with NULL where necessary.
Example:
To combine STUDENT and INSTRUCTOR relations:
STUDENT_OR_INSTRUCTOR(Name, Ssn, Department, Advisor, Rank)

8.5 Examples of Queries in Relational Algebra

Query 1:
Retrieve the name and address of all employees who work for the 'Research'
department.

RESULT ← π(Fname, Lname, Address) (σ(Dname='Research')(DEPA


RTMENT ⨝ (Dnumber=Dno) EMPLOYEE))

Query 2:
For every project located in 'Stafford', list the project number, the controlling
department number, and the department manager’s last name, address, and
birth date.

RESULT ← π(Pnumber, Dnum, Lname, Address, Bdate) (PROJECT ⨝


(Dnum=Dnumber) DEPARTMENT ⨝ (Mgr_ssn=Ssn) EMPLOYEE ⨝ σ(Plo
cation='Stafford')(PROJECT))

Query 3:
Find the names of employees who work on all the projects controlled by
department number 5.

Unit 3: Relational Data Model 13


RESULT ← π(Lname, Fname) ((EMPLOYEE ⨝ WORKS_ON) ÷ (π(Pnumbe
r) (σ(Dnum=5)(PROJECT))))

Query 4:
Make a list of project numbers for projects that involve an employee whose last
name is 'Smith', either as a worker or as a manager of the department that
controls the project.

RESULT ← π(Pno) ((WORKS_ON ⨝ (π(Ssn) (σ(Lname='Smith')(EMPL


OYEE)))) ∪ (π(Pnumber) (σ(Lname='Smith') (DEPARTMENT ⨝ (Mg
r_ssn=Ssn) EMPLOYEE))))

Query 5:
List the names of all employees with two or more dependents.

RESULT ← π(Lname, Fname) (EMPLOYEE ⨝ (ρ(Ssn, No_of_dependen


ts) (Essn Ï COUNT(Dependent_name)(DEPENDENT))))

Query 6:
Retrieve the names of employees who have no dependents.

RESULT ← π(Lname, Fname) (EMPLOYEE Ï (ρ(Ssn) (π(Essn)(DEPEN


DENT))))

Query 7:
List the names of managers who have at least one dependent.

RESULT ← π(Lname, Fname) ((DEPARTMENT ⨝ (Mgr_ssn=Ssn) EMPLO


YEE) ∩ (ρ(Ssn) (π(Essn)(DEPENDENT))))

Unit 3: Relational Data Model 14

You might also like