Unit 2
Unit 2
Unit 2
Relational query languages use relational algebra to break the user requests
and instruct the DBMS to execute the requests. It is the language by which user
communicates with the database. These relational query languages can be
procedural or non-procedural.
A procedural query language will have set of queries instructing the DBMS to
perform various transactions in the sequence to meet the user request. For
example, get_CGPA procedure will have various queries to get the marks of
student in each subject, calculate the total marks, and then decide the CGPA
based on his total marks. This procedural query language tells the database
what is required from the database and how to get them from the database.
Relational algebra is a procedural query language.
These query languages basically will have queries on tables in the database. In
the relational database, a table is known as relation. Records / rows of the table
are referred as tuples. Columns of the table are also known as attributes. All
these names are used interchangeably in relational database.
Relational algebra will have operators to indicate the operations. This algebra
can be applied on single relation – called unary or can be applied on two tables
– called binary. While applying the operations on the relation, the resulting
subset of relation is also known as new relation. There can be multiple steps
involved in some of the operations.
Select (σ)
Project (∏)
Rename (ρ)
Cartesian product (X)
Union (U)
Set-difference (-)
Select (σ) – This is a unary relational operation. This operation pulls the
horizontal subset (subset of rows) of the relation that satisfies the conditions.
This can use operators like <, >, <=, >=, = and != to filter the data from the
relation. It can also use logical AND, OR and NOT operators to combine the
various filtering conditions. This operation can be represented as below:
σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p
is the logical formula or the filtering conditions to get the subset.
It selects the record/tuple from the STUDENT table with Student name as
‘James’
Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the
attributes of the relations which will be shown in the resultant subset.
This will select all the records from STUDENT table but only selected columns –
std_name, address and course. Suppose we have to select only these 3
columns for particular student then we have to combine both project and select
operations.
This selects the record for ‘James’ and displays only std_ID, address and his
course columns. Here we can see two unary operators are combined, and it has
two operations performing. First it selects the tuple from STUDENT table for
‘James’. The resultant subset of STUDENT is also considered as intermediary
relation. But it is temporary and exists till the end of this operation. It then filters
the 3 columns from this temporary relation.
Prepared by : Dr. Ahmad Jamal
Rename (ρ)
Rename (ρ) – This is a unary operator used to rename the tables and columns
of a relation. When we perform self join operation, we have to differentiate two
same tables. In such case rename operator on tables comes into picture. When
we join two or more tables and if those tables have same column names, then it
is always better to rename the columns to differentiate them. This occurs when
we perform Cartesian product operation.
ρ R(E)
Where ρ is the rename operator, E is the existing relation name, and R is the
new relation name.
ρ STUDENT (STD_TABLE)
It will rename the columns in the order the names appear in the table
Cartesian product (X): – This is a binary operator. It combines the tuples of two
relations into one relation.
RXS
Where R and S are two relations and X is the operator. If relation R has m
tuples and relation S has n tuples, then the resultant relation will have mn
tuples. For example, if we perform cartesian product on EMPLOYEE (5 tuples)
and DEPT relations (3 tuples), then we will have new tuple with 15 tuples.
EMPLOYEE X DEPT
This operator will simply create a pair between the tuples of each table. i.e.;
each employee in the EMPLOYEE table will be mapped with each department
in DEPT table.
Union (U) – It is a binary operator, which combines the tuples of two relations. It
is denoted by
RUS
DESIGN_EMPLOYEE U TESTING_EMPLOYEE
Cartesian product combines the attributes of two relations into one relation
whereas Union combines the tuples of two relations into one relation.
In Union, both relations should have same number of columns. Suppose we
have to list the employees who are working for design and testing
department. Then we will do the union on employee table. Since it is union on
same table it has same number of attributes. Cartesian product does not
concentrate on number of attribute or rows. It blindly combines the attributes.
In Union, both relations should have same types of attributes in same order.
In the above example, since union is on employee relation, it has same type
of attribute in the same order.
R–S
Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each
row of the table and checks if the predicate is true or false for the given row.
Depending on the given predicate condition, it returns the row or part of the row.
Where t is the tuple variable that runs over every Row, and P(t) is the predicate
logic expression or condition.
Solution:
{t \| Customer(t) ∧ t[Zipcode] = 12345 }
The TRC expression can be read as "Return all the tuple which belongs to the
Customer Table and whose Zipcode is equal to 12345."
Result:
Domain Relational Calculus uses domain Variables to get the column values
required from the database based on the predicate expression or condition.
Example:
{<x1,x2,x3,x4...> \| P(x1,x2,x3,x4...)}
where,
<x1,x2,x3,x4...> are domain variables used to get the column values required,
and P(x1,x2,x3...) is predicate expression or condition.
Question 2: Write a DRC query to get the data of all customers with Zip code
12345.
Result:
Enlisted below are the most popular Enterprise-Grade Open Source DBMS that
are used worldwide.
Altibase
MySQL
PostgreSQL
Maria DB
MongoDB
Cassandra
SQLite
Cubrid
SQL Server
Prepared by : Dr. Ahmad Jamal
Commercial Database
A commercial database is one created for commercial purposes only and it's
available at a price. Unlike open source databases, commercial databases can
only be viewed or modified by authorized users.
Relational database design (RDD) models information and data into a set of
tables with rows and columns. Each row of a relation/table represents a record,
and each column represents an attribute of data. The Structured Query
Language (SQL) is used to manipulate relational databases.
The design of a relational database is composed of four stages, where the data
are modeled into a set of related tables. The stages are:
Define relations/attributes
Define primary keys
Define relationships
Normalization
Keys in DBMS are used to uniquely identify any record or row of data from the
table. It is also used to establish and identify relationships between tables.
Keys in DBMS are of different types eg: Super key, Candidate key, Primary Key,
Foreign key, etc.
Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.
The data type defined for a column in a database is called a database domain.
This data type can either be a built-in type (such as an integer or a string) or a
custom type that defines data constraints.
Example
Let's say that we have a table, that stores student records. Now, suppose there
is a column, or attribute to store the students' contact numbers. The column for
contact numbers should only expect numeric values, usually of the type INT,
which is the domain for that attribute.
Domain Constraints are user-defined columns that assist the user in entering
values that are appropriate for the data type. If it detects an incorrect input, it
informs the user that the column is not properly filled. Or, to put it another way,
it's an attribute that describes all of the potential values for the attribute, such as
integer, character, date, time, string, and so on.
Data Dependency is the relationship between the data stored in the tables.
Functional Dependency
Fully-Functional Dependency
Transitive Dependency
Multivalued Dependency
Partial Dependency
When existence of one or more rows in a table implies one or more other rows
in the same table, then the Multi-valued dependencies occur.
Example:
P->->Q
Q→→R
The StudentName can be determined by StudentID that makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent. Prepared by : Dr. Ahmad Jamal
Normalization
Normalization is the process of organizing the data and the attributes of a
database. It is performed to reduce the data redundancy in a database and to
ensure that data is stored logically. Data redundancy in DBMS means having
the same data but at multiple places. It is necessary to remove data redundancy
because it causes anomalies in a database which makes it very hard for a
database administrator to maintain it.
Normal Forms
To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as
In next slide:
For a relational table to be in second normal form, it must satisfy the following
rules:
The table must be in first normal form.
It must not contain any partial dependency.
If a partial dependency exists, we can divide the table to remove the partially
dependent attributes and move them to some other table where they fit in
well.
Prepared by : Dr. Ahmad Jamal
Table Name: EmployeeProjectDetail
In the above table, the prime attributes of the table are Employee Code and Project ID. We
have partial dependencies in this table because Employee Name can be determined by
Employee Code and Project Name can be determined by Project ID. Thus, the above
relational table violates the rule of 2NF.
For a relational table to be in third normal form, it must satisfy the following
rules:
The table must be in the second normal form.
No non-prime attribute is transitively dependent on the primary key.
For each functional dependency X -> Z at least one of the following conditions
hold:
1: X is a super key of the table.
2: Z is a prime attribute of the table.
The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:
Employee Code -> Employee Zipcode
Employee Zipcode -> Employee City
Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.
To remove transitive dependency from this table and normalize it into the third normal form,
we can decompose the <EmployeeDetail> table into the following two tables:
Prepared by : Dr. Ahmad Jamal
Thus, we’ve converted the <EmployeeDetail>
table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation>
tables as they are in 2NF and they don’t have
any transitive dependency.
Prepared by : Dr. Ahmad Jamal
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains
additional constraints compared to 3NF.
A superkey is a set of one or more attributes that can uniquely identify a row in
a database table.
To convert the given table into BCNF, we decompose it into three tables:
Prepared by : Dr. Ahmad Jamal
Thus, we have converted the
<EmployeeProjectLead> table into BCNF by
decomposing it into <EmployeeProject> and
<ProjectLead> tables. Prepared by : Dr. Ahmad Jamal
Armstrong’s Axioms
William Armstrong in 1974 suggested a few rules related to functional
dependency. They are called RAT rules.
If a relation is not properly decomposed, then it may lead to other problems like
information loss, etc. There are two types of decomposition as shown below:
1: Lossless Join Decomposition
2: Dependency Preserving
No information is lost while decomposing from the original relation.
If we join back the sub decomposed relations, the same relation that was
decomposed is obtained.
Let’s say we have a relation R and we decomposed it into R1 and R2, then
the rules are:
The union of attributes of both the sub relations R1 and R2 must contain all
the attributes of original relation R.
R1 ∪ R2 = R
The intersection of attributes of both the sub relations R1 and R2 must not be
null, i.e., there should be some attributes that are present in both R1 and R2.
R1 ∩ R2 ≠ ∅
The intersection of attributes of both the sub relations R1 and R2 must be the
superkey of R1 or R2, or both R1 and R2.
R1 ∩ R2 = Super key of R1 or R2
In a lossy decomposition, one or more of these conditions would fail and we will
not be able to recover Complete information as present in the original relation.
Query Processing is the activity performed in extracting data from the database.
The activities involved in parsing, validating, execution and optimizing a query is called
Query Processing.
Steps:
The steps involved in query processing and optimization are as follows:
A sequence of primitive operations that can be used to evaluate a query is called
query execution plan or query evaluation plan.
The query execution engine takes a query evaluation plan, executes that plan and
produces the desired output. The different execution plans for a given query can
have different costs based on the number of disks. It is the responsibility of the
system to construct a query evaluation plan which minimizes the cost of query
evaluation. This task is called query optimization.
Don’t store the result of A ⋈ B in a temporary file. Instead the result is passed
directly for projection with C and so on.
Prepared by : Dr. Ahmad Jamal
Query
Equivalence
Query equivalence deals with queries that are equal.
This means the output of the two queries you consider will be the same.
For example,
Consider a table called employee with fields like name, age, salary and
department.
Query 1:
SELECT * FROM (SELECT * FROM EMPLOYEE WHERE AGE>30)
WHERE SALARY>50000;
Query 2:
SELECT * FROM (SELECT * FROM EMPLOYEE WHERE SALARY>50000)
WHERE AGE>30;
Prepared by : Dr. Ahmad Jamal
These two queries output will be the records of the employees whose salary is
greater than 50000 and age is greater than 30.
“The equivalence rule says that expressions of two forms are the same or
equivalent because both expressions produce the same outputs on any legal
database instance. It means that we can possibly replace the expression of
the first form with that of the second form and replace the expression of the
second form with an expression of the first form. Thus, the optimizer of the
query-evaluation plan uses such an equivalence rule or method for
transforming expressions into the logically equivalent one.”
Join in DBMS is a binary operation which allows you to combine join product
and selection in one single statement. The goal of creating a join condition is
that it helps you to combine the data from two or more DBMS tables.
Types of Join
There are mainly two types of joins in DBMS:
Inner Joins: Theta, Natural, EQUI
Outer Join: Left, Right, Full
Inner Join is used to return rows from both tables which satisfy the given
condition. It is the most widely used join operation and can be considered as a
default join-type.
Theta join
Natural join
EQUI join
Theta Join allows you to merge two tables based on the condition represented
by theta. Theta joins work for all comparison operators. It is denoted by
symbol θ. The general case of JOIN operation is called a Theta join.
Syntax:
A ⋈θ B
Natural join does not use any comparison operator. It does not concatenate
the way a Cartesian product does. We can perform a Natural Join only if there
is at least one common attribute that exists between two relations. In addition,
the attributes must have the same name and domain.
Natural join acts on those matching attributes where the values of attributes in
both the relations are same.
All the tuples from the Left relation, R, are included in the resulting relation. If
there are tuples in R without any matching tuple in the Right relation S, then
the S-attributes of the resulting relation are made NULL.
All the tuples from the Right relation, S, are included in the resulting relation. If
there are tuples in S without any matching tuple in R, then the R-attributes of
resulting relation are made NULL.
All the tuples from both participating relations are included in the resulting
relation. If there are no matching tuples for both relations, their respective
unmatched attributes are made NULL.