Unit 2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

Relational Query Languages

Relational query languages use relational algebra to break the user requests
and instruct the DBMS to execute the requests. It is the language by which user
communicates with the database. These relational query languages can be
procedural or non-procedural.

Prepared by : Dr. Ahmad Jamal


Procedural Query Language

A procedural query language will have set of queries instructing the DBMS to
perform various transactions in the sequence to meet the user request. For
example, get_CGPA procedure will have various queries to get the marks of
student in each subject, calculate the total marks, and then decide the CGPA
based on his total marks. This procedural query language tells the database
what is required from the database and how to get them from the database.
Relational algebra is a procedural query language.

Prepared by : Dr. Ahmad Jamal


Non-Procedural Query Language
Non-procedural queries will have single query on one or more tables to get
result from the database. For example, get the name and address of the student
with particular ID will have single query on STUDENT table. Relational
Calculus is a non procedural language which informs what to do with the
tables, but doesn’t inform how to accomplish this.

These query languages basically will have queries on tables in the database. In
the relational database, a table is known as relation. Records / rows of the table
are referred as tuples. Columns of the table are also known as attributes. All
these names are used interchangeably in relational database.

Prepared by : Dr. Ahmad Jamal


Relational Algebra
Relational algebra is a procedural query language. It takes one or more
relations / tables and performs the operation and produce the result. This result
is also considered as a new table or relation. Suppose we have to retrieve
student name, address and class for the given ID. What a relational algebra will
do in this case is, it filters the name, address and class from the STUDENT
table for the input ID. In mathematical terms, relational algebra has produced a
subset of STUDENT table for the given ID.

Relational algebra will have operators to indicate the operations. This algebra
can be applied on single relation – called unary or can be applied on two tables
– called binary. While applying the operations on the relation, the resulting
subset of relation is also known as new relation. There can be multiple steps
involved in some of the operations.

Prepared by : Dr. Ahmad Jamal


Relational Algebra in DBMS has 6 fundamental operations.


Select (σ)

Project (∏)

Rename (ρ)

Cartesian product (X)

Union (U)

Set-difference (-)

Prepared by : Dr. Ahmad Jamal


Select (σ)

Select (σ) – This is a unary relational operation. This operation pulls the
horizontal subset (subset of rows) of the relation that satisfies the conditions.
This can use operators like <, >, <=, >=, = and != to filter the data from the
relation. It can also use logical AND, OR and NOT operators to combine the
various filtering conditions. This operation can be represented as below:

σ p (r)
Where σ is the symbol for select operation, r represents the relation/table, and p
is the logical formula or the filtering conditions to get the subset.

Prepared by : Dr. Ahmad Jamal


Select (σ)
Examples

σSTD_NAME = “James” (STUDENT)

It selects the record/tuple from the STUDENT table with Student name as
‘James’

σdept_id = 20 AND salary>=10000 (EMPLOYEE) –

Selects the records from EMPLOYEE table with department ID = 20 and


employees whose salary is more than 10000.

Prepared by : Dr. Ahmad Jamal


Project (∏)
Project (∏) – This is a unary operator and is similar to select operation above. It
creates the subset of relation based on the conditions specified. Here, it selects
only selected columns/attributes from the relation- vertical subset of relation.
The select operation above creates subset of relation but for all the attributes in
the relation. It is denoted as below:

∏a1, a2, a3 (r)

Where ∏ is the operator for projection, r is the relation and a1, a2, a3 are the
attributes of the relations which will be shown in the resultant subset.

Prepared by : Dr. Ahmad Jamal


Project (∏)
Examples
∏std_name, address, course (STUDENT)

This will select all the records from STUDENT table but only selected columns –
std_name, address and course. Suppose we have to select only these 3
columns for particular student then we have to combine both project and select
operations.

∏STD_ID, address, course (σ STD_NAME = “James”(STUDENT))

This selects the record for ‘James’ and displays only std_ID, address and his
course columns. Here we can see two unary operators are combined, and it has
two operations performing. First it selects the tuple from STUDENT table for
‘James’. The resultant subset of STUDENT is also considered as intermediary
relation. But it is temporary and exists till the end of this operation. It then filters
the 3 columns from this temporary relation.
Prepared by : Dr. Ahmad Jamal
Rename (ρ)

Rename (ρ) – This is a unary operator used to rename the tables and columns
of a relation. When we perform self join operation, we have to differentiate two
same tables. In such case rename operator on tables comes into picture. When
we join two or more tables and if those tables have same column names, then it
is always better to rename the columns to differentiate them. This occurs when
we perform Cartesian product operation.

ρ R(E)

Where ρ is the rename operator, E is the existing relation name, and R is the
new relation name.

Prepared by : Dr. Ahmad Jamal


Rename (ρ)
Examples

ρ STUDENT (STD_TABLE)

Renames STD_TABLE table to STUDENT

ρ STD_ID, STD_NAME, STD_ADDRESS(STUDENT)

It will rename the columns in the order the names appear in the table

Prepared by : Dr. Ahmad Jamal


Cartesian product (X)

Cartesian product (X): – This is a binary operator. It combines the tuples of two
relations into one relation.

RXS

Where R and S are two relations and X is the operator. If relation R has m
tuples and relation S has n tuples, then the resultant relation will have mn
tuples. For example, if we perform cartesian product on EMPLOYEE (5 tuples)
and DEPT relations (3 tuples), then we will have new tuple with 15 tuples.

Prepared by : Dr. Ahmad Jamal


Cartesian product (X)
Example

EMPLOYEE X DEPT

This operator will simply create a pair between the tuples of each table. i.e.;
each employee in the EMPLOYEE table will be mapped with each department
in DEPT table.

Prepared by : Dr. Ahmad Jamal


Cartesian product (X)
Example

Prepared by : Dr. Ahmad Jamal


Union (U)

Union (U) – It is a binary operator, which combines the tuples of two relations. It
is denoted by

RUS

Where R and S are the relations and U is the operator.

Prepared by : Dr. Ahmad Jamal


Union (U)
Example

DESIGN_EMPLOYEE U TESTING_EMPLOYEE

Where DESIGN_EMPLOYEE and TESTING_EMPLOYEE are two relations.

Prepared by : Dr. Ahmad Jamal


Union (U)
Example

Prepared by : Dr. Ahmad Jamal


How Union is different from Cartesian product


Cartesian product combines the attributes of two relations into one relation
whereas Union combines the tuples of two relations into one relation.

In Union, both relations should have same number of columns. Suppose we
have to list the employees who are working for design and testing
department. Then we will do the union on employee table. Since it is union on
same table it has same number of attributes. Cartesian product does not
concentrate on number of attribute or rows. It blindly combines the attributes.

In Union, both relations should have same types of attributes in same order.
In the above example, since union is on employee relation, it has same type
of attribute in the same order.

Prepared by : Dr. Ahmad Jamal


Set-difference (-)

Set-difference (-) – This is a binary operator. This operator creates a new


relation with tuples that are in one relation but not in other relation. It is denoted
by ‘-‘symbol.

R–S

Where R and S are the relations.

Prepared by : Dr. Ahmad Jamal


Set-difference (-)
DESIGN_EMPLOYEE −TESTING_EMPLOYEE

In case we want to retrieve the employees who are working in Design


department but not in testing.

Prepared by : Dr. Ahmad Jamal


Relational Calculus
Relational Calculus in database management system (DBMS) is all about
"What you want ?". Relational calculus does not tell us how to get the results
from the Database, but it just cares about what we want.

Prepared by : Dr. Ahmad Jamal


Tuple Relational Calculus (TRC)

Tuple Relational Calculus in DBMS uses a tuple variable (t) that goes to each
row of the table and checks if the predicate is true or false for the given row.
Depending on the given predicate condition, it returns the row or part of the row.

The Tuple Relational Calculus expression Syntax


{t \| P(t)}

Where t is the tuple variable that runs over every Row, and P(t) is the predicate
logic expression or condition.

Prepared by : Dr. Ahmad Jamal


Tuple Relational Calculus (TRC)
Example
Question 1: Write a TRC query to get all the data of customers whose zip code is 12345.

Solution:
{t \| Customer(t) ∧ t[Zipcode] = 12345 }

Prepared by : Dr. Ahmad Jamal


Workflow of query - The tuple variable "t" will go through every tuple of the
Customer table. Each row will check whether the Cust_Zipcode is 12345 or not
and only return those rows that satisfies the Predicate expression condition.

The TRC expression can be read as "Return all the tuple which belongs to the
Customer Table and whose Zipcode is equal to 12345."

Result:

Prepared by : Dr. Ahmad Jamal


Domain Relational Calculus (DRC)

Domain Relational Calculus uses domain Variables to get the column values
required from the database based on the predicate expression or condition.

Example:
{<x1,x2,x3,x4...> \| P(x1,x2,x3,x4...)}

where,
<x1,x2,x3,x4...> are domain variables used to get the column values required,
and P(x1,x2,x3...) is predicate expression or condition.

Prepared by : Dr. Ahmad Jamal


DRC Example

Question 2: Write a DRC query to get the data of all customers with Zip code
12345.

Solution: {<x1,x2,x3> \| <x1,x2> ∈ Customer ∧ x3 = 12345 }

Prepared by : Dr. Ahmad Jamal


Workflow of Query: In the above query x1,x2,x3 (ordered) refers to the attribute
or column which we need in the result, and the predicate condition is that the
first two domain variables x1 and x2 should be present while matching the
condition for each row and the third domain variable x3 should be equal to
12345.

Result:

Prepared by : Dr. Ahmad Jamal


SQL3
SQL3 was accepted as the new standard for SQL in 1999, after more than 7
years of debate. Basically, SQL3 includes data definition and management
techniques from Object-Oriented dbms, OO-dbms, while maintaining the
relational dbms platform. Based on this merger of concepts and techniques,
DBMSs that support SQL3 are called Object-Relational or or-dbms'.

The most central data modelling notions included in SQL3 are:



Classification hierarchies

Embedded structures that support composite attributes

Collection data-types (sets, lists/arrays, and multi-sets) that can be used for
multi-valued attribute types

Large OBject types, LOBs, within the DB, as opposed to requiring external
storage, and

User defined data-types and functions (UDT/UDF) that can be used to define
complex structures and derived attribute value calculations, among many
other function extensions. Prepared by : Dr. Ahmad Jamal
DDL and DML constructs

Prepared by : Dr. Ahmad Jamal


Open source DBMS
An open source database allows users to create a system based on their
unique requirements and business needs. It is free and can also be shared. The
source code can be modified to match any user preference.

Enlisted below are the most popular Enterprise-Grade Open Source DBMS that
are used worldwide.

Altibase

MySQL

PostgreSQL

Maria DB

MongoDB

Cassandra

SQLite

Cubrid

SQL Server
Prepared by : Dr. Ahmad Jamal
Commercial Database

A commercial database is one created for commercial purposes only and it's
available at a price. Unlike open source databases, commercial databases can
only be viewed or modified by authorized users.

Oracle, DB2, Splunk etc.

Prepared by : Dr. Ahmad Jamal


Relational Database Design

Relational database design (RDD) models information and data into a set of
tables with rows and columns. Each row of a relation/table represents a record,
and each column represents an attribute of data. The Structured Query
Language (SQL) is used to manipulate relational databases.

The design of a relational database is composed of four stages, where the data
are modeled into a set of related tables. The stages are:


Define relations/attributes

Define primary keys

Define relationships

Normalization

Prepared by : Dr. Ahmad Jamal


Key

Keys in DBMS are used to uniquely identify any record or row of data from the
table. It is also used to establish and identify relationships between tables.

Keys in DBMS are of different types eg: Super key, Candidate key, Primary Key,
Foreign key, etc.

Prepared by : Dr. Ahmad Jamal


Primary key
It is the first key used to identify one and only one instance of an entity uniquely.
An entity can contain multiple keys, as we saw in the PERSON table. The key
which is most suitable from those lists becomes a primary key.

Prepared by : Dr. Ahmad Jamal


Candidate key
A candidate key is an attribute or set of attributes that can uniquely identify a
tuple.

Except for the primary key, the remaining attributes are considered a candidate
key. The candidate keys are as strong as the primary key.

Prepared by : Dr. Ahmad Jamal


Super Key
Super key is an attribute set that can uniquely identify a tuple. A super key is a
superset of a candidate key.

In the above EMPLOYEE table, for(EMPLOEE_ID, EMPLOYEE_NAME), the name of two


employees can be the same, but their EMPLYEE_ID can't be the same. Hence, this
combination can also be a key.
Prepared by : Dr. Ahmad Jamal
Foreign key

Foreign keys are the column of the table used to point to the primary key of another table.

Every employee works in a specific department in a company, and employee and department
are two different entities. So we can't store the department's information in the employee
table. That's why we link these two tables through the primary key of one table.

We add the primary key of the DEPARTMENT table, Department_Id, as a new attribute in the
EMPLOYEE table.

In the EMPLOYEE table, Department_Id is the foreign key, and both the tables are related.

Prepared by : Dr. Ahmad Jamal


Domain
A domain is a unique set of values that can be assigned to an attribute in a
database. For example, a domain of strings can accept only string values.

The data type defined for a column in a database is called a database domain.
This data type can either be a built-in type (such as an integer or a string) or a
custom type that defines data constraints.

Example
Let's say that we have a table, that stores student records. Now, suppose there
is a column, or attribute to store the students' contact numbers. The column for
contact numbers should only expect numeric values, usually of the type INT,
which is the domain for that attribute.

Prepared by : Dr. Ahmad Jamal


Domain Integrity Constraints

Domain Constraints are user-defined columns that assist the user in entering
values that are appropriate for the data type. If it detects an incorrect input, it
informs the user that the column is not properly filled. Or, to put it another way,
it's an attribute that describes all of the potential values for the attribute, such as
integer, character, date, time, string, and so on.

Types of Domain Constraints


1: NOT NULL
2: Check

Prepared by : Dr. Ahmad Jamal


Data Dependency

Data Dependency is the relationship between the data stored in the tables.

It has the following types:


Functional Dependency

Fully-Functional Dependency

Transitive Dependency

Multivalued Dependency

Partial Dependency

Prepared by : Dr. Ahmad Jamal


Functional Dependency

If the information stored in a table can uniquely determine another information in


the same table, then it is called Functional Dependency. Consider it as an
association between two attributes of the same relation.

If P functionally determines Q, then


P -> Q

Prepared by : Dr. Ahmad Jamal


Functional Dependency
Example

In the above table, EmpName is functionally dependent on EmpID because


EmpName can take only one value for the given value of EmpID:

EmpID -> EmpName

Prepared by : Dr. Ahmad Jamal


Fully-functionally Dependency
An attribute is fully functional dependent on another attribute, if it is Functionally
Dependent on that attribute and not on any of its proper subset.

For example, an attribute Q is fully functional dependent on another attribute P,


if it is Functionally Dependent on P and not on any of the proper subset of P.

{EmpID, ProjectID} Â -> (Days)

Prepared by : Dr. Ahmad Jamal


Transitive Dependency

When an indirect relationship causes functional dependency it is called


Transitive Dependency.

If P -> Q and Q -> R is true, then P-> R is a transitive dependency.

Prepared by : Dr. Ahmad Jamal


Multivalued Dependency

When existence of one or more rows in a table implies one or more other rows
in the same table, then the Multi-valued dependencies occur.

If a table has attributes P, Q and R, then Q and R are multi-valued facts of P.

It is represented by double arrow − →→

Example:
P->->Q
Q→→R

In the above case, Multivalued Dependency exists only if Q and R are


independent attributes. Prepared by : Dr. Ahmad Jamal
Multivalued Dependency occurs when two attributes in a table are independent of
each other but are dependent on the third attribute.
Partial Dependency
Partial Dependency occurs when a nonprime attribute is functionally dependent
on part of a candidate key.

In the above table, we have partial dependency; let us see how:

The prime key attributes are StudentID and ProjectNo.


As stated, the non-prime attributes i.e. StudentName and ProjectName should be
functionally dependent on part of a candidate key, to be Partial Dependent.

The StudentName can be determined by StudentID that makes the relation Partial
Dependent.
The ProjectName can be determined by ProjectID, which that the relation Partial
Dependent. Prepared by : Dr. Ahmad Jamal
Normalization
Normalization is the process of organizing the data and the attributes of a
database. It is performed to reduce the data redundancy in a database and to
ensure that data is stored logically. Data redundancy in DBMS means having
the same data but at multiple places. It is necessary to remove data redundancy
because it causes anomalies in a database which makes it very hard for a
database administrator to maintain it.

Normal Forms

Prepared by : Dr. Ahmad Jamal


First Normal Form (1NF)

A relation is in 1NF if every attribute is a single-valued attribute or it does not


contain any multi-valued or composite attribute, i.e., every attribute is an atomic
attribute. If there is a composite or multi-valued attribute, it violates the 1NF. To
solve this, we can create a new row for each of the values of the multi-valued
attribute to convert the table into the 1NF.

Prepared by : Dr. Ahmad Jamal


Here, the Employee Phone Number is a multi-valued attribute. So, this relation is not in 1NF.

To convert this table into 1NF, we make new rows with each Employee Phone Number as a new row as
In next slide:

Prepared by : Dr. Ahmad Jamal


1NF

Prepared by : Dr. Ahmad Jamal


Second Normal Form (2NF)
The normalization of 1NF relations to 2NF involves the elimination of partial
dependencies. A partial dependency in DBMS exists when any non-prime
attributes, i.e., an attribute not a part of the candidate key, is not fully
functionally dependent on one of the candidate keys.

For a relational table to be in second normal form, it must satisfy the following
rules:


The table must be in first normal form.

It must not contain any partial dependency.


If a partial dependency exists, we can divide the table to remove the partially
dependent attributes and move them to some other table where they fit in
well.
Prepared by : Dr. Ahmad Jamal
Table Name: EmployeeProjectDetail

In the above table, the prime attributes of the table are Employee Code and Project ID. We
have partial dependencies in this table because Employee Name can be determined by
Employee Code and Project Name can be determined by Project ID. Thus, the above
relational table violates the rule of 2NF.

Prepared by : Dr. Ahmad Jamal


To remove partial dependencies from this table and normalize it into second
normal form, we can decompose the EmployeeProjectDetail table into the
following three tables:

Prepared by : Dr. Ahmad Jamal


Prepared by : Dr. Ahmad Jamal
Third Normal Form (3NF)
The normalization of 2NF relations to 3NF involves the elimination of transitive
dependencies in DBMS.

For a relational table to be in third normal form, it must satisfy the following
rules:

The table must be in the second normal form.

No non-prime attribute is transitively dependent on the primary key.

For each functional dependency X -> Z at least one of the following conditions
hold:
1: X is a super key of the table.
2: Z is a prime attribute of the table.

If a transitive dependency exists, we can divide the table to remove the


transitively dependent attributes and place them to a new table along with a
copy of the determinant.
Prepared by : Dr. Ahmad Jamal
If a transitive dependency exists, we can divide the table to remove the
transitively dependent attributes and place them to a new table along with a
copy of the determinant.

The above table is not in 3NF because it has Employee Code -> Employee City transitive
dependency because:
Employee Code -> Employee Zipcode
Employee Zipcode -> Employee City
Also, Employee Zipcode is not a super key and Employee City is not a prime attribute.
To remove transitive dependency from this table and normalize it into the third normal form,
we can decompose the <EmployeeDetail> table into the following two tables:
Prepared by : Dr. Ahmad Jamal
Thus, we’ve converted the <EmployeeDetail>
table into 3NF by decomposing it into
<EmployeeDetail> and <EmployeeLocation>
tables as they are in 2NF and they don’t have
any transitive dependency.
Prepared by : Dr. Ahmad Jamal
Boyce-Codd Normal Form (BCNF)
Boyce-Codd Normal Form(BCNF) is an advanced version of 3NF as it contains
additional constraints compared to 3NF.

For a relational table to be in Boyce-Codd normal form, it must satisfy the


following rules:

The table must be in the third normal form.


For every non-trivial functional dependency X -> Y, X is the superkey of the
table. That means X cannot be a non-prime attribute if Y is a prime attribute.

A superkey is a set of one or more attributes that can uniquely identify a row in
a database table.

Prepared by : Dr. Ahmad Jamal


The above table satisfies all the normal forms till 3NF, but it violates the rules of BCNF
because the candidate key of the above table is {Employee Code, Project ID}. For the non-
trivial functional dependency, Project Leader -> Project ID, Project ID is a prime attribute
but Project Leader is a non-prime attribute. This is not allowed in BCNF.

To convert the given table into BCNF, we decompose it into three tables:
Prepared by : Dr. Ahmad Jamal
Thus, we have converted the
<EmployeeProjectLead> table into BCNF by
decomposing it into <EmployeeProject> and
<ProjectLead> tables. Prepared by : Dr. Ahmad Jamal
Armstrong’s Axioms
William Armstrong in 1974 suggested a few rules related to functional
dependency. They are called RAT rules.

1. Reflexivity: If A is a set of attributes and B is a subset of A, then the


functional dependency A → B holds true.

For example, { Employee_Id, Name } → Name

2. Augmentation: If a functional dependency A → B holds true, then appending


any number of the attribute to both sides of dependency doesn't affect the
dependency. It remains true.

For example, X → Y holds true then, ZX → ZY also holds true.


For example, if { Employee_Id, Name } → { Name } holds true then,
{ Employee_Id, Name, Age } → { Name, Age } Prepared by : Dr. Ahmad Jamal
3. Transitivity: If two functional dependencies X → Y and Y → Z hold true, then
X → Z also holds true by the rule of Transitivity.

For example, if { Employee_Id } → { Name } holds true


and { Name } → { Department } holds true,
then { Employee_Id } → { Department } also holds true.

Prepared by : Dr. Ahmad Jamal


Decomposition
Decomposition in Database Management System is to break a relation into
multiple relations to bring it into an appropriate normal form. It helps to remove
redundancy, inconsistencies, and anomalies from a database. The
decomposition of a relation R in a relational schema is the process of replacing
the original relation R with two or more relations in a relational schema. Each of
these relations contains a subset of the attributes of R and together they include
all attributes of R.

If a relation is not properly decomposed, then it may lead to other problems like
information loss, etc. There are two types of decomposition as shown below:

Prepared by : Dr. Ahmad Jamal


Rules for Decomposition

Whenever we decompose a relation, there are certain properties that must be


satisfied to ensure no information is lost while decomposing the relations. These
properties are:


1: Lossless Join Decomposition

2: Dependency Preserving

Prepared by : Dr. Ahmad Jamal


1: Lossless Join Decomposition

A lossless Join decomposition ensures two things:


No information is lost while decomposing from the original relation.


If we join back the sub decomposed relations, the same relation that was
decomposed is obtained.

Prepared by : Dr. Ahmad Jamal



We can follow certain rules to ensure that the decomposition is a lossless join
decomposition


Let’s say we have a relation R and we decomposed it into R1 and R2, then
the rules are:

The union of attributes of both the sub relations R1 and R2 must contain all
the attributes of original relation R.

R1 ∪ R2 = R

The intersection of attributes of both the sub relations R1 and R2 must not be
null, i.e., there should be some attributes that are present in both R1 and R2.

R1 ∩ R2 ≠ ∅

The intersection of attributes of both the sub relations R1 and R2 must be the
superkey of R1 or R2, or both R1 and R2.

R1 ∩ R2 = Super key of R1 or R2

Prepared by : Dr. Ahmad Jamal


Lossless vs Lossy Decomposition

In a lossy decomposition, one or more of these conditions would fail and we will
not be able to recover Complete information as present in the original relation.

Prepared by : Dr. Ahmad Jamal


2: Dependency Preserving

The second property of lossless decomposition is dependency preservation


which says that after decomposing a relation R into R1 and R2, all
dependencies of the original relation R must be present either in R1 or R2 or
they must be derivable using the combination of functional dependencies
present in R1 and R2.

Prepared by : Dr. Ahmad Jamal


What we learn till now
Conclusion

Decomposition is the process of breaking an original relation into multiple
sub relations.

Decomposition helps to remove anomalies redundancy, and other problems
in a DBMS.

Decomposition can be lossy or lossless.

An ideal decomposition should be lossless join decomposition and
dependency preserving.

Prepared by : Dr. Ahmad Jamal


Query Processing and Optimization

Query Processing is the activity performed in extracting data from the database.

The activities involved in parsing, validating, execution and optimizing a query is called
Query Processing.

Steps:
The steps involved in query processing and optimization are as follows:

A sequence of primitive operations that can be used to evaluate a query is called
query execution plan or query evaluation plan.

The query execution engine takes a query evaluation plan, executes that plan and
produces the desired output. The different execution plans for a given query can
have different costs based on the number of disks. It is the responsibility of the
system to construct a query evaluation plan which minimizes the cost of query
evaluation. This task is called query optimization.

Prepared by : Dr. Ahmad Jamal



A query optimization is expressed in high level query language and is
scanned, parsed and validated. The scanner identifies the SQL keywords,
attributes and relation names in the text of the query.

The parser checks the syntax which is used to determine if the query is
formulated according to syntax rules of the query language.

Finally, the query is evaluated by checking that all attributes and relation
names are valid and semantically meaningful in the schema of a particular
database.

An internal representation of the query is then created which is either a tree
or a graph known as query tree or query graph.

If the query written SQL is translated into relational algebra, then its internal
representation is a query tree. Otherwise, if TRC or DRC its internal
representation is a query graph. A graph has many possible execution
strategies and the process of choosing a strategy with minimum cost is
called query optimization.
Prepared by : Dr. Ahmad Jamal
The query processing and optimization in the DBMS are
explained in the form of a diagram below −

Prepared by : Dr. Ahmad Jamal


Evaluation of relational algebra expression
SQL queries are decomposed into query blocks. One query block contains a
single SELECT-FROM-WHERE expression, as well as GROUP BY and
HAVING clause (if any). Nested queries are split into separate query blocks.

Select lastname, firstname from employee where salary>(select max(salary)


from employee where deptname =CSE ;
C=(select max(salary) from employee where deptname=CSE); // inner block
Select lastname, firstname from employee where salary>c; //outer block

Where C represents the result returned from the inner block.


The relation algebra for the inner block is
Ģmax(salary) (σdname=CSE(employee))
The relation algebra for the outer blocks is
Πlastname, firstname(σsalary>c(employee))
The query optimizer would then choose an execution or evaluation plan for
each block. Prepared by : Dr. Ahmad Jamal
Materialized evaluation

Evaluate one operation at a time. Evaluate the expression in a bottom-up


manner and stores intermediate results to temporary files.

Store the result of A ⋈ B in a temporary file.


Store the result of C ⋈ D in a temporary file.
Finally, join the results stored in temporary files.
The overall cost=sum of costs of individual operations + cost of writing
intermediate results to disk, cost of writing results to results to temporary files
and reading them back is quite high.
Prepared by : Dr. Ahmad Jamal
Pipelined evaluation

Evaluate several operations simultaneously. Result of one operation is passed


to the next operation. Evaluate the expression in a bottom-up manner and
don’t store intermediate results to temporary files.

Don’t store the result of A ⋈ B in a temporary file. Instead the result is passed
directly for projection with C and so on.
Prepared by : Dr. Ahmad Jamal
Query
Equivalence
Query equivalence deals with queries that are equal.

This means the output of the two queries you consider will be the same.

For example,
Consider a table called employee with fields like name, age, salary and
department.
Query 1:
SELECT * FROM (SELECT * FROM EMPLOYEE WHERE AGE>30)
WHERE SALARY>50000;

Query 2:
SELECT * FROM (SELECT * FROM EMPLOYEE WHERE SALARY>50000)
WHERE AGE>30;
Prepared by : Dr. Ahmad Jamal
These two queries output will be the records of the employees whose salary is
greater than 50000 and age is greater than 30.

“The equivalence rule says that expressions of two forms are the same or
equivalent because both expressions produce the same outputs on any legal
database instance. It means that we can possibly replace the expression of
the first form with that of the second form and replace the expression of the
second form with an expression of the first form. Thus, the optimizer of the
query-evaluation plan uses such an equivalence rule or method for
transforming expressions into the logically equivalent one.”

Prepared by : Dr. Ahmad Jamal


Joins
Join is a combination of a Cartesian product followed by a selection process. A
Join operation pairs two tuples from different relations, if and only if a given
join condition is satisfied.

Join in DBMS is a binary operation which allows you to combine join product
and selection in one single statement. The goal of creating a join condition is
that it helps you to combine the data from two or more DBMS tables.

Types of Join
There are mainly two types of joins in DBMS:


Inner Joins: Theta, Natural, EQUI

Outer Join: Left, Right, Full

Prepared by : Dr. Ahmad Jamal


Inner Join

Inner Join is used to return rows from both tables which satisfy the given
condition. It is the most widely used join operation and can be considered as a
default join-type.

Inner Join further divided into three subtypes:


Theta join

Natural join

EQUI join

Prepared by : Dr. Ahmad Jamal


Theta Join

Theta Join allows you to merge two tables based on the condition represented
by theta. Theta joins work for all comparison operators. It is denoted by
symbol θ. The general case of JOIN operation is called a Theta join.

Syntax:
A ⋈θ B

Theta join can use any conditions in the selection criteria.

Condition can be <,> and =.

Prepared by : Dr. Ahmad Jamal


Prepared by : Dr. Ahmad Jamal
STUDENT ⋈Student.Std = Subject.Class SUBJECT

Prepared by : Dr. Ahmad Jamal


Equijoin

When Theta join uses only equality comparison operator, it is said to be


equijoin.

Prepared by : Dr. Ahmad Jamal


Natural Join

Natural join does not use any comparison operator. It does not concatenate
the way a Cartesian product does. We can perform a Natural Join only if there
is at least one common attribute that exists between two relations. In addition,
the attributes must have the same name and domain.

Natural join acts on those matching attributes where the values of attributes in
both the relations are same.

Prepared by : Dr. Ahmad Jamal


Prepared by : Dr. Ahmad Jamal
Prepared by : Dr. Ahmad Jamal
Outer Joins

Left Outer Join


(R Left Outer Join S)

All the tuples from the Left relation, R, are included in the resulting relation. If
there are tuples in R without any matching tuple in the Right relation S, then
the S-attributes of the resulting relation are made NULL.

Prepared by : Dr. Ahmad Jamal


Prepared by : Dr. Ahmad Jamal
Prepared by : Dr. Ahmad Jamal
Right Outer Join

All the tuples from the Right relation, S, are included in the resulting relation. If
there are tuples in S without any matching tuple in R, then the R-attributes of
resulting relation are made NULL.

Prepared by : Dr. Ahmad Jamal


Full Outer Join

All the tuples from both participating relations are included in the resulting
relation. If there are no matching tuples for both relations, their respective
unmatched attributes are made NULL.

Prepared by : Dr. Ahmad Jamal

You might also like