DBMS
DBMS
DBMS
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After
designing the conceptual model of Database using ER diagram, we need to convert the conceptual
model in relational model which can be implemented using any RDMBS languages like Oracle
SQL, MySQL etc. So we will see what Relational Model is.
Relational Model represents how data is stored in Relational Databases. A relational database
stores data in the form of relations (tables). Consider a relation STUDENT with attributes
ROLL_NO, NAME, ADDRESS, PHONE and AGE shown in Table 1.
STUDENT
IMPORTANT TERMINOLOGIES
1. Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
2. Relation Schema: A relation schema represents name of the relation with its attributes.
e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema
for STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
3. Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples,
one of which is shown as:
4. Relation Instance: The set of tuples of a relation at a particular instance of time is called
as relation instance. Table 1 shows the relation instance of STUDENT at a particular
time. It can change whenever there is insertion, deletion or updation in the database.
5. Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
6. Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT
relation defined above has cardinality 4.
7. Column: Column represents the set of values for a particular attribute. The column
ROLL_NO is extracted from relation STUDENT.
ROLL_NO
1
2
3
4
8. NULL Values: The value which is not known or unavailable is called NULL value. It is
represented by blank space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
While designing Relational Model, we define some conditions which must hold for data present
in database are called Constraints. These constraints are checked before performing any
operation (insertion, deletion and updation) in database. If there is a violation in any of
constrains, operation will fail.
Domain Constraints: These are attribute level constraints. An attribute can only take values
which lie inside the domain range. e.g,; If a constrains AGE>0 is applied on STUDENT relation,
inserting negative value of AGE will result in failure.
Key Integrity: Every relation in the database should have atleast one set of attributes which
defines a tuple uniquely. Those set of attributes is called key. e.g.; ROLL_NO in STUDENT is a
key. No two students can have same roll number. So a key has two properties:
Referential Integrity: When one attribute of a relation can only take values from other attribute
of same relation or any other relation, it is called referential integrity. Let us suppose we have 2
relations
STUDENT
BRANCH
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ELECTRONICS AND
ECE
COMMUNICATION ENGINEERING
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in
BRANCH_CODE of BRANCH which is called referential integrity constraint. The relation
which is referencing to other relation is called REFERENCING RELATION (STUDENT in this
case) and the relation to which other relations refer is called REFERENCED RELATION
(BRANCH in this case).
ANOMALIES
An anomaly is an irregularity, or something which deviates from the expected or normal state.
When designing databases, we identify three types of anomalies: Insert, Update and Delete.
ON DELETE CASCADE: It will delete the tuples from REFERENCING RELATION if value
used by REFERENCING ATTRIBUTE is deleted from REFERENCED RELATION. e.g;, if we
delete a row from BRANCH with BRANCH_CODE CS, the rows in STUDENT relation with
BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
Projection ()
Projection is used to project required column data from a relation.
Example :
R
(A B C)
----------
1 2 4
2 2 3
3 2 3
4 3 4
(BC)
B C
-----
2 4
2 3
3 4
Selection ()
Selection is used to select required tuples of the relations.
Note: selection operator only selects the required tuples but does not display them. For
displaying, data projection operator is used.
For the above selected tuples, to display we need to use projection also.
A B C
-------
1 2 4
4 3 4
Union (U)
Union operation in relational algebra is same as union operation in set theory, only constraint is
for union of two relation both relation must have same set of Attributes.
Rename ()
Rename is a unary operation used for renaming attributes of a relation.
(a/b)R will rename the attribute b of relation by a.
Cross product between two relations let say A and B, so cross product between A X B will
results all the attributes of A followed by each attribute of B. Each record of A will pairs with
every record of B.
A B
(Name Age Sex ) (Id Course)
------------------ -------------
Ram 14 M 1 DS
Sona 15 F 2 DBMS
kim 20 M
A X B
Name Age Sex Id Course
---------------------------------
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: if A has n tuples and B has m tuples then A X B will have n*m tuples.
Natural Join ()
Natural join is a binary operator. Natural join between two or more relations will result set of all
combination of tuples where they have equal common attribute.
Emp Dep
(Name Id Dept_name ) (Dept_name Manager)
------------------------ ---------------------
A 120 IT Sale Y
B 125 HR Prod Z
C 110 Sale IT A
D 111 IT
Emp Dep
Conditional Join
Conditional join works similar to natural join. In natural join, by default condition is equal
between common attribute while in conditional join we can specify the any condition such as
greater than, less than, not equal
R S
(ID Sex Marks) (ID Sex Marks)
------------------ --------------------
1 F 45 10 M 20
2 F 55 11 M 22
3 F 60 12 M 59
Relational Algebra is a procedural query language which takes relations as an input and returns
relation as an output. There are some basic operators which can be applied on relations to
produce required results which we will discuss one by one. We will use STUDENT,
STUDENT_SPORTS and EMPLOYEE relations as given in Table 1, Table 2 and Table 3
respectively to understand the various operators.
STUDENT
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 1
STUDENT_SPORTS
Table 2
EMPLOYEE
Table 3
Selection operator (): Selection operator is used to select tuples from a relation based on some
condition. Syntax:
(Cond)(Relation Name)
Extract students whose age is greater than 18 from STUDENT relation given in Table 1
(AGE>18)(STUDENT)
RESULT:
Projection Operator (): Projection operator is used to project particular columns from a
relation. Syntax:
(ROLL_NO,NAME)(STUDENT)
RESULT:
ROLL_NO NAME
1 RAM
2 RAMESH
3 SUJIT
4 SURESH
Note: If resultant relation after projection has duplicate rows, it will be removed. For
Example: (ADDRESS)(STUDENT) will remove one duplicate row with value DELHI and return
three rows.
Cross Product(X): Cross product is used to join two relations. For every row of Relation1, each
row of Relation2 is concatenated. If Relation1 has m tuples and and Relation2 has n tuples, cross
product of Relation1 and Relation2 will have m X n tuples. Syntax:
Relation1 X Relation2
To apply Cross Product on STUDENT relation given in Table 1 and STUDENT_SPORTS
relation given in Table 2,
STUDENT X STUDENT_SPORTS
RESULT:
Union (U): Union on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible (These two relation should have same number of attributes and corresponding
attributes in two relations have same domain) . Union operator when applied on two relations R1
and R2 will give a relation with tuples which are either in R1 or in R2. The tuples which are in
both R1 and R2 will appear only once in result relation. Syntax:
Relation1 U Relation2
Find person who are either student or employee, we can use Union operator like:
STUDENT U EMPLOYEE
RESULT:
Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union
compatible. Minus operator when applied on two relations as R1-R2 will give a relation with
tuples which are in R1 but not in R2. Syntax:
Relation1 - Relation2
Find person who are student but not employee, we can use minus operator like:
STUDENT - EMPLOYEE
RESULT:
(Relation2, Relation1)
(STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME from
STUDENT, it can be done using rename operator as:
Relational Model
Extended operators are those operators which can be derived from basic operators.There are
mainly three types of extended operators in Relational Algebra:
Join
Intersection
Divide
STUDENT
Table 1
STUDENT_SPORTS
ROLL_NO SPORTS
1 Badminton
2 Cricket
2 Badminton
4 Badminton
Table 2
ALL_SPORTS
SPORTS
Badminton
Cricket
Table 3
EMPLOYEE
Table 4
Intersection (): Intersection on two relations R1 and R2 can only be computed if R1 and R2
are union compatible (These two relation should have same number of attributes and
corresponding attributes in two relations have same domain). Intersection operator when applied
on two relations as R1R2 will give a relation with tuples which are in R1 as well as R2. Syntax:
Relation1 Relation2
Example: Find a person who is student as well as employee- STUDENT
EMPLOYEE
RESULT:
Conditional Join(c): Conditional Join is used when you want to join two or more relation
based on some conditions. Example: Select students whose ROLL_NO is greater than EMP_NO
of employees
STUDENTc STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
(STUDENT.ROLL_NO>EMPLOYEE.EMP_NO)(STUDENTEMPLOYEE)
RESULT:
Equijoin: Equijoin is a special case of conditional join where only equality condition holds
between a pair of attributes. As values of two attributes will be equal in result of equijoin, only
one attribute will be appeared in result.
STUDENTSTUDENT.ROLL_NO=EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
Natural Join(): It is a special case of equijoin in which equality condition hold on all attributes
which have same name in relations R and S (relations on which join operation is applied). While
applying natural join on two relations, there is no need to write equality condition explicitly.
Natural Join will also return the similar attributes only once as their value will be same in
resulting relation.
STUDENTSTUDENT_SPORTS
Natural Join is by default inner join because the tuples which does not satisfy the conditions of
join does not appear in result set. e.g.; The tuple having ROLL_NO 3 in STUDENT does not
match with any tuple in STUDENT_SPORTS, so it has not been a part of result set.
Left Outer Join : When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Left Outer Joins gives all
tuples of R in the result set. The tuples of R which do not satisfy join condition will have values
as NULL for attributes of S.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details of
other students as well
STUDENT STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT
Right Outer Join : When applying join on two relations R and S, some tuples of R or S
does not appear in result set which does not satisfy the join conditions. But Right Outer Joins
gives all tuples of S in the result set. The tuples of S which do not satisfy join condition will have
values as NULL for attributes of R.
Example: Select students whose ROLL_NO is greater than EMP_NO of employees and details
of other Employees as well
STUDENT STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
Full Outer Join : When applying join on two relations R and S, some tuples of R or S does
not appear in result set which does not satisfy the join conditions. But Full Outer Joins gives all
tuples of S and all tuples of R in the result set. The tuples of S which do not satisfy join condition
will have values as NULL for attributes of R and vice versa.
Example:Select students whose ROLL_NO is greater than EMP_NO of employees and details of
other Employees as well and other Students as well
STUDENT STUDENT.ROLL_NO>EMPLOYEE.EMP_NOEMPLOYEE
RESULT:
Division Operator (): Division operator AB can be applied if and only if:
Consider the relation STUDENT_SPORTS and ALL_SPORTS given in Table 2 and Table 3
above.
STUDENT_SPORTS ALL_SPORTS
ROLL_NO
2
Database Management System | Dependency
Preserving Decomposition
Dependency Preservation
Problem: Let a relation R (A, B, C, D ) and functional dependency {AB > C, C > D, D >
A}. Relation R is decomposed into R1( A, B, C) and R2(C, D). Check whether decomposition is
dependency preserving or not.
Solution:
closure(A) = { A } // Trivial
closure(B) = { B } // Trivial
closure(C) = {C, A, D} but D can't be in closure as D is not present R1.
= {C, A}
C--> A // Removing C from right side as it is trivial attribute
closure(AB) = {A, B, C, D}
= {A, B, C}
AB --> C // Removing AB from right side as these are trivial attributes
closure(BC) = {B, C, D, A}
= {A, B, C}
BC --> A // Removing BC from right side as these are trivial attributes
closure(AC) = {A, C, D}
AC --> D // Removing AC from right side as these are trivial attributes
F1 {C--> A, AB --> C, BC --> A}.
Similarly F2 { C--> D }
A functional dependency X->Y in a relation holds if two tuples having same value for X also have
same value for Y i.e X uniquely determines Y.
FD E-ID->E-NAME holds because for each E-ID, there is a unique value of E-NAME.
FD E-ID->E-CITY and E-CITY->E-STATE also holds.
FD E-NAME->E-ID does not hold because E-NAME John is not uniquely determining
E-ID. There are 2 E-IDs corresponding to John (E001 and E003).
EMPLOYEE
Table 1
Trivial versus Non-Trivial Functional Dependency: A trivial functional dependency is the one
which will always hold in a relation.
Let X, Y, and Z are sets of attributes in a relation R. There are several properties of functional
dependencies which always hold in R also known as Armstrong Axioms.
1. Add A to S.
2. Recursively add attributes which can be functionally determined from attributes of the set
S until done.
(E-NAME)+ = {E-NAME}
(E-CITY)+ = {E-CITY, E_STATE}
Q. Find the attribute closures of given FDs R(ABCDE) = {AB->C, B->D, C->E, D->A} To
find (B)+ ,we will add attribute in set using various FD which has been shown in table below.
We can find (C, D)+ by adding C and D into the set (triviality) and then E
using(C->E) and then A using (D->A) and set becomes.
(C,D)+ = {C,D,E,A}
Similarly we can find (B,C)+ by adding B and C into the set (triviality) and
then D using (B->D) and then E using (C->E) and then A using (D->A)
and set becomes
(B,C)+ ={B,C,D,E,A}
Candidate Key
Candidate Key is minimal set of attributes of a relation which can be used to identify a tuple
uniquely. For Example, each tuple of EMPLOYEE relation given in Table 1 can be uniquely
identified by E-ID and it is minimal as well. So it will be Candidate key of the relation.
Super Key
Super Key is set of attributes of a relation which can be used to identify a tuple uniquely.For
Example, each tuple of EMPLOYEE relation given in Table 1 can be uniquely identified by E-
ID or (E-ID, E-NAME) or (E-ID, E-CITY) or (E-ID, E-STATE) or (E_ID, E-NAME, E-
STATE) etc. So all of these are super keys of EMPLOYEE relation.
Note: A candidate key is always a super key but vice versa is not true.
Q. Finding Candidate Keys and Super Keys of a Relation using FD set The set of attributes
whose attribute closure is set of all attributes of relation is called super key of relation. For
Example, the EMPLOYEE relation shown in Table 1 has following FD set. {E-ID->E-NAME,
E-ID->E-CITY, E-ID->E-STATE, E-CITY->E-STATE} Let us calculate attribute closure of
different set of attributes:
The minimal set of attributes whose attribute closure is set of all attributes of relation is called
candidate key of relation. As shown above, (E-ID)+ is set of all attributes of relation and it is
minimal. So E-ID will be candidate key. On the other hand (E-ID, E-NAME)+ also is set of all
attributes but it is not minimal because its subset (E-ID)+ is equal to set of all attributes. So (E-ID,
E-NAME) is not a candidate key.
Database Normalization | Introduction
Database normalization is the process of organizing the attributes of database to reduce or
eliminate data redundancy (having same data but at different places) .
Functional Dependency
Functional Dependency is a constraint between two sets of attributes in a relation from a
database. Functional dependency is denoted by arrow (). If an attributed A functionally
determines B, then it is written as A B.
For example employee_id name means employee_id functionally determines name of
employee. As another example in a time table database, {student_id, time} {lecture_room},
student ID and time determine the lecture room where student should be.
For example in the below table A B is true, but B A is not true as there are different values
of A for B = 3.
A B
------
1 3
2 3
4 0
1 3
4 0
ABC --> AB
ABC --> A
ABC --> ABC
Non Trivial Functional Dependencies
X > Y is a non trivial functional dependencies when Y is not a subset of X.
Id --> Name,
Name --> DOB
Normal form
Normal forms are used to eliminate or reduce redundancy in database tables.
Example :
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M C2, c3
A relation is in 2NF iff it has No Partial Dependency, i.e., no non-prime attribute (attributes
which are not part of any candidate key) is dependent on any proper subset of any candidate key
of the table.
In the above relation, AB is the only candidate key and there is no partial dependency, i.e., any
proper subset of AB doesnt determine any non-prime attribute.
All possible candidate keys in above relation are {A, E, CD, BC}
All attribute are on right sides of all functional dependencies are prime.
BCNF
A relation is in BCNF iff in every non-trivial functional dependency X > Y, X is a super key.
Key Points
ABC --> D
CD --> AE
BCNF: ABC -> D is in BCNF. Let us check CD -> AE, CD is not a super key so this
dependency is not in BCNF. So, R is not in BCNF.
3NF: ABC -> D we dont need to check for this dependency as it already satisfied BCNF. Let us
consider CD -> AE. Since E is not a prime attribute, so relation is not in 3NF.
2NF: In 2NF, we need to check for partial dependency. CD which is a proper subset of a
candidate key and it determine E, which is non prime attribute. So, given relation is also not in 2
NF.
Given a Relation with different FD sets for that relation, we have to find out whether one FD set
is subset of other or both are equal.
1. If all FDs of FD1 can be derived from FDs present in FD2, we can say that FD2 FD1.
2. If all FDs of FD2 can be derived from FDs present in FD1, we can say that FD1 FD2.
3. If 1 and 2 both are true, FD1=FD2.
All these three cases can be shown using Venn diagram as:
Q. Let us take an example to show the relationship between two FD sets. A relation
R(A,B,C,D) having two FD sets FD1 = {A->B, B->C, AB->D} and FD2 = {A->B, B->C, A-
>C, A->D}
As all FDs in set FD1 also hold in set FD2, FD2 FD1 is true.
As all FDs in set FD2 also hold in set FD1, FD1 FD2 is true.
Step 3. As FD2 FD1 and FD1 FD2 both are true FD2 =FD1 is true. These two FD sets are
semantically equivalent.
Q. Let us take another example to show the relationship between two FD sets. A relation
R2(A,B,C,D) having two FD sets FD1 = {A->B, B->C,A->C} and FD2 = {A->B, B->C, A-
>D}
As all FDs in set FD1 also hold in set FD2, FD2 FD1 is true.
As all FDs in set FD2 do not hold in set FD1, FD2 FD1.
Step 3. In this case, FD2 FD1 and FD2 FD1, these two FD sets are not semantically
equivalent.
DBMS | How to find the highest normal form
of a relation
To understand this topic, you should have basic idea about
Normal forms
Example 1. Find the highest normal form of a relation R(A,B,C,D,E) with FD set {A->D, B-
>A, BC->D, AC->BE}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute of
relation, So AC will be candidate key. A can be derived from B, so we can replace A in AC by B.
So BC will also be a candidate key. So there will be two candidate keys {AC, BC}.
Step 2. Prime attribute are those attribute which are part of candidate key {A,B,C} in this example
and others will be non-prime {D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is not in 2nd Normal form because A->D is partial dependency (A which is subset of
candidate key AC is determining non-prime attribute D) and 2nd normal form does not allow partial
dependency.
Example 2. Find the highest normal form of a relation R(A,B,C,D,E) with FD set as {BC->D,
AC->BE, B->E}
Step 1. As we can see, (AC)+ ={A,C,B,E,D} but none of its subset can determine all attribute
of relation, So AC will be candidate key. A or C cant be derived from any other attribute of the
relation, so there will be only 1 candidate key {AC}.
Step 2. Prime attribute are those attribute which are part of candidate key {A,C} in this example
and others will be non-prime {B,D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not proper subset
of candidate key AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in
2nd normal form (B is not a proper subset of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a
prime attribute) and in B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd
normal for, either LHS of an FD should be super key or RHS should be prime attribute.
Example 3. Find the highest normal form of a relation R(A,B,C,D,E) with FD set {B->A, A-
>C, BC->D, AC->BE}
Step 1. As we can see, (B)+ ={B,A,C,D,E}, so B will be candidate key. B can be derived from
AC using AC->B (Decomposing AC->BE to AC->B and AC->E). So AC will be super key but
(C)+ ={C} and (A)+ ={A,C,B,E,D}. So A (subset of AC) will be candidate key. So there will be
two candidate keys {A,B}.
Step 2. Prime attribute are those attribute which are part of candidate key {A,B} in this example
and others will be non-prime {C,D,E} in this example.
Step 3. The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or
composite attribute.
The relation is in 2nd normal form because B->A is in 2nd normal form (B is a super key) and A-
>C is in 2nd normal form (A is super key) and BC->D is in 2nd normal form (BC is a super key)
and AC->BE is in 2nd normal form (AC is a super key).
The relation is in 3rd normal form because LHS of all FDs are super keys. The relation is in
BCNF as all LHS of all FDs are super keys. So the highest normal form is BCNF.
Database Management System | ER Model
ER Model is used to model the logical view of the system from data perspective which consists
of these components:
An Entity may be an object with a physical existence a particular person, car, house, or
employee or it may be an object with a conceptual existence a company, a job, or a university
course.
An Entity is an object of Entity Type and set of all entities is called as entity set. e.g.; E1 is an
entity having Entity Type Student and set of all students is called Entity Set. In ER diagram,
Entity Type is represented as:
Attribute
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB,
Age, Address, Mobile_No are the attributes which defines entity type Student. In ER diagram,
attribute is represented by an oval.
Key Attribute
The attribute which uniquely identifies each entity in the entity set is called key attribute.For
example, Roll_No will be unique for each student. In ER diagram, key attribute is represented by
Multivalued
Attribute
An attribute consisting more than one value for a given entity. For example, Phone_No (can be
more than one for a given student). In ER diagram, multivalued attribute is represented by
double oval.
Derived Attribute
An attribute which can be derived from other attributes of the entity type is known as derived
attribute. e.g.; Age (can be derived from DOB). In ER diagram, derived attribute is represented
by dashed oval.
The complete entity type Student with its attributes can be represented
as:
Relationship Type and Relationship Set
A relationship type represents the association between entity types. For example,Enrolled in
is a relationship type that exists between entity type Student and Course. In ER diagram,
relationship type is represented by a diamond and connecting the entities with lines.
A set of relationships of same type is known as relationship set. The following relationship set
depicts S1 is enrolled in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Unary Relationship
When there is only ONE entity set participating in a relation, the relationship is called as
unary relationship. For example, one person is married to only one person.
Binary Relationship
When there are TWO entities set participating in a relation, the relationship is called as binary
relationship.For example, Student is enrolled in Course.
n-ary Relationship
When there are n entities set participating in a relation, the relationship is called as n-ary
relationship.
Cardinality
The number of times an entity of an entity set participates in a relationship set is known as
cardinality. Cardinality can be of different types:
one to one: When each entity in each entity set can take part only once in the relationship, the
cardinality is one to one. Let us assume that a male can marry to one female and a female can
marry to one male. So the relationship will be one to one.
Using Sets, it can be represented as:
many to one: When entities in one entity set can take part only once in the relationship set
and entities in other entity set can take part more than once in the relationship set, cardinality
is many to one. Let us assume that a student can take only one course but one course can be taken
by many students. So the cardinality will be n to 1. It means that for one course there can be n
students but for one student, there will be only one course.
Using Sets, it
can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many students.
many to many: When entities in all entity sets can take part more than once in the
relationship cardinality is many to many. Let us assume that a student can take more than one
course and one course can be taken by many students. So the relationship will be many to many.
In this example, student S1 is enrolled in C1 and C3 and Course C3 is enrolled by S1, S3 and S4.
So it is many to many relationships.
Participation Constraint
Total Participation: Each entity in the entity set must participate in the relationship. If each
student must enroll in a course, the participation of student will be total. Total participation is
shown by double line in ER diagram.
Partial Participation: The entity in the entity set may or may NOT participate in the
relationship. If some courses are not enrolled by any of the student, the participation of course
will be partial.
The diagram depicts the Enrolled in relationship set with Student Entity set having total
participation and Course Entity set having partial participation.
Using set, it can be represented as,
Every student in Student Entity set is participating in relationship but there exists a course C4
which is not taking part in the relationship.
As discussed before, an entity type has a key attribute which uniquely identifies each entity in
the entity set. But there exists some entity type for which key attribute cant be defined.
These are called Weak Entity type.
For example, A company may store the information of dependants (Parents, Children, Spouse) of
an Employee. But the dependents dont have existence without the employee. So Dependent will
be weak entity type and Employee will be Identifying Entity type for Dependant.
A weak entity type is represented by a double rectangle. The participation of weak entity type is
always total. The relationship between weak entity type and its identifying strong entity type is
called identifying relationship and it is represented by double diamond.
Minimization of ER Diagram
Entity Relationship (ER) Diagram is diagrammatic representation of data in databases, it shows
how data is related.
Note: This article for those who already know what is ER diagram and how to draw ER diagram.
For Student(SID, Name), SID is the primary key. For Course ( CID, C_name ), CID is the
primary key
Student Course
(SID Name) ( CID C_name )
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
(SID CID)
----------
1 C1
2 C1
3 c3
4 C2
Now the question is, what should be the primary key for Enroll SID or CID or combined. We
cant have CID as primary key as you can see in enroll for the same CID we have multiples
SID. (SID , CID) can distinguish table uniquely, but it is not minimum. So SID is the primary
key for the relation enroll.
Student
Enroll
Course
Student_Enroll
( SID Name CID )
---------------------
1 A c1
2 B c1
3 C c3
4 D c2
Enroll
( SID CID )
----------
1 C1
1 C2
2 C1
2 C2
3 c3
4 C2
Now, same question what is the primary key of Enroll relation, if we carefully analyse the Enroll
primary key for Enroll
table is ( SID , CID ).
But in this case we cant merge Enroll table with any one of Student and Course. If we try to
merge Enroll with any one of the Student and Course it will create redundant data.
Primary key of R can be A1 or B1, but we cant still combine all the three table into one. if we
do, so some entries in combined table may have NULL entries. So idea of merging all three table
into one is not good.
ER model
Relation model
After designing the ER diagram of system, we need to convert it to Relational models which can
directly be implemented by any RDBMS like Oracle, MySQL etc. In this article we will discuss
how to convert ER diagram to Relational Model for different scenarios.
Case 1: Binary Relationship with 1:1 cardinality with total participation of an entity
A
person has 0 or 1 passport number and Passport is always owned by 1 person. So it is 1:1
cardinality with full participation constraint from Passport.
First Convert each entity and relationship to tables. Person table corresponds to Person
Entity with key as Per-Id. Similarly Passport table corresponds to Passport Entity with key as
Pass-No. Has Table represents relationship between Person and Passport (Which person has
which passport). So it will take attribute Per-Id from Person and Pass-No from Passport.
As we can see from Table 1, each Per-Id and Pass-No has only one entry in Has table. So we can
merge all three tables into 1 with attributes shown in Table 2. Each Per-Id will be unique and not
null. So it will be the key. Pass-No cant be key because for some person, it can be NULL.
Other
Other
Per-Id Person Pass-No
PassportAttribute
Attribute
Table 2
Case 2: Binary Relationship with 1:1 cardinality and partial participation of both entities
A male marries 0 or 1 female and vice versa as well. So it is 1:1 cardinality with partial
participation constraint from both. First Convert each entity and relationship to tables. Male
table corresponds to Male Entity with key as M-Id. Similarly Female table corresponds to
Female Entity with key as F-Id. Marry Table represents relationship between Male and Female
(Which Male marries which female). So it will take attribute M-Id from Male and F-Id from
Female.
Table 3
As we can see from Table 3, some males and some females do not marry. If we merge 3 tables
into 1, for some M-Id, F-Id will be NULL and for some F-Id, M-Id will be NULL. So there is no
attribute which is always not NULL. So we cant merge all three tables into 1. We can convert
into 2 tables. In table 4, M-Id who are married will have F-Id associated. For others, it will be
NULL. Table 5 will have information of all females. Primary Keys have been underlined.
Other
M-Id Male F-Id
Attribute
Table 4
Other
F-Id
FemaleAttribute
Table 5
Note: Binary relationship with 1:1 cardinality will have 2 table if partial participation of both
entities in the relationship. If atleast 1 entity has total participation, number of tables required
will be 1.
In this
scenario, every student can enroll only in one elective course but for an elective course there can
be more than one student. First Convert each entity and relationship to tables. Student table
corresponds to Student Entity with key as S-Id. Similarly Elective_Course table corresponds to
Elective_Course Entity with key as E-Id. Enrolls Table represents relationship between Student
and Elective_Course (Which student enrolls in which course). So it will take attribute S-Id from
Person and E-Id from Passport.
Table 6
As we can see from Table 6, S-Id is not repeating in Enrolls Table. So it can be considered as a
key of Enrolls table. Both Student and Enrolls Tables key is same; we can merge it as a single
table. The resultant tables are shown in Table 7 and Table 8. Primary Keys have been underlined.
Other Student
S-Id E-Id
Attribute
Table 7
Other Elective
E-Id
CourseAttribute
Table 8
In this scenario, every student can enroll in more than 1 compulsory course and for a compulsory
course there can be more than 1 student. First Convert each entity and relationship to
tables. Student table corresponds to Student Entity with key as S-Id. Similarly
Compulsory_Courses table corresponds to Compulsory Courses Entity with key as C-Id. Enrolls
Table represents relationship between Student and Compulsory_Courses (Which student enrolls
in which course). So it will take attribute S-Id from Person and C-Id from Compulsory_Courses.
Table 9
As we can see from Table 9, S-Id and C-Id both are repeating in Enrolls Table. But its
combination is unique; so it can be considered as a key of Enrolls table. All tables keys are
different, these cant be merged. Primary Keys of all tables have been underlined.
In this scenario, an employee can have many dependants and one dependant can depend on one
employee. A dependant does not have any existence without an employee (e.g; you as a child can
be dependant of your father in his company). So it will be a weak entity and its participation will
always be total. Weak Entity does not have key of its own. So its key will be combination of key
of its identifying entity (E-Id of Employee in this case) and its partial key (D-Name).
First Convert each entity and relationship to tables. Employee table corresponds to Employee
Entity with key as E-Id. Similarly Dependants table corresponds to Dependant Entity with key
as D-Name and E-Id. Has Table represents relationship between Employee and Dependants
(Which employee has which dependants). So it will take attribute E-Id from Employee and D-
Name from Dependants.
Table 10
As we can see from Table 10, E-Id, D-Name is key for Has as well as Dependants Table. So we
can merge these two into 1. So the resultant tables are shown in Tables 11 and 12. Primary Keys
of all tables have been underlined.
Other Employee
E-Id
Attribute
Table 11
Other
D-Name E-Id
DependantsAttribute
Table 12
Structured Query Language (SQL)
Structured Query Language is a standard Database language which is used to create, maintain
and retrieve the relational database.
Relational database means the data is stored as well as retrieved in the form of relations (tables).
Table 1 shows the relational database with only one relation called STUDENT which stores
ROLL_NO, NAME, ADDRESS, PHONE and AGE of students.
STUDENT
TABLE 1
These are some important terminologies that are used in terms of relation.
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME etc.
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of
which is shown as:
Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation
defined above has cardinality 4.
Column: Column represents the set of values for a particular attribute. The column ROLL_NO
is extracted from relation STUDENT.
ROLL_NO
1
2
3
4
Data Definition Language: It is used to define the structure of the database. e.g; CREATE
TABLE, ADD COLUMN, DROP COLUMN and so on.
Data Manipulation Language: It is used to manipulate data in the relations. e.g.; INSERT,
DELETE, UPDATE and so on.
Data Query Language: It is used to extract the data from the relations. e.g.; SELECT
So first we will consider the Data Query Language. A generic query to retrieve from a relational
database is:
Part of the query represented by statement 1 is compulsory if you want to retrieve from a
relational database. The statements written inside [] are optional. We will look at the possible
query combination on relation shown in Table 1.
Case 1: If we want to retrieve attributes ROLL_NO and NAME of all students, the query will
be:
Case 2: If we want to retrieve ROLL_NO and NAME of the students whose ROLL_NO is
greater than 2, the query will be:
CASE 3: If we want to retrieve all attributes of students, we can write * in place of writing all
attributes as:
SELECT * FROM STUDENT
WHERE ROLL_NO>2;
ROLL_NO NAME ADDRESS PHONE AGE
3 SUJIT ROHTAK 9156253131 20
4 SURESH DELHI 9156768971 18
CASE 4: If we want to represent the relation in ascending order by AGE, we can use ORDER
BY clause as:
Note: ORDER BY AGE is equivalent to ORDER BY AGE ASC. If we want to retrieve the
results in descending order of AGE, we can use ORDER BY AGE DESC.
If DISTINCT is not used, DELHI will be repeated twice in result set. Before understanding
GROUP BY and HAVING, we need to understand aggregations functions in SQL.
COUNT: Count function is used to count the number of rows in a relation. e.g;
SUM: SUM function is used to add the values of an attribute in a relation. e.g;
In the same way, MIN, MAX and AVG can be used. As we have seen above, all aggregation
functions return only 1 row.
GROUP BY: Group by is used to group the tuples of a relation based on an attribute or group of
attribute. It is always combined with aggregation function which is computed on group. e.g.;
In this query, SUM(AGE) will be computed but not for entire table but for each address. i.e.;
sum of AGE for address DELHI(18+18=36) and similarly for other address as well. The output
is:
ADDRESS SUM(AGE)
DELHI 36
GURGAON 18
ROHTAK 20
If we try to execute the query given below, it will result in error because we have computed
SUM(AGE) for each address and there can be more than 1 student for each address. So it cant
be displayed in result set.
NOTE: An attribute which is not a part of GROUP BY clause cant be used for selection. Any
attribute which is part of GROUP BY CLAUSE can be used for selection but it is not mandatory.
Student Table
StudentCourse Table
CourseID EnrollNo
1 1001
2 1001
3 1001
1 1002
2 1003
Following is join query that shows names of students enrolled in different courseIDs.
SELECT Student.StudentName,
StudentCourse.CourseID
FROM Student
INNER JOIN StudentCourse
ON StudentCourse.EnrollNo = Student.EnrollNo
ORDER BY StudentCourse.CourseID
Note: INNER is optional above. Simple JOIN is also considered as INNER JOIN
CourseID StudentName
1 geek1
1 geek2
2 geek1
2 geek3
3 geek1
1) Left outer join returns all rows of table on left side of join. The rows for which there is no
matching row on right side, result contains NULL in the right side.
SELECT Student.StudentName,
StudentCourse.CourseID
FROM Student
LEFT OUTER JOIN StudentCourse
ON StudentCourse.EnrollNo = Student.EnrollNo
ORDER BY StudentCourse.CourseID
Note: OUTER is optional above. Simple LEFT JOIN is also considered as LEFT OUTER JOIN
StudentName CourseID
geek4 NULL
geek2 1
geek1 1
geek1 2
geek3 2
geek1 3
2) Right Outer Join is similar to Left Outer Join (Right replaces Left everywhere)
3) Full Outer Join Contains results of both Left and Right outer joins.
The where clause works on rows data, not on aggregated data. Let us consider below table
Marks.
a c1 40
a c2 50
b c3 60
d c1 70
e c2 80
Student Total
a 90
b 60
d 70
e 80
Student Total
a 90
e 80
Note: It is not a predefined rule but in a good number of the SQL queries, we use WHERE prior
to GROUP BY and HAVING after GROUP BY. The Where clause acts as a pre filter where as
Having as a post filter.
DBMS | Nested Queries in SQL
Prerequisites : Basics of SQL
In nested queries, a query is written inside a query. The result of inner query is used in execution
of outer query. We will use STUDENT, COURSE, STUDENT_COURSE tables for
understanding nested queries.
STUDENT
COURSE
C_ID C_NAME
C1 DSA
C2 Programming
C3 DBMS
STUDENT_COURSE
S_ID C_ID
S1 C1
S1 C3
S2 C1
S3 C2
S4 C2
S4 C3
There are mainly two types of nested queries:
IN: If we want to find out S_ID who are enrolled in C_NAME DSA or DBMS, we
can write it with the help of independent nested query and IN operator. From COURSE
table, we can find out C_ID for C_NAME DSA or DBMS and we can use these
C_IDs for finding S_IDs from STUDENT_COURSE TABLE.
The inner query will return a set with members C1 and C3 and outer query will return
those S_IDs for which C_ID is equal to any member of set (C1 and C3 in this case). So,
it will return S1, S2 and S4.
Note: If we want to find out names of STUDENTs who have either enrolled in DSA or
DBMS, it can be done as:
The innermost query will return a set with members C1 and C3. Second inner query will
return those S_IDs for which C_ID is equal to any member of set (C1 and C3 in this
case) which are S1, S2 and S4. The outermost query will return those S_IDs where S_ID
is not a member of set (S1, S2 and S4). So it will return S3.
Co-related Nested Queries: In co-related nested queries, the output of inner query
depends on the row which is being currently executed in outer query. e.g.; If we want to
find out S_NAME of STUDENTs who are enrolled in C_ID C1, it can be done with
the help of co-related nested query as:
For each row of STUDENT S, it will find the rows from STUDENT_COURSE where
S.S_ID = SC.S_ID and SC.C_ID=C1. If for a S_ID from STUDENT S, atleast a row
exists in STUDENT_COURSE SC with C_ID=C1, then inner query will return true
and corresponding S_ID will be returned as output.
ACID Properties in DBMS
A transaction is a single logical unit of work which accesses and possibly modifies the contents
of a database. Transactions access data using read and write operations.
In order to maintain consistency in a database, before and after transaction, certain properties are
followed. These are called ACID properties.
Atomicity
By this, we mean that either the entire transaction takes place at once or doesnt happen at all.
There is no midway i.e. transactions do not occur partially. Each transaction is considered as one
unit and either runs to completion or is not executed at all. It involves following two operations.
Abort: If a transaction aborts, changes made to database are not visible.
Commit: If a transaction commits, changes made are visible.
Atomicity is also known as the All or nothing rule.
Consider the following transaction T consisting of T1 and T2: Transfer of 100 from account X to
account Y.
If the transaction fails after completion of T1 but before completion of T2.( say, after write(X)
but before write(Y)), then amount has been deducted from X but not added to Y. This results in
an inconsistent database state. Therefore, the transaction must be executed in entirety in order to
ensure correctness of database state.
Consistency
This means that integrity constraints must be maintained so that the database is consistent before
and after the transaction. It refers to correctness of a database. Referring to the example above,
The total amount before and after the transaction must be maintained.
Total before T occurs = 500 + 200 = 700.
Total after T occurs = 400 + 300 = 700.
Therefore, database is consistent. Inconsistency occurs in case T1 completes but T2 fails. As a
result T is incomplete.
Isolation
This property ensures that multiple transactions can occur concurrently without leading to
inconsistency of database state. Transactions occur independently without interference. Changes
occurring in a particular transaction will not be visible to any other transaction until that
particular change in that transaction is written to memory or has been committed. This property
ensures that the execution of transactions concurrently will result in a state that is equivalent to a
state achieved these were executed serially in some order.
Let X= 500, Y = 500.
Consider two transactions T and T.
Suppose T has been executed till Read (Y) and then T starts. As a result , interleaving of
operations takes place due to which T reads correct value of X but incorrect value of Y and
sum computed by
T: (X+Y = 50, 000+500=50, 500)
is thus not consistent with the sum at end of transaction:
T: (X+Y = 50, 000 + 450 = 50, 450).
This results in database inconsistency, due to a loss of 50 units. Hence, transactions must take
place in isolation and changes should be visible only after a they have been made to the main
memory.
Durability:
This property ensures that once the transaction has completed execution, the updates and
modifications to the database are stored in and written to disk and they persist even is system
failure occurs. These updates now become permanent and are stored in a non-volatile memory.
The effects of the transaction, thus, are never lost.
The ACID properties, in totality, provide a mechanism to ensure correctness and consistency of
a database in a way such that each transaction is a group of operations that acts a single unit,
produces consistent results, acts in isolation from other operations and updates that it makes are
durably stored.
What is Transaction?
A set of logically related operations is known as transaction. The main operations of a transaction
are:
Read(A): Read operations Read(A) or R(A) reads the value of A from the database and stores it
in a buffer in main memory.
Write (A): Write operation Write(A) or W(A) writes the value back to the database from buffer.
Let us take a debit transaction from an account which consists of following operations:
1. R(A);
2. A=A-1000;
3. W(A);
The first operation reads the value of A from database and stores it in a buffer.
Second operation will decrease its value by 1000. So buffer will contain 4000.
Third operation will write the value from buffer to database. So As final value will be
4000.
But it may also be possible that transaction may fail after executing some of its operations. The
failure can be because of hardware, software or power etc. For example, if debit transaction
discussed above fails after executing operation 2, the value of A will remain 5000 in the database
which is not acceptable by the bank. To avoid this, Database has two important operations:
Commit: After all instructions of a transaction are successfully executed, the changes made by
transaction are made permanent in the database.
Rollback: If a transaction is not able to execute all operations successfully, all the changes made
by transaction are undone.
Properties of a transaction
Atomicity: As a transaction is set of logically related operations, either all of them should be
executed or none. A debit transaction discussed above should either execute all three operations
or none.If debit transaction fails after executing operation 1 and 2 then its new value 4000 will not
be updated in the database which leads to inconsistency.
Consistency: If operations of debit and credit transactions on same account are executed
concurrently, it may leave database in an inconsistent state.
For Example, T1 (debit of Rs. 1000 from A) and T2 (credit of 500 to A) executing
concurrently, the database reaches inconsistent state.
Let us assume Account balance of A is Rs. 5000. T1 reads A(5000) and stores the value in
its local buffer space. Then T2 reads A(5000) and also stores the value in its local buffer
space.
T1 performs A=A-1000 (5000-1000=4000) and 4000 is stored in T1 buffer space. Then T2
performs A=A+500 (5000+500=5500) and 5500 is stored in T2 buffer space. T1 writes the
value from its buffer back to database.
As value is updated to 4000 in database and then T2 writes the value from its buffer back
to database. As value is updated to 5500 which shows that the effect of debit transaction
is lost and database has become inconsistent.
To maintain consistency of database, we need concurrency control protocols which will
be discussed in next article. The operations of T1 and T2 with their buffers and database
have been shown in Table 1.
Table 1
Isolation: Result of a transaction should not be visible to others before transaction is committed.
For example, Let us assume that As balance is Rs. 5000 and T1 debits Rs. 1000 from A. As new
balance will be 4000. If T2 credits Rs. 500 to As new balance, A will become 4500 and after this
T1 fails. Then we have to rollback T2 as well because it is using value produced by T1. So a
transaction results are not made visible to other transactions before it commits.
Durable: Once database has committed a transaction, the changes made by the transaction should
be permanent. e.g.; If a person has credited $500000 to his account, bank cant say that the update
has been lost. To avoid this problem, multiple copies of database are stored at different locations.
What is a Schedule?
A schedule is series of operations from one or more transactions. A schedule can be of two types:
Serial Schedule: When one transaction completely executes before starting another
transaction, the schedule is called serial schedule. A serial schedule is always consistent.
e.g.; If a schedule S has debit transaction T1 and credit transaction T2, possible serial
schedules are T1 followed by T2 (T1->T2) or T2 followed by T1 ((T1->T2). A serial
schedule has low throughput and less resource utilization.
Concurrent Schedule: When operations of a transaction are interleaved with operations
of other transactions of a schedule, the schedule is called Concurrent schedule. e.g.;
Schedule of debit and credit transaction shown in Table 1 is concurrent in nature. But
concurrency can lead to inconsistency in database. The above example of concurrent
schedule is also inconsistent.
Question: Consider the following transaction involving two bank accounts x and y:
1. read(x);
2. x := x 50;
3. write(x);
4. read(y);
5. y := y + 50;
6. write(y);
The constraint that the sum of the accounts x and y should remain constant is that of?
1. Atomicity
2. Consistency
3. Isolation
4. Durability
[GATE 2015]
1) Initial Read
If a transaction T1 reading data item A from initial database in S1 then in S2 also T1 should read
A from initial database.
T1 T2 T3
-------------------
R(A)
W(A)
R(A)
R(B)
2)Updated Read
If Ti is reading A which is updated by Tj in S1 then in S2 also Ti should read A which is updated
by Tj.
T1 T2 T3 T1 T2 T3
------------------- ----------------
W(A) W(A)
W(A) R(A)
R(A) W(A)
Above two schedule are not view equal as in S1 :T3 is reading A updated by T2, in S2 T3 is
reading A updated by T1.
T1 T2 T1 T2
------------ ---------------
R(A) R(A)
W(A) W(A)
W(A) W(A)
Above two schedule are not view as Final write operation in S1 is done by T1 while in S2 done
by T2.
Conflicting operations: Two operations are said to be conflicting if all conditions satisfy:
Example:
Conflicting operations pair (R1(A), W2(A)) because they belong to two different
transactions on same data item A and one of them is write operation.
Similarly, (W1(A), W2(A)) and (W1(A), R2(A)) pairs are also conflicting.
On the other hand, (R1(A), W2(B)) pair is non-conflicting because they operate on different
data item.
Similarly, ((W1(A), W2(B)) pair is non-conflicting.
If Oi and Oj are two operations in a transaction and Oi< Oj (Oi is executed before Oj), same order
will follow in schedule as well. Using this property, we can get two transactions of schedule S1
as:
-> Swapping non-conflicting operations R2(A) and R1(B) in S1, the schedule becomes,
-> Similarly, swapping non-conflicting operations W2(A) and W1(B) in S11, the schedule
becomes,
Swapping non-conflicting operations R1(A) and R2(B) in S2, the schedule becomes,
Similarly, swapping non-conflicting operations W1(A) and W2(B) in S21, the schedule becomes,
In schedule S22, all operations of T2 are performed first, but operations of T1 are not in order
(order should be R1(A), W1(A), R1(B), W1(B)). So S2 is not conflict serializable.
Conflict Equivalent: Two schedules are said to be conflict equivalent when one can be
transformed to another by swapping non-conflicting operations. In the example discussed above,
S11 is conflict equivalent to S1 (S1 can be converted to S11 by swapping non-conflicting
operations). Similarly, S11 is conflict equivalent to S12 and so on.
Note 1: Although S2 is not conflict serializable, but still it is conflict equivalent to S21 and S21
because S2 can be converted to S21 and S22 by swapping non-conflicting operations.
Note 2: The schedule which is conflict serializable is always conflict equivalent to one of the serial
schedule. S1 schedule discussed above (which is conflict serializable) is equivalent to serial
schedule (T1->T2).
Question: Consider the following schedules involving two transactions. Which one of the
following statement is true?
[GATE 2007]
Again, swapping non conflicting operations R1(X) and R2(Y) of S2, we get
Again, swapping non conflicting operations R1(X) and W2(Y) of S2, we get
Above table shows a schedule with two transactions, T1 reads and writes A and that value is read
and written by T2. T2 commits. But later on, T1 fails. So we have to rollback T1. Since T2 has
read the value written by T1, it should also be rollbacked. But we have already committed that.
So this schedule is irrecoverable schedule.
Irrecoverable Schedule: When Tj is reading the value updated by Ti and Tj is committed before
commit of Ti, the schedule will be irrecoverable.
Table 2 shows a schedule with two transactions, T1 reads and writes A and that value is read and
written by T2. But later on, T1 fails. So we have to rollback T1. Since T2 has read the value
written by T1, it should also be rollbacked. As it has not committed, we can rollback T2 as well.
So it is recoverable with cascading rollback.
Recoverable with cascading rollback: If Tj is reading value updated by Ti and commit of Tj is
delayed till commit of Ti , the schedule is called recoverable with cascading rollback.
Table 3 shows a schedule with two transactions, T1 reads and writes A and commits and that
value is read by T2. But if T1 fails before commit, no other transaction has read its value, so
there is no need to rollback other transaction. So this is a cascadeless recoverable schedule.
Question: Which of the following scenarios may lead to an irrecoverable error in a database
system?
(A) A transaction writes a data item after it is read by an uncommitted transaction.
(B) A transaction reads a data item after it is read by an uncommitted transaction.
(C) A transaction reads a data item after it is written by a committed transaction.
(D) A transaction reads a data item after it is written by an uncommitted transaction.
Answer: See the example discussed in Table 1, a transaction is reading a data item after it is
written by an uncommitted transaction, the schedule will be irrecoverable.
Indexing in Databases | Set 1
Indexing is a way to optimize performance of a database by minimizing the number of disk
accesses required when a query is processed.
An index or database index is a data structure which is used to quickly locate and access the data
in a database table.
Indexing Methods
1. Clustered Indexing
A clustered index is the one which sorts the file sequentially based on the search key. It is a
sorted list of where data points actually lie. The data is physically stored in the order of clustered
index.
For eg. In an address/phonebook the first/last name form a clustered index and the phone
numbers are present in front of them. So the data is actually present here. Also, the page numbers
on each page of the book form a clustered index as the data(pages) is sequentially sorted and
physically present where we are looking.
There can be only one clustered index (usually the primary key) for a particular table. Take for
example the case above, data in a phone book cannot be sorted both alphabetically and according
to phone numbers at the same time. Since we are grouping similar values, it is called a clustered
index.
Clustered index sorted according to first name (Search key)
2. Non-Clustered Indexing
A non clustered index just tells us where the data lies, i.e. it gives us a list of virtual pointers or
references to the location where the data is actually stored. Data is not physically stored in the
order of the index. Instead , data is present in leaf nodes. For eg. the contents page of a book.
Each entry gives us the page number or location of the information stored. The actual data
here(information on each page of book) is not organised but we have an ordered
reference(contents page) to where the data points actually lie.
Search: It requires more time as compared to clustered index because some amount of extra
work is done in order to extract the data by further following the pointer. In case of clustered
index, data is directly present in front of the index.
3. Primary Index
In this case, the data is sorted according to the search key. It induces sequential file organisation.
4. Secondary Index
It is used to optimize query processing and access records in a database with some information
other than the usual search key (primary key). In this two levels of indexing are used in order to
reduce the mapping size of the first level and in general. Initially, for the first level, a large range
of numbers is selected so that the mapping size is small. Further, each range is divided into
further sub ranges.
In order for quick memory access, first level is stored in the primary memory. Actual physical
location of the data is determined by the second mapping level.
5. Ordered Indices
For every search key value in the data file, there is an index record. This record contains the
search key and also a reference to the first data record with that search key value.
(II) Sparse Index :
The index record appears only for a few items in the data file. Each item points to a block as
shown. To locate a record, we find the index record with the largest search key value less than or
equal to the search key value we are looking for.
DBMS | How to solve Relational Algebra
problems for GATE
In this article, I will discuss common types of questions in relational algebra which are asked in
GATE. Before reading this article, you should have idea about basic and extended operators in
relational algebra.
Type 1: Given a relational algebra expression, find the result. Suppose you have a relation
Order (Prod_Id, Agent_Id, Order_Month) and you have to find out what will the following
algebra expression return.
ORDER
When we apply the following expression, the rows which are highlighted in blue will be
selected.
After projecting Order1.Prod_Id, the output will be P002 which is Prod_Id of products which are
ordered by at least two different agents in same month.
Note: If we want to find Prod_Id which are ordered by at least three different agents in same
month, it can be done as:
Type 2: Given two relations, what will be the maximum and minimum number of tuples
after natural join? Consider the following relation R(A,B,C) and S(B,D,E) with underlined
primary key. The relation R contains 200 tuples and the relation S contains 100 tuples.
What is the maximum number of tuples possible in the natural Join R and S?
To solve this type of question, first we will see that on which attribute natural join will
take place.
Natural join selects those rows which have equal values for common attribute. In this
case, expression would be like:
R.B=S.B (RX S)
In relation R, attribute B is primary key. So Relation R will have 200 distinct values of B.
On the other hand, Relation S has BD as primary key. So attribute B can have 100
distinct values or 1 value for all rows.
Case 1: S.B has 100 distinct values and each of these values match to R.B
R S
B Other Attributes B Other Attributes
1 1
2 2
. .
. .
. .
200 100
In this case, every value of B in S will match to a value of B in R. So natural join will have 100
tuples.
R S
B Other Attributes B Other Attributes
1 1
2 1
. .
. .
. .
200 1
In this case, every value of B in S will match to a value of B in R. So natural join will have 100
tuples.
Case 3: S.B has 100 distinct values and none of these values matches to R.B
R S
B Other Attributes B Other Attributes
1 201
2 202
. .
. .
. .
200 300
In this case, no value of B in S will match to a value of B in R. So natural join will have 0 tuple.
Case 4: S.B has 1 value and it does not match with R.B
R S
A Other Attributes B Other Attributes
1 300
2 300
. .
. .
. .
200 300
In this case, no value of B in S will match to a value of B in R. So natural join will have 0 tuple.
Note: If it is explicitly mentioned that S.B is foreign key to R.B, then Case 3 and Case 4 discussed
above are not possible because value of S.B will be from the values of R.B. So, minimum and
maximum number of tuples in natural join will be 100.
Last Minute Notes DBMS
We will discuss the important key points useful for GATE exams in summarized
form. For details you may refer this.
1:n cardinality 2
m:n cardinality 3
Note: This is a general observation. Special cases need to be taken care. We may need
extra table if attribute of a relationship cant be moved to any entity side.
Keys of a relation: There are various types of keys in a relation which are:
Candidate Key: The minimal set of attributes which can determine a tuple
uniquely. There can be more than 1 candidate key of a relation and its proper
subset cant determine tuple uniquely and it cant be NULL.
Super Key: The set of attributes which can determine a tuple uniquely. A
candidate key is always a super key but vice versa is not true.
Primary Key and Alternate Key: Among various candidate keys, one key is
taken primary key and others are alternate keys.
Normal Forms
First Normal Form: A relation is in first normal form if it does not contain
any multi-valued or composite attribute.
Second Normal Form: A relation is in second normal form if it does not
contain any partial dependency. A dependency is called partial dependency if
any proper subset of candidate key determines non-prime (which are not part of
candidate key) attribute.
Third Normal Form: A relation is in third normal form if it does not contain
any transitive dependency. For a relation to be in Third Normal Form, either
LHS of FD should be super key or RHS should be prime attribute.
Boyce-Codd Normal Form: A relation is in Boyce-Codd Normal Form if LHS
of every FD is super key. The relationship between Normal Forms can be
represented as: 1NF2NF3NFBCNF
Basic Semantic
Operator
U (Union) Return those tuples which are either in R1 or in R2. Max no. of rows returned =
m+n andMin no. of rows returned = max(m,n)
(Minus) R1-R2 returns those tuples which are in R1 but not in R2. Max no. of rows
returned = m and Min no. of rows returned = m-n
Extended Semantic
Operator
Returns those tuples which are in both R1 and R2. Max no. of rows returned =
(Intersection) min(m,n) and Min no. of rows returned = 0
c Selection from two or more tables based on some condition (Cross product
(Conditional followed by selection)
Join)
c(Equi Join) It is a special case of conditional join when only equality condition is applied
between attributes.
(Natural In natural join, equality condition on common attributes hold and duplicate
Join) attributes are removed by default. Note: Natural Join is equivalent to cross
product if two relations have no attribute in common and natural join of a
relation R with itself will return R only.
/(Division Division operator A/B will return those tuples in A which is associated with
Operator) every tuple of B.Note:Attributes of B should be proper subset of attributes of A.
The attributes in A/B will be Attributes of A- Attribute of B.
SQL: As opposed to Relational Algebra, SQL is a non-procedural language.
Operator Meaning
Note: Two phase locking protocol produce conflict serializable schedule but may
suffer from deadlock. On the other hand, Time-Stamp based protocols are free from
deadlock yet produce conflict serializable schedule.
View Serializable and View Equivalence: Two schedules S1 and S2 are said to be
view-equivalent if all conditions are satisfied for all objects:
Irrecoverable Schedules: For a transaction pair < Ti, Tj >, if Tj is reading the value
updated by Ti and Tj is committed before commit of Ti, the schedule will be
irrecoverable.
Recoverable Schedules: For a transaction pair < Ti, Tj >, if Tj is reading the value
updated by Ti and Tj is committed after commit of Ti, the schedule will be
recoverable.
Cascadeless Recoverable Schedules: For a transaction pair < Ti, Tj >, if value
updated by Ti is read by Tj only after commit of Ti, the schedule will be cascadeless
recoverable.
Strict Recoverable: For a transaction pair < Ti, Tj >, if value updated by Ti is read or
written by Tj only after commit of Ti, the schedule will be strict recoverable. The
relationship between them can be represented as:
What are the differences between DDL, DML and DCL in SQL?
Ans: Following are some details of three.
DDL stands for Data Definition Language. SQL queries like CREATE, ALTER, DROP and
RENAME come under this.
DML stands for Data Manipulation Language. SQL queries like SELECT, INSERT and
UPDATE come under this.
DCL stands for Data Control Language. SQL queries like GRANT and REVOKE come under
this.
What is Join?
Ans: An SQL Join is used to combine data from two or more tables, based on a common field
between them. For example, consider the following two tables.
Student Table
StudentCourse Table
CourseID EnrollNo
1 1000
2 1000
3 1000
1 1002
2 1003
Following is join query that shows names of students enrolled in different courseIDs.
CourseID StudentName
1 geek1
1 geek2
2 geek1
2 geek3
3 geek1
What is Identity?
Ans: Identity (or AutoNumber) is a column that automatically generates numeric values. A start
and increment value can be set, but most DBA leave these at 1. A GUID column also generates
numbers; the value of this cannot be controlled. Identity/GUID columns do not need to be
indexed.
What is a Trigger?
Ans: A Trigger is a code that associated with insert, update or delete operations. The code is
executed automatically whenever the associated query is executed on a table. Triggers can be
useful to maintain integrity in database.
Q. There is a table where only one row is fully repeated. Write a Query to find the
Repeated row
Name Section
abc CS1
bcd CS2
abc CS1
In the above table, we can find duplicate row using below query.
OR
ID salary DeptName
1 10000 EC
2 40000 EC
3 30000 CS
4 40000 ME
5 50000 ME
6 60000 ME
7 70000 CS
SELECT E.ID
FROM Employee E
WHERE EXISTS (SELECT E2.salary
FROM Employee E2
WHERE E2.DeptName = 'CS'
AND E.salary > E2.salary)
Following 5 rows will be result of query as 3000 is the minimum salary of CS Employees and all
these rows are greater than 30000.
2
4
5
6
7
Q. Write a trigger to update Emp table such that, If an updation is done in Dep table then
salary of all employees of that department should be incremented by some amount
(updation)
Assuming Table name are Dept and Emp, trigger can be written as
Q. There is a table which contains two column Student and Marks, you need to find all the
students, whose marks are greater than average marks i.e. list of above average students.
Q.Name the student who has secured third highest marks using sub queries.
SELECT Emp1.Name
FROM Employee Emp1
WHERE 2 = (SELECT COUNT(DISTINCT(Emp2.Salary))
FROM Employee Emp2
WHERE Emp2.Salary > Emp1.Salary
)
*LOGIC- Number of people with salary higher than this person will be 2.
Q. Why we cannot use WHERE clause with aggregate functions like HAVING ?
The difference between the having and where clause in SQL is that the where clause canNOT be
used with aggregates, but the having clause can. Please note : It is not a predefined rule but by
and large youll see that in a good number of the SQL queries, we use WHERE prior to GROUP
BY and HAVING after GROUP BY.
The Where clause acts as a pre filter where as Having as a post filter.
a c1 40
a c2 50
b c3 60
d c1 70
e c2 80
This would select data row by row basis. The having clause works on aggregated data.
For example, output of below query
Student Total
a 90
b 60
d 70
e 80
Student Total
a 90
e 80
Q. Difference between primary key and unique key and why one should use unique key if it
allows only one null ?
Primary key:
Unique Key:
Materialized views
Disk based and are updated periodically based upon the query definition.
A materialized table is created or updated infrequently and it must be synchronized with
its associated base tables.
Dynamic views
Virtual only and run the query definition each time they are accessed.
A dynamic view may be created every time that a specific view is requested by the user.
Q. What is embedded and dynamic SQL?
SQL statements in an application that do not change at runtime and, therefore, can be
hard-coded into the application.
Dynamic SQL
SQL statements that are constructed at runtime; for example, the application may allow
users to enter their own queries.
Dynamic SQL is a programming technique that enables you to buildSQL statements
dynamically at runtime. You can create more general purpose, flexible applications by
using dynamic SQL because the full text of a SQL statement may be unknown at
compilation.
http://docs.oracle.com/cd/A87860_01/doc/appdev.817/a76939/adg09dyn.htm
Which of the following queries cannot be expressed using the basic relational algebra
operations (U, -, x, , , p)? (GATE CS 2000)
(a) Department address of every employee
(b) Employees whose name is the same as their department name
(c) The sum of all employees salaries
(d) All employees of a given department
Answer: (c)
Explanation:
The six basic operators of relational algebra are the selection( ), the projection(), the Cartesian
product (x) (also called the cross product or cross join), the set union (U), the set difference (-),
and the rename (p). These six operators are fundamental in the sense that none of them can be
omitted without losing expressive power. Many other operators have been defined in terms of
these six. Among the most important are set intersection, division, and the natural join, but
aggregation is not possible with these basic relational algebra operations. So, we cannot run sum
of all employees salaries with the six operations.
References:
http://en.wikipedia.org/wiki/Relational_algebra
http://faculty.ksu.edu.sa/zitouni/203%20Haseb%20%20Lecture%20Notes/Relional%20Algebra.
pdf
x y z
1 4 2
1 5 3
1 6 3
3 2 2
Which of the following functional dependencies are satisfied by the instance? (GATE CS
2000)
(a) XY -> Z and Z -> Y
(b) YZ -> X and Y -> Z
(c) YZ -> X and X -> Z
(d) XZ -> Y and Y -> X
Answer: (b)
Explanation:
A functional dependency (FD) is a constraint between two sets of attributes in a relation from a
database. A FD X->Y require that the value of X uniquely determines the value of Y where X
and Y are set of attributes. FD is a generalization of the notion of a key.
Given that X, Y, and Z are sets of attributes in a relation R, one can derive several properties of
functional dependencies. Among the most important are Armstrongs axioms, which are used in
database normalization:
In the above question, Y uniquely determines X and Z, for a given value of Y you can easily find
out values of X and Z.
So, Y -> X and Y -> Z hold for above schema.
From rule of augmentation we can say YZ->X. If we understand the notion of FD, we dont need
to apply axioms to find out which option is true, just by looking at the schema and options we
can say that (b) is true.
References:
http://www.cse.iitb.ac.in/~sudarsha/db-book/slide-dir/ch7.pdf
http://en.wikipedia.org/wiki/Functional_dependency
Answer: (a)
Explanation:
The query selects all attributes of r. Since we have distinct in query, result can be equal to r only
if r doesnt have duplicates.
If we do not give any attribute on which we want to join two tables, then the queries like above
become equivalent to Cartesian product. Cartisian product of two sets will be empty if any of the
two sets is empty. So, s should have atleast one record to get all rows of r.
4. In SQL, relations can contain null values, and comparisons with null values are treated
as unknown. Suppose all comparisons with a null value are treated as false. Which of the
following pairs is not equivalent? (GATE CS 2000)
(a) x = 5, not (not (x = 5)
(b) x = 5, x > 4 and x < 6, where x is an integer
(c) x < 5, not(x = 5)
(d) None of the above
Answer (c)
Explanation:
It doesnt need much explanation. For all values smaller than 5, x < 5 will always be true but x =
5 will be false.
5. Consider a schema R(A, B, C, D) and functional dependencies A -> B and C -> D. Then
the decomposition of R into R1 (A, B) and R2(C, D) is (GATE CS 2001)
a) dependency preserving and loss less join
b) loss less join but not dependency preserving
c) dependency preserving but not loss less join
d) not dependency preserving and not loss less join
Answer: (c)
Explanation:
Dependency Preserving Decomposition:
Decomposition of R into R1 and R2 is a dependency preserving decomposition if closure of
functional dependencies after decomposition is same as closure of of FDs before decomposition.
A simple way is to just check whether we can derive all the original FDs from the FDs present
after decomposition.
In the above question R(A, B, C, D) is decomposed into R1 (A, B) and R2(C, D) and there are
only two FDs A -> B and C -> D. So, the decomposition is dependency preserving
Lossless-Join Decomposition:
Decomposition of R into R1 and R2 is a lossless-join decomposition if at least one of the
following functional dependencies are in F+ (Closure of functional dependencies)
R1 R2 R1
OR
R1 R2 R2
In the above question R(A, B, C, D) is decomposed into R1 (A, B) and R2(C, D), and R1 R2 is
empty. So, the decomposition is not lossless.
Database Management Systems | Set 2
Following Questions have been asked in GATE 2012 exam.
Answer (B)
P is correct. HAVING clause can also be used with aggregate function. If we use a HAVING
clause without a GROUP BY clause, the HAVING condition applies to all rows that satisfy the
search condition. In other words, all rows that satisfy the search condition make up a single
group. See this for more details.
SELECT Count(*)
FROM temp
GROUP BY name;
Output:
count(*)
--------
2
1
1
2) Given the basic ER and relational models, which of the following is INCORRECT?
(A) An attributes of an entity can have more that one value
(B) An attribute of an entity can be composite
(C) In a row of a relational table, an attribute can have more than one value
(D) In a row of a relational table, an attribute can have exactly one value or a NULL value
Answer (C)
The term entity belongs to ER model and the term relational table belongs to relational
model.
A and B both are true. ER model supports both multivalued and composite attributes See this for
more details.
(C) is false and (D) is true. In Relation model, an entry in relational table can can have exactly
one value or a NULL.
3) Suppose (A, B) and (C,D) are two relation schemas. Let r1 and r2 be the corresponding
relation instances. B is a foreign key that refers to C in r2. If data in r1 and r2 satisfy
referential integrity constraints, which of the following is ALWAYS TRUE?
Answer (A)
B is a foreign key in r1 that refers to C in r2. r1 and r2 satisfy referential integrity constraints. So
every value that exists in column B of r1 must also exist in column C of r2.
Answer (C)
BCNF is a stronger version 3NF. So every relation in BCNF will also be in 3NF.
Database Management Systems | Set 3
Following Questions have been asked in GATE 2012 exam.
1) Consider the following transactions with data items P and Q initialized to zero:
Answer (B)
Two or more actions are said to be in conflict if:
1) The actions belong to different transactions.
2) At least one of the actions is a write operation.
3) The actions access the same object (read or write).
The schedules S1 and S2 are said to be conflict-equivalent if the following conditions are
satisfied:
1) Both schedules S1 and S2 involve the same set of transactions (including ordering of actions
within each transaction).
2) The order of each pair of conflicting actions in S1 and S2 are the same.
Table A
Id Name Age
----------------
12 Arun 60
15 Shreya 24
99 Rohit 11
Table B
Id Name Age
----------------
15 Shreya 24
25 Hari 40
98 Rohit 20
99 Rohit 11
Table C
Id Phone Area
-----------------
10 2200 02
99 2100 01
(A) 7
(B) 4
(C) 5
(D) 9
Answer (A)
Id Name Age
----------------
12 Arun 60
15 Shreya 24
99 Rohit 11
25 Hari 40
98 Rohit 20
3) Consider the above tables A, B and C. How many tuples does the result of the following
SQL query contains?
SELECT A.id
FROM A
WHERE A.age > ALL (SELECT B.age
FROM B
WHERE B. name = "arun")
(A) 4
(B) 3
(C) 0
(D) 1
Answer (B)
The meaning of ALL is the A.Age should be greater than all the values returned by the
subquery. There is no entry with name arun in table B. So the subquery will return NULL. If a
subquery returns NULL, then the condition becomes true for all rows of A (See this for details).
So all rows of table A are selected.
Database Management Systems | Set 4
Following Questions have been asked in GATE 2011 exam.
1. Consider a relational table with a single record for each registered student with the
following attributes.
Answer (A)
A Candidate Key value must uniquely identify the corresponding row in table.
BankAccount_Number is not a candidate key. As per the question A student can have multiple
accounts or joint accounts. This attributes stores the primary account number. If two students
have a joint account and if the joint account is their primary account, then
BankAccount_Number value cannot uniquely identify a row.
2) Consider a relational table r with sufficient number of records, having attributes A1,
A2,, An and let 1 <= p <= n. Two queries Q1 and Q2 are given below.
Answer (C)
If record are accessed for a particular value from table, hashing will do better. If records are
accessed in a range of values, ordered indexing will perform better. See this for more details.
3) Database table by name Loan_Records is given below.
SELECT Count(*)
FROM ( (SELECT Borrower, Bank_Manager
FROM Loan_Records) AS S
NATURAL JOIN (SELECT Bank_Manager,
Loan_Amount
FROM Loan_Records) AS T );
(A) 3
(B) 9
(C) 5
(D) 6
Answer (C)
Borrower Bank_Manager
--------------------------
Ramesh Sunderajan
Suresh Ramgqpal
Mahesh Sunderjan
Bank_Manager Loan_Amount
---------------------------
Sunderajan 10000.00
Ramgopal 5000.00
Sunderjan 7000.00
Following will be the result of natural join of above two tables. The key thing to note is that the
natural join happens on column name with same name which is Bank_Manager in the above
example. Sunderjan appears two times in Bank_Manager column, so their will be four entries
with Bank_Manager as Sunderjan.
4) Consider a database table T containing two columns X and Y each of type integer. After
the creation of the table, one record (X=1, Y=1) is inserted in the table.
Let MX and My denote the respective maximum values of X and Y among all records in
the table at any point in time. Using MX and MY, new records are inserted in the table 128
times with X and Y values being MX+1, 2*MY+1 respectively. It may be noted that each
time after the insertion, values of MX and MY change. What will be the output of the
following SQL query after the steps mentioned above are carried out?
(A) 127
(B) 255
(C) 129
(D) 257
Answer (A)
X Y
-------
1 1
2 3
3 7
4 15
5 31
6 63
7 127
......
......
Database Management Systems | Set 5
Following Questions have been asked in GATE CS 2010 exam.
Table: Passenger
pid pname age
-----------------
0 Sachin 65
1 Rahul 66
2 Sourav 67
3 Anil 69
Table : Reservation
pid class tid
---------------
0 AC 8200
1 AC 8201
2 SC 8201
5 AC 8203
1 SC 8204
3 AC 8202
What pids are returned by the following SQL query for the above instance of the tables?
SLECT pid
FROM Reservation ,
WHERE class AC AND
EXISTS (SELECT *
FROM Passenger
WHERE age > 65 AND
Passenger. pid = Reservation.pid)
(A) 1, 0
(B) 1, 2
(C) 1, 3
(S) 1, 5
Answer (C)
When a subquery uses values from outer query, the subquery is called correlated subquery. The
correlated subquery is evaluated once for each row processed by the outer query.
The outer query selects 4 entries (with pids as 0, 1, 5, 3) from Reservation table. Out of these
selected entries, the subquery returns Non-Null values only for 1 and 3.
2) Which of the following concurrency control protocols ensure both conflict serialzability
and freedom from deadlock?
I. 2-phase locking
II. Time-stamp ordering
(A) I only
(B) II only
(C) Both I and II
(D) Neither I nor II
Answer (B)
2 Phase Locking (2PL) is a concurrency control method that guarantees serializability. The
protocol utilizes locks, applied by a transaction to data, which may block (interpreted as signals
to stop) other transactions from accessing the same data during the transactions life. 2PL may be
lead to deadlocks that result from the mutual blocking of two or more transactions. See the
following situation, neither T3 nor T4 can make progress.
Which one of the schedules below is the correct serialization of the above?
(A)T1 T3 T2
(B)T2 T1 T3
(C)T2 T3 T1
(D)T3 T1 T2
Answer (A)
T1 can complete before T2 and T3 as there is no conflict between Write(X) of T1 and the
operations in T2 and T3 which occur before Write(X) of T1 in the above diagram.
T3 should can complete before T2 as the Read(Y) of T3 doesnt conflict with Read(Y) of T2.
Similarly, Write(X) of T3 doesnt conflict with Read(Y) and Write(Y) operations of T2.
Another way to solve this question is to create a dependency graph and topologically sort the
dependency graph. After topologically sorting, we can see the sequence T1, T3, T2.
4) Which of the following functional dependencies hold for relations R(A, B, C) and S(B, D,
E):
B A,
AC
The relation R contains 200 tuples and the rel ation S contains 100 tuples. What is the
maximum number of tuples possible in the natural join RS (R natural join S)
(A) 100
(B) 200
(D) 300
(D) 2000
Answer (A)
From the given set of functional dependencies, it can be observed that B is a candidate key of R.
So all 200 values of B must be unique in R. There is no functional dependency given for S. To
get the maximum number of tuples in output, there can be two possibilities for S.
1) All 100 values of B in S are same and there is an entry in R that matches with this value. In
this case, we get 100 tuples in output.
2) All 100 values of B in S are different and these values are present in R also. In this case also,
we get 100 tuples.
Database Management Systems | Set 6
Following questions have been asked in GATE 2009 CS exam.
1) Consider two transactions T1 and T2, and four schedules S1, S2, S3, S4 of T1 and T2 as
given below:
T1 = R1[X] W1[X] W1[Y]
T2 = R2[X] R2[Y] W2[Y]
S1 = R1[X] R2[X] R2[Y] W1[X] W1[Y] W2[Y]
S2 = R1[X] R2[X] R2[Y] W1[X] W2[Y] W1[Y]
S3 = R1[X] W1[X] R2[X] W1[Y] R2[Y] W2[Y]
S1 = R1[X] R2[Y]R2[X]W1[X] W1[Y] W2[Y]
Which of the above schedules are conflict-serializable?
(A) S1 and S2
(B) S2 and S3
(C) S3 only
(D) S4 only
Answer (B)
There can be two possible serial schedules T1 T2 and T2 T1. The serial schedule T1 T2 has the
following sequence of operations
R1[X] W1[X] W1[Y] R2[X] R2[Y] W2[Y]
And the schedule T2 T1 has the following sequence of operations.
R2[X] R2[Y] W2[Y] R1[X] W1[X] W1[Y]
The Schedule S2 is conflict-equivalent to T2 T1 and S3 is conflict-equivalent to T1 T2.
2) Let R and S be relational schemes such that R={a,b,c} and S={c}. Now consider
the following queries on the database:
SELECT S.sname
FROM Suppliers S
WHERE S.sid NOT IN (SELECT C.sid
FROM Catalog C
WHERE C.pid NOT IN (SELECT P.pid
FROM Parts P
WHERE P.color<> 'blue'))
Assume that relations corresponding to the above schema are not empty. Which one of the
following is the correct interpretation of the above query?
(A) Find the names of all suppliers who have supplied a non-blue part.
(B) Find the names of all suppliers who have not supplied a non-blue part.
(C) Find the names of all suppliers who have supplied only blue parts.
(D) Find the names of all suppliers who have not supplied only blue parts.
Answer (A)
The subquery SELECT P.pid FROM Parts P WHERE P.color<> blue gives pids of parts
which are not blue. The bigger subquery SELECT C.sid FROM Catalog C WHERE C.pid NOT
IN (SELECT P.pid FROM Parts P WHERE P.color<> blue) gives sids of all those suppliers
who have supplied blue parts. The complete query gives the names of all suppliers who have
supplied a non-blue part
4) Assume that, in the suppliers relation above, each supplier and each street within a city
has a unique name, and (sname, city) forms a candidate key. No other functional
dependencies are implied other than those implied by primary and candidate keys. Which
one of the following is TRUE about the above schema?
(A) The schema is in BCNF
(B) The schema is in 3NF but not in BCNF
(C) The schema is in 2NF but not in 3NF
(D) The schema is not in 2NF
Answer (A)
A relation is in BCNF if for every one of its dependencies X ? Y, at least one of the following
conditions hold:
Since (sname, city) forms a candidate key, there is no non-tirvial dependency X ? Y where X is
not a superkey
Database Management Systems | Set 7
Following questions have been asked in GATE 2008 CS exam.
Answer (D)
In I, Ps from natural join of R and S are selected.
In III, all Ps from intersection of (P, Q) pairs present in R and S.
IV is also equivalent to III because (R (R S)) = R S.
II is not equivalent as it may also include Ps where Qs are not same in R and S.
Answer (B)
See http://geeksquiz.com/gate-gate-cs-2008-question-82/ for explanation.
3) Which of the following is a correct attribute set for one of the tables for the correct
answer to the above question?
(A) {M1, M2, M3, P1}
(B) {M1, P1, N1, N2}
(C) {M1, P1, N1}
(D) {M1, P1}
Answer (A)
Assume {Author, Title} is the key for both schemes. Which of the following statements is
true?
(A) Both Book and Collection are in BCNF
(B) Both Book and Collection are in 3NF only
(C) Book is in 2NF and Collection is in 3NF
(D) Both Book and Collection are in 2NF only
Answer (C)
Table Collection is in BCNF as there is only one functional dependency Title Author >
Catalog_no and {Author, Title} is key for collection. Book is not in BCNF because Catalog_no
is not a key and there is a functional dependency Catalog_no > Title Author Publisher Year.
Book is not in 3NF because non-prime attributes (Publisher Year) are transitively dependent on
key [Title, Author]. Book is in 2NF because every non-prime attribute of the table is either
dependent on the key [Title, Author], or on another non prime attribute.
Database Management Systems | Set 8
Following questions have been asked in GATE 2005 CS exam.
Answer (c)
It is not always possible to decompose a table in BCNF and preserve dependencies. For example,
a set of functional dependencies {AB > C, C > B} cannot be decomposed in BCNF. See this
for more details.
2) The following table has two attributes A and C where A is the primary key and C is the
foreign key referencing A with on-delete cascade.
A C
-----
2 4
3 4
4 3
5 2
7 2
9 5
6 4
The set of all tuples that must be additionally deleted to preserve referential integrity when
the tuple (2,4) is deleted is:
(a) (3,4) and (6,4)
(b) (5,2) and (7,2)
(c) (5,2), (7,2) and (9,5)
(d) (3,4), (4,3) and (6,4)
Answer (C)
When (2,4) is deleted. Since C is a foreign key referring A with delete on cascade, all entries
with value 2 in C must be deleted. So (5, 2) and (7, 2) are deleted. As a result of this 5 and 7 are
deleted from A which causes (9, 5) to be deleted.
3) The relation book (title, price) contains the titles and prices of different books. Assuming
that no two books have the same price, what does the following SQL query list?
select title
from book as B
where (select count(*)
from book as T
where T.price > B.price) < 5
Answer (d)
When a subquery uses values from outer query, the subquery is called correlated subquery. The
correlated subquery is evaluated once for each row processed by the outer query.
The outer query selects all titles from book table. For every selected book, the subquery returns
count of those books which are more expensive than the selected book. The where clause of
outer query will be true for 5 most expensive book. For example count (*) will be 0 for the most
expensive book and count(*) will be 1 for second most expensive book.
Database Management Systems | Set 9
Following questions have been asked in GATE 2006 CS exam.
1) Consider the following log sequence of two transactions on a bank account, with initial
balance 12000, that transfer 2000 to a mortgage payment and then apply a 5% interest.
1. T1 start
2. T1 B old=12000 new=10000
3. T1 M old=0 new=2000
4. T1 commit
5. T2 start
6. T2 B old=10000 new=10500
7. T2 commit
Suppose the database system crashes just before log record 7 is written. When the system is
restarted, which one statement is true of the recovery procedure?
(A) We must redo log record 6 to set B to 10500
(B) We must undo log record 6 to set B to 10000 and then redo log records 2 and 3
(C) We need not redo log records 2 and 3 because transaction T1 has committed
(D) We can apply redo and undo operations in arbitrary order because they are idempotent.
Answer (B)
2) Consider the relation enrolled (student, course) in which (student, course) is the primary
key, and the relation paid (student, amount) where student is the primary key. Assume no
null values and no foreign keys or integrity constraints. Given the following four queries:
Query1: select student from enrolled where student in (select student from
paid)
Query2: select student from paid where student in (select student from
enrolled)
Query3: select E.student from enrolled E, paid P where E.student = P.student
Query4: select student from paid where exists
(select * from enrolled where enrolled.student = paid.student)
Answer (A)
The output of Query2, Query3 and Query4 will be identical. Query1 may produce duplicate
rows. But rowset produced by all of them will be same.
Table enrolled
student course
----------------
abc c1
xyz c1
abc c2
pqr c1
Table paid
student amount
-----------------
abc 20000
xyz 10000
rst 10000
Output of Query 1
abc
abc
xyz
Output of Query 2
abc
xyz
Output of Query 3
abc
xyz
Output of Query 4
abc
xyz
3) Consider the relation enrolled(student, course) in which (student, course) is the primary
key, and the relation paid(student, amount), where student is the primary key. Assume no
null values and no foreign keys or integrity constraints. Assume that amounts 6000, 7000,
8000, 9000 and 10000 were each paid by 20% of the students. Consider these query plans
(Plan 1 on left, Plan 2 on right) to list all courses taken by students who have paid more
than x.
A disk seek takes 4ms, disk data transfer bandwidth is 300 MB/s and checking a tuple to
see if amount is greater than x takes 10 micro-seconds. Which of the following statements is
correct?
(A) Plan 1 and Plan 2 will not output identical row sets for all databases.
(B) A course may be listed more than once in the output of Plan 1 for some databases
(C) For x = 5000, Plan 1 executes faster than Plan 2 for all databases.
(D) For x = 9000, Plan I executes slower than Plan 2 for all databases.
Answer (C)
Assuming that large enough memory is available for all data needed. Both plans need to load
both tables courses and enrolled. So disk access time is same for both plans.
Plan 2 does lesser number of comparisons compared to plan 1.
1) Join operation will require more comparisons as the second table will have more rows in plan
2 compared to plan 1.
2) The joined table of two tables will will have more rows, so more comparisons are needed to
find amounts greater than x.
AB CD, AF D, DE F, C G , F E, G A
Answer (C)
Closure of AF or AF+ = {ADEF}, closure of AF doesnt contain C and G.
Option (D) also looks correct. AB+ = {ABCDG}, closure of AB doesnt contain F.
Database Management Systems | Set 10
Following questions have been asked in GATE CS 2005 exam.
1) Let r be a relation instance with schema R = (A, B, C, D). We define r1 = select A,B,C
from r and r2 = select A, D from r. Let s = r1 * r2 where * denotes natural join. Given
that the decomposition of r into r1 and r2 is lossy, which one of the following is TRUE?
(a) s is subset of r
(b) r U s = r
(c) r is a subset of s
(d) r * s = s
Answer (c)
Consider the following example with lossy decomposition of r into r1 and r2. We can see that r is
a subset of s.
Table r
A B C D
---------------------------
1 10 100 1000
1 20 200 1000
1 20 200 1001
Table r1
A B C
------------------
1 10 100
1 20 200
Table r2
A D
-----------
1 1000
1 1001
2) Let E1 and E2 be two entities in an E/R diagram with simple single-valued attributes. R1
and R2 are two relationships between E1 and E2, where R1 is one-to-many and R2 is
many-to-many. R1 and R2 do not have any attributes of their own. What is the minimum
number of tables required to represent this situation in the relational model?
(a) 2
(b) 3
(c) 4
(d) 5
Answer (b)
See http://geeksquiz.com/gate-gate-cs-2005-question-75/ for explanation.
Answer (d)
A set of attributes S is candidate key of relation R if the closure of S is all attributes of R and
there is no subset of S whose closure is all attributes of R.
Closure of AEH, i.e. AEH+ = {ABCDEH}
Closure of BEH, i.e. BEH+ = {ABCDEH}
Closure of DEH, i.e. DEH+ = {ABCDEH}
Database Management Systems | Set 11
Following questions have been asked in GATE CS 2007 exam.
Answer (B)
The expression given in question does following steps in sequence.
a) Select studids of all female students and selects all courseids of all courses.
b) Then the query does a Cartesian Product of the above select two columns from different
tables.
c) Finally it subtracts enroll table from the result of above step (b). This will remove all the
(studid, courseid) pairs which are present in enroll table. If all female students have registered in
a courses, then this course will not be there in the subtracted result.
So the complete expression returns courses in which a proper subset of female students are
enrolled.
studinfo table
studid name sex
------------------------
1 a Male
2 c Female
3 d Female
enroll table
studid courseid
------------------
1 1
2 1
3 1
2 2
3 3
3 2
Result of step b
studid courseid
---------------------
2 1
2 2
2 3
3 1
3 2
3 3
Result of step c
studid courseid
-------------------
2 3
2) Consider the relation employee(name, sex, supervisorName) with name as the key.
supervisorName gives the name of the supervisor of the employee under consideration.
What does the following Tuple Relational Calculus query produce?
Answer (C)
The query selects all those employees whose immediate subordinate is male. In other words, it
selects names of employees with no immediate female subordinates
3) Consider the table employee(empId, name, department, salary) and the two queries Q1
,Q2 below. Assuming that department 5 has more than one employee, and we want to find
the employees who get higher salary than anyone in the department 5, which one of the
statements is TRUE for any arbitrary employee table?
Q1 : Select e.empId
From employee e
Where not exists
(Select * From employee s where s.department = 5 and
s.salary >=e.salary)
Q2 : Select e.empId
From employee e
Where e.salary > Any
(Select distinct salary From employee s Where s.department = 5)
Answer (D)
5) Consider the following schedules involving two transactions. Which one of the following
statements is TRUE?
Answer (C)
S1 is not conflict serializable, but S2 is conflict serializable
Schedule S1
T1 T2
---------------------
r1(X)
r1(Y)
r2(X)
r2(Y)
w2(Y)
w1(X)
The schedule is neither conflict equivalent to T1T2, nor T2T1.
Schedule S2
T1 T2
---------------------
r1(X)
r2(X)
r2(Y)
w2(Y)
r1(Y)
w1(X)
The schedule is conflict equivalent to T2T1.