DBMS GFG
DBMS GFG
DBMS GFG
ER Model is used to model the logical view of the system from data perspective which consists of these
components:
An Entity may be an object with a physical existence – a particular person, car, house, or employee – or it may be an
object with a conceptual existence – a company, a job, or a university course.
An Entity is an object of Entity Type and set of all entities is called as entity set. e.g.; E1 is an entity having Entity
Type Student and set of all students is called Entity Set. In ER diagram, Entity type is represented as:
Attribute(s):
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB, Age, Address,
Mobile_No are the attributes which defines entity type Student. In ER diagram, attribute is represented by an oval.
1. Key Attribute –
The attribute which uniquely identifies each entity in the entity set is called key attribute.For example,
Roll_No will be unique for each student. In ER diagram, key attribute is represented by an oval with underlying
lines.
1. Composite Attribute –
An attribute composed of many other attribute is called as composite attribute. For example, Address
attribute of student Entity type consists of Street, City, State, and Country. In ER diagram, composite attribute
is represented by an oval comprising of ovals.
2. Multivalued Attribute –
An attribute consisting more than one value for a given entity. For example, Phone_No (can be more than one
for a given student). In ER diagram, multivalued attribute is represented by double oval.
3. Derived Attribute –
An attribute which can be derived from other attributes of the entity type is known as derived attribute. e.g.;
Age (can be derived from DOB). In ER diagram, derived attribute is represented by dashed oval.
The complete entity type Student with its attributes can be represented as:
A set of relationships of same type is known as relationship set. The following relationship set depicts S1 is enrolled
in C2, S2 is enrolled in C1 and S3 is enrolled in C3.
Degree of a relationship set:
The number of different entity sets participating in a relationship set is called as degree of a relationship set.
1. Unary Relationship –
When there is only ONE entity set participating in a relation, the relationship is called as unary relationship.
For example, one person is married to only one person.
2. Binary Relationship –
When there are TWO entities set participating in a relation, the relationship is called as binary
relationship.For example, Student is enrolled in Course.
3. n-ary Relationship –
When there are n entities set participating in a relation, the relationship is called as n-ary relationship.
Cardinality:
The number of times an entity of an entity set participates in a relationship set is known as cardinality.
Cardinality can be of different types:
1. One to one – When each entity in each entity set can take part only once in the relationship, the cardinality is
one to one. Let us assume that a male can marry to one female and a female can marry to one male. So the
relationship will be one to one.
Using
Sets, it can be represented as:
In this case, each student is taking only 1 course but 1 course has been taken by many students.
3. Many to many – When entities in all entity sets can take part more than once in the relationship cardinality
is many to many. Let us assume that a student can take more than one course and one course can be taken by
many students. So the relationship will be many to many.
Every student in Student Entity set is participating in relationship but there exists a course C4 which is not
taking part in the relationship.
Weak Entity Type and Identifying Relationship:
As discussed before, an entity type has a key attribute which uniquely identifies each entity in the entity set. But
there exists some entity type for which key attribute can’t be defined. These are called Weak Entity type.
For example, A company may store the information of dependants (Parents, Children, Spouse) of an Employee. But
the dependents don’t have existence without the employee. So Dependent will be weak entity type and Employee
will be Identifying Entity type for Dependant.
A weak entity type is represented by a double rectangle. The participation of weak entity type is always total. The
relationship between weak entity type and its identifying strong entity type is called identifying relationship and it is
represented by double diamond.
Enhanced ER Model
Prerequisite – Introduction of ER Model
Todays time the complexity of the data is increasing so it becomes more and more difficult to use the traditional ER
model for database modeling. To reduce this complexity of modeling we have to make improvements or
enhancements were made to the existing ER model to make it able to handle the complex application in a better
way.
Enhanced entity-relationship diagrams are advanced database diagrams very similar to regular ER diagrams which
represents requirements and complexities of complex databases.
It is a diagrammatic technique for displaying the Sub Class and Super Class; Specialization and Generalization;
Union or Category; Aggregation etc.
Generalization and Specialization –
These are very common relationship found in real entities. However this kind of relationships was added later as
enhanced extension to classical ER model. Specialized class are often called as subclass while generalized
class are called superclass, probably inspired by object oriented programming. A sub-class is best understood
by “IS-A analysis”. Following statements hopefully makes some sense to your mind “Technician IS-A Employee”,
“Laptop IS-A Computer”.
An entity is specialized type/class of other entity. For example, Technician is special Employee in a university
system Faculty is special class of Employee. We call this phenomenon as generalization/specialization. In the
example here Employee is generalized entity class while Technician and Faculty are specialized class of Employee.
Example – This example instance of “sub-class” relationships. Here we have four sets employee: Secretary,
Technician, and Engineer. Employee is super-class of rest three set of individual sub-class is subset of Employee set.
An entity belonging to a sub-class is related with some super-class entity. For instance emp no 1001 is a
secretary, and his typing speed is 68. Emp no 1009 is engineer (sub-class) and her trade is “Electrical”, so forth.
Sub-class entity “inherits” all attributes of super-class; for example employee 1001 will have attributes eno,
name, salary, and typing speed.
Enhanced ER model of above example –
For Student(SID, Name), SID is the primary key. For Course ( CID, C_name ), CID is the primary key
Student Course
(SID Name) ( CID C_name )
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
(SID CID)
----------
1 C1
2 C1
3 c3
4 C2
Now the question is, what should be the primary key for Enroll SID or CID or combined. We can’t have CID as
primary key as you can see in enroll for the same CID we have multiples SID. (SID , CID) can distinguish table
uniquely, but it is not minimum. So SID is the primary key for the relation enroll.
For above ER diagram, we considered three tables in database
Student
Enroll
Course
But we can combine Student and Enroll table renamed as Student_enroll.
Student_Enroll
( SID Name CID )
---------------------
1 A c1
2 B c1
3 C c3
4 D c2
Student and enroll tables are merged now .
So require minimum two DBMS tables for Student_enroll and Course.
Note: In One to Many relationship we can have minimum two tables.
2. When there is Many to Many cardinality in ER Diagram.
Let us consider above example with the change that now student can also enroll more than 1 course.
Student Course
( SID Name) ( CID C_name )
-------------- -----------------
1 A c1 Z
2 B c2 Y
3 C c3 X
4 D
Enroll
( SID CID )
----------
1 C1
1 C2
2 C1
2 C2
3 c3
4 C2
Now, same question what is the primary key of Enroll relation, if we carefully analyse the Enroll primary key for
Enroll
table is ( SID , CID ).
But in this case we can’t merge Enroll table with any one of Student and Course. If we try to merge Enroll with any
one of the Student and Course it will create redundant data.
Note: Minimum three tables are required in Many to Many relationship.
The primary key of E1, which is in total participation should not be allowed as the primary key of the reduced
table, since if the primary key of E1 is used, it might have null values for many of its entries in the reduced
table for some E2 entities.
Specialization –
In specialization, an entity is divided into sub-entities based on their characteristics. It is a top-down approach where higher
level entity is specialized into two or more lower level entities. For Example, EMPLOYEE entity in an Employee
management system can be specialized into DEVELOPER, TESTER etc. as shown in Figure 2. In this case, common
attributes like E_NAME, E_SAL etc. become part of higher entity (EMPLOYEE) and specialized attributes like TES_TYPE
become part of specialized entity (TESTER).
Aggregation –
An ER diagram is not capable of representing relationship between an entity and a relationship which may be required in
some scenarios. In those cases, a relationship with its corresponding entities is aggregated into a higher level entity. For
Example, Employee working for a project may require some machinery. So, REQUIRE relationship is needed between
relationship WORKS_FOR and entity MACHINERY. Using aggregation, WORKS_FOR relationship with its entities
EMPLOYEE and PROJECT is aggregated into single entity and relationship REQUIRE is created between aggregated entity
and MACHINERY.
Recursive Relationships in ER diagrams
Prerequisite – ER Model
A relationship between two entities of similar entity type is called a recursive relationship. Here the same entity
type participates more than once in a relationship type with a different role for each instance. In other words, a
relationship has always been between occurrences in two different entities. However, it is possible for the same
entity to participate in the relationship. This is termed a recursive relationship.
Example –
Let us suppose that we have an employee table. A manager supervises a subordinate. Every employee can have
a supervisor except the CEO and there can be at most one boss for each employee. One employee may be the
boss of more than one employee. Let’s suppose that REPORTS_TO is a recursive relationship on the Employee
entity type where each Employee plays two roles
Supervisor
Subordinate
Supervisor and Subordinate are called “Role Names”. Here the degree of the REPORTS_TO relationship is 1
i.e. a unary relationship.
The minimum cardinality of Supervisor entity is ZERO since the lowest level employee may not be a manager
for anyone.
The maximum cardinality of Supervisor entity is N since an employee can manage many employees.
Similarly the Subordinate entity has a minimum cardinality of ZERO to account for the case where CEO can
never be a subordinate.
It maximum cardinality is ONE since a subordinate employee can have at most one supervisor.
Note – Here none of the participants have a total participation since both minimum cardinalities are Zero.
Hence, the relationships are connected by a single line instead of a double line in the ER diagram.
To implement a recursive relationship, a foreign key of the employee’s manager number would be held in each
employee record. A Sample table would look something like this:-
The first problem that may occur is that is data type mismatch means the programming language attribute data
type may differ from the attribute data type in the data model.
Hence it is quite necessary to have a binding for each host programming language that specifies for each
attribute type the compatible programming language types. It is necessary to have different data types, for
example, we have different data types available in different programming languages such as data types in C are
different from Java and both differ from SQL data types.
The second problem that may occur is because the results of most queries are sets or multisets of tuples and
each tuple is formed of a sequence of attribute values. In the program, it is necessary to access the individual
data values within individual tuples for printing or processing.
Hence there is a need for binding to map the query result data structure which is a table to an appropriate data
structure in the programming language. A mechanism is needed to loop over the tuples in a query result in
order to access a single tuple at a time and to extract individual values from the tuple.
The extracted values are typically copied to appropriate program variables for further processing by the
program.
A cursor or iterator is a variable which is used for looping over the tuples in a query result. Individual values
within each tuple are extracted into different or unique program variables of the appropriate datatype.
Impedance mismatch is less of a problem when a special database programming language is designed that uses
the same data model and data type as a database model for example Oracles’sPL/SQL.
Introduction of Relational Model and Codd Rules in DBMS
Terminology
Relational Model: Relational model represents data in the form of relations or tables.
Relational Schema: Schema represents structure of a relation. e.g.; Relational Schema of STUDENT relation
can be represented as:
STUDENT (STUD_NO, STUD_NAME, STUD_PHONE, STUD_STATE, STUD_COUNTRY, STUD_AGE)
Relational Instance: The set of values present in a relation at a particular instance of time is known as relational
instance as shown in Table 1 and Table 2.
Attribute: Each relation is defined in terms of some properties, each of which is known as attribute. For
Example, STUD_NO, STUD_NAME etc. are attributes of relation STUDENT.
Domain of an attribute: The possible values an attribute can take in a relation is called its domain. For Example,
domain of STUD_AGE can be from 18 to 40.
Tuple: Each row of a relation is known as tuple. e.g.; STUDENT relation given below has 4 tuples.
NULL values: Values of some attribute for some tuples may be unknown, missing or undefined which are
represented by NULL. Two NULL values in a relation are considered different from each other.
Table 1 and Table 2 represent relational model having two relations STUDENT and STUDENT_COURSE.
Codd rules were proposed by E.F. Codd which should be satisfied by relational model.
Foundation Rule: For any system that is advertised as, or claimed to be, a relational data base management
system, that system must be able to manage data bases entirely through its relational capabilities.
Information Rule: Data stored in Relational model must be a value of some cell of a table.
Guaranteed Access Rule: Every data element must be accessible by table name, its primary key and name of
attribute whose value is to be determined.
Systematic Treatment of NULL values: NULL value in database must only correspond to missing, unknown or
not applicable values.
Active Online Catalog: Structure of database must be stored in an online catalog which can be queried by
authorized users.
Comprehensive Data Sub-language Rule: A database should be accessible by a language supported for
definition, manipulation and transaction management operation.
View Updating Rule: Different views created for various purposes should be automatically updatable by the
system.
High level insert, update and delete rule: Relational Model should support insert, delete, update etc. operations
at each level of relations. Also, set operations like Union, Intersection and minus should be supported.
Physical data independence: Any modification in the physical location of a table should not enforce
modification at application level.
Logical data independence: Any modification in logical or conceptual schema of a table should not enforce
modification at application level. For example, merging of two tables into one should not affect application
accessing it which is difficult to achieve.
Integrity Independence: Integrity constraints modified at database level should not enforce modification at
application level.
Distribution Independence: Distribution of data over various locations should not be visible to end-users.
Non-Subversion Rule: Low level access to data should not be able to bypass integrity rule to change data.
Relational Model was proposed by E.F. Codd to model data in the form of relations or tables. After designing
the conceptual model of Database using ER diagram, we need to convert the conceptual model in the relational
model which can be implemented using any RDMBS languages like Oracle SQL, MySQL etc. So we will see
what Relational Model is.
Relational Model represents how data is stored in Relational Databases. A relational database stores data in the
form of relations (tables). Consider a relation STUDENT with attributes ROLL_NO, NAME, ADDRESS,
PHONE and AGE shown in Table 1.
STUDENT
IMPORTANT TERMINOLOGIES
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME
Relation Schema: A relation schema represents name of the relation with its attributes. e.g.; STUDENT
(ROLL_NO, NAME, ADDRESS, PHONE and AGE) is relation schema for STUDENT. If a schema has more
than 1 relation, it is called Relational Schema.
Tuple: Each row in the relation is known as tuple. The above relation contains 4 tuples, one of which is shown
as:
1 RAM DELHI 9455123451 18
Relation Instance: The set of tuples of a relation at a particular instance of time is called as relation instance.
Table 1 shows the relation instance of STUDENT at a particular time. It can change whenever there is
insertion, deletion or updation in the database.
Degree: The number of attributes in the relation is known as degree of the relation. The STUDENT relation
defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above
has cardinality 4.
Column: Column represents the set of values for a particular attribute. The column ROLL_NO is extracted
from relation STUDENT.
ROLL_NO
1
2
3
4
NULL Values: The value which is not known or unavailable is called NULL value. It is represented by blank
space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Constraints in Relational Model
While designing Relational Model, we define some conditions which must hold for data present in database are
called Constraints. These constraints are checked before performing any operation (insertion, deletion and
updation) in database. If there is a violation in any of constrains, operation will fail.
Domain Constraints: These are attribute level constraints. An attribute can only take values which lie inside the
domain range. e.g,; If a constrains AGE>0 is applied on STUDENT relation, inserting negative value of AGE
will result in failure.
Key Integrity: Every relation in the database should have atleast one set of attributes which defines a tuple
uniquely. Those set of attributes is called key. e.g.; ROLL_NO in STUDENT is a key. No two students can
have same roll number. So a key has two properties:
STUDENT
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
ECE ELECTRONICS AND COMMUNICATION ENGINEERING
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of BRANCH
which is called referential integrity constraint. The relation which is referencing to other relation is called
REFERENCING RELATION (STUDENT in this case) and the relation to which other relations refer is called
REFERENCED RELATION (BRANCH in this case).
The value of Candidate Key is unique and non-null for every tuple.
There can be more than one candidate key in a relation. For Example, STUD_NO as well as STUD_PHONE
both are candidate keys for relation STUDENT.
The candidate key can be simple (having only one attribute) or composite as well. For Example, {STUD_NO,
COURSE_NO} is a composite candidate key for relation STUDENT_COURSE.
Note – In Sql Server a unique constraint that has a nullable column, allows the value ‘null‘ in that column only
once. That’s why STUD_PHONE attribute as candidate here, but can not be ‘null’ values in primary key
attribute.
Super Key: The set of attributes which can uniquely identify a tuple is known as Super Key. For Example,
STUD_NO, (STUD_NO, STUD_NAME) etc.
Alternate Key: The candidate key other than primary key is called as alternate key. For Example, STUD_NO as
well as STUD_PHONE both are candidate keys for relation STUDENT but STUD_PHONE will be alternate
key (only one out of many candidate keys).
Foreign Key: If an attribute can only take the values which are present as values of some other attribute, it will
be foreign key to the attribute to which it refers. The relation which is being referenced is called referenced
relation and corresponding attribute is called referenced attribute and the relation which refers to referenced
relation is called referencing relation and corresponding attribute is called referencing attribute. Referenced
attribute of referenced relation should be primary key for it. For Example, STUD_NO in STUDENT_COURSE
is a foreign key to STUD_NO in STUDENT relation.
It may be worth noting that unlike, Primary Key of any given relation, Foreign Key can be NULL as well as
may contain duplicate tuples i.e. it need not follow uniqueness constraint.
For Example, STUD_NO in STUDENT_COURSE relation is not unique. It has been repeated for the first and
third tuple. However, the STUD_NO in STUDENT relation is a primary key and it needs to be always unique
and it cannot be null.
Please write comments if you find anything incorrect, or you want to share more information about the topic
discussed above
This means that from a super key when we remove all the attributes that are unnecessary for its uniqueness,
only then it becomes a primary/candidate key. So, in essence, all primary/candidate keys are super keys but not
all superkeys are primary/candidate keys. By the formal definition of a Relation(Table), we know that the
tuples of a relation are all unique. So the set of all attributes itself is a super key.
Counting the possible number of superkeys for a table is a common question for GATE. The examples below will
demonstrate all possible types of questions on this topic.
Example-1 : Let a Relation R have attributes {a1,a2,a3} & a1 is the candidate key. Then how many super keys are
possible?
Here, any superset of a1 is the super key.
Super keys are = {a1, a1 a2, a1 a3, a1 a2 a3}
Thus we see that 4 Super keys are possible in this case.
In general, if we have ‘N’ attributes with one candidate key then the number of possible superkeys are 2 (N – 1).
Example-2 : Let a Relation R have attributes {a1, a2, a3,…,an}. Find Super key of R.
Maximum Super keys = 2n – 1.
If each attribute of relation is candidate key.
Example-3 : Let a Relation R have attributes {a1, a2, a3,…,an} and the candidate key is “a1 a2 a3” then the possible
number of super keys?
Following the previous formula, we have 3 attributes instead of one. So, here the number of possible superkeys are 2 (N-
3)
.
Example-4 : Let a Relation R have attributes {a1, a2, a3,…,an} and the candidate keys are “a1”, “a2” then the possible
number of super keys?
This problem now is slightly different since we now have two different candidate keys instead of only one. Tackling
problems like these is shown in the diagram below:
= (superkeys possible with candidate key A1) + (superkeys possible with candidate key A2)
– (common superkeys from both A1 and A2)
Example-6 : Let a Relation R have attributes {a1, a2, a3,…,an} and the candidate keys are “a1 a2”, “a3 a4”
then the possible number of super keys?
Super keys of(a1 a2) + Super keys of(a3 a4) – Super keys of(a1 a2 a3 a4)
⇒ 2(n – 2) + 2(n – 2) – 2(n – 4)
Example-7 : Let a Relation R have attributes {a1, a2, a3,…,an} and the candidate keys are “a1 a2”, “a1 a3”
then the possible number of super keys?
Super keys of(a1 a2) + Super keys of(a1 a3) – Super keys of(a1 a2 a3)
⇒ 2(n – 2) + 2(n – 2) – 2(n – 3)
Example-8 : Let a Relation R have attributes {a1, a2, a3,…,an} and the candidate keys are “a1”, “a2”, “a3” then
the possible number of super keys?
In this question, we have 3 different candidate keys. Tackling problems like these are shown in the diagram
below.
→ |A1 ∪ A2 ∪ A3| = |A1| + |A2| + |A3| – |A1 ∩ A2| – |A1 ∩ A3| – |A2 ∩ A3| + |A1 ∩ A2 ∩ A3|
= (superkeys possible with candidate key A1) + (superkeys possible with candidate key A2) + (superkeys
possible with candidate key A3) – (common superkeys from both A1 and A2) – (common superkeys from both
A1 and A3) – (common superkeys from both A2 and A3) + (common superkeys from both A1, A2 and A3)
Step 1:- First of all, we have to find what the candidate keys are:-
as we can see in given functional dependency D is missing but in relation, D is given so D must be a prime
attribute of the Candidate key.
#Supekeys = 4 * 26 – 6 * 25 + 4 * 24 – 23 = 120