Entity Relationship Diagram: Information
Entity Relationship Diagram: Information
Entity Relationship Diagram: Information
When a company asks you to make them a working, functional Database Management System (DBMS)
which they can work with, there are certain steps to follow. Let us summarize them here:
1. Gathering information: This could be a written document that describes the system in question
with reasonable amount of details.
2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the
description we have gathered about the system.
3. Designing the database: Out of the ERD we have created, it is very easy to determine the tables,
the attributes which the tables must contain and the relationship among these tables.
4. Normalization: This is a process of removing different kinds of impurities from the tables we
have just created in the above step.
How to Prepare an ERD
Step 1
Let us take a very simple example and we try to reach a fully organized database from it. Let us
look at the following simple statement:
A boy eats an ice cream.
This is a description of a real word activity, and we may consider the above statement as a written
document (very short, of course).
Step 2
Now we have to prepare the ERD. Before doing that we have to process the statement a little. We
can see that the sentence contains a subject (boy), an object (ice cream) and a verb (eats) that
defines the relationship between the subject and the object. Consider the nouns as entities
(boy and ice cream) and the verb (eats) as a relationship. To plot them in the diagram, put the
nouns within rectangles and the relationship within a diamond. Also, show the relationship with a
directed arrow, starting from the subject entity (boy) towards the object entity (ice cream).
Well, fine. Up to this point the ERD shows how boy and ice cream are related. Now, every boy must have
a name, address, phone number etc. and every ice cream has a manufacturer, flavor, price etc. Without these
the diagram is not complete. These items which we mentioned here are known as attributes, and they must
be incorporated in the ERD as connected ovals.
But can only entities have attributes? Certainly not. If we want then the relationship must have their
attributes too. These attribute do not inform anything more either about the boy or the ice cream, but they
provide additional information about the relationships between the boy and the ice cream.
Step 3
We are almost complete now. If you look carefully, we now have defined structures for at least
three tables like the following:
However, this is still not a working database, because by definition, database should be collection of
related tables. To make them connected, the tables must have some common attributes. If we chose the
attribute Name of the Boy table to play the role of the common attribute, then the revised structure of the
above tables become something like the following.
his is as complete as it can be. We now have information about the boy, about the ice cream he has
eaten and about the date and time when the eating was done.
Cardinality of Relationship
While creating relationship between two entities, we may often need to face the cardinality
problem. This simply means that how many entities of the first set are related to how many entities
of the second set. Cardinality can be of the following three types.
One-to-One
Only one entity of the first set is related to only one entity of the second set. E.g. A teacher teaches
a student. Only one teacher is teaching only one student. This can be expressed in the
followingdiagram as:
Many-to-One
Multiple entities of the first set are related to multiple entities of the second set. E.g. Teachers
teach a student. Many teachers are teaching only one student. This can be expressed in the
followingdiagram as:
Many-to-Many
Multiple entities of the first set is related to multiple entities of the second set. E.g. Teachers teach
students. In any school or college many teachers are teaching many students. This can be
consideredas a two way one-to-many relationship. This can be expressed in the following diagram
as:
In this discussion we have not included the attributes, but you can understand that they can be used
without any problem if we want to.
Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row in
a table. Generally they contain unique values and can never contain NULL values. There can be
more than one super key or candidate key in a table e.g. within a STUDENT table Roll and Mobile
No. can both serve to uniquely identify a student.
Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the entire
table. E.g. although there are two candidate keys in the STUDENT table, the college would
obviously use Roll as the primary key of the table.
Alternate Key: This is the candidate key which is not chosen as the primary key of the table. They
are named so because although not the primary key, they can still identify a row.
Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single
class Roll is enough to find a student, but in the entire school, merely searching by the Roll is not
enough, because there could be 10 classes in the school and each one of them may contain a certain
roll no 5. To uniquely identify the student we have to say something like class VII, roll no 5. So,
a combination of two or more attributes is combined to create a unique combination of values,
such as Class + Roll.
Foreign Key: Sometimes we may have to work with an attribute that does not have a primary key
of its own. To identify its rows, we have to use the primary attribute of a related table. Such a copy
of another related tables primary key is called foreign key.
Strong and Weak Entity
Based on the concept of foreign key, there may arise a situation when we have to relate an entity
having a primary key of its own and an entity not having a primary key of its own. In such a case,
the entity having its own primary key is called a strong entity and the entity not having its own
primary key is called a weak entity. Whenever we need to relate a strong and a weak entity
together, the ERD would change just a little.
Say, for example, we have a statement A Student lives in a Home. STUDENT is obviously a
strong entity having a primary key Roll. But HOME may not have a unique primary key, as its
only attribute Address may be shared by many homes (what if it is a housing estate?). HOME is a
weak entity in this case.
The ERD of this statement would be like the following
As you can see, the weak entity itself and the relationship linking a strong and weak entity must
have double border.
Different Types of Database
There are three different types of data base. The difference lies in the organization of the database
and the storage structure of the data. We shall briefly mention them here.
Relational DBMS
This is our subject of study. A DBMS is relational if the data is organized into relations, that is,
tables. In RDBMS, all data are stored in the well-known row-column format.
Hierarchical DBMS
In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among
data items and the data model is very suitable for representing one-to-many relationship. To access
the data items, some kind of tree-traversal techniques are used, such as preorder traversal.
Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to
organize a hierarchical database into row column format. For example, consider the following
hierarchical database that shows four employees (E01, E02, E03, and E04) belonging to the same
department D1.
There are two ways to represent the above one-to-many information into a relation that is built in one-to-
one relationship. The first is called Replication, where the department id is replicated a number of times in
the table like the following.
Replication makes the same data item redundant and is an inefficient way to store data. A better way is to
use a technique called the Virtual Record. While using this, the repeating data item is not used in the table.
It is kept at a separate place. The table, instead of containing the repeating information, contains a pointer
to that place where the data item is stored.
Two main types of connection traps are called fan traps and chasm traps.
Fan Trap. It occurs when a model represents a relationship between entity types, but
pathway between certain entity occurrences is ambiguous.
Chasm Trap. It occurs when a model suggests the existence of a relationship between
entity types, but pathway does not exist between certain entity occurrences.
Now, we will discuss the each trap in detail address)
Fan Trap
A fan trap occurs when one to many relationships fan out from a single entity.
For example: Consider a database of Department, Site and Staff, where one site can
contain number of department, but a department is situated only at a single site. There
are multiple staff members working at a single site and a staff member can work from a
single site. The above case is represented in e-r diagram shown.
The problem of above e-r diagram is that, which staff works in a particular department remain
answered. The solution is to restructure the original E-R model to' represent the correct
association as shown.
In other words the two entities should have a direct relationship between them to provide
the necessary information.
There is one another way to solve the problem of e-r diagram of figure, by introducing
direct relationship between DEPT and STAFF as shown in figure.
Another example: Let us consider another case, where one branch contains multiple staff
members and cars, which are represented.
The problem of above E-R diagram is that, it is unable to tell which member of staff uses a
particular, which is represented. It is not possible tell which member of staff uses' car SH34.
E-R NOTATION
There is no standard for representing data objects in ER diagrams. Each modeling methodology
uses its own notation.
All notational styles represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent the cardinality of
connection. The symbols used for the basic ER constructs are:
Entities are represented by labeled rectangles. The label is the name of the entity. Entity
names should be singular nouns.
Attributes are represented by Ellipses.
A solid line connecting two entities represents relationships. The name of the
relationship is written above the line. Relationship names should be verbs and diamonds
sign is used to represent relationship sets.
Attributes, when included, are listed inside the entity rectangle. Attributes, which are
identifiers, are underlined. Attribute names should be singular nouns.
Multi-valued attributes are represented by double ellipses.
Directed line is used to indicate one occurrence and undirected line is used to indicate
many occurrences in a relation.
The symbols used to design an ER diagram are shown.
The ER diagram showing the usage of different symbols
What is attributes?
Attributes means characteristics. For instance, in a database or a spreadsheet you can apply
attributes to each field or cell to customize your document. As a general attribute, you can choose
whether it is to be a text field or a numeric field or perhaps a computed field, whose value the
application calculates for you.
Then you can apply more specific attributes to the field or cell, such as making the text
bold and right-aligned and perhaps in a particular typeface. If a field is numeric, you will
have other attribute options available, such as how many decimal places to display,
whether to use a dollar sign or a percent symbol, or whether to start a formula in the cell.
There will be only one conceptual view, consisting of the abstract representation of the database
in its entirely. Similarly there will be only one internal or physical view, representing the total
database, as it is physically stored.
Schema
It is important to note that the data in the database changes frequently, while the plans or schemes
remain the same over long periods of time. The database plans consist of types of entities that a
database deals with, the relationship among these entities and the ways in which the entities and
relationships are expressed from one level of abstraction to the next level for the users' view. The
users' view of the data (also called logical organization of data) should be in a form that is most
convenient for the users and they should not be concerned about the way data is physically
organized. Therefore, a DBMS should do the translation between the logical (users' view)
organization and the physical organization of the data in the database.
The plan or scheme of the database is known as Schema. Schema gives the names of the entities
and attributes. It specifies the relationship among them. It is a framework into which the values of
the data items (or fields) are fitted. The plans or the format of schema remains the same. But the
values fitted into this format changes from instance to instance. In other terms, schema means
overall plans of all the data item (field) types and record types stored in a database. Schema
includes the definition of the database name, the record type and the components that make up
those records
Types of Schema
There are three different types of schema in the database corresponding to each data view of
database. In other words, the data views at each of three levels are described by schema.
A schema is defined as an outline or a plan that describes the records and relationships
existing at the particular level. The External view is described by means of a schema called
external schema that correspond to different views of the data. Similarly the Conceptual
view is defined by conceptual schema, which describes all the entities, attributes, and
relationship together with integrity constraints. Internal View is defined by internal
schema, which is a complete description of the internal model, containing definition of
stored records, the methods of representation, the data fields, and the indexes used.
There is only one conceptual schema and one internal schema per database. The schema
also describes the way in which data elements at one level can be mapped to the
corresponding data elements in the next level.
Thus, we can say that schema establishes correspondence between the records and
relationships in the two levels. In a relational database, the schema defines the tables, the
fields in each table, and the relationships between fields and tables. Schema are generally
stored in a data dictionary.
The data in the database at any particular point in time is called a database instance.
Therefore, many database instances can correspond to the same database schema. The
schema is sometimes called the intension of the database, while an instance is called an
extension (or state) of the database.
Example: To understand the difference between the three levels, consider again the database
schema that describes College Database system. If User1 is a Library clerk, the external view
would contain only the student and book information. If User2 is an account office clerk then
he/she may be interested in students detail and fee detail. Shows specific information actually
available at each level regarding a particular user.
The external view would depend upon the user who is accessing the database. The conceptual level
contain the logical view of the whole database, it represents the data type of each required field.
The internal view represents the physical location of each element on the disk of the servers well
as how many bytes of storage each element needs.
Relational Model
Relational model stores data in the form of tables. This concept purposed by Dr. E.F. Codd, a researcher of
IBM in the year 1960s. The relational model consists of three major components:
1. The set of relations and set of domains that defines the way data can be represented (data
structure).
2. Integrity rules that define the procedure to protect the data (data integrity).
3. The operations that can be performed on data (data manipulation).
A rational model database is defined as a database that allows you to group its data items into one
or more independent tables that can be related to one another by using fields common to each
related table.
Hardware overheads: Relational database system hides the implementation complexities and the
physical data storage details from the users. For doing this, i.e. for making things easier for the
users, the relational database systems need more powerful hardware computers and data storage
devices. So, the RDBMS needs powerful machines to run smoothly. But, as the processing power
of modem computers is increasing at an exponential rate and in today's scenario, the need for more
processing power is no longer a very big issue.
Ease of design can lead to bad design: The relational database is an easy to design and use. The
users need not know the complex details of physical data storage. They need not know how the
data is actually stored to access it. This ease of design and use can lead to the development and
implementation of very poorly designed database management systems. Since the database is
efficient, these design inefficiencies will not come to light when the database is designed and when
there is only a small amount of data. As the database grows, the poorly designed databases will
slow the system down and will result in performance degradation and data corruption.
'Information island' phenomenon: As we have said before, the relational database systems are
easy to implement and use. This will create a situation where too many people or departments will
create their own databases and applications.
These information islands will prevent the information integration that is essential for the smooth
and efficient functioning of the organization. These individual databases will also create problems
like data inconsistency, data duplication, data redundancy and so on.
But as we have said all these issues are minor when compared to the advantages and all these
issues could be avoided if the organization has a properly designed database and has enforced good
database standards.