ER Model Info
ER Model Info
ER Model Info
Entity
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For
example, in a school database, students, teachers, classes, and courses offered can be considered
as entities. All these entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with
attribute sharing similar values. For example, a Students set may contain all the students of a
school; likewise a Teachers set may contain all the teachers of a school from all faculties. Entity
sets need not be disjoint.
Attributes
Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a
student's name cannot be a numeric value. It has to be alphabetic. A student's age cannot be
negative, etc.
Types of Attributes
Simple attribute Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.
Composite attribute Composite attributes are made of more than one simple attribute.
For example, a student's complete name may have first_name and last_name.
Derived attribute Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.
Multi-value attribute Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
Super Key A set of attributes (one or more) that collectively identifies an entity in an entity
set.
Candidate Key A minimal super key is called a candidate key. An entity set may have
more than one candidate key.
Primary Key A primary key is one of the candidate keys chosen by the database designer
to uniquely identify the entity set.
Relationship
The association among entities is called a relationship. For example, an employee works_ata
department, a student enrolls in a course. Here, Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like entities, a relationship too can
have attributes. These attributes are called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be associated with the
number of entities of other set via relationship set.
One-to-one One entity from entity set A can be associated with at most one entity of entity
set B and vice versa.
One-to-many One entity from entity set A can be associated with more than one entities
of entity set B however an entity from entity set B, can be associated with at most one
entity.
Many-to-one More than one entities from entity set A can be associated with at most one
entity of entity set B, however an entity from entity set B can be associated with more than
one entity from entity set A.
Many-to-many One entity from A can be associated with more than one entity from B and
vice versa.
Let us now learn how the ER Model is represented by means of an ER diagram. Any object, for
example, entities, attributes of an entity, relationship sets, and attributes of relationship sets, can be
represented with the help of an ER diagram.
Entity
Entities are represented by means of rectangles. Rectangles are named with the entity set they
represent.
Attributes
Attributes are the properties of entities. Attributes are represented by means of ellipses. Every
ellipse represents one attribute and is directly connected to its entity (rectangle).
If the attributes are composite, they are further divided in a tree like structure. Every node is then
connected to its attribute. That is, composite attributes are represented by ellipses that are
connected with an ellipse.
Derived attributes
depicted
by
are
dashed
ellipse.
Relationship
Relationships are represented by diamond-shaped box. Name of the relationship is written inside
the diamond-box. All the entities (rectangles) participating in a relationship, are connected to it by a
line.
One-to-one When only one instance of an entity is associated with the relationship, it is
marked as '1:1'. The following image reflects that only one instance of each entity should be
associated with the relationship. It depicts one-to-one relationship.
One-to-many When more than one instance of an entity is associated with a relationship,
it is marked as '1:N'. The following image reflects that only one instance of entity on the left
and more than one instance of an entity on the right can be associated with the relationship.
It depicts one-to-many relationship.
Many-to-one When more than one instance of entity is associated with the relationship, it
is marked as 'N:1'. The following image reflects that more than one instance of an entity on
the left and only one instance of an entity on the right can be associated with the
relationship. It depicts many-to-one relationship.
Many-to-many The following image reflects that more than one instance of an entity on
the left and more than one instance of an entity on the right can be associated with the
relationship. It depicts many-to-many relationship.
Participation Constraints
Partial participation Not all entities are involved in the relationship. Partial participation is
represented by single lines.
The ER Model has the power of expressing database entities in a conceptual hierarchical manner.
As the hierarchy goes up, it generalizes the view of entities, and as we go deep in the hierarchy, it
gives us the detail of every entity included.
Going up in this structure is called generalization, where entities are clubbed together to represent
a more generalized view. For example, a particular student named Mira can be generalized along
with all the students. The entity shall be a student, and further, the student is a person. The reverse
is called specialization where a person is a student, and that student is Mira.
Generalization
As mentioned above, the process of generalizing entities, where the generalized entities contain the
properties of all the generalized entities, is called generalization. In generalization, a number of
entities are brought together into one generalized entity based on their similar characteristics. For
example, pigeon, house sparrow, crow and dove can all be generalized as Birds.
Specialization
Specialization is the opposite of generalization. In specialization, a group of entities is divided into
sub-groups based on their characteristics. Take a group Person for example. A person has name,
date of birth, gender, etc. These properties are common in all persons, human beings. But in a
company, persons can be identified as employee, employer, customer, or vendor, based on what
role they play in the company.
Similarly, in a school database, persons can be specialized as teacher, student, or a staff, based on
what role they play in school as entities.
Inheritance
We use all the above features of ER-Model in order to create classes of objects in object-oriented
programming. The details of entities are generally hidden from the user; this process known
as abstraction.
Inheritance is an important feature of Generalization and Specialization. It allows lower-level entities
to inherit the attributes of higher-level entities.
For example, the attributes of a Person class such as name, age, and gender can be inherited by
lower-level entities such as Student or Teacher.
RDBMS
Introduction
The relational database is the most common form of database found today and is at
the heart of a multi-billion dollar global industry. If you have not directly come across
one you can be assured you will have unknowingly used many products or services
where a relational database is at the core of their business. During this unit you will be
introduced to designing a relational database using methodologies such as
Normalisation and Entity Relationship Modelling, implementing your design using
Structured Query Language (SQL) and finally interacting with your implementation
using SQL and higher level programming languages. But first let us start from the
beginning in defining what a relational database is.
Though you are probably aware of the term database, what youre actually referring to
is a Database Management System. A database in specific terms is a collection of
related data managed as a single unit and stored on some form of persistent storage
device such as a hard disk. This collection of data is managed by the Relational
Database Management System (RDBMS) which acts to control access to the database.
The RDBMS is a collection of applications sitting on top of the database providing a
number of vital functions which include:
Allows users to create databases
Allow users to query the data stored
Support and maintain large amounts of data
Allow multiple users to access the data concurrently
Most importantly the RDBMS provides consistency, integrity and security of the data
it holds and as such the reliability and robustness of the relational database it looks
after.
Separation of Concept
When we consider a Relational Database we can illustrate it as a set of conceptual
components and interfaces. This helps us get a better overview of how each of the
elements to be learnt fit together and interact with each other.
Database can be considered as a collection of related data
managed as a single unit
DBMS a collection of applications providing a number of vital
functions for the database
The other four are our perception of design, implementation, use and maintenance of
the database and how we interact with it to achieve our goals of efficient data storage
and information retrieval. These are the areas we will be looking at within this unit but
for now they are:
When we communicate about a particular subject area it is very important we use the
common terms of reference. Relational database design is no different with its own set
of unique terms, and in order to express or define our ideas and concepts we need to
know the language to use. With Relational Databases this set of terms goes further
being used to express and define the database design process itself and is used
throughout the industry from publications and education to conversations between
database professionals. In this section we are going to look at some of these terms and
where they are used.
The process of creating a database can be broadly divided into two main stages:
1. Data analysis, using a formalised methodology to create a
database design. Two widely used methods are Entity
Relationship Modelling (ER) and Normalisation.
2. Physical implementation of that design in a database system.
There are many examples of Relational databases including
MySQL, Oracle, SQLServer to mention but a few.
As you move from a database design to a physical implementation, different
terminology is used. It is important to understand these differences and ensure the
correct terms are used for the appropriate Methodology or stage you are discussing or
presenting.
The following table identifies each of the different disciplines and their equivalent
terms in relation to the other disciplines.
Entity:
A uniquely identifiable object of important from a top level
perspective of an organisation or business model.
Entity Occurrence:
A single instance of an entity.
Attribute:
An identified element within an entity
If we consider a college as an example of something we might be modelling, a
department, student or module might be examples of a uniquely identifiable object of
important. It is worth noting that by convention we tend to name entities in the
singular as in student rather than students.
Continuing with this example, within a Student entity we would have attributes such
as studentId and studentName and an entity occurrence would be a single instance of
these attributes.
Normalisation Terms
Relation:
A bottom up view of a design concept of a realisation of a
potential database table taken from the movement of
information within an organisation or business model
Tuple:
An ordered finite set of values of a relation.
Domain:
Defines the constraint and type of a single value element of a
relation.
Table:
A table is the conceptual view of the databases internal
structure in the context of the 3 layer model.
Record/Row:
A row/record is a set of related data values of a common item.
Column:
A column is a data value of a particular item type.
This is the implementation of the design and an example can be seen in the following
SQL statement
Relational Model
is representative of a single entity or relation within the context
of a relational database where each of the elements of the
entity or relation have been defined. For example a single
entity has its attributes, constraints and keys defined which are
representative of the completed table to be implemented.
Relational Schema
is a realisation of a relational model. It is the implementation of
this model into the relational database. In other words, it is a
physical single table within the database.
Relational Database Schema
is the collection of Relational Schemas and their relationships
to each other as implemented into a relational database. In
other words, it is a collection of physical tables and their
relationships that make up the database as a whole.
Database Keys
Introduction
For the purposes of clarity we will refer to keys in terms of RDBMS tables but the
same definition, principle and naming applies equally to Entity Modelling and
Normalisation.
Keys are, as their name suggests, a key part of a relational database and a vital part of
the structure of a table. They ensure each record within a table can be uniquely
identified by one or a combination of fields within the table. They help enforce
integrity and help identify the relationship between tables. There are three main types
of keys, candidate keys, primary keys and foreign keys. There is also an alternative
key or secondary key that can be used, as the name suggests, as a secondary or
alternative key to the primary key
Super Key
A Super key is any combination of fields within a table that uniquely identifies each
record within that table.
Candidate Key
A candidate is a subset of a super key. A candidate key is a single field or the least
combination of fields that uniquely identifies each record in the table. The least
combination of fields distinguishes a candidate key from a super key. Every table
must have at least one candidate key but at the same time can have several.
Once your candidate keys have been identified you can now select one to be your
primary key
Primary Key
A primary key is a candidate key that is most appropriate to be the main reference key
for the table. As its name suggests, it is the primary key of reference for the table and
is used throughout the database to help establish relationships with other tables. As
with any candidate key the primary key must contain unique values, must never be
null and uniquely identify each record in the table.
As an example, a student id might be a primary key in a student table, a department
code in a table of all departments in an organisation. This module has the code DH3D
35 that is no doubt used in a database somewhere to identify RDBMS as a unit in a
table of modules. In the table below we have selected the candidate key student_id to
be our most appropriate primary key
Primary keys are mandatory for every table each record must have a value for its
primary key. When choosing a primary key from the pool of candidate keys always
choose a single simple key over a composite key.
Foreign Key
A foreign key is generally a primary key from one table that appears as a field in
another where the first table has a relationship to the second. In other words, if we had
a table A with a primary key X that linked to a table B where X was a field in B, then
X would be a foreign key in B.
An example might be a student table that contains the course_id the student is
attending. Another table lists the courses on offer with course_id being the primary
key. The 2 tables are linked through course_id and as such course_id would be a
foreign key in the student table.
A table may have one or more choices for the primary key. Collectively these are
known as candidate keys as discuss earlier. One is selected as the primary key. Those
not selected are known as secondary keys or alternative keys.
For example in the table showing candidate keys above we identified two candidate
keys, studentId and firstName + lastName. The studentId would be the most
appropriate for a primary key leaving the other candidate key as secondary or
alternative key. It should be noted for the other key to be candidate keys, we are
assuming you will never have a person with the same first and last name combination.
As this is unlikely we might consider fistName+lastName to be a suspect candidate
key as it would be restrictive of the data you might enter. It would seem a shame to
not allow John Smith onto a course just because there was already another John
Smith.
Simple Key
Any of the keys described before (ie primary, secondary or foreign) may comprise one
or more fields, for example if firstName and lastName was our key this would be a
key of two fields where as studentId is only one. A simple key consists of a single
field to uniquely identify a record. In addition the field in itself cannot be broken
down into other fields, for example, studentId, which uniquely identifies a particular
student, is a single field and therefore is a simple key. No two students would have the
same student number.
Compound Key
A compound key consists of more than one field to uniquely identify a record. A
compound key is distinguished from a composite key because each field, which
makes up the primary key, is also a simple key in its own right. An example might be
a table that represents the modules a student is attending. This table has a studentId
and a moduleCode as its primary key. Each of the fields that make up the primary key
are simple keys because each represents a unique reference when identifying a student
in one instance and a module in the other.
Composite
A composite key consists of more than one field to uniquely identify a record. This
differs from a compound key in that one or more of the attributes, which make up the
key, are not simple keys in their own right. Taking the example from compound key,
imagine we identified a student by their firstName + lastName. In our table
representing students on modules our primary key would now be firstName +
lastName + moduleCode. Because firstName + lastName represent a unique reference
to a student, they are not each simple keys, they have to be combined in order to
uniquely identify the student. Therefore the key for this table is a composite key.
Normalisation Overview
The Normalisation Process
Initially there were only three normal forms, First Normal Form (1NF), Second
Normal Form (2NF) and Third Normal Form (3NF), but over time three more were
added. In general terms the first three are more commonly used in database modelling.
The additional three are identification of potential redundancies that could be
considered but however when applied practically can lead to inefficiencies in
performance and tend to be used under special circumstances or for consideration with
complex data structures.
In addition we have something called Un-Normalised Form (UNF), though not
generally considered as part of the normalisation rules, is representative of the very
first stages of the normalisation process.
We can identify each of the normal forms as follows and will define each in detail
thereafter:
1. Un-Normalised Form (UNF) Data Modelling
Un-normalised Form
Create Model Data
Un-Normalised Form (UNF)
In this example with Student ID as the primary key we see the three domains,
StudentName, Year and Semester repeat themselves across the tuples for each of the
different UnitCode and UnitName entries. Though workable it means our relation
could potentially be huge with loads of repeating data taking up valuable space and
costing valuable time to search through.
The rules of First Normal Form break this relation into two and relate them to each
other so the information needed can be found without storing unneeded data. So from
our example we would have one table with the student information and another with
the Unit Information with the two relations linked by a domain common to both, in
this case, the StudentId.
So the steps from UNF to 1NF are:
1. Identify repeating groups of data. Make sure your model data is
of good quality to help identify the repeating groups and dont
be afraid to move the domains around to help with the process.
2. Remove the domains of the repeating groups to a new relation
leaving a copy of the primary key with the relation that is left.
Would it not seem more sensible to have a different relation we could use to look up
UG45783 and find the unit name Advanced Database? This way we wouldnt have
to store lots of additional duplicate information in our Student/Unit relation.
This is exactly what we aim to achieve with Second Normal Form and its purpose is
to remove partial dependancies.
We can consider a relation to be in Second Normal Form when: The relation is in First
Normal Form and all partial key dependencies are removed so that all non key
domains are functionally dependant on all of the domains that make up the primary
key.
Before we start with the steps, if we have a table with only a single simple key this
cant have any partial dependencies as there is only one domain that is a key therfroe
these relations can be moved directly to 2nd normal form.
For the rest the steps from 2NF to 3NF are:
1. Take each non-key domain in turn and check if it is only
dependant on part of the key?
2. If yes
a. Remove the non-key domain along with a copy of the part
of the key it is dependent upon to a new relation.
current location, in order to get to A, our primary key, we need to step on a stepping
stone B, another non-key domain, to help us get there. Of course we could jump
directly from C to A, but it is easier, and we are less likely to fall in, if we use our
stepping stone B. Therefore current location C is transitively dependent on A through
our stepping stone B.
Before we start with the steps, if we have any relations with zero or only one non-key
domain we cant have a transitive dependency so these move straight to 3rd Normal
Form
For the rest the steps from 2NF to 3NF are:
1. Take each non-key domain in turn and check it is more
dependent on another non-key domain than the primary key.
2. If yes
a. Move the dependent domain, together with a copy of the
non-key attribute upon which it is dependent, to a new
relation.
b. Make the non-key domain, upon which it is dependent,
the key in the new relation.
A multiple repeating group is where there are two or more repeating groups within the
relation to be normalised. The data within each repeating group is unconnected with
any other data within a repeating group.
The normalisation process is exactly the same as described previously, except it needs
to be carried out for each repeating group. Each group of repeating domains will result
in a separate table.
Nested Repeating Groups
A nested repeating group is where there are two or more repeating groups within the
relation to be normalised. The data within one repeating group is nested within
another repeating group. That is, the data is related.
The normalisation process is exactly the same as described previously, except it needs
to be carried out repeatedly till the relation is in 1st Normal form.
The upshot of both of these anomalies is you should never move on to 2nd Normal
Form until you are absolutely sure the relation you are dealing with is defiantly in 1st
Normal Form.