Database Management System
Database Management System
Database Management System
A DBMS is the acronym of Data Base Management System is a collection of interrelated data and
a set of programs to access those data. It manages new large amount of data and supports
efficient access to new large amounts of data.
Features of Database:
Simplicity: Simplicity requires that the design and implementation avoid introducing more
elements than are absolutely necessary.
Right kind of element: Attributes are easier to implement but entity sets are relationships
are necessary to ensure that the right kind of element is introduced.
Types of Database
Database Designer: Person responsible for preparing external schemas for applications,
identifying and integrating user needs into a conceptual (or community or enterprise)
schema.
Database Administrator: Person responsible for define the internal schema, sub-schemas
(with database designers) and specifying mappings between schemas, monitoring
database usage and supervising DBMS functionality (e.g., access control, performance
optimisation, backup and recovery policies, conflict management).
End Users: Users who query and update the database through fixed programs (invoked by
non-programmer users) e.g., banking.
Levels of Data Abstraction: A 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the database. It is the most widely
used architecture to design a DBMS.
Physical Level: It is lowest level of abstraction and describes how the data are actually
stored and complex low level data structures in detail.
Logical Level: It is the next higher level of abstraction and describes what data are stored
and what relationships exist among those data. At the logical level, each such record is
described by a type definition and the interrelationship of these record types is defined as
well. Database administrators usually work at this level of abstraction.
View Level: It is the highest level of abstraction and describes only part of the entire
database and hides the details of the logical level.
Relational Algebra:
Relational model is completely based on relational algebra. It consists of a collection of operators
that operate on relations. Its main objective is data retrieval. It is more operational and very much
useful to represent execution plans, while relational calculus is non-operational and declarative.
Here, declarative means user define queries in terms of what they want, not in terms of how
compute it.
Basic Operation in Relational Algebra
The operations in relational algebra are classified as follows.
Selection (): The select operation selects tuples/rows that satisfy a given predicate or condition.
We use () to denote selection. The predicate/condition appears as a subscript to .
Projection (): It selects only required/specified columns/attributes from a given relation/table.
Projection operator eliminates duplicates (i.e., duplicate rows from the result relation).
Union (): It forms a relation from rows/tuples which are appearing in either or both of the
specified relations. For a union operation R S to be valid, below two conditions must be satisfied.
The relations Rand S must be of the same entity i.e., they must have the same number of
attributes.
The domains. of the i th attribute of R and i th attribute of S must be the same, for all i.
Intersection (): It forms a relation of rows/ tuples which are present in both the relations R and S.
Ensure that both relations are compatible for union and intersection operations.
Set Difference (-): It allows us to find tuples that are in one relation but are not in another. The
expression R S produces a relation containing those tuples in R but not in S.
Cross Product/Cartesian Product (): Assume that we have n 1 tuples in R and n 2 tuples in S.
Then, there are n 1 * n 2 ways of choosing a pair of tuples; one tuple from each relation. So, there will
be (n 1 * n 2) tuples in result relation P if P = R S.
Schema:
A schema is also known as database schema. It is a logical design of the database and a database
instance is a snapshot of the data in the database at a given instant of time. A relational schema
consists of a list of attributes and their corresponding domains.
Types of Schemas: It can be classified into three parts, according to the levels of abstraction
Data model :
A data model is a plan for building a database. Data models define how data is connected to each
other and how they are processed and stored inside the system.
Two widely used data models are:
Entity :
An entity may be an object with a physical existence or it may be an object with a conceptual
existence. Each entity has attributes. A thing (animate or inanimate) of independent physical or
conceptual existence and distinguishable. In the University database context, an individual student,
faculty member, a class room, a course are entities.
Attributes
Each entity is described by a set of attributes/properties.
Types of Attributes
Simple Attributes: having atomic or indivisible values. example: Dept a string Phone
Number - an eight digit number
Composite Attributes: having several components in the value. example: Qualification with
components (Degree Name, Year, University Name)
Derived Attributes: Attribute value is dependent on some other attribute. example: Age
depends on Date Of Birth. So age is a derived attribute.
Single-valued: having only one value rather than a set of values. for instance, Place Of
Birth - single string value.
Multi-valued: having a set of values rather than a single value. for instance, Courses
Enrolled attribute for student Email Address attribute for student Previous Degree attribute
for student.
Keys
A super key of an entity set is a set of one or more attributes whose values uniquely
determine each entity.
Although several candidate keys may exist, one of the candidate keys is selected to be
the primary key.
NOTE: this means a pair of entity sets can have at most one relationship in a particular
relationship set.
If we wish to track all access-dates to each account by each customer, we cannot assume a
relationship for each access. We can use a multivalued attribute though.
Must consider the mapping cardinality of the relationship set when deciding the what are
the candidate keys.
Need to consider semantics of relationship set in selecting the primary key in case of more
than one candidate key.
ER Modeling:
Entity-Relationship model (ER model) in software engineering is an abstract way to describe a
database. Describing a database usually starts with a relational database, which stores data in
tables.
Notations/Shapes in ER Modeling
Notations/Shapes in ER Modeling:
The overall logical structure of a database can be expressed graphically by an E-R diagram. The
diagram consists of the following major components.
Lines: links attribute set to entity set and entity set to relationship set.
One to One: An entity in A is associated with at most one entity in B and an entity in B is
associated with at most one entity in A.
One to Many: An entity in A is associated with any number (zero or more) of entities; in B.
An entity in B, however, can be associated with at most one entity in A.
Many to Many: An entity in A is associated with any number (zero or more) c entities in B
and an entity B is associated with any number (zero or more) of entities in A.
Specialization: Consider an entity set person with attributes name, street and city, A person may
be further classified-as one of the following: Customer, and Employee. Each of these person types
is described by a set of attributes 1 at includes all the attributes of entity set person plus possibly
additional attributes. The process of designating subgroupings within an entity set is called
specialization.
The specialization of person allows us to distinguish among persons according to whether they are
employees or customers,
The refinement from an initial entity set into successive levels of entity subgroupings represents a
top-down design process in which distinctions are made explicitly.
Generalization: Basically generalization is a simple inversion of specialization. Some common
attributes of multiple entity sets are chosen to create higher level entity set. If the customer entity
set and the employee entity set are having several attributes in common, then this commonality
can be expressed by generalization.
Here, person is the higher level entity set and customer and employee are lower level entity sets.
Higher and lower level entity sets also may be designated by- the terms super class and subclass,
respectively.
Aggregation: Aggregation is used when we have to model a relationship involving entity set and a
relationship set. Aggregation is an abstraction through which relationships are treated as higher
level entities.
Integrity Constraints:
Necessary conditions to be satisfied by the data values in the relational instances so that the set of
data values constitute a meaningful database.
There are four types of Integrity constraints
Domain Constraint: The value of attribute must be within the domain.
Case of semi-trivial FD
S id Sid S name (semi-trivial)
Because on decomposition, we will get
S id Sid (trivial FD) and
S id Sname (non-trivial FD)
Properties of Functional Dependence (FD)
Augmentation: If X Y, then XZ YZ
Attribute Closure: Suppose R(X, Y, Z) be a relation having set of attributes i.e., (X, Y, Z), then (X + )
will be an attribute closure which functionally determines other attributes of the relation (if not all
then atleast itself).
Normal Forms/Normalization:
In relational database design, the normalization is the process for organizing data to minimize
redundancy. Normalization usually involves dividing a database into two or more tables and
defining relationship between the tables. The normal forms define the status of the relation about
the individuated attributes. There are five types of normal forms
First Normal Form (1NF): Relation should not contain any multivalued attributes or relation should
contain atomic attribute. The main disadvantage of 1NF is high redundancy.
Second Normal Form (2NF): Relation R is in 2NF if and only if R should be in 1NF, and R should
not contain any partial dependency.
Partial Dependency: Let R be the relational schema having X, Y, A, which are non-empty set of
attributes, where X = Any candidate key of the relation, Y = Proper subset of any candidate key,
and A = Non-prime attribute (i.e., A doesn't belong to any candidate key)
In the above example, X A already exists and if Y A will exist, then it will become a partial
dependency, if and only if
If any of the above two conditions fail, then Y A will also become fully functional dependency.
Full Functional Dependency: A functional dependency P Q is said to be fully functional
dependency, if removal of any attribute S from P means that the dependency doesn't hold any
more.
(Student_Name, College_Name College_Address)
Suppose, the above functional dependency is a full functional dependency, then we must ensure
that there are no FDs as below.
(Student_Name College_Address)
or (College_Name Collage_Address)
Third Normal Form (3NF): Let R be a relational schema, then any non-trivial FD X Y over R is
in 3NF, if X should be a candidate key or super key or Y should be a prime attribute.
Either both of the above conditions should be true or one of them should be true.
In relation R 1, C is not a candidate key and D is non-prime attribute. Due to this, R 1 fails to
satisfy 3NF condition. Transitive dependency is present here.
Fourth Normal Form (4NF): 4NF is mainly concerned with multivalued dependency A relation is in
4NF if and only if for every one of its non-trivial multivalued dependencies X Y, X is a super
key (i.e., X is either a candidate key or a superset).
Fifth Normal Form (5NF): It is also 'known as Project Join Normal From (PJ/NF). 5NF reduces
redundancy in relational database recording multivalued facts by isolating semantically related
multiple relationships. A table or relation is said to be in the 5NF, if and only if every join
dependency in it is implied by the candidate keys.
SQL:
Structured Ouery language (SQL) is a language that provides an interface to relational database
systems. SQL was developed by IBM in the 1970, for use in system R and is a defector standard,
as well as an ISO and ANSI standard.
To deal with the above database objects, we need a programming language and that
programming languages is known as SOL.
SQL Keyword
Function
CREATE
Used to define
change and drop the
ALTER
DROP
TRUNCATE
Data manipulation
language(DML)
structure of a table
Used to remove all
rows from a table
SELECT
INSERT INTO
UPDATE
DELETE FROM
GRANT
REVOKE
Used to provide
control over the data
in a database
COMMIT
ROLLBACK
Transaction Control Language (TCL): It is used to manage changes affecting the data.
COMMIT To save the work done, such as inserting or updating or deleting data to/from the
table.
SQL Data Types SQL data types specify the type, size and format of data/information that
can be stored in columns and variables.
The HAVING clause can be used to set conditions for the GROUPBY clause.
HAVING clause is similar to the WHERE clause, but having puts conditions on groups.
WHERE clause cant include aggregate: function, while HAVING conditions can do so.
Example:
SELECT Emp_Id, AVG (Salary)
FROM Employee
GROUP BY Emp_Id
HAVING AVG (Salary) > 25000;
Aggregate Functions
Joins: Joins are needed to retrieve data from two tables' related rows on the basis of some
condition which satisfies both the tables. Mandatory condition to join is that atleast one set of
column (s) should be taking values from same domain in each table.
Inner Join: Inner join is the most common join operation used in applications and can be regarded
as the default join-type. Inner join creates a new result table by combining column values of two
tables (A and B) based upon the join-predicate. These may be further divided into three parts.
1. Equi Join (satisfies equality condition)
2. Non-Equi Join (satisfies non-equality condition)
3. Self Join (one or more column assumes the same domain of values).
Outer Join: An outer join does not require each record in the two joined tables to have a matching
record. The joined table retains each record-even if no other matching record exists.
Considers also the rows from table (s) even if they don't satisfy the joining condition
(i) Right outer join (ii) Left outer join (iii) Full outer join
Left Outer Join: The result of a left outer join for table A and B always contains all records of the
left table (A), even if the join condition does not find any matching record in the right table (B).
Result set of T1 and T2
Right Outer Join: A right outer closely resembles a left outer join, except with the treatment of the
tables reversed. Every row from the right table will appear in the joined table at least once. If no
matching with left table exists, NULL will appear.
Full Outer Join: A full outer join combines the effect of applying both left and right outer joins
where records in the FULL OUTER JOIN table do not match, the result set will have NULL values
for every column of the table that lacks a matching row for those records that do match, as single
row will be produced in the result set.
Cross Join (Cartesian product): Cross join returns the Cartesian product of rows form tables in the
join. It will produce rows which combine each row from the first table with each row from the
second table.
Select * FROM T1, T2
Number of rows in result set = (Number of rows in table 1 Number of rows in table 2)
Result set of T1 and T2 (Using previous tables T1 and T2)
StructureStorage:
The storage structure can be divided into two categories:
Volatile storage: As the name suggests, a volatile storage cannot survive system crashes. Volatile
storage devices are placed very close to the CPU; normally they are embedded onto the chipset
itself. For example, main memory and cache memory are examples of volatile storage. They are
fast but can store only a small amount of information.
Non-volatile storage: These memories are made to survive system crashes. They are huge in
data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic
tapes, flash memory, and non-volatile (battery backed up) RAM.
File Organisation:
The database is stored as a collection of files. Each file is a sequence of records. A record is a
sequence of fields. Data is usually stored in the form of records. Records usually describe entities
and their attributes. e.g., an employee record represents an employee entity and each field value in
the record specifies some attributes of that employee, such as Name, Birth-date, Salary or
Supervisor.
Allocating File Blocks on Disk: There are several standard techniques for allocating the blocks of
a file on disk
Contiguous Allocation: The file blocks are allocated to consecutive disk blocks. This
makes reading the whole file very fast.
Linked Allocation: In this, each file contains a pointer to the next file block.
Indexed Allocation: Where one or more index blocks contain pointers to the actual file
blocks.
Files of Unordered Records (Heap Files): In the simplest type of organization records are placed
in the file in the order in which they are inserted, so new records are inserted at the end of the file.
Such an organisation is called a heap or pile file.
This organisation is often used with additional access paths, such as the secondary indexes.
In this type of organisation, inserting a new record is very efficient. Linear search is used to search
a record.
Files of Ordered Records (Sorted Files): We can physically order the records of a file on disk
based on the values of one of their fields called the ordering field. This leads to an ordered or
sequential file. If the ordering field is also a key field of the file, a field guaranteed to have a unique
value in each record, then the field is called the ordering key for the file. Binary searching is used
to search a record.
Indexing Structures for Files: Indexing mechanism are used to optimize certain accesses to data
(records) managed in files. e.g., the author catalog in a library is a type of index. Search key
(definition) attribute or combination of attributes used to look-up records in a file.
An index file consists of records (called index entries) of the form.
Index files are typically much smaller than the original file because only the values for search key
and pointer are stored. The most prevalent types of indexes are based on ordered files (singlelevel indexes) and tree data structures (multilevel indexes).
Types of Single Level Ordered Indexes: In an ordered index file, index enteries are stored sorted
by the search key value. There are several types of ordered Indexes
Primary Index: A primary index is an ordered file whose records are of fixed length with two fields.
The first field is of the same data type as the ordering key field called the primary key of the data
file and the second field is a pointer to a disk block (a block address).
There is one index entry in the index file for each block in the data file.
Dense index A dense index has an index entry for every search key value in the data file.
Sparse index A sparse index (non-dense), on the other hand has index entries for only
some of the search values.
A primary index is a non-dense (sparse) index, since it includes an entry for each disk block
of the data file rather than for every search value.
Clustering Index: If file records are physically ordered on a non-key field which does not have a
distinct value for each record that field is called the clustering field. We can create a different type
of index, called a clustering index, to speed up retrieval of records that have the same value for the
clustering field.
A clustering index is also an ordered file with two fields. The first field is of the same type
as the clustering field of the data file.
Secondary Index: A secondary index provides a secondary means of accessing a file for which
some primary access already exists. The secondary index may be on a field which is a candidate
key and has a unique value in every record or a non-key with duplicate values. The index is an
ordered file with two fields. The first field is of the same data type as some non-ordering field of the
data file that is an indexing field. The second field is either a block pointer or a record pointer. A
secondary index usually needs more storage space and longer search time than does a primary
index.
Multilevel Indexes: The idea behind a multilevel index is to reduce the part of the index. A
multilevel index considers the index file, which will be referred now as the first (or base) level of a
multilevel index. Therefore, we can create a primary index for the first level; this index to the first
level is called the second level of the multilevel index and so on.
Dynamic Multilevel Indexes Using B-Trees and B + -Trees: There are two multilevel indexes
B-Trees
When data volume is large and does not fit in memory, an extension of the binary search
tree to disk based environment is the B-tree.
In fact, since the B-tree is always balanced (all leaf nodes appear at the same level), it is
an extension of the balanced binary search tree.
The problem which the B-tree aims to solve is given a large collection of objects, each
having a key and a value, design a disk based index structure which efficiently supports
query and update.
A B-tree of order p, when used as an access structure on a key field to search for records
in a data file, can be defined as follows
1. Each internal node in the B-tree is of the form
where, q p
In a B-tree, every value of the search field appears once at some level in the tree, along
with a data pointer. In a B + -tree, data pointers are stored only at the leaf nodes of the tree.
Hence, the structure of the leaf nodes differs from the structure of internal nodes.
The pointers in the internal nodes are tree pointers to blocks that are tree nodes whereas
the pointers in leaf nodes are data pointers.
5. Each leaf node is of the form: where, q p, each is a data pointer and P
to the next leaf node of the B +-trees.
next
points