DBMS

Download as pdf or txt
Download as pdf or txt
You are on page 1of 55

2 DBMS

1. Introduction

DBMS is one of the most important module for Specialist Officer (IT) Exam. As we’ve seen that the objective paper of
Professional Knowledge (especially for Scale-I Officer) in IBPS Exam has many questions from Database and
Networking Modules. Thus, aspirants should prepare DBMS thoroughly. The term DBMS stands for Data Base
Management System. Now comes a question that what is DBMS?
DBMS is the acronym of Data Base Management System. DBMS is a collection of interrelated data and a set of
programs to access this data in a convenient and efficient way. It controls the organization, storage, retrieval,
security and integrity of data in a database.

A database management system (DBMS) is a computer software that manages databases, it may use any of a variety
of database models, such as the Hierarchical DBMS, Network DBMS and Relational DBMS.

The emergence of the first type of DBMS was between 1960's-70's; that was the Hierarchical DBMS. IBM had the first
model, developed on IBM 360 and their (DBMS) was called IMS, originally it was written for the Apollo program.
This type of DBMS was based on binary trees, where the shape was like a tree and relations were only limited
between parent and child records.

1.1 Database Models


A database model shows the logical structure of a database, including the relationships and constraints that determine
how data can be stored and accessed.

Hierarchical Database Model

Hierarchical Database model is one of the oldest database models. In the hierarchical data model, records are linked
with other superior records on which they are dependent and also on the records, which are dependent on them. A
tree structure may establish one-to-many relationship. Parents can have many children exhibiting one to many
relationships. The grandparents and children are the nodes or dependents of the root. In general, a root may have any
number of dependents.
A tree-structure diagram is the schema for a hierarchical database. Such a diagram
consists of two basic components:
1. Boxes, which correspond to record types
2. Lines, which correspond to links

Pros:
✓ The model allows easy addition and deletion of new information.
✓ Data at the top of the Hierarchy is very fast to access.
✓ It relates well to anything that works through a one to many relationships.
Cons:
• Realtime requirements are of more sophisticated relationships which this model fails to cater.

20 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
• The database can be very slow when searching for information on the lower entities.
• Many to many relationships are not supported.

Network Database Model

The Network Database model can be viewed as an upside-down tree where each member information is the branch
linked to the owner, which is the bottom of the tree. The network database model was a progression from the
hierarchical database model and was designed to solve some of that model's problems, specifically the lack of
flexibility. It addresses the need to model more complex relationships such as the many-to-many relationship which
hierarchical model could not deal with.

The Network model replaces the hierarchical tree with a graph thus allowing more general connections among the
nodes. The main difference of the network model from the hierarchical model, is its ability to handle many to many
(N:N) relations. In other words, it allows a record to have more than one parent.

Pros:
✓ In the network database terminology, a relationship is a set. Each set comprises of two types of records.- an
owner record and a member record,
✓ In a network model an application can access an owner record and all the member records within a set.
✓ Network Model supports data independence to some level as it draws a clear line of demarcation between
programs and the complex physical storage details.
Cons:
• The insertion, deletion and updating operations of any record require large number of pointers
adjustments.
• A change in structure demands a change in the application as well, which leads to lack of structural
independence.

Relational Database Model


Relational data model is the primary data model, which is used widely around the world for data storage and
processing. The relational database model was a huge leap forward from the network database model. Instead of
relying on a parent-child or owner-member relationship, the relational model allows any file to be related to any other
by means of a common field. Relational databases go hand-in-hand with the development of SQL. Structured Query
Language is a standardized language for defining and manipulating data in a relational database.

What are Tables in Relational Model?


Relations are saved in the format of Tables. This format stores the relation among entities. A table has rows and
columns, where rows represents records and columns represent the attributes. Following are some terms associated
with relations:

21 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
1. Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
2. Relation instance − A finite set of tuples in the relational database system represents relation instance.
Relation instances do not have duplicate tuples.
3. Relation schema − A relation schema describes the relation name (table name), attributes, and their names.
4. Relation key − Each row has one or more attributes, known as relation key, which can identify the row in
the relation (table) uniquely.
5. Attribute domain − Every attribute has some pre-defined value scope, known as attribute domain.

Object-Oriented Database Model


Object Oriented Database Model (also referred to as object-oriented database management system or OODBMS), is a
database management system (DBMS) that supports the modelling and creation of data as objects. This includes some
kind of support for classes of objects and the inheritance of class properties and methods by subclasses and their
objects. ODBMS were originally thought of to replace RDBMS because of their better fit with object-oriented
programming languages. However, high switching cost, the inclusion of object-oriented features in RDBMS to make
them ORDBMS, and the emergence of object-relational mappers (ORMs) have made RDBMS successfully defend their
dominance in the data center for server-side persistence.

Relational databases store data in tables that are two dimensional. The tables have rows and columns. Relational
database tables are "normalized" so data is not repeated more often than necessary. With traditional databases, data
manipulated by the application is transient and data in the database is persisted (Stored on a permanent storage
device). In object databases, the application can manipulate both transient and persisted data.

Entity-Relationship Database Model


The Entity Relationship Data Model ensure that you get a precise understanding of the nature of the data and how it
is used by the enterprise, you need to have a universal model for interaction that is non-technical and free of
ambiguities and easy readable to both technical as well as non-technical members. This is implemented with use of the
ER Diagrams.

ER model is based on two concepts:

• Entities, defined as tables that hold specific information (data)


• Relationships, defined as the associations or interactions between entities
What is Entity Relationship Diagram (ER-Diagram)?
ER-Diagram is a pictorial representation of data that describes how data is communicated and related to each
other. Any object, such as entities, attributes of an entity, sets of relationship and other attributes of relationship
can be characterized with the help of the ER diagram.

1.2 Advantages of today's’ DBMS over earlier File Management System

These are some important advantages of today’s DBMS:

Reduced Data Redundancy and Inconsistency: This means with DBMS the chances of multiple file formats,
duplication of information in different files got eliminated. Which means it reduced data duplication and with this the
data could stay more consistent.

Data Integrity: data integrity” refers to the accuracy and consistency of data stored in a database
DBMS ensures data integrity by managing transactions through ACID test = atomicity, consistency, isolation,
durability. While such integrity is absent in file management system.

Sharing of Data: In DBMS, data can be shared by authorized users of the organization. The database administrator
manages the data and gives rights to users to access the data.

Control Over Concurrency: In a file-based system, if two users can access data simultaneously, it is possible that they
will interfere with each other. For example, if both users attempt to perform update operation on the same record,
then one may overwrite the values recorded by the other. Most database management systems have sub-systems to
control the concurrency so that transactions are always recorded with accuracy.

22 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Backup and Recovery Procedures: In a computer file-based system, the user creates the backup of data regularly to
protect the valuable data from damage due to failures to the computer system or application program. It is very time
consuming method, if amount of data is large. Most of the DBMSs provide the 'backup and recovery' sub-systems that
automatically create the backup of data and restore data if required.

Data Independence: The separation of data structure of database from the application program that uses the data is
called data independence. In DBMS, you can easily change the structure of database without modifying the
application program.

2. Database Architecture

2.1 Data Abstraction – View Levels

The generalized architecture of DBMS is called ANSI/SPARC model. The architecture is divided into three levels:

1. External view or User view/View Level


It is the highest level of data abstraction. This includes only those portions of database of concern to a user or
Application program. Each user has a different external view and it is described by means of a scheme called external
schema.

2. Conceptual view/Logical Level


All the database entities and the relationship among them are included. One conceptual view represents the entire
database called conceptual schema.

3. Internal view/Physical Level


It is the lowest level of abstraction, closest to the physical storage method. It describes how the data is stored, what is
the structure of data storage and the method of accessing these data. It is represented by internal schema.
View Level …Defined by User

2.2 Instances and Schemas

Schema can be defined as the design of a database. The overall description of the database is called the database
schema.
You can relate it as something like types and variables in programming languages. Thus, essentially Schema is the
logical structure of the database. Just like the View Levels in Data Abstraction Schema is of 3 types:

1. Physical Schema:
The design of a database at physical level is called physical schema, how the data stored in blocks of storage is
described at this level.

2. Logical schema:
Logical schema can be defined as the design of database at logical level. In this level, the programmers as well as the
database administrator (DBA) work. At this level data can be described as certain types of data records which can be
stored in the form of data structures. However, the internal details (such as implementation of data structure) will be
remaining hidden at this level.

23 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
3. View Schema
View schema can be defined as the design of database at view level which generally describes end-user interaction
with database systems.

• Physical Data Independence–the ability to modify the physical schema without changing the logical schema.
• Applications depend on the logical schema
• In general, the interfaces between the various levels and components should be well defined so that changes in
some parts do not seriously influence others.

What is an Instance?
Databases change over time as information is inserted and deleted. The collection of information stored in the
database at a particular moment is called an instance.

2.3 Database Languages

A database system provides a data-definition language to specify the database schema and a data-manipulation
language to express database queries and updates.

1. Data Definition Language: DDL is used for specifying the database schema. It contains commands to create tables,
alter the structure, delete tables or rename tables.

Examples of DDL commands in SQL:


To create the database instance – CREATE
To alter the structure of database – ALTER
To drop database instances – DROP
To delete tables in a database instance – TRUNCATE
To rename database instances – RENAME

2. Data Manipulation Language: As the name specifies itself DML is used for accessing and manipulating data in a
database.

Examples of DML commands in SQL:


To read records from table(s) – SELECT
To insert records into the tables – INSERT
Update the data in tables– UPDATE
Delete all the records from the table – DELETE

3. Data Control Language: DCL is used for granting and revoking user access on a database –

Examples of DCL commands in SQL:


To grant access to user – GRANT
To revoke access from user – REVOKE

3. Entity-Relationship Model

What is an Entity?
In a database, we would be grouping only related data together and storing them under one group name called Entity
/ Table. This helps in identifying which data is stored where and under what name. It reduces the time to search for a
particular data in a whole database.
Entities can be classified based on their strength. An entity is considered weak if its tables are existence dependent.
Following are basic types of entities:

1. Strong Entity: Entities having its own attribute as primary keys are called strong entity. For example,
EMPLOYE has EMPLOYE_ID as primary key. Hence it is a strong entity.
2. Weak Entity: Entities which cannot form their own attribute as primary key are known weak entities. These
entities will derive their primary keys from the combination of its attribute and primary key from its mapping
entity. The relationship between weak entity and strong entity set is called as Identifying Relationship.
3. Composite Entity: Entities participating in the many to many relationships are called composite entity.

24 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
The relationship between weak entity and strong entity set is called as Identifying Relationship. The line connecting
strong entity set with the relationship is single whereas the line connecting weak entity set with the identifying
relationship is double. A member of a strong entity set is called dominant entity and member of weak entity set is
called as subordinate entity. A weak entity set does not have a primary key, but we need a means of distinguishing
among all those entries in the entity set that depend on one particular strong entity set. The discriminator of a weak
entity set is a set of attributes that allows this distinction be made. A weak entity set is represented by doubly outlined
box and corresponding identifying relation by a doubly outlined diamond. It is also called as the Partial key of the
entity set.

Weak Entity Sets


Weak Entity Set: An entity set whose members owe their existence to some entity in a strong entity set.
• entities are not of independent existence.
• each weak entity is associated with some entity of the owner entity set through a special relationship.
• weak entity set may not have a key attribute.

Weak Entity set- Example


We depict a weak entity set by double rectangles. We underline the discriminator of a weak entity set with a dashed
line.

For example: payment_number – discriminator of the payment entity set

Primary key for payment – (loan_number, payment_number)

Note: the primary key of the strong entity set is not explicitly stored with the weak entity set, since it is implicit in the
identifying relationship. If loan_number were explicitly stored, payment could be made a strong entity, but then the
relationship between payment and loan would be duplicated by an implicit relationship defined by the attribute
loan_number common to payment and loan.

3.1 Attributes

Each entity is described by a set of attributes/properties.


Types of Attributes
• Simple Attributes: having atomic or indivisible values. example: Dept – a string Phone Number - an eight-digit
number
• Composite Attributes: having several components in the value. example: Qualification with components (Degree
Name, Year, University Name)
• Derived Attributes: Attribute value is dependent on some other attribute. example: Age depends on Date of Birth.
So, age is a derived attribute.
• Single-valued: having only one value rather than a set of values. for instance, Place Of Birth - single string value.
• Multi-valued: having a set of values rather than a single value. for instance, Courses Enrolled attribute for student
Email Address attribute for student Previous Degree attribute for student.
• Attributes can be: simple single-valued, simple multi-valued, composite single-valued or composite multi-valued.

3.2 E-R Diagram

An ER diagram is a means of visualizing how the information a system produces is related. There are five main
components of an ER Diagram:

1. Connecting lines, solid lines that connect attributes to show the relationships of entities in the diagram.
2. Entities: Represented by Rectangle
• Strong Entity: These shapes are independent from other entities, and are often called parent entities, since
they will often have weak entities that depend on them.

25 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
• Weak Entity: A weak entity is an entity that must defined by a foreign key relationship with another entity as
it cannot be uniquely identified by its own attributes alone.

3. Relationship: connects two or more entities into an association/relationship - Diamond

Here you can see: Employee Works in Department. EMPLOYE and Dept are Entity Types and WorksIn is the
relationship represented with a diamond figure.

A recursive relationship is one in which the same entity participates more than once in the relationship. For Example:
Every manager is also an employee. So, manager is not a new entity,
but just a subset of the instances of the entity EMPLOYEE.

Recessive Relationship:

EMPLOYE MANAGER
Nikhil Anuj
Sumita Nikhil
Anuj
Rahul

Manages of

This also a representation of - many cardinality.

3.3 Attributes-Represented by Ovals

An Attribute describes a property or characteristic of an entity. For example, Name, ID, Age, Address etc can be
attributes of an EMPLOYE.

Key attribute represents the main characteristic of an Entity. It is used to represent Primary key. Ellipse with
underlying lines represent Key Attribute. Here EmpId is the key attribute that is the primary key which will uniquely
identify the EMPLOYE Records.

Double Ellipses is used to represent multivalued attributes.

An attribute can also have their own attributes. These attributes are known as Composite attribute.

26 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Derived Attribute is calculated or otherwise derived from another attribute, such as age from a DOB (Date of Birth).

4. Cardinality
The cardinality of a relationship is the number of instances of entity B that can be associated with entity A. There is a
minimum cardinality and a maximum cardinality for each relationship. Cardinality refers to the maximum number of
times an instance in one entity can relate to instances of another entity. Ordinality, on the other hand, is the minimum
number of times an instance in one entity can be associated with an instance in the related entity.

Cardinalit
Many
Zero or one
One
One (and only one)
Zero or many
One or many

Binary Relationships and Cardinality Ratio

• The number of entities from E2 that an entity from E1 can possibly be associated thru R (and vice-versa)
determines the cardinality ratio of R.
• Four possibilities are usually specified:
1. one-to-one (1:1)
2. one-to-many (1:N)
3. many-to-one (N:1)
4. many-to-many (M:N)

Cardinality Ratios
One-to-one: An E1 entity may be associated with at most one E2 entity and similarly an E2 entity may be associated
with at most one E1 entity.
One-to-many: An E1 entity may be associated with many E2 entities whereas an E2 entity may be associated with at
most one E1 entity.
Many-to-one: … (similar to above)
Many-to-many: Many E1 entities may be associated with a single E2 entity and a single E1 entity may be associated
with many E2 entities.

Mapping Cardinalities

Note: Some elements in A and B may not be mapped to any elements in the other set.

27 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Many to one Many to many
Note: Some elements in A and B may not be mapped to any elements in the other set.

Alternative Representation of Cardinality


Many to Many Relationship b/w User and course meaning any number of users can study or enroll in any number of
course and these is one to many relationship b/w a teacher (which is also a user) and course meaning only one
instructor can teach may number of courses

In one department we have many employees so the following represents – one to many relationships

And in case any number of employee may work in any number of department (many to many)

5. Keys
A super key of an entity set is a set of one or more attributes whose values uniquely determine each entity. A
candidate key of an entity set is a minimal super key

For Example: Customer-id is candidate key of customer


account-number is candidate key of account
Although several candidate keys may exist, one of the candidate keys is selected to be the primary key.

Keys for Relationship Sets


The combination of primary keys of the participating entity sets forms a super key of a relationship set.
(customer-id, account-number) is the super key of depositor
• NOTE: this means a pair of entity sets can have at most one relationship in a particular relationship set.
• E.g. if we wish to track all access-dates to each account by each customer, we cannot assume a relationship for
each access. We can use a multivalued attribute though.
• Must consider the mapping cardinality of the relationship set when deciding the what are the candidate keys.
• Need to consider semantics of relationship set in selecting the primary key in case of more than one candidate key.

Following is an example of an ER Diagram:

Entity- Relationship (E-R) Diagram


The overall logical structure of a database can be expressed graphically by an E-R diagram. The diagram consists
of the following major components.
28 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
• Rectangles: represent entity set.
• Ellipses: represent attributes.
• Diamonds: represents relationship sets.
• Lines: links attribute set to entity set and entity set to relationship set.
• Double ellipses: represent multi-valued attributes.
• Dashed ellipses: denote derived attributes.
• Double lines: represent total participation of an entity in a relationship set.
• Double rectangles: represent weak entity sets.

3.3 Specialization, Generalization and Aggregation

Generalization is a bottom-up approach in which two lower level entities combine to form a higher-level entity. In
generalization, the higher-level entity can also combine with other lower level entity to make further higher-level
entity. Specialization is opposite to Generalization. It is a top-down approach in which one higher level entity can be
broken down into two lower level entities.

Top-down design process; we designate subgroupings within an entity set that are distinctive from other entities in
the set.
• These subgroupings become lower-level entity sets that have attributes or participate in relationships that do not
apply to the higher-level entity set.
• Depicted by a triangle component labelled ISA (E.g. customer “is a” person).
• Attribute inheritance – a lower-level entity set inherits all the attributes and relationship participation of the
higher-level entity set to which it is linked.

Specialization Example

Generalization
• A bottom-up design process – combine a number of entity sets that share the same features into a higher-level
entity set.
• Specialization and generalization are simple inversions of each other; they are represented in an E-R diagram in
the same way.
• The terms specialization and generalization are used interchangeably.

Generalization Example

Specialization and Generalization


• Can have multiple specializations of an entity set based on different features.
• E.g. permanent-employee vs. temporary-employee, in addition to officer vs. secretary vs. teller
• Each particular employee would be a member of one of permanent-employee or temporary-employee, and also a
member of one of officer, secretary, or teller
• The IS-A relationship also referred to as superclass - subclass relationship.

Aggregation

Aggregation is a process when relation between two entities is treated as a single entity. It is an abstraction that treats
relationships as entities.
29 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
• Eliminate this redundancy via aggregation
• Treat relationship as an abstract entity
• Allows relationships between relationships
• Abstraction of relationship into new entity

Aggregation Example

ER Diagram with Aggregation

Summary of Symbols Used in E-R Notation

30 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
4. Relational Database Management System

The relational model for database management is a data model based on predicate logic and set theory. It was
invented by Edgar Codd. The fundamental assumption of the relational model is that all data are represented as
mathematical n-ary relations, an n-ary relation being a subset of the Cartesian product of n sets.

n-ary Relationship
When there are n entities set participating in a relation, the relationship is called as n-ary relationship.

1) Relation - The fundamental organizational structure for data in the relational model is the relation. A relation is a
two-dimensional table made up of rows and columns. Each relation also called a table, stores data about entities.
2) Tuples - The rows in a relation are called tuples. They represent specific occurrences (or records) of an entity. Each
row consists of a sequence of values, one for each column in the table. In addition, each row (or record) in a table must
be unique. A tuple variable is a variable that stand for a tuple.
3) Attributes – The column in a relation is called attribute. The attributes represent characteristics of an entity.
4) Domain – For each attribute there is a set of permitted values called domain of that attribute. For all relations ‘r’,
the domain of all attributes of ‘r’ should be atomic. A domain is said to be atomic if elements of the domain are
considered to be indivisible units.

Database Schema – Logical design of the database is termed as database schema.


Database instance – Database instance is a snapshot of the data in a database at a given instant of time.
Relation schema – The concept of relation schema corresponds to the programming notion of type definition. It can
be considered as the definition of a domain of values. The database schema is the collection of relation schemas that
define a database.
Relation instance – The concept of a relation instance corresponds to the programming language notion of a value of
a variable. For relation instance, we actually mean the “relation” itself.

4.1 Database Keys

1. Primary Key: Which uniquely identifies a record in a table. Student_ID is the primary key in this STUDENT
Table.
2. Candidate Key: A candidate key is a single field or the least combination of fields that uniquely identifies
each record in the table. Every table must have at least one candidate key but at the same time can have
several. For Example in the table STUDENT, Student_ID and Roll_No. Are Candidate keys.

Roll_No. Student_ID
001 11093100
002 11093101
003 11093126
004 11093127

3. Foreign Key: A foreign key is generally a primary key from one table that appears as a field in another. For
Example let us consider these two table STUDENT and LIBRARY_RECORD.

STUDENT
Roll_No. Student_ID Student_Name Student_Class
001 11093100 Ravi Kumar 3
002 11093101 Nihal Sharma 4
003 11093126 Astha Mathur 3
004 11093127 Nishi Arora 5

LIBRARY_RECORD
Lib_CardNo Student_ID Student_Name Address
AX120 11093101 Nihal Sharma 12th Avenue Street, Delhi
AX121 11093126 Astha Mathur XYZ Lane,Delhi
BL101 11093127 Nishi Arora 5-D, Z Block, Delhi

31 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
In the table LIBRARY_RECORD Lib_CardNo. Is the Primary key and Student_ID is the foreign key as it is the primary
key of the table STUDENT.
4. Alternate Key: The candidate key other than primary key is called as alternate key.
5. Super Key: The set of attributes which can uniquely identify a tuple is known as Super Key. For Example,
Student_Enroll_No, (Student_ID, Student_Name) etc.

Non-key attributes are attributes other than candidate key attributes in a table. And Non-prime Attributes are
attributes other than Primary attribute.

4.2 Relational Query Languages


Relational query languages use relational algebra to break the user requests and instruct the DBMS to execute the
requests. It is the language by which user communicates with the database. These relational query languages can be
procedural or non-procedural.

• In a procedural language, the user instructs the system to perform a sequence of operations on the database to
compute the desired result.

• In a nonprocedural language, the user describes the desired information without giving a specific procedure
for obtaining that information.

4.3 Relational Algebra


Relational algebra is a procedural query language. It takes one or more relations / tables and performs the operation
and produce the result. This result is also considered as a new table or relation. Specifically, since the result of a
relational query is itself a relation, relational operations can be applied to the results of queries as well as to the given
set of relations.

An operator can be either unary or binary. Following are some operations of relational algebra:

1. Selection operator (σ): Selection operator is used to select tuples from a relation based on some condition.
Syntax: σ (Cond)(Relation Name)

Example

Extract employees whose age is greater than 30 from EMPLOYEES relation


σ (AGE>30)(EMPLOYEES)

2. Project Operation (∏) : It projects column(s) that satisfy a given predicate.


Syntax: ∏(Column 1,Column 2….Column n)(Relation Name)

Example

Extract EMP_ID and NAME from EMPLOYEE relation.


∏(EMP_ID,NAME)(EMPLOYEE)

3. Union Operation (∪): It performs binary union between two given relations. Union on two relations R1 and
R2 can only be computed if R1 and R2 are union compatible (These two relation should have same number of
attributes and corresponding attributes in two relations have same domain) . Duplicate tuples are
automatically eliminated in union operation.
Syntax: Relation1 U Relation2
r ∪ s = { t | t ∈ r or t ∈ s}
Note: r, and s must have the same number of attributes.

32 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Example

Projects the names of th Employees who are Managers in IT_Dept or Managers in FUNC_Dept or Both
∏ Managers (IT_Dept) ∪ ∏ Managers(FUNCT_Dept)

4. Minus (-): Minus on two relations R1 and R2 can only be computed if R1 and R2 are union compatible. Minus
operator when applied on two relations as R1-R2 will give a relation with tuples which are in R1 but not in
R2.
Syntax: Relation1 - Relation2

Example

Find person who are student but not employee, we can use minus operator like:
∏ Name (STUDENTS) − ∏ Name (EMPLOYEE)

5. Rename(ρ): Rename operator is used to give another name to a relation.


Syntax: ρ(Relation2, Relation1)

Example

To rename STUDENT relation to STUDENT1, we can use rename operator like:


ρ(STUDENT1, STUDENT)

6. Cartesian Product (Χ): The cartesian product of two tables combines each row in one table with each row in
the other table. It combines tuples from two relations, but unlike the join operation, its result contains all pairs
of tuples from the two relations, regardless of whether their attribute values match.
Syntax: r Χ s
Where r and s are relations and their output will be defined as −
r Χ s = { q t | q ∈ r and t ∈ s}

Tuple Relational Calculus

Relational calculus is a non-procedural query language. It uses mathematical predicate calculus instead of
algebra. It provides the description about the query to get the result whereas relational algebra gives the
method to get the result.

A query in the tuple relational calculus is expressed as: {t | P(t)}


i.e. the set of tuples for which predicate is true.

{t | EMPLOYEE (t) and t.SALARY>20000} - implies that it selects the tuples from EMPLOYEE relation
such that resulting employee tuples will have salary greater than 20000. It is example of selecting a range of
values.

Domain Relational Calculus

In the tuple relational calculus, you have use variables that have series of tuples in a relation. In the domain
relational calculus, you will also use variables but in this case the variables take their values from domains
of attributes rather than tuples of relations. An domain relational calculus expression has the following
general format –

{d1, d2, . . . , dn | F(d1, d2, . . . , dm)} m ≥ n


where d1, d2, . . . , dn, . . . , dm stand for domain variables and F(d1, d2, . . . , dm) stands for a formula
33 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
composed of atoms.

For example, select EMP_ID and EMP_NAME of employees who work for department ID 415

{<EMP_ID, EMP_NAME> | <EMP_ID, EMP_NAME> ? EMPLOYEE Λ DEPT_ID = 415}

5. Normalization

Normalization is a process of organizing the data in database to avoid data redundancy, insertion anomaly, update
anomaly & deletion anomaly. Thus, database normalization is a database schema design technique, by which an
existing schema is modified to minimize redundancy and dependency of data.
What are Anomalies in Database Management?
Anomalies are inconvenient or error-prone situation arising when we process the tables. There are three types of
anomalies:
1. Update Anomalies: Incorrect data may have to be changed, which could involve many records having to be
changed, leading to the possibility of some changes being made incorrectly.
2. Delete Anomalies: A record of data can legitimately be deleted from a database, and the deletion can result in
the deletion of the only instance of other, required data, E.g. Deleting a book loan from a library member can
remove all details of the particular book from the database such as the author, book title etc
3. Insert Anomalies: The nature of a database may be such that it is not possible to add a required piece of data
unless another piece of unavailable data is also added. E.g. A library database that cannot store the details of a
new member until that member has taken out a book.

5.1 Functional Dependency

We use functional dependencies to: test relations to see if they are legal under a given set of functional
dependencies.
If a relation r is legal under a set F of functional dependencies, we say that r satisfies F.
We say that F holds on R if all legal relations on R satisfy the set of functional dependencies F.
Note: A specific instance of a relation schema may satisfy a functional dependency even if the functional
dependency does not hold on all legal instances. For example, a specific instance of loan may, by chance, satisfy
amount →customer_name.
A functional dependency is trivial if it is satisfied by all instances of a relation
Example:
customer_name, loan_number →customer_name
customer_name→customer_name

Inference rules
Armstrong's axioms - sound and complete i.e, enable the computation of any functional dependency. Functional
dependencies are:
1. Reflexivity - if the B's are a subset of the A's then A → B
2. Augmentation - If A → B, then A, C → B, C.
3. Transitivity - If A → B and B → C then A → C.
Additional inference rules
4. Decomposition - If A → B, C then A → B
5. Union - If A → B and A → C then A → B, C
6. Pseudo transitive - If A → B and C, B → D then C, A → D
Equivalence of sets of functional dependencies
Two functional dependencies S & T are equivalent iff S→ T and T → S.
The dependency {A_1, ..., A_n} → {B_1, ..., B_m} is trivial if the B's are a subset of the A's is nontrivial if at least
one of the B's is not among the A's is completely nontrivial if none of the B's is also one of the A's

Closure (F+)
All dependencies that include F and that can be inferred from F using the above rules are called closure of F
denoted by F+.
After finding a set of functional dependencies that are hold on a relation, the next step is to find the Super key for
that relation (table). The set of identified functional dependencies play a vital role in finding the key for the

34 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
relation. We can decide whether an attribute (or set of attributes) of any table is a key for that table or not by
identifying the attribute or set of attributes’ closure. If A is an attribute, (or set of attributes) then its attribute
closure is denoted as A+.
Algorithm to compute closure
We have to find out whether F╞ X → Y. This is the case when X → Y Є F+
The better method is to generate X+, closure of X under F and test F╞ X → Y using the first two axioms
augmentation and reflexive rules.

Example:
EMPLOYEE(empid, empname, dept, age, salary, experience)
Let the functional dependencies be as follows:

empid->empname
{age,experience}->salary
empid->{age,dept}
dept->experience
In above example, let us find the closure of the attribute empid, i.e, closure of {empid}

Since we are finding closure of empid. empid is an element of the closure set C+. Now we go step by step.
• Step 1: Select each functional dependency and check whether the left side of functional dependency is a
subset of closure. If yes, add the right side of that functional dependency to closure set. if not, check the next
functional dependency
• Step 2: Keep on checking the functional dependencies until there is no more functional dependencies with its
left side as a subset of closure set C+.

What is a subset? A set M is said to be a subset of another set N only if all elements of set M is present in set N. Set N
can have more elements than M.
• So, in our example, empid is an element of the closure set C+. So, initially, C+={empid}.
• First functional dependency says empid functionally determines empname. Its left side ( {empid}) is subset of
C+. Therefore, its right side is added to C +. Now C+={empid, empname}.
• Second fd (functional dependency) says {age, experience}->salary. Here left side ( {age, experience} ) is not a
subset of C+. So we check the next fd.
• Third fd says, empid->{age, dept}. Here left side ( {empid} ) is subset of C+. Therefore, its right side is added to
C+. Now, C becomes, C+={empid, empname, age, dept}.
• Fourth fd says, dept->experience. Here left side ( {dept} ) is a subset of C +. So we are adding its right side (
{experience} ) to Closure set. Now, C+={empid, empname, age, dept, experience}.
• We are looking again for a functional dependency with its left side as a subset of closure set. Since the closure
set C+ is getting changed in some steps, there is more possibility to find another functional dependency with
its left side as a subset of C+. Again, we go through every functional dependency.
• Since sets do not allow duplication, we should do nothing if the right side of a functional dependency whose
left side is subset of C+, is already present in closure set C+.
• Second fd has a left side that is subset of C+. {age, experience}->salary. Therefore, salary is added to C+. Now,
C+={empid, empname, age, dept, experience, salary}.
• There isn't any more functional dependency whose left side is subset of C+ and give at least one new attribute
to closure set. Therefore, we stop now.

Now closure of set C is C+={empid, empname, age, dept, experience, salary}.


Minimal Cover of FD

We say that a set of functional dependencies F covers another set of functional dependencies G, if every functional
dependency in G can be inferred from F. More formally, F covers G if G+ ⊆ F+. F is a minimal cover of G if F is the
smallest set of functional dependencies that cover G.

We find the minimal cover by iteratively simplifying the set of functional dependencies. To do this, we will use three
methods:

Simplifying an FD by the Union Rule: Let X, Y , and Z be sets of attributes. If X → Y and X → Z, then X → Y Z

35 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Simplifying an FD by simplifying the left-hand side: Let X and Y be sets of attributes, and B be a single attribute not
in X.

Let F be: XB → Y
and H be X → Y
If F ⇒ X → Y , then we can replace F by H. In other words, if Y ⊆ X+F, then we can replace F by H.
For example, say we have the following functional dependencies (F):
• AB → C
•A→B
And we want to know if we can simplify to the following (H):
•A→C
•A→B
Then A+F = ABC. Since, Y ⊆ X+F, we can replace F by H.

Simplifying an FD by simplifying the right-hand side: Let X and Y be sets of attributes, and C be a single attribute
not in Y .
Let F be:
X → Y C and H be

X→Y
If H ⇒ X → Y C, then we can replace F by H. In other words, if Y C ⊆ X+H, then we can replace F by H.
For example, say we have the following functional dependencies (F):
• A → BC
•B→C
And we want to know if we can simplify to the following (H):
•A→B
•B→C
Then A+H = ABC. Since, BC ⊆ X+H, we can replace F by H.

Finding the Minimal Cover


Given a set of functional dependencies F :
1. Start with F
2. Remove all trivial functional dependencies
3. Repeatedly apply (in whatever order you like), until no changes are possible
• Union Simplification (it is better to do it as soon as possible, whenever possible)
• RHS Simplification
• LHS Simplification
4. Result is the minimal cover

Example: Applying to algorithm to EGS with


1. E → G
2. G → S
3. E → S
Using the union rule, we combine 1 and 3 and get
1. E → GS
2. G → S
Simplifying RHS of 1 (this is the only attribute we can remove), we get:
1. E → G
2. G → S

Algorithm to find minimal cover for a set of FDs F


Step 1: Let G be the set of FDs obtained from F by decomposing the
right hand sides of each FD to a single attribute.
Step 2: Remove all redundant attributes from the left hand sides of FDs in G.
Step 3: From the resulting set of FDs, remove all redundant FDs.
Output the resulting set of FDs.

Example: Consider R = ABCDEFGH and the following set of FDs, F:


ABH → C
36 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
A→D
C→E
BGH → F
F → AD
E→F
BH → E
Converting right hand sides to single attributes, we get:
ABH → C
A→D
C→E
BGH → F
F→A
F→D
E→F
BH → E
Perform steps 2 and 3....

Understanding Normalization
In relational database theory, normalization is the process of restructuring the logical data model of a database to
eliminate redundancy, organize data efficiently, and reduce repeating data and to reduce the potential for anomalies
during data operations. Data normalization also may improve data consistency and simplify future extension of the
logical data model. The formal classifications used for describing a relational database's level of normalization are
called normal forms (NF).

A non-normalized database can suffer from data anomalies: A non-normalized database may store data representing
a particular referent in multiple locations. An update to such data in some but not all of those locations results in an
update anomaly, yielding inconsistent data. A normalized database prevents such an anomaly by storing such data
(i.e. data other than primary keys) in only one location.

A non-normalized database may have inappropriate dependencies, i.e. relationships between data with no functional
dependencies. Adding data to such a database may require first adding the unrelated dependency. A normalized
database prevents such insertion anomalies by ensuring that database relations mirror functional dependencies.
Similarly, such dependencies in non-normalized databases can hinder deletion. That is, deleting data from such
databases may require deleting data from the inappropriate dependency. A normalized database prevents such
deletion anomalies by ensuring that all records are uniquely identifiable and contain no extraneous information.

Normal forms
Edgar F. Codd originally defined the first three normal forms. The first normal form requires that tables be made up
of a primary key and a number of atomic fields, and the second and third deal with the relationship of non-key fields
to the primary key. These have been summarized as requiring that all non-key fields be dependent on "the key, the
whole key and nothing but the key". In practice, most applications in 3NF are fully normalized. However, research
has identified potential update anomalies in 3NF databases. BCNF is a further refinement of 3NF that attempts to
eliminate such anomalies. The fourth and fifth normal forms (4NF and 5NF) deal specifically with the representation
of many-many and one-many relationships. Sixth normal form (6NF) only applies to temporal databases.

5.2 First Normal Form (1NF)

First normal form (1NF) lays the groundwork for an organized database design: Ensure that each table has a primary
key: minimal set of attributes which can uniquely identify a record. It states that the domain of an attribute must
include only atomic values and the value of any attribute in a tuple must be single value from the domain of that
attribute. It doesn’t allow nested relation. Data that is redundantly duplicated across multiple rows of a table is
moved out to a separate table.

Atomicity: Each attribute must contain a single value, not a set of values.
Unnormalized form (UNF): A table that contains one or more repeating groups.
First normal form (1NF): A relation in which the intersection of each row and column contains one and only one
value.
37 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
UNF → 1NF: remove repeating groups:
• Entering appropriate data in the empty columns of rows.
• Placing repeating data along with a copy of the original
• key attribute in a separate relation. Identifying a primary
• key for each of the new relations.
First normal form (1NF) lays the groundwork for an organized database design: Ensure that each table has a primary
key: minimal set of attributes which can uniquely identify a record. It states that the domain of an attribute must
include only atomic values and the value of any attribute in a tuple must be single value from the domain of that
attribute. It doesn’t allow nested relation. Data that is redundantly duplicated across multiple rows of a table is
moved out to a separate table.

For Example:
Consider a table STUDENT with fields Roll_No, Name, Course. Here a student may have opted for more than one courses thus
the values in Course field will not be atomic:
Roll_No Name Course
1 Snehal Polity, History, Economics
2 Kajal DBMS, CD
3 Amit Physics, Chemistry
After converting it to First Normal Form (1NF)

Roll_No Name Course


1 Snehal Polity
1 Snehal History
1 Snehal Economics
2 Kajal DBMS
2 Kajal CD
3 Amit Physics
3 Amit Chemistry

5.3 Second Normal Form (2NF)

General Definition: A relation schema R is in second normal form (2NF) if every nonprime attribute A in R is not
partially dependent on any key of R.
Partial Dependency – If proper subset of candidate key determines non-prime attribute, it is called partial
dependency.
• Create separate tables for sets of values that apply to multiple records.
• Relate the tables with a foreign key.
• Records should not depend on anything other than a table's primary key (a compound key, if necessary). For
example, consider a customer's address in an accounting system. The address is needed by the Customers
table, but also by the Orders, Shipping, Invoices, Accounts Receivable, and Collections tables. Instead of
storing the customer's address as a separate entry in each of these tables, store it in one place, either in the
Customers table or in a separate Addresses table.

A table is said to be in 2NF if both the following conditions hold:


• Table is in 1NF (First normal form)
• No non-prime attribute is dependent on the proper subset of any candidate key of table.
1NF → 2NF: remove partial dependencies: the functionally dependent attributes are removed from the relation by
placing them in a new relation along with a copy of their determinant.

As per First Normal Form, no two Rows of data must contain repeating group of information i.e each set of column
must have a unique value, such that multiple columns cannot be used to fetch the same row. The Primary key is
usually a single column, but sometimes more than one column can be combined to create a single primary key which
is actually called a Candidate Key. To identify or establish 2NF we must identify Candidate Key and Partial
Dependencies.

38 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Example: Let us consider a Relation R with fields A,B,C,D :
R(A B C D)
Here AB-> D and B-> C.
A and B are the essential attributes here as from A and B you can find D and from B you can find C. Thus Candidate
Key for this relation R is AB.
(AB)+ = ABCD
AB-> Candidate Key
Hence, AB ∈ Prime Attribute - those who are a part of candidate key
CD ∈ Non-Prime Attribute
Identify Partial Dependency: Here C is only dependent on B and not the complete candidate key set AB, thus in this
relation R there is a partial dependency.
Now for 2NF a relation must be in 1NF and there should not be any partial dependency. So we’ll elimination partial
dependency by creating another Relation R1 and R2:
R1 (A B D)
AB->D
R2 (B C)
B->C
In this scenario we have eliminated partial dependency as B is the only candidate key and C is dependent on B.

5.4 Third Normal Form (3NF)


For 3NF, first, the table must be in 2NF, plus, we want to make sure that the non-key fields are dependent upon
ONLY the PK, and not on any other field in the table. This is very similar to 2NF, except that now you are comparing
the non-key fields to OTHER non-key fields.

3NF Rule:
1. The database must meet all the requirements of the second normal form.
2. Any field which is dependent not only on the primary key but also on another field is moved out to a separate table.
(No Transitive Dependencies)

Example:
STUDENT( Stu_ID, Stu_Name, City, Zip)

We find that in the above STUDENT relation, Stu_ID is the key and only prime key attribute.
We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a superkey nor is
City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows:

Student_Detail (Stu_ID, Stu_Name, Zip)


ZipCode(Zip, City)

General Definition:
A relation schema R is in 3NF if, whenever a nontrivial functional dependency X→A
holds in R,
Either a) X is a Super key Or b) Y is a prime attribute of R.
i.e. A relation schema R is in 3NF if every nonprime attribute of R meets both of the following terms:
1. It is fully functionally dependent on every key of R.
2. It is non-transitively dependent on every key of R.

5.5 Boyce-Codd Normal Form (BCNF)

A row is in BCNF if and only if every determinant is a candidate key. The second and third normal forms assume that
all attributes not part of the candidate keys depend on the candidate keys but does not deal with dependencies within
the keys.

BCNF deals with such dependencies.


A relation R is said to be in BCNF if whenever X -> A holds in R, and A is not in X, then X is a candidate key for R.

39 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
BCNF covers very specific situations where 3NF miss interdependencies between non-key attributes. It should be
noted that most relations that are in 3NF are also in BCNF.

Infrequently, a 3NF relation is not in BCNF and this happens only if


(a) the candidate keys in the relation are composite keys (that is, they are not single attributes),
(b) there is more than one candidate key in the relation, and
(c) the keys are not disjoint, that is, some attributes in the keys are common.
The BCNF differs from the 3NF only when there are more than one candidate keys and the keys are composite and
overlapping. Consider for example, the relationship enrol (sno, sname, cno, cname, date-enrolled)
Let us assume that the relation has the following candidate keys:
(sno, cno)
(sno, cname)
(sname, cno)
(sname, cname)
(we have assumed sname and cname are unique identifiers). The relation is in 3NF but not in BCNF because there are
dependencies
sno -> sname
cno -> cname
where attributes that are part of a candidate key are dependent on part of another candidate key. Such dependencies
indicate that although the relation is about some entity or association that is identified by the candidate keys.
e.g. (sno, cno), there are attributes that are not about the whole thing that the keys identify. For example, the above
relation is about an association (enrolment) between students and subjects and therefore the relation needs to include
only one identifier to identify students and one identifier to identify subjects. Providing two identifiers about students
(sno, sname) and two keys about subjects (cno, cname) means that some information about students and subjects that is
not needed is being provided. This provision of information will result in repetition of information and the anomalies.
If we wish to include further information about students and courses in the database, it should not be done by
including the information in the present relation but by creating new relations that represent information about
entities student and subject.
These difficulties may be overcome by decomposing the above relation in the following three relations:
(sno, sname)
(cno, cname)
(sno, cno, date-of-enrolment)
We now have a relation that only has information about students, another only about subjects and the third only
about enrolments. All the anomalies and repetition of information have been removed.

5.6 Multivalued Dependency and Fourth Normal Form (4NF)

In a relational model, if all of the information about an entity is to be represented in one relation, it will be necessary
to repeat all the information other than the multivalue attribute value to represent all the information that we wish to
represent. This results in many tuples about the same instance of the entity in the relation and the relation having a
composite key (the entity id and the multivalued attribute). Of course, the other option suggested was to represent
this multivalued information in a separate relation. The situation of course becomes much worse if an entity has more
than one multivalued attributes and these values are represented in one relation by a number of tuples for each entity
instance. The multivalued dependency relates to this problem when more than one multivalued attributes exist.
Consider the following relation that represents an entity employee that has one multivalued attribute proj:
emp (e#, dept, salary, proj)

We have so far considered normalization based on functional dependencies; dependencies that apply only to
single-valued information. For example, e# -> dept implies only one dept value for each value of e#. Not all
information in a database is single-valued, for example, proj in an employee relation may be the list of all projects that
the employee is currently working on. Although e# determines the list of all projects that an employee is working on,
e# -> proj is not a functional dependency.

We can more clearly analyze the multivalued dependency by the following example.
programmer (emp_name, qualifications, languages)
This relation includes two multivalued attributes of entity programmer; qualifications and languages. There are no
functional dependencies.

40 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
The attributes qualifications and languages are assumed independent of each other. If we were to consider
qualifications and languages as separate entities, we would have two relationships (one between employees and
qualifications and the other between employees and programming languages). Both the above relationships are
many-to-many i.e. one programmer could have several qualifications and may know several programming languages.
Also one qualification may be obtained by several programmers and one programming language may be known to
many programmers.
Functional dependency A -> B relates one value of A to one value of B while multivalued dependency A ->> B defines
a relationship in which a set of values of attribute B are determined by a single value of A.

Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two tuples in R that have the same values
for attributes X and therefore with t1[x] = t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that
t1[x] = t2[x] = t3[x] = t4[x]
t3[Y] = t1[Y] and t3[Z] = t2[Z]
t4[Y] = t2[Y] and t4[Z] = t1[Z]
In other words if t1 and t2 are given by
t1 = [X, Y1, Z1], and
t2 = [X, Y2, Z2]
then there must be tuples t3 and t4 such that
t3 = [X, Y1, Z2], and
t4 = [X, Y2, Z1]

We are therefore insisting that every value of Y appears with every value of Z to keep the relation instances consistent.
In other words, the above conditions insist that X alone determines Y and Z and there is no relationship between Y and
Z since Y and Z appear in every possible pair and hence these pairings present no information and are of no
significance.

Fourth Normal Form


Fourth normal form (or 4NF) requires that there must be no non-trivial multivalued dependencies of attribute sets
on something other than a superset of a candidate key. A table is said to be in 4NF if and only if it is in the BCNF
and multivalued dependencies are functional dependencies. The 4NF removes unwanted data structures: multivalued
dependencies.

Definition: A relation schema R is in 4NF with respect to a set of dependencies F, if, for every non-trivial
multivalued dependency X ->>Y in F+, X is a super key for R.

5.7 Properties of Relational Decompositions

If R doesn’t satisfy a particular normal form, we decompose R into smaller schemas


What’s a decomposition?
R = (A1, A2,…, An)
D = (R1, R2,…, Rk) st Ri ⊆ R and R = R1 ∪ R2 ∪ … ∪ Rk
(Ri’s need not be disjoint)
Replacing R by R1, R2,…, Rk – process of decomposing R
Ex: gradeInfo (rollNo, studName, course, grade)
R1: gradeInfo (rollNo, course, grade)
R2: studInfo (rollNo, studName)

Decomposition Property: A relation must satisfy the following two properties during decomposition.
i. Lossless: A lossless-join dependency is a property of decomposition, which ensures that spurious rows are
generated when relations are united through a natural join operation. i.e. The information in an instance r of R must
be preserved in the instances r1, r2, r3, …..rk where ri = ΠRi (r)
Decomposition is lossless with respect to a set of functional dependencies F if, for every relation instance r on R
satisfying F, 𝑟 = 𝜋𝑅1 (𝑟) ∗ 𝜋𝑅2 (𝑟) ∗ . . . ∗ 𝜋𝑅𝑘 (𝑟)

Lossless join property


F – set of FDs that hold on R
R – decomposed into R₁, R₂,…,𝑅𝑘

41 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Decomposition is lossless wrt F if for every relation instance r on R satisfying F,

𝑟 = 𝜋𝑅1 (𝑟) ∗ 𝜋𝑅2 (𝑟) ∗ . . . ∗ 𝜋𝑅𝑘 (𝑟)

ii. Dependency Preserving Property: If a set of functional dependencies hold on R it should be possible to enforce F
by enforcing appropriate dependencies on each r1 .
Decomposition D= (R1, R2, R3, ………, Rk) of schema R preserves a set of dependencies
F if,
(π R1 (F) U π R2 (F) U . . . . . . . . . . . π Rn (F)) +=F+
π Ri(F) is the projection of F onto Ri.
i.e Any FD that logically follows from F must also logically follows from the union of projection of F onto Ri ‘S . Then
D is called dependency preserving.

Dependency Preserving Decompositions


Decomposition D = (R₁, R₂, …, R k) of schema R preserves a set of dependencies F if
+
(πR1 (F) ∪ πR2 (F) ∪ … ∪ πRk (F)) = 𝐹 +
Here, 𝜋𝑅𝑖 (F) = { (X → Y) ∈ F + | X ⊆ R i , Y ⊆ R i }
(called projection of F onto R i )
Informally, any FD that logically follows from F must also logically follow from the union of projections of F onto R i ’s
Then, D is called dependency preserving.

Join Dependency
Join dependency is the term used to indicate the property of a relation schema that cannot be decomposed losslesly
into two relations schema, but can be decomposed losslesly into three or more simpler relation schema. It means that
a table, after it has been decomposed into three or more smaller tables must be capable of being joined again on
common keys to form the original table.

Algorithm for BCNF decomposition


R – given schema. F – given set of FDs
D = {R} // initial decomposition
while there is a relation schema R i in D that is not in BCNF do
{let X → A be the FD in R i violating BCNF;
Replace R i by R i1 = R i – {A} and R i2 = X ∪ {A} in D;}

Decomposition of R i is lossless as
R i1 ∩ R i2 = X, R i2 – R i1 = A and X → A
Result: a lossless decomposition of R into BCNF relations

5.8 Fifth Normal Form


Fifth normal form (5NF also called PJ/NF) requires that there are no non-trivial join dependencies that do not follow
from the key constraints. A table is said to be in the 5NF if and only if it is in 4NF and the candidate keys imply every
join dependency in it.

Goals of Normalization
• Let R be a relation scheme with a set F of functional dependencies.
• Decide whether a relation scheme R is in “good” form.
42 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
• In the case that a relation scheme R is not in “good” form, decompose it into a set of relation scheme {R1, R2, ..., R
n} such that each relation scheme is in good form the decomposition is a lossless-join decomposition Preferably,
the decomposition should be dependency preserving.
‘A relation R is in fifth normal form (5NF) – also called projection-join normal form (PJ/NF) if and only if every
join dependency in R is a consequence of the candidate keys of R.’

For every normal form it is assumed that every occurrence of R can be uniquely identified by a primary key using one
or more attributes in R.

For R to be in 5th Normal Form it should follow the following rules:


1. It should be in 4NF and
2. If Join Dependency does not exist
else
If Join Dependency exits then it should be Trivial in Nature
else
If all the Ri are Super Key; R-> (R1, R2, R3 ... Ri)

• Join dependencies generalize multivalued dependencies lead to project-join normal form (PJNF) (also called
fifth normal form)
• A class of even more general constraints, leads to a normal form called domain key normal form.
• Problem with these generalized constraints: are hard to reason with, and no set of sound and complete set of
inference rules exists. Hence rarely used.

In short, Normalization of a Database is achieved by following a set of rules called 'forms' in creating the
database.
These rules are 5 in number (with one extra one stuck in-between 3&4) and they are:

• 1st Normal Form or 1NF: Each Column Type is Unique.


• 2nd Normal Form or 2NF: The entity under consideration should already be in the 1NF and all
attributes within the entity should depend solely on the entity's unique identifier.
• 3rd Normal Form or 3NF: The entity should already be in the 2NF and no column entry should be
dependent on any other entry (value) other than the key for the table. If such an entity exists, move it
outside into a new table.
• Now if these 3NF are achieved, the database is considered normalized. But there are three more
'extended' NF for the elitist. These are:
• BCNF (Boyce & Codd): The database should be in 3NF and all tables can have only one primary key.
• 4NF: Tables cannot have multi-valued dependencies on a Primary Key.
• 5NF: There should be no cyclic dependencies in a composite key.

6. Structured Query Language

What's SQL ?
In 1971, IBM researchers created a simple non-procedural language called Structured English Query Language. or
SEQUEL. This was based on Dr. Edgar F. (Ted) Codd's design of a relational model for data storage where he
described a universal programming language for accessing databases.

In the late 80's ANSI and ISO (these are two organizations dealing with standards for a wide variety of things) came
out with a standardized version called Structured Query Language or SQL. SQL is prounced as 'Sequel'. There have
been several versions of SQL and the latest one is SQL-99. Though SQL-92 is the current universally adopted
standard.
SQL is the language used to query all databases. It's simple to learn and appears to do very little but is the heart of a
successful database application. Understanding SQL and using it efficiently is highly imperative in designing an
efficient database application. The better your understanding of SQL the more versatile you'll be in getting
information out of databases.

43 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Onto SQL
There are four basic commands which are the workhorses for SQL and figure in almost all queries to a database.
INSERT - Insert Data
DELETE - Delete Data
SELECT - Pull Data
UPDATE - Change existing Data

As you can see SQL is like English. Let's build a real world example database using MySQL and perform some SQL
operations on it. A database that practically anyone could use would be a Contacts database. In our example we are
going to create a database with the following fields:

First Name, Last Name, Birth Date, Street Address, City, State, Zip, Country, Telephone Home, Telephone Work, Email,
Company Name, Designation

First, let’s decide how we are going to store this data in the database. For illustration purposes, we are going to
keep this data in multiple tables.
This will let us exercise all the SQL commands pertaining to retrieving data from multiple tables. Also, we can
separate different kinds of entities into different tables. So, let's say you have thousands of friends and need to send a
mass email to all of them, a SELECT statement (covered later) will look at only one table.
Well, we can keep the FirstName, LastName and BirthDate in one table.
Address related data in another.
Company Details in another.
Emails can be separated into another.
Telephones can be separated into another.

Let's build the database in MySQL


While building a database - you need to understand the concept of data types. Data types allow the user to define
how data is stored in fields or cells within a database. It's a way to define how your data will actually exist. Whether
it's a Date or a string consisting of 20 characters, an integer etc. When we build tables within a database we also define
the contents of each field in each row in the table using a data type. It's imperative that you use only the data type that
fits your needs and don't use a data type that reserves more memory than the data in the field actually requires.
Let's look at various Data Types under MySQL

Size in
Type Description
bytes
TINYINT (length) 1 Integer with unsigned range of 0-255 and a signed range from -128-127
Integer with unsigned range of 0-65535 and a signed range from
SMALLINT (length) 2
-32768-32767
Integer with unsigned range of 0-16777215 and a signed range from
MEDIUMINT(length) 3
-8388608-8388607
Integer with unsigned range of 0-429467295 and a signed range from
INT(length) 4
-2147483648-2147483647
Integer with unsigned range of 0-18446744 and a signed range from
BIGINT(length) 8
-9223372036854775808-9223372036854775807
Floating point number with max. value +/-3.402823466E38 and
FLOAT(length, decimal) 4
min.(non-zero) value +/11.175494351E-38
DOUBLEPRECISION(length, Floating point number with max. value +/- -1.7976931348623157E308 and
8
decimal) min. (non-zero) value +/-2.2250738585072014E-308
Floating point number with the range of the DOUBLE type that is stored
DECIMAL(length, decimal) length
as a CHAR field type.
YYYYMMDDHHMMSS or YYMMDDHHMMSS or YYYYMMDD,
TIMESTAMP(length) 4 YYMMDD. A Timestamp value is updated each time the row changes
value. A NULL value sets the field to the current time.
DATE 3 YYYY-MM-DD
TIME 3 HH:MM:DD
DATETIME 8 YYYY-MM-DD HH:MM:SS
YEAR 1 YYYY or YY
44 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
A fixed length text string where fields shorter than the assigned length are
CHAR(length) length
filled with trailing spaces.
A fixed length text string (255 Character Max) where unused trailing
VARCHAR(length) length
spaces are removed before storing.
TINYTEXT length+1 A text field with max. length of 255 characters.
TINYBLOB length+1 A binary field with max. length of 255 characters.
TEXT length+1 64Kb of text
BLOB length+1 64Kb of data
MEDIUMTEXT length+3 16Mb of text
MEDIUMBLOB length+3 16 Mb of data
LONGTEXT length+4 4GB of text
LONGBLOB length+4 4GB of data
This field can contain one of a possible 65535 number of options. Ex:
ENUM 1,2
ENUM('abc','def','ghi')
This type of field can contain any number of a set of predefined possible
SET 1-8
values.

The following examples will make things quite clear on declaring Data Types within SQL statements.
Steps in Creating the Database using MySQL.

From the shell prompt (either in DOS or UNIX):


mysqladmin create contacts;
This will create an empty database called "contacts".
Now run the command line tool "mysql" and from the mysql prompt do the following:
mysql> use contacts;
(You'll get the response "Database changed")
The following commands entered into the MySQL prompt will create the tables in the database.
CREATE TABLE names (contact_id SMALLINT NOT NULL AUTO_INCREMENT PRIMARY KEY, FirstName
CHAR(20), LastName CHAR(20), BirthDate DATE);
CREATE TABLE address(contact_id SMALLINT NOT NULL PRIMARY KEY, StreetAddress CHAR(50), City
CHAR(20), State CHAR(20), Zip CHAR(15), Country CHAR(20));
CREATE TABLE telephones (contact_id SMALLINT NOT NULL PRIMARY KEY, TelephoneHome CHAR(20),
TelephoneWork(20));
CREATE TABLE email (contact_id SMALLINT NOT NULL PRIMARY KEY, Email CHAR(20));

CREATE TABLE company_details (contact_id SMALLINT NOT NULL PRIMARY KEY, CompanyName
CHAR(25), Designation CHAR(15));

Note: Here we assume that one person will have only one email address. Now if there were a situation where one person has
multiple addresses, this design would be a problem. We'd need another field which would keep values that indicated to whom the
email address belonged to. In this particular case email data ownership is indicated by the primary key. The same is true for
telephones. We are assuming that one person has only one home telephone and one work telephone number. This need not be true.
Similarly, one person could work for multiple companies at the same time holding two different designation. In all these cases an
extra field will solve the issue. For now, however let's work with this small design.

KEYS: The relationships between columns located in different tables are usually described through the use of keys.
As you can see we have a PRIMARY KEY in each table. The Primary key serves as a mechanism to refer to other fields
within the same row. In this case, the Primary key is used to identify a relationship between a row under
consideration and the person whose name is located inside the 'names' table. We use the AUTO_INCREMENT
statement only for the 'names' table as we need to use the generated contact_id number in all the other tables for
identification of the rows.
This type of table design where one table establishes a relationship with several other tables is known as a 'one to
many’ relationship.

In a 'many to many' relationship we could have several Auto Incremented Primary Keys in various tables with
several inter-relationships.
Foreign Key: A foreign key is a field in a table which is also the Primary Key in another table. This is known
commonly as 'referential integrity'.
45 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
Execute the following commands to see the newly created tables and their contents.
To see the tables inside the database:
mysql> SHOW TABLES;
+-----------------------+
| Tables in contacts |
+-----------------------+
| address |
| company_details |
| email |
| names |
| telephones |
+----------------------+
5 rows in set (0.00 sec)
To see the columns within a particular table:
mysql>SHOW COLUMNS FROM address;
+---------------+-------------+------+-----+---------+-------+---------------------------------+
| Field | Type | Null | Key | Default | Extra | Privileges
|
+---------------+-------------+------+-----+---------+-------+---------------------------------+
| contact_id | smallint(6) | | PRI | 0 | | select,insert,update,references |
| StreetAddress | char(50) | YES | | NULL | | select,insert,update,references |
| City | char(20) | YES | | NULL | | select,insert,update,references |
| State | char(20) | YES | | NULL | | select,insert,update,references |
| Zip | char(10) | YES | | NULL | | select,insert,update,references |
| Country | char(20) | YES | | NULL | | select,insert,update,references |
+---------------+-------------+------+-----+---------+-------+------------------ ---------------+
6 rows in set (0.00 sec)
So, we have the tables created and ready. Now we put in some data.
Let's start with the 'names' table as it uses a unique AUTO_INCREMENT field which in turn is used in the
other tables.

Inserting data, one row at a time:


mysql> INSERT INTO names (FirstName, LastName, BirthDate) VALUES ('Yamila','Diaz ','1974-10-13');
Query OK, 1 row affected (0.00 sec)

Inserting multiple rows at a time:


mysql> INSERT INTO names (FirstName, LastName, BirthDate) VALUES
('Nikki','Taylor','1972-03-04'),('Tia','Carrera','1975-09-18');
Query OK, 2 rows affected (0.00 sec)
Records: 2 Duplicates: 0 Warnings: 0
Let's see what the data looks like inside the table. We use the SELECT command for this.
mysql> SELECT * from NAMES;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
3 rows in set (0.06 sec)

Try another handy command called 'DESCRIBE'.


mysql> DESCRIBE names;
+------------+-------------+------+-----+---------+----------------+---------------------------------+
| Field | Type | Null | Key | Default | Extra | Privileges
|
+------------+-------------+------+-----+---------+----------------+---------------------------------+
| contact_id | smallint(6) | | PRI | NULL | auto_increment | select,insert,update,references |
| FirstName | char(20) | YES | | NULL | | select,insert,update,references |
46 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
| LastName | char(20) | YES | | NULL | | select,insert,update,references |
| BirthDate | date | YES | | NULL | | select,insert,update,references |
+------------+-------------+------+-----+---------+----------------+---------------------------------+
4 rows in set (0.00 sec)
Now lets populate the other tables. Observer the syntax used.
mysql> INSERT INTO address(contact_id, StreetAddress, City, State, Zip, Country) VALUES ('1', '300 Yamila
Ave.', 'Los Angeles', 'CA', '300012', 'USA'),('2','4000 Nikki St.','Boca Raton','FL','500034','USA'),('3','404 Tia
Blvd.','New York','NY','10011','USA');
Query OK, 3 rows affected (0.05 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM address;
+------------+-----------------+-------------+-------+--------+---------+
| contact_id | StreetAddress | City | State | Zip | Country |
+------------+-----------------+-------------+-------+--------+---------+
| 1 | 300 Yamila Ave. | Los Angeles | CA | 300012 | USA |
| 2 | 4000 Nikki St. | Boca Raton | FL | 500034 | USA |
| 3 | 404 Tia Blvd. | New York | NY | 10011 | USA |
+------------+-----------------+-------------+-------+--------+---------+
3 rows in set (0.00 sec)
mysql> INSERT INTO company_details (contact_id, CompanyName, Designation) VALUES ('1','Xerox','New
Business Manager'), ('2','Cabletron','Customer Support Eng'),('3','Apple','Sales Manager');
Query OK, 3 rows affected (0.05 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM company_details;
+------------+-------------+----------------------+
| contact_id | CompanyName | Designation |
+------------+-------------+----------------------+
| 1 | Xerox | New Business Manager |
| 2 | Cabletron | Customer Support Eng |
| 3 | Apple | Sales Manager |
+------------+-------------+----------------------+
3 rows in set (0.06 sec)
mysql> INSERT INTO email (contact_id, Email) VALUES ('1', '[email protected]'),( '2',
'[email protected]'),('3','[email protected]');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM email;
+------------+-------------------+
| contact_id | Email |
+------------+-------------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
+------------+-------------------+
3 rows in set (0.06 sec)
mysql> INSERT INTO telephones (contact_id, TelephoneHome, TelephoneWork) VALUES
('1','333-50000','333-60000'),('2','444-70000','444-80000'),('3','555-30000','55 5-40000');
Query OK, 3 rows affected (0.00 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SELECT * FROM telephones;
+------------+---------------+---------------+
| contact_id | TelephoneHome | TelephoneWork |
+------------+---------------+---------------+
| 1 | 333-50000 | 333-60000 |
| 2 | 444-70000 | 444-80000 |
| 3 | 555-30000 | 555-40000 |
+------------+---------------+---------------+
3 rows in set (0.00 sec)

47 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Okay, so we now have all our data ready for experimentation.

Before we start experimenting with manipulating the data let's look at how MySQL stores the Data.
To do this execute the following command from the shell prompt.
mysqldump contacts > contacts.sql
Note: The reverse operation for this command is:
mysql contacts < contacts.sql
The file generated is a text file that contains all the data and SQL instruction needed to recreate the same
database. As you can see, the SQL here is slightly different than what was typed in. Don't worry about this.
It's all good ! It would also be obvious that this is a good way to backup your stuff.
# MySQL dump 8.2
#
# Host: localhost Database: contacts
#--------------------------------------------------------
# Server version 3.22.34-shareware-debug
#
# Table structure for table 'address'
#
CREATE TABLE address (
contact_id smallint(6) DEFAULT '0' NOT NULL,
StreetAddress char(50),
City char(20),
State char(20),
Zip char(10),
Country char(20),
PRIMARY KEY (contact_id)
);
#
# Dumping data for table 'address'
#
INSERT INTO address VALUES (1,'300 Yamila Ave.','Los Angeles','CA','300012','USA');
INSERT INTO address VALUES (2,'4000 Nikki St.','Boca Raton','FL','500034','USA');
INSERT INTO address VALUES (3,'404 Tia Blvd.','New York','NY','10011','USA');
#
# Table structure for table 'company_details'
#
CREATE TABLE company_details (
contact_id smallint(6) DEFAULT '0' NOT NULL,
CompanyName char(25),
Designation char(20),
PRIMARY KEY (contact_id)
);
#
# Dumping data for table 'company_details'
#
INSERT INTO company_details VALUES (1,'Xerox','New Business Manager');
INSERT INTO company_details VALUES (2,'Cabletron','Customer Support Eng');
INSERT INTO company_details VALUES (3,'Apple','Sales Manager');
#
# Table structure for table 'email'
#
CREATE TABLE email (
contact_id smallint(6) DEFAULT '0' NOT NULL,
Email char(20),
PRIMARY KEY (contact_id)
);
#
# Dumping data for table 'email'
#
48 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
INSERT INTO email VALUES (1,'[email protected]');
INSERT INTO email VALUES (2,'[email protected]');
INSERT INTO email VALUES (3,'[email protected]');
#
# Table structure for table 'names'
#
CREATE TABLE names (
contact_id smallint(6) DEFAULT '0' NOT NULL auto_increment,
FirstName char(20),
LastName char(20),
BirthDate date,
PRIMARY KEY (contact_id)
);
#
# Dumping data for table 'names'
#
INSERT INTO names VALUES (3,'Tia','Carrera','1975-09-18');
INSERT INTO names VALUES (2,'Nikki','Taylor','1972-03-04');
INSERT INTO names VALUES (1,'Yamila','Diaz','1974-10-13');
#
# Table structure for table 'telephones'
#
CREATE TABLE telephones (
contact_id smallint(6) DEFAULT '0' NOT NULL,
TelephoneHome char(20),
TelephoneWork char(20),
PRIMARY KEY (contact_id)
);
#
# Dumping data for table 'telephones'
#
INSERT INTO telephones VALUES (1,'333-50000','333-60000');
INSERT INTO telephones VALUES (2,'444-70000','444-80000');
INSERT INTO telephones VALUES (3,'555-30000','555-40000');
Let's try some SELECT statement variations:
To select all names whose corresponding contact_id is greater than 1.
mysql> SELECT * FROM names WHERE contact_id > 1;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 2 | Nikki | Taylor | 1972-03-04 |
+------------+-----------+----------+------------+
2 rows in set (0.00 sec)
As a condition we can also use NOT NULL. This statement will return all names where there exists a
contact_id.
mysql> SELECT * FROM names WHERE contact_id IS NOT NULL;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
3 rows in set (0.06 sec)
Results can be arranged in a particular way using the statement ORDER BY.
mysql> SELECT * FROM names WHERE contact_id IS NOT NULL ORDER BY LastName;
+------------+-----------+----------+------------+

49 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 1 | Yamila | Diaz | 1974-10-13 |
| 2 | Nikki | Taylor | 1972-03-04 |
+------------+-----------+----------+------------+
3 rows in set (0.06 sec)
'asc' and 'desc' stand for ascending and descending respectively and can be used to arrange the results.
mysql> SELECT * FROM names WHERE contact_id IS NOT NULL ORDER BY LastName desc;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
| 3 | Tia | Carrera | 1975-09-18 |
+------------+-----------+----------+------------+
3 rows in set (0.04 sec)
You can also place date types into conditional statements.
mysql> SELECT * FROM names WHERE BirthDate > '1973-03-06';
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
2 rows in set (0.00 sec)
LIKE is a statement to match field values using wildcards. The % sign is used for denoting wildcards and can
represent multiple characters.
mysql> SELECT FirstName, LastName FROM names WHERE LastName LIKE 'C%';
+-----------+----------+
| FirstName | LastName |
+-----------+----------+
| Tia | Carrera |
+-----------+----------+
1 row in set (0.06 sec)
'_' is used to represent a single wildcard.
mysql> SELECT FirstName, LastName FROM names WHERE LastName LIKE '_iaz';
+-----------+----------+
| FirstName | LastName |
+-----------+----------+
| Yamila | Diaz |
+-----------+----------+
1 row in set (0.00 sec)

SQL Logical Operations (operates from Left to Right)


1. NOT or !
2. AND or &&
3. OR or ||
4. = : Equal
5. <> or != : Not Equal
6. <=
7. >=
8. <,>
Here are some more variations with Logical Operators and using the 'IN' statement.
mysql> SELECT FirstName FROM names WHERE contact_id < 3 AND LastName LIKE 'D%';
+-----------+
| FirstName |
+-----------+
| Yamila |
50 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
+-----------+
1 row in set (0.00 sec)
mysql> SELECT contact_id FROM names WHERE LastName IN ('Diaz','Carrera');
+------------+
| contact_id |
+------------+
|3|
|1|
+------------+
2 rows in set (0.02 sec)
To return the number of rows in a table
mysql> SELECT count(*) FROM names;
+----------+
| count(*) |
+----------+
|3|
+----------+
1 row in set (0.02 sec)
mysql> SELECT count(FirstName) FROM names;
+------------------+
| count(FirstName) |
+------------------+
|3|
+------------------+
1 row in set (0.00 sec)
To do some basic arithmetic aggregate functions.
mysql> SELECT SUM(contact_id) FROM names;
+-----------------+
| SUM(contact_id) |
+-----------------+
|6|
+-----------------+
1 row in set (0.00 sec)
To select a largest value from a row. Substitute 'MIN' and see what happens next.
mysql> SELECT MAX(contact_id) FROM names;
+-----------------+
| MAX(contact_id) |
+-----------------+
|3|
+-----------------+
1 row in set (0.00 sec)

HAVING
Take a look at the first query using the statement WHERE and the second statement using the statement
HAVING.
mysql> SELECT * FROM names WHERE contact_id >=1;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 1 | Yamila | Diaz | 1974-10-13 |
| 2 | Nikki | Taylor | 1972-03-04 |
| 3 | Tia | Carrera | 1975-09-18 |
+------------+-----------+----------+------------+
3 rows in set (0.03 sec)
mysql> SELECT * FROM names HAVING contact_id >=1;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
51 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
3 rows in set (0.00 sec)
Now lets work with multiple tables and see how information can be pulled out of the data.
mysql> SELECT names.contact_id, FirstName, LastName, Email FROM names, email WHERE
names.contact_id = email.contact_id;
+------------+-----------+----------+-------------------+
| contact_id | FirstName | LastName | Email |
+------------+-----------+----------+-------------------+
| 1 | Yamila | Diaz | [email protected] |
| 2 | Nikki | Taylor | [email protected] |
| 3 | Tia | Carrera | [email protected] |
+------------+-----------+----------+-------------------+
3 rows in set (0.11 sec)
mysql> SELECT DISTINCT names.contact_id, FirstName, Email, TelephoneWork FROM names, email,
telephones WHERE names.contact_id=email.contact_id=telephones.contact_id;
+------------+-----------+-------------------+---------------+
| contact_id | FirstName | Email | TelephoneWork |
+------------+-----------+-------------------+---------------+
| 1 | Yamila | [email protected] | 333-60000 |
| 2 | Nikki | [email protected] | 333-60000 |
| 3 | Tia | [email protected] | 333-60000 |
+------------+-----------+-------------------+---------------+
3 rows in set (0.05 sec)

So what's a JOIN?
JOIN is the action performed on multiple tables that returns a result as a table. It's what makes a database
'relational'.
There are several types of joins. Let's look at LEFT JOIN (OUTER JOIN) and RIGHT JOIN
Let's first check out the contents of the tables we're going to use
mysql> SELECT * FROM names;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
3 rows in set (0.00 sec)
mysql> SELECT * FROM email;
+------------+-------------------+
| contact_id | Email |
+------------+-------------------+
| 1 | [email protected] |
| 2 | [email protected] |
| 3 | [email protected] |
+------------+-------------------+
3 rows in set (0.00 sec)

A LEFT JOIN First:


mysql> SELECT * FROM names LEFT JOIN email USING (contact_id);
+------------+-----------+----------+------------+------------+------------------+
| contact_id | FirstName | LastName | BirthDate | contact_id | Email|
+------------+-----------+----------+------------+------------+-------------------+
| 3 | Tia | Carrera | 1975-09-18 | 3 | [email protected] |
| 2 | Nikki | Taylor | 1972-03-04 | 2 | [email protected] |
| 1 | Yamila | Diaz | 1974-10-13 | 1 | [email protected] |

52 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
+------------+-----------+----------+------------+------------+-------------------+
3 rows in set (0.16 sec)
To find the people who have a home phone number.
mysql> SELECT names.FirstName FROM names LEFT JOIN telephones ON names.contact_id =
telephones.contact_id WHERE TelephoneHome IS NOT NULL;
+-----------+
| FirstName |
+-----------+
| Tia |
| Nikki |
| Yamila |
+-----------+
3 rows in set (0.02 sec)
These same query leaving out 'names' (from names.FirstName) is still the same and will generate the same
result.
mysql> SELECT FirstName FROM names LEFT JOIN telephones ON names.contact_id =
telephones.contact_id WHERE TelephoneHome IS NOT NULL;
+-----------+
| FirstName |
+-----------+
| Tia |
| Nikki |
| Yamila |
+-----------+
3 rows in set (0.00 sec)

And now a RIGHT JOIN:


mysql> SELECT * FROM names RIGHT JOIN email USING(contact_id);
+------------+-----------+----------+------------+------------+----------------- --+
| contact_id | FirstName | LastName | BirthDate | contact_id | Email |
+------------+-----------+----------+------------+------------+-------------------+
| 1 | Yamila | Diaz | 1974-10-13 | 1 | [email protected] |
| 2 | Nikki | Taylor | 1972-03-04 | 2 | [email protected]
|
| 3 | Tia | Carrera | 1975-09-18 | 3 | [email protected]
|
+------------+-----------+----------+------------+------------+-----------------
--+
3 rows in set (0.03 sec)

BETWEEN
This conditional statement is used to select data where a certain related contraint falls between a certain range
of values. The following example illustrates it's use.
mysql> SELECT * FROM names;
+------------+-----------+----------+------------+
| contact_id | FirstName | LastName | BirthDate |
+------------+-----------+----------+------------+
| 3 | Tia | Carrera | 1975-09-18 |
| 2 | Nikki | Taylor | 1972-03-04 |
| 1 | Yamila | Diaz | 1974-10-13 |
+------------+-----------+----------+------------+
3 rows in set (0.06 sec)
mysql> SELECT FirstName, LastName FROM names WHERE contact_id BETWEEN 2 AND 3;
+-----------+----------+
| FirstName | LastName |
+-----------+----------+
| Tia | Carrera |
53 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
| Nikki | Taylor |
+-----------+----------+
2 rows in set (0.00 sec)

ALTER
The ALTER statement is used to add a new column to an existing table or to make changes to it.
mysql> ALTER TABLE names ADD Age SMALLINT;
Query OK, 3 rows affected (0.11 sec)
Records: 3 Duplicates: 0 Warnings: 0
Now let's take a look at the 'ALTER'ed Table.
mysql> SHOW COLUMNS FROM names;
+------------+-------------+------+-----+---------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+
| contact_id | smallint(6) | | PRI | 0 | auto_increment |
| FirstName | char(20) | YES | | NULL | |
| LastName | char(20) | YES | | NULL | |
| BirthDate | date | YES | | NULL | |
| Age | smallint(6) | YES | | NULL | |
+------------+-------------+------+-----+---------+----------------+
5 rows in set (0.06 sec)
But we don't require Age to be a SMALLINT type when a TINYINT would suffice. So we use another ALTER
statement.
mysql> ALTER TABLE names CHANGE COLUMN Age Age TINYINT;
Query OK, 3 rows affected (0.02 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SHOW COLUMNS FROM names;
+------------+-------------+------+-----+---------+--------+----------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+--------+----------------+
| contact_id | smallint(6) | | PRI | NULL |
| FirstName | char(20) | YES | | NULL | |
| LastName | char(20) | YES | | NULL | |
| BirthDate | date | YES | | NULL | |
| Age | tinyint(4) | YES | | NULL | |
+------------+-------------+------+-----+---------+--------+----------------+
5 rows in set (0.00 sec)

MODIFY
You can also use the MODIFY statement to change column data types.
mysql> ALTER TABLE names MODIFY COLUMN Age SMALLINT;
Query OK, 3 rows affected (0.03 sec)
Records: 3 Duplicates: 0 Warnings: 0
mysql> SHOW COLUMNS FROM names;
+------------+-------------+------+-----+---------+----------------+---------------+
| Field | Type | Null | Key | Default | Extra |
+------------+-------------+------+-----+---------+----------------+---------------+
| contact_id | smallint(6) | | PRI | NULL | auto_increment |
| FirstName | char(20) | YES | | NULL | |
| LastName | char(20) | YES | | NULL | |
| BirthDate | date | YES | | NULL | |
| Age | smallint(6) | YES | | NULL | |
+------------+-------------+------+-----+---------+----------------+---------------+
5 rows in set (0.00 sec)
To Rename a Table:
mysql> ALTER TABLE names RENAME AS mynames;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW TABLES;
+--------------------+
54 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
| Tables_in_contacts |
+--------------------+
| address |
| company_details |
| email |
| mynames |
| telephones |
+--------------------+
5 rows in set (0.00 sec)
We rename it back to the original name.
mysql> ALTER TABLE mynames RENAME AS names;
Query OK, 0 rows affected (0.01 sec)

UPDATE
The UPDATE command is used to add a value to a field in a table.
mysql> UPDATE names SET Age ='23' WHERE FirstName='Tia';
Query OK, 1 row affected (0.06 sec)
Rows matched: 1 Changed: 1 Warnings: 0
The Original Table:
mysql> SELECT * FROM names;
+------------+-----------+----------+------------+------+
| contact_id | FirstName | LastName | BirthDate | Age |
+------------+-----------+----------+------------+------+
| 3 | Tia | Carrera | 1975-09-18 | 23 |
| 2 | Nikki | Taylor | 1972-03-04 | NULL |
| 1 | Yamila | Diaz | 1974-10-13 | NULL |
+------------+-----------+----------+------------+------+
3 rows in set (0.05 sec)
The Modified Table:
mysql> SELECT * FROM names;
+------------+-----------+----------+------------+------+
| contact_id | FirstName | LastName | BirthDate | Age |
+------------+-----------+----------+------------+------+
| 3 | Tia | Carrera | 1975-09-18 | 24 |
| 2 | Nikki | Taylor | 1972-03-04 | NULL |
| 1 | Yamila | Diaz | 1974-10-13 | NULL |
+------------+-----------+----------+------------+------+
3 rows in set (0.00 sec)

DELETE
mysql> DELETE FROM names WHERE Age=23;
Query OK, 1 row affected (0.00 sec)
mysql> SELECT * FROM names;
+------------+-----------+----------+------------+------+
| contact_id | FirstName | LastName | BirthDate | Age |
+------------+-----------+----------+------------+------+
| 2 | Nikki | Taylor | 1972-03-04 | NULL |
| 1 | Yamila | Diaz | 1974-10-13 | NULL |
+------------+-----------+----------+------------+------+
2 rows in set (0.00 sec)

A DEADLY MISTAKE
mysql> DELETE FROM names;
Query OK, 0 rows affected (0.00 sec)
mysql> SELECT * FROM names;
Empty set (0.00 sec)
One more destructive tool...

DROP TABLE
55 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
mysql> DROP TABLE names;
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW TABLES;
+--------------------+
| Tables in contacts |
+--------------------+
| address |
| company_details |
| email |
| telephones |
+--------------------+
4 rows in set (0.05 sec)
mysql> DROP TABLE address ,company_details, telephones;
Query OK, 0 rows affected (0.06 sec)
mysql> SHOW TABLES;
Empty set (0.00 sec)
As you can see, the table 'names' no longer exists. MySQL does not give a warning so be careful.

Full Text Indexing and Searching


Since version 3.23.23, Full Text Indexing and Searching has been introduced into MySQL. FULLTEXT indexes can be
created from VARCHAR and TEXT columns. FULLTEXT searches are performed with the MATCH function. The
MATCH function matches a natural language query on a text collection and from each row in a table it returns
relevance. The resultant rows are organized in order of relevance.
Full Text searches are a very powerful way to search through text. But is not ideal for small tables of text and may
produce inconsistent results. Ideally it works with large collections of textual data.

Optimizing your Database: Well, databases do tend to get large at some or the other. And here arises the issue of
database optimization. Queries are going to take longer and longer as the database grows, and certain things can be
done to speed things up.

Clustering: The easiest method is that of 'clustering'. Suppose you do a certain kind of query often, it would be faster
if the database contents were arranged in the same way data was requested. To keep the tables in a sorted order you
need a clustering index. Some databases keep stuff sorted automatically.

Ordered Indices: These are a kind of 'lookup' tables of sorts. For each column that may be of interest to you, you can
create an ordered index. It needs to be noted that again these kinds of optimization techniques produce a system load
in terms of creating a new index each time the data is re-arranged.
There are additional methods such as B-Trees, hashing which you may like to read up about but will not be discussed
here.

Replication: Replication is the term given to the process where databases synchronize with each other. In this process
one database updates its own data with respect to another or with reference to certain criteria for updates specified by
the programmer. Replication can be used under various circumstances. Examples may be: safety and backup, to
provide a closer location to the database for certain users.

What are Transactions?


In an RDBMS, when several people access the same data or if a server dies in the middle of an update, there must be a
mechanism to protect the integrity of the data. Such a mechanism is called a Transaction. A transaction groups a set of
database actions into a single instantaneous event. This event can either succeed or fail. i.e. either get the job done or
fail.
The definition of a transaction can be provided by an Acronym called 'ACID'.

(A) Atomicity: If an action consists of multiple steps - it's still considered as one operation.

(C) Consistency: The database exists in a valid and accurate operating state before and after a transaction.

(I) Isolation: Processes within one transaction are independent and cannot interfere with that in others.

(D) Durability: Changes affected by a transaction are permanent.


56 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
To enable transactions a mechanism called 'Logging' needs to be introduced. Logging involves a DBMS writing details
on the tables, columns and results of a particular transaction, both before and after, onto a log file. This log file is used
in the process of recovery. Now to protect a certain database resource (ex. a table) from being used and written onto
simultaneously several techniques are used. One of them is 'Locking' another is to put a 'time stamp' onto an action. In
the case of Locking, to complete an action, the DBMS would need to acquire locks on all resources needed to complete
the action. The locks are released only when the transaction is completed.

Now if there were say a large number of tables involved in a particular action, say 50, all 50 tables would be locked till
a transaction is completed. To improve things a bit, there is another technique used called 2 Phase Locking or 2PL. In
this method of locking, locks are acquired only when needed but are released only when the transaction is completed.
This is done to make sure that that altered data can be safely restored if the transaction fails for any reason. This
technique can also result in problems such as "deadlocks".
In this case - processes requiring the same resources lock each other up by preventing the other to complete an
action. Options here are to abort one, or let the programmer handle it. MySQL implements transactions by
implementing the Berkeley DB libraries into its own code. So it's the source version you'd want here for MySQL
installation. Read the MySQL manual on implementing this.

7. Beyond MySQL

What are Views?


A view allows you to assign the result of a query to a new private table. This table is given the name used in your
VIEW query. Although MySQL does not support views yet a sample SQL VIEW construct statement would look like:

CREATE VIEW TESTVIEW AS SELECT * FROM names;


What are Triggers?
A trigger is a pre-programmed notification that performs a set of actions that may be commonly required. Triggers
can be programmed to execute certain actions before or after an event occurs. Triggers are very useful as they they
increase efficiency and accuracy in performing operations on databases and also are increase productivity by reducing
the time for application development. Triggers however do carry a price in terms of processing overhead.

What are Procedures?


Like triggers, Procedures or 'Stored' Procedures are productivity enhancers. Suppose you needed to perform an action
using a programming interface to the database in say PERL and ASP. If a programmed action could be stored at the
database level, it's obvious that it has to be written only once and can be called by any programming language
interacting with the database. Procedures are executed using triggers.

Beyond RDBMS
Distributed Databases (DDB)
A distributed database is a collection of several, logically interrelated database located at multiple locations of a
computer network. A distributed database management system permits the management of such a database and
makes the operation transparent to the user. Good examples of distributed databases would be those utilized by
banks, multinational firms with several office locations where each distributed data system works only with the data
that is relevant to it's operations. DDBs have full functionality of any DBMS. It's also important to know that the
distributed databases are considered to be actually one database rather than discrete files and data within distributed
databases are logically interrelated.

Object Database Management Systems or ODBMS


When the capabilities of a database are integrated with object programming language capabilities, the resulting
product is an ODBMS. Database objects appear as programming objects in an ODBMS. Using an ODBMS offers
several advantages. The ones that can be most readily appreciated are:
1. Efficiency: When you use an ODBMS, you're using data the way you store it. You will use less code as you're not
dependent on an intermediary like SQL or ODBC. When this happens you can create highly complex data structures
through your programming language.
2. Speed: When data is stored the way you'd like it to be stored (i.e. natively) there is a massive performance increase
as no to-and-fro translation is required.

57 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
A Quick Tutorial on Database Normalization
Let's start off by taking some data represented in a Table.
Table Name: College Table
StudentNa CourseID CourseTitle CourseProfesso CourseID CourseTitle CourseProfesso StudentAdvis StudentI
me 1 1 r1 2 2 r2 or D
Object
Perl Regular Oriented Fred
Tia Carrera CS123 Don Corleone CS003 Daffy Duck 400
Expressions Programmi Flintstone
ng 1
Socket
Homer
John Wayne CS456 Programmi DJ Tiesto CS004 Algorithms Barney Rubble 401
Simpson
ng
Data
Lara Croft CS789 OpenGL Bill Clinton CS001 Papa Smurf Seven of Nine 402
Structures
(text size has been shrunk to aid printability on one page)
The First Normal Form: (Each Column Type is Unique and there are no repeating groups [types] of data)
This essentially means that you indentify data that can exist as a separate table and therefore reduce repetition and
will reduce the width of the original table.

We can see that for every student, Course Information is repeated for each course. So if a student has three course,
you'll need to add another set of columns for Course Title, Course Professor and CourseID. So Student information
and Course Information can be considered to be two broad groups.
Table Name: Student Information
StudentID (Primary Key)
StudentName
AdvisorName

Table Name: Course Information


CourseID (Primary Key)
CourseTitle
CourseDescription
CourseProfessor
It's obvious that we have here a Many to Many relationship between Students and Courses.
Note: In a Many to Many relationship we need something called a relating table which basically contains information
exclusively on which relatioships exist between two tables. In a One to Many relationship we use a foreign key.
So in this case we need another little table called: Students and Courses
Table Name: Students and Courses
SnCStudentID
SnCCourseID
The Second Normal Form: (All attributes within the entity should depend solely on the entity's unique identifier)
The AdvisorName under Student Information does not depend on the StudentID. Therefore it can be moved to it's
own table.
Table Name: Student Information
StudentID (Primary Key)
StudentName

Table Name: Advisor Information


AdvisorID
AdvisorName

Table Name: Course Information


CourseID (Primary Key)
CourseTitle
CourseDescription
CourseProfessor

58 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Table Name: Students and Courses
SnCStudentID
SnCCourseID
Note: Relating Tables can be created as required.
The Third Normal Form: (no column entry should be dependent on any other entry (value) other than the key for the
table)
In simple terms - a table should contain information about only one thing.
In Course Information, we can pull CourseProfessor information out and store it in another table.
Table Name: Student Information
StudentID (Primary Key)
StudentName

Table Name: Advisor Information


AdvisorID
AdvisorName

Table Name: Course Information


CourseID (Primary Key)
CourseTitle
CourseDescription

Table Name: Professor Information


ProfessorID
CourseProfessor

Table Name: Students and Courses


SnCStudentID
SnCCourseID

Note: Relating Tables can be created as required.


Well that's it. Ones you are done with 3NF the database is considered Normalized.
Now let’s consider some cases where normalization would have to avoided for practical purposes. Suppose we
needed to store a student’s home address along with State and Zip Code information. Would you create a separate
table for every zip code in your country along with one for cities and one for states? It actually depends on you. I
would prefer just using a non-normalized address table and stick everything in there. So exceptions crop up often and
it's up to your better judgement

8. Transaction Control

Transaction: A collection of actions that transforms the DB from one consistent state into another consistent state;
during the execution, the DB might be inconsistent. A transaction can be defined as a logical unit of work on the
database. This may be an entire program, a piece of a program or a single command (like the SQL commands such as
INSERT or UPDATE) and it may engage in any number of operations on the database.

Consider an example of steps involved in a simple transaction of deducting 100 Rupees from Amit’s Account.
Open_Acc (Amit)
OldBal = Amit.bal
NewBal = OldBal - 5000
Ram.bal = NewBal
CloseAccount(Amit)

59 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
8.1 States of Transaction

• Active: Initial state and during the execution


• Partially committed: After the final statement has been executed
• Committed: After successful completion
• Failed: After the discovery that normal execution can no longer proceed
• Aborted: After the transaction has been rolled back and the DB restored to its state prior to the start of the
transaction. Restart it again or kill it.
A transaction enters the failed state after the system determines that the transaction can no longer proceed with its
normal execution (for example, because of hardware or logical errors). Such a transaction must be rolled back. Then, it
enters the aborted state. At this point, the system has two options:
• It can restart the transaction, but only if the transaction was aborted as a result of some hardware or software
error that was not created through the internal logic of the transaction. A restarted transaction is considered to
be a new transaction.
• It can kill the transaction. It usually does so because of some internal logical error that can be corrected only
by rewriting the application program, or because the input was bad, or because the desired data were not
found in the database.

8.2 Concurrency Control

Problems in Concurrency Control


Lost update
• Occurs in two concurrent transactions when:
• Same data element is updated
• One of the updates is lost

Uncommitted data
• Occurs when:
• Two transactions are executed concurrently
• First transaction is rolled back after the second transaction has already accessed uncommitted data

Inconsistent retrievals
• Occurs when a transaction accesses data before and after one or more other transactions finish working with such
data

8.3 Schedules
Schedule – a sequences of instructions that specify the chronological order in which instructions of concurrent
transactions are executed. A schedule for a set of transactions must consist of all instructions of those transactions.
Must preserve the order in which the instructions appear in each individual transaction.

A transaction that successfully completes its execution will have a commit instruction as the last statement (will be
omitted if it is obvious). A transaction that fails to successfully complete its
execution will have an abort instruction as the last statement (will be omitted if it is obvious).

The execution sequences just described are called schedules. They represent the chronological order in which
instructions are executed in the system. Clearly, a schedule for a set of transactions must consist of all instructions of
those transactions, and must preserve the order in which the instructions appear in each individual transaction.
Following is an example of a serial schedule:
60 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
8.4 Serializability
If a schedule S can be transformed into a schedule S´ by a series of swaps of non-conflicting instructions, we say that S
and S´ are conflict equivalent. We say that a schedule S is serializable if it is conflict equivalent to a serial schedule.

Schedule S1 can be transformed into S2, a serial schedule where T2 follows T1, by series of swaps of non-conflicting
instructions
Therefore, Schedule S1 is serializable

Schedule S1:
T1 T2
read (A)
write (A)
read (A)
write (A)
read (B)
write (B)
read (B)
write (B)

Schedule S2:
T1 T2
read (A)
write (A)
read (B)
write (B)
read (A)
write (A)
read (B)
write (B)

Schedule S3 is not serializable: We are unable to swap instructions in the schedule to obtain either the serial schedule
<T3, T4>, or the serial schedule <T4,T3>

Schedule S3:
T3 T4
read (P)
write (P)
write (P)

8.5 Recoverability
Recoverable schedule: For each pair of transactions Ti and Tj, where Tj reads data items written by Ti, Ti must commit
before Tj commits.
61 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
The following schedule is not recoverable, if T6 commits immediately after the read

T5 T6
read (A)
write (A)
read (A)
read (B)

If T5 should abort, T6 would have read (and possibly shown to the user) an inconsistent database state. Hence,
database must ensure that schedules are recoverable.

8.6 Cascadeless schedules

T7 T8 T9
read (A)
read (B)
write (A)
read (A)
write (A)
read (A)

Transaction T7 writes a value of A that is read by Transaction T8. Transaction T8 writes a value of A that is read by
Transaction T9. Suppose at this point T7 fails. T7 must be rolled back, since T8 is dependent on T7, T8 must be rolled
back, T9 is dependent on T8, T9 must be rolled back.

This phenomenon, in which a single transaction failure leads to a series of transaction rollbacks is called
Cascading rollback.
• Cascading rollback is undesirable, since it leads to the undoing of a significant amount of work.
• It is desirable to restrict the schedules to those where cascading rollbacks cannot occur, Such schedules are
called Cascadeless Schedules.
• Formally, a cascadeless schedule is one where for each pair of transaction Ti and Tj such that Tj reads data
item, previously written by Ti the commit operation of Ti appears before the read operation of Tj .
• Every Cascadeless schedule is also recoverable schedule.

Cascadeless Schedule
T10 T11
read (A)
write (A)
read (B)
commit
read (A)
8.7 Implementation of Isolation Levels

The goal of concurrency-control policies is to provide a high degree of concurrency, while ensuring that all schedules
that can be generated are conflict or view serializable, recoverable, and cascadeless.

Locking
Instead of locking the entire database, a transaction could, instead, lock only those data items that it accesses. Under
such a policy, the transaction must hold locks long enough to ensure serializability, but for a period short enough not
to harm performance excessively.

Further improvements to locking result if we have two kinds of locks: shared and exclusive. Shared locks are used for
data that the transaction reads and exclusive locks are used for those it writes. Many transactions can hold shared
locks on the same data item at the same time, but a transaction is allowed an exclusive lock on a data item only if no
other transaction holds any lock (regardless
of whether shared or exclusive) on the data item. This use of two modes of locks along with two-phase locking allows
concurrent reading of data while still ensuring serializability.

62 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
Timestamps
Another category of techniques for the implementation of isolation assigns each transaction a timestamp, typically
when it begins. For each data item, the system keeps two timestamps. The read timestamp of a data item holds the
largest (that is, the most recent) timestamp of those transactions that read the data item. The write timestamp of a data
item holds the timestamp of the transaction that Transactions wrote the current value of the data item.

Snapshot Isolation
By maintaining more than one version of a data item, it is possible to allow a transaction to read an old version of a
data item rather than a newer version written by an uncommitted transaction or by a transaction that should come
later in the serialization order. There are a variety of multi-version concurrency control techniques. One in particular,
called snapshot isolation, is widely used in practice.

Snapshot isolation ensures that attempts to read data never need to wait (unlike locking). Read-only transactions
cannot be aborted; only those that modify data run a slight risk of aborting. Since each transaction reads its own
version or snapshot of the database, reading data does not cause subsequent update attempts by other transactions to
wait (unlike locking). Since most transactions are read-only (and most others read more data than they update), this is
often a major source of performance improvement as compared to locking.

8.9 Lock Based Protocols


A lock is a mechanism to control concurrent access to a data item! Data items can be locked in two modes:
1. exclusive (X) mode. Data item can be both read as well as written. X-lock is requested using lock-X instruction.
2. shared (S) mode. Data item can only be read. S-lock is requested using lock-S instruction.
Lock requests are made to concurrency-control manager. Transaction can proceed only after request is granted.

A transaction may be granted a lock on an item if the requested lock is compatible with locks already held on the item
by other transactions. Any number of transactions can hold shared locks on an item, but if any transaction holds an
exclusive on the item no other transaction may hold any lock on the item. If a lock cannot be granted, the requesting
transaction is made to wait till all incompatible locks held by other transactions have been released. The lock is then
granted.

Drawbacks
The potential for deadlock exists in most locking protocols. Deadlocks are a necessary evil.
Starvation is also possible if concurrency control manager is badly designed.
For example:
• A transaction may be waiting for an X-lock on an item, while a sequence of other transactions request and are
granted an S-lock on the same item.
• The same transaction is repeatedly rolled back due to deadlocks.
Concurrency control manager can be designed to prevent starvation.

The Two-Phase Locking Protocol

One protocol that ensures serializability is the two-phase locking protocol. This protocol requires that each transaction
issue lock and unlock requests in two phases:
1. Growing phase. A transaction may obtain locks, but may not release any lock.
2. Shrinking phase. A transaction may release locks, but may not obtain any new locks.

Initially, a transaction is in the growing phase. The transaction acquires locks as needed. Once the transaction releases
a lock, it enters the shrinking phase, and it can issue no more lock requests.

Two-phase locking does not ensure freedom from deadlocks


• Cascading roll-back is possible under two-phase locking. To avoid this, follow a modified protocol called
strict two-phase locking. Here a transaction must hold all its exclusive locks till it commits/aborts.
• Rigorous two-phase locking is even stricter: here all locks are held till commit/abort. In this protocol
transactions can be serialized in the order in which they commit.

63 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
9. Deadlock Handling
In a database, a deadlock is a situation in which two or more transactions are waiting for one another to give up locks.
For example, Transaction A might hold a lock on some rows in the Accounts table and needs to update some rows in
the Orders table to finish. Transaction B holds locks on those very rows in the Orders table but needs to update the
rows in the Accounts table held by Transaction A. Transaction A cannot complete its transaction because of the lock
on Orders. Transaction B cannot complete its transaction because of the lock on Accounts. All activity comes to a halt
and remains at a standstill forever unless the DBMS detects the deadlock and aborts one of the transactions.

Deadlock Prevention
To prevent any deadlock situation in the system, the DBMS aggressively inspects all the operations which transactions
are about to execute. DBMS inspects operations and analyze if they can create a deadlock situation. If it finds that a
deadlock situation might occur then that transaction is never allowed to be executed.
There are deadlock prevention schemes, which uses time-stamp ordering mechanism of transactions in order to
predict a deadlock situation.

WAIT-DIE SCHEME:
In this scheme, if a transaction request to lock a resource (data item), which is already held with conflicting lock by
some other transaction, one of the two possibilities may occur:
• If TS(Ti) < TS(Tj), that is Ti, which is requesting a conflicting lock, is older than Tj, Ti is allowed to wait until
the dataitem is available.
• If TS(Ti) > TS(tj), that is Ti is younger than Tj, Ti dies. Ti is restarted later with random delay but with same
timestamp.
• This scheme allows the older transaction to wait but kills the younger one.

WOUND-WAIT SCHEME:
In this scheme, if a transaction request to lock a resource (data item), which is already held with conflicting lock by
some other transaction, one of the two possibilities may occur:
• If TS(Ti) < TS(Tj), that is Ti, which is requesting a conflicting lock, is older than Tj, Ti forces Tj to be rolled
back, that is Ti wounds Tj. Tj is restarted later with random delay but with same timestamp.
• If TS(Ti) > TS(Tj), that is Ti is younger than Tj, Ti is forced to wait until the resource is available.

This scheme, allows the younger transaction to wait but when an older transaction request an item held by younger
one, the older transaction forces the younger one to abort and release the item.
In both cases, transaction, which enters late in the system, is aborted.

Deadlock Detection

Deadlocks can be described precisely in terms of a directed graph called a wait for graph. This graph consists of a pair
G = (V, E), where V is a set of vertices and E is a set of edges. The set of vertices consists of all the transactions in the
system. Each element in the set E of edges is an ordered pair Ti → Tj. If Ti → Tj is in E, then there is a directed edge
from transaction Ti to Tj , implying that transaction Ti is waiting for transaction Tj to release a data item that it needs.

When transaction Ti requests a data item currently being held by transaction Tj, then the edge Ti → Tj is inserted in
the wait-for graph. This edge is removed only when transaction Tj is no longer holding a data item needed by
transaction Ti.

A deadlock exists in the system if and only if the wait-for graph contains a cycle. Each transaction involved in the
cycle is said to be deadlocked. To detect deadlocks, the system needs to maintain the wait-for graph, and periodically
to invoke an algorithm that searches for a cycle in the graph.

Wait for graph with no cycle

Since the graph has no cycle, the system is not in a deadlock state.
64 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
Suppose now that transaction T4 is requesting an item held by T3. The edge T4 → T3 is added to the wait-for graph,
resulting in the new system state. This time, the graph contains the cycle:

T2 →T4 →T3 →T2


implying that transactions T2, T3, and T4 are all deadlocked.

Deadlock Recovery

When a detection algorithm determines that a deadlock exists, the system must recover from the deadlock. The most
common solution is to roll back one or more transactions to break the deadlock. Choosing which transaction to abort
is known as Victim Selection.

Selection of a victim: In the above discussed wait-for graph transactions T3, T2 and T4 are deadlocked. In order to
remove deadlock one of the transaction out of these three transactions must be roll backed.

We should roll back those transactions that will incur the minimum cost. When a deadlock is detected, the choice of
which transaction to abort can be made using following criteria:
• The transaction which have the fewest locks
• The transaction that has done the least work
• The transaction that is farthest from completion

Rollback: Once we have decided that a particular transaction must be rolled back, we must determine how far this
transaction should be rolled back.
The simplest solution is a total rollback: Abort the transaction and then restart it. However, it is more effective to roll
back the transaction only as far as necessary to break the deadlock. Such partial rollback requires the system to
maintain additional information about the state of all the running transactions. Specifically, the sequence of lock
requests/grants and updates performed by the transaction needs to be recorded. The deadlock detection mechanism
should decide which locks the selected transaction needs to release in order to break the deadlock. The selected
transaction must be rolled back to the point where it obtained the first of these locks, undoing all actions it took after
that point. The recovery mechanism must be capable of performing such partial rollbacks. Furthermore, the
transactions must be capable of resuming execution after a partial rollback.

Problem of Starvation: In a system where the selection of victims is based primarily on cost factors, it may happen
that the same transaction is always picked as a victim. As a result this transaction never completes can be picked as a
victim only a (small) finite number of times. The most common solution is to include the number of rollbacks in the
cost factor.

PRACTICE SET
1. Mechanism developed to enforce users to enter data (a) Home tab (b) Backstage View tab
in required format is? (c) None of these (d) Database Tools tab
(a) Data validation (b) Input mask (e) File menu
(c) Criteria (d) Data verification 4. Which of the following method can be used to add
(e) None of these more tables in a database?
2. What is the size of Data & Time field type? (a) Design View (b) Table Wizard
(a) 1 (b) 8 (c) By Entering Data (d) All of above
(c) 255 (d) 50 (e) None of these
(e) None of these 5. The feature that database allows to access only
3. The options like Save, Open Database, Print are certain records in database is?
available in (a) Forms (b) Reports
65 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
(c) Queries (d) Tables 17. Column header is referring as?
(e) None of these (a) Table (b) Relation
6. You can find Sort & Filter group of commands in? (c) Attributes (d) Domain
(a) Create ribbon (b) Home ribbon (e) None of these
(c) Database tools ribbon 18. In mathematical term table is referred as?
(d) Fields ribbon (e) None of these (a) Relation (b) Attribute
7. Arrange according to the size (c) Tuple (d) Domain
(a) Record, field, byte, bit (b) Bit, field, byte, record (e) None of these
(c) Field, byte, record, bit (d) Byte, bit, record, field 19. Minimal Super keys are called?
(e) None of these (a) Schema keys (b) Candidate keys
8. What is the maximum allowed field size for Boolean (c) Domain keys (d) Attribute keys
(Yes/No) fields? (e) None of these
(a) 1 (b) 8 20. The Primary key must be?
(c) 50 (d) 255 (e) None of these (a) Non Null (b) Unique
9. What is relational database? (c) Either (e) or (b) (d) Both (a) and (b)
(a) A database structured to recognize relations (e) Null
between stored items of information. It is based on 21. By Grouped Report you understand-
the relational model of data (a) Type of report generated by the Report Wizard
(b) A database that is not related to other databases (b) Type of report that present records sorted in
(c) A database to store human relations ascending or descending order as you specify
(d) Both (a) and (b) (c) Type of report that displays data grouped by
(e) None of these fields you specified
10. What is the best data type for a field that (d) Both (a) and (b) (e) None of these
stores mobile numbers? 22. Which of the following is not a level of data
(a) Memo (b) Number abstraction?
(c) Date/Time (d) Text (a) Physical Level (b) Critical Level
(e) None of these (c) Logical Level (d) View Level
11. Which filter method lets you filter records based on (e) None of these
criterion you specify? 23. Data Manipulation Language enables users to
(a) Filter by form (b) Filter by selection (a) Retrieval of information stored in database
(c) Auto filter (d) Advanced filter (b) Only (a) and (b)
(e) None of these (c) Insertion of new information into the database
12. Which of the following field type is used to store (d) Deletion of information from the database
photograph of employees? (e) All of the above
(a) Memo (b) None of these 24. Which of the following is not a Storage Manager
(c) OLE (d) Photo Component?
(e) Picture (a) Transaction Manager (b) Logical Manager
13. Which of the following in not a function of DBA? (c) Buffer Manager (d) None of these
(a) Network Maintenance (e) File Manager
(b) Routine Maintenance 25. To display associated record from related table
(c) Schema Definition in datasheet view, you can?
(d) Authorization for data access (a) Double click the record
(e) None of these (b) Apply filter by form command
14. Which of the following is a Data Model? (c) Single click on expand indicator (+) next to the
(a) Entity-Relationship model record
(b) Relational data model (d) Double click on expand indicator (+) next to the
(c) Object-Based data model record
(d) Network model (e) All of the above (e) None of these
15. Which of the following is not Modification of the 26. A Relation is a
Database? (a) Subset of a Cartesian product of a list of
(a) Deletion (b) Insertion attributes
(c) Sorting (d) Updating (b) Subset of a Cartesian product of a list of domains
(e) None of these (c) Subset of a Cartesian product of a list of tuple
16. Which of the following represents a relationship (d) Subset of a Cartesian product of a list of relations
among a set of values? (e) None of these
(a) A row (b) A table 27. Who proposed the relational model?
(c) A field (d) A column (a) Bill Gates (b) E.F. Codd
(e) None of these (c) Herman Hollerith (d) Charles Babbage

66 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
(e) None of these 38. An entity type whose existence depends on another
28. In an Entity-Relationship Diagram “Ellipses” entity type is called a _____ entity.
represents (a) Strong (b) Weak
(a) Attributes (b) Weak entity set (c) Codependent (d) Variant
(c) Relationship sets (d) Multi-valued attributes (e) Independent.
(e) None of these 39. The advantage of computerized database over
29. What type of relationship exists between a Teacher manual database is?
table and Class table? (a) We can get the information quickly
(a) One to many (b) Many to many (b) We can put in information quick
(c) One to one (d) Two to two (c) Solve the repeated information and consistency
(e) None of these problem
30. Data Manipulation Language (DML) is not to (d) All of above
(a) Create information table in the Database (e) None of these
(b) Insertion of new information into the Database 40. The property of transaction which ensures that
(c) Deletion of information in the Database either all operations of the transaction are reflected
(d) Modification of information in the Database properly in the database or none, is called
(e) None of these (a) Atomicity (b) Durability
31. Which of the following in true regarding Referential (c) Isolation (d) Consistency
Integrity? (e) Deadlock.
(a) Every primary-key value must match a 41. In a super type/subtype hierarchy, each subtype
primary-key value in an associated table has?
(b) Every primary-key value must match a (a) Only one super type (b) Many super types
foreign-key value in an associated table (c) At most two super types
(c) Every foreign-key value must match a (d) At least one subtype (e) Not at all.
primary-key value in an associated table 42. A property or characteristic of an entity type that is
(d) Every foreign-key value must match a of interest to the organization is called an
foreign-key value in an associated table (a) Attribute (b) Coexisting entity
(e) None of these (c) Relationship (d) Cross-function
32. Group names in ribbon can be helpful to (e) Weak entity.
(a) Group the commands so that when you move one, 43. In the context of a database table, the statement “A
you can move all of them together determines B” indicates that
(b) Give a name for buttons on ribbon (a) Knowing the value of attribute A you cannot
(c) Find the required option by inspecting through look up the value of attribute B
them (b) You do not need to know the value of attribute A
(d) All of above in order to look up the value of attribute B
(e) None of these (c) Knowing the value of attribute B you can look up
33. Which of the following is an unary operation? the value of attribute A
(a) Selection operation (b) Generalized selection (d) Knowing the value of attribute A you can look
(c) Primitive operation (d) Projection operation up the value of attribute B
(e) None of these (e) None of the above.
34. Which of the following NF is based on Multi Valued 44. A method that speeds query processing by running
Dependency? a query at the same time against several partitions of
(a) First (b) Second a table using multi processors is called
(c) Third (d) Fourth (a) Multiple partition query
(e) None of these (b) Perpendicular query processing
35. Which of the following in not Outer join? (c) Parallel query processing
(a) Left outer join (b) Right outer join (d) Query optimization (e) Query Execution.
(c) Full outer join (d) All of the above 45. A database management software (DBMS) includes
(e) None of these (a) Automated tools (CASE) used to design
36. A relation that has no partial dependencies is in databases and application programs
which normal form? (b) A software application that is used to define,
(a) First (b) Second create, maintain and provide controlled access to
(c) Third (d) BCNF user databases
(e) None of these (c) Application programs that are not used to
37. A transaction completes its execution is said to be provide information to users
(a) Saved (b) Loaded (d) Database that contains only occurrences of
(c) Rolled (d) Committed logically organised data or information
(e) None of these

67 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
(e) Repository of meta data, which is a central (e) Using SQL language constructs like revoke and
storehouse for all data definitions, data relationships,grant respectively for revoking and granting
screen and report formats and other system privileges to users.
components. 52. State the unit of storage that can store one or more
46. Which of the following statements concerning the records in a hash file organization
primary key is true? (a) Buckets (b) Disk pages
(a) All primary key entries are unique (c) Blocks (d) Nodes (e) Baskets.
(b) The primary key may be null 53. Which of the following questions is answered by the
(c) The primary key is not required for all tables SQL statement?
(d) The primary key data does not have to be unique Select Count (Product_Description) from Product_T;
(e) None of these (a) How many products are in the Product Table?
47. An index record appears for every search key value (b) How many different product descriptions are in
in the file is the Product Table?
(a) Secondary index (b) Dense index (c) How many characters are in the field name
(c) Sparse index (d) Multi level index “Product_Description”?
(e) B+ tree. (d) How many different columns named “Product
48. What does the following SQL statement do? Description” is there in table Product_T?
Select * From Customer Where Cust_Type = “Best”; (e) How many total records in a table?
(a) Selects all the fields from the Customer table for
54. Consider the following table obtained using Student
each row with a customer labeled “best” and Instructor relations.
(b) Selects the “*” field from the Customer table for Fname: Lname:
each row with a customer labeled “best” Ajith Gamage
(c) Selects fields with a “*” in them from the Sujith Hewage
Customer table Kasun Peiris
(d) Selects all the fields from the Customer table for Which relational algebra operation could have been
each row with a customer labeled “*” applied on the pair of relations Student and
(e) Counts all records and displays the value. Instructor to obtain the above data?
49. If K is a foreign key in a relation R1, then (a) Student n Instructor (b) Instructor ÷ Student
(a) Every tuple of R1 has a distinct value for K (c) Student – Instructor (d) Student ? Instructor
(b) K cannot have a null value for tuples in R1 (e) Instructor – Student.
(c) K is a key for some other relation 55. Which of the following type of index is
(d) K is a Primary key for R1 automatically created when we do not specify?
(e) K is a Composite key for R1. (a) Bitmap (b) Balanced Tree Index
50. Select the correct statement from the following on (c) Binary Tree Index (d) Hashed
proper naming of schema constructs: (e) Sparse Index.
(a) Entity type name applies to all the entities 56. Which of the following is a procedure for acquiring
belonging to that entity type and therefore a plural the necessary locks for a transaction where all
name is selected for entity type necessary locks are acquired before any are
(b) In the narrative description of the database released?
requirements, verbs tend to indicate the names of (a) Record controller (b) Exclusive lock
relationship types (c) Authorization rule (d) Two phase lock
(c) The nouns arising from a database requirement (e) Three Phase lock.
description can be considered as names of attributes57. In the relational modes, cardinality is termed as
(d) Additional nouns which are appearing in the (a) Number of tuples (b) Number of attributes
narrative description of the database requirements (c) Number of tables (d) Number of constraints
represent the weak entity type names (e) None of these
(e) Adjectives written in the database requirement 58. Out of the following activities, which is the one that
description help to identify the partial relationships normally performed by DBMS, without the
among entities. interference of the DBA?
51. Embedded SQL means (a) Integrity (b) Retention
(a) Using the EMBED key word in a SQL statement (c) Security (d) Granting the Privileges
(b) Writing a SQL statement to retrieve data from (e) Recovery.
more than one relation 59. Which of the following Relational Algebra
(c) Writing SQL statements within codes written in a operations require that both tables (or virtual tables)
general programming language involved have the exact same attributes/data types?
(d) Specifying a condition and action to be taken in (a) Join, Projection, Restriction
case the given condition is satisfied in a trigger (b) Multiplication and Division
(c) Union, Intersection, Minus
(d) Minus, Multiplication, Intersection
68 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
(e) Projection, Selection, Rename. (a) Business rule constraint (b) Data integrity
60. Which of the following is a component of the (c) Business integrity (d) Data structure
relational data model included to specify business (e) Entity Integrity.
rules to maintain the integrity of data when they are
manipulated?
SOLUTIONS
1. (b); ‘Input mask’ 43. (d); Reason: Knowing the value of attribute A you
2. (b); ‘8’ can look up the value of attribute B.
3. (e); ‘File menu’ 44. (c); Reason: A method that speeds query processing
4. (d); ‘All of above’ by running a query at the same time against several
5. (c); ‘Queries’ partitions of a table using multi processors is called
6. (b); ‘Home ribbon’ parallel query processing.
7. (a); ‘Record, field, byte, bit’ 45. (b); Reason: A software application that is used to
8. (a); ‘1’ define, create, maintain and provide controlled
9. (a) access to user databases.
10. (d); ‘Text’ 46. (a)
11. (a); ‘Filter by form’ 47. (b); Reason: Dense Index record appears for every
12. (c); ‘OLE’ search key valued in the file.
13. (a); ‘Network Maintenance’ 48. (a);
14. (e); ‘All of the above’ 49. (c); Reason : If k is a foreign key in a relation R1,
15. (c); ‘Sorting’ then K is a key for some other relation.
16. (a); ‘A row’ 50. (b); Reason: In the narrative description of the
17. (c); ‘Attributes’ database requirements, verbs tend to indicate the
18. (a); ‘Relation’ names of relationship types.
19. (b); ‘Candidate key’ 51. (c); Embedded SQL refers to writing SQL statements
20. (d); ‘Both (a) and (b)’ within codes written in a general programming
21. (b); language.
22. (b) 52. (a); Buckets are used to store one or more records in
23. (e) a hash file organization.
24. (b) 53. (b); Reason: How many different product
25. (c) descriptions are in the Product Table?
26. (b) 54. (e); Instructor – Student is the relational algebra
27. (b) operation that could be applied on the pair of
28. (a) relations Student and Instructor to obtain the above
29. (b) data.
30. (a) 55. (b); Balanced Tree Index is automatically created
31. (c) when we do not specify.
32. (c) 56. (d); Two-phase lock is a procedure for acquiring the
33. (b) necessary locks for a transaction where all necessary
34. (d) locks are acquired before any are released
35. (d) 57. (a);
36. (b) 58. (e); Recovery is the one that normally is performed
37. (d) by DBMS, without the interference of the DBA
38. (b) 59. (c); n relational algebra Union, Intersection, Minus
39. (d) operations require that both tables (or virtual tables)
40. (a) involved have the exact same attributes/data types.
41. (a); Reason: In a super type/sub-type hierarchy, 60. (b); Data integrity is a component of the relational
each sub-type has only one super type data model included to specify business rules to
42. (a); Reason: A property or characteristic of an entity maintain the integrity of data when they are
type that is of interest to the organization is called manipulated
attribute

69 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
3 DATA WAREHOUSING
& DATAMINING

1. Data warehousing:

1.1 Introduction

Data warehousing is combining data from multiple sources into one comprehensive and easily manipulated database.
The primary aim for data warehousing is to provide businesses with analytic results from data mining, OLAP, Score
carding and reporting.

A data warehouse is a subject-oriented, integrated, time-variant and non-volatile collection of data in support
of management's decision making process.

Subject Oriented: Data that gives information about a particular subject instead of about a company's ongoing
operations.

Integrated: Data that is gathered into the data warehouse from a variety of source and merged into a coherent
whole.

Time variant: All data in the data warehouse is identified with a particular time period.

Non-volatile: Data is stable in a data warehouse. More data is added but data is never removed.

1.2. Benefits of data warehousing:


⚫ Enhance Business Intelligence
⚫ Increased Query and System performance
⚫ Business Intelligence from Multiple Source
⚫ Timely Access to Data
⚫ Enhanced Data quality and Consistency
⚫ Historical intelligence
⚫ High return on investment

1.3. Operational and Informational Data

Operational Data:
➢ Focusing on transactional function such as bank card withdrawals and deposits
➢ Detailed
➢ Updateable
➢ Reflects current data

Informational Data:
➢ Focusing on providing answers to problems posed by decision makers
➢ summarized
➢ Non updateable

Important Terminology:

Enterprise Data Warehouse- It collects all information about subjects (Customers, products, sale assets, personnel)
that span the entire organization.
70 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
Data Mart- Departmental subsets that focus on selected subjects. A data mart is a segment of a data warehouse that
can provide data for reporting and analysis on a section, unit, department or operation in the company, e.g. sales,
payroll, production. Data marts are sometimes complete individual data warehouses which are usually smaller
than the corporate data warehouse.

Decision Support System(DSS)- Information technology to help the knowledge worker(executive,manager and
analyst) makes faster and better decision.

Drill-down- Traversing the summarization levels from highly summarized data to the underlying current or old
detail.

Meta data- Data about data. Containing location and description of warehouse system components: names,
definition, structure etc.

1.4. Data Warehouse Architecture:

Data warehouse architecture is based on a relational database management system server that functions as the central
repository for informational data. In the data warehouse architecture, operational data and processing is completely
separate from data warehouse processing.

Components of Data Warehouse Architecture:


1. Data sourcing, cleanup, transformation and migration tools.
2. Metadata repository
3. Warehouse database technology
4. Data marts
5. Data query, reporting, analysis and mining tools
6. Data warehouse administration and management
7. Information delivery system

NOTE on Operational Data Store: Operational Data store (ODS) is an architecture concept to support
day-to-day operational decision support and contains current value data propagated from operational applications
⚫ ODS is subject-oriented, similar to a classic definition of a Data warehouse
⚫ ODS is integrated, in the same sense as a data warehouse
However
⚫ ODS is volatile, while a data warehouse is nonvolatile
⚫ ODS contains very current data, while a data warehouse contains both current and historical data.
⚫ ODS contains detailed data only, and not precalculated summaries and aggregates, as is typical for a data
warehouse.
71 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS
Email : [email protected] | website: www.careerpower.in
1.5. Data Warehouse Design Approaches:

Designing or Building of a Data Warehouse can be done following either one of the approaches. These approaches are
notably known as:
• Top-Down Approach-In the top down approach suggested by Bill Inmon, we build a centralized repository to
house corporate wide business data. This repository is called Enterprise Data Warehouse (EDW). The data in the
EDW is stored in a normalized form in order to avoid redundancy. The central repository for corporate wide data
helps us maintain one version of truth of the data. The data in the EDW is stored at the most detail level. The
reason to build the EDW on the most detail level is to leverage- Flexibility to be used by multiple departments and
Flexibility to cater for future requirements.
• Bottom-Up Approach-The bottom up approach suggested by Ralph Kimball is an incremental approach to build
a data warehouse. Here we build the data marts separately at different points of time as and when the specific
subject area requirements are clear. The data marts are integrated or combined together to form a data
warehouse. Separate data marts are combined through the use of conformed dimensions and conformed facts. A
conformed dimension and a conformed fact is one that can be shared across data marts.

A Conformed dimension has consistent dimension keys, consistent attribute names and consistent values across
separate data marts. The conformed dimension means exact same thing with every fact table it is joined.
A Conformed fact has the same definition of measures, same dimensions joined to it and at the same granularity
across data marts.
The bottom up approach helps us incrementally build the warehouse by developing and integrating data marts as and
when the requirements are clear. We don’t have to wait for knowing the overall requirements of the warehouse

1.6. Meta Data:

It is data about data. It is used for maintaining, managing and using the data warehouse. It is classified into two:
Technical Meta data: It contains information about data warehouse data used by warehouse designer, administrator
to carry out development and management tasks. It includes,
• Info about data stores
• Transformation descriptions. That is mapping methods from operational db to warehouse db
• Warehouse Object and data structure definitions for target data
• The rules used to perform clean up, and data enhancement
• Data mapping operations
• Access authorization, backup history, archive history, info delivery history, data acquisition history, data access
etc.,

Business Meta data: It contains info that gives info stored in data warehouse to users. It includes,
• Subject areas, and info object type including queries, reports, images, video, audio clips etc.
• Internet home pages
• Info related to info delivery system
• Data warehouse operational info such as ownerships, audit trails etc.,

Meta data helps the users to understand content and find the data. Meta data are stored in a separate data
stores which is known as informational directory or Meta data repository which helps to integrate, maintain and
view the contents of the data warehouse.

A metadata repository is a database of data about data (metadata). The purpose of the metadata repository is
to provide a consistent and reliable means of access to data. The repository itself may be stored in a physical
location or may be a virtual database, in which metadata is drawn from separate sources.

1.7. Access Tools In Data warehouse

Its purpose is to provide info to business users for decision making. There are five categories:
1. Data query and reporting tools- Query and reporting tools are used to generate query and report.

72 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
2. Application development tools- It used to generate SQL query. It uses Meta layer software in between users and
databases which offers a point-and-click creation of SQL statement.
3. Executive info system tools (EIS)- An Executive Information System (EIS) as a management information system
is generally designed to be emphasized with graphical display and very easy to use and appealing interfaces as
this is assumed to be used for supporting and facilitating the information and decision making needs of senior
executives.
4. OLAP tools- These tools are based on concepts of multidimensional database and allow a sophisticated user to
analyse the data using elaborate, multidimensional and complex views. Typical business applications for these
tools include product performance and profitability, effectiveness of a sales program or a marketing campaign,
sales forecasting and capacity planning. These tools assume that the data is organised in a multidimensional
model, which is supported by a special multidimensional database or by a Relational database designed to enable
multidimensional properties.
5. Data mining tools- Data mining tools are used to discover knowledge from the data warehouse data also can be
used for data visualization and data correction purposes.

1.8. Data marts:

Data mart is a subset of a data warehouse that support the requirements of particular department or business
function.
The characteristic that differentiate data marts and data warehouse is, a data mart focuses only the requirements of
users associated with one department or business function.

1.9. OLAP (Online Analytical Processing):

OLAP is an approach to answering multi dimentional analytical queries. OLAP is part of the broader category of
business intelligence, which also encompasses relational database, report writing and data mining. OLAP tools enable
users to analyze multidimentional data interactively from multiple perspectives.
The OLAP databases are highly de-normalized, which makes the files redundant and helps to improve analytic
performance. The processing speed of the system is very slow and can take up to many hours depending on the data
involved.
Types of OLAP:
➢ Relational OLAP
➢ Multidimentional OLAP
➢ Hybrid OLAP

1.10. OLTP (Online Transaction Processing):

It is a class of systems that facilitate and manage transaction-oriented applications, typically for data entry and
retrieval transaction processing. It manages current data and stores all of the given data. It is characterized by a large
number of short online transactions and their quick real time response to the users.
The main purpose of the OLTP system is to control or run the fundamental business tasks.

1.11. Difference between OLAP and OLTP:

OLAP OLTP
OLAP technology used to perform complex analysis OLTP technology used to preform updates on operational
of the data in a data warehouse. or transactional system (e.g. Point of sale system).
It holds historical and only relevant data It holds current and all type of data
It has few concurrent users. It has many concurrent users.
OLAP systems are used by knowledge workers such OLTP systems are used by clerks, DBAs, or database
as executives, managers and analysts. professionals.
Provides summarized and multidimensional view of Provides detailed and flat relational view of data.
data.
Based on Star Schema, Snowflake, Schema and Fact It is highly normalized with many tables.
Constellation Schema.

73 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in
2. Data Mining:

Data mining is a process of extracting previously unknown, valid and actionable information from large set of data
and then using the information to make crucial business decision.
Data mining is concerned with the analysis of data and the use of software techniques for finding hidden and
unexpected patterns and relationships in sets of data. The focus of data mining is to find the information that is
hidden and unexpected.
2.1. Data Mining Techniques:
➢ Association
➢ Classification
➢ Clustering
➢ prediction
➢ Sequential Patterns
➢ Decision Trees
➢ Combinations

2.2. Data Mining Applications:

Various fields uses data mining technologies because of fast access of data and valuable information from vast
amount of data. Data mining technologies have been applied successfully in many areas:
• Financial Data Analysis:
The financial data in banking and financial industry is generally reliable and of high quality which facilitates the
systematic data analysis and data mining. Here are the few typical cases: Design and construction of data
warehouses for multidimensional data analysis and data mining. Loan payment prediction and customer credit
policy analysis. Classification and clustering of customers for targeted marketing. Detection of money laundering
and other financial crimes

• Retail Industry:
Data Mining has its great application in Retail Industry because it collects large amount data from on sales,
customer purchasing history, goods transportation, consumption and services. It is natural that the quantity of
data collected will continue to expand rapidly because of increasing ease, availability and popularity of web. The
Data Mining in Retail Industry helps in identifying customer buying patterns and trends. That leads to improved
quality of customer service and good customer retention and satisfaction.

• Telecommunication Industry:
Today the Telecommunication industry is one of the most emerging industries providing various services such as
fax, pager, cellular phone, Internet messenger, images, email, web data transmission etc. Due to the development
of new computer and communication technologies, the telecommunication industry is rapidly expanding. This is
the reason why data mining is become very important to help and understand the business. Data Mining in
Telecommunication industry helps in identifying the telecommunication patterns, catch fraudulent activities,
make better use of resource, and improve quality of service.

• Biological Data Analysis:


Now a days we see that there is vast growth in field of biology such as genomics, proteomics, functional
Genomics and biomedical research. Biological data mining is very important part of Bioinformatics. Following are
the aspects in which Data mining contribute for biological data analysis:
➢ Semantic integration of heterogeneous, distributed genomic and proteomic databases.
➢ Alignment, indexing, similarity search and comparative analysis multiple nucleotide sequences.
➢ Discovery of structural patterns and analysis of genetic networks and protein pathways.

• Other Scientific Applications


The applications discussed above tend to handle relatively small and homogeneous data sets for which the
statistical techniques are appropriate. Huge amount of data have been collected from scientific domains such as
geosciences, astronomy etc. There is large amount of data sets being generated because of the fast numerical
simulations in various fields such as climate, and ecosystem modelling, chemical engineering, fluid dynamics etc.
Following are the applications of data mining in field of Scientific Applications.

74 | P a g e CAREER POWER | PREMIER INSTITUTE FOR ALL GOVT. COMPETITIVE EXAMS


Email : [email protected] | website: www.careerpower.in

You might also like