DBMS Reference Notes
DBMS Reference Notes
DBMS Reference Notes
Page | 1
Keeping organizational information in a file processing system has a number
of major disadvantages:
- The designers of the original system may not include all the
requests that is needed at any time in modern business.
- File processing systems do not allow needed data to be retrieved in
a convenient and efficient manner.
3. Data isolation:
4. Integrity Problems:
- The data values stored in the database must satisfy certain types of
consistency constraints.
- Developers enforce these constraints in the system by adding
appropriate code in the various application programs.
- When new constraints are added, it is difficult to change the
programs to enforce them.
- The problem is compounded when constraints involve several data
items from different files.
5. Atomicity problems:
Page | 2
- Supervision is needed to guard against such possibilities, which is
not possible in conventional file processing system.
7. Security Problems:
- Not every user of the database system should be able to access all
the data
- In the file processing system, the application program does not
enforce such security.
Data Abstraction:
The purpose of the database system is to provide users with abstract view of
data. This means the system hides certain details of how the data are stored
and maintained.
Since many users are not computer trained, developers hide the complexity
from users through several levels of abstraction to simplify users’ interactions
with the system.
1. Physical Level: This level describes how the data are actually stored. It
is a complex low-level data structure.
2. Logical Level: This level describes what data are stored in the
database and what relationship exists among those data. This
describes the entire database in terms of relatively simple structure.
The users o the logical level does not need to be aware of the
complexity of physical level. The database administration who decides
what data to keep in the database use the logical level.
3. View Level: This level describes only the part of the database. Because
of the variety of information stored in a large database, the logical level
is still complex for many users because they need to access only a part
of the database. The view level simplifies their interaction with the
system. The system provides many views for the same database.
Page | 3
Fig: Three levels of data abstraction
View Level
Logical Level
Physical level
Page | 4
Instances and Schemas:
Data Models:
Consists of set conceptual tools that are used to describe the structure (data
types, relationships, and constraints) of a database.
3. Entity-Relationship Model:
- Entity relationship model is based on the perception of real world objects
called entities and relationship among those objects.
- It is conceptual data model.
- Database is modeled as a collection of entities and relationship among
those entities.
- It is represented graphically by the basic components:
Rectangles (represent entity sets)
Ellipses (represent attributes)
Diamonds (represent relationship sets among entity sets)
Lines (link attributes to entity sets and entity sets to relationship sets)
4. Relational model:
- It is also representational or implementation data model.
- Unlike hierarchical and network models, there are no physical links.
- Data is maintained in the form of tables consisting of rows and columns.
- Each row (record) represents an entity and a column (field) represents an
attribute of entity.
- The relationship between the two tables is implemented through a common
attribute in the tables not by physical links or pointers.
- This makes query much more easier in a relational database system than
hierarchical or network database system.
- More program friendly and the popular model in today’s commercial world.
Data Independence
• The capacity to change the database schema without affecting
application programs is called data independence.
• Database system provides two types of data independence:
1. Logical Data Independence
2. Physical Data Independence
Database Users:
• Database users are those who interact with the database in order to
query and update the database and generate reports.
• On the basis of how users interact with the database, users are
classified as below:
- Naive users: normal users, invokes written application programs.
- Sophisticated users: business analyst, scientist, use database query
language.
- Specialized users: writes specialized database programs such as
computer-aided design system.
- Application programmers: Computer professionals like database
application programmers develop application programs to facilitate easy data
access.
Database Administrator:
• They are the person who has central control over both data and
application programs.
• Responsibilities of DBA:
- Schema definition and modification
- New software installation
- Security enforcement and administration
- Data analysis
- Preliminary database design
- Physical organization modification
- Routine maintenance checks
Application Archictecture:
• An entity set is a set of entities of the same type that share same properties or attributes, eg, set of all
students in a class is an entity set.
• Entity sets are represented by rectangle with entity name within it in E-R diagram.
2.Attributes
• Attributes are the information that explains the properties of an entity, eg, student entity can have
student-id, student name, student address.
• They are the descriptive properties possessed by all entity of an entity sets.
• Each entity may have its own value for each attribute.
Types:
• Association between two entity sets is called relationship set, eg, “Teacher teaches students”, here
teaches is the association between entity sets teacher and student.
Page | 1
• A relationship may also have attributes called descriptive attributes. Consider a relationship set
depositor with entity sets customer and account. We could associate the attribute access-date to that
relationship to specify the most recent date on which a customer accessed an account.
Degree of relationship
• Number of entity sets that participate in a relationship set is called degree of relationship set
1. Unary Relationship:
If only one entity set participates in relationship more than once it is called unary relationship.
Here same entity set participates in relationship twice with different roles.
Role names are specified above the link joining entity set and relationship set.
This type of relationship set is sometimes called a recursive relationship set.
2. Binary Relationship:
Relationship sets in which two entity sets participate are called binary relationships. In other
words we can say that relationship sets of degree 2 are called binary relationship.
This is the most common type of relationship in database systems.
3. N-ary Relationship:
Relationship set in which more than two entity sets involves is called n-ary relationship.
Ternary and quaternary relationships are special cases of n-ary relationship.
For example ternary relationship Arrange between three entity sets Financial institution, solicitors
and buyers.
• On the basis of cardinality ratio, relationship can be categorized into one-to-one, one-to-many, many-to-
one, many-to-many.
1. One-to-one relationship: If every entity in A is associated with at most one entity in B and vice-versa
then the relationship is called one-to-one relationship, eg, every bank has only one CEO and a
person can be CEO of only one bank.
2. One-to-many relationship: If an entity in A can be associated with any number of entities in B but
every entity in B can be associated with at most one entity in A and then it is called one-to-many
relationship. Eg, a mother can have any number of children but children can have only one mother.
3. Many-to-one relationship: If every entity in A can be associated with only one of entities in B but an
entity in B can be associated with any number of entities in A, then it is called many to one
relationship. Eg, a book is always published by only one publisher but a publisher can publish any
number of books.
4. Many-to-many relationship: If an entity in A can be associated with any number of entities in B and
vice-versa then it is called many-to-many relationship. Eg, a student can enroll into more than one
subject and a subject can be enrolled by many students.
Participation Constraints:
• Constraint on ER model that determines whether all or only some entity occurrence participate in a
relationship is called participation constraint.
• There are two types of participation constraints: Total Participation Constraints and Partial Participation
Constraints.
Page | 2
• The participation of an entity set A in a relationship set R is said to be total if every entity in A
participates in relationship at least once. This is represented by ( )
• The participation of an entity set R is said to be partial if only some of the members of an entity set A
participate in relationship. It is represented by ( ).
5. Keys:
• Set of one or more attributes whose values are distinct for each individual entity in the entity set is called
key.
• Its values can be used to identify each entity uniquely. Eg, Name attribute is a key of the entity set
company because no two companies are allowed to have same name.
Types of Keys:
Super Key:
• A super key of an entity set is a set of one or more attributes whose values uniquely determine each
entity in the entity set.
• Example: Student-id attribute of the entity set student is sufficient to distinguish one student entity from
another. Thus the student-id is a super key. Similarly the set of attributes {roll-number, name, address} is
also a super key of the entity set student.
Candidate Key:
• A candidate key of an entity set is a minimal super key. That is a super key which does not have any
proper subset is called candidate key.
• For example, student-id is a candidate key of the entity set student but set of attributes {roll-number,
name, address} is not a candidate key of the entity set student because it has a proper subset {roll-
number, name}.
• All candidate keys are super keys but vice versa is not true.
Primary Key:
• A primary key is a candidate key that is chosen by the database designer as principle means of uniquely
identifying entities within an entity set.
• There may exists several candidate keys, one of the candidate keys is selected to be the primary key.
• For example: entity set student have candidate keys student-id and roll-number, if database designer
choose student-id for the purpose of uniquely identifying entities within entity set then it becomes
primary key.
- It cannot be null
- It cannot be duplicate
Page | 3
• Let R be a relationship set involving entity sets E1, E2….. En. Let primary-key(E1) denote the set of
attributes that forms the primary key for entity set E1. The composition of the primary key for a
relationship set depends on the set of attributes associated with the relationship set R.
• If the relationship set R has no attributes associated with it, then the set of attributes
• If the relationship set R has attributes a1,a2….an associated with it, then the set of attributes
Primary-key(K1)Uprimary-key(K2)U…Uprimary-key(Kn)U {a1,a2…an}
• The structure of the primary key for the relationship set depends on the mapping cardinality of the
relationship set.
• Suppose that the relationship set is many to many, then the primary key of depositor consists of the union
of the primary keys of customer and account.
• However if a customer can have only one account, that is if the depositor relationship is many to one from
customer to account, then primary key of depositor is simply the primary key of customer.
• Similarly if the relationship is one to many that is an account is owned by at most one customer, then the
primary key of depositor is simply the primary key of account.
7. Entity-Relationship diagram:
• This is the most popular conceptual model used for designing a database.
• Once the entity types, relationships types, and their corresponding attributes have been identified, the
step is to graphically represent these components using entity relationship (E-R) diagram.
• An E-R diagram is a specialized graphical tool that demonstrates the interrelationships among various
entities of a database. We use different symbols in E-R Diagram.
• E-R diagram focuses high level database design and hides low level details of database representation
therefore it can be used to communicate with users of the system while collecting information.
• One that does have a primary key is called strong entity set.
• Weak entity sets depends on the strong entity sets for its existence. Thus these strong entity sets are
called owner or identifying entity sets.
Page | 4
• Weak entity sets does not have a primary key but we need a means of distinguishing among entities. The
primary key of owner entity set and discriminator of the weak entity set allows this distinction to be
made.
• An attribute of weak entity set that is use in combination with primary key of the strong entity set to
identify the weak entity set uniquely is called discriminator (partial key).
• Weak entity sets are represented by double rectangle, identifying relationship by double diamond and
discriminator by dotted line.
• Both the E-R model and the relational database model are logical representation of real-world
enterprises. Because the two models employ similar-design principles, we can convert an E-R design
into a relational design.
• For schema derived from strong entity sets, the primary key of the entity set serves as a primary key of
the resulting schema.
• For example, consider the entity set student of the E-R diagram. This entity set has 4 attributes: SID,
Sname, address, phoneno. We represent this entity set by schema called student with four attributes:
• Note that since SID is the primary key of the entity set, it is also the primary key of the relation schema.
• For example the composite attribute Sname of entity set Student, the schema generated contains the
attribute f_name, m_name and L_name; there is no separate attribute or schema for Sname.
• The relational schema derived from the Student entity with complex attributes is thus:
Student (SID, f_name, m_name, l_name, street_name, ward_no., dist_name, Phone no.)
• For a multivalued attribute M, we create a relation schema R with an attribute A that corresponds to M
and attributes corresponding to the primary key of the entity set of which M is an attribute.
• For example, consider an entity set student, which include the multi-valued attribute phone no. The
primary key of entity set student is SID. For this multi-valued attribute, we create a relation schema
• We create a primary key of the relation schema consisting of all attributes of the schema. In the above
example the primary key consists of both SID and phone no.
• In addition, we create a foreign-key constraint on the relation schema created from a multi-valued
attribute. In the above example SID will be the foreign key.
Page | 5
9.3 Representation of Weak Entity Sets:
• Let A be a weak entity set with attributes a1,a2….an. Let B be the strong entity set on which A depends.
Let the primary key of B consists of attributes b1,b2,…. bn. We represent the entity set A by a relation
schema called A as
{a1,a2…..an} U {b1,b2…..bn}
• For schemas derived from a weak entity set, the combination of the primary key of the strong entity set
and the discriminator of the weak entity set serves as the primary key of the schema.
• In addition to creating a primary key, we also create a foreign key constraint on the relation A.
• For example, consider the weak entity set section. This entity set has the attributes: sec_id, semestar, year.
The primary key of the course entity set on which section depends, is course_id. Thus we represent
section by a schema with the following attributes:
• The primary key for the relation schema are chosen as follows:
I. For a binary many-to-many relationship, the union of the primary key attributes from the participating
entity sets becomes the primary key.
II. For a binary one-to-one relationship set, primary key of either entity set can be chosen as the primary
key. The choice can be made arbitrarily.
III. For a binary many-to-one or one-to-many relationship set, the primary key of the entity set on the many
side of the relationship set serves as the primary key.
IV. For an n-ary relationship set without any arrows on its edge, the union of the primary key attributes form
the participating entity sets becomes the primary key.
V. For an n-ary relationship set with an arrow on one of its edges, the primary keys of the entity sets not on
the arrow side of the relationship set serve as the primary key for the schema.
• For example, consider the relationship set Enrolls. The relationship sets involve the following two entity
sets:
• Since the relationship set has no attributes, the Enrolls schema has two attributes SID and course-id.
• We also create two foreign-key constraints on the Enrolls relation with attribute SID referencing the
primary key of student and attribute course-id referencing the primary key of course.
Page | 6
• To model complex systems such as, databases for engineering design and manufacturing,
telecommunications, complex software systems, and geographic information systems (GIS), and many
other applications, it is not enough.
• The EER model includes all the concepts introduced by the ER model.
• Additionally it includes the concepts of specialization, generalization, higher and lower level entity sets,
attribute inheritance, and aggregation.
10.1 Specialization:
• An entity set may include sub groupings of entities that are distinct in some way from other entities in the
set. A subset of entities within an entity set may have attributes that are not shared by all the entities in
the entity set.
• Thus E-R model provides the means of sub groupings these distinct entities.
• The process of designating sub groupings within an entity set is called specialization.
• For example, the entity set person can be specialized into entity sets employee, and customer because
the employee entities can be further described by emp-id and salary whereas customer entities can be
described by cus-id.
• Further on the basis of job type the entity set employee can be divided into entity sets manager, typist,
sweeper etc. Thus specialization may create hierarchy.
• In E-R diagram specialization is represented by rectangle with ISA level inside it. The level ISA stands for
“is a” and represents for example, that a customer “is a” person. This may be referred as superclass-
subclass relationship.
10.2 Generalization:
• Generalization is the reverse of the specialization. It is a bottom-up design process.
• Here we combine a number of entity sets that share the same features into a higher-level entity sets.
• Designer applies generalization is to emphasize the similarities among the entity sets and hide their
differences.
• Specialization and generalization are simple inversions of each other; they are represented in the E-R
diagram in the same way.
• For example entity set “employee” can be viewed as generalization of {officer, typist and sweeper}. In
reverse we can view {officer, typist and sweeper} as a specialization of “employee”.
• The attribute of higher level entity sets are said to be inherited by lower level entity sets.
• For example customer and employee inherit the attribute of person. Thus customer is described by
name, street and city attribute and additionally by customer-id attribute; the employee is described by
its name, street and city attribute and additionally by employee-id and salary.
Page | 7
• Condition defined vs User defined: This type of constraint determines which entities can be members
of a given lower-level entity set. If we can determine exactly which entities can be members of each
subclass by condition the subclass are called predicate-defined (or condition defined)
• Disjoint vs overlapping: Constraints on whether or not entities may belong to more than one lower-
level entity set within a single specialization.
Disjoint: an entity can belong to only one lower-level entity set. Noted in E-R diagram by writing
disjoint next to the ISA triangle.
Overlapping: an entity can belong to more than one lower-level entity set.
• Completeness constraint: specifies whether or not an entity in the higher-level entity set must belong
to at least one of the lower-level entity sets within a generalization.
Partial: an entity need not belong to one of the lower-level entity sets
10.5 Aggregation:
• A relationship set is an association between entity sets. Sometimes we have to model a relationship
between a collection of entities and relationships.
• E-R model cannot express relationships among relationship sets and relationships between relationship
and entity sets.
• Consider a DB with information about employees who work on a particular project and use a number of
machines in doing that work. One alternative in this case is to use ternary relationship between {emplyee,
project and machinery}
• The main problem that will arise in this case is that it has redundant relationships.
• Every {employee, project} combination that appears in uses relationship set also appears in works
relationship set.
• The solution is to use aggregation. An abstraction through which relationship sets (along with associated
entity sets) are treated as higher-lever entities is called aggregation and thus allows us to express
relationships between relationship set and entity set.
• For example, we treat the relationship set works and the entity sets employee and project as a higher-
level entity set.
ii. Here phone_number is used as an attribute but if we want to keep extra information about phone, such as
its location (home or office) , or its type (mobile or landline), we can treat the phone_number as an entity
set.
iii. Thus, if we take this point of view, we do not add the attribute phone_number to the instructor. Rather we
create: phone entity set with attribute phone_number and location and a relationship set inst_phone,
denoting the association between instructors and the phones.
iv. The difference here is treating a phone as an entity better models a situation where one may want to
keep extra information about a phone. Treating a phone as an attribute implies that instructors have
Page | 8
precisely one phone number each. Treating a phone as an entity phone permits instructors to have
several phone numbers and other information.
• Consider the takes relationship set to model the situation where a student takes a course.
• If we want to enter a course-registration record for each course that each student takes, then we have an
entity set to represent the course- registration record.
• Each registration entity is related to exactly one student and to exactly one section, so we have two
relationship sets, one to relate course registration records to student and one to relate course
registration record to section.
• Thus the entity sets course and student with the takes relationship set is replaced by one entity set and
two relationship sets:
• Both the approach represent the university’s information, but the use of takes is more compact and
preferable. However, if we want to associates other information, it might be best to make it an entity in its
own right.
Page | 9
Entity and entity set
An entity is a “thing” or “object” in the real world, either animate or
inanimate, that can be easily identifiable. For example, in a school
database, students, teachers, classes, and courses offered can be
considered as entities. All these entities have some attributes or properties
that give them their identity.
Attributes
Entities are represented by means of their properties, called attributes. All
attributes have values. For example, a student entity may have name,
class, roll no and age as attributes.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be
divided further. For example, a student's phone number is an atomic value of
10 digits.
Candidate Key − A minimal super key is called a candidate key. An entity set
may have more than one candidate key.
Primary Key − A primary key is one of the candidate keys chosen by the
database designer to uniquely identify the entity set.
Relationship
The association among entities is called a relationship. For example, an
employee works_at a department, a student enrolls in a course. Here,
Works_at and Enrolls are called relationships.
Relationship Set
A set of relationships of similar type is called a relationship set. Like
entities, a relationship too can have attributes. These attributes are
called descriptive attributes.
Degree of Relationship
The number of participating entities in a relationship defines the degree of
the relationship.
Binary = degree 2
Ternary = degree 3
n-ary = degree
Mapping Constraints
1 Mapping Cardinalities
Cardinality defines the number of entities in one entity set, which can be
associated with the number of entities of other set via relationship set.
One-to-one − One entity from entity set A can be associated with at most one
entity of entity set B and vice versa.
One-to-many − One entity from entity set A can be associated with more than
one entities of entity set B however an entity from entity set B, can be
associated with at most one entity.
Many-to-one − More than one entities from entity set A can be associated with
at most one entity of entity set B, however an entity from entity set B can be
associated with more than one entity from entity set A.
Many-to-many − One entity from A can be associated with more than one
entity from B and vice versa.
2 Participation Constraints
Tuples or rows
• Consider the account table. It has three column headers: account-number, branch-name and balance. We
refer to this headers as attributes.
• For each attribute, there is a set of permitted values, called the domain of that attribute. For the
attribute branch-name, the domain is the set of all branch names.
• The rows in a table are called tuples. For example in the above account relation {A101, Butwal, 500} is
a tuple.
• Attribute values are normally required to be atomic that is indivisible.
• The special value null is the member of every domain which signifies that the value is unknown or does
not exist. Null values causes a number of difficulties when we access or update the database, and thus
should be eliminated if at possible.
Let D1 denote the set of all account numbers, D2 denote the set of all branches and D3 the set of all
balances. Then the relation account is the set of
D1 X D2 X D3
that is account relation is a relation over account-number X branch-name X balances.
In general a table of n attributes is a set of
D1 X D2 X….....................X Dn.
Database Schema:
• Database Schema is the logical Design of the database and We use a convention of using lower-case
names for relations, and names beginning with an uppercase letter for relation schemas.
• If A1, A2, A3 are the attributes then R = (A1, A2,A 3) is a relation schema; For example:
Account-schema = (account number, branch-name, balance)
r(R) is a relation on relation schema R; For example:
account (Account-schema)
• In general, a relation schema consists of a list of attributes and their corresponding domains.
Database Instance:
• The contents of relation change with time as the relation is updated.
• database instance is the snapshot of the data in the database at a given instant of time.
• We simply say relation when we actually mean relation instance because the current values
(relation instance) of a relation are specified by the table.
Database:
• A database consists of multiple relations
• Information about an enterprise is broken up into parts, with each relation storing one part of the
information. E.g:
account: stores information about accounts
depositor: stores information about which customer owns which account
customer: stores information about custozmers
A database schema, along with primary key and foreign key dependencies, can be depicted pictorially
by schema diagrams.
The above figure shows the schema diagram for our banking enterprise. Each relation appears as a box,
with the attributes listed inside it and the relation name above it. If there are primary key attributes, we
draw a horizontal line acoross the box, with the primary key attributes listed above the line.
Foreign key dependencies appear as arrows from the foreign key attributes of the referencing relation
to the primary key of the referenced relation. For example, account-number attribute of depositor
relation act as a foreign key, which is a primary key of relation account. The dependency is shown by
the arrow.
(Note: do not confuse schema diagram with E-R diagram, In E-R diagram, attributes are listed in columns and
do not show foreign key explicitly as in schema diagram)
Query Language:
• A query language is a language in which a user requests information from the database.
• Query language can be categorized as either procedural or non-procedural.
• In procedural language, the user instructs the system to perform a sequence of operations on the
database to compute the desired result.
• In nonprocedural language, the user describes the desired information without giving a specific
procedure for obtaining that information.
• The relational algebra is procedural, whereas tuple relational calculus and domain relational calculus
are non-procedural approaches.
Fundamental Operation:
• The select, project and rename operations are called unary operations, because they operate on one
relation.
• The union, set difference and cartesian product are binary operations, because they operate on pairs of
relations.
Note: The below four tables are used to explain all the fundamental relational algebra
account depositor
loan borrower
1.The Select Operation:
A σA=BB∧ D > 5(r)C D
1 7
Relation r
23 10
A B C D
1 7
5 7
12 3
23 10
• The Union operation is the binary operation which involve a pair of relations.
• Consider a query to find the customer name of the bank who have either an account or a loan or both.
• To answer this query, we need the information in the depositor relation and in the borrower relation.
• To find the names of all customers with a loan in the bank, we write:
Πcustomer-name (borrower)
• To find the names of all customers with an account in the bank, we write:
Πcustomer-name (depositor)
• To answer the query we need the union of these two sets because we need all the customer names that
appear in either or both of the two relations. So the expression needed is:
Πcustomer-name borrower U Πcustomer-name (depositor)
• The set-difference operation, denoted by (–), allows us to find tuples that are in one relation but not in
another.
• The expression r – s produces a relation containing those tuples in r but not in s.
• Thus we can find all the customers of the bank who have an account but not a loan by writing
Πcustomer- C D D name(depositor) – Πcustomer-name(borrower)
• As with the union operation, we must ensureC that set difference are taken
between compatible relations. (Note: mention above two compatible condition)
A B Relation r Relation s
1 A B
2 1
1 2
A B C D E
10 a 1 10 a
10 a 1 10 a
20 b 1 20 b
2 10 a
2 10 a
2 20 b
• The Cartesian-product operation, denoted by a cross (X), allows
us to combine information from any two relations. We write the Cartesian product of relations r and s
as
r X s.
• Here the same attribute name may appear in both r and s, so we need to differentiate between these
attributes. We differentiate between the attributes by attaching the name of relation to the attribute
from which the attribute originally came. For example the relation schema for borrower X loan is
(customer-name, borrower.loan-number, loan.loan-number, branch-name, amount)
• Assume that we have n1 tuples in borrower and n2 tuples in loan, then there are n1 X n2 ways of
choosing a pair of tuples. Thus the output of the cartesian-product result in a large table.
• Suppose that we want to find the name of all the customers who have loan at bank along with a branch
name and amount. For this We need the information in both loan and borrower relation. Therefore, we
take out the cartesian product of borrower and loan (borrower X loan). We know that there is some
tuple in borrower X loan that has
borrower.loan-number = loan.loan-number.
Thus we write
Πcustomer-name, branch-name, amount (σborrower.loan-number = loan.loan-number (borrower X loan))
This will produce the required result.
Practice questions:
branch (branch-name, branch-city, assets)
customer (customer-name, customer-street, customer-city )
account (account-number, branch-name, balance)
loan (loan-number, branch-name, amount)
depositor (customer-name, account-number)
borrower (customer-name, loan-number)
a. Find all loans of over $1200.
b. Find the loan number for each loan of an amount greater than $1200
c. Find the names of all customers who have a loan, an account, or both, from the bank
d. Find the names of all customers who have a loan at the Butwal branch.
e. Find the names of all customers who have a loan at the Butwal branch but do not have an account at the
bank.
f. Find the largest account balance.
Answers for above questions:
a. σamount > 1200 (loan)
b. ∏loan-number (σamount > 1200 (loan))
c. ∏customer-name borrower ∪∏customer-name (depositor)
d. ∏customer-name (σbranch-name= Butwal (σborrower.loan-number = loan.loan-number(borrower x loan))
e. ∏customer-name (σbranch-name = Butwal (σborrower.loan-number = loan.loan-number(borrower x loan))) – ∏customer-
name(depositor)
Additional Operations:
• The fundamental operations of the relational algebra are sufficient to express any relational-algebra
query.
• However if we restrict ourselves to just the fundamental operations, certain common queries are
lengthy to express.
• Therefore, A B there are additional operations that do not add any power to
algebra, but 2 simplify common queries.
• Those 3 additional queries are:
set intersection, natural join, division, and assignment.
• Suppose that we want to find the name of all the customers who have loan at bank along with a branch
name and amount. We first form the cartesian product of the borrower and loan relation, then, we
select those tuples which have same loan-number, followed by the projection operation:
Πcustomer-name, branch-name, amount (σborrower.loan-number = loan.loan-number (borrower X loan))
• The natural join simplify this operation. We can obtain the same result by using the natural join as
follow:
Πcustomer-name, branch-name, amount (borrower loan)
account depositor
branch
2. Insertion:
• To insert data into a relation, we either specify a tuple to be inserted or write a query whose result is a
set of tuples to be inserted.
• The attribute value for the inserted tuples must be members of the attribute’s domain.
• Similarly , tuples inserted must be of the correct arity that is the number of attributes must be same.
• In relational algebra, an insertion is expressed by
r rUE
where, r is a relation and E is a relational algebra expression
• We express the insertion of a single tuple by letting E be a constant relation containing one tuple
Example:
• Insert information in the database specifying that Meena has Rs 1200 in account A-209 at the Sunwal
branch.
account account U {(A-209, Sunwal , 1200)}
depositor depositor U {( Meena , A-209)}
3. Updating:
• It is a mechanism to change a value in a tuple without changing all values in the tuple.
• We use a generalized projection operator to do this task
r ΠA , A ,…..Ai r
• Each Ai is either
the ith attribute of r, if the ith attribute is not updated, or,
if the attribute is to be updated Ai is an expression, involving only constants and the attributes of r,
which gives the new value for the attribute.
Example:
• Make interest payments by increasing all balances by 5 percent
account Πaccount-number, branch-name, balance * 1.05 (account)
• Pay all accounts with balances over 1000, 6 percent interest and pay all others 5 percent
account Πaccount-number, branch-name, balance * 1.06 (σbalance > 1000 account U Πaccount-number, branch-name, balance *
1.05 (σbalance (account))
Views:
• In some cases, it is not desirable for all users to see the entire logical model, that is all the actual
relations stored in the database. Security consideration may require that certain data be hidden from
users.
• Consider a person who needs to know the customer’s account number but has no need to see the
balance. This person should see a relation described, in relational algebra by
∏customer-name, account-number (account depositor)
• Any relation that is not part of the logical model, but is made visible to a user as a virtual relation, is
called a view.
View Definition:
• To define a view, we must give the view a name, and must state a query that computes the views
• A view is defined using the create view statement which has the form
create view v as <query expression>
Where, <query expression> is any relational algebra query expression and the view name is represented by v.
• As an example, consider the view consisting of the customer’s name and their account number. If we
wish this view name to be called cust_acc, we define this view as fallows:
create view cust_acc as
∏customer-name, account-number (account depositor)
• Once the view is defined, we can use the view name to refer to the virtual relation that the view
generates.
• Using the view cust_acc, we can find all the customers who have account by writing
∏customer-name (cust_acc)
• At any given time, the set of tuples in the view relation is the result of evaluation of the query
expression that defines the view at that time.
• Thus if a view relation is computed and stored, it may become out of date if the relations used to define
it are modified.
• To avoid this, views are usually implemented as follows:
When we define view, the database system stores the definition of view itself rather than the result of
evaluation of the relational-algebra expression that defines the view
Whenever a view relation appears in a query, it is replaced by the stored query expression.
Thus, whenever we evaluate the query, the view relation gets recomputed.
Certain database systems allows view relations to be stored, but they make sure that, if the actual
relations used in the view definition change, the view is kept up to date. Such views are called
materialized views. The process of keeping the view up to date is called view maintenance.
Difference in assignment operation and view definition:
We evaluate the assignment operation once, and the assignment variable does not change when we
update the relations that involves in the query expression assigned to it. In contrast, any modification
we make to the relations involved in defining the views, changes the set of tuples in the view relation as
well.
CHAPTER-4
To explain all the terms in this chapter we will be using the following relation schema:
Branch (branch_name, branch_city, assets)
Customer (customer_name, customer_street, customer_city)
Loan (loan_number, branch_name, amount)
Borrower (customer_name, loan_number)
Account (account_number, branch_name, balance)
Depositor(customer_name, account_number)
Tuple Variables
• A tuple variable in SQL must be associated with a particular relation.
• We define a tuple variable in the from clause by placing it after the name of the relation with
which it is associated, with the keyword as in between.
• To illustrate it, we rewrite the query For all customers who have loan from the bank, find the
names, loan numbers, and loan amount as
select customer_name, T.loan_number, S.amount
from borrower as T, loan as S
where T.loan_number = S.loan_number
• Tuple variable are most useful for comparing two tuples in the same relation.
• Suppose that we want the query Find the names of all branches that have assets greater than
those of at least one branch located in Butwal.
• We can write the expression as
select distinct T.branch_name
from branch as T, branch as S
where T.assets > S.assets and S.branch_city = Butwal
String Operation
• SQL specifies strings by enclosing them in single quotes, for example, Butwal .
• A single quote character that is part of a string can be specified by using two single quote
characters; for example It s right can be specified by It s right .
• The most commonly used operation on strings is pattern matching using the operator like. We
describe the patterns by using two special characters:
Percent (%): The % character matches any substring.
Underscore (_): The _ character matches any character.
• Consider the following examples to illustrate pattern matching:
But% matches any string beginning with But
%tw% matches any string containing tw as a substring.
_ _ matches any string of exactly two characters
_ _ % matches any string of at least two characters.
• Consider the query Find the names of all customers whose street address includes the
substring nag . The query can be written as
select customer_name
from customer
where customer_street like %nag%
• For patterns to include special pattern characters (that is % and _), SQL allows the specification
of an escape character.
• The escape character is used immediately before a special pattern character to indicate that
the special pattern character is to be treated like a normal character.
• We define the escape character for a like comparison using the escape keyword
• Consider the following patterns, which use a backslash (\) as the escape character:
Like ab\%cd% escape \ matches all string beginning with ab%cd .
Like ab\_cd% escape \ matches all string beginning with ab_cd .
• SQL allows us to search for mismatches instead of matches by using the not like comparison
operator.
Ordering of Tuples
• SQL offers the user some control over the order in which tuples in a relation are displayed.
• The order by clause causes the tuples in the result of a query to appear in sorted order.
• To list in alphabetic order all customers who have a loan at the Butwal branch, we write
select distinct customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number and branch_name = Butwal
order by customer_name
• By default, the order by clause lists the items in ascending order.
• To specify the sort order, we may specify desc for descending order or asc for ascending order.
• Furthermore, ordering can be performed on multiple attributes.
• Suppose that we wish to list the entire loan relation in descending order of amount. If several
loans have the same amount, we order them in ascending order by loan number. We express
this query in SQL as follows:
select *
from loan
order by amount desc, loan_number asc
Set Operations
• The set operations union, intersect, and except operate on relations and correspond to the
relational algebra operations , , −.
• Each of the above operations automatically eliminates duplicates; to retain all duplicates use
the corresponding multiset versions union all, intersect all and except all.
• Like union, intersection and set difference in relational algebra, the relations participating in
the set operations must be compatible, that is, they must be of same set of attributes
Aggregate Functions
• Aggregate functions are functions that take a collection of values as input and return a single
value. SQL offers five built-in aggregate functions:
avg: average value
min: minimum value
max: maximum value
sum: sum of values
count: number of values
• Consider the query Find the average balance at the Butwal branch. We write this query as
fallows:
select avg (balance)
from account
where branch_name = Butwal
The result of this query is a relation with a single attribute, containing a single tuple with numerical
value corresponding to the average balance at the Butwal branch.
• There are cases where we must eliminate duplicates while computing aggretate and for this we
use distinct in the aggregate expression. For example:
Find the number of branch of the bank.
select count (distinct branch_name)
from account
• We use the aggregate function count frequently to count the number of tuples in a relation. We
use the notation (*) for this. For example
Find the number of tuples in the account relation.
select count (*)
from account
1. Aggregate Functions – Group By:
• The group by clause is used where we would like to apply the aggregate function not only to
the set of tuples but also to a group of sets of tuples.
• The attributes given in the group by clause are used to form groups.
• Tuples with same value on the attribute in the group by clause are placed in one group.
• For example: Find the average account balance at each branch . We write this query as
fallows:
select branch_name, avg (balance)
from account
group by branch_name
Null values
• It is possible for tuples to have a null value, denoted by null, for some of their attributes
• Null signifies an unknown value or that a value does not exist.
• The predicate is null can be used to check for null values.
E.g. Find all account number which appear in the account relation with null values for balance.
select account_number
from account
where balance is null
• The result of any arithmetic expression involving null is null
E.g. 5 + null returns null
• Any comparison with null returns null
E.g. 5 < null returns null
• However, aggregate functions simply ignore nulls except count (*).
• Since the predicate in a where clause can involve operation such as and, or and not on the
results of comparisons, such operations deal with unknown value as:
OR: (unknown or true) = true, (unknown or false) = unknown, (unknown or unknown) =
unknown
AND: (true and unknown) = unknown, (false and unknown) = false, (unknown and unknown)
= unknown
NOT: (not unknown) = unknown
(Note that the result of the predicate is true if it satisfies the predicate otherwise it evaluates to be
false).
Nested Subqueries
• SQL provides a mechanism for the nesting of subqueries.
• A subquery is a select-from-where expression that is nested
• A common use of subqueries is to perform tests for set membership, set comparisons, and set
cardinality.
1. Set Membership
• The in connective tests for set membership, where the set is the collection of values produced
by a select clause.
• The not in connective tests for the absence of set membership.
• Consider the query Find all the customers who have both a loan and an account at the bank .
We can write the query as
select distinct customer_name
from borrower
where customer_name in (select customer_name
from depositor)
• Similarly, to find all customers who do not have a loan at the bank, but do not have an account
at the bank, we can write:
select distinct customer_name
from borrower
where customer_name not in (select customer_name
from depositor)
• The following query selects the names of customers who have a loan at the bank, and whose
name are neither Sita nor Hari.
select distinct customer_name
from borrower
where customer_name not in (‘Hari , Sita )
• Find all customers who have both an account and a loan at the Butwal branch.
select distinct customer_name
from borrower, loan
where borrower.loan_number = loan.loan_number and
branch_name = Butwal and
(branch_name, customer_name) in
(select branch_name, customer_name
from depositor, account
where depositor.account_number = account.account_number)
2.Set Comparison:
• Nested subquery can be used to compare sets.
• Consider a query Find the names of all branches that have assets greater than those of at least
one branch in Butwal.
• Here, the phrase greater than at least one is represented in SQL by > some. Thus the query
can be written as:
select branch_name
from branch
where assets > some (select assets
from branch
where branch_city = Butwal )
• SQL allows < some, <= some, >= some, and = some comparisons.
• Similary, the construct >all corresponds to the phrase greater than all .
• If you modify the above query as find the names of all branches that have an asset value
greater than that of each branch in Butwal.
• Then we write the query as fallows:
select branch_name
from branch
where assets > all (select assets
from branch
where branch_city = Butwal )
• SQL also allows <all, <=all, >=all, =all, comparisos.
Derived Relations
• SQL allows a subquery expression to be used in the from clause.
• If we use such an expression, then we must give the result relation a name, and we can rename
the attributes.
• We do this renaming by using the as clause. For example, consider the subquery
(select branch-name, avg (balance)
from account
group by branch-name)
as branch-avg (branch-name, avg-balance)
• The subquery result is named branch-avg, with the attributes branch-name and avg-balance.
• To illustrate the use of a subquery expression in the from clause, consider the query Find the
average account balance of those branches where the average account balance is greater than
$1200.
• We can write the query as:
select branch-name, avg-balance
from (select branch-name, avg (balance)
from account
group by branch-name)
as branch-avg (branch-name, avg-balance)
where avg-balance > 1200
Views
• We define a view in SQL by using the create view command.
• To define a view, we must give the view a name and must state the query that computes the
view. The form of the create view command is
create view v as <query expression>
where <query expression> is any legal query expression. The view name is represented by v.
• As an example, consider the view consisting of branch names and the names of customers who
have either an account or a loan at that branch. Assume that we want this view to be called all-
customer. We define this view as follows:
create view all-customer as
(select branch-name, customer-name
from depositor, account
where depositor.account-number = account.account-number)
union
(select branch-name, customer-name
from borrower, loan
where borrower.loan-number = loan.loan-number)
• View names may appear in any place where a relation name may appear. Using the view all-
customer, we can find all customers of the Butwal branch by writing
select customer-name
from all-customer
where branch-name = Butwal
• The attribute names of a view can be specified explicitly as follows:
create view branch-total-loan(branch-name, total-loan) as
select branch-name, sum(amount)
from loan
groupby branch-name
• The preceding view gives for each branch the sum of the amounts of all the loans at the branch.
Since the expression sum(amount) does not have a name, the attribute name is specified
explicitly in the view definition.
3. Updates
• Update statement is used to change a value in a tuple without changing all values a the tuple.
• Increase all the balance by 5 percent
update account
set balance = balance * 1.05
• If the increment is to be done with a balance of 1000 or more, then, we write:
update account
set balance = balance * 1.05
where balance >= 1000
• Let us now suppose that all accounts with balances over 1,000 receive 6 percent interest,
whereas all others receive 5 percent. We could write two update statements:
update account
set balance = balance * 1.06
where balance > 1000
update account
set balance = balance * 1.05
where balance <= 1000
• SQL provides a case construct, which we can use to perform both the updates with a single
update statement, avoiding the problem with order of updates.
update account
set balance = case
when balance <= 1000 then balance * 1.05
else balance * 1.06
end
4. Update of a view
• Note: borrower information is missing for L-301 and loan information is missing for L-401.
• The expression computes the inner join of the loan and the borrower relations, with the join
condition being loan.loan-number = borrower.loan-number.
• The attributes of the result consist of the attributes of the left-hand-side relation followed by
the attributes of the right-hand-side relation.
• We can rename the result relation and the attributes by using as clause.
loan inner join borrower on loan.loan-number = borrower.loan-number
as lb(loan-number, branch, amount, cust, cust-loan-num)
• The result is similar to the result of the inner join with the on condition, since they have, in
effect, the same join condition. However, the attribute loan-number appears only once in the
result of the natural join, whereas it appears twice in the result of the join with the on
condition.
• The attributes of the result are defined by the join condition, which is a natural join; hence,
loan-number appears only once. The first two tuples in the result are from the inner natural
join of loan and borrower. The tuple (Sita, L-401) from the right-hand-side relation does not
match any tuple from the left-hand-side relation loan in the natural inner join. Hence, the tuple
(L-401, null, null, Sita) appears in the join result.
Transaction
• A transaction consists of a sequence of query and/or update statements. The SQL standard
specifies that a transaction begins implicitly when an SQL statement is executed. One of the
following SQL statements must end the transaction:
1. Commit work commits the current transaction; that is, it makes the updates performed by the
transaction become permanent in the database. After the transaction is committed, a new
transaction is automatically started.
2. Rollback work causes the current transaction to be rolled back; that is, it undoes all the
updates performed by the SQL statements in the transaction. Thus, the database state is
restored to what it was before the first statement of the transaction was executed.
• Commit is similar to saving changes to a document that is being edited.
• While roolback is similar to quitting the edit session without saving changes.
• Transaction rollback is useful if some error condition is detected during execution of a
transaction.
• The database system guarantees that in the event of some failure, such as an error in one of the
SQL statements, a power outage, or a system crash, a transaction s effects will be rolled back if
it has not yet executed commit work.
Example:
• Transfer of money from one account to another involves two steps:
deduct from one account and credit to another
• If one steps succeeds and the other fails, database is in inconsistent state. Therefore, either
both steps should succeed or neither should.
• If any step of a transaction fails, all work done by the transaction can be undone by rollback
work.
• Rollback of incomplete transactions is done automatically, in case of system failures
• In many SQL implementations, by default each SQL statement is taken to be a transaction on its
own, and gets committed as soon as it is executed.
• Automatic commit of individual SQL statements must be turned off if a transaction consisting
of multiple SQL statements needs to be executed. How to turn off automatic commit depends
on the specific SQL implementation.
• An alternative, which is part of the SQL:1999 standard, is to allow multiple SQL statements to
be enclosed between the keywords
begin atomic . . .
end
All the statements between the keywords then form a single transaction.
Drop Clause:
• To remove a relation from an SQL database, we use the drop table command. The drop table
command deletes all information about the relation from the database. The command
drop table r
is a more drastic action than
delete from r
• The latter doesn t delete relation r, but deletes all tuples in r. The former deletes not only all
tuples of r, but also the schema for r.
Alter clause:
• We use the alter table command to add attributes to an existing relation. All tuples in the
relation are assigned null as the value for the new attribute. The form of the alter table
command is
alter table r add A D
where r is the name of an existing relation, A is the name of the attribute to be added, and D is
the domain of the added attribute.
• We can drop attributes from a relation by the command
alter table r drop A
where r is the name of an existing relation, and A is the name of an attribute of the relation.
Embedded SQL:
• The SQL standard defines embeddings of SQL in a variety of programming languages such as
Pascal, PL/I, Fortran, C, and Cobol.
• A language to which SQL queries are embedded is referred to as a host language, and the SQL
structures permitted in the host language comprise embedded SQL.
• EXEC SQL statement is used to identify embedded SQL request to the preprocessor
EXEC SQL <embedded SQL statement > END- EXEC
• a programmer must have access to a database from a general- purpose programming language
for at least two reasons:
1. Not all queries can be expressed in SQL, since SQL does not provide the full expressive power
of a general-purpose language. That is, there exist queries that can be expressed in a language
such as C, Java, or Cobol that cannot be expressed in SQL. To write such queries, we can embed
SQL within a more powerful language.
2. Nondeclarative actions—such as printing a report, interacting with a user, or sending the
results of a query to a graphical user interface—cannot be done from within SQL. For an
integrated application, the programs written in the programming language must be able to
access the database.
Dynamic SQL
• The dynamic SQL component of SQL allows programs to construct and submit SQL queries at
run time.
• In contrast, embedded SQL statements must be completely present at compile time; they are
compiled by the embedded SQL preprocessor.
• Using dynamic SQL, programs can create SQL queries as strings at run time (based on input
from the user) and can either have them executed immediately or have them prepared for
subsequent use. Preparing a dynamic SQL statement compiles it, and subsequent uses of the
prepared statement use the compiled version.
Unit 5 Integrity Constraints
A domain is defined as the set of all unique values permitted for an attribute. For
example, a domain of date is the set of all possible valid dates, a domain of integer
is all possible whole numbers, a domain of day-of-week is Sunday, Monday, Tuesda
.....Saturday.
This in effect is defining rules for a particular attribute. If it is determined that an
attribute is a date then it should be implemented in database to prevent invalid
dates being entered.
Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY
KEY / FOREIGN KEY / CHECK / DEFAULT)
Integrity constraints guard against accidental damage to the database, by ensuring that
authorized changes to the database do not result in a loss of data consistency.
Domain constraints are the most elementary form of integrity constraint.
They test values inserted in the database, and test queries to ensure that the
comparisons make sense.
New domains can be created from existing data types
E.g. create domain Dollars numeric(12, 2)
create domain Pounds numeric(12,2)
We cannot assign or compare a value of type Dollars to a value of type Pounds.
However, we can convert type as below
(cast r.A as Pounds)
(it should also multiply by the dollar-to-pound conversion-rate)
The check clause in SQL-92 permits domains to be restricted:
Use check clause to ensure that an hourly-wage domain allows only values
greater than a specified value.
The domain has a constraint that ensures that the hourly-wage is greater than
4.00
The clause constraint value-test is optional; useful to indicate which constraint
an update violated.
Can have complex conditions in domain check
create domain AccountType char(10)
constraint account-type-test check (value in (‘Checking’, ‘Saving’))
Example:
I want to create a table “student_info” with “stu_id” field having value greater than 100,
I can create a domain and table like this:
Another example:
I want to create a table bank_account with account_type field having value either checking or
saving :
Formal Definition
E1 R E2
• Delete. If a tuple t1 is deleted from r1, the system must compute the set of
tuples in r2 that reference t1:
(r2)
σα = t1[K]
If this set is not empty, either the delete command is rejected as an error ,or the
tuples that reference t1 must themselves be deleted. The latter solution may lead
to cascading deletions, since tuples may reference tuples that reference t1, and so
on.
• Update. We must consider two cases for update: updates to the referencing
relation (r2), and updates to the referenced relation (r1).
If a tuple t2 is updated in relation r2, a d the update odifies
values for the foreig key α, the a test si ilar to the insert
’
case is made. Let t2 denote the new value of tuple t2. The
system must ensure that
’
t2 [α] ∈ ΠK (r1)
If a tuple t1 is updated in r1, a d the update odifies values
for the primary key (K), then a test similar to the delete case is
made. The system must compute
σα = t1[K] (r2)
using the old value of t1 (the value before the update is applied). If
this set is not empty, the update is rejected as an error, or the update is
cascaded in a manner similar to delete.
Assertion Example
Eg 1)The sum of all loan amounts for each branch must be less than the sum of all
account balances at the branch.
create assertion sum-constraint check
(not exists (select * from branch
where (select sum(amount) from loan
where loan.branch-name = branch.branch-name)
>= (select sum(balance) from account
where account.branch-name = branch.branch-name)))
Eg 2)Every loan has at least one borrower who maintains an account with a
minimum balance or $1000.00
5.4 Triggers
A trigger is a statement that is executed automatically by the system as a side effect of a
modification to the database.
To design a trigger mechanism, we must meet two requirements:
1) Specify the conditions under which the trigger is to be executed.
2) Specify the actions to be taken when the trigger executes.
The above model of triggers is referred to as the event-condition-action model for
triggers.
Triggers have many uses, such as implementing business rules, audit logging, and even
carrying out actions outside the database system
Triggers introduced to SQL standard in SQL:1999, but supported even earlier using non-
standard syntax by most databases.
Trigger Example
Suppose that instead of allowing negative account balances, the bank deals with overdrafts by
setting the account balance to zero
creating a loan in the amount of the overdraft
giving this loan a loan number identical to the account number of the
overdrawn account
The condition for executing the trigger is an update to the account relation that results in a
negative balance value
d) Physical level
# Physical access to computers allows destruction of data by intruders;
traditional lock-and-key security is needed
# Computers must also be protected from floods, fire, etc.
e) Human level
# Users must be screened to ensure that an authorized users do not give access
to intruders
# Users should be trained on password selection and secrecy
Authorization
Forms of authorization on parts of the database:
Read authorization - allows reading, but not modification of data.
Insert authorization - allows insertion of new data, but not modification of existing data.
Update authorization - allows modification, but not deletion of data.
Delete authorization - allows deletion of data
6.3 Decomposition
We should decompose a relation schema that has many attributes into several schemas with fewer
attributes. Careless decomposition, however, may lead to another form of bad design.
Careless decomposition may lead to the loss of information and such decomposition is called lossy-
decomposition, or a lossy-join decomposition. Lossy-decomposition is a bad database design.
A decomposition that is not a lossy-join decomposition is a lossless-join decomposition.
When we decompose a relation into a number of smaller relations, it is crucial that the decomposition
be lossless.
A set of relation schemas {R1 , R2 , . . . , Rn } is a decomposition of R if
R = R1∪R2 ∪···∪Rn
That is, {R1, R2,...,Rn} is a decomposition of R if, for i = 1,2,...,n, each Ri is a subset of R, and every
attribute in R appears in at least one Ri.
To have a loseless-join decomposition, we need to impose contraints on the set of possible relations.
Let C represent a set of constraints on the database, and let R be a relation schema. A decomposition
{R1, R2, . . . , Rn} of R is a lossless-join decomposition if, for all relations r on schema R that are legal
under C,
r = ΠR1 r ΠR2 r ··· ΠRn (r)
Some functional dependencies are said to be trivial because they are satisfied by all relations.
For example, A → A is satisfied by all relations involving attribute A. Si ilarly, AB → A is satisfied y all
relations involving attribute A.
I ge eral, a fu tio al depe de y of the for α → β is trivial if β ⊆ α.
Desirable Properties of Decomposition
The decomposition has mainly two desirable properties:
1. Lossless-Join Decompsition
2. Dependency Preservation
We can use a given set of functional dependencies in designing a relational database in which most of
the undesirable properties do not occur. When we design such systems, it may become necessary to
decompose a relation into several smaller relations.
Consider a Lending-Schema:
Lending Schema = (branch-name, branch-city, assests, customer-name, loan-number, amount)
The set F of functional dependencies that we require to hold on Lending-schema are
branch-name → branch-city assets
loan-number → amount branch-name
loan-number → usto er-name
Based on above functional dependency, we decompose it to the following three relations:
Branch-schema = (branch-name, branch-city, assets)
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
Lossless-Join Decomposition
When we decompose a relation into a number of smaller relations, it is crucial that the decomposition
be lossless.
Let R be a relation schema, and let F be a set of functional dependencies on R. Let R1 and R2 form a
decomposition of R. This decomposition is a lossless-join decomposition of R if at least one of the
following functional dependencies is in F+:
R ∩R →R
R ∩R →R
In other words, if R ∩ R for s a superkey of either R or R , the de o positio of R is a lossless-join
decomposition. We can use attribute closure to efficiently test for superkeys.
Let s demonstrate that our decomposition of Lending-schema is a lossless-join decomposition by
showing a sequence of steps that generate the decomposition. We begin by decomposing Lending-
schema into two schemas:
Branch-schema = (branch-name, branch-city, assets)
Loan-info-schema= (branch-name, customer-name, loan-number, amount)
Since branch-name → branch-city assets, the augmentation rule for functional dependencies implies
that
branch-name → branch-name branch-city assets
Since Branch-schema ∩ Loan-info-schema = {branch-name}, it follows that our initial decomposition is a
lossless-join decomposition.
Next, we decompose Loan-info-schema into
Loan-schema = (loan-number, branch-name, amount)
Borrower-schema = (customer-name, loan-number)
This step results in a lossless-join decomposition, since loan-number is a common attribute and loan-
number → amount branch-name.
Dependency Preservation
When an update is made to the database, the system should be able to check that the update will not
create an illegal relation that is, one that does not satisfy all the given functional dependencies.
Suppose it is given that a set F of functional dependencies holds on any relation based on schema R.
Then set of functional dependencies that holds on any relation subschema R1 is F1 that contains the
functional dependencies of F which contains attributes of only R1. So if decomposition of R is { R1, R2,…Rn
} such that corresponding functional dependencies which holds on them are { F1, F2,…Fn } then following
should be true.
F+ = {F1 ∪ F2 ∪ … ∪ Fn}+.
Such a decomposition is called dependency preserving decomposition.
Example
Consider the schema
R = {A, B, C, D}
such that following functional dependency holds on it
F = {A→B, A →BC, C →D}.
Now suppose the decomposition of this R is
R1= {A,B} and R2 = {B,C,D},
The functional dependencies which holds on R1 are F1= {A→B} Note: F1 should contain all the functional
dependencies in F which have only attributes of R1) and
those on R2 are F2 = {C→D}.
The union F1 ∪ F2 is {A→B, C →D} hi h does t contain the
A →BC , so it is ot a depe de y preser i g de o positio .
If we decompose R into these relation schemas
R1 ={A,B,C} and R2={C,D} then
F1={A→B, A →BC} a d F2 = {C→D} a d
F1 ∪ F2 is {A→B, A →BC, C →D}, so it is a depe de y preser i g de o positio .
Algorithm for testing dependency preservation
The algorithm shown below is for testing dependency preservation. The input is a set D = {R1, R2, . . . ,
Rn} of decomposed relation schemas, and a set F of functional dependencies.
compute F+;
for each schema Ri in D do
begin
Fi : = the restriction of F+ to Ri;
end
F′ := ∅
for each restriction Fi do
begin
F ′ = F ′ ∪ Fi
end
o pute F′+;
if F ′+ = F +) then return (true)
else return (false);
Example
R = (A, B, C)
F = A → B, B → C
Can be decomposed in two different ways
1. R1 =(A,B), R2 =(B,C)
Lossless-join decomposition:
R ∩R = {B} a d B→BC
Dependency preserving
R = A,B , f : A → B
R = B,C , f : B → C
f1 U f2 = F
2. R1=(A,B), R2 =(A,C)
Lossless-join decomposition:
R ∩R ={A} a d A→AB
Not dependency preserving
R = A,B , f : A → B
R2 = (A,C)
a ot he k B → C
Suppose that we decompose the schema R = (A, B, C, D, E) into
(A, B, C) and (A, D, E)
Show that this decomposition is a lossless-join decomposition if the following set F of functional
dependencies holds:
A → BC CD → E B→D E→A
Ans: A decomposition {R1,R2} is a lossless-join decomposition if R1 ∩ R2 → R1 or R1 ∩ R2 → R2.
Let R1 = (A,B,C) and R2 = (A,D,E)
Given,
A → BC
by augmentation rule
AA →ABC by augmentation rule
A → ABC union of identical sets
And the common attribute in R1 and R2 is A, that is,
R 1 ∩ R 2 → R1
Thus the requirement for lossless-join decomposition is fulfilled, so, this decomposition is lossless-join
decomposition.
6.4 Normalization
The normalization process, takes a relation schema through a series of tests to "certify" whether it
satisfies a certain normal form.
There are different normal forms:
1. First Normal Form
2. Second Normal Form
3. Third Normal Form
4. Boyce-Codd Normal Form (BCNF)
Remove the attribute Dlocations that violates 1NF and place it in a separate relation Dept_locations
along with the primary key Dnumber of Department. The primary key of this relation is the combination
{Dnumber, Dlocation}. A distinct value exists for each location of a department in Dept_location. This
decomposes the non-1NF relation into two 1NF relations.
• Regulatory
– Reduce risk
– Improve compliance
• Opportunities
– Driving business growth
– Delivering improved client service
– Providing prompt and accurate responses to regulatory requests
– Delivering increased business intelligence from consistent and aggregated data
– Information sharing and coordination across organizations.
– Cross-selling
– Segmenting and targeted service opportunities
5
7.2 Data governance drivers
• While data governance initiatives can be driven by a desire
to improve data quality, they are more often driven by C-
Level leaders responding to external regulations.
• Examples of these regulations include Sarbanes-
Oxley, Basel I, Basel II, HIPAA, and a number of data
privacy regulations.
• To achieve compliance with these regulations, business
processes and controls require formal management
processes to govern the data subject to these regulations.
• Successful programs identify drivers meaningful to both
supervisory and executive leadership.
• Common themes among the external regulations center on
the need to manage risk. The risks can be financial
misstatement, inadvertent release of sensitive data, or
poor data quality for key decisions.
• Methods to manage these risks vary from industry to
industry. Examples of commonly referenced best practices
and guidelines include COBIT, ISO/IEC 38500, and others.
• The proliferation of regulations and standards creates
challenges for data governance professionals, particularly
when multiple regulations overlap the data being managed.
Organizations often launch data governance initiatives to
address these challenges.
Data Governance - Drivers
An effective data governance program leads to a successful data management capability
Data
Customer
Security &
Service
Quality
Data Strategy
& Data Corporate
Agility
Accountability Governance
Drivers
Corporate
Cost
Decision
Reduction
Making
Compliance
& Risk
Mgmt
1. Corporate Governance
• Following the Enron scandal the US government took action to hold
directors accountable for the misrepresentation of financial results with
the Sarbanes-Oxley Act.
Many South African companies have dual listings or international partners
and have to comply with this act. On the local front, King III is the most
comprehensive set of guidelines yet published to ensure good financial
governance for South African companies, while government entities must
comply with the Public Financial Management Act (PFMA).
2. Privacy legislation
• The Consumer Protection Act, the PCI-DSS (for credit card vendors) and
the pending Protection of Personal Information bill all require that
sensitive data is secure, is of good quality, is accessed on a need to know
basis only, and is disposed of when no longer needed.
3. Risk
• Basel II (and the pending Basel III) and Solvency II are regulations set up to
manage and assess risk for banks and insurance companies.
• Organizations are required to establish a process for data quality
management and account for adjustments to historical data.
• The egulatio s e ui e that isk al ulatio s ust e p o a ly o e t
based on ongoing assessment and monitoring of core risk data.
• Like any large program that touches all departments in the business a
pragmatic data governance implementation will add value.
Unit:7
Data-governance
Data governance is a control that ensures that the data entry by an operations team member or by an
automated process meets precise standards, such as a business rule, a data definition and data integrity
constraints in the data model.
The data governor uses data quality monitoring against production data to communicate errors in data
back to operational team members, or to the technical support team, for corrective action.
Data governance includes the people, processes, and information technology required to create a
consistent and proper handling of an organization's data across the business enterprise in order to
improve data quality.
Data governance is a set of processes that ensures that important data assets are formally managed
throughout the enterprise.
Data governance ensures that data can be trusted and that people can be made accountable for any
adverse event that happens because of low data quality.
It is about putting people in charge of fixing and preventing issues with data so that the enterprise can
become more efficient.
Data governance also describes an evolutionary process for a company, altering the company’s way of
thinking and setting up the processes to handle information so that it may be utilized by the entire
organization.
It’s about using technology when necessary in many forms to help aid the process.
When companies desire, or are required, to gain control of their data, they empower their people, set
up processes and get help from technology to do it.
According to one vendor, data governance is a quality control discipline for assessing, managing, using,
improving, monitoring, maintaining, and protecting organizational information. It is a system of decision
rights and accountabilities for information-related processes, executed according to agreed-upon
models which describe who can take what actions with what information, and when, under what
circumstances, using what methods.
Business enterprises will are realized following benefits by the
implementation of Data governance programs.
1. Increasing consistency and confidence in decision making
2. Decreasing the risk of regulatory fines
3. Improving data security, also defining and verifying the requirements for data distribution
policies
4. Maximizing the income generation potential of data
5. Designating accountability for information quality
6. Enable better planning by supervisory staff
7. Minimizing or eliminating re-work
8. Optimize staff effectiveness
9. Establish process performance baselines to enable improvement efforts
10. Acknowledge and hold all gain
Data Governance - Benefits
Cost
Increased efficiency reduces costs
Reduce redundant systems and associated operating costs
Reduce time and effort needed to verify and correct poor data
Regulatory
Reduce risk
Improve compliance
Opportunities
Driving business growth
Delivering improved client service
Providing prompt and accurate responses to regulatory requests
Delivering increased business intelligence from consistent and aggregated data
Information sharing and coordination across organizations.
Cross-selling
Segmenting and targeted service opportunities
Implementation
Implementation of a Data Governance initiative may vary in scope as well as origin.
Sometimes, an executive mandate will arise to initiate an enterprise wide effort, sometimes the
mandate will be to create a pilot project or projects, limited in scope and objectives, aimed at either
resolving existing issues or demonstrating value.
Sometimes an initiative will originate lower down in the organization’s hierarchy, and will be deployed in
a limited scope to demonstrate value to potential sponsors higher up in the organization.
The initial scope of an implementation can vary greatly as well, from review of a one-off IT system, to a
cross-organization initiative.
Unit 9Transaction Management
Transaction:
A business deal in which goods, services, or money are passed from one person, account,
etc., to another is called transaction. A transaction can be defined as a group of tasks. A
single task is the minimum processing unit which cannot be divided further.
1) Due to system failures many operations remain uncompleted and it provides a reliable
way which recovers the database and makes it consistent.
Atomicity − This property states that a transaction must be treated as an atomic unit, i.e.,
either all of its operations are executed or none. There must be no state in a database
where a transaction is left partially completed. States should be defined either before the
execution of the transaction or after the execution/abortion/failure of the transaction.
a transaction is a unit of operation - either all the transaction's actions are completed or none
are
atomicity is maintained in the presence of deadlocks
Consistency −The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.eg: While sending and receiving
email data. Sender sends email and which remains same at receiving end.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will
affect the existence of any other transaction. Eg: In banking system, deposit, loan and
payment should be done at once in the different departments.
Durability −The database should be durable enough to hold all its latest updates even if
the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be updated
once the system springs back into action.
recovery to the most recent successful commit after a database software failure
recovery to the most recent successful commit after an application software failure
recovery to the most recent successful commit after a data disk failure
9.2 Transaction States (States of Transactions)
A transaction in a database can be in one of the following states −
Active −In this state, the transaction is being executed. This is the initial state of every
transaction.
Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then
the recovery manager rolls back all its write operations on the database to bring the
database back to its original state where it was prior to the execution of the transaction.
Transactions in this state are called aborted. The database recovery module can select one
of the two operations after a transaction aborts −
If at any point the transaction has to be aborted, the system merely deletes the new copy.
The old copy of the database has not been affected. If the transaction completes, it is
committed as follows. First, the operating system is asked to make sure that all pages of the
new copy of the database have been written out to disk. After the operating system has
written all the pages to disk, the database system updates the pointer db-pointer to point to
the new copy of the database; the new copy then becomes the current copy of the
database. The old copy of the database is then deleted. Figure depicts the scheme,
showing the database state before and after the update.
The transaction is said to have been committed at the point where the updated db pointer
is written to disk.
We now consider how the technique handles transaction and system failures. First, consider
transaction failure. If the transaction fails at any time before db-pointer is updated, the old
contents of the database are not affected. We can abort the transaction by just deleting
the new copy of the database. Once the transaction has been committed, all the updates
that it performed are in the Prepared.
database pointed to by dbpointer. Thus, either all updates of the transaction are reflected
,or none of the effects are reflected, regardless of transaction failure.
Now consider the issue of system failure. Suppose that the system fails at any time before
the updated db-pointer is written to disk. Then, when the system restarts, it will read db-
pointer and will thus see the original contents of the database, and none of the effects of
the transaction will be visible on the database.
Next, suppose that the system fails after db-pointer has been updated on disk. Before the
pointer is updated, all updated pages of the new copy of the database were written to
disk. Again, we assume that, once a file is written to disk, its contents will not be damaged
even if there is a system failure. Therefore, when the system restarts, it will read db-pointer
and will thus see the contents of the database after all the updates performed by the
transaction. The implementation actually depends on the write to db-pointer being atomic;
that is, either all its bytes are written or none of its bytes are written. If some of the bytes of
the pointer were updated by the write, but others were not, the pointer is meaningless, and
neither old nor new versions of the database may be found when the system restarts.
Luckily, disk systems provide atomic updates to entire blocks, or at least to a disk sector.
In other words, the disk system guarantees that it will update db-pointer atomically, as long
as we make sure that db-pointer lies entirely in a single sector, which we can ensure by
storing db-pointer at the beginning of a block. Thus, the atomicity and durability properties
of transactions are ensured by the shadow-copy implementation of the recovery-
management component.
9.2.2 Serializability
When multiple transactions are being executed by the operating system in a
multiprogramming environment, there are possibilities that instructions of one transactions
are interleaved with some other transaction.
Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of schedule
is called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark. The
execution sequence of an instruction in a transaction cannot be changed, but two
transactions can have their instructions executed in a random fashion. This execution does
no harm if two transactions are mutually independent and working on different segments of
data; but in case these two transactions are working on the same data, then the results
may vary. This ever-varying result may bring the database to an inconsistent state.
a) Lock-based Protocols
Database systems equipped with lock-based protocols use a mechanism by which any
transaction cannot read or write data until it acquires an appropriate lock on it.
Binary Locks − A lock on a data item can be in two states; it is either locked or unlocked.
Shared/exclusive − This type of locking mechanism differentiates the locks based on their
uses. If a lock is acquired on a data item to perform a write operation, it is an exclusive lock.
Allowing more than one transaction to write on
the same data item would lead the database into an inconsistent state. Read locks are
shared because no data value is being changed.
Simplistic lock-based protocols allow transactions to obtain a lock on every object before a
'write' operation is performed. Transactions may unlock the data item after completing the
‘write’ operation.
Pre-claiming protocols evaluate their operations and create a list of data items on which
they need locks. Before initiating an execution, the transaction requests the system for all
the locks it needs beforehand. If all the locks are granted, the transaction executes and
releases all the locks when all its operations are over. If all the locks are not granted, the
transaction rolls back and waits until all the locks are granted.
This locking protocol divides the execution phase of a transaction into three parts. In the first
part, when the transaction starts executing, it seeks permission for the locks it requires. The
second part is where the transaction acquires all the locks. As soon as the transaction
releases its first lock, the third phase starts. In this phase, the transaction cannot demand
any new locks; it only releases the acquired locks.
Two-phase locking has two phases, one is growing, where all the locks are being acquired
by the transaction; and the second phase is shrinking, where the locks held by the
transaction are being released.
To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and
then upgrade it to an exclusive lock.
The first phase of Strict-2PL is same as 2PL. After acquiring all the locks in the first phase, the
transaction continues to execute normally. But in contrast to 2PL, Strict-2PL does not release
a lock after using it. Strict-2PL holds all the locks until the commit point and releases all the
locks at a time.
The most commonly used concurrency protocol is the timestamp based protocol. This
protocol uses either system time or logical counter as a timestamp
Lock-based protocols manage the order between the conflicting pairs among transactions
at the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the
age of the transaction. A transaction created at 0002 clock time would be older than all
other transactions that come after it. For example, any transaction 'y' entering the system at
0004 is two seconds younger and the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system
know when the last ‘read and write’ operation was performed on the data item.
Recovery:
Crash Recovery
DBMS is a highly complex system with hundreds of transactions being executed every
second. The durability and robustness of a DBMS depends on its complex architecture and
its underlying hardware and system software. If it fails or crashes during transactions, it is
expected that the system would follow some sort of algorithm or techniques to recover lost
data.
Failure Classification
A failure has been generalize into various categories as follows:-
1) Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it
can’t go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Logical errors − Where a transaction cannot complete because it has some code error or
any internal error condition.
System errors − Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system aborts an
active transaction.
2) System Crash
There are problems − external to the system − that may cause the system to stop abruptly
and cause the system to crash. For example, interruptions in power supply may cause the
failure of underlying hardware or software failure which include operating system errors..
3) Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or a part of disk storage.
It should check the states of all the transactions, which were being executed.
A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be rolled
back.
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and
stored on a stable storage media, which is failsafe.
Deferred database modification − All logs are written on to the stable storage and the
database is updated when a transaction commits.
Checkpoint
Keeping and maintaining logs in real time and in real environment may fill out all the
memory space available in the system. As time passes, the log file may grow too big to be
handled at all. Checkpoint is a mechanism where all the previous logs are removed from
the system and stored permanently in a storage disk. Checkpoint declares a point before
which the DBMS was in consistent state, and all the transactions were committed.
Recovery
When a system with concurrent transactions crashes and recovers, it behaves in the
following manner:-
The recovery system reads the logs backwards from the end to the last checkpoint.
If the recovery system sees a log with <Tn, Start> and <Tn, Commit> or just <Tn, Commit>, it
puts the transaction in the redo-list.
If the recovery system sees a log with <Tn, Start> but no commit or abort log found, it puts
the transaction in undo-list.
All the transactions in the undo-list are then undone and their logs are removed. All the transactions in the
redo-list and their previous logs are removed and then redone before saving their logs.