Degree Dbms Notes
Degree Dbms Notes
Degree Dbms Notes
CHAPTER-1
INTRODUCATION TO DBMS
System
In the conventional File Processing System each and every sub system of the
information system will have own set of files. As a result there will be a duplication
of data between various sub systems.
It is clearly that some of the files are duplicated in different subsystems of the
conventional file processing system. This will intern increase the data redundancy.
1
DBMS Material
Consider the example of a hospital system. The following diagram of the hospital
is shown below.
The patients come to the hospital from the society. Upon the arrival of a
preliminary registration is done by seeking information about the patient. Then
depending on the type and illness, the patient will either be treated as out-patient
(or) in-patient. In some cases initially a patient will be treated as out-patient and
then the patient will be admitted as outpatient if necessary. Finally the bills are to be
paid before the patient is discharged. In this system we are using four files. The files
are
2
DBMS Material
2. A lot of paper work and telephone calls would be required to synchronize file
structure.
3. The system cannot provide answer to complex queries.
Each sub system of an organization maintains own set of files without data
sharing, the same data will be available in different files. This will result increased
disc space, increased time of data entry and inconsistency of data.
2. Inconsistency of data
The uncontrolled redundancy of data will permit the system to have the same
data in different files. As a result, a particular data element like patient name will be
entered differently in different files. Which is nothing but inconsistency of that
particular data element . while performing the basic data operations like updation,
deletion, retrieval etc. This inconsistency will give misleading results.
3. Inflexibility
In the conventional file processing system generally top down approach will be
followed in file design. In this approach a set of meaningful entities of the proposed
system will be identified along with their attributes to create the files. The actual set
of reports which is required for the system will not be considered on this approach.
As a result , it may not be possible to meet the requirements of the users fully. Also
3
DBMS Material
in the future , if there is a some changes in the user requirements , then the
conventional file processing system will not be flexible to provide the results.
In this conventional file processing system there is no implicit facility for backup
and recovery from system failure. It means that when an application program failed
in middle of its work on its updating on a file.
Since different applications and their respective files were developed by different
groups will design data fields, since each group will follow its own standards for
fields name, fields width, fields type etc. This will create a serious difficulty while
modifying programs and data structures by different groups of users which will leads
to low programmer productivity.
What is DATABASE?
4
DBMS Material
For E.g., consider the names, telephone numbers and addresses of the people. We
have recorded this data in an address book (or) we may have stored it on a file in
the hard disk, using a computer and software such as Microsoft Excel (or) MS
Access. This collection of inter related data is a database. Generally a database
contains one data file (or) large number of data files. The database is organized in
such a way that a application programs will quickly retrieve the required data.
Defining a database involves specifying the data types, structures, and constraints
for the data to be stored in the database.
Constructing the database is the process of storing the data itself on some storage
medium that is controlled by the DBMS.
Sharinga database allows multiple users and programs to access the database
simultaneously. Other important function provided by the DBMS is unauthorized
users cannot access the database.
Consider the example of the hospital system which deals in-patients as well
as out- patients. The hospital system is shown below.
5
DBMS Material
In-Patient Accounts
Treatment
Out-Patient
Treatment
The database approach uses a centralized place to save data hence the
amount of data redundancy will be minimized. Data redundancy can be minimized
by applying normalization process in database design.
6
DBMS Material
2. Consistency of data
3. Flexibility
7
DBMS Material
1. Increased complexity.
2. Requirement of new and specialized manpower.
3. Large size of DBMS.
4. Increased installation and maintenance cast.
5. Conversion cost.
1. Increased complexity
8
DBMS Material
The DBMS occupies many Giga Bytes of storage space and requires more
amount of main memory to run efficiently.
The DBMS software has a high initial cost. It requires trained person to install
and operate. and also has more annual maintenance. Installing such software’s also
requires upgrades to the hardware and software.
5. Conversion cost
9
DBMS Material
User User
Client Application
Application
Programer Programer
Application Server
Database
Application srever
Server
Database
10
DBMS Material
Internal level
The internal schema defines the internal level. The internal level is the lowest
level of data abstraction. This level indicates how the data will be stored into the
database and describes the file structures and data structures and methods to be
used by the data base.
Conceptual level
The conceptual schema defines the conceptual level. The conceptual level in the
middle level abstraction. This level indicates entities, attributes, relationship between
entities and attributes.
External level
External schema defines the external level. The external level is the highest level of
data abstraction. This level describes part of database. i.e., relevant to the user.
11
DBMS Material
Physical independency
In physical independency, changes to the internal schema such as file
structures, accessing methods and devices used for store the data must be possible,
without changing the conceptual schema and external schema.
Logical independency
In logical independency, changes to the conceptual schema such as addition
and deletion of entities, addition and deletion of attributes, addition and deletion of
relationships must be possible without change in external schema.
12
DBMS Material
13
DBMS Material
14
DBMS Material
b. DBMS software.
15
DBMS Material
Utilities are the software tools used to help, manage the database systems
computer components.
What is Data?
What is Information ?
Meta data is the data about the data i.e., information for accessing the data.
Field: A field is the lowest level of data item of an entity which is alternatively called
as an attribute of that entity.
16
DBMS Material
1 sweaty banglore
File: File is a collection of records having same set of fields arranged in the same
sequence.
1 manimala Chennai
2 priya Hyderabad
Key field (or) Primary key: A key field is said to be key field (Or) primary key if it
can identify a record uniquely in a file.
Non key field (or) secondary key: A field is said to be Non key field (or)
secondary key if it cannot identify a record uniquely in a file.
17
DBMS Material
CHAPTER-2
Data Models
Data model:
Data modeling in the first step in designing a database refers to the process of
creating a specific data model for a problem.
1. Data Model can facilitate interaction among the designer, the application
programmer and the end user.
2. Applications are used to transform data into information. But data are viewed in
different ways by different people.
3. For e.g. the manager and clerk both are working in the same company, the
manager have wide view of company data than the clerk.
4. A company president has universal view of data.
5. Different Managers views data differently in a company. The inventory manager
is more concerned about inventory levels, while purchasing manager concerned
about items and supplies.
6. Application programmers have another view of data i.e., concerned with data
locations and formatting.
7. A house is a collection of roots, if someone is going to build a house, they have
the overall view i.e., provided by blue print. A sound data environment requires
an overall database blue print based on appropriate data model.
18
DBMS Material
The basic building blocks of data models are entities, attributes, relationships
and constraints. An entity represents a real world object person (or) place.
Many-to-many (M:N, *…….*): An employee may learn many job skills and each
job skill may be learned by many employees. Therefore, the database designer label
the relationship Employee learns skills as many-to-many (M:N).
One-to-one (1:1, 1…….1): Each store manager manages only a single store.
Therefore, the data designer label the relationship employee manages stores as one-
to-one (1:1).
BUSINESS RULES:
19
DBMS Material
2. A training session cannot be scheduled for fever than 10 employees or for more
than 30 employees. These business rules establish entities, relationships
and constraints.
The first business rule establishes two entities (customer, invoices) and a one-
to-many relationship between these two entities.
The 2 nd business rule establishes a constraint. (No fewer than 10 people (or)
more than 30 people) and two entities (training, people) and a relationship between
employee and training.
The main source of business rules are company manager, policy manager,
department manager and written documents such as company’s procedures,
standards (or) operation manuals. A faster direct source of business rules is direct
interviews with the concerned persons.
General rule: A noun in a business rule be translate into an entity in that model,
and a verb associating nouns will translated into a relationship among the entities.
For e.g. the business rule “ customer may generate many voices “
Containing two nouns (customer and invoices) and a verb (generate) that
associates the noun.
For e.g. the business rule “ A customer may generate many invoices”, the
relationship is one-to-many (1:M, 1…….*). Customer is the 1 side and invoice is the
many side.
20
DBMS Material
21
DBMS Material
The above conceptual data model can be mapped into any one ways as shown
below.
An alternative1 student file is kept at the root segment of the tree and the
faculty file is kept at the bottom of the tree. By mapping the conceptual data model
into the hierarchical data model the following facts are observed.
22
DBMS Material
In alternative2 the faculty file is kept at the root of the tree and student
file is kept at the bottom of the tree. While mapping the conceptual data model the
following facts are observed.
Advantages:
Disadvantages:
23
DBMS Material
A Network data model consists of a set of pair wise association between the
entities.
1. Schema: The schema provides overall view of the database to the administrator.
2. Sub Schema: The sub schema which defines the portion of the database seen
by the application programs.
3. Database Management Language: That defines the environment in which
data can be changed. The DBTC specify 3 DML components.
a. A schema data Definition Language (DDL), which enables the data base
administrator to create the database.
b. A subschema DDL, which allows the application programs to define database
component that will be used by the application.
c. A Data Manipulation Language, to manipulate the data in the database.
24
DBMS Material
Sets
25
DBMS Material
Advantages:
3. Data access is more flexible than in hierarchical and file system models.
6. It includes data definition language (DDL) and data manipulation language (DML)
in DBMS
Disadvantages:
26
DBMS Material
Tables are related through the sharing of common attribute. For e.g. the
table agent and customer as shown below.
Agent
Customer
27
DBMS Material
Agent Customer
Agent_Code Cust_No
Agent_Name Cust_name
Agent_Address Cust_Address
Agent_PhoneNo Phone_No
The customer represents “many” sides, because an AGENT can have many
CUSTOMERS.
The AGENT represents the “1” side because each CUSTOMER has only one
AGENT.
Advantages:
28
DBMS Material
4. Powerful RDBMS isolates the end user from physical-level details and improves
implementation and management simplicity.
Disadvantages:
Peter Chen first introduced the E.R.data model in 1976; it was the graphical
representation of entities and their relationship in a database.
a. Entity: entities are the real time objects. Entities represented by a rectangle.
e.g. painter, employee, skills, noun.
b. Attribute: Attributes are the characteristics of entities.
e.g. Empno, Empname, Empaddress etc.
c. Relationships: A relationship describes association among the entities.
There are three types of relationships, one-to-many, many-to-many and one-
to-one.
1. Chen notation
29
DBMS Material
CHEN NOTATION
In Chen notation, entities are represented rectangle and entity names are
written in the capital letters at the centre of the rectangle. Relationships are
represented by a diamond. The diamonds are connected to entities through a
relationship name is written inside the diamond.
In the crows foot notation, the crow foot is derived from the three pronged
symbol used to represent many relationships. In this notation, the one represented
by a short line segments, and many is represented by the crow’s foot. The
30
DBMS Material
relationship name is written above the relationship line. The relationships are also
show in vertical.
Advantages:
1. Visual modeling yields exceptional conceptual simplicity.
2. Visual representation makes it an effective communication tool.
3. It is integrated with dominant relational model.
Disadvantages:
1. There is limited constraint representation.
2. There is limited relationship representation.
3. There is no data manipulation language.
4. Loss of information content occurs when attributes are removed from entities to
avoid crowded displays. (This limitation has been addressed in subsequent graphical
versions)
31
DBMS Material
For E.g. Let us use invoice program. In this case the invoices are generated by
customer, each invoice contains one (or) more lines, and each line represents an
item purchased by customer.
for invoice.
32
DBMS Material
The object representation of invoice includes all the related objects within the
same object. The 1 next to the customer object indicates that each invoice related
to one customer. The M next to the line object indicates that each invoice contains
no. of lines.
The UML class diagram uses 3 separate classes (customer, invoice and line)
and two relationships to represent this problem.
The E.R Model also uses the 3 separate entities and two relationships to
represent the invoice problem.
Advantages:
Disadvantages:
33
DBMS Material
CHAPTER-3
Relational Database Model
In Relational Data base model records are stored into tables. Relational data
model is easier to understand than the hierarchal data models and network data
models. Relational data model provides a logical view of the data and its relationship
among other data.
A Table is composed of rows and columns. Each row of the table is called as
tuple. Each column of the table is called as attribute. A table is also called as
Relation. A table contains a group of related entities.
Characteristics of a Table:
34
DBMS Material
KEYS
A key may be composed of more than one attribute; such a multi attribute key is
known as Composite key.
35
DBMS Material
{Eno } :No two rows have same Eno ( Eno uniquely identifies a tuple(row) in a
relation)
{Voter_id} :No two rows have same Voter_id (Voter_id uniquely identifies a
tuple(row) in a relation)
Eg : In a STUDENT table
STUDENT{STU_NUM,STU_LNAME,STU_FNAME,STU_INIT,STU_DOB,STU_HRS,STU_
CLASS}
{STU_NUM}
{STU_NUM, STU_LNAME}
{STU_NUM, STU_LNAME, STU_INIT}
Definition : A Minimal Super Key (Key) K is a superkey with the additional property
that removal of any attribute from K will cause K not to be a superkey any more.
{Eno } : is Minimal Super Key ( A Super Key which have only one attribute is
Minimal Super Key)
36
DBMS Material
{Eno, Ename }: is Not a Minimal Super Key ( Removal of Ename from {Eno,
Ename} = {Eno} is also a Super Key } hence {Eno, Ename} is not Minimal Super
Key.
Candidate Key :
Definition : If a relation schema has more than one key (Minimal Super Key) then
each of them is called as candidate key.
Primary Key :
Note :
A Relation(table) can have many Superkeys, and also many Minimal Superkeys.
One of the candidate keys is arbitrarily designated to be the primary key, and the
others are called secondary keys(or Alternative key).
Key Hierarchy
37
DBMS Material
Foreign Key:
A foreign key is an attribute whose values match the primary key values in the
related table.
A foreign key means the values of a column in one table must also appear
in a column in another table. The foreign key in the child table will generally
reference a primary key in the parent table. The referencing table is called
the child table & referenced table is called the parent table
Self Referential Integrity Constraint mean a column in one table references
the primary key column(s) in the same table.
EMPReferencing relation
(FK)
In EMP table Eno is Primary Key. (Duplicates and Null values are not allowed in Eno)
In EMP table Dept_no is foreign key which references DEPT table Dept_no column.
(A value for Dept_no in EMP table accepts only if it exists in Dept_no column in
DEPT table.)
38
DBMS Material
(PK)
10 MTech BVRM
20 MBA HYD
30 MCA BVRM
Secondary Key:
The secondary key is defined as a key that is used to for data retrieval purpose.
Example: In the customer table the data retrieval can be facilitated when
CUST_LAST and CUST_PHONE number are used.
Integrity Rules:
1. Entity Integrity: All primary key entries are unique and no part of the primary
key may be NULL.
Example: In Agent table the agent_code is primary key and this column is
free from null values.
2. Referential Integrity: A foreign key is an attribute whose values match the
primary key values in the related table.
Example: The vendor_code is the primary key in the vendor key and it occurs
as a foreign key.
3. NOT NULL: NOT NULL constraint can be placed in a column while inserting a
row that column must have a value.
4. Unique: Unique constraint can be placed in a column while inserting a row
that column have unique values. (No duplication).
39
DBMS Material
The relational set operators are SELECT, PROJECT, JOIN, INTERSECT, UNION,
DIFFERENCE, PRODUCT, DIVIDE.
1. UNION:
The UNION operator combines all rows from two tables, excluding duplicate
rows. The tables must have the same structure.
Product 1
Product 2
2. INTERSECT:
The INTERSECT operator gives only the rows that are appear in both tables. The
tables are also have same structure.
40
DBMS Material
3. DIFFERENCE:
The DIFFERENCE operator gives all rows in one table that are not found in other
table.
STUDENT EMPLOYEE
4. PRODUCT:
The PRODUCT operator gives all possible pair of rows from two tables.
41
DBMS Material
5. SELECT:
The SELECT operator gives all rows that satisfies a given condition.
6. PROJECT:
The PROJECT operator gives all values for selected attributes. In other words
project gives a vertical subset of tables.
42
DBMS Material
7. JOIN: The JOIN operator combines rows from 2 or more tables. There are
several types of joins.
a. Natural Join
A Natural Join joins tables by selecting the rows with common values in their
common attributes. A natural join is the result of a three-stage process:
1). First, a PRODUCT of the tables is created, yielding the results shown in Figure
3.12.
43
DBMS Material
2). Second, a SELECT is performed on the output of Step a to yield only the rows for
which theAGENT_CODE values are equal. The common columns are referred to as
the join columns.
44
DBMS Material
3). A PROJECT is performed on the results of Step b to yield a single copy of each
attribute, thereby eliminating duplicate columns.
The final outcome of a natural join yields a table that does not include unmatched
pairs and provides only the copies of the matches.
b. Equijoin:
Or
Inner join produces only the set of records that match in both Table A and
Table B.
Product:
Vendor:
45
DBMS Material
Vend_code Vend-name
V101 ravi
V102 ram
V103 krishna
Result:
Outer Join:
In the Outer Join the matched pair of records would be written and any
unmatched values in other table would be NULL.
46
DBMS Material
Product:
Vendor:
Vend_code Vend-name
V101 ravi
V102 ram
V103 krishna
Result:
47
DBMS Material
8. DIVIDE:
The DIVIDE operator uses one single column table as a deviser and two column
table as the dividend. The output of DIVIDE operator is a single column with a
values column-A from the dividend table rows where the values of the common
column in both tables match.
Data Dictionary:
The Data Dictionary provides a description of all tables in the database. The
Data Dictionary contains attribute names and characteristics of each table in the
system. The data dictionary contains meta data.
Example:
48
DBMS Material
System Catalog:
Like the Data Dictionary, System Catalog contains metadata. The system catalog
describes table names, table creator, and creation of data, Number of column in
each table, the data types of the column, authorized users and access privileges.
One entity of one table is associated with number of entities into other tables.
Consider the “Painter Paints Paintings” the ER Model and implementations are shown
below.
49
DBMS Material
One-to-One
In this relationship one entity of one table is associated with one entity of other
table and vice-versa.
In this relationship each and every entity of one file will be associated with one or
more entities of another tables and vice versa. This relationship is not supported in
relational environment. The many to many relationship can be converted into two
One to Many relationships.
Consider student and subjects example. The ER model and implementation are
shown in below.
50
DBMS Material
In above each and every entity of student file is associated with one or more
entities of the subject table because each student will opt one or more subjects in a
semester.
Each and every entity of subject file is associated with one or more entities of
student table because each sub will be opted by more than one student in a
semester. Many to Many associations are not supported.
Hence the relationship will be converted into two one to many associations
are as shown below by introducing an intermediate table in which the common data
of the original file are stored.
Index:
An Index is composed of and index key and set of points. Each key points to
the location of data identified by the key.
Example: Suppose we want to look up all of the paintings created by the given
painter without an index, we must read each row in the painting table. If we index
the painter table and use the index key of painter number, we look up appropriate
51
DBMS Material
DBMSs use indexes for many different purposes. You just learned that an
index can be used to retrieve data more efficiently. But indexes can also be used by
a DBMS to retrieve data ordered by a specific attribute or attributes.
For example, creating an index on a customer’s last name will allow you to
retrieve the customer last name in alphabetical ordered.
Rule 1: Information:
52
DBMS Material
The metadata must be stored as ordinary data in a table within the database. Such
data must be available to authorized users.
The relational database may support many languages. However it must support data
definition, view definition, data manipulation integrity constraints, authorizations and
transaction management.
Application programs are logically unaffected when storage structures are changed.
Application programs are logically unaffected when changes are made to the table
structures.
All relational integrity constraints must be definable in the relational language and
stored in the system catalogs.
53
DBMS Material
The end users and application programs are unaffected by the data locations.
If the system support low level access to the data, there must not be a way to
bypass the integrity rules of the database.
54
DBMS Material
UNIT-2 CHAPTER-4
In chen notation the attributes are represented by Vowels and are connected to the
entity rectangle with a line .Each vowel contains name of the Attribute.
55
DBMS Material
In the cross foot Notation the attributes are written in the attribute box below the
entity rectangle.
In the Crows foot Notation the required Attributes are represented by bold face.
56
DBMS Material
Domain: Attributes have a domain. A domain is the set of possible values for a
given Attributes.
Eg: The domain for the gender attribute consists of only two possibilities namely
male and female.
For Eg. To identify the each student entity by using a composit primary key
composed of the combination of Stu_Lname and Stu_Fname instead of using
Stu_no.
A Composite Attribute is an Attribute that can be further sub divided which gives
additional Attributes.
Eg: Address can be sub divided into street , city and pincode.
Single Attribute: A Single Attribute ia an attribute that can have only one value.
Multi Value Attribute: A multi value Attribute is an attribute that can have many
values.
In Chenn notation , the multivalued attributes are shown by a double line connecting
to the attribute to entity.
57
DBMS Material
Note: The crows foot notation does not identify multivalued attributes.
Eg: An Employee age computing difference between current data and the employee
date of birth.
1. One- One
2. One- Many
3. Many-Many
58
DBMS Material
The first value represents the minimum number of associated entities. The
second value represents the maximum number of associated entities.
For Eg:
The cardinality (1,1) indicates that each class is taught by only one professor.
The cardinality (1,4) indicates one entity in the professor relation associate
not more than four times in the entities of the class Relation.
Weak Relationship
A weak relationship exists if the primary key of the entity does not contain
component of the parent entity.
The Relationship between course and class is weak because the class_code is
the primary key in the class entity while the course_code in the class is foreign key.
In this example the class primary key did not inherit the primary key components
from the course entity.
59
DBMS Material
STRONG RELATIONSHIP
A Relationship exist if the primary key of the related by contains a primary key
component of the parent entity. The relationship between course and class is strong
because the class entity composite entity key is composed of class_code+ Crs_code.
The class primary key inherit the primary key component from the course entity. In
the strong relationship we can write ‘o’ symbol next to the entity.
WEAK ENTITY
In Weak Entity the primary key is primary key is partially as totally derived
from the parent entity in the relationship .For example: The dependent entity key
was inherited from the employee entity as show below.
STRONG ENTITY
In Strong entity the primary key has not partially or totally derived from the
parent entity in the relationship. For example in the course and class relationship the
class table primary key is class code which is not derived from the course parent
key. The entity class is a strong entity.
60
DBMS Material
RELATIONSHIP PARTICIPATION:
Optional participation
Optional participation means that one entity occurrences does not require a
corresponding entity occurrence in a particular Relationship.
Eg: “course generate class” relationship an entity occurrence in the course table
does not necessary require the corresponding entity occurrences in the class table.
In the crows foot Notation there is no separate symbol for the mandatory entity. The
minimum cardinality is one for the mandatory entity.
61
DBMS Material
The cardinality (1,1) represents one class is taken by one professor. The cardinality
(0,3) indicates the professor may teach no classes or theory classes.
The following table shows various cardinalities that are supported by crow’s foot
notation.
RELATIONSHIP DEGREE
62
DBMS Material
Eg: An Employee within the employee entity is the manger for one or more entities
within that entity.
Binary Relationship: A binary relationship exists when two entities are associated.
63
DBMS Material
1. One- One
2. One- Many
3. Many-Many
Many to Many: The Many to Many relationship may be expressed by a course may
be pre requisite to many other courses.
64
DBMS Material
Example: The Crown foot notation the relationship between the parent and child
entities indicates the strong relationship.
DEVELOPING AN ER DIAGRAM:
During the review process additional objects, attributes and relationships will be
covered. Therefore the basic ERD will be modified to incorporate the newly
discovered components.
65
DBMS Material
Eg: Let us with initial interviews with the tiny college administrator and the interview
process gives the following business rules.
5. Each department may have professors one and only one of those professors
chairs the department and no professor is required to accept the chair
66
DBMS Material
position.
6. Each professor may teach the classes. A professor may not teach the class.
7. A student may enroll several classes, each class contains several students.
Student is optional to class in the many to many relationships. This many to
many relationship must be divided into two one to many relationship through
many enroll entities.
67
DBMS Material
The following diagram shows crows foot ERD for tiny college.
68
DBMS Material
69
DBMS Material
DatabaseDesignChallenges:
Database designers often must make design compromises that are triggered by
conflicting goals, such as adherence to design standards (design elegance),
processing speed,andinformationrequirements.
Designstandards:
The database design must conform to design standards. Such standards have
guided you in developing logical structures that minimize data redundancies.
In short, design standards allow you to work with well-defined components and to
evaluate the interaction of those components with some precision.
Processing speed:
In many organizations, particularly those generating large numbers of transactions,
high processing speeds are often a top priority in database design. High processing
speed means minimal access time. If the focus is on data-retrieval speed, you might
also be forced to include derived attributes in the design.
Information requirements:
The quest for timely information might be the focus of database design.
Complex information requirements may dictate data transformations, and they may
expand the number of entities and attributes within the design.
Therefore, the database may have to sacrifice some of its “clean” design structures
and/or some of its high transaction speed to ensure maximum information
generation.
A design that meets all logical requirements and design conventions is an important
goal.
However, if this perfect design fails to meet the customer’s transaction speed and/or
information requirements, the designer will not have done a proper job from the end
user’s point of view.
70
DBMS Material
Finally, prepare the document! Put all design activities in writing. Then review what
you’ve written.
71
DBMS Material
Chapter-5
Def:
72
DBMS Material
SubTotal 10259.45
18 Amber wave 114 Harika Applications 48.10 25.6 1231.36
designer
118 Ganesh General support 18.36 45.3 831.708
The Easiest way to generate the required report to create a table that table has
some fields of the Report.
Table_Name : Construction_Company
73
DBMS Material
For example: Employee number 104 has been assigned to two project
.Therefore knowing the project _no and employee no will find the job classification
and hours worked. Therefore project_No and emp_no will be taken as primary key.
1. The project _no is a part of primary key. But it contains null values.
2. The table entries invites data inconsistency for example job classification
value Electrical_Engineer might be entered.Elec_engi ,EE
3. The table displays data redundancy.
Update Anomalies: Modify the job class for Employee_No 105 requires
many alternatives.
Insertion Anomalies: To complete a row definition of a new employee
must be assigned to a project. If the employee is not assigned, a dummy
project must be created to complete the row.
Deletion Anomalies: Suppose only one employee is associated with a
project, if that employee leaves the company and the employee data are
deleted, the project information will also be deleted.
The above deficiency of table structure appears to work, the report
gives different results depending on data.
Normalization Process: The most common Normal forms and their characteristics
are
74
DBMS Material
SubTotal 10259.45
18 Amber wave 114 Harika Applications 48.10 25.6 1231.36
designer
118 Ganesh General support 18.36 45.3 831.708
75
DBMS Material
CONSTRUCTION_COMPANY(Proj_No,Proj_Name,(Emp_No,Emp_Name,
Job_Classification, Charge_Per_Hour, Hours_ Billed)) ----- (1)
The field Total charge, SUB TOTAL,GRAND TOTAL are not included in the
relation because they are derived Attribute.
In Relation (1), the fields in the inner most set of parenthesis put together is
known as repetating group. This will result in redundancy of data for the first two
relations remove the repetating group. Hence the relation 1 is subdivided into two
relations to remove repeating group
PROJECT_EMP(Proj_No,
Emp_No,Emp_Name,Job_Class,Charge_Per_Hour,Hours_Billed) ------ (3)
Now above relation (2) & (3) are in 1NF. In relation (3) Proj_No , Emp_No jointly
serve as key field.
In Relation 2 the number key fields is only one and hence there is no scope
for partial dependency the absence of partial dependency in relation 2 takes it 2NF
without any modification.
76
DBMS Material
Emp_Job(Emp_No,Emp_Name,Job_Class,Charge-Per_Hour)--------------(5)
Transitive Dependency:
In Relation (2) there is only one non key field. This means that it has no
transitive dependency. Hence Relation (2) can be treated as 3NF without any
modification similarly in relation (4) there is only one non key field. This means that
it has no transitive dependency. Hence relation (4) can be treated as 3Nf without
any modification.
Diagram
77
DBMS Material
Hence relation (5) is sub divided into two relations. Relation (6) and relation
(7) as shown below.
Job(Job_Class, Charge_Per_Hour)-----------------(6)
Emp(Emp_No,Emp_Name,Job_Class)------------(7)
Hence, the process of normalization is stopped and the final 3NF relations of
construction company as shown below.
Project(Proj_No,Proj_Name)---------------------------------(1)
Emp(Emp_No,Emp_Name,Job_Classification)------------(3)
Job(Job_Classification, Charge_Per_Hour)----------------(4)
Improving Design:
Emp(Emp_No,Emp_Name,Job_Code)
Job(Job_Code,Job_Classification,Charge_Per_Hour)
78
DBMS Material
Job(Job_Code,Job_Description,Job_Chg_Hour)
Assignment(Proj_No,Emp_No,Assign_Hours_Billed)
Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Job_Code)
Emp(Emp_No,Emp_LName,Emp_FName,Emp_Init,Hire_Data,Gross_
Salary,Job_Code)
79
DBMS Material
more than one time in the same day then it will violates the primary key
constraints. The same data entry gives no problem when Assign_No is used
as a primary key in the Assignment relation.
Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_B
illed)
Assignment(Assign_No,Assign_Date,Proj_No,Emp_No,Assig_Hour_B
illed,Assign_Chg_Hour)
The BCNF can be violated only when the table contains more than one candidate key
Candidate key:
A key is said to be candidate key if the superkey that does not contain a
subset of attributes i.e the key itself a superkey.
80
DBMS Material
A+B -------------C,D
C ---------- B
To convert the above table structure from 3NF to BCNF, first change the
primary key to A+C. The dependency C---B means that C is in effect a superset
of B. The Decomposition procedures to produce the results shown below.
Each Class_Code identifies a class iniquely. A student contains many classes and
earning the grades respectively. A staff member can teach many classes. But each
class is taught by only one staff member.
81
DBMS Material
Class_Code ---------------Staff_Id
The above table contains two candidate keys to violets the BCNF. Now we can
eliminate the one candidate key from the above table.
Consider an employee can have multiple assignment i.e, that employee works
as an volunteer in service organization and worked in different projects which is
shown below
82
DBMS Material
In the Assignment table and service table does not contain multi valued dependency
83
DBMS Material
Eg: The need for de normalization due to generate evaluation of faculty report in
which each row list the scores of obtaining during the last 4 semester taught.
instruct Dep Sem Mea Sem Mea Sem Mea Sem Mea Last_se
or t. -1 n -2 n -3 n -4 n m avg
We can generate easy above the report but the problem arises. The data are stored
in a normalized table. In which each row represented a different score for a given
faculty in a given semester.
EVALDATA:
The other table FACHLST faculty history table contains the last four semester mean
for each faculty .The faculty history table is a temporary table created from the
evaldata as shown below.
Instruct Dep Sem Mea Sem Mea Sem Mea Sem Mea Last_se
or t. -1 n -2 n -3 n -4 n m avg
The FACHIST is a un normalized from table using the table we can generate .The
faculty evaluation report very firstly. After generating the report, the temporary
84
DBMS Material
table, FACHIST will be deleted. We are doing like this, we can increase the
performance of the database
Chapter – 6
85
DBMS Material
For Example the pilot shares certain characteristics with other employee such as
employee_no, emp_name, emp_address, employee_hire_date on the other
employees. But pilot characteristics are not shared by other employee. The pilot
characteristics are employee_license and employee_ratingwill generates nulls
for employees who are not pilot.
The pilot entity store only the attributes that are unique to pilot, and the
employee entity store attribute that are common to all employees.
We can conclude that pilot is a sub type of employee and employee is a super
type of pilot. An entity super type is a generic type i.e, related to one or more entity
subtypes, where the entity super type contains common characteristics entity
subtype contains unique characteristics.
Specialization Hierarchy:
Entity super types and sub types are organized in a hierarchy which describes
the higher level entity super types (parent entity) and lower level entity (child entity)
sub types.
1. Pilot
86
DBMS Material
2. Mechanic
3. Accountant.
The specialization hierarchy reflects one-to-one relationship between super
entity type and sub entity type.
Eg:
Inheritance:
87
DBMS Material
Subtype Discriminator:
A sub type discriminator is the attribute in the super type entity that
determines to which sub type is related.
If the emp_type has a value of p the super type is related to pilot sub type.
If the emp_type has a value of A the super type is related to Account subtype.
If the emp_type has a value of M the super type is related to mechanic subtype.
Disjoint subtype are sub types that contains a unique subset of the super type
entity set.
The Disjoint subtypes are indicated by the letter ‘d’ inside the category shape.
In disjoint the super type entity is related to only one sub entity types.
Eg: An employee who is a pilot can appear only in the pilot sub type, not in any of
other sub types.
Overlapping subtypes are subtypes that contains non unique subsets of the
super type entity set.
The Overlapping subtypes are indicated by the letter’ o’ inside the category
shape. In the overlapping the super type entity is not relate to only one sub entity
types.
88
DBMS Material
The Administrator and professor overlap the super type entity employee.
Completeness constraint:
Partial completeness means that not every super type entity is a member
of sub type entity. A single horizontal line under the circle represents a partial
constraint O.
89
DBMS Material
We can use various approaches to develop entity super types and sub types.
Specialization is the top down process of identifying lower level entity sub
type from a higher level entity super type.
Eg: The specialization is used to identify multiple entity supply (Pilot, Mechanic,
Accountant) from the super entity employee.
Eg: The Generalization is used to identify entity super type from the sub type (Pilot,
mechanic, Accountant).
Entity Clustering:
Generally the data model will develop an initial ERD containing a few entities.
As the designed approach completion the ERD will contain hundreds of entities and
relationships. In those cases, we can use entity cluster to minimize the number of
entities in the ERD.
90
DBMS Material
Entity Integrity:
The unique identifies is commonly uncounted in the real world. For example
class_no to register for classes, invoice_no to identify a particular invoice,
account_no to identify credit cards and soon. These Examples contains natural
keys.
91
DBMS Material
If an entity has a natural identifier, a data modeler uses that natural key as
the primary key of the entity.
The primary key main function is to uniquely identity an entity for a given
primary key value the relational model can determine values of all dependent
attributes.
The second function is primary key and foreign key implement relationship
between tables or entities.
A primary key contains unique values and not accept the null values.
The primary key should be permanent and unchangeable
A primary key should have the minimum number of attributes.
Unique values can be managed when primary keys are numeric
In the first case assume that we have a student entity and class entity and
the relationship between these two entities is many to many via enroll entity. The
enroll entity contains key fields of student entity and class entity which is used to
identity entity instance in the enroll entity.
92
DBMS Material
In the 2nd case a weak entity in a strong relationship with a parent entity is normally
used.
Eg: The key field of employee entity is used one of the key filed of dependent entity.
Emp(Emp_no, Emp_Fname,Emp_Lname,email)
These are some instances when a primary key does not exist in the real world
object.
(or)
For Example: Consider the facility that rent for rooms for small parties. The manager
of the facility keep the all the events in the following table formats.
93
DBMS Material
In the above entity there is no simple natural keys i.e, used as a primary key in the
model.
Based on the concept of primary key we suggest one of these options (Date,
Time_start, Room) or (Date, Time_End, Room).
1. When Implementation of data model, the composite primary key in the event
entity makes complexity and also coding.
2. The solution to the problem is to use a numeric single attribute as surrogate
primary key.
94
DBMS Material
CHAPTER - 6
Transaction:
95
DBMS Material
In database terms a transaction is any action that read from and/or writes to a
database. A transaction may consists of a single SQL statement or a series of related
update statements or insert statements or combination of select, update and insert
statements.
Transaction Properties
Atomicity.
Consistency.
Isolation.
Durability.
Example: If a transaction T1 has four SQL requests, all four requests must be
successfully completed otherwise the entire transaction is aborted.
Isolation: The data used during the execution of the transaction cannot be used
by a second transaction until the first transaction is completed.
Durability: Once the transaction changes are done they cannot be undone even
in the system failure.
96
DBMS Material
1. The DBMS uses a transaction log file to keep track of all transaction that
update the database.
2. The Information stored in the log file is used by the DBMS for ROLLBACK,
abnormal termination or system failure.
3. While DBMS executes transactions that modify the database, it also
automatically updates the transaction log.
4. The transaction log stores
a. A record for the beginning of transaction.
b. For each transaction components
i. The type of operation is being to perform.
ii. The name of the object effected by the transaction (Name of the
table).
iii. The “Before” and “After” values for the fields being updated.
iv. Pointers to the previous and next transaction log entries for the
same transaction.
c. The end of the transaction log.
Where cust_no=’c1234’;
97
DBMS Material
If the system failures occur the DBMS will examine the transaction log for
incomplete transaction and ROLLBACK the database to its previous state.
1. Lost updates.
2. Uncommitted data
3. Data inconsistency.
Lost Updates:
The Lost update problem occurs when two concurrent transitions T 1 and T2
are updating the same data element and one of the update is lost.
Eg: The two concurrent transactions T 1 and T2 update the prod_qua value for same
item in the prod table. Assume that current prod_qua value is 35.
Transaction Computation
The following table shows the serial execution of those transaction under the
normal circumstance gives the answer.
98
DBMS Material
2 T1 PROD_QOH=35+100
3 T1 WRITE PROD_QOH 135
4 T2 READ PROD_QOH 135
5 T2 PROD_QOH=135-30
6 T2 WRITE PROD_QOH 105
Uncommitted Data
The uncommitted data problem occur when two transactions T 1 and T2 are
executed concurrently and the first transaction T 1 is Rolled back after the second
transaction T2 has already accessed the uncommitted data.
Eg: The two concurrent transactions T 1 &T2update the prod_qua value for same item
in the prod table assumes that the current prod_qua value is 35.
Transaction Computation
The following table shows under normal circumstance, the serial execution of
this transaction use that transaction.
99
DBMS Material
The following table shows, how the uncommitted data problem can arise when the
Roll back is completed after T2 has begin its execution.
Inconsistency retrieval occur when a transaction access data before and after
another transaction finish working with same data.
For example the transaction T1 calculates the total prod_qua of the products
stored in the product table. At the same time T 2 updates prod_qua for two products
in the product table.
TRANSACTION 1 TRANSACTION 2
Select sum(Prod_QOH)from Product Update Product set PROD_QOH+10
where
Prod_code=1003;
Update Product set PROD_QOH- 10
where
Prod_code=1004;
Commit;
Total=92 Total=92
100
DBMS Material
The Shedular
The shedular is a special DBMS process. The shedular uses the concurrency
control algorithms such as locking or time stamp methods to control the concurrent
executions of transactions on the same database.
101
DBMS Material
A transaction acquires a lock before to data access the lock is released when
the transaction is completed. So that another transaction can lock the data item for
its exclusive use all lock information is managed by a lock manager.
Lock Granularity
Lock granularity indicates the level of lock use.Locking can takes place at the
following levels:
Database level:
In Database level lock , the entire database is locked, preventing the use of
any tables in the database by transaction T2 while transacting T1 is being executed.
1. This type of locking is unsuitable for multi user DBMS because thousands of
transactions waiting for the previous transactions to be completed.
2. In database level lock transactions cannot access the same database
concurrently even when they use different tables.
1. In table level lock, the entire table is locked preventing access to any row by
transaction T2 while transaction T1 is using a table.
2. If a transaction requires several tables, each table may be locked. Two
transactions can access the same database as long as they access different
tables.
102
DBMS Material
3. Table level locks also cause traffic jam when many transactions are waiting to
access the same table.
The following shows transactions T1 and T2 cannot access the same table even
when they trying to use different rows, T2 must wait until T1 unlocks the table.
Page Level
1. A table can span several pages and a page can contains several rows. A page
level locks are most frequently used multi user DBMS locking method.
The following shows transactions access the same table while locking different
pages. If T2 requires the use of a row located on a page that is locked by T 1, T2 must
wait until the page is unlocked by T1.
103
DBMS Material
Row level
In the above fig. the both transactions execute concurrently, even when the
requested rows are on the same page. T2 must wait only if it requests the same row
as T1.
104
DBMS Material
Field Level
The field level lock allows concurrent transactions to access the same row but
different fields. The field level locking gives most flexible multiuser data access but it
requires an extremely high level overheads.
LOCK TYPES:
1. Binary lock
2. Shared/ Exclusive lock.
Binary Lock: A binary lock has two states locked (1), unlocked (0). Every
transaction requires a lock and unlock operations for each data item that is accessed
such operations automatically managed by DBMS.
1. A shared lock is issued when a transactions wants to read data item from the
database and no exclusive lock is held on that data item.
2. An Exclusive lock is issued when a transaction wants to update a data item
and no locks are held on that data item by any other transaction.
The following table shows conflict when at least one of a transaction is a write
operation.
TRANSACTION RESULT
T1 T2
READ READ No Conflict
OPERATION READ WRITE Conflict
WRITE READ Conflict
105
DBMS Material
i. Growing Phase
ii. Shrinking Phase.
Growing Phase:
In growing phase the transaction acquires all required locks on data item.
Once all locks have been acquired the transaction is in locked point and performs the
operation.
Shrinking Phase:
When the transaction completed the operations then starts the shrinking
phase. In shrinking phase the transaction releases all locks and cannot obtain any
new locks.
106
DBMS Material
The transaction acquires all the locks it needs. When the locked point is
reached then performs the operation, when the transaction is completed it releases
all the locks.
DEAD LOCK :
A dead lock occurs when two users have a lock each on separate resources.
They want to acquire a lock on each other resources.
In this situation, the first user has to wait until the second user releases the
lock and alternatively the second user also waits till the first user releases the lock.
At this stage both the users are in a waiting state. They cannot proceed with their
work.
Dead locks are possible only when one of the transactions wants to obtain an
exclusive lock on a data item, no dead lock item can exists among shared locks.
107
DBMS Material
The DBMS periodically test the database for dead locks. If a dead lock is
found one of the transactions is rolled back and other transaction continues.
The transaction must obtain all the locks it needs before it can be executed.
The choice of the best deadlock control method to use depends on the
database environment.
Assume that we have two conflicting transactions T 1 and T2. T1 has a time
stamp 1154 and T2 has a time stamp 1956. T 1 is older transaction and T2 is younger
transaction.
108
DBMS Material
Unintentional
Intentional
An Unintentional Failure: Under this category, the humans are deleting the
wrong rows from a table pressing the keys or shutdown the server by accidentally.
An Intentional Failure: Under this category, the unauthorized users accessing the
database to perform operations on the database and virus attackers on the database
to damage the data on the company.
Natural Incidents
109
DBMS Material
Under this category earthquakes, floods and power failures. The critical
events can render the database in an consistent state. The various techniques are
used to recovery the database from an inconsistent state to consistent state.
Transaction Recovery
The following are the four important concepts that affect the recovery
process.
The transaction logs are always written before any database data are
updated.
Differed Write
The recovery process for all started and committed transaction follows these
steps.
1. For a transaction that started and was committed before the last check point.
Nothing needs to be done because the data already saved.
2. For a transaction that performed a commit operation after the last check
point, the DBMS uses the transaction log records to redo the transaction and
to update the database, using the after values in the transaction log.
110
DBMS Material
3. For a transaction that has a roll back operation after the last check point,
nothing needs to be done because the database was never updated.
1. For a transaction that started and was committed before the last check point.
Nothing needs to be done because the data are already saved.
2. For a transaction that was committed after the last check point, the DBMS
uses the transaction log records to redo the transaction, using the after
values in the transaction log.
3. For any transaction that had a roll back operation after the log check point
the DBMS uses the transaction log to undo the operations using the before
values in the transaction log.
111
DBMS Material
The centralized approach worked well to fill the structural information needs
of corporations. The centralized database management system fell short when faster
response and quick access of information
What was needed from centralized database management system was quick
access, unstructured information and using adhoc queries to generate on the spot
information.
Because of above problems the technology changed , that effect the database
development and design.
A Distributed Database Management System (DDBMS) consists of a single
logical database that is split into a number of fragments. Each fragment is stored on
one or more computers under the control of a separate DBMS, with the computers
connected by a communications network. Each site is capable of independently
processing user requests that require access to local data (that is, each site has
some degree of local autonomy) and is also capable of processing data stored on
other computers in the network.
112
DBMS Material
• Less danger of a single-point failure. When one of the computers fails, the
workload is picked up by other workstations. Data are also distributed at multiple
sites.
•Processor independence. The end user is able to access any available copy of
the data, and an end user's request is processed by any processor at the data
location.
Disadvantages:
•Complexity of management and control. Applications must recognize data
location, and they must be able to stitch together data from various sites.
•Technological difficulty. Data integrity, transaction management, concurrency
control, security, backup, recovery, query optimization, access path selection, and so
on, must all be addressed and resolved.
• Security. The probability of security lapses increases when data are located at
multiple sites. The responsibility of data management will be shared by different
people at several sites.
•Lack of standards. There are no standard communication protocols at the
database level.
• Increased storage and infrastructure requirements. Multiple copies of data
are required at different sites, thus requiring additional disk storage space.
• Increased training cost. Training costs are generally higher in a distributed
model than the centralized model
• Costs. Distributed databases require duplicated infrastructure to operate (physical
location, environment, personnel, software, licensing, etc.)
Distributed processing:
In distributed processing, a database’s logical processing is shared among
two or more physically independent sites that are connected through a network.
113
DBMS Material
Although the database resides at only one site , each site can access the data
and update the database. The database is located on Computer A, a network
computer known as the database server.
Distributed Database:
A Distributed database stores a logically related data base in two or more
sites. The sites are connected via a communication network.
In distributed database system a database is composed of of several parts
known as database fragments. The database fragments are located at different
sites. An example of distributed database environment is shown below.
114
DBMS Material
In the above diagram the database is divided into three fragments (E1,E2,E3)
located at different sites. When you compare the distributed database the following
points are observed.
1. Distributed processing does not require a distributed database, but a
distributed database requires distributed processing (each database fragment
is managed by its own local database process).
2. Both distributed processing and distributed databases require a network to
connect all components.
Characteristics of distributed database management system:
The Characteristics of distributed database management system are
1. Application interface: To interact with the end user, application programs,
and other DBMSs within the distributed database.
2. Validation: To analyze data requests for syntax correctness.
3. Transformation: To decompose complex requests into atomic data request
components
4. Mapping:To determine the data location of local and remote fragments.
5. I/O interface: To read or write data from or to permanent local storage.
6. Security : To provide data privacy at both local and remote databases.
115
DBMS Material
DDBMS COMPONENTS:
The components of DDBMS are
Computer workstations: It form the network system. The distributed database
system must be independent of the computer system hardware.
Network hardware and software: the network hardware and software
components that resides in each workstation. The network components allow all
sites to interact and exchange data.
Communications media: The communication mediathat carry the data from one
node to another.
116
DBMS Material
processes the application’s data requests (remote and local). The TP is also known
as the application processor (AP) or the transaction manager (TM).
Data processor (DP):
The data processor (DP), which is the software component residing on each
computer or device that stores and retrieves data located at the site. The DP is also
known as the data manager (DM).
117
DBMS Material
In SPSD all processing is done on a single host computer (mainframe computer) and
all data are stored on the host computer’s hard disk.
Processing cannot be done on the end user’s side of the system.
The transaction processor and Data processor are embedded with in the
DBMS located on a single computer.
Multiple-Site Processing, Single-Site Data (MPSD)
Under the multiple-site processing, single-site data (MPSD) scenario,
multiple processes run on different computers sharing a single data repository. The
MPSD scenario requires a network file server.
All records and the files locking activities are done at the work station.
118
DBMS Material
All data selection, search, and update functions take place at the workstation, thus
requiring that entire files travel through the network for processing at the
workstation. Such a requirement increases network traffic, slows response time, and
increases communication costs.
Multiple-Site Processing, Multiple-Site Data (MPMD):
The multiple-site processing, multiple-site data (MPMD) scenario
supports multiple transaction processing and multiple data processing at multiple
sites. The DDBMSs are classified as either homogeneous or heterogeneous.
Homogeneous DDBMSs: Homogeneous DDBMS supports same DBMS will be
running on different server platforms.
Heterogeneous DDBMSs: Heterogeneous DDBMS supports different DBMS will be
running on different server platforms.
119
DBMS Material
DISTRIBUTION TRANSPARENCY:
Three levels of distribution transparency are recognized:
Fragmentation transparency: Itis the highest level of transparency. The end user
or programmer does not need to know the fragment names and fragment locations
to access data.
Location transparency: This is the middle level of transparency. The end user
must specify fragement name and does not need to specify the location name.
Local mapping transparency: This is the lowest level of transparency. The end
user or programmer must specify both the fragment names and their locations
name.
For example the employee data are distributed over three different locations – Delhi,
Hyderabad, Banglore.
The Delhi employees data are stored in fragment E1, Hyderabad employees
data are stored in fragment E2, Banglore employees data are stored in fragment E3
is shown below.
Suppose the end user wants to list all the employees with a date of birth
before Jan 1,1980. The end user written this query in the fragmentation
transparency as shown below.
Select * from employees where DOB< ’01-jan-1980’;
The end user written this Query in the location Transparency as shown below.
Select * from E1 where DOB< ’01-jan-1980’
Union
Select * from E2 where DOB< ’01-jan-1980’
Union
Select * from E3 where DOB< ’01-jan-1980’;
120
DBMS Material
The end user written this Query in the local Transparency as shown below.
Select * from E1 Delhi where DOB< ’01-jan-1980’
Union
Select * from E2 Hyderabad where DOB< ’01-jan-1980’
Union
Select * from E3 Banglore where DOB< ’01-jan-1980’;
TRANSACTION TRANSPARENCY
To understand how the transactions are managed in the DDBMS, we should
know the basic concepts of remote request, remote transaction, Distributed request
and Distributed transaction.
Remote Request: A Remote request is shown below
121
DBMS Material
BEGIN WORK;
UPDATE PRODUCT SET PROD_QTY = PROD_QTY – 1
WHERE PROD_NUM = ‘231785’;
122
DBMS Material
BEGIN WORK;
UPDATE PRODUCT
SET PROD_QTY=PROD_QTY – 1 WHERE PROD_NUM = ‘231785’;
INSERT INTO INVOICE (CUS_NUM, INV_DATE, INV_TOTAL) VALUES (‘100’, ‘15-FEB-
2010’, 120.00);
UPDATE CUSTOMER SET CUS_BALANCE = CUS_BALANCE + 120
WHERE CUS_NUM = ‘100’;
COMMIT WORK;
As we observe the above diag. the following features are
1. The transaction references two remote sites (B and C).
2. The first two requests (UPDATE PRODUCT and INSERT INTO INVOICE) are
processed by the DP at the remote site C, and the last request (UPDATE
CUSTOMER) is processed by the DP at the remote site B.
3. Each request can access only one remote site at a time.
The third characteristic may create problems. For example, suppose the table
PRODUCT is divided into two fragments, PRODl and PROD2, located at sites B and C,
respectively, the distributed transaction cannot access the data from more than one
site. Therefore the DDBMS must be support a distributed request.
123
DBMS Material
Distributed request:
The Distributed request allows a single request to reference a partitioned
table in several sites.
For ex. Customer table is divided into two fragments E1 and E2 located at site
B and Site C are shown below.
124
DBMS Material
125
DBMS Material
1. The sub coordinator receive the message; write the transaction log, using the
write-ahead protocol; and send an acknowledgment (YES/PREPARED TO COMMIT or
NO/NOT PREPARED) message to the coordinator.
2. If all nodes are PREPARED TO COMMIT, the transaction goes to Phase 2. If one
or more nodes reply NO or NOT PREPARED, the coordinator broadcasts an ABORT
message to all subordinates.
Phase 2: The Final COMMIT
1. The coordinator broadcasts a COMMIT message to all subordinates and waits for
the replies.
2. Each subordinate receives the COMMIT message, and then updates the database
using the DO protocol.
3. The subordinates reply with a COMMITTED or NOT COMMITTED message to the
coordinator.
If one or more subordinates did not commit, the coordinator sends an ABORT
message, thereby forcing them to UNDO all changes.
Note: The objective of the two-phase commit is to ensure that each node commits
its part of the transaction; otherwise, the transaction is aborted.
126
DBMS Material
127
DBMS Material
Data fragmentation allows you to break a single object into two or more
segments, or fragments. The object might be a user’s database, a system database,
or a table. Each fragment can be stored at any site over a computer network.
Horizontal fragmentation partitions the rows of a global fragment into subsets. A
fragment r1 is a selection on the global fragment r using a predicate Pi, its
qualification. The reconstruction of r is obtained by taking the union of all fragments.
Vertical fragmentation subdivides the attributes of the global fragment into
groups. The simplest form of vertical fragmentation is decomposition. A unique row-
id may be included in each fragment to guarantee that the reconstruction through a
join operation is possible.
Mixed fragmentation is the result of the successive application of both
fragmentation techniques.
Rules for Fragmentation
1. Fragments are formed by the select predicates associated with dominant database
transactions. The predicates specify attribute values used in the conjunctive (AND)
and disjunctive (OR) form of select commands, and rows (records) containing the
same values form fragments.
2. Fragments must be disjoint and their union must become the whole fragment.
Overlapping fragments are too difficult to analyze and implement.
3. The largest fragment is the whole table. The smallest table is a single record.
Fragments should be designed to maintain a balance between these extremes.
128
DBMS Material
a new network is added, provided they do not need information from the added
node.
Rule-4.
Location Independence (or Transparency). A common global user view of the
database should be supported so that users need not know where the data is
located. This allows data to be moved for performance considerations or in response
to storage constraints without affecting the user applications.
Rule-5.
Fragmentation Independence (or Transparency). This allows tables to be split
among several sites, transparent to user applications. For example, we can store
New York employee records at the New York site and Boston employees at the
Boston site, but allow the user to refer to the separated data as EMPLOYEES,
independent of their locations.
Rule-6.
Replication Independence (or Transparency). This allows several copies of a
table (or portions thereof) to reside at different nodes. Query performance can be
improved since applications can work with a local copy instead of a remote one.
Update performance, however, may be degraded due to the additional copies.
Availability can improve.
Rule-7.
Distributed Query Processing. No central site should perform optimization; but
the submitting site, which receives the query from the user, should decide the
overall strategy. Other participants perform optimization at their own levels.
Rule-8.
Distributed Transaction Processing. The system should process a transaction
across multiple databases exactly as if all of the data were local. Each node should
be capable of acting as a coordinator for distributed updates, and as a participant in
other transactions. Concurrency control must occur at the local level (Rule 2), but
there must also be cooperation between individual systems to ensure that a “global
deadlock” does not occur.
Rule-9.
Hardware Independence. The concept of a single database system must be
presented regardless of the underlying hardware used to implement the individual
systems.
Rule-10.
Operating System Independence. The concept of a single database system must
be presented regardless of the underlying operating systems used.
129
DBMS Material
Rule-11.
Network Independence. The distributed system must be capable of
communicating over a wide variety of networks, often different ones in the same
configuration. Standard network protocols must be adhered to.
Rule-12.
DBMS Independence (Heterogeneity). The distributed system should be able to
be made up of individual sites running different database management systems.
130