Module - 1 Introduction To Database Management System
Module - 1 Introduction To Database Management System
Module - 1 Introduction To Database Management System
Contents
1.1 Introduction ........................................................................................................................................ 4
1.2 Basic definitions and concepts ........................................................................................................... 4
1.2.1 Data......................................................................................................................................................... 4
1.2.2 Information ............................................................................................................................................. 4
1.2.3 Meta Data ............................................................................................................................................... 5
1.2.4 Data Dictionary....................................................................................................................................... 5
1.2.5 Database.................................................................................................................................................. 6
1.2.5.1 Fields ..................................................................................................................................... 6
1.2.5.2 Records.................................................................................................................................. 6
1.2.5.3 Files ....................................................................................................................................... 6
1.2.6 Components of a Database ..................................................................................................................... 6
1.2.7 Database Management System (DBMS) ................................................................................................ 7
1.2.8 Components of DBMS ........................................................................................................................... 7
1.2.8.1 Data Definition Language (DDL) ......................................................................................... 8
1.2.8.2 Data Manipulation Language (DML) and Query Language ................................................. 8
1.2.8.3 Software for Controlled Access of Database ......................................................................... 8
1.3 traditional file system Versus database systems ........................................................................... 8
1.3.1 Disadvantages of Traditional File System .............................................................................................. 9
1.3.2 Database Systems or Database System environment ............................................................................ 10
1.3.3 Advantages of Database Systems (DBMS’s)........................................................................................ 11
1.3.4 Disadvantages of Database Systems ..................................................................................................... 12
1.4 dBms users ................................................................................................................................... 13
1.4.1 end users or Naive users ....................................................................................................................... 13
An organization must have accurate and reliable data (information) for effective decision making. Data
(information) is the backbone and most critical resource of an organization that enables managers and
organizations to gain a competitive edge. In this age of information explosion, where people are bombarded
with data, getting the right information, in the right amount, at the right time is not an easy task. So, only
those organizations will survive that successfully manage information.
A database system simplifies the tasks of managing the data and extracting useful information in a timely
fashion. A database system is an integrated collection of related files, along with the details of the
interpretation of the data. A Data Base Management System is a software system or program that allows
access to data contained in a database. The objective of the DBMS is to provide a convenient and effective
method of defining, storing, and retrieving the information stored in the database.
The database and database management systems have become essential for managing business, governments,
schools, universities, banks etc.
In an organization, the data is the most basic resource. To run the organization efficiently, the proper
organization and management of data is essential. The formal definition of the major terms used in databases
and database systems is defined in this section.
1.2.1 Data
The term data may be defined as known facts that could be recorded and stored on Computer Media. It is also
defined as raw facts from which the required information is produced.
1.2.2 Information
Data and information are closely related and are often used interchangeably. Information is nothing but
refined data. In other way, we can say, information is processed, organized or summarized data. According
to Burch et. al., “Information is data that have been put into a meaningful and useful content and
communicated to a recipient who uses it to made decisions”. Information consists of data, images, text,
documents and voice, but always in a meaningful content. So we can say, that information is something more
than mere data.
Data are processed to create information. The recipient receives the information and then makes a decision
and takes an action, which may triggers other actions
In these days, there is no lack of data, but there is lack of quality information. The quality information means
information that is accurate, timely and relevant, which are the three major key attributes of information.
1. Accuracy : It means that the information is free from errors, and it clearly and accurately reflects the
meaning of data on which it is based. It also means it is free from bias and conveys an accurate picture to
the recipient.
2. Timeliness : It means that the recipients receive the information when they need it and within the required
time frame.
3. Relevancy : It means the usefulness of the piece of information for the corresponding persons. It is a very
subjective matter. Some information that is relevant for one person might not be relevant for another and
vice versa e.g., the price of printer is irrelevant for a person who wants to purchase computer.
So, organization that have good information system, which produce information that is accurate, timely and
relevant will survive and those that do not realize the importance of information will soon be out of business.
1.2.5 Database
A database is a collection of interrelated data stored together with controlled redundancy to serve one or more
applications in an optimal way. The data are stored in such a way that they are independent of the programs
used by the people for accessing the data. The approach used in adding the new data, modifying and retrieving
the existing data from the database is common and controlled one.
It is also defined as a collection of logically related data stored together that is designed to meet information
requirements of an organization. We can also define it as an electronic filling system.
The example of a database is a telephone directory that contains names, addresses and telephone
numbers of the people stored in the computer storage.
Databases are organized by fields, records and files. These are described briefly as follows :
1.2.5.1 Fields
It is the smallest unit of the data that has meaning to its users and is also called data item or data
element. Name, Address and Telephone number are examples of fields. These are represented in the
database by a value.
1.2.5.2 Records
A record is a collection of logically related fields and each field is possessing a fixed number of bytes and is
of fixed data type. Alternatively, we can say a record is one complete set of fields and each field have some
value. The complete information about a particular phone number in the database represents a record. Records
are of two types fixed length records and variable length records.
1.2.5.3 Files
A file is a collection of related records. Generally, all the records in a file are of same size and record type
but it is not always true. The records in a file may be of fixed length or variable length depending upon the
size of the records contained in a file. The telephone directory containing records about the different telephone
holders is an example of file. More detail is available in chapter 3.
DataItems
Relationships
Physical
Database Constraints
Schema
1. Data item : It is defined as a distinct piece of information and is explained in the previous section.
2. Relationships : It represents a correspondence between various data elements.
3. Constraints : These are the predicates that define correct database states.
4. Schema : It describes the organization of data and relationships within the database. The schema consists
of definitions of the various types of record in the database, the data-items they contain and the sets into
which they are grouped. The storage structure of the database is described by the storage schema. The
conceptual schema defines the stored data structure. The external schema defines a view of the database
for particular users.
ApplicationPrograms
DataDefinitionLanguage
DBMS (DDL)
Components
Softwaretoprocess queries
andprograms
(DML/SQL)
Softwareforcontrolledaccess
ofstoreddata
Physical
MetaData
Database
The DBMS software together with the database is called a Database System.
Conventionally, the data were stored and processed using traditional file processing systems. In these
traditional file systems, each file is independent of other file, and data in different files can be integrated only
by writing individual program for each application. The data and the application programs that uses the data
are so arranged that any change to the data requires modifying all the programs that uses the data. This is
because each file is hard-coded with specific information like data type, data size etc. Some time it is even
not possible to identify all the programs using that data and is identified on a trial-and-error basis.
A file processing system of an organization is shown in Figure 1.3. All functional areas in the organization
creates, processes and disseminates its own files. The files such as inventory and payroll generate separate
files and do not communicate with each other.
File File File File File File File File File File File
1 2 3 1 2 3 1 2 3 1 2
FIgure 1.3. Traditional file system.
No doubt such an organization was simple to operate and had better local control but the data of the
organization is dispersed throughout the functional sub-systems. These days, databases are preferred because
of many disadvantages of traditional file systems.
1.3.1 Disadvantages of Traditional File System
A traditional file system has the following disadvantages:
1. Data Redundancy : Since each application has its own data file, the same data may have to be recorded
and stored in many files. For example, personal file and payroll file, both contain data on employee
name, designation etc. The result is unnecessary duplicate or redundant data items. This redundancy
requires additional or higher storage space, costs extra time and money, and requires additional efforts
to keep all files upto-date.
2. Data Inconsistency : Data redundancy leads to data inconsistency especially when data is to be updated.
Data inconsistency occurs due to the same data items that appear in more than one file do not get updated
simultaneously in each and every file. For example, an employee is promoted from Clerk to
Superintendent and the same is immediately updated in the payroll file may not necessarily be updated
in provident fund file. This results in two different designations of an employee at the same time. Over
the period of time, such discrepencis degrade the quality of information contain in the data file that
affects the accuracy of reports.
3. Lack of Data Integration : Since independent data file exists, users face difficulty in getting information
on any ad hoc query that requires accessing the data stored in many files. In such a case complicated
programs have to be developed to retrieve data from every file or the users have to manually collect the
required information.
4. Program Dependence : The reports produced by the file processing system are program dependent,
which means if any change in the format or structure of data and records in the file is to be made, the
programs have to modified correspondingly. Also, a new program will have to be developed to produce
a new report.
5. Data Dependence : The Applications/programs in file processing system are data dependent i.e., the
file organization, its physical location and retrieval from the storage media are dictated by the
requirements of the particular application. For example, in payroll application, the file may be organised
on employee records sorted on their last name, which implies that accessing of any employee’s record
has to be through the last name only.
6. Limited Data Sharing : There is limited data sharing possibilities with the traditional file system. Each
application has its own private files and users have little choice to share the data outside their own
applications. Complex programs required to be written to obtain data from several incompatible files.
7. Poor Data Control : There was no centralised control at the data element level, hence a traditional file
system is decentralised in nature. It could be possible that the data field may have multiple names defined
by the different departments of an organization and depending on the file it was in. This situation leads
to different meaning of a data field in different context or same meaning for different fields. This causes
poor data control.
8. Problem of Security : It is very difficult to enforce security checks and access rights in a traditional file
system, since application programs are added in an adhoc manner.
9. Data Manipulation Capability is Inadequate : The data manipulation capability is very limited in
traditional file systems since they do not provide strong relationships between data in different files.
10. Needs Excessive Programming : An excessive programming effort was needed to develop a new
application program due to very high interdependence between program and data in a file system. Each
new application requires that the developers start from the scratch by designing new file formats and
descriptions and then write the file access logic for each new file.
Database
(Hardware)
User
(Data)
Application
programs User
(Software)
(Data)
User
(Data)
User
(Users)
FIgure 1.4. Database system.
1. Data : The whole data in the system is stored in a single database. This data in the database are both
shared and integrated. Sharing of data means individual pieces of data in the database is shared among
different users and every user can access the same piece of data but may be for different purposes. Integration
of data means the database can be function of several distinct files with redundancy controlled among the
files.
2. Hardware : The hardware consists of the secondary storage devices like disks, drums and so on,
where the database resides together with other devices. There is two types of hardware. The first one, i.e.,
processor and main memory that supports in running the DBMS. The second one is the secondary storage
devices, i.e., hard disk, magnetic disk etc., that are used to hold the stored data.
3. Software : A layer or interface of software exists between the physical database and the users. This
layer is called the DBMS. All requests from the users to access the database are handled by the DBMS. Thus,
the DBMS shields the database users from hardware details. Furthermore, the DBMS provides the other
facilities like accessing and updating the data in the files and adding and deleting files itself.
4. Users : The users are the people interacting with the database system in any way. There are four
types of users interacting with the database systems. These are Application Programmers, online users, end
users or naive users and finally the Database Administrator (DBA). More about users in section 1.4.
The users of a database system can be classified into various categories depending upon their interaction and
degree of expertise of the DBMS.
1.4.1 end users or Naive users
The end users or naive users use the database system through a menu-oriented application program, where
the type and range of response is always displayed on the screen. The user need not be aware of the presence
of the database system and is instructed through each step. A user of an ATM falls in this category.
The DBMS provides different languages and interfaces for each category of users to express database queries
and updations. When the design of the database is complete and the DBMS is chosen to implement it, the
first thing to be done is to specify the conceptual and internal schemas for the database and the corresponding
mappings. The following five languages are available to specify different schemas.
1. Data Definition Language (DDL) 2. Storage Definition Language (SDL)
3. View Definition Language (VDL) 4. Data Manipulation Language (DML)
5. Fourth-Generation Language (4-GL)
1.5.1 Data Definition Language (DDL)
It is used to specify a database conceptual schema using set of definitions. It supports the definition or
declaration of database objects. Many techniques are available for writing DDL. One widely used technique
is writing DDL into a text file. More about DDL in chapter 7.
The plans of the database and data stored in the database are most important for an organization, since
database is designed to provide information to the organization. The data stored in the database changes
regularly but the plans remain static for longer periods of time.
1.6.1 Schema
A schema is plan of the database that give the names of the entities and attributes and the relationship among
them. A schema includes the definition of the database name, the record type and the components that make
up the records. Alternatively, it is defined as a frame-work into which the values of the data items are fitted.
The values fitted into the frame-work changes regularly but the format of schema remains the same e.g.,
consider the database consisting of three files ITEM, CUSTOMER and SALES. The data structure diagram
for this schema is shown in Figure 1.5. The schema is shown in database language. Generally, a schema can
be partitioned into two categories, i.e., (i) Logical schema and (ii) Physical schema.
(i) The logical schema is concerned with exploiting the data structures offered by the DBMS so that the
schema becomes understandable to the computer. It is important as programs use it to construct
applications.
(ii) The physical schema is concerned with the manner in which the conceptual database get represented in
the computer as a stored database. It is hidden behind the logical schema and can usually be modified
without affecting the application programs.
The DBMS’s provide DDL and DSDL to specify both the logical and physical schema.
Schema name is ITEM_SALES_REC
FIgure 1.5. Data structure diagram for the item sales record.
1.6.2 Subschema
A subschema is a subset of the schema having the same properties that a schema has. It identifies a subset of
areas, sets, records, and data names defined in the database schema available to user sessions. The subschema
allows the user to view only that part of the database that is of interest to him. The subschema defines the
portion of the database as seen by the application programs and the application programs can have different
view of data stored in the database.
The different application programs can change their respective subschema without affecting other’s
subschema or view.
The Subschema Definition Language (SDL) is used to specify a subschema in the DBMS.
1.6.3 Instances
The data in the database at a particular moment of time is called an instance or a database state. In a given
instance, each schema construct has its own current set of instances. Many instances or database states can
be constructed to correspond to a particular database schema. Everytime we update (i.e., insert, delete or
modify) the value of a data item in a record, one state of the database changes into another state. The Figure
1.6 shows an instance of the ITEM relation in a database schema.
ITeM
ITeM-ID ITeM_DeSC ITeM_COST
1111A Nutt 3
1112A Bolt 5
The architecture is a framework for describing database concepts and specifying the structure of database
system. The three level architecture was suggested by ANSI/SPARC. Here database is divided into three
levels external level, conceptual level and internal level as shown in Figure 1.7.
Viewsand External External External External
mappings view1 view2 viewN level
are
maintained
byDBA
Conceptual/Internal
mapping
Internal Internalview(Physical
level storageofdata)
A data model is a collection of concepts that can be used to describe the structure of the database including
data types, relationships and the constraints that apply on the data.
A data model helps in understanding the meaning of the data and ensures that, we understand.
– The data requirements of each user.
– The use of data across various applications.
– The nature of data independent of its physical representations.
A data model supports communication between the users and database designers. The major use of data
model is to understand the meaning of the data and to facilitate communication about the user requirements.
In the Figure 1.8, the ‘WORLD’ acts as a root of the tree structure which has many children’s like Asia,
Europe, Australia etc. These children can act as a parent for different countries such as ASIA continents acts
as a parent for countries like India, China, Pakistan etc. Similarly these children can act as a parent for
different states such as INDIA country acts as a parent for states Punjab, Haryana, Rajasthan etc. Further the
same follows.
Consider child ‘ROHTAk’ which has a parent ‘HARyANA’ which further has a parent ‘INDIA’ and so on.
Now ‘India’ will acts a grandparent for the child ‘ROHTAk’.
The major advantages of Hierarchical Model are that it is simple, efficient, maintains data integrity and is
the first model that provides the concept of data security. The major disadvantages of Hierarchical model are
that it is complex to implement, Lacking of structural independence, operational anomalies and data
management problem.
(ii) Network Data Model : As a result of limitations in the hierarchical model, designers developed the
Network Model. The ability of this model to handle many to many (N : N) relations between its records is
the main distinguishing feature from the hierarchical model. Thus, this model permits a child record to have
more than one parent. In this model, directed graphs are used instead of tree structure in which a node can
have more than one parent. This model was basically designed to handle non-hierarchical relationships.
The relationships between specific records of 1 : 1 (one to one), 1 : N (one to many) or N : N (many to many)
are explicitly defined in database definition of this model. The Network Model was standardized as the
CODASyL DBTG (Conference of Data System Languages, Database Task Group) model.
There are two basic data structures in this model—Records and Sets. The record contains the detailed
information regarding the data which are classified into record types. A set type represents relationship
between record types and this model use linked lists to represent these relationships. Each set type definition
consists of three basic elements : a name for set type an owner record type (like parent) and a member record
type (like child).
To represent many to many relationship in this model, the relationship is decomposed into two one to many
(1 : N) relationships by introducing an additional record type called an Intersection Record or Connection
Record.
The major advantages of Network Model are that it is conceptually simple, Handles more relationship types,
promotes database integrity, data access flexibility and conformance to the standards.
The major disadvantages of Network Model are that it is complex and lack of structural independence.
(iii) Relational data Model : The Relational Model was first introduced by Dr. Edgar Frank, an Oxford-
trained Mathematician, while working in IBM Research Centre in 1970’s.
The Relational Model is considered one of the most popular developments in the database technology
because it can be used for representing most of the real world objects and the relationships between them.
The main significance of the model is the absolute separation of the logical view and the physical view of
the data. The physical view in relational model is implementation dependent and not further defined.
The logical view of data in relational model is set oriented. A relational set is an unordered group of items.
The field in the items are the columns. The column in a table have names.
The rows are unordered and unnamed. A database consists of one or more tables plus a catalogue describing
the database.
The relational model consists of three components:
1. A structural component—A set of tables (also called relations) and set of domains that defines the way
data can be represented.
2. A set of rules for maintaining the integrity of the database.
3. A manipulative component consisting of a set of high-level operations which act upon and produce whole
tables.
In the relational model the data is represented in the form of tables which is used interchangeably with the
word Relation. Each table consists of rows also knowns as tuples (A tuple represents a collection of
information about an item, e.g., student record) and column also known as attributes. (An attribute represents
the characteristics of an item, e.g., Student’s Name and Phone No.). There are relationships existing between
different tables. This model doesn’t require any information that specifies how the data should be stored
physically.
The major advantages of Relational Model are that it is structurally independent, improved conceptual
simplicity adhoc query capability and powerful DBMS. The major disadvantages of relational model are
substantial hardware and software overhead and facilitates poor design and implementation.
1.8.1.2 Object Based Data Models
Object Based Data Models are also known as conceptual models used for defining concepts including entries,
attributes and relationships between them. These models are used in describing data at the logical and user
view levels. These models allow the constraints to be specified on the data explicitly by the users.
An entity is a distinct object which has existence in real world. It will be implemented as a table in a database.
An attribute is the property of an entity, in other words, attribute is a single atomic unit of information that
describes something about its entity. It will be implemented as a column or field in the database.
The associations or links between the various entities is known as relationships.
There are 4 types of object based data models. These are:
(a) Entity-relationship (E-R) Model
(b) Object-Oriented Model
(c) Semantic Data Model
(d) Functional Data Model
These are discussed as follows:
(a) Entity-Relationship (E-R) Model : The E-R model is a high level conceptual data model developed by
Chen in 1976 to facilitate database design. The E-R model is the generalization of earlier available
commercial model like the hierarchical and network model. It also allows the representation of the various
constraints as well as their relationships.
The relationship between entity sets is represented by a name. E-R relationship is of 1 : 1, 1 : N or N : N type
which tells the mapping from one entity set to another.
E-R model is shown diagrammatically using entity-relationship (E-R) diagrams which represents the
elements of the conceptual model that show the meanings and relationships between those elements
independent of any particular DBMS. The various features of E-R model are:
(i) E-R Model can be easily converted into relations (tables).
(ii) E-R Model is used for purpose of good database design by database developer.
(iii) It is helpful as a problem decomposition tool as it shows entities and the relationship between those
entities.
(iv) It is an iterative process.
(v) It is very simple and easy to understand by various types of users.
The major advantages of E-R model are that it is conceptually simple, have vishal representation, an effective
communication tool and can be integrated with the relational data model.
The major disadvantages of E-R model are that there are limited constraint representation, limited
relationship representation, no data manipulation language and loss of information content.
(b) Object-Oriented Data Model : Object-oriented data model is a logical data model that captures the
semantics of objects supported in an object-oriented programming. It is based on collection of objects,
attributes and relationships which together form the static properties. It also consists of the integrity rules
over objects and dynamic properties such as operations or rules defining new database states.
An object is a collection of data and methods. When different objects of same type are grouped together they
form a class. This model is used basically for multimedia applications as well as data with complex
relationships. The object model is represented graphically with object diagrams containing object classes.
Classes are arranged into hierarchies sharing common structure and behaviour and are associated with other
classes.
Operating System
Database Files (Storage)
Communications
LocalAreaNetwork
(LAN)
PC PC
—UserInterface
—DBMS
—OperatingSystem
—DatabaseFiles(Storage)
—OperatingSystem
Terminal Terminal
—Display —Display
—Keyboard —Keyboard
Database Database
Network
Client/Server Client/Server
Database Database
Client Client
Network
Server
Client Client
1. Server : Server is DBMS itself. It consists of DBMS and supports all basic DBMS functions. Server
components of DBMS are installed at server. It acts as monitor of all of its clients. It distributes work-load to
other computers. Clients must obey their servers.
Functions of Server : The server performs various functions, which are as follows.
1. It supports all basic DBMS functions.
2. Monitor all his clients.
3. Distribute work-load over clients.
4. Solve problems which are not solved by clients.
5. Maintain security and privacy.
6. Avoiding unauthorized access of data.
2. Clients : Client machine is a personal computer or workstation which provide services to both server
and users. It must obey his server. Client components of DBMS are installed at client site. Clients are taking
instructions from server and help them by taking their load. When any user want to execute a query on client,
the client first take data from server then execute the query on his own hardware and returns the result to the
server. As a result, server is free to do more complex applications.
3. Network Interface : Clients are connected to server by network interface. It is useful in connecting
the server interface with user interface so that server can run his applications over his clients.
In the client server architecture, there are more than one server. Sometimes, a server is used as
Database Server, other as Application Server, other as Backup Server etc.
Advantages of Client-Server Database System
1. It increase the overall performance of DBMS.
2. Load can be distributed among clients.
3. It provides better user interface.
4. It is used to develop highly complex applications.
5. Clients with different operating systems can be connected with each other.
6. Single copy of DBMS is shared.
7. It reduces cost.
Disadvantages of Client-Server Database System
1. Network is error prone.
2. It is a kind of centralized system. If server is crashed or failed, there is loss of data.