Introduction To Database Concepts

Download as pdf or txt
Download as pdf or txt
You are on page 1of 40

INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

KABARAK UNIVERSITY

BSCIT, BMIT &BSC TLCM


INTE 121: FUNDAMENTALS OF DATABASE
MANAGEMENT SYSTEM
REVISED 3RD EDITION(2013)

1
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

INTRODUCTION TO DATABASE CONCEPTS


Introduction:
Databases and database technology are having a major impact on the growing use of computers.
It is fair to say that databases play a critical role in almost all areas where computers are used, including
business, electronic commerce, engineering, medicine, law, education, and library science, to name a few.
The word database is in such common use that we must begin by defining what a database is.

Database: (DB)
This is a shared collection of logically related data and description of the data designed to meet
the information needs of an organization. It is a single repository of data that can be used
simultaneously by many departments and users. All the data items are integrated with minimum
amount of duplication.
Since database holds not only the organization’s operational data but also a description of this
data hence database can be defined as a self describing collection of integrated records. The
description of the data is known as the system catalog (data dictionary or meta-data- data about
data). When analyzing information needs of an organization we identify entities, attributes and
relationships. Whereby an entity is a distinct object in an object in an organization that is to be
represented in the database. An attribute is a property that describes some aspect of the object
one wishes to record while a relationship is an association between entities.

A database has the following implicit properties:


 A database represents some aspect of the real world, sometimes called the miniworld or the
universe of discourse (DoD). Changes to the miniworld are reflected in the database.
 A database is a logically coherent collection of data with some inherent meaning. A random
assortment of data cannot correctly be referred to as a database.
 A database is designed, built, and populated with data for a specific purpose. It has an intended
group of users and some preconceived applications in which these users are interested.

Database Management System (DBMS)


This is a software system that enables users to define, create, maintain and control access to the
database. The DBMS is hence a general-purpose software system that facilitates the processes of
defining, constructing, manipulating, and sharing databases among various users and applications.
Defining a database involves specifying the data types, structures, and constraints for the data to be
stored in the database. Constructing the database is the process of storing the data itself on some storage
medium that is controlled by the DBMS. Manipulating a database includes such functions as querying
the database to retrieve specific data, updating the database to reflect changes in the miniworld, and
generating reports from the data. Sharing a database allows multiple users and programs to access the
database concurrently.
Other important functions provided by the DBMS include protecting the database and maintaining it over
a long period of time. Protection includes both system protection against hardware or software
malfunction (or crashes), and security protection against unauthorized or malicious access. A typical large

2
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

database may have a life cycle of many years, so the DBMS must be able to maintain the database system
by allowing the system to evolve as requirements change over time.

Database Application Program


A database application is simply a program that interacts with the database at some point in its execution.
We also use the more inclusive term database system to be a collection of application programs that
interact with the database along with the DBMS and database itself.

TRADITIONAL FILE-BASED APPROACH


It is almost a tradition that comprehensive database books introduce the database system with a review of
its predecessor, the file-based system. We will not depart from this tradition. Although the file-based
approach is largely obsolete, there are good reasons for studying it:
 Understanding the problems inherent in file-based systems may prevent us from repeating these
problems in database systems. In other words, we should learn from our earlier mistakes.
Actually, using the word ‘mistakes’ is derogatory and does not give any cognizance to the work
that served a useful purpose for many years. However, we have learned from this work that there
are better ways to handle data.
 If you wish to convert a file-based system to a database system, understanding how the file
system works will be extremely useful, if not essential

FILE-BASED APPROACH
File-based system is a collection of application program that perform services for the end users such as
production of reports. Each program defines and manages its own data.
This is the system which attempted computerizing file keeping (manual). A file is a collection of records
which contain logically related data. Since each application program was independent for each
department there was the need for data processing staff for each department.

LIMITATION OF THE FILE BASED SYSTEM.

Separation And Isolation Of Data.


When data is isolated in separate files the more difficult to access data that should be available. For
example assuming that a university running with file based system and they have organized a graduation
ceremony for those students who have cleared there course as well as their fee this will require details to
be compiled from the registration office .to the exam office then the accounts office. Therefore this
system becomes difficult.

Duplication Of Data.
Due to the decentralization approach taken by each department, uncontrolled duplication of data is
encouraged. Uncontrolled duplication is undesired because
Duplication is wasteful. It cost time and money to enter the data more than once.
 Takes up additional storage space.
 It can lead to loss of data integrity through data inconsistency.

3
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Data Dependence.
Since the physical structure and storage of data files and records are defined in the application codes to
implement changes to an existing structure are difficult to make. This characteristic is known as program
–data dependence.

Incompatible File Format.


Because the structure of files is embedded in application program the structure are dependent on the
application programming language. Because of the different programming languages used to develop the
different department the files structures are incompatible which makes it difficult for them to process
jointly.
This requires the application development to write a software to convert the files to common format to
facilitate processing which is time consuming and expensive.

Fixed Queries/Proliferation Of Application Program.


This system is very dependent on the application developer who has to write queries of reports that are
required. Therefore there cannot be an unplanned query from the very beginning.

Database Approach
All the above limitations of the file-based approach can be attributed to two factors:
i. The definition of the data is embedded in the application programs, rather than being stored
separately and independently;
ii. There is no control over the access and manipulation of data beyond that imposed by the
application programs.
To become more effective, a new approach was required. What emerged were the database and the
Database Management System (DBMS).

CHARACTERISTICS OF THE DATABASE APPROACH


A number of characteristics distinguish the database approach from the traditional approach of
programming with files. In traditional file processing, each user defines and implements the files needed
for a specific software application as part of programming the application. For example, one user, the
grade reporting office, may keep a file on students and their grades. Programs to print a student's
transcript and to enter new grades into the file are implemented as part of the application. A second user,
the accounting office, may keep track of students' fees and their payments. Although both users are
interested in data about students, each user maintains separate files-and programs to manipulate these
files-because each requires some data not available from the other user's files. This redundancy in
defining and storing data results in wasted storage space and in redundant efforts to maintain common
data up to date.
In the database approach, a single repository of data is maintained that is defined once and then is
accessed by various users. The main characteristics of the database approach versus the file-processing
approach are the following:
 Self-describing nature of a database system
 Insulation between programs and data, and data abstraction
 Support of multiple views of the data
 Sharing of data and multiuser transaction processing

4
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Self-Describing Nature of a Database System

A fundamental characteristic of the database approach is that the database system contains not only the
database itself but also a complete definition or description of the database structure and constraints. This
definition is stored in the DBMS catalog, which contains information such as the structure of each file,
the type and storage format of each data item, and various constraints on the data. The information stored
in the catalog is called meta-data, and it describes the structure of the primary database. Precisely meta-
data is data about data.

Insulation between Programs and Data, and Data


Abstraction

In traditional file processing, the structure of data files is embedded in the application programs,so any
changes to the structure of a file may require changing all programs that access this file. By contrast,
DBMS access programs do not require such changes in most cases. The structure of data files is stored in
the DBMS catalog separately from the access programs. We call this property program-data
independence.
The characteristic that allows program-data independence is called data abstraction.

Support of Multiple Views of the Data

A database typically has many users, each of whom may require a different perspective or view of the
database. A view may be a subset of the database or it may contain virtual data that is derived from the
database files but is not explicitly stored. Some users may not need to be aware of whether the data they
refer to is stored or derived. A multiuser DBMS whose users have a variety of distinct applications must
provide facilities for defining multiple views. For example, one user of the database of may be interested
only in accessing and printing the transcript of each student. A second user, who is interested only in
checking that students have taken all the prerequisites of each course for which they register. All these
users use different views because there needs vary.

Sharing of Data and Multiuser


Transaction Processing

A multiuser DBMS, as its name implies, must allow multiple users to access the database at the same
time. This is essential if data for multiple applications is to be integrated and maintained in a single
database. The DBMS must include concurrency control software to
ensure that several users trying to update the same data do so in a controlled manner so that the result of
the updates is correct. For example, when several reservation clerks try to assign a seat on an airline
flight, the DBMS should ensure that each seat can be accessed by only one clerk at a time for assignment
to a passenger. These types of applications are generally called online transaction processing (OLTP)
applications. A fundamental role
of multiuser DBMS software is to ensure that concurrent transactions operate correctly.
The concept of a transaction has become central to many database applications. A
transaction is an executing program or process that includes one or more database accesses,
such as reading or updating of database records. Each transaction is supposed to execute a
logically correct database access if executed in its entirety without interference from other transactions.
The DBMS must enforce several transaction properties. The isolation property ensures that each

5
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

transaction appears to execute in isolation from other transactions, even though hundreds of transactions
may be executing concurrently. The
atomicity property ensures that either all the database operations in a transaction are
executed or none are.

Components of the DBMS Environment:


There are five major components: -
i. Hardware
ii. Software
iii. Data
iv. Procedures
v. People
Hardware:
The DBMS and the applications require hardware to run. The hardware depends on the organization’s
requirements and the DBMS used. DBMS operating on the minimum configuration may not necessarily
give acceptable performance.

Software:
This comprises of DBMS, application programs, operating system and network software if the DBMS is
being used over a network.

Data:
The DBMS contains both operational and meta-data. The structure of DB is called schema.

Procedures:
These are the instructions and rules that govern the design and use of the DB. The users of the system and
staff that manage the DB require documented procedure on how to use or run the system. They may
consist instructions like on how:
 To log on to the DBMS
 To use a particular DBMS facility
 To start and stop the DBMS
 To make backup copies of DB
 To handle hardware and software failure.
People:
The different people who participate in the DB environment are: -
 Data and Database Administrator
 Database designer
 Application developer
 End – users
The people are also known as roles.

Roles in the Database Environment

Data and Database Administrators

The database and the DBMS are corporate resources that must be managed like any other resource. Data
and database administration are the roles generally associated with the management and control of a
DBMS and its data. The Data Administrator (DA) is responsible for the management of the data resource
including database planning, development and maintenance of standards, policies and procedures, and

6
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

conceptual/logical database design. The DA consults with and advises senior managers, ensuring that the
direction of database development will ultimately support corporate objectives.

The Database Administrator (DBA) is responsible for the physical realization of the database, including
physical database design and implementation, security and integrity control, maintenance of the
operational system, and ensuring satisfactory performance of the applications for users. The role of the
DBA is more technically oriented than the role of the DA, requiring detailed knowledge of the target
DBMS and the system environment.
The tasks associated with data administrator and database administrators are: -

Data Administrator Database Administrator


 Selecting appropriate productivity  Evaluating and selecting DBMS
tools products
 Assisting in the development of the  Undertaking physical database
corporate Information design.
Technology/Information System and
enterprise strategies.
 Developing corporate data model  Implementing a physical DB using a
target DBMS.
 Determining organization’s data  Defining security and integrity
requirements. constraints
 Setting data collection standards and  Liaising with DB application
establishing data formats developers.
 Estimating volumes of data and  Developing test strategies
likely growth.
 Determining patterns and  Training users
frequencies of data usage

Database Designers

In large database design projects, we can distinguish between two types of designer: logical database
designers and physical database designers. The logical database designer is concerned with identifying
the data (that is, the entities and attributes), the relationships between the data, and the constraints on the
data that is to be stored in the database.

The logical database designer must have a thorough and complete understanding of the organization’s
data and any constraints on this data (the constraints are sometimes called business rules). These
constraints describe the main characteristics of the data as viewed by the organization.
To be effective, the logical database designer must involve all prospective database users in the
development of the data model, and this involvement should begin as early in the process as possible. The
work of the logical database designer into two stages:

 Conceptual database design, which is independent of implementation details such as the target
DBMS, application programs, programming languages, or any other physical considerations;

 Logical database design, which targets a specific data model, such as relational, network,
hierarchical, or object-oriented. This involves the target DBMS. The physical database designer
decides how the logical database design is to be physically realized. This involves:

7
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

 Mapping the logical database design into a set of tables and integrity constraints;

 Selecting specific storage structures and access methods for the data to achieve good
performance;

 Designing any security measures required on the data.

Many parts of physical database design are highly dependent on the target DBMS, and there may be more
than one way of implementing a mechanism. Consequently, the physical database designer must be fully
aware of the functionality of the target DBMS and must understand the advantages and disadvantages of
each alternative for a particular implementation. The physical database designer must be capable of
selecting a suitable storage strategy that takes account of usage. Whereas conceptual and logical database
designs are concerned with the what, physical database design is concerned with the how.

Application Developers

Once the database has been implemented, the application programs that provide the required functionality
for the end-users must be implemented. This is the responsibility of the application developers. Typically,
the application developers work from a specification produced by systems analysts. Each program
contains statements that request the DBMS to perform some operation on the database. This includes
retrieving data, inserting, updating, and deleting data.

End-Users

The end-users are the ‘clients’ for the database, which has been designed and implemented, and is being
maintained to serve their information needs. End-users can be classified according to the way they use the
system:

 Naive users are typically unaware of the DBMS. They access the database through specially
written application programs that attempt to make the operations as simple as possible. They
invoke database operations by entering simple commands or choosing options from a menu. This
means that they do not need to know anything about the database or the DBMS. For example, the
checkout assistant at the local supermarket uses a bar code reader to find out the price of the item.
However, there is an application program present that reads the bar code, looks up the price of the
item in the database, reduces the database field containing the number of such items in stock, and
displays the price on the till.

 Sophisticated users. At the other end of the spectrum, the sophisticated end-user is familiar with
the structure of the database and the facilities offered by the DBMS. Sophisticated end-users may
use a high-level query language such as SQL to perform the required operations. Some
sophisticated end-users may even write application programs for their own use.

8
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Advantages and Disadvantages of DBMSs

Advantages

The advantages of database management systems are :

Control of data redundancy

In traditional file-based systems waste space by storing the same information in more than one file. In
contrast, the database approach attempts to eliminate the redundancy by integrating the files so that
multiple copies of the same data are not stored. However, the database approach does not eliminate
redundancy entirely, but controls the amount of redundancy inherent in the database. Sometimes, it is
necessary to duplicate key data items to model relationships. At other times, it is desirable to duplicate
some data items to improve performance.

Data consistency

By eliminating or controlling redundancy, we reduce the risk of inconsistencies occurring. If a data item
is stored only once in the database, any update to its value has to be performed only once and the new
value is available immediately to all users. If a data item is stored more than once and the system is aware
of this, the system can ensure that all copies of the item are kept consistent. Unfortunately, many of
today’s DBMSs do not automatically ensure this type of consistency.

More information from the same amount of data

With the integration of the operational data, it may be possible for the organization to derive additional
information from the same data.

Sharing of data

Typically, files are owned by the people or departments that use them. On the other hand, the database
belongs to the entire organization and can be shared by all authorized users. In this way, more users share
more of the data. Furthermore, new applications can build on the existing data in the database and add
only data that is not currently stored, rather than having to define all data requirements again. The new
applications can also rely on the functions provided by the DBMS, such as data definition and
manipulation, and concurrency and recovery control, rather than having to provide these functions
themselves.

Improved data integrity

Database integrity refers to the validity and consistency of stored data. Integrity is usually expressed in
terms of constraints, which are consistency rules that the database is not permitted to violate. Constraints
may apply to data items within a single record or they may apply to relationships between records. Again,
integration allows the DBA to define, and the DBMS to enforce, integrity constraints.

9
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Improved security

Database security is the protection of the database from unauthorized users. Without suitable security
measures, integration makes the data more vulnerable than file-based systems. However, integration
allows the DBA to define, and the DBMS to enforce, database security. This may take the form of user
names and passwords to identify people authorized to use the database. The access that an authorized user
is allowed on the data may be restricted by the operation type (retrieval, insert, update, delete).

Enforcement of standards

Again, integration allows the DBA to define and enforce the necessary standards. These may include
departmental, organizational, national, or international standards for such things as data formats to
facilitate exchange of data between systems, naming conventions, documentation standards, update
procedures, and access rules.

Economy of scale

Combining all the organization’s operational data into one database, and creating a set of applications that
work on this one source of data, can result in cost savings. In this case, the budget that would normally be
allocated to each department for the development and maintenance of its file-based system can be
combined, possibly resulting in a lower total cost, leading to an economy of scale. The combined budget
can be used to buy a system configuration that is more suited to the organization’s needs. This may
consist of one large, powerful computer or a network of smaller computers.

Balance of conflicting requirements

Each user or department has needs that may be in conflict with the needs of other users. Since the
database is under the control of the DBA, the DBA can make decisions about the design and operational
use of the database that provide the best use of resources for the organization as a whole. These decisions
will provide optimal performance for important applications, possibly at the expense of less critical ones.

Increased concurrency

In some file-based systems, if two or more users are allowed to access the same file simultaneously, it is
possible that the accesses will interfere with each other, resulting in loss of information or even loss of
integrity. Many DBMSs manage concurrent database access and ensure such problems cannot occur.

Improved backup and recovery services

Many file-based systems place the responsibility on the user to provide measures to protect the data from
failures to the computer system or application program. This may involve taking a nightly backup of the
data. In the event of a failure during the next day, the backup is restored and the work that has taken place
since this backup is lost and has to be re-entered. In contrast, modern DBMSs provide facilities to
minimize the amount of processing that is lost following a failure.

10
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Disadvantages

The disadvantages of the database approach are:

Complexity

The provision of the functionality we expect of a good DBMS makes the DBMS an extremely complex
piece of software. Database designers and developers, the data and database administrators, and end-users
must understand this functionality to take full advantage of it. Failure to understand the system can lead to
bad design decisions, which can have serious consequences for an organization.

Size

The complexity and breadth of functionality makes the DBMS an extremely large piece of software,
occupying many megabytes of disk space and requiring substantial amounts of memory to run efficiently.

Cost of DBMSs

The cost of DBMSs varies significantly, depending on the environment and functionality provided. For
example, a single-user DBMS for a personal computer may only cost US$100. However, a large
mainframe multi-user DBMS servicing hundreds of users can be extremely expensive, perhaps
US$100,000 or even US$1,000,000. There is also the recurrent annual maintenance cost, which is
typically a percentage of the list price.

Additional hardware costs

The disk storage requirements for the DBMS and the database may necessitate the purchase of additional
storage space. Furthermore, to achieve the required performance, it may be necessary to purchase a larger
machine, perhaps even a machine dedicated to running the DBMS. The procurement of additional
hardware results in further expenditure.

Cost of conversion

In some situations, the cost of the DBMS and extra hardware may be insignificant compared with the cost
of converting existing applications to run on the new DBMS and hardware. This cost also includes the
cost of training staff to use these new systems, and possibly the employment of specialist staff to help
with the conversion and running of the system. This cost is one of the main reasons why some
organizations feel tied to their current systems and cannot switch to more modern database technology.
The term legacy system is sometimes used to refer to an older, and usually inferior, system.

Performance

Typically, a file-based system is written for a specific application, such as invoicing. As a result,
performance is generally very good. However, the DBMS is written to be more general, to cater for many
applications rather than just one. The effect is that some applications may not run as fast as they used to.

Higher impact of a failure

11
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

The centralization of resources increases the vulnerability of the system. Since all users and applications
rely on the availability of the DBMS, the failure of certain components can bring operations to a halt.

The Three-Level ANSI-SPARC Architecture

An early proposal for a standard terminology and general architecture for database systems was produced
in 1971 by the DBTG (Data Base Task Group) appointed by the Conference on Data Systems and
Languages (CODASYL, 1971). The DBTG recognized the need for a two-level approach with a system
view called the schema and user views called subschemas. The American National Standards Institute
(ANSI) Standards Planning and Requirements Committee (SPARC), ANSI/X3/SPARC, produced a
similar terminology and architecture in 1975 (ANSI, 1975). ANSI-SPARC recognized the need for a
three-level approach with a system catalog. These proposals reflected those published by the IBM user
organizations Guide and Share some years previously, and concentrated on the need for a
implementation-independent layer to isolate programs from underlying representational issues
(Guide/Share, 1970). Although the ANSI-SPARC model did not become a standard, it still provides a
basis for understanding some of the functionality of a DBMS.

For our purposes, the fundamental point of these and later reports is the identification of three levels of
abstraction, that is, three distinct levels at which data items can be described. The levels form a three level
architecture comprising an external, a conceptual, and an internal level, as depicted in Figure below. The
way users perceive the data is called the external level. The way the DBMS and the operating system
perceive the data is the internal level, where the data is actually stored using the data structures and files
organizations. The conceptual level provides both the mapping and the desired independence between the
external and internal levels.

12
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

External Level View 1 View 2 View n

Conceptual Level Conceptual Schema

Internal Schema
Internal Level

Physical data Organization


DB

The objective of the three-level architecture is to separate each user’s view of the database from the way
the database is physically represented. There are several reasons why this separation is desirable:

13
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

 Each user should be able to access the same data, but have a different customized view of the
data. Each user should be able to change the way he or she views the data, and this change should
not affect other users.

 Users should not have to deal directly with physical database storage details, such as indexing or
hashing. In other words, a user’s interaction with the database should be independent of storage
considerations.

 The Database Administrator (DBA) should be able to change the database storage structures
without affecting the users’ views.

 The internal structure of the database should be unaffected by changes to the physical aspects of
storage, such as the changeover to a new storage device.

 The DBA should be able to change the conceptual structure of the database without affecting all
users.

External Level

The users’ view of the database. This level describes that part of the database that is relevant to each user.
The external level consists of a number of different external views of the database. Each user has a view
of the ‘real world’ represented in a form that is familiar for that user. The external view includes only
those entities, attributes, and relationships in the ‘real world’ that the user is interested in. Other entities,
attributes, or relationships that are not of interest may be represented in the database, but the user will be
unaware of them. In addition, different views may have different representations of the same data.

Conceptual Level

The community view of the database. This level describes what data is stored in the database and the
relationships among the data. The middle level in the three-level architecture is the conceptual level. This
level contains the logical structure of the entire database as seen by the DBA. It is a complete view of the
data requirements of the organization that is independent of any storage considerations.
The conceptual level represents:

 All entities, their attributes, and their relationships;

 The constraints on the data;

 Semantic information about the data;

 Security and integrity information.

The conceptual level supports each external view, in that any data available to a user must be contained
in, or derivable from, the conceptual level. However, this level must not contain any storage-dependent
details. For instance, the description of an entity should contain only data types of attributes (for example,
integer, real, character) and their length(such as the maximum number of digits or characters), but not any
storage considerations, such as the number of bytes occupied.

14
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Internal Level

The physical representation of the database on the computer. This level describes how the data is stored in
the database. The internal level covers the physical implementation of the database to achieve optimal run
time performance and storage space utilization. It covers the data structures and file organizations used to
store data on storage devices. It interfaces with the operating system access methods (file management
techniques for storing and retrieving data records) to place the data on the storage devices, build the
indexes, retrieve the data, and so on. The internal level is concerned with such things as:

 Storage space allocation for data and indexes;

 Record descriptions for storage (with stored sizes for data items);

 Record placement;

 Data compression and data encryption techniques.

Schemas, Mappings, and Instances

The overall description of the database is called the database schema. There are three different types of
schema in the database and these are defined according to the levels of abstraction of the three-level
architecture. At the highest level, we have multiple external schemas (also called subschemas) that
correspond to different views of the data.
At the conceptual level, we have the conceptual schema, which describes all the entities, attributes, and
relationships together with integrity constraints.
At the lowest level of abstraction we have the internal schema, which is a complete description of the
internal model, containing the definitions of stored records, the methods of representation, the data fields,
and the indexes and storage structures used. There is only one conceptual schema and one internal schema
per database.

The DBMS is responsible for mapping between these three types of schema. It must also check the
schemas for consistency; in other words, the DBMS must check that each external schema is derivable
from the conceptual schema, and it must use the information in the conceptual schema to map between
each external schema and the internal schema.

The conceptual schema is related to the internal schema through a conceptual/internal mapping. This
enables the DBMS to find the actual record or combination of records in physical storage that constitute a
logical record in the conceptual schema, together with any constraints to be enforced on the operations for
that logical record. It also allows any differences in entity names, attribute names, attribute order, data
types, and so on, to be resolved.
Finally, each external schema is related to the conceptual schema by the external/conceptual mapping.
This enables the DBMS to map names in the user’s view on to the relevant part of the conceptual schema.

It is important to distinguish between the description of the database and the database itself. The
description of the database is the database schema. The schema is specified during the database design
process and is not expected to change frequently. However, the actual data in the database may change
frequently. The data in the database at any particular point in time is called a database instance.

15
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Therefore, many database instances can correspond to the same database schema. The schema is
sometimes called the intension of the database, while an instance is called an extension (or state) of the
database.

Data Independence
A major objective for the three-level architecture is to provide data independence, which means that upper
levels are unaffected by changes to lower levels. There are two kinds of data independence: logical and
physical.

Logical data independence


Logical data independence refers to the immunity of the external schemas to changes in the conceptual
schema.
Changes to the conceptual schema, such as the addition or removal of new entities, attributes, or
relationships, should be possible without having to change existing external schemas or having to rewrite
application programs. Clearly, the users for whom the changes have been made need to be aware of them,
but what is important is that other users should not be.

Physical data independence


Physical data independence refers to the immunity of the conceptual schema to changes in the internal
schema.
Changes to the internal schema are such as:
 using different file organizations or storage structures,
 using different storage devices
 Modifying indexes, or hashing algorithms, should be possible without having to change the
conceptual or external schemas.
From the users’ point of view, the only effect that may be noticed is a change in performance. In fact,
deterioration in performance is the most common reason for internal schema changes.

Database Languages

A data sublanguage consists of two parts: a Data Definition Language (DDL) and a Data Manipulation
Language (DML). The DDL is used to specify the database schema and the DML is used to both read and
update the database. These languages are called data sublanguages because they do not include constructs
for all computing needs such as conditional or iterative statements, which are provided by the high-level
programming languages.

The Data Definition Language (DDL)

This is a language that allows the DBA or user to describe and name the entities, attributes, and
relationships required for the application, together with any associated integrity and security constraints.
The DDL is used to define a schema or to modify an existing one. It cannot be used to manipulate data.
The result of the compilation of the DDL statements is a set of tables stored in special files collectively
called the system catalog. The system catalog integrates the meta-data, that is data that describes objects

16
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

in the database and makes it easier for those objects to be accessed or manipulated. The meta-data
contains definitions of records, data items, and other objects that are of interest to users or are required by
the DBMS. The DBMS normally consults the system catalog before the actual data is accessed in the
database.

The Data Manipulation Language (DML)

This is a language that provides a set of operations to support the basic data manipulation operations on
the data held in the database.
Data manipulation operations usually include the following:

 insertion of new data into the database;

 modification of data stored in the database;

 retrieval of data contained in the database;

 Deletion of data from the database.

Therefore, one of the main functions of the DBMS is to support a data manipulation language in which
the user can construct statements that will cause such data manipulation to occur. Data manipulation
applies to the external, conceptual, and internal levels. However, at the internal level we must define
rather complex low-level procedures that allow efficient data access. In contrast, at higher levels,
emphasis is placed on ease of use and effort is directed at providing efficient user interaction with the
system.

The part of a DML that involves data retrieval is called a query language. A query language can be
defined as a high-level special-purpose language used to satisfy diverse requests for the retrieval of data
held in the database. The term ‘query’ is therefore reserved to denote a retrieval statement expressed in a
query language.
DMLs are distinguished by their underlying retrieval constructs. We can distinguish between two types of
DML: procedural and non-procedural. The prime difference between these two data manipulation
languages is that procedural languages specify how the output of a DML statement is to be obtained,
while non-procedural DMLs describe only what output is to be obtained. Typically, procedural languages
treat records individually, whereas non-procedural languages operate on sets of records.

Procedural DMLs

This is a language that allows the user to tell the system what data is needed and exactly how to retrieve
the data. With a procedural DML, the user, or more normally the programmer, specifies what data is
needed and how to obtain it. This means that the user must express all the data access operations that are
to be used by calling appropriate procedures to obtain the information required. Typically, such a
procedural DML retrieves a record, processes it and, based on the results obtained by this processing,
retrieves another record that would be processed similarly, and so on. This process of retrievals continues
until the data requested from the retrieval has been gathered. Typically, procedural DMLs are embedded
in a high-level programming language that contains constructs to facilitate iteration and handle
navigational logic.

17
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Non-procedural DMLs

It’s a language that allows the user to state what data is needed rather than how it is to be retrieved. Non-
procedural DMLs allow the required data to be specified in a single retrieval or update statement. With
non-procedural DMLs, the user specifies what data is required without specifying how it is to be obtained.
The DBMS translates a DML statement into one or more procedures that manipulate the required sets of
records. This frees the user from having to know how data structures are internally implemented and what
algorithms are required to retrieve and possibly transform the data, thus providing users with a
considerable degree of data independence. Non-procedural languages are also called declarative
languages.

Data Models

An integrated collection of concepts for describing and manipulating data, relationships between data, and
constraints on the data in an organization. A model is a representation of ‘real world’ objects and events,
and their associations. It is an abstraction that concentrates on the essential, inherent aspects of an
organization and ignores the accidental properties.
A data model represents the organization itself. It should provide the basic concepts and notations that
will allow database designers and end-users unambiguously and accurately to communicate their
understanding of the organizational data. A data model can be thought of as comprising three
components:
 A structural part, consisting of a set of rules according to which databases can be constructed;
 A manipulative part, defining the types of operation that are allowed on the data (this includes the
operations that are used for updating or retrieving data from the database and for changing the
structure of the database);
 Possibly a set of integrity constraints, which ensures that the data is accurate.

Based on the ANSI – SPARC architecture there are three categories of data Models: object-based,
record-based, and physical data models. The first two are used to describe data at the conceptual and
external levels; the latter is used to describe data at the internal level.

Object-Based Data Models

Object-based data models use concepts such as entities, attributes, and relationships. An entity is a distinct
object (a person, place, thing, concept, and event) in the organization that is to be represented in the
database. An attribute is a property that describes some aspect of the object that we wish to record, and a
relationship is an association between entities. Some of the more common types of object-based data
model are:

 Entity–Relationship

 Semantic

 Functional

 Object-Oriented.

18
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Record-Based Data Models

In a record-based model, the database consists of a number of fixed-format records possibly of differing
types. Each record type defines a fixed number of fields, each typically of a fixed length. There are three
principal types of record-based logical data model:
 The relational data model
 The network data model
 And the hierarchical data model.
The hierarchical and network data models were developed almost a decade before the relational data
model, so their links to traditional file processing concepts are more evident.

Relational data model

The relational data model is based on the concept of mathematical relations. In the relational model, data
and relationships are represented as tables, each of which has a number of columns with a unique name.
Note that the relational data model requires only that the database be perceived by the user as tables.
However, this perception applies only to the logical structure of the database, that is, the external and
conceptual levels of the ANSI-SPARC architecture. It does not apply to the physical structure of the
database, which can be implemented using a variety of storage structures.

Network data model

In the network model, data is represented as collections of records, and relationships are represented by
sets. Compared with the relational model, relationships are explicitly modeled by the sets, which become
pointers in the implementation. The records are organized as generalized graph structures with records
appearing as nodes (also called segments) and sets as edges in the graph.

Hierarchical data model

The hierarchical model is a restricted type of network model. Again, data is represented as collections of
records and relationships are represented by sets. However, the hierarchical model allows a node to have
only one parent. A hierarchical model can be represented as a tree graph, with records appearing as nodes
(also called segments) and sets as edges.

Record-based (logical) data models are used to specify the overall structure of the database and a higher
level description of the implementation. Their main drawback lies in the fact that they do not provide
adequate facilities for explicitly specifying constraints on the data, whereas the object-based data models
lack the means of logical structure specification but provide more semantic substance by allowing the
user to specify constraints on the data.

The majority of modern commercial systems are based on the relational paradigm, whereas the early
database systems were based on either the network or hierarchical data models. The latter two models
require the user to have knowledge of the physical database being accessed, whereas the former provides
a substantial amount of data independence.

Hence, while relational systems adopt a declarative approach to database processing (that is, they specify
what data is to be retrieved), network and hierarchical systems adopt a navigational approach (that is, they
specify how the data is to be retrieved).

19
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Physical Data Models

Physical data models describe how data is stored in the computer, representing information such as record
structures, record orderings, and access paths. There are not as many physical data models as logical data
models, the most common ones being the unifying model and the frame memory.
DATABASE DEVELOPMENT PROCESS
Database can be developed using the following method:
 Prototyping
 CASE tool
 Waterfall model

Prototyping
A prototype is a working model that does not normally have all the required features or provide all the
functionality of the final system. The main purpose of developing a prototype database system is to allow
users to use the prototype to identify the features of the system that work well, or are inadequate, and if
possible to suggest improvements or even new features to the database system. In this way, we can
greatly clarify the users’ requirements for both the users and developers of the system and evaluate the
feasibility of a particular system design. Prototypes should have the major advantage of being relatively
inexpensive and quick to build.
There are two prototyping strategies in common use today: requirements prototyping and evolutionary
prototyping. Requirements prototyping uses a prototype to determine the requirements of a proposed
database system and once the requirements are complete the prototype is discarded. While evolutionary
prototyping is used for the same purposes, the important difference is that the prototype is not discarded
but with further development becomes the working database system.

CASE Tools
The first stage of the database system development lifecycle, namely database planning, may also involve
the selection of suitable Computer-Aided Software Engineering (CASE) tools. In its widest sense, CASE
can be applied to any tool that supports software engineering. Appropriate productivity tools are needed
by data administration and database administration staff to permit the database development activities to
be carried out as efficiently and effectively as possible. CASE support may include:
 a data dictionary to store information about the database system’s data;
 Design tools to support data analysis; tools to permit development of the corporate data model,
and the conceptual and logical data models; tools to enable the prototyping of applications.
CASE tools may be divided into three categories: upper-CASE, lower-CASE, and integrated-CASE.
Upper-CASE tools support the initial stages of the database system development lifecycle, from planning
through to database design. Lower-CASE tools support the later stages of the lifecycle, from
implementation through testing, to operational maintenance. Integrated-CASE tools support all stages of
the lifecycle and thus provide the functionality of both upper- and lower-CASE in one tool.

Waterfall model
It involves a series of logical steps whereby the end results of one step becomes the input of the
succeeding step. The steps are:

Enterprise Modeling:
The activities involved in this level are:
 Analyzing the current data processing system
 Analyzing the general business functions and database needs

20
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

 Justification of the need for new data and database in support of business
Conceptual Modeling:
The activities involved in this level are:
 Identifying the scope of database requirement for proposed information system
 Analyzing overall data requirements for business function(s) supported by database
 Developing preliminary conceptual data model including entities and relationships
 Comparing preliminary conceptual data model with enterprise data model
 Developing detailed conceptual data model which include all entities, relationships, attributes,
and business rules
 Making conceptual data model consistent with other model
Logical database design:
The activities involved in this level are:
 Analyzing in detail the transactions, forms, inquiries required by the business functions supported
by the database
 Integrating database views into conceptual data model
 Identifying data integrity and security requirements
 Create a stable and well – defined structure for the database.
Physical database design
The activities involved in this level are:
 Defining database to DBMS
 Deciding on physical organizing of data
 Designing database processing program
Database implementation:
The activities involved in this level are:
 Code and test database processing programs
 Completion of database documentation and training materials
 Installation of database and conversion of data from prior system
Database maintenance
The activities involved in this level are:
 Analyzing database and database applications to ensure that the evolving information
requirements are met.
 Tuning the DB for improved performance
 Fixing errors in DB and DB applications as well as recovering the DB when failure occurs

RELATIONAL DATA MODEL


Relational Data Structure
The structural part of relational data model include:
1. Relation:
This is a table with columns and rows. RDBMS requires only that DB be perceived by user as tables.
This is only for the external and conceptual levels of the ANSI-SPARC
2. Attribute:
This is a name given to a column of a relation.
3. Domain:
This is a set of allowable values for one or more attributes.

4. Tuple:
This is a row of a relation.
5. Intension:

21
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

This is the structure of a relation together with a specification of the domain and only other restrictions
on possible values.
6. Extension: (state)
These are the tuples of a relation which change over time.
7. Degree:
The degree of a relation is the number of attributes it contains. This is a property of the intention of the
relation.
8. Cardinality:-
This is the number of tuples a relation contains.

9. Relational DB:
This is a collection of normalized relation with distinct relation names.

Properties of Relations
A relation has the following properties:
 the relation has a name that is distinct from all other relation names in the relational schema;
 each cell of the relation contains exactly one atomic (single) value;
 each attribute has a distinct name;
 the values of an attribute are all from the same domain;
 each tuple is distinct; there are no duplicate tuples;
 the order of attributes has no significance;
 The order of tuples has no significance, theoretically. (However, in practice, the order may affect
the efficiency of accessing tuples.)

Relational Keys
As stated above, there are no duplicate tuples within a relation. Therefore, we need to be able to identify
one or more attributes (called relational keys) that uniquely identifies each tuple in a relation. They
include:
 Superkey
This is an attribute or set of attributes, that uniquely identifies a tuple within a relation.
A superkey uniquely identifies each tuple within a relation. However, a superkey may contain additional
attributes that are not necessary for unique identification, and we are interested in identifying superkeys
that contain only the minimum number of attributes necessary for unique identification.
 Candidate key
This is a superkey such that no proper subset is a superkey within the relation.
A candidate key, K, for a relation R has two properties:
i. Uniqueness – in each tuple of R, the values of K uniquely identify that tuple;
ii. Irreducibility – no proper subset of K has the uniqueness property.
There may be several candidate keys for a relation. When a key consists of more than one attribute, we
call it a composite key.

 Primary key
The candidate key that is selected to identify tuples uniquely within the relation.
Since a relation has no duplicate tuples, it is always possible to identify each row uniquely. This means
that a relation always has a primary key. In the worst case, the entire set of attributes could serve as the
primary key, but usually some smaller subset is sufficient to distinguish the tuples. The candidate keys
that are not selected to be the primary key are called alternate keys.

Foreign key An attribute, or set of attributes, within one relation that matches the candidate key of some
(possibly the same) relation.

22
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

When an attribute appears in more than one relation, its appearance usually represents a relationship
between tuples of the two relations.
Integrity Constraints
Since every attribute has an associated domain, there are constraints (called domain constraints) that
form restrictions on the set of values allowed for the attributes of relations. In addition, there are two
important integrity rules, which are constraints or restrictions that apply to all instances of the database.
The two principal rules for the relational model are known as entity integrity and referential integrity.
Before we define entity and referential integrity, it is necessary to understand the concept of nulls.

Nulls
It represents a value for an attribute that is currently unknown or is not applicable for this tuple. A null
can be taken to mean the logical value ‘unknown’. It can mean that a value is not applicable to a
particular tuple, or it could merely mean that no value has yet been supplied. Nulls are a way to deal with
incomplete or exceptional data. However, a null is not the same as a zero numeric value or a text string
filled with spaces; zeros and spaces are values, but a null represents the absence of a value. Therefore,
nulls should be treated differently from other values. Some authors use the term ‘null value’, however as a
null is not a value but represents the absence of a value, the term ‘null value’ is deprecated.

ENTITY INTEGRITY
This rule applies to the primary key of a base relation. Base relation is a relation that corresponds to an
entity in the conceptual schema.
This constraint ensures that no attributes of the primary key can be null.

REFERENTIAL INTEGRITY
This rule applies to foreign keys. This constraint ensure that if a foreign key exists in a relation i.e. A
foreign key value it must match a candidate key value of some tuple in its home relation or the foreign
key value must be wholly null.
The FOREIGN KEY constraint is used to prevent actions that would destroy link between tables.
The FOREIGN KEY constraint also prevents that invalid data is inserted into the foreign key column,
because it has to be one of the values contained in the table it points to.

VIEWS
This is the dynamic result of one or more relational operation operating on the base relation to produce
another relation. It’s a virtual relation that does not necessarily exist in DB. But can be produced by a user
of the time of request.

PURPOSE OF VIEWS
 It provides a powerful and flexible security mechanism by hiding parts of the DB from certain
users
 It permits users to access data in a way that is customized to their needs
 It simplifies complex operation on the base relation

Updating Views
All updates to a base relation should be immediately reflected in all views that reference that base
relation. Similarly, if a view is updated, then the underlying base relation should reflect the change.
However, there are restrictions on the types of modification that can be made through views. We
summarize below the conditions under which most systems determine whether an update is allowed
through a view:

23
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

 Updates are allowed through a view defined using a simple query involving a single base relation
and containing either the primary key or a candidate key of the base relation.
 Updates are not allowed through views involving multiple base relations.
 Updates are not allowed through views involving aggregation or grouping operations.

Conceptual Modeling

From an examination of the three-level architecture, we see that the conceptual schema is the ‘heart’ of
the database. It supports all the external views and is, in turn, supported by the internal schema. However,
the internal schema is merely the physical implementation of the conceptual schema. The conceptual
schema should be a complete and accurate representation of the data requirements of the enterprise. If this
is not the case, some information about the enterprise will be missing or incorrectly represented and we
will have difficulty fully implementing one or more of the external views.

Conceptual modeling, or conceptual database design, is the process of constructing a model of the
information used in an enterprise that is independent of implementation details, such as the target DBMS,
application programs, programming languages, or any other physical considerations. This model is called
a conceptual data model.
The conceptual data model that results can be theoretically implemented on any database, or even on a
flat file system. The person who performs conceptual database design is often called a data modeler.

COMPONENTS OF CONCEPTUAL MODEL


The main basic components of a conceptual database design are:
 Entity
 Attribute
 Relationship
 Business Rule
 Intersection Data
Entities
An entity is a person, place, thing, event, or concept about which data is collected. In other words, entities
are the real world things in which we have sufficient interest to capture and store data about them in a
database. An entity is represented as a rectangle on the diagram. Just about anything that can be named
with a noun can be an entity.
However, to avoid designing everything on the planet into our database, we restrict ourselves to entities of
interest to the people who will use our database.

Attribute
An attribute is a unit fact that characterizes or describes an entity in some way. These are represented on
the conceptual design diagram as names inside the rectangle that represents the entity to which they
belong. The attribute (or attributes) that appears at the top of the rectangle (above the horizontal line) is
the unique identifier for the entity. A unique identifier, as the name suggests, provides a unique value for
each instance of the entity.

Relationships
Relationships are the associations among the entities. Because databases are all about storing related data,
the relationships become the glue that holds the database together. Relationships are shown on the
conceptual design diagram as lines connecting one or more entities.

24
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Business Rules
A business rule is a policy, procedure, or standard that an organization has adopted. Business rules are
very important in database design because they dictate controls that must be placed upon the data.
Business rules can be implemented in the database as constraints, which are formally defined rules that
restrict the data values in the database in some way.

NB: Intersection data, which is presented in “Many-to-Many Relationships.”

ENTITY RELATIONSHIP MODELLING


One of the most difficult aspects of database design is the fact that designers, programmers, and end-users
tend to view data and its use in different ways. Unfortunately, unless we gain a common understanding
that reflects how the enterprise operates, the design we produce will fail to meet the users’ requirements.
To ensure that we get a precise understanding of the nature of the data and how it is used by the
enterprise, we need to have a model for communication that is non-technical and free of ambiguities.

Basic concepts of the Entity–Relationship model


The basic components of ER model are: entities, relationships, and attributes.

Entity Types
These are a group of objects with the same properties, which are identified by the enterprise as having an
independent existence.
The basic concept of the ER model is the entity type, which represents a group of ‘objects’ in the ‘real
world’ with the same properties. An entity type has an independent existence and can be objects with a
physical (or ‘real’) existence or objects with a conceptual (or ‘abstract’) existence. Note that we are only
able to give a working definition of an entity type as no strict formal definition exists. This means that
different designers may identify different entities.
Examples of physical entity type include: staff, product, student, customer, lecture halls etc. Examples of
conceptual entity types include: course, sport, department, sale, unit etc.

Entity occurrence:
This is a uniquely identifiable object of an entity type. It is also known as an instance.
Diagrammatically the entity type in the ER model is represented by a rectangular shape with its name
within it. The name should be in singular form and start with a capital letter.

Staff
26

We can classify entity types as being strong or weak. Strong entity type is an entity type that is not
existence-dependent on some other entity type. An entity type is referred to as being strong if its existence
does not depend upon the existence of another entity type. Examples of strong entities are: the Staff,
Branch, and Client entities. A characteristic of a strong entity type is that each entity occurrence is
uniquely identifiable using the primary key attribute(s) of that entity type. For example, we can uniquely
identify each member of staff using the staffNo attribute, which is the primary key for the Staff entity
type.

25
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Weak entity type is an entity type that is existence-dependent on some other entity type. A weak entity
type is dependent on the existence of another entity type. An example of a weak entity type is Preference.
A characteristic of a weak entity is that each entity occurrence cannot be uniquely identified using only
the attributes associated with that entity type. For example, note that there is no primary key for the
Preference entity. This means that we cannot identify each occurrence of the Preference entity type using
only the attributes of this entity. We can only uniquely identify each preference through the relationship
that a preference has with a client who is uniquely identifiable using the primary key for the Client entity
type, namely clientNo. In this example, the Preference entity is described as having existence dependency
for the Client entity, which is referred to as being the owner entity.
Weak entity types are sometimes referred to as child, dependent, or subordinate entities and strong entity
types as parent, owner, or dominant entities.

Relationship Types
This is a set of meaningful associations among entity types. A relationship type is a set of associations
between one or more participating entity types.
Each relationship type is given a name that describes its function.
Relationship occurrence is a uniquely identifiable association, which includes one occurrence from each
participating entity type.
Consider a relationship type called Has, which represents an association between Branch and Staff
entities that is Branch Has Staff. Each occurrence of the Has relationship associates one Branch entity
occurrence with one Staff entity occurrence.

Has
Branch Staff

Note that we represent each Branch and Staff entity occurrences using values for their primary key
attributes, namely branchNo and staffNo. The relationship should be read from both directions. From
instance the above can also imply that every staff belongs to a branch.
Each relationship type is shown as a line connecting the associated entity types, shape labeled with the
name of the relationship. Normally, a relationship is named using a verb (for example, Supervises or
Manages) or a short phrase including a verb (for example, LeasedBy).
Again, the first letter of each word in the relationship name is shown in upper case. Whenever possible, a
relationship name should be unique for a given ER model.

Degree of Relationship Type


This is the number of participating entity types in a relationship. The entities involved in a particular
relationship type are referred to as participants in that relationship. The number of participants in a
relationship type is called the degree of that relationship. Therefore, the degree of a relationship indicates
the number of entity types involved in a relationship.
A relationship of degree two is called binary. An example of a binary relationship is the Has relationship
shown above with two participating entity types namely, Staff and Branch.
A relationship of degree three is called ternary. An example of a ternary relationship is Registers with
three participating entity types, namely Staff, Branch, and Client. This relationship represents the
registration of a client by a member of staff at a branch.
The term ‘complex relationship’ is used to describe relationships with degrees higher than binary.

26
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

A relationship of degree four is called quaternary.

Recursive Relationship
This is a relationship type where the same entity type participates more than once in different roles.
Consider a recursive relationship called Supervises, which represents an association of staff with a
Supervisor where the Supervisor is also a member of staff. In other words, the Staff entity type
participates twice in the Supervises relationship; the first participation as a Supervisor, and the second
participation as a member of staff who is supervised (Supervisee). Recursive relationships are sometimes
called unary relationships. Thus

Supervises

Supervisor

Supervisee
Staff

Cardinality and Participation Constraints


Cardinality Describes the maximum number of possible relationship occurrences for an entity
participating in a given relationship type.
The cardinality of a binary relationship is as follows:
i. one-to-one (1:1),
ii. one-to-many (1:n),
iii. Many-to-many (m: n). Where m and n are variables to symbolize more than one.
The cardinality of a relationship appears as the maximum values for the multiplicity ranges on either side
of the relationship.
One-to-One Relationships
A one-to-one relationship is an association where an instance of one entity can be associated with at most
one instance of the other entity, and vice versa.
One-to-one relationships are surprisingly rare among entities. In practice, one-to-one relationships that are
mandatory in both directions represent a design flaw that should be corrected by combining the two
entities. For example a staff member should belong to a branch.

27
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

1 Has 1
Branch Staff

One-to-Many Relationships
A one-to-many relationship is an association between two entities where any instance of the first entity
may be associated with one or more instances of the second, and any instance of the second entity may be
associated with at most one instance of the first. For example Customer and Order entities. The
relationship between Customer and Order, which is mandatory in only one direction, is read as follows:
“At any point in time, each customer can have zero to many orders, and each order must have one and
only one owning customer.”
One-to-many relationships are quite common. In fact, they are the fundamental building block of the
relational database model in that all relationships in a relational database are implemented as if they are
one-to-many. It is rare for them to be optional on the “one” side and even more rare for them to be
mandatory on the “many” side, but these situations do happen.
The many side of a relationship is represented by a crow feet. Thus:

1 M

Order
Customer

Many-to-Many Relationships
A many-to-many relationship is an association between two entities where any instance of the first entity
may be associated with zero, one, or more instances of the second, and vice versa. For example Order and
Product entity type is many-to-many relationship. We read the relationship thus: “At any given point in
time, each order contains zero to many products, and each product appears on zero to many orders.”
Data that belongs to a many-to-many relationship is called intersection data. It can be resolved by
deriving a new entity between the associated entities.

M N
Order
Product

Participation Determines whether all or only some entity occurrences participate in a relationship. The
participation constraint represents whether all entity occurrences are involved in a particular relationship
(referred to as mandatory participation) or only some (referred to as optional participation). The
participation of entities in a relationship appears as the minimum values for the multiplicity ranges on

28
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

either side of the relationship. Optional participation is represented as a minimum value of 0 while
mandatory participation is shown as a minimum value of 1. It is important to note that the participation
for a given entity in a relationship is represented by the minimum value on the opposite side of the
relationship; that is the minimum value for the multiplicity beside the related entity.
Minimum relationship cardinality is shown with a small circle near the end of the line to signify “zero”
(participation in the relationship is optional) or a short perpendicular line across the relationship line to
signify “one” (participation in the relationship is mandatory).

Attributes
This is a property of an entity or a relationship type. Each attribute is associated with a set of values called
a domain. The domain defines the potential values that an attribute may hold and is similar to the domain
concept in the relational model. For example the valid number of days to stay with a library book as a
student is 0 – 14 days.
Attributes can be classified as being: simple or composite; single-valued or multi-valued; or derived.

Simple and Composite Attributes

A simple attribute is an attribute composed of a single component with an independent existence.


Simple attributes cannot be further subdivided into smaller components. Examples of simple attributes
include position and salary of the Staff entity. Simple attributes are sometimes called atomic attributes.
A composite attribute is an attribute composed of multiple components, each with an independent
existence. Some attributes can be further divided to yield smaller components with an independent
existence of their own. For example, the address attribute of the Branch entity with the value (163 Main
St, Glasgow, G11 9QX) can be subdivided into street (163 Main St), city
(Glasgow), and postcode (G11 9QX) attributes.

Single-Valued and Multi-Valued Attributes

A single – valued attribute is an attribute that holds a single value for each occurrence of an entity
type.
The majority of attributes are single-valued. For example, each occurrence of the Branch entity type has a
single value for the branch number (branchNo) attribute (for example B003), and therefore the branchNo
attribute is referred to as being single-valued.
A multi – valued attribute is an attribute that holds multiple values for each occurrence of an entity
type. Some attributes have multiple values for each entity occurrence. For example, each occurrence of
the Branch entity type can have multiple values for the telNo attribute (for example, branch number B003
has telephone numbers 0141-339-2178 and 0141-339-4439) and therefore the telNo attribute in this case
is multi-valued.

Derived Attributes
This is an attribute that represents a value that is derivable from the value of a related attribute or set of
attributes, not necessarily in the same entity type. The values held by some attributes may be derived. For
example, the value for the duration attribute of the Lease entity is calculated from the rentStart and
rentFinish attributes also of the Lease entity type. We refer to the duration attribute as a derived attribute,
the value of which is derived from the rentStart and rentFinish attributes.

29
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

NORMALIZATION
Normalization is a technique for producing a set of relations with desirable properties, given the data
requirements of an enterprise. Normalization is a database design technique, which begins by examining
the relationships (called functional dependencies) between attributes. Attributes describe some property of
the data or of the relationships between the data that is important to the enterprise.
Normalization uses a series of tests (described as normal forms) to help identify the optimal grouping for
these attributes to ultimately identify a set of suitable relations that supports the data requirements of the
enterprise.
PURPOSE OF NORMALIZATION
The purpose of normalization is to identify a suitable set of relations that support the data requirements of
an enterprise. The characteristics of a suitable set of relations include the following:
 the minimal number of attributes necessary to support the data requirements of the enterprise;
 attributes with a close logical relationship (described as functional dependency) are found in the
same relation;
 minimal redundancy with each attribute represented only once with the important exception of
attributes that form all or part of foreign keys which are essential for the joining of related
relations.
The benefit of using a database that has a suitable set of relations is that the database will be easier for the
user to access and maintain the data, and take up minimal storage space on the computer.
When a relation is not normalized is denormalized or unnormalized. The problems associated with using a
relation that is not appropriately normalized or unnormalized are Data Redundancy and Update
Anomalies.
The major aim of relational database design is to group attributes into relations to minimize data
redundancy. If this aim is achieved, the potential benefits for the implemented database include the
following:
 updates to the data stored in the database are achieved with a minimal number of operations thus
reducing the opportunities for data inconsistencies occurring in the database;
 reduction in the file storage space required by the base relations thus minimizing costs.
Of course, relational databases also rely on the existence of a certain amount of data redundancy. This
redundancy is in the form of copies of primary keys (or candidate keys) acting as foreign keys in related
relations to enable the modeling of relationships between data.
Relations that have redundant data may have problems called update anomalies, which are classified as
insertion, deletion, or modification anomalies. Given the following as an unnormalized relation
staffbranch:

StaffNo Fname Position Salary BranchNo Baddress


S002 Kay Manager 50000 B003 Nairobi
S005 Hassan Ass.Manager 30000 B004 Mombasa
S007 Jane HR 45000 B002 Nakuru

NB: the primary key for this relation is branchNo and StaffNo

Insertion Anomalies
There are two main types of insertion anomaly, which we illustrate using the Staffbranch relation shown
above:
 To insert the details of new members of staff into the Staffbranch relation, we must include the
details of the branch at which the staff are to be located. For example, to insert the details of new
staff located at branch number B007, we must enter the correct details of branch number B007 so
that the branch details are consistent with values for branch B007 in other tuples of the

30
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Staffbranch relation. The relations shown above can only allow branchNo B007 once hence it
will be impossible.
 To insert details of a new branch that currently has no members of staff into the Staffbranch
relation, it is necessary to enter nulls into the attributes for staff, such as staffNo. However, as
staffNo is the primary key for the Staffbranch relation, attempting to enter nulls for staffNo
violates entity integrity, and is not allowed. We therefore cannot enter a tuple for a new branch
into the Staffbranch relation with a null for the staffNo.

Deletion Anomalies
If we delete a tuple from the Staffbranch relation that represents the last member of staff located at a
branch, the details about that branch are also lost from the database.

Modification Anomalies
If we want to change the value of one of the attributes of a particular branch in the Staffbranch relation,
for example the address for branch number B003, we must update the tuples of all staff located at that
branch. If this modification is not carried out on all the appropriate tuples of the Staffbranch relation, the
database will become inconsistent. In this example, branch number B003 may appear to have different
addresses in different staff tuples.
This demonstrates that while the Staffbranch relation is subject to update anomalies, we can avoid these
anomalies by decomposing the original relation into the Staff and Branch relations. There are two
important properties associated with decomposition of a larger relation into smaller relations:
 The lossless-join property ensures that any instance of the original relation can be identified from
corresponding instances in the smaller relations.
 The dependency preservation property ensures that a constraint on the original relation can be
maintained by simply enforcing some constraint on each of the smaller relations. In other words,
we do not need to perform joins on the smaller relations to check whether a constraint on the
original relation is violated.

Functional Dependencies
An important concept associated with normalization is functional dependency, which describes the
relationship between attributes (Maier, 1983). It describes the relationship between attributes in a relation.
For example, if A and B are attributes of relation R, B is functionally dependent on A (denoted A → B), if
each value of A is associated with exactly one value of B. (A and B may each consist of one or more
attributes.)
Determinant refers to the attribute, or group of attributes, on the left-hand side of the arrow of a
functional dependency. From the above explanation A is the determinant of B. For example consider the
attributes staffNo and position of the Staff relation. For a specific staffNo, for example S002, we can
determine the position of that member of staff as Manager. In other words, staffNo functionally
determines position.
However, the opposite is not true, as position does not functionally determine staffNo. A member of staff
holds one position; however, there may be several members of staff with the same position.

FULL FUNCTIONAL DEPENDENCY


Indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is
functionally dependent on A, but not on any proper subset of A. A functional dependency A → B is a
fully functional dependency if removal of any attribute from A results in the dependency no longer
existing. A functional dependency A→B is a partially dependency if there is some attribute that can be
removed from A and yet the dependency still holds.

31
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

It is correct to say that each value of (staffNo, sName) is associated with a single value of branchNo.
However, it is not a full functional dependency because branchNo is also functionally dependent on a
subset of (staffNo, sName), namely staffNo. In other words, the functional dependency shown above is an
example of a partial dependency. The type of functional dependency that we are interested in identifying
is a fully functional dependency as shown below.

StaffNo →branchNo

TRANSITIVE DEPENDENCY
This is a condition where A, B, and C are attributes of a relation such that if A → B and B → C, then C is
transitively dependent on A via B (provided that A is not functionally dependent on B or C). For example

StaffNo →sName, position, salary, branchNo, Baddress


BranchNo →Baddress
The transitive dependency branchNo → Baddress exists on staffNo via branchNo. In other words, the
staffNo attribute functionally determines the Baddress via the branchNo attribute and neither branchNo
nor Baddress functionally determines staffNo.

The Process of Normalization


Normalization is a formal technique for analyzing relations based on their primary key (or candidate
keys) and functional dependencies (Codd, 1972b). The technique involves a series of rules that can be
used to test individual relations so that a database can be normalized to any degree. When a requirement
is not met, the relation violating the requirement must be decomposed into relations that individually meet
the requirements of normalization.
Three normal forms were initially proposed called First Normal Form (1NF), second
Normal Form (2NF) and Third Normal Form (3NF). Normalization is often executed as a series of steps.
Each step corresponds to a specific normal form that has known properties. As normalization proceeds,
the relations become progressively more restricted (stronger) in format and also less vulnerable to update
anomalies. For the relational data model, it is important to recognize that it is only First Normal Form
(1NF) that is critical in creating relations; all subsequent normal forms are optional.

First Normal Form (1NF)


The state before normalization is known as unnormalized form. A relation is said to be in first normal
form if in each the intersection of each row and column contains one and only one value. The
process of normalization by first transferring the data from the source (for example, a standard data entry
form) into table format with rows and columns. In this format, the table is in Unnormalized Form and is
referred to as an unnormalized table.
To transform the unnormalized table to First Normal Form we identify and remove repeating groups
within the table. A repeating group is an attribute, or group of attributes, within a table that occurs with
multiple values for a single occurrence of the nominated key attribute(s) for that table. Note that in this
context, the term ‘key’ refers to the attribute(s) that uniquely identify each row within the unnormalized
table. There are two common approaches to removing repeating groups from unnormalized tables:
i. By entering appropriate data in the empty columns of rows containing the repeating data. In
other words, we fill in the blanks by duplicating the nonrepeating data, where required. This
approach is commonly referred to as ‘flattening’ the table.
ii. By placing the repeating data, along with a copy of the original key attribute(s), in a separate
relation. Sometimes the unnormalized table may contain more than one repeating group, or
repeating groups within repeating groups. In such cases, this approach is applied repeatedly until
no repeating groups remain. A set of relations is in 1NF if it contains no repeating groups.

32
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

For both approaches, the resulting tables are now referred to as 1NF relations containing atomic (or
single) values at the intersection of each row and column.

Second Normal Form (2NF)


Second Normal Form (2NF) is based on the concept of full functional dependency. In Second Normal
Form applies to relations with composite keys, that is, relations with a primary key composed of two or
more attributes. A relation with a single-attribute primary key is automatically in at least 2NF.
A relation is said to be in second normal form it is in First Normal Form and every non-primary-
key attribute is fully functionally dependent on the primary key.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial
dependency exists, we remove the partially dependent attribute(s) from the relation by placing them in a
new relation along with a copy of their determinant.

Third Normal Form (3NF)


Although 2NF relations have less redundancy than those in 1NF, they may still suffer from update
anomalies. A relation is said to be in third normal form it is in First and Second Normal Form and
in which no non-primary-key attribute is transitively dependent on the primary key.
The normalization of 2NF relations to 3NF involves the removal of transitive dependencies. If a transitive
dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the
attribute(s) in a new relation along with a copy of the determinant.

DATABASE SECURITY
Data is a valuable resource that must be strictly controlled and managed, as with any corporate resource.
Part or all of the corporate data may have strategic importance to an organization and should therefore be
kept secure and confidential.
The term database security refers to the protection of the database against unauthorized access, either
intentional or accidental. Security considerations apply not only to the data held in a database: breaches of
security may affect other parts of the system, which may in turn affect the database. Consequently,
database security encompasses hardware, software, people, and data. To effectively implement security
requires appropriate controls, which are defined in specific mission objectives for the system.
We consider database security in relation to the following situations:
 theft and fraud;
 loss of confidentiality (secrecy);
 loss of privacy;
 loss of integrity;
 Loss of availability.
These situations broadly represent areas in which the organization should seek to reduce risk that is the
possibility of incurring loss or damage. In some situations, these areas are closely related such that an
activity that leads to loss in one area may also lead to loss in another. In addition, events such as fraud or
loss of privacy may arise because of either intentional or unintentional acts, and do not necessarily result
in any detectable changes to the database or the computer system.
Theft and fraud affect not only the database environment but also the entire organization. As it is people
who perpetrate such activities, attention should focus on reducing the opportunities for this occurring.
Theft and fraud do not necessarily alter data, as is the case for activities that result in either loss of
confidentiality or loss of privacy.
Confidentiality refers to the need to maintain secrecy over data, usually only that which is critical to the
organization, whereas privacy refers to the need to protect data about individuals. Breaches of security
resulting in loss of confidentiality could, for instance, lead to loss of competitiveness, and loss of privacy
could lead to legal action being taken against the organization.

33
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Loss of data integrity results in invalid or corrupted data, which may seriously affect the operation of an
organization. Many organizations are now seeking virtually continuous operation, the so-called 24/7
availability (that is, 24 hours a day, 7 days a week).
Loss of availability means that the data, or the system, or both cannot be accessed, which can seriously
affect an organization’s financial performance. In some cases, events that cause a system to be
unavailable may also cause data corruption.
Database security aims to minimize losses caused by anticipated events in a cost effective manner without
unduly constraining the users. In recent times, computer-based criminal activities have significantly
increased and are forecast to continue to rise over the next few years.

Database Threats
This is any situation or event, whether intentional or accidental, that may adversely affect a system and
consequently the organization.
Any threat must be viewed as a potential breach of security which, if successful, will have a certain
impact. . They include: -
 Using another persons’ means of access.
 Unauthorized amendment or copying of data.
 Program alteration
 Inadequate policies and procedures that allow a mix of confidential and normal output.
 Wire tapping
 Illegal entry by hackers
 Inadequate staff training
 Introduction of viruses
 Failure of security mechanism giving greater access than normal.
 Blackmail
The extent that an organization suffers as a result of a threat succeeding depends upon a number of factors
such as countermeasures.

Countermeasures – Computer-Based
Controls
The following computer-based security controls for a multi-user environment (some of which may not be
available in the PC environment):
 authorization
 access controls
 views
 backup and recovery
 integrity
 encryption
 RAID technology

Authorization
This refers to the granting of a right or privilege that enables a subject to have legitimate access to a
system or a system’s object. Authorization controls can be built into the software, and govern not only
what system or object a specified user can access, but also what the user may do with it. The process of
authorization involves authentication of subjects requesting access to objects, where ‘subject’ represents a
user or program and ‘object’ represents a database table, view, procedure, trigger, or any other object that
can be created within the system.

34
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

Authentication
This is a mechanism that determines whether a user is who he or she claims to be. A system administrator
is usually responsible for allowing users to have access to a computer system by creating individual user
accounts. Each user is given a unique identifier, which is used by the operating system to determine who
they are. Associated with each identifier is a password, chosen by the user and known to the operating
system, which must be supplied to enable the operating system to verify (or authenticate) who the user
claims to be.
This procedure allows authorized use of a computer system but does not necessarily authorize access to
the DBMS or any associated application programs. A separate, similar procedure may have to be
undertaken to give a user the right to use the DBMS. The responsibility to authorize use of the DBMS
usually rests with the Database Administrator (DBA), who must also set up individual user accounts and
passwords using the DBMS itself.

Access Controls
The typical way to provide access controls for a database system is based on the granting and revoking of
privileges. A privilege allows a user to create or access (that is read, write, or modify) some database
object (such as a relation, view, or index) or to run certain DBMS utilities. Privileges are granted to users
to accomplish the tasks required for their jobs. As excessive granting of unnecessary privileges can
compromise security: a privilege should only be granted to a user if that user cannot accomplish his or her
work without that privilege. A user who creates a database object such as a relation or a view
automatically gets all privileges on that object. The DBMS subsequently keeps track of how these
privileges are granted to other users, and possibly revoked, and ensures that at all times only users with
necessary privileges can access an object.

Views
A view is the dynamic result of one or more relational operations operating on the base relations to
produce another relation. A view is a virtual relation that does not actually exist in the database, but is
produced upon request by a particular user, at the time of request. The view mechanism provides a
powerful and flexible security mechanism by hiding parts of the database from certain users. The user is
not aware of the existence of any attributes or rows that are missing from the view. A view can be defined
over several relations with a user being granted the appropriate privilege to use it, but not to use the base
relations. In this way, using a view is more restrictive than simply having certain privileges granted to a
user on the base relation(s).

Backup and Recovery


This is the process of periodically taking a copy of the database and log file (and possibly programs) on
to offline storage media. A DBMS should provide logging facilities, sometimes referred to as journaling,
which keep track of the current state of transactions and database changes, to provide support for recovery
procedures. The advantage of journaling is that, in the event of a failure, the database can be recovered to
its last known consistent state using a backup copy of the database and the information contained in the
log file.

Encryption
This is the encoding of the data by a special algorithm that renders the data unreadable by any program
without the decryption key.
If a database system holds particularly sensitive data, it may be deemed necessary to encode it as a
precaution against possible external threats or attempts to access it. Some DBMSs provide an encryption
facility for this purpose. The DBMS can access the data (after decoding it), although there is a
degradation in performance because of the time taken to decode it. Encryption also protects data

35
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

transmitted over communication lines. There are a number of techniques for encoding data to conceal the
information; some are termed ‘irreversible’ and others ‘reversible’. Irreversible techniques, as the name
implies, do not permit the original data to be known. However, the data can be used to obtain valid
statistical information. Reversible techniques are more commonly used. To transmit data securely over
insecure networks requires the use of a cryptosystem, which includes:
 an encryption key to encrypt the data (plaintext);
 an encryption algorithm that, with the encryption key, transforms the plaintext into ciphertext;
 a decryption key to decrypt the ciphertext;
 a decryption algorithm that, with the decryption key, transforms the ciphertext back into
plaintext.
One technique, called symmetric encryption, uses the same key for both encryption and decryption and
relies on safe communication lines for exchanging the key. However, most users do not have access to a
secure communication line and, to be really secure, the keys need to be as long as the message (Leiss,
1982). However, most working systems are based on user keys shorter than the message. One scheme
used for encryption is the Data Encryption Standard (DES), which is a standard encryption algorithm
developed by IBM. This scheme uses one key for both encryption and decryption, which must be kept
secret, although the algorithm need not be. The algorithm transforms each 64-bit block of plaintext using
a 56-bit key. The DES is not universally regarded as being very secure, and some authors maintain that a
larger key is required.

RAID (Redundant Array of Independent Disks)


The hardware that the DBMS is running on must be fault-tolerant, meaning that the DBMS should
continue to operate even if one of the hardware components fails. This suggests having redundant
components that can be seamlessly integrated into the working system whenever there is one or more
component failures. The main hardware components that should be fault-tolerant include disk drives, disk
controllers, CPU, power supplies, and cooling fans. Disk drives are the most vulnerable components with
the shortest times between failures of any of the hardware components.
One solution is the use of Redundant Array of Independent Disks (RAID) technology. RAID
originally stood for Redundant Array of Inexpensive Disks, but more recently the ‘I’ in RAID has come to
stand for Independent. RAID works on having a large disk array comprising an arrangement of several
independent disks that are organized to improve reliability and at the same time increase performance.
Performance is increased through data striping: the data is segmented into equal-size partitions (the
striping unit) which are transparently distributed across multiple disks. This gives the appearance of a
single large, fast disk where in actual fact the data is distributed across several smaller disks. Striping
improves overall I/O performance by allowing multiple I/Os to be serviced in parallel. At the same time,
data striping also balances the load among disks.

Transaction Management
Transaction Support
Transaction is an action, or series of actions, carried out by a single user or application program,
which reads or updates the contents of the database. For example an organization dealing with
property for rent has two relations in the DB; Staff and property for rent, then the following
transactions can take place: -

a) Read (Staff_No = x, salary)


Salary = salary * 1.1
Write (Staff_No = x, salary)
End

36
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

b) Delete (Staff_No = x)
For all property rent records, Pno
Begin
Read (propertyNo = Pno, Staff_No)
If (Staff_No = x) then
Begin
Staff_No = newstaffNo
Write (propertyNo = Pno, Staff_No)
End
End

Where the read and write operation on a data item x are denoted by read (x) and write (x); so read
(staffNo = x, salary) implies that we want to read the data item salary for the tuple with a primary key
value x. for the above example we have two database operations and a non-database operation (salary =
salary*1.1). again there is the delete operation of a staff member with the StaffNo x as well as deleting
this staff member we need to identify the properties from the property for rent which he managed and
replace him with a different member of staff. If all updates are not made the referential integrity will be
lost and the database will be in an inconsistent state. A transaction always transforms the Database from
one state to another although consistence may be violated while the transaction is in progress.

A transaction is a logical unit of work on the database. It may be an entire program, a part of a program,
or a single command (for example, the SQL command INSERT or UPDATE), and it may involve any
number of operations on the database. In the database context, the execution of an application program
can be thought of as one or more transactions with non-database processing taking place in between.
A transaction can have one of two outcomes. If it completes successfully, the transaction is said to be
committed and the database reaches a new consistent state. On the other hand, if the transaction does not
execute successfully, the transaction is aborted. If a transaction is aborted, the database must be restored
to the consistent state it was in before the transaction started. Such a transaction is rolled back or
undone. A committed transaction cannot be aborted.

Properties of Transactions
There are properties that all transactions should possess. The four basic or so-called ACID, properties of a
transaction are (Haerder and Reuter, 1983):
 Atomicity The ‘all or nothing’ property. A transaction is an indivisible unit that is either
performed in its entirety or is not performed at all. It is the responsibility of the recovery
subsystem of the DBMS to ensure atomicity.
 Consistency A transaction must transform the database from one consistent state to another
consistent state. It is the responsibility of both the DBMS and the application developers to ensure
consistency. The DBMS can ensure consistency by enforcing all the constraints that have been
specified on the database schema, such as integrity and enterprise constraints. However, in itself
this is insufficient to ensure consistency. For example, suppose we have a transaction that is
intended to transfer money from one bank account to another and the programmer makes an error
in the transaction logic and debits one account but credits the wrong account, then the database is
in an inconsistent state. However, the DBMS would not have been responsible for introducing
this inconsistency and would have had no ability to detect the error.
 Isolation Transactions execute independently of one another. In other words, the partial effects of
incomplete transactions should not be visible to other transactions. It is the responsibility of the
concurrency control subsystem to ensure isolation.

37
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

 Durability The effects of a successfully completed (committed) transaction are permanently


recorded in the database and must not be lost because of a subsequent failure. It is the
responsibility of the recovery subsystem to ensure durability.

Concurrency Control
Concurrency control is the process of managing simultaneous operations on the database without having
them interfere with one another.

The Need for Concurrency Control


A major objective in developing a database is to enable many users to access shared data concurrently.
Concurrent access is relatively easy if all users are only reading data, as there is no way that they can
interfere with one another. However, when two or more users are accessing the database simultaneously
and at least one is updating data, there may be interference that can result in inconsistencies.
However, although two transactions may be perfectly correct in themselves, the interleaving of operations
in this way may produce an incorrect result, thus compromising the integrity and consistency of the
database. We examine three examples of potential problems caused by concurrency: the lost update
problem, the uncommitted dependency problem, and the inconsistent analysis problem.

Lost Update
This occurs when an apparent successfully completed update operation by one user can be overridden by
another user. For example when transactions T1 and T2 are executing currently. Assuming they are
accessing the same account whose initial balance is Ksh 100 i. e. bal x; T1 is withdrawing Ksh.10 and T2
is depositing Ksh 100. On the start each is reads the balance is Ksh 100. T2 increases bal x to Ksh 200
and stores the updates in the DB. Meanwhile T1 decrements its copy by Ksh. 10 and stores the value
overwriting the previous update hence losing the Ksh. 100 which was previously added.
The loss of T2’s update is avoided by preventing T1 from reading the value of bal x until T2’s update has
been completed. Viz
T1 T2

t1 begin transaction
t2 begin transaction read (bal x)
t3 read (bal x) bal x = bal x + 100
t4 bal x = bal x – 10 write (bal x)
t5 write (bal x) commit
t6 commit

The Uncommitted Dependency


This occurs when one transaction is allowed to see the intermediate results of another transaction has been
committed. For example assuming T3 and T4 are correcting the previous mistake of T1 and T2. T4
updates the balance to Ksh 200 but it aborts the transaction so that the balance should be Ksh 100.
However by this time T3 has read the new bal x is Ksh 200 and using this value as basis of the Ksh 10
reduction giving the new incorrect bal x is Ksh 190 instead of 90.
This can be avoided by not allowing T3 to read the balance until T4 has been decided to be committed or
aborted. Viz
T3 T4
t1 begin transaction
t2 read (bal x)
t3 bal x = bal x + 100
t4 begin transaction write (bal x)
t5 read (bal x) .

38
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

t6 bal x = bal x – 10 rollback


t7 write (bal x)
t8 commit.

The Inconsistent Analysis Problem


This problem occurs when a transaction reads several values from the DB but the second transaction
updates some of them during the execution of the first. For example a transaction that is summarizing data
in the Db will obtain inaccurate results if while its executing other transactions are updating the DB. For
instance if T6 is totaling balances of account x (Ksh 100), account y (Ksh 50) and account z ( Ksh 25).
However in the meantime T5 has transferred Ksh 10 from bal x to bal z; so that T6 now has wrong result.
This can be prevented by avoiding T6 from reading bal x and bal z until T5 has completed its updates.
Viz
T5 T6
t1 begin transaction
t2 begin transaction sum = 0
t3 read(bal x) read (bal x)
t4 bal x = bal x – 10 sum = sum + bal x
t5 write (bal x) read (bal y)
t6 read (bal z) sum = sum + bal y
t7 bal z = bal z + 10 :
t8 write (bal z) :
t9 commit read (bal z)
t10 sum = sum + bal z
t11 commit

Serializability
This is a mechanism of identifying the transactions that are guaranteed to ensure consistency on DB when
executed together. The commonly used terms are: -
a. Schedule: This refers to the sequence of the operations by a set of concurrent transactions that
preserves the order of the operations in each of the individual transactions.
b. Serial schedule: This is a schedule where the operations of each transaction are executed
consecutively without any interleaved operation s from other transaction.
c. Non - serial schedule: This is a schedule where no operations from a set of concurrent
transactions are interleaved.
Recoverability
Serializability identifies schedules that maintain the consistency of the database, assuming that none of
the transactions in the schedule fails. An alternative perspective examines the recoverability of
transactions within a schedule. If a transaction fails, the atomicity property requires that we undo the
effects of the transaction. In addition, the durability property states that once a transaction commits, its
changes cannot be undone (without running another, compensating, transaction).

Concurrency control techniques


Serializability can be achieved in several ways. There are two main concurrency control techniques that
allow transactions to execute safely in parallel subject to certain constraints: locking and timestamp
methods.

Locking Methods
This is a procedure used to control concurrent access to data. When one transaction is accessing the
database, a lock may deny access to other transactions to prevent incorrect results. Locking methods are

39
INTE 121: FUNDAMENTALS OF DATABASE MANAGEMENT SYSTEMS

the most widely used approach to ensure serializability of concurrent transactions. There are several
variations, but all share the same fundamental characteristic, namely that a transaction must claim a
shared (read) or exclusive (write) lock on a data item before the corresponding database read or write
operation. The lock prevents another transaction from modifying the item or even reading it, in the case
of an exclusive lock. Data items of various sizes, ranging from the entire database down to a field, may be
locked. The size of the item determines the fineness, or granularity, of the lock. The actual lock might be
implemented by setting a bit in the data item to indicate that portion of the database is locked, or by
keeping a list of locked parts of the database, or by other means.
Shared lock: If a transaction has a shared lock on a data item, it can read the item but not update it.
Exclusive lock: If a transaction has an exclusive lock on a data item, it can both read and update the item.
Since read operations cannot conflict, it is permissible for more than one transaction to hold shared locks
simultaneously on the same item. On the other hand, an exclusive lock gives a transaction exclusive
access to that item. Thus, as long as a transaction holds the exclusive lock on the item, no other
transactions can read or update that data item. Locks are used in the following way:
 Any transaction that needs to access a data item must first lock the item, requesting a shared lock
for read only access or an exclusive lock for both read and write access.
 If the item is not already locked by another transaction, the lock will be granted.
 If the item is currently locked, the DBMS determines whether the request is compatible with the
existing lock. If a shared lock is requested on an item that already has a shared lock on it, the
request will be granted; otherwise, the transaction must wait until the existing lock is released.
 A transaction continues to hold a lock until it explicitly releases it either during execution or
when it terminates (aborts or commits). It is only when the exclusive lock has been released that
the effects of the write operation will be made visible to other transactions.

40

You might also like