Fundamentals of Database System Chapter 1 (1)
Fundamentals of Database System Chapter 1 (1)
Fundamentals of Database System Chapter 1 (1)
Chapter Contents:
• Functions of DBMS
➢ Data: Known facts that can be recorded and that have implicit meaning.
➢ Database: Logically coherent collection of data with inherent meaning.
➢ Mini-world or universe of discourse (UoD): Some part of the real world about which
data is stored in a database. For example, student grades and transcripts at a university.
➢ Database management system (DBMS)
✓ A database management system (DBMS) is a collection of programs that enables
you to store, modify, and extract information from a database. There are many
different types of database management systems, ranging from small systems that
run on personal computers to huge systems that run on mainframes.
✓ The DBMS provides users and programmers with a systematic way to create,
retrieve, update and manage data.
✓ The DBMS manages three important things: the data, the database engine that
allows data to be accessed, locked and modified -- and the database schema, which
defines the database’s logical structure. These three foundational elements help
provide concurrency, security, data integrity and uniform administration
procedures.
➢ Database System: The DBMS software together with the data itself. Sometimes, the
applications are also included.
Users/Programmers
Application Programs/Queries
Software to Process
Queries/Programs
Software to Access
Stored Data
1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach: In the manual approach, data storage and retrieval follow the primitive
and traditional way of information handling where cards and paper are used for the purpose.
✓ Files for as many event and objects as the organization has been used to store
information.
2 Prepared & compiled by: Muluken W.
Fundamentals of Database System Chapter One
✓ Each of the files containing various kinds of information is labelled and stored in one or
more cabinets.
✓ The cabinets could be kept in safe places for security purpose based on the sensitivity of
the information contained in it.
✓ Insertion and retrieval are done by searching first for the right cabinet then for the right
file then the information.
✓ One could have an indexing system to facilitate access to the data
✓ Prone to error
✓ Difficult to update, retrieve, integrate
✓ You have the data but it is difficult to compile the information
✓ Limited to small size information
✓ Cross referencing is difficult
2. Traditional File Based Approach: After the introduction of Computer for data processing to
the business community, the need to use the device for data storage and processing increase.
There were, and still are, several computer applications with file-based processing used for the
purpose of data handling.
✓ File based systems were an early attempt to computerize the manual filing system.
✓ This approach is the decentralized computerized data handling method.
✓ A collection of application programs performs services for the end-users. In such
systems, every application program that provides service to end users define and manage
its own data.
✓ Such systems have number of programs for each of the different applications in the
organization.
✓ Since every application defines and manages its own data, the system is subjected to
serious data duplication problem.
✓ File, in traditional file-based approach, is a collection of records which contains
logically related data.
The most significant problem experienced by the traditional file-based approach of data handling
is the “update anomalies”. We have three types of update anomalies;
1. Modification Anomalies: a problem experienced when one or more data value is modified on
one application program but not on others containing the same data set.
2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
3. Insertion Anomalies: a problem encountered where one cannot decide whether the data to be
inserted is valid and consistent with other similar data set.
3. Database Approach
Following a famous paper written by Ted Codd in 1970, database systems changed significantly.
Codd proposed that database systems should present the user with a view of data organized as
tables called relations. Behind the scenes, there might be a complex data structure that allowed
rapid response to a variety of queries. But, unlike the user of earlier database systems, the user of
a relational system would not be concerned with the storage structure. Queries could be
expressed in a very high-level language, which greatly increased the efficiency of database
programmers. The database approach emphasizes the integration and sharing of data throughout
the organization.
✓ Improved accessibility of data: by using structured query languages, the users can easily
access data without programming experience.
✓ Redundancy can be reduced: isolated data is integrated in database to decrease the
redundant data stored at different applications.
✓ Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the
data in the database to some extent.
✓ Transaction support can be provided: basic demands of any transaction support systems
are implanted in a full-scale DBMS.
✓ Improved decision support: the database will provide information useful for decision
making.
✓ Speed: data storage and retrieval are fast as it will be using the modern fast computer
systems.
✓ Less labor: unlike the other data handling methods, data maintenance will not demand
much resource.
✓ Centralized information control: Since relevant data in the organization will be stored at
one repository, it can be controlled and managed at the central level.
Notice that we can apply similar controls to the DBMS software. For example, only the DBA's
staff may be allowed to use certain privileged software, such as the software for creating new
accounts.
3. DBMS provide backup and recovery. A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is responsible
for recovery. For example, if the computer system fails in the middle of a complex update
transaction, the recovery subsystem is responsible for making sure that the database is restored to
the state it was in before the transaction started executing. Alternatively, the recovery subsystem
could ensure that the transaction is resumed from the point at which it was interrupted so that its
full effect is recorded in the database.
4. DBMS provide multiple user interfaces.
Because many types of users with varying levels of technical knowledge use a database, a
DBMS should provide a variety of user interfaces. These include query languages for casual
users, programming language interfaces for application programmers, forms and command codes
for parametric users, and menu-driven interfaces and natural language interfaces for stand-alone
users.
As a summery, in a File System, files are used to store data while, a database is a collection
of organized data. Although File System and databases are two ways of managing data,
databases clearly have many advantages over File Systems. Typically, when using a File System,
most tasks such as storage, retrieval and search are done manually (even though most operating
systems provide graphical interfaces to make these tasks easier) and it is quite tedious whereas
when using a database, the inbuilt DBMS will provide automated methods to complete these
tasks. Because of this reason, using a File System will lead to problems like data integrity, data
inconsistency and data security, but these problems could be avoided by using a database. Unlike
a File System, databases are efficient because reading line by line is not required, and certain
control mechanisms are in place.
Roles in Database Design and Use
As people are one of the components in DBMS environment, there are group of roles played by
different stakeholders of the designing and operation of a database system.
We can have further classifications of this role in big organizations having huge amount of data
and user requirement.
We have two distinctions of database designers, one involving in the logical and conceptual
design and another involving in physical design.
✓ Sees the database independent of any data model at conceptual level and consider
one specific data model at logical design phase.
➢ Physical DBD
✓ Take logical design specification as input and decide how it should be physically
realized.
✓ Map the logical data model on the specified DBMS with respect to tables and
integrity constraints. (DBMS dependent designing)
✓ Select specific storage structure and access path to the database
✓ Design security measures required on the database
3. Application Programmer and Systems Analyst
✓ System analyst determines the user requirement and how the user wants to view
the database.
✓ The application programmer implements these specifications as programs; code,
test, debug, document and maintain the application program.
✓ Determines the interface on how to retrieve, insert, update and delete data in the
database.
✓ The application could use any high-level programming language according to the
availability, the facility and the required service.
4. End Users
Workers, whose job requires accessing the database frequently for various purpose. There are
different group of users in this category.
➢ Naïve Users:
✓ Sizable proportion of users
✓ Unaware of the DBMS
✓ Only access the database based on their access level and demand
✓ Use standard and pre-specified types of queries.
➢ Sophisticated Users
✓ Are users familiar with the structure of the Database and facilities of the DBMS.
✓ Have complex requirements
✓ Have higher level queries
These users can be again classified as “Actors on the Scene” and “Workers Behind the Scene”.
Actors On the Scene:
✓ Data Administrator
✓ Database Administrator
✓ Database Designer
✓ End Users
✓ DBMS designers and implementers: who design and implement different DBMS
software.
✓ Tool Developers: experts who develop software packages that facilitates database
system designing and use. Prototype, simulation, code generator developers could be an
example. Independent software vendors could also be categorized in this group.
✓ Operators and Maintenance Personnel: system administrators who are responsible for
actually running and maintaining the hardware and software of the database system and
the information technology facilities.
The ANSI/SPARC architecture and framework have been widely applied in relational databases.
Information at the levels is represented by the internal, conceptual, and external schemas.
The goal of the three-schema architecture is to separate the user applications from the physical
database. The three-level ANSI-SPARC architecture has an important place in database
technology development because it clearly separates the users’ external level, the database’s
conceptual level, and the internal storage level for designing a database. In a DBMS based on the
three-schema architecture, each user group refers to its own external schema.
External Layer: The user’s view of the database contains data representations that may
represent higher-level abstractions than how data are actually stored in the database. An example
is an application that produces dynamic graphs based on numerical data from the database. The
user would see illustrations but the actual data would be numbers.
Conceptual Layer: The conceptual layer defines how the data is stored within the entire
database and provides a global view of data. Relationships between data are defined here as are
the conceptual models for database data. This layer is still independent of the hardware of
software.
Internal Layer: The internal layer defines how the database is represented, physically, on the
system. This includes partitioning, indexing, and OS-related accommodations that may be
required. This level involves the least degree of abstraction and is accessed by Database
Management System (DBMS) developers, OS developers, and etc.
Data independence in DBMS means the ability to change a schema to one level without
affecting the other schemas. It is when the schema is changed at some level, the schema at the
next higher level remains unchanged but only the mapping between the two levels is changed.
There are two types of data independence namely logical data independence and physical data
independence.
Logical data independence is the ability to change the conceptual schema without having to
change external schemas. A database administrator may change the conceptual schema to expand
the database. e.g. by adding a record type or data item. Only the view definition and the
mappings need to be changed in a DBMS that supports logical data independence. After the
conceptual schema undergoes a logical reorganization, application programs that reference the
external schema constructs must work as before.
Physical data independence is the ability to change the internal schema without having to
change the conceptual schema. Therefore, the external schemas need not be changed as well. It
allows users to frame queries in terms of the logical structure of the data, letting a query
processor automatically translate them into optimal plans that access physical storage structures.
For example, creating additional access structures to improve the performance of retrieval or
update in a database must not affect the users provided the same data as before remains in the
database.
In general, physical data independence exists in most databases and file environments where
physical details such as the exact location of data on disk, and hardware details of storage
encoding, placement, compression, splitting, merging of records, and so on are hidden from the
user.
Functionality of DBMS
Database Management System (DBMS) is a Software package used for providing EFFICIENT,
CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or
even same data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT
(data outlives programs that operate on it) data. A DBMS also provides a systematic method for
creating, updating, storing, retrieving data in a database. DBMS also provides the service of
controlling data access, enforcing data integrity, managing concurrency control, and recovery.
Having this in mind, a full-scale DBMS should at least have the following services to provide to
the user.
A specific DBMS has its own specific Data Definition Language, but this type of language is too
low level to describe the data requirements of an organization in a way that is readily
understandable by a variety of users. We need a higher-level language. Such a higher-level is
called data-model.
Data Model: a set of concepts to describe the structure of a database, and certain constraints that
the database should obey. A data model is a description of the way that data is stored in a
database. Data model helps to understand the relationship between entities and to create the most
effective structure to hold data.
✓ Data
✓ Data relationships
✓ Data semantics
✓ Data constraints
The main purpose of Data Model is to represent the data in an understandable way. Categories of
data models include:
✓ Object-based
✓ Record-based
✓ Physical
❖ Hierarchical Model
❖ Network Model
❖ Relational Data Model
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upside-down tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many
• Relation is established by creating physical link between stored records (each is stored
with a predefined access path to other records)
• To add new record type or relationship, the database must be redefined and then stored
in a new form.
✓ Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET
NEXT WITHIN PARENT etc.
✓ Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc.
✓ Programmers can do optimal navigation through the database.
The rows represent records (collections of information about separate items). The columns
represent fields (particular attributes of a record). Conducts searches by using data in specified
columns of one table to find additional data in another table. In conducting searches, a relational
database matches information from a field in one table with information in a corresponding field
of another table to produce a third table that combines requested data from both tables.
Database Languages
✓ Allows DBA or user to describe and name entitles, attributes and relationships required
for the application.
✓ Specification notation for defining the database schema
✓ Query Languages
✓ Forms Generators
✓ Report Generators
✓ Graphics Generators
✓ Application Generators
•Enterprise Information
•Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).
•Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.
•Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.
– If the database system is not able to handle the complexity of data because of
modeling limitations
– If the database users need special operations not supported by the DBMS.
Chapter 1: End