Fundamentals of Database System Chapter 1 (1)

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Fundamentals of Database System Chapter One

Chapter One: Introduction

Chapter Contents:

• Data Handling Approach

• Roles in Database Design & Development

• The ANSI-SPARC Architecture

• Functions of DBMS

• Data models and conceptual models

• Database Languages (DDL, DML, DCL)

Common Terms in a Database

➢ Data: Known facts that can be recorded and that have implicit meaning.
➢ Database: Logically coherent collection of data with inherent meaning.
➢ Mini-world or universe of discourse (UoD): Some part of the real world about which
data is stored in a database. For example, student grades and transcripts at a university.
➢ Database management system (DBMS)
✓ A database management system (DBMS) is a collection of programs that enables
you to store, modify, and extract information from a database. There are many
different types of database management systems, ranging from small systems that
run on personal computers to huge systems that run on mainframes.
✓ The DBMS provides users and programmers with a systematic way to create,
retrieve, update and manage data.
✓ The DBMS manages three important things: the data, the database engine that
allows data to be accessed, locked and modified -- and the database schema, which
defines the database’s logical structure. These three foundational elements help
provide concurrency, security, data integrity and uniform administration
procedures.
➢ Database System: The DBMS software together with the data itself. Sometimes, the
applications are also included.

1 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Users/Programmers

Application Programs/Queries

Software to Process
Queries/Programs

Software to Access
Stored Data

Stored Database Stored


Definition
(Metadata) Database

Figure 1: The database system environment

Data Handling Approach


Data management passes through the different levels of development along with the
development in technology and services. These levels could best be described by categorizing
the levels into three levels of development. Even though there is an advantage and a problem
overcome at each new level, all methods of data handling are in use to some extent.

The major three levels are;

1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach: In the manual approach, data storage and retrieval follow the primitive
and traditional way of information handling where cards and paper are used for the purpose.

✓ Files for as many event and objects as the organization has been used to store
information.
2 Prepared & compiled by: Muluken W.
Fundamentals of Database System Chapter One

✓ Each of the files containing various kinds of information is labelled and stored in one or
more cabinets.
✓ The cabinets could be kept in safe places for security purpose based on the sensitivity of
the information contained in it.
✓ Insertion and retrieval are done by searching first for the right cabinet then for the right
file then the information.
✓ One could have an indexing system to facilitate access to the data

Limitations of the Manual approach

✓ Prone to error
✓ Difficult to update, retrieve, integrate
✓ You have the data but it is difficult to compile the information
✓ Limited to small size information
✓ Cross referencing is difficult

2. Traditional File Based Approach: After the introduction of Computer for data processing to
the business community, the need to use the device for data storage and processing increase.
There were, and still are, several computer applications with file-based processing used for the
purpose of data handling.

✓ File based systems were an early attempt to computerize the manual filing system.
✓ This approach is the decentralized computerized data handling method.
✓ A collection of application programs performs services for the end-users. In such
systems, every application program that provides service to end users define and manage
its own data.
✓ Such systems have number of programs for each of the different applications in the
organization.
✓ Since every application defines and manages its own data, the system is subjected to
serious data duplication problem.
✓ File, in traditional file-based approach, is a collection of records which contains
logically related data.

3 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Limitations of the Traditional File Based approach:

✓ Separation or Isolation of Data: Available information in one application may not be


known.
✓ Limited data sharing
✓ Lengthy development and maintenance time
✓ Duplication or redundancy of data
✓ Data dependency on the application
✓ Incompatible file formats between different applications and programs creating
inconsistency.
✓ Fixed query processing which is defined during application development

The most significant problem experienced by the traditional file-based approach of data handling
is the “update anomalies”. We have three types of update anomalies;

1. Modification Anomalies: a problem experienced when one or more data value is modified on
one application program but not on others containing the same data set.

2. Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.

3. Insertion Anomalies: a problem encountered where one cannot decide whether the data to be
inserted is valid and consistent with other similar data set.

3. Database Approach

Following a famous paper written by Ted Codd in 1970, database systems changed significantly.
Codd proposed that database systems should present the user with a view of data organized as
tables called relations. Behind the scenes, there might be a complex data structure that allowed
rapid response to a variety of queries. But, unlike the user of earlier database systems, the user of
a relational system would not be concerned with the storage structure. Queries could be
expressed in a very high-level language, which greatly increased the efficiency of database
programmers. The database approach emphasizes the integration and sharing of data throughout
the organization.

4 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Thus, in Database Approach:

✓ Database is just a computerized record keeping system or a kind of electronic filing


cabinet.
✓ Database is a repository for collection of computerized data files.
✓ Database is a shared collection of logically related data designed to meet the
information needs of an organization. Since it is a shared corporate resource, the
database is integrated with minimum amount of or no duplication.
✓ Database is a collection of logically related data where these logically related data
comprise entities, attributes, relationships, and business rules of an organization's
information.
✓ In addition to containing data required by an organization, database also contains a
description of the data which called as “Metadata” or “Data Dictionary” or “Systems
Catalogue” or “Data about Data”.
✓ Since a database contains information about the data (metadata), it is called a self-
descriptive collection on integrated records.
✓ The purpose of a database is to store information and to allow users to retrieve and
update that information on demand.
✓ Database is deigned once and used simultaneously by many users.
✓ Unlike the traditional file-based approach in database approach there is program data
independence. That is the separation of the data definition from the application. Thus,
the application is not affected by changes made in the data structure and file
organization.
✓ Each database application will perform the combination of: Creating database, Reading,
Updating and Deleting data.

Main Characteristics of the Database Approach

✓ Improved accessibility of data: by using structured query languages, the users can easily
access data without programming experience.
✓ Redundancy can be reduced: isolated data is integrated in database to decrease the
redundant data stored at different applications.

5 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

✓ Inconsistency can be avoided: controlled data redundancy will avoid inconsistency of the
data in the database to some extent.
✓ Transaction support can be provided: basic demands of any transaction support systems
are implanted in a full-scale DBMS.
✓ Improved decision support: the database will provide information useful for decision
making.
✓ Speed: data storage and retrieval are fast as it will be using the modern fast computer
systems.
✓ Less labor: unlike the other data handling methods, data maintenance will not demand
much resource.
✓ Centralized information control: Since relevant data in the organization will be stored at
one repository, it can be controlled and managed at the central level.

Difference between File processing system and DBMS (Database):


1. Redundancy is control in DBMS, but not in file system
Redundancy is storing the same data multiple times and different place that leads to varies
problems. First, there is the need to perform a single logical update-such as entering data on a
new student- multiple times: once for each file where student data is recorded. This leads to
duplication of effort. Second, storage space is wasted when the same data is stored repeatedly,
and this problem may be serious for large databases. Third, files that represent the same data may
become inconsistent. This may happen because an update is applied to some of the files but not
to others. Even if an update-such as adding a new student-is applied to all the appropriate files,
the data concerning the student may still be inconsistent because the updates are applied
independently by each user group. For example, one user group may enter a student's birth date
erroneously as JAN-19-1984, whereas the other user groups may enter the correct value of JAN-
29-1984.
2. Unauthorized access is restricted in DBMS but not in file system.
When multiple users share a large database, it is likely that most users will not be authorized to
access all information in the database. Typically, users or user groups are given account numbers
protected by passwords, which they can use to gain access to the database. A DBMS should
provide a security and authorization subsystem, which the DBA uses to create accounts and to
specify account restrictions. The DBMS should then enforce these restrictions automatically.
6 Prepared & compiled by: Muluken W.
Fundamentals of Database System Chapter One

Notice that we can apply similar controls to the DBMS software. For example, only the DBA's
staff may be allowed to use certain privileged software, such as the software for creating new
accounts.
3. DBMS provide backup and recovery. A DBMS must provide facilities for recovering from
hardware or software failures. The backup and recovery subsystem of the DBMS is responsible
for recovery. For example, if the computer system fails in the middle of a complex update
transaction, the recovery subsystem is responsible for making sure that the database is restored to
the state it was in before the transaction started executing. Alternatively, the recovery subsystem
could ensure that the transaction is resumed from the point at which it was interrupted so that its
full effect is recorded in the database.
4. DBMS provide multiple user interfaces.
Because many types of users with varying levels of technical knowledge use a database, a
DBMS should provide a variety of user interfaces. These include query languages for casual
users, programming language interfaces for application programmers, forms and command codes
for parametric users, and menu-driven interfaces and natural language interfaces for stand-alone
users.
As a summery, in a File System, files are used to store data while, a database is a collection
of organized data. Although File System and databases are two ways of managing data,
databases clearly have many advantages over File Systems. Typically, when using a File System,
most tasks such as storage, retrieval and search are done manually (even though most operating
systems provide graphical interfaces to make these tasks easier) and it is quite tedious whereas
when using a database, the inbuilt DBMS will provide automated methods to complete these
tasks. Because of this reason, using a File System will lead to problems like data integrity, data
inconsistency and data security, but these problems could be avoided by using a database. Unlike
a File System, databases are efficient because reading line by line is not required, and certain
control mechanisms are in place.
Roles in Database Design and Use

As people are one of the components in DBMS environment, there are group of roles played by
different stakeholders of the designing and operation of a database system.

7 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

1. Database Administrator (DBA)


✓ Responsible to oversee, control and manage the database resources (the database
itself, the DBMS and other related software)
✓ Authorizing access to the database
✓ Coordinating and monitoring the use of the database
✓ Responsible for determining and acquiring hardware and software resources
✓ Accountable for problems like poor security, poor performance of the system
✓ Involves in all steps of database development

We can have further classifications of this role in big organizations having huge amount of data
and user requirement.

✓ Data Administrator (DA): is responsible on management of data resources. Involves in


database planning, development, maintenance of standards policies and procedures at the
conceptual and logical design phases.
✓ Database Administrator (DBA): is more technically oriented role. Responsible for the
physical realization of the database. Involves in physical design, implementation,
security and integrity control of the database.
2. Database Designer (DBD)
✓ Identifies the data to be stored and choose the appropriate structures to represent
and store the data.
✓ Should understand the user requirement and should choose how the user views
the database.
✓ Involve on the design phase before the implementation of the database system

We have two distinctions of database designers, one involving in the logical and conceptual
design and another involving in physical design.

➢ Logical and Conceptual DBD


✓ Identifies data (entity, attributes and relationship) relevant to the organization
✓ Identifies constraints on each data
✓ Understand data and business rules in the organization

8 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

✓ Sees the database independent of any data model at conceptual level and consider
one specific data model at logical design phase.
➢ Physical DBD
✓ Take logical design specification as input and decide how it should be physically
realized.
✓ Map the logical data model on the specified DBMS with respect to tables and
integrity constraints. (DBMS dependent designing)
✓ Select specific storage structure and access path to the database
✓ Design security measures required on the database
3. Application Programmer and Systems Analyst
✓ System analyst determines the user requirement and how the user wants to view
the database.
✓ The application programmer implements these specifications as programs; code,
test, debug, document and maintain the application program.
✓ Determines the interface on how to retrieve, insert, update and delete data in the
database.
✓ The application could use any high-level programming language according to the
availability, the facility and the required service.
4. End Users

Workers, whose job requires accessing the database frequently for various purpose. There are
different group of users in this category.

➢ Naïve Users:
✓ Sizable proportion of users
✓ Unaware of the DBMS
✓ Only access the database based on their access level and demand
✓ Use standard and pre-specified types of queries.
➢ Sophisticated Users
✓ Are users familiar with the structure of the Database and facilities of the DBMS.
✓ Have complex requirements
✓ Have higher level queries

9 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

✓ Are most of the time engineers, scientists, business analysts, etc


➢ Casual Users
✓ Users who access the database occasionally.
✓ Need different information from the database each time.
✓ Use sophisticated database queries to satisfy their needs.
✓ Are most of the time middle to high level managers.

These users can be again classified as “Actors on the Scene” and “Workers Behind the Scene”.
Actors On the Scene:

✓ Data Administrator
✓ Database Administrator
✓ Database Designer
✓ End Users

Workers Behind the Scene

✓ DBMS designers and implementers: who design and implement different DBMS
software.
✓ Tool Developers: experts who develop software packages that facilitates database
system designing and use. Prototype, simulation, code generator developers could be an
example. Independent software vendors could also be categorized in this group.
✓ Operators and Maintenance Personnel: system administrators who are responsible for
actually running and maintaining the hardware and software of the database system and
the information technology facilities.

The ANSI-SPARC Architecture


The American National Standards Institute (ANSI) Standards Planning and Requirements
Committee (SPARC) produced the ANSI-SPARC architecture in 1975 (ANSI, 1975). It is a
three-level schema architecture which have the physical, logical and external levels.

The ANSI/SPARC architecture and framework have been widely applied in relational databases.
Information at the levels is represented by the internal, conceptual, and external schemas.

10 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

The goal of the three-schema architecture is to separate the user applications from the physical
database. The three-level ANSI-SPARC architecture has an important place in database
technology development because it clearly separates the users’ external level, the database’s
conceptual level, and the internal storage level for designing a database. In a DBMS based on the
three-schema architecture, each user group refers to its own external schema.

External Layer: The user’s view of the database contains data representations that may
represent higher-level abstractions than how data are actually stored in the database. An example
is an application that produces dynamic graphs based on numerical data from the database. The
user would see illustrations but the actual data would be numbers.

Conceptual Layer: The conceptual layer defines how the data is stored within the entire
database and provides a global view of data. Relationships between data are defined here as are
the conceptual models for database data. This layer is still independent of the hardware of
software.

Internal Layer: The internal layer defines how the database is represented, physically, on the
system. This includes partitioning, indexing, and OS-related accommodations that may be
required. This level involves the least degree of abstraction and is accessed by Database
Management System (DBMS) developers, OS developers, and etc.

11 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Data independence in DBMS means the ability to change a schema to one level without
affecting the other schemas. It is when the schema is changed at some level, the schema at the
next higher level remains unchanged but only the mapping between the two levels is changed.
There are two types of data independence namely logical data independence and physical data
independence.

Logical data independence is the ability to change the conceptual schema without having to
change external schemas. A database administrator may change the conceptual schema to expand
the database. e.g. by adding a record type or data item. Only the view definition and the
mappings need to be changed in a DBMS that supports logical data independence. After the
conceptual schema undergoes a logical reorganization, application programs that reference the
external schema constructs must work as before.

Physical data independence is the ability to change the internal schema without having to
change the conceptual schema. Therefore, the external schemas need not be changed as well. It
allows users to frame queries in terms of the logical structure of the data, letting a query
processor automatically translate them into optimal plans that access physical storage structures.
For example, creating additional access structures to improve the performance of retrieval or
update in a database must not affect the users provided the same data as before remains in the
database.

12 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

In general, physical data independence exists in most databases and file environments where
physical details such as the exact location of data on disk, and hardware details of storage
encoding, placement, compression, splitting, merging of records, and so on are hidden from the
user.

Functionality of DBMS

Database Management System (DBMS) is a Software package used for providing EFFICIENT,
CONVENIENT and SAFE MULTI-USER (many people/programs accessing same database, or
even same data, simultaneously) storage of and access to MASSIVE amounts of PERSISTENT
(data outlives programs that operate on it) data. A DBMS also provides a systematic method for
creating, updating, storing, retrieving data in a database. DBMS also provides the service of
controlling data access, enforcing data integrity, managing concurrency control, and recovery.
Having this in mind, a full-scale DBMS should at least have the following services to provide to
the user.

1. Data storage, retrieval and update in the database


2. A user accessible catalogue
3. Transaction support service: ALL or NONE transaction, which minimize data
inconsistency. 4. Concurrency Control Services: access and update on the database by
different users simultaneously should be implemented correctly.
5. Recovery Services: a mechanism for recovering the database after a failure must be
available. 6. Authorization Services (Security): must support the implementation of
access and authorization service to database administrator and users.
7. Support for Data Communication: should provide the facility to integrate with data
transfer software or data communication managers.
8. Integrity Services: rules about data and the change that took place on the data,
correctness and consistency of stored data, and quality of data based on business
constraints.
9. Services to promote data independency between the data and the application
10. Utility services: sets of utility service facilities like Importing data, Statistical
analysis support, Index reorganization and Garbage collection.

13 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Data models and conceptual models

A specific DBMS has its own specific Data Definition Language, but this type of language is too
low level to describe the data requirements of an organization in a way that is readily
understandable by a variety of users. We need a higher-level language. Such a higher-level is
called data-model.

Data Model: a set of concepts to describe the structure of a database, and certain constraints that
the database should obey. A data model is a description of the way that data is stored in a
database. Data model helps to understand the relationship between entities and to create the most
effective structure to hold data.

Data Model is a collection of tools or concepts for describing

✓ Data
✓ Data relationships
✓ Data semantics
✓ Data constraints

The main purpose of Data Model is to represent the data in an understandable way. Categories of
data models include:

✓ Object-based
✓ Record-based
✓ Physical

Record-based Data Models

✓ Consist of a number of fixed format records.


✓ Each record type defines a fixed number of fields,
✓ Each field is typically of a fixed length.
- Relational Data Model
- Network Data Model
- Hierarchical Data Model

14 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

We have three major types of data models

❖ Hierarchical Model
❖ Network Model
❖ Relational Data Model
1. Hierarchical Model
• The simplest data model
• Record type is referred to as node or segment
• The top node is the root node
• Nodes are arranged in a hierarchical structure as sort of upside-down tree
• A parent node can have more than one child node
• A child node can only have one parent node
• The relationship between parent and child is one-to-many
• Relation is established by creating physical link between stored records (each is stored
with a predefined access path to other records)
• To add new record type or relationship, the database must be redefined and then stored
in a new form.

ADVANTAGES of Hierarchical Data Model:

✓ Hierarchical Model is simple to construct and operate on


✓ Corresponds to a number of natural hierarchically organized domains - e.g., assemblies
in manufacturing, personnel organization in companies

15 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

✓ Language is simple; uses constructs like GET, GET UNIQUE, GET NEXT, GET
NEXT WITHIN PARENT etc.

DISADVANTAGES of Hierarchical Data Model:

✓ Navigational and procedural nature of processing


✓ Database is visualized as a linear arrangement of records
✓ Little scope for "query optimization"
2. Network Model
• Allows record types to have more than one parent unlike hierarchical model
• A network data models sees records as set members
• Each set has an owner and one or more members
• Allow no many to many relationships between entities
• Like hierarchical model network model is a collection of physically linked records.
• Allow member records to have more than one owner

ADVANTAGES of Network Data Model:

✓ Network Model is able to model complex relationships and represents semantics of


add/delete on the relationships.
✓ Can handle most situations for modeling using record types and relationship types.

16 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

✓ Language is navigational; uses constructs like FIND, FIND member, FIND owner, FIND
NEXT within set, GET etc.
✓ Programmers can do optimal navigation through the database.

DISADVANTAGES of Network Data Model:

✓ Navigational and procedural nature of processing


✓ Database contains a complex array of pointers that thread through a set of records.
✓ Little scope for automated "query optimization”
3. Relational Data Model
• Developed by Dr. Edgar Frank Codd in 1970 (famous paper, 'A Relational Model for
Large Shared Data Banks')
• Terminologies originates from the branch of mathematics called set theory and relation
• Can define more flexible and complex relationship
• Viewed as a collection of tables called “Relations” equivalent to collection of record
types
• Relation: Two-dimensional table
• Stores information or data in the form of tables rows and columns
• A row of the table is called tuple equivalent to record
• A column of a table is called attribute equivalent to fields
• Data value is the value of the Attribute
• Records are related by the data stored jointly in the fields of records in two tables or
files. The related tables contain information that creates the relation
• The tables seem to be independent but are related somehow.
• No physical consideration of the storage is required by the user
• Many tables are merged together to come up with a new virtual view of the relationship

17 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

The rows represent records (collections of information about separate items). The columns
represent fields (particular attributes of a record). Conducts searches by using data in specified
columns of one table to find additional data in another table. In conducting searches, a relational
database matches information from a field in one table with information in a corresponding field
of another table to produce a third table that combines requested data from both tables.

Database Languages

Data Definition Language (DDL)

✓ Allows DBA or user to describe and name entitles, attributes and relationships required
for the application.
✓ Specification notation for defining the database schema

Data Manipulation Language (DML)

✓ Provides basic data manipulation operations on data held in the database.


✓ Language for accessing and manipulating the data organized by the appropriate data
model
✓ DML also known as query language
✓ Procedural DML: user specifies what data is required and how to get the data.
✓ Non-Procedural DML: user specifies what data is required but not how it is to be
retrieved.

SQL is the most widely used non-procedural language query language

Fourth Generation Language (4GL)

✓ Query Languages
✓ Forms Generators
✓ Report Generators
✓ Graphics Generators
✓ Application Generators

18 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

Limitations and risk of Database Approach

✓ Introduction of new professional and specialized personnel.


✓ Complexity in designing and managing data.
✓ The cost and risk during conversion from the old to the new system.
✓ High cost incurred to develop and maintain.
✓ Complex backup and recovery services from the user’s perspective.
✓ Reduced performance due to centralization.
✓ High impact on the system when failure occur.

Application Areas of Database System

Databases are widely used. Here are some representative applications:

•Enterprise Information

➢ Sales: For customer, product, and purchase information.


➢ Accounting: For payments, receipts, account balances, assets and other accounting
information.
➢ Human resources: For information about employees, salaries, payroll taxes, and
benefits, and for generation of pay checks.
➢ Manufacturing: For management of the supply chain and for tracking production
of items in factories, inventories of items in warehouses and stores, and orders for
items.
➢ Online retailers: For sales data noted above plus online order tracking, generation
of recommendation lists, and maintenance of online product evaluations.

•Banking and Finance

➢ Banking: For customer information, accounts, loans, and banking transactions.


➢ Credit card transactions: For purchases on credit cards and generation of monthly
statements.
➢ Finance: For storing information about holdings, sales, and purchases of financial
instruments such as stocks and bonds;

19 Prepared & compiled by: Muluken W.


Fundamentals of Database System Chapter One

•Universities: For student information, course registrations, and grades (in addition to standard
enterprise information such as human resources and accounting).

•Airlines: For reservations and schedule information. Airlines were among the first to use
databases in a geographically distributed manner.

•Telecommunication: For keeping records of calls made, generating monthly bills, maintaining
balances on prepaid calling cards, and storing information about the communication networks.

When not to use a DBMS

⚫ When a DBMS may be unnecessary:


– If the database and applications are simple, well defined, and not expected to
change.
– If there are stringent real-time requirements that may not be met because of
DBMS overhead.
– If access to data by multiple users is not required.

⚫ When no DBMS may suffice:

– If the database system is not able to handle the complexity of data because of
modeling limitations

– If the database users need special operations not supported by the DBMS.

Chapter 1: End

20 Prepared & compiled by: Muluken W.

You might also like