Database Concepts EGS2207

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

Database Concepts

EGS 2207

Database Concepts EGS2207


The world is increasingly
driven by data…

This class teaches the


basics of how to use &
manage data.
Key Questions We Will Answer
• How can we collect and store large amounts of data?

• How can we efficiently query data?

• How can we safely update data?


tion 1 > Logistics

EXPECTATIONS
• Attend lectures
• If you don’t, it’s at your own peril

• Be active and think critically


• Ask questions

• Do programming and homework projects


• Start early and be honest NO COPYING

• Study for tests and exams


Introduction: databases
Data hierarchy Definition Example

Bit a 0 or a 1 0 or 1

Byte a group of related bits A, B, 1, 2

Field a group of related bytes name

Record a group of related fields name, idnum,


rate
Introduction: databases cont
 We live in an information society that values data.
 Efficient data management typically requires the use of a
computer database.
 The term database is normally used vaguely.
 Database is a collection of computerised information
available to many people for various uses.
What is a database system?
 A database is a large, integrated collection of data
 A database contains a model of something!
 A database management system (DBMS) is a software
system designed to store, manage and facilitate access
to the database

Database Concepts EGS2207


What does a database system do?
 Manages Very Large Amounts of Data
 Supports efficient access to Very Large Amounts of Data
 Supports concurrent access to Very Large Amounts of
Data
 Supports secure, atomic access to Very Large Amounts
of Data

Database Concepts EGS2207


Cont
 It has two important properties:
 It is integrated
 It is shared
 Integration allows reduction of data redundancy and
facilitates data access.
 Sharing allows all users in an organisation to have access to
the same data for use in a variety of ways.
Features of data
 It is well organised
 It is related
 It is accessible in different orders without great difficulty.
 It is stored only once
Operational data
 Data is a valuable resource that is valued and must be
secured.
 In any modern organisation a large amount of data is
generated about its operations . This data is sometimes
referred to as operational data.
 Operational data:
 Includes data that an organisation must necessarily maintain
about its operation
 Includes relationships linking the basic entities.
 Excludes temporary results or any transient information.
Characteristics of the db approach
 A database application is a program (or a set of related
programs ) that is used to perform a set of database activities
(create, read, update and delete) on behalf of database users.
 Self describing nature of the database.
 Insulation between programs and, and data abstraction.
 Support of multiple views of data.
 Sharing of data and multi user transaction processing.
Self describing nature
 The db system not only the database itself but also a
complete definition or description of the database. The
description of the data is known as the system catalog /data
dictionary and it contains such information as the structure
of each file , the type and storage format of each data item.
The information in the catalog is called meta-data.
 The catalog is used by the DBMS and database users who
need information about the structure of the database.
Insulation between programs and
data, and data abstraction
 In traditional file processing the structure of the files is
embedded in the access programs , so any changes to the
structure of a file may require changing all programs that
access the file while in the db approach the structure of data
files is stored in the DBMS catalog separately from access
programs.
 This is called program-data independence.
Support of multiple views
 A database typically has many users each of which may
require a different perspective or view of the database.
 A view may be a subset of the database or it may contain
virtual data that is derived from the database but is not
explicitly stored in the database
 Some users may not need be aware whether data that they
refer to is stored or derived.
Sharing of data and multi user
transaction processing
 The DBMS must allow users to access the db at the same
time.
 The DBMS must include concurrency control software to
make sure that the data in the db is updated in an orderly
manner.
The DBMS
 A collection of programs that manages the db structure and
controls access to the data stored in the db.
 Serves as an intermediary between the user and the
database.
 Reduces complexity when accessing data.
Advantages of DBMS
 Improved data sharing. The DBMS helps create an
environment in which end users have better access to more
and better-managed data.
 Improved data security. provides a framework for better
enforcement of data privacy and security policies.
 Better data integration. Wider access to well-managed data
promotes an integrated view of the organization’s operations
and a clearer view of the big picture.
cont
 Minimized data inconsistency. Data inconsistency exists
when different versions of the same data appear in different
places.
 Improved data access. The DBMS makes it possible to
produce quick answers to ad hoc queries.
 Improved decision making. Better-managed data and
improved data access make it possible to generate better-
quality information, on which better decisions are based.
 Increased end-user productivity. The availability of data,
combined with the tools that transform data into usable
information, empowers end users to make quick, informed
decisions.
Traditional file system vs database
system
 File based system :
 A collection of application programs that perform services for
the end users(eg reports).
 Each user defines and implements the files needed for each
specific application –there is redundancy in defining and
storing data which in turn results in wasted storage space.
Limitations of the file based
approach
 Separation and isolation of data:
 each program maintains its own set of data
 Users of one program may be unaware of potentially useful
data held by other programs.
 Duplication of data (redundancy):
 Same data is held by different programs
 Wasted space and potentially different values and/or different
formats for the same item-> leads to data inconsistency.
Limitations of the file based
approach 1
 Data dependency/application program dependency:
 File structure / format and records are defined in the program/
application code.
 Changes in formats must be reflected in the code
 Time consuming and error prone tasks.
 Incompatible file formats :
 Programs are written in different languages , and so can not
easily access each other’s file’s , rather files written in different
programming languages cannot be readily be combined or
compared.
Limitations of the file based
approach
 Separation and isolation of data:
 each program maintains its own set of data
 Users of one program may be unaware of potentially useful
data held by other programs.
 Duplication of data (redundancy):
 Same data is held by different programs
 Wasted space and potentially different values and/or different
formats for the same item-> leads to data inconsistency.
Disadvantages of using DBMS
 A database system generally provides online access to the
database for many users. In contrast, a conventional system
is often designed to meet a specific need and therefore
generally provides access to only a small number of users.
 Because of the larger number of users accessing the data
when a db is used, the enterprise may involve additional
risks as compared to conventional data processing system in
the following areas:
Disadvantages of using DBMS1
 Confidentiality , privacy and security.
 Data quality
 Data integrity
 Enterprise vulnerability may be higher
 The cost of using DBMS
Confidentiality ,privacy and
security
 When information is centralised and is made available to
users from remote locations , the possibilities of abuse are
often more than in conventional data processing system.
 To reduce the chances of unauthorised users accessing
sensitive information, it is necessary to take technical ,
administrative and possibly legal measures.
Data quality
 With increased number of users accessing data directly ,
there are enormous opportunities for users to damage the
data.
 Unless there are suitable controls, the data quality may be
compromised.
Data integrity
 The main threat to data integrity comes from several
different users attempting to update the data at the same
time.
 The database therefore needs to be protected from
inadvertent changes by the users.
Enterprise vulnerability
 Centralising all data of an enterprise in one database may
mean that the database becomes an indispensable resource.
 The survival of the enterprise may depend on reliable
information being available from its database.
 The enterprise therefore becomes vulnerable to the database
or to the destruction of the db or to unauthorised
modification of the db.
cost
 Additional Hardware
 Conversion
Summary-Potential benefits
of the db approach
 Program data independence
 Minimal data redundancy
 Improved data consistency
 Improved data sharing
 Reduced application development time
 Enforcement of standards
 Improved data quality
 Improved data accessibility and responsiveness
 Reduced program maintenance
Database system architecture

 It is common to describe databases in two ways


– The logical level:
 What users see, the program or query language interface, …
– The physical level:
 How files are organised, what indexing mechanisms are
used, …
 It is traditional to split the logical level into two: overall
database design (conceptual) and the views that
various users get to see
 A schema is a description of a database

Database Concepts EGS2207


Three-level architecture
External External … External
Schema 1 Schema 2 Schema n

External
level
Conceptual Conceptual
level Schema

Physical Internal
level Schema

Database Concepts EGS2207


Logical and physical data
independence
• Data independence is the ability to change the
schema at one level of the database system without
changing the schema at the next higher level
• Logical data independence is the capacity to
change the conceptual schema without changing the
user views
• Physical data independence is the capacity to
change the internal schema without having to change
the conceptual schema or user views

Database Concepts EGS2207


DBMS architecture 1
 The view of each of these levels is described by a scheme.
 A scheme is an outline or a plan that describes the records
and relationships existing in the views.
 A db schema is a description of the db and this is specified
during the db design and is not expected to change
frequently.
External view
The external view:
 It is the view that the individual user of the database has.
 This view is often a restricted view of the database and the
same database may provide a number of different views for
different classes of users.
 In general, the end users and even the application
programmers are only interested in a subset of the database.
 Users view of data
 Describes that part of the database that is relevant to a
particular user.
External view 2
External level 2:
 This is the level at which users interact with the system via
application programs, a host language or data sub language.
 The data definition language (DDL) and the data
manipulation language (DML)are the most common
interface tools used in this schema.
 different views may have different representations of the
same data. Eg user1 views dates as (day ,month ,year)
whereas user2 may view them as (year, month,day)
 Some views may include some derived or calculated data.
Conceptual level
 The conceptual view is a representation of the entire
information content of the database.
 This level describes what data is stored in the database and
the relationships among the data.
 This level contains the logical structure of entire database as
seen by the DBA.
 The conceptual schema hides the details of physical storage
structures and concentrates on describing entities , data
types , relationships, user operations and constraints.
 The schema can be regarded as being derived from a model
of the organisation.
Internal level
 It is a low level representation of the entire database. This
level describes how the data is stored in the database and the
access paths for the db.
 The internal view is the view about the actual physical
storage of data.
 It tells us what data is stored in the database and how.
 Storage allocation
 Access paths eg specification of the primary and secondary key
 Miscellaneous eg data compression and encryption techniques
Objectives of Three level
Architecture
 All users should be able to access same data.
 A user’s view is immune to changes in other views.
 Users should not know physical database storage details
 DBA should be able to change database storage structures
without affecting the user’s views.
 Internal structure of database should be unaffected by
changes to physical aspects of storage.
 DBA should be able to change conceptual structure of db
without affecting all users.
Views
 The view mechanism provides users with only the data they
want or need to use.
 A view allows each user to have his or her own view of the
database
 A view is essentially a subset of the database.
Benefits of views:
 Reduce complexity
 Provide a level of security
 Provide a mechanism to customise the appearance of the
database;
 Present a consistent , unchanging picture of the structure of
the database , even if the underlying database is changed.
Data independence
 The three level architecture can be used to explain the
concept of data independence.
 Which can be defined as the capacity to change the schema
at one level of a db system without having to change at the
next higher level.
 We can define two types of data independence:
 Logical data independence
 Physical data independence
Logical data independence
 It is the capacity to change the conceptual schema without
having to change the external schema or application
programs.
 The conceptual may be changed to expand the db eg by
adding a new record type or data item or to reduce the db eg
by removing a record type or data item.
Physical data independence
 It is the capacity to change the internal schema without
having to change the conceptual or external schema.
 Changes to the internal schema may be needed because
some files may have to be reorganised – by using different
file organisation, storage structures or creation of additional
access structures to improve the performance of retrieval or
update.
Data Models

 A collection of tools for describing


 Data
 Data relationships
 Data semantics
 Data constraints

 Relational model
 Entity-Relationship data model (mainly for database
design)
 Object-based data models (Object-oriented and Object-
relational)
 Semistructured data model (XML)
 Other older models:
 Network model
References

 Silberschatz, Korth and Sudarshan. Database System Concepts


 Elmasri, R.; Navathe, S.B. (2010). Fundamentals of Database Systems,
Sixth Edition

You might also like