Unit 1
Unit 1
Unit 1
Systems(CS19443)
II YEAR- IV SEMESTER
SYLLABUS
UNIT-I INTRODUCTION TO DATABASE SYSTEMS 10
Introduction – Purpose of Database Systems - View of Data –Database Architecture - Relational Databases – Database Schema – Keys – Codd’s Rule –
Relational Algebra – Data Models – Entity Relationship Model – Constraints – Entity Relationship Diagram - Design Issues of ER Model – Extended ER
Features – Mapping ER Model to Relational Model.
UNIT-II SQL AND QUERY PROCESSING 10
SQL: Data Definition – Domain types – Structure of SQL Queries - Modifications of the database – Set Operations – Aggregate Functions – Null Values –
Nested Sub queries – Complex Queries – Views – Joined relations – Complex Queries – PL/SQL: Functions, Procedures, Triggers, Cursors -Embedded SQL –
Query Processing – Heuristics for Query Optimization .
UNIT-III DEPENDENCIES AND NORMAL FORMS 8
Motivation for Normal Forms – Functional dependencies – Armstrong’s Axioms for Functional Dependencies – Closure for a set of Functional
Dependencies – Definitions of 1NF-2NF-3NF and BCNF – Multivalued Dependency 4NF - Joint Dependency- 5NF.
UNIT-IV TRANSACTIONS 7
Transaction Concept – State – ACID Properties – Concurrency control - Serializability – Recoverability – Locking based protocols –Timestamp Based
Protocol - Deadlock handling.
2
Text Book(s):
1. Abraham Silberschatz, Henry F. Korth and S. Sudharshan, “Database System Concepts”, Seventh
Edition, Mc Graw Hill, March 2019.
2.P. J. Sadalage and M. Fowler, "NoSQL Distilled: A Brief Guide to the Emerging World of Polyglot
Persistence", Addison-Wesley Professional, 2013.
Reference Books(s):
1.“Ramez Elmasri and Shamkant B. Navathe, “Fundamentals of Database Systems”, Seventh
Edition, Pearson Education, 2016.
2.C.J.Date, A.Kannan and S.Swamynathan, “An Introduction to Database Systems”, Eighth
Edition, Pearson Education, 2006.
3.Atul Kahate, “Introduction to Database Management Systems”, Pearson Education, New
Delhi, 2006.
4.Steven Feuerstein with Bill Pribyl,”Oracle PL/SQL Programming”,sixth edition, Publisher:
O'Reill 2014.
5. MongoDB: The Definitive Guide, 3rd Edition,by Kristina Chodorow, Shannon
Bradshaw,Publisher: O'Reilly Media,2019
6.Cassandra: The Definitive Guide, 2nd Edition,by Eben Hewitt, Jeff Carpenter.Publisher:
O'Reilly Media, 2016.
3
Unit 1
Introduction to Database Management
Systems
Application
End-user
program DBMS
4
Basic Definitions
Data:-Raw facts and figures that can be recorded.
Information: Meaningful (processed) data is known as information.
Example-
10,971,108
Data
Chennai Population in 2020
Information
5
Basic Definitions(Cont.)
Database: Collection of inter-related data stored in
a secondary storage device, organized meaningfully
for a specific purpose.
◦ Database is a collection of data that contains
information relevant to an enterprise.
◦ Databases are designed to manage large bodies
of information.
6
Basic Definitions(Cont.)
DBMS: Database Management System is a collection of interrelated persistent
data and a set of program to access and manage those data.
◦ Management of data involves both defining structures for storage of
information and providing mechanisms for manipulation of information.
◦ The primary goal of a DBMS is to provide an environment to store, retrieve
and modify database information that is both convenient and efficient.
7
Database Applications Examples
oEnterprise Information
◦ Sales: customers, products, purchases
◦ Accounting: payments, receipts, assets
◦ Human Resources: Information about employees, salaries, payroll taxes.
oManufacturing: management of production, inventory, orders, supply chain.
oBanking and finance
◦ customer information, accounts, loans, and banking transactions.
◦ Credit card transactions
◦ Finance: sales and purchases of financial instruments (e.g., stocks and bonds;
storing real-time market data)
oUniversities: registration, grades
oAirlines: reservations, schedules
8
Database Applications Examples
(Cont.)
oTelecommunication: records of calls, texts, and data usage, generating monthly bills,
maintaining balances on prepaid calling cards
oWeb-based services
◦ Online retailers: order tracking, customized recommendations
◦ Online advertisements
oDocument databases
◦ A document database is a type of nonrelational database that is designed to store and
query data as JSON-like documents.
◦ Document databases make it easier for developers to store and query data in a database
by using the same document-model format they use in their application code.
oNavigation systems: For maintaining the locations of varies places of interest along with the
exact routes of roads, train systems, buses, etc.
oScientific: digital libraries, genomics, satellite imagery, physical sensors, simulation data
oPersonal: Music, photo, & video libraries ,Email archives , File contents (“desktop search”).
9
Disadvantages of Conventional File-Processing
System
•File system is a collection of data .
•In the traditional file approach, each application maintains its own master file and generally, has
its own set of transaction files.
•Files are custom-designed for each application and there is little sharing of data among the
various applications .
•Application programs are data-dependent. It is impossible to change the physical representation
or access techniques without affecting the application.
•This typical file processing system is supported by a conventional operating system. The system
stores permanent records in various files ,and it needs different application programs to extract
records/ information from the file system.
10
Purpose of DBMS / Disadvantages of
Conventional File-Processing System
Data redundancy and inconsistency
◦ In file processing, every user group maintains its own files for handling its data-processing applications.
Storing the same data multiple times is called data redundancy.
◦ This redundancy leads to several problems such as storage space is wasted; need to perform a single
logical operation like insertion, update and deletion more than one times which may lead to data
inconsistency problem.
◦ For Example: The address and phone number of a particular customer may appear in a file that consists
of personal information and in saving account records file also. This redundancy leads to data
consistency that is, the various copies of the same data may no longer agree.
11
Purpose of DBMS / Disadvantages of
Conventional File-Processing System (Cont.)
Difficulty in accessing data
◦ File processing environments do not allow needed data to be retrieved in a convenient and efficient
manner.
◦ For Example: Suppose that bank officer needs to find out the names of all customers who live within the
city‘s 411027 zip code. The bank officer has now two choices: Either get the list of customers and
extract the needed information manually or ask the data processing department to have a system
programmer write the necessary application program. Both alternatives are unsatisfactory.
Data isolation
◦ Because data are scattered in various files and files may be in different formats, writing new application
programs to retrieve the appropriate data is difficult.
12
Purpose of DBMS / Disadvantages of
Conventional File-Processing System (Cont.)
Integrity problems
◦ The data values stored in database must satisfy certain types of consistency constraints. Developers
enforce those constraints in the system by adding appropriate code in the various application programs.
When new constraints are added, it is difficult to change the programs to enforce them. The problem is
compounded when constraints involve several data items from different files.
◦ For Example: The balance of a bank account may never fall below a prescribed amount (say
₹5000).These constraints are enforced in the system by adding appropriate code in the various
application programs.
Atomicity problems
◦ Atomic means the transaction must happen in it’s entirely or not at all. It is difficult to ensure atomicity
in a conventional file-processing system.
◦ Failures may leave database in an inconsistent state with partial updates carried out
◦ Example: Transfer of funds from one account to another should either complete or not happen at all
13
Purpose of DBMS / Disadvantages of
Conventional File-Processing System (Cont.)
Concurrent-access anomalies
◦ Many systems allow multiple users to update the data simultaneously to provide overall performance
of the system and faster response.
◦ The result of the concurrent access may leave the account in an incorrect state.
◦ For Example: Consider bank account A, containing ₹5000. If two customers withdraw funds say ₹500
and ₹1000 respectively from account A at about the same time, the result of the concurrent executions
may leave the account in an incorrect (or inconsistent) state. Balance will be ₹4000 instead of ₹3500. To
protect against this possibility, the system must maintain some form of supervision.
Security problems
◦ Enforcing security constraints in an ad hoc manner to the file processing system is difficult. Not every
user of the database system should be able to access all the data.
◦ For Example: In a banking system, payroll personnel should be only given authority to see the part of
the database that has information about the various bank employees. They do not need access to
information about customer accounts
14
Advantages of a DBMS
•Improved security - Database security is the protection of the database from an unauthorized access.
• Improved data integrity- Database integrity refers to the validity and consistency of stored data.
Integrity is expressed in terms of constraints, which are consistency rules that the database is not
permitted to violate.
• Data consistency - If a data item is stored more than once and the system is aware of this, the system
can ensure that all copies of the item are kept consistent.
• Improved data accessibility and responsiveness - Many DBMSs provide query language that allows
users to enquire questions and to obtain the required information at their terminals, without any need
of separate software.
• Increased concurrency -Many DBMSs manage concurrent database access and ensure the data in the
database is consistent and valid.
• Improved backup and recovery services - Modern DBMSs provide facilities to minimize the amount of
processing that is lost following a failure.
15
Disadvantages of a DBMS
• Cost of DBMS -The cost of DBMS varies significantly, depending on the environment and
functionality provided
•Complexity and Size -The provision of the functionality makes DBMS an extremely complex
piece of software. Failure to understand the system can lead to bad design decisions.
• Higher impact of a failure -The centralization of resources increases the vulnerability of the
system. Since all users and applications rely on the availability of the DBMS, the failure of any
component can bring operations to a halt.
• Cost of conversion -The cost of converting existing applications to run on the new DBMS and
hardware includes the cost of training staff to use these new systems and running of the system.
•Performance -The DBMS is written to be more general in order to support applications in all
domains. The effect is that some applications may not run as they used to.
16
Functions of DBMS
17
Levels of Abstraction
A major purpose of a database system is to provide users with an abstract view of the data.
A database system must retrieve data efficiently.
The need for efficiency has led designers to use complex data structures to represent data in the
database.
Since many users are not computer trained, developers hide the complexity from users through
several levels of abstraction to simplify users’ interactions with the system.
Three levels abstraction of a database system
◦ Physical / Internal Level: The lowest level of abstraction
◦ Logical Level: The next-higher level of abstraction
◦ View Level: The highest level of abstraction
18
View of Data
A three-level architecture for a database system
19
Three-levels architecture
•Physical / Internal Level: The lowest level of abstraction describes how
the data are stored. This level describes complex low-level data
structures in details.
•Logical Level: The next-higher level of abstraction describes what data
are stored in the database and what relationships exist among those
data, although implementation of the simple structures at the logical
level does not need to be aware of this complexity.
Database administrators, who must decide what information to keep in
the database, use the logical level of abstraction.
•View Level: The highest level of abstraction describes only part of the
entire database. Even though the logical level uses simpler structures,
complexity remains because of the variety of information stored in a
large database.
20
Instances and Schemas
Like types and variables in programming languages
21
Data Independence
•The ability to modify a schema definition in one level without affecting a schema definition in
the next higher level is called data independence. There are two levels of data independence:
1. Physical data independence is the ability to modify the physical schema without causing
application programs to be rewritten.
◦ Modifications at the physical level are occasionally necessary in order to improve performance.
2. Logical data independence is the ability to modify the conceptual schema without causing
application programs to be rewritten.
◦ Modifications at the conceptual level are necessary whenever the logical structure of the database is
altered.
22
Data Models
•Data Model is a collection of tools for describing
◦ Data
◦ Data relationships
◦ Data semantics
◦ Data constraints
•Types:
1. Relational Model
2. The Entity-Relationship Model
3. Object-Based Data Model
4. Semi-Structured Data Model (XML)
5. Network Data Model
6. Hierarchical Data Model
23
Relational Model
▪The relational model uses a collection of tables to
represent both data and the relationships among those
data.
▪It is based on the concept of mathematical relations.
▪Each table corresponds to an entity and each row
represents an instance of that entity.
▪Table are also called relations, related to each other
through the sharing of a common entity characteristic.
▪The relational data model is widely used data model
and a vast majority of current database systems are
based on the relational model e.g., Relational DBMS -
DB2, MS SQL Server.
24
Entity-Relational Model
•The ER Model is based on two components namely entity
and relationships.
•An entity is a collection of basic real time objects and an
entity is described by a set of attributes that describes
associations among data.
•A relationship describes associations among these objects/
data. There are three types of relationships exist such as
One-to-One, One-to-Many and Many-to-Many.
25
Object-Based Data Model
▪In the object-oriented data model (OODM) both data and their relationships are contained in a
single structure known as an object.
▪The object-oriented data model can be seen extending the ER model with notions of
encapsulation, methods and object identity. It combines features of the object-oriented data
model and relational data model. The OODM is said to be a semantic data model because
semantic indicates meaning.
▪The OODM is based on the following components:
◦ An object is an abstraction of a real-world entity.
◦ Attributes describe the properties of an object.
◦ A class is a collection of similar objects with shared structure and behaviour. Classes are organized in a
class hierarchy.
◦ Inheritance is the ability of an object within the class hierarchy.
26
Hierarchical Model
• The hierarchical data model organizes data in a tree
structure.
•There is a hierarchy of parent and child data segments.
•This structure implies that a record can have repeating
information, generally in the child data segments.
•Data is represented by a collection of records (record types).
•A record type is the equivalent of a table in the relational
model, and with the individual records being the equivalent
of rows.
•To create links between these record types, the hierarchical
model uses parent -child relationships.
27
Network Model
•The network model permitted the modeling of many-to-many
relationships in data.
•In 1971, the Conference on Data Systems Languages
(CODASYL) formally defined the network model.
• Data in the network model is represented by a collection of
records and the relationships among data are represented by
links (pointers).
•The records in the database are organized as collections of
graphs. Example: IDMS.
•IDMS, short for Integrated Database Management System, is
primarily a network model database management system for
mainframes.
28
Database Design
•The process of designing the general structure of the database:
•Logical Design – Deciding on the database schema. Database design requires
that we find a “good” collection of relation schemas.
• Business decision – What attributes should we record in the database?
• Computer Science decision – What relation schemas should we have and
how should the attributes be distributed among the various relation
schemas?
•Physical Design – Deciding on the physical layout of the database
29
Database Engine
•A database system is partitioned into modules that deal with each of the
responsibilities of the overall system.
•The functional components of a database system can be divided into
• The storage manager,
30
Storage Manager
•A program module that provides the interface between the low-level data stored in the database and the
application programs and queries submitted to the system.
•The storage manager is responsible to the following tasks:
• Interaction with the OS file manager
• Efficient storing, retrieving and updating of data
•The storage manager components include:
• Authorization and integrity manager- It tests for satisfaction of various integrity constraints and checks the
authority of users accessing the data.
• Transaction manager- It ensures that the database remains in a consistent state despite system failures,
and concurrent executions proceed without conflicting.
• File manager- It manages the allocation of space on disk storage and the data structures used to represent
information stored on disk.
• Buffer manager- It is responsible for fetching data from disk storage into main memory and to decide what
data to cache in main memory. It enables the database to handle data sizes that are much larger than the
size of the main memory.
31
Storage Manager(Cont.)
•The storage manager implements several data structures as
part of the physical system implementation:
◦ Data files -- store the database itself
◦ Data dictionary -- stores metadata about the structure of the
database, in particular the schema of the database. A database
system consults the data dictionary before reading and
modifying actual data.
◦ Indices -- can provide fast access to data items. A database
index provides pointers to those data items that hold a particular
value.
32
Query Processor
▪The major component of a DBMS is Query Processor that
transforms queries into a series of low-level instructions.
▪It helps the database system to simplify and facilitate access
to data.
▪The query processor components include:
▪ DDL interpreter -- interprets DDL statements and
records the definitions in the data dictionary.
▪ DML compiler -- translates DML statements in a query
language into an evaluation plan consisting of low-level
instructions that the query evaluation engine
understands.
▪ The DML compiler performs query optimization; that
is, it picks the lowest cost evaluation plan from
among the various alternatives.
▪ Query evaluation engine -- executes low-level
instructions generated by the DML compiler.
33
Transaction Management
•A transaction is a collection of operations that performs a single logical function in a database
application
•Transaction-management component ensures that the database remains in a consistent
(correct) state despite system failures (e.g., power failures and operating system crashes) and
transaction failures.
•Concurrency-control manager controls the interaction among the concurrent transactions, to
ensure the consistency of the database.
34
Database Architecture
(Centralized/Shared-Memory)
35
Database Users
•There are four different types of database-system users, differentiated by the way they expect to
interact with the system.
• Different types of user interfaces have been designed for the different types of users.
•Naive users: Naive users interact with the system by invoking one of the application programs
that have been written previously.
• Naive users are typical users of form interface, where the user can fill in appropriate fields of the form.
• Naive users may also simply read reports generated from the database
36
Database Users(Cont.)
•Sophisticated users: Sophisticated users interact with the system without writing programs.
Instead, they form their requests in a database query language.
• They submit each such query to a query processor that the storage manager understands.
• Online analytical processing (OLAP) tools simplify analysis and data mining tools specify certain kinds of
patterns in data
•Specialized users: Specialized users are sophisticated users who write specialized database
applications that do not fit into the traditional data-processing framework.
• The applications are computer-aided design systems, knowledge base and expert systems, systems that
store data with complex data types .
37
Database Administrator
•A person who has central control over the system is called a database administrator (DBA).
Functions of a DBA include:
• Schema definition
• Storage structure and access-method definition
• Schema and physical-organization modification
• Granting of authorization for data access
• Routine maintenance
• Periodically backing up the database
• Ensuring that enough free disk space is available for normal operations, and upgrading disk space as
required
• Monitoring jobs running on the database
38
Users at different level of abstractions
39
Introduction to Relational Model
Example of an Instructor Relation attributes
(or columns)
tuples
(or rows)
Relation Schema and Instance
A1, A2, …, An are attributes
R = (A1, A2, …, An ) is a relation schema
Example:
Instructor = (ID, name, dept_name, salary)
A relation instance r defined over schema R is denoted by r (R).
The current values a relation are specified by a table
An element t of relation r is called a tuple and is represented by a
row in a table
Attributes
•The set of allowed values for each attribute is called the domain of the attribute
•Attribute values are required to be atomic; that is, indivisible
•The special value null is a member of every domain. Indicated that the value is
“unknown”
Relations are Unordered
Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
Example: instructor relation with unordered tuples
Database Schema
Database schema -- is the logical structure of the database.
Database instance -- is a snapshot of the data in the database at a given instant in
time.
Example:
◦ schema:
◦ instructor (ID, name, dept_name, salary)
◦ Instance:
Keys
•Let K ⊆ R , K is a super key of R if values for K are sufficient to identify a
unique tuple of each possible relation r(R)
◦ Example: {ID} and {ID , name} are both super keys of instructor.
•Super key K is a candidate key if K is minimal
Example: {ID} is a candidate key for Instructor
•A candidate key is a set of one or more attributes that can uniquely identify
a row in a given table.
•There can be more than one candidate keys in a table.
•Candidate keys are identified during the design phase
•While creating the table, the database designer chooses one candidate key
from amongst the several available, to serve as a primary key
•It is preferred to select a candidate key with a minimal number of attributes
to function as a primary key
Keys(Cont.)
One of the candidate keys is selected to be the Primary Key.
◦ Which one?
46
Foreign key
Foreign key constraint: If an attribute can only take the values which are present as values of some
other attribute, it will be foreign key to the attribute to which it refers.
The relation which is being referenced is called referenced relation and corresponding attribute
is called referenced attribute and the relation which refers to referenced relation is called
referencing relation and corresponding attribute is called referencing attribute.
Referenced attribute of referencing attribute should be primary key.
47
Foreign key(Cont.)
STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO
in STUDENT relation.
A foreign key is a set of attributes of a table, the values of
which are required to match values of some candidate key in
the same or another table.
The constraint that values of a given foreign key must match
the values of the corresponding candidate key is known as
referential constraint.
A table which has a foreign key referring to its own candidate
key is known as self-referencing table
48
OVERVIEW OF CODD’s RULE
•Any database which simply has relational data model is not a relational database system
(RDBMS).
•There are certain rules for a database to be perfect RDBMS. These rules are developed by Dr.
Edgar F Codd in 1985 to define a perfect RDBMS. For a RDBMS to be a perfect RDBMS, it must
follow his rules.
•EF Codd has developed 13 rules for a database to be a RDBMS.
•According to him, all these rule help to have perfect RDBMS and hence correct data and relation
among the objects in database.
•But none of the database follows all these rules; but obeys to some extent.
•For example, Oracle supports 11.5 of E.F. CODD Rule. The .5 missing was according to E.F. CODD
Rule DML operation through a complex view is possible. But in oracle DML operation through a
complex view is not always.
49
Codd’s Rule 0
▪This is the foundational Rule.
▪This rule states that any database system should be relational in nature and must have a
management system to be RDBMS.
▪That means a database should be a relational by having the relation / mapping among the tables
in the database. They have to be related to one another by means of constraints/ relation. There
should not be any independent tables hanging in the database.
▪RDBMS is management system – that means it should be able to manage the data, relation,
retrieval, update, delete, permission on the objects. It should be able handle all these
administrative tasks without affecting the objectives of database. It should be performing all
these tasks by using query languages.
50
Codd’s Rule 1
▪Information Rule
All information in the database is to be represented in one and only one way.
Data must be stored in a table in the form of rows and columns.
The order of rows and columns in the table should not affect the meaning of the table.
Each cell should have single data.
There should not be any group/range of values separated by comma, space or hyphen (Normalized data).
▪This rule is satisfied by all the databases.
For example:
Order of storing personal details about ‘Ram’ and ‘Hari’ in PERSON table should not have any difference. There
should be flexibility of storing them in any order in a row. Similarly, storing Person name first and then his
address should be same as storing address and then his name. It does not make any difference on the meaning of
table.
51
Codd’s Rule 2
Guaranteed access Rule.
▪Each unique piece of data should be accessible by,
table name + primary key(row) +
attribute(column).
▪All data are uniquely identified and accessible via
this identity.
▪When combination of these 3 is used, it should
give the correct result.
▪ Any column/ cell value should not be directly
accessed without specifying the table and primary
key.
▪Most RDBMS do not make the definition of the Address of Kathy
primary key mandatory and are deficient to that STUDENT + STUDENT_ID (100) + ADDRESS is the
extent . right way of getting any cell value.
52
Codd’s Rule 3
Systematic Treatment of NULL
▪This rule states about handling the NULLs in
the database.
▪ As database consists of various types of data,
each cell will have different datatypes. If any
of the cell value is unknown, or not applicable
or missing, it cannot be represented as zero or
empty.
▪It will be always represented as NULL. This
NULL should be acting irrespective of the
datatype used for the cell. When used in
logical or arithmetical operation, it should
result the value correctly.
53
Codd’s Rule 4
Active Online Catalog Example:
▪This rule illustrates data dictionary. ▪SELECT * FROM ALL_TAB;
▪Metadata should be maintained for all the ALL_TAB is the table which has the table
data in the database. definitions that the user owns and has access.
▪These metadata stored in the data dictionary
should also obey all the characteristics of a
database.
▪ We should be able to access these metadata
by using same query language that we use to
access the database.
54
Codd’s Rule 5
Comprehensive Data Sub-Language Rule Example:
▪Any RDBMS database should not be directly ▪SQL is a structured query language which
accessed. support creating tables / views/
constraints/indexes, accessing the records of
▪It should always be accessed by using some tables/views (SELECT), manipulating the
strong query language. records by insert/delete/update, provides
▪This query language should be able to access security by giving different level of access
the data, manipulate the data and maintain rights (GRANT and REVOKE) and integrity and
the consistency , atomicity ,and integrity of consistency by using constraints.
the database.
55
Codd’s Rule 6
View Updating Rule
▪Views are the virtual tables created by using
queries to show the partial view of the table.
Example:
▪ That is views are subset of table, it is only
▪ If a view is formed as join of 3 tables, changes
partial table with few rows and columns.
to view should be reflected in base tables.
▪This rule states that views are also be able to
get updated as we do with its table.
56
Codd’s Rule 7
High-level insert, update, and delete Example:
▪This rule states that insert, update, and delete ▪Suppose employees got 5% hike in a year.
operations should be supported for any Then their salary has to be updated to reflect
retrievable set rather than just for a single row the new salary. Since this is the annual hike
in a single table. given to the employees, this increment is
applicable for all the employees. Hence, the
▪It also perform the operation on multiple row query should not be written for updating the
simultaneously . salary one by one for thousands of employee.
▪There must be delete, updating and insertion A single query should be strong enough to
at each level of operation. Set operation like update the entire employee’s salary at a time.
union, all union , insertion and minus should
also support.
57
Codd’s Rule 8
Physical Data Independence EXAMPLE:
▪The ability to change the physical schema ▪A change to the internal schema, such as using
without changing the logical schema is called different file organization or storage
physical data independence. structures, storage devices, or indexing
strategy, should be possible without having to
change the conceptual or external schemas.
▪This is saying that users shouldn’t be
concerned about how the data is stored or
how it’s accessed. In fact, users of the data
need only be able to get the basic definition of
the data they need.
58
Codd’s Rule 9
Logical Data Independence EXAMPLE:
The ability to change the logical (conceptual) The addition or removal of new entities,
schema without changing the External schema attributes, or relationships to the conceptual
(User View) is called logical data schema should be possible without having to
independence. change existing external schemas or having to
rewrite existing application programs.
59
Codd’s Rule 10
Integrity Independence Example:
Database should apply integrity rules by using Suppose we want to insert an employee for
its query languages. department 50 using an application.
It should not be dependent on any external But department 50 does not exists in the
factor or application to maintain the integrity. system.
The keys and constraints in the database In such case, the application should not
should be strong enough to handle the perform the task of fetching if department 50
integrity. exists, if not insert the department and then
inserting the employee.
A good RDBMS should be independent of the
frontend application. It should at least support It should all handled by the database.
primary key and foreign key integrity
constraints.
60
Codd’s Rule 11
Distribution Independence
▪The database can be located at the user server or at any other network.
▪The end user should not be able to know about the database servers.
▪He should be able to get the records as if he is pulling the records locally.
▪Even if the database is located in different servers, the accessibility time should be
comparatively less.
61
Codd’s Rule 12
Non-Subversion Rule Example:
When a query is fired in the database, it will be Update Student’s address query should always be
converted into low level language so that it can be converted into low level language which updates
understood by the underlying systems to retrieve the address record in the student file in the
the data. memory.
In such case, when accessing or manipulating the It should not be updating any other record in the
records at low level language, there should not be file nor inserting some malicious record into the
any loopholes that alter the integrity of the file/memory.
database.
If low level access is allowed to a system, it should
not be able to subvert or bypass integrity rules to
change the data.
This can be achieved by some sort of looking or
encryption.
62
RELATIONAL QUERY LANGUAGES
•It is the language by which user
communicates with the database.
•These relational query languages can
be procedural or non-procedural.
•Relational query languages use
relational algebra to break the user
requests and instruct the DBMS to
execute the requests.
63
RELATIONAL QUERY LANGUAGES(Cont.)
Procedural Query Language Non-Procedural Query Language
In procedural languages, the program code is In the non-procedural languages, the user has
written as a sequence of instructions. to specify only “what to do” and not “how to
do”.
User has to specify “what to do” and also
“how to do” (step by step procedure). It is also known as an applicative or functional
language.
These instructions are executed in the
sequential order. These instructions are It involves the development of the functions
written to solve specific problems. from other functions to construct more
complex functions.
Examples : Relational algebra , FORTRAN,
COBOL, ALGOL, BASIC, C and Pascal. Examples : SQL, PROLOG, LISP.
64
Relational Algebra
A procedural language consisting of a set of operations that take
one or two relations as input and produce a new relation as their
result.
Six basic operators
◦ select: σ
◦ project: ∏
◦ union: ∪
◦ set difference: –
◦ Cartesian product: x
◦ rename: ρ
Select Operation
The select operation selects tuples that satisfy a given predicate.
Notation: σ p (r)
p is called the selection predicate
Example: select those tuples of the instructor relation where the instructor is in
the “Physics” department.
◦ Query
σ dept_name=“Physics” (instructor)
◦ Result
Select Operation (Cont.)
We can use comparison operators like =, ≠, >, ≥. <. ≤ in the
selection predicate.
We can combine several predicates into a larger predicate by using
the connectives:
∧ (and), ∨ (or), ¬ (not)
Example: Find the instructors in Physics with a salary greater
$90,000, we write:
It creates the subset of relation based on the conditions specified. Here, it selects only
selected columns/attributes from the relation- vertical subset of relation
Notation:
The result is defined as the relation of k columns obtained by erasing the columns that are
not listed
Duplicate rows removed from result, since relations are sets
Project Operation Example
72
instructor X teaches
Join Operation
•The Cartesian-Product
instructor X teaches
associates every tuple of instructor with every tuple of teaches.
◦ Most of the resulting rows have information about instructors who did NOT teach a
particular course.
To get only those tuples of “instructor X teaches “ that pertain to instructors and the courses
that they taught, we write:
σ instructor.id = teaches.id
(instructor x teaches ))
◦ We get only those tuples of “instructor X teaches” that pertain to instructors and the
courses that they taught.
The result of this expression, shown in the next slide
Join Operation (Cont.)
σ instructor.id = teaches.id
(instructor x teaches))
Join Operation (Cont.)
Union Operation
The union operation allows us to combine two relations
Notation: r ∪ s
For r ∪ s to be valid.
1. r, s must have the same arity (same number of
attributes)
2. The attribute domains must be compatible (example:
2nd
column of r deals with the same type of values as does the
2nd column of s)
Result of:
∏course_id (σ semester=“Fall” Λ year=2017
(section)) ∪
∏course_id (σ semester=“Spring” Λ year=2018
(section))
Set-Intersection Operation
The set-intersection operation allows us to find tuples that
are in both the input relations.
Notation: r ∩ s
Assume:
◦ r, s have the same arity
◦ attributes of r and s are compatible
Example: Find the set of all courses taught in both the Fall
2017 and the Spring 2018 semesters.
◦ Result
Set-Difference Operation
The set-difference operation allows us to find tuples that
are in one relation but are not in another.
Notation r – s
Set differences must be taken between compatible
relations.
◦ r and s must have the same arity
◦ attribute domains of r and s must be compatible
◦ Result
The Assignment Operation
▪It is convenient at times to write a relational-algebra expression by assigning parts
of it to temporary relation variables.
▪The assignment operation is denoted by ← and works like assignment in a
programming language.
▪Example: Find all instructor in the “Physics” and Music department.
ρx (E)
returns the result of expression E under the name x
Another form of the rename operation:
Query 2
The two queries are not identical; they are, however, equivalent --
they give the same result on any database.
Equivalent Queries
Design Phases
Initial phase -- characterize fully the data needs of the prospective database users.
Second phase -- choosing a data model
◦ Applying the concepts of the chosen data model
◦ Translating these requirements into a conceptual schema of the database.
◦ A fully developed conceptual schema indicates the functional requirements of the
enterprise.
◦ Describe the kinds of operations (or transactions) that will be performed on the data.
▪Final Phase -- Moving from an abstract data model to the implementation of the database
◦ Logical Design – Deciding on the database schema.
◦ Database design requires that we find a “good” collection of relation schemas.
▪ Business decision – What attributes should we record in the database?
▪ Computer Science decision – What relation schemas should we have and how should
the attributes be distributed among the various relation schemas?
◦ Physical Design – Deciding on the physical layout of the database
Design Alternatives
▪In designing a database schema, we must ensure that we avoid two major pitfalls:
• Redundancy: a bad design may result in repeat information.
▪ Redundant representation of information may lead to data inconsistency
among the various copies of information
• Incompleteness: a bad design may make certain aspects of the enterprise difficult
or impossible to model.
▪Avoiding bad designs is not enough. There may be a large number of good designs
from which we must choose.
ER model -- Database Modeling
The ER data mode was developed to facilitate database design by allowing
specification of an enterprise schema that represents the overall logical structure of
a database.
The ER data model employs three basic concepts:
◦ entity sets,
◦ relationship sets,
◦ attributes.
The ER model also has an associated diagrammatic representation, the ER diagram,
which can express the overall logical structure of a database graphically.
Entity Sets
An entity is an object that exists and is distinguishable from other objects.
◦ Example: specific person, company, event, plant
An entity set is a set of entities of the same type that share the same properties.
◦ Example: set of all persons, companies, trees, holidays
A subset of the attributes form a primary key of the entity set; i.e., uniquely
identifying each member of the set.
Entity Sets -- instructor and student
Ternary Relationship
A ternary relationship exists when there are three entities associated. The
entities teacher, subject and student are related using a ternary relationship
98
Attributes
•An attribute of an entity set is a function that maps from the entity set
into a domain.
•For each attribute, there is a set of permitted values called ‘domain’ or
‘value set’ of that attribute.
•An attribute can be characterized by the following attribute types,
1. Simple and composite attributes.
2. Single valued and multi-valued attributes.
3. Derived attribute.
Composite Attributes
•An attribute composed of a single component with an independent
existence is called ‘simple attribute or atomic attribute’. Simple
attribute cannot be further subdivided into smaller components.
e.g.: gender of a staff entity.
An attribute composed of multiple components each with an
independent existence is called ‘composite attribute’.
e.g.: The ‘address’ attribute of the branch entity, can be subdivided
into street, city, country and postal code attributes.
Representing Complex Attributes in ER Diagram
Mapping Cardinality Constraints
•Express the number of entities to which another entity can be associated via a
relationship set.
•Most useful in describing binary relationship sets.
•For a binary relationship set the mapping cardinality must be one of the following
types:
◦ One to one
◦ One to many
◦ Many to one
◦ Many to many
Mapping Cardinalities
The choice of the primary key for a relationship set depends on the mapping
cardinality of the relationship set.
Choice of Primary key for Binary
Relationship
▪Many-to-Many relationships. The preceding union of the primary keys is a minimal
superkey and is chosen as the primary key.
▪One-to-Many relationships . The primary key of the “Many” side is a minimal superkey
and is used as the primary key.
▪Many-to-one relationships. The primary key of the “Many” side is a minimal superkey
and is used as the primary key.
▪One-to-one relationships. The primary key of either one of the participating entity
sets forms a minimal superkey, and either one can be chosen as the primary key.
Weak Entity Sets
▪A weak entity set is one whose existence is dependent on another entity, called its
identifying entity
▪Instead of associating a primary key with a weak entity, we use the identifying entity, along
with extra attributes called discriminator to uniquely identify a weak entity.
▪ An entity set that is not a weak entity set is termed a strong entity set.
▪Every weak entity must be associated with an identifying entity; that is, the weak entity set
is said to be existence dependent on the identifying entity set.
▪The identifying entity set is said to own the weak entity set that it identifies.
▪The relationship associating the weak entity set with the identifying entity set is called the
identifying relationship.
▪Note that the relational schema we eventually create from the entity set section does have
the attribute course_id, for reasons that will become clear later, even though we have
dropped the attribute course_id from the entity set section.
Expressing Weak Entity Sets
▪In E-R diagrams, a weak entity set is depicted via a double rectangle.
▪We underline the discriminator of a weak entity set with a dashed line.
▪The relationship set connecting the weak entity set to the identifying strong entity
set is depicted by a double diamond.
▪A weak entity set becomes a table that includes a column for the primary key of the
identifying strong entity set