Information Systems (6CFU) : Prof. Cinzia Bernardeschi Master of Science in Computer Engineering 2014-15
Information Systems (6CFU) : Prof. Cinzia Bernardeschi Master of Science in Computer Engineering 2014-15
Information Systems (6CFU) : Prof. Cinzia Bernardeschi Master of Science in Computer Engineering 2014-15
MySQL
Book:
Silberschatz, Korth and Sudarshan
Database System Concepts
McGraw-Hill
These slides are a modified version of the slides of the book “Database
System Concepts”, 5th Ed., Silberschatz, Korth and Sudarshan. Original
slides are available at www.db-book.com
1.2
Database Management System (DBMS)
1.3
Purpose of Database Systems
In the early days, database applications were built directly on top of
file systems. The system stores permanent records into files.
Application programs to extract and add records to files.
A file-processing-system is supported by a conventional operating
system.
Drawbacks of using file systems to store data:
Data redundancy and inconsistency
Multiple file formats, duplication of information in different files
(different programers created data/progr over a long period)
Difficulty in accessing data
Need to write a new program to carry out each new task
Data isolation — multiple files and formats
Integrity problems
Integrity constraints (e.g. account balance > 0) become
“buried” in program code rather than being stated explicitly
Hard to add new constraints or change existing ones
1.4
Purpose of Database Systems (Cont.)
Drawbacks of using file systems (cont.)
Atomicity of updates (a computer is subject to failures)
Failures may leave database in an inconsistent state with partial
updates carried out
Example: Transfer of funds from one account to another should
either complete or not happen at all
Concurrent access by multiple users (for the sake of overall
performance of the system)
Concurrent accessed needed for performance
Uncontrolled concurrent accesses can lead to inconsistencies
– Example: Two people reading a balance and updating it at the
same time
Security problems
Hard to provide user access to some, but not all, data
Database systems offer solutions to all the above problems
1.5
Levels of Abstraction
A major purpose of a DBMS is to provide a user with an abstract view
of data
1.6
View of Data
1.7
Instances and Schemas
Databases change over time as information is inserted and deleted.
1.8
Data Models
Underlying the structure of a data base is the data model
A collection of conceptual tools for describing
Data
Data relationships
Data semantics
Data constraints
1.9
The Entity-Relationship Model
Models an enterprise as a collection of entities and relationships
Entity: a “thing” or “object” in the enterprise that is distinguishable
from other objects
Described by a set of attributes
Relationship: an association among several entities
Represented diagrammatically by an entity-relationship diagram:
1.10
Data Manipulation Language (DML)
Language for accessing and manipulating the data organized by the
appropriate data model
DML also known as query language
Two classes of languages
Procedural – user specifies what data is required and how to get
those data
Declarative (nonprocedural) – user specifies what data is
required without specifying how to get those data
SQL is the most widely used query language
1.11
Data Definition Language (DDL)
Specification notation for defining the database schema
Example: create table account (
account-number char(10),
balance integer)
DDL compiler generates a set of tables stored in a data dictionary
Data dictionary contains metadata (i.e., data about data)
Database schema
Data storage and definition language
Specifies the storage structure and access methods used
Integrity constraints
Domain constraints
Referential integrity (references constraint in SQL)
Assertions
Authorization
1.12
Relational Model
Attributes
Example of tabular data in the relational model
Each table has multiple columns, each column has a unique name
1.13
A Sample Relational Database
which accounts
belong to which
customers
1.14
SQL
non-procedural:
“what data are needed withouth specifying how to get those data”
1.15
Database Design
The process of designing the general structure of the database:
User requirements – interaction with domain experts to carry out the specification
of user requirements
Conceptual design – Translate the requirements into a conceptual schema
Functional Requirements: user defines the kinds of operations that will be
performed on data. Review of the schema to meet functional requirements
Logical Design –
What attributes should we record in the database?
What relation schemas should we have and how should the attributes be
distributed among the various relation schemas?
Deciding on the database schema. Database design requires that we find a
“good” collection of relation schemas (no unnecessary redundancy, retrieve
information easily)
The most common approach is to use “functional dependencies”
1.16
Data Base Design for Banking Enterprise
Major characteristics
The bank is organised into branches. Each branch is located in a
particular city and is identified by a unique name. The bank monitors
the assets of each branch.
Bank customers are identified by their customer_id value. The bank
stores each customer’s name, and the street and the city where the
customer lives. Customers may have accounts and can take out
loans. A customer may be associated with a particular banker; who
may act as a loan officer or personal banker for that customer.
The bank offers two types of accounts: savings and checking
accounts. Accounts can be held by more than one customer, and a
customer can have more than one account. Each account is assigned
a unique account number. The bank mantains a record of each
account’s balance and the most recent date on which the account was
accessed by each customer holding the account. In addition each
savings account has an interest rate, and overdrafts are recorded for
each checking account.
1.17
Banking Enterprise
The bank provides its customers with loans. A loan originates at a particular
branch and can be held by one or more customers. A loan is identified by
unique loan number. For each loan, the bank keeps track of loan amount and
the loan payments. Although a loan-payment number does not uniquely identify
a particular payment among those for all the bank’s loans, a payment number
does identify a particular payment for a specific loan. The date and the amount
are recorded for each payment.
To keep the example small, we do not keep track of deposits and withdrawals from
savings and checking accounts, just as it keeps track of payments to loan
accounts.
1.18
E-R Diagram for a Banking Enterprise
1.19
Summary of Symbols Used in E-R Notation
1.20
Summary of Symbols (Cont.)
1.21
Entity Sets customer and loan
customer_id customer_ customer_ customer_ loan_ amount
name street city number
1.22
Relationship Set borrower
1.23
Relationship Sets (Cont.)
1.24
Degree of a Relationship Set
Refers to number of entity sets that participate in a relationship
set.
Relationship sets that involve two entity sets are binary (or
degree two). Generally, most relationship sets in a database
system are binary.
Relationship sets may involve more than two entity sets.
1.25
Attributes
1.26
Composite Attributes
1.27
E-R Design Decisions
The use of an attribute or entity set to represent an object.
Whether a real-world concept is best expressed by an entity set or
a relationship set.
The use of a ternary relationship versus a pair of binary
relationships.
The use of a strong or weak entity set.
The use of specialization/generalization – contributes to modularity
in the design.
The use of aggregation – can treat the aggregate entity set as a
single unit without concern for the details of its internal structure.
1.28
Reduction to Relation Schemas
1.29
The Banking Schema
branch = (branch_name, branch_city, assets)
customer = (customer_id, customer_name, customer_street, customer_city)
loan = (loan_number, amount)
account = (account_number, balance)
employee = (employee_id. employee_name, telephone_number, start_date)
dependent_name = (employee_id, dname)
account_branch = (account_number, branch_name)
loan_branch = (loan_number, branch_name)
borrower = (customer_id, loan_number)
depositor = (customer_id, account_number)
cust_banker = (customer_id, employee_id, type)
works_for = (worker_employee_id, manager_employee_id)
payment = (loan_number, payment_number, payment_date, payment_amount)
savings_account = (account_number, interest_rate)
checking_account = (account_number, overdraft_amount)
1.30
Representing Entity Sets as Schemas
1.31
Representing Relationship Sets as
Schemas
A many-to-many relationship set is represented as a schema with
attributes for the primary keys of the two participating entity sets,
and any descriptive attributes of the relationship set.
Example: schema for relationship set borrower
borrower = (customer_id, loan_number )
1.32
Storage access: Buffer Manager
1.33
Storage Access
1.34
Buffer Manager
Programs call on the buffer manager when they need a block
from disk.
1. If the block is already in the buffer, buffer manager returns
the address of the block in main memory
2. If the block is not in the buffer, the buffer manager
1. Allocates space in the buffer for the block
1. Replacing (throwing out) some other block, if required,
to make space for the new block.
2. Replaced block written back to disk only if it was
modified since the most recent time that it was written
to/fetched from the disk.
2. Reads the block from the disk to the buffer, and returns
the address of the block in main memory to requester.
1.35
Buffer-Replacement Policies
Most operating systems replace the block least recently used
(LRU strategy)
Idea behind LRU – use past pattern of block references as a
predictor of future references
Queries have well-defined access patterns (such as sequential
scans), and a database system can use the information in a user’s
query to predict future references
LRU can be a bad strategy for certain access patterns involving
repeated scans of data
For example: when computing the join of 2 relations r and s
by a nested loops
for each tuple tr of r do
for each tuple ts of s do
if the tuples tr and ts match …
Mixed strategy with hints on replacement strategy provided
by the query optimizer is preferable
1.36
Buffer-Replacement Policies (Cont.)
Pinned block – memory block that is not allowed to be written
back to disk.
Toss-immediate strategy – frees the space occupied by a block
as soon as the final tuple of that block has been processed
Most recently used (MRU) strategy – system must pin the block
currently being processed. After the final tuple of that block has
been processed, the block is unpinned, and it becomes the most
recently used block.
Buffer manager can use statistical information regarding the
probability that a request will reference a particular relation
E.g., the data dictionary is frequently accessed. Heuristic:
keep data-dictionary blocks in main memory buffer
Buffer managers also support forced output of blocks for the
purpose of recovery (more in Chapter 17)
1.37
File Organization
1.38
File Organization
1.39
Fixed-Length Records
Simple approach:
Store record i starting from byte n (i – 1), where n is the size of
each record.
Record access is simple but records may cross blocks
Modification: do not allow records to cross block boundaries
Deletion of record i:
alternatives:
move records i + 1, . . ., n
to i, . . . , n – 1
move record n to i
do not move records, but
link all free records on a
free list
1.40
Record 2 Deleted and All Records Moved
1.41
Record 2 deleted and Final Record Moved
1.42
Free Lists
Store the address of the first deleted record in the file header.
Use this first record to store the address of the second deleted record,
and so on
Can think of these stored addresses as pointers since they “point” to
the location of a record.
More space efficient representation: reuse space for normal attributes
of free records to store pointers. (No pointers stored in in-use records.)
1.43
Variable-Length Records
1.44
Variable-Length Records: Slotted Page Structure
1.45
Record Representation
Records with fixed length fields are easy to represent
Similar to records (structs) in programming languages
Extensions to represent null values
E.g. a bitmap indicating which attributes are null
Variable length fields can be represented by a pair
(offset,length)
where offset is the location within the record and length is field length.
All fields start at predefined location, but extra indirection required
for variable length fields
1.46
Byte-String Representation of Variable-Length Records
1.47
Fixed-Length Representation
Use one or more fixed length records:
reserved space
pointers
Reserved space – can use fixed-length records of a known
maximum length; unused space in shorter records filled with a null
or end-of-record symbol.
1.48
Pointer Method
Pointer method
A variable-length record is represented by a list of fixed-length
records, chained together via pointers.
Can be used even if the maximum record length is not known
1.49
Pointer Method (Cont.)
Disadvantage to pointer structure; space is wasted in all
records except the first in a a chain.
Solution is to allow two kinds of block in file:
Anchor block – contains the first records of chain
Overflow block – contains records other than those that
are the first records of chairs.
1.50
Organization of Records in Files
1.51
Sequential File Organization
Suitable for applications that require sequential processing of
the entire file
The records in the file are ordered by a search-key
1.52
Sequential File Organization (Cont.)
Deletion – use pointer chains
Insertion –locate the position where the record is to be inserted
if there is free space insert there
if no free space, insert the record in an overflow block
In either case, pointer chain must be updated
Need to reorganize the file
from time to time to restore
sequential order
1.53
Multitable Clustering File Organization
Store several relations in one file using a multitable clustering
file organization Depositor
Customer
1.54
Multitable Clustering File Organization (cont.)
1.55
Clustering File Structure With Pointer Chains
1.56
Data Dictionary Storage
Data dictionary (also called system catalog) stores metadata;
that is, data about data, such as
Information about relations
names of relations
names and types of attributes of each relation
names and definitions of views
integrity constraints
User and accounting information, including passwords
Statistical and descriptive data
number of tuples in each relation
Physical file organization information
How relation is stored (sequential/hash/…)
Physical location of relation
Information about indices (Chapter 12)
1.57
Data Dictionary Storage (Cont.)
Catalog structure
Relational representation on disk
specialized data structures designed for efficient access, in
memory
A possible catalog representation:
1.58