Database Management Systems Week 1
Database Management Systems Week 1
Database Management Systems Week 1
Week 1: Introduction
2
University Database Example
▪ Application program examples
▪ Add new students, instructors, and courses
▪ Register students for courses, and generate class rosters
▪ Assign grades to students, compute grade point
averages (GPA) and generate transcripts
▪ In the early days, database applications were built directly on
top of file systems
3
Drawbacks of using file systems to store data
4
Drawbacks of using file systems to store data (Cont.)
▪ Atomicity of updates
▪ Failures may leave database in an inconsistent state with partial
updates carried out
▪ Example: Transfer of funds from one account to another should
either complete or not happen at all
▪ Concurrent access by multiple users
▪ Concurrent access needed for performance
▪ Uncontrolled concurrent accesses can lead to inconsistencies
▪ Example: Two people reading a balance (say 100) and
updating it by withdrawing money (say 50 each) at the same
time
▪ Security problems
▪ Hard to provide user access to some, but not all, data
5
Relational Model
▪ All the data is stored in various tables.
▪ Example of tabular data in the relational model Columns
Rows
6
A Sample Relational Database
7
Database Design
The process of designing the general structure of the database:
8
Database Design (Cont.)
▪ Is there any problem with this relation?
9
Design Approaches
▪ Need to come up with a methodology to ensure that each of the
relations in the database is “good”
▪ Two ways of doing so:
▪ Entity Relationship Model
▪ Models an enterprise as a collection of entities and
relationships
▪ Represented diagrammatically by an entity-relationship
diagram:
▪ Normalization Theory
▪ Formalize what designs are bad, and test for them
10
Data Definition Language (DDL)
▪ Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
▪ DDL compiler generates a set of table templates stored in a data dictionary
▪ Data dictionary contains metadata (i.e., data about data)
▪ Database schema
▪ Integrity constraints
▪ Primary key (ID uniquely identifies instructors)
▪ Authorization
▪ Who can access what
11
Data Manipulation Language (DML)
▪ Language for accessing and manipulating the data organized
by the appropriate data model
▪ DML also known as query language
▪ Two classes of languages
▪ Pure – used for proving properties about computational
power and for optimization
▪ Relational Algebra
▪ Tuple relational calculus
▪ Domain relational calculus
▪ Commercial – used in commercial systems
▪ SQL is the most widely used commercial language
12
SQL
13
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation
14
Query Processing (Cont.)
▪ Alternative ways of evaluating a given query
▪ Equivalent expressions
▪ Different algorithms for each operation
▪ Cost difference between a good and a bad way of evaluating a
query can be enormous
▪ Need to estimate the cost of operations
▪ Depends critically on statistical information about relations
which the database must maintain
▪ Need to estimate statistics for intermediate results to compute
cost of complex expressions
15
Data Analysis
16
Data Analysis (Cont.)
▪ Data analysis tasks are simplified by specialized tools and SQL
extensions
▪ Example tasks
▪ For each product category and each region, what were the
total sales in the last quarter and how do they compare with
the same quarter last year
▪ As above, for each product category and each customer
category
▪ Statistical analysis packages (e.g., : S++) can be interfaced with
databases
▪ Statistical analysis is a large field, but not covered here
▪ Data mining seeks to discover knowledge automatically in the form of
statistical rules and patterns from large databases.
▪ A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site.
▪ Important for large businesses that generate data from multiple
divisions, possibly at multiple sites
▪ Data may also be purchased externally
17
History of Database Systems
• 1950s and early 1960s:
▪ Data processing using magnetic tapes for storage
▪ Tapes provided only sequential access
▪ Punched cards for input
• Late 1960s and 1970s:
▪ Hard disks allowed direct access to data
▪ Network and hierarchical data models in widespread use
▪ Ted Codd defines the relational data model
▪ Would win the ACM Turing Award for this work
▪ IBM Research begins System R prototype
▪ UC Berkeley (Michael Stonebraker) begins Ingres prototype
▪ Oracle releases first commercial relational database
▪ High-performance (for the era) transaction processing
18
History of Database Systems (Cont.)
• 1980s:
▪ Research relational prototypes evolve into commercial systems
▪SQL becomes industrial standard
▪ Parallel and distributed database systems
▪Wisconsin, IBM, Teradata
▪ Object-oriented database systems
• 1990s:
▪ Large decision support and data-mining applications
▪ Large multi-terabyte data warehouses
▪ Emergence of Web commerce
19
History of Database Systems (Cont.)
• 2000s
▪ Big data storage systems
▪ Google BigTable, Yahoo PNuts, Amazon,
▪ “NoSQL” systems.
▪ Big data analysis: beyond SQL
▪ Map reduce and friends
• 2010s
▪ SQL reloaded
▪ SQL front end to Map Reduce systems
▪ Massively parallel database systems
▪ Multi-core main-memory databases
20