Database Management Systems Week 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

Adapted for SEN2104 – DBMS

Week 1: Introduction

Database System Concepts


©Silberschatz, Korth and Sudarshan
Database Management System (DBMS)

▪ DBMS contains information about a particular enterprise


▪ Collection of interrelated data
▪ Set of programs to access the data
▪ An environment that is both convenient and efficient to use
▪ Database Applications:
▪ Banking: transactions
▪ Airlines: reservations, schedules
▪ Universities: registration, grades
▪ Sales: customers, products, purchases
▪ Online retailers: order tracking, customized recommendations
▪ Manufacturing: production, inventory, orders, supply chain
▪ Human resources: employee records, salaries, tax deductions
▪ Databases can be very large.
▪ Databases touch all aspects of our lives

2
University Database Example
▪ Application program examples
▪ Add new students, instructors, and courses
▪ Register students for courses, and generate class rosters
▪ Assign grades to students, compute grade point
averages (GPA) and generate transcripts
▪ In the early days, database applications were built directly on
top of file systems

3
Drawbacks of using file systems to store data

▪ Data redundancy and inconsistency


▪ Multiple file formats, duplication of information in different files
▪ Difficulty in accessing data
▪ Need to write a new program to carry out each new task
▪ Data isolation
▪ Multiple files and formats
▪ Integrity problems
▪ Integrity constraints (e.g., account balance > 0) become “buried”
in program code rather than being stated explicitly
▪ Hard to add new constraints or change existing ones

4
Drawbacks of using file systems to store data (Cont.)

▪ Atomicity of updates
▪ Failures may leave database in an inconsistent state with partial
updates carried out
▪ Example: Transfer of funds from one account to another should
either complete or not happen at all
▪ Concurrent access by multiple users
▪ Concurrent access needed for performance
▪ Uncontrolled concurrent accesses can lead to inconsistencies
▪ Example: Two people reading a balance (say 100) and
updating it by withdrawing money (say 50 each) at the same
time
▪ Security problems
▪ Hard to provide user access to some, but not all, data

Database systems offer solutions to all the above problems

5
Relational Model
▪ All the data is stored in various tables.
▪ Example of tabular data in the relational model Columns

Rows

6
A Sample Relational Database

7
Database Design
The process of designing the general structure of the database:

▪ Logical Design – Deciding on the database schema.


Database design requires that we find a “good” collection of
relation schemas.
▪ Business decision – What attributes should we record in
the database?
▪ Computer Science decision – What relation schemas
should we have and how should the attributes be
distributed among the various relation schemas?
▪ Physical Design – Deciding on the physical layout of the
database

8
Database Design (Cont.)
▪ Is there any problem with this relation?

9
Design Approaches
▪ Need to come up with a methodology to ensure that each of the
relations in the database is “good”
▪ Two ways of doing so:
▪ Entity Relationship Model
▪ Models an enterprise as a collection of entities and
relationships
▪ Represented diagrammatically by an entity-relationship
diagram:
▪ Normalization Theory
▪ Formalize what designs are bad, and test for them

10
Data Definition Language (DDL)
▪ Specification notation for defining the database schema
Example: create table instructor (
ID char(5),
name varchar(20),
dept_name varchar(20),
salary numeric(8,2))
▪ DDL compiler generates a set of table templates stored in a data dictionary
▪ Data dictionary contains metadata (i.e., data about data)
▪ Database schema
▪ Integrity constraints
▪ Primary key (ID uniquely identifies instructors)
▪ Authorization
▪ Who can access what

11
Data Manipulation Language (DML)
▪ Language for accessing and manipulating the data organized
by the appropriate data model
▪ DML also known as query language
▪ Two classes of languages
▪ Pure – used for proving properties about computational
power and for optimization
▪ Relational Algebra
▪ Tuple relational calculus
▪ Domain relational calculus
▪ Commercial – used in commercial systems
▪ SQL is the most widely used commercial language

12
SQL

▪ The most widely used commercial language


▪ To be able to compute complex functions SQL is usually
embedded in some higher-level language
▪ Application programs generally access databases through one of
▪ Language extensions to allow embedded SQL
▪ Application program interface (e.g., ODBC/JDBC) which allow
SQL queries to be sent to a database

13
Query Processing
1. Parsing and translation
2. Optimization
3. Evaluation

14
Query Processing (Cont.)
▪ Alternative ways of evaluating a given query
▪ Equivalent expressions
▪ Different algorithms for each operation
▪ Cost difference between a good and a bad way of evaluating a
query can be enormous
▪ Need to estimate the cost of operations
▪ Depends critically on statistical information about relations
which the database must maintain
▪ Need to estimate statistics for intermediate results to compute
cost of complex expressions

15
Data Analysis

▪ Decision-support systems are used to make business decisions,


often based on data collected by on-line transaction-processing
systems.
▪ Examples of business decisions:
▪ What items to stock?
▪ What insurance premium to change?
▪ To whom to send advertisements?
▪ Examples of data used for making decisions
▪ Retail sales transaction details
▪ Customer profiles (income, age, gender, etc.)

16
Data Analysis (Cont.)
▪ Data analysis tasks are simplified by specialized tools and SQL
extensions
▪ Example tasks
▪ For each product category and each region, what were the
total sales in the last quarter and how do they compare with
the same quarter last year
▪ As above, for each product category and each customer
category
▪ Statistical analysis packages (e.g., : S++) can be interfaced with
databases
▪ Statistical analysis is a large field, but not covered here
▪ Data mining seeks to discover knowledge automatically in the form of
statistical rules and patterns from large databases.
▪ A data warehouse archives information gathered from multiple
sources, and stores it under a unified schema, at a single site.
▪ Important for large businesses that generate data from multiple
divisions, possibly at multiple sites
▪ Data may also be purchased externally

17
History of Database Systems
• 1950s and early 1960s:
▪ Data processing using magnetic tapes for storage
▪ Tapes provided only sequential access
▪ Punched cards for input
• Late 1960s and 1970s:
▪ Hard disks allowed direct access to data
▪ Network and hierarchical data models in widespread use
▪ Ted Codd defines the relational data model
▪ Would win the ACM Turing Award for this work
▪ IBM Research begins System R prototype
▪ UC Berkeley (Michael Stonebraker) begins Ingres prototype
▪ Oracle releases first commercial relational database
▪ High-performance (for the era) transaction processing

18
History of Database Systems (Cont.)

• 1980s:
▪ Research relational prototypes evolve into commercial systems
▪SQL becomes industrial standard
▪ Parallel and distributed database systems
▪Wisconsin, IBM, Teradata
▪ Object-oriented database systems
• 1990s:
▪ Large decision support and data-mining applications
▪ Large multi-terabyte data warehouses
▪ Emergence of Web commerce

19
History of Database Systems (Cont.)

• 2000s
▪ Big data storage systems
▪ Google BigTable, Yahoo PNuts, Amazon,
▪ “NoSQL” systems.
▪ Big data analysis: beyond SQL
▪ Map reduce and friends
• 2010s
▪ SQL reloaded
▪ SQL front end to Map Reduce systems
▪ Massively parallel database systems
▪ Multi-core main-memory databases

20

You might also like