Data Base Lesson 2
Data Base Lesson 2
Data Base Lesson 2
Database System Concepts, 5th Ed.
©Silberschatz, Korth and Sudarshan
See www.dbbook.com for conditions on reuse
Chapter 2: Relational Model
■ Structure of Relational Databases
■ Fundamental RelationalAlgebraOperations
■ Additional RelationalAlgebraOperations
■ Extended RelationalAlgebraOperations
■ Null Values
■ Modification of the Database
■ Example: If
● customer_name = {Jones, Smith, Curry, Lindsay, …}
/* Set of all customer names */
● customer_street = {Main, North, Park, …} /* set of all street names*/
● customer_city = {Harrison, Rye, Pittsfield, …} /* set of all city names */
Then r = { (Jones, Main, Harrison),
(Smith, North, Rye),
(Curry, North, Rye),
(Lindsay, Park, Pittsfield) }
is a relation over
customer_name x customer_street x customer_city
Database System Concepts 5th Edition, Oct 5, 2006 2.4 ©Silberschatz, Korth and Sudarshan
Attribute Types
■ Each attribute of a relation has a name
■ The set of allowed values for each attribute is called the domain of the
attribute
■ Attribute values are (normally) required to be atomic; that is, indivisible
● E.g. the value of an attribute can be an account number,
but cannot be a set of account numbers
■ Domain is said to be atomic if all its members are atomic
■ The special value null is a member of every domain
■ The null value causes complications in the definition of many operations
● We shall ignore the effect of null values in our main presentation and
consider their effect later
■ R = (A1, A2, …, An ) is a relation schema
Example:
Customer_schema = (customer_name, customer_street, customer_city)
■ r(R) denotes a relation r on the relation schema R
Example:
customer (Customer_schema)
attributes
(or columns)
customer_name customer_street customer_city
customer
■ Order of tuples is irrelevant (tuples may be stored in an arbitrary order)
■ Example: account relation with unordered tuples
account : stores information about accounts
depositor : stores information about which customer
owns which account
customer : stores information about customers
■ Storing all information as a single relation such as
bank(account_number, balance, customer_name, ..)
results in
● repetition of information
e.g.,if two customers own an account (What gets repeated?)
● the need for null values
e.g., to represent a customer without an account
■ Normalization theory (Chapter 7) deals with how to design relational schemas
● select: σ
● project: ∏
● union: ∪
● set difference: –
● Cartesian product: x
● rename: ρ
■ The operators take one or two relations as inputs and produce a new
relation as a result.
α α 1 7
α β 5 7
β β 12 3
β β 23 10
σA=B ^ D > 5 (r)
A B C D
α α 1 7
β β 23 10
σp(r) = {t | t ∈ r and p(t)}
Where p is a formula in propositional calculus consisting of terms
connected by : ∧ (and), ∨ (or), ¬ (not)
Each term is one of:
<attribute> op <attribute> or <constant>
where op is one of: =, ≠, >, ≥. <. ≤
■ Example of selection:
σ branch_name=“Perryridge”(account)
α 10 1
α 20 1
β 30 1
β 40 2
∏A,C (r) A C A C
α 1 α 1
α 1 = β 1
β 1 β 2
β 2
∏account_number, balance (account)
α 1 α 2
α 2 β 3
β 1 s
r
A B
■ r ∪ s: α 1
α 2
β 1
β 3
■ Example: to find all customers with either an account or a loan
∏customer_name (depositor) ∪ ∏customer_name (borrower)
α 1 α 2
α 2 β 3
β 1 s
r
■ r – s:
A B
α 1
β 1
■ Set differences must be taken between compatible
relations.
● r and s must have the same arity
● attribute domains of r and s must be compatible
α 1 α 10 a
β 10 a
β 2 β 20 b
r γ 10 b
s
■ r x s:
A B C D E
α 1 α 10 a
α 1 β 10 a
α 1 β 20 b
α 1 γ 10 b
β 2 α 10 a
β 2 β 10 a
β 2 β 20 b
β 2 γ 10 b
■ Assume that attributes of r(R) and s(S) are disjoint. (That is, R ∩ S = ∅).
■ If attributes of r(R) and s(S) are not disjoint, then renaming must be
used.
■ σA=C(r x s)
A B C D E
α 1 α 10 a
β 2 β 10 a
β 2 β 20 b
ρ x (E)
returns the expression E under the name X
■ If a relationalalgebra expression E has arity n, then
ρ (E )
x ( A1 , A2 ,..., An )
returns the result of expression E under the name X, and with the
attributes renamed to A1 , A2 , …., An .
customer (customer_name, customer_street, customer_city)
account (account_number, branch_name, balance)
loan (loan_number, branch_name, amount)
depositor (customer_name, account_number)
borrower (customer_name, loan_number)
■ Find the loan number for each loan of an amount greater than
$1200
∏loan_number (σamount > 1200 (loan))
■ Find the names of all customers who have a loan, an account, or both,
from the bank
∏customer_name (borrower) ∪ ∏customer_name (depositor)
∏customer_name (σbranch_name=“Perryridge”
(σborrower.loan_number = loan.loan_number(borrower x loan)))
■ Find the names of all customers who have a loan at the
Perryridge branch but do not have an account at any branch of
the bank.
∏customer_name (σbranch_name = “Perryridge”
(σborrower.loan_number = loan.loan_number(borrower x loan))) –
∏customer_name(depositor)
● Query 1
∏customer_name (σbranch_name = “Perryridge” (
σborrower.loan_number = loan.loan_number (borrower x loan)))
● Query 2
∏customer_name(σloan.loan_number = borrower.loan_number (
(σbranch_name = “Perryridge” (loan)) x borrower))
● E1 ∪ E2
● E1 – E2
● E1 x E2
● σp (E1), P is a predicate on attributes in E1
● ∏s(E1), S is a list consisting of some of the attributes in E1
● ρ (E ), x is the new name for the result of E
x th Edition, Oct 5, 2006
Database System Concepts 5 1 2.34 1 ©Silberschatz, Korth and Sudarshan
Additional Operations
We define additional operations that do not add any power to the
relational algebra, but that simplify common queries.
■ Set intersection
■ Natural join
■ Division
■ Assignment
■ Relation r, s:
A B A B
α 1 α 2
α 2 β 3
β 1
r s
■ r ∩ s
A B
α 2
● If tr and ts have the same value on each of the attributes in R ∩ S, add a
tuple t to the result, where
t has the same value as tr on r
t has the same value as ts on s
■ Example:
R = (A, B, C, D)
S = (E, B, D)
● Result schema = (A, B, C, D, E)
● r s is defined as:
Database System Concepts 5th Edition, Oct 5, 2006 2.38 ©Silberschatz, Korth and Sudarshan
Natural Join Operation – Example
■ Relations r, s:
A B C D B D E
α 1 α a 1 a α
β 2 γ a 3 a β
γ 4 β b 1 a γ
α 1 γ a 2 b δ
δ 2 β b 3 b ∈
r s
■ r s
A B C D E
α 1 α a α
α 1 α a γ
α 1 γ a α
α 1 γ a γ
δ 2 β b δ
■ Notation: r÷s
■ Suited to queries that include the phrase “for all”.
■ Let r and s be relations on schemas R and S respectively
where
● R = (A1, …, Am , B1, …, Bn )
● S = (B1, …, Bn)
The result of r ÷ s is a relation on schema
R – S = (A1, …, Am)
r ÷ s = { t | t ∈ ∏ RS (r) ∧ ∀ u ∈ s ( tu ∈ r ) }
Where tu means the concatenation of tuples t and u to
produce a single tuple
α
β
α a α a 1 a 1
α a γ a 1 b 1
α a γ b 1 s
β a γ a 1
β a γ b 3
γ a γ a 1
γ a γ b 1
γ a β b 1
r
■ r ÷ s:
A B C
α a γ
γ a γ
r ÷ s = ∏RS (r ) – ∏RS ( ( ∏RS (r ) x s ) – ∏RS,S(r ))
To see why
● ∏RS,S (r) simply reorders attributes of r
● ∏RS (∏RS (r ) x s ) – ∏RS,S(r) ) gives those tuples t in
∏RS (r ) such that for some tuple u ∈ s, tu ∉ r.
temp1 ← ∏RS (r )
temp2 ← ∏RS ((temp1 x s ) – ∏RS,S (r ))
result = temp1 – temp2
● The result to the right of the ← is assigned to the relation variable on
the left of the ←.
● May use variable in subsequent expressions.
∏customer_name (borrower) ∩ ∏customer_name (depositor)
■ Find the name of all customers who have a loan at the bank and the
loan amount
∏customer_name, loan_number, amount (borrower loan)
∏customer_name (σbranch_name = “Downtown” (depositor account )) ∩
∏customer_name (σbranch_name = “Uptown” (depositor account))
● Query 2
∏customer_name, branch_name (depositor account)
÷ ρtemp(branch_name) ({(“Downtown” ), (“Uptown” )})
Note that Query 2 uses a constant relation.
∏customer_name, branch_name (depositor account)
÷ ∏branch_name (σbranch_city = “Brooklyn” (branch))
∏ F1 ,F2 ,..., Fn (E )
■ E is any relationalalgebra expression
■ Each of F1, F2, …, Fn are are arithmetic expressions involving constants
and attributes in the schema of E.
■ Given relation credit_info(customer_name, limit, credit_balance), find
how much more each person can spend:
∏customer_name, limit – credit_balance (credit_info)
G1,G2 ,,Gn
ϑF ( A ),F ( A ,,F ( A ) (E )
1 1 2 2 n n
E is any relationalalgebra expression
● G1, G2 …, Gn is a list of attributes on which to group (can be empty)
● Each Fi is an aggregate function
● Each Ai is an attribute name
A B C
α α 7
α β 7
β β 3
β β 10
27
branch_name g sum(balance) as sum_balance (account)
■ Relation borrower
customer_name loan_number
Jones L170
Smith L230
Hayes L155
loan borrower
■ Left Outer Join
loan borrower
loan_number branch_name amount customer_name
L170 Downtown 3000 Jones
L230 Redwood 4000 Smith
L260 Perryridge 1700 null
■ The result of any arithmetic expression involving null is null.
■ Aggregate functions simply ignore null values (as in SQL)
■ For duplicate elimination and grouping, null is treated like any other
value, and two nulls are assumed to be the same (as in SQL)
account ← account – σ branch_name = “Perryridge” (account )
■ Delete all loan records with amount in the range of 0 to 50
■ Delete all accounts at branches located in Needham.
r1 ← σ branch_city = “Needham” (account branch )
r2 ← ∏ account_number, branch_name, balance (r1)
r3 ← ∏ customer_name, account_number (r2 depositor)
account ← account – r2
depositor ← depositor – r3
account ← account ∪ {(“A973”, “Perryridge”, 1200)}
depositor ← depositor ∪ {(“Smith”, “A973”)}
■ Provide as a gift for all loan customers in the Perryridge
branch, a $200 savings account. Let the loan number serve
as the account number for the new savings account.
r1 ← (σbranch_name = “Perryridge” (borrower loan))
account ← account ∪ ∏loan_number, branch_name, 200 (r1)
depositor ← depositor ∪ ∏customer_name, loan_number (r1)
r ← ∏ F ,F ,,F , ( r )
1 2 l
■ Each F is either
i
● the I th attribute of r, if the I th attribute is not updated, or,
● if the attribute is to be updated Fi is an expression, involving only
constants and the attributes of r, which gives the new value for the
attribute
■ Make interest payments by increasing all balances by 5 percent.
account ← ∏ account_number, branch_name, balance * 1.05 (account)
■ Pay all accounts with balances over $10,000 6 percent interest
and pay all others 5 percent
account ← ∏ account_number, branch_name, balance * 1.06 (σ BAL > 10000 (account ))
∪ ∏ account_number, branch_name, balance * 1.05 (σBAL ≤ 10000 (account))
Database System Concepts, 5th Ed.
©Silberschatz, Korth and Sudarshan
See www.dbbook.com for conditions on reuse
Figure 2.3. The branch relation