Overview of Storage and Indexing: CSCD34 - Data Management Systems - A. Vaisman 1
Overview of Storage and Indexing: CSCD34 - Data Management Systems - A. Vaisman 1
Overview of Storage and Indexing: CSCD34 - Data Management Systems - A. Vaisman 1
Indexing
Indexes
Index Classification
Index Classification
1
2
3
4
Brown ..
Smith..
White ..
Yu ..
5 Chen ..
6 Peterson..
7 Rhodes..
..
Brown
Chen
Peterson
Rhodes
Smith
Yu
White
CLUSTERED
Index entries
direct search for
data entries
UNCLUSTERED
Data entries
Data entries
(Index File)
(Data file)
Data Records
Data Records
Example B+ Tree
Root
17
Entries <= 17
5
2*
3*
Entries > 17
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
Hash-Based Indexes
Static Hashing
# primary pages fixed, allocated sequentially, never deallocated; overflow pages if needed.
h(k) mod N = bucket to which data entry with key k
belongs. (N = # of buckets)
Long overflow chains can develop and degrade
performance.
Extendible and Linear Hashing: Dynamic techniques to
0
fix h(key)
this. mod N
2
key
N-1
Primary bucket pages
CSCD34 - Data Management Systems - A. Vaisman
Overflow pages
10
11
12
13
Operations to Compare
14
Cost of Operations
(1) Heap
(a) Scan
(b) Equality
(c ) Range
(d) Insert
(e) Delete
BD
0.5BD
BD
2D
Search
+D
Search
+BD
Search
+D
Search
+ 2D
Dlog 2 B +
# matches
(3) Clustered
1.5BD
Dlog F 1.5B Dlog 2 1.5B
+ # matches
(4) Unclustered BD(R+0.15) D(1 +log F Dlog F
Tree index
0.15B)
0.15B
+ # matches
(5) Unclustered BD(R+0.1 2D
BD
Hash index
25)
(2) Sorted
BD
Dlog 2B
Search
+ BD
Search
+D
D(3 +log F
0.15B)
4D
Search
+ 2D
Several
assumptions
underlie these
(rough)
estimates!
15
Choice of Indexes
What indexes should we create?
One approach: Consider the most important queries
in turn. Consider the best plan using the current
indexes, and see if a better plan is possible with an
additional index. If so, create it.
16
17
Examples of Clustered
Indexes
SELECT E.dno
FROM Emp E
WHERE E.age>40
(*
FROM Emp E
WHERE E.hobby=Stamp
18
11,80
11
12,10
12
12,20
13,75
<age, sal>
10,12
20,12
75,13
10
cal 11
80
joe 12
20
sue 13
75
12
13
<age>
10
Data records
sorted by name
80,11
<sal, age>
20
75
80
<sal>
Data entries
sorted by <sal>
19
20
Summary (Contd.)
21
Summary (Contd.)
22