Lesson 8 Cs450 - Indexing
Lesson 8 Cs450 - Indexing
Lesson 8 Cs450 - Indexing
Important Points
Data which is organized based on one field, may be difficult to search
based on a different field.
Consider a phone book. The data is well organized if you want to find
Jessica Lins phone number. On the other hand, finding out whose
number 234-2342 belongs to is much harder!
Informally, the attribute we are most interested in searching is called
the search key, or just key (we will formalize this notation later).
Note that the search key can be a combination of fields, for example
phone books are organized by <Last_name, First_name>
Unfortunately, the word key is overloaded in databases, the word key in this context, has
nothing to do with primary key, candidate key etc.
1) Heap Files
The data is unsorted in heap files. We can initially build the
database in sorted order, but if our application is dynamic, our database
will become unsorted very quickly. So we assume that heap files are
unsorted.
Data
Page
Data
Page
Data
Page
Data
Page
Data
Page
Data
Page
Full Pages
Header
Page
Pages with
Free Space
Entries <= 17
5
Data
Page
13
14
Data
Page
Entries > 17
18
16
Data
Page
Data
Page
30
35
Data
Page
43
Data
Page
h
Data
Page
Data
Page
Data
Page
Data
Page
Data
Page
Data
Page
Basic Concepts
Indexing mechanisms are used to speed up access to desired
data.
e.g., author catalog in library
pointer"
Index files are typically much smaller than the original file
Two basic kinds of indices:
Ordered indices: search keys are stored in sorted order (I.e tree based)
Hash indices: search keys are distributed uniformly across buckets
using a hash function .
Blue
100.30
Marge
Red
12.34
Homer
Pink
32,12
search-key"
pointer"
search-key"
pointer"
Bart
Blue
null
search-key"
pointer"
Lisa
Black
45.12
Seymour
Red
56.91
Apu
3 Green
Manjula
Blue
234.23
Lenny
4 White
45.34
search-key"
pointer"
Chapter 10
Tree-Structured Indexing
First, we look at a simple approach (ISAM)
We will see why it is unsatisfactory
This will motivate the B+ tree
Data
Page 1
Data
Page 2
Data
Page 3
::
Data
Page N-1
Data
Page N
K P
Maggie page 7
Data Page 7
Maggie
Manjula
Marge
Monty
K3 P0 K1 P1 K2 P2 K3 P3
K3 P3 7 P4
Index File
12 p ::
16
19 p
Data Files
Data
Page 1
Data
Page 2
Data
Page 3
::
Data
Page N-1
Data
Page N
Index File
::
34
77 p
Data Files
Data
Page 1
Data
Page 2
Data
Page 3
::
Data
Page N-1
Data
Page N
Index File
12 p ::
16
19 p
Data Files
Data
Page 1
Data
Page 2
Data
Page 3
::
Data
Page N-1
Data
Page N
p 21
Index File
12 p ::
16
19 p
Data Files
Data
Page 1
Data
Page 2
Data
Page 3
::
Data
Page N-1
Data
Page N
Data
Page 1
Data
Page 2
17
13
14
Data
Page 3
18
16
Data
Page 4
Data
Page 5
Data
Page 6
Data
Page 7
30
35
Data
Page 8
43
Data
Page 9
Data
Page 10
Suppose we add a
bunch of 15 year olds
to the database
5
Data
Page 1
Data
Page 2
13
14
Data
Page 3
18
16
Data
Page 4
Overflow 1
Data
Page 5
Data
Page 6
Data
Page 7
30
35
Data
Page 8
43
Data
Page 9
Data
Page 10
B+-Trees Example
Root
17
Entries <= 17
5
2*
3*
Entries > 17
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
Queries on
+
B -Trees
Queries on B+-Trees
Find 28*, Find 0*, Find all records > 25
Root
17
Entries <= 17
5
2*
3*
Entries > 17
27
13
5*
7* 8*
14* 16*
22* 24*
30
27* 29*
Updates on
+
B -Trees:
Insertion
Updates on
+
B -Trees:
13
17
24
Insertion
30
Insert 23
2*
3* 5*
7*
14* 16*
2*
3* 5*
7*
13
14* 16*
17
24
30
17
24
30
Insert 8
2*
3* 5*
7*
14* 16*
13
17
24
30
2*
3*
5*
7* 8*
14* 16*
Because the insertion will cause overfill, we split the leaf node into two nodes, we split the data
into two nodes (and distribute the data evenly between them). 5 is special, since it
discriminates between the two new siblings, so it is copied up.
We now need to insert 5 into the parent node
13
17
24
30
2*
3*
5*
7* 8*
14* 16*
17
2*
3*
5*
13
7* 8*
24
14* 16*
30
Because the insertion will cause overfill, we split the node into two nodes, we split the data into two nodes. 17 is special,
since it discriminates between the two new siblings, so it is pushed up.
Updates on
+
B -Trees:
Insertion
17
2*
3*
5*
13
7* 8*
24
14* 16*
30
17
2*
3*
5*
13
7* 8*
24
14* 16*
30