Dbms Unit III Notes

UNIT-3
File Organization: - It refers to the way in which data is stored in a file and
the methods by which it can be accessed. It refers to the logical structuring of
the records as determined by the way in which they are accessed. It is a way
of arranging the records in a file when the file is stored on the disk.
File Operation: The typical operations that may be performed on the
information stored in the file are as follows:
1. Retrieve: To find a record (or set of records ) having a particular value
in a particular field or where the field values satisfy certain conditions.
2. Insert: Inserts a record (or sets of records) at some specific locations.
3. Update: Modifies the field values of a record (or sets of records).
4. Delete: Deletes a particular record ( or sets of record ).
Blocks
 Refer to physical unit of storage in storage devices(Example: sectors in
hard disks, page in virtual memory)
 Usually (but not necessarily) stores records of a single file.
 Of fixed length, based on physical characteristics of the
storage/computing device and operating system.
Blocking Factor
The number of records that are stored in a block is called the “blocking
factor”. Blocking factor is constant across blocks if record length is fixed, or
variable otherwise.
If B is block size and R is record size, then blocking factor is:
bfr =
Since R may not exactly divide B, there could be some left-over space in
each block equal to:
B – (bfr*R) bytes.
Spanned and Unspanned Records: when extra space in blocks is left
unused, the record organization is said to be UNSPANNED.
Record 1 Record 2 Record 3 
Unused
In SPANNED record storage, records can be split so that they span across
blocks.
Block m
Record 1 Record 2 Record 3 Record 4 (part) P
Block p
Record 4
(remaining)
When record size is greater than block size, use of “spanned” record storage
is compulsory.
Average Blocking Factor: For variable length records, with the use of
record spanning, the “average” record length can be used in bfr computation
to compute the number of blocks required for a file of records.
b=
where b is the number of blocks required to store a file having r records of

variable length with the overall blocking factor bfr.
File Organization Techniques
The basic techniques of file organization are as follows:
Index Sequential Access

File Organization Techniques
File
Heap (pile) Access File

Sequential Access File
Hashed Access File

Heap (Pile) File Organization: In this simplest and most basic type of
organization, records are placed in the file in the order in which they are
inserted, so new records are inserted at the end of the file. Such an
organization is called a heap or pile file.
 Inserting a new record is very efficient. The last disk block of the file is
copied into a buffer, the new record is added, and the block is then
rewritten back to disk.
 To delete a record, a program must first find its block, copy the block into
a buffer, delete the record from the buffer, and finally rewrite the block
back to the disk. This leaves unused space in the disk block. Deleting a
large number of records in this way results in wasted storage space.
 Searching a specific record from a heap file would often lead to an
exhaustive search of the entire file. To search a record, one have to access
on average n/2 blocks where n is the total number of blocks in the file.
Advantages
1. This is simplest file organization method.
2. Insertion is somehow efficient.
3. Good for bulk-loading data into a table.
4. Best if file scans are common or insertions are frequent.
Disadvantages
1. Retrieval requires a linear search and is inefficient.
2. Deletion can result in unused space/need for reorganization.
Sequential File Organization: We can physically order the records of a file

on disk based on the values of one of their fields—called the ordering field.
This leads to an ordered or sequential file. If the ordering field is also a key
field of the file—a field guaranteed to have a unique value in each record—
then the field is called the ordering key for the file.
Ordered records have some advantages over unordered files:

 Reading the records in order of the ordering key values becomes
extremely efficient because no sorting is required.
 Finding the next record from the current one in order of the ordering key
usually requires no additional block accesses because the next record is in
the same block as the current one (unless the current record is the last one
in the block).
 Using a search condition based on the value of an ordering key field
results in faster access when the binary search technique is used.
Advantages
 More efficient than pile-files for key-based searches.
 Suitable mainly for random-access devices.
 A better choice when database is mostly read-only and queries are mostly
key-based retrievals.
Disadvantages
 Requires that all new transactions be sorted into the proper sequence for
sequential access processing.
 Locating, storing, modifying, deleting, or adding records it the file require
rearranging the file.
Index File Organization: It organizes the file like a large dictionary, i.e.
records are stored in order of the key but an index is kept which also permits
a type of direct access.
 An index is a set of index value, address pairs. Indexing associates a set of
objects to a set of orderable quantities that are usually smaller in number.
Thus, an index is a mechanism for faster search.
Advantages
 The indexed file organization permits the economical and efficient use of
sequential processing techniques when the activity rate is high.
 This organization also provides quick access to records in a relatively
efficient way.
 Records can be inserted or updated in the middle of the file.
Disadvantages
 It requires relatively expensive hardware and software resources.
 It requires unique keys.
 Processing is occasionally slow.
 Requires periodic re-organization of file.
Hashing
Another type of primary file organization is based on hashing, which
provides very fast access to records under certain search conditions. This
organization is usually called a hash file. The search condition must be an
equality condition on a single field, called the hash field. In most cases, the
hash field is also a key field of the file, in which case it is called the hash key.
The idea behind hashing is to provide a function h, called a hash function or
randomizing function, which is applied to the hash field value of a record and
yields the address of the disk block in which the record is stored. A search for
the record within the block can be carried out in a main memory buffer. For
most records, we need only a single-block access to retrieve that record.
In a huge database structure, it is very inefficient to search all the index

values and reach the desired data. Hashing technique is used to calculate the
direct location of a data record on the disk without using index structure.
In this technique, data is stored at the data blocks whose address is generated
by using the hashing function. The memory location where these records are
stored is known as data bucket or data blocks.
Advantages
 Records are not needed to be stored in order during addition.
 It gives fastest retrieval of records.
 Searching time depends upon mapping procedure not logarithm of the
number of search keys as in B-tree.
Disadvantages
 In direct organization, one calculates the address. So, processing time is
consumed in this calculation to record address.
 Addition and deletion is complicated in direct file.
 One needs expensive hardware and software for direct organization.
Hash Function: A hash function is any well-defined procedure or

mathematical function which converts a large, possibly variable-sized amount
of data into a small datum, usually a single integer that may serve as an index
into an array.
The basic requirements for a hash function are:
1. The input can be of any length.
2. The output has a fixed length.
3. H(x) is relatively easy to compute for any given x
4. H(x) is one-way.
5. H(x) is collision-free.
Types of Hash Function:-
1. Division Remainder Method:- In this method we divide the key
value(k) by table size(m) which gives the address.
h(k)=k mod m /* m is prime number*/
key=131110 and m=10 then h(key)=131110 mod 10=0 so the address
is 0.
2. Mid Square Method:- In this method we square the key value and then
using number of bits from middle of square to obtain the bucket
address.
Key=7 then square of key=72=49 = 0110001 =mid value
=100(address)
3. Folding hash function:- in the first step the key value is divided into
various parts . in second step these parts are added together and hash
value is obtained by ignoring the last carry in 3 digits.
For eg:-k=9235 then parts=92,35 and sum of parts =92+35=127=27
4. Multiplication method- It multiplies of all the individual digits in the
key together and takes remainder after dividing the resulting number by
table size(m).
For eg:- h(k)=(a*b*c*d….)mod m
Key=131110 and m=10 then h(k)=1*3*1*1*3*0 mod 10 =0 mod
10=0.
Internal Hashing
For internal files, hashing is typically implemented as a hash table through
the use of an array of records. Suppose that the array index range is from 0 to
M – 1, then we have M slots whose addresses correspond to the array
indexes. We choose a hash function that transforms the hash field value into
an integer between 0 and M − 1. One common hash function is the h(K) = K
mod M function, which returns the remainder of an integer hash field value K
after division by M; this value is then used for the record address.
A collision occurs when the hash field value of a record that is being inserted
hashes to an address that already contains a different record. The process of
finding another position is called collision resolution. There are numerous
methods for collision resolution, including the following:
 Open addressing: Proceeding from the occupied position specified by
thehash address, the program checks the subsequent positions in order
until anunused (empty) position is found.
 Chaining:For this method, various overflow locations are kept, usually
byextending the array with a number of overflow positions.
Additionally, apointer field is added to each record location. A collision
is resolved by placingthe new record in an unused overflow location
and setting the pointer ofthe occupied hash address location to the
address of that overflow location.
 Multiple hashing:The program applies a second hash function if the
firstresults in a collision. If another collision results, the program uses
openaddressing or applies a third hash function and then uses open
addressing ifnecessary.
External Hashing
Hashing for disk files is called external hashing.
Types of Hashing Approach:
Types of Hashing Approach
Static Hashing Approach
Dynamic Hashing Approach

Static Hashing
In static hashing, the resultant data bucket address will always be the same.
That means if we generate an address for EMP_ID =103 using the hash
function mod (5) then it will always result in same bucket address 3. Here,
there will be no change in the bucket address. Hence in this static hashing,
the number of data buckets in memory remains constant throughout.
To suit the characteristics of disk storage, the target address space is made of
buckets, each of which holds multiple records. A bucket is either one disk
block or a cluster of contiguous disk blocks. The hashing function maps a key
into a relative bucket number, rather than assigning an absolute block address
to the bucket. A table maintained in the file header converts the bucket
number into the corresponding disk block address.
The collision problem is less severe with buckets, because as many records as
will fit in a bucket can hash to the same bucket without causing problems.
We can use a variation of chaining in which a pointer is maintained in each
bucket to a linked list of overflow records for the bucket.The pointers in the
linked list should be record pointers, which include both a block address and
a relative record position within the block.
The hashing scheme described so far is called static hashing because a fixed
number of buckets M is allocated. This can be a serious drawback for
dynamic files. Suppose that we allocate M buckets for the address space and
let m be the maximum number of records that can fit in one bucket; then at
most (m *M) records will fit in the allocated space. If the number of records
turns out to be substantially fewer than (m * M), we are left with a lot of
unused space. On the other hand, if the number of records increases to
substantially more than (m * M), numerous collisions will result and retrieval
will be slowed down because of the long lists of overflow records. In either
case,we may have to change the number of blocks M allocated and then use a
new hashing function (based on the new value of M) to redistribute
therecords. These reorganizations can be quite time-consuming for large files.
Operation in static Hashing:-
 Insertion − When a record is required to be entered using static hash,
the hash function h computes the bucket address for search key K,
where the record will be stored.
Bucket address = h(K)
 Search − When a record needs to be retrieved, the same hash function
can be used to retrieve the address of the bucket where the data is
stored.
 Delete − This is simply a search followed by a deletion operation.
Bucket Overflow in static hashing:-

The condition of bucket-overflow is known as collision. This is a fatal state
for any static hash function. In this case, overflow chaining can be used.
 Overflow Chaining − When buckets are full, a new bucket is allocated
for the same hash result and is linked after the previous one. This
mechanism is called Closed Hashing. In Closed hashing method, a new
data bucket is allocated with same address and is linked it after the full data
bucket. This method is also known as overflow chaining.
 Linear Probing − When a hash function generates an address at which

data is already stored, the next free bucket is allocated to it. This
mechanism is called Open Hashing. In Open hashing method, next
available data block is used to enter the new record, instead of overwriting older
one. This method is also called linear probing.
Dynamic Hashing
o The dynamic hashing method is used to overcome the problems of
static hashing like bucket overflow.
o In this method, data buckets grow or shrink as the records increases or
decreases.
o The problem with static hashing is that it does not expand or shrink
dynamically as the size of the database grows or shrinks. Dynamic
hashing provides a mechanism in which data buckets are added and
removed dynamically and on-demand.
o Hash function, in dynamic hashing, is made to produce a large number
of values and only a few are used initially.
Advantages of Dynamic Hashing:-

 Performance does not come down as the data grows in the system. It
simply increases the memory size to accommodate the data.
 Since it grows and shrinks with the data, memory is well utilized. There
will not be any unused memory lying.
 Good for dynamic databases where data grows and shrinks frequently.
Disadvantages:-
 When there is a huge increase in data, maintaining this bucket address
table becomes tedious.
 Bucket overflow situation will occur in this case too. But it might take
little time to reach this situation than static hashing.
Types of Dynamic Techniques:
Dynamic Hashing Approach
Extendible Hashing o
Linear Hashing
Extendible Hashing: Extendible
hashing has a dynamic structure that grows and shrinks as the database grows
and shrinks. This approach simultaneously solves the problem of making
hash tables that are extendible.
In extendible hashing, a type of directory—an array of 2d bucket addresses—
is maintained, where d is called the global depth of the directory. The integer
value corresponding to the first (high-order) d bits of a hash value is used as
an index to the array to determine a directory entry, and the address in that
entry determines the bucket in which the corresponding records are stored.
However, there does not have to be a distinct bucket for each of the 2d
directory locations. Several directory locations with the same first d ” bits for
their hash values may contain the same bucket address if all the records that
hash to these locations fit in a single bucket. A local depth d”—stored with
each bucket—specifies the number of bits on which the bucket contents are
based.
The following figure shows a directory with global depth d = 3.
To illustrate bucket splitting, suppose that a new inserted record causes
overflow in the bucket whose hash values start with 01—the third bucket in
Figure. The records will be distributed between two buckets: the first
contains all records whose hash values start with 010, and the second all
those whose hash values start with 011. Now the two directory locations for
010 and 011 point to the two new distinct buckets. Before the split, they
pointed to the same bucket. The local depth d ” of the two new buckets is 3,
which is one more than the local depth of the old bucket.
If a bucket that overflows and is split used to have a local depth d ” equal to
the global depth d of the directory, then the size of the directory must now be
doubled so that we can use an extra bit to distinguish the two new buckets.
The main advantage of extendible hashing that makes it attractive is that the
performance of the file does not degrade as the file grows, as opposed to
static external hashing where collisions increase and the corresponding
chaining effectively increases the average number of accesses per key.
Additionally, no space is allocated in extendible hashing for future growth,
but additional buckets can be allocated dynamically as needed.
Linear Hashing: It is a dynamically updateable disk based index structure
which implements a hashing scheme and which grows or shrinks one bucket
at a time. The idea behind linear hashing is to allow a hash file to expand and
shrink its number of buckets dynamically without needing a directory.
Suppose that the file starts with M buckets numbered 0, 1, ..., M − 1 and uses
the mod hash function h(K) = K mod M; this hash function is called the
initial hash function hi. Overflow because of collisions is still needed and can
be handled by maintaining individual overflow chains for each bucket.
However, when a collision leads to an overflow record in any file bucket, the
first bucket in the file—bucket 0—is split into two buckets: the original
bucket 0 and a new bucket M at the end of the file. The records originally in
bucket 0 are distributed between the two buckets based on a different hashing
function hi+1(K) = K mod 2M. A key property of the two hash functions h i
and hi+1 is that any records that hashed to bucket 0 based on hi will hash to
either bucket 0 or bucket M based on h i+1; this is necessary for linear hashing
to work.
As further collisions lead to overflow records, additional buckets are split in
the linear order 1, 2, 3, .... If enough overflows occur, all the original file
buckets 0, 1, ..., M− 1 will have been split, so the file now has 2M instead of
M buckets, and all buckets use the hash function h i+1. Hence, the records in
overflow are eventually redistributed into regular buckets, using the function
hi+1 via a delayed split of their buckets. There is no directory; only a value n
—which is initially set to 0 and is incremented by 1 whenever a split occurs
—is needed to determine which buckets have been split.
Collision in Hashing:-
If two or more keys that are kept in the same location in hash table is called
collision. A hash function h maps the keys k and j to the same slot so they
collide.
Collision Resolution Techniques:- When collision occurs the hashing

methods are used to avoid collision these techniques are called collision
resolution strategies.
1. Open Hashing (Separate chaining)- The simplest form of open
hashing defines each slot in the hash table to be the head of a
linked list. All records that hash to a particular slot are placed on
that slot’s linked list.
2. Closed hashing (Open addressing) – finds the next free spot in
the Hash table. A closed hashing implementation is one in which
the elements stay in the array.
Types of Closed hashing:-
a) Linear Probing − When a hash function generates an
address at which data is already stored, the next free bucket
is allocated to it. This mechanism is called Open Hashing.
h (k) = (h' (k) + i) mod m
b) Quadratic Probing:-In this method we search the key
value h,h+1,h+4,h+9,…..h+i2. Address=h(k)+i2
c) Double Hashing: the first hash function is used for
resolving collision and second function is added to first
hash function.it search the address with h, h+h’ ,
h+3h’…..h+ih’
h (k) = (h1(k) + i h2 (k)) mod m
3. Rehashing: - It is the process of applying a hash function to the
entries to move them to another hash table. It is possible to use
the hash function which was used earlier or use a new function
altogether.
Indexing
Indexes are used to speed up the retrieval of records in response to certain
search conditions. The index structures typically provide secondary access
paths, which provide alternative ways to access the records without affecting
the physical placement of records on disk.
o Indexing is used to optimize the performance of a database by
minimizing the number of disk accesses required when a query is
processed.
o The index is a type of data structure. It is used to locate and access the
data in a database table quickly.
Index structure:
Indexes can be created using some database columns.
o The first column of the database is the search key that contains a copy
of the primary key or candidate key of the table. The values of the
primary key are stored in sorted order so that the corresponding data
can be accessed easily.
o The second column of the database is the data reference. It contains a
set of pointers holding the address of the disk block where the value of
the particular key can be found.
Indexing Methods
Primary Index
o If the index is created on the basis of the primary key of the table, then
it is known as primary indexing. These primary keys are unique to each
record and contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the
searching operation is quite efficient.
o Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the
relation.
o The primary index can be classified into two types: Dense index and
Sparse index.
Dense index
o The dense index contains an index record for every search key value in
the data file. It makes searching faster.
o In this, the number of records in the index table is same as the number
of records in the main table.
o It needs more space to store index record itself. The index records have
the search key and a pointer to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. Each item
points to a block.
o In this, instead of pointing to each record in the main table, the index
points to the records in the main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the

index is created on non-primary key columns which may not be unique
for each record.
o In this case, to identify the record faster, we will group two or more
columns to get the unique value and create index out of them. This
method is called a clustering index.
o The records which have similar characteristics are grouped, and
indexes are created for these group.
o Clustering index is defined on an ordered data file. The data file is
ordered on a non-key field.
Example: suppose a company contains several employees in each

department. Suppose we use a clustering index, where all employees which
belong to the same Dept_ID are considered within a single cluster, and index
pointers point to the cluster as a whole. Here Dept_Id is a non-unique key.
The previous schema is little confusing because one disk block is shared by
records which belong to the different cluster. If we use separate disk block
for separate clusters, then it is called better technique.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also
grows. These mappings are usually kept in the primary memory so that
address fetch should be faster. Then the secondary memory searches the
actual data based on the address got from mapping. If the mapping size grows
then fetching the address itself becomes slower. In this case, the sparse index
will not be efficient. To overcome this problem, secondary indexing is
introduced.
In secondary indexing, to reduce the size of mapping, another level of

indexing is introduced. In this method, the huge range for the columns is
selected initially so that the mapping size of the first level becomes small.
Then each range is further divided into smaller ranges. The mapping of the
first level is stored in the primary memory, so that address fetch is faster. The
mapping of the second level and actual data are stored in the secondary
memory (hard disk).
Secondary index may be generated from a field which is a candidate key and
has a unique value in every record, or a non-key with duplicate values.
For example:
o If you want to find the record of roll 111 in the diagram, then it will
search the highest entry which is smaller than or equal to 111 in the
first level index. It will get 100 at this level.
o Then in the second index level, again it does max (111) <= 111 and
gets 110. Now using the address 110, it goes to the data block and starts
searching each record till it gets 111.
o This is how a search is performed in this method. Inserting, updating or
deleting is also done in the same manner.
Multilevel Index
o Index records comprise search-key values and data pointers. Multilevel
index is stored on the disk along with the actual database files. As the
size of the database grows, so does the size of the indices. There is an
immense need to keep the index records in the main memory so as to
speed up the search operations. If single-level index is used, then a
large size index cannot be kept in memory which leads to multiple disk
accesses.
o
B-Tree
A B-tree of order p, when used as an access structure on a key field to search
for records in a data file, can be defined as follows:
1. Each internal node in the B-tree is of the form<P1, <K1, Pr1>, P2, <K2,
Pr2>, ..., <Kq-1, Prq-1>, Pq>
Where q ≤ p. Each Pi is a tree pointer—a pointer to another node in the
B-tree. Each Pri is a data pointer—a pointer to the record whose search
key field value is equal to Ki (or to the data file block containing that
record).
2. Within each node, K1< K2< ... < Kq-1.
3. For all search key field values X in the sub tree pointed at by P i (the ith
sub tree), we have:
Ki-1< X < Ki for 1 <i< q; X < Ki for i = 1; and Ki-1< X for i = q.
4. Each node has at most p tree pointers.
5. Each node, except the root and leaf nodes, has at least tree
pointers.The root node has at least two tree pointers unless it is the only
node in thetree.
6. A node with q tree pointers, q ≤ p, has q – 1 search key field values
(and hencehas q – 1 data pointers).
7. All leaf nodes are at the same level. Leaf nodes have the same structure
asinternal nodes except that all of their tree pointers Pi are NULL.
Example of Btree
key: - 1,12,8,2,25,6,14,28,17,
Order = 5
Procedure for adding key in b-tree
Step1. Add first key as root node.
Step2. Add next key at the appropriate place in sorted order.
Step3. Same process applied until root node full. if root node full then spliting process applied.
Some important steps

B+-Tree
A B+ tree is a balanced binary search tree that follows a multi-level index

format. The leaf nodes of a B+ tree denote actual data pointers. B+ tree
ensures that all leaf nodes remain at the same height, thus balanced.
Additionally, the leaf nodes are linked using a link list; therefore, a B+ tree
can support random access as well as sequential access. B+ Tree is an
extension of B Tree which allows efficient insertion, deletion and search
operations. In B Tree, Keys and records both can be stored in the internal as
well as leaf nodes. Whereas, in B+ tree, records (data) can only be stored on
the leaf nodes while internal nodes can only store the key values.The leaf
nodes of a B+ tree are linked together in the form of a singly linked lists to
make the search queries more efficient.
Structure of B+ Tree
Every leaf node is at equal distance from the root node. A B + tree is of the
order n where n is fixed for every B+ tree
In a B-tree, every value of the search field appearsonce at some level in the
tree, along with a data pointer. In a B+-tree, data pointersare stored only at
the leaf nodes of the tree; hence, the structure of leaf nodes differsfrom the
structure of internal nodes. The leaf nodes have an entry for every value ofthe
search field, along with a data pointer to the record (or to the block that
containsthis record) if the search field is a key field. For a nonkey search
field, the pointerpoints to a block containing pointers to the data file records,
creating an extra levelof indirection.
The structure of internal nodes of a B+-tree of order p is as follows:

1. Each internal node is of the form
<P1, K1, P2, K2, ..., Pq-1, Kq-1, Pq>
Where q ≤ p and each Pi is a tree pointer.
2. Within each internal node, K1< K2< ... < Kq-1.
3. For all search field values X in the subtree pointed at by P i, we have Ki-
1< X≤ Ki for 1 <i< q; X ≤ Ki for i = 1; and Ki-1< X for i = q
4. Each internal node has at most p tree pointers.
5. Each internal node, except the root, has at least tree pointers.
Theroot node has at least two tree pointers if it is an internal node.
6. An internal node with q pointers, q ≤ p, has q − 1 search field values.
The structure of the leaf nodes of a B+-tree of order p is as follows:
1. Each leaf node is of the form
<<K1, Pr1>, <K2, Pr2>, ...,<Kq-1, Prq-1>, Pnext>
Where q ≤ p, each Pri is a data pointer, and Pnext points to the next leaf
node of the B+-tree.
2. Within each leaf node, K1 ≤ K2... , Kq-1, q ≤ p.
3. Each Pri is a data pointer that points to the record whose search field
value is Ki or to a file block containing the record (or to a block of
record pointersthat point to records whose search field value is K i if the
search field is not akey).
4. Each leaf node has at least values.
5. All leaf nodes are at the same level.
Advantages of B+ Tree
1. Records can be fetched in equal number of disk accesses.
2. Height of the tree remains balanced and less as compare to B tree.
3. We can access the data stored in a B+ tree sequentially as well as directly.
4. Keys are used for indexing.
5. Faster search queries as the data is stored only on the leaf nodes.
Comparison between B Tree and B+ Tree:
B Tree B+ Tree
Also known as Balanced tree. B plus tree.
In a B tree, search keys

In a B+ tree, data stored
Storage and data stored in
only in leaf nodes.
internal or leaf nodes.
Data The leaf nodes of the The leaf nodes of the

three store pointers to tree stores the actual
records rather than actual record rather than
records. pointers to records.
There trees do not waste

Space These trees waste space
space.
In B tree, the leaf node In B+ tree, leaf node

Function of leaf nodes cannot store using linked data are ordered in a
list. sequential linked list.
Here, searching becomes Here, searching of any

difficult in B- tree as data in a B+ tree is very
Searching
data cannot be found in easy because all data is
the leaf node. found in leaf nodes.
Here in B tree the search

Here in B+ tree the
Search accessibility is not that easy as
searching becomes easy.
compared to a B+ tree.
They do not store They store redundant

Redundant key
redundant search key. search key.
They are an older

Many database system
version and are not that
implementers prefer the
Applications advantageous as
structural simplicity of a
compared to the B+
B+ tree.
trees.
Example
Delete the key 200 from the B+ Tree shown in the following figure.
After delete 200

Dbms Unit III Notes

Uploaded by

Copyright:

Available Formats

Dbms Unit III Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Dbms Unit III Notes

Uploaded by

Copyright:

Available Formats

UNIT-3

where b is the number of blocks required to store a file having r records of

Index Sequential Access

Heap (pile) Access File

Hashed Access File

Sequential File Organization: We can physically order the records of a file

Ordered records have some advantages over unordered files:

In a huge database structure, it is very inefficient to search all the index

Hash Function: A hash function is any well-defined procedure or

Types of Hashing Approach

Static Hashing Approach

Dynamic Hashing Approach

Bucket Overflow in static hashing:-

 Linear Probing − When a hash function generates an address at which

Advantages of Dynamic Hashing:-

Types of Dynamic Techniques:

Dynamic Hashing Approach

Collision Resolution Techniques:- When collision occurs the hashing

o A clustered index can be defined as an ordered data file. Sometimes the

Example: suppose a company contains several employees in each

In secondary indexing, to reduce the size of mapping, another level of

Step2. Add next key at the appropriate place in sorted order.

Some important steps

A B+ tree is a balanced binary search tree that follows a multi-level index

The structure of internal nodes of a B+-tree of order p is as follows:

Comparison between B Tree and B+ Tree:

Also known as Balanced tree. B plus tree.

In a B tree, search keys

Data The leaf nodes of the The leaf nodes of the

There trees do not waste

In B tree, the leaf node In B+ tree, leaf node

Here, searching becomes Here, searching of any

Here in B tree the search

They do not store They store redundant

They are an older

You might also like