Database Management System

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 21

Database Management System

A DBMS is the acronym of Data Base Management System is a collection of interrelated data and
a set of programs to access those data. It manages new large amount of data and supports
efficient access to new large amounts of data.
Features of Database:

Faithfulness: The design and implementation should be faithful to the requirements.

Avoid Redundancy: This value is important because redundancy.

Simplicity: Simplicity requires that the design and implementation avoid introducing more
elements than are absolutely necessary.

Right kind of element: Attributes are easier to implement but entity sets are relationships
are necessary to ensure that the right kind of element is introduced.

Types of Database

Centralized Database: All data is located at a single site.

Distributed Database: The database is stored on several computer.

The information contained in a database is represented on two levels:

1. Data (which is large and is being frequently modified)


2. Structure of data (which is small and stable in time)
Database Management System (DBMS) provides efficient, reliable, convenient and safe multi user
storage of and access to massive amounts of persistent data.

Key People Involved in a DBMS:

DBMS Implementer: Person who builds system

Database Designer: Person responsible for preparing external schemas for applications,
identifying and integrating user needs into a conceptual (or community or enterprise)
schema.

Database Application Developer: Person responsible for implementing database


application programs that facilitate data access for end users.

Database Administrator: Person responsible for define the internal schema, sub-schemas
(with database designers) and specifying mappings between schemas, monitoring
database usage and supervising DBMS functionality (e.g., access control, performance
optimisation, backup and recovery policies, conflict management).

End Users: Users who query and update the database through fixed programs (invoked by
non-programmer users) e.g., banking.

Levels of Data Abstraction: A 3-tier architecture separates its tiers from each other based on the
complexity of the users and how they use the data present in the database. It is the most widely
used architecture to design a DBMS.

Physical Level: It is lowest level of abstraction and describes how the data are actually
stored and complex low level data structures in detail.

Logical Level: It is the next higher level of abstraction and describes what data are stored
and what relationships exist among those data. At the logical level, each such record is
described by a type definition and the interrelationship of these record types is defined as
well. Database administrators usually work at this level of abstraction.

View Level: It is the highest level of abstraction and describes only part of the entire
database and hides the details of the logical level.

Relational Algebra:
Relational model is completely based on relational algebra. It consists of a collection of operators
that operate on relations. Its main objective is data retrieval. It is more operational and very much
useful to represent execution plans, while relational calculus is non-operational and declarative.
Here, declarative means user define queries in terms of what they want, not in terms of how
compute it.
Basic Operation in Relational Algebra
The operations in relational algebra are classified as follows.
Selection (): The select operation selects tuples/rows that satisfy a given predicate or condition.
We use () to denote selection. The predicate/condition appears as a subscript to .
Projection (): It selects only required/specified columns/attributes from a given relation/table.
Projection operator eliminates duplicates (i.e., duplicate rows from the result relation).
Union (): It forms a relation from rows/tuples which are appearing in either or both of the
specified relations. For a union operation R S to be valid, below two conditions must be satisfied.

The relations Rand S must be of the same entity i.e., they must have the same number of
attributes.

The domains. of the i th attribute of R and i th attribute of S must be the same, for all i.

Intersection (): It forms a relation of rows/ tuples which are present in both the relations R and S.
Ensure that both relations are compatible for union and intersection operations.
Set Difference (-): It allows us to find tuples that are in one relation but are not in another. The
expression R S produces a relation containing those tuples in R but not in S.
Cross Product/Cartesian Product (): Assume that we have n 1 tuples in R and n 2 tuples in S.
Then, there are n 1 * n 2 ways of choosing a pair of tuples; one tuple from each relation. So, there will
be (n 1 * n 2) tuples in result relation P if P = R S.
Schema:
A schema is also known as database schema. It is a logical design of the database and a database
instance is a snapshot of the data in the database at a given instant of time. A relational schema
consists of a list of attributes and their corresponding domains.
Types of Schemas: It can be classified into three parts, according to the levels of abstraction

Physical/Internal Schema: Describes the database design at the physical level.

Logical/Conceptual Schema/Community User View: Describes the database design at


the logical level.

Sub-schemas /View/External Schema: Describes different views of the database views


may be queried combined in queries with base relations, used to define other views in
general not updated freely.

Data model :
A data model is a plan for building a database. Data models define how data is connected to each
other and how they are processed and stored inside the system.
Two widely used data models are:

Object based logical model

Record based logical model

Entity :
An entity may be an object with a physical existence or it may be an object with a conceptual
existence. Each entity has attributes. A thing (animate or inanimate) of independent physical or
conceptual existence and distinguishable. In the University database context, an individual student,
faculty member, a class room, a course are entities.
Attributes
Each entity is described by a set of attributes/properties.
Types of Attributes

Simple Attributes: having atomic or indivisible values. example: Dept a string Phone
Number - an eight digit number

Composite Attributes: having several components in the value. example: Qualification with
components (Degree Name, Year, University Name)

Derived Attributes: Attribute value is dependent on some other attribute. example: Age
depends on Date Of Birth. So age is a derived attribute.

Single-valued: having only one value rather than a set of values. for instance, Place Of
Birth - single string value.

Multi-valued: having a set of values rather than a single value. for instance, Courses
Enrolled attribute for student Email Address attribute for student Previous Degree attribute
for student.

Attributes can be: simple single-valued, simple multi-valued, composite single-valued or


composite multi-valued.

Keys

A super key of an entity set is a set of one or more attributes whose values uniquely
determine each entity.

A candidate key of an entity set is a minimal super key

Customer-id is candidate key of customer

account-number is candidate key of account

Although several candidate keys may exist, one of the candidate keys is selected to be
the primary key.

Keys for Relationship Sets


The combination of primary keys of the participating entity sets forms a super key of a relationship
set.
(customer-id, account-number) is the super key of depositor

NOTE: this means a pair of entity sets can have at most one relationship in a particular
relationship set.

If we wish to track all access-dates to each account by each customer, we cannot assume a
relationship for each access. We can use a multivalued attribute though.

Must consider the mapping cardinality of the relationship set when deciding the what are
the candidate keys.

Need to consider semantics of relationship set in selecting the primary key in case of more
than one candidate key.

ER Modeling:
Entity-Relationship model (ER model) in software engineering is an abstract way to describe a
database. Describing a database usually starts with a relational database, which stores data in
tables.
Notations/Shapes in ER Modeling

Notations/Shapes in ER Modeling:

The overall logical structure of a database can be expressed graphically by an E-R diagram. The
diagram consists of the following major components.

Rectangles: represent entity set.

Ellipses: represent attributes.

Diamonds: represents relationship sets.

Lines: links attribute set to entity set and entity set to relationship set.

Double ellipses: represent multi-valued attributes.

Dashed ellipses: denote derived attributes.

Double lines: represent total participation of an entity in a relationship set.

Double rectangles: represent weak entity sets.

Mapping Cardinalities / Cardinality Ratio / Types of Relationship:


Expresses the number of entities to which another entity can be associated via a relationship set.
For a binary relationship set R between entity sets A and B, the mapping cardinality must be one of
the following:

One to One: An entity in A is associated with at most one entity in B and an entity in B is
associated with at most one entity in A.

One to Many: An entity in A is associated with any number (zero or more) of entities; in B.
An entity in B, however, can be associated with at most one entity in A.

Many to Many: An entity in A is associated with any number (zero or more) c entities in B
and an entity B is associated with any number (zero or more) of entities in A.

Specialization: Consider an entity set person with attributes name, street and city, A person may
be further classified-as one of the following: Customer, and Employee. Each of these person types
is described by a set of attributes 1 at includes all the attributes of entity set person plus possibly
additional attributes. The process of designating subgroupings within an entity set is called
specialization.
The specialization of person allows us to distinguish among persons according to whether they are
employees or customers,
The refinement from an initial entity set into successive levels of entity subgroupings represents a
top-down design process in which distinctions are made explicitly.
Generalization: Basically generalization is a simple inversion of specialization. Some common
attributes of multiple entity sets are chosen to create higher level entity set. If the customer entity
set and the employee entity set are having several attributes in common, then this commonality
can be expressed by generalization.
Here, person is the higher level entity set and customer and employee are lower level entity sets.
Higher and lower level entity sets also may be designated by- the terms super class and subclass,
respectively.
Aggregation: Aggregation is used when we have to model a relationship involving entity set and a
relationship set. Aggregation is an abstraction through which relationships are treated as higher
level entities.
Integrity Constraints:
Necessary conditions to be satisfied by the data values in the relational instances so that the set of
data values constitute a meaningful database.
There are four types of Integrity constraints
Domain Constraint: The value of attribute must be within the domain.

Key Constraint: Every relation must have a primary key.


Entity Integrity Constraint: Primary key of a relation should not contain NULL values.
Referential Integrity Constraint: In relational model, two relations are related to each other over
the basis of attributes. Every value of referencing attributes must be NULL or be available in the
referenced attribute.
There Schema Refinement/Normalization
Decomposition of complex records into simple records. Normalization reduces redundancy using
non-loss decomposition principle.
Decomposition
Splitting a relation R into two or more sub relation R 1 and R 2. A fully normalized relation must have
a primary key and a set of attributes.
Decomposition should satisfy: (i) Lossless join, and (ii) Dependency preservence
Lossless Join Decomposition
Join between the sub relations should not create any additional tuples or there should not be a
case such that more number of tuples in R 1 than R 2
R R1 R 2 (Lossy)
R R 1 R2 (Lossless)
Dependency Preservence: Because of decomposition, there must not be loss of any single
dependency.
Functional Dependency (FD): Dependency between the attribute is known as functional
dependency. Let R be the relational schema and X, Y be the non-empty sets of attributes and t 1,
t 2, ... ,t n are the tuples of relation R. X Y {values for X functionally determine values for Y}
Trivial Functional Dependency: If X Y, then X Y will be trivial FD.

Here, X and Y are set of attributes of a relation R.


In trivial FD, there must be a common attribute at both the sides of arrow.
Non-Trivial Functional Dependency: If X Y = (no common attributes) and X Y satisfies FD,
then it will be a non-trivial FD.
(no common attribute at either side of arrow)

Case of semi-trivial FD
S id Sid S name (semi-trivial)
Because on decomposition, we will get
S id Sid (trivial FD) and
S id Sname (non-trivial FD)
Properties of Functional Dependence (FD)

Reflexivity: If X Y, then X Y (trivial)

Transitivity: If X Y and Y Z, then X Z

Augmentation: If X Y, then XZ YZ

Splitting or Decomposition: If X YZ, then X Y and X Z

Union: If X Y and X Z, then X YZ

Attribute Closure: Suppose R(X, Y, Z) be a relation having set of attributes i.e., (X, Y, Z), then (X + )
will be an attribute closure which functionally determines other attributes of the relation (if not all
then atleast itself).
Normal Forms/Normalization:
In relational database design, the normalization is the process for organizing data to minimize
redundancy. Normalization usually involves dividing a database into two or more tables and
defining relationship between the tables. The normal forms define the status of the relation about
the individuated attributes. There are five types of normal forms
First Normal Form (1NF): Relation should not contain any multivalued attributes or relation should
contain atomic attribute. The main disadvantage of 1NF is high redundancy.
Second Normal Form (2NF): Relation R is in 2NF if and only if R should be in 1NF, and R should
not contain any partial dependency.
Partial Dependency: Let R be the relational schema having X, Y, A, which are non-empty set of
attributes, where X = Any candidate key of the relation, Y = Proper subset of any candidate key,
and A = Non-prime attribute (i.e., A doesn't belong to any candidate key)

In the above example, X A already exists and if Y A will exist, then it will become a partial
dependency, if and only if

Y is a proper subset of candidate key.

A should be non-prime attribute.

If any of the above two conditions fail, then Y A will also become fully functional dependency.
Full Functional Dependency: A functional dependency P Q is said to be fully functional
dependency, if removal of any attribute S from P means that the dependency doesn't hold any
more.
(Student_Name, College_Name College_Address)

Suppose, the above functional dependency is a full functional dependency, then we must ensure
that there are no FDs as below.
(Student_Name College_Address)
or (College_Name Collage_Address)
Third Normal Form (3NF): Let R be a relational schema, then any non-trivial FD X Y over R is
in 3NF, if X should be a candidate key or super key or Y should be a prime attribute.

Either both of the above conditions should be true or one of them should be true.

R should not contain any transitive dependency.

For a relation schema R to be a 3NF, it is necessary to be in 2NF.

Transitive Dependency: A FD, P Q in a relation schema R is a transitive if

There is a set of attributes Z that is not a subset of any key of R.

Both X Z and Z Y hold

The above relation is in 2NF.

In relation R 1, C is not a candidate key and D is non-prime attribute. Due to this, R 1 fails to
satisfy 3NF condition. Transitive dependency is present here.

AB C and C D, then AB D will be transitive.


Boycee Codd Normal Form (BCNF): Let R be the relation schema and X Y be the any nontrivial FD over R is in BCNF if and only if X is the candidate key or super key.

If R satisfies this dependency, then of course it satisfy 2NF and 3NF.

Summary of 1 NF, 2 NF and 3 NF:

Fourth Normal Form (4NF): 4NF is mainly concerned with multivalued dependency A relation is in
4NF if and only if for every one of its non-trivial multivalued dependencies X Y, X is a super
key (i.e., X is either a candidate key or a superset).
Fifth Normal Form (5NF): It is also 'known as Project Join Normal From (PJ/NF). 5NF reduces
redundancy in relational database recording multivalued facts by isolating semantically related
multiple relationships. A table or relation is said to be in the 5NF, if and only if every join
dependency in it is implied by the candidate keys.
SQL:
Structured Ouery language (SQL) is a language that provides an interface to relational database
systems. SQL was developed by IBM in the 1970, for use in system R and is a defector standard,
as well as an ISO and ANSI standard.

To deal with the above database objects, we need a programming language and that
programming languages is known as SOL.

Three subordinate languages of SOL are :


Type of SQL
Statement
Data Definition
Language(DDL)

SQL Keyword

Function

CREATE

Used to define
change and drop the

ALTER
DROP
TRUNCATE

Data manipulation
language(DML)

Data Control Language


(DCL)

structure of a table
Used to remove all
rows from a table

SELECT
INSERT INTO
UPDATE
DELETE FROM

Used to enter, modify,


delete and retrieve
data from a table

GRANT
REVOKE

Used to provide
control over the data
in a database

COMMIT
ROLLBACK

Used to define the


end of a transaction

Data Definition Language (DDL) :


It includes the commands as

CREATE To create tables in the database.

ALTER To modify the existing table structure:

DROP To drop the table with table structure.

Data Manipulation Language(DML)


It is used to insert, delete, update data and perform queries on these tables. Some of the DML
commands are given below.

INSERT To insert data into the table.

SELECT To retrieve data from the table.

UPDATE To-update existing data in the table.

DELETE To delete data from the table.

Data Control Language (DCL)


It is used to control user's access to the database objects. Some of the DCL commands are:

GRANT Used to grant select/insert/delete access.

REVOKE Used to revoke the provided access

Transaction Control Language (TCL): It is used to manage changes affecting the data.

COMMIT To save the work done, such as inserting or updating or deleting data to/from the
table.

ROLLBACK To restore database to the original state, since last commit.

SQL Data Types SQL data types specify the type, size and format of data/information that
can be stored in columns and variables.

Constraint Types with Description


Default Constraint: It is used to insert a default value into a column, if no other value is specified
at the time of insertion.
Syntax
CREATE TABLE Employee
{
Emp_idint NOT NULL,
Last_Name varchar (250),
City varchar (50)OEFAULT *BANGALURU*
}
DDL Commands
1. CREATE TABLE < Tab1e_Name>
{
Co1umn_name 1< data_type >,
Column_name 2 < d'lta_type >
}
2. ALTER TABLE < Table_Name>
ALTER Column < Column_Name> SET NOT NULL
3. RENAME < object_type >object_name > to <new_name>
4. DROP TABLE <Table_Name>
DML Commands
SELECT A1, A2, A3,A n what to return
FROM R 1, R 2, R 3, .., R m relations or table
WHERE condition filter condition i.e., on what basis, we want to restrict the outcome/result.
If we want to write the above SQL script in the form of relational calculus, we use the following
syntax
Comparison operators which we can use in filter condition are (=, >, <, > = , < =, < >,) < > means
not equal to.
INSERT Statement: Used to add row (s) to the tables in a database
INSERT INTO Employee (F_Name, L_Name) VALUES ('Atal', 'Bihari')
UPDATE Statement: It is used to modify/update or change existing data in single row, group of
rows or all the rows in a table.
Example:
//Updates some rows in a table.
UPDATE Employee

SET City = LUCKNOW


WHERE Emp_Id BETWEEN 9 AND 15;
//Update city column for all the rows
UPDATE Employee SET City=LUCKNOW;
DELETE Statement: This is used to delete rows from a table,
Example:
//Following query will delete all the rows from Employee table
DELETE Employee
Emp_Id=7;
DELETE Employee
ORDER BY Clause: This clause is used to, sort the result of a query in a specific order (ascending
or descending), by default sorting order is ascending.
SELECT Emp_Id, Emp_Name, City FROM Employee
WHERE City = LUCKNOW
ORDER BY Emp_Id DESC;
GROUP BY Clause: It is used to divide the result set into groups. Grouping can be done by a
column name or by the results of computed columns when using numeric data types.

The HAVING clause can be used to set conditions for the GROUPBY clause.

HAVING clause is similar to the WHERE clause, but having puts conditions on groups.

WHERE clause places conditions on rows.

WHERE clause cant include aggregate: function, while HAVING conditions can do so.

Example:
SELECT Emp_Id, AVG (Salary)
FROM Employee
GROUP BY Emp_Id
HAVING AVG (Salary) > 25000;
Aggregate Functions
Joins: Joins are needed to retrieve data from two tables' related rows on the basis of some
condition which satisfies both the tables. Mandatory condition to join is that atleast one set of
column (s) should be taking values from same domain in each table.
Inner Join: Inner join is the most common join operation used in applications and can be regarded
as the default join-type. Inner join creates a new result table by combining column values of two
tables (A and B) based upon the join-predicate. These may be further divided into three parts.
1. Equi Join (satisfies equality condition)
2. Non-Equi Join (satisfies non-equality condition)
3. Self Join (one or more column assumes the same domain of values).
Outer Join: An outer join does not require each record in the two joined tables to have a matching
record. The joined table retains each record-even if no other matching record exists.

Considers also the rows from table (s) even if they don't satisfy the joining condition
(i) Right outer join (ii) Left outer join (iii) Full outer join

Left Outer Join: The result of a left outer join for table A and B always contains all records of the
left table (A), even if the join condition does not find any matching record in the right table (B).
Result set of T1 and T2
Right Outer Join: A right outer closely resembles a left outer join, except with the treatment of the
tables reversed. Every row from the right table will appear in the joined table at least once. If no
matching with left table exists, NULL will appear.

Result set of T1 and T2

Full Outer Join: A full outer join combines the effect of applying both left and right outer joins
where records in the FULL OUTER JOIN table do not match, the result set will have NULL values
for every column of the table that lacks a matching row for those records that do match, as single
row will be produced in the result set.

Result set of T1 and T2 (Using tables of previous example)

Cross Join (Cartesian product): Cross join returns the Cartesian product of rows form tables in the
join. It will produce rows which combine each row from the first table with each row from the
second table.
Select * FROM T1, T2
Number of rows in result set = (Number of rows in table 1 Number of rows in table 2)
Result set of T1 and T2 (Using previous tables T1 and T2)

StructureStorage:
The storage structure can be divided into two categories:
Volatile storage: As the name suggests, a volatile storage cannot survive system crashes. Volatile
storage devices are placed very close to the CPU; normally they are embedded onto the chipset
itself. For example, main memory and cache memory are examples of volatile storage. They are
fast but can store only a small amount of information.
Non-volatile storage: These memories are made to survive system crashes. They are huge in
data storage capacity, but slower in accessibility. Examples may include hard-disks, magnetic
tapes, flash memory, and non-volatile (battery backed up) RAM.

File Organisation:
The database is stored as a collection of files. Each file is a sequence of records. A record is a
sequence of fields. Data is usually stored in the form of records. Records usually describe entities
and their attributes. e.g., an employee record represents an employee entity and each field value in
the record specifies some attributes of that employee, such as Name, Birth-date, Salary or
Supervisor.
Allocating File Blocks on Disk: There are several standard techniques for allocating the blocks of
a file on disk

Contiguous Allocation: The file blocks are allocated to consecutive disk blocks. This
makes reading the whole file very fast.

Linked Allocation: In this, each file contains a pointer to the next file block.

Indexed Allocation: Where one or more index blocks contain pointers to the actual file
blocks.

Files of Unordered Records (Heap Files): In the simplest type of organization records are placed
in the file in the order in which they are inserted, so new records are inserted at the end of the file.
Such an organisation is called a heap or pile file.
This organisation is often used with additional access paths, such as the secondary indexes.
In this type of organisation, inserting a new record is very efficient. Linear search is used to search
a record.
Files of Ordered Records (Sorted Files): We can physically order the records of a file on disk
based on the values of one of their fields called the ordering field. This leads to an ordered or
sequential file. If the ordering field is also a key field of the file, a field guaranteed to have a unique
value in each record, then the field is called the ordering key for the file. Binary searching is used
to search a record.
Indexing Structures for Files: Indexing mechanism are used to optimize certain accesses to data
(records) managed in files. e.g., the author catalog in a library is a type of index. Search key
(definition) attribute or combination of attributes used to look-up records in a file.
An index file consists of records (called index entries) of the form.

Index files are typically much smaller than the original file because only the values for search key
and pointer are stored. The most prevalent types of indexes are based on ordered files (singlelevel indexes) and tree data structures (multilevel indexes).
Types of Single Level Ordered Indexes: In an ordered index file, index enteries are stored sorted
by the search key value. There are several types of ordered Indexes
Primary Index: A primary index is an ordered file whose records are of fixed length with two fields.
The first field is of the same data type as the ordering key field called the primary key of the data
file and the second field is a pointer to a disk block (a block address).

There is one index entry in the index file for each block in the data file.

Indexes can also be characterised as dense or sparse.

Dense index A dense index has an index entry for every search key value in the data file.

Sparse index A sparse index (non-dense), on the other hand has index entries for only
some of the search values.

A primary index is a non-dense (sparse) index, since it includes an entry for each disk block
of the data file rather than for every search value.

Clustering Index: If file records are physically ordered on a non-key field which does not have a
distinct value for each record that field is called the clustering field. We can create a different type
of index, called a clustering index, to speed up retrieval of records that have the same value for the
clustering field.

A clustering index is also an ordered file with two fields. The first field is of the same type
as the clustering field of the data file.

The record field in the clustering index is a block pointer.

A clustering index is another example of a non-dense index.

Secondary Index: A secondary index provides a secondary means of accessing a file for which
some primary access already exists. The secondary index may be on a field which is a candidate
key and has a unique value in every record or a non-key with duplicate values. The index is an
ordered file with two fields. The first field is of the same data type as some non-ordering field of the
data file that is an indexing field. The second field is either a block pointer or a record pointer. A
secondary index usually needs more storage space and longer search time than does a primary
index.
Multilevel Indexes: The idea behind a multilevel index is to reduce the part of the index. A
multilevel index considers the index file, which will be referred now as the first (or base) level of a
multilevel index. Therefore, we can create a primary index for the first level; this index to the first
level is called the second level of the multilevel index and so on.
Dynamic Multilevel Indexes Using B-Trees and B + -Trees: There are two multilevel indexes
B-Trees

When data volume is large and does not fit in memory, an extension of the binary search
tree to disk based environment is the B-tree.

In fact, since the B-tree is always balanced (all leaf nodes appear at the same level), it is
an extension of the balanced binary search tree.

The problem which the B-tree aims to solve is given a large collection of objects, each
having a key and a value, design a disk based index structure which efficiently supports
query and update.

A B-tree of order p, when used as an access structure on a key field to search for records
in a data file, can be defined as follows
1. Each internal node in the B-tree is of the form
where, q p

Each P i is a tree pointer to another node in the B-tree.


Each is a data pointer to the record whose search key field value is equal to K j .
2. Within each node, K 1 < K 2 < . < K q1
3. Each node has at most p tree pointers.
4. Each node, except the root and leaf nodes, has atleast [(p/2)] tree pointers.
5. A node within q tree pointers q p, has q 1 search key field values (and hence
has q 1 data pointers).
e.g., A B-tree of order p = 3. The values were inserted in the order 8, 5, 1, 7, 3, 12,
9, 6.
B + Trees

It is the variation of the B-tree data structure.

In a B-tree, every value of the search field appears once at some level in the tree, along
with a data pointer. In a B + -tree, data pointers are stored only at the leaf nodes of the tree.
Hence, the structure of the leaf nodes differs from the structure of internal nodes.

The pointers in the internal nodes are tree pointers to blocks that are tree nodes whereas
the pointers in leaf nodes are data pointers.

B + Tree's Structure: The structure of the B +-tree of order p is as follows


1. Each internal node is of the form < P l , K 1, P 2, K 2, . ,P q1, K q1, Pq> where, q P and
each Pi is a tree pointer.
2. Within each internal node, K 1 < K 2 < K 3. < K q1.
3. Each internal node has at most p tree pointers and except root, has atleast [(p/ 2)]
tree pointers.
4. The root node has atleast two tree pointers, if it is an internal node.

5. Each leaf node is of the form: where, q p, each is a data pointer and P
to the next leaf node of the B +-trees.

next

points

You might also like