DBMS Exam
DBMS Exam
DBMS Exam
Co1
1. Main Characteristics of the Database Approach
a. Allowing a set of concurrent users to retrieve from and to update the
database.
b. Concurrency control within the DBMS guarantees that each transaction is
correctly executed or aborted
c. Recovery subsystem ensures each completed transaction has its effect
permanently recorded in the database
d. OLTP (Online Transaction Processing) is a major part of database applications.
This allows hundreds of concurrent transactions to execute per second.
2. Advantages of DBMS
Controlling Redundancy: Duplication of data is almost avoided but not completely.
Avoiding Inconsistency: When the same data is duplicated and changes are made at
one site, which is not propagated to the other site, it gives rise to inconsistency and the two
entries regarding the same data will not agree
Providing Storage Structures (e.g. indexes) for efficient Query Processing
Data can be shared.
Data Independence.
Concurrent access.
Providing Backup and Recovery from crashes .
Restricting unauthorized access
Solving enterprise requirement than individual requirement
Providing multiple interfaces to different classes of users.
Integrity can be enforced: Integrity of data means that data in database is always
accurate, such that incorrect information cannot be stored in database
Drawing inferences and actions from the stored data using deductive and active rules
3. Disadvantages of DBMS
-> High Cost - Hardware/software, staff training, Data Conversion, maintenance
Higher impact of a failure.
Complexity - extremely complex software, knowledge
Frequent updation
Performance - for the small scale firms, DBMS does not give a good
performance
4. When not to use a DBMS
If the database and applications are simple, well defined, and not expected
to change.
If access to data by multiple users is not required.
5. Different types of Database Users
3. System Analyst :
System Analyst is a user who analyzes the requirements of parametric end users. They
check whether all the requirements of end users are satisfied.
4. Sophisticated Users :
Sophisticated users can be engineers, scientists, business analyst, who are familiar
with the database. They can develop their own data base applications according to their
requirement. They don’t write the program code but they interact the data base by
writing SQL queries directly through the query processor.
6. Application Program :
Application Program are the back end programmers who writes the code for the
application programs.They are the computer professionals. These programs could be
written in Programming languages such as Visual Basic, Developer, C, FORTRAN,
COBOL etc.
Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding irrelevant
details from user is called data abstraction.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data
is stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with database
system.
Example: Let’s say we are storing customer information in a customer table. At physical level these
records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These
details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data
types, their relationship among each other can be logically implemented. The programmers generally
work at this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the screen,
they are not aware of how the data is stored and what data is stored; such details are hidden from
them.
7. Data Model
The Data Model is defined as an abstract model that organizes data description, data semantics, and consistency
constraints of data. The data model emphasizes on what data is needed and how it should be organized instead
of what operations will be performed on data. Data Model is like an architect's building plan, which helps to
build conceptual models and set a relationship between data items.
Skip Ad
Types of Data Models: There are mainly three different types of data models: conceptual data models, logical
data models, and physical data models, and each one has a specific purpose. The data models are used to
represent the data and how it is stored in the database and to set the relationship between data items.
1. Conceptual Data Model: This Data Model defines WHAT the system contains. This model is
typically created by Business stakeholders and Data Architects. The purpose is to organize, scope and
define business concepts and rules.
2. Logical Data Model: Defines HOW the system should be implemented regardless of the DBMS. This
model is typically created by Data Architects and Business Analysts. The purpose is to developed
technical map of rules and data structures.
3. Physical Data Model: This Data Model describes HOW the system will be implemented using a
specific DBMS system. This model is typically created by DBA and developers. The purpose is actual
implementation of the database.
At this Data Modeling level, no primary or secondary key is defined. At this Data modeling level, you need to
verify and adjust the connector details that were set earlier for relationships.
Describes data needs for a single project but could integrate with other logical data models based on the
scope of the project.
Designed and developed independently from the DBMS.
Data attributes will have datatypes with exact precisions and length.
Normalization processes to the model is applied typically till 3NF.
The physical data model describes data need for a single project or application though it maybe
integrated with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the
project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined.
Data Independence is defined as a property of DBMS that helps you to change the Database schema at one level
of a database system without requiring to change the schema at the next higher level. Data independence helps
you to keep data separated from all programs that make use of it.
You can use this stored data for computing and presentation. In many systems, data independence is an essential
function for components of the system.
Physical data independence helps you to separate conceptual levels from the internal/physical levels. It allows
you to provide a logical description of the database without the need to specify physical structures. Compared to
Logical Independence, it is easy to achieve physical data independence.
With Physical independence, you can easily change the physical storage structures or devices with an effect on
the conceptual schema. Any change done would be absorbed by the mapping between the conceptual and
internal levels. Physical data independence is achieved by the presence of the internal level of the database and
then the transformation from the conceptual level of the database to the internal level.
Logical Data Independence
Logical Data Independence is the ability to change the conceptual scheme without changing
1. External views
2. External API or programs
Any change made will be absorbed by the mapping between external and conceptual levels.
When compared to Physical Data independence, it is challenging to achieve logical data independence
9. Database Language
o A DBMS has appropriate languages and interfaces to express database queries and updates.
o Database languages can be used to read, store and update the data in the database.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of
tables and schemas, their names, indexes, columns in each table, constraints, etc.
These commands are used to update the database schema that's why they come under Data
definition language.
(But in Oracle database, the execution of data control language does not have the feature of
rolling back.)
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical
transaction.
o Rollback: It is used to restore the database to original since the last Commit.
An entity can be a real-world object, either animate or inanimate, that can be easily identifiable. For
example, in a school database, students, teachers, classes, and courses offered can be considered
as entities. All these entities have some attributes or properties that give them their identity.
An entity set is a collection of similar types of entities. An entity set may contain entities with attribute
sharing similar values. For example, a Students set may contain all the students of a school; likewise
a Teachers set may contain all the teachers of a school from all faculties. Entity sets need not be
disjoint.
Entity-Set and Keys
Key is an attribute or collection of attributes that uniquely identifies an entity among entity set.
For example, the roll_number of a student makes him/her identifiable among students.
Super Key − A set of attributes (one or more) that collectively identifies an entity in an entity
set.
Candidate Key − A minimal super key is called a candidate key. An entity set may have
more than one candidate key.
Primary Key − A primary key is one of the candidate keys chosen by the database designer
to uniquely identify the entity set.
Entities are represented by means of their properties, called attributes. All attributes have values.
For example, a student entity may have name, class, and age as attributes.
There exists a domain or range of values that can be assigned to attributes. For example, a student's
name cannot be a numeric value. It has to be alphabetic. A student's age cannot be negative, etc.
Types of Attributes
Simple attribute − Simple attributes are atomic values, which cannot be divided further. For
example, a student's phone number is an atomic value of 10 digits.
Composite attribute − Composite attributes are made of more than one simple attribute. For
example, a student's complete name may have first_name and last_name.
Derived attribute − Derived attributes are the attributes that do not exist in the physical
database, but their values are derived from other attributes present in the database. For
example, average_salary in a department should not be saved directly in the database,
instead it can be derived. For another example, age can be derived from data_of_birth.
Single-value attribute − Single-value attributes contain single value. For example −
Social_Security_Number.
Multi-value attribute − Multi-value attributes may contain more than one values. For
example, a person can have more than one phone number, email_address, etc.
These attribute types can come together in a way like −
For example, If we have two entity type ‘Customer’ and ‘Account’ and they are linked using the
primary key and foreign key. We can say that the degree of relationship is 2 because here two
entities are taking part in the relationship.
Based on the number of entity types that are connected we have the following degree of
relationships:
Unary
Binary
Ternary
N-ary
Unary (degree 1)
A unary relationship exists when both the participating entity type are the same. When such a
relationship is present we say that the degree of relationship is 1.
For example, Suppose in a classroom, we have many students who belong to a particular club-like
dance club, basketball club etc. and some of them are club leads. So, a particular group of student is
managed by their respective club lead. Here, the group is formed from students and also, the club
leads are chosen from students. So, the ‘Student’ is the only entity participating here. We can
represent this relationship using the E-R diagram as follows:
So, here we get the answer to our first question which was asked in the introduction section. Yes,
there can be only one entity type in a relationship and the minimum degree of a relationship can be
one.
Binary (degree 2)
A binary relationship exists when exactly two entity type participates. When such a relationship is
present we say that the degree is 2. This is the most common degree of relationship. It is easy to
deal with such relationship as these can be easily converted into relational tables.
For example, We have two entity type ‘Customer’ and ‘Account’ where each ‘Customer’ has an
‘Account’ which stores the account details of the ‘Customer’. Since we have two entity types
participating we call it a binary relationship. Also, one ‘Customer’ can have many ‘Account’ but each
‘Account’ should belong to only one ‘Customer’. We can say that it is a one-to-many binary
relationship. (Learn more about types of relationships in DBMS from here)
Ternary(degree 3)
A ternary relationship exists when exactly three entity type participates. When such a relationship is
present we say that the degree is 3. As the number of entity increases in the relationship, it becomes
complex to convert them into relational tables.
For example, We have three entity type ‘Employee’, ‘Department’ and ‘Location’. The relationship
between these entities are defined as an employee works in a department, an employee works at a
particular location. So, we can see we have three entities participating in a relationship so it is a
ternary relationship. The degree of this relation is 3.
N-ary (n degree)
An N-ary relationship exists when ‘n’ number of entities are participating. So, any number of
entities can participate in a relationship. There is no limitation to the maximum number of entities
that can participate. But, relations with a higher degree are not common. This is because the
conversion of higher degree relations to relational tables gets complex. We are making an E-R
model because it can be easily be converted into any other model for implementing the database.
But, this benefit is not available if we use higher degree relations. So, binary relations are more
popular and widely used. Though we can make a relationship with any number of entity types but
we don't do that.
We represent an N-ary relationship as follows:
In the above example, E1 denotes the first entity type, E2 denotes the second entity type and so on.
R represents the relationship. So, here we have a total of 5 entity type which participates in the
relationship. Therefore, the degree of the above n-ary relationship is 5.
1. Many-to-Many Cardinality-
An entity in set A can be associated with any number (zero or more) of entities in set B.
An entity in set B can be associated with any number (zero or more) of entities in set A.
Symbol Used-
Example-
Here,
2. Many-to-One Cardinality-
Symbol Used-
Example-
Consider the following ER diagram-
Here,
3. One-to-Many Cardinality-
An entity in set A can be associated with any number (zero or more) of entities in set B.
An entity in set B can be associated with at most one entity in set A.
Symbol Used-
Example-
Here,
4. One-to-One Cardinality-
Symbol Used-
Example-
Here,
Consider two entities Employee and Department related via Works_For relationship. Now, every
Employee works in at least one department therefore an Employee entity exist if it has at least one
Works_For relationship with Department entity. Thus the participation of Employee in Works_For is
total relationship.
Relational Model (RM) represents the database as a collection of relations. A relation is nothing but a table of
values. Every row in the table represents a collection of related data values. These rows in the table denote a
real-world entity or relationship.
The table name and column names are helpful to interpret the meaning of values in each row. The data are
represented as a set of relations. In the relational model, data are stored as tables. However, the physical storage
of the data is independent of the way the data are logically organized.
1. Attribute: Each column in a Table. Attributes are the properties which define a relation. e.g.,
Student_Rollno, NAME,etc.
2. Tables – In the Relational model the, relations are saved in the table format. It is stored along with its
entities. A table has two properties rows and columns. Rows represent records and columns represent
attributes.
3. Tuple – It is nothing but a single row of a table, which contains a single record.
4. Relation Schema: A relation schema represents the name of the relation with its attributes.
5. Degree: The total number of attributes which in the relation is called the degree of the relation.
6. Cardinality: Total number of rows present in the Table.
7. Column: The column represents the set of values for a specific attribute.
8. Relation instance – Relation instance is a finite set of tuples in the RDBMS system. Relation instances
never have duplicate tuples.
9. Relation key - Every row has one, two or multiple attributes, which is called relation key.
10. Attribute domain – Every attribute has some pre-defined value and scope which is known as attribute
domain
Relational Integrity constraints in DBMS are referred to conditions which must be present for a valid relation.
These Relational constraints in DBMS are derived from the rules in the mini-world that the database represents.
There are many types of Integrity Constraints in DBMS. Constraints on the Relational database management
system is mostly divided into three main categories are:
1. Domain Constraints
2. Key Constraints
3. Referential Integrity Constraints
Domain Constraints
Domain constraints can be violated if an attribute value is not appearing in the corresponding domain or it is not
of the appropriate data type.
Domain constraints specify that within each tuple, and the value of each attribute must be unique. This is
specified as data types which include standard data types integers, real numbers, characters, Booleans, variable
length strings, etc.
Key Constraints
An attribute that can uniquely identify a tuple in a relation is called the key of the table. The value of the
attribute for different tuples in the relation has to be unique.
Referential Integrity constraints in DBMS are based on the concept of Foreign Keys. A foreign key is an
important attribute of a relation which should be referred to in other relationships. Referential integrity
constraint state happens where relation refers to a key attribute of a different or same relation. However, that key
element must exist in the table.
CO2
1.DML
DML is short name of Data Manipulation Language which deals with data
manipulation and includes most common SQL statements such SELECT, INSERT,
UPDATE, DELETE, etc., and it is used to store, modify, retrieve, delete and update
data in a database.
Inner Join or Equi Join returns rows when there is a match in both tables.
The INNER JOIN creates a new result table by combining column values of two tables
(table1 and table2) based upon the join-predicate. The query compares each row of
table1 with each row of table2 to find all pairs of rows which satisfy the join-
predicate. When the join-predicate is satisfied, column values for each matched pair
of rows of A and B are combined into a result row.
Syntax of INNER JOIN:
SELECT table1.column1, table2.column2...
FROM table1 INNER JOIN table2
ON table1.common_field = table2.common_field;
Left Outer Join
Left-Outer Join returns all rows from the left table, even if there are no matches in
the right table.
This means that if the ON clause matches 0 (zero) records in right table, the join will
still return a row in the result, but with NULL in each column from right table.
This means that a left join returns all the values from the left table, plus matched
values from the right table or NULL in case of no matching join predicate.
Syntax of Left Outer Join or left join
SELECT table1.column1, table2.column2...
FROM table1 LEFT OUTER JOIN table2
ON table1.common_field = table2.common_field;
Right Outer Join
RIGHT JOIN or RIGHT OUTER JOIN returns all rows from the right table, even if there
are no matches in the left table.
This means that if the ON clause matches 0 (zero) records in left table, the join will
still return a row in the result, but with NULL in each column from left table.
This means that a right join returns all the values from the right table, plus matched
values from the left table or NULL in case of no matching join predicate.
Syntax of Right Outer Join or Right join
SELECT table1.column1, table2.column2...
FROM table1 RIGHT OUTER JOIN table2
ON table1.common_field = table2.common_field;
Full Join
FULL JOIN combines the results of both left and right outer joins.
The joined table will contain all records from both tables, and fill in NULLs for missing
matches on either side.
Syntax of Full join
SELECT table1.column1, table2.column2...
FROM table1 FULL JOIN table2
ON table1.common_field = table2.common_field;
3.Null Values
The SQL NULL is the term used to represent a missing value. A NULL value in a
table is a value in a field that appears to be blank.
A field with a NULL value is a field with no value. It is very important to understand
that a NULL value is different than a zero value or a field that contains spaces.
Syntax
The basic syntax of NULL while creating a table.
SQL> CREATE TABLE CUSTOMERS(
ID INT NOT NULL,
NAME VARCHAR (20) NOT NULL,
AGE INT NOT NULL,
ADDRESS CHAR (25) ,
SALARY DECIMAL (18, 2),
PRIMARY KEY (ID)
);
Here, NOT NULL signifies that column should always accept an explicit value of the
given data type. There are two columns where we did not use NOT NULL, which
means these columns could be NULL.
A field with a NULL value is the one that has been left blank during the record
creation.
4. Views in SQL
o Views in SQL are considered as a virtual table. A view also contains rows and
columns.
o To create the view, we can select the fields from one or more tables present in the
database.
o A view can either have specific rows based on certain condition or all the rows
of a table.
1. Creating view
A view can be created using the CREATE VIEW statement. We can create a view from
a single table or multiple tables.
2.Updating a view
All NOT NULL columns from the base table must be included in the view in
order for the INSERT query to function.
So if a view satisfies all the above-mentioned rules then you can update a view.
Advantages of views
Security: Each user can be given permission to access the database only
through a small set of views that contain the specific data the user is
authorized to see, thus restricting the user's access to stored data
Query Simplicity: A view can draw data from several different tables and
present it as a single table, turning multi-table queries into single-table
queries against the view.
Data Integrity: If data is accessed and entered through a view, the DBMS can
automatically check the data to ensure that it meets the specified integrity
constraints.
Logical data independence: View can make the application and database
tables to a certain extent independent. If there is no view, the application must
be based on a table. With the view, the program can be established in view of
above, to view the program with a database table to be separated.
Disadvantages of views
Performance: Views create the appearance of a table, but the DBMS must still
translate queries against the view into queries against the underlying source
tables. If the view is defined by a complex, multi-table query then simple
queries on the views may take considerable time.
Update restrictions: When a user tries to update rows of a view, the DBMS
must translate the request into an update on rows of the underlying base
tables. This is possible for simple views, but more complex views are often
restricted to read-only.
6. Relational Algebra
Relational algebra is a procedural query language. It gives a step by step process to
obtain the result of the query. It uses operators to perform queries.
1. Select Operation:
o The select operation selects tuples that satisfy a given predicate.
o It is denoted by sigma (σ).
1. Notation: σ p(r)
Where:
2. Project Operation:
o This operation shows the list of those attributes that we wish to appear in the
result. Rest of the attributes are eliminated from the table.
o It is denoted by ∏.
1. Notation: ∏ A1, A2, An (r)
Where
Union Operation:
o Suppose there are two tuples R and S. The union operation contains all the
tuples that are either in R or S or both in R & S.
o It eliminates the duplicate tuples. It is denoted by ∪.
1. Notation: R ∪ S
Set Intersection:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in both R & S.
o It is denoted by intersection ∩.
1. Notation: R ∩ S
Set Difference:
o Suppose there are two tuples R and S. The set intersection operation contains
all tuples that are in R but not in S.
o It is denoted by intersection minus (-).
1. Notation: R - S
Cartesian product
o The Cartesian product is used to combine each row in one table with each row
in the other table. It is also known as a cross product.
o It is denoted by X.
1. Notation: E X D
Rename Operation:
The rename operation is used to rename the output relation. It is denoted by rho (ρ).
A tree gives a good visual feel of the complexity of the query and
the operations involved
8.What is SQL?
SQL stands for Structured Query Language
SQL lets you access and manipulate databases
SQL became a standard of the American National Standards
Institute (ANSI) in 1986, and of the International Organization for
Standardization (ISO) in 1987
9. RDBMS
RDBMS stands for Relational Database Management System.
RDBMS is the basis for SQL, and for all modern database systems such as MS
SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
How it works
Due to a collection of organized set of tables, data can be accessed easily in RDBMS.
10.SQL Commands
o SQL commands are instructions. It is used to communicate with the database.
It is also used to perform specific tasks, functions, and queries of data.
o SQL can perform various tasks like create a table, add data to tables, drop the
table, modify the table, set permission for users.
a. CREATE It is used to create a new table in the database.
b. DROP: It is used to delete both the structure and record stored in the table.
c. ALTER: It is used to alter the structure of the database. This change could be either
to modify the characteristics of an existing attribute or probably to add a new
attribute.
d. TRUNCATE: It is used to delete all the rows from the table and free the space
containing the table.
e. INSERT: The INSERT statement is a SQL query. It is used to insert data into the row of a
table.
f. UPDATE: This command is used to update or modify the value of a column in the table.
CO3
1.PL/SQL
PL/SQL is tightly integrated with SQL.
DECLARE
<declarations section>
BEGIN
<executable command(s)>
EXCEPTION
<exception handling>
END;
Programming structures
• Conditional statements
• Loops
• Procedures
• Functions
• Cursors
• Triggers
• A cursor allows you to iterate a set of rows returned by a query and process each
row individually.
SELECT statement.
➢ Non-scrollable: Can be traversed only in one direction and cannot skip rows
➢ Asensitive: The server may or may not make a copy of its result table
• You can use MySQL cursors in stored procedures, stored functions, and triggers.
3.MySQL Triggers
• A trigger is a stored program invoked automatically in response to an event that occurs in
the associated table.
• MySQL supports triggers that are invoked in response to the INSERT, UPDATE or DELETE
event.
– row-level triggers
– statement-level triggers.
created.
➢ Before Insert
➢ After Insert
➢ Before Update
➢ After Update
➢ Before Delete
➢ After Delete
4.Normalization
o Normalization is the process of organizing the data in the database.
o Normalization is used to minimize the redundancy from a relation or set of
relations. It is also used to eliminate the undesirable characteristics like
Insertion, Update and Deletion Anomalies.
o Normalization divides the larger table into the smaller table and links them
using relationship.
o The normal form is used to reduce redundancy from the database table.
2NF A relation will be in 2NF if it is in 1NF and all non-key attributes are fully functional
4NF A relation will be in 4NF if it is in Boyce Codd normal form and has no multi-valued
dependency.
5NF A relation is in 5NF if it is in 4NF and not contains any join dependency and joining should
be lossless.
5.Data Redundancy
Redundancy means having multiple copies of same data in the database.
This problem arises when a database is not normalized. Suppose a table of
student details attributes are: student Id, student name, college name,
college rank, course opted.
1.Insertion Anomaly –
If a student detail has to be inserted whose course is not being
decided yet then insertion will not be possible till the time course is
decided for student.
This problem happens when the insertion of a data record is not
possible without adding some additional unrelated data to the
record.
2.Deletion Anomaly –
If the details of students in this table is deleted then the details of
college will also get deleted which should not occur by common
sense.
This anomaly happens when deletion of a data record results in
losing some unrelated information that was stored as part of the
record that was deleted from a table.
It is not possible to delete some information without losing some
other information in the table as well.
3.Updation Anomaly –
Suppose if the rank of the college changes then changes will have
to be all over the database which will be time-consuming and
computationally costly.
If updation do not occur at all places then database will be in
inconsistent state.
7.Storage Devices
Databases are stored in file formats, which contain records. At physical level, the
actual data is stored in electromagnetic format on some device. These storage
devices can be broadly categorized into three types −
Primary Storage
• Direct access by CPU.
• Fast access to data
• Limited storage capacity.
• More expensive.
• Volatile
• Main memory & faster cache memory.
Secondary Storage
• No direct accessto CPU
• Slow accessto data
• Large capacity
• Less expensive
• Non-Volatile
• Online devices
• Magnetic disk & Flash memory
• Most databases are stored permanently on magnetic disks
Tertiary Storage
• No direct access to CPU
• Slower access to data
• Large capacity
• Less expensive than disks
• Non-Volatile
• Offline devices
• Magnetic tapes & Optical disks
• Used for backing up databases
8.Hashing
In a huge database structure, it is very inefficient to search all the index values and
reach the desired data. Hashing technique is used to calculate the direct location of a
data record on the disk without using index structure.
In this technique, data is stored at the data blocks whose address is generated by
using the hashing function. The memory location where these records are stored is
known as data bucket or data blocks.
In this, a hash function can choose any of the column value to generate the address.
Most of the time, the hash function uses the primary key to generate the address of
the data block. A hash function is a simple mathematical function to any complex
mathematical function. We can even consider the primary key itself as the address of
the data block. That means each row whose address will be the same as a primary
key stored in the data block.
Types of Hashing:
Static Hashing
In static hashing, the resultant data bucket address will always be the same. That
means if we generate an address for EMP_ID =103 using the hash function mod (5)
then it will always result in same bucket address 3. Here, there will be no change in
the bucket address.
Hence in this static hashing, the number of data buckets in memory remains constant
throughout. In this example, we will have five data buckets in the memory used to
store the data.
o Insert a Record
o Delete a Record
o Update a Record
Dynamic Hashing
9. Indexing in DBMS
o Indexing is used to optimize the performance of a database by minimizing the
number of disk accesses required when a query is processed.
o The index is a type of data structure. It is used to locate and access the data in
a database table quickly.
Index structure:
Indexing Methods
Ordered indices
The indices are usually sorted to make searching faster. The indices which are sorted
are known as ordered indices.
Primary Index
o If the index is created on the basis of the primary key of the table, then it is
known as primary indexing. These primary keys are unique to each record and
contain 1:1 relation between the records.
o As primary keys are stored in sorted order, the performance of the searching
operation is quite efficient.
o The primary index can be classified into two types: Dense index and Sparse
index.
Dense index
o The dense index contains an index record for every search key value in the
data file. It makes searching faster.
o In this, the number of records in the index table is same as the number of
records in the main table.
o It needs more space to store index record itself. The index records have the
search key and a pointer to the actual record on the disk.
Sparse index
o In the data file, index record appears only for a few items. Each item points to
a block.
o In this, instead of pointing to each record in the main table, the index points
to the records in the main table in a gap.
Clustering Index
o A clustered index can be defined as an ordered data file. Sometimes the index
is created on non-primary key columns which may not be unique for each
record.
o In this case, to identify the record faster, we will group two or more columns
to get the unique value and create index out of them. This method is called a
clustering index.
o The records which have similar characteristics are grouped, and indexes are
created for these group.
Secondary Index
In the sparse indexing, as the size of the table grows, the size of mapping also grows.
These mappings are usually kept in the primary memory so that address fetch should
be faster. Then the secondary memory searches the actual data based on the address
got from mapping. If the mapping size grows then fetching the address itself
becomes slower. In this case, the sparse index will not be efficient. To overcome this
problem, secondary indexing is introduced.
Operations of Transaction:
Read(X): Read operation is used to read the value of X from the database and stores
it in a buffer in main memory.
Write(X): Write operation is used to write the value back to the database from the
buffer.
2.Recoverable Schedules:
Schedules in which transactions commit only after all transactions whose
changes they read commit are called recoverable schedules. In other
words, if some transaction T j is reading value updated or written by some
other transaction Ti, then the commit of Tj must occur after the commit of
T i.
Irrecoverable Schedule:
The table below shows a schedule with two transactions, T1 reads and
writes A and that value is read and written by T2. T2 commits. But later
on, T1 fails. So we have to rollback T1. Since T2 has read the value
written by T1, it should also be rollbacked. But we have already
committed that. So this schedule is irrecoverable schedule. When Tj is
reading the value updated by Ti and Tj is committed before committing of
Ti, the schedule will be irrecoverable.
Recoverable with Cascading Rollback:
The table below shows a schedule with two transactions, T1 reads and
writes A and that value is read and written by T2. But later on, T1 fails.
So we have to rollback T1. Since T2 has read the value written by T1, it
should also be rollbacked. As it has not committed, we can rollback T2 as
well. So it is recoverable with cascading rollback. Therefore, if Tj is
reading value updated by Ti and commit of Tj is delayed till commit of Ti,
the schedule is called recoverable with cascading rollback.
Cascadeless Recoverable Rollback:
The table below shows a schedule with two transactions, T1 reads and
writes A and commits and that value is read by T2. But if T1 fails before
commit, no other transaction has read its value, so there is no need to
rollback other transaction. So this is a Cascadeless recoverable
schedule. So, if Tj reads value updated by Ti only after Ti is committed,
the schedule will be cascadeless recoverable.
Characterization of schedules based on Serializabilty
• Serial Schedule
• Serializable Schedule
• Result equivalent
• Conflict equivalent
• Conflict serializable
Note: Serializable and Serial are not same. They are different.
Serializability is very hard to verify.
In most of the Database Management Systems, the current approach
is using locks.
3.Concurrency Control
Concurrency Control is the working concept that is required for controlling and
managing the concurrent execution of database operations and thus avoiding the
inconsistencies in the database. Thus, for maintaining the concurrency of the
database, we have the concurrency control protocols.
Lock-Based Protocol
In this type of protocol, any transaction cannot read or write data until it acquires an
appropriate lock on it. There are two types of lock:
1. Shared lock:
o It is also known as a Read-only lock. In a shared lock, the data item can only
read by the transaction.
o It can be shared between the transactions because when the transaction holds
a lock, then it can't update the data on the data item.
2. Exclusive lock:
o In the exclusive lock, the data item can be both reads as well as written by the
transaction.
o This lock is exclusive, and in this lock, multiple transactions do not modify the
same data simultaneously.
1. Read phase: In this phase, the transaction T is read and executed. It is used to
read the value of various data items and stores them in temporary local
variables. It can perform all the write operations on temporary variables
without an update to the actual database.
2. Validation phase: In this phase, the temporary variable value will be validated
against the actual data to see if it violates the serializability.
3. Write phase: If the validation of the transaction is validated, then the
temporary results are written to the database or system otherwise the
transaction is rolled back.
Undo:
Redo:
For Undo:
For Redo:
5.Aries
• Aries- Algorithms for Recovery and Isolation Exploiting Semantics
• ARIES is a recovery algorithm designed to work with a no-force, steal
database approach
• It is based on WAL (Write Ahead Logging)
• Repeating history during redo
• Logging changes during undo