Unit 5

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 34

UNIT-5

Storage System in DBMS

• A database system provides an ultimate view of the stored data. However, data in the form of bits, bytes get
stored in different storage devices.
• In this section, we will take an overview of various types of storage devices that are used for accessing and
storing data.

Types of Data Storage


For storing the data, there are different types of storage options available. These storage types differ from one
another as per the speed and accessibility. There are the following types of storage devices used for storing the
data:

• Primary Storage
• Secondary Storage
• Tertiary Storage
Primary Storage
• It is the primary area that offers quick access to the stored data. We also know the primary storage as volatile
storage. It is because this type of memory does not permanently store the data. As soon as the system leads to a
power cut or a crash, the data also get lost. Main memory and cache are the types of primary storage.

• Main Memory: It is the one that is responsible for operating the data that is available by the storage medium. The
main memory handles each instruction of a computer machine. This type of memory can store gigabytes of data on
a system but is small enough to carry the entire database. At last, the main memory loses the whole content if the
system shuts down because of power failure or other reasons.
• Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a tiny storage media
which is maintained by the computer hardware usually. While designing the algorithms and query processors for
the data structures, the designers keep concern on the cache effects.
Secondary Storage
• Secondary storage is also called as Online storage. It is the storage area that allows the user to save and store data
permanently. This type of memory does not lose the data due to any power failure or system crash. That's why we
also call it non-volatile storage.
• There are some commonly described secondary storage media which are available in almost every type of
computer system:

• Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further plugged into the USB
slots of a computer system. These USB keys help transfer data to a computer system, but it varies in size limits. Unlike
the main memory, it is possible to get back the stored data which may be lost due to a power cut or other reasons.
This type of memory storage is most commonly used in the server systems for caching the frequently used data. This
leads the systems towards high performance and is capable of storing large amounts of databases than the main
memory.
Magnetic Disk Storage: This type of storage media is also known as online storage media. A magnetic disk is used for
storing the data for a long time. It is capable of storing an entire database. It is the responsibility of the computer
system to make availability of the data from a disk to the main memory for further accessing. Also, if the system
performs any operation over the data, the modified data should be written back to the disk. The tremendous
capability of a magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure can
easily ruin as well as destroy the stored data.
Tertiary Storage: It is the storage type that is external from the computer system. It has the slowest speed. But it is
capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is generally used for data
backup. There are following tertiary storage devices available:
• Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk (CD) can store 700
megabytes of data with a playtime of around 80 minutes. On the other hand, a Digital Video Disk or a DVD can store
4.7 or 8.5 gigabytes of data on each side of the disk.
• Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for archiving or backing up the
data. It provides slow access to data as it accesses data sequentially from the start. Thus, tape storage is also known as
sequential-access storage. Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.
Transaction Control in DBMS

• The transaction is a single logical unit that accesses and modifies the contents of the database. Transactions access
data using read and write operations.
• Transaction is a single operation of processing that can have many operations. Transaction is needed when more
than one user wants to access same database. Transaction has ACID properties.
ACID Properties of Transaction:

• Atomicity: All the operations in a transaction are considered to be atomic and as one unit. If system fails
or any read/write conflicts occur during transaction the system needs to revert back to its previous state.
Atomicity is maintained by the Transaction Management Component.
• Consistency: If system fails because of invalid data while doing an operation revert back the system to
its previous state. Consistency is maintained by the Application manager.
• Isolation: If multiple transactions are executing on single database, each transaction should be isolated
from other transaction. If multiple transactions are performed on single database, operation from any
transaction should not interfere with operation in other transaction. Isolation is maintained by the
concurrency control manager.
• Durability: Durability means the changes made during the transactions should exist after completion of
transaction. Changes must be permanent and must not be lost due to any database failure. It is
maintained by the recovery manager.
Concurrency Control

• Concurrency control mechanisms provide us with


various concepts & implementations to ensure the
execution of any transaction across any node doesn’t
violate ACID or BASE (depending on database)
properties causing inconsistency & mixup of data in
the distributed systems.
• Transactions in the distributed system are executed in
“sets“, every set consists of various sub-transactions.
These sub-transactions across every node must be
executed serially to maintain data integrity & the
concurrency control mechanisms do this serial
execution.
Pessimistic Concurrency Control (PCC)
• The Pessimistic Concurrency Control Mechanisms proceeds on assumption that, most of the transactions
will try to access the same resource simultaneously. It’s basically used to prevent concurrent access to a
shared resource and provide a system of acquiring a Lock on the data item before performing any
operation.
Optimistic Concurrency Control (OCC)
• The problem with pessimistic concurrency control systems is that, if a transaction acquires a lock on a
resource so that no other transactions can access it. This will result in reducing concurrency of the
overall system.
• The Optimistic Concurrency control techniques proceeds on the basis of assumption that, 0 or very few
transactions will try to access a certain resource simultaneously. We can describe a system as FULLY OPTIMISTIC,
if it uses No-Locks at all & checks for conflicts at commit time. It has following 4-phases of operation:
• Read Phase: When a transaction begins, it reads the data while also logging the time-stamp at which data is read
to verify for conflicts during the validation phase.
• Execution Phase: In this phase, the transaction executes all its operation like create, read, update or delete etc.
• Validation Phase: Before committing a transaction, a validation check is performed to ensure consistency by
checking the last_updated timestamp with the one recorded at read_phase. If the timestamp matches, then the
transaction will be allowed to be committed and hence proceed with the commit phase.
• Commit phase: During this phase, the transactions will either be committed or aborted, depending on the
validation check performed during previous phase. If the timestamp matches, then transactions are committed
else they’re aborted.
Concurrency problems in DBMS Transactions

• Concurrency control is an essential aspect of database management systems (DBMS) that ensures transactions can
execute concurrently without interfering with each other. However, concurrency control can be challenging to
implement, and without it, several problems can arise, affecting the consistency of the database. In this article, we
will discuss some of the concurrency problems that can occur in DBMS transactions and explore solutions to prevent
them.
• When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it might lead to
several problems. These problems are commonly referred to as concurrency problems in a database environment.
The five concurrency problems that can occur in the database are:
• Temporary Update Problem
• Incorrect Summary Problem
• Lost Update Problem
• Unrepeatable Read Problem
• Phantom Read Problem
Temporary Update Problem:
• Temporary update or dirty read problem occurs when one transaction updates an item and fails. But the updated
item is used by another transaction before the item is changed or reverted back to its last value.

• In the above example, if transaction 1 fails for some reason


then X will revert back to its previous value. But transaction
2 has already read the incorrect value of X.
Incorrect Summary Problem:
• Consider a situation, where one transaction is applying the aggregate function on some records while another
transaction is updating these records. The aggregate function may calculate some values before the values have been
updated and others after they are updated.

• In the above example, transaction 2 is calculating the sum of


some records while transaction 1 is updating them.
Therefore the aggregate function may calculate some values
before they have been updated and others after they have
been updated.
Lost Update Problem:
• In the lost update problem, an update done to a data item by a transaction is lost as it is overwritten by the update
done by another transaction.

• In the above example, transaction 2 changes the


value of X but it will get overwritten by the write
commit by transaction 1 on X (not shown in the
image above). Therefore, the update done by
transaction 2 will be lost. Basically, the write commit
done by the last transaction will overwrite all
previous write commits.
Unrepeatable Read Problem:
• The unrepeatable problem occurs when two or more read operations of the same transaction read different values of
the same variable.

• In the above example, once transaction 2 reads the variable


X, a write operation in transaction 1 changes the value of
the variable X. Thus, when another read operation is
performed by transaction 2, it reads the new value of X
which was updated by transaction 1.
Phantom Read Problem:
The phantom read problem occurs when a transaction reads a variable once but when it tries to read that same variable
again, an error occurs saying that the variable does not exist.

• In the above example, once transaction 2 reads the variable


X, transaction 1 deletes the variable X without transaction
2’s knowledge. Thus, when transaction 2 tries to read X, it is
not able to do it.
Recovery With Concurrent Transactions

• Concurrency control means that multiple transactions can be executed at the same time and then the
interleaved logs occur. But there may be changes in transaction results so maintain the order of execution
of those transactions.
• During recovery, it would be very difficult for the recovery system to backtrack all the logs and then start
recovering.
• Recovery with concurrent transactions can be done in the following four ways.
• Interaction with concurrency control
• Transaction rollback
• Checkpoints
• Restart recovery
Interaction with concurrency control :

• In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is used. So, to rollback a
failed transaction, we must undo the updates performed by the transaction.
Transaction rollback :
• In this scheme, we rollback a failed transaction by using the log.
• The system scans the log backward a failed transaction, for every log record found in the log the system restores the
data item.
Checkpoints :
• Checkpoints is a process of saving a snapshot of the applications state so that it can restart from that point in case of
failure.
• Checkpoint is a point of time at which a record is written onto the database form the buffers.
• Checkpoint shortens the recovery process.
• When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the entire log
file will be removed from the file. Then the log file is updated with the new step of transaction till the next checkpoint
and so on.
• The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the transactions
were committed.
To ease this situation, ‘Checkpoints‘ Concept is used by the most DBMS.
• In this scheme, we used checkpoints to reduce the number of log records that the system must scan when it
recovers from a crash.
• In a concurrent transaction processing system, we require that the checkpoint log record be of the form
<checkpoint L>, where ‘L’ is a list of transactions active at the time of the checkpoint.
• A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates even while buffer blocks are
being written out.

Restart recovery :
• When the system recovers from a crash, it constructs two lists.
• The undo-list consists of transactions to be undone, and the redo-list consists of transaction to be redone.
• The system constructs the two lists as follows: Initially, they are both empty. The system scans the log backward,
examining each record, until it finds the first <checkpoint> record.
What is CRUD in MongoDB?
• CRUD operations describe the conventions of a user-interface that let users view, search, and modify parts of the
database.
• MongoDB documents are modified by connecting to a server, querying the proper documents, and then
changing the setting properties before sending the data back to the database to be updated. CRUD is data-
oriented, and it’s standardized according to HTTP action verbs.
When it comes to the individual CRUD operations:
• The Create operation is used to insert new documents in the MongoDB database.
• The Read operation is used to query a document in the database.
• The Update operation is used to modify existing documents in the database.
• The Delete operation is used to remove documents in the database.

How to Perform CRUD Operations


• Now that we’ve defined MongoDB CRUD operations, we can take a look at how to carry out the individual
operations and manipulate documents in a MongoDB database. Let’s go into the processes of creating, reading,
updating, and deleting documents, looking at each operation in turn.
Create Operations
• For MongoDB CRUD, if the specified collection doesn’t exist, the create operation will create the collection when
it’s executed. Create operations in MongoDB target a single collection, not multiple collections. Insert operations
in MongoDB are atomic on a single document level.
• MongoDB provides two different create operations that you can use to insert documents into a collection:
Method Description

db.collection.insertOne() It is used to insert a single document in the collection.

db.collection.insertMany() It is used to insert multiple documents in the collection.

db.createCollection() It is used to create an empty collection.


insertOne()
• As the namesake, insertOne() allows you to insert one document into
the collection. For this example, we’re going to work with a collection db.RecordsDB.insertOne({
name: "Marsh",
called RecordsDB. We can insert a single entry into our collection by age: "6 years",
calling the insertOne() method on RecordsDB. We then provide the species: "Dog",
ownerAddress: "380 W. Fir Ave",
information we want to insert in the form of key-value pairs, chipped: true
establishing the schema. })
db.RecordsDB.insertMany([{
insertMany()
name: "Marsh",
It’s possible to insert multiple items at one time by calling the age: "6 years",
species: "Dog",
insertMany() method on the desired collection. In this case, we
ownerAddress: "380 W. Fir Ave",
pass multiple items into our chosen collection (RecordsDB) and chipped: true},
{name: "Kitana",
separate them by commas. Within the parentheses, we use
age: "4 years",
brackets to indicate that we are passing in a list of multiple species: "Cat",
ownerAddress: "521 E. Cortland",
entries. This is commonly referred to as a nested method.
chipped: true}])

db.RecordsDB.insertMany([{ name: "Marsh", age: "6 years", species: "Dog",


ownerAddress: "380 W. Fir Ave", chipped: true}, {name: "Kitana", age: "4 years",
species: "Cat", ownerAddress: "521 E. Cortland", chipped: true}])
{
"acknowledged" : true,
"insertedIds" : [
ObjectId("5fd98ea9ce6e8850d88270b4"),
ObjectId("5fd98ea9ce6e8850d88270b5")
]
}
Read Operations
• The read operations allow you to supply special query filters and criteria that let you specify which documents
you want. The MongoDB documentation contains more information on the available query filters. Query
modifiers may also be used to change how many results are returned.
• MongoDB has two methods of reading documents from a collection:
db.collection.find()
db.collection.findOne()
find()
• In order to get all the documents from a collection, we can simply use the find() method on our chosen
collection. Executing just the find() method with no arguments will return all records currently in the collection.
db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species"
: "Cat", "ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years", "species"
: "Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years", "species" :
"Dog", "ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years", "species" :
"Dog", "ownerAddress" : "900 W. Wood Way", "chipped" : true }
• Here we can see that every record has an assigned “ObjectId” mapped to the “_id” key.
• If you want to get more specific with a read operation and find a desired subsection of the records, you can use
the previously mentioned filtering criteria to choose what results should be returned. One of the most common
ways of filtering the results is to search by value.

db.RecordsDB.find({"species":"Cat"})
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat", "o

findOne()
• In order to get one document that satisfies the search criteria, we can simply use the findOne() method on our
chosen collection. If multiple documents satisfy the query, this method returns the first document according to the
natural order which reflects the order of documents on the disk. If no documents satisfy the search criteria, the
function returns null. The function takes the following form of syntax.
db.{collection}.findOne({query}, {projection})

{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years", "species" : "Cat",


"ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years", "species" : "Dog",
"ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "3 years", "species" : "Dog",
"ownerAddress" : "380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "8 years", "species" : "Dog",
"ownerAddress" : "900 W. Wood Way", "chipped" : true }

db.RecordsDB.find({"age":"8 years"})

{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "8 years", "species" : "Cat",


"ownerAddress" : "521 E. Cortland", "chipped" : true }
Update Operations
• Like create operations, update operations operate on a single collection, and they are atomic at a single document
level. An update operation takes filters and criteria to select the documents you want to update.
• You should be careful when updating documents, as updates are permanent and can’t be rolled back. This applies to
delete operations as well.
• For MongoDB CRUD, there are three different methods of updating documents:
• db.collection.updateOne()
db.collection.updateMany()
db.collection.replaceOne()
updateOne()
We can update a currently existing record and change a single document with an update operation. To do this, we
use the updateOne() method on a chosen collection, which here is “RecordsDB.” To update a document, we
provide the method with two arguments: an update filter and an update action.

The update filter defines which items we want to update, and the update action defines how to update those
items. We first pass in the update filter. Then, we use the “$set” key and provide the fields we want to update as a
value. This method will update the first record that matches the provided filter.

db.RecordsDB.updateOne({name: "Marsh"}, {$set:{ownerAddress: "451 W. Coffee St. A204"}})


{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "6 years", "species" : "Dog",
"ownerAddress" : "451 W. Coffe
updateMany()
updateMany() allows us to update multiple items by passing in a list of items, just as we did when inserting multiple
items. This update operation uses the same syntax for updating a single document.

db.RecordsDB.updateMany({species:"Dog"}, {$set: {age: "5"}})


{ "acknowledged" : true, "matchedCount" : 3, "modifiedCount" : 3 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat",
"ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" : "Dog", "ownerAddress" :
"451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" : "Dog", "ownerAddress" : "380
W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Kevin", "age" : "5", "species" : "Dog", "ownerAddress" :
"900 W. Wood Wa
replaceOne()
The replaceOne() method is used to replace a single document in the specified collection. replaceOne() replaces
the entire document, meaning fields in the old document not contained in the new will be lost.

db.RecordsDB.replaceOne({name: "Kevin"}, {name: "Maki"})


{ "acknowledged" : true, "matchedCount" : 1, "modifiedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat",
"ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" : "Dog", "ownerAddress"
: "451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" : "Dog", "ownerAddress" :
"380 W. Fir Ave", "chipped" : true }
{ "_id" : ObjectId("5fd994efce6e8850d88270ba"), "name" : "Maki" }
Delete Operations
Delete operations operate on a single collection, like update and create operations. Delete operations are also
atomic for a single document. You can provide delete operations with filters and criteria in order to specify which
documents you would like to delete from a collection. The filter options rely on the same syntax that read operations
utilize.
MongoDB has two different methods of deleting records from a collection:
db.collection.deleteOne()
db.collection.deleteMany()

deleteOne()
deleteOne() is used to remove a document from a specified collection on the MongoDB server. A filter criteria
is used to specify the item to delete. It deletes the first record that matches the provided filter.
db.RecordsDB.deleteOne({name:"Maki"})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat",
"ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" : "Dog", "ownerAddress" :
"451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" : "Dog", "ownerAddress" :
"380 W. Fir Ave", "chipped" : true }
deleteMany()
deleteMany() is a method used to delete multiple documents from a desired collection with a single delete operation.
A list is passed into the method and the individual items are defined with filter criteria as in deleteOne().

db.RecordsDB.deleteMany({species:"Dog"})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat"

You might also like