Unit 5
Unit 5
Unit 5
• A database system provides an ultimate view of the stored data. However, data in the form of bits, bytes get
stored in different storage devices.
• In this section, we will take an overview of various types of storage devices that are used for accessing and
storing data.
• Primary Storage
• Secondary Storage
• Tertiary Storage
Primary Storage
• It is the primary area that offers quick access to the stored data. We also know the primary storage as volatile
storage. It is because this type of memory does not permanently store the data. As soon as the system leads to a
power cut or a crash, the data also get lost. Main memory and cache are the types of primary storage.
• Main Memory: It is the one that is responsible for operating the data that is available by the storage medium. The
main memory handles each instruction of a computer machine. This type of memory can store gigabytes of data on
a system but is small enough to carry the entire database. At last, the main memory loses the whole content if the
system shuts down because of power failure or other reasons.
• Cache: It is one of the costly storage media. On the other hand, it is the fastest one. A cache is a tiny storage media
which is maintained by the computer hardware usually. While designing the algorithms and query processors for
the data structures, the designers keep concern on the cache effects.
Secondary Storage
• Secondary storage is also called as Online storage. It is the storage area that allows the user to save and store data
permanently. This type of memory does not lose the data due to any power failure or system crash. That's why we
also call it non-volatile storage.
• There are some commonly described secondary storage media which are available in almost every type of
computer system:
• Flash Memory: A flash memory stores data in USB (Universal Serial Bus) keys which are further plugged into the USB
slots of a computer system. These USB keys help transfer data to a computer system, but it varies in size limits. Unlike
the main memory, it is possible to get back the stored data which may be lost due to a power cut or other reasons.
This type of memory storage is most commonly used in the server systems for caching the frequently used data. This
leads the systems towards high performance and is capable of storing large amounts of databases than the main
memory.
Magnetic Disk Storage: This type of storage media is also known as online storage media. A magnetic disk is used for
storing the data for a long time. It is capable of storing an entire database. It is the responsibility of the computer
system to make availability of the data from a disk to the main memory for further accessing. Also, if the system
performs any operation over the data, the modified data should be written back to the disk. The tremendous
capability of a magnetic disk is that it does not affect the data due to a system crash or failure, but a disk failure can
easily ruin as well as destroy the stored data.
Tertiary Storage: It is the storage type that is external from the computer system. It has the slowest speed. But it is
capable of storing a large amount of data. It is also known as Offline storage. Tertiary storage is generally used for data
backup. There are following tertiary storage devices available:
• Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact Disk (CD) can store 700
megabytes of data with a playtime of around 80 minutes. On the other hand, a Digital Video Disk or a DVD can store
4.7 or 8.5 gigabytes of data on each side of the disk.
• Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for archiving or backing up the
data. It provides slow access to data as it accesses data sequentially from the start. Thus, tape storage is also known as
sequential-access storage. Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.
Transaction Control in DBMS
• The transaction is a single logical unit that accesses and modifies the contents of the database. Transactions access
data using read and write operations.
• Transaction is a single operation of processing that can have many operations. Transaction is needed when more
than one user wants to access same database. Transaction has ACID properties.
ACID Properties of Transaction:
• Atomicity: All the operations in a transaction are considered to be atomic and as one unit. If system fails
or any read/write conflicts occur during transaction the system needs to revert back to its previous state.
Atomicity is maintained by the Transaction Management Component.
• Consistency: If system fails because of invalid data while doing an operation revert back the system to
its previous state. Consistency is maintained by the Application manager.
• Isolation: If multiple transactions are executing on single database, each transaction should be isolated
from other transaction. If multiple transactions are performed on single database, operation from any
transaction should not interfere with operation in other transaction. Isolation is maintained by the
concurrency control manager.
• Durability: Durability means the changes made during the transactions should exist after completion of
transaction. Changes must be permanent and must not be lost due to any database failure. It is
maintained by the recovery manager.
Concurrency Control
• Concurrency control is an essential aspect of database management systems (DBMS) that ensures transactions can
execute concurrently without interfering with each other. However, concurrency control can be challenging to
implement, and without it, several problems can arise, affecting the consistency of the database. In this article, we
will discuss some of the concurrency problems that can occur in DBMS transactions and explore solutions to prevent
them.
• When multiple transactions execute concurrently in an uncontrolled or unrestricted manner, then it might lead to
several problems. These problems are commonly referred to as concurrency problems in a database environment.
The five concurrency problems that can occur in the database are:
• Temporary Update Problem
• Incorrect Summary Problem
• Lost Update Problem
• Unrepeatable Read Problem
• Phantom Read Problem
Temporary Update Problem:
• Temporary update or dirty read problem occurs when one transaction updates an item and fails. But the updated
item is used by another transaction before the item is changed or reverted back to its last value.
• Concurrency control means that multiple transactions can be executed at the same time and then the
interleaved logs occur. But there may be changes in transaction results so maintain the order of execution
of those transactions.
• During recovery, it would be very difficult for the recovery system to backtrack all the logs and then start
recovering.
• Recovery with concurrent transactions can be done in the following four ways.
• Interaction with concurrency control
• Transaction rollback
• Checkpoints
• Restart recovery
Interaction with concurrency control :
• In this scheme, the recovery scheme depends greatly on the concurrency control scheme that is used. So, to rollback a
failed transaction, we must undo the updates performed by the transaction.
Transaction rollback :
• In this scheme, we rollback a failed transaction by using the log.
• The system scans the log backward a failed transaction, for every log record found in the log the system restores the
data item.
Checkpoints :
• Checkpoints is a process of saving a snapshot of the applications state so that it can restart from that point in case of
failure.
• Checkpoint is a point of time at which a record is written onto the database form the buffers.
• Checkpoint shortens the recovery process.
• When it reaches the checkpoint, then the transaction will be updated into the database, and till that point, the entire log
file will be removed from the file. Then the log file is updated with the new step of transaction till the next checkpoint
and so on.
• The checkpoint is used to declare the point before which the DBMS was in the consistent state, and all the transactions
were committed.
To ease this situation, ‘Checkpoints‘ Concept is used by the most DBMS.
• In this scheme, we used checkpoints to reduce the number of log records that the system must scan when it
recovers from a crash.
• In a concurrent transaction processing system, we require that the checkpoint log record be of the form
<checkpoint L>, where ‘L’ is a list of transactions active at the time of the checkpoint.
• A fuzzy checkpoint is a checkpoint where transactions are allowed to perform updates even while buffer blocks are
being written out.
Restart recovery :
• When the system recovers from a crash, it constructs two lists.
• The undo-list consists of transactions to be undone, and the redo-list consists of transaction to be redone.
• The system constructs the two lists as follows: Initially, they are both empty. The system scans the log backward,
examining each record, until it finds the first <checkpoint> record.
What is CRUD in MongoDB?
• CRUD operations describe the conventions of a user-interface that let users view, search, and modify parts of the
database.
• MongoDB documents are modified by connecting to a server, querying the proper documents, and then
changing the setting properties before sending the data back to the database to be updated. CRUD is data-
oriented, and it’s standardized according to HTTP action verbs.
When it comes to the individual CRUD operations:
• The Create operation is used to insert new documents in the MongoDB database.
• The Read operation is used to query a document in the database.
• The Update operation is used to modify existing documents in the database.
• The Delete operation is used to remove documents in the database.
db.RecordsDB.find({"species":"Cat"})
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat", "o
findOne()
• In order to get one document that satisfies the search criteria, we can simply use the findOne() method on our
chosen collection. If multiple documents satisfy the query, this method returns the first document according to the
natural order which reflects the order of documents on the disk. If no documents satisfy the search criteria, the
function returns null. The function takes the following form of syntax.
db.{collection}.findOne({query}, {projection})
db.RecordsDB.find({"age":"8 years"})
The update filter defines which items we want to update, and the update action defines how to update those
items. We first pass in the update filter. Then, we use the “$set” key and provide the fields we want to update as a
value. This method will update the first record that matches the provided filter.
deleteOne()
deleteOne() is used to remove a document from a specified collection on the MongoDB server. A filter criteria
is used to specify the item to delete. It deletes the first record that matches the provided filter.
db.RecordsDB.deleteOne({name:"Maki"})
{ "acknowledged" : true, "deletedCount" : 1 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat",
"ownerAddress" : "521 E. Cortland", "chipped" : true }
{ "_id" : ObjectId("5fd993a2ce6e8850d88270b7"), "name" : "Marsh", "age" : "5", "species" : "Dog", "ownerAddress" :
"451 W. Coffee St. A204", "chipped" : true }
{ "_id" : ObjectId("5fd993f3ce6e8850d88270b8"), "name" : "Loo", "age" : "5", "species" : "Dog", "ownerAddress" :
"380 W. Fir Ave", "chipped" : true }
deleteMany()
deleteMany() is a method used to delete multiple documents from a desired collection with a single delete operation.
A list is passed into the method and the individual items are defined with filter criteria as in deleteOne().
db.RecordsDB.deleteMany({species:"Dog"})
{ "acknowledged" : true, "deletedCount" : 2 }
> db.RecordsDB.find()
{ "_id" : ObjectId("5fd98ea9ce6e8850d88270b5"), "name" : "Kitana", "age" : "4 years", "species" : "Cat"