21aim45a-Dbms Module-5

Download as pdf or txt
Download as pdf or txt
You are on page 1of 74

DATABASE MANAGEMENT SYSTEM

MODULE -5
SUBJECT CODE: 21AIM45A
MODULE –V : SYLLABUS

NOSQL Databases: What is NoSQL, Need of NOSQL,


Features OF NOSQL, CAP Theorem, ACID v/s BASE,
Advantages & Disadvantages of NOSQL,
Types of NOSQL: Key-Value database Document-based database-
Column- based database - Graph based database.

Introduction to Cassandra: Architecture, Gossip protocol, Snitches,


Virtual Nodes, write consistency level and write process, read
consistency level and read data operation, compaction, Antientropy,
Tombstones
Relational databases Vs NoSQL

✓ Relational databases require that schemas be defined before you can add data. For example, you
might want to store data about your customers such as phone numbers, first and last name,
address, city and state – a SQL database needs to know this in advance.

✓ This fits poorly with agile development approaches, because each time you complete new features,
the schema of your database often needs to change. So if you decide, a few iterations into
development, that you'd like to store customers' favorite items in addition to their addresses and
phone numbers, you'll need to add that column to the database, and then migrate the entire
database to the new schema.

✓ If the database is large, this is a very slow process that involves significant downtime.
If you are frequently changing the data your application stores , this downtime may also be frequent.

✓ There's also no way, using a relational database, to effectively address data that's completely
unstructured or unknown in advance.

✓ NoSQL databases are built to allow the insertion of data without a predefined schema. That
makes it easy to make significant application changes in real-time, without worrying about
service interruptions – which means development is faster, code integration is more reliable,
and less database administrator time is needed.
What is unstructured data?
• Unstructured data is data that you cannot store in the traditional structure of a relational
database.

• It is sometimes referred to qualitative data — you can derive meaning from it, but it can also
be incredibly ambiguous and difficult to parse.

• Because of the nature of unstructured data, document-type storage options, which are non-
relational (NoSQL) databases, are ideal since they can more easily adapt their data model with
a flexible schema.

Unstructured data examples


•Email: While we sometimes consider this semi-structured, email message fields are text fields
that are not always easily analyzed.
•Multimedia content: Digital photos, audio, and video files are all unstructured. Complicating
matters, multimedia can come in multiple format files, produced through various means. For
instance, a photo can be TIFF, JPEG, GIF, PNG, or RAW, each with their own characteristics.
•Text files: Almost all traditional business files, including your word processing documents,
presentations, notes, and PDFs, are unstructured data.
•Social media: Social media has a component of semi-structured data you can access through
built-in analytics, but the content of each social media message is unstructured.
Unstructured data examples
•Websites and markup language:
The content on the web may be tagged, but code is not designed to capture the
meaning or function of tagged elements in ways that support automated
processing of the information contained on each page. XML provides an element
of structure, however, these building blocks are filled with unstructured elements.

•Mobile and communications data:


Your customer service and sales team are collating unstructured data in their
phone calls and chat logs, including text messages, phone recordings,
collaboration software, conferencing, and instant messaging.

•Survey responses:
Every time you gather feedback from your customers, you're collecting
unstructured data. For example, surveys with text responses and open-ended
comment fields are unstructured data.
NoSQL Database
❖NoSQL Database is used to refer a non-SQL or non
relational database.

❖It provides a mechanism for storage and retrieval of


data other than tabular relations model used in
relational databases.
❖ NoSQL database doesn't use tables for storing data. It is
generally used to store big data and real-time web
applications,where there is a high volume of data that needs
to be processed and analyzed in real-time, such as social
media analytics, e-commerce, and gaming. They can also be
used for other applications, such as content management
systems, document management, and customer relationship
management.
NoSQL Databases-Technical concepts

❑ NoSQL databases are open-source, schema-less, horizontally


scalable and high-performance databases.

❑ The four types of data stores in NoSQL databases (key-value store,


document store, column store, and graph store) contribute to
significant flexibility for a range of applications.

❑ NoSQL databases are well suited to handle typical challenges of big


data, including volume, variety, and velocity.

❑ For these reasons, they are increasingly adopted by private


industries and used in research.
❑ They have gained tremendous popularity in the last decade due to
their ability to manage unstructured data (e.g. social media data).
NoSQL databases may not be suitable for all applications, as they may not provide
the same level of data consistency and transactional guarantees as traditional
relational databases
History behind the creation of NoSQL Databases

❖ In the early 1970, Flat File Systems are used. Data were stored in
flat files and the biggest problems with flat files are each company
implement their own flat files and there are no standards. It is very
difficult to store data in the files, retrieve data from files because
there is no standard way to store data.

❖ Then the relational database was created by E.F. Codd and these
databases answered the question of having no standard way to
store data.

❖ But later relational database also get a problem that it could not
handle big data, due to this problem there was a need of database
which can handle every types of problems then NoSQL database
was developed.
Characteristics of NoSQL databases

With a technique called “sharding” MongoDB easily distribute data


and grows the deployment over inexpensive hardware or in the
cloud. One of the benefits of scaling with MongoDB is that
sharding is automatic and built into the database. This relieves
developers of having to build in sharding logic into the
application code to scale out the system.
ACID Vs BASE properties
ACID Vs BASE properties
Types of NOSQL

NoSQL databases are generally classified into four main categories:

1.Document databases: These databases store data as semi-structured documents,


such as JSON or XML, and can be queried using document-oriented query
languages.

2.Key-value stores: These databases store data as key-value pairs, and are
optimized for simple and fast read/write operations.

3.Column-family stores:
These databases store data as column families, which are sets of columns that are
treated as a single entity. They are optimized for fast and efficient querying of large
amounts of data.

4.Graph databases:
These databases store data as nodes and edges, and are designed to handle
complex relationships between data.
Graph store database
Graph store database that is designed to handle data with
complex relationships and interconnections.
In a graph database, data is stored as nodes and edges, where
nodes represent entities and edges represent the relationships
between those entities.
Graph stores include Neo4J and HyperGraphDB.
Key Features of NoSQL

❖ Dynamic schema: NoSQL databases do not have a fixed schema


and can accommodate changing data structures without the
need for migrations or schema alterations.
❖ Horizontal scalability: NoSQL databases are designed to scale out
by adding more nodes to a database cluster, making them well-
suited for handling large amounts of data and high levels of
traffic.
❖ Document-based: Some NoSQL databases, such as MongoDB, use
a document-based data model, where data is stored in semi-
structured format, such as JSON or BSON.
❖ Key-value-based: Other NoSQL databases, such as Redis, use a
key-value data model, where data is stored as a collection of key-
value pairs.
❖ Column-based: Some NoSQL databases, such as
Cassandra, use a column-based data model, where data is
organized into columns instead of rows.
❖ Distributed and high availability: NoSQL databases are
often designed to be highly available and to automatically
handle node failures and data replication across multiple
nodes in a database cluster.
❖ Flexibility: NoSQL databases allow developers to store
and retrieve data in a flexible and dynamic manner, with
support for multiple data types and changing data
structures.
❖ Performance: NoSQL databases are optimized for high
performance and can handle a high volume of reads and
writes, making them suitable for big data and real-time
applications.
Need of NOSQL

•NoSQL databases frequently enable developers to customize the data format.


They are well suited to contemporary Agile production techniques centred on
sprints, fast modifications, and frequent software releases.

•It can be time-consuming for a programmer to request a SQL database operator


to modify the database structure and afterwards discharge and then reload the
data.
•NoSQL databases are frequently more suitable for collecting and analyzing
organized, semi-structured, and unstructured data in a centralized database.

•NoSQL databases frequently store information in a format comparable to the


entities used during applications.

•Minimizing the requirement for translation between the format in which the
information is stored to the format in which the data appears in code.
•As part of its core architecture, NoSQL databases originally were designed to
manage large amounts of information.
Types of NoSQL Database:
•Document-based databases
•Key-value stores
•Column-oriented databases
•Graph-based databases
Document-Based Database: Key features of document database:
❖ The document-based database is a ❖ Flexible schema: Documents in the
nonrelational database. Instead of storing the database has a flexible schema. It
data in rows and columns (tables), it uses the means the documents in the
documents to store the data in the database. database need not be the same
schema.
❖ A document database stores data in JSON, ❖ Faster creation and maintenance:
BSON, or XML documents. the creation of documents is easy
and minimal maintenance is
❖ Documents can be stored and retrieved in a required once we create the
form that is much closer to the data objects document.
used in applications which means less ❖ No foreign keys: There is no
translation is required to use these data in the dynamic relationship between two
applications. documents so documents can be
independent of one another. So,
❖ In the Document database, the particular there is no requirement for a
elements can be accessed by using the index foreign key in a document
value that is assigned for faster querying. database.
❖ Open formats: To build a document
we use XML, JSON, and others.
Key-Value Stores:
❖A key-value store is a nonrelational database. The simplest form of a
NoSQL database is a key-value store. Every data element in the database
is stored in key-value pairs. The data can be retrieved by using a unique
key allotted to each element in the database. The values can be simple
data types like strings and numbers or complex objects.

❖A key-value store is like a relational database with only two columns


which is the key and the value.
Key features of the key-value store:
•Simplicity.
•Scalability.
•Speed.
Column Oriented Databases:
❖ A column-oriented database is a non-relational database that
stores the data in columns instead of rows. That means when we
want to run analytics on a small number of columns, you can
read those columns directly without consuming memory with the
unwanted data.
❖ Columnar databases are designed to read data more efficiently
and retrieve the data with greater speed. A columnar database is
used to store a large amount of data. Key features of columnar
oriented database:
•Scalability.
•Compression.
•Very responsive.
Graph-Based databases:
❖Graph-based databases focus on the relationship between the elements.
It stores the data in the form of nodes in the database. The connections
between the nodes are called links or relationships.

❖Key features of graph database:


❖In a graph-based database, it is easy to identify the relationship between
the data by using the links.
❖The Query’s output is real-time results.
❖The speed depends upon the number of relationships among the
database elements.
❖Updating data is also easy, as adding a new node or edge to a graph
database is a straightforward task that does not require significant
schema changes.
CAP Theorem
❑ NoSQL is also type of distributed database, which means that information is copied and
stored on various servers, which can be remote or local.

❑ This ensures availability and reliability of data. If some of the data goes offline, the rest
of the database can continue to run.

❑ The CAP theorem, originally introduced as the CAP principle, can be used to explain some of the competing
requirements in a distributed system with replication. It is a tool used to make system designers aware of the
trade-offs while designing networked shared-data systems.

❑ The three letters in CAP refer to three desirable properties of distributed systems with replicated data:
consistency (among replicated copies), availability (of the system for read and write operations) and partition
tolerance (in the face of the nodes in the system being partitioned by a network fault).

❑ The CAP theorem states that it is not possible to guarantee all three of the desirable properties –
consistency, availability, and partition tolerance at the same time in a distributed system with data
replication.
CAP Theorem
The theorem states that networked shared-data systems can only strongly support two of
the following three properties:

In CAP, the term consistency refers to the


consistency of the values in different copies of
the same data item in a replicated distributed
system.
The theorem states that networked shared-data systems can only strongly support two of the
following three properties:
•Consistency –
Consistency means a guarantee that every node in a distributed cluster returns the same, most recent and a
successful write. Consistency refers to every client having the same view of the data. There are various types of
consistency models. Consistency in CAP refers to sequential consistency, a very strong form of consistency.

•Availability –
Availability means that each read or write request for a data item will either be processed successfully or will
receive a message that the operation cannot be completed. Every non-failing node returns a response for all the
read and write requests in a reasonable amount of time. The key word here is “every”. In simple terms, every
node (on either side of a network partition) must be able to respond in a reasonable amount of time.

•Partition Tolerance –
Partition tolerance means that the system can continue operating even if the network connecting the nodes has a
fault that results in two or more partitions, where the nodes in each partition can only communicate among each
other. That means, the system continues to function and upholds its consistency guarantees in spite of network
partitions. Network partitions are a fact of life. Distributed systems guaranteeing partition tolerance can gracefully
recover from partitions once the partition heals.
CAP theorem NoSQL database types
NoSQL databases are ideal for distributed network applications.

NoSQL databases are horizontally scalable and distributed by design—


they can rapidly scale across a growing network consisting of multiple
interconnected nodes.
Today, NoSQL databases are classified based on the CAP characteristics they
support:

1.CP database: A CP database delivers consistency and partition tolerance at


the expense of availability. When a partition occurs between any two nodes,
the system has to shut down the non-consistent node (i.e., make it
unavailable) until the partition is resolved.
2.AP database: An AP database delivers availability and partition tolerance at the expense of
consistency. When a partition occurs, all nodes remain available but those at the wrong end of a
partition might return an older version of data than others. (When the partition is resolved, the
AP databases typically resync the nodes to repair all inconsistencies in the system.)

3.CA database:
A CA database delivers consistency and availability across all nodes. It can’t do this if there is a
partition between any two nodes in the system, however, and therefore can’t deliver fault
tolerance.
In a distributed system, partitions can’t be avoided. So, while we can discuss a CA distributed
database in theory, for all practical purposes a CA distributed database can’t exist. This doesn’t
mean you can’t have a CA database for your distributed application if you need one.

Many relational databases, such as PostgreSQL, deliver consistency and availability and can
be deployed to multiple nodes using replication.
SQL NoSQL
❖ In relational database management systems, data is
arranged in tables. ❖ NoSQL databases are non-relational, which
eliminates the need for connecting tables.

❖ SQL provided a standard interface to interact with ❖ Their built-in sharing and high availability
relational data, allowing analysts to connect tables by capabilities ease horizontal scaling.
merging on common fields.

❖ If a single database server is not enough to store all


❖ Provides vertical scaling of data
your data or handle all the queries, the workload
can be divided across two or more servers,
allowing companies to scale their data horizontally.

❖ Not a popular choice for unstructured data, big data ❖ In this era of growth within cloud, big data, and
and real time data applications mobile and web applications, NoSQL databases
provide that speed and scalability, making it a
popular choice for their performance and ease of
use.
Shashikala KS

Introduction to Cassandra
• Architecture
• Gossip protocol
• Snitches, Virtual Nodes
• write consistency level and write process
• read consistency level and read data operation
• indexing, compaction
• Anti-entropy
• Tombstones
Shashikala KS

Cassandra
• Apache Cassandra is an open source distributed database management system designed
to handle large amounts of data across many commodity servers, providing high
availability with no single point of failure.

• Apache Cassandra was developed by Avinash Lakshman and Prashant Malik when
both were working as engineers at Facebook. The database was designed to
power Facebook's inbox search feature, making it easy for users to quickly find the
conversations and other content they were looking for.

• Cassandra offers robust support for clusters spanning multiple datacenters, with
asynchronous masterless replication allowing low latency operations for all clients.
Shashikala KS

1.A distributed database:


• In computing, distributed means splitting data or tasks across multiple
machines. In the context of Cassandra, it means that the data is distributed
across multiple machines. It means that no single node (a machine in a
cluster is usually called a node) holds all the data, but just a chunk of it.

• It means that you are not limited by the storage and processing capabilities
of a single machine. If the data gets larger, add more machines. If you need
more parallelism (ability to access data in parallel/concurrently), add more
machines.

• This means that a node going down does not mean that all the data is lost .If
a distributed mechanism is well designed, it will scale with a number of
nodes. Cassandra is one of the best examples of such a system
Shashikala KS

2.High availability
A high-availability system is one that is ready to serve any request at any
time. High availability is usually achieved by adding redundancies. So, if
one part fails, the other part of the system can serve the request. To a
client, it seems as if everything works fine.

Cassandra is a robust software. Nodes joining and leaving are


automatically taken care of.

With proper settings, Cassandra can be made failure-resistant. This means


that if some of the servers fail, the data loss will be zero.

So, you can just deploy Cassandra over cheap commodity hardware or a
cloud environment, where hardware or infrastructure failures may occur.
Shashikala KS

3. Replication
Cassandra has a pretty powerful replication mechanism. Cassandra treats every node in the same manner.

Data need not be written on a specific server (master), and you need not wait until the data is written to all the
nodes that replicate this data (slaves).

So, there is no master or slave in Cassandra, and replication happens asynchronously. This means that
the client can be returned with success as a response as soon as the data is written on at least one server.

4.Multiple data centers


Expanding from a single machine to a single data center cluster or multiple data centers is very simple compared to
traditional databases where you need to make a plethora of configuration changes and watch replication.

We can use each data center to perform different tasks without overloading the other data centers.

This is a powerful support when you do not have to worry whether users in Japan with a data center in Tokyo and users
in the US with a data center in Virginia, are in sync or not.
Shashikala KS

Cassandra data model


• Cassandra has three containers, one within another.
• The outermost container is keyspace. You can think of keyspace as a database in the
RDBMS.
• Tables reside under keyspace. A table can be assumed as a relational database table,
except it is more flexible.
• A table is basically a sorted map of sorted maps .
• Each table must have a primary key. This primary key is called row key or partition key.
• (In a CQL table, the row key is the same as the primary key. If the primary key is made
up of more than one column, the first component of this composite key is equivalent to
the row key).
• Each partition is associated with a set of cells. Each cell has a name and a value. These
cells may be thought of as columns in the traditional database system.
• The CQL engine interprets a group of cells with the same cell name prefix as a row.
Shashikala KS

Cassandra
data model
Shashikala KS

Features of Cassandra

Benefit of having such a flexible data storage mechanism is that you can have arbitrary number of cells with
customized names and have a partition key store data as a list of tuples (a tuple is an ordered set; in this case, the tuple
is a key-value pair).

This comes handy when you have to store things such as time series, for example, if you want to use Cassandra to store
your Facebook timeline or your Twitter feed or you want the partition key to be a sensor ID and each cell to represent a
tuple with name as the timestamp when the data was created and value as the data sent by the sensor

Unlike RDBMS, Cassandra does not have relations. This means relational logic will be needed to be
handled at the application level.

This also means that we may want to denormalize the database because there is no join and to avoid
looking up multiple tables by running multiple queries. Denormalization is a process of adding
redundancy in data to achieve high read performance
Shashikala KS

Types of keys
In the context of Cassandra, There are five types of keys

• Primary key: This is the column or a group of columns that uniquely defines a row of the CQL table.

• Composite key: This is a type of primary key that is made up of more than one column. Sometimes, the composite key
is also referred to as the compound key.

• Partition key: Cassandra’s internal data representation is large rows with a unique key called row key. It uses these row
key values to distribute data across cluster nodes. Since these row keys are used to partition data, they as called partition
keys. When you define a table with a simple key, that key is the partition key. If you define a table with a composite key,
the first term of that composite key works as the partition key. This means all the CQL rows with the same partition key
lives on one machine.

• Clustering key: This is the column that tells Cassandra how the data within a partition is ordered (or clustered). This
essentially provides presorted retrieval if you know what order you want your data to be retrieve in.

• Composite partition key:


• Optionally, CQL lets you define a composite partition key (the first part of a composite key). This key helps you
distribute data across nodes if any part of the composite partition key differs.
Shashikala KS
CREATE TABLE customers ( id uuid, email text, PRIMARY KEY (id) )

In the preceding example, id is the primary key and also the partition key. There is no
clustering. It is a simple key.

Let’s add a twist to the primary key:


CREATE TABLE country_states ( country text, state text, population int, PRIMARY KEY
(country, state) )

In the preceding example, we have a composite key that uses country and state to uniquely
define a CQL row.

The country column is the partition key, so all the rows with the same country node will
belong to the same node/machine. The rows within a partition will be sorted by the state
names.

So, when you query for states in the US, you will encounter the row with California before
the one with New York
Shashikala KS

Cassandra Architecture
RDBMS has been tremendously successful in the last 40 years (the relational data model was proposed in its first
incarnation by Codd, E.F. (1970) in his research paper,” A Relational Model of Data for Large Shared Data Banks”).

However, in early 2000s, big companies such as Google and Amazon, which have a gigantic load on their databases to
serve, started to feel bottlenecked with RDBMS, even with helper services such as Memcache on top of them.

As a response to this, Google came up with BigTable and Amazon with Dynamo

While using RDBMS for a complicated web application, We face problems such as slow queries due to complex joins,
expensive vertical scaling, and problems in horizontal scaling.

Due to these problems, indexing takes a long time. At some point, you may have chosen to replicate the database, but
there was still some locking, and this hurts the availability of the system. This means that under a heavy load, locking
will cause the user’s experience to deteriorate
Shashikala KS

NoSQL

NoSQL is a blanket term for the databases that solve the scalability issues which are
common among relational databases. This term, in its modern meaning, was first coined
by Eric Evans

NoSQL solutions provide scalability and high availability, but may not guarantee
ACID: atomicity, consistency, isolation, and durability in transactions.

Many of the NoSQL solutions, including Cassandra, sit on the other extreme of
ACID, named BASE, which stands for basically available, soft-state, eventual
consistency
Shashikala KS

Architecture of Cassandra

Cassandra is a distributed, decentralized, fault tolerant, eventually consistent, linearly scalable, and
column-oriented data store.

Cassandra is made to easily deploy over a cluster of machines located at geographically different places.

There is no central master server, so no single point of failure, no bottleneck, data is replicated, and a
faulty node can be replaced without any downtime. It’s consistent.

It is linearly scalable, which means that with more nodes, the requests served per second per node will not
go down.

Also, the total throughput of the system will increase with each node being added. And finally, it’s column
oriented, much like a map (or better, a map of sorted maps) or a table with flexible columns where each
column is essentially a key-value pair
Shashikala KS

Architecture of Cassandra
Cassandra has a tabular data presentation. It is rather a dictionary-like structure where each entry holds another sorted
dictionary/map.

This model is more powerful than the usual key-value store and it is named a table, formerly known as a column
family.
• Each row is identified by a row key with a number (token), and unlike spreadsheets, each cell may have
its own unique name within the row.

• In Cassandra, the columns in the rows are sorted by this unique column name.

• Also, since the number of partitions is allowed to be very large (1.7*1038), it distributes the rows almost
uniformly across all the available machines by dividing the rows in equal token groups.

• Tables or column families are contained within a logical container or name space called keyspace.

• A keyspace can be assumed to be more or less similar to database in RDBMS.


Shashikala KS

• A cell in a partition can be assumed as a key-value pair.

• The maximum number of cells per partition is limited by the Java integer’s max value, which is about 2
billion. So, one partition can hold a maximum of 2 billion cells.

• A row, in CQL terms, is a bunch of cells with predefined names.

• When you define a table with a primary key that has just one column, the primary key also serves as the
partition key.

• But when you define a composite primary key, the first column in the definition of the primary key works
as the partition key.

• So, all the rows (bunch of cells) that belong to one partition key go into one partition. This means that
every partition can have a maximum of X rows, where X = (2*109/number_of_columns_in_a_row).

• Essentially, rows * columns cannot exceed 2 billion per partition.


Shashikala KS

Gossip protocol
Cassandra uses the gossip protocol for inter-node communication.

It’s a way for nodes to build the global map of the system with a small number of local interactions.

Cassandra uses gossip to find out the state and location of other nodes in the ring (cluster).

The gossip process runs every second and exchanges information with, at the most, three other nodes in the cluster.
Nodes exchange information about themselves and other nodes that they come to know about via some other gossip
session. This causes all the nodes to eventually know about all the other nodes.

Gossip messages have a version number associated with them. So, whenever two nodes gossip, the older information
about a node gets overwritten with newer information.

Cassandra uses an anti-entropy version of the gossip protocol that utilizes Merkle trees to repair unread data.

Implementation-wise, the gossip task is handled by the org.apache.cassandra.gms.Gossiper class.


The Gossiper class maintains a list of live and dead endpoints (the unreachable endpoints). At each one-second interval,
this module starts a gossip round with a randomly chosen node. A full round of gossip consists of three messages.

A node X sends a syn message to a node Y to initiate gossip. Y, on receipt of this syn message, sends an ack message
back to X. To reply to this ack message, X sends an ack2 message to Y completing a full message round.
Shashikala KS

The following figure shows the two nodes gossiping

Syn and ack are also known as a message handshake. It is a mechanism that allows two machines trying to
communicate to each other to negotiate the parameters of connection before transmitting data. Syn stands for
“synchronize packet” and ack stands for “acknowledge packet”.

The Gossiper module is linked to failure detection. The module, on hearing one of these messages,
updates the failure detector with the liveness information that it has gained.
If it hears GossipShutdownMessage, the module marks the remote node as dead in the failure detector. The
node to be gossiped with is chosen based on the following rules:
• Gossip to a random live endpoint
• Gossip to a random unreachable endpoint
• If the node in point 1 was not a seed node or the number of live nodes is less than the number of seeds,
gossip to a random seed
Shashikala KS

Ring representation

Token ownership and


distribution in a
balanced Cassandra ring
Shashikala KS

Virtual nodes
In Cassandra 1.1 and previous versions, when you create a cluster or add a node, you manually assign its initial
token. This extra work the database should handle internally.

Apart from this, adding and removing nodes requires manual resetting token ownership for some or all nodes.
This is called rebalancing.

Yet another problem was replacing a node. In the event of replacing a node with a new one, the data (rows that
the to-be-replaced node owns) is required to be copied to the new machine from a replica of the old machine

For a large database, this could take a while because we are streaming from one machine.
To solve all these problems, Cassandra 1.2 introduced virtual nodes (vnodes).
Shashikala KS

The figure shows 16 vnodes distributed over four servers


• Each node is responsible for a single continuous range.
• In the case of a replication factor of 2 or more, the data is also stored on other machines
than the one responsible for the range. (Replication factor (RF) represents the number of
copies of a table that exist in the system. So, RF=2, means there are two copies of each
record for the table.)

• In this case, one can say one machine, one range.

• With vnodes, each machine can have multiple smaller ranges and these ranges are
automatically assigned by Cassandra.

• If you have a 30 ring cluster and a node with 256 vnodes had to be replaced.
• If nodes are well-distributed randomly across the cluster, each physical node in
remaining 29 nodes will have 8 or 9 vnodes (256/29) that are replicas of vnodes on the
dead node. In older versions, with a replication factor of 3, the data had to be streamed
from three replicas (10 percent utilization). In the case of vnodes, all the nodes can
participate in helping the new node get up.

• The other benefit of using vnodes is that you can have a heterogeneous ring where some
machines are more powerful than others, and change the vnodes ‘ settings such that the
stronger machines will take proportionally more data than others.
Shashikala KS

Compaction

A read require may require Cassandra to read across multiple SSTables to get a result. This is wasteful, costs
multiple (disk) seeks, may require a conflict resolution, and if there are too many SSTables, it may slow down the
read.

To handle this problem, Cassandra has a process in place, namely compaction.

Compaction merges multiple SSTable files into one.

Off the shelf, Cassandra offers two types of compaction mechanisms: size-tiered compaction strategy and level
compaction strategy

The compaction process starts when the number of SSTables on disk reaches a certain threshold (configurable).
Although the merge process is a little I/O intensive, it benefits in the long term with a lower number of disk seeks
during reads.

Benefits of compaction:
Removal of expired tombstones (Cassandra v0.8+)
Merging row fragments
Rebuilds primary and secondary indexes
Shashikala KS

For example, let’s say you have a compaction threshold set to four.

Cassandra initially creates SSTables of the same size as MemTable.

When the number of SSTables surpasses the threshold, the compaction thread triggers. This compacts the four
equal-sized SSTables into one.

Temporarily, you will have two times the total SSTable data on your disk. Another thing to note is that SSTables
that get merged have the same size. So, when the four SSTables get merged to give a larger SSTable of size, say G,
the buckets for the rest of the to-be-filled SSTables will be G each.

So, the next compaction will take an even larger space while merging. The SSTables, after merging, are marked as
deletable. They get deleted at a garbage collection cycle of the JVM, or when Cassandra restarts. The compaction
process happens on each node and does not affect other nodes. This is called minor compaction. This is
automatically triggered, system controlled, and regular.
Shashikala KS

WRITE PROCESS AND WRITE CONSISTENCY LEVELS IN CASSANDRA

Cassandra is designed to distribute and manage large data loads across multiple nodes in
a cluster constituted of commodity hardware.

A node in Cassandra is structurally identical to any other node.


Shashikala KS

1.Gossip and Failure Detection:


Gossip is a peer-to-peer communication protocol in which nodes periodically exchange
state information about themselves and about other nodes they know about. The gossip
process runs every second and exchanges state messages with up to three other nodes in the
cluster. The nodes exchange information about themselves and about the other nodes that
they have gossiped about, so all nodes quickly learn about all other nodes in the cluster
whether a node is alive or offline. A gossip message has a version associated with it, so that
during a gossip exchange, older information is overwritten with the most current state for a
particular node.
2.Partitioner:
A partitioner takes a call on how to distribute data on the various nodes in a cluster. A
partitioner is a hash function to compute the token of the partition key. The partition key
helps to identify a row uniquely.

3.Replication factor:
It determines number of copies of data(replicas) that will be stored across nodes in a cluster
Shashikala KS
4. Anti-Entropy and Read Repair:

When a client connects to any node in the cluster to read data, the key question is how
many nodes will be read before responding to the client is based on the consistency level
specified by the client.

If the consistency is not met, the read operation blocks as few of the nodes may respond
with an out-of-date value.

For repairing unread data, Cassandra uses an anti-entropy version of the gossip
protocol, implies comparing all the replicas of each piece of data and updating each
replica to the newest version.
Shashikala KS

5. Writes in Cassandra:
Assume a client initiates a write request. Where does his write get written to? It is first
written to the commit log. A write is taken as successful only if it is written to the
commit log.
The next step is to push the write to a memory resident data structure called Memtable.

When the number of objects stored in the Memtable reaches a defined threshold value,
the contents of Memtable are flushed to the disk in a file called SSTable(Sorted String
Table).

It is possible to have multiple Memtables for a single column family. One of them is
current and rest are waiting to be flushed
Hinted Handoffs: Shashikala KS

Cassandra works on the philosophy that it will always be available for writes.
Assume there are 3 nodes A,B,C in a cluster. Node C is down for some reason.
We maintain a replication factor of 2.
The client makes a write request to Node A.
Node A is the coordinator and serves as proxy between the client and the nodes on which
the replica is to be placed(B & C)
Shashikala KS

7. Tunable Consistency: The consistency level on Cassandra is tunable by the user.


i. Strong Consistency: If we work with strong consistency it implies that each update propagates to all locations
where that piece of data resides.
ii. Eventual Consistency: It implies that the client is acknowledged with a success as soon as a part of the cluster
acknowledges the write
Shashikala KS

Consistency levels in Cassandra

Apache Cassandra is a distributed, highly scalable NoSQL


database management system. Cassandra's configurable
consistency is one of its core characteristics, allowing users to
combine data consistency with speed and availability.

A consistency level in Cassandra is the number of replicas that react


before providing a response to a user. Cassandra's consistency is
configurable, which means that any client may decide how much
consistency and availability they want.
Shashikala KS

Consistency Level (CL): is the number of replica nodes that must acknowledge a read or write request
for the whole operation/query to be successful.
Write CL controls how many replica nodes must acknowledge that they received and wrote the
partition.
Read CL controls how many replica nodes must send their most recent copy of partition to the
coordinator.

Write Consistency
Write consistency means having consistent data (immediate or eventual) after your write query to your Cassandra
cluster. You can tune the write consistency for performance (by setting the write CL as ONE) or immediate
consistency for critical piece of data (by setting the write CL as ALL) Following is how it works:

A client sends a write request to the coordinator.

The coordinator forwards the write request (INSERT, UPDATE or DELETE) to all replica nodes whatever write CL
you have set.
The coordinator waits for n number of replica nodes to respond. n is set by the write CL.
The coordinator sends the response back to the client.
Shashikala KS

The strong consistency formula is R+W>RF, where R stands for read consistency, W stands for write
consistency, and RF stands for replication factor. Consistency is deemed weak if R+W <RF.
Cassandra offers five consistency levels:

•Consistency Level ONE − For the read or write operation to be regarded successful, just one node needed
to recognize it. This category offers the least amount of consistency guarantee but the most availability and
performance. This level is appropriate for applications where data integrity is not crucial, such as tracking
user activity.

•Example − Administration of User Profiles


•Consider a web application that enables users to edit and maintain their profiles. As it is not necessary for all
nodes to concur on the user profile data in this situation, we may choose Consistency Level ONE. The
application may instantly access user profile information from any node with Consistency Level ONE,
enhancing performance and availability.
•Consistency Level TWO − For the read or write operation to be regarded as successful, two nodes must
recognize it. This level promises greater consistency than Consistency Level ONE while providing adequate
availability and performance.
Shashikala KS

Read Consistency
Read consistency refers to having same data on all replica nodes for any read request. Following is how it works:

A client sends a read request to the coordinator.


The coordinator forwards the read (SELECT) request to n number of replica nodes. n is set by the read CL.
The coordinator waits for n number of replica nodes to respond.
The coordinator then merges (finds out most recent copy of written data) the n number of responses to a single response
and sends response to the client.
Read CL = ALL gives you immediate consistency as it reads data from all replica nodes and merges them, means keeps
the most current data.

Read CL = ONE gives you benefit of speed, Cassandra only contacts one closest/fastest replica node, so throughput of
the read request will be lower so performance will be higher. Also, it might so happen that 2 out of 3 replica nodes
might be down or query might be failed and you will still get a result because CL = ONE, so you have highest
availability.

For all these benefits, the price you pay is lower consistency. So, your consistency guarantees are much lower.

Read CL = QUORUM (Cassandra contacts majority of the replica nodes) gives you a nice balance, it gives you high
performance reads, good availability and good throughput.
Anti-entropy Shashikala KS

Cassandra promises eventual consistency and read repair is the process that does this part. Read
repair, is the process of fixing inconsistencies among the replicas at the time of read.

Example:
Let’s say we have three replica nodes, A, B, and C, that contain a data X. During an update, X
is updated to X1 in replicas A and B, but it fails in replica C for some reason.
On a read request for data X, the coordinator node asks for a full read from the nearest node
(based on the configured snitch) and digest of data X from other nodes to satisfy consistency
level. The coordinator node compares these values (something like digest(full_X) ==
digest_from_node_C).
If it turns out that the digests are the same as the digests of the full read, the system is
consistent and the value is returned to the client. On the other hand, if there is a mismatch, full
data is retrieved and reconciliation is done and the client is sent the reconciled value. After this,
in the background, all the replicas are updated with the reconciled value to have a consistent
view of data on each node
Shashikala KS

The following figure shows this process: Client queries for data X, from a node C
(coordinator) C gets data from replicas R1, R2, and R3 reconciles Sends reconciled
data to client If there is a mismatch across replicas, a repair is invoked .

The following figure shows the read repair dynamics:


Shashikala KS

What if the node containing hinted handoff data dies, and the data that contains the hint is
never read? Is there a way to fix them without read?

This brings us to the anti-entropy architecture of Cassandra (borrowed from Dynamo).

Anti-entropy compares all the replicas of a column family and updates the replicas to the
latest version.

This happens during major compaction. It uses Merkle trees to determine discrepancies
among the replicas and fixes them.
Shashikala KS

Merkle tree

Merkle tree is a hash tree where leaves of the tree hashes hold actual data in column family
and non-leaf nodes hold hashes of their children.

The unique advantage of a Merkle tree is that a whole subtree can be validated just by
looking at the value of the parent node.

So, if nodes on two replica servers have the same hash values, then the underlying data is
consistent and there is no need to synchronize.

If one node passes the whole Merkle tree of a column family to another node, it can
determine all the inconsistencies.
Shashikala KS

The following figure shows the Merkle tree to determine a mismatch in hash values at
the parent nodes due to the difference in the underlying data:
Shashikala KS

Tombstones
• Cassandra is a complex system with its data distributed among commit logs, MemTables, and SSTables on a node. The
same data is then replicated over replica nodes.

• Deletion, to an extent, follows an update pattern, except Cassandra tags the deleted data with a special value, and marks it as a
tombstone. In Cassandra, deleted data is not immediately purged from the disk. Instead, Cassandra writes a special value, known as a
tombstone, to indicate that data has been deleted. Tombstones prevent deleted data from being returned during reads, and will
eventually allow the data to be dropped via compaction.

• This marker helps future queries, compaction, and conflict resolution. Let’s step further down and see what happens
when a column from a column family is deleted.

• A client connected to a node (a coordinator node may not be the one holding the data that we are going to mutate),
issues a delete command for a column C, in a column family CF.

• If the consistency level is satisfied, the delete command gets processed. When a node, containing the row key receives a
delete request, it updates or inserts the column in MemTable with a special value, namely tombstone.

• The tombstone basically has the same column name as the previous one; the value is set to the Unix epoch. The
timestamp is set to what the client has passed. When a MemTable is flushed to SSTable, all tombstones go into it as any
regular column will.
Shashikala KS

For immediate deletion, i.e., without waiting for any period of time, Cassandra supports the
‘DROP KEYSPACE’ and ‘DROP TABLE’ statements
Besides these two statements, there are two other methods of deletion, as shown in Figure 3,
in which immediate deletion does not take place. These methods are: 1. user issues a delete
command 2. user marks record (row/column) with TTL (time to live).
Shashikala KS

In the 1st method, when the delete command is issued, a tombstone, which marks the
record for deletion, gets added to a particular record. The tombstone is then written to
the SSTable. When the built-in grace period (expressed as gc_grace_seconds) expires,
the tombstone is deleted. The default value for grace period is 864,000 seconds (ten
days)

However, each table can have its own value for the grace period. In order to identify the
grace period of a tombstone, the ‘cassandra.yaml’ file can be used.

The ’cassandra.yaml’ file consists of configuration properties, one of which is the


‘gc_grace_seconds.’ The value in ‘gc_grace_seconds’ denotes the grace period
associated with a tombstone.
Shashikala KS

In the 2nd method, the user marks row/column with a TTL value. In order to identify the
TTL associated with a record, one can use the ‘TTL function’. The TTL function takes 1
argument, which is a column name, and returns the corresponding associated TTL. After the
TTL value expires, that particular record is marked with a tombstone, and then this
tombstone is written to SSTable. Then, the tombstone is deleted when the grace period
expires.

In order to identify whether a record has a tombstone associated with it, it is required to
check if the associated SSTable dump output of the corresponding partition has the
‘deletion_info’ tombstone marker. The ‘deletion_info’ tombstone marker present in the
SSTable dump output includes a ‘marked_deleted’ timestamp which indicates the time at
which a record was requested to be deleted as well as a ‘local_delete_time’ timestamp
indicating the local time at which the Cassandra server received a request to delete a record.
Hence, the ‘marked_deleted’ timestamp is specified by the user or the user application, and
the ‘local_delete_time’ timestamp is set by the Cassandra server.
Shashikala KS

The SSTable event log dump has the ‘deletion_info’ associated with a deleted record. To
read the log dump of SSTable, ‘cassandratools’ can be installed, and ‘nodetool’ can then be
used to create a snapshot, and consequently, an ‘sstabledump’ can be generated and stored
in a file. An analyst can access this log dump to identify information pertaining to the
deletion of a record.

You might also like