Introduction To Cassandra
Introduction To Cassandra
Introduction To Cassandra
2016.03.21
Agenda
• Since 1970
• Use SQL to manipulate data
• Excellent for applications such as management
(accounting, reservations, staff management, etc)
Relational DBMS
the system
all nodes see the
continues to operate
same data at the
despite arbitrary
same time
message loss
Consistency Level
• Strong (Sequential): After the update completes any
subsequent access will return the updated value
• Weak (weaker than Sequential): The system does not
guarantee that subsequent accesses will return the
updated value
• Eventual: All updates will propagate throughout all of the
replicas in a distributed system, but that this may take
some time. Eventually, all replicas will be consistent.
Cassandra
• Apache Cassandra was initially
developed at Facebook to power their
Inbox Search
• Originally designed at Facebook,
Cassandra came from Amazon’s highly
available Dynamo and Google’s BigTable
data model
Use-case: Facebook Inbox Search
• Cassandra developed to address this problem.
• 50+TB of user messages data in 150 node cluster on which
Cassandra is tested.
• Search user index of all messages in 2 ways.
– Term search : search by a key word
– Interactions search : search by a user id
Use-cases: Apple
• Cassandra is Apple's dominant NoSQL database
– MongoDB - 35 job listings (iTunes, Customer Systems Platform, and
others)
– Couchbase - 4 job listings (iTunes Social)
– Hbase - 33 job listings (Maps, Siri, iAd, iCloud, and more)
– Cassandra - 70 job listings (Maps, iAd, iCloud, iTunes, and more)
Replication and Multi Data Center Replication
Use-cases: NetFlix
Use-cases - Apple
Data Model
• Partitioning
How data is partitioned across nodes
• Replication
How data is duplicated across nodes
• Cluster Membership
How nodes are added, deleted to the cluster
Partitioning
?
Partitioning
Partitions, Partition Key
Replication
• Each data item is replicated at N (replication factor) nodes.
h(key2) F
B
D
1/2
* Figure taken from Avinash Lakshman and Prashant Malik (authors of the paper) slides.
24
Partitioning and Replication
25
Cassandra Key features
• Elastic Scalability
– Elastic scalability refers to a special property of horizontal scalability.
It means that your cluster can seamlessly scale up and scale back
down.
References
• https://jaxenter.com/evaluating-nosql-performance-which-database-is-
right-for-your-data-107481.html
• http://www.slideshare.net/amcsquarelearning/learn-mongo-db-at-
amc-square-learning?next_slideshow=1
• https://en.wikipedia.org/wiki/Apache_Cassandra
• http://www.datastax.com/
• http://www.slideshare.net/asismohanty/cassandra-basics-20
Thank You