Lecture 1
Lecture 1
Lecture 1
1
NoSQL!
NoSQL databases are currently a hot topic in some parts of
computing, with over a hundred
different NoSQL databases.
RDBMS Characteristics
• Documents
• Loosely structured sets of key/value pairs in documents, e.g., XML, JSON,
BSON
• Encapsulate and encode data in some standard formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a document into its
constituent name/value pairs
• Allow documents retrieving by keys or contents
• Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and more)
Document Databases (Document Store)
15
Document Databases, JSON
{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}
Key/Value stores
• Store data in a schema-less way
• Store data as maps
• HashMaps or associative arrays
• Provide a very efficient average running
time algorithm for accessing data
• Notable for:
• Couchbase (Zynga, Vimeo, NAVTEQ, ...)
• Redis (Craiglist, Instagram, StackOverfow,
flickr, ...)
• Amazon Dynamo (Amazon, Elsevier,
IMDb, ...)
• Apache Cassandra (Facebook, Digg,
Reddit, Twitter,...)
• Voldemort (LinkedIn, eBay, …)
• Riak (Github, Comcast, Mochi, ...)
Sorted Ordered Column-Oriented Stores
• Data are stored in a column-oriented way
• Data efficiently stored
• Avoids consuming space for storing nulls
• Columns are grouped in column-families
• Data isn’t stored as a single table but is stored by column families
• Unit of data is a set of key/value pairs
• Identified by “row-key”
• Ordered and sorted based on row-key
• Notable for:
• Google's Bigtable (used in all
Google's services)
• HBase (Facebook, StumbleUpon,
Hulu, Yahoo!, ...)
Graph Databases
• Graph-oriented
• Everything is stored as an edge, a node or an attribute.
• Each node and edge can have any number of attributes.
• Both the nodes and edges can be labelled.
• Labels can be used to narrow searches.
19
Dealing with Big Data and Scalability
• Issues with scaling up when the dataset is just too big
• RDBMS were not designed to be distributed
• Traditional DBMSs are best designed to run well on a
“single” machine
• Larger volumes of data/operations requires to upgrade the server with faster
CPUs or more memory known as ‘scaling up’ or ‘Vertical scaling’
• NoSQL solutions are designed to run on clusters or multi-
node database solutions
• Larger volumes of data/operations requires to add more machines to the
cluster, Known as ‘scaling out’ or ‘horizontal scaling’
• Different approaches include:
• Master-slave
• Sharding (partitioning)
Scaling RDBMS
• Master-Slave
• All writes are written to the master. All reads performed against
the replicated slave databases
• Critical reads may be incorrect as writes may not have been
propagated down
• Large data sets can pose problems as master needs to
duplicate data to
• Sharding
• Any DB distributed across multiple machines needs to
know in what machine a piece of data is stored or must
be stored
• A sharding system makes this decision for each row, using
its key
NoSQL, No ACID
C A
• AP systems relax consistency in
favor of availability – but are not
inconsistent
• Most likely, 10 years from now, the majority of data is still stored in
RDBMS.
• Leading users of NoSQL datastores are social networking
sites such as Twitter, Facebook, LinkedIn, and Digg.
• Not every problem is a nail and not every solution is a
hammer.
• NoSQL has taken a field that was "dead" (database development) and
suddenly brought it back to life.