Research Report: IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Download as pdf or txt
Download as pdf or txt
You are on page 1of 8

Research Report

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata
Executive Summary

The problem: how to analyze vast amounts of data (Big Data) most efficiently.
The solution: the solution is threefold:
1. Find ways to organize and compress data such that large amounts of data take up
less space and find a way to read compressed data to speed query completion;
2. Use efficient algorithms to accelerate the speed of Big Data analysis; and,
3. Select a system environment that provides balanced resource utilization such that
CPU power, memory, input/output (I/O), networks and storage all work together in
a balanced fashion in order to generate query results as expeditiously as possible.
Three companies IBM, SAP, and Oracle all build software environments designed to
accelerate Big Data analysis. There are, however, very significant differences in how each
vendor organizes/queries data and in related system designs:

IBMs approach uses an innovative new technique known as DB2 BLU Acceleration. Using a columnar approach, BLU quickly whittles down the size of a Big
Data database to isolate relevant data, in effect speed reading large databases (this
approach enables BLU to achieve a 10-50X performance advantage over traditional
row-based approaches). IBMs approach also features database compression, the
ability to read compressed data in memory, and a balanced system design;
SAPs HANA relies on placing large amounts of columnar data in main memory
where the whole database can be analyzed in real time. HANA also compresses
data by up to 20X (but need to decompress it to enable query processing). We like
HANA, but we question whether system resource utilization is well-balanced; and,
Oracles own Website describes Exadata Database Machine as combining massive
memory and low-cost disks to deliver the highest performance and petabyte scalability at the lowest cost. To us, Exadata is a highly-tuned Oracle real application
cluster (RAC) packaged as an appliance with storage that uses the Oracle database
along with in-database advanced analytics. This offering does not exploit columnar
data; does not read compressed data; and its compression facilities lag IBMs DB2.

A closer look at each vendors Big Data analytics offerings shows major differences in how
data is cached and compressed; how workloads are managed; how memory is used and in
system balance/optimization. As we examined each vendors offerings from these perspectives, we found that IBMs DB2 with BLU acceleration has strong advantages over SAP
HANA and Oracle Exadata particularly when it comes to balanced system design and
performance. In this Research Report, Clabby Analytics discusses in greater depth how
each of these systems differ.

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata
The Big Data Marketplace

Due to advances parallel processing; due to continual reductions in storage costs; due to the
simplification of data management software; and due to lower cost, more powerful and
flexible business analytics software, it has now become more affordable than ever before to
run analytics applications against large volumes of enterprise data. For years enterprises
have been capturing valuable, usable data but this data has been too expensive to mine
(analyze). Now, with reduced systems, storage and software costs, enterprises are finding
that they can achieve a clear return-on-investment by analyzing more and more of the
structured and unstructured data that they have long been able to capture.
A CEO study entitled Leading Through Connections shows that enterprises now realize
the strategic value of business analytics. Enterprises are cleansing their data, consolidating
and parallelizing their databases, and building integrated infrastructures. As evidence of
the strategic importance of Big Data analysis, also consider this study published in the MIT
Sloan Management Review that concludes that enterprises that use Big Data analytics are
twice as likely to outperform their competitors. This study also found that there has been a
60% increase in the use of business analytics over the past few years.
The use of analytics is growing and it is growing because enterprises now see the strategic value of
analyzing all sorts of data. By moving to Big Data analytics enterprise are better able to respond to new
opportunities as well as to create or respond more quickly to competitive threats. Enterprises that have
embraced Big Data analytics are finding that they can better service customers, better manage risk, spot
trends and thus improve customer relationships, reduce risk, and exploit new business opportunities.

Organizing and Working with Big Data

The goal of business analytics is to help people make more informed decisions thus
leading to better business outcomes. In order to make more informed decisions, enterprises
need to:

Organize, integrate, and govern their structured as well as unstructured data;


Address data growth (scale) through data and compute parallelism; and,
Find ways to cost effectively manage and store large, complex data sets.

To achieve data consistency and operational efficiency, multiple, siloed data warehouses
need to be consolidated and parallelized in order to create one version of the truth (a single,
common database view). Further, Quality-of-Service (QoS) requirements for reliability,
availability, and security as well as scalability and performance need to be addressed.
Once the data has been cleansed and federated, enterprises need to figure out how to work
with that data. Entire databases can be placed into memory, or they can be dynamically
cached (placed in the storage subsystem, then accessed as needed). Data can be tiered such
that the most important data can be located on fast disks in close proximity to the processors such that it can be analyzed most expeditiously. Data can be compressed such that it
takes up less space (saving on storage and memory costs). Data can be organized into
columns in order to improve analytics performance over traditional row-based data. Some
vendors can even read this compressed data, producing results more quickly than
competitors.

September, 2013

2013 Clabby Analytics

Page 2

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata
Big Differences Start to Show-up When Analyzing How Data Is Organized and How the System
Design Supports Big Data Analytics Processing

Some of the comparison points that we look for when comparing and contrasting IBMs,
SAPs, and Oracles approaches include:

What is the data structure? Is it row data or columnar or both?


Is the data cached or is it all held in memory?
How is compression handled?
What are the system design characteristics? Where/how is the data processed?
What is being done to streamline CPU, memory and I/O interaction?
What are the deployment characteristics?

As Figure 1 shows, there are big differences in how IBM, SAP, and Oracle express data
(rows vs. columnar); how memory management/caching is handled; how compression is
handled; and how data is processed by the system (resource utilization).
Figure 1 How IBM, SAP, and Oracle Organized Big Data

Source: Clabby Analytics September, 2013

The way we see it, organizing data into columns has big performance advantages as compared with
processing row-based data. We like in-memory databases like SAPs HANA, especially for real-time
processing, but we have some resource utilization issues with this design (so clever caching is more
appealing to us). We see the ability to analyze compressed data as a huge competitive differentiator for

September, 2013

2013 Clabby Analytics

Page 3

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata
IBM because it speeds the time it takes to achieve results. From a systems design perspective, we like to
see the balanced use of all resources and we like to see processors exploit SIMD instructions (single
instruction, multiple data) in order to more efficiently process data in parallel. As for deployment (and
operations), it can be argued that IBMs approach is less complex (load-and-go simplicity) and more
flexible (more control over how to configure/build a Big Data server) as compared with its SAP and Oracle
competitors.

A Closer Look at IBMs DB2 BLU Acceleration Solution

First, it is important to note that IBM tunes its DB2 BLU acceleration for its own hardware
just as Oracle does for its own hardware. SAP, on the other hand, has been designed to
run on the commodity (x86) hardware offered by many vendors.
As we looked at IBMs BLU Acceleration data structure we found that it can be used with
row or column-based data. In column mode, it has been reported to be 10-50X faster than
traditional row-based relational databases. From a memory management perspective,
IBMs DB2 with BLU Acceleration is designed to use in-memory columnar processing that
maintains in-memory performance while dynamically moving unused data to/from storage
as needed. As for compression, DB2 has long had a compression advantage over other
major database competitors (a few years ago we wrote about how some Oracle customers
were able to save 40% of their storage costs by taking advantage of DB2 compression).
But, in addition to compression efficiency advantages, BLU Acceleration is able to read
compressed data (no decompression necessary) while also employing data skipping
algorithms to speed-read compressed databases. BLU Acceleration also takes advantage of
processor level parallel vector processing to exploit multi-core and SIMD (single
instruction, multiple data) parallelism (SIMD instructions help improve parallel
performance, helping to produce query results faster than systems that do not exploit
SIMD).
DB2 with BLU Acceleration is NOT an in-memory database processing environment (like SAPs HANA)
instead it uses dynamic memory management caching techniques to off-load some data to near proximity
fast storage. DB2 compression can reduce the size of Big Data databases more efficiently than Oracle or
SAP. BLU Acceleration can read compressed data in memory without having to decompress it (IBMs BLU
Acceleration uses advanced data-skipping techniques both SAPs HANA and Oracles Exadata do not
read compressed data in memory). We see this ability to read compressed data in memory as a huge
advantage for IBMs BLU when it comes to the speed of query completion.

Also noteworthy, from a systems design perspective, IBMs DB2 BLU Accelerator can be
deployed on IBM POWER-based Power Systems as well as x86-based System x servers.
Because POWER processors can execute twice as many threads as their Intel counterparts,
it is reasonable to expect Power Systems to be able to significantly outperform x86-based
SAP and Oracle counterparts when running the same query.
Finally, note that IBMs DB2 BLU Acceleration takes advantage of a balanced system
design where CPU, memory, and I/O all work together in an optimized fashion. As we
examined how IBMs DB2 BLU Acceleration works with underlying processors and
subsystems, we observed that underlying subsystems are optimized for:

September, 2013

2013 Clabby Analytics

Page 4

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

In-memory processing because: 1) the most useful data is placed in memory (the
data stays compressed so more data can be placed in memory while data in storage
is scan-friendly from a caching perspective); 2) less data is placed in memory (as a
result of the use of columnar data, late materialization, and data skipping
techniques); and, 3) memory latency is optimized for scans, joins and aggregation.
High CPU performance thanks to the use of SIMD instructions that speed scans,
joins, grouping and arithmetic performance; and thanks to core friendly parallelism.
I/O optimization because the system design places less stress on the I/O
subsystem (because there is less data to read thanks to columnar processing and the
ability to read compressed data in memory). When data is retrieved from cache, it
is easier to read because it has been packaged as scan-friendly. And, finally,
specialized columnar prefetching algorithms also speed-up cache calls.
IBMs system design is a great example of how to build a well-balanced environment that places the most
important data in memory while making cached data easy to retrieve. We like this dynamic data caching
approach better than an in-memory database approach largely because it accommodates database
scalability better. With in-memory databases, to deal with the size of ever-growing Big Data databases,
either more systems needs to be purchased or data needs to be dropped out of memory in favor of new
data. Dynamic caching provides more flexibility when dealing with data sets that are larger than main
memory can hold. We also like the way the processor is fed a steady stream of data to process, as well as
the use of SIMD instructions to improve parallel processing performance. We also like the way the I/O
subsystem is organized such that cache calls are fewer and further between, and when cache is called
prefetch algorithms speed data acquisition. Finally, we note that IBMs DB2 environment offers advanced
workload management facilities whereas its SAP HANA competitor does not.

A Closer Look at SAPs HANA

SAPs HANA environment is an in-memory processing environment. With HANA,


ALL data is placed in main memory for fast processing. (An excellent overview of how
this architecture processes queries can be found here).
The emphasis in the HANA design is to converge online analytical processing (OLAP) and
online transaction processing (OLTP) into one columnar store in order to eliminate latency
(reads/writes to disk) and thus speed real-time decision making. The design advantage in
using an in-memory data store is that cache/disk latency is eliminated, and OLAP and
OLTP activities can take place in parallel in real time. In contrast, IBMs DB2 Acceleration would need to access row-based tables stored on disk to perform certain operational
queries while trend and historical data would probably reside in an in-memory data mart
(having to go to two different places in order to gather data could make BLU less suitable
for real-time analysis as compared with using main memory exclusively).
Enterprises that need to converge OLAP and OLTP into one common environment for real time decision
making, therefore, would likely be well served by adopting the HANA approach.

The big questions to be answered with respect to how HANA handles large in-memory
databases are given that the size of memory is finite is how many concurrent users can

September, 2013

2013 Clabby Analytics

Page 5

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

be supported and how does the system perform as queries complexity increases. SAPs
own HANA Memory Usage guide indirectly raises these same questions when it states
that the amount of extra memory will depend on the size of the tables (larger tables will
create larger intermediate result-tables in operations like joins), but even more on the
expected workload in terms of the concurrency and complexity of the analytical queries
(each concurrent query needs its own workspace). To us, this means that as query
complexity increases, performance will slow down and, because each query needs its
own workspace, the number of concurrent workloads may need to be decreased in order to
meet service level requirements for performance.
The other important elements we examine when evaluating Big Data processing environments include compression efficiency and the ability to analyze compressed data. We have
not yet seen a published HANA vs. BLU compression comparison, but we did see an IBM
Labs test that showed that for the same 220 GB raw, uncompressed fact data, IBM beat
HANA compression by 13%. Further, when it comes to analyzing uncompressed data,
HANA needs to decompress data before reading it which should result in substantial
performance advantages for IBMs DB2 BLU Accelerator.
In short, we think SAPs HANA has been designed as a real-time OLTP/OLAP environment. If the database
can fit into memory and if the queries are simple this is a good architecture to process real-time
operational OLTP/OLAP workloads. We do have reservations, however, on intermediate and complex
query demands on the system and the level of concurrency that can be supported accordingly.

Oracles Exadata Database Machine Environment

Oracles Exadata environment is a highly tuned x86-based real application cluster environment (packaged as an appliance) that has been designed to process large Oracle databases.
It does not use columnar processing other than as a compression technique; and it is not an
in-memory-only system solution (like SAPs HANA). The way it deals with large volumes
of data is similar to IBMs BLU approach in that it places hot data in main memory and
caches the overflow.
The beauty of this environment is that it features a tuned and optimized Oracle database
(which is important because some enterprises have standardized on Oracle as their primary
database), an operating system, servers, and storage along with analytics tools and utilities,
all as an integrated solution. It offers good performance and can scale readily. It can be
used to perform all types of analytics ranging from scan-intensive applications to highly
concurrent transactional processing. Finally, it offers solid workload management
facilities. Oracle customers love Exadata because it is a prepackaged data appliance that is
straightforward to deploy and provides an immediate performance boost when compared to
standard Oracle database performance on various commercial servers.
Our big issue with Oracles Exadata Database Machine is that it appears to be a jack-of-alltrades designed to handle a variety of analytics workloads (we dont see this design as
being optimized for any specific analytics workload). In contrast, consider IBMs
PureData Systems systems that have been engineered to process specific analytics
workloads. For instance, IBMs PureData System for Analytics is a high performance,

September, 2013

2013 Clabby Analytics

Page 6

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

scalable, massively parallel appliance that has been tuned and optimized to perform
analytics on large volumes of historical data. IBMs PureData System for Hadoop
simplifies Hadoop deployment and accelerates Hadoop performance. PureData System for
Operational Analytics is optimized for operational analytics and has been designed to hand
1000+ concurrent operational queries. And IBMs PureData System for Transactions has
been designed specifically to process high volumes of transactions. IBM builds appliances
that are tuned and optimized to handle specific analytics workloads optimally.
As for comparing Oracles Exadata Database Machine compression with IBMs DB2 BLU
Acceleration compression, we have already mentioned that Oracle customers who had
switched to IBMs DB2 saw compression rates greatly improve. We suspect that IBMs
compression efficiency is probably @20% better than Oracles compression efficiency
saving IBM DB2 customers from having to purchase a lot of additional storage hardware.
We also stated earlier is that Oracles Exadata does not read compressed data requiring
that data be decompressed before it is acted upon. Decompression can happen at the
storage tier (as part of a smart scan) or at the database tier but regardless of where it
occurs it has a performance impact.
As for Oracles Exadata Database Machines failure to exploit columnar processing, this
too has an impact on database processing performance. Because Oracles Exadata is not
based on a columnar database design, it does not have the ability to read just 1 column for
many rows so it places all columns for a given row into a compression unit. During
decompression, rows are reconstructed out of the compression unit. Decompressed rows
are returned to the database if Smartscan is used, but if it isnt used than the entire compression unit is returned to buffer cache. Compression, on the other hand, occurs on the
database tier only (never on storage cells). It is reasonable to assume that having to constantly compress/decompress data will likely negatively impact analytics performance.
From a system design perspective, there are IBM has a further advantage over Oracles
x86-based Exadata Database Machine in that IBM can offer its DB2 BLU Acceleration on
POWER-based systems. Due to faster threading and the ability to support SIMD instructtions, IBMs it stands to reason that IBM Power Systems can process more work on fewer
cores than Oracles Exadata (using less hardware may result in price advantages when
selecting an IBM BLU Acceleration solution).
As for the complexity to manage aspect that we described in Figure 1, consider how
Exadata handles query parallelism. The Oracle database as implemented on Exadata
allocates parallel execution processes on a first-come-first-served basis until the maximum
number of parallel processes is achieved. When the system load is low, queries are
allocated the maximum number of parallel execution processes, thus improving performance. But when the load is high, Oracle downgrades the number of execution processes
allocated to queries and/or forces queries to wait in queues. Downgrading the number of
parallel execution processes while queuing others degrades query performance as fewer
resources are made available to execute query requests. Further, downgraded queries
remain downgraded until the query is finished. To us, this is an example of why this
environment is difficult to manage.

September, 2013

2013 Clabby Analytics

Page 7

IBM DB2 BLU Acceleration vs. SAP HANA vs. Oracle Exadata

Finally, it is important to note that due to processor advantages (faster processors, SIMD instruction
execution, the ability to process more threads using fewer cores) and balanced I/O and memory
management, we suspect that IBMs POWER-based Power Systems with Blue Acceleration should be able
to outperform Oracles Exadata by 1.5X to 2.5X depending on the complexity of the reports being executed.

Summary Observations

The bottom line in comparing these three architectures is that there are some commonalities in the ways that each vendor approaches Big Data analytics but there are also some
very distinct differences. These differences manifest themselves in database query processing speed; number of concurrent users that can be supported; the affect that query complexity can have on system performance; system efficiency/optimization; and manageability.
The key take-aways from this report should be as follows:

IBMs DB2 BLU Accelerator and SAPs HANA use a columnar approach to set up
data for processing (IBM can also use rows). Columnar processing can be much
faster than row processing (Oracle uses a row-only processing approach);
IBMs DB2 BLU Accelerator has a distinct advantage over SAPs HANA and
Oracles Exadata in that it can analyze compressed data in memory. This means
that more data can be read more quickly;
SAPs HANA can be excellent at processing large volumes of data in-memory for
combined OLAP/OLTP environments where real-time results are required. We
suspect, however, that as query complexity or volume (concurrent users) increases,
performance will slow down significantly as each query contends for resources;
Oracles Exadata Database Machine has been well received by the Oracle community because it offers good out-of-the-box performance and scales well. We see this
appliance, however, as a general purpose analytics processor not tuned for
specific analytics workloads. Other shortcomings are its row-based orientation as
well as its inability to read compressed data in memory. We think that given IBMs
system design advantages (faster processors, more threads, SIMD, balanced I/O
handling, solid memory management), IBM POWER-based Power Systems should
be able to easily outperform Exadata systems when processing complex and
intermediate workloads respectively.
When choosing a Big Data processing environment, each approach has its merits. But, from our
perspective, IBMs DB2 BLU Acceleration has several design advantages that should lead to consistently
higher performance when querying Big Data databases.

Clabby Analytics
http://www.clabbyanalytics.com
Telephone: 001 (207) 846-6662

Clabby Analytics is an independent technology research and


analysis organization. Unlike many other research firms, we
advocate certain positions and encourage our readers to find
counter opinions then balance both points-of-view in order to

2013 Clabby Analytics


All rights reserved
September, 2013

decide on a course of action. Other research and analysis


conducted by Clabby Analytics can be found at:
www.ClabbyAnalytics.com.

You might also like