Cpo - Analyze Data Faster With Db2 Blu On Power v7.9

Download as pdf or txt
Download as pdf or txt
You are on page 1of 13

Analyze Data Faster with DB2 BLU

Acceleration and POWER8 Systems

Satish Upadhyay - [email protected]


IBM Competitive Project Office
November 2014

Executive Summary
Big Data and Analytics has become a quintessential part of every business in the market today.
Organizations want to glean through the meaningful information from the massive amount of
data coming from a variety of sources as well as data stored in the data warehouses. Line of
Business leaders are looking for ways to analyze this data, called 'Big Data', to make key business
decisions. However, having the right data solution in place is a big challenge. Many data solution
providers offer a partial solution that does not meet the client's requirements. IBM DB2 BLU
Acceleration running on newly released POWER8 servers provide a better and faster solution for
Big Data Analytics. In this white paper we look at these two technologies in detail. We will show
how DB2 BLU running on POWER8 server delivers superior performance at lower cost.
We compared the price/performance of DB2 BLU on Power and competitor database running on
pre-integrated database machine with various Cognos Business Intelligence (BI) test cases. The
test results show that POWER8 server delivers up to 70% more throughput and has 5.7x lower
cost than competitor's solution.

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 1

Why Big Data and Analytics is Important?


Enterprises are collecting data at unprecedented rates. According to an IDC study1, nearly 4
zettabytes of data (zettabyte = trillion gigabytes or 1 followed by 21 zeros) was generated just in
2013. If this trend continues, it will reach 40 zettabytes by 2020. It is not just the volume of data
that is increasing, it is also many different types of data (audio, video, tweets etc.) which is
being generated that are not stored in traditional databases. Previously, this was considered
throw-away data because there was no cost-effective way to analyze it. In last few years, new
technologies have emerged that changed the landscape. Now technologies exist that make Big
Data Analytics possible and IBM is in the forefront of it. Businesses want to examine these
large sets of data to uncover hidden patterns, unknown correlations, and other useful
information that can be used to make better business decisions. Companies that are able to
apply Big Data Analytics in their decision making, have a major competitive advantage.

There are three major forces driving the need for Analytics: the growth of mobile devices,
emergence of social media, and the shift of power from companies to consumers. Each of these
pose a special set of challenges to organizations in the area of big data and warrant a special
look at why they are important.

Growth of mobile devices


In the last decade, mobile devices like smart phones and tablets, have become key internet
devices to interact from anywhere anytime. That is the biggest source of unstructured data.
Users of these devices are also sharing a lot more of their personal data via these devices. For
example, smartphone users willingly share their location data in order to find places of interest
such as the nearest restaurant, gas station, or a shopping place. Smart companies can take
advantage of that information to sell them their products.
Emergence of Social Media
With the phenomenal growth of social media channels like Facebook, Twitter and YouTube,
consumers are quick to share their experiences both good and bad very quickly and they are
able to reach a much wider audience. A short video clip from someones cell phone can go
viral in matter of hours. So if companies are not quick to act on the sentiments in social
media, a minor negative incident can turn into a major embarrassment for the company. A BigData savvy competitor can take advantage of the situation and grow its customer base.
Shift of power to consumers
1

http://www.emc.com/about/news/press/2012/20121211-01.htm

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 2

Another interesting side effect of the factors above is that consumers have started to rely on
reviews from other consumers. Reviews on sites like Yelp and Tripadvisor can make or break a
restaurant or a hotel. They have become so important that most big hotels have dedicated staff
whose job is to respond to these reviews.

In addition, data from smart meters, banking transactions, weblogs, GPS and more can be
tracked, stored, and analyzed in many different ways. In short, for every business, Analysis of
Big Data whether it is structured or unstructured can make a substantial impact to:

Increase customer retention and loyalty

Improve customer engagement by matching customer needs to companys products

Gain competitive edge by acting on competitors missteps

Save money by preventing fraud

Challenges of Big Data Analytics


There are three major challenges of Analyzing Big Data:
1. Four Vs: Volume, velocity, variety, and veracity of data is so complex in each dimension
that it is difficult to process by traditional database software. Legacy hardware
platforms such as old servers and storage are unable to handle it. Big data analysis
requires high-performance analytic platforms including servers, storage and software.
2. Simplicity: Majority of the Analytic solutions are complex in nature. They require
specialized data scientists and solution experts to deploy and very time consuming to
implement.
3. Cost: In order to handle vast amount of data, Analytic solutions which include hardware
and software get expensive to deploy. They often demand more system resources like
more servers with higher number of processors, higher memory and storage as well as
costly analytic software tools which leads to an expensive solution.

IBM Solution for Big Data and Analytics


IBM has defined a Big Data Platform which addresses the full spectrum of Analytics, both
structured and unstructured. It delivers this in a zone architecture, each zone handling a
different type of analytics workload. It runs on new generation hardware that is built with Big
Data in mind. The picture below shows the different zones in the platform. In this white paper
our study is limited to Enterprise warehouse data mart zone, outlined in red.

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 3

IBM DB2 BLU Acceleration running on POWER8 server is a great implementation of the Big
Data platform. The combination of IBM hardware and software can run most complex
analytical queries much faster.

Figure: 1, IBM Big Data Platform Architecture: Analyze all data from any source

IBM DB2 with BLU Acceleration


DB2 version 10.5 comes with BLU acceleration built in. It consists of a collection of technologies
that are designed for the Analytics workload. It is not just fast, it is order of magnitude faster,
up to 50-70 times in many cases. You can get all this speed without creating new indexes,
aggregates, or tuning. DB2 BLU Acceleration is easy to deploy and easier to administer. Its
efficient compression algorithm in conjunction with IBMs Terabyte pricing options (cost based
on database size) makes it a great solution for small and medium businesses. There are four
major technological innovations in BLU Acceleration:

Dynamic in-memory and columnar database- BLU acceleration is dynamic in-memory


which optimizes movement of data from storage to system memory to CPU cache. With its
efficient prefetching technology it loads only the required data into the memory to reduce
query latency. So your entire data sets dont have to fit in the memory unlike some of its
competitors. Excess data can reside on the storage.

Actionable Compression- BLU Acceleration preserves the order of the data, enabling
compressed data in BLU acceleration tables to be used without decompression. A broad

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 4

range of operations like predicate and joins can be completed on compressed data. Most
frequent values are encoded with fewer bits to optimize the compression.

Parallel Vector Processing- All the latest processors like POWER8, POWER7+/7, Intel Ivy
bridge etc. have this new feature called SIMD (Single Instruction Multiple Data) which
allows you to fetch many data elements simultaneously with a single instruction, thus
parallelizing the operations for faster processing at the chip level. BLU Acceleration is
designed to take advantage of this feature.

Data Skipping- BLU Acceleration can automatically detect large sections of data that does
not qualify for a query and effectively skips or ignores it. Data skipping utilizes a secondary
object called a synopsis table, which is a tiny in size just .1% of the size of the user table that
is created and maintained automatically. BLU Acceleration keeps the minimum and
maximum range of data values on "chunks" of data (about 1,000 records) in this table. So it
can get the min and max amongst 1000 records with just 1 read.

IBMs POWER8 Servers Designed for Big Data and Analytics


IBM has recently announced its next generation Power servers based on the new POWER8
processor. These systems are purposefully built for complex workloads like Big Data Analytics
and cloud computing. POWER8 processor has many technological advancements to deliver
better performance. It is designed to be a massively multi-threaded chip with each of its cores
capable of handling eight hardware threads (SMT8) simultaneously, for a total of 96 threads
executed on a 12-core chip compared to just 24 and 30 threads on Intels latest Ivy Bridge EP
and EX processors respectively. The processor has higher Caches (on-chip L1, L2 and L3) and
additional 128MB off-chip L4 cache, the technology brought into POWER8 from IBMs
Mainframe systems which makes cache intensive workloads like Big Data Analytics much
faster. Higher Memory bandwidth and higher I/O bandwidth makes system balanced with
reduced memory and I/O latencies. Below is the comparison of some of the major features of
POWER8 versus Intel Ivy Bridge EX processors.
Ivy Bridge EX E7-88xx v2

POWER8 Systems

1.9-3.4 GHz
1, 2*

3.0-4.15 GHz
1, 2, 4, 8

30

96

Max L1 Cache
Max L2 Cache
Max L3 Cache
Max L4 Cache

32 KB*
256 KB
37.5 MB
0

64 KB
512 KB
96 MB
128 MB

Memory Bandwidth (GB/s)

68-85**

230-410

Clock rates
SMT Options
Max Threads/Socket

Table 1: Comparing Intels Ivy Bridge-EX to IBMs POWER8

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 5

2*= Intel calls this Hyper-Threading technology; with HT or No HT


32KB* = Running in Non-RAS mode 16KB results in better RAS
85** = Running in Non-RAS mode and dual-device error NOT supported

In addition, there are key innovations in POWER8 processor which boosts performance:

Transactional Memory which speeds-up memory write operations by reducing contention

Coherent Accelerator Processor Interface (CAPI): a new Open interface which allows PCIe3
devices to with higher bandwidth and low latency.

BLU Acceleration is optimized for Power Systems


DB2 with BLU Acceleration exploits POWER8 features for even better performance. It is
optimized to take advantage of higher number of cores/threads that provides massive
parallelization. Power System has a higher number of registers than X86 (64 on Power vs. 16 on
Intel), which results 4x more data being loaded in registers of Power System than x86 for higher
performance. AIX running on Power also has 2 pipes for processing SIMD requests which
further parallelize for higher processing than x86. If BLU Acceleration is running on POWER
systems, it leverages separate binaries which are specifically optimized for Power architecture.

Business Intelligence Performance Study


A competitive study was conducted by running Cognos Business Intelligence (BI) based reports
with DB2 BLU Acceleration on POWER8 server versus Competitor database running on preintegrated database machine.
Cognos BI can analyze large volume of data, generate reports, dashboard and scoreboards. You
can freely explore information, analyze key facts and quickly collaborate to align decisions with
key stakeholders. In this test, we generated 9 Cognos reports which fall in three different
categories:
1) Simple reports
2) Intermediate reports, and
3) Complex reports

Simple reports are simple in nature. They are very fast running reports like dashboard and adhoc reports. They take seconds or sub second of time to execute.

Intermediate reports are advanced reports, require predicate evaluation over large sales_fact
table, joins, and aggregation of relatively small result set. They take minutes to complete.

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 6

Complex reports are much more complex and resource intensive, require multiple joins and
aggregations on the sales_fact table. For 1TB database size, these reports scan all the 9 Billion
records in the sales_fact table. Complex reports can take hours to finish.
BI tests were conducted against a database with 5 schemas and 80 tables designed in a
snowflake star schema. Database contains retail sample data of 1TB (Raw) in size. Each report
contains multiple sql queries. All these reports are generated in three different testing scenarios
1. Operational Analytics Test ( Multiple Concurrent user test)
2. Deep Analytics Test ( Multiple Concurrent user test)
3. Fixed execution Test ( Multiple Concurrent user test)

Systems Under test


BI tests were conducted and results were compared between IBM POWER8 and Pre-integrated
Database machine. Table below shows their system configurations details.
Database System
Specifications

IBM POWER8

Competitor System

System

IBM Power System S824

Rack Pre-integrated database


competitor V3

Processor

24 cores POWER8, 3.5 GHz

68 cores Intel Xeon E5-2690, 2.9 GHz

RAM

256 GB

256 GB

Storage

Flash System 840

SSD based Flash Storage

Operating System

AIX 7.1 TL03

Competitor Enterprise Linux

Database

DB2 10.5 AESE with BLU Acceleration

Competitor database

Table: 2, Configurations of the systems under test


Both systems were loaded with 1TB database (raw data) and tuned systems performances to
deliver their best throughput. For Competitor system, its best practices guide was used to tune
and optimize the performance and for BLU Acceleration, since it is self-optimized, no additional
steps were taken towards its tuning. Both systems were made to utilize close to 100 percent of
their cpus.

Performance Results
Operational Analytics
In Operational Analytics test all three types of reports simple, intermediate and complex were
generated simultaneously by 80 concurrent users and we measured the total throughput of both
systems. Below are the performance results.

POWER8 system generated 30% more Simple reports than Competitor

POWER8 system generated 70% more Intermediate reports than competitor with
similar complex reports

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 7

POWER8 Delivers 30%

POWER8 Delivers 70%

more

more

Intermediate reports/hour

Simple reports/hour
591,355

472,229

POWER8
Competitor

POWER8

Competitor

Systems Under Test

Throughput in reports/hour

Throughput in reports/hour

227

137
POWER8
Competitor

POWER8

Competitor

Systems Under Test

Figure: 2, Performance results from Operational Analytics

Deep Analytics
Second test was Deep Analytics test with heavier workload. In this scenario, 24 concurrent users
generated intermediate and complex reports simultaneously and measured the total throughput
of both systems. Below are the results of this test.

POWER8 system generated 60% more Intermediate reports per hour

POWER8 system generated 40% more Complex reports per hour

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 8

POWER8 Delivers 60% more

POWER8 Delivers 40% more

Intermediate reports/hour

Complex reports/hour

237

149
POWER8
Competitor

POWER8

Competitor

Systems Under Test

Throughput in reports/hour

Throughput in reports/hour

3.27

2.27
POWER8
Competitor

POWER8

Competitor

Systems Under Test

Figure: 3, Performance results from Deep Analytics

Fixed Execution
Lastly we ran the Fixed Execution test. In this test we had a predefined workload size (161,166
reports) which were executed on both servers. We measured the total time taken by each system
to complete all the reports. As the diagram below shows, POWER8 was 13% faster than preintegrated database competitor system.
POWER8 Completed Fixed Execution test 13% Faster

124

POWER8

POWER8
Competitor

141

Competitor

Time taken in Minutes

Figure: 4, Performance results from Fixed Execution test

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 9

Pricing
We saw that DB2 BLU on POWER8 beats the competitor hands down. But performance and
price go hand in hand. So we compared the total cost of both solutions. The table below shows
Total Cost of Acquisition (TCA). It includes the cost of hardware, Software with 3 years of
maintenance/support. All prices are US list price.
POWER S824
System Hardware Including Operating System

Competitor ( Rack )

$98,388

$488,921

Virtual Manager ( 3 Year TCA )

$8,688

Cluster HW+SW ( 3 Year TCA )

$610,880

Storage cost ( 3 Year TCA )

$429,820

$597,600

Total Hardware Cost

$536,896

$1,697,401

$79,380

$1,261,600

$571,040

$79,380

$1,832,640

$616,276

$3,530,041

( 3 Year TCA)

Database Software ( 3 Year TCA )


Analytics Add-on Components (3 Year TCA )
Total Software Cost
Total Cost (HW and SW) with 3 Year TCA
Table: 3 Year TCA comparison

As the table above shows, POWER8 S824 System provides 5.7x lower TCA compared to Preintegrated database competitor V3 ( Rack ).

Why BLU Acceleration on Power is better than Pre-integrated


database competitor?
There are many innovations built into BLU Acceleration as well as into POWER8 servers than
competitors systems which leads to the higher performance.
BLU Acceleration is a Dynamic in-memory optimized technology. With IBMs innovative
prefetching technology, it loads and caches only the hot data into the memory. However Preintegrated database competitor is NOT an in-memory database. Competitors system caches
entire rows of data regardless if you need it or not and wastes memory on unused data which
leads to the slower performance. BLU Acceleration uses Column Organized data which is
highly suitable for Analytics operations. However, competitor database is row based and has
not columnar database capability. Actionable compression makes BLUs performance faster as it

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 10

can operate on compressed data but competitor database has additional overheads of
decompression before it makes any operation and worsen the performance. And lastly BLU
Acceleration is easy to deploy and administer. Just load and go. No need for indexes and
complex performance tuning.

Conclusion
POWER8 server is designed specifically for todays complex workload like Big Data Analytics.
IBMs DB2 BLU Acceleration is optimized for POWER8 server and together it provides a
breakthrough performance for Analytics workloads at 5.7x lower cost. DB2 BLU has
innovations like dynamic in-memory, columnar organized data, actionable compression and
exploitation of SIMD that are simply unmatched by any competition.

The combination of BLU Acceleration and POWER8 provides the best solution to clients who
are looking to gain new insights into their data, in both price and performance.

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 11

DISCLAIMER NOTICE
This case studys results are based on measurements and projections using standard IBM workloads in a
controlled environment. This information is presented along with general recommendations to assist
the reader to have a better understanding of IBM (*) products. The actual throughput or performance
that any user will experience will vary depending upon considerations such as the amount of
multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the
workload processed. Therefore, no assurance can be given that an individual user will achieve
throughput or performance or power savings improvements equivalent to the ratios stated here. All
performance and power data contained in this publication was obtained in the specific operating
environment and under the conditions described within the document and is presented as an
illustration. Performance or power characteristics obtained in other operating environments may vary
and customers should conduct their own testing.
Information is provided "AS IS" without warranty of any kind.
The use of this information or the implementation of any of these techniques is a customer
responsibility and depends on the customer's ability to evaluate and integrate them into the customer's
operational environment. While each item may have been reviewed by IBM for accuracy in a specific
situation, there is no guarantee that the same or similar results will be obtained elsewhere. Customers
attempting to adapt these techniques to their own environments do so at their own risk.

Copyright IBM Corporation 2014


IBM Corporation
Software Group
Route 100
Somers, NY 10589
Produced in the United States of America
November 2014
IBM, the IBM logo, ibm.com, and POWER7+ are trademarks of International Business Machines Corp., registered in many
jurisdictions worldwide. Other product and service names might be trademarks of IBM or other companies. A current list of IBM
trademarks is available on the Web at Copyright and trademark information at www.ibm.com/legal/copytrade.shtml
This document is current as of the initial date of publication and may be changed by IBM at any time. Not all offerings are available
in every country in which IBM operates.
The performance data and client examples cited are presented for illustrative purposes only. Actual performance results may vary
depending on specific configurations and operating conditions.
THE INFORMATION IN THIS DOCUMENT IS PROVIDED AS IS WITHOUT ANY WARRANTY, EXPRESS OR IMPLIED,
INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND ANY
WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are warranted according to the terms and conditions of the
agreements under which they are provided.

Please Recycle

Analyze Data Faster with DB2 BLU Acceleration and POWER8 Systems

pg. 12

You might also like