Big Data
Big Data
Big Data
This document contains Confidential, Proprietary and Trade Secret Information (Confidential
Information) of Informatica and may not be copied, distributed, duplicated, or otherwise reproduced
in any manner without the prior written consent of Informatica.
While every attempt has been made to ensure that the information in this document is accurate and
complete, some typographical errors or technical inaccuracies may exist. Informatica does not accept
responsibility for any kind of loss resulting from the use of information contained in this document.
The information contained in this document is subject to change without notice.
The incorporation of the product attributes discussed in these materials into any release or upgrade of
any Informatica software productas well as the timing of any such release or upgradeis at the sole
discretion of Informatica.
Protected by one or more of the following U.S. Patents: 6,032,158; 5,794,246; 6,014,670;
6,339,775; 6,044,374; 6,208,990; 6,208,990; 6,850,947; 6,895,471; or by the following
pending U.S. Patents: 09/644,280; 10/966,046; 10/727,700.
This edition published September 2015
White Paper
Table of Contents
Executive Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Why Arent We There Yet? . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
Why Do Many Big Data Efforts Fail? . . . . . . . . . . . . . . . . . . . . . . . . . . 3
The Big Data Laboratory vs. Big Data Factory . . . . . . . . . . . . . . . 5
Three Essential Pillars. The Importance of Data Integration,
Governance, and Security . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
Trustworthy Data is Essential to Achieving ROI . . . . . . . . . . . . . . . . . . 7
Big Data Management in Action: Customer Examples . . . . . . . . . . . . . 8
Why Informatica for Big Data Management? . . . . . . . . . . . . . . . . . . . 10
Get Big Data Ready . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
Executive Summary
Now that the industry has nearly a decade of experience with big datasurrounded by far too much hype
about data volume, variety, and velocityits time to finally catch up to the promise. After years of trial and
error, businesses should now be ready to define a big data strategy that delivers tangible and real results
and ultimately achieves increased ROI.
Big data integration and management form the foundation for success. With the right approach, you can bring
together disparate and complex information from multiple sources and turn it into trusted information assets at
scale to drive competitive advantage.
This white paper describes a big data management platform that relies on three fundamental pillars to deliver
expected business results from big data projects:
Dynamic and optimized big data integration
End-to-end big data governance and quality
Risk-centric big data security
Without question, the key big data challenges for most businesses today are how to get started and how to
achieve quick and measurable results.
Business
Objectives
Data Modeler
Relational,
Mainframe
Documents
and Emails
Data Scientist
Data Analyst
Data Steward
Data Engineer
Business
Improve
Fraud
Detection
Social Media,
Web Logs
Laboratory (insights)
Machine Device,
Cloud
Increase
Customer
Loyalty
Factory (actions)
Reduce
Security
Risk
Improve
Predictive
Maintenance
Increase
Operational
Figure 1. Businesses are challenged by how to get started and how to achieve results with big data
Ibid
Capgemini Consulting, Cracking the Data Conundrum: How Successful Companies Make Big
Data Operational.
Business
Objectives
Data Modeler
Relational,
Mainframe
Data Scientist
Data Analyst
Data Steward
Data Engineer
Business
Improve
Fraud
Detection
Documents
and Emails
Self-Service Autonomy
Operational Agility
Laboratory (insights)
Factory (actions)
Social Media,
Web Logs
Machine Device,
Cloud
Increase
Customer
Loyalty
Reduce
Security
Risk
Improve
Predictive
Maintenance
Increase
Operational
For big data management to truly be effective, you need to start with a platform that delivers
three key elements:
1. Dynamic and optimized big data integration
2. End-to-end big data governance and quality
3. Risk-centric big data security
In the early days of big data, most of the investment was in Hadoop and data analysis and
less on data governance. But as businesses have begun building complex architectures for
big data, challenges around data governance and data privacy have increased5. Now
as big data strategies maturewere seeing more interest in comprehensive big data
management platforms that handle data integration, data governance, and data security
for multiple projects across the enterprise. And although businesses are still intrigued by
Hadoop, investment in the technology remains tentative.
Integration
Big data integration should deliver high-throughput data ingestion and at-scale processing
so business analysts can make better decisions using next-generation analytics tools.
Big data integration helps businesses gain better insights from big data because it:
Speeds up development, leverages existing IT skills, and simplifies maintenance through
the use of a simple visual interface supported by easy-to-use templates
Increases performance and resource utilization by optimizing data processing execution
and providing flexible, hybrid deployment across a variety of platforms
Handles a wide variety of data sources though hundreds of pre-built transforms,
connectors, and orchestrates data flows by using broker-based data ingestion
EY, Big data: Changing the way businesses compete and operate. April 2014
6
Security
Risk-centric big data security analyzes all data to quickly detect and act upon risks and vulnerabilities. This
requires a 360-degree view of sensitive data, supported by risk analytics and policy-based protection of
data at risk. Big data security should de-identify information controlled by corporate policies and industry
regulations. Risk-centric big data security must enable:
Single pane of glass monitoring of sensitive data stores to provide visibility into the locations of
sensitive data
Sensitive data discovery and classification for a comprehensive 360 view of sensitive data
Usage and proliferation analysis for a precise understanding of data risk
Risk assessment to help prioritize investments in security programs
Non-intrusive persistent and dynamic data masking to protect sensitive data in development and
production environments to help minimize the risk of a security breach
Business
Objectives
Data Modeler
Data Scientist
Relational,
Mainframe
Data Analyst
Data Steward
Data Engineer
Business
Improve
Fraud
Detection
Documents
and Emails
Social Media,
Web Logs
Machine Device,
Cloud
Increase
Customer
Loyalty
Reduce
Security
Risk
Improve
Predictive
Maintenance
Increase
Operational
Figure 3: Data integration, governance, and security are three essential pillars of Big Data Management
6 KEY BUSINESS
OBJECTIVES
Here are six of the most common
business objectives for big data
projects today. View the Big
Data Management in Action
section for customer examples.
Increasing customer loyalty
Increasing operational
efficiency
Reducing security risks
Improving fraud detection
in financial services and
insurance
Improving predictive
maintenance in manufacturing
You can avoid the cost and bad publicity associated with security breaches
by understanding where all your sensitive PII/PHI customer data resides and
masking it to de-identify and de-sensitize the data.
And finally, you can build a data pipeline that is easy to maintain and
delivers information directly to consumers mobile devicesoptimizing their
personal engagement and increasing loyalty thereby increasing revenue and
market share.
Western Union Creates Enterprise Data Hub to Help Identify Trends and
Enhance Customer Experience
Ecommerce giant and money lender Western Union wanted to develop an omni-channel marketing approach
that integrated retail, web, and mobile and would help it expand into new markets with digital products. The
firm sought to reach customers with a more tailored and personalized experience, while also reducing risk.
Western Union built a new big data platform based on Hadoop and Informatica Big Data Management to
help the company identify trends and analyze data from multiple diverse sources (legacy, online, and mobile).
Western Union can now quickly evaluate customer preferences and buying patterns to enhance the overall
customer experience.
Find out more about Western Union and Informatica
Leading Insurance Firm Relies on Unified Big Data Platform to Power Marketing Campaigns
With nearly 20 million customers, a leading insurance firm wanted a 360-degree view of all consumer activity
for improved marketing, planning, and analytics. It sought to discover and mine relationships and to create
highly targeted and personalized campaigns.
Many data sources needed to be integrated, cleansed, and matched at scale from a variety of systems. Data
sources for marketing include customer profile data, Salesforce CRM, prospect and partner data, solicitation
history, web logs, and social media data.
Informatica provided a single big data management platform that delivers a consistent enterprise-wide
view across all business units. The platform enables rapid intake of new data sources, both structured and
unstructured, and eliminates data pipeline bottlenecks while increasing processing power for statistical
analytics. Insurance is a highly regulated industry, so the platform also supports data governance with tools
and processes to profile data, validate data quality, capture metadata, provide end-to-end data lineage, and
ensure security.
About Informatica
Informatica is a leading
independent software
provider focused on delivering
transformative innovation for
the future of all things data.
Organizations around the world
rely on Informatica to realize their
information potential and drive
top business imperatives. More
than 5,800 enterprises depend on
Informatica to fully leverage their
information assets residing onpremise, in the Cloud and on the
internet, including social networks.
A major oil and gas company is building a logical data warehouse using Informatica Big
Data Management and a Hadoop-based data lake that supports critical capabilities such as
data ingestion, metadata, data integration, data quality and data governance, master data
management, information security, and information sharing and reusability.
The company is among the largest U.S.-based independent natural gas and oil producers.
Its business users need fast, accurate data at their fingertips to make smart, quick decisions.
Informatica Big Data Management makes manual data entry and reconciliation a thing of
the past by providing authoritative and trusted data as it relates to wells, suppliers, and
other key master and reference data.
Informatica accelerates time-to-value with readily available skills to build data pipelines
in Hadoop that transform, prepare, and deliver data to the companys big data analytic
applications. This improves the automation of many core business tasks such as fracking
process efficiency, dispatching drivers out to locations more efficiently and improving
decision support during well construction.
We interviewed Informatica customers who have built a new data organization that turns
their data into a competitive advantage. Read the eBook How to organize the data-ready
enterprise, to discover the seven key principles for building an organization that creates
great data for real business value.
6
Gartner, 2015 Magic Quadrant for Data Integration Tools, Eric Thoo, Lakshmi Randall, 29 July 2015.
Gartner, 2014 Magic Quadrant for Data Quality Tools, Saul Judah, Ted Friedman, November 26,
2014
Gartner, 2014 Magic Quadrant for Master Data Management of Customer Data Solutions, Bill
OKane, Saul Judah, October 30, 2014.
Gartner 2014 Magic Quadrant for Data Masking Technology, Joseph Feiman and Brian Lowans,
December 10, 2014
Gartner, 2015 Magic Quadrant for Enterprise Integration Platform as a Service, Worldwide, Massimo
Pazzini, et al. March 23, 2015.
10
Worldwide Headquarters, 2100 Seaport Blvd, Redwood City, CA 94063, USA Phone: 650.385.5000 Fax: 650.385.5500
Toll-free in the US: 1.800.653.3871 informatica.com linkedin.com/company/informatica twitter.com/Informatica
2015 Informatica LLC. All rights reserved. Informatica and Put potential to work are trademarks or registered trademarks of Informatica in the
United States and in jurisdictions throughout the world. All other company and product names may be trade names or trademarks.
IN09_0915_02975