Big Data Analysis-Modul 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 59

Big Data Analysis

Modue 1
8th sem Elective - IV

Prof. Vinutha H
Asst. Professor
Dept. of CSE
Dr.AIT
What is BIG DATA?
• Big Data is the next generation of data warehousing and business analytics and is
poised to deliver top line revenues cost efficiently for enterprises

• Rapid pace of innovation and change; where we are today is not where we ’ll be in
just two years and definitely not where we ’ll be in a decade

• Big Data has been around for decades for firms that have been handling tons of
transactional data over the years—even dating back to the mainframe era

• But what ’s really going on???


• Computing perfect storm: Big Data analytics are the natural result of four major global trends
1) Moore ’s Law – technology always gets cheaper
2) Mobile computing - smart phone or mobile tablet in your hand
3) Social networking - Facebook, Foursquare, Pinterest, twitter etc.
4) Cloud computing - you don’t even have to own hardware or software anymore; you can
rent or lease someone else ’s

• Data perfect storm:


– Data flood gates have now opened with more volume , and the velocity and variety—
the three Vs—of data
– Storm of the three Vs makes it extremely complex and cumbersome with the current data
management and analytics technology and practices

• Convergence perfect storm: Traditional data management and analytics software + hardware
technologies + open-source technology + commodity hardware
– AII are merging to create new alternatives for IT and business executives
• “Big Data” isn’t new - companies that have dealt with billions of transactions for many years

• Recent innovations –
– give the ability to leverage new technology and approaches
– enable to affordably handle more data
– take advantage of the variety of—such as unstructured data

• Ability to store data in an affordable way has changed the game in industries

• People are able to store that much data now and more than they ever before
– They don ’t have to make decisions about which half to keep or how much history to keep
– It ’s now economically feasible to keep all the history and go back later start looking for it
again
• Aside from the changes in the actual hardware and software technology, there has also been a
massive change in the actual evolution of data systems
• 3 pinnacle stages in the evolution of data systems:
– Dependent (Early Days)
• Data systems were fairly new
• Users didn’t know quite know what they wanted
• IT assumed that “Build it and they shall come.”

– Independent (Recent Years)


• Users understood what an analytical platform was
• Users worked together with IT to define the business needs and approach for deriving
insights for their firm

– Interdependent (Big Data Era)


• Interactional stage between various companies, creating more social collaboration
beyond your firm ’s walls
• Big Data corporate applications and other applications turn their data skills into a helping hand
for humanity
– Eg1: A disaster recovery organization
• Using optimization analytics they help in directing the correct supplies to areas where
they are needed most
• Does a village need bottled water or boats, rice or wheat, shelter or toilets? Follow up
surveys 6, 12, 18, and 24 months following the disaster help track the recovery and
direct further relief efforts
– Eg2: A company called DataKind
• supports a data-driven social sector through services, tools, and educational resources to
help with the entire data pipeline
• In the service of humanity, they were able to secure funding from several corporations
and foundations
• As data and technology become more ubiquitous and the need for insights more pressing,
ordinary data scientists are finding themselves with extraordinary powers
• The world is changing and those who are stepping up to use data for the greater good have a
real opportunity to change it for the better
• Flood of data represents something we ’ve never seen before. It ’s new, it ’s powerful, and yes, it
’s scary but extremely exciting
• Treasured firms (Facebook, Google, LinkedIn, and eBay) that rely on the skills of new data
scientists
– They are breaking the traditional barriers by leveraging new technology and approaches to
capture and analyze data that drives their business
• Organizations ensure that they have a mechanism to change with the times
• At the end of the day, legacy data warehousing and BI analytics are not going away anytime soon
• It ’s all about finding the right home for the new approaches and making them work for us!
• Organizations capture trillions of bytes of information about their customers, suppliers, and
operations through digital systems
• Millions of networked sensors embedded in mobile phones, automobiles, and other products
are continually sensing, creating, and communicating data
• The result is a 40 percent projected annual growth in the volume of data generated
• What makes Big Data different from “regular” data?
• Different Definitions of BIG DATA
1) Big Data is the “data that becomes large enough that it cannot be processed using
conventional methods”
-O ’Reilly Open Source Convention

• 2) Big data refers to “datasets whose size is beyond the ability of typical database software
tools to capture, store, manage, and analyze”
- McKinsey study
• Big data in many sectors today will range from a few dozen terabytes to multiple petabytes
(thousands of terabytes)
• Big Data isn ’t just a description of raw volume. “The real issue is usability,”?
• The real challenge - Identifying or developing most cost-effective and reliable methods for
extracting value from all the terabytes and petabytes of data now available
Why Big Data Now?
• Past mistakes - investing in new technologies that didn’t fit into existing business frameworks
• many companies made substantial investments in customer-facing technologies that subsequently
failed to deliver expected value
• Management either forgot (or just didn’t know) that big projects require a synchronized
transformation of people, process, and technology - All three must be marching in step or the
project is doomed
• The technology of Big Data is the easy part

• Hard part is figuring out what you are going to do with the output
generated by your Big Data analytics

• Making sure that you have the people and process pieces ready
before buying the technology
A Convergence of Key Trends
• Big companies have been collecting and storing large amounts of data for a long
time
• Difference between “Old Big Data” and “New Big Data” – Accessibility
• Large amounts of information were stored on tape
• The real change has been in the ways that we access that data and use it to create
value
• Eg: technologies like Hadoop, make it functionally practical to access a tremendous
amount of data, and then extract value from it
• Convergence of several trends: -> more data -> Less expensive & faster hardware
• The availability of lower-cost hardware - easier and more feasible to retrieve and
process information, quickly and at lower costs
• Cost/benefit has really been a game changer – getting raw speed at an affordable price

• Ability to analyze Big Data in real time is making an impact


– Eg1: Insurance companies - need to know the answers to questions like this: As people age,
what kinds of different services will they need from us?
– In the past, the companies would have been forced to settle for general answers
– Today, they can use their data to find answers that are more specific and significantly more
useful
– can look at actual data, from real customers and can extract and analyze every policy they
’ve ever held - potentially petabytes worth of data considering all of your insurance
customers across the lifespan of their policies
– can compare one individual to all the other people in an age bracket and perform an
analysis, in real time
– If a customer service rep had access to that kind of information in real time
– Think of all the opportunities and advantages there would be, for the company and for the
customer
• Ability to analyze Big Data in real time is making an impact (contd…)
– Eg2: Shop in a store to buy a pair of pants.
– At the cash register and the clerk asks you if you would like to save 10 percent off your
purchase by signing up for the store ’s credit card - 99.9 percent of the time, answer is “No”
– If the store could automatically look at all the past purchases and see what other items were
bought when customer came in to buy a pair of pants—and then offer them 50 percent off a
similar purchase?
– Now that is relevant to customer
– The store isn’t offering another lame credit card—it ’s offering me something that customer
probably want, at an attractive price
• The two scenarios aren’t fantasies
• Yesterday, the cost of real-time data analysis was prohibitive
• Today, real-time analytics have become affordable
• Lastly not all new Big Data technology is open source – SAP HANA(an in-memory database
platform for real time analytics and applications)
Big Data Defined by Three dimensions: 3 V’s
1)Volume – defined as sheer quantity of transactions, events or amounts of history
– Analytics - used smaller data sets (samples) to create predictive
models for business and by using lesser samples predictive insight
has been severely blunted
– Removal of data volume constraint & usage of larger data sets
• Enterprises discover subtle patterns that leads to targeted actionable micro
decisions OR
• Enterprises factor in more observations into predictions there by increasing
the accuracy of the predictive models
• Enterprises look at data over a longer period of time to create more accurate
forecasts that mirror real-world complexities of interrelated bits of
information
2) Variety - Data variety is the assortment of data

• Traditional data - “structured” & it is put into a database based on the type of
data (i.e., character, numeric, floating point, etc)
• New data – “unstructred” eg: text, audio, video, image, geospatial, and Internet
data (including click streams and log files)
• “Semi-structured” data –
– a combination of different types of data
– has some pattern or structure that is not as strictly defined as
structured data Eg: call center logs may contain customer name +
date of call + complaint where the complaint information is
unstructured and not easily synthesized into a data store
– XML data
• 3) Velocity - speed at which data is created, accumulated, ingested, and
processed

– Increasing pace of the world has put demands on businesses to


process information in real-time or with near real-time responses

– Data is processed on the fly or while “streaming” by to make


quick, real-time decisions
A Wider Variety of Data Sources

• Internet data (i.e., clickstream, social media, social networking links, search engine data)
• Primary research (i.e., surveys, experiments, observations)
• Secondary research (i.e., competitive and marketplace data, industry reports, consumer data, business
data)
• Location data (i.e., mobile device data, geospatial data)
• Image data (i.e., video, satellite image, surveillance)
• Supply chain data (i.e., EDI, vendor catalogs and pricing, quality information)
• Device data (i.e., sensors, PLCs, RF devices, LIMs, telemetry)
• Black Box Data (i.e., voices of the flight crew, recordings of microphones and earphones, and the
performance information of the aircraft)
• Stock Exchange Data (i.e., information about the ‘buy’ and ‘sell’ decisions made on a share of different
companies made by the customers)
• Power grid data (i.e., information consumed by a particular node with respect to a base station)
• Transport data (i.e., Transport data includes model, capacity, distance and availability of a vehicle)
• Contd…..
• The wide variety of data leads to complexities in ingesting the data into data
storage

• The variety of data also complicates :

– The transformation of the data (or the changing of data into a


form that can be used in analytics processing) and

– Analytic computation of the processing of the data


The Expanding Universe of Unstructured Data
• Structured data (the kind that is easy to define, store, and analyze)

• Unstructured data (the kind that tends to complicate the data definition, takes up lots of storage
capacity, and is typically more difficult to analyze)
– does not have a predefined data model and/or does not fit well into a relational database
– typically text heavy, but may contain data such as dates, numbers, and facts

• Semi-structured data (the kind that describe structured data that doesn’t fit into a formal structure
of data models)
– does contain tags that separate semantic elements which includes the capability to enforce
hierarchies within the data
If unstructured data is so complicating, then why bother?

• The amount of data (all data, everywhere) is doubling every two years
• Most new data is unstructured (unstructured data represents almost 95 percent of new data, while
structured data represents only 5 percent)
• Unstructured data tends to grow exponentially, unlike structured data, which tends to grow in a
more linear fashion
• Unstructured data is vastly underutilized - there ’s a lot of money to be made for smart individuals
and companies that can mine unstructured data successfully
• The implosion of data is happening due to more open and transparent societies
– Eg: “Resumes used to be considered private information,” - “Not anymore with the advent of
LinkedIn.”
– Instagram and Flickr for pictures, Facebook for circle of friends, and Twitter for our personal
thoughts (and what the penalty can be given the recent London Olympics, where a Greek athlete
was sent home for violating strict guidelines on what athletes can say in social media)
“Even if you don’t know how you are going to apply it today,
unstructured data has value”

• Smart companies are beginning to capture that value, or they


are partnering with companies that can capture the value of
unstructured data
– For eg: some companies use unstructured social data to monitor
their own systems.
– If your customer-facing website goes down, hear about it really
quickly if you monitor Twitter
– Monitoring social media can also help to spot and fix embarrassing
mistakes before they cost serious money
Is Big Data analytics worth the effort?
• Yes, It will be a competitive advantage, and it’s likely to play a key role in sorting winners from losers
in ultracompetitive global economy
• But we need to learn to
– Use Big Data analytics to drive value the enterprise that aligns with core competencies and
creates a competitive advantage for the enterprise
– Capitalize on new technology capabilities and leverage existing technology assets
– Enable the appropriate organizational change to move towards fact based decisions, adoption
of new technologies, and uniting people from multiple disciplines into a single multidisciplinary
team
– Deliver faster and superior results by embracing and capitalizing on the ever-increasing rate of
change that is occurring in the global market place
• Big Data analytics uses a wide variety of advanced analytics, to provide:
– Deeper insights. Rather than looking at segments, classifications, regions,
groups, you ’ll have insights into all the individuals, all the products, all the
parts, all the events, all the transactions, etc.
– Broader insights. Big Data analytics takes into account all the data, including
new data sources, to understand the complex, evolving, and interrelated
conditions to produce more accurate insights.
– Frictionless actions. Increased reliability and accuracy that will allow the
deeper and broader insights to be automated into systematic actions
• GigaOm, a leading technology industry research firm, uses a simple framework (see Table 1.1 )
to describe potential Big Data Business Models for enterprises seeking to exploit Big Data
analytics
• The competitive strategies outlined in the GigaOm framework are enabled today via packaged or
custom analytic applications depending on the maturity of the competitive strategy in the
marketplace
• The key to success for organizations seeking to take advantage of this opportunity is:
■ Leverage all your current data and enrich it with new data sources
■ Enforce data quality policies and leverage today’ s best technology and people to support
the policies
■ Relentlessly seek opportunities to imbue your enterprise with fact based decision making
■ Embed your analytic insights throughout your organization
• With Big Data analytics, that Universe is vastly larger You can define new variables “on the fly”
• But with traditional methodologies - ability to ask questions is limited
• Why is the ability to define new variables so critically important?
– In the real world, you don’t always know what you ’re looking for. So you can’t possibly know in
advance which questions you ’ll need to ask to find a solution
• Dr. Usama Fayyad is one of the best minds in Big Data analytics. A world-renowned pioneer in the world of
analytics, data mining, and corporate data strategy, he was formerly Yahoo!’s chief data officer and
executive vice president, as well as founder of Yahoo!’s research organization
• Dr. Fayyad uses the second Palomar Sky Survey, a comprehensive effort to map the heavens, as an
analogy to explain the inherent problems of handling Big Data
• Second Palomar Sky Survey, a comprehensive effort to map the heavens, as an analogy to explain the
inherent problems of handling Big Data. The survey, also known as POSS II, generated a huge amount of
data
– Astronomers collect layers of resolution data about billions of stars and other objects
– It is similar to how businesses deal with their customers You know very little about the majority of your customers, and the data
you have is noisy, incomplete, and potentially inaccurate. It ’s the same with stars
– Astronomers need to take a deeper look, they use a much higher resolution telescope that has a much narrower field of view of
the sky i.e. looking at a very tiny proportion of the universe, but you ’re looking much. You can see whether they are stars or
galaxies or something else
– Similarly when you have higher resolution data, a lot of objects that were hardly recognizable in the main part of the survey
become recognizable
POSS II survey:
• Initially, the astronomers were working with 50 or 60 variables for each object – but they are too
many variables for the human mind to handle
• Eventually the astronomers discovered that only eight dimensions are necessary to make accurate
predictions
• They struggled with this problem for 30 years until they found the right variables. Of course, nobody
knew that they needed only eight and they needed the eight simultaneously - if any one of the key
attributes is dropped, it became very difficult to predict with better than 70 percent accuracy
whether something was a star or a galaxy
• But if you actually used all eight variables together, you could get up to the 90 to 95 percent level of
accuracy level that ’s critical for drawing certain conclusions
• Gathering data is often easier than figuring out how to use it. As the saying goes, “You don ’t know
what you don’t know.” Are all of the variables important, or only a small subset? With Big Data
analytics, you can get to the answer faster
• Most of us won ’t have the luxury of working a problem for 30 years to find the optimal solution
Industry Examples of Big Data
• Google’s digital marketing evangelist and author Avinash Kaushik spent the first 10 years of his
professional career in the world of business intelligence, during which he actually built large multi
terabyte data warehouses and the intelligence platforms
• some hard and valuable lessons during this time to become more competitive with data are:
– large, complicated multinational companies and build the single source of truth in a relational
database such as Oracle
– The single source of truth would reply on very simplistic data from ERP and other sources
– The single source of truth would reply on very simplistic data from ERP and other sources
– Now the big data warehouse approach does not work in the online world
– we were tasked to collect all the clickstream data from our digital activities and it worked great
for a few months, then it was a big disaster because everything that works in the BI world does
not work in the online world
– in order for you to be successful online, you have to embrace multiplicity , i.e to give up on these
deeply embedded principles that companies can survive by building a single source of the truth
• This multiplicity requires multiple skills in the decision-making team, multiple tools, and multiple
types of data (clickstream data, consumer data, competitive intelligence data, etc.)
• That ’s why most companies struggle with making smart decisions online because they cannot at
some level, embrace multiplicity and they cannot bring themselves to embrace incomplete data and
it is against their blood and DNA, they ’re forced to pick perfect data in order to make decisions
• Web, you have to learn how to use different kinds of tools to bring multiple types of data together
and make decisions at the speed of light!
• You have to embrace incorrect data and make 80 percent good decisions every day and then not wait
for the truth because by the time you get the truth, you ’re dead
• Digital marketing encompasses using any sort of online media channel. It can be any existence
online—whether profit or nonprofit—it doesn ’t matter
• And it means driving people to a website, a mobile app, and the like, and, once there, retaining
them, interacting with them. But digital marketing, too, has a material impact on what happens in
the “offline,”
Non-Line World
• A world David Hughes calls the “non-line world”:
• The reality is that consumers live in a non-line world
• They move fluidly between these two worlds
• There is such little friction. When I ’m in the offline world, I ’m using my mobile
phone or laptop to pull in information from the online world and vice versa. I ’m
taking snapshots now of coupons in the supermarket and then redeeming them
online
• Companies are organized to execute in an online world or an offline world. They
need to organize for the non-line world. It has an implication of people, systems,
marketing programs, data analysis
• Eg: Avinash Kaushik describes a simplistic scenario involving a major newspaper publisher that
struggled to find out with what people want to read when they come to their website or digital
existence or mobile app or iPad app:
– Analyst or the marketer log into Google analytics or Omniture ’s clickstream analysis tool and
look at the top viewed pages on their website, mobile apps, and so on
– That will help them understand what people want to read. Do they want to read more sports?
International news? In their minds that will help frame the front page of the digital platform they
have

• Actual reason for their problem:


– The only way that clickstream analysis tool can collect a piece of data is that a page has to be
rendered, it executes the code for good analytics, and it records the page
– But how will Google Analytics or Omniture actually measure the content that a typical reader
wanted to read but could not find because of in-efficient website navigation?
– In that scenario, there is no way that the publisher knows the content that reader could not find
because the page wasn’t rendered
• In order to actually find the data, one of the ways you would do it is you ’d pop up a
survey at the end of the visit by the person and you ’d ask two different questions:
Why are you here? Then they would ask you, were you able to complete your task?
If the answer comes back to be no, it dynamically asks you what were you looking
for.
• Then you would use that actual text by the user in order to understand what is it
that people want that doesn ’t execute the code that would give the content that
people really want.
• Then we have to bring these two pieces of disparate data together from two
completely different systems—mash that massive data set together and then be
smart enough to make a decision. This is what people actually want when they
come to our website. Or this is what people were looking for, but we failed them.
• Truth - Digital marketing isn ’t easy
– One facet is that corporations used to have all the data they needed for people who were their
“consumers”
– They could do secondary research or some kind of primary research in order to collect data
about people who might be prospective customers
• one of the problems to deal with in a Big Data, digital existence, is google do not have access to data
for all its consumers. There is massive fragmentation
• Really good information about those people is obtained when they interact with Google’s primary
platforms, the ones they own, and they have very little information
• But once they start to interact with the company on other platforms—Facebook, Twitter, Google+,
pick a platform
• Then they have very little visibility about people when they meet in concentrated masses and talk
about us, on platforms that are not ours in any way.
• They are starting to lose control of our ability to access the data we need in order to make smart,
timely decisions
• You have to have the primary outpost from where you can collect your own “big data” and have a
really solid relationship with the consumers you have and their data so you can make smarter
decisions
• Eg: companies that do this best are consumer package goods companies such as Proctor & Gamble
(P&G)
– P&G can plug into the digital marketing world and get direct access to consumers through social
media. Once that connection is made, it helps the firm with critical business decisions such as
new product development
Is IT Losing Control of Web Analytics?
• Its really complex to build an infrastructure to access and manage all the Internet data within IT
department
• Way back, IT departments would opt for a four-year project and millions of dollars to go that route
• Today this sector has built up an ecosystem of companies that spread the burden and allow others to
benefit
• Corporate information officers (CIOs) are and will continue to lose massive amounts of control over data
and create large bureaucratic organizations whose only purpose is to support, collect, create, mash data,
and be in the business of data “puking”. Such CIOs are “losing control in spades”
• Kaushik ’s “elevator pitch” to businesses is that Big Data on the Web will completely transform a company
’s ability to understand the effectiveness of its marketing and hold its people accountable for the millions
of dollars that they spend. It will also transform a company ’s ability to understand how its competitors are
behaving
• create a democracy in your organization where, rather than a few people making big decisions, the
“organization is making hundreds and thousands of smart decisions every day and having the kind of
impact on your company that would be impossible in the offline world. Not that the offline world is bad or
anything, it ’s just that the way the data gets produced, assessed, and used on the Web is dramatically
different.”
Database Marketers, Pioneers of Big Data
• Shaun Doyle, president and CEO of Cognitive Box
– Database marketing concerned with
• building databases containing information about individuals,
• using that information to better understand those individuals, and
• communicating effectively with some of those individuals to drive business value
– Marketing databases are typically used for a couple of things :
1) Customer acquisition - There are some large-scale databases in the U.S. with large prospect
universes and those prospect universes are used to acquire and drive incremental business
2) Retaining and cross-selling to existing customers, which reactivates the cycle
• Database Marketing (1960s & 1970s)
– mainframe systems were built to contain information on customers and information about the
products and services those customers were buying
– Companies used this information to drive communications to those customers
– companies began extracting data from the mainframe, putting the data into separate databases, and
then using those databases to drive direct mail activity
• As companies grew and systems proliferated, we ended up with a situation where you had one system for
one product, another system for another product, and then potentially another system for another
product, there were duplicates
• Then companies began developing technologies to manage and duplicate data from multiple source
systems. Companies such as Acxiom and Experian started developing software that could eliminate
duplicate customer information (de-duping)
• That enabled them to extract customer information from those siloed product systems, manage the
information into a single database, remove all the duplicates, and then send direct mail to subsets of the
customers in the database
• Companies such as Reader's Digest and several of the larger financial services firms were early champions
of this new kind of marketing, and they used it very effectively
• Database Marketing (1980s)
– marketers developed the ability to run reports on the information in their
databases
– The reports gave them better and deeper insights into the buying habits and
preferences of customers
– Companies began storing contact history, which enabled them to determine
which kinds of direct mail marketing campaigns generated the most responses
and which kinds of customers were more likely to respond
– Telemarketing became popular when marketers figured out how to feed
information extracted from customer databases to call centers
• Database Marketing (1990s)
– email entered the picture, and marketers quickly saw opportunities for reaching customers via the
Internet and the World Wide Web
– As the dot-com boom accelerated, marketers rapidly adopted new technologies for pulling data from the
web and using it to fuel online marketing campaigns
– The Web also enabled marketers to launch campaigns based on behavioral data, and the promise of “real-
time marketing” seemed just over the horizon
– Software vendors also saw an opportunity to sell marketing solutions directly to marketers
– The software vendors saw a new market and began developing products that didn’t require highly
specialized knowledge to generate acceptable ROIs
– Technology started to help marketers collect more relevant data to make informed decisions
– Companies began bringing these solutions in-house
• Began using new software for campaign management, reporting, and even predictive analytics
advantage
• It enables you to bring in lots of different data from different place
• Started to see detailed data based on a product, detailed transactional data(in the case of banks, they
saw detailed data about individual transactions within bank accounts)
• Signs of scale emerging - Databases got larger and larger

– As technology evolved to absorb greater volumes of data, the costs of data environments started
to come down, and companies began collecting even more transactional data
– Today, many companies have the capability to store and analyze data generated from every
search you run on their websites, every article you read, and every product you look at
– By combining that specific data with anonymous data from external sources, they can predict
your likely behavior with astonishing accuracy
– It might sound creepy, but it ’s also helping keep us safe from criminals and terrorists. “A lot of
the technology used by the CIA and other security agencies evolved through database
marketing,” says Doyle.
– Some of the tools originally developed for database marketers are now used to detect fraud and
prevent money-laundering
Big Data and the New School of Marketing
• Dan Springer, CEO of Responsys, defines the new school of marketing
– “Today ’s consumers have changed. They ’ve put down the newspaper, they fast
forward through TV commercials, and they junk unsolicited email. Why?
– They have new options that better fit their digital lifestyle.
– They can choose which marketing messages they receive, when, where, and
from whom. They prefer marketers who talk with them, not at them.
– New School marketers deliver what today ’s consumers want: relevant
interactive communication across the digital power channels: email, mobile,
social, display and the web”
• Consumers Have Changed. So Must Marketers
– lifecycle model is still the best way to approach marketing
– But linear concept of succession of lifecycle “stages,” is no longer a useful framework for planning marketing
campaigns and programs
– Because today ’s new cross-channel customer is online, offline, captivated, distracted, satisfied, annoyed,
vocal, or quiet at any given moment
– Marketing to today ’s cross-channel consumer demands a more nimble, holistic approach, one in which
customer behavior and preference data determine the content and timing—and delivery channel—of
marketing messages
– Marketing campaigns should be cohesive: content should be versioned and distributable across multiple
channels.
– Marketers can still drive conversions and revenue, based on their own needs, with targeted campaigns sent
manually, but more of their marketing should be driven by—and sent via preferred channels in response
to—individual customer behaviors and event
– How can marketers plan for that?
• Permission, integration, and automation are the keys
• Along with a more practical lifecycle model designed to make every acquisition marketing investment
result in conversion, after conversion, after conversion
The Right Approach: Cross-Channel Lifecycle
Marketing
• starts with the capture of customer permission, contact information, and
preferences for multiple channels
• It also requires marketers to have the right integrated marketing and customer
information systems, so that
(1) they can have complete understanding of customers through stated
preferences and observed behavior at any given time; and
(2) they can automate and optimize their programs and processes throughout
the customer lifecycle
• Once marketers have that, they need a practical framework for planning marketing
activities
Loops that guide marketing strategies and tactics in the Cross-Channel Lifecycle
Marketing approach: conversion, repurchase, stickiness, win-back and re-permission
Social and Affiliate Marketing
• Word-of-mouth marketing has been the most powerful form or marketing since before the Internet
– Eg: Avon Lady was organizing Tupperware parties made buying plastics acceptable back in the 1940s
• The concept of affiliate marketing, or pay for performance marketing on the Internet is credited to William J. Tobin, the
founder of PC Flowers & Gifts as he was granted patents around the concept of an online business rewarding another site (an
affiliate site) for each referred transaction or purchase
• Amazon.com launched its own affiliate program in 1996 and middleman affiliate networks like Linkshare and Commission
Junction emerged preceding the 1990s Internet boom, providing the tools and technology to allow any brand to put affiliate
marketing practices to use
• Today, industry analysts estimate affiliate marketing to be a $3 billion industry
• In 2012, using social web Facebook, Twitter, and Tumblr, now any consumer with can do affliate marketing.
• Couponmountain.com and other well know affiliate sites generate multimillion dollar yearly revenues for driving transactions
for the merchants they promote
• Most people trust a recommendation from the people they know
• Professional affiliate marketing sites provide the aggregation of many merchant offers on one centralized site, they completely
lack the concept of trusted source recommendations
• Using the backbone and publication tools created by companies like Facebook and Twitter, brands will soon find that
rewarding their own consumers for their advocacy is a required piece of their overall digital marketing mix
Empowering Marketing with Social Intelligence
• Niv Singer, Chief Technology Officer at Tracx, a social media intelligence software provider, say about
the big data challenges faced in the social media realm and how it ’s impacting the way business is
done today—and in the future
– As a result of the growing popularity and use of social media “big data” is created which is
immense, and continues to grow exponentially
– Millions of status updates, blog posts, photographs, and videos are shared every second
– Successful organizations will not only need to identify the information relevant to their company
and products—but also be able to dissect it, make sense of it, and respond to it in real time and
on a continuous basis, drawing business intelligence that help predict future customer behavior
• Real challenge is to unify social profiles for a single user who may be using different names or
handles on each of their social networks
• So an algorithm combs through key factors(content of posts, and location, etc) among others, to
provide a robust identity unification
• Client should have the flexibility to sort influencers by any of these characteristics:
– Very intelligent software is required to parse all that social data to define things like the sentiment of a post.
– A system that also learns over time what that sentiment means to a specific client or brand
– Then represent that data with increased levels of accuracy
– This provides clients a way to “train” a social platform to measure sentiment more closely to the way they would be doing
it manually themselves.
– It’s important for brands to be able to understand the demographic information of the individual driving social discussions
around their brand such as gender, age, and geography so they can better understand their customers and better target
campaigns and programs based on that knowledge
• In terms of geography
– Social check-in data from Facebook, Foursquare, and similar social sites and applications over maps are combined to show
brands at the country, state/region, state, and down to the street level where conversations are happening about their
brand, products, or competitors
– This capability enables marketers with better service or push coupons in real time, right when someone states a need,
offering value, within steps from where they already are, which has immense potential to drive sales and brand loyalty
• Singer believes that the real power comes in mining social data for business intelligence, not only for marketing, but also for
customer support and sales
• As a result, they’ve created a system to be a data management system that just happens to be focused on managing
unstructured social data, but which can easily integrate with other kinds of data sets too
• Eg1: Integration with CRM systems like Salesforce.com and Microsoft Dynamics to enable
companies to get a more holistic view of what ’s going with their clients by supplementing existing
data sets (which are more static in nature) with the social data set (which is more dynamic and real-
time)
• Eg2: Integration with popular analytics platforms like Google Analytics and Omniture, so marketers
can see a direct correlation and payoff of social campaigns through improved social sentiment or an
increase in social conversations around their brand or product
• Social media is the world ’s largest and purest focus group
• Marketers now have the opportunity to mine social conversations for purchase intent and brand lift
through Big Data
• So, marketers can communicate with consumers when they are emotionally engaged, regardless of
the channel
• Since this data is captured in real-time, Big Data is coercing marketing organizations into moving
more quickly to optimize media mix and message
• This data sheds light on all aspects of consumer behavior, companies are aligning data to insight to
prescription across channels, across media, and across the path to purchase
Fraud and Big Data
• Fraud is intentional deception made for personal gain or to damage another individual
• Most common forms of fraudulent activity is Credit card fraud
• The credit card fraud rate in all the countries is increasing
• As per Javelin ’s research, “8th Annual Card Issuers’ Safety Scorecard: Proliferation of Alerts Lead to
Quicker Detection Time and Lower Fraud Costs,” credit card fraud incidence increased 87 percent in
2011 culminating in an aggregate fraud
• Despite the significant increase in incidence, total cost of credit card fraud increased only 20 percent
• The comparatively small rise in total cost can be attributed to an increasing sophistication of fraud
detection mechanisms
One approach to solve fraud with Big Data
• According to the Capgemini Financial Services Team:
– Even though fraud detection is improving, the rate of incidents is rising
– This means banks need more proactive approaches to prevent fraud
– While issuers’ investments in detection and resolution has resulted in an influx of customer-
facing tools and falling average detection times among credit card fraud victims, the rising
incidence rate indicates that credit card issuers should prioritize preventing fraud.
• Social media and mobile phones are forming the new frontiers for fraud
• Social networks are a great resource for fraudsters, consumers are still sharing a significant amount
of personal information frequently used to authenticate a consumer ’s identity. Those with public
profiles (those visible to everyone) were more likely to expose this personal information.
• In order to prevent the fraud, credit card transactions are monitored and checked in near real time. If
the checks identify pattern inconsistencies and suspicious activity, the transaction is identified for
review and escalation.
• The Capgemini Financial Services team believes that due to the nature of data streams and
processing required, Big Data technologies provide an optimal technology solution based on the
following three Vs:
– 1.High volume. Years of customer records and transactions (150 billion1 records per year)
– 2.High velocity. Dynamic transactions and social media information
– 3.High variety. Social media plus other unstructured data such as customer emails, call center
conversations, as well as transactional structured data
• Capgemini ’s new fraud Big Data initiative focuses on flagging the suspicious credit card transactions
to prevent fraud in near real-time via multi-attribute monitoring
• Real-time inputs involving transaction data and customers records are monitored via validity checks
and detection rules
• Pattern recognition is performed against the data to score and weight individual transactions across
each of the rules and scoring dimensions
• A cumulative score is then calculated for each transaction record and compared against thresholds to
decide if the transaction is potentially suspicious or not
• The Capgemini use an open-source weapon named Elastic Search, which is a distributed, free/open-
source search server based on Apache Lucene.
– It can be used to search all kind of documents at near real-time
– They use the tool to index new transactions
– Data specific to the index historical data sets can be used in conjunction with real-time data to identify
deviations from typical payment patterns
– This Big Data component allows overall historical patterns to be compared and contrasted, and allows the
number of attributes and characteristics about consumer behavior to be very wide, with little impact on
overall performance.
– Once the transaction data has been processed, the percolator query then identify the functioning of
identifying new transactions that have raised profiles
– Percolator is a system for incrementally processing updates to large data sets.
– Percolator is the technology that Google used in building the index— that links keywords and URLs—used to
answer searches on the Google page
– Percolator query can handle both structured and unstructured data.
– This provides scalability to the event processing framework, and allows specific suspicious transactions to
be enriched with additional unstructured information—phone location/geospatial records, customer travel
schedules, and so on
– This ability to enrich the transaction further can reduce false positives and increase the experience of the
customer while redirecting fraud efforts to actual instances of suspicious activity
Another approach to solving fraud with Big Data -
social network analysis (SNA)
• SNA is the precise analysis of social networks.
• Social network analysis views social relationships and makes assumptions

• SNA reveal all individuals involved in fraudulent activity, from perpetrators to their associates, and understand their relationships and behaviors
to identify a bust out fraud case
• According to a recent article in bankersonline.com posted by Experian, “bust out” is a hybrid credit and fraud problem and the scheme is
typically defined by the following behavior:
– The account in question is delinquent or charged-off
– The balance is close to or over the limit
– One or more payments have been returned
– The customer cannot be located
– The above conditions exist with more than one account and/or financial institution.

• There are some Big Data solutions in the market like SAS ’s SNA solution, which helps institutions and goes beyond individual and account views
to analyze all related activities and relationships at a network dimension.
• The network dimension allows you to visualize social networks and see previously hidden connections and relationships, which potentially could
be a group of fraudsters.
• Obviously there are huge reams of data involved behind the scene, but the key to SNA solutions like SAS ’s is the visualization techniques for
users to easily engage and take action
Risk and Big Data
• The two most common types of risk management: credit risk management and market risk
management
• The tactics for risk professionals typically include
– avoiding risk
– reducing the negative effect or probability of risk
– or accepting some or all of the potential consequences in exchange for a potential upside gain
• Credit risk analytics - focus on past credit behaviors to predict the likelihood that a borrower will
default on any type of debt by failing to make payments which they obligated to do. For example, “Is
this person likely to default on their $300,000 mortgage?”
• Market risk analytics - focus on understanding the likelihood that the value of a portfolio will
decrease due to the change in stock prices, interest rates, foreign exchange rates, and commodity
prices. For example, “Should we sell this holding if the price drops another 10 percent?”
Credit Risk Management
• Credit risk management is a critical function that spans a diversity of businesses across a wide range
of industries
• Ori Peled is the American Product Leader for MasterCard Advisors Risk & Marketing Solutions. He
brings several years of information services experience in his current role with Master-Card and
having served in various product development capacities at Dun &Bradstreet. Peled shares his insight
with us on credit risk:
– Whether you ’re a small B2B regional plastics manufacturer or a large global consumer financial
institution, the underlying credit risk principles are essentially the same: driving the business
using the optimal balance of risk and reward
• Traditionally, credit risk management was rooted in the philosophy of minimizing losses
• credit risk professionals and business leaders came to understand that there are acceptable levels of
risk that can boost profitability beyond what would normally have been achieved by simply focusing
on avoiding write-offs
• The shift to the more profitable credit risk management approach aids large part to an ever-
expanding availability of data, tools, and advanced analytics

• Credit risk professionals are stakeholders in key decisions addressing


– Finding new and profitable customers to maintaining and growing relationships with existing
customers
– Maximizing the risk and reward opportunities requires that risk managers understand their
customer portfolio, allowing them to leverage a consistent credit approach while acknowledging
that you can ’t treat every customer the same.
• As businesses grow, what starts out as a manual and judgmental process of making credit decisions
gives way to a more structured and increasingly automated process in which data-driven decisions
becomes the core
• Decisions that impact not only revenue but also operational costs like staffing levels of customer
support representatives or collections agents
• The future of credit risk management will continue to change as we leverage new data sources emanating from
a highly digital and mobile world
• Eg: Social media and cell phone usage data are opening up new opportunities to uncover customer behavior
insights that can be used for credit decisioning
• This is especially relevant in the parts of the world where a majority of the population is unbanked and
traditional bureau data is unavailable.
• There are four critical parts of the typical credit risk framework: planning, customer acquisition, account
management, and collections
• All four parts are handled in unique ways through the use of Big Data.

You might also like