Big Data
Big Data
Big Data
INTRODUCTION
1.1) INTRODUTION
Big data is a broad term for data sets so large or complex that traditional data processing applications are
inadequate. Challenges include analysis, capture, data curation, search, sharing, storage, transfer, visualization,
and information privacy. The term often refers simply to the use of predictive analytics or other certain
advanced methods to extract value from data, and seldom to a size of data set. Accuracy in big data may lead
to more confident decision making. And better decisions can mean greater operational efficiency, cost
reductions and reduced risk.
Analysis of data sets can find new correlations, to "spot business trends, prevent diseases, combat crime and
so on." Scientists, practitioners of media and advertising and governments alike regularly meet difficulties
with large data sets in areas including Internet search, finance and business informatics. Scientists encounter
limitations in e-Science work, including meteorology, genomics, connect omics, complex physics simulations,
and biological and environmental research.
Data sets grow in part because they are increasingly being gathered by cheap and numerous information-
sensing mobile devices, aerial (remote sensing), software logs, cameras, microphones, radio-frequency
identification (RFID) readers, and wireless sensor networks. The world's technological per-capita capacity to
store information has roughly doubled every 40 months since the 1980s; as of 2012, every day 2.5 exabytes
(2.5×1018) of data were created; The challenge for large enterprises is determining who should own big data
initiatives that straddle the entire organization.
1
1.2) Definition
Big data usually includes data sets with sizes beyond the ability of commonly used software: -
Tools to capture, curate, manage, and process data within a tolerable elapsed time. Big data "size" is a
constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data
is a set of techniques and technologies that require new forms of integration to uncover large hidden values
from large datasets that are diverse, complex, and of a massive scale.
In a 2001 research report and related lectures, META Group (now Gartner) analyst Doug Laney defined data
growth challenges and opportunities as being three-dimensional, i.e. increasing volume, velocity, and variety.
Gartner, and now much of the industry, continue to use this "3Vs" model for describing big data. In 2012,
Gartner updated its definition as follows: "Big data is high volume, high velocity, and/or high variety
information assets that require new forms of processing to enable enhanced decision making, insight discovery
and process optimization." Additionally, a new V "Veracity" is added by some organizations to describe it.
If Gartner’s definition (the 3Vs) is still widely used, the growing maturity of the concept fosters a sounder
difference between big data and Business Intelligence, regarding data and their use:
Business Intelligence uses descriptive statistics with data with high information density to measure things,
detect trends etc.; Big data uses inductive statistics and concepts from nonlinear system identification to infer
laws from large sets of data with low information density to reveal relationships, dependencies and perform
predictions of outcomes and behaviours.
2
A more recent, consensual definition states that "Big Data represents the Information assets characterized by
such a High Volume, Velocity and Variety to require specific Technology and Analytical Methods for its
transformation into Value".
1.3) Characteristics
• Volume – The quantity of data that is generated is very important in this context. It is the size of the
data which determines the value and potential of the data under consideration and whether it can be
considered Big Data or not. The name ‘Big Data’ itself contains a term which is related to size and
hence the characteristic.
• Variety - The next aspect of Big Data is its variety. This means that the category to which Big Data
belongs to is also a very essential fact that needs to be known by the data analysts. This helps the
people, who are closely analysing the data and are associated with it, to effectively use the data to their
advantage and thus upholding the importance of the Big Data.
• Velocity - The term ‘velocity’ in the context refers to the speed of generation of data or how fast the
data is generated and processed to meet the demands and the challenges which lie ahead in the path of
growth and development.
• Variability - This is a factor which can be a problem for those who analyse the data. This refers to the
inconsistency which can be shown by the data at times, thus hampering the process of being able to
handle and manage the data effectively.
• Veracity - The quality of the data being captured can vary greatly. Accuracy of analysis depends on
the veracity of the source data.
• Complexity - Data management can become a very complex process, especially when large volumes
of data come from multiple sources. These data need to be linked, connected and correlated in order to
be able to grasp the information that is supposed to be conveyed by these data. This situation, is
therefore, termed as the ‘complexity’ of Big Data.
3
Everything we do either online or offline is a data source. The ability or methods to measure and collect data
is improved with changing technology. We understand this world to study the behavioural trends. But the
problem they face today is that technology is very expansive and there is huge data available. Systematization,
research and understanding of this information has become more complex than ever as it is full of numbers,
facts, proportions, and unlimited perceptions. Big data has been going on for many years in the digital world,
but that's a sad thing, it's a concept that others do not understand at all.
Big Data is a very common word in today’s generation where it has got huge application in businesses. It can
be used to solve many problems, take crucial decision and many more. In this paper we would focus big data
influence on one aspect of the business that is e-commerce. This data offers insight for e-commerce businesses.
E-commerce business owners can take the information from big data and use it to study trends that will help
them gain more customers and streamline operations for success. This data contains information about e-
commerce activity. E-commerce owners can capture information from a large volume of data and exploit it to
explore trends, attract more customers, streamline the business, and succeed-commerce involves use of both
business and technology , it require flow of information and many more and all this information collected on big
scale converts information as big data , thus e-commerce players require use of big data processing system to
run their business profitable-commerce can use the big data for the analysis of sale, tracking of faulty
transactions, forecasting supply and demand and many more things.
4
CHAPTER 2
Five aspects of the digital economy are converging to create the Big Data phenomenon:
As a result, an unprecedented amount of digital data is being created all around the world.
2. To collect customer data that can be used to improve sales and service and for targeted advertising
The phrase Big Data appears a lot in the media these days. It is used to describe a statistical approach to
genetics or epidemiological projects, the sequencing of DNA, and to explain the new search and storage
technologies that allow companies to scan different types of online media for “sentiment” data. Many people,
hearing the phrase, think of Google, or possibly the NSA, and the implications of collecting and selling
personal data on civil liberties and privacy. Others see it manifested in new technologies like Hadoop, or cloud
computing, or the coming Internet of Things. Some say Big Data is revolutionary; others that it is all
overblown, an unnecessarily capitalized sobriquet for injecting excitement into what is simply the next
evolutionary phase in the advancement of information technology.
5
It can be argued that Big Data is all those things, and yet not limited to any one defining characteristic, making
it a “big thing” in the sense that all these various interpretations are a reflection of a larger process of
technological and economic change that is just now beginning to mature and manifest itself in what I refer to
as the Big Data-intelligence complex a group of wealthy and influential companies and government agencies
responsible for the myriad of technological developments powering growth and innovation in the economy
and revolutionizing how we communicate, entertain ourselves, and interact with others throughout the world
Is Big Data revolutionary and transformational? Probably so, by historical standards, although it may not really
matter whether it is a process of evolution or revolution, or even whether Big Data qualifies as a
transformational technology in the manner of electricity, the telephone, or the internal combustion engine.
That debate seems to me to be just a marketer’s way of trying to label (and glamorize) an amorphous and
rapidly changing group of technologies. But however it is described, it is important to consider where Big
Data as a broader phenomenon takes us over the next decade, because the emerging Big Data-intelligence
complex is already proving to be both rambunctious and irrepressible, introducing innovation at a rate that is
almost impossible to follow, much less regulate. That has implications not just for civil liberties and personal
privacy (which are already significant), but for the way in which our businesses, our global economy, our
laws, and even the relationships between nations, develop in the future.
6
1.5) THE BIG DATA ECOSYSTEM
First and most prominent is the familiar consumer technology: The Internet, e-commerce, telematics, social
media, and mobile technologies that combine to create a consumer-driven Big Data industry. This is all about
entertainment and smartphones and instant messaging. We live it and see it around us every day: The quarrels
and the mergers and the Initial Public Offering (IPOs), the money surrounding these tools and toys, and the
billions being offered for the latest app. The consumer-driven Big Data industry is helpful, and it’s
entertaining, even distracting, and because of that, we have a tendency to trivialize it somewhat, to see it as
being not as serious or as economically important as the normal, productive economy, essential to employment
and economic growth. But the consumer-driven Big Data industry is not trivial, even if it involves Tweets,
photo sharing, and Angry Birds. This is big business, and big money, concentrating our best and brightest
minds on advertising, apps, and games and ever-more-clever ways of capturing enormous amounts of personal
data.
The actual technologies are impressive, if not technologically revolutionary. But what is more likely to be
revolutionary is that increasingly the companies that dominate in this consumer-driven Big Data economy
Google, Amazon, Facebook, Alibaba, Twitter, Apple, and the myriad of Internet-related start-ups that support
and feed off these companies around the world will also dominate the industrial side of the economy in the
next decades. For the foreseeable future, economic growth, not only in the developed economies but in the
developing world will be determined by where these big data giants take us. Some may worry that to have so
much of our global economic future tied to a handful of gatekeeping technology companies is at least unsettling
if not downright scary.
7
But the consumer side of Big Data that we see every day is only one aspect of the phenomenon. While
consumer-driven Big Data industry is roaring away, another possibly even more important side of Big Data is
emerging. It is the industrial side of Big Data applied to what is increasingly seen as the “old” economy. This
is because the combination of mechatronics and Internet-based technologies is transforming the collection and
analysis of business data in a much more traditional and orthodox but nonetheless important way. New self-
reporting sensors, components, and systems now can feed performance data into ever-more sophisticated
enterprise computing systems, making traditional business functions such as sales, accounting, inventory, and
logistics much more efficient. These innovative machine-based data collection and analysis technologies lie
behind the expansion of the Industrial Internet and the Internet of Things, and together are causing a parallel
deluge of digital data generation and collection.
And although the hardware may still be manufactured by the “old economy” powerhouses like GE or Siemens
or Erikson, the companies that are likely to make the Internet of Things happen and to control it and profit
from it when it does happen will be the new and powerful young Turks like Google, Facebook, and Amazon.
That’s because their core competency is mining and analysing Big Data. Those who once worried IBM would
be their Big Brother should think again. When the battle for the Internet is over, chances are that IBM and GE
will simply provide the supportive infrastructure to help Amazon, Google, and Facebook control the digital
data flowing to and from our businesses, homes, cars, and smart phones.
Again, in themselves these industrial Internet Big Data technologies are not revolutionary. My Saab has been
alerting me to of ongoing mechanical and electrical failures for six years. Predictive diagnostics are helpful,
but they still don’t repair the car. But Google may soon be driving the car for me. And Google will suggest
where I should go to have it repaired and how to get there. And the bill will be paid with the Ease app, activated
by my voice command or a nod while wearing Google Glass, after I’ve scanned the QR code through its
camera and activated my virtual Apple Pay or Bitcoin wallet to transfer the payment. And while I’m away,
Google will adjust my home’s thermostat using Nest technologies while Amazon orders the parts and
organizes the repairs, along with my groceries and my dry cleaning. And I will monitor and direct it all from
my mobile, running on Google’s Android operating system, which will allow Google to monitor and capture
all that activity, and add it to the ever-growing digital profile that it has on me, so that I can receive customized
advertising. Google will then record how I react to that coupon, or if I recommend the service centre to social
media friends through a “like” button, and then Google will follow that cookie onto the accounts of my friends
and similar coupons will appear on their Facebook sites inviting them to come to the service centre and the
digital data collection will continue and grow and grow.
8
This apocryphal story about an aging Saab makes a more serious point. When combined, the convergence of
these three Big Data trends the Consumer Internet, the Industrial Internet, and the Internet of Things begins to
take on new significance.
That is in part because in parallel to these major Big Data trends, another powerful industry has emerged the
digital data collection industry. It consists of the big Internet players like Google, Yahoo!, Facebook, and
Twitter, and online retailers like Amazon and Apple. It also includes most of the major online and offline retail
stores such as (in the United States) Walmart, Target, and Walgreens, which collect and sell their customers’
personal and transaction data. It also consists of hundreds of online data tracking software and service
companies that most people have never heard about but that monitor our everyday online activity, following
our digital footprints and selling the data to advertisers and employment agencies and debt collectors, and
anyone else who will pay for it. And, of course, it also includes the major advertising agencies, and the large
data aggregators like Experian, FICO, and Acxiom who had their origins in credit reporting but now maintain
colossal databases on the personal and private details of millions of people around the world.
9
Together, this sometimes competing, sometimes mutually supporting, collective of data-handlers has become
a powerful, shadowy, economic force, making a fortune by interpreting and selling consumer-related data and
allowing companies to know much, much more about those consumers than they ever thought possible. And
the data collectors derive their power from the fact that they have the databases and the tools to control the
data everyone wants to get their hands on to determine how it is distributed, who sees it, and how it is used.
They are the drivers, the manipulators, the monetizers of Big Data. They are important in fact, essential to the
success of a Big Data economy, because they are the ones that spin raw data into the supposed gold of
customer-targeted advertising.
First, Big Data is not just about crunching numbers. Big Data is about collecting and utilizing the
unprecedented almost inconceivable amount of digital data now available and applying new analytical tools
to reveal new insight from that data. It is called Big Data because quantity is the key element, and it is premised
on the fact that all over the world an explosion of digital data is occurring every second and nearly everywhere.
Simply to describe this digital explosion enters territory far beyond the meagre gigabyte or terabyte thresholds
that used to impress; today we talk in terms of petabytes, exabytes, yottabytes, and zettabytes. My personal
favourite, partly because it sounds like something that the Flintstones might have ordered as a take-out, is the
brontobyte.
10
For most of us, these numbers don’t reveal much. I’ve worked in data management most of my life, and I have
no sense of what it means when IBM estimates that there are an additional 2.5 quintillion bytes of data being
generated every day. Fig provides a good sense of the explosive nature of this growth of digital data.
Comparisons can help. For example, it is estimated that somewhere around 2011, the amount of data being
produced around the world exceeded 1.8 zettabytes (1.8 trillion gigabytes), at which point there were as many
bytes held electronically as there are stars in the universe. Or consider that the 25 petabytes of new data
entering the Internet every day is 70 times larger than the total of all the collections in the Library of
Congress. The IDC estimates that the digital universe will grow by a factor of ten from 4.4 exabytes in 2013
to 44 exabytes by 2020.
11
Better still, we can think in terms of transactions that we are familiar with. For example, every minute, 48
hours’ worth of new video is loaded onto YouTube. In that same 60 seconds, 34,722 “likes” are recorded on
Facebook, and 571 new web sites are created around the world. In one hour, the point-of-sale systems for
Walmart capture more than 1 million customer transactions.3 Each day there are more than 180 billion e-mails
exchanged around the world, and it has recently been announced that the Library of Congress is maintaining
a comprehensive collection of the more than 500 million Tweets sent every day; leaving them currently with
an archive of more than 180 billion Twitter messages.
More unsettling than the idea of a comprehensive Twitter archive, a single data-aggregating company,
Acxiom, now maintains a profile containing some 1,500 data points on each of nearly 190 million people.
That database accounts for nearly 126 million households in the United States, and about 500 million people
worldwide. Acxiom processes more than 50 trillion data “transactions” a year, and they are only one of
thousands of data aggregators that collect and sell personal data.
This digital torrent is not limited to just the United States and Europe; 70% of all digital data is already being
generated outside the United States, and by 2020, the Asian data market alone will be producing more digital
data than the United States and Western Europe combined. And this mass digitization process is only just
getting started; 90% of all the world’s digital data has been produced in the past two years, and the rate of data
generation is growing steadily at 50% year-on-year. That means that there will be nearly 800% more digital
data being produced and stored by 2020 than there is now.
The second important feature of Big Data is that it comes from a variety of data sources: online Internet
searches, phone recordings, GPS, social media, a car’s diagnostic systems. And from thousands of other
sensors and self-reporting components that are increasingly a part of our world.
12
To put this in perspective, consider for a moment the amount of digital data that individuals create every day
not just the web sites visited, or the output from Twitter accounts, or text and e-mail messages. Also include
all the data that is generated on the job through enterprise systems, presentations, and forwarded and group e-
mails. Then think about all the online activity being logged, tracked, and saved in some way.
But much of the data being produced and collected today worldwide (probably as much as 90%) emanates
from videos, Internet search tracking, customer service phone calls, and other sources of digital data that is
only in a semi structured or unstructured format, which makes search and retrieval using our conventional
storage, database, and business intelligence technologies much more difficult. Below figure reflects the
relative growth between structured and unstructured digital data. One of the fundamental contentions of those
who see Big Data as a unique new phenomenon is that we need to keep all the data we produce, because it is
only when we apply algorithms to a full and complete universe of a single large data set, that computers can
discern new patterns or correlations that otherwise would remain invisible.
This reflects the number-crunching origins of Big Data in science and engineering, and the assumption that
the data in that complete universe of a single, large data set, is clean, uncorrupted, and relevant. Obviously, if
90% of that data comes from such a wide variety of sources and in such varied formats, ensuring that we have
a usable data set for analysis is much more difficult.
13
And if we are going to deal with these large data sets, given the sheer volume of data being produced and
made available from the consumer and industrial spheres most of which is in an unstructured format we need
to change our conventional approach to data management.
This brings us to the third important feature of Big Data: the new tools and technologies that now allow us to
store and analyse that data in ways that can help draw correlations and conclusions about everyday activities
customer preferences, political positions, purchasing patterns, and personal health in ways that weren’t
possible in the past. This is what makes Big Data different from just “more data” the ability to apply
sophisticated algorithms and powerful computers to large data sets to reveal correlations and insight previously
inaccessible through conventional data warehousing or Business Intelligence tools.
These Big Data tools consist broadly of new storage systems (mostly cloud computing) and new search and
analytical tools such as Hadoop and other MapReduce-type technologies that allow storage and analysis of
massive amounts of data from many different formats. Technologies that had their origins in the enormously
powerful search engines Yahoo! and Google have revolutionized the way we search the Internet. We look at
all these things more carefully throughout the book, but the important thing to note is that for the first time
these types of technologies for collection, storage, search, and analysis are becoming democratized made
available to organizations of any size through a wide variety of cloud-based offerings and enterprise software.
Part of the reason that the Big Data phenomenon has captured the imagination of the business world is because
now almost anyone can get a piece of the Big Data action.
14
CHAPTER 3
Big Data and the Battle for Control of the Consumer Internet
Consider what it was like in the early 1990s, just 25 short years ago. Back then, the information technology
marketplace was still almost exclusively the domain of the business world. There were still only a few token
rivals to IBM in the mainframe marketplace ICL, Amdahl, Olivetti mostly small, nationally sponsored rivals
to Big Blue. The IT world consisted mostly of mainframes and batch work; companies had only begun,
reluctantly, to move toward distributed computing. Most importantly, data management access to digitized
data was still pretty much exclusively in the hands of the hardware and software giants. Even in the early
1990s, most of the PCs in the world were owned and used by businesses. And even though most computing
involved large companies and large mainframe computers, data storage and memory were still infinitesimally
small by today’s standards, mostly recorded on reels of tape. In 1986, 99% of all data storage was analogy.
Four years later, only 3% of the world’s information was kept on emerging digital technologies like optical
disks or hard drives. 1990 was also the year that Tim Berners-Lee invented something he called a networked
hypertext system, or the World Wide Web, for government and academia. The public wouldn’t have access to
it for another year, and the nascent Internet then bumped along slowly for several years, impeded by dial-up
connections running at 56 kilobits per second, bland web sites, and limited bandwidth. Apple made personal
computers, but no one could have imagined the iPod, much less the iPad or smartphone. Google wouldn’t even
register its domain name for another seven years. No one had even dreamt of technologies like Facebook or
Twitter (Mark Zuckerberg was six years old). Even as late as 2000, for most people the Internet was still just
a frustrating curiosity. Discussions about the Internet revolution at that time cantered on how we were moving
from Web.1 (static web pages) to Web.2 (interactive web pages).
15
On the consumer media side in the early 1990s, we had audiocassettes for music and VHS format VCRs for
films. Almost no one had a mobile phone, and those who did carried them around with heavy battery packs.
Mostly driven by an innovative market in the Nordic countries, they were cumbersome first generation (1G)
systems based on analogy technology.
By 2000, with the boom in personal computers and the rise of Apple, Dell, HP, Compaq, and Acer, we began
to see the rapid commercialization of digital technology and for the first time no business users people in their
homes began to take control of the creation and ownership of digitized data. IT was shifting away from the
impersonal world of business to the personal world of the “me-machine” technology that could benefit
individual consumers. In a matter of a few short years, the combined take-up of home-based PCs, music CDs,
and the digital camera all combined to drive home digital storage away from the original floppy discs and into
the improved 3½-inch disks (which could hold up to a mind-boggling 20 MB). These would soon give way to
DVDs; hard, flash, and zip drives; and memory cards that captured and stored personal digital data in quantities
inconceivable only a few years before. By 2002, digital storage passed analogy data storage for the first time
most of that driven by digital data being created by no business users.
16
Flash forward now to 2015, and the world looks very different. The Internet, with its high-capacity fibre-optic
cables, routers, and data centres, has expanded into the remarkable global, public, and private network of
networks that it is today. Three billion people are now on the Internet worldwide, and more than seven billion
people use mobile phones. Facebook boasts 1.2 billion users worldwide. And although IT and the Internet are
a core feature of most businesses today, an enormous market for IT, mobile, telematics, and Big Data-related
products and services in 2015 lies not only with businesses but also with consumers old, young, married,
single, rich, poor, rural, and urban all around the world, who are surfing the Web; watching streaming TV
programs, films, and videos; playing games; emailing; texting; posting photos; and buying things online. All
this, increasingly, is being done on a mobile device like a tablet or a smartphone. That is a different paradigm
from 1990, but it is also a different paradigm from 2000, or for that matter, 2010. If current rates of storage
improvement continue, a micro-memory card in 2050 could conceivably have the storage equivalent of three
times the brain capacity of the entire human race.
17
Of the once-dominant IT manufacturers, only IBM survived relatively unscathed among the mainframe
manufacturers of the day, and it is now a global services company mainly focusing on analytics and cloud-
based applications and storage (to a large extent in support of consumer Big Data). Once feared because they
were so large and powerful, IBM is, like the rest of the business-focused IT and software houses, in some way
only a bit player, important and supportive, but following the major Internet technology companies on
the ever-expanding consumer Internet side of the market. Not even the PC makers, such as Dell or HP, who
for a fleeting moment potentially owned the gateways the user interface to the Internet, managed to become
dominant players in the consumer Internet marketplace.
This is in part because the consumer Internet and the data that flows over it are not fundamentally about
sophisticated technology hardware or software, or even networks. And, as we’ll see, the Internet is less and
less about PCs. All these groups essentially missed the train on the consumer Internet side, and for the most
part remain focused on the industrial Internet, with its occasional intersections and overlaps with the consumer
data world through the Internet of Things.
To get into the world of Facebook and Twitter with IPOs worth billions of dollars companies need to have a
unique way of accessing and controlling Big Data on the consumer side. That’s where almost all the
commercial momentum lies, and that’s why the big, traditional hardware and software services groups are
finding it difficult to elbow their way back into the mainstream of that market. For many, the best that they
can hope for is to provide the tools necessary to help store and analyse consumer-derived digital data. In some
ways, it’s a real comedown for the once mighty giants of IT.
18
And that’s also why, for the most part, when the press, politicians, businessmen, Wall Street investors, and
venture capitalists talk about Big Data, they are talking about big consumer data, not big business data.
So, if the Big Data market and the consumer Internet do not belong to IBM or SAP or Oracle, who does it
belong to? The key rivals for control of the consumer Internet today are just a handful of innovative, popular,
well-run companies, names we have all come to know:
• With 215 million active account holders, and sales growing at around 20% a year,2 Amazon has
become the largest online retailer in the world. Amazon also produces mobile devices, including the
Kindle e-books and the Kindle Fire tablet, has begun creating original video content, and provides
online payment services through its “login and pay” system and Simple Pay, a PayPal-like service that
allows consumers to use their Amazon account platform to pay for purchases on other web sites. They
compete with Netflix and Hulu for steaming video through the Amazon Prime offering and Fire TV,
an Android-powered instant video streaming and gaming set-top box. Amazon recently announced the
release of its own smartphone, running on a customized Android operating system. These various
offerings had a combined revenue of more than $50 billion in 2013. The company is also the world’s
largest cloud computing infrastructure holder, with Amazon Web Service generating $1.8 billion in
revenue last year, and with what Gartner estimates to be five times the capacity of 14 other cloud
computing companies (including IBM) combined. Its Web Services (cloud) clients include
organizations as varied as Netflix, Shell, and the CIA. Amazon was valued at around $150 billion in
2014.
19
• At more than $500 billion in 2014, Apple, the most valuable company in the world, makes
personal computers and its OSX operating system software, as well as iPad tablets and iPhones, which
run its widely popular mobile operating system iOS. Apple also has the iPod portable digital music
players, tied into the Apple iTunes store, where customers can download digital music and video, and
the Mac App store for software, apps, and peripherals. It also has Apple TV, a set-top box for
downloading iTunes, and iWatch, their long-awaited and still unannounced wearable computer. The
Apple iPhone’s fingerprint sensor system, Touch ID, opens the gateway for its own online purchasing
platform. Apple also has its own cloud storage service, iCloud, which allows all Apple products to be
managed online without a computer, as well as a cloud storage service for documents, music, movies,
TV programs, or books. The company recently announced its intention to move into the household
Internet of Things with its home automation ecosystem offering.
• Still the predominant search engine with both Google and the Google Chrome browser, and number
one in digital advertising, Google has a market capitalization of somewhere around $400 billion.
Google provides its popular Gmail, Google+, a dominant news filter, travel scheduling, Google maps,
traffic and location tracking, online shopping, a social network, video and photo sharing, YouTube,
and a messaging system. The company also makes Google e-readers, owns the Android operating
system, and partnered with Samsung to produce the Galaxy mobile phones and tablets. In the home,
Google has a TV set-top box and has invested heavily in Google Fibber, its high-speed fibre-optic
service scheduled for installation in 34 US metropolitan areas. It also has investments in satellites and
wind farms and own several drone design and manufacturing companies, the Nest home monitoring
system, six massive data centres, and the second largest cloud computing service in the world.
• At a market valuation of $175 billion, Facebook stands out among the large Internet tech companies
vying for control of the consumer Internet, because of its massive user base, ready cash supply, and its
willingness to build on the same key elements of its consumer Internet platform. Facebook has its
social network, of course, with more than 1.28 billion users worldwide, and it moved into the mobile
marketplace, modifying its desktop software for both the Android and iOS mobile markets, and with
apps, including a wide variety of alliances that provide music and video downloads.
20
• Microsoft has its globally dominant Windows software, and in July 2014 announced its intention of
combining its mobile and desktop operating systems after moving into the world of mobile hardware
with the recent purchase of Nokia. Microsoft has the Bing search engine (which also powers Yahoo!
search); Azure, its cloud analytics and storage platform; Skype; the social network Yammer; and
ownership of 18% of the MSNBC cable news channel.
• A relative newcomer to the big league, Twitter is now valued between $25 and $70 billion due to wild
share price fluctuations. Twitter boasts more than 250 million users worldwide. It has moved into
mobile advertising through its MO Pub advertising exchange and is experimenting with video sharing.
• With more than 270 million users, Yahoo! is still a consumer Internet giant. It took some time for the
company to realize that it could no longer attract users simply through a news and entertainment web
portal, and it recently spent $200 million on nearly 20 new start-ups, including its purchase of Tumblr,
the blogging network, for $1.1 billion. Unlike Google, Facebook, Amazon, or Apple, Yahoo! doesn’t
make mobile devices or control them through an operating system. It does, though, have an Internet
advertisement sales system (known as Panama), and offers a variety of online shopping services such
as Yahoo Shopping and Yahoo Travel, and has a faithful following and a strong presence globally.
21
It would be wrong, though, to think of all this as a purely Western phenomenon; challenges to dominate the
consumer Internet aren’t just coming from Silicon Valley. There is also India ART (India) and ECPlaza (South
Korea). And with Taobao, its billion-product eBay-like online shopping site, and Alipay, its online payment
service, China’s Alibaba had sales of $170 billion in 2012, making it a rival to both Amazon and Google.
Weibo provides a service like Twitter and plans to list as an IPO in New York with a valuation of $7 to $8
billion. Baidu, the Chinese-language search engine and encyclopaedia, also provides video music and
multimedia files and is moving quickly into mobile search. Tencent, the massive Chinese investment company,
offers everything from online shopping and games to instant messaging. It has a market capitalization of $157
million, making it larger than companies like McDonald’s, Boeing, and Cisco. These companies are making
headlines in the financial and business press every day, spending vast amounts of money to buy up the most
important new technologies that will give them the ability to control the flow of data to and from consumers
on the Internet.
The Alibaba group is China’s largest online business conglomerate, combining many similar elements of
today’s Western Internet giants: It has an Internet-based news and information portal, an online shopping
search engine, consumer online shopping sites (Taobao and Tmall) that rival Amazon and eBay, and a massive
business-to-business trading portal (Alibaba.com) to allow Chinese companies to trade with overseas partners.
It also owns Alipay, an online payment system, and recently began offering a wide variety of cloud computing
services.
22
Tencent is its chief rival in China, with a broad portfolio of Internet-based services including Internet and
mobile phone services, instant messaging (Tencent QQ), and the mobile chat service (WeChat), which already
has 355 million users including 100 million users outside China. Tencent also has a social networking site that
earned $2 billion more than Facebook in 2012, a web portal, online shopping, several powerful online gaming
companies, and a cloud-based storage facility.
Tencent is listed on the Hang Seng Index, and Alibaba is hovering for a much-anticipated listing in New York.
The pattern is probably already obvious. Each of the groups listed previously as diverse as they are from
Facebook to Apple is seeking a strong portfolio of offerings that will help them capture the gateway to the
Internet with a platform that includes some, or all, of the following:
• An attractive core function funded predominantly through the collection of customer data and the sale
of digital advertising
• A focus on mobile devices, including mobile hardware, and if possible, a close affiliation with, or
ownership of, a mobile operating system
• A multitude of apps and a competitive arrangement with other major operating systems (iOS, Android)
23
This comprehensive approach owning or controlling offerings in as many of these key areas as possible allows
these companies not only to leverage their core services but also to intercept and collect as much consumer
data as possible from as many different sources as possible. Becoming the principal interface between the
consumer and the Internet gives these companies the ability to dominate the crucial gateways to our lives:
telecoms, the media (books, music, TV, gaming), Internet search, online shopping, banking, and data storage.
We have to turn to this handful of Internet Tech companies and examine what they are doing in these important
areas in order to understand why Big Data as a topic and a movement is so powerful, because what these
companies are doing not only excites investors, but it also makes civil libertarians blink in horror at the sheer
size and span of influence that these few companies will soon have over our everyday lives.
Although these powerful new Internet technology companies are all now competing for this same broad set of
online consumer services, their origins are diverse: from Internet search to smartphones, selling online books
to providing social media and chat apps. A few years ago, not many people would have thought that Apple,
Microsoft, Google, and Facebook would be competing in the same market to control the gateways to the
consumer Internet. So, what do all these Internet tech companies have in common that makes them stand apart
from other powerful companies? The answer, obviously, is that they all have access to huge amounts of
personal customer data and are good at collecting and analysing that personal data to sell their products and
services (Apple and Amazon) or to provide targeted advertising (all the rest). This ability to capture and
manipulate customer data provides them as diverse as their core offerings are with an underlying business
model that is unique and different from the former Fortune 100 powerhouses: IBM, SAP, GE, Procter &
Gamble, and so on. In short, these Internet tech companies, whatever else they are about, are all about
collecting Big Consumer Data.
24
25