Overview of Big Data

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4
At a glance
Powered by AI
The key takeaways are that big data refers to massive amounts of data that are difficult to process using traditional data processing applications. Big data has characteristics such as volume, velocity, variety, veracity, variability and value.

The characteristics of big data are volume, velocity, variety, veracity, variability and value. Volume refers to the large amount of data. Velocity is the speed at which data is generated and processed. Variety means data comes in all types of formats. Veracity is about the unreliability of some data sources. Variability refers to changes in the rate of data flow. Value means low value density initially but high value after analysis.

The challenges of big data include needing special infrastructure, trained workforce, security and privacy issues since data is stored remotely and accessed over networks.

International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248

Volume: 4 Issue: 1 322 – 325


_______________________________________________________________________________________________
Overview of Big Data

Kavita D. Zinjurde Sujata S. Magare Sandeep S. Parwe


Asst. Professor: Department of MCA Asst. Professor: Department of MCA Asst. Professor: Department of MCA
Jawaharlal Nehru Engineering College Jawaharlal Nehru Engineering College Jawaharlal Nehru Engineering College
Aurangabad, Maharashtra, India Aurangabad, Maharashtra, India Aurangabad, Maharashtra, India
[email protected] [email protected] [email protected]

Abstract— In this paper we have focused on basics of Data and Big Data, characteristics and yet challenges of Big data. Big data can store
massive amount of data, we have described approaches and the architecture that are used to store data. Nowadays Big Data is used in various
subject area like pattern recognition, medical, commercial industries and social networking.

Keywords- BigData;Hadoop; MapReduce;Veracity; Veriability


__________________________________________________*****_________________________________________________

I. INTRODUCTION there is storage which is nothing but the place where all data is
get stored. And Size of these storage is depends on the
Big data can be used in every application where large company from where you are taking/hiring the storage [5].
amount of massive data is needed and which demands
innovative computational methodologies and more powerful A. Data Measurement
tools. With rapid growth of data analysis of knowledgeable
data have become more challenging [1].
Big data demands special infrastructure and trained
workforce. Trained workforce is needed to take benefits of Big
data analysis. Security and privacy are challengeable issues.
Data is physically stored in a different place and can be access
through a network [2]. Big data generally includes Traditional
Enterprise data, Machine generated or sensor generated data
and Social data [3].
II. DATA
This Data is a special kind of information which has been
derived/generated from our/someone’s experiences, with sense,
with intellectuals, with knowledge& with understanding. Data
has always been required for proper Decision Making. On the
basis of available data one can take a decision for his further Figure 1. Pyramid of Data Measurement
task.
Data has always been everywhere and there has every time Byte= A single letter, like "A."
been a need for storage, processing, and management of data,
meanwhile the beginning of human evolution. Still, the amount Kilobyte=A 14-line e-mail. A pretty lengthy paragraph of text.
and type of data captured, stored, processed, and managed
depended then and even now on various aspects including the Megabyte= A good sized novel. Shelley's "Frankenstein" is
requirement felt by humans, available tools/technologies for only about four-fifths of a megabyte.
storage, processing, management, effort/cost, and ability to
increase understandings into the data, make decisions, and so Gigabyte=About 300 MP3s. About 40 minutes of video at
on. DVD quality (this varies, depending on maker). A CD holds
To capture, store, and process the data has enabled human about three-fourths of a gigabyte.
beings to propagate knowledge and research from one
generation to the next, so that the next generation does not have Terrabyte=statistically, the average person has spoken about
to re-invent the pulley. this much by age 25.
The capacity of data storage has been increasing
dramatically, and today with the availability of the cloud Petabyte= the amount of data available on the web in the year
infrastructure, possibly one can store unlimited amounts of 2000 is thought to occupy 8 petabytes.
data. Today Petabytes and Exabytes of data is being generated,
captured, processed, stored, and managed[4]. Exabyte= in a world with a population of 3 billion, all
If the data is storing is live i.e. continuously running data information generated annually in any form would occupy a
example, Facebook data where user are recurrently sending single exabyte.
likes, comments, uploading/sharing photos, videos etc. these all
things are get stored on the server. At the backend of the server
322
IJFRCSCE | January 2018, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 4 Issue: 1 322 – 325
_______________________________________________________________________________________________
Zettabyte= Three hundred trillion MP3s; Two hundred billion remarkable amounts of geospatial (e.g., GPS) data, such as that
DVDs. produced by cell phones, that can be used by applications like
Four Square to assist you know the locations of friends and to
Yottabyte=???????? [6] receive offers from nearby stores and restaurants. Applications
such as recognition systems in security systems can be
III. BIG DATA analyzed data like Image, voice, and audio.
As the name signifies it consist of huge amount/volume of C. Big Data Usages
heterogeneous data which is being generated at high speed.
 To understand pattern like analyze pattern for a
This high speed generating data cannot be managed and
analytics
processed using traditional data management tools and
application at hand. A Big Data is an emerging technology  To understand behavior of people for example, In
which can deal with high speed generating huge volume of medical industry for patients under observation for
complex data with the use of a new set of tools, applications unknown diseases.
and frameworks to process and manage the data [7].
For example, if we construct one house with single labor in  Share market
20 days and if we construct same house with 20 labors so it will  In bank transaction to analyze suspicious behavior
take 1 day to construct it. Same thing is there in big data by
adding or using parallel resources. You can reduce the  Many more notifications on social sites like Facebook,
processing time of big data to analysis for decision making. twitter etc. & many more [8]
A. Why Big Data D. Storage Component and Processing
To perform deep Analytics it is necessary to integrate Big The Data acquisition, data organization and data analysis
data with the organizational data [3]. Following are some are stages that are required to carry out infrastructural platform.
reasons that fascinate industries towards big data shown in fig. Data Acquisition refers to collection data from different
2. Table I shows reasons and the benefits of using big data sources. Data may be in any format and size; it has to be
organizing in such manner so that it could be easy to extract
knowledgeable and useful information from such diverse data.
Data organization and Analysis does this work. Data is
uniformly stored and can be extractable for analysis purpose.
Statistical method and Data mining techniques is used to
achieve this task.
 A big data is stored with special kind of technique
known as HDFS i.e. Hadoop Distributed File System,
which is storage component of Hadoop. HDDS
provides high performance across data clusters[9].
 Normally you stored your data in txt.file, .xml files,
.ldb, .mdf.
Figure 2. Motives for Big Data
 Processing part of Big Data is done by MapReduce
TABLE I. REASONS AND BENEFITS OF BIG DATA (Business Logic)

Reasons Big Data Benefits


Increase immediate insights from different 1) Architecture of Storing Data
Well-timed
data sources
Improvement of business performance through Fig.3 shows Traditional architecture and fig.4 shows
Enhanced Analytics
real-time analytics Master-Slave Architecture
Massive amount of data Huge amounts of data can be managed
Unstructured and semi-structured data helps to
Comprehensions
provide better Perception
Supervisory Decision-
Risk analysis helps to moderate risk
Making

B. Big Data Sources


Big data has many sources that includes A large stock
exchange, Video sharing portal (like YouTube, Video, Daily
motion etc.), Social networks (like Instagram, LinkedIn,
Facebook, and Twitter etc.), Network sensors, Web pages, text
and documents, Logs (such as Web logs, System logs, Search
index data) etc.
Each mouse click on a web site can be captured in
Web log files and analyzed in order to better understand Figure 3. Traditional Client Server Architecture
shoppers’ buying activities and to encourage their shopping by
vigorously endorsing products. Social media sources generate The Master-Slave Architecture consist of Name Node and
tremendous amounts of comments, likes and tweets. There is Data Node
323
IJFRCSCE | January 2018, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 4 Issue: 1 322 – 325
_______________________________________________________________________________________________

Figure 5. Six V’s in Big Data

4) Veracity:
Figure 4. Modern Master Slaves Architecture
Veracity signifies the unreliability characteristic in some
 NameNode (Master):- Name node works on Master sources of data. For example, data generating from social
system or in other word we can say that Name node media are uncertain in nature. However they contain valuable
work as a Master node. Name node is responsible for information. Thus the requirement to deal with uncertain data is
managing Meta data which consist of controls access another aspect of big data, which is handled using various tools
to file system by the clients, file system namespace, developed for management and mining of uncertain data.
records of the DataNodes, replication factor and 5) Variability (and complexity):
confirms that it is continuously maintained. Variation in the rate of flow of data is referring as
 DataNodes (Slave):- Data node works on slave Variability. Complexity is all about to the fact that big data are
system. The main task of data node is to follow the spawned through a numerous of sources. This enforces a
instruction coming from Master or name node. You critical challenge: the need to link, match, cleanse and
can read/write data from/to data node. Actually transform data received from various sources.
whatever data receives from user is stored on 6) Value:
datanode in the form of blocks. DataNode is unaware Value definition, big data are characterized by relatively
about the location on which the block it is stored. As a “low value density”. That is, the data received in the original
outcome of this, if the NameNode crashes then the form usually has a low value relative to its volume. However, a
data in HDFS is unusable as only the NameNode tells high value can be obtained by analyzing large volumes of such
which blocks belong to which file, where each block data [12].
located exactly etc. [10]
IV. CHALLENGES OF BIG DATA ACKNOWLEDGMENT
As data is increasing day by day, the management of data The Authors are grateful to Department of MCA,
and Integration of data from the different sources have has Jawaharlal Nehru Engineering College, Aurangabad. The
become a biggest challenges [11]. Big data infrastructure is authors would like to thank all the faculty members and staff
required to support statistical method and data mining and also to the Institute Authorities for providing the
techniques. Fig.5 shows characteristics and yet challenges to infrastructure to carry out the research.
Big Data.
REFERENCES
1) Volume: [1] Feng Xia,Wei Wang, Teshome Megersa Bekele and Huan Liu,
Volume can be described as data at rest. Currently the data “Big scholarly data: A survey”, IEEE Transactions on Big Data,
gathering is in petabytes and is assume to be increase to vol. 3, Issue 1, pp.18-35, March 2017
zettabytes in immediate future. The social networking sites, E- [2] D. R. Luna, J.C Mayan,M.J. García A.A. Almerares and M.
Commerce sites, banking sites etc. are themselves generating Househ,“Challenges and Potential Solutions for Big Data
data in order of terabytes every day and this amount of data is Implementations in Developing Countries”, NCBI, Yearbook
no doubt difficult to be handled using the present traditional Med Inform. 2014, 9(1), pp.36–41. Published online 2014 Aug
systems. Machine generated data are larger in terms of
15, doi: 10.15265/IY-2014-0012
traditional data [3].
2) Velocity: [3] Oracle White Paper, “Big Data for the Enterprise”, pp.1-16, June
2013
Velocity is all about data in motion (Stream data within [4] Wikibon,“Big Data Statistics”,http://wikibon.org/blog/big-data-
millisecond to respond). Velocity talks about the rate at which statistics, in Analytics and Bigdata [Accessed on Jan 03, 2018]
data are produced and the speed at which it should be analyzed [5] Dattatrey Sindol, “Introduction to big data”,
and acted upon.
https://www.mssqltips.com/sqlservertip/3132/big-data-basics--
3) Variety:
part-1--introduction-to-big-data, [Accessed on Jan 02, 2018].
Continuous flow of heterogeneous data is called as Variety, [6] http://www.hjo3.net/bytes.html, [Accessed on Jan 02,2018]
Which consist of structured (tabular data), unstructured (Text, [7] Ibrahim Abaker et al,“The rise of big data on cloud computing:
images, audio, and video) and semi-structured (XML) data. Review and open research issues”, ELSEVIER, Information
Technological advances permits organizations to use variety Systems 47, pp. 98–115, 2015
structured, semi-structured, and unstructured data.

324
IJFRCSCE | January 2018, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________
International Journal on Future Revolution in Computer Science & Communication Engineering ISSN: 2454-4248
Volume: 4 Issue: 1 322 – 325
_______________________________________________________________________________________________
[8] “Application of Big Data in Real Life”, Article in Intellipaat,
https://intellipaat.com/blog/7-big-data-examples-application-of-
big-data-in-real-life, [Accessed on Jan. 8 ,2018]
[9] Fazlur Rahman, “Beginners Guide- Introduction of Big Data
Hadoop”, Article in Codeproject, Jan 2017
[10] Mohd Rehan Ghazia, Durgaprasad Gangodkara , “Hadoop,
MapReduce and HDFS: A Developers Perspective” , Published
by Elsevier B.V., International Conference on Intelligent
Computing, Communication & Convergence(ICCC-2015),
1877-0509, pp.45-50,2015
[11] Hammond WE, Bailey C, Boucher P, Spohr M, Whitaker
P., “Connecting Information To Improve Health”, Health Aff
(Millwood) , 29(2), pp. 284–288, Feb 2010
[12] Amir Gandomi, Murtaza Haider,” Beyond the hype: Big data
concepts, methods, and analytics”,ELSEVIER,International
Journal of Information Management 35 , pp. 137–144, 2015.

325
IJFRCSCE | January 2018, Available @ http://www.ijfrcsce.org
_______________________________________________________________________________________

You might also like