A Survey On Big Data Applications and Challenges
A Survey On Big Data Applications and Challenges
A Survey On Big Data Applications and Challenges
Abstract—Big data defines huge, diverse and fast growing data which Velocity: It defines the speed at which data comes. As per
requires new technologies to handle. With the rapid growth of data, the data collection strategies the data are coming at
big data has brought attention of researchers to use it in most different rate from multiple sources in any type. It
prominent way for decision making in various emerging applications. requires new algorithms for potential analysis of data.
These huge data is extremely useful and valuable for scientific The data increase at the rate of exponential.
exploration, increase productivity in business and improvement in
mankind. It helps from public sector to business activities, healthcare Variety: It indicates different kinds of data which can be
to better navigation, smart cities to national security. Though, with collected via sources like sensors, scientific results, health
large opportunities to work, the challenges are to handle these data is care, social media data and so on. This data can be
also increased. In this paper basic of big data with its application and classified as structure or unstructured format. Data can be
challenges have been discussed. These challenges are also inherent a text, audio, video, images, data logs and so on.
from verity, volume and velocity of data. However if we can manage
this issues related to big data then there will be potential improvement Value: It is the process of extracting valuable information
in quality of our lives. from huge set of data. It is important as it generates
knowledge for people and business.
Keywords—Big data, Challenges, Big data applications Veracity: It refers accuracy of collected information. Data
quality with its privacy is important for correct analysis.
I. INTRODUCTION Here, in section 2, different big data applications will be
discussed. In section 3, challenges of big data will be
In current time big data is booming word and many demonstrated. In section 4, different big data techniques will be
opportunities available for researchers. In many fields, big data shown.
is a revolutionary concept to analyze the data to get accurate
result and analysis in interest of human. Scientific research, II. APPLICATION OF BIG DATA
internet of things, e-commerce, health care and finance are some
of such application where huge data is produced and need to be Big data where huge and diversified data is being collected
handled correctly to extract knowledge out of it. and analyzed, to generate an accurate result, which helps human
for good decision making process. In this paper four fields will
Various definitions for big data are available from 3V to 4V be discussed where big data plays significant role to profitable
[2]. The concepts of big data initially was define using 3V decision making.
model in 2001 by Laney [1] as “high volume, high velocity and
high verity information assets that demand cost effective, A. Social network analysis
innovative forms of information processing for enhanced insight Social network analysis is one of the application of graph
and decision making.” theory to understand and classify relationship on a social
networks [3]. Currently each day social media generate huge
The updated definition of big data was given by Garther in amount of data which is difficult to handle by traditional
2012 as “Big data are high-volume, high- velocity and high- analytics algorithms and techniques like data mining and
variety information aspect that requires new format of machine learning [1]. Social media data are very useful to find
processing to enable enhanced decision making insight relationship between entities, for trust analysis, influence
discovery and process optimization” [2]. analysis, recommendation of any product or place (in
The concept of this V-model is described in below in brief association with e-commerce websites), link prediction, crime
[2][11]: detection and so on [4].
Volume: It indicates huge amount of data from any B. Health care
source of any type. However, the storage is a challenge. In today’s era it is essential to keep digitize data in health
This large volume of data benefit to obtain accurate care organization. This huge amount of data can be patients past
results. medical history, their X-rays, drug details for research, audio
surgery videos, health policies, and output data of medical solution is to upload it on cloud. However terabytes and zetta
equipment and so on. Here structured and unstructured data is bytes of data take ample amount of time to upload it on cloud.
involved and with big data analytics, doctor can find hidden Also due to the rapid nature of data it is not possible to use
knowledge, analyze patterns to advance treatment and more cloud for real time data [11]. So storage of these all data is key
accurate diagnosis, with reduced cost and less amount of time, challenge in big data.
can be done [5][7].
B. Heterogeneity
Big data analytics can improve the quality in healthcare, The data generated by user are heterogeneous in nature
with consideration of patient centric services, detection of whereas data analysis algorithms expect homogeneous data for
diseases at earlier stage, improvement in treatment methods and better processing and analysis.
monitoring quality of hospital, with derived knowledge [6].
In significance, data must be properly structured at the
C. Business and marketing beginning of analysis. Structured data is well organized and
Today almost every company generates large amount of data manageable. Unstructured data represents all kind of social
from customers’ profile, their purchase details, employees’ data network data, documents, reviews etc. [11]. Unstructured data is
and goods data [2]. These multi-national companies encounter costly to work with and also it is not feasible to convert all
big data problems as huge and diverse transactional data is unstructured data to structured data. This diversity in data is
involved which is difficult to understand in its raw form. challenge to handle and process.
By using big data analytics, company can find hidden Another issue is of Meta data which use to define other
patterns and knowledge which helps it to generate customer recorded data. These Meta data seems to be useless understands
centric strategies that eventually helps in benefit to company through data analysis pipelines [12].
[8].
C. Inconsistency, incompleteness and quality of data
D. Education Big data receives data for analysis, is coming from various
The fact that universities started digitizing massive amount sources with different reliability levels which may contain
of students’ data but not using it for education and admiration erroneous, uncertain and missing value data. Such data must be
improvement. By adaption of big data, it will be beneficial to managed before analysis process [10].
universities and their students both. By preserving and
analyzing records of existing and past students, prediction of On positive side, huge data helps to compensate missing
admission of current year can be done. Students’ problems can value, to find hidden relationship and inherent groups. This
be identified and solved with sentiment and behavioral analysis. captured data also used for decision making and prediction
analysis which leads to better results if the quality of data is
Here data can be gathered from history of work done by
students, their social media account or from their feedbacks [9]. good and genuine [11]. Business managers require more data
for analysis where as technical person requires useful data
With the use of student’s profile and analyzing it with other which can be practically store and analyze.
records, suitable educational course can be suggested to student
or even university can introduced new course based on trend So it is better to have a quality data, which helps to find
analysis. genuine and unbiased results rather than irrelevant massive data.
Quality of data is also important as it polarize the final result.
III. BIG DATA CHALLENGES For example bias feedback or random details filled by users will
not help to get quality results. It is issue to validate quality and
Big data brings many attractive opportunities and consistent data from irrelevant data.
application with a lot of challenges to handle. Here some
important challenges of big data are mentioned. D. Timeliness
A. Data storage Each day data is growing in exponential rate which needs to
be summarized, filtered and stored. However, for real time
Data does not come from vacuum. It comes from many applications, like fraud detection, social networks, intelligent of
sources by underlying activities, examples like web logs, social things, intelligent transport system and biomedical, timeliness
networks, sensors, scientific research, experiments which must be at top-most priority [2].
produce huge amount of data [10].
It is difficult to generate timeliness response when the
Raw data generated from source are too gigantic and out of volume of data is very huge. Considering the example, it is
all generated or captured data, everything is not always useful. important to analyze early detection of disease otherwise it is of
These collected data also of diverse in structured which need to no helpful to the patient. To find the common patterns, first it
be correlated for further processing. So initially all data need to requires time to scan whole dataset. For good traffic controlling
be stored for preprocessing. and navigation of vehicles, accurate and real time decision must
Apart from these captured data, few more data is be taken in timely manner. Even when the accident is done, no
automatically generated by system which is called Meta data accurate decision is of use.
that defines which type of data will be stored. The available E. Privacy
storage is not sufficient enough to store such massive data. One
Privacy of data is one of the foremost concern in big data. For example eBay has millions of users and billions of
Privacy preserving of patients’ data is essential as there is fear products sold every year which eventually generates huge data
of inappropriate use of personal data which might get revealed which is difficult to understand. So eBay turns to big data
when integrating such data from several other sources. Privacy visualization tool called “Tableau” which has capability to
of data is not only a technical but also a sociological problem convert complex data into pictures [2]. This helps employee to
[12]. conduct sentiment analysis and visualize relevance of goods and
customers.
For example, by observing movements and integrating
health history from multiple sources, may reveal health However it is difficult to control data visualization because
condition and identity of any patient which compromises user of the high dimensionality of data. Current visualization tools
personal data. Another major privacy concern is in location also have performance and scalability issues. It is important to
based service, where it requires user to share their location have better visualization of outcome as growing number of
which leads to leakage in privacy as using patterns of visited people wish to analyze the data.
location user’s identity may get revealed.
I. Fault tolerance
Each day massive data is generated on social media where With new technologies like cloud computing and big data, it
user shares their private information, locations, pictures and so is always requires that if failure occurs and damage is done then
on which not only reveal identity of users but also use for it must be in acceptable threshold rather than starting which
criminal activities and fraud. process from the scratch [11]. It is not possible to provide 100%
F. Security reliable fault tolerant machines or software. So, main task is to
reduce the rate of failure to an acceptable level.
To store the big data, it requires big repositories like servers,
data warehouses and even cloud. It is concern to provide Two methods are currently there to deal with it. First is to
security to these repositories as criminal groups are targeting divide the whole computation into individual independent task
these big data repositories to gain important and confidential which will execute on different nodes. One node is assign to
data which may include customer’s personal or financial details, observe if all other nodes are working properly or not. Though,
employees’ data or company’s secret. The major risk is at it is not always possible to divide whole task into independent
financial and healthcare sector [12]. task. In such case if failure occurs then it needs to restart whole
process. To avoid such scenario, checkpoint can be introduced
Apart from repositories, big data environment also has to save the state of system at defined time intervals. So even if
distributed nature that increase security concerns as it is more the failure occurs, computation can be restart from last
prone to attack. checkpoint.
G. Scalability
CONCLUSION
The first impression of big data is it size and which arise
most important challenge of scalability. Now we are in an era of digitization which generates
massive data every day. By considering and applying big data in
Researchers are working on algorithms that work and many sector, will increase productivity and decision making
generate accurate results even with increasing amount of data. capabilities. In this paper, applications and major challenges of
For big data analytical techniques, only incremental algorithms big data had been addressed where research can be done to take
have good scalability property but not all machine learning maximum advantage of these rapidly generated massive data for
algorithms can work with rapidly growing data [2]. improving quality of human life.
Apart from algorithms for analysis, infrastructure to REFERENCES
implement incremental data is also needed. Unfortunately,
parallel data processing techniques which were applied on [1] Gema Bello-Orgaz, Jason J. Jung and David Camacho “Social big data:
nodes cannot directly use for intra-node parallelism to process Recent achievements and new challenges”, Elsevier, Information Fusion,
data different architecture is required. 2016.
[2] C.L. Philip Chen and C.-Y. Zhang, “Data-intensive applications,
H. Visualization
challenges, techniques and technologies: A survey on Big Data”, Elsevier,
In big data we do not only consider system but also consider
Information Science, 2014.
human perspective. We must assure that human can properly
[3] Iva Sorić, Davor Dinjar, Marko Štajcer and Dražen Oreščanin, “Efficient
understand results and not get lost in massive data. For different
application special care must be taken for user to understand Social Network Analysis in Big Data Architectures”, MIPRO 2017.
and discover interesting patterns. As the way output is generated [4] Sancheng Peng, Guojun Wang, and Dongqing Xie, “Social Influence
for recommendation system will not be useful for output of Analysis in Social Networking Big Data: Opportunities and Challenges”,
scientific exploration as in scientific application new patterns IEEE network, 2016.
needs to be discover. The main objective of data visualization is [5] Mimoh Ojha and Dr. Kirti Mathur, “Proposed Application of Big Data
to represent knowledge effectively by with different graphs and Analytics in Healthcare at Maharaja Yeshwantrao Hospital”, IEEE 3rd
plots. MEC International Conference on Big Data and Smart City, 2016.