A Review On Big Data Privacy and Security
A Review On Big Data Privacy and Security
A Review On Big Data Privacy and Security
ISSN 2229-5518
Abstract— Big data is a convergence of new hardware and algorithms that allow us to discover new patterns in large
data sets-patterns we can apply to making better predictions and, ultimately, better decisions. Today, organizations are
putting Big Data into practice in such diverse fields as healthcare, smart cities, energy and finance.. Big data can be
characterized by 3 V’s i.e., Volume, Velocity, Variety. Big data analytics is the term used to describe the process of
researching massive amounts of complex data in order to reveal hidden patterns or identify secret correlations. However,
there is an obvious challenge between the security and privacy of big data and the widespread use of big data. This
paper focuses on privacy and security concerns in big data, differentiates between privacy and security and privacy and
requirements in big data. There have been a number of privacy-preserving mechanisms developed for privacy protection
at different stages (for example, data generation, data storage, and data processing) of a big data life cycle. The goal of
this paper is to provide a major review of the privacy preservation mechanisms in big data and present the challenges for
existing mechanisms. This paper also presents recent techniques of privacy preserving in big data like differential
privacy, privacy preserving big data publishing. This paper refer privacy and security aspects healthcare in big data.
Comparative study between various recent techniques of big data privacy is also done as well.
—————————— ——————————
IJSER
ig Data [1, 2] which consists of both
B structured and unstructured data which is
in bigger in size is flooded to data servers by
organizations in terms of billions of bytes on
day-to-day basis. Due to recent technological
development, the amount of data generated by
internet, social networking sites, sensor
networks, healthcare applications, and many
other companies, is extremely increasing day by
day. All the expansive measure of data produced
from various sources in multiple formats with
very high speed [3] is referred as big data. The
term big data [4, 5] is defined as “a new
generation of technologies and architectures,
designed to economically separate value from
very large volumes of a wide variety of data, by Fig.(1): Big data life cycle stages, i.e., data
enabling high-velocity capture, discovery and generation, storage, and processing are shown
analysis”. On the premise of this definition, the above
properties of big data are reflected by 3V’s,
which are, volume, velocity and variety. Thus, To handle various measurements of big data in
veracity, validity, value, variability, venue, terms of volume, velocity, and variety, there is
vocabulary, and vagueness were added to make need to design efficient and effective frameworks
some complement explanation of big data [6]. A to process expansive measure of data arriving at
common theme of big data is that the data are very high speed from various sources. Big data
diverse, i.e., they may contain text, audio, image, needs to experience multiple phases during its
or video etc. This differing qualities of data is life cycle.
signified by variety. In order to ensure big data
privacy, various Smart energy big data analytics is also a very
complex and challenging topic that share many
common issues with the generic big data
analytics. Smart energy big data involve
extensively with physical processes where data
IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 130
ISSN 2229-5518
intelligence can have a huge impact to the safe Security Security is the practice of defending
operation of the systems in real-time [7]. This can information and information assets through the
also be useful for marketing and other use of technology, processes and training from:-
commercial companies to grow their business. Unauthorized access, Disclosure, Disruption,
As the database contains the personal Modification, Inspection, Recording, and
information, it is vulnerable to provide the direct Destruction.
access to researchers and analysts. Since in this
case, the privacy of individuals is leaked, it can Privacy vs. security Data privacy is focused on
cause threat and it is also illegal. “Privacy and the use and governance of individual data—
security concerns” section discusses of privacy things like setting up policies in place to ensure
and security concerns in big data and “Privacy that consumers’ personal information is being
requirements in big data” section covers the collected, shared and utilized in appropriate
Privacy requirement in big data. “Big data ways. Security concentrates more on protecting
privacy in data generation phase”, “Big data data from malicious attacks and the misuse of
privacy in data storage phase” and “Big data stolen data for profit [8]. While security is
privacy preserving in data processing” sections fundamental for protecting data, it’s not
discusses about big data privacy in data sufficient for addressing privacy.
generation, data storage, and data processing
Phase.“Recent Techniques of Privacy Preserving
3 PRIVACY REQUIREMENTS IN BIG
in Big Data” section presents some recent DATA
techniques of big data privacy and comparative
Big data analytics draw in various organizations;
study between these techniques.
a hefty portion of them decide not to utilize these
IJSER
services because of the absence of standard
2 PRIVACY AND SECURITY
security and privacy protection tools. These
CONCERNS IN BIG DATA
sections analyse possible strategies to upgrade
big data platforms with the help of privacy
Privacy and security is an important issue. Big
protection capabilities.
data security model is not suggested in the event
of complex applications due to which it gets Businesses and government agencies are
disabled by default. However, in its absence, generating and continuously collecting large
data can always be compromised easily. As such, amounts of data. The current increased focus on
this section focuses on the privacy and security substantial sums of data will undoubtedly create
issues. opportunities and avenues to understand the
processing of such data over numerous varying
———————————————— domains. But, the potential of big data come with
a price; the users’ privacy is frequently at danger.
• Y Rajeswari is currently pursuing Master of Computer Ensures conformance to privacy terms and
Applications in KMMIPS, Tirupathi, PH-0877-2289100. regulations are constrained in current big data
E-mail:[email protected]
• C Sreekanya is currently pursuing Master of Computer analytics and mining practices. Developers
Applications in KMMIPS, Tirupathi, PH-0877-2289100. should be able to verify that their applications
E-mail:[email protected]
• Dr K Venkata Ramana, Head of the Department Master of conform to privacy agreements and that sensitive
Applications in KMMIPS, Tirupathi, PH-0877-2289100. information is kept private regardless of changes
E-mail:[email protected]
in the applications and/or challenges privacy
regulations. To address these, identify a need for
new contributions in the areas of formal methods
and testing procedures. New paradigms for
Privacy Information privacy is the privilege to
privacy conformance testing to the four areas of
have some control over how the personal
the ETL (Extract, Transform, and Load) process
information is collected and used. Information
as shown in Fig. 2 [9, 10].
privacy is the capacity of an individual or group
to stop information about themselves from
becoming known to people other than those they
give the information to.
IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 131
ISSN 2229-5518
IJSER
This step does the representation of the data supposed to be shared, it refuse to provide such
loading process. At this step, the privacy data. If the data owner is giving the data
specifications characterize the sensitive pieces of passively, a few measures could be taken to
data that can uniquely identify a user or an ensure privacy, such as anti-tracking extensions,
entity. Privacy terms can likewise indicate which advertisement or script blockers and encryption
pieces of data can be stored and for how long. At tools.
this step, schema restrictions can take place as
4.2 Falsifying Data
well.
In some circumstances, it is unrealistic to resist
3.2 Map‐reduce Process Validation
access of sensitive data. In that case, data can be
twist out of shape using certain tools prior to the
This process changes big data assets to
data received by some third party. If the data are
effectively react to a query. Privacy terms can tell
twist out of shape, the true information cannot be
the minimum number of returned records
easily revealed. The following techniques are
required to cover individual values, in addition
utilized by the data owner to falsify the data:
to constraints on data sharing between various
processes.
• A tool Socket puppet is utilized to hide
3.3 ETL Process Validation online identity of individual by deception.
By utilizing multiple Socket puppets, the
Similar to step (2), warehousing a logical bases data belonging to one specific individual
for a course of action should be confirmed at this will be regarded as having a place with
step for concurrence with privacy terms. Some various people. In that way the data collector
data values may be aggregated unspecified or will not have enough knowledge to relate
excluded in the warehouse if that indicates high different socket puppets to one individual.
probability of identifying individuals.
• Certain security tools can be used to mask
3.4 Reports Testing individual’s identity, such as Mask Me. This
is especially useful when the data owner
Reports are another form of questions, needs to give the credit card details amid
imaginable with higher visibility and wider online shopping.
audience. Privacy terms that characterize
‘purpose’ are fundamental to check that sensitive
IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 132
ISSN 2229-5518
IJSER
have the ability to be configured dynamically to called as the privacy guard.
accommodate various applications. One
promising technology to address these Step 1 The analyst can make a query to the
requirements is storage virtualization, database through this intermediary privacy
empowered by the emerging cloud computing guard.
paradigm [15]. Storage virtualization is process
in which numerous network storage devices are Step 2 The privacy guard takes the query from
combined into what gives off an impression of the analyst and evaluates this query and other
being a single storage device. SecCloud is one of earlier queries for the privacy risk. After
the models for data security in the cloud that evaluation of privacy risk.
jointly considers both of data storage security
Step 3 The privacy guard then gets the answer
and computation auditing security in the cloud
from the database.
[16].
Step 4 Add some distortion to it according to the
6 BIG DATA PRIVACY PRESERVING evaluated privacy risk and finally provide it to
IN DATA PROCESSING the analyst.
Big data processing paradigm categorizes
The amount of distortion added to the pure
systems into batch, stream, graph, and machine
data is proportional to the evaluated privacy
learning processing [17]. For privacy protection
risk. If the privacy risk is low, distortion added is
in data processing part, division can be done into
small enough so that it do not affect the quality
two phases. In the first phase, the goal is to
of answer, but large enough that they protect the
safeguard information from unsought disclosure
individual privacy of database. But if the privacy
since the collected data might contain sensitive
risk is high then more distortion is added.
information of the data owner. In the second
phase, the aim is to extract meaningful
information from the data without violating the
privacy.
7 RECENT TECHNIQUES OF
PRIVACY PRESERVING IN BIG DATA
7.1 Differential Privacy
IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 133
ISSN 2229-5518
IJSER
overpowering businesses. Yet only a small
networks and mobile devices from which data
percentage of data is actually analysed. In this
might be gathered, the volume of such data has
paper, we have investigated the privacy
also increased over time. Privacy-preserving
challenges in big data by first identifying big
models broadly fall into two different settings,
data privacy requirements and then discussing
which are referred to as input and output
whether existing privacy preserving techniques
privacy. Much of the work in privacy has been
are sufficient for big data processing. Privacy
focused on the quality of privacy preservation
challenges in each phase of big data life cycle [7]
(vulnerability quantification) and the utility of
are presented along with the advantages and
the published data. The solution is to just divide
disadvantages of existing privacy-preserving
the data into smaller parts (fragments) and
technologies in the context of big data
anonymize each part independently [19].
applications. This paper also presents traditional
9 PRIVACY AND SECURITY ASPECTS as well as recent techniques of privacy
OF HEALTHCARE IN BIG DATA preserving in big data. In terms of healthcare
services as well, more efficient privacy
The new wave of digitizing medical records has techniques need to be developed.
seen a paradigm shift in the healthcare industry.
As a result, healthcare industry is witnessing an
increase in absolute volume of data in terms of
REFERENCES
complexity, diversity and timeliness. The term
“big data” refers to the a large group of many [1] Abadi DJ, Carney D, Cetintemel U, Cherniack
different things collected and complex data sets, M, Convey C, Lee S, Stone-braker M, Tatbul N,
which exceeds existing computational, storage Zdonik SB. Aurora: a new model and
architecture for data stream management. VLDB
and communication capabilities of conventional
J. 2003;12(2):120–39.
methods or systems. In healthcare, several [2] Kolomvatsos K, Anagnostopoulos C,
factors provide the necessary propulsion to Hadjiefthymiades S. An efficient time optimized
tackle the power of big data. The harnessing the scheme for progressive analytics in big data. Big
power of big data analysis and complete research Data Res. 2015;2(4):155–65.
with real-time access to patient records could [3] Big data at the speed of business, [online].
http://www-01.ibm.com/soft-
allow doctors to make informed decisions on
ware/data/bigdata/2012.
treatments. Big data will oblige security to [4] Manyika J, Chui M, Brown B, Bughin J, Dobbs
amend their predictive models. The real-time R, Roxburgh C, Byers A. Big data: the next
IJSER © 2017
http://www.ijser.org
International Journal of Scientific & Engineering Research Volume 8, Issue 5, May-2017 134
ISSN 2229-5518
IJSER
Prof. 2011;15(1):2–3.
[13] Sokolova M, Matwin S. Personal privacy
protection in time of big data. Berlin: Springer;
2015.
[14] Cheng H, Rong C, Hwang K, Wang W, Li Y.
Secure big data storage and sharing scheme for
cloud tenants. China Commun. 2015;12(6):106–
15.
[15] Mell P, Grance T. The NIST definition of cloud
computing. Natl Inst Stand Technol.
2009;53(6):50.
[16] Wei L, Zhu H, Cao Z, Dong X, Jia W, Chen Y,
Vasilakos AV. Security and privacy for storage
and computation in cloud computing. Inf Sci.
2014;258:371–86.
[17] Xu K, et al. Privacy-preserving machine learning
algorithms for big data systems. In: Distributed
IJSER © 2017
http://www.ijser.org