Achieving Availability - ijaeST
Achieving Availability - ijaeST
Achieving Availability - ijaeST
E-mail:[email protected]
Abstract--Cloud computing is an elastic execution environment of search engines, like Google, Yahoo, or Microsoft. The vast
resources involving multiple stakeholders such as users, amount of data they have to deal with every day has made
providers, resellers, adopters and providing a metered service at traditional database solutions prohibitively expensive [1].
multiple fine-granularities of the data access for a specified level Instead, these companies have popularized an architectural
of quality of service. To be more specific, a cloud is a platform or
paradigm based on a large number of commodity servers.
infrastructure that enables execution of code, services,
applications etc., in a managed and elastic fashion. This paper is Problems like processing crawled documents or regenerating a
a main objective of achieve an availability, elasticity, reliability of web index are split into several independent subtasks,
the data access in cloud system, data access when users outsource distributed among the available nodes, and computed in
sensitive data for sharing between owners and customers via parallel. In order to simplify the development of distributed
cloud servers, which are not within the same trusted domain as applications on top of such architectures, many of these
data owners. Elasticity is an essential core feature of cloud companies have also built customized data processing
systems and circumscribes the capability of the underlying frameworks. Examples are Google’s Map Reduce [2],
infrastructure to adapt to potentially non-functional Microsoft’s Dryad [3], or Yahoo!’s Map-Reduce-Merge [4].
requirements, elasticity goes one step further, tough, and does
They can be classified by terms like high throughput
also allow the dynamic integration and extraction of physical
resources to the infrastructure. Whilst from the application computing (HTC) or many-task computing (MTC), depending
perspective, this is identical to scaling, from the middleware on the amount of data and the number of tasks involved in the
management perspective this poses additional requirements, in computation [5]. Although these systems differ in design, their
particular regarding reliability is essential for all cloud systems programming models share similar objectives, namely hiding
in order to support today’s data centre-type applications in a the hassle of parallel programming, fault tolerance, and
cloud, it is considered one of the main features to exploit cloud execution optimizations from the developer. Developers can
capabilities. This paper is proposes to denotes the capability to typically continue to write sequential programs.
ensure constant operation of the system without disruption, i.e. The processing framework then takes care of distributing the
No loss of data, no code reset during execution etc. We can
program among the available nodes and executes each
achieve through redundant resource utilisation, many of the
reliability aspects move from a hardware to a software-based instance of the program on the appropriate fragment of data.
solution ,there is a strong relationship between availability and For companies that only have to process large amounts of data
reliability however, reliability focuses in particular on occasionally running their own data canter is obviously not an
prevention of loss (of data or execution progress). Availability of option. Instead, Cloud computing has emerged as a promising
services and data is an essential capability of cloud systems and approach to rent a large IT infrastructure on a short-term pay-
was actually one of the core aspects to give rise to clouds in the per-usage basis. Operators of so-called Infrastructure-as-a-
first instance. Fault tolerance also requires the ability to Service (IaaS) clouds, like Amazon EC2 [6], let their
introduce new redundancy in an online manner non-intrusively customers allocate, access, and control a set of virtual
with increasing concurrent access, availability is particularly
machines (VMs) which run inside their data centres and only
achieved through replication of data / services and distributing
them across different resources to achieve load-balancing. This charge them for the period of time the machines are allocated.
can be regarded as the original essence of scalability in cloud The VMs are typically offered in different types, each type
system. with its own characteristics (number of CPU cores, amount of
main memory, etc.) and cost. Since the VM abstraction of IaaS
Keywords--Availability, Cloud computing, Data communication, clouds fits the architectural paradigm assumed by the data
Elasticity, Reliability, Computer Network. processing frameworks described above, projects like Hadoop
[7], a popular open source implementation of Google’s Map
I. INTRODUCATION Reduce framework, already have begun to promote using their
Today a growing number of companies have to process huge frameworks in the cloud.
amounts of data in a cost-efficient manner. Classic Only recently, Amazon has integrated Hadoop as one of its
representatives for these companies are operators of Internet core infrastructure services [9]. However, instead of
Page 1
Lepakshi Goud.T/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES-2011
embracing its dynamic resource allocation, current data warehouse. The data stored in the cloud may be frequently
processing frameworks rather expect the cloud to imitate the updated by the users, including insertion, deletion,
static nature of the cluster environments they were originally modification, appending, reordering, etc. To ensure storage
designed for. E.g., at the moment the types and number of correctness under dynamic data update is hence of paramount
VMs allocated at the beginning of a compute job cannot be importance. However, this dynamic feature also makes
changed in the course of processing, although the tasks the job traditional integrity insurance techniques futile and entails new
consists of might have completely different demands on the solutions. Last but not the least, the deployment of Cloud
environment. As a result, rented resources may be inadequate Computing is powered by data centers running in a
for big parts of the processing job, which may lower the simultaneous, cooperated and distributed manner. Individual
overall processing performance and increase the cost. In this user’s data is redundantly stored in multiple physical locations
paper we want to discuss the particular challenges and to further reduce the data integrity threats. Therefore,
opportunities for efficient parallel data processing in clouds distributed protocols for storage correctness assurance will be
and present Nephele, a new processing framework explicitly of most importance in achieving a robust and secure cloud
designed for cloud environments. Most notably, Nephele is the data storage system in the real world. However, such
first data processing framework to include the possibility of important area remains to be fully explored in the literature.
dynamically allocating/ de allocating different compute Recently, the importance of ensuring the remote data
resources from a cloud in its scheduling and during job integrity has been highlighted by the following research works
execution. This paper is an extended version of [10]. It [13]–[14]. These techniques, while can be useful to ensure the
includes further details on scheduling strategies and extended storage correctness without having users possessing data, can
experimental results. not address all the security threats in cloud data storage, since
Several trends are opening up the era of Cloud Computing, they are all focusing on single server scenario and most of
which is an Internet-based development and use of computer them do not consider dynamic data operations. As an
technology. The ever cheaper and more powerful processors, complementary approach, researchers have also proposed
together with the software as a service (SaaS) computing distributed protocols [15]–[16] for ensuring storage
architecture, are transforming data canters into pools of correctness across multiple servers or peers. Again, none of
computing service on a huge scale. The increasing network these distributed schemes is aware of dynamic data operations.
bandwidth and reliable yet flexible network connections make As a result, their applicability in cloud data storage can be
it even possible that users can now subscribe high quality drastically limited.
services from data and software that reside solely on remote Cloud Computing has been envisioned as the next-
data canters. Moving data into the cloud offers great generation architecture of IT enterprise, due to its long list of
convenience to users since they don’t have to care about the unprecedented advantages in the IT history: on-demand self-
complexities of direct hardware management. The pioneer of service, ubiquitous network access, location independent
Cloud Computing vendors, Amazon Simple Storage Service resource pooling, rapid resource elasticity, usage-based pric-
(S3) and Amazon Elastic Compute Cloud (EC2) [11] are both ing and transference of risk [17]. As a disruptive technology
well known examples. While these internet-based online with profound implications, Cloud Computing is transforming
services do provide huge amounts of storage space and the very nature of how businesses use information technology.
customizable computing resources, this computing platform One fundamental aspect of this paradigm shifting is that data
shift, however, is eliminating the responsibility of local is being centralized or outsourced into the Cloud. From users’
machines for data maintenance at the same time. As a result, perspective, including both individuals and IT enterprises,
users are at the mercy of their cloud service providers for the storing data remotely into the cloud in a flexible on-demand
availability and integrity of their data. Recent downtime of manner brings appealing benefits: relief of the burden for
Amazon’s S3 is such an example [12]. From the perspective of storage management, universal data access with independent
data security, which has always been an important aspect of geo graphical locations, and avoidance of capital expenditure
quality of service, Cloud Computing inevitably poses new on hardware, software, and personnel maintenances, etc [18].
challenging security threats for number of reasons. Firstly, While these advantages of using clouds are unarguable, due to
traditional cryptographic primitives for the purpose of data the opaqueness of the Cloud—as separate administrative
security protection cannot be directly adopted due to the users’ entities, the internal operation details of cloud service
loss control of data under Cloud Computing. Therefore, providers (CSP) may not be known by cloud users—data
verification of correct data storage in the cloud must be outsourcing is also relinquishing user’s ultimate control over
conducted without explicit knowledge of the whole data. the fate of their data.
Considering various kinds of data for each user stored in the As a result, the correctness of the data in the cloud is being
cloud and the demand of long term continuous assurance of put at risk due to the following reasons. First of all, although
their data safety, the problem of verifying correctness of data the infrastructures under the cloud are much more powerful
storage in the cloud becomes even more challenging. and reliable than personal computing devices, they are still
Secondly, Cloud Computing is not just a third party data facing the broad range of both internal and external threats for
Page 2
Lepakshi Goud.T/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES-2011
data integrity. Examples of outages and security breaches of task of token distribution to it. Just given these public tokens,
noteworthy cloud services appear from time to time. the server is not able to derive the decryption key of any file.
This solution introduces a minimal number of secret key per
II. RELATED WORK user and a minimal number of encryption key for each file.
Existing work close to ours can be found in the areas of However, the complexity of operations of file creation and
―shared cryptographic file systems‖ and ―access control of user grant/revocation is linear to the number of users, which
outsourced data‖. In [19], Kallahalla et al proposed Plutus as a makes the scheme un scalable. User access privilege
cryptographic file system to secure file storage on untrusted accountability is also not supported.
servers. Plutus groups a set of files with similar sharing
attributes as a file-group and associates each file-group with a III.BASIC CONCEPTS OF PAPER
symmetric lockbox-key. Each file is encrypted using a unique A. Cloud computing
file-blcok key which is further encrypted with the lockbox-key Since ―clouds‖ do not refer to a specific technology, but to a
of the file group to which the file belongs. If the owner wants general provisioning paradigm with enhanced capabilities, it is
to share a file-group, he just delivers the corresponding mandatory to elaborate on these aspects. There is currently a
lockbox-key to users. As the complexity of key management strong tendency to regard clouds as ―just a new name for an
is proportional to the total number of file-groups, Plutus is not old idea‖, which is mostly due to a confusion between the
suitable for the case of fine-grained access control in which cloud concepts and the strongly related P/I/SaaS paradigms
the number of possible ―file-groups‖ could be huge. In [20], (see also II.A.2, but also due to the fact that similar aspects
Goh et al proposed SiRiUS which is layered over existing file have already been addressed without the dedicated term
systems such as NFS but provides end-to-end security. For the ―cloud‖ associated with it (see also II). This section specifies
purpose of access control, SiRiUS attaches each file with a the concrete capabilities associated with clouds that are
meta data file that contains the file’s access control list (ACL), considered essential (required in any cloud environment) and
each entry of which is the encryption of the file’s file relevant (ideally supported, but may be restricted to specific
encryption key (FEK) using the public key of an authorized use cases). We can thereby distinguish non-functional,
user. The extension version of SiRiUS uses NNL broadcast economic and technological capabilities addressed,
encryption algorithm [21] to encrypt the FEK of each file respectively to be addressed by cloud systems.
instead of encrypting it with each individual user’s public key. Non-functional aspects: represent qualities or
As the complexity of the user revocation solution in NNL is properties of a system, rather than specific
proportional to the number of revoked users, SiRiUS has the technological requirements. Implicitly, they can be
same complexity in terms of each meta data file’s size and the realized in multiple fashions and interpreted in
encryption overhead, and thus is not scalable. different ways which typically leads to strong
Ateniese et al [22] proposed a secure distributed storage compatibility and interoperability issues between
scheme based on proxy re-encryption. Specifically, the data individual providers as they pursue their own
owner encrypts blocks of content with symmetric content approaches to realize their respective requirements,
keys. The content keys are all encrypted with a master public which strongly differ between providers [25]. Non-
key, which can only be decrypted by the master private key functional aspects are one of the key reasons why
kept by the data owner. The data owner uses his master ―clouds‖ differ so strongly in their interpretation (see
private key and user’s public key to generate proxy re- also II.B).
encryption keys, with which the semi-trusted server can then Economic considerations: Is one of the key reasons
convert the cipher text into that for a specific granted user and to introduce cloud systems in a business environment
full fill the task of access control enforcement. The main issue in the first instance. The particular interest typically
with this scheme is that collusion between a malicious server lies in the reduction of cost and effort through
and any single malicious user would expose decryption keys outsourcing and / or automation of essential resource
of all the encrypted data and compromise data security of the management. As has been noted in the first section,
system completely. In addition, user access privilege is not relevant aspects thereby to consider relate to the cut-
protected from the proxy server. User secret key off between loss of control and reduction of effort.
accountability is neither supported. In [23], Vimercati et al With respect to hosting private clouds, the gain
proposed a solution for securing data storage on untrusted through cost reduction has to be carefully balanced
servers based on key derivation methods [24]. In this proposed with the increased effort to build and run such a
scheme, each file is encrypted with a symmetric key and each system.
user is assigned a secret key. To grant the access privilege for Technological challenges: implicitly arise from the
a user, the owner creates corresponding public tokens from non-functional and economical aspects, when trying
which, together with his secret key, the user is able to derive to realize them. As opposed to these aspects,
decryption keys of desired files. The owner then transmits technological challenges typically imply a specific
these public tokens to the semi-trusted server and delegates the realization – even though there may be no standard
Page 3
Lepakshi Goud.T/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES-2011
approach as yet and deviations may hence arise. In number of concurrent users etc [25]. One can distinguish
addition to these implicit challenges, one can identify between horizontal and vertical scalability, whereby
additional technological aspects to be addressed by horizontal scalability refers to the amount of instances to
cloud system, partially as a pre-condition to realize satisfy e.g. changing amount of requests, and vertical
some of the high level features, but partially also as scalability refers to the size of the instances themselves and
they directly relate to specific characteristics of cloud thus implicit to the amount of resources required maintaining
systems. the size. Cloud scalability involves both (rapid) up- and down-
scaling. Elasticity goes one step further, tough, and does also
allow the dynamic integration and extraction of physical
resources to the infrastructure. Whilst from the application
perspective, this is identical to scaling, from the middleware
management perspective this poses additional requirements, in
particular regarding reliability. In general, it is assumed that
changes in the resource infrastructure are announced first to
the middleware manager, but with large scale systems it is
vital that such changes can be maintained automatically.
A.3 Reliability
It is essential for all cloud systems – in order to support
today’s data centre-type applications in a cloud, reliability is
considered one of the main features to exploit cloud
capabilities. Reliability denotes the capability to ensure
constant operation of the system without disruption, i.e. no
loss of data, no code reset during execution etc. Reliability is
typically achieved through redundant resource utilisation.
Interestingly, many of the reliability aspects move from
hardware to a software-based solution.(Redundancy in the file
systems vs. RAID controllers, stateless front end servers vs.
UPS, etc.) Notably, there is a strong relationship between
availability (see below) and reliability – however, reliability
focuses in particular on prevention of loss (of data or
execution progress).
B. Data communication
The fundamental purpose of a communications system is the
Fig.1: Structure of the Cloud computing
exchange of data between two parties. Data communications
are the exchange of data between two devices via some form
A.1 Availability of transmission medium such as a wire cable [22]. For data
It services and data is an essential capability of cloud communications to occur, the communicating devices must be
systems and was actually one of the core aspects to give rise to part of a communication system made up of a combination of
clouds in the first instance. It lies in the ability to introduce hardware (physical equipment) and software (programs). The
redundancy for services and data so failures can be masked effectiveness of a data communications system depends on
transparently. Fault tolerance also requires the ability to four fundamental characteristics: delivery, accuracy,
introduce new redundancy (e.g. previously failed or fresh timeliness, and jitter.
nodes) in an online manner non-intrusively (without a
Delivery: The system must deliver data to the correct
significant performance penalty). With increasing concurrent
destination. Data must be received by the intended
access, availability is particularly achieved through replication
device or user and only by that device or user.
of data / services and distributing them across different
Accuracy: The system must deliver the data
resources to achieve load-balancing. This can be regarded as
accurately. Data that have been altered in
the original essence of scalability in cloud systems.
transmission and left uncorrected are unusable.
A.2 Elasticity Timeliness: The system must deliver data in a timely
It is an essential core feature of cloud systems and manner. Data delivered late are useless. In the case of
circumscribes the capability of the underlying infrastructure to video and audio, timely delivery means delivering
adapt to changing, potentially non-functional requirements, for data as they are produced, in the same order that they
example amount and size of data supported by an application,
Page 4
Lepakshi Goud.T/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES-2011
are produced, and without significant delay. This ones in this field to consider distributed data storage in Cloud
kind of delivery is called real-time transmission. Computing. Our contribution can be summarized as the
Jitter: Jitter refers to the variation in the packet following three aspects:
arrival time. It is the uneven delay in the delivery of Compared to many of its predecessors, which only
audio or video packets. For example, let us assume provide binary results about the storage state across
that video packets are sent every 3D ms. If some of the distributed servers, the challenge-response
the packets arrive with 3D-ms delay and others with protocol in our work further provides the localization
4D-ms delay, an uneven quality in the video is the of data error.
result. Unlike most prior works for ensuring remote data
integrity, the new scheme supports secure and
C. Computer network efficient dynamic operations.
Network is a collection of computer and devices On data blocks, including: update, delete and append.
interconnected by communications channels that facilitate Extensive security and performance analysis shows
communications among users and allow users to share that the proposed scheme is highly efficient and
resources [26]. Networks may be classified according to a resilient against, Byzantine failure, malicious data
wide variety of characteristics. A computer network allows modification attack, and even server colluding
sharing of resources and information among interconnected attacks.
devices
Computer networks can be used for a variety of purposes:
Facilitating communications: Using a network,
people can communicate efficiently and easily via
email, instant messaging, chat rooms, telephone,
video telephone calls, and video conferencing.
Sharing hardware: In a networked environment,
each computer on a network may access and use
hardware resources on the network, such as printing a
document on a shared network printer.
Sharing files, data, and information: In a network
environment, authorized user may access data and
information stored on other computers on the
network. The capability of providing access to data
and information on shared storage devices is an
important feature of many networks.
Sharing software: Users connected to a network
may run application programs on remote computers. Fig.2: Cloud data storage architecture
Page 5
Lepakshi Goud.T/ (IJAEST) INTERNATIONAL JOURNAL OF ADVANCED ENGINEERING SCIENCES AND TECHNOLOGIES-2011
Page 6