Unit 3,4,5 CC Notes by NK

Download as pdf or txt
Download as pdf or txt
You are on page 1of 149

3

Cloud Architecture, Services


and Storage
Syllabus
Layered cloud architecture design - NIST cloud computing reference architecture - Public, Private
and Hybrid clouds - laaS - PaaS - SaaS - Architectural design challenges - Cloud storage -
Storage-as-a-Service - Advantages of cloud storage - Cloud storage providers - S3.

Contents
3.1 Cloud Architecture Design
3.2 NIST Cloud Computing Reference Architecture
3.3 Cloud Deployment Models
3.4 Cloud Service Models
3.5 Architectural Design Challenges
3.6 Cloud Storage
3.7 Storage as a Service
3.8 Advantages of Cloud Storage
3.9 Cloud Storage Providers
3.10 Simple Storage Service (S3)

(3 - 1)
Cloud Computing 3-2 Cloud Architecture, Services and Storage

3.1 Cloud Architecture Design


The cloud architecture design is the important aspect while designing a cloud. The
simplicity in cloud services attract cloud users to use it which makes positive business
impact. Therefore, to design such a simple and user - friendly services, the cloud
architecture design plays an important role to develop that. Every cloud platform is
intended to provide four essential design goals like scalability, reliability, efficiency and
virtualization. To achieve this goal, certain requirements has to be considered. The basic
requirements for cloud architecture design are given as follows :
 The cloud architecture design must provide automated delivery of cloud services
along with automated management.
 It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
 It must support very large - scale HPC infrastructure with both physical and virtual
machines.
 The architecture of cloud must be loosely coupled.
 It should provide easy access to cloud services through a self - service web portal.
 Cloud management software must be efficient to receive the user request, finds the
correct resources and then calls the provisioning services which invoke the
resources in the cloud.
 It must provide enhanced security for shared access to the resources from data
centers.
 It must use cluster architecture for getting the system scalability.
 The cloud architecture design must be reliable and flexible.
 It must provide efficient performance and faster speed of access.
Today's clouds are built to support lots of tenants (cloud devices) over the resource
pools and large data volumes. So, the hardware and software plays an important role to
achieve that. The rapid development in multicore CPUs, memory chips, and disk arrays
in the hardware field has made it possible to create data centers with large volumes of
storage space instantly. While development in software standards like web 2.0 and SOA
have immensely helped to developed a cloud services. The Service - Oriented
Architecture (SOA) is also a crucial component which is used in the delivery of SaaS. The
web service software detects the status of the joining and leaving of each node server and
performs appropriate tasks accordingly. The virtualization of infrastructure allows for
quick cloud delivery and recovery from disasters. In recent cloud platforms, resources are
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-3 Cloud Architecture, Services and Storage

built into the data centers which are typically owned and operated by a third - party
provider. The next section explains the layered architecture design for cloud platform.

3.1.1 Layered Cloud Architecture Design


The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform and application. These three levels of architecture are
implemented with virtualization and standardization of cloud - provided hardware and
software resources. This architectural design facilitates public, private and hybrid cloud
services that are conveyed to users through networking support over the internet and the
intranets. The layered cloud architecture design is shown in Fig. 3.1.1.

Fig. 3.1.1 : Layered cloud architecture design

In layered architecture, the foundation layer is infrastructure which is responsible for


providing different Infrastructure as a Service (IaaS) components and related services. It
is the first layer to be deployed before platform and application to get IaaS services and to
run other two layers. The infrastructure layer consists of virtualized services for
computing, storage and networking. It is responsible for provisioning infrastructure
components like compute (CPU and memory), storage, network and IO resources to run

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-4 Cloud Architecture, Services and Storage

virtual machines or virtual servers along with virtual storages. The abstraction of these
hardware resources is intended to provide the flexibility to the users. Internally,
virtualization performs automated resource provisioning and optimizes the process of
managing resources. The infrastructure layer act as a foundation for building the second
layer called platform layer for supporting PaaS services.
The platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to
install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. This layer provides an
environment for users to create their applications, test operation flows, track the
performance and monitor execution results. The platform must be ensuring to provide
scalability, reliability and security. In this layer, virtualized cloud platform, acts as an
"application middleware" between the cloud infrastructure and application layer of cloud.
The platform layer is the foundation for application layer.
A collection of all software modules required for SaaS applications forms the
application layer. This layer is mainly responsible for making on demand application
delivery. In this layer, software applications include day-to-day office management
softwares used for information collection, document processing, calendar and
authentication. Enterprises also use the application layer extensively in business
marketing, sales, Customer Relationship Management (CRM), financial transactions and
Supply Chain Management (SCM). It is important to remember that not all cloud services
are limited to a single layer. Many applications can require mixed - layers resources. After
all, with a relation of dependency, the three layers are constructed from the bottom up
approach. From the perspective of the user, the services at various levels need specific
amounts of vendor support and resource management for functionality. In general, SaaS
needs the provider to do much more work, PaaS is in the middle and IaaS requests the
least. The best example of application layer is the Salesforce.com's CRM service where not
only the hardware at the bottom layer and the software at the top layer is supplied by the
vendor, but also the platform and software tools for user application development and
monitoring.

3.2 NIST Cloud Computing Reference Architecture


In this section, we will examine and discuss the reference architecture model given by
the National Institute of Standards and Technology (NIST). The model offers approaches
for secure cloud adoption while contributing to cloud computing guidelines and
standards.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-5 Cloud Architecture, Services and Storage

The NIST team works closely with leading IT vendors, developers of standards,
industries and other governmental agencies and industries at a global level to support
effective cloud computing security standards and their further development. It is
important to note that this NIST cloud reference architecture does not belong to any
specific vendor products, services or some reference implementation, nor does it prevent
further innovation in cloud technology.
The NIST reference architecture is shown in Fig. 3.2.1.

Fig. 3.2.1 : Conceptual cloud reference model showing different actors and entities

From Fig. 3.2.1, note that the cloud reference architecture includes five major actors :
 Cloud consumer
 Cloud provider
 Cloud auditor
 Cloud broker
 Cloud carrier
Each actor is an organization or entity plays an important role in a transaction or a
process, or performs some important task in cloud computing. The interactions between
these actors are illustrated in Fig. 3.2.2.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-6 Cloud Architecture, Services and Storage

Fig. 3.2.2 : Interactions between different actors in a cloud

Now, understand that a cloud consumer can request cloud services directly from a
CSP or from a cloud broker. The cloud auditor independently audits and then contacts
other actors to gather information. We will now discuss the role of each actor in detail.

3.2.1 Cloud Consumer


A cloud consumer is the most important stakeholder. The cloud service is built to
support a cloud consumer. The cloud consumer uses the services from a CSP or person or
asks an organization that maintains a business relationship. The consumer then verifies
the service catalogue from the cloud provider and requests an appropriate service or sets
up service contracts for using the service. The cloud consumer is billed for the service
used.
Some typical usage scenarios include :

Example 1 : Cloud consumer requests the service from the broker instead of directly
contacting the CSP. The cloud broker can then create a new service by combining
multiple services or by enhancing an existing service. Here, the actual cloud provider is
not visible to the cloud consumer. The consumer only interacts with the broker. This is
illustrated in Fig. 3.2.3.

Fig. 3.2.3 : Cloud broker interacting with cloud consumer

Example 2 : In this scenario, the cloud carrier provides for connectivity and transports
cloud services to consumers. This is illustrated in Fig. 3.2.4.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-7 Cloud Architecture, Services and Storage

Fig. 3.2.4 : Scenario for cloud carrier

In Fig. 3.2.4, the cloud provider participates by arranging two SLAs. One SLA is with
the cloud provider (SLA2) and the second SLA is with the consumer (SLA1). Here, the
cloud provider will have an arrangement (SLA) with the cloud carrier to have secured,
encrypted connections. This ensures that the services are available for the consumer at a
consistent level to fulfil service requests. Here, the provider can specify the requirements,
such as flexibility, capability and functionalities in SLA2 to fulfil essential service
requirements in SLA1.

Example 3 : In this usage scenario, the cloud auditor conducts independent evaluations
for a cloud service. The evaluations will relate to operations and security of cloud service
implementation. Here the cloud auditor interacts with both the cloud provider and
consumer, as shown in Fig. 3.2.5.

Fig. 3.2.5 : Usage scenario involving a cloud auditor

In all the given scenarios, the cloud consumer plays the most important role. Based on
the service request, the activities of other players and usage scenarios can differ for other
cloud consumers. Fig. 3.2.6 shows an example of available cloud services types.
In Fig. 3.2.6, note that SaaS applications are available over a network to all consumers.
These consumers may be organisations with access to software applications, end users,
app developers or administrators. Billing is based on the number of end users, the time of
use, network bandwidth consumed and for the amount or volume of data stored.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-8 Cloud Architecture, Services and Storage

Fig. 3.2.6 : Example of cloud services available to cloud consumers

PaaS consumers can utilize tools, execution resources, development IDEs made
available by cloud providers. Using these resources, they can test, develop, manage,
deploy and configure many applications that are hosted on a cloud. PaaS consumers are
billed based on processing, database, storage, network resources consumed and for the
duration of the platform used.
On the other hand, IaaS consumers can access virtual computers, network - attached
storage, network components, processor resources and other computing resources that
are deployed and run arbitrary software. IaaS consumers are billed based on the amount
and duration of hardware resources consumed, number of IP addresses, volume of data
stored, network bandwidth, and CPU hours used for a certain duration.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3-9 Cloud Architecture, Services and Storage

3.2.2 Cloud Provider


Cloud provider is an entity that offers cloud services to interested parties. A cloud
provider manages the infrastructure needed for providing cloud services. The CSP also
runs the software to provide services and organizes the service delivery to cloud
consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfill cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed on
the infrastructure. On the other hand, SaaS consumers have no or limited administrative
controls.
PaaS cloud providers manage the computing infrastructure and ensure that the
platform runs the cloud software and implements databases, appropriate runtime
software execution stack and other required middleware elements. They support
development, deployment and the management of PaaS consumers by providing them
with necessary tools such as IDEs, SDKs and others. PaaS providers have complete
control of applications, settings of the hosting environment, but have lesser control over
the infrastructure lying under the platform, network, servers, OS and storage.
Now, the IaaS CSP aggregates physical cloud resources such as networks, servers,
storage and network hosting infrastructure. The provider operates the cloud software and
makes all compute resources available to IaaS cloud consumer via a set of service
interfaces, such as VMs and virtual network interfaces. The IaaS cloud provider will have
control over the physical hardware and cloud software to enable provisioning and
possible infrastructure services.
The main activities of a cloud provider can be viewed in Fig. 3.2.7.

Fig. 3.2.7 : Major activities of a cloud provider

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 10 Cloud Architecture, Services and Storage

The major activities of a cloud provider include :


 Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
 Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
 Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
 Security : Security, which is a critical function in cloud computing, spans all layers
in the reference architecture. Security must be enforced end-to-end. It has a wide
range from physical to application security. CSPs must take care of security.
 Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.

3.2.3 Cloud Auditor


The cloud auditor performs the task of independently evaluating cloud service
controls to provide an honest opinion when requested. Cloud audits are done to validate
standards conformance by reviewing the objective evidence. The auditor will examine
services provided by the cloud provider for its security controls, privacy, performance,
and so on.

3.2.4 Cloud Broker


The cloud broker collects service requests from cloud consumers and manages the use,
performance, and delivery of cloud services. The cloud broker will also negotiate and
manage the relationship between cloud providers and consumers. A cloud broker may
provide services that fall into one of the following categories :
 Service intermediation : Here the cloud broker will improve some specific
capabilities, and provide value added services to cloud consumers.
 Service aggregation : The cloud broker links and integrates different services into
one or more new services.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 11 Cloud Architecture, Services and Storage

 Service Arbitrage : This is similar to aggregation, except for the fact that services
that are aggregated are not fixed. In service arbitrage, the broker has the liberty to
choose services from different agencies.

3.2.5 Cloud Carrier


The cloud carrier tries to establish connectivity and transports cloud services between
a cloud consumer and a cloud provider. Cloud carriers offer network access for
consumers, by providing telecommunication links for accessing resources using other
devices (laptops, computers, tablets, smartphones, etc.). Usually, a transport agent is an
entity offering telecommunication carriers to a business organization to access resources.
The cloud provider will set up SLAs with cloud carrier to ensure carrier transport is
consistent with the level of SLA provided by the consumers. Cloud carriers provide
secure and dedicated high - speed links with cloud providers and between different cloud
entities.

3.3 Cloud Deployment Models


A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. The NIST have classified cloud
deployment models into four categories namely,
 Public cloud
 Private cloud
 Hybrid cloud
 Community cloud
They describe the way in which users can access the cloud services. Each cloud
deployment model fits different organizational needs, so it's important that you pick a
model that will suit your organization's needs. The four deployment models are
characterized based on the functionality and accessibility of cloud services. The four
deployment models of cloud computing are shown in Fig. 3.3.1.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 12 Cloud Architecture, Services and Storage

Fig. 3.3.1 : Four deployment models of cloud computing

3.3.1 Public Cloud


The public cloud services are runs over the internet. Therefore, the users who want
cloud services have to have internet connection in their local device like thin client, thick
client, mobile, laptop or desktop etc. The public cloud services are managed and
maintained by the Cloud Service Providers (CSPs) or the Cloud Service Brokers (CSBs).
The public cloud services are often offered on utility base pricing like subscription or pay-
per-use model. The public cloud services are provided through internet and APIs. This
model allows users to easily access the services without purchasing any specialize
hardware or software. Any device which has web browser and internet connectivity can
be a public cloud client. The popular public cloud service providers are Amazon web
services, Microsoft azure and Google app engine, Salesforce etc.

Advantages of public cloud


1. It saves capital cost behind purchasing the server hardware’s, operating systems
and application software licenses.
2. There is no need of server administrators to take care of servers as they are kept at
CSPs data center and managed by them.
3. No training is required to use or access the cloud services.
4. There is no upfront or setup cost is required.
5. A user gets easy access to multiple services under a single self - service portal.
6. Users have a choice to compare and select between the providers.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 13 Cloud Architecture, Services and Storage

7. It is cheaper than in house cloud implementation because user have to pay for that
they have used.
8. The resources are easily scalable.

Disadvantages of public cloud


1. There is lack of data security as data is stored on public data center and managed by
third party data center vendors therefore there may be compromise of user’s
confidential data.
2. Expensive recovery of backup data.
3. User never comes to know where (at which location) their data gets stored, how
that can be recovered and how many replicas of data have been created.

3.3.2 Private Cloud


The private cloud services are used by the organizations internally. Most of the times it
run over the intranet connection. They are designed for a single organization therefore
anyone within the organization can get access to data, services and web applications
easily through local servers and local network but users outside the organizations cannot
access them. This type of cloud services are hosted on intranet therefore users who are
connected to that intranet get access to the services. The infrastructure for private cloud is
fully managed and maintained by the organization itself. It is much more secure than
public cloud as it gives freedom to local administrators to write their own security
policies for user’s access. It also provides good level trust and privacy to the users. Private
clouds are more expensive than public clouds due to the capital expenditure involved in
acquiring and maintaining them. The well-known private cloud platforms are Openstack,
Open nebula, Eucalyptus, VMware private cloud etc.

Advantages of private cloud


1. Speed of access is very high as services are provided through local servers over
local network.
2. It is more secure than public cloud as security of cloud services are handled by local
administrator.
3. It can be customized as per organizations need.
4. It does not require internet connection for access.
5. It is easy to manage than public cloud.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 14 Cloud Architecture, Services and Storage

Disadvantages of private cloud

1. Implementation cost is very high as setup involves purchasing and installing


servers, Hypervisors, Operating systems.
2. It requires administrators for managing and maintaining servers.
3. The scope of scalability is very limited.

3.3.3 Hybrid Cloud


The hybrid cloud services are composed of two or more clouds that offers the benefits
of multiple deployment models. It mostly comprises on premise private cloud and off-
premise public cloud to leverage benefits of both and allow users inside and outside to
have access to it. The Hybrid cloud provides flexibility such that users can migrate their
applications and services from private cloud to public cloud and vice versa. It becomes
most favored in IT industry because of its eminent features like mobility, customized
security, high throughput, scalability, disaster recovery, easy backup and replication
across clouds, high availability and cost efficient etc. The popular hybrid clouds are AWS
with eucalyptus, AWS with VMware cloud, Google cloud with Nutanix etc.
The limitations of hybrid cloud are compatibility of deployment models, vendor-lock
in solutions, requires a common cloud management software and management of
separate cloud platforms etc.

3.3.4 Community Cloud


The community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause. The
community cloud is setup between multiple organizations whose objective is same. The
Infrastructure for community cloud is to be shared by several organizations within
specific community with common security, compliance objectives which is managed by
third party organizations or managed internally. The well-known community clouds are
Salesforce, Google community cloud etc.

3.3.5 Comparison between various Cloud Deployment Models


The comparison between different deployment models of cloud computing are given
in Table 3.3.1.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 15 Cloud Architecture, Services and Storage

Sr. Public Community


Feature Private Cloud Hybrid Cloud
No Cloud Cloud
1 Scalability Very High Limited Very High Limited
2 Security Less Secure Most Secure Very Secure Less Secure
Low to
3 Performance Good Good Medium
Medium
Medium to
4 Reliability Medium High Medium
High
5 Upfront Cost Low Very High Medium Medium
Quality of
6 Low High Medium Medium
Service

Intranet and
7 Network Internet Intranet Internet
Internet

For general
For general Organizations public and For community
8 Availability
public internal staff organizations members
internal Staff
Openstack,
Windows Combination of
VMware cloud, Salesforce
9 Example Azure, AWS Openstack and
CloudStack, community
etc. AWS
Eucalyptus etc.

Table 3.3.1 : Comparison between various Cloud Deployment Models

3.4 Cloud Service Models


A Cloud computing is meant to provide variety of services and applications for users
over the internet or intranet. The most widespread services of cloud computing are
categorised into three service classes which are called cloud service models or cloud
reference models or working models of cloud computing. They are based on the
abstraction level of the offered capabilities and the service model of the CSPs. The various
service models are :
 Infrastructure as a Service (IaaS)
 Platform as a Service (PaaS)
 Software as a Service (SaaS)
The three service models of cloud computing and their functions are shown in
Fig. 3.4.1.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 16 Cloud Architecture, Services and Storage

Fig. 3.4.1 : Cloud service models

From Fig. 3.4.1, we can see that the Infrastructure as a Service (IaaS) is the bottommost
layer in the model and Software as a Service (SaaS) lies at the top. The IaaS has lower
level of abstraction and visibility, while SaaS has highest level of visibility.
The Fig. 3.4.2 represents the cloud stack organization from physical infrastructure to
applications. In this layered architecture, the abstraction levels are seen where higher
layer services include the services of the underlying layer.

Fig. 3.4.2 : The cloud computing stack

As you can see in Fig. 3.4.2, the three services, IaaS, PaaS and SaaS, can exist
independent of one another or may combine with one another at some layers. Different
layers in every cloud computing model are either managed by the user or by the vendor

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 17 Cloud Architecture, Services and Storage

(provider). In case of the traditional IT model, all the layers or levels are managed by the
user because he or she is solely responsible for managing and hosting the applications. In
case of IaaS, the top five layers are managed by the user, while the four lower layers
(virtualisation, server hardware, storage and networking) are managed by vendors or
providers. So, here, the user will be accountable for managing the operating system via
applications and managing databases and security of applications. In case of PaaS, the
user needs to manage only the application and all the other layers of the cloud computing
stack are managed by the vendor. Lastly, SaaS abstracts the user from all the layers as all of
them are managed by the vendor and the former is responsible only for using the
application.
The core middleware manages the physical resources and the VMs are deployed on
top of them. This deployment will provide the features of pay-per-use services and multi-
tenancy. Infrastructure services support cloud development environments and provide
capabilities for application development and implementation. It provides different
libraries, models for programming, APIs, editors and so on to support application
development. When this deployment is ready for the cloud, they can be used by end-
users/ organisations. With this idea, let us further explore the different service models.

3.4.1 Infrastructure as a Service (IaaS)


Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for users. It is a
cloud service model that provides hardware resources virtualized in the cloud. It
provides virtual computing resources to the users through resource pool. In IaaS, the CSP
owns all equipment, such as servers, storage disks, and network infrastructure.
Developers use the IaaS service model to create virtual hardware on which the
applications and/ or services are developed. We can understand that an IaaS cloud
provider will create hardware utility service and make them available for users to
provision virtual resources as per need. Developers can create virtual private storage,
virtual private servers, and virtual private networks by using IaaS. The private virtual
systems contain software applications to complete the IaaS solution. The infrastructure of
IaaS consists of communication networks, physical compute nodes, storage solutions and
the pool of virtualized computing resources managed by a service provider. IaaS provides
users with a web-based service that can be used to create, destroy and manage virtual
machines and storage. It is a way of delivering cloud computing infrastructure like
Virtual servers, Virtual storage, Virtual network and Virtual operating systems as an on-
demand service. Instead of purchasing extra servers, softwares, datacenter space or

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 18 Cloud Architecture, Services and Storage

network equipment, IaaS enables on-demand provisioning of computational resources in


the form of virtual machines in cloud data center. Some key providers of IaaS are Amazon
Web Services (AWS), Microsoft Azure, GoGrid, Joyent, Rackspace etc. and some of the
private cloud softwares through which IaaS can be setup are Openstack, Apache Cloud
Stack, Eucalyptus, and VMware VSphere etc.
You must understand that the virtualised resources are
mapped to real systems in IaaS. This can be understood as
when a user with IaaS service makes a request for a service
from virtual systems, that request is redirected to the
physical server that does the actual work. The structure of
the IaaS model is shown in Fig. 3.4.3.
In IaaS service delivery, workload is the fundamental
component of the virtualised client. It simulates the
capacity of a physical server to perform work. Hence, the
work done is equal to the total number of Transaction Per
Minute (TPM). Note that the workload also has other
attributes, such as disk I/O (determined by I/O
per second), RAM used in MB, latency and network Fig. 3.4.3 : Components in
throughput and others. IaaS service model (cloud
In the case of hosted applications, the client runs on a security alliance)
dedicated server inside a server rack. It may also run on a
standalone server. In cloud computing, the provisioned server is known as an instance (or
server instance), which is reserved by a customer, along with adequate computing
resources required to fulfil their resource requirements. The user reserves an equivalent
machine required to run workloads.
The IaaS infrastructure runs the instances of the server in the data centre offering the
service. The resources for this server instance are drawn from a mix of virtualised
systems, RAID disks, network and interface capacity. These are physical systems
partitioned into logical smaller logical units.
The client in IaaS is allocated with its own private network. For example, Amazon EC2
enables this service to behave such that each server has its own separate network unless
the user creates a virtual private cloud. If the EC2 deployment is scaled by adding
additional networks on the infrastructure, it is easy to logically scale, but this can create
an overhead as traffic gets routed between logical networks.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 19 Cloud Architecture, Services and Storage

In IaaS, the customer has controls over the OS, storage and installed applications, but
has limited control over network components. The user cannot control the underlying
cloud infrastructure. Services offered by IaaS include web servers, server hosting,
computer hardware, OS, virtual instances, load balancing, web servers and bandwidth
provisioning. These services are useful during volatile demands and when there is a
computing resource need for a new business launch or when the company may not want
to buy hardware or if the organisation wants to expand.

3.4.2 Platform as a Service


The Platform as a Service can be defined as a computing platform that allows the user
to create web applications quickly and easily and without worrying about buying and
maintaining the software and infrastructure. Platform-as-a-Service provides tools for
development, deployment and testing the softwares, middleware solutions, databases,
programming languages and APIs for developers to develop custom applications;
without installing or configuring the development environment. The PaaS provides a
platform to run web applications without installing them in a local machine i.e. the
applications written by the users can be directly run on the PaaS cloud. It is built on the
top of IaaS layer. The PaaS realizes many of the unique benefits like utility computing,
hardware virtualization, dynamic resource allocation, low investment costs and pre-
configured development environment. It has all the application typically required by the
client deployed on it. The challenge associated with PaaS is compatibility i.e. if user wants
to migrate the services from one provider to other then they have checked the
compatibility of execution engine and cloud APIs first. Some key providers of PaaS
clouds are Google App Engine, Microsoft Azure, NetSuite, Red hat Open shift etc.
The PaaS model includes the software environment
where the developer can create custom solutions using
development tools available with the PaaS platform. The
components of a PaaS platform are shown in Fig. 3.4.4.
Platforms can support specific development languages,
frameworks for applications and other constructs. Also,
PaaS provides tools and development environments to
design applications. Usually, a fully Integrated
Development Environment (IDE) is available as a PaaS Fig. 3.4.4 : Components of
service. For PaaS to be a cloud computing service, the PaaS
platform supports user interface development. It also has
many standards such as HTML, JavaScript, rich media and so on.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 20 Cloud Architecture, Services and Storage

In this model, users interact with the software and append and retrieve data, perform
an action, obtain results from a process task and perform other actions allowed by the
PaaS vendor. In this service model, the customer does not own any responsibility to
maintain the hardware and software and the development environment. The applications
created are the only interactions between the customer and the PaaS platform. The PaaS
cloud provider owns responsibility for all the operational aspects, such as maintenance,
updates, management of resources and product lifecycle. A PaaS customer can control
services such as device integration, session management, content management, sandbox,
and so on. In addition to these services, customer controls are also possible in Universal
Description Discovery and Integration (UDDI), and platform independent Extensible
Mark-up Language (XML) registry that allows registration and identification of web
service apps.
Let us consider an example of Google app engine.
The platform allows developers to program apps using Google’s published APIs. In
this platform, Google defines the tools to be used within the development framework, the
file system structure and data stores. A similar PaaS offering is given by Force.com,
another vendor that is based on the Salesforce.com development platform for the latter’s
SaaS offerings.Force.com provides an add - on development environment.
In PaaS, note that developers can build an app with Python and Google API. Here, the
PaaS vendor is the developer who offers a complete solution to the user. For instance,
Google acts as a PaaS vendor and offers web service apps to users. Other examples are :
Google Earth, Google Maps, Gmail, etc.
PaaS has a few disadvantages. It locks the developer and the PaaS platform in a
solution specific to a platform vendor. For example, an application developed in Python
using Google API on Google App Engine might work only in that environment.
PaaS is also useful in the following situations :
 When the application must be portable.
 When proprietary programming languages are used.
 When there is a need for custom hardware and software.
Major PaaS applications include software development projects where developers and
users collaborate to develop applications and automate testing services.
3.4.2.1 Power of PaaS

PaaS offers promising services and continues to offer a growing list of benefits. The
following are some standard features that come with a PaaS solution :

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 21 Cloud Architecture, Services and Storage

 Source code development : PaaS solutions provide the users with a wide range of
language choices including stalwarts such as Java, Perl, PHP, Python and Ruby.
 Websites : PaaS solutions provide environments for creating, running and
debugging complete websites, including user interfaces, databases, privacy and
security tools. In addition, foundational tools are also available to help developers
update and deliver new web applications to meet the fast-changing needs and
requirements of their user communities.
 Developer sandboxes : PaaS also provides dedicated “sandbox” areas for
developers to check how snippets of a code perform prior to a more formal test.
Sandboxes help the developers to refine their code quickly and provide an area
where other programmers can view a project, offer additional ideas and suggest
changes or fixes to bugs.
The advantages of PaaS go beyond relieving the overheads of managing servers,
operating systems, and development frameworks. PaaS resources can be provisioned and
scaled quickly, within days or even minutes. This is because the organisation does not
have host any infrastructure on premises. In fact, PaaS also may help organisations reduce
costs with its multitenancy model of cloud computing allowing multiple entities to share
the same IT resources. Interestingly, the costs are predictable because the fees are pre-
negotiating every month.
The following boosting features can empower a developer’s productivity, if efficiently
implemented on a PaaS site :
 Fast deployment : For organisations whose developers are geographically scattered,
seamless access and fast deployment are important.
 Integrated Development Environment (IDE) : PaaS must provide the developers
with Internet - based development environment based on a variety of languages,
such as Java, Python, Perl, Ruby etc., for scripting, testing and debugging their
applications.
 Database : Developers must be provided with access to data and databases. PaaS
must provision services such as accessing, modifying and deleting data.
 Identity management : Some mechanism for authentication management must be
provided by PaaS. Each user must have a certain set of permissions with the
administrator having the right to grant or revoke permissions.
 Integration : Leading PaaS vendors, such as Amazon, Google App Engine, or
Force.com provide integration with external or web-based databased and services.
This is important to ensure compatibility.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 22 Cloud Architecture, Services and Storage

 Logs : PaaS must provide APIs to open and close log files, write and examine log
entries and send alerts for certain events. This is a basic requirement of application
developers irrespective of their projects.
 Caching : This feature can greatly boost application performance. PaaS must make
available a tool for developers to send a resource to cache and to flush the cache.
3.4.2.2 Complications with PaaS

PaaS can significantly affect an application’s performance, availability and flexibility.


However, there are critical issues to consider. The following are some of the
complications or issues of using PaaS :
Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may not be
compatible with another vendor and hence not necessarily migrate easily to it.
Although most of the times customers agree with being hooked up to a single vendor,
this may not be the situation every time. Users may want to keep their options open. In
this situation, developers can opt for open - source solutions. Open - source PaaS
provides elasticity by revealing the underlying code and the ability to install the PaaS
solution on any infrastructure. The disadvantage of using an open source version of PaaS
is that certain benefits of an integrated platform are lost.
Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to make
sure that the vendor you choose supports the same technologies. For example, if you are
strongly dedicated to a .NET architecture, then you must select a vendor with native .NET
support. Likewise, database support is critical to performance and minimising
complexity.
Vulnerability and security : Multitenancy lets users to be spread over interconnected
hosts. The providers must take adequate security measures in order to protect these
vulnerable hosts from attacks, so that an attacker is not able to easily access the resources
of host and also tenant objects.
Providers have the ability to access and modify user objects/systems. The following
are the three ways by which security of an object can be breached in PaaS systems :
 A provider may access any user object that resides on its hosts. This type of attack is
inevitable but can be avoided to some extent by trusted relations between the user
and the provider.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 23 Cloud Architecture, Services and Storage

 Co-tenants, who share the same resources, may mutually attack each other’s objects.
 Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
 Cryptographic methods namely, symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of the
providers to protect the integrity and privacy of user objects on a host.
Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related solutions
are being built to tackle this problem of vendor lock-in. Most customers are unaware of
the terms and conditions of the providers that prevent interoperability and portability of
applications. A number of strategies are proposed on how to avoid/lessen lock-in risks
before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is unable
to migrate its applications or data to a different vendor. This heterogeneity of cloud
semantics creates technical incompatibility, which in turn leads to interoperability and
portability challenges. This makes interoperation, collaboration, portability and
manageability of data and services a very complex task.

3.4.3 Software as a Service


Software-as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users. It gives remote access to softwares that resides on cloud
server not on the user’s device. Therefore, user does not need to install required softwares
in their local device as they are provided remotely through network. The consumer of a
SaaS application only requires thin client software such as a web browser to access the
cloud-hosted application. This reduces the hardware requirements for end-users and
allows for centralized control, deployment and maintenance of the software.
SaaS provides a model for complete infrastructure. It is viewed as a complete cloud
model where hardware, software and the solution, all are provided as a complete service.
You can denote SaaS as software deployed on the cloud or on a hosted service accessed
through a browser, from anywhere over the internet. The user accesses the software, but
all the other aspects of the service are abstracted away from the user. Some examples of
popular SaaS applications are Google Docs, Hotmail, Salesforce and Gmail. The structure
of the SaaS system is illustrated in Fig. 3.4.5.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 24 Cloud Architecture, Services and Storage

SaaS provides the capability to use applications


supplied by the service provider but does not follow
control of platform or the infrastructure. Most of the users
are familiar with SaaS systems because they offer a
substitute for local software. Examples are : Google
Calendar, Zoho Office Suite, Google Gmail.
SaaS applications come in a variety of applications to
include custom software such as CRM applications, Fig. 3.4.5 : Structure of SaaS

Helpdesk applications, HR applications, billing and


invoicing applications and so on. SaaS applications may not be fully customisable, but
there are many applications that provide APIs for developers to create customised
applications.
The APIs allow modifications to the security model, data schema, workflow
characteristics and other functionalities of services as experienced by the user. Few
examples of SaaS platform enabled by APIs include Salesforce.com, Quicken.com and
others. SaaS apps are delivered by CSPs. This further implies that the user does not have a
hand in infrastructure management or individual app capabilities. Rather the SaaS apps
can be accessed over a thin client web interface. SaaS provides the following services :
 Enterprise - level services
 Web 2.0 applications including social networking, blogs, wiki servers, portal
services, metadata management and so on.
Some of the common characteristics found in SaaS applications are as follows :
 Applications deployed on SaaS are available over the internet and can be accessed
from any location.
 Software can be licensed based on subscriptions or billed based on usage, usually
on a recurring basis.
 The vendor monitors and maintains the software and the service.
 SaaS applications are cheaper because they reduce the cost of distribution and
maintenance. End - user costs are also reduced significantly.
 SaaS enables faster rollout, as features such as automatic rollouts, upgrades, patch
management and other tasks are easier to implement from a centralised system.
 SaaS applications can scale up or scale down based on demand and they have lower
barrier entry compared to their locally installed competitors.
 All SaaS users can have the same version of the software, and hence the issue of

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 25 Cloud Architecture, Services and Storage

compatibility is eliminated.
 SaaS has the capacity to support multiple users.
In spite of the above benefits, there are some drawbacks of SaaS. For example, SaaS is
not suited for applications that need real - time response where there is a requirement for
data to be hosted externally.

3.5 Architectural Design Challenges


The cloud architecture design plays an important role in making cloud services
successful in all aspects, but still it has some challenges. The major challenges involved in
architectural design of cloud computing are shown in Fig. 3.5.1 and explained as follows.

Fig. 3.5.1 : Architectural design challenges in cloud

3.5.1 Challenges related to Data Privacy, Compliance and Security Concerns


Presently, most of the cloud offerings are basically runs on public networks which
renders the infrastructure more susceptible to attack. The most common attacks on the
network include buffer overflows, DoS attacks, spyware, malware, root kits, trojan horses
and worms. With well-known technologies such as encrypted data, virtual LANs and
network middleboxes such as firewalls, packet filters etc., many challenges can be solved
immediately. Newer attacks may result from hypervisor malware, guest hopping and
hijacking or VM rootkits in a cloud environment. Another form of attack on VM
migrations is the man-in-the-middle attack. The passive attacks typically steal personal
data or passwords while active attacks can exploit data structures in the kernel that will
cause significant damage to cloud servers.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 26 Cloud Architecture, Services and Storage

To protect from cloud attacks, one could encrypt their data before placing it in a cloud.
In many countries, there are laws that allow SaaS providers to keep consumer data and
copyrighted material within national boundaries that also called as compliance or
regulatory standards. Many countries still do not have laws for compliance; therefore, it is
indeed required to check the cloud service providers SLA for executing compliance for
services.

3.5.2 Challenges related to Unpredictable Performance and Bottlenecks


In cloud computing, the cloud platform is responsible for deploying and running
services on the top of resource pool which has shared hardware from different physical
servers. In a production environment, multiple Virtual Machines (VMs) shares the
resources with each other like CPU, memory, I/O and network. Whenever I/O devices
are shared between VMs, it may generate a big challenge during provisioning due to I/O
interfaced between them. It may generate an unpredicted performance and may result
into system bottlenecks. The problem becomes wider when such I/O resources are pulled
across boundaries of cloud. In such scenarios, the accessibility may become complicated
for data placement and transport. To overcome that, data transfer bottlenecks must be
removed, bottleneck links must be widened and weak servers in cloud infrastructure
should be removed. One solution for this challenge is to improve I / O architectures and
operating systems used in physical servers, so that interrupts and I / O channels can be
easily virtualized.

3.5.3 Challenges related to Service Availability and Vendor/Data Lock-in


Due to popularity of cloud computing, many organizations run their mission critical or
business critical applications on cloud with shared infrastructure provided by cloud
service providers. Therefore, any compromise in service availability may result into huge
financial loss. Therefore, managing a single enterprise cloud service is often leads to
single failure points. The solution related to this challenge is use of multiple cloud
providers. In such case, even if a company has multiple data centers located in different
geographic regions, it may have common software infrastructure and accounting systems.
Therefore, using multiple cloud providers may provide more protection from failures.
In such instances, even if an organization has several data centers located in various
geographic regions the multiple cloud service providers can protect their cloud
infrastructure and accounting systems and make them available continuously. The use of
multiple cloud providers will also provide more protection against failures. Such
implementation may ensure the high availability for the organizations. Distributed Denial

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 27 Cloud Architecture, Services and Storage

of Service (DDoS) attacks are another obstacle to availability. Criminals are trying to slash
SaaS providers' profits by making their services out of control. Some utility computing
services give SaaS providers the ability to use quick scale - ups to protect themselves
against DDoS attacks.
In some cases, due the failure of a single company who was providing cloud storages
the lock - in concern arises. As well as because of some vendor - lock in solutions of cloud
services providers, organizations face difficulties in migrating to new cloud service
provider. Therefor to mitigate those challenges related to data lock in and vendor lock in,
software stacks can be used to enhance interoperability between various cloud platforms
as well as standardize APIs to rescue data loss due to a single company failure. It also
supports "surge computing" that has the same technological framework in both public
and private clouds and is used to catch additional tasks that cannot be performed
efficiently in a private cloud's data center.

3.5.4 Challenges related to Cloud Scalability, Interoperability and


Standardization
In cloud computing, pay-as-you-go model refers to utility - based model where bill for
storage and the bandwidth of the network are calculated according to the number of
bytes used. Depending on the degree of virtualization, computation is different. Google
App Engine scales and decreases automatically in response to load increases; users are
paid according to the cycles used. Amazon Web Service charges the number of instances
used for VM by the hour, even though the computer is idle. The potential here is to scale
up and down quickly in response to load variability, to save money, but without
breaching SLAs. In virtualization, the Open Virtualization Format (OVF) defines an open,
secure, portable, effective and extensible format for VM packaging and delivery. It also
specifies a format to be used to distribute the program in VMs. It also specifies a
transportation framework for VM templates, which can refer to various virtualization
platforms with different virtualization levels.
The use of a different host platform, virtualization platform or guest operating system
does not depend on this VM format. The solution is to address virtual platform - agnostic
packaging with bundled device certification and credibility. The package provides
support for virtual appliances that span more than one VM. The ability of virtual
appliances needs to be proposed to operate on any virtual platform in terms of cloud
standardization to allow VMs to run hypervisors on heterogeneous hardware platforms.
The cloud platform should also introduce live cross - platform migration between x86

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 28 Cloud Architecture, Services and Storage

Intel and AMD technologies and support legacy load balancing hardware to avoid the
challenges related to interoperability.

3.5.5 Challenges related to Software Licensing and Reputation Sharing


Most of the cloud computing providers primarily depended on open source software,
as the commercial software licensing model is not suitable for utility computing. The key
opportunity is either to stay popular with open source, or simply to encourage
commercial software companies to adjust their licensing structure to suit cloud
computing better. One may consider using both pay-for-use and bulk licensing schemes
to broaden the scope of the company. Bad conduct by one client can affect the credibility
of the cloud as a whole. For example, In AWS, spam - prevention services can restrict
smooth VM installation by blacklisting of EC2 IP addresses. An advantage would be to
build reputation - guarding services similar to those currently provided through "trusted
e-mail" providers for providers hosted on smaller ISPs. Another legal issue concerns the
transfer of legal responsibility. Cloud services require consumers to remain legally
accountable and vice versa. This problem needs to be solved at SLA level.

3.5.6 Challenges related to Distributed Storage and Bugs in Softwares


In cloud applications the database services continuously grow. The potential is to
build a storage infrastructure that not only fulfills this growth but also blends it with the
cloud benefit of scaling up and down dynamically on demand. That involves the design
of efficiently distributed SANs. The data centers will meet the standards of programmers
in terms of scalability, system reliability and HA. A major problem in cloud computing is
data consistency testing in SAN - connected data centers. Large - scale distributed bugs
cannot be replicated, so debugging must take place on a scale in the data centers for
production. Hardly any data center will deliver that convenience. One solution may be to
focus on using VMs in cloud computing. The virtualization level can allow valuable
information to be captured in ways that are impossible without using VMs. Debugging
on simulators is another way to fix the problem, if the simulator is well designed.

3.6 Cloud Storage


With the rise in the popularity of cloud computing, you may be wondering where and
how the data is stored in the cloud. The model in which the digital data is stored in logical
pools is a cloud storage. Your data is stored in an online repository. So, it is the
responsibility of the storage service provider to take care of the data files. Take an
example of the email service you are using, like Gmail, Yahoo etc. The emails you send or

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 29 Cloud Architecture, Services and Storage

receive are not stored on your local hard disks but are kept on the email providers’ server.
It is important to note that none of the data is stored on your local hard drives.
It is true that all computer owners store data. For these users, finding enough storage
space to hold all the data they have accumulated seems like impossible mission. Earlier,
people stored information in the computer’s hard drive or other local storage devices, but
today, this data is saved in a remote database. The Internet provides the connection
between the computer and the database. Fig. 3.6.1 illustrates how cloud storage works.

Fig. 3.6.1 : The working of cloud storage

People may store their data on large hard drives or other external storage devices like
thumb drives or compact discs. But with cloud, the data is stored in a remote database.
Fig. 3.6.1 consists of a client computer, which has a bulk of data to be stored and the
control node, a third-party service provider, which controls several databases together.
Cloud storage system has storage servers. The subscriber copies their files to the storage
servers over the internet, which will then record the data. If the client needs to retrieve the
data, the client accesses the data server with a web - based interface, and the server either
sends the files back to the client or allows the client to access and manipulate the data
itself.
Cloud storage is a service model in which data is maintained, managed and backed up
remotely and made available to users over a network. Cloud storage provides extremely
efficient storage of objects that scales to exabytes of data. It allows to access data from any
storage class instantly, integrate storage with a single unified API into your applications
and optimize the performance with ease. It is the responsibility of cloud storage providers

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 30 Cloud Architecture, Services and Storage

to keep the data available and accessible and to secure and run the physical environment.
Even though data is stored and accessed remotely, you can maintain data both locally and
on the cloud as a measure of safety and redundancy.
The cloud storage system requires one data server to be connected to the internet. The
copies of files are sent by the client to that data server, which saves the information. The
server sends the files back to the client. Through the web - based interface, the server
allows the client to access and change the files on the server itself, whenever he or she
wants to retrieve it. The connection between the computer and database is provided by
the internet. Cloud storage services, however, use tens or hundreds of data servers. Since
servers need maintenance or repair, it is important to store stored data on several
machines, providing redundancy. Without redundancy, cloud storage services could not
guarantee clients that they would be able to access their information at any given time.
There are two techniques used for storing the data on cloud called cloud sync and cloud
backup which are explained as follows.

3.6.1 Difference between Cloud Sync and Cloud Backup


 Cloud sync : Cloud sync stores the same set of most up-to-date version of files and
folders on client devices and cloud storage. When you modify the data, sync
uploads those updated files, which can be manually downloaded by the user. This
is one-way sync. In two - way sync, the intermediate storage is a cloud. Cloud sync
is suitable for the organisations or people who use multiple devices regularly. Some
cloud sync services are Dropbox, iCloud Drive, OneDrive, Box and Google Drive.
These services match up organisers on your PC to folders on different machines or
to the cloud – enabling clients to work from a folder or directory from anywhere.
 Cloud backup : Sending a copy of the data over a public network to an off - site
server is called cloud backup and is handled by a third - party service provider. Some
cloud backup services are IBackup, Carbonite, Back Blaze, etc. These services work
out of sight naturally. The client does not have to make any move, such as setting up
folders. Backup services commonly go down any new or changed information on
your PC to another area.

3.7 Storage as a Service


Storage as a service comes across as a good substitute for a small or medium scale
organisations who are not efficient enough have their own storage infrastructure, have
budget constraints and lacks technical personnel for storage implementation. It is an
outsource model which allows third party providers (organizations) to rent space on their

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 31 Cloud Architecture, Services and Storage

storage to end users, who lacks a budget or a capital budget to pay for it on their own.
End users store their data on rented storage space at remote location on cloud. The
storage as a service providers rent their storage space to the organizations on a cost-per-
gigabyte stored or cost-per-data-transfer basis. The end user doesn't have to pay for the
infrastructure; they only pay for how much they're transferring and saving data on the
servers of the provider.
The storage as a service is a good alternative for small or mid - size businesses that
lacks the capital budget to implement and maintain their own storage infrastructure. The
key providers of storage as a service are Amazon S3, Google Cloud Storage, Rackspace,
Dell EMC, Hewlett Packard Enterprise (HPE), NetApp and IBM etc. It is also being
promoted as a way for all companies to mitigate their risks in disaster recovery, provide
long-term retention of records and enhance both business continuity and availability. The
small - scale enterprises find it very difficult and costly to buy dedicated storage
hardware for data storage and backup. This issue is addressed by storage as a service,
which is a business model that help the small companies in renting storage from large
companies who have wider storage infrastructure. It is also suitable if the technical staff
are not available or have insufficient experience to implement and manage the storage
infrastructure.
Individuals as well as small companies can use storage as a service to save cost and
manage backups. They can save cost in hardware, personnel and physical space. Storage
as a service is also called as hosted storage. Storage Service Provider (SSP) are those
companies which are providing storage as a service. SaaS vendors promotes SaaS as a
suitable way of managing backups in the enterprise. They target the secondary storage
applications. It also helps in mitigating the effect of disaster recovery.
Storage providers are responsible for storing data of their customers using this model.
The storage provider provides the software required for the client to access their stored
data on cloud from anywhere and at any time. Customers use that software to perform
standard storage related activities, including data transfers and backups. Since storage as
a service vendors agree to meet SLAs, businesses can be assured that storage can scale
and perform as required. It can facilitate direct connections to both public and private
cloud storage.
In most instances, organizations use storage as a service that opt public cloud for
storage and backup purpose instead of keeping data on premises. The methods provided
by storage as a service include backup and restore, disasters recovery, block storage, SSD
storage, object storage and transmission of bulk data. The backup and restore refers to
data backup to the cloud which provides protection and recovery when data loss occurs.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 32 Cloud Architecture, Services and Storage

Disaster recovery may refer to protecting and replicating data from Virtual Machines
(VMs) in case of disaster. Block storage allows customers to provision block storage
volumes for lower - latency I/O. SSD storage is another type of storage generally used for
data intensive read/write and I/O operations. Object storage systems are used in in data
analytics, disaster recovery and cloud applications. Cold storage is used for quick creation
and configuration of stored data. Bulk data transfers can use disks and other equipment
for bulk data transmission.
There are many cloud storage providers available on the internet, but some of the
popular storage as a service providers are listed as follows :
 Google drive - The google provides Google Drive as a storage service for every
Gmail user who can store up to 15 GB of data free of cost which can scale up to ten
terabytes. It allows to use Google Docs embedded with google account to upload
documents, spreadsheets and presentations to Google’s data servers.
 Microsoft one drive - Microsoft provides One drive with 5 GB free storage space
which is scalable to 5 TB for storing users’ files. It is embedded with Microsoft 365
and Outlook mails. It allows to synchronize files between the cloud and a local
folder along with providing a client software for any platform to store and access
files from multiple devices. It allows to backed-up files with ransomware protection
as well as allowing to recover previous saved versions of files or data from the
cloud.
 Drop box - Dropbox is a file hosting service, that offers cloud storage, file
synchronization, personal cloud and client software services. It can be installed and
run on any OS platform. It provides free storage space of 2 GB which can scale up to
5 TB.
 MediaMax and Strongspace - They offer rented storage space for any kind of
digital data to be stored on cloud servers.

3.7.1 Advantages of Storage as a Service


The key advantages of storage as a service are given as follows
 Cost - Storage as a service reduces much of the expense of conventional backup
methods, by offering ample cloud storage space at a small monthly charge.
 Invisibility - Storage as a service is invisible, as no physical presence can be seen in
its deployment, and therefore does not take up valuable office space.
 Security - In this type of service, data is encrypted both during transmission and
during rest, ensuring no unauthorized access to files by the user.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 33 Cloud Architecture, Services and Storage

 Automation - Storage as a service makes the time - consuming process of backup


easier to accomplish through automation. Users can simply select what and when
they want to backup and the service does the rest of it.
 Accessibility - By using storage as a service, users can access data from
smartphones, netbooks to desktops and so on.
 Syncing - Syncing in storage as a service ensures that your files are updated
automatically across all of your devices. This way, the latest version of a user file
stored on their desktop is available on your smartphone.
 Sharing - Online storage services make it easy for users to share their data with just
a few clicks.
 Collaboration - Cloud storage services are also ideal for collaborative purposes.
They allow multiple people to edit and collaborate in a single file or document. So,
with this feature, users don't need to worry about tracking the latest version or who
made any changes.
 Data protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters such as floods, earthquakes and human error.
 Disaster recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.

3.7.2 Disadvantages of Storage as a Service


The disadvantages of storage as a service are given as follows
 Potential downtimes : Due to failure in cloud, vendors may go through periods of
downtime where the service is not available, which may be a major issue for
mission - critical data.
 Limited customization : As the cloud infrastructure is owned and managed by the
service provider, it is less customizable.
 Vendor lock-in : Due to potential for vendor lock-in, it may be difficult to migrate
from one service provider to another.
 Unreliable - In some cases, there is still a possibility that the system could crash and
leave consumers with no means of accessing their stored data. The small service
provider becomes unreliable in that case. Therefore, when a cloud storage system is
unreliable, it becomes a liability. No one wants to save data on an unstable platform
or trust a organization that is unstable. Most cloud storage providers seek to resolve
the issue of reliability through redundancy.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 34 Cloud Architecture, Services and Storage

3.8 Advantages of Cloud Storage


In today’s scenario, cloud storage is an extremely important and valuable tool for all
kind of businesses. Therefore, it is necessary to understand the benefits and risks
associated with cloud storage. We will now discuss some benefits and risks of the cloud
technology.
The following are the benefits of cloud storage :
 Accessibility : With the internet, clients can access the information from anyplace
and at any time using devices such as smartphones, laptops, tablets, etc. This
reduces the stress of transferring files. Also, files remain same across all the devices.
The cloud storage gives you freedom to access to your files from anywhere, at any
time and on any device through an internet connection.
 Greater collaboration : Without wasting time, cloud storage enables you to transfer
or share files or folders in a simple and a quick way. It removes the pain of sending a
lot of emails to share files. This helps save your time and provides better
collaboration. Also, all the changes are automatically saved and shared with the
collaborators.
 Security : Security is a major concern when it comes to your confidential data.
Cloud storage is secure, with various encryption techniques that prevent
unauthorised access. Cloud storage providers complement their services with
additional security layers. Since there are many users with files stored in the cloud,
these services go to great lengths to ensure that the files are not accessed by anyone
who are not authorized for.
 Cost - efficient : Cloud storage, which is an online repository, eliminates the cost of
hard drives or any other external devices like compact disks. Organisations do not
need to spend extra money on additional expensive servers. There is plenty of space
in online storage. The physical storage's can be expensive than cloud storage as
cloud storage provides remarkably cheaper per GB pricing without the need for
hardware storage than using external drives.
 Instant data recovery : You can access your files in the cloud and recover them in
case of a hard drive failure or some other hardware malfunction. It serves as a
backup solution for your physical drives data stored locally. Cloud storage allows
easy recovery of your original files and restores them with minimal downtime.
 Syncing and updating : When you deal with cloud storage, any time you make
changes to a file from which you access the cloud will be synchronized and
modified across all of your devices.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 35 Cloud Architecture, Services and Storage

 Disaster recovery : Companies are highly advised to have an emergency response


plan ready in case of an emergency. Enterprises may use cloud storage as a back -
up service by offering a second copy of critical files. Such files are stored remotely,
and can be accessed through an internet connection.

3.8.1 Risks in Cloud Storage


The following are the risks in cloud storage :
 Dependency : It is also known as “vendor-lock-in”. The term alludes to the
difficulties in moving from one cloud specialist organisation to other. This is because
of the movement of information. Since administrations keep running over a remote
virtual condition, the client is furnished with restricted access over the product and
equipment, which gives rise to concerns about control.
 Unintended permanence : There have been scenarios when cloud users complain
that specific pictures have been erased in the current ‘iCloud hack’. In this way, the
specialist organisations are in full commitment that the client’s information ought
not be damaged or lost. Consequently, clients are urged to make full utilisation of
cloud backup offices. Subsequently, the duplicates of documents might be recovered
from the servers, regardless of the possibility that the client loses its records.
 Insecure interfaces and APIs : To manage and interact with cloud services, various
interfaces and APIs are used by customers. Two categories of web - based APIs are
SOAP (based on web services) and REST (based on HTTP). These APIs are easy
targets for man-in-the-middle or replay attacks. Therefore, secure authentication,
encryption and access control must be used to provide protection against these
malicious attacks.
 Compliance risks : It is a risk for organisations that have earned certifications to
either meet industry standards or to gain the competitive edge when migrating to
clouds. This is a risk when cloud provider does not follow their own compliance
requirements or when the cloud provider does not allow the audit by the cloud
customer.

3.8.2 Disadvantages of Cloud Storage


 Privacy concerns : In cloud storage, the data no longer exists on your physical disks
as it stores on cloud platform run by cloud service providers. In many cases, the
storage solutions are outsourced by cloud providers to other firms, in such cases the
privacy concerns may arises due to intervention of third - party providers.
 Dependency on internet connection : The data file can only be moved to a cloud
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 36 Cloud Architecture, Services and Storage

server when your internet connection remains working. When your internet
connection faces technical problems or stops functioning, you will face difficulties
in transmitting the data to or recovering from remote server.
 Compliance problems : Many cloud service providers are prone to weaker
compliance as many countries restrict cloud service providers to expose their users
data across country’s geographic boundaries and if they do so, they may get
penalized or may leads to closure of IT operations of specific cloud service provider
in that country that may leads to huge data loss. Therefore, one should never
purchase cloud storage from an unknown source or third parties and always decide
to buy from well - established companies. It might not be possible to operate within
the public cloud depending on the degree of regulation within your industry. This
is particularly the case for healthcare, financial services and publicly traded
enterprises that need to be very cautious when considering this option.
 Vulnerability to attacks : The vulnerability to external hack attacks is present with
your business information stored in the cloud. The internet is not entirely secure,
and for this reason, sensitive data can still be stealthy.
 Data management : Managing cloud data can be a challenge because cloud storage
systems have their own structures. Your business current storage management
system may not always fit well with the system offered by the cloud provider.
 Data protection concerns : Cloud protection and privacy : There are issues about
the remote storage of sensitive and essential data. Before adopting cloud
technologies, you should be aware that you are providing a third - party cloud
service provider with confidential business details and that could potentially harm
your firm. That's why it's crucial to choose a trustworthy service provider you trust
to keep your information protected.

3.9 Cloud Storage Providers


The cloud storage provider, also known as the Managed Service Provider (MSP), is a
company that provides organizations and individuals with the ability to place and retain
data in an off - site storage system. Customers can lease cloud storage capacity per month
or on request. Cloud storage provider hosts customer data in its own data center,
providing cost - based computing, networking and storage infrastructure. Individual and
corporate customers can have unlimited storage capacity on the provider's servers at a
low per - gigabyte price. Instead of storing data on local storage devices, such as a hard
disk drive, flash storage or tape, customers choose a cloud storage provider to host data

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 37 Cloud Architecture, Services and Storage

on a remote data center system. Users can then access these files via an internet
connection. The cloud storage provider also sells non - storage services at a fee.
Enterprises purchase computing, software, storage and related IT components as discreet
cloud services with a pay-as-you-go license. Customers may choose to lease infrastructure
as a service; platform as a service; or security, software and storage as a service. The level
and type of services chosen are set out in a service level agreement signed with the
provider. The ability to streamline costs by using the cloud can be particularly beneficial
for small and medium - sized organizations with limited budgets and IT staff. The main
advantages of using a cloud storage provider are cost control, elasticity and self - service.
Users can scale computing resources on demand as needed and then discard those
resources after the task has been completed. This removes any concerns about exceeding
storage limitations with on - site networked storage. Some of popular cloud storage
providers are Amazon Web Services, Google, Microsoft, Nirvanics and so on. The
description about popular cloud storage providers are given as follows :
 Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services
interface that can be used to store and retrieve any amount of data from anywhere
on the cloud at any time. It gives every developer access to the same highly scalable
data storage infrastructure that Amazon uses to operate its own global website
network. The goal of the service is to optimize the benefits of scale and to pass those
benefits on to the developers.
 Google Bigtable datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.
The size of the Bigtable database can be petabytes, spanning thousands of
distributed servers. Bigtable is now open to developers as part of the Google app
engine, their cloud computing platform.
 Microsoft live mesh : Windows live mesh was a free-to-use internet - based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses live framework APIs to share any
data item between devices that recognize the data.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 38 Cloud Architecture, Services and Storage

 Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage - based pricing. It supports Cloud - based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup or unstructured archives that need long -
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built - in disaster data recovery and
automatic data replication feature for up to three geographically distributed storage
nodes.

3.10 Simple Storage Service (S3)


Amazon S3 offers a simple web services interface that can be used to store and retrieve
any amount of data from anywhere, at any time on the web. It gives any developer access
to the same scalable, secure, fast, low - cost data storage infrastructure that Amazon uses
to operate its own global website network. S3 is an online backup and storage system. The
high - speed data transfer feature known as AWS Import/Export will exchange data to
and from AWS using Amazon’s own internal network to another portable device.
Amazon S3 is a cloud - based storage system that allows storage of data objects in the
range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Here it is
important to note that the concept of a file system is not associated with S3 because file
systems are not supported, only objects are stored. In addition to this, the user is not
required to mount a bucket, as opposed to a file system. Fig. 3.10.1 shows an S3
diagrammatically.

Fig. 3.10.1 : AWS S3

S3 system allows buckets to be named (Fig. 3.10.2), but the name must be unique in the
S3 namespace across all consumers of AWS. The bucket can be accessed through the S3
web API (with SOAP or REST), which is similar to a normal disk storage system.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 39 Cloud Architecture, Services and Storage

Fig. 3.10.2 : Source bucket

The performance of S3 is limited for use with non-operational functions such as data
archiving, retrieval and disk backup. The REST API is more preferred to SOAP API
because it is easy to work with large binary objects in REST.
Amazon S3 offers large volumes of reliable storage with high protection and low
bandwidth access. S3 is most ideal for applications that need storage archives. For
example, S3 is used by large storage sites that share photos and images.
The APIs to manage the bucket has the following features :
 Create new, modify or delete existing buckets.
 Upload or download new objects to a bucket.
 Search and identify objects in buckets.
 Identify metadata associated with objects and buckets.
 Specify where the bucket is stored.
 Provide public access to buckets and objects.
The S3 service can be used by many users as a backup component in a 3-2-1 backup
method. This implies that your original data is 1, a copy of your data is 2 and an off-site
copy of data is 3. In this method, S3 is the 3rd level of backup. In addition to this, Amazon
S3 provides the feature of versioning.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 40 Cloud Architecture, Services and Storage

In versioning, every version of the object stored in an S3 bucket is retained, but for this,
the user must enable the versioning feature. Any HTTP or REST operation, namely PUT,
POST, COPY or DELETE will create a new object that is stored along with the older
version. A GET operation retrieves the new version of the object, but the ability to recover
and undo actions are also available. Versioning is a useful method for reserving and
archiving data.

3.10.1 Amazon Glacier


Amazon glacier is very low - price online file storage web service which offer secure,
flexible and durable storage for online data backup and archiving. This web service is
specially designed for those data which are not accessed frequently. That data which is
allowed to be retrieved within three to five hours can use amazon glacier service.
You can virtually store any type of data, any format of data and any amount of data
using amazon glacier. The file in ZIP and TAR format are the most common type of data
stored in amazon glacier.
Some of the common use of amazon glacier are :
 Replacing the traditional tape solutions with backup and archive which can last
longer.
 Storing data which is used for the purposes of compliance.

3.10.2 Glacier Vs S3
Both amazon S3 and amazon glacier work almost the same way. However, there are
certain important aspects that can reflect the difference between them. Table 3.10.1 shows
the comparison of amazon glacier and amazon S3 :
Amazon Glacier Amazon S3

It supports 40 TB archives It supports 5 TB objects

It is recognised by archive IDs which are It can use “friendly” key names
system generated

It encrypts the archives automatically It is optional to encrypt the data automatically

It is extremely low - cost storage Its cost is much higher than Amazon Glacier

Table 3.10.1 : Amazon Glacier Vs Amazon S3

You can also use amazon S3 interface for availing the offerings of amazon glacier with
no need of learning a new interface. This can be done by utilising Glacier as S3 storage
class along with object lifecycle policies.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 41 Cloud Architecture, Services and Storage

Summary

 The cloud architecture design is the important aspect while designing a cloud.
Every cloud platform is intended to provide four essential design goals like
scalability, reliability, efficiency, and virtualization. To achieve this goal, certain
requirements has to be considered.
 The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. These three levels of architecture are
implemented with virtualization and standardization of cloud-provided
hardware and software resources.
 The NIST cloud computing reference architecture is designed with taking help of
IT vendors, developers of standards, industries and other governmental
agencies, and industries at a global level to support effective cloud computing
security standards and their further development.
 A cloud deployment models are defined according to where the computing
infrastructure resides and who controls the infrastructure. There are four
deployment models are characterized based on the functionality and
accessibility of cloud services namely Public, Private, Hybrid and community.
 The public cloud services are runs over the internet. Therefore, the users who
want cloud services have to have internet connection in their local device,
private cloud services are used by the organizations internally and most of the
times it run over the intranet connection, Hybrid cloud services are composed of
two or more clouds that offers the benefits of multiple deployment models while
community cloud is basically the combination of one or more public, private or
hybrid clouds, which are shared by many organizations for a single cause.
 The most widespread services of cloud computing are categorised into three
service classes which are also called Cloud service models namely IaaS, PaaS
and SaaS.
 Infrastructure-as-a-Service (IaaS) can be defined as the use of servers, storage,
computing power, network and virtualization to form utility like services for
users, Platform as a Service can be defined as a computing platform that allows
the user to create web applications quickly and easily and without worrying
about buying and maintaining the software and infrastructure while Software-
as-a-Service is specifically designed for on demand applications or software
delivery to the cloud users.
 There are six challenges related to cloud architectural design related to data
privacy, security, compliance, performance, interoperability, standardization,
service availability, licensing, data storage and bugs.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 42 Cloud Architecture, Services and Storage

 Cloud storage is a service model in which data is maintained, managed and


backed up remotely and made available to users over a network. Cloud Storage
provides extremely efficient storage of objects that scales to exabytes of data.
 The Storage as a Service is an outsource model which allows third party
providers (organizations) to rent space on their storage to end users, who lacks a
budget or a capital budget to pay for it on their own.
 The cloud storage provider, also known as the Managed Service Provider (MSP),
is a company that provides organizations and individuals with the ability to
place and retain data in an off-site storage system.
 Amazon S3 offers a simple web services interface that can be used to store and
retrieve any amount of data from anywhere, at any time on the web. It gives any
developer access to the same scalable, secure, fast, low-cost data storage
infrastructure that Amazon uses to operate its own global website network.

Short Answered Questions

Q.1 Bring out differences between private cloud and public cloud. AU : Dec.-16
Ans. : The differences between private cloud and public cloud are given in Table 3.1.

Sr. No Feature Public Cloud Private Cloud

1 Scalability Very High Limited

2 Security Less Secure Most Secure

3 Performance Low to Medium Good

4 Reliability Medium High

5 Upfront Cost Low Very High

6 Quality of Service Low High

7 Network Internet Intranet

8 Availability For General Public Organizations Internal Staff

Openstack, VMware
9 Example Windows Azure, AWS etc. Cloud, CloudStack,
Eucalyptus etc.

Table 3.1 : Comparison between various cloud deployment models

Q.2 Why do we need hybrid cloud ? AU : Dec.-16

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 43 Cloud Architecture, Services and Storage

Ans. : The hybrid cloud services are composed of two or more clouds that offers the
benefits of multiple deployment models. It mostly comprises on premise private cloud
and off - premise public cloud to leverage benefits of both and allow users inside and
outside to have access to it. The hybrid cloud provides flexibility such that users can
migrate their applications and services from private cloud to public cloud and vice
versa. It becomes most favored in IT industry because of its eminent features like
mobility, customized security, high throughput, scalability, disaster recovery, easy
backup and replication across clouds, high availability and cost efficient etc. The other
benefits of hybrid cloud are
 Easily - accessibility between private cloud and public cloud with plan for disaster
recovery.
 We can take a decision about what needs to be shared on public network and what
needs to be kept private.
 Get unmatched scalability as per demand.

 Easy to control and manage public and private cloud resources.

Q.3 Write a short note on community cloud. AU : Dec.-18


Ans. : Refer section 3.3.4.

Q.4 Summarize the differences between PaaS and SaaS. AU : May-17


Ans. : The differences between PaaS and SaaS are given as follows.

Platform as a Service (PaaS) Software as a Service (SaaS)


It is used for providing a platform to develop, It is used for on demand software or
deploy, test or run web applications quickly application delivery over the internet or
and easily without worrying about buying and intranet.
maintaining the software and infrastructure.
It is used for web hosting. It is used for software or application hosting.
It provides tools for development, deployment It provides hosted software stack to the users
and testing the softwares along with from which they can get access to particular
middleware solutions, databases, and APIs for software at any time over the network.
developers.
It is used by developers. It is used by end users.
The abstraction in PaaS is moderate. The abstraction in SaaS is very high.
It has significantly lower degree of control than It has higher degree of control than PaaS.
SaaS.
Risk of vendor-interlocking is medium. Risk of vendor-interlocking is very high.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 44 Cloud Architecture, Services and Storage

Operational cost is lower than IaaS. Operational cost is very minimal than IaaS
and PaaS.
It has lower portability than IaaS. It doesn’t provide portability.
Examples : AWS Elastic Beanstalk, Windows Examples : Google Apps, Dropbox,
Azure, Heroku, Force.com, Google App Salesforce, Cisco WebEx, Concur,
Engine, Apache Stratos, OpenShift GoToMeeting

Q.5 Who are the major players in the cloud ? AU : May-19


Ans. : There are many major players who provides cloud services, some of them with
their services supported are given in Table 3.2.
Sr. No. Name of Supported services Deployment
Cloud service model
provider
1) Amazon Web Infrastructure as a Services using EC2, Platform Public cloud
Service (AWS) as a service using elastic beanstalk, Database as a
service using RDB, Storage as a service using S3,
Network as a service using pureport, Containers
as a service using amazon elastic container
service, Serverless computing using lambda etc.
2) Openstack Infrastructure as a services using Nova, Platform Private cloud
as a service using Solum, Database as a service
using Trove, Network as a service using
Neutron, Big data as a service using Sahara etc.
3) Google cloud Infrastructure as a services using google compute Public cloud
platform engine, Platform as a service using google app
engine, Software as a service using google docs,
Gmail and google suit, Database as a service
using Cloud SQL, Containers as a service using
Kubernetes, Serverless computing using
functions as a service, Big data as a service using
Big Query, Storage as a service google cloud
storage, etc.
4) Microsoft azure Infrastructure as a services using azure virtual Public cloud
machines, Platform as a service using azure app
services, Database as a service using azure SQL,
Storage as a service using azure Blob storage,
Containers as a service using azure Kubernetes
service, Serverless computing using azure
functions etc.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 45 Cloud Architecture, Services and Storage

5) Salesforce Software as a service Public cloud


6) Oracle Cloud Infrastructure as a services using Oracle Cloud Public cloud
Infra OCI, Platform as a service using Oracle
application container, Storage as a service using
Oracle Cloud Storage OCI, Containers as a
service using Oracle Kubernetes service,
Serverless computing using Oracle cloud Fn etc.
7) Heroku Cloud Platform as a service Public cloud

Q.6 What are the basic requirements for cloud architecture design ?
Ans. : The basic requirements for cloud architecture design are given as follows :

 The cloud architecture design must provide automated delivery of cloud services
along with automated management.
 It must support latest web standards like Web 2.0 or higher and REST or RESTful
APIs.
 It must support very large - scale HPC infrastructure with both physical and virtual
machines.
 The architecture of cloud must be loosely coupled.

 It should provide easy access to cloud services through a self-service web portal.

 Cloud management software must be efficient to receive the user request, finds the
correct resources, and then calls the provisioning services which invoke the resources
in the cloud.
 It must provide enhanced security for shared access to the resources from data
centers.
 It must use cluster architecture for getting the system scalability.

 The cloud architecture design must be reliable and flexible.

 It must provide efficient performance and faster speed of access.

Q.7 What are different layers in layered cloud architecture design ?


Ans. : The layered architecture of a cloud is composed of three basic layers called
infrastructure, platform, and application. The infrastructure layer consists of virtualized
services for computing, storage, and networking. It is responsible for provisioning
infrastructure components like Compute (CPU and memory), Storage, Network and IO
resources to run virtual machines or virtual servers along with virtual storages. The
platform layer is responsible for providing readily available development and
deployment platform for web applications to the cloud users without needing them to

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 46 Cloud Architecture, Services and Storage

install in a local device. The platform layer has collection of software tools for
development, deployment and testing the software applications. A collection of all
software modules required for SaaS applications forms the application layer. This layer
is mainly responsible for making on demand application delivery. In this layer,
software applications include day-to-day office management softwares used for
information collection, document processing, calendar and authentication. Enterprises
also use the application layer extensively in business marketing, sales, Customer
Relationship Management (CRM), financial transactions, and Supply Chain
Management (SCM).
Q.8 What are different roles of cloud providers ?
Ans. : Cloud provider is an entity that offers cloud services to interested parties. A
cloud provider manages the infrastructure needed for providing cloud services. The
CSP also runs the software to provide services, and organizes the service delivery to
cloud consumers through networks.
SaaS providers then deploy, configure, maintain and update all operations of the
software application on the cloud infrastructure, in order to ensure that services are
provisioned and to fulfil cloud consumer service requests. SaaS providers assume most
of the responsibilities associated with managing and controlling applications deployed
on the infrastructure. On the other hand, SaaS consumers have no or limited
administrative controls.
The major activities of a cloud provider include :
 Service deployment : Service deployment refers to provisioning private, public,
hybrid and community cloud models.
 Service orchestration : Service orchestration implies the coordination, management
of cloud infrastructure, and arrangement to offer optimized capabilities of cloud
services. The capabilities must be cost-effective in managing IT resources and must
be determined by strategic business needs.
 Cloud services management : This activity involves all service-related functions
needed to manage and operate the services requested or proposed by cloud
consumers.
 Security : Security, which is a critical function in cloud computing, spans all layers in
the reference architecture. Security must be enforced end-to-end. It has a wide range
from physical to application security. CSPs must take care of security.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 47 Cloud Architecture, Services and Storage

Fig. 3.1 : Major activities of a cloud provider

 Privacy : Privacy in cloud must be ensured at different levels, such as user privacy,
data privacy, authorization and authentication, and it must also have adequate
assurance levels. Since clouds allow resources to be shared, privacy challenges are a
big concern for consumers using clouds.
Q.9 What are different complications in PaaS ?
Ans. : The following are some of the complications or issues of using PaaS :

 Interoperability : PaaS works best on each provider’s own cloud platform, allowing
customers to make the most value out of the service. But the risk here is that the
customisations or applications developed in one vendor’s cloud environment may
not be compatible with another vendor, and hence not necessarily migrate easily to
it.
Although most of the times customers agree with being hooked up to a single
vendor, this may not be the situation every time. Users may want to keep their
options open. In this situation, developers can opt for open-source solutions. Open-
source PaaS provides elasticity by revealing the underlying code, and the ability to
install the PaaS solution on any infrastructure. The disadvantage of using an open
source version of PaaS is that certain benefits of an integrated platform are lost.
 Compatibility : Most businesses have a restricted set of programming languages,
architectural frameworks and databases that they deploy. It is thus important to
make sure that the vendor you choose supports the same technologies. For example,
if you are strongly dedicated to a .NET architecture, then you must select a vendor
with native .NET support. Likewise, database support is critical to performance and
minimising complexity.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 48 Cloud Architecture, Services and Storage

 Vulnerability and Security : Multitenancy lets users to be spread over interconnected


hosts. The providers must take adequate security measures in order to protect these
vulnerable hosts from attacks, so that an attacker is not able to easily access the
resources of host and also tenant objects.
 Providers have the ability to access and modify user objects/systems. The following
are the three ways by which security of an object can be breached in PaaS systems :
o A provider may access any user object that resides on its hosts. This type of attack
is inevitable but can be avoided to some extent by trusted relations between the
user and the provider.
o Co-tenants, who share the same resources, may mutually attack each other’s
objects.
o Third parties may attack a user object. Objects need to securely code themselves to
defend themselves.
o Cryptographic methods, namely symmetric and asymmetric encryption, hashing
and signatures are the solution for object vulnerability. It is the responsibility of
the providers to protect the integrity and privacy of user objects on a host.
 Vendor lock-in : Pertaining to the lack of standardisation, vendor lock-in becomes a
key barrier that stops users from migrating to cloud services. Technology related
solutions are being built to tackle this problem of vendor lock-in. Most customers are
unaware of the terms and conditions of the providers that prevent interoperability
and portability of applications. A number of strategies are proposed on how to
avoid/lessen lock-in risks before adopting cloud computing.
Lock-in issues arise when a company decides to change cloud providers but is
unable to migrate its applications or data to a different vendor. This heterogeneity of
cloud semantics creates technical incompatibility, which in turn leads to
interoperability and portability challenges. This makes interoperation, collaboration,
portability and manageability of data and services a very complex task.
Q.10 Enlist the pros and cons of storage as a service.
Ans. : The key advantages or pros of storage as a service are given as follows :

 Cost - Storage as a service reduces much of the expense of conventional backup


methods, by offering ample cloud storage space at a small monthly charge.
 Invisibility - Storage as a service is invisible, as no physical presence can be seen in
its deployment, and therefore does not take up valuable office space.
 Security - In this type of service, data is encrypted both during transmission and
during rest, ensuring no unauthorized access to files by the user.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 49 Cloud Architecture, Services and Storage

 Automation - Storage as a service makes the time-consuming process of backup


easier to accomplish through automation. Users can simply select what and when
they want to backup, and the service does the rest of it.
 Accessibility - By using storage as a service, users can access data from smartphones,
netbooks to desktops, and so on.
 Syncing - Syncing in storage as a service ensures that your files are updated
automatically across all of your devices. This way, the latest version of a user file
stored on their desktop is available on your smartphone.
 Sharing - Online storage services make it easy for users to share their data with just a
few clicks.
 Collaboration - Cloud storage services are also ideal for collaborative purposes. They
allow multiple people to edit and collaborate in a single file or document. So, with
this feature, users don't need to worry about tracking the latest version or who made
any changes.
 Data Protection - By storing data on cloud storage services, data is well protected
against all kinds of disasters, such as floods, earthquakes and human error.
 Disaster Recovery - Data stored in the cloud is not only protected from disasters by
having the same copy at several locations, but can also favor disaster recovery in
order to ensure business continuity.

The disadvantages or cons of storage as a service are given as follows


 Potential downtimes : Due to failure in cloud, vendors may go through periods of
downtime where the service is not available, which may be a major issue for mission-
critical data.
 Limited customization : As the cloud infrastructure is owned and managed by the
service provider, it is less customizable.
 Vendor lock-in : Due to Potential for vendor lock-in, it may be difficult to migrate
from one service provider to another.
 Unreliable : In some cases, there is still a possibility that the system could crash and
leave consumers with no means of accessing their stored data. The small service
provider becomes unreliable in that case. Therefore, when a cloud storage system is
unreliable, it becomes a liability. No one wants to save data on an unstable platform
or trust an organization that is unstable. Most cloud storage providers seek to resolve
the issue of reliability through redundancy.
Q.11 What are different risks in cloud storages ?

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 50 Cloud Architecture, Services and Storage

Ans. : The following are the risks in cloud storage :

 Dependency : It is also known as “vendor-lock-in”. The term alludes to the


difficulties in moving from one cloud specialist organisation to other. This is because
of the movement of information. Since administrations keep running over a remote
virtual condition, the client is furnished with restricted access over the product and
equipment, which gives rise to concerns about control.
 Unintended Permanence : There have been scenarios when cloud users complain
that specific pictures have been erased in the current ‘iCloud hack’. In this way, the
specialist organisations are in full commitment that the client’s information ought not
be damaged or lost. Consequently, clients are urged to make full utilisation of cloud
backup offices. Subsequently, the duplicates of documents might be recovered from
the servers, regardless of the possibility that the client loses its records.
 Insecure Interfaces and APIs : To manage and interact with cloud services, various
interfaces and APIs are used by customers. Two categories of web-based APIs are
SOAP (based on web services) and REST (based on HTTP). These APIs are easy
targets for man-in-the-middle or replay attacks. Therefore, secure authentication,
encryption and access control must be used to provide protection against these
malicious attacks.
 Compliance Risks : It is a risk for organisations that have earned certifications to
either meet industry standards or to gain the competitive edge when migrating to
clouds. This is a risk when cloud provider does not follow their own compliance
requirements or when the cloud provider does not allow the audit by the cloud
customer.
Q.12 Enlist the different cloud storage providers.
Ans. : The description about popular cloud storage providers are given as follows :

 Amazon S3 : Amazon S3 (Simple Storage Service) offers a simple cloud services


interface that can be used to store and retrieve any amount of data from anywhere on
the cloud at any time. It gives every developer access to the same highly scalable data
storage infrastructure that Amazon uses to operate its own global website network.
The goal of the service is to optimize the benefits of scale and to pass those benefits
on to the developers.
 Google Bigtable Datastore : Google defines Bigtable as a fast and highly scalable
datastore. The google cloud platform allows Bigtable to scale through thousands of
commodity servers that can store petabytes of data together. Bigtable has been
designed with very high speed, versatility and extremely high scalability in mind.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 51 Cloud Architecture, Services and Storage

The size of the Bigtable database can be petabytes, spanning thousands of distributed
servers. Bigtable is now open to developers as part of the Google App Engine, their
cloud computing platform.
 Microsoft Live Mesh : Windows Live Mesh was a free-to-use Internet-based file
synchronization application designed by Microsoft to enable files and directories
between two or more computers to be synchronized on Windows or Mac OS
platforms. It has support of mesh objects that consists of data feeds, which can be
represented in Atom, RSS, JSON, or XML. It uses Live Framework APIs to share any
data item between devices that recognize the data.
 Nirvanix : Nirvanix offers public, hybrid and private cloud storage services with
usage-based pricing. It supports Cloud-based Network Attached Storage
(CloudNAS) to store data in premises. Nirvanix CloudNAS is intended for
businesses that manage archival, backup, or unstructured archives that need long-
term, secure storage, or organizations that use automated processes to migrate files
to mapped drives. The CloudNAS has built-in disaster data recovery and automatic
data replication feature for up to three geographically distributed storage nodes.
Q.13 What is Amazon S3 ?
Ans. : Amazon S3 is a cloud-based storage system that allows storage of data objects in
the range of 1 byte up to 5 GB in a flat namespace. The storage containers in S3 have
predefined buckets, and buckets serve the function of a directory, though there is no
object hierarchy to a bucket, and the user can save objects to it but not files. Amazon S3
offers a simple web services interface that can be used to store and retrieve any amount
of data from anywhere, at any time on the web. It gives any developer access to the
same scalable, secure, fast, low-cost data storage infrastructure that Amazon uses to
operate its own global website network.

Long Answered Questions

Q.1 With architecture, elaborate the various deployment models and reference
models of cloud computing. AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models.
Q.2 Describe service and deployment models of cloud computing environment
with illustration. How do they fit in NIST cloud architecture ? AU : Dec.-17
Ans. : Refer section 3.3 for cloud deployment models and section 3.4 for cloud reference
models and section 3.2 for NIST cloud architecture.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 3 - 52 Cloud Architecture, Services and Storage

Q.3 List the cloud deployment models and give a detailed note about them.
AU : Dec.-16
Ans. : Refer section 3.3 for cloud deployment models.

Q.4 Give the importance of cloud computing and elaborate the different types of
services offered by it. AU : Dec.-16
Ans. : Refer section 3.4 for cloud service models.

Q.5 What are pros and cons for public, private and hybrid cloud ? AU : Dec.-18
Ans. : Refer section 3.3 for pros and cons of public, private and hybrid cloud and
section 3.3.5 for their comparison.
Q.6 Describe Infrastructure as a Service (IaaS), Platform-as-a-Service (PaaS) and
Software-as-a-Service (SaaS) with example. AU : Dec.-18
Ans. : Refer section 3.4 for cloud service models for description of Infrastructure as a
Service (IaaS), Platform-as-a-Service (PaaS) and Software-as-a-Service (SaaS).
Q.7 Illustrate the cloud delivery models in detail. AU : Dec.-19
Ans. : Refer section 3.4 for cloud delivery models.

Q.8 Compare and contrast cloud deployment models. AU : Dec.-19


Ans. : Refer section 3.3 for cloud deployment models and 3.3.5 for comparison between
cloud deployment models.
Q.9 Describe the different working models of cloud computing. AU : May-19
Ans. : Refer sections 3.3 and 3.4 for working models of cloud computing which are
deployment models and service models.
Q.10 Write a detailed note on layered cloud architecture design.
Ans. : Refer section 3.1.1.

Q.11 Explain in brief NIST cloud computing reference architecture.


Ans. : Refer section 3.2.

Q.12 Enlist and contrast architectural design challenges of cloud computing.


Ans. : Refer section 3.5.

Q.13 Explain in detail cloud storage along with its pros and cons.
Ans. : Refer section 3.6 for cloud storage and 3.8 for pros and cons of cloud storage.

Q.14 Write a detailed note on storage-as-a-service


Ans. : Refer section 3.7.

Q.15 Explain in brief significance of Amazon S3 in cloud computing.


Ans. : Refer section 3.10.


®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
4 Resource Management and
Security in Cloud
Syllabus
Inter Cloud Resource Management - Resource Provisioning and Resource Provisioning
Methods - Global Exchange of Cloud Resources - Security Overview - Cloud Security
Challenges - Software-as-a-Service Security - Security Governance - Virtual Machine Security -
IAM - Security Standards.

Contents
4.1 Inter Cloud Resource Management
4.2 Resource Provisioning and Resource Provisioning Methods
4.3 Global Exchange of Cloud Resources
4.4 Security Overview
4.5 Cloud Security Challenges
4.6 Software-as-a-Service Security
4.7 Security Governance
4.8 Virtual Machine Security
4.9 IAM
4.10 Security Standards

(4 - 1)
Cloud Computing 4-2 Resource Management and Security in Cloud

4.1 Inter Cloud Resource Management


Resource management is a process for the allocation of computing, storage,
networking and subsequently energy resources to a set of applications, in a context that
aims to collectively meet the performance goals of infrastructure providers, cloud users
and applications. The cloud users prefer to concentrate on application performance while
the conceptual framework offers a high-level view of the functional aspect of cloud
resource management systems and all their interactions. Cloud resource management is a
challenge due to the scale of modern data centers, the heterogeneity of resource types, the
interdependence between such resources, the variability and unpredictability of loads,
and the variety of objectives of the different players in the cloud ecosystem.
Whenever any service is deployed on cloud, it uses resources aggregated in a common
resource pool which are collected from different federated physical servers. Sometimes,
cloud service brokers may deploy cloud services on shared servers for their customers
which lie on different cloud platforms. In that situation, the interconnection between
different servers needs to be maintained. Sometimes, there may be a loss of control if any
particular cloud server faces downtime which may generate huge business loss.
Therefore, it’s quite important to look at inter cloud resource management to address the
limitations related to resource provisioning.
We have already seen the NIST architecture for cloud computing which has three
layers namely infrastructure, platform and application.
These three layers are referred by three services like Infrastructure as a service,
Platform as a service and Software as a service respectively. The Infrastructure as a
service is the foundation layer which provides compute, storage and network services to
other two layers like platform as a service and software as a service. Even as the three
basic services are different in use, they are built on top of each other. In practical there are
five layers required to run cloud applications. The functional layers of cloud computing
services are shown in Fig. 4.1.1.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-3 Resource Management and Security in Cloud

Fig. 4.1.1 Functional layers of Cloud computing

 The consequence is that one cannot directly launch SaaS applications on a cloud
platform. The cloud platform for SaaS cannot be built unless there are compute,
storage and network infrastructure are established.
 In above architecture, the lower three layers are more closely connected to physical
specifications.
 The Hardware as a Service (HaaS) is the lowermost layer which provides various
hardware resources to run cloud services.
 The next layer is Infrastructure as a Service that interconnects all hardware elements
using computer, storage and network services.
 The next layer has two services namely Network as a Service (NaaS) to bind and
provisioned cloud services over the network and Location as a Service (LaaS) to
provide collocation service to control, and protect all physical hardware and
network resources.
 The next layer is Platform as a Service for web application deployment and delivery
while topmost layer is actually used for on demand application delivery.
In any cloud platform, the cloud infrastructure performance is the primary concern for
every cloud service provider while quality of services, service delivery and security are
the concerns for cloud users. Every SaaS application is subdivided into the different
application areas for business applications like CRM is used for sales, promotion, and
marketing services. CRM offered the first SaaS on the cloud successfully. The other tools
may provide distributed collaboration, financial management or human resources
management.
In inter cloud resource provisioning, developers have to consider how to design the
system to meet critical requirements such as high throughput, HA, and fault tolerance.
The infrastructure for operating cloud computing services may be either a physical server

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-4 Resource Management and Security in Cloud

or a virtual server. By using VMs, the platform can be flexible, i.e. running services are
not associated with specific hardware platforms. This adds flexibility to cloud computing
platforms. The software layer at the top of the platform is a layer for storing huge
amounts of data.
Like in the cluster environment, there are some runtime support services accessible in
the cloud computing environment. Cluster monitoring is used to obtain the running state
of the cluster as a whole. The scheduler queues the tasks submitted to the entire cluster
and assigns tasks to the processing nodes according to the availability of the node. The
runtime support system helps to keep the cloud cluster working with high efficiency.
Runtime support is the software needed for browser-initiated applications used by
thousands of cloud customers. The SaaS model offers software solutions as a service,
rather than requiring users to buy software. As a result, there is no initial investment in
servers or software licenses on the customer side. On the provider side, the cost is rather
low compared to the conventional hosting of user applications. Customer data is stored in
a cloud that is either private or publicly hosted by PaaS and IaaS providers.

4.2 Resource Provisioning and Resource Provisioning Methods


The rise of cloud computing indicates major improvements in the design of software
and hardware. Cloud architecture imposes further focus on the amount of VM instances
or CPU cores. Parallelism is being used at the cluster node level. This section broadly
focuses on the concept of resource provisioning and its methods.

4.2.1 Provisioning of Compute Resources


Cloud service providers offer cloud services by signing SLAs with end-users. The
SLAs must commit appropriate resources, such as CPU, memory, and bandwidth that the
user can use for a preset time. The lack of services and under provisioning of resources
would contribute to violation of the SLAs and penalties. The over provisioning of
resources can contribute to under-use of services and, as a consequence, to a decrease in
revenue for the supplier. The design of an automated system to provision resources and
services effectively is a difficult task. The difficulties arise from the unpredictability of
consumer demand, heterogeneity of services, software and hardware failures, power
management and disputes in SLAs signed between customers and service providers.
Cloud architecture and management of cloud infrastructure rely on effective VM
provisioning. Resource provisioning schemes are also used for the rapid discovery of
cloud computing services and data in cloud. The virtualized cluster of servers involve
efficient VM deployment, live VM migration, and fast failure recovery. To deploy VMs,

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-5 Resource Management and Security in Cloud

users use virtual machines as a physical host with customized operating systems for
different applications.
For example, Amazon’s EC2 uses Xen as the Virtual Machine Monitor (VMM) which is
also used in IBM’s Blue Cloud. Some VM templates are also supplied on the EC2
platform. From templates, users can select different types of VMs. But no VM templates
are provided by IBM 's Blue Cloud. Any form of VMs may generally be run on the top of
Xen. In its Azure cloud platform, Microsoft also applied virtualization. A resource-
economic services provider should deliver. The increase in energy waste by heat
dissipation from data centers means that power-efficient caching, query processing and
heat management schemes are necessary. Public or private clouds promise to streamline
software, hardware and data as a service, provisioned in order to save on-demand IT
deployment and achieving economies of scale in IT operations.

4.2.2 Provisioning of Storage Resources


As cloud storage systems also offer resources to customers, it is likely that data is
stored in the clusters of the cloud provider. The data storage layer in layered architecture
lies at the top of a physical or virtual server. The provisioning of storage resources in
cloud is often associated with the terms like distributed file system, storage technologies
and databases.
Several cloud computing providers have developed large scale data storage services to
store a vast volume of data collected every day. A distributed file system is very essential
for storing large data, as traditional file systems have failed to do that. For cloud
computing, it is also important to construct databases such as large-scale systems based
on data storage or distributed file systems. Some examples of distributed file system are
Google’s GFS that stores huge amount of data generated on web including images, text
files, PDFs or spatial data for Google Earth. The Hadoop Distributed File System (HDFS)
developed by Apache is another framework used for distributed data storage from the
open source community. Hadoop is an open-source implementation of Google's cloud
computing technology. The Windows Azure Cosmos File System also uses the distributed
file system. Since the storage service or distributed file system can be accessed directly,
similar to conventional databases, cloud computing does have a form of structure or
semi-structured database processing capabilities. However, there are also other forms of
data storage. In cloud computing, another type of data storage is (Key, Value) pair or
object-based storage. Amazon DynamoDB uses (Key, Value) pair to store a data in a
NOSQL database while Amazon S3 uses SOAP to navigate objects stored in the cloud.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-6 Resource Management and Security in Cloud

In storage, numerous technologies are available like SCSI, SATA, SSDs, and Flash
storages and so on. In future, hard disk drives with solid-state drives may be used as an
enhancement in storage technologies. It would ensure reliable and high-performance data
storage. The key obstacles to the adoption of flash memory in data centers have been
price, capacity and, to some extent, lack of specialized query processing techniques.
However, this is about to change as the I/O bandwidth of the solid-state drives is
becoming too impressive to overlook.
Databases are very popular for many applications as they used as an underlying storage
container. The size of such a database can be very high for the processing of huge
quantities of data. The main aim is to store data in structured or semi-structured forms so
that application developers can use it easily and construct their applications quickly.
Traditional databases may meet the performance bottleneck while the system is being
extended to a larger scale. However, some real applications do not need such a strong
consistency. The size of these databases can be very growing. Typical cloud databases
include Google’s Big Table, Amazons Simple DB or DynamoDB and Azure SQL service
from Microsoft Azure.

4.2.3 Provisioning in Dynamic Resource Deployment


The cloud computing utilizes virtual machines as basic building blocks to construct
the execution environment across multiple resource sites. Resource provisioning in
dynamic environment can be carried out to achieve scalability of performance. The Inter-
Grid is a Java-implemented programming model that allows users to build cloud-based
execution environments on top of all active grid resources. The peering structures
established between gateways enable the resource allocation from multiple grids to
establish the execution environment. The Intergrid Gateway (IGG) allocate resources
from the local cluster to deploy applications in three stages, which include requesting
virtual machines, authorizing leases and deploying virtual machines as demanded. At
peak demand, this IGG interacts at another IGG that is capable of sharing resources from
a cloud storage provider. The grid has pre-configured peering relationships with other
grids that are controlled by the IGG. The system manages the use of Intergrid resources
across several IGGs. The IGG is aware of peering parameters with other grids that selects
appropriate grids that can provide the necessary resources, and responds to requests
from other IGGs. The Request redirect policies decide which peering grid Intergrid wants
to process the request and the rate at which that grid can perform the task. The IGG can
even allocate resources from a cloud service provider. The cloud system provides a
virtual environment that lets users to deploy their applications as like Intergrid, such

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-7 Resource Management and Security in Cloud

technologies use the tools of the distributed grid. The Intergrid assigns and manages a
Distributed Virtual Environment (DVE). It is a cluster of available vms isolated from
other virtual clusters. The DVE Manager component performs resource allocation and
management on behalf of particular user applications. The central component of the IGG
is the schedule for enforcing provisioning policies and peering with several other
gateways. The communication system provides an asynchronous message-passing
mechanism that is managed in parallel by a thread pool.

4.2.4 Methods of Resource Provisioning


There are three cases in the static cloud resource provisioning scheme, namely over-
provisioning of resources at peak load, under provisioning of resources that results in
losses for both the user and the providers because of wastage and shortage of resources
below the allocated capacity and constant provisioning and Constant provision of
resources with fixed capacity for declining user demand could result in to even worse
waste of resources. In such cases, both the user and the provider may lose in the
provisioning of resources with no elasticity.
 There are three resource-provisioning methods which are presented in the
following sections.
 The demand-driven method offers static resources and has been used for many
years in grid computing.
 The event-driven method is based on the expected time-dependent workload.
 The popularity-driven method is based on the monitoring of Internet traffic. We
define these methods of resource provisioning as follows.

4.2.4.1 Demand-Driven Resource Provisioning

In demand driven resource provisioning, the resources are allocated as per demand by
the users in dynamic environment. This method adds or eliminates computing instances
depending on the current level of usage of allocated resources. The demand-driven
method automatically allocates two the CPUs to the user application when the user uses
one CPU more than 60 percent of the time for an extended period. In general, when a
resource has met the threshold for a certain amount of time, the system increases the
resource on the basis of demand. If the resource is utilized below the threshold for a
certain amount of time, that resource could be reduced accordingly. This method is
implemented by Amazon web services called as auto-scale feature that runs on its EC2

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-8 Resource Management and Security in Cloud

server. This method is very easy to implement. This approach does not work successfully
if the workload changes abruptly.

4.2.4.2 Event-Driven Resource Provisioning

In event driven resource provisioning, the resources are allocated whenever an event
generated by the users for at a specific time of interval in dynamic environment. This
method adds or removes machine instances that are based on a specific time event. This
approach works better for seasonal or predicted events when additional resources are
required for shorter time of interval. During these events, the number of users increases
before and decreases after the event period. Decreases over the course of the incident.
This scheme estimates peak traffic before the event happens. This method results in a
small loss of QoS if the occurrence is correctly predicted. Otherwise, its wasted resources
are even larger due to events that do not follow a fixed pattern.

4.2.4.3 Popularity-Driven Resource Provisioning

In popularity driven resource provisioning, the resources are allocated based on


popularity of certain applications and their demands. In this method, the internet checks
for popularity of certain applications and produces instances by popularity demand. In
this method, the Internet seeks the popularity and creates instances by popularity
demand of certain applications. The scheme expects increased in traffic with popularity.
Again, if the predicted popularity is correct, the scheme has a minimum loss of QoS. If
traffic does not happen as expected, resources may get wasted.

4.3 Global Exchange of Cloud Resources


To serve a large number of users worldwide, the IaaS cloud providers have set up
datacenters in various geographical locations to provide redundancy and ensure
reliability in the event of site failure. However, Amazon is currently asking its cloud
customers (i.e. SaaS providers) to give preference to where they want their application
services to be hosted. Amazon does not have seamless/automatic frameworks for scaling
hosted services across many geographically dispersed data centers. There are many
weaknesses in this approach. First, cloud customers cannot find the best place for their
services in advance because they do not know the origin of their services' consumers.
Secondly, SaaS providers may not be able to meet QoS requirements from multiple
geographical locations of their service consumers. It involves the development of
structures that help complex applications across multiple domains to efficiently federate

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4-9 Resource Management and Security in Cloud

cloud data centers to meet cloud customers' QoS targets. Moreover, not a single provider
of cloud infrastructure will be able to set up its data centers, anywhere around the world.
This will make it difficult to meet the QoS standards for all its customers by cloud
applications service (SaaS) providers. They also want to take advantage of the resources
of multiple providers that can best serve their unique needs in cloud infrastructure. In
companies with global businesses and applications such as Internet services, media
hosting and Web 2.0 applications, this form of requirement often arises. This includes the
federation of providers of cloud infrastructure to offer services to multiple cloud
providers. To accomplish it, Intercloud architecture has been proposed to enable
brokerage and the sharing of cloud resources for applications across multiple clouds in
order to scale applications. The generalized Intercloud architecture is shown in Fig. 4.3.1.

Fig. 4.3.1 Intercloud Architecture

The cloud providers can expand or redimension their provision capacity in a


competitive and dynamic manner by leasing the computation and storage resources of
other cloud service providers with the use of Intercloud architectural principles. It helps
operators, such as Salesforce.com to host services based in an SLA contract that is agreed,
to operate in a market-driven resource leased federation. It offers reliable, on demand,
affordable and QoS-aware services using virtualization technology and ensures high QoS
quality and reduces cost of operation. They must be able to employ market-based utility

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 10 Resource Management and Security in Cloud

models as the assumption to offer heterogeneous user applications to virtualize software


services and federated hardware infrastructures.
The intercloud architecture consolidates the distributed storage and computing
capabilities of clouds in a single resource-leasing abstraction. They comprise client
brokerage and coordination services which support the utility based useful cloud
federation: scheduling of applications, allocation of resources and workload migration.
The system will facilitate the integration of cross-domain capability for on demand,
adaptable, energized and reliable infrastructure access based on virtualization
technology. The Cloud Exchange (CEX) is used to enhance and analyze the infrastructure
demands of application brokers against the available supply. It acts as a marketing
authority to bring service producers and consumers together to encourage cloud service
trading on the basis of competitive economic models such as commodity prices and
auctions. The SLA (Service Level Agreement) specifies the service details to be provided
in accordance with agreed metrics, incentives and penalties for meeting and breaching
expectations. The accessibility of a bank system within the market ensures that SLAs
between participants are transacted in a safe and reliable environment.

4.4 Security Overview


The cloud computing is made as an on-demand service through the network to
provision resources, applications and information. It includes a very high computational
power and storage capacity. Nowadays most small and medium size companies (SMEs)
move to the cloud because of their advantages such as lower infrastructure, no
maintenance costs, model payoff, scalability, load balancing, independent venue, on-
demand access, quicker deployment and flexibility, etc.
Although cloud computing has many benefits in most of the aspects, but security
issues in cloud platforms led many companies to hesitate to migrate their essential
resources to the cloud. In this new environment, companies and individuals often worry
about how security, privacy, trust, confidentiality and integrity of compliance can be
maintained. However, the companies that jump to the cloud computing can be even more
worrying about the implications of placing critical applications and data in the cloud. The
migration of critical applications and sensitive data to public and shared across multiple
cloud environments is a major concern for companies that move beyond the network
perimeter defense of their own data center. To resolve these concerns, a cloud software
provider needs to ensure that customers continue to maintain the same security and
privacy controls on their services and applications, provide customers with evidence that
their company and consumers are secure, and can fulfill their service-level agreements,

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 11 Resource Management and Security in Cloud

and can demonstrate their auditors' compliance. Lack of trust between service providers
and cloud users has prevented cloud computing from being generally accepted as a
solution for on demand service.
Trust and privacy are also more challenging for web and cloud services as most
desktop and server users have resisted leaving user applications to their cloud provider’s
data center. Some users worry about the lack of privacy, security and copyright
protection on cloud platforms. Trust is not a mere technological question, but a social
problem. However, with a technical approach, the social problem can be solved. The
cloud uses virtual environment that poses new security threats that are harder to manage
than traditional configurations of clients and servers. Therefore, a new data protection
model is needed to solve these problems.
Three basic enforcement of cloud security is expected. First, data center security
facilities require year-round on-site security. There is frequently deployment of biometric
readers, CCTV (Close Circuit), motion detection and man traps are required. The global
firewalls, Intrusion Detection Systems (IDSes) and third-party vulnerability assessments
often required for meeting fault-tolerant network security. Finally, security platform must
acquire SSL transmissions, data encryption, strict password policies and the certification
of the system's trust. As Cloud servers can be either physical or Virtual machines.
Security compliance requires a security-aware cloud architecture that should provide
remedy for malware-based attacks such as worms, viruses and DDoS exploit system
vulnerabilities. These attacks compromise the functionality of the system or provide
unauthorized access to critical information for intruders.

4.4.1 Cloud Infrastructure Security


The cloud computing is made for provision of resources, applications and information
as an on-demand service over the network. It comprises very high computational power
and storage capabilities. Nowadays most of the small and medium size companies are
migrating towards cloud because of its benefits like reduced hardware, no maintenance
cost, pay-as-you go model, scalability, load balancing, location independent access, on-
demand security controls, fast deployment and flexibility etc.
But still many of the organizations are still confident to move in to cloud because of
security concerns. There are many security issues arises in cloud computing which needs
to be resolved on priority basis. As we have seen in chapter 3, cloud computing has three
service models called SPI model in which infrastructure is the core of all the service
models. The infrastructure as a service comprises servers, storage, network, virtual
machines and virtual operating systems on which other services are deployed. So, there is
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 12 Resource Management and Security in Cloud

a need to protect infrastructure first. The infrastructure security is the important factor in
cloud security. The cloud composed of network of connected servers called host with
applications deployed on them.
The infrastructure security has three levels security model which is composed of Network
level security, Host level security and Application level security. The three models of
infrastructure security are explained as follows.

4.4.2 Network Level Security


The network level security is related to vulnerabilities in the public and private
network. At network level it is important to distinguish public and private clouds. In
private cloud the attacks, vulnerabilities and risk specific to network topology are known
prior and information security personnel need to consider those only.
In public cloud the changing security requirements will require changes to the
network topology and the manner in which the existing network topology interacts with
the cloud provider's network. In public cloud the data moves to or from the organization
which needs to ensure confidentiality and integrity. So if user is not using HTTPS but
using HTTP for accesses then it increase the risk which needs to be point out.
In hybrid cloud private and public cloud work together in different environments and
has different network topologies. So challenge here is to look risks associated with both
the topologies.
There are four significant factors needs to be considered in network level security.
 The Confidentiality and Integrity needs to be ensured for data-in-transit to and
from public cloud.
 The Access control including authentication, authorization, and auditing need to be
provided for resources you are using from public cloud.
 The Availability must be ensured for resources in a public cloud those are being
used by your organization or assigned to you by your public cloud providers.
 The Established Model of Network Zones and Tiers should be replaced with
domains.
The above factors are explained in detail as follows.

a) Ensuring Data Confidentiality and Integrity


In cloud some resources need to be accessed from public data center while some
resides in private data center so resources and data confined to a private network will be
exposed to the Internet and shared over public network which belongs to third-party

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 13 Resource Management and Security in Cloud

cloud provider.so there we need to ensure ensuring data confidentiality and integrity
together.
For example, as per Amazon Web Services (AWS) security vulnerability report the
users have used digital signature algorithms to access Amazon SimpleDB and Amazon
Elastic Compute Cloud (EC2) over HTTP instead of HTTPS. Because of that they did face
an increased risk that their data could have been altered in transit without their
knowledge.

b) Ensuring Proper Access Control


As some resources from private network are being exposed to public network, an
organization using a public cloud faces a significant increase in risk to its data. So there is
a need of auditing the operations of your cloud provider’s network by observing
network-level logs and data, and need thoroughly conducted investigations on it. The
reused IP address and DNS attacks are examples of this risk factor

c) Ensuring the Availability of Internet-Facing Resources


The network level security is needed because of increased amount of data and services
on externally hosted devices to ensure the availability of cloud-provided resources. The
BGP+ prefix hijacking is the good example of this risk factor. The Prefix hijacking
involves announcing an autonomous system address space that belongs to someone else
without her/his permission that affects the availability.

d) Replacing the existing model of network zones with Domains


In cloud, the already existing and established models of network zones and tiers are no
longer exists. In those models the network security has relied on zones like intranet and
internet. Those models were based on exclusion where only individuals and systems in
specific roles have access to those specific zones. For example, systems within a
presentation tier are not allowed to communicate directly with systems in the database
tier, but can communicate within the application zone.

4.4.3 Host Level Security


The Host Level security is related to cloud service models like SaaS, PaaS, and IaaS
and deployment models public, private, and hybrid. Although there are some known
threats to hosts but some virtualization threats like VM escape, system configuration,
drift, weak access control to the hypervisor and insider threats needs to be prevented.
Therefore, managing vulnerabilities and doing patches is becomes much harder than just
running a scan. The host level security for SaaS, PaaS and IaaS are explained as follows.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 14 Resource Management and Security in Cloud

a) SaaS and PaaS Host Security


In general, CSP never disclose the information related to host platform and operating
systems that are in place to secure the host. So, in the context of SaaS and PaaS CSP is
liable to secure the hosts. To get assurance from CSP, the user can ask to share
information under a Non-Disclosure Agreement (NDA) by which CSP share their
information via a control’s assessment framework such as SysTrust or ISO 27002. As we
have seen that both PaaS and SaaS platforms hide the host operating system from end
users using host abstraction layer. Therefore, host security responsibilities in SaaS and
PaaS services are transferred to the CSP so you do not have to worry about host level
security by protecting hosts from host-based security threats.

b) IaaS Host Security


In IaaS hypervisor is the main controller of all the VMs underlined it. So, IaaS host
security involves securing the Virtualization software (called Hypervisor) and guest OS
or virtual servers. The Virtualization software is sits on top of bare metal hypervisor that
allows customers to create, destroy and manage virtual instances.so it is important to
protect Virtualization software as it sits between Hardware and Virtual servers. The
Virtual instances of operating systems like Windows, Linux are provisioned on the top of
the virtualization layers which are visible to their customers.so VM instances running
business critical applications are also needs to be protected. So, if hypervisor becomes
vulnerable then it could expose all users VM instances and domains to outsider.
The most common attacks happen at host level in public cloud are
 Hijacking of accounts those are not properly secured.
 Stealing the keys like SSH private keys those are used to access and manage hosts.
 Attacking unpatched and vulnerable services by listening on standard ports like
FTP, NetBIOS, SSH.
 Attacking systems that are not secured by host firewalls.
 Deploying Trojans embedded viruses in the software’s running inside the VM.
So, the recommendation for host level security are given as follows
 Do not allow password-based authentication for shell-based user access.
 Install and Configure a host firewall with minimum ports should be opened for
necessary services.
 Install host-based IDS and IPS.
 Always enable the system auditing and event logging.
 Use public and private keys to access hosts in the public cloud.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 15 Resource Management and Security in Cloud

 Periodically review logs for inspecting the suspicious activities.


 Protect the integrity of vm images from unauthorized access.
 Always ask for super user password or role-based access for Unix based host
images.

4.4.4 Application Level Security


The application security describes the security features and specifications for the
applications and discusses the outcomes of the safety testing. Security of applications in
the cloud is one of vital success factor for every SaaS or PaaS cloud. Security procedures
for software, secure declaration of instructions, training scripts and testing methods are
usually a joint effort for the development and security teams. The product development
team should provide the security standards of product development engineers, although
product engineering is likely to focus on the application layer. They must provide the
security design and the infrastructure levels that interact with the application itself.
The security team and product development team must work together to provide
better application level security. For source code reviews of applications and to get
insights about attacks, the external penetration testers are used who ensure to fulfill an
objective of safety review of the application and for customers they regularly perform the
attacks and penetration tests. Firewalls, intrusion detection and prevention system,
integrity monitoring and log inspection can all be used as virtual machine application to
enhance server and application protection along with compliance integrity as virtual
resources are migrate from on-site to public cloud environments.
The Application security is happened at the SaaS or PaaS level of SPI (SaaS, PaaS and
IaaS) model therefore cloud service providers are responsible for providing security for
applications hosted in their data centers. At SaaS level the hosted applications need to be
protected while at PaaS level the Platform, databases and runtime engines needs to be
protected. The designing and implementing new applications which are going to be
deployment on public cloud platform will require reevaluation of existing application
security programs and standards. Mostly the web applications like content management
system, websites, blogs, portals, bulletin boards and discussion forums are used by small
and large organizations which are hosted on cloud platform. Therefor attacks related to
web needs to be prevented by understanding the vulnerabilities in the websites. The
cross-site scripting (XSS) attack, SQL injection, malicious file execution are the common
attacks happened at application level in the cloud.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 16 Resource Management and Security in Cloud

The common types attacks happened at network, host and application levels are
explained in Table 4.1.1.

Level of Attack Name of Attack Description

In which attacker monitors the network traffic in transit


Eavesdropping
then interprets all unprotected data.
In this attack the valid data is transmitted maliciously
Replay attack and repeatedly to gain access to unauthorized
resources.
In this attack same IP address is reassigned to new
Reused IP
customer which is used by other customer abruptly that
address
violating the privacy of the original user.
In this attack the attacker translates the domain name in
DNS Attacks to IP address such that sender and a receiver get
rerouted through some evil connection.
In this attack the wrong announcement on IP address
BGP Prefix associated with Autonomous System (AS) is done such
Hijacking that malicious parties get access to the untraceable IP
Network Level
address.
Attacks
In this attack the data flowing on network is traced and
captured by Sniffer program and recorded through the
Sniffer Attack
NIC such that the data and traffic is rerouted to evil
connection.
In this attack the open ports are scanned which are
configured by customer to allow traffic from any source
Port Scanning
to a specific port, then that specific port will become
vulnerable to a port scan.
This type of attack prevents the authorized users to
Dos Attack access services on network by flooding, disrupting,
jamming, or crashing them.
Distributed In DDoS attack that occurs from more than one source,
Denial of Service and from more than one location at the same time to
Attack flood the server.
As hypervisor is responsible for running monitor
Host level Threats to multiple Guest operating systems under single
attacks hypervisor hardware unit. It is difficult to monitor it so malicious
code get control of the system and block other guest OS.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 17 Resource Management and Security in Cloud

Self-provisioning feature provided by virtual servers on


Threats to Virtual
an IaaS platform creates a risk that insecure virtual
servers
servers
Attackers insert a malicious code into a standard SQL
SQL Injection
code and allow downloading the entire database in
attack
illicit ways.
It embed the script tags in URLs such that when user
Cross-site clicks on them, the JavaScript get executed on
scripting [XSS] machine and hacker get control and access all our
private information
EDoS attacks on pay-as-you-go cloud applications will
result dramatic increase in your cloud utility bill,
increased use of network bandwidth, CPU, and storage
EDoS
consumption. This type of attack is done against the
billing model that underlies the cost of providing a
service.
Application level
In this type of attack the unauthorized person changes
attacks
or modifies the content of cookies to identity the
Cookie Poisoning
credential information and accesses the applications or
web pages.
This type of attack happens when developers enable the
Backdoor and debugging option during publishing the web site. So,
debug options hacker can easily enter into the web-site and make some
changes.
In this attack the attacker identifies hidden fields in
Hidden field
HTML forms, save the catalogue page and change the
manipulation
value of hidden fields posted on web page.
This type of attack is same as eavesdropping where
Man in the
tracker set up the connection between two users and
middle attack
tries to hear the conversation between them.

Table 4.4.1 Common types of attacks happened at network, host and application levels

4.5 Cloud Security Challenges


Although cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is important to
resolve increased security threats in order to fully benefit from this new computing
paradigm. This applies to SaaS suppliers in particular. You share computing services with

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 18 Resource Management and Security in Cloud

other companies in a cloud environment. In such environment, you may not have
awareness or control of where the resources are running in a shared pool outside the
organization’s boundary. Sharing your data in such environment with other companies
may give the government reasonable reason to seize your assets as other company has
violated the law of compliance. You may put your data at risk of seizure because you
have shared the cloud environment. Most of the times, if you want to switch from one
cloud provider to the other, storage services offered by one cloud vendor may be
incompatible with other platform services. Like Amazon’s “Simple Storage Service” [S3]
is incompatible with IBM’s Blue Cloud, or Dell or Google cloud platform. In Storage
cloud, most clients probably want their data to be encrypted via SSL (Secure Sockets
Layer) across the Internet in both ways. They most probably want to encrypt their data
while it is in the cloud storage pool. Therefore, in cloud, who is controlling the encryption
/ decryption keys when information is encrypted during the cloud ? Is it by the client or
the vendor in the cloud? These are unanswered questions. Therefore, before moving
data to the cloud make sure that the encryption / decryption keys are working and
tested, as when data resides on your own servers.
The integrity of data means making certain that data is maintained identically during
each operation (e.g. transmission, storage or recovery). In other words, data integrity
ensures consistence and correctness of the data. Ensuring the integrity of the information
does mean that it only changes when the transactions are authorized. It sounds good, but
you must remember that there is still no common standard for ensuring data integrity.
The use of SaaS services in the cloud means that software development is much less
necessary. If you plan on using internally developed cloud coding, a formally secure
development software life cycle (SDLC) is even more important. Inadequate use of
mashup technology (web services combinations), which is crucial for cloud applications,
will certainly lead to unknown security vulnerabilities in such applications. A security
model should be integrated into the development tool to guide developers during the
development phase and restrict users to their authorized data only once the system has
been deployed. With increasing number of mission-critical processes moving into the
cloud, SaaS providers will need to provide log information directly and in real time,
probably for their administrators and their customers alike. Someone must take
responsibility of monitoring for security and compliance control. They would not be able
to comply without application and data being tracked by the end users.
As the Payment Card Industry Data Protection Standard (PCI DSS) includes access to
logs, the auditors and regulators may refer to them for auditing a security report. The
security managers must ensure that they obtain access to the logs of the service provider
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 19 Resource Management and Security in Cloud

in the context of any service agreements. Cloud apps are constantly being enhanced by
features and users must remain up-to - date about app improvements to make sure they
are protected. SDLC and security are affected by the speed at which cloud changes in
applications. For example, the SDLC of Microsoft assumes a three to five-year period for
which the mission-critical software won't change substantially, but the cloud may require
a change in the application every couple of weeks. Unfortunately, a secure SLDC cannot
deliver a security cycle that keeps pace with such rapid changes. This means that users
have to update continuously as an older version does not work, or protect the data.
The appropriate fail-over technology is an often-overlooked aspect of securing the
cloud. The company cannot survive if a mission critical application goes offline but may
be survive for non-mission critical applications. Security must shift to device level, so that
businesses can ensure that their data is secured everywhere they go. In cloud computing,
security at the data level is one of the major challenges.
In a cloud world, the majority of compliance requirements do not allow for
enforcement. There are a wide range of IT security and compliance standards that
regulates most business interactions that must be converted to the cloud over time. SaaS
makes it much more difficult for a customer to determine where his data resides in a
network managed by its SaaS provider or a partner of that provider, posing all kinds of
data protection, aggregation and security enforcement concerns. Many regulations of
compliance require that data not be mixed with other data on shared servers or on
databases. Some nation’s government has strict restrictions on what and how long their
citizens can store data. Some regulations on banking require that the financial data from
customers must stay in their countries. Many mobile IT users can have access to business
data & infrastructure without going through the corporate network through cloud-based
applications. It will increase the need for businesses to monitor security between mobile
and cloud-based users. Placing large amounts of confidential information in a global
cloud enables companies to face wide-ranging distributive threats - attacker no longer
have to come and steal data as all this can be found in one "virtual" location. Cloud
virtualization efficiencies require that multi-organisations virtual machines situated
together on the same physical resources. Although the security of traditional data center
remains in place in the cloud environment, the physical separation and hardware security
of virtual machines on the same server cannot protect them from attack. Management
access is via the Internet instead of direct or on-site, monitored and restricted connections
that are in line with the conventional data center model. It raises risk and visibility and
demands strict monitoring for system control modifications and restrictions on access
control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 20 Resource Management and Security in Cloud

The complex and flexible design of the virtual machines would make it hard for the
security to be maintained and auditable. It would be difficult to demonstrate the security
status of a device and detect the location of an unsafe virtual machine. No matter where a
virtual machine is located in virtual environment, intrusion detection and prevention
systems will require malicious activity to be detected on the virtual machine. The
interconnection of several virtual machines increases the attack surface and risk of
compromise between machines. Individual virtual machines and physical servers in the
cloud server environment uses same operating systems along with business and web
applications that makes raising the threat of an attack or malware exploiting
vulnerabilities remotely. Due to the switching between private cloud and public cloud,
virtual machines become vulnerable. A cloud system that is completely or partially
shared would have a greater attacking surface and thus be more at risk than a resource
environment. Operating systems and application files in a virtualized cloud environment
are on a shared physical infrastructure that requires system, file and activity control to
provide confidence and auditable proof to corporate clients with that their resources have
not become compromised or manipulated.
In the cloud computing environment, the organization uses cloud computing
resources where the subscriber is responsible for patching not the cloud computing
provider. Therefore, it is essential to have patch maintenance awareness. Companies are
frequently required to prove that their conformity with security regulations, standards
and auditing practices is consistent, irrespective of the location of the systems on which
data resides. The data is flexible in the cloud environment and can be placed in on-
premises physical servers, on-site virtual machines or outside the premises on virtual
cloud computing services, and auditors and practicing managers may have to reconsider
it. Many companies are likely to rush into cloud computing without serious consideration
of security implications in their efforts to profit from the benefits of cloud computing,
including significant cost savings.
The virtual machines need to protect themselves, essentially moving the perimeter to
the virtual machine itself, in order to create areas of cloud trust. Enterprise perimeter
security is provided through firewalls, segmentation of network, IDS/IPS, monitoring
tools, De-Militarized Zones (DMZs) and security policies associated with them. These
security strategies and policies control the data resides or transits behind the perimeter.
The cloud service provider is responsible for the security and privacy of customer’s data
in the cloud computing environment.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 21 Resource Management and Security in Cloud

4.5.1 Key Privacy Issues in the Cloud


The privacy is deals with collection, use, disclosure, retention and disclosure of
Personally Identifiable Information (PII) or data. According to American Institute of
Certified Public Accountants (AICPA) the definition of privacy is given as,
“Privacy is nothing but the right and obligations of individuals and organizations with respect
to collection, retention and disclosure of personal information”.
Although privacy is important aspect of security but most of the time it is ignored by
the users. The privacy has many concerns related to data collection, use, retention and
storage in cloud explained as follows

a) Compliance issue
The compliance is related to regulatory standards provided for the use of personal
information or data privacy by the country’s laws or legislations. The compliance makes a
restriction on use or share of personally identifiable information by cloud service
providers. The various regulatory standards are available in USA for data privacy like
USA patriot act, HIPAA, GLBA, FISMA etc.
The compliance concern is depends on various factors like applicable laws,
regulations, standards, contractual commitment, privacy requirements etc. for example as
cloud has multitenant environment, the users data is stored across multiple countries,
regions or states where each region, country has their own legislations related to use and
sharing of personal data that makes restriction on usage of such data.

b) Storage issue
In cloud, storage is the biggest issue because as cloud has multitenant environment it
makes multiple copies of user’s data and store them in multiple data centers across
multiple countries. Therefore, user never comes to know where their personal data is
stored and in which country. The storage concern is related to where users’ data is stored.
So, the main concern for user or organization is to find where their data is stored ? Was
it transferred to another datacenter in other country ? What are the privacy standards
enforced by those countries that makes a limitation on transferring personal data ?

c) Retention issue
The retention issue is related to duration for which the personal data is kept in storage
with retention policies. Each Cloud Service Provider (CSP) has their own set of retention
policies that governs the data.so user or organization has to look at retention policy used
by CSP with their exceptions.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 22 Resource Management and Security in Cloud

d) Access issue
The access issue is related to organizations ability to provide individual with access to
personal information and to comply stated request. The user or organization have right to
know what personal data is kept in cloud and can make request to CSP to stop processing
it or delete it from the cloud.

e) Auditing and monitoring


The organization has right to know the audit policies have been implemented by CSP.
They can monitor the activities by CSP and assure their stakeholders about privacy
requirements met with PII in cloud.

f) Destruction of data
At the end of retention period CSPs used to destroy PII. So the concern here is
organizations never comes to know whether their data or PII on cloud is destroyed by
CSP or not or they have kept additional copies or they just make it inaccessible to
organization.

g) Privacy and security breaches


The users in cloud never comes to know whether security breaches are occurred or
not.so the negligence of CSP may put privacy breaches and has to be resolved by CSPs to
avoid inaccessibility.

4.6 Software-as-a-Service Security


Cloud future models will likely incorporate the use of the Internet for fulfilling their
customers’ requirements via the SaaS and other XaaS models with Web 2.0 collaboration
technologies. The move to cloud computing not only leads to the creation of new business
models, but also creates new security problems and requirements as previously
mentioned. The evolutionary step in the cloud service models are shown in Fig. 4.6.1.
For the near future, SaaS is likely to remain the dominant model in cloud services. In
this field, the security practices and monitoring are most importantly required. Like a
managed service provider, businesses or end user must look at the data protection
policies of vendors before using vendor services to prevent them from losing or unable to
access their data.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 23 Resource Management and Security in Cloud

Fig. 4.6.1 Evolutionary steps in the cloud service models

The survey firm Gartner lists seven security problems, which should be discussed with
a cloud computing provider.
 Data location : Is it possible for the provider to check data location ?
 Data segregation : Ensure that encryption is effective at all times and such this
encryption schemes are designed and tested by qualified experts.
 Recovery : In the event of a disaster, find out what will happen with data. Are
service providers are offering full restoration ? If so, how much time does it take ?
 Privileged user access : Find out who has sophisticated access to data and how such
administers are hired and managed.
 Regulatory compliance : Ensure the vendor is ready to be audited externally
and/or certified for security.
 Long-term viability : What if the company goes out of business, and what will
happen with the data ? How and in what format will the data be restored ?
 Investigative support : Is the vendor able to investigate any inappropriate or illegal
activity ?
It is now more difficult to assess data protection, meaning that data security roles are
more critical than in past years. The data must be encrypted yourself as a tactic besides
the Gartner's report. If you encrypt the data using a trustworthy algorithm, then the data
will only be available with the decryption key, regardless of the security and encryption
policies of the service provider. This leads to a further problem, of course: How can you
manage private keys in a computing infrastructure with pay on demand ? SaaS suppliers
will have to incorporate and enhance security practices that managed service providers

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 24 Resource Management and Security in Cloud

provide and develop new practices as the cloud environment evolves in order to deal
with the above security issues along with those mentioned earlier. A structured
agreement for security organizations and the initiative is one of the most critical activities
for a security team. This will foster a shared view of what the security leadership is and
aims to achieve which will encourage 'ownership' for the group's success.

4.7 Security Governance


A security management committee should be set up with the goal of providing
guidance on security measures and coordination with business and IT strategies. So, one
of the outcomes for the steering committee is usually a safety charter. The charter must
clearly define the roles and responsibilities of the security team and the other groups
involved performing information security exercises. The lack of a formalized strategy can
lead to the development of an unsupportable operating model and the level of security.
Furthermore, lack of care for security management may lead to failure to satisfy key
business needs, including risk management, security surveillance and application
security as well as support for sales. The inability to correctly govern and manage tasks
may also result in the un addressing of potential safety risks and missing opportunities
for improving businesses because security teams are not focused on key business security
functions and activities.
The essential factors required in security governance are explained as follows.

1. Risk Assessment
The risk assessment of security is crucial for helping the information security
organization to make informed decisions to balance business utility dueling security goals
and asset protection. Failure to carry out formal risk evaluations can contribute to
increase in information. Security Audit observations can compromise certification goals,
leading to ineffective, inefficient collection of security checks that cannot mitigate security
risks adequately to an appropriate level of information. The structured risk management
process for information security will proactively identify, prepare and manage risks for
security issues on a daily or on a required basis. Applications and infrastructure will also
provide further and more comprehensive technical risk assessments in the context of
threat-modeling. This can assist product management and engineering groups to be more
proactive with the design, test and collaboration with the internal security team. The
modeling of threats requires both IT and business processes and technical knowledge
about the workings of the applications or systems under review.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 25 Resource Management and Security in Cloud

2. Risk Management
The identification of technological assets; identification of data with its connections to
processes, applications, and storage of data; and assignment of ownership with custodial
responsibilities are part of effective risk management. The risk management measures
will also involve maintaining an information asset repository. Owners have the
responsibility and privileges to ensure the confidentiality, integrity, availability and
privacy of information assets, including protective requirements. A formal risk
assessment process for allocating security resources related to business continuity must
be developed.

3. Third-Party Risk Management


As SaaS progresses into cloud computing for the storage and processing of customer
data, it is also likely to handle security threats effectively with third parties. Approaching
a third-party risk management framework may lead to harm the reputation of the
provider, revenue losses and legal proceedings if it is found that the supplier has not
carried out due diligence on its third-party vendors.

4. Security Awareness
Security awareness and culture are among the few successful methods for handling
human risks in security. The failure to provide people with adequate knowledge and
training may expose the organization to a number of security risks which threats and
entry points for persons instead of systems or application vulnerabilities. The risk caused
by the lack of an effective security awareness program can leads to Social Engineering
attacks, lower reputation, slumping responses to potential security incidents, and the
inadvertent customer data leakage. The unique approach to security awareness is not
necessarily the right approach for all SaaS organizations; an information security
awareness and training program that adapts the information and training to the person's
role in the organisations, is more important. For example, development engineers can
receive security awareness in the form of a Secure Code and Testing Training while data
privacy and security certification training can be provided to customer service
representatives. An ideal approach should be used for both generic and personal
purposes.

5. Security Portfolio Management


In order to ensure efficient and successful operation of any information security
system, the security portfolio management is important in terms of the speedy access and
the interactive nature of cloud computing. The lack of portfolios and the discipline of

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 26 Resource Management and Security in Cloud

project management that result in a project not being completed and never realizing its
expected returns. There are excessive and unrealistic workloads expectations occur
because projects are not prioritized in accordance with policy, goals and resort ability.
The security team should ensure that the project plan and project manager with
appropriate training with experience are in place for each new project being conducted by
a security teams so that the project can be seen through to completion. The development
of methodology, tools and processes that support the expected complexity of projects for
both traditional business practice and cloud-based approach can be enhanced by portfolio
and project management capabilities.

6. Security Standards, Guidelines and Policies


In developing information security policies, standards and guidelines, many resources
and templates are available. Firstly, a cloud computing security team will identify
security details and business needs that are specific to cloud computing, SaaS and
collaborative applications. A cloud computing security team should identify information
security requirements and develop policies, supporting standards and guidelines which
should be documented and implemented. These policies, norms and guidelines should be
regularly (at least annually) reviewed in order to maintain relevance or where there are
significant changes in the business or the IT environment. Intangible security standards,
guidelines and policies may lead to the misrepresentation and inadvertent disclosure of
information as a business model for cloud computing alters frequently. In order to change
business initiatives, the business environment and the risk landscape, it is important to
maintain the accuracy and relevance of information security standards, guidelines and
policies. Such, standards, guidelines and policies also constitute the basis for maintaining
a consistent performance and the continuity of expertise during resource turnover.

7. Trainings and education


The security team may not be prepared to deal with the goal of the business without
appropriate training and mentorship programs. Programmers will also be developed to
provide the security team and its internal partners with basic security and risk
management skills and knowledge. It involves a formal process for assessing and aligning
skills to the security team and provides appropriate training and mentorship to include a
broad base of fundamental security, including data protection and risk management
knowledge. The security challenges facing an organization will have to change as per
cloud computing business model and associated services changes.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 27 Resource Management and Security in Cloud

8. Security Monitoring and Incident Response


In order to notify security vulnerabilities and monitor systems on a long-term basis via
automated technologies, the centralized security information management systems
should be used. The centralized security information management systems should be
integrated to the network and other systems monitoring processes so as to perform
dedicated monitoring processes, including security information management, security
event management and managing security operation centers. Management of regular,
independent security testing by third parties should also be incorporated. Many of the
security threats and issues in SaaS across the application and data layers need different
approaches to security management from the conventional infrastructure and perimeter
controls, because the nature and severity of threats or attacks for SaaS organisations
changes dynamically. The company may therefore need to expand its security
surveillance capability to include application and data activity. This may also require
experts in application security and the unique aspects of cloud privacy. A company can't
detect and prevent threats to security or attacks against its customer data and service
stability without that capacity and expertise.

9. Requests for Information security during Sales Support


The security of the cloud computing customer is a top priority and a major concern, as
the absence of information security representatives who can help the sales team to
address the customer's problems could potentially lead to the loss of a sales opportunity.
The requests for information and sales support are part of the organizations SaaS security
teams who ensures integrity of security business model of the provider, compliance with
regulation and certification, reputation of the company, competitiveness, and
marketability. The sales support teams are relying on the ability of the security team to
provide a truthful, clear and concise response to a need of customers through Request For
Information (RFI) or Request For Proposal (RFP). A structured process and knowledge
base of information that is requested frequently will provide a significant amount of
efficiency and prevent the customer's RFI/RFP process from being supported on an
ad-hoc basis.

10. Business Continuity Plan and Disaster Recovery


The goal of Business Continuity plan (BC) and Disaster Recovery (DR) planning is to
reduce the effect of a negative outcome on business processes to a reasonable level.
Business continuity and resilience services ensure continuous operations in all layers of
the business and help them to avoid, prepare and recover from an interruption. SaaS
services that facilitate uninterrupted communications cannot just help the company to
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 28 Resource Management and Security in Cloud

recover from failure, but can also reduce the overall complexity, costs and risk of
managing your most critical applications on regular basis. There are also drastic prospects
in the cloud for cost-effective BC/DR solutions.

11. Vulnerability Assessment


Vulnerability assessment categorizes network assets in order to prioritize vulnerability
management programs more efficiently, including patching and upgrading. It measures
risk reduction by setting targets for reduced exposure to vulnerability and for faster
mitigation. The vulnerability management should be incorporated in business with
investigations, patch management and upgrade processes before they get exploited.

12. Data Privacy


In order to maintain data privacy, a risk assessment and gap analysis of controls and
procedures must be carried out. Either an individual or team must be assigned and be
held responsible for maintaining privacy according to the size and scale of the
organization. A member of a privacy or company security team should work with the
legal team to address privacy issues and concerns. A security team is responsible for
privacy concerns. As with defense, there should also be a privacy management committee
to assist in making decisions on data protection. A professional consultant or qualified
staff member will ensure that the company is able to fulfill its customers and regulators'
data protection needs. Relevant skills, training and expertise that are normally not in the
security team are required to mitigate privacy concerns.

13. Computer Forensics


The computer forensics are used for data gathering and analysis. It involves the
unfortunate incident by collecting and preserving information, the analysis of data for the
reconstruction of events and the assessment of the event status. Network Forensics
involves recording and analysis of network events that determine the nature and source
of information violence and protection, security attacks and other such incidents. This is
usually achieved through the long-term recording or capturing of packages from a key
area or points within your Infrastructure and then data mining for content analysis and
re-creation.

14. Password Testing


In cloud computing, distributed password crackers can be used by SaaS security team
or its customers to periodically test the password strength.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 29 Resource Management and Security in Cloud

15. Security Images


Cloud computing based on virtualization allows to create secure builds of "Gold VM
Images" that provides up-to - date protection and exposure reduction by offline patches.
The offline VMs can be patched off-network for making the effects of security changes
easier, more cost-effective and more productive to test. This is a great way of duplicating
VM images to your production environment, implement a security change, testing the
impact at a low cost, minimizing start-up time, and removing major obstacles to security
in a production environment.

16. Compliance and Security Investigation logs


You can make use of cloud computing to generate logs are in the cloud that index
these logs in real time and take advantage of instant search results. A real time view can
be obtained because the measurement instances can be evaluated and scale based on the
logging load as required. Cloud computing also provides the option for enhanced
logging.

17. Secure Software Development Life Cycle


The Secure Software Development Life Cycle defines specific threats and the risks that
they present, develops and executes appropriate controls to combat threats and assists the
organization and/or its clients in the management of the risks they pose. It aims to ensure
consistency, repeatability and conformity.

Fig. 4.7.1 Secure Software Development Life Cycle

The Secure Software Development Life Cycle consists of six phases which are shown
in Fig. 4.7.1 and described as follows.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 30 Resource Management and Security in Cloud

I. Initial Investigation : To define and document project processes and goals in the
security policy of the program.
II. Requirement Analysis : To analyze recent security policies and systems,
assessment of emerging threats and controls, study of legal issues, and perform
risk analysis.
III. Logical design : To develop a security plan; planning for incident response
measures; business responses to disaster; and determine whether the project can
be carried on and/or outsourced.
IV. Physical design : Selecting technologies to support the security plan, developing a
solution definition that is successful, designing physical security measures to
support technological solutions and reviewing and approving plans.
V. Implementation : Purchase or create solutions for security. Submit a tested
management package for approval at the end of this stage.
VI. Maintenance : The application code can be monitored, tested, and maintained
continuously for efficient enhancement. Further additional security processes
have been developed in order to support the development of application projects
such as external and internal penetration testing and standard security
requirements for data classification. Formally training and communication should
also be introduced to raise awareness about process improvement.

18. Security architectural framework


A Security architectural framework must be developed for implementing
authentication, authorization, access control, confidentiality, integrity, non-repudiation
and security management etc. across every application in the enterprises. It is also used
for evaluating processes, operating procedures, technology specifications, individuals and
organizational management, compliance to security programs and reporting. To align
with that, there should be a security architecture document that describes the concepts of
security and privacy in order to achieve business goals. The documenting is necessary for
assessing Risk management plans, asset-specific metrics, physical security, system access
control, network and computer management, application development, maintenance,
business continuity and compliance. The major goals of security architectural framework
are Authentication, Authorization, Availability, Confidentiality, Integrity, Accountability
and Privacy. To achieve business objectives, a security architecture document should be
established that outline the security and privacy principles. Concerning this architecture,
new design reviews can be better evaluated to ensure that they conform to the principles
described in the architecture to allow more coherent and effective design reviews.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 31 Resource Management and Security in Cloud

4.8 Virtual Machine Security


In traditional network, several security attacks arise such as buffer overflows, DoS
attacks, spyware, malware, rootkits, Trojan horses and worms. Newer attacks may arise
in a cloud environment such as hypervisor malware, guest hopping, hijacking, or VM
root kits. The man-in-the-middle attack for VM migrations is another type of attack
happen on Virtual machines. The Passive attacks on VMs usually steal sensitive
information or passwords while active attacks manipulate the kernel data structures
which cause significant damage to cloud servers. To overcome the security attacks on
VMs, Network level IDS or Hardware level IDS can be used for protection, shepherding
programs can be applied for code execution control and verification and additional
security technologies can be used. The additional security technologies involve the use of
RIO's vSafe and vShield software, hypervisor enforcement and Intel VPro technologies,
with dynamic optimization infrastructure or using the hardened OS environment or are
using isolated sandboxing and execution. Physical servers are consolidated on virtualized
servers with several virtual machine instances in the cloud environment. Firewalls,
intrusion detection and prevention, integrity monitoring and log inspection can all be
deployed as software on virtual machines to enhance the integrity of servers, increase
protection and maintain compliance. Here applications to move from on-site to public
cloud environments as a virtual resource. The security software loaded on a virtual
machine should be filled with two-way stateful Firewall which enables virtual machine
isolation and localization, enabling tighter policy and the flexibility to transfer the virtual
machine from the on premises to cloud resources to make it easier for the centralized
management of the server firewall policy. The integrity monitoring and log inspection
should be used for virtual machine level applications. This approach to VM, which
connects the system back to the home server, that has some benefits in that the security
software can be incorporated into single software agent which ensures consistent cloud-
wide control and management while reintegrates itself seamlessly into existing
investments in security infrastructure, providing economic scale, deployment and cost
savings.

4.9 IAM
The Identity and Access Management (IAM) are the vital function for every
organisations, and SaaS customers have a fundamental expectation that their data is given
the principle of the least privilege. The privilege principle says that only the minimum
access is required to perform an operation that should be granted access only for the
minimum amount of time required. Aspects of current models including trust principles,

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 32 Resource Management and Security in Cloud

privacy implications and operational aspects of authentication and authorization, are


challenged within the cloud environment in which services delivered on demand and can
continuously evolve. To meet these challenges, SaaS providers need to align their efforts
by testing new models and management processes for IAM to include end-to-end trust
and identity across the cloud and their enterprises. The balance between usability and
security will be seen as an additional issue. When a good balance is achieved, obstacles to
the successful completion of their support and maintenance activities will impact both
businesses and their groups.
As cloud composed of many services deployed on big infrastructure because of that it
requires multiple security mechanisms to protect it from failure.
The Identity and access management is the security framework composed of policy
and governance components used for creation, maintenance and termination of digital
identities with controlled access of shared resources. It composed of multiple processes,
components, services and standard practices. It focuses on two parts namely Identity
management and access management. The directory services are used in IAM for creating
a repository for identity management, authentication and access management. The IAM
provides many features like user management, authentication management,
authorization management, Credential and attribute management, Compliance
management, Monitoring and auditing etc. The lifecycle of identity management is
shown in Fig. 4.9.1.

Fig. 4.9.1 Lifecycle of Identity Management

The IAM architecture is made up of several processes and activities (see Fig. 4.9.2). The
processes supported by IAM are given as follows.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 33 Resource Management and Security in Cloud

a) User management - It provides processes for managing the identity of different


entities.
b) Authentication management - It provides activities for management of the
process for determining that an entity is who or what it claims to be.
c) Access management - It provides policies for access control in response to request
for resource by entity.
d) Data management - It provides activities for propagation of data for
authorization to resources using automated processes.
e) Authorization management - It provides activities for determining the rights
associated with entities and decide what resources an entity is permitted to access
in accordance with the organization’s policies.
f) Monitoring and auditing - Based on the defined policies, it provides monitoring,
auditing, and reporting compliance by users regarding access to resources.
The activities supported by IAM are given as follows.
a) Provisioning - The provisioning has essential processes that provide users with
necessary access to data and resources. It supports management of all user
account operations like add, modify, suspend, and delete users with password
management. By provisioning the users are given access to data, systems,
applications, and databases based on a unique user identity. The deprovisioning
does the reverse of provisioning which deactivate of delete the users identity with
privileges.
b) Credential and attribute management - The Credential and attribute management
prevents identity impersonation and inappropriate account use. It deals with
management of credentials and user attributes such as create, issue, manage and
revoke users to minimize the business risk associated with it. The individuals’
credentials are verified during the authentication process. The Credential and
attribute management processes include provisioning of static or dynamic
attributes that comply with a password standard, encryption management of
credentials and handling access policies for user attributes.
c) Compliance management - The Compliance management is the process used for
monitoring the access rights and privileges and tracked to ensure the security of
an enterprise’s resources. It also helpful to auditors to verify the compliance to
various access control policies, and standards. It includes practices like access
monitoring, periodic auditing, and reporting.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 34 Resource Management and Security in Cloud

d) Identity federation management - Identity federation management is the process


of managing the trust relationships beyond the network boundaries where
organizations come together to exchange the information about their users and
entities.
e) Entitlement management - In IAM, entitlements are nothing but authorization
policies. The Entitlement management provides processes for provisioning and
deprovisioning of privileges needed for the users to access the resources
including systems, applications, and databases.

Fig.4.9.2 IAM Architecture

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 35 Resource Management and Security in Cloud

4.10 Security Standards


Security standards are needed to define the processes, measures and practices required
to implement the security program in a web or network environment. These standards
also apply to cloud-related IT exercises and include specific actions to ensure that a secure
environment is provided for cloud services along with privacy for confidential
information. Security standards are based on a set of key principles designed to protect a
trusted environment of this kind. The following sections explain the different security
standards used in protecting cloud environment.

4.10.1 Security Assertion Markup Language (SAML)


Security Assertion Markup Language (SAML) is a security standard developed by
OASIS Security Services Technical Committee that enables Single Sign-On technology
(SSO) by offering a way of authenticating a user once and then communicating
authentication to multiple applications. It is an open standard for exchanging
authentication and authorization data between parties, in particular, between an identity
provider and a service provider.
It enables Identity Providers (IdPs) to pass permissions and authorization credentials
to Service Providers (SP). A range of existing standards, including SOAP, HTTP, and
XML, are incorporated into SAML. An SAML transactions use Extensible Markup
Language (XML) for standardized communications between the identity provider and
service providers. SAML is the link between the authentication of a user’s identity and
the authorization to use a service. The majority of SAML transactions are in a
standardized XML form. The XML schema is mainly used to specify SAML assertions and
protocols. For authentication and message integrity, both SAML 1.1 and SAML 2.0 use
digital signatures based on the XML Signature Standard. XML encryption is supported in
SAML 2.0 but not by SAML 1.0 as it doesn’t support encryption capabilities. SAML
defines assertions, protocol, bindings and profiles based on XML.
A binding of SAML defines how SAML requests and responses map to the standard
messaging protocols. A SAML binding is a mapping of a SAML protocol message onto
standard messaging formats and/or communications protocols. SAML Core refers to the
general syntax and semantics of SAML assertions and to the protocol for requesting and
transmitting all those assertions from one system entity to another. SAML standardizes
user authentication, entitlements and attribute-related knowledge questions and answers
in an XML format. A platform or application that can transmit security data can be a
SAML authority, often called the asserting party. The assertion consumer or requesting
party is a partner site receiving security information. The information exchanged covers
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 36 Resource Management and Security in Cloud

an authentication status of a subject, access permission and attribute information. SAML


claims are usually passed to service providers from Identity Providers. Assertions include
claims used by service providers to make decisions about access control. A SAML
protocol describes how certain SAML elements (including assertions) are packaged
within SAML request and response elements, and gives the processing rules that SAML
entities must follow when producing or consuming these elements. For the most part, a
SAML protocol is a simple request-response protocol.

4.10.2 Open Authentication (OAuth)


OAuth is a standard protocol which allows secure API authorization for various types
of web applications in a simple, standard method. OAuth is an open standard for
delegating access and uses it as a way of allowing internet users to access their data on
websites and applications without passwords. It is a protocol that enables secure
authorization from web, mobile or desktop applications in a simple and standard way. It
is a publication and interaction method with protected information. It allows developers
access to their data while securing credentials of their accounts. OAuth enables users to
access their information which is shared by service providers and consumers without
sharing all their identities. This mechanism is used by companies such as Amazon,
Google, Facebook, Microsoft and Twitter to permit the users to share information about
their accounts with third party applications or websites. It specifies a process for resource
owners to authorize third-party access to their server resources without sharing their
credentials. Over secure Hypertext Transfer Protocol (HTTPs), OAuth essentially allows
access tokens to be issued to third-party clients by an authorization server, with the
approval of the resource owner. The third party then uses the access token to access the
protected resources hosted by the resource server.

4.10.3 Secure Sockets Layer and Transport Layer Security


Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are cryptographically
secure protocols to provide security and data integrity for TCP/IP based
communications. The network connections segments in the transport layer are encrypted
by the TLS and SSL. In web browsers, e-mail, instant messaging and voice over IP, many
implementations of these protocols are widely used. TLS is the latest updated IETF
standard protocol for RFC 5246. The TLS protocol allows client/server applications to
communicate across a network in a way that avoids eavesdropping, exploitation,
tampering and message forgery. TLS uses cryptography to ensure endpoint
authentication and data confidentiality.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 37 Resource Management and Security in Cloud

TLS authentication is a single way, since the client knows the identity of the server
already. The client is not authenticated in this case. This means that on browser level, the
browser specifically validated the server’s certificate and checked the digital signatures of
the server certificate issuing chain of Certification Authorities (CAs). No validation
identifies the end user's server. The end user must verify the identifying information
contained in the certificate of the server in order to be truly identifiable. That is the only
way for end users to be aware of the server's "identity," and it is the only way to securely
establish the identify, to check that the server's certificate specifies the URL, name or
address that they are using in the server's certificate. The valid certificate of another
website cannot be used by malicious websites, because they have no means of encrypting
the transmission in a way to decrypt it with a true certificate. Since only a trustworthy CA
can incorporate a URL in a certificate, this makes sure that it is appropriate to compare
the apparent URL with the URL specified in the certificate. A more secure bilateral
connection mode is also supported by TLS ensuring that both ends of the connection
communicate with the individual they believe is connected. This is called mutual
authentication. The TLS client side must also keep a certificate for mutual authentication.
Three basic phases involve TLS are Algorithm support for pair negotiation involves
cipher suites that are negotiated between the client and the server to determine the
ciphers being used; Authentication and key exchange involves decisions on
authentication algorithms and key exchange to be used. Here key exchange and
authentication algorithms are public key algorithms; and Message authentication using
Symmetric cipher encryption determines the message authentication codes. The
Cryptographic hash functions are used for message authentication codes. Once these
decisions are made, the transfer of data can be commenced.
Summary

 Resource management is a process for the allocation of computing, storage,


networking and subsequently energy resources to a set of applications, in a
context that aims to collectively meet the performance goals of infrastructure
providers, cloud users and applications.
 In Inter cloud resource provisioning, developers have to consider how to design
the system to meet critical requirements such as high throughput, HA, and fault
tolerance. The infrastructure for operating cloud computing services may be
either a physical server or a virtual server.
 Resource provisioning schemes are used for the rapid discovery of cloud
computing services and data in cloud.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 38 Resource Management and Security in Cloud

 The provisioning of storage resources in cloud is often associated with the terms
like distributed file system, storage technologies and databases.
 There are three methods of resource provisioning namely Demand-Driven,
Event-Driven and Popularity-Driven.
 The cloud providers can expand or redimension their provision capacity in a
competitive and dynamic manner by leasing the computation and storage
resources of other cloud service providers with the use of Intercloud
architectural principles.
 Although cloud computing has many benefits in most of the aspects, but
security issues in cloud platforms led many companies to hesitate to migrate
their essential resources to the cloud.
 Even if cloud computing and virtualization can enhance business efficiency by
breaking the physical ties between an IT infrastructure and its users, it is
important to resolve increased security threats in order to fully benefit from this
new computing paradigms.
 Some security issues in cloud platforms are trust, privacy, lack of security and
copyright protection.
 Key privacy issues in the cloud computing are Compliance issue, Storage
concern, Retention concern, Access Concern, Auditing and monitoring and so
on.
 The lack of a formalized strategy can lead to the development of an
unsupportable operating model and the level of security.
 The essential factors required in security governance are Risk Assessment and
management, Security Awareness, Security Portfolio Management, Security
Standards, Guidelines and Policies, Security Monitoring and Incident Response,
Business Continuity Plan and Disaster Recovery and so on.
 To overcome the security attacks on VMs, Network level IDS or Hardware level
IDS can be used for protection, shepherding programs can be applied for code
execution control and verification and additional security technologies can be
used.
 The Identity and access management is the security framework composed of
policy and governance components used for creation, maintenance and
termination of digital identities with controlled access of shared resources. It
composed of multiple processes, components, services and standard practices.
 Security standards are needed to define the processes, measures and practices
required to implement the security program in a web or network environment.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 39 Resource Management and Security in Cloud

 Security Assertion Markup Language (SAML) is a security standard that enables


Single Sign-On technology (SSO) by offering a way of authenticating a user once
and then communicating authentication to multiple applications.
 OAuth is an open standard for delegating access and uses it as a way of allowing
internet users to access their data on websites and applications without
passwords.
 Secure Sockets Layer (SSL) and Transport Layer Security (TLS) are
cryptographically secure protocols to provide security and data integrity for
TCP/IP based communications.

Short Answered Questions

Q.1 List any four host security threats in public IaaS. AU : Dec.-17
Ans. : The most common host security threats in public IaaS public cloud are
 Hijacking of accounts those are not properly secured.

 Stealing the keys like SSH private keys those are used to access and manage hosts.

 Attacking unpatched and vulnerable services by listening on standard ports like FTP,
NetBIOS, SSH.
 Attacking systems that are not secured by host firewalls.

 Deploying Trojans embedded viruses in the software’s running inside the VM.

Q.2 Mention the importance of transport level security. AU : Dec.-16


Ans. : The TLS protocol allows client / server applications to communicate across a
network in a way that avoids eavesdropping, exploitation, tampering and message
forgery. TLS uses cryptography to ensure endpoint authentication and data
confidentiality. TLS authentication is a single way, since the client knows the identity of
the server already. This means that on browser level, the browser specifically validated
the server’s certificate and checked the digital signatures of the server certificate issuing
chain of Certification Authorities (CAs). No validation identifies the end user's server.
The end user must verify the identifying information contained in the certificate of the
server in order to be truly identifiable.
Q.3 Discuss on the application and use of identity and access management. AU : Dec.-16

OR What is identity and access management in a cloud environment ? AU : Dec.18


Ans. : The Identity and access management in cloud computing is the security
framework composed of policy and governance components used for creation,
maintenance and termination of digital identities with controlled access of shared
resources. It composed of multiple processes, components, services and standard
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 40 Resource Management and Security in Cloud

practices. It focuses on two parts namely Identity management and access


management. The directory services are used in IAM for creating a repository for
identity management, authentication and access management. The IAM provides many
features like user management, authentication management, authorization
management, credential and attribute management, compliance management,
monitoring and auditing etc.
Q.4 What are the various challenges in building the trust environment ? AU : May-17
Ans. : In cloud computing, Trust is important in building healthy relationships
between cloud service provider and cloud user. The trust environment between service
providers and cloud users can’t be easily build as the customer shows limited belief and
trust on particular cloud service provider due to growing number of cloud service
providers available on internet. The various challenges in building the trust
environment are
a) Lack of trust between service providers and cloud users can prevent cloud
computing from being generally accepted as a solution for on demand service.
b) It can generate Lack of transparency, difficulty in communication and
confidentiality between cloud service provider and cloud users.
c) Lack of Standardization.
d) Challenges due to multi-tenancy and audit trails.
Q.5 Differentiate between authentication and authorization. AU : Dec.-19

Ans. : Authentication is the process of validating individuals’ credentials like User


Name/User ID and password to verify your identity. Authentication technology
provides access control for systems by checking to see if a user's credentials match the
credentials in a database of authorized users or in a data authentication server. It
determines the user’s identity before revealing the sensitive information.
Authorization technique is used to determine the permissions that are granted to an
authenticated user. In simple words, it checks whether the user is permitted to access
the particular resources or not. Authorization occurs after authentication, where the
user’s identity is assured prior then the access list for the user is determined by looking
up the entries stored in the tables and databases. In authentication process, the identity
of users is checked for providing the access to the system while in authorization
process; person’s or user’s authorities are checked for accessing the resources. The
authentication is always done before the authorization process.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 41 Resource Management and Security in Cloud

Q.6 List key privacy issues in cloud. AU : Dec.-19


Ans. : “Privacy is nothing but the right and obligations of individuals and
organizations with respect to collection, retention and disclosure of personal
information”. Although privacy is important aspect of security but most of the time it
is ignored by the users. The privacy has many concerns related to data collection, use,
retention and storage in cloud which are listed as follows.
a) Compliance issue b) Storage issue
c) Retention issue d) Access issue
e) Auditing and monitoring f) Destruction of data
g) Privacy and security breaches
Q.7 List out the security challenges in cloud. AU : May -19
Ans. : The security challenges in cloud are
a) A Lack of Visibility and Control
b) Compliance complexity issues
c) Trust and Data Privacy Issues
d) Data Breaches and Downtime
e) Issues related to User Access Control
f) Vendor Lock-In
g) Lack of Transparency
h) Insecure Interfaces and APIs
i) Insufficient Due Diligence
j) Shared Technology Vulnerabilities
k) Potential Threats like Distributed Denial of Service (DDos), Man in the Middle
attacks or Traffic Hijacking etc.
Q.8 How can the data security be enforced in cloud ? AU : May-19
Ans. : In Cloud computing data security can be enforced by
a) Providing data encryption for in transit data
b) Providing data privacy and privacy protection
c) Providing data availability with minimal downtime
d) Preserving data integrity
e) Maintaining confidentiality, integrity, and availability for data
f) Incorporating different access control schemes like Role Based Access Control
(RBAC), Mandatory Access Control or Discretionary Access Control.
g) Secure data from different threats

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 42 Resource Management and Security in Cloud

Q.9 What are three methods of resource provisioning ?


Ans. : Refer section 4.2.4.
Q.10 What is the purpose of Open Authentication in the cloud computing ?
Ans. : OAuth is a standard protocol in cloud computing which allows secure API
authorization for various types of web applications in a simple, standard method.
OAuth is an open standard for delegating access and uses it as a way of allowing
internet users to access their data on websites and applications without passwords. It is
a protocol that enables secure authorization from web, mobile or desktop applications
in a simple and standard way. It is a publication and interaction method with protected
information. It allows developers access to their data while securing credentials of their
accounts. OAuth enables users to access their information which is shared by service
providers and consumers without sharing all their identities. This mechanism is used
by companies such as Amazon, Google, Facebook, Microsoft and Twitter to permit the
users to share information about their accounts with third party applications or
websites.

Long Answered Questions

Q.1 “In today’s world, infrastructure security and data security are highly challenging at
network, host and application levels”, Justify and explain the several ways of protecting the
data at transit and at rest. AU : May-18
Ans. : Refer section 4.4.1 to 4.4.4.
Q.2 Explain the baseline Identity and access management (IAM) factors to be practices by
the stakeholders of cloud services and the common key privacy issues likely to happen in the
cloud environment. AU : May-18
Ans. : Refer section 4.9 for Identity and access management and 4.5.1 for the common
key privacy issues likely to happen in the environment.
Q.3 What is the purpose of IAM ? Describe its functional architecture with an illustration.
AU : Dec.-17
Ans. : Refer section 4.9.
Q.4 Write details about cloud security infrastructure. AU : Dec.-16
Ans. : Refer section 4.4.
Q.5 Write detailed note on identity and access management architecture. AU : May-17
Ans. : Refer section 4.9.
Q.6 Describe the IAM practices in SaaS, PaaS and IaaS availability in cloud. AU : Dec.-19
Ans. : Refer section 4.9.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 4 - 43 Resource Management and Security in Cloud

Q.7 How is the identity and access management established in cloud to counter threats ?
AU : May-19
Ans. : Refer section 4.9.
Q.8 Write detailed note on Resource Provisioning and Resource Provisioning Methods
Ans. : Refer section 4.2.
Q.9 How Security Governance can be achieved in cloud computing environment
Ans. : Refer section 4.7.
Q.10 Explain different Security Standards used in cloud computing
Ans. : Refer section 4.10.



®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
5
Cloud Technologies and
Advancements
Syllabus
Hadoop – MapReduce – Virtual Box -- Google App Engine – Programming Environment for
Google App Engine - Open Stack – Federation in the Cloud – Four Levels of Federation –
Federated Services and Applications – Future of Federation.

Contents
5.1 Hadoop
5.2 Hadoop Distributed File system (HDFS)
5.3 Map Reduce
5.4 Virtual Box
5.5 Google App Engine
5.6 Programming Environment for Google App Engine
5.7 Open Stack
5.8 Federation in the Cloud
5.9 Four Levels of Federation
5.10 Federated Services and Applications
5.11 The Future of Federation

(5 - 1)
Cloud Computing 5-2 Cloud Technologies and Advancements

5.1 Hadoop
With the evolution of internet and related technologies, the high computational power,
large volumes of data storage and faster data processing becomes the basic need for most
of the organizations and it has been significantly increased over the period of time.
Currently, organizations are producing huge amount of data at faster rate. The recent
survey on data generation by various organization says that Facebook produces roughly
600+ TBs of data per day and analyzes 30+ Petabytes of user generated data, Boeing jet
airplane generates more than 10 TBs of data per flight including geo maps, special images
and other information, Walmart handles more than 1 million customer transactions every
hour, which is imported into databases estimated to contain more than 2.5 petabytes of
data.
So, there is a need to acquire, analyze, process, handle and store such a huge amount
of data called big data. The different challenges associated with such big data are given as
below
a) Volume : The Volume is related to Size of big data. The amount of data growing
day by day is very huge. According to IBM, in the year 2000, 8 lakh petabytes of
data were stored in the world.so challenge here is, how to deal with such huge Big
Data.
b) Variety : The Variety is related to different formats of big data. Nowadays most of
the data stored by organizations have no proper structure called unstructured data.
Such data has complex structure and cannot be represented using rows and
columns. The challenge here is how to store different formats of data in databases.
c) Velocity : The Velocity is related to speed of data generation which is very fast. It is
a rate at which data is captured, generated and shared. The challenge here is how to
react to massive information generated in the time required by the application.
d) Veracity : The Veracity refers to uncertainty of data. The data stored in database
sometimes is not accurate or consistent that makes poor data quality. The
inconsistent data requires lot of efforts to process such data.
The traditional database management techniques are incapable to satisfy above four
characteristics as well as doesn’t supports storing, processing, handling and analyzing big
data. Therefore, these challenges associated with Big data can be solved using one of the
most popular framework provided by Apache is called Hadoop.
The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-3 Cloud Technologies and Advancements

models. It is designed to scale up from a single server to thousands of machines, with a


very high degree of fault tolerance. It is a software framework for running the
applications on clusters of commodity hardware with massive storage, enormous
processing power and supporting limitless concurrent tasks or jobs.
The Hadoop core is divided into two fundamental components called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that organizes
files and stores their data on a distributed computing system, while MapReduce is the
computation engine running on top of HDFS as its data storage manager. The working of
HDFS and MapReduce are explained followed by Hadoop Ecosystem Components in
further sections.

5.1.1 Hadoop Ecosystem Components


Although HDFS and MapReduce are two main components in Hadoop architecture
but there are several other components which are used for storing, analyzing and
processing a big data collectively termed as Hadoop ecosystem components. The different
components of Hadoop ecosystem are shown in Fig. 5.1.1 and explained in Table 5.1.1.

Fig. 5.1.1 : Hadoop ecosystem components

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-4 Cloud Technologies and Advancements

Sr. No. Name of Component Description


1) HDFS It is a Hadoop distributed file system which is used to split the
data in to blocks and stored amongst distributed servers for
processing. It runs multiple clusters to store several copies of
data blocks those can be used in case failure occurs.

2) MapReduce It’s a programming model to process the big data. It


comprising of two programs written in Java such as mapper
and reducer. The mapper extracts data from HDFS and put in
to maps while reducer aggregate the results generated by
mappers.

3) Zookeeper It is a centralized service used for maintaining configuration


information with distributed synchronization and
coordination.

4) HBase It is a Column-oriented database service used as NoSQL


solution for big data

5) Pig It is a platform used for analyzing the large data sets using a
high-level language. It uses dataflow language and provides
parallel execution framework.

6) Hive It provides data warehouse infrastructure for big data

7) Flume It provides distributed and reliable service for efficiently


collecting, aggregating, and moving large amounts of log data.

8) Scoop It is a tool designed for efficiently transferring bulk data


between Hadoop and structured data stores such as relational
databases.

9) Mahaout It provides libraries for scalable machine learning algorithms


implemented on the top of Hadoop implemented using
MapReduce framework.

10) Oozie It is a workflow scheduler system to manage the Hadoop jobs.

11) Ambari It provides a software framework for provisioning, managing


and monitoring Hadoop clusters.

Table 5.1.1 : Different components of Hadoop ecosystem

As we know that from all above components, HDFS and MapReduce are the two core
components of Hadoop framework which are explained in next sections.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-5 Cloud Technologies and Advancements

5.2 Hadoop Distributed File system (HDFS)


The Hadoop Distributed File system (HDFS) is the Hadoop implementation of
distributed file system design that hold large amount of data. It provide easier access to
stored data to many clients distributed across the network. It is highly fault tolerant and
designed to be run - on low - cost hardware (called commodity hardware). The files in
HDFS are stored across the multiple machine in redundant fashion to recover the data
loss in case of failure.
It enables storage and management of large files stored on distributed storage medium
over the pool of data node. A single name node runs in a cluster is associated with
multiple data nodes that provide the management of hierarchical file organization and
namespace. The HDFS file composed of fixed size blocks or chunks that are stored on
data nodes. The name node is responsible for storing the metadata about each file that
includes attributes of files like type of file, size, date and time of creation, properties of the
files as well as the mapping of blocks to files at the data nodes. The data node treats each
data block as a separate file and propagates the critical information with the name node.
The HDFS provides fault tolerance through data replication that can be specified at the
time of file creation with attribute name degree of replication (i.e., the number of copies
made) which is progressively significant in bigger environments consisting of many racks
of data servers. The significant benefits provided by HDFS are given as follows
 It provides streaming access to file system data
 It is suitable for distributed storage and processing
 It is optimized to support high streaming read operations with limited set.
 It supports file operations like read, write, delete but append not update.
 It provides Java APIs and command line command line interfaces to interact with
HDFS.
 It provides different File permissions and authentications for files on HDFS.
 It provides continuous monitoring of name nodes and data nodes based on
continuous “heartbeat” communication between the data nodes to the name node.
 It provides Rebalancing of data nodes so as to equalize the load by migrating blocks
of data from one data node to another.
 It uses checksums and digital signatures to manage the integrity of data stored in a
file.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-6 Cloud Technologies and Advancements

 It has built-in metadata replication so as to recover data during the failure or to


protect against corruption.
 It also provides synchronous snapshots to facilitates rolled back during failure.

5.2.1 Architecture of HDFS


The HDFS follows Master-slave architecture using name node and data nodes. The
name node act as a master while multiple data nodes worked as slaves. The HDFS is
implemented as block structure file system where files are broken in to block of fixed size
stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.2.1.

Fig. 5.2.1 : HDFS architecture

The Components of HDFS composed of following elements

1. Name Node

An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system
across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system, while
the transaction log record is stored in Editlog file.

2. Data Node

In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-7 Cloud Technologies and Advancements

Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block creation,
deletion and replication upon instruction from name node. The data node stores each
HDFS data block in separate file and several blocks are stored on different data nodes.
The requirement of such a block structured file system is to store, manage and access files
metadata reliably.
The representation of name node and data node is shown in Fig. 5.2.2.

Fig. 5.2.2 : Representation of name node and data nodes

3. HDFS Client

In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user references
files and directories by paths in the namespace. The user application does not need to
aware that file system metadata and storage are on different servers, or that blocks have
multiple replicas. When an application reads a file, the HDFS client first asks the name
node for the list of data nodes that host replicas of the blocks of the file. The client
contacts a data node directly and requests the transfer of the desired block. When a client
writes, it first asks the name node to choose data nodes to host replicas of the first block of
the file. The client organizes a pipeline from node-to-node and sends the data. When the
first block is filled, the client requests new data nodes to be chosen to host replicas of the
next block. The Choice of data nodes for each block is likely to be different.

4. HDFS Blocks

In general, the user’s data stored in HDFS in terms of block. The files in file system are
divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
The HDFS is fault tolerance such that if data node fails then current block write
operation on data node is re-replicated to some other node. The block size, number of
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-8 Cloud Technologies and Advancements

replicas and replication factors are specified in Hadoop configuration file. The
synchronization between name node and data node is done by heartbeats functions
which are periodically generated by data node to name node.
Apart from above components the job tracker and task trackers are used when
MapReduce application runs over the HDFS. Hadoop core consists of one master job
tracker and several task trackers. The job tracker runs on name node like a master while
task trackers runs on data nodes like slaves.
The job tracker is responsible for taking the requests from a client and assigning task
trackers to it with tasks to be performed. The job tracker always tries to assign tasks to
the task tracker on the data nodes where the data is locally present. If for some reason the
node fails the job tracker assigns the task to another task tracker where the replica of the
data exists since the data blocks are replicated across the data nodes. This ensures that the
job does not fail even if a node fails within the cluster.

5.3 Map Reduce


The MapReduce is a programming model provided by Hadoop that allows expressing
distributed computations on huge amount of data.it provides easy scaling of data
processing over multiple computational nodes or clusters. In MapReduce s model the
data processing primitives used are called mapper and reducer. Every MapReduce
program must have at least one mapper and reducer subroutines. The mapper has map
method that transforms input key value pair in to any number of intermediate key value
pairs while reducer has a reduce method that transform intermediate key value pairs that
aggregated in to any number of output key, value pairs.
The MapReduce keeps all processing operations separate for parallel executions where
a complex problem with extremely large in size is decomposed in to sub tasks. These
subtasks are executed independently from each other’s. After that the result of all
independent executions are combined together to get the complete output.

5.3.1 Features of MapReduce


The different features provided by MapReduce are explained as follows
 Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The synchronization
is provided by reading the state of each MapReduce operation during the execution
and uses shared variables for those.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5-9 Cloud Technologies and Advancements

 Data locality : In MapReduce although the data resides on different clusters, it


appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
 Error handling : MapReduce engine provides different fault tolerance mechanisms
in case of failure. When the tasks are running on different cluster nodes during
which if any failure occurs then MapReduce engine find out those incomplete tasks
and reschedule them for execution on different nodes.
 Scheduling : The MapReduce involves map and reduce operations that divide large
problems in to smaller chunks and those are run in parallel by different machines.so
there is a need to schedule different tasks on computational nodes on priority basis
which is taken care by MapReduce engine.

5.3.2 Working of MapReduce Framework


The unit of work in MapReduce is a job. During map phase the input data is divided
in to input splits for analysis where each split is an independent task. These tasks run in
parallel across Hadoop clusters. The reducer phase uses result obtained from mapper as
an input to generate the final result.
The MapReduce takes a set of input <key, value> pairs and produces a set of output
<key, value> pairs by supplying data through map and reduce functions. The typical
MapReduce operations are shown in Fig. 5.3.1.

Fig. 5.3.1 : MapReduce operations

Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.3.2 and explained as follows.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 10 Cloud Technologies and Advancements

Fig. 5.3.2 : Different phases of execution in MapReduce

In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept on
HDFS (Hadoop Distributed File System) store which has standard InputFormat specified
by user.
Once input file is selected then the split phase reads the input data and divided those
in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs. It reads
input data from split using record reader and generates intermediate results. It is used to
transform the input key, value list data to output key, value list which is then pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. Each partition is called subset. Each
subset becomes input to the reducer.in general shuffle phase ensures that the partitioned
splits reached at appropriate reducers where reducer uses http protocol to retrieve their
own partition from mapper.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 11 Cloud Technologies and Advancements

The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases occur
simultaneously where mapped output are being fetched and merged.
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format.
The final output of each MapReduce program is generated with key value pairs written in
output file which is written back to the HDFS store. In example of Word count process
using MapReduce with all phases of execution are illustrated in Fig. 5.3.3.

Fig. 5.3.3 : Word count process using MapReduce

5.4 Virtual Box


Virtual Box (formerly Sun Virtual Box and presently called Oracle VM Virtual Box) is
an x86 virtualization software package, created by software company Innotek GmbH,
purchased by Sun Microsystems, and now takeover by Oracle Corporation as part of its
family of virtualization products. It is cross-platform virtualization software that allows
users to extend their existing computer to run multiple operating systems at the same
time. VirtualBox runs on Microsoft Windows, Mac OS, Linux, and Solaris systems. It is
ideal for testing, developing, demonstrating, and deploying solutions across multiple
platforms on single machine.
It is a Type II (hosted) hypervisor that can be installed on an existing host operating
system as an application. This hosted application allows to run additional operating
systems inside it known as a Guest OS. Each guest OS can be loaded and run with its own
virtual environment. VirtualBox allows you to run guest operating systems using its own
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 12 Cloud Technologies and Advancements

virtual hardware. Each instance of guest OS


is called a “Virtual machine”. The functional
architecture of Virtual Box hypervisor is
shown in Fig. 5.4.1.
It has lightweight, extremely fast and
powerful virtualization engine. The guest
system will run in its VM environment just
as if it were installed on a real computer. It
operates according to the VM settings you
Fig. 5.4.1 : Functional architecture of Virtual
have specified. All software that you choose
Box Hypervisor
to run on the guest system will operate just
as it would on a physical computer. Each VM runs over its independent virtualized
hardware.
The latest version of VirtualBox simplifies cloud deployment by allowing developers
to create multiplatform environments and to develop applications for Container and
Virtualization technologies within Oracle VM VirtualBox on a single machine. VirtualBox
also supports OS (.vmdk) and Virtual hard disk (.vhd) images made using VMware
Workstation or Microsoft Virtual PC, thus it can flawlessly run and integrate guest
machines which were configured via VMware Workstation or other hypervisors.
The VirtualBox provides the following main features
 It supports Fully Para virtualized environment along with Hardware virtualization.
 It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
 It provides shared folder support to copy data from host OS to guest OS and vice
versa.
 It has latest Virtual USB controller support.
 It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
 It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
 It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 13 Cloud Technologies and Advancements

5.5 Google App Engine


Google App Engine (GAE) is a Platform-as-a-Service cloud computing model that
supports many programming languages. GAE is a scalable runtime environment mostly
devoted to execute Web applications. In fact, it allows developers to integrate third-party
frameworks and libraries with the infrastructure still being managed by Google. It allows
developers to use readymade platform to develop and deploy web applications using
development tools, runtime engine, databases and middleware solutions. It supports
languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in which developers can
write their code and deploy it on available google infrastructure with the help of Software
Development Kit (SDK). In GAE, SDKs are required to set up your computer for
developing, deploying, and managing your apps in App Engine. GAE enables users to
run their applications on a large number of data centers associated with Google’s search
engine operations. Presently, Google App Engine uses fully managed, serverless platform
that allows to choose from several popular languages, libraries, and frameworks to
develop user applications and then uses App Engine to take care of provisioning servers
and scaling application instances based on demand. The functional architecture of the
Google cloud platform for app engine is shown in Fig. 5.5.1.

Fig. 5.5.1 : Functional architecture of the Google cloud platform for app engine

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 14 Cloud Technologies and Advancements

The infrastructure for google cloud is managed inside datacenter. All the cloud
services and applications on Google runs through servers inside datacenter. Inside each
data center, there are thousands of servers forming different clusters. Each cluster can run
multipurpose servers. The infrastructure for GAE composed of four main components
like Google File System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for
storing large amounts of data on google storage clusters. The MapReduce is used for
application program development with data processing on large clusters. Chubby is used
as a distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data. In this architecture, users can interact
with Google applications via the web interface provided by each application.
The GAE platform comprises five main components like
 Application runtime environment offers a platform that has built-in execution
engine for scalable web programming and execution.
 Software Development Kit (SDK) for local application development and
deployment over google cloud platform.
 Datastore to provision object-oriented, distributed, structured data storage to store
application and data. It also provides secures data management operations based
on BigTable techniques.
 Admin console used for easy management of user application development and
resource management
 GAE web service for providing APIs and interfaces.

5.6 Programming Environment for Google App Engine


The Google provides programming support for its cloud environment, that is, Google
Apps Engine, through Google File System (GFS), Big Table, and Chubby. The following
sections provide a brief description about GFS, Big Table, Chubby and Google APIs.

5.6.1 The Google File System (GFS)


Google has designed a distributed file system, named GFS, for meeting its exacting
demands off processing a large amount of data. Most of the objectives of designing the
GFS are similar to those of the earlier designed distributed systems. Some of the
objectives include availability, performance, reliability, and scalability of systems. GFS
has also been designed with certain challenging assumptions that also provide
opportunities for developers and researchers to achieve these objectives. Some of the
assumptions are listed as follows :

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 15 Cloud Technologies and Advancements

a) Automatic recovery from component failure on a routine basis


b) Efficient storage support for large - sized files as a huge amount of data to be
processed is stored in these files. Storage support is provided for small - sized files
without requiring any optimization for them.
c) With the workloads that mainly consist of two large streaming reads and small
random reads, the system should be performance conscious so that the small reads
are made steady rather than going back and forth by batching and sorting while
advancing through the file.
d) The system supports small writes without being inefficient, along with the usual
large and sequential writes through which data is appended to files.
e) Semantics that are defined well are implemented.
f) Atomicity is maintained with the least overhead due to synchronization.
g) Provisions for sustained bandwidth is given priority rather than a reduced latency.
Google takes the aforementioned assumptions into consideration, and supports its
cloud platform, Google Apps Engine, through GFS. Fig. 5.6.1 shows the architecture of
the GFS clusters.

Fig. 5.6.1 : Architecture of GFS clusters

GFS provides a file system interface and different APIs for supporting different file
operations such as create to create a new file instance, delete to delete a file instance, open to
open a named file and return a handle, close to close a given file specified by a handle,
read to read data from a specified file and write to write data to a specified file.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 16 Cloud Technologies and Advancements

It can be seen from Figure 5.6.1, that a single GFS Master and three chunk servers are
serving to two clients comprise a GFS cluster. These clients and servers, as well as the
Master, are Linux machines, each running a server process at the user level. These
processes are known as user-level server processes.
In GFS, the metadata is managed by the GFS Master that takes care of all the
communication between the clients and the chunk servers. Chunks are small blocks of
data that are created from the system files. Their usual size is 64 MB. The clients interact
directly with chunk servers for transferring chunks of data. For better reliability, these
chunks are replicated across three machines so that whenever the data is required, it can
be obtained in its complete form from at least one machine. By default, GFS stores three
replicas of the chunks of data. However, users can designate any levels of replication.
Chunks are created by dividing the files into fixed-sized blocks. A unique immutable
handle (of 64-bit) is assigned to each chunk at the time of their creation by the GFS
Master. The data that can be obtained from the chunks, the selection of which is specified
by the unique handles, is read or written on local disks by the chunk servers. GFS has all
the familiar system interfaces. It also has additional interfaces in the form of snapshots
and appends operations. These two features are responsible for creating a copy of files or
folder structure at low costs and for permitting a guaranteed atomic data-append
operation to be performed by multiple clients of the same file concurrently.
Applications contain a specific file system, Application Programming Interface (APIs)
that are executed by the code that is written for the GFS client. Further, the
communication with the GFS Master and chunk servers are established for performing
the read and write operations on behalf of the application. The clients interact with the
Master only for metadata operations. However, data-bearing communications are
forwarded directly to chunk servers. POSIX API, a feature that is common to most of the
popular file systems, is not included in GFS, and therefore, Linux vnode layer hook-in is
not required. Clients or servers do not perform the caching of file data. Due to the
presence of the streamed workload, caching does not benefit clients, whereas caching by
servers has the least consequence as a buffer cache that already maintains a record for
frequently requested files locally.
The GFS provides the following features :
 Large - scale data processing and storage support
 Normal treatment for components that stop responding

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 17 Cloud Technologies and Advancements

 Optimization for large-sized files (mostly appended concurrently and read


sequentially)
 Fault tolerance by constant monitoring, data replication, and automatic recovering
 Data corruption detections at the disk or Integrated Development Environment
(IDE) subsystem level through the checksum method
 High throughput for concurrent readers and writers
 Simple designing of the Master that is centralized and not bottlenecked
GFS provides caching for the performance and scalability of a file system and logging
for debugging and performance analysis.

5.6.2 Big Table


Googles Big table is a distributed storage system that allows storing huge volumes of
structured as well as unstructured data on storage mediums. Google created Big Table
with an aim to develop a fast, reliable, efficient and scalable storage system that can
process concurrent requests at a high speed. Millions of users access billions of web pages
and many hundred TBs of satellite images. A lot of semi-structured data is generated
from Google or web access by users. This data needs to be stored, managed, and
processed to retrieve insights. This required data management systems to have very high
scalability.
Google's aim behind developing Big Table was to provide a highly efficient system for
managing a huge amount of data so that it can help cloud storage services. It is required
for concurrent processes that can update various data pieces so that the most recent data
can be accessed easily at a fast speed. The design requirements of Big Table are as
follows :
1. High speed
2. Reliability
3. Scalability
4. Efficiency
5. High performance
6. Examination of changes that take place in data over a period of time.
Big Table is a popular, distributed data storage system that is highly scalable and self-
managed. It involves thousands of servers, terabytes of data storage for in-memory

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 18 Cloud Technologies and Advancements

operations, millions of read/write requests by users in a second and petabytes of data


stored on disks. Its self-managing services help in dynamic addition and removal of
servers that are capable of adjusting the load imbalance by themselves.
It has gained extreme popularity at Google as it stores almost all kinds of data, such as
Web indexes, personalized searches, Google Earth, Google Analytics, and Google
Finance. It contains data from the Web is referred to as a Web table. The generalized
architecture of Big table is shown in Fig. 5.6.2.

Fig. 5.6.2 : Generalized architecture of Big table

It is composed of three entities, namely Client, Big table master and Tablet servers. Big
tables are implemented over one or more clusters that are similar to GFS clusters. The
client application uses libraries to execute Big table queries on the master server. Big table
is initially broken up into one or more slave servers called tablets for the execution of
secondary tasks. Each tablet is 100 to 200 MB in size.
The master server is responsible for allocating tablets to tasks, clearing garbage
collections and monitoring the performance of tablet servers. The master server splits
tasks and executes them over tablet servers. The master server is also responsible for
maintaining a centralized view of the system to support optimal placement and load-
balancing decisions. It performs separate control and data operations strictly with tablet
servers. Upon granting the tasks, tablet servers provide row access to clients. Fig. 5.6.3
shows the structure of Big table :

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 19 Cloud Technologies and Advancements

Fig. 5.6.3 : Structure of Big table

Big Table is arranged as a sorted map that is spread in multiple dimensions and
involves sparse, distributed, and persistence features. The Big Table’s data model
primarily combines three dimensions, namely row, column, and timestamp. The first two
dimensions are string types, whereas the time dimension is taken as a 64-bit integer. The
resulting combination of these dimensions is a string type.
Each row in Big table has an associated row key that is an arbitrary string of up to
64 KB in size. In Big Table, a row name is a string, where the rows are ordered in a
lexicological form. Although Big Table rows do not support the relational model, they
offer atomic access to the data, which means you can access only one record at a time. The
rows contain a large amount of data about a given entity such as a web page. The row
keys represent URLs that contain information about the resources that are referenced by
the URLs.
The naming conventions that are used for columns are more structured than those of
rows. Columns are organized into a number of column families that logically groups the
data under a family of the same type. Individual columns are designated by qualifiers
within families. In other words, a given column is referred to use the syntax column_
family: optional_ qualifier, where column_ family is a printable string and qualifier is an
arbitrary string. It is necessary to provide an arbitrary name to one level which is known
as a column family, but it is not mandatory to give a name to a qualifier. The column
family contains information about the data type and is actually the unit of access control.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 20 Cloud Technologies and Advancements

Qualifiers are used for assigning columns in each row. The number of columns that can
be assigned in a row is not restricted.
The other important dimension that is assigned to Big Table is a timestamp. In Big
table, the multiple versions of data are indexed by timestamp for a given cell. The
timestamp is either related to real-time or can be an arbitrary value that is assigned by a
programmer. It is used for storing various data versions in a cell. By default, any new
data that is inserted into Big Table is taken as current, but you can explicitly set the
timestamp for any new write operation in Big Table. Timestamps provide the Big Table
lookup option that returns the specified number of the most recent values. It can be used
for marking the attributes of the column families. The attributes either retain the most
recent values in a specified number or keep the values for a particular time duration.
Big Table supports APIs that can be used by developers to perform a wide range of
operations such as metadata operations, read/write operations, or modify/update
operations. The commonly used operations by APIs are as follows:
 Creation and deletion of tables
 Creation and deletion of column families within tables
 Writing or deleting cell values
 Accessing data from rows
 Associate metadata such as access control information with tables and column
families
The functions that are used for atomic write operations are as follows :
 Set () is used for writing cells in a row.
 DeleteCells () is used for deleting cells from a row.
 DeleteRow() is used for deleting the entire row, i.e., all the cells from a row are
deleted.
It is clear that Big Table is a highly reliable, efficient, and fan system that can be used
for storing different types of semi-structured or unstructured data by users.

5.6.3 Chubby
Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities in
an asynchronous environment on a large scale. It is used as a name service within Google
and provides reliable storage for file systems along with the election of coordinator for
multiple replicas. The Chubby interface is similar to the interfaces that are provided by
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 21 Cloud Technologies and Advancements

distributed systems with advisory locks. However, the aim of designing Chubby is to
provide reliable storage with consistent availability. It is designed to use with loosely
coupled distributed systems that are connected in a high-speed network and contain
several small-sized machines. The lock service enables the synchronization of the
activities of clients and permits the clients to reach a consensus about the environment in
which they are placed. Chubby’s main aim is to efficiently handle a large set of clients by
providing them a highly reliable and available system. Its other important characteristics
that include throughput and storage capacity are secondary. Fig. 5.6.4 shows the typical
structure of a Chubby system :

Fig. 5.6.4 : Structure of a Chubby system

The chubby architecture involves two primary components, namely server and client
library. Both the components communicate through a Remote Procedure Call (RPC).
However, the library has a special purpose, i.e., linking the clients against the chubby cell.
A Chubby cell contains a small set of servers. The servers are also called replicas, and
usually, five servers are used in every cell. The Master is elected from the five replicas
through a distributed protocol that is used for consensus. Most of the replicas must vote

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 22 Cloud Technologies and Advancements

for the Master with the assurance that no other Master will be elected by replicas that
have once voted for one Master for a duration. This duration is termed as a Master lease.
Chubby supports a similar file system as Unix. However, the Chubby file system is
simpler than the Unix one. The files and directories, known as nodes, are contained in the
Chubby namespace. Each node is associated with different types of metadata. The nodes
are opened to obtain the Unix file descriptors known as handles. The specifiers for
handles include check digits for preventing the guess handle for clients, handle sequence
numbers, and mode information for recreating the lock state when the Master changes.
Reader and writer locks are implemented by Chubby using files and directories. While
exclusive permission for a lock in the writer mode can be obtained by a single client, there
can be any number of clients who share a lock in the reader’s mode. The nature of locks is
advisory, and a conflict occurs only when the same lock is requested again for an
acquisition. The distributed locking mode is complex. On one hand, its use is costly, and
on the other hand, it only permits numbering the interactions that are already using locks.
The status of locks after they are acquired can be described using specific descriptor
strings called sequencers. The sequencers are requested by locks and passed by clients to
servers in order to progress with protection.
Another important term that is used with Chubby is an event that can be subscribed
by clients after the creation of handles. An event is delivered when the action that
corresponds to it is completed. An event can be :
a. Modification in the contents of a file
b. Addition, removal, or modification of a child node
c. Failing over of the Chubby Master
d. Invalidity of a handle
e. Acquisition of lock by others
f. Request for a conflicting lock from another client
In Chubby, caching is done by a client that stores file data and metadata to reduce the
traffic for the reader lock. Although there is a possibility for caching of handles and files
locks, the Master maintains a list of clients that may be cached. The clients, due to
caching, find data to be consistent. If this is not the case, an error is flagged. Chubby
maintains sessions between clients and servers with the help of a keep-alive message,
which is required every few seconds to remind the system that the session is still active.
Handles that are held by clients are released by the server in case the session is overdue
for any reason. If the Master responds late to a keep-alive message, as the case may be, at

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 23 Cloud Technologies and Advancements

times, a client has its own timeout (which is longer than the server timeout) for the
detection of the server failure.
If the server failure has indeed occurred, the Master does not respond to a client about
the keep-alive message in the local lease timeout. This incident sends the session in
jeopardy. It can be recovered in a manner as explained in the following points:
 The cache needs to be cleared.
 The client needs to wait for a grace period, which is about 45 seconds.
 Another attempt is made to contact the Master.
If the attempt to contact the Master is successful, the session resumes and its jeopardy
is over. However, if this attempt fails, the client assumes that the session is lost. Fig. 5.6.5
shows the case of the failure of the Master :

Fig. 5.6.5 : Case of failure of Master server

Chubby offers a decent level of scalability, which means that there can be any
(unspecified) number of the Chubby cells. If these cells are fed with heavy loads, the lease
timeout increases. This increment can be anything between 12 seconds and 60 seconds.
The data is fed in a small package and held in the Random-Access Memory (RAM) only.
The Chubby system also uses partitioning mechanisms to divide data into smaller
packages. All of its excellent services and applications included, Chubby has proved to
be a great innovation when it comes to storage, locking, and program support services.
The Chubby is implemented using the following APls :
1. Creation of handles using the open() method
2. Destruction of handles using the close() method
The other important methods include GetContentsAndStat(), GetStat(), ReadDir(),
SetContents(), SetACl(), Delete(), Acquire(), TryAcquire(), Release(), GetSequencer(),
SetSequencer(), and CheckSequencer(). The commonly used APIs in chubby are listed in
Table 5.6.1 :
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 24 Cloud Technologies and Advancements

API Description
Open Opens the file or directory and returns a handle

Close Closes the file or directory and returns the associated handle

Delete Deletes the file or directory

ReadDir Returns the contents of a directory

SetContents Writes the contents of a file

GetStat Returns the metadata

GetContentsAndStat Writes the file contents and return metadata associated with the file

Acquire Acquires a lock on a file

Release Releases a lock on a file

Table 5.6.1 : APIs in Chubby

5.6.4 Google APIs


Google developed a set of Application Programming Interfaces (APIs) that can be used
to communicate with Google Services. This set of APIs is referred as Google APIs. and
their integration to other services. They also help in integrating Google Services to other
services. Google App Engine help in deploying an API for an app while not being aware
about its infrastructure. Google App Engine also hosts the endpoint APIs which are
created by Google Cloud Endpoints. A set of libraries, tools, and capabilities that can be
used to generate client libraries and APIs from an App Engine application is known as
Google Cloud Endpoints. It eases the data accessibility for client applications. We can also
save the time of writing the network communication code by using Google Cloud
Endpoints that can also generate client libraries for accessing the backend API.

5.7 Open Stack


OpenStack is an open - source cloud operating system that is increasingly gaining
admiration among data centers. This is because OpenStack provides a cloud computing
platform to handle enormous computing, storage, database and networking resources in
a data center. In simple way we can say, OpenStack is an opensource highly scalable
cloud computing platform that provides tools for developing private, public or hybrid
clouds, along with a web interface for users to access resources and admins to manage
those resources.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 25 Cloud Technologies and Advancements

Put otherwise, OpenStack is a platform that enables potential cloud providers to


create, manage and bill their custom-made VMs to their future customers. OpenStack is
free and open, which essentially means that everyone can have access to its source code
and can suggest or make changes to it and share it with the OpenStack community.
OpenStack is an open-source and freely available cloud computing platform that enables
its users to create, manage and deploy virtual machines and other instances. Technically,
OpenStack provides Infrastructure-as-a-Service (IaaS) to its users to enable them to
manage virtual private servers in their data centers.
OpenStack provides the required software tools and technologies to abstract the
underlying infrastructure to a uniform consumption model. Basically, OpenStack allows
various organisations to provide cloud services to the user community by leveraging the
organization’s pre-existing infrastructure. It also provides options for scalability so that
resources can be scaled whenever organisations need to add more resources without
hindering the ongoing processes.
The main objective of OpenStack is to provide a cloud computing platform that is :
 Global
 Open-source
 Freely available
 Easy to use
 Highly and easily scalable
 Easy to implement
 Interoperable
OpenStack is for all. It satisfies the needs of users, administrators and operators of
private clouds as well as public clouds. Some examples of open-source cloud platforms
already available are Eucalyptus, OpenNebula, Nimbus, CloudStack and OpenStack,
which are used for infrastructure control and are usually implemented in private clouds.

5.7.1 Components of OpenStack


OpenStack consists of many different components. Because OpenStack cloud is open -
source, developers can add components to benefit the OpenStack community. The
following are the core components of OpenStack as identified by the OpenStack
community:
 Nova : This is one of the primary services of OpenStack, which provides numerous
tools for the deployment and management of a large number of virtual machines.
Nova is the compute service of OpenStack.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 26 Cloud Technologies and Advancements

 Swift : Swift provides storage services for storing files and objects. Swift can be
equated with Amazon’s Simple Storage System (S3).
 Cinder : This component provides block storage to Nova Virtual Machines. Its
working is similar to a traditional computer storage system where the computer is
able to access specific locations on a disk drive. Cinder is analogous to AWS’s EBS.
 Glance : Glace is OpenStack’s image service component that provides virtual
templates (images) of hard disks. These templates can be used for new VMs. Glance
may use either Swift or flat files to store these templates.
 Neutron (formerly known as Quantum) : This component of OpenStack provides
Networking-as-a- Service, Load-Balancer-as-a-Service and Firewall- as-a-Service. It
also ensures communication between other components.
 Heat : It is the orchestration component of OpenStack. It allows users to manage
infrastructural needs of applications by allowing the storage of requirements in
files.
 Keystone : This component provides identity management in OpenStack
 Horizon : This is a dashboard of OpenStack, which provides a graphical interface.
 Ceilometer : This component of OpenStack provisions meters and billing models
for users of the cloud services. It also keeps an account of the resources used by
each individual user of the OpenStack cloud. Let us also discuss some of the non-
core components of OpenStack and their offerings.
 Trove : Trove is a component of OpenStack that provides Database-as-a- service. It
provisions relational databases and big data engines.
 Sahara : This component provisions Hadoop to enable the management of data
processors.
 Zaqar : This component allows messaging between distributed application
components.
 Ironic : Ironic provisions bare-metals, which can be used as a substitute to VMs.
The basic architectural components of OpenStack, shown in Fig. 5.7.1, includes its core
and optional services/ components. The optional services of OpenStack are also known as
Big Tent services, and OpenStack can be used without these components or they can be
used as per requirement.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 27 Cloud Technologies and Advancements

Fig. 5.7.1 : Components of open stack architecture


We have already discussed the core services and the four optional services. Let us now
discuss the rest of the services.
 Designate : This component offers DNS services analogous to Amazon’s Route 53.
The following are the subsystems of Designate :
o Mini DNS Server
o Pool Manager
o Central Service and APIs
 Barbican : Barbican is the key management service of OpenStack that is comparable
to KMS from AWS. This provides secure storage, retrieval, and provisioning and
management of various types of secret data, such as keys, certificates, and even
binary data.
 AMQP : AMQP stands for Advanced Message Queue Protocol and is a messaging
mechanism used by OpenStack. The AQMP broker lies between two components of
Nova and enables communication in a slackly coupled fashion.
Further, OpenStack uses two architectures - Conceptual and Logical, which are
discussed in the next section.

5.7.2 Features and Benefits of OpenStack


OpenStack helps build cloud environments by providing the ability to integrate various
technologies of your choice. Apart from the fact that OpenStack is open-source, there are
numerous benefits that make it stand out. Following are some of the features and benefits
of OpenStack Cloud :

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 28 Cloud Technologies and Advancements

 Compatibility : OpenStack supports both private and public clouds and is very easy
to deploy and manage. OpenStack APIs are supported in Amazon Web Services. The
compatibility eliminates the need for rewriting applications for AWS, thus enabling
easy portability between public and private clouds.
 Security : OpenStack addresses the security concerns, which are the top- most
concerns for most organisations, by providing robust and reliable security systems.
 Real-time Visibility : OpenStack provides real-time client visibility to
administrators, including visibility of resources and instances, thus enabling
administrators and providers to track what clients are requesting for.
 Live Upgrades : This feature allows upgrading services without any downtime.
Earlier, for upgradations, the was a need for shutting-down complete systems,
which resulted in loss of performance. Now, OpenStack has enabled upgrading
systems while they are running by requiring only individual components to shut-
down.
Apart from these, OpenStack offers other remarkable features, such as networking,
compute, Identity Access Management, orchestration, etc.

5.7.3 Conceptual OpenStack Architecture


Fig. 5.7.2, depicting a magnified version of the architecture by showing relationships
among different services and between the services and VMs. This expanded
representation is also known as the Conceptual architecture of OpenStack.

Fig. 5.7.2 : Conceptual architecture of OpenStack

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 29 Cloud Technologies and Advancements

From Fig. 5.7.2, we can see that every service of OpenStack depends on other services
within the systems, and all these services exist in a single ecosystem working together to
produce a virtual machine. Any service can be turned on or off depending on the VM
required to be produced. These services communicate with each other through APIs and
in some cases through privileged admin commands.
Let us now discuss the relationship between various components or services specified
in the conceptual architecture of OpenStack. As you can see in Figure 4.2, three
components, Keystone, Ceilometer and Horizon, are shown on top of the OpenStack
platform.
Here, Horizon is providing user interface to the users or administrators to interact with
underlying OpenStack components or services, Keystone is providing authentication to
the user by mapping the central directory of users to the accessible OpenStack services,
and Ceilometer is monitoring the OpenStack cloud for the purpose of scalability, billing,
benchmarking, usage reporting and other telemetry services. Inside the OpenStack
platform, you can see that various processes are handled by different OpenStack services;
Glance is registering Hadoop images, providing image services to OpenStack and
allowing retrieval and storage of disk images. Glance stores the images in Swift, which is
responsible for providing reading service and storing data in the form of objects and files.
All other OpenStack components also store data in Swift, which also stores data or job
binaries. Cinder, which offers permanent block storage or volumes to VMs, also stores
backup volumes in Swift. Trove stores backup databases in Swift and boots databases
instances via Nova, which is the main computing engine that provides and manages virtual
machines using disk images.
Neutron enables network connectivity for VMs and facilitates PXE Network for Ironic
that fetches images via Glance. VMs are used by the users or administrators to avail and
provide the benefits of cloud services. All the OpenStack services are used by VMs in
order to provide best services to the users. The infrastructure required for running cloud
services is managed by Heat, which is the orchestration component of OpenStack that
orchestrates clusters and stores the necessarys resource requirements of a cloud
application. Here, Sahara is used to offer a simple means of providing a data processing
framework to the cloud users.
Table 5.7.1 shows the dependencies of these services.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 30 Cloud Technologies and Advancements

Code Name Dependent on Optional


Nova (Compute) Keystone, Horizon, Glance Cinder, Neutron

Swift (Object Storage) Keystone -

Cinder (Block Storage) Keystone -

Glance (Image Service) Swift, Keystone, Horizon -

Neutron (Network) Keystone, Nova -

Keystone (Identity) - -

Horizon (Dashboard) Keystone -

Table 5.7.1 : Service Dependencies

5.7.4 Modes of Operations of OpenStack


OpenStack majorly operates in two modes - single host and multi host. A single host
mode of operation is that in which the network services are based on a central server,
whereas a multi host operation mode is that in which each compute node has a duplicate
copy of the network running on it and the nodes act like Internet gateways that are
running on individual nodes. In addition to this, in a multi host operation mode, the
compute nodes also individually host floating IPs and security groups. On the other
hand, in a single host mode of operation, floating IPs and security groups are hosted on
the cloud controller to enable communication.
Both single host and multi host modes of operations are widely used and have their
own set of advantages and limitations. A single host mode of operation has a major
limitation that if the cloud controller goes down, it results in the failure of the entire
system because instances stop communicating. This is overcome by a multi host operation
mode where a copy of the network is provisioned to every node. Whereas, this limitation
is overcome by the multi host mode, which requires a unique public IP address for each
compute node to enable communication. In case public IP addresses are not available,
using the multi host mode is not possible.

5.8 Federation in the Cloud


As many of the cloud computing environments present difficulties in creating and
managing decentralized provisioning of cloud services along with maintaining consistent
connectivity between untrusted components and fault - tolerance. Therefore, to overcome

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 31 Cloud Technologies and Advancements

such challenges the federated cloud ecosystem is introduced by associating multiple


cloud computing providers using a common standard. Cloud federation includes services
from different providers aggregated in a single pool supporting three essential
interoperability features like resource redundancy, resource migration, and combination
of complementary resources respectively. It allows an enterprise to distribute workload
around the globe and move data between desperate networks and implement innovative
security models for user access to cloud resources. In federated clouds, the cloud
resources are provisioned through network gateways that connect public or external
clouds with private or internal clouds owned by a single entity and/or community clouds
owned by several co-operating entities.
A popular project on identity management for federated cloud is conducted by
Microsoft, called the Geneva Framework. The Geneva framework was principally
centered around claim based access where claims describe the identity attributes, Identity
Metasystem characterizes a single identity model for the enterprise and federation, and
Security Token Services (STS) are utilized in the Identity Metasystem to assist with user
access management across applications regardless of location or architecture.
In this section we are going to see the federation in cloud by using IETF (Internet
Engineering Task Force) standard protocols for interdomain federation called Jabber XCP
(Jabber Extensible Communications Platform) and XMPP (Extensible Messaging and
Presence Protocol) which are experimented by many popular companies like Google,
Facebook, Twitter, etc. for cloud federation.

1. Jabber XCP
Instant Messaging (IM) allows users to exchange messages that are delivered
synchronously. As long as the recipient is connected to the service, the message will be
pushed to it directly. This can either be realized using a centralized server or peer to peer
connections between each client.
The Jabber Extensible Communications Platform (Jabber XCP) is a commercial IM
server, created by the Cisco in association with Sun Microsystems. It is a highly
programmable presence and messaging platform. It supports the exchange of
information between applications in real time. It supports multiple protocols such as
Session Initiation Protocol for Instant Messaging and Presence Leveraging Extensions
(SIMPLE) and Instant Messaging and Presence Service (IMPS). It is a highly
programmable platform and scalable solution, which makes it ideal for adding presence
and messaging to existing applications or services and for building next-generation,
presence - based solutions.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 32 Cloud Technologies and Advancements

2. XMPP (Extensible Messaging and Presence Protocol)


The Extensible Messaging & Presence Protocol (XMPP) is an open standard for instant
messaging. It is published in Request For Comments (RFCs) by the Internet Engineering
Task Force (IETF) and can be used freely. The protocol messages are formatted using
XML, which allows extending the messages with additional XML formatted information.
It was previously known as Jabber. Although in principle a protocol does not dictate the
underlying architecture, XMPP is a single-endpoint protocol. XMPP clients connect to one
machine of the architecture and transmit and receive both instant messages and presence
updates. In XMPP, users are identified by a username and domain name. Users can be
connected multiple times using the same username, which allows to have an IM&P
connection at work while the client at home is still running. Each connection is identified
by a unique resource, which is combined with the username and domain name to yield a
unique Jabber Identifier (JID).
In cloud architectures, web services play an important role in provisioning the
resources and services. But the protocols used by current cloud services like, SOAP
(Simple Object Access Protocol) or others are assorted HTTP-based protocols. These
protocols can only perform one-way information exchanges. Due to that the cloud
services possess challenges related to scalability, real-time communication and bypassing
firewall rules. Therefore, in search of solution many researchers have found XMPP (also
called Jabber) as the convenient protocol that can overcome those barriers and can be
used effectively with the cloud solutions. Many of cloud pioneers’ companies like Google,
IBM, Apple, and so on have already incorporated this protocol into their cloud-based
solutions in the last few years.
The XMPP is advantageous and best match for cloud computing because of following
consummate benefits :
a. It is decentralized and supports easy two-way communication
b. It doesn’t require polling for synchronization
c. It has built-in publish subscribe (pub-sub) functionality
d. It works on XML based open standards
e. It is perfect for Instant Messaging features and custom cloud services
f. It is efficient and scales up to millions of concurrent users on a single service
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL)
i. It is flexible and extensible.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 33 Cloud Technologies and Advancements

In current scenario, XMPP and XCP are extensively used for federation in the cloud
due to its unique capabilities which were never before. The next sections of this chapter
explain the levels of federations along with their applications and services.

5.9 Four Levels of Federation


In a real-time, federation defines the way how XMPP servers in different domains
exchange the XML based messages. As per XEP-0238, the XMPP Protocol in Inter-Domain
Federation, has four basic levels of federation which are shown in Fig. 5.9.1 and explained
as follows.

Fig. 5.9.1 : Four levels of federation

Level 1 : Permissive federation


The Permissive federation is the lowest level of federation where server accepts a
connection from a peer network server without confirming its identity using DNS
lookups or certificate checking. To run permissive federation, there is no minimum
criteria. The absence of verification or validation (authentication) may lead to domain
spoofing (the unapproved utilization of third-party domain name in an email message so
as to profess to be another person). It has least security mechanisms due to which it
makes the way for widespread spam and other abuses. Initially, permissive federation
was the only solution to work with web applications but with the arrival of the open
source Jabberd 1.2, the permissive federation becomes obsolete on the XMPP network.

Level 2 : Verified federation

The verified federation works at level 2, runs above the permissive federation. In
this level, the server accepts a connection from a peer network server only when the
identity of peer is verified or validated. Peer verification is the minimum criteria to run

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 34 Cloud Technologies and Advancements

verified federation. It utilizes information acquired by DNS for domain-specific keys


exchange in advance. In this type, the connection isn't encrypted but because of identity
verification it effectively prevents the domain spoofing. To make this work effectively,
the federation requires appropriate DNS setup but still it is prone to DNS poisoning
attacks. The verified federation has been the default service approach on the open XMPP
since the arrival of the open - source Jabberd 1.2 server. It act as a foundation for
encrypted federation.

Level 3 : Encrypted federation


The Encrypted federation is the third level of federation runs above verified
federation. In this level, a server accepts a connection from peer network servers if and
only if peer supports the Transport Layer Security (TLS) as characterized by XMPP in
RFC 3920. The Transport Layer Security is the minimum criteria to run encrypted
federation.
The TLS is the advancement to Secure Sockets Layer (SSL) which was developed to
make secure communications over HTTP. XMPP uses a TLS profile that enables two
entities to upgrade/convert a connection from unencrypted to encrypted. TLS is mainly
used for channel encryption. In encrypted federation, the peer must possess a self-signed
digital certificate verified by identity provider. However, this prevents mutual
authentication. If so, both parties proceed to weakly confirm identity utilizing Server
Dialback protocol. The XEP-0220 characterizes the Server Dialback protocol, which is
utilized between XMPP servers to give identity check. For that, Server Dialback utilizes
the DNS as the basis for verifying identity; the fundamental methodology is that when a
receiving server gets a server-to-server connection request from a beginning server. It
doesn't acknowledge the request until it has checked a key with a authoritative server for
the domain affirmed by the originating server. Despite the fact that Server Dialback
doesn't give strong authentication or trusted federation, and in spite of the fact that it is
liable to DNS poisoning attacks, it has successfully prevented most instances of address
spoofing on the XMPP network. This outcomes in an encrypted connection with weak
identity verification. Here certificates are signed by the server itself.

Level 4 : Trusted federation


The Encrypted federation is the top most level of federation runs above encrypted
federation. In this level, a server accepts a connection from peer network server if and
only if the peer supports TLS and the peer can present a digital certificate issued by a root
Certification Authority (CA) that is trusted by the authenticating server. The use of digital
certificates in trusted federation results in providing strong authentication and channel
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 35 Cloud Technologies and Advancements

encryption. The trusted root CAs are identified based on one or more factors like their
operating system environment, XMPP server software, or local service policy. The
utilization of trusted domain certificates prevents DNS poisoning attacks but makes
federation more difficult. Under such circumstances, the certificates are difficult to obtain.
Here certificates are signed by CA.

5.10 Federated Services and Applications


The server to server federation is required to head toward building a constant real -
time communication in the cloud. The cloud typically comprises the considerable number
of clients, devices, services, and applications connected to the network. So as to
completely use the capacities of this cloud structure, a member needs the capacity to
discover different entities of interest. Such entities may be end users, real-time content
feeds, user directories, messaging gateways, and so on. The XMPP utilizes service
discovery to find the stated entities. The discovery protocol empowers any network
participant to inquiry another entity with respect to its identity, capacities, and related
substances. At any point, when a participant connects to the network, it inquiries the
authoritative server for its specific domain about the entities associated with that
authoritative server. In response to a service discovery query, the authoritative server
instructs the inquirer about services hosted there and may likewise detail services that are
accessible however hosted somewhere else. XMPP incorporates a technique for keeping
up lists of other entities, known as roster technology, which empowers end clients to
monitor different kinds of entities. For the most part, these lists are comprised of other
entities the users are interested in or collaborate with regularly. Most XMPP
arrangements incorporate custom directories with the goal that internal users of those
services can easily find what they are searching for.
Some organizations are wary of federation because they fear that real-time
communication networks will introduce the same types of problems that are endemic to
email networks, such as spam and viruses. While these concerns are not unfounded, they
tend to be exaggerated for several reasons:
Some of the organizations are careful about federation since they fear that real - time
communication may leads to vulnerabilities in email networks, such as spam and viruses.
While these concerns are not unfounded because of some technical strengths in
federation. As XMPP learned from past problems, it helps email systems to prevent
address spoofing, inline scripts, unlimited binary attachments, and other attack tactics.
The use of point - to - point federation can avoid problem that occurred in traditional
multi - hop federation as it can restricts injection attacks, data loss, and unencrypted
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 36 Cloud Technologies and Advancements

intermediate links. With federation, the email network can ensures encrypted connections
and strong authentication because of using certificates issued by trusted root CAs.

5.11 The Future of Federation


The accomplishment of federated communications is an antecedent to build a
consistent cloud that can interact with individuals, devices, information feeds,
documents, application interfaces and so on. The intensity of a federated, presence-
enabled communications infrastructure is that it empowers software developers and
service providers to build and deploy such applications without asking consent from a
large, centralized communications operator. The procedure of server-to-server federation
with the end goal of interdomain communication has assumed an enormous role in the
success of XMPP, which depends on a small set of simple but incredible mechanisms for
domain checking and security to produce verified, encrypted, and trusted connections
between any two deployed servers. These mechanisms have given a steady, secure
establishment for development of the XMPP network and similar real - time technologies.
Summary

 The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using
programming models.
 The Hadoop core is divided into two fundamental layers called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data
storage manager.
 The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves.
 In MapReduce s model the data processing primitives used are called mapper
and reducer. The mapper has map method that transforms input key value pair
in to any number of intermediate key value pairs while reducer has a reduce
method that transform intermediate key value pairs that aggregated in to any
number of output key, value pairs.
 VirtualBox is a Type II (hosted) hypervisor that runs on Microsoft Windows,
Mac OS, Linux, and Solaris systems. It is ideal for testing, developing,
demonstrating, and deploying solutions across multiple platforms on single
machine using VirtualBox.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 37 Cloud Technologies and Advancements

 Google App Engine (GAE) is a Platform-as-a-Service cloud computing model


that supports many programming languages like Java, Python, .NET, PHP,
Ruby, Node.js and Go in which developers can write their code and deploy it on
available google infrastructure with the help of Software Development Kit
(SDK).
 The Google provides programming support for its cloud environment through
Google File System (GFS), Big Table, and Chubby.
 The GFS is used for storing large amounts of data on google storage clusters,
BigTable offers a storage service for accessing structured as well as unstructured
data while Chubby is used as a distributed application locking service.
 OpenStack is an opensource highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web
interface for users to access resources and admins to manage those resources.
 Openstack architecture has many components to manage compute storage,
network and security services.
 Cloud federation includes services from different providers aggregated in a
single pool supporting three essential interoperability features like resource
redundancy, resource migration, and combination of complementary resources
respectively.
 The Jabber XCP and XMPP (Extensible Messaging and Presence Protocol) are
two popular protocols used in federation of cloud by companies like Google,
Facebook, Twitter, etc.
 There are four levels of federations namely permissive, verified federation,
encrypted and trusted federation where trusted is most secured federation while
permissive is least secured federation.

Short Answered Questions

Q.1 What are the advantages of using Hadoop ? AU : Dec.-16


Ans. : The Apache Hadoop is an open source software project that enables distributed
processing of large data sets across clusters of commodity servers using programming
models. It is designed to scale up from a single server to thousands of machines, with a
very high degree of fault tolerance.
The advantages of Hadoop are listed as follows :
a. Hadoop is a highly scalable in nature for data storage and processing platforms.
b. It satisfies all four characteristics of big data like volume, velocity, variety and
veracity.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 38 Cloud Technologies and Advancements

c. It is a cost-effective solution for Big data applications as it uses a cluster of


commodity hardware to store data.
d. It provides high throughput and low latency for high computational jobs.
e. It is highly fault tolerance in nature along with features like self-healing and
replication. It automatically replicates data if server or disk get crashed.
f. It is flexible in nature as it supports different file formats of data like structured,
unstructured and semi structured.
g. It provides faster execution environment for big data applications.
h. It provides support for business intelligence by querying, reporting, searching,
filtering, indexing, aggregating the datasets.
i. It provides tools for report generation, trend analysis, search optimization, and
information retrieval.
j. It supports different types of analytics like predictive, prescriptive and descriptive
along with functions like as document indexing, concept filtering, aggregation,
transformation, semantic text analysis, pattern recognition, and searching.
Q.2 What is the purpose of heart beat in Hadoop. AU : Dec.-17

OR State the significance of heart beat message in Hadoop. AU : Dec.-19

OR Give the significance of heart beat message in Hadoop. AU : May-19


Ans. : In Hadoop Name node and data node does communicate using Heartbeat. The
Heartbeat is the signal that is sent by the data node to the name node in regular time
interval to indicate its presence, i.e. to indicate that it is alive. Data Nodes send a
heartbeat signal to the Name Node every three seconds by default. The working of
Heartbeat is shown in Fig. 5.1.

Fig. 5.1 : Heartbeat in HDFS

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 39 Cloud Technologies and Advancements

If after a certain time of heartbeat (which is ten minutes by default), Name Node
does not receive any response from Data Node, then that particular Data Node used to
be declared as dead. If the death of a node causes the replication factor of data blocks to
drop below their minimum value, the Name Node initiates additional replication to
normalized state.
Q.3 Name the different modules in Hadoop framework. AU : May-17
Ans. : The Hadoop core is divided into two fundamental modules called HDFS and
MapReduce engine. The HDFS is a distributed file system inspired by GFS that
organizes files and stores their data on a distributed computing system, while
MapReduce is the computation engine running on top of HDFS as its data storage
manager. Apart from that there are several other modules in Hadoop, used for data
storage, processing and analysis which are listed below :
a. HBase : Column-oriented NOSQL database service
b. Pig : Dataflow language and provides parallel data processing framework
c. Hive : Data warehouse infrastructure for big data
d. Scoop : Transferring bulk data between Hadoop and structured data stores
e. Oozie : Workflow scheduler system
f. Zookeeper : Distributed synchronization and coordination service
g. Mahaout : Machine learning tool for big data.
Q.4 “HDFS” is fault tolerant. Is it true ? Justify your answer. AU : Dec.-17
Ans. : Fault tolerance refers to the ability of the system to work or operate
uninterrupted even in case of unfavorable conditions (like components failure due to
disaster or by any other reason). The main purpose of this fault tolerance is to remove
frequently taking place failures, which occurs commonly and disturbs the ordinary
functioning of the system. The three main solutions which are used to produce fault
tolerance in HDFS are data replication, heartbeat messages and checkpoint and
recovery.
In data replication, The HDFS stores multiple replicas of same data across different
clusters based on replication factor. HDFS uses an intelligent replica placement model
for reliability and performance. The same copy of data is positioned on several different
computing nodes so when that data copy is needed it is provided by any of the data
node. major advantage of using this technique is to provide instant recovery from node
and data failures. But one main disadvantage is it consume high memory in storing the
same data on multiple nodes.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 40 Cloud Technologies and Advancements

In heartbeat messages, the message is sent by the data node to the name node in
regular time interval to indicate its presence, i.e. to indicate that it is alive. If after a
certain time of heartbeat, Name Node does not receive any response from Data Node,
then that particular Data Node used to be declared as dead. In that case, the replication
node is considered as a primary data node to recover the data.
In checkpoint and recovery, similar concept as that of rollback is used to tolerate
faults up to some point. After a fixed length of time interval, the copy report has been
saved and stored. It just rollbacks to the last save point when the failure occurs and
then it starts performing transaction again.
Q.5 How does divide and conquer strategy related to MapReduce paradigm ?
AU : May-18
Ans. : In divide and conquer strategy a computational problem is divided into smaller
parts and execute them independently
until all parts gets completed and then
combining them to get a desired
solution of that problem.
The MapReduce takes a set of input
<key, value> pairs and produces a set
of output <key, value> pairs by
supplying data through map and
Fig. 5.2 : MapReduce operations
reduce functions. The typical
MapReduce operations are shown in Fig. 5.2.
In MapReduce, Mapper uses divide approach where input data gets splitted into
blocks; each block is represented as an input key and value pair. The unit of work in
MapReduce is a job. During map phase the input data is divided in to input splits for
analysis where each split is an independent task. These tasks run in parallel across
Hadoop clusters. A map function is applied to each input key/value pair, which does
some user-defined processing and emits new key/value pairs to intermediate storage
to be processed by the reducer. The reducer uses conquer approach for combining the
results. The reducer phase uses result obtained from mapper as an input to generate the
final result. A reduce function is applied on to mappers output in parallel to all values
corresponding to each unique map key and generates a single output key/value pair.
Q.6 How MapReduce framework executes user jobs ? AU : Dec.-18
Ans. : The unit of work in MapReduce is a job. During map phase the input data is
divided in to input splits for analysis where each split is an independent task. These

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 41 Cloud Technologies and Advancements

tasks run in parallel across Hadoop


clusters. The reducer phase uses result
obtained from mapper as an input to
generate the final result. The
MapReduce takes a set of input <key,
value> pairs and produces a set of
output <key, value> pairs by supplying
data through map and reduce
functions. The typical MapReduce
Fig. 5.3 : MapReduce operations
operations are shown in Fig. 5.3.
Every MapReduce program undergoes different phases of execution. Each phase has
its own significance in MapReduce framework.
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.
Once input file is selected then the split phase reads the input data and divided
those in to smaller chunks. The splitted chunks are then given to the mapper. The map
operations extract the relevant data and generate intermediate key value pairs pass to
combiner.
The combiner is used with both mapper and reducer to reduce the volume of data
transfer.it is also known as semi reducer which accepts input from mapper and passes
output key, value pair to reducer. The shuffle and sort are the components of reducer.
The shuffling is a process of partitioning and moving a mapped output to the reducer
where intermediate keys are assigned to the reducer. The sort phase is responsible for
sorting the intermediate keys on single node automatically before they are presented to
the reducer. The shuffle and sort phases occur simultaneously where mapped output
are being fetched and merged.
The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output of
each MapReduce program is generated with key value pairs written in output file
which is written back to the HDFS store.
Q.7 What is Map - reduce ? enlist the features of Map - reduce framework.
Ans. : The Map - reduce is a programming model provided by Hadoop that allows
expressing distributed computations on huge amount of data.it provides easy scaling of
data processing over multiple computational nodes or clusters.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 42 Cloud Technologies and Advancements

Features of MapReduce
The different features provided by MapReduce are explained as follows :
 Synchronization : The MapReduce supports execution of concurrent tasks. When
the concurrent tasks are executed, they need synchronization. The
synchronization is provided by reading the state of each MapReduce operation
during the execution and uses shared variables for those.
 Data locality : In MapReduce although the data resides on different clusters, it
appears like a local to the users’ application. To obtain the best result the code and
data of application should resides on same machine.
 Error handling : MapReduce engine provides different fault tolerance
mechanisms in case of failure. When the tasks are running on different cluster
nodes during which if any failure occurs then MapReduce engine find out those
incomplete tasks and reschedule them for execution on different nodes.
 Scheduling : The MapReduce involves map and reduce operations that divide
large problems in to smaller chunks and those are run in parallel by different
machines.so there is a need to schedule different tasks on computational nodes on
priority basis which is taken care by MapReduce engine.
Q.8 Enlist the features of Virtual Box.
Ans. : The VirtualBox provides the following main features
 It supports Fully Para virtualized environment along with Hardware
virtualization.
 It provides device drivers from driver stack which improves the performance of
virtualized input/output devices.
 It provides shared folder support to copy data from host OS to guest OS and vice
versa.
 It has latest Virtual USB controller support.
 It facilitates broad range of virtual network driver support along with host, bridge
and NAT modes.
 It supports Remote Desktop Protocol to connect windows virtual machine (guest
OS) remotely on a thin, thick or mobile client seamlessly.
 It has Support for Virtual Disk formats which are used by both VMware and
Microsoft Virtual PC hypervisors.
Q.9 Describe google app engine.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 43 Cloud Technologies and Advancements

Ans. : Google App Engine (GAE) is a platform-as-a-service cloud computing model


that supports many programming languages. GAE is a scalable runtime environment
mostly devoted to execute Web applications. In fact, it allows developers to integrate
third-party frameworks and libraries with the infrastructure still being managed by
Google. It allows developers to use readymade platform to develop and deploy web
applications using development tools, runtime engine, databases and middleware
solutions. It supports languages like Java, Python, .NET, PHP, Ruby, Node.js and Go in
which developers can write their code and deploy it on available google infrastructure
with the help of Software Development Kit (SDK). In GAE, SDKs are required to set up
your computer for developing, deploying, and managing your apps in App Engine.
Q.10 What are the core components of Google app engine architecture ?
Ans. : The infrastructure for GAE composed of four main components like Google File
System (GFS), MapReduce, BigTable, and Chubby. The GFS is used for storing large
amounts of data on google storage clusters. The MapReduce is used for application
program development with data processing on large clusters. Chubby is used as a
distributed application locking services while BigTable offers a storage service for
accessing structured as well as unstructured data.
Q.11 Enlist the advantages of GFS.
Ans. : Google has designed a distributed file system, named GFS, for meeting its
exacting demands off processing a large amount of data. GFS provides a file system
interface and different APIs for supporting different file operations such as create to
create a new file instance, delete to delete a file instance, open to open a named file and
return a handle, close to close a given file specified by a handle, read to read data from a
specified file and write to write data to a specified file.
The advantages of GFS are :
a. Automatic recovery from component failure on a routine basis.
b. Efficient storage support for large - sized files as a huge amount of data to be
processed is stored in these files. Storage support is provided for small - sized files
without requiring any optimization for them.
c. With the workloads that mainly consist of two large streaming reads and small
random reads, the system should be performance conscious so that the small
reads are made steady rather than going back and forth by batching and sorting
while advancing through the file.
d. The system supports small writes without being inefficient, along with the usual
large and sequential writes through which data is appended to files.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 44 Cloud Technologies and Advancements

e. Semantics that are defined well are implemented.


f. Atomicity is maintained with the least overhead due to synchronization.
g. Provisions for sustained bandwidth is given priority rather than a reduced latency.
Q.12 What is the role of chubby in Google app engine ?
Ans. : Chubby is the crucial service in the Google infrastructure that offers storage and
coordination for other infrastructure services such as GFS and Bigtable. It is a coarse -
grained distributed locking service that is used for synchronizing distributed activities
in an asynchronous environment on a large scale. It is used as a name service within
Google and provides reliable storage for file systems along with the election of
coordinator for multiple replicas. The Chubby interface is similar to the interfaces that
are provided by distributed systems with advisory locks. However, the aim of
designing Chubby is to provide reliable storage with consistent availability.
Q.13 What is Openstack ? Enlist its important components.
Ans. : OpenStack is an open source highly scalable cloud computing platform that
provides tools for developing private, public or hybrid clouds, along with a web
interface for users to access resources and admins to manage those resources.
The different components of Openstack architecture are :
a. Nova (Compute)
b. Swift (Object storage)
c. Cinder (Block level storage)
d. Neutron (Networking)
e. Glance (Image Management)
f. Keystone (Key management)
g. Horizon (Dashboard)
h. Ceilometer (Metering)
i. Heat (Orchestration)
Q.14 Explain the term “federation in the cloud”
Ans. : As many of the cloud computing environments present difficulties in creating
and managing decentralized provisioning of cloud services along with maintaining
consistent connectivity between untrusted components and fault-tolerance. Therefore,
to overcome such challenges the federated cloud ecosystem is introduced by associating
multiple cloud computing providers using a common standard. Cloud federation
includes services from different providers aggregated in a single pool supporting three
essential interoperability features like resource redundancy, resource migration, and
combination of complementary resources respectively. It allows an enterprise to
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 45 Cloud Technologies and Advancements

distribute workload around the globe and move data between desperate networks and
implement innovative security models for user access to cloud resources. In federated
clouds, the cloud resources are provisioned through network gateways that connect
public or external clouds with private or internal clouds owned by a single entity
and/or community clouds owned by several co-operating entities.
Q.15 Mention the importance of Transport Level Security (TLS). AU : Dec.-16
Ans. : The Transport Layer Securities (TLS) are designed to provide security at the
transport layer. TLS was derived from a security protocol called Secure Service Layer
(SSL). TLS ensures that no third party may eavdrops or tamper with any message.
The benefits of TLS are :
a. Encryption : TLS/SSL can help to secure transmitted data using encryption.
b. Interoperability : TLS/SSL works with most web browsers, including Microsoft
Internet Explorer and on most operating systems and web servers.
c. Algorithm Flexibility : TLS/SSL provides operations for authentication
mechanism, encryption algorithms and hashing algorithm that are used during
the secure session.
d. Ease of Deployment : Many applications TLS/SSL temporarily on a windows
server 2003 operating systems.
e. Ease of Use : Because we implement TLS/SSL beneath the application layer, most
of its operations are completely invisible to client.
Q.16 Enlist the features of extensible messaging & presence protocol for cloud
computing.
Ans. : The features of extensible messaging & presence protocol for cloud computing
are :
a. It is decentralized and supports easy two-way communication.
b. It doesn’t require polling for synchronization.
c. It has built-in publish subscribe (pub-sub) functionality.
d. It works on XML based open standards.
e. It is perfect for instant messaging features and custom cloud services.
f. It is efficient and scales up to millions of concurrent users on a single service.
g. It supports worldwide federation models
h. It provides strong security using Transport Layer Security (TLS) and Simple
Authentication and Security Layer (SASL).
i. It is flexible and extensible.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 46 Cloud Technologies and Advancements

Long Answered Questions

Q.1 Give a detailed note on Hadoop framework. AU : Dec.-16


Ans. : Refer section 5.1 and 5.1.1.

Q.2 Explain the Hadoop Ecosystem framework.


Ans. : Refer section 5.1 and 5.1.1.

Q.3 Explain the Hadoop Distributed File System architecture with a diagram.
AU : Dec.-18
Ans. : Refer section 5.2 and 5.2.1.
Q.4 Elaborate HDFS concepts with suitable diagram. AU : May-17
Ans. : Refer section 5.2 and 5.2.1.
OR Illustrate the design of Hadoop file system. AU : Dec.-19
Ans. : Refer section 5.2 and 5.2.1.
Q.5 Illustrate dataflow in HDFS during file read/write operation with suitable
diagrams. AU : Dec.-17
Ans. : The HDFS follows Master-slave architecture using name node and data nodes.
The Name node act as a master while multiple Data nodes worked as slaves. The HDFS
is implemented as block structure file system where files are broken in to block of fixed
size stored on Hadoop clusters. The HDFS architecture is shown in Fig. 5.4.

Fig. 5.4 : HDFS Architecture

The Components of HDFS composed of following elements


1. Name Node
An HDFS cluster consists of single name node called master server that manages the
file system namespaces and regulate access to files by client. It runs on commodity
hardware that manages file system namespaces. It stores all metadata for the file system

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 47 Cloud Technologies and Advancements

across the clusters. The name node serves as single arbitrator and repository for HDFS
metadata which is kept in main memory for faster random access. The entire file system
name space is contained in a file called FsImage stored on name nodes file system,
while the transaction log record is stored in Editlog file.
2. Data Node
In HDFS there are multiple data nodes exist that manages storages attached to the
node that they run on. They are usually used to store users’ data on HDFS clusters.
Internally the file is splitted in to one or more blocks to data node. The data nodes are
responsible for handling read/write request from clients. It also performs block
creation, deletion and replication upon instruction from name node. The data node
stores each HDFS data block in separate file and several blocks are stored on different
data nodes. The requirement of such a block structured file system is to store, manage
and access files metadata reliably.
The representation of name node and data node is shown in Fig. 5.5.

Fig. 5.5 : Representation of name node and data nodes

3. HDFS Client
In Hadoop distributed file system, the user applications access the file system using
the HDFS client. Like any other file systems, HDFS supports various operations to read,
write and delete files, and operations to create and delete directories. The user
references files and directories by paths in the namespace. The user application does
not need to aware that file system metadata and storage are on different servers, or that
blocks have multiple replicas. When an application reads a file, the HDFS client first
asks the name node for the list of data nodes that host replicas of the blocks of the file.
The client contacts a data node directly and requests the transfer of the desired block.
When a client writes, it first asks the name node to choose data nodes to host replicas of
the first block of the file. The client organizes a pipeline from node-to-node and sends
the data. When the first block is filled, the client requests new data nodes to be chosen
to host replicas of the next block. The Choice of data nodes for each block is likely to be
different.
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 48 Cloud Technologies and Advancements

4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files in file system
are divided in to one or more segments called blocks. The default size of HDFS block is
64 MB that can be increase as per need.
A. Read Operation in HDFS
The Read Operation in HDFS is shown in Fig. 5.6 and explained as follows.

Fig. 5.6 : Read Operation in HDFS


1. A client initiates read request by calling 'open ()' method of Filesystem object; it is
an object of type Distributed File system.
2. This object connects to name node using RPC and gets metadata information such
as the locations of the blocks of the file. Please note that these addresses are of first
few blocks of a file.
3. In response to this metadata request, addresses of the Data Nodes having a copy
of that block is returned back.
4. Once addresses of Data Nodes are received, an object of type FSDataInputStream
is returned to the client. FSDataInputStream contains DFSInputStream which
takes care of interactions with Data Node and Name Node. In step 4 shown in the
above diagram, a client invokes 'read ()' method which causes DFSInputStream to
establish a connection with the first Data Node with the first block of a file.
5. Data is read in the form of streams wherein client invokes 'read ()' method
repeatedly. This process of read () operation continues till it reaches the end of
block.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 49 Cloud Technologies and Advancements

6. Once the end of a block is reached, DFSInputStream closes the connection and
moves on to locate the next Data Node for the next block
7. Once a client has done with the reading, it calls a close () method.

B. Write Operation in HDFS

The Write Operation in HDFS is shown in Fig. 5.7 and explained as follows

Fig. 5.7 : Write operation in HDFS

1. A client initiates write operation by calling 'create ()' method of Distributed File
system object which creates a new file - Step no. 1 in the above diagram.
2. Distributed file system object connects to the Name Node using RPC call and
initiates new file creation. However, this file creates operation does not associate
any blocks with the file. It is the responsibility of Name Node to verify that the file
(which is being created) does not exist already and a client has correct permissions
to create a new file. If a file already exists or client does not have sufficient
permission to create a new file, then IOException is thrown to the client.
Otherwise, the operation succeeds and a new record for the file is created by the
Name Node.
3. Once a new record in Name Node is created, an object of type
FSDataOutputStream is returned to the client. A client uses it to write data into
the HDFS. Data write method is invoked (step 3 in the diagram).
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 50 Cloud Technologies and Advancements

4. FSDataOutputStream contains DFSOutputStream object which looks after


communication with Data Nodes and Name Node. While the client continues
writing data, DFSOutputStream continues creating packets with this data. These
packets are enqueued into a queue which is called as DataQueue.
5. There is one more component called DataStreamer which consumes
this DataQueue. DataStreamer also asks Name Node for allocation of new blocks
thereby picking desirable Data Nodes to be used for replication.
6. Now, the process of replication starts by creating a pipeline using Data Nodes. In
our case, we have chosen a replication level of 3 and hence there are 3 Data Nodes
in the pipeline.
7. The DataStreamer pours packets into the first Data Node in the pipeline.
8. Every Data Node in a pipeline stores packet received by it and forwards the same
to the second Data Node in a pipeline.
9. Another queue, 'Ack Queue' is maintained by DFSOutputStream to store packets
which are waiting for acknowledgment from Data Nodes.
10. Once acknowledgment for a packet in the queue is received from all Data Nodes
in the pipeline, it is removed from the 'Ack Queue'. In the event of any Data Node
failure, packets from this queue are used to reinitiate the operation.
11. After a client is done with the writing data, it calls a close () method (Step 9 in the
diagram) Call to close (), results into flushing remaining data packets to the
pipeline followed by waiting for acknowledgment.
12. Once a final acknowledgment is received, Name Node is contacted to tell it that
the file write operation is complete.
Q.6 Discuss MapReduce with suitable diagram. AU : May-17

OR Analyze how MapReduce framework supports parallel and distributed


computing on large data sets with a suitable example. AU : Dec.-19
OR Illustrate the Hadoop implementation of MapReduce framework.
AU : May-19
Ans. : Refer section 5.3.

Q.7 Develop a wordcount application with Hadoop MapReduce programming


model. AU : May-19

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 51 Cloud Technologies and Advancements

Ans. : The MapReduce takes a set of input <key, value> pairs and produces a set of
output <key, value> pairs by supplying data through map and reduce functions. Every
MapReduce program undergoes different phases of execution. Each phase has its own
significance in MapReduce framework. The different phases of execution in
MapReduce are shown in Fig. 5.8 and explained as follows.

Fig. 5.8 : Different phases of execution in MapReduce

Let us take an example of Word count application where inputs is an set of words.
The Input to the mapper has three sets of words like [Deer, Bear, River], [Car, Car,
River] and [Deer, Car, Bear]. These three sets are taken arbitrarily as an input to the
MapReduce process. The various stages in MapReduce for wordcount application are
shown in Fig. 5.9.
In input phase the large data set in the form of <key, value> pair is provided as a
standard input for MapReduce program. The input files used by MapReduce are kept
on HDFS (Hadoop Distributed File System) store which has standard InputFormat
specified by user.

®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 52 Cloud Technologies and Advancements

Fig. 5.9 : Various stages in MapReduce for Wordcount application

Once input file is selected then the splitting phase reads the input data and divided
those in to smaller chunks. Like [Deer, Bear, River], [Car, Car, River] and [Deer, Car,
Bear] as a separate set. The splitted chunks are then given to the mapper.
The mapper does map operations extract the relevant data and generate
intermediate key value pairs. It reads input data from split using record reader and
generates intermediate results like [Deer:1; Bear:1; River:1], [Car:1; Car:1; River:1] and
[Deer:1; Car:1; Bear:1]. It is used to transform the input key, value list data to output
key, value list which is then pass to combiner.
The shuffle and sort are the components of reducer. The shuffling is a process of
partitioning and moving a mapped output to the reducer where intermediate keys are
assigned to the reducer. Each partition is called subset. Each subset becomes input to
the reducer.in general shuffle phase ensures that the partitioned splits reached at
appropriate reducers where reducer uses http protocol to retrieve their own partition
from mapper. The output of this stage would be [Deer:1, Deer:1], [Bear:1, Bear:1],
[River:1, River:1] and [Car:1, Car:1, Car:1].
The sort phase is responsible for sorting the intermediate keys on single node
automatically before they are presented to the reducer. The shuffle and sort phases
occur simultaneously where mapped output are being fetched and merged. It sorts all
intermediate results alphabetically like [Bear:1, Bear:1], [Car:1, Car:1, Car:1], [Deer:1,
Deer:1] and [River:1, River:1]. The combiner is used with both mapper and reducer to
reduce the volume of data transfer.it is also known as semi reducer which accepts input
from mapper and passes output key, value pair to reducer. Then output of this stage
would be [Bear:2], [Car:3], [Deer:2] and [River:2].
®
TECHNICAL PUBLICATIONS - An up thrust for knowledge
Cloud Computing 5 - 53 Cloud Technologies and Advancements

The reducer reduces a set of intermediate values which share unique keys with set of
values. The reducer uses sorted input to generate the final output. The final output is
written using record writer by the reducer in to output file with standard output format
like [Bear:2, Car:3, Deer:2, River:2]. The final output of each MapReduce program is
generated with key value pairs written in output file which is written back to the HDFS
store. In example of Word count process using MapReduce with all phases of execution
are illustrated in Fig. 5.9.
Q.8 Explain the functional architecture of the Google cloud platform for app
engine in detail.
Ans. : Refer section 5.5.

Q.9 Write a short note on Google file system.


Ans. : Refer section 5.6.1.

Q.10 Explain the functionality of Chubby.


Ans. : Refer section 5.6.3.

Q.11 Explain the significance of Big table along with its working.
Ans. : Refer section 5.6.2.

Q.12 Explain in brief the conceptual architecture of Openstack.


Ans. : Refer section 5.7.1.

Q.13 Write a short note on levels of federation in cloud.


Ans. : Refer section 5.9.



®
TECHNICAL PUBLICATIONS - An up thrust for knowledge

You might also like